Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 182 additions & 0 deletions notebooks/Spanish/chat_pdf_images.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fd77e3b8",
"metadata": {},
"source": [
"# Chatea con imágenes de páginas PDF\n",
"\n",
"**Si estás buscando la aplicación web, revisa la carpeta src/.** \n",
"\n",
"Este notebook demuestra cómo convertir páginas de PDF a imágenes y enviarlas a un modelo de visión para inferencia"
]
},
{
"cell_type": "markdown",
"id": "e5eb545b",
"metadata": {},
"source": [
"## Autentícate en OpenAI\n",
"\n",
"El siguiente código se conecta a OpenAI, ya sea usando una cuenta de Azure OpenAI, modelos de GitHub, o modelo local de Ollama. Mira el README para instrucciones sobre cómo configurar el archivo `.env`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ae3a4d3",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import azure.identity\n",
"import openai\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv(\".env\", override=True)\n",
"\n",
"openai_host = os.getenv(\"OPENAI_HOST\", \"github\")\n",
"\n",
"if openai_host == \"github\":\n",
" print(\"Usando GitHub Models con GITHUB_TOKEN como clave\")\n",
" openai_client = openai.OpenAI(\n",
" api_key=os.environ[\"GITHUB_TOKEN\"],\n",
" base_url=\"https://models.github.ai/inference\",\n",
" )\n",
" model_name = os.getenv(\"OPENAI_MODEL\", \"openai/gpt-4o\")\n",
"elif openai_host == \"local\":\n",
" print(\"Usando API local compatible con OpenAI sin clave\")\n",
" openai_client = openai.OpenAI(api_key=\"no-key-required\", base_url=os.environ[\"LOCAL_OPENAI_ENDPOINT\"])\n",
" model_name = os.getenv(\"OPENAI_MODEL\", \"gpt-4o\")\n",
"elif openai_host == \"azure\" and os.getenv(\"AZURE_OPENAI_KEY_FOR_CHATVISION\"):\n",
" # Autenticación usando una clave de API de Azure OpenAI\n",
" # Esto generalmente no se recomienda, pero se proporciona por conveniencia\n",
" print(\"Usando Azure OpenAI con clave\")\n",
" openai_client = openai.OpenAI(\n",
" base_url=os.environ[\"AZURE_OPENAI_ENDPOINT\"] + \"/openai/v1/\",\n",
" api_key=os.environ[\"AZURE_OPENAI_KEY_FOR_CHATVISION\"],\n",
" )\n",
" # Esto es en realidad el nombre del deployment, no el nombre del modelo\n",
" model_name = os.getenv(\"OPENAI_MODEL\", \"gpt-4o\")\n",
"elif openai_host == \"azure\" and os.getenv(\"AZURE_OPENAI_ENDPOINT\"):\n",
" tenant_id = os.environ[\"AZURE_TENANT_ID\"]\n",
" print(\"Usando Azure OpenAI con credencial de Azure Developer CLI para tenant id\", tenant_id)\n",
" default_credential = azure.identity.AzureDeveloperCliCredential(tenant_id=tenant_id)\n",
" token_provider = azure.identity.get_bearer_token_provider(\n",
" default_credential, \"https://cognitiveservices.azure.com/.default\"\n",
" )\n",
" openai_client = openai.OpenAI(\n",
" base_url=os.environ[\"AZURE_OPENAI_ENDPOINT\"] + \"/openai/v1/\",\n",
" api_key=token_provider,\n",
" )\n",
" # Esto es en realidad el nombre del deployment, no el nombre del modelo\n",
" model_name = os.getenv(\"OPENAI_MODEL\", \"gpt-4o\")\n",
"\n",
"print(f\"Usando modelo {model_name}\")"
]
},
{
"cell_type": "markdown",
"id": "74df1ca5",
"metadata": {},
"source": [
"## Convierte PDFs a imágenes"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3664e1d",
"metadata": {},
"outputs": [],
"source": [
"%pip install Pillow PyMuPDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8c56b19",
"metadata": {},
"outputs": [],
"source": [
"import pymupdf\n",
"from PIL import Image\n",
"\n",
"filename = \"../plants.pdf\"\n",
"doc = pymupdf.open(filename)\n",
"for i in range(doc.page_count):\n",
" doc = pymupdf.open(filename)\n",
" page = doc.load_page(i)\n",
" pix = page.get_pixmap()\n",
" original_img = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n",
" original_img.save(f\"page_{i}.png\")"
]
},
{
"cell_type": "markdown",
"id": "f822fb8f",
"metadata": {},
"source": [
"## Envía imágenes al modelo de visión"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1bdaa995",
"metadata": {},
"outputs": [],
"source": [
"import base64\n",
"\n",
"\n",
"def open_image_as_base64(filename):\n",
" with open(filename, \"rb\") as image_file:\n",
" image_data = image_file.read()\n",
" image_base64 = base64.b64encode(image_data).decode(\"utf-8\")\n",
" return f\"data:image/png;base64,{image_base64}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4de5dca7",
"metadata": {},
"outputs": [],
"source": [
"user_content = [{\"text\": \"¿Qué plantas están listadas en estas páginas?\", \"type\": \"text\"}]\n",
"# Procesa solo las primeras páginas, ya que procesar todas las páginas (doc.page_count) es lento\n",
"for i in range(3):\n",
" user_content.append({\"image_url\": {\"url\": open_image_as_base64(f\"../page_{i}.png\")}, \"type\": \"image_url\"})\n",
"\n",
"response = openai_client.chat.completions.create(model=model_name, messages=[{\"role\": \"user\", \"content\": user_content}])\n",
"\n",
"print(response.choices[0].message.content)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading