Merge pull request #94 from aymeric-roucher/agents

Transformers agents cookbook
huggingface · May 13, 2024 · b49ff72 · b49ff72
2 parents 0336d8c + bba8b7e
commit b49ff72
Show file tree

Hide file tree

Showing 2 changed files with 370 additions and 0 deletions.
diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
@@ -50,6 +50,8 @@
     title: Create a legal preference dataset
   - local: semantic_cache_chroma_vector_database
     title: Implementing semantic cache to improve a RAG system.
+  - local: agents
+    title: Build an agent with tool-calling superpowers using Transformers Agents
 
 - title: Computer Vision
   sections:

diff --git a/notebooks/en/agents.ipynb b/notebooks/en/agents.ipynb
@@ -0,0 +1,368 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Build an agent with tool-calling superpowers 🦸 using Transformers Agents\n",
+    "_Authored by: [Aymeric Roucher](https://huggingface.co/m-ric)_\n",
+    "\n",
+    "This notebook demonstrates how you can use [**Transformers Agents**](https://huggingface.co/docs/transformers/en/transformers_agents) to build awesome **agents**!\n",
+    "\n",
+    "What are **agents**? Agents are systems that are powered by an LLM and enable the LLM (with careful prompting and output parsing) to use specific *tools* to solve problems.\n",
+    "\n",
+    "These *tools* are basically functions that the LLM couldn't perform well by itself: for instance for a text-generation LLM like [Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct), this could be an image generation tool, a web search tool, a calculator...\n",
+    "\n",
+    "What is **Transformers Agents**? it's an extension of our `transformers` library that provides building blocks to build your own agents! Learn more about it in the [documentation](https://huggingface.co/docs/transformers/en/transformers_agents).\n",
+    "\n",
+    "Let's see how to use it, and which use cases it can solve.\n",
+    "\n",
+    "We install transformers agents from source since it has not been released as of writing, but later this week when it gets release you can simply install it with `pip install transformers[agents]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install \"git+https://github.com/huggingface/transformers.git#egg=transformers[agents]\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install huggingface_hub langchain sentence-transformers faiss-cpu serpapi google-search-results -q"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🏞️ Multimodal + 🌐 Web-browsing assistant\n",
+    "\n",
+    "For this use case, we want to show an agent that browses the web and is able to generate image.\n",
+    "\n",
+    "To build it, we simply need to have two tools ready: image generation and web search.\n",
+    "- For image generation, we load a tool from the Hub that uses the HF Inference API (Serverless) to generate images using Stable Diffusion.\n",
+    "- For the web search, we load a LangChain tool."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/aymeric/venvs/test/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n",
+      "You're loading a tool from the Hub from None. Please make sure this is a source that you trust as the code within that tool will be executed on your machine. Always verify the code of the tools that you load. We recommend specifying a `revision` to ensure you're loading the code that you have checked.\n",
+      "\u001b[33;1m======== New task ========\u001b[0m\n",
+      "\u001b[37;1mGenerate me a photo of the car that James bond drove in the latest movie.\u001b[0m\n",
+      "\u001b[33;1m==== Agent is executing the code below:\u001b[0m\n",
+      "\u001b[0m\u001b[38;5;7mlatest_movie\u001b[39m\u001b[38;5;7m \u001b[39m\u001b[38;5;109;01m=\u001b[39;00m\u001b[38;5;7m \u001b[39m\u001b[38;5;7msearch\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;144mlatest James Bond movie\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;7m)\u001b[39m\n",
+      "\u001b[38;5;109mprint\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;144mLatest movie:\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;7m,\u001b[39m\u001b[38;5;7m \u001b[39m\u001b[38;5;7mlatest_movie\u001b[39m\u001b[38;5;7m)\u001b[39m\u001b[0m\n",
+      "\u001b[33;1m====\u001b[0m\n",
+      "\u001b[33;1mPrint outputs:\u001b[0m\n",
+      "\u001b[32;20mLatest movie: No Time to Die\n",
+      "\u001b[0m\n",
+      "\u001b[33;1m==== Agent is executing the code below:\u001b[0m\n",
+      "\u001b[0m\u001b[38;5;7mbond_car\u001b[39m\u001b[38;5;7m \u001b[39m\u001b[38;5;109;01m=\u001b[39;00m\u001b[38;5;7m \u001b[39m\u001b[38;5;7msearch\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;144mwhat car did James Bond drive in No Time to Die\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;7m)\u001b[39m\n",
+      "\u001b[38;5;109mprint\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;144mJames Bond\u001b[39m\u001b[38;5;144m'\u001b[39m\u001b[38;5;144ms car:\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;7m,\u001b[39m\u001b[38;5;7m \u001b[39m\u001b[38;5;7mbond_car\u001b[39m\u001b[38;5;7m)\u001b[39m\u001b[0m\n",
+      "\u001b[33;1m====\u001b[0m\n",
+      "\u001b[33;1mPrint outputs:\u001b[0m\n",
+      "\u001b[32;20mJames Bond's car: Aston Martin DB5\n",
+      "\u001b[0m\n",
+      "\u001b[33;1m==== Agent is executing the code below:\u001b[0m\n",
+      "\u001b[0m\u001b[38;5;7mimage\u001b[39m\u001b[38;5;7m \u001b[39m\u001b[38;5;109;01m=\u001b[39;00m\u001b[38;5;7m \u001b[39m\u001b[38;5;7mimage_generator\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;144mA photorealistic image of an Aston Martin DB5, the car driven by James Bond in No Time to Die\u001b[39m\u001b[38;5;144m\"\u001b[39m\u001b[38;5;7m)\u001b[39m\n",
+      "\u001b[38;5;7mfinal_answer\u001b[39m\u001b[38;5;7m(\u001b[39m\u001b[38;5;7mimage\u001b[39m\u001b[38;5;7m)\u001b[39m\u001b[0m\n",
+      "\u001b[33;1m====\u001b[0m\n",
+      "\u001b[33;1mPrint outputs:\u001b[0m\n",
+      "\u001b[32;20m\u001b[0m\n",
+      "\u001b[33;1m>>> Final answer:\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<IPython.core.display.Image object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from transformers import Tool, load_tool, ReactCodeAgent, HfEngine\n",
+    "\n",
+    "# Import tool from Hub\n",
+    "image_generation_tool = load_tool(\"m-ric/text-to-image\")\n",
+    "\n",
+    "# Import tool from LangChain\n",
+    "from langchain.agents import load_tools\n",
+    "\n",
+    "search_tool = Tool.from_langchain(load_tools([\"serpapi\"])[0])\n",
+    "\n",
+    "llm_engine = HfEngine(\"meta-llama/Meta-Llama-3-70B-Instruct\")\n",
+    "# Initialize the agent with both tools\n",
+    "agent = ReactCodeAgent(\n",
+    "    tools=[image_generation_tool, search_tool], llm_engine=llm_engine\n",
+    ")\n",
+    "\n",
+    "# Run it!\n",
+    "agent.run(\"Generate me a photo of the car that James bond drove in the latest movie.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Image of an Aston Martin DB5](\"https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/agents_db5.png\\\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📚💬 Retrieval-Augmented Generation with source selection\n",
+    "\n",
+    "Quick definition: Retrieval-Augmented-Generation (RAG) is “using an LLM to answer a user query, but basing the answer on information retrieved from a knowledge base”. It has many advantages over using a vanilla or fine-tuned LLM: to name a few, it allows to ground the answer on true facts and reduce confabulations, it allows to provide the LLM with domain-specific knowledge, and it allows fine-grained control of access to information from the knowledge base.\n",
+    "\n",
+    "Now let’s say we want to perform RAG, but with the additional constraint that some parameters must be dynamically generated. For example, depending on the user query we could want to restrict the search to specific subsets of the knowledge base, or we could want to adjust the number of documents retrieved. The difficulty is: **how to dynamically adjust these parameters based on the user query?**\n",
+    "\n",
+    "🔧 Well, we can solve this by in a simple way: we will **give our agent control over these parameters!**\n",
+    "\n",
+    "➡️ Let's show how to do this. We first load a knowledge base on which we want to perform RAG: this dataset is a compilation of the documentation pages for many `huggingface` packages, stored as markdown.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datasets\n",
+    "\n",
+    "knowledge_base = datasets.load_dataset(\"m-ric/huggingface_doc\", split=\"train\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we prepare the knowledge base by processing the dataset and storing it into a vector database to be used by the retriever. We are going to use LangChain, since it features excellent utilities for vector databases:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.docstore.document import Document\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import FAISS\n",
+    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
+    "\n",
+    "source_docs = [\n",
+    "    Document(page_content=doc[\"text\"], metadata={\"source\": doc[\"source\"].split(\"/\")[1]})\n",
+    "    for doc in knowledge_base\n",
+    "]\n",
+    "\n",
+    "docs_processed = RecursiveCharacterTextSplitter(chunk_size=500).split_documents(\n",
+    "    source_docs\n",
+    ")[:1000]\n",
+    "\n",
+    "embedding_model = HuggingFaceEmbeddings(model_name=\"thenlper/gte-small\")\n",
+    "vectordb = FAISS.from_documents(documents=docs_processed, embedding=embedding_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have the database ready, let’s build a RAG system that answers user queries based on it!\n",
+    "\n",
+    "We want our system to select only from the most relevant sources of information, depending on the query.\n",
+    "\n",
+    "Our documentation pages come from the following sources:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "all_sources = list(set([doc.metadata[\"source\"] for doc in docs_processed]))\n",
+    "all_sources"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from transformers.agents import Tool\n",
+    "from langchain_core.vectorstores import VectorStore\n",
+    "\n",
+    "\n",
+    "class RetrieverTool(Tool):\n",
+    "    name = \"retriever\"\n",
+    "    description = \"Retrieves some documents from the knowledge base that have the closest embeddings to the input query.\"\n",
+    "    inputs = {\n",
+    "        \"query\": {\n",
+    "            \"type\": \"text\",\n",
+    "            \"description\": \"The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.\",\n",
+    "        },\n",
+    "        \"source\": {\"type\": \"text\", \"description\": \"\"},\n",
+    "    }\n",
+    "    output_type = \"text\"\n",
+    "\n",
+    "    def __init__(self, vectordb: VectorStore, all_sources: str, **kwargs):\n",
+    "        super().__init__(**kwargs)\n",
+    "        self.vectordb = vectordb\n",
+    "        self.inputs[\"source\"][\n",
+    "            \"description\"\n",
+    "        ] = f\"The source of the documents to search, as a str representation of a list. Possible values in the list are: {all_sources}. If this argument is not provided, all sources will be searched.\"\n",
+    "\n",
+    "    def forward(self, query: str, source: str = None) -> str:\n",
+    "        assert isinstance(query, str), \"Your search query must be a string\"\n",
+    "\n",
+    "        if source:\n",
+    "            if isinstance(source, str) and \"[\" not in str(\n",
+    "                source\n",
+    "            ):  # if the source is not representing a list\n",
+    "                source = [source]\n",
+    "            source = json.loads(str(source).replace(\"'\", '\"'))\n",
+    "\n",
+    "        docs = self.vectordb.similarity_search(\n",
+    "            query, filter=({\"source\": source} if source else None), k=3\n",
+    "        )\n",
+    "\n",
+    "        if len(docs) == 0:\n",
+    "            return \"No documents found with this filtering. Try removing the source filter.\"\n",
+    "        return \"Retrieved documents:\\n\\n\" + \"\\n===Document===\\n\".join(\n",
+    "            [doc.page_content for doc in docs]\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### [Optional] Share your Retriever tool to Hub\n",
+    "\n",
+    "To share your tool to the Hub, first copy-paste the code in the RetrieverTool definition cell to a new file named for instance `retriever.py`.\n",
+    "\n",
+    "When the tool is loaded from a separate file, you can then push it to the Hub using the code below (make sure to login with a `write` access token)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import login\n",
+    "\n",
+    "login(\"your_token\")\n",
+    "\n",
+    "from retriever import RetrieverTool\n",
+    "\n",
+    "tool = RetrieverTool(vectordb, all_sources)\n",
+    "\n",
+    "tool.push_to_hub(repo_id=\"m-ric/langchain-retriever-tool\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Run the agent!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\u001b[33;1m======== New task ========\u001b[0m\n",
+      "\u001b[37;1mPlease show me a LORA finetuning script\u001b[0m\n",
+      "\u001b[33;1mCalling tool: 'retriever' with arguments: {'query': 'LORA finetuning script', 'source': 'transformers'}\u001b[0m\n",
+      "\u001b[33;1mCalling tool: 'retriever' with arguments: {'query': 'LORA finetuning script'}\u001b[0m\n",
+      "\u001b[33;1mCalling tool: 'final_answer' with arguments: {'answer': 'https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/text_to_image_lora.py'}\u001b[0m\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Final output:\n",
+      "https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/text_to_image_lora.py\n"
+     ]
+    }
+   ],
+   "source": [
+    "from transformers.agents import HfEngine, ReactJsonAgent\n",
+    "\n",
+    "llm_engine = HfEngine(\"meta-llama/Meta-Llama-3-70B-Instruct\")\n",
+    "\n",
+    "agent = ReactJsonAgent(\n",
+    "    tools=[RetrieverTool(vectordb, all_sources)], llm_engine=llm_engine\n",
+    ")\n",
+    "\n",
+    "agent_output = agent.run(\"Please show me a LORA finetuning script\")\n",
+    "\n",
+    "print(\"Final output:\")\n",
+    "print(agent_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ➡️ Conclusion\n",
+    "\n",
+    "These two use cases should give you a glimpse into the possibilities of our Agents framework!\n",
+    "\n",
+    "For more advanced usage, read the [documentation](https://huggingface.co/docs/transformers/en/transformers_agents), and [this experiment](https://github.com/aymeric-roucher/agent_reasoning_benchmark/blob/main/benchmark_gaia.ipynb) that allowed us to build our own agent based on Llama-3-70B that beats many GPT-4 agents on the very difficult [GAIA Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)!\n",
+    "\n",
+    "All feedback is welcome, it will help us improve the framework together! 🚀"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "test",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}