diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
index a87b25a..d2de2e7 100644
--- a/notebooks/en/_toctree.yml
+++ b/notebooks/en/_toctree.yml
@@ -74,6 +74,8 @@
       sections:
         - local: agents
           title: Build an agent with tool-calling superpowers using Transformers Agents
+        - local: agent_rag
+          title: Agentic RAG - turbocharge your RAG with query reformulation and self-query
 
 - title: Enterprise Hub Cookbook
   isExpanded: True
@@ -88,4 +90,3 @@
     title: Inference Endpoints (Dedicated)
   - local: enterprise_cookbook_argilla
     title: Data annotation with Argilla Spaces
-
diff --git a/notebooks/en/agent_rag.ipynb b/notebooks/en/agent_rag.ipynb
new file mode 100644
index 0000000..78c571d
--- /dev/null
+++ b/notebooks/en/agent_rag.ipynb
@@ -0,0 +1,827 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Agentic RAG: turbocharge your RAG with query reformulation and self-query! 🚀\n",
+    "_Authored by: [Aymeric Roucher](https://huggingface.co/m-ric)_\n",
+    "\n",
+    "> This tutorial is advanced. You should have notions from [this other cookbook](advanced_rag) first!\n",
+    "\n",
+    "> Reminder: Retrieval-Augmented-Generation (RAG) is “using an LLM to answer a user query, but basing the answer on information retrieved from a knowledge base”. It has many advantages over using a vanilla or fine-tuned LLM: to name a few, it allows to ground the answer on true facts and reduce confabulations, it allows to provide the LLM with domain-specific knowledge, and it allows fine-grained control of access to information from the knowledge base.\n",
+    "\n",
+    "But vanilla RAG has limitations, most importantly these two:\n",
+    "- It **performs only one retrieval step**: if the results are bad, the generation in turn will be bad.\n",
+    "- __Semantic similarity is computed with the *user query* as a reference__, which might be suboptimal: for instance, the user query will often be a question and the document containing the true answer will be in affirmative voice, so its similarity score will be downgraded compared to other source documents in the interrogative form, leading to a risk of missing the relevant information.\n",
+    "\n",
+    "But we can alleviate these problems by making a **RAG agent: very simply, an agent armed with a retriever tool!**\n",
+    "\n",
+    "This agent will: ✅ Formulate the query itself and ✅ Critique to re-retrieve if needed.\n",
+    "\n",
+    "So it should naively recover some advanced RAG techniques!\n",
+    "- Instead of directly using the user query as the reference in semantic search, the agent formulates itself a reference sentence that can be closer to the targeted documents, as in [HyDE](https://huggingface.co/papers/2212.10496)\n",
+    "- The agent can the generated snippets and re-retrieve if needed, as in [Self-Query](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/)\n",
+    "\n",
+    "Let's build this system. 🛠️\n",
+    "\n",
+    "Run the line below to install required dependencies:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install pandas langchain langchain-community sentence-transformers faiss-cpu \"transformers[agents]\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We first load a knowledge base on which we want to perform RAG: this dataset is a compilation of the documentation pages for many `huggingface` packages, stored as markdown."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/aymeric/Documents/Code/cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "import datasets\n",
+    "\n",
+    "knowledge_base = datasets.load_dataset(\"m-ric/huggingface_doc\", split=\"train\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we prepare the knowledge base by processing the dataset and storing it into a vector database to be used by the retriever.\n",
+    "\n",
+    "We use [LangChain](https://python.langchain.com/) for its excellent vector database utilities.\n",
+    "For the embedding model, we use [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) since it performed well in our `RAG_evaluation` cookbook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Splitting documents...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 2647/2647 [00:34<00:00, 76.04it/s] \n",
+      "/Users/aymeric/Documents/Code/cookbook/.venv/lib/python3.12/site-packages/langchain_core/_api/deprecation.py:139: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.\n",
+      "  warn_deprecated(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Embedding documents... This should take a few minutes (5 minutes on MacBook with M1 Pro)\n"
+     ]
+    }
+   ],
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "from langchain.docstore.document import Document\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import FAISS\n",
+    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
+    "from langchain_community.vectorstores.utils import DistanceStrategy\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "source_docs = [\n",
+    "    Document(page_content=doc[\"text\"], metadata={\"source\": doc[\"source\"].split(\"/\")[1]})\n",
+    "    for doc in knowledge_base\n",
+    "]\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(\n",
+    "    AutoTokenizer.from_pretrained(\"thenlper/gte-small\"),\n",
+    "    chunk_size=200,\n",
+    "    chunk_overlap=20,\n",
+    "    add_start_index=True,\n",
+    "    strip_whitespace=True,\n",
+    "    separators=[\"\\n\\n\", \"\\n\", \".\", \" \", \"\"],\n",
+    ")\n",
+    "\n",
+    "# Split docs and keep only unique ones\n",
+    "print(\"Splitting documents...\")\n",
+    "docs_processed = []\n",
+    "unique_texts = {}\n",
+    "for doc in tqdm(source_docs):\n",
+    "    new_docs = text_splitter.split_documents([doc])\n",
+    "    for new_doc in new_docs:\n",
+    "        if new_doc.page_content not in unique_texts:\n",
+    "            unique_texts[doc.page_content] = True\n",
+    "            docs_processed.append(new_doc)\n",
+    "\n",
+    "print(\n",
+    "    \"Embedding documents... This should take a few minutes (5 minutes on MacBook with M1 Pro)\"\n",
+    ")\n",
+    "embedding_model = HuggingFaceEmbeddings(model_name=\"thenlper/gte-small\")\n",
+    "vectordb = FAISS.from_documents(\n",
+    "    documents=docs_processed,\n",
+    "    embedding=embedding_model,\n",
+    "    distance_strategy=DistanceStrategy.COSINE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now the database is ready: let’s build our agentic RAG system!\n",
+    "\n",
+    "👉 We only need a `RetrieverTool` that our agent can leverage to retrieve information from the knowledge base."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers.agents import Tool\n",
+    "from langchain_core.vectorstores import VectorStore\n",
+    "\n",
+    "\n",
+    "class RetrieverTool(Tool):\n",
+    "    name = \"retriever\"\n",
+    "    description = \"Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query.\"\n",
+    "    inputs = {\n",
+    "        \"query\": {\n",
+    "            \"type\": \"text\",\n",
+    "            \"description\": \"The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.\",\n",
+    "        }\n",
+    "    }\n",
+    "    output_type = \"text\"\n",
+    "\n",
+    "    def __init__(self, vectordb: VectorStore, **kwargs):\n",
+    "        super().__init__(**kwargs)\n",
+    "        self.vectordb = vectordb\n",
+    "\n",
+    "    def forward(self, query: str) -> str:\n",
+    "        assert isinstance(query, str), \"Your search query must be a string\"\n",
+    "\n",
+    "        docs = self.vectordb.similarity_search(\n",
+    "            query,\n",
+    "            k=7,\n",
+    "        )\n",
+    "\n",
+    "        return \"\\nRetrieved documents:\\n\" + \"\".join(\n",
+    "            [\n",
+    "                f\"===== Document {str(i)} =====\\n\" + doc.page_content\n",
+    "                for i, doc in enumerate(docs)\n",
+    "            ]\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now it’s straightforward to create an agent that leverages this tool!\n",
+    "\n",
+    "The agent will need these arguments upon initialization:\n",
+    "- *`tools`*: a list of tools that the agent will be able to call.\n",
+    "- *`llm_engine`*: the LLM that powers the agent.\n",
+    "\n",
+    "Our `llm_engine` must be a callable that takes as input a list of [messages](https://huggingface.co/docs/transformers/main/chat_templating) and returns text. It also needs to accept a `stop_sequences` argument that indicates when to stop its generation. For convenience, we directly use the `HfEngine` class provided in the package to get a LLM engine that calls our [Inference API](https://huggingface.co/docs/api-inference/en/index).\n",
+    "\n",
+    "And we use [CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus) as the llm engine because:\n",
+    "- It has a long 128k context, which is helpful for processing long source documents\n",
+    "- It is served for free at all times on HF's Inference API!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers.agents import HfEngine, ReactJsonAgent\n",
+    "\n",
+    "llm_engine = HfEngine(\"CohereForAI/c4ai-command-r-plus\")\n",
+    "\n",
+    "retriever_tool = RetrieverTool(vectordb)\n",
+    "agent = ReactJsonAgent(\n",
+    "    tools=[retriever_tool], llm_engine=llm_engine, max_iterations=4, verbose=2\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since we initialized the agent as a `ReactJsonAgent`, it has been automatically given a default system prompt that tells the LLM engine to process step-by-step and generate tool calls as JSON blobs (you could replace this prompt template with your own as needed).\n",
+    "\n",
+    "Then when its `.run()` method is launched, the agent takes care of calling the LLM engine, parsing the tool call JSON blobs and executing these tool calls, all in a loop that ends only when the final answer is provided."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\u001b[33;1m======== New task ========\u001b[0m\n",
+      "\u001b[37;1mHow can I push a model to the Hub?\u001b[0m\n",
+      "\u001b[38;20mSystem prompt is as follows:\u001b[0m\n",
+      "\u001b[38;20mYou are an expert assistant who can solve any task using JSON tool calls. You will be given a task to solve as best you can.\n",
+      "To do so, you have been given access to the following tools: 'retriever', 'final_answer'\n",
+      "The way you use the tools is by specifying a json blob, ending with '<end_action>'.\n",
+      "Specifically, this json should have an `action` key (name of the tool to use) and an `action_input` key (input to the tool).\n",
+      "\n",
+      "The $ACTION_JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. It should be formatted in json. Do not try to escape special characters. Here is the template of a valid $ACTION_JSON_BLOB:\n",
+      "{\n",
+      "  \"action\": $TOOL_NAME,\n",
+      "  \"action_input\": $INPUT\n",
+      "}<end_action>\n",
+      "\n",
+      "Make sure to have the $INPUT as a dictionary in the right format for the tool you are using, and do not put variable names as input if you can find the right values.\n",
+      "\n",
+      "You should ALWAYS use the following format:\n",
+      "\n",
+      "Thought: you should always think about one action to take. Then use the action as follows:\n",
+      "Action:\n",
+      "$ACTION_JSON_BLOB\n",
+      "Observation: the result of the action\n",
+      "... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $ACTION_JSON_BLOB must only use a SINGLE action at a time.)\n",
+      "\n",
+      "You can use the result of the previous action as input for the next action.\n",
+      "The observation will always be a string: it can represent a file, like \"image_1.jpg\".\n",
+      "Then you can use it as input for the next action. You can do it for instance as follows:\n",
+      "\n",
+      "Observation: \"image_1.jpg\"\n",
+      "\n",
+      "Thought: I need to transform the image that I received in the previous observation to make it green.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"image_transformer\",\n",
+      "  \"action_input\": {\"image\": \"image_1.jpg\"}\n",
+      "}<end_action>\n",
+      "\n",
+      "To provide the final answer to the task, use an action blob with \"action\": \"final_answer\" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this:\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"final_answer\",\n",
+      "  \"action_input\": {\"answer\": \"insert your final answer here\"}\n",
+      "}<end_action>\n",
+      "\n",
+      "\n",
+      "Here are a few examples using notional tools:\n",
+      "---\n",
+      "Task: \"Generate an image of the oldest person in this document.\"\n",
+      "\n",
+      "Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"document_qa\",\n",
+      "  \"action_input\": {\"document\": \"document.pdf\", \"question\": \"Who is the oldest person mentioned?\"}\n",
+      "}<end_action>\n",
+      "Observation: \"The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland.\"\n",
+      "\n",
+      "\n",
+      "Thought: I will now generate an image showcasing the oldest person.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"image_generator\",\n",
+      "  \"action_input\": {\"text\": \"\"A portrait of John Doe, a 55-year-old man living in Canada.\"\"}\n",
+      "}<end_action>\n",
+      "Observation: \"image.png\"\n",
+      "\n",
+      "Thought: I will now return the generated image.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"final_answer\",\n",
+      "  \"action_input\": \"image.png\"\n",
+      "}<end_action>\n",
+      "\n",
+      "---\n",
+      "Task: \"What is the result of the following operation: 5 + 3 + 1294.678?\"\n",
+      "\n",
+      "Thought: I will use python code evaluator to compute the result of the operation and then return the final answer using the `final_answer` tool\n",
+      "Action:\n",
+      "{\n",
+      "    \"action\": \"python_interpreter\",\n",
+      "    \"action_input\": {\"code\": \"5 + 3 + 1294.678\"}\n",
+      "}<end_action>\n",
+      "Observation: 1302.678\n",
+      "\n",
+      "Thought: Now that I know the result, I will now return it.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"final_answer\",\n",
+      "  \"action_input\": \"1302.678\"\n",
+      "}<end_action>\n",
+      "\n",
+      "---\n",
+      "Task: \"Which city has the highest population , Guangzhou or Shanghai?\"\n",
+      "\n",
+      "Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\n",
+      "Action:\n",
+      "{\n",
+      "    \"action\": \"search\",\n",
+      "    \"action_input\": \"Population Guangzhou\"\n",
+      "}<end_action>\n",
+      "Observation: ['Guangzhou has a population of 15 million inhabitants as of 2021.']\n",
+      "\n",
+      "\n",
+      "Thought: Now let's get the population of Shanghai using the tool 'search'.\n",
+      "Action:\n",
+      "{\n",
+      "    \"action\": \"search\",\n",
+      "    \"action_input\": \"Population Shanghai\"\n",
+      "}\n",
+      "Observation: '26 million (2019)'\n",
+      "\n",
+      "Thought: Now I know that Shanghai has a larger population. Let's return the result.\n",
+      "Action:\n",
+      "{\n",
+      "  \"action\": \"final_answer\",\n",
+      "  \"action_input\": \"Shanghai\"\n",
+      "}<end_action>\n",
+      "\n",
+      "\n",
+      "Above example were using notional tools that might not exist for you. You only have access to those tools:\n",
+      "\n",
+      "- retriever: Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query.\n",
+      "    Takes inputs: {'query': {'type': 'text', 'description': 'The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.'}}\n",
+      "\n",
+      "- final_answer: Provides a final answer to the given problem\n",
+      "    Takes inputs: {'answer': {'type': 'text', 'description': 'The final answer to the problem'}}\n",
+      "\n",
+      "Here are the rules you should always follow to solve your task:\n",
+      "1. ALWAYS provide a 'Thought:' sequence, and an 'Action:' sequence that ends with <end_action>, else you will fail.\n",
+      "2. Always use the right arguments for the tools. Never use variable names in the 'action_input' field, use the value instead.\n",
+      "3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.\n",
+      "4. Never re-do a tool call that you previously did with the exact same parameters.\n",
+      "\n",
+      "Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n",
+      "\u001b[0m\n",
+      "\u001b[38;20m===== New step =====\u001b[0m\n",
+      "===== Calling LLM with this last message: =====\n",
+      "{'role': <MessageRole.USER: 'user'>, 'content': 'Task: How can I push a model to the Hub?'}\n",
+      "\u001b[38;20m===== Output message of the LLM: =====\u001b[0m\n",
+      "\u001b[38;20mThought: I can use the \"retriever\" tool to find documents relevant to the question, \"How can I push a model to the Hub?\" I will then read through the retrieved documents to find the relevant information and provide an answer to the question.\n",
+      "\n",
+      "Action: ```json\n",
+      "{\n",
+      "  \"action\": \"retriever\",\n",
+      "  \"action_input\": {\n",
+      "    \"query\": \"How can I push a model to the Hub?\"\n",
+      "  }\n",
+      "}\u001b[0m\n",
+      "\u001b[38;20m===== Extracting action =====\u001b[0m\n",
+      "\u001b[33;1mCalling tool: 'retriever' with arguments: {'query': 'How can I push a model to the Hub?'}\u001b[0m\n",
+      "Retrieved documents:\n",
+      "===== Document 0 =====\n",
+      "# Step 7. Push everything to the Hub\n",
+      "    api.upload_folder(\n",
+      "        repo_id=repo_id,\n",
+      "        folder_path=repo_local_path,\n",
+      "        path_in_repo=\".\",\n",
+      "    )\n",
+      "\n",
+      "    print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)\n",
+      "```\n",
+      "\n",
+      "### .\n",
+      "\n",
+      "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.===== Document 1 =====\n",
+      "```py\n",
+      ">>> trainer.push_to_hub()\n",
+      "```\n",
+      "</pt>\n",
+      "<tf>\n",
+      "Share a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:\n",
+      "\n",
+      "- An output directory for your model.\n",
+      "- A tokenizer.\n",
+      "- The `hub_model_id`, which is your Hub username and model name.\n",
+      "\n",
+      "```py\n",
+      ">>> from transformers import PushToHubCallback\n",
+      "\n",
+      ">>> push_to_hub_callback = PushToHubCallback(\n",
+      "...     output_dir=\"./your_model_save_path\", tokenizer=tokenizer, hub_model_id=\"your-username/my-awesome-model\"\n",
+      "... )\n",
+      "```===== Document 2 =====\n",
+      "Let's pretend we've now fine-tuned the model. The next step would be to push it to the Hub! We can do this with the `timm.models.hub.push_to_hf_hub` function.\n",
+      "\n",
+      "```py\n",
+      ">>> model_cfg = dict(labels=['a', 'b', 'c', 'd'])\n",
+      ">>> timm.models.hub.push_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\n",
+      "```\n",
+      "\n",
+      "Running the above would push the model to `<your-username>/resnet18-random` on the Hub. You can now share this model with your friends, or use it in your own code!\n",
+      "\n",
+      "## Loading a Model===== Document 3 =====\n",
+      "processor.push_to_hub(hub_model_id)\n",
+      "trainer.push_to_hub(**kwargs)\n",
+      "```\n",
+      "\n",
+      "# 4. Inference\n",
+      "\n",
+      "Now comes the exciting part, using our fine-tuned model! In this section, we'll show how you can load your model from the hub and use it for inference.===== Document 4 =====\n",
+      "--push_to_hub\n",
+      "```===== Document 5 =====\n",
+      ". The second way to upload a model, though, is to call model.push_to_hub(). So this is more of a once-off method - it's not called regularly during training. You can just call this manually whenever you want to upload a model to the hub. So we recommend running this after the end of training, just to make sure that you have a commit message just to guarantee that this was the final version of the model at the end of training. And it just makes sure that you're working with the definitive end-of-training model and not accidentally using a model that's from a checkpoint somewhere along the way===== Document 6 =====\n",
+      "Finally, if you want, you can push your model up to the hub. Here, we'll push it up if you specified `push_to_hub=True` in the training configuration. Note that in order to push to hub, you'll have to have git-lfs installed and be logged into your Hugging Face account (which can be done via `huggingface-cli login`).\n",
+      "\n",
+      "```python\n",
+      "kwargs = {\n",
+      "    \"finetuned_from\": model.config._name_or_path,\n",
+      "    \"tasks\": \"image-classification\",\n",
+      "    \"dataset\": 'beans',\n",
+      "    \"tags\": ['image-classification'],\n",
+      "}\n",
+      "\u001b[38;20m===== New step =====\u001b[0m\n",
+      "===== Calling LLM with this last message: =====\n",
+      "{'role': <MessageRole.TOOL_RESPONSE: 'tool-response'>, 'content': 'Observation: Retrieved documents:\\n===== Document 0 =====\\n# Step 7. Push everything to the Hub\\n    api.upload_folder(\\n        repo_id=repo_id,\\n        folder_path=repo_local_path,\\n        path_in_repo=\".\",\\n    )\\n\\n    print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)\\n```\\n\\n### .\\n\\nBy using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.===== Document 1 =====\\n```py\\n>>> trainer.push_to_hub()\\n```\\n</pt>\\n<tf>\\nShare a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:\\n\\n- An output directory for your model.\\n- A tokenizer.\\n- The `hub_model_id`, which is your Hub username and model name.\\n\\n```py\\n>>> from transformers import PushToHubCallback\\n\\n>>> push_to_hub_callback = PushToHubCallback(\\n...     output_dir=\"./your_model_save_path\", tokenizer=tokenizer, hub_model_id=\"your-username/my-awesome-model\"\\n... )\\n```===== Document 2 =====\\nLet\\'s pretend we\\'ve now fine-tuned the model. The next step would be to push it to the Hub! We can do this with the `timm.models.hub.push_to_hf_hub` function.\\n\\n```py\\n>>> model_cfg = dict(labels=[\\'a\\', \\'b\\', \\'c\\', \\'d\\'])\\n>>> timm.models.hub.push_to_hf_hub(model, \\'resnet18-random\\', model_config=model_cfg)\\n```\\n\\nRunning the above would push the model to `<your-username>/resnet18-random` on the Hub. You can now share this model with your friends, or use it in your own code!\\n\\n## Loading a Model===== Document 3 =====\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n```\\n\\n# 4. Inference\\n\\nNow comes the exciting part, using our fine-tuned model! In this section, we\\'ll show how you can load your model from the hub and use it for inference.===== Document 4 =====\\n--push_to_hub\\n```===== Document 5 =====\\n. The second way to upload a model, though, is to call model.push_to_hub(). So this is more of a once-off method - it\\'s not called regularly during training. You can just call this manually whenever you want to upload a model to the hub. So we recommend running this after the end of training, just to make sure that you have a commit message just to guarantee that this was the final version of the model at the end of training. And it just makes sure that you\\'re working with the definitive end-of-training model and not accidentally using a model that\\'s from a checkpoint somewhere along the way===== Document 6 =====\\nFinally, if you want, you can push your model up to the hub. Here, we\\'ll push it up if you specified `push_to_hub=True` in the training configuration. Note that in order to push to hub, you\\'ll have to have git-lfs installed and be logged into your Hugging Face account (which can be done via `huggingface-cli login`).\\n\\n```python\\nkwargs = {\\n    \"finetuned_from\": model.config._name_or_path,\\n    \"tasks\": \"image-classification\",\\n    \"dataset\": \\'beans\\',\\n    \"tags\": [\\'image-classification\\'],\\n}'}\n",
+      "\u001b[38;20m===== Output message of the LLM: =====\u001b[0m\n",
+      "\u001b[38;20mThought: I have found multiple ways to push a model to the Hub. In order to answer comprehensively, I will provide each code snippet that describes how to push a model to the Hub.\n",
+      "\n",
+      "Action: ```json\n",
+      "{\n",
+      "  \"action\": \"final_answer\",\n",
+      "  \"action_input\": {\n",
+      "    \"answer\": \"There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\\n\\nUsing the `api`:\\n```python\\napi.upload_folder(\\n    repo_id=repo_id,\\n    folder_path=repo_local_path,\\n    path_in_repo=\\\".\\\",\\n)\\n\\nprint(\\\"Your model is pushed to the Hub. You can view your model here:\\\", repo_url)\\n```\\n\\nWith Transformers:\\n```python\\nfrom transformers import PushToHubCallback\\n\\n# Initialize the callback with the output directory,\\ntokenizer, and your Hub username and model name\\npush_to_hub_callback = PushToHubCallback(\\n    output_dir=\\\"./your_model_save_path\\\",\\n    tokenizer=tokenizer,\\n    hub_model_id=\\\"your-username/my-awesome-model\\\"\\n)\\n\\n# Assuming `trainer` is your Trainer object\\ntrainer.add_callback(push_to_hub_callback)\\n```\\n\\nUsing `timm`:\\n```python\\nfrom timm.models.hub import push_to_hf_hub\\n\\n# Assuming `model` is your fine-tuned model\\nmodel_cfg = {\\\"labels\\\": [\\\"a\\\", \\\"b\\\", \\\"c\\\", \\\"d\\\"]}\\npush_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\\n```\\n\\nFor computer vision models, you can also use `push_to_hub`:\\n```python\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n```\\n\\nYou can also manually push a model with `model.push_to_hub()`:\\n```python\\nmodel.push_to_hub()\\n```\\n\\nAdditionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\"\n",
+      "  }\n",
+      "}\u001b[0m\n",
+      "\u001b[38;20m===== Extracting action =====\u001b[0m\n",
+      "\u001b[33;1mCalling tool: 'final_answer' with arguments: {'answer': \"There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\\n\\nUsing the `api`:\\npython\\napi.upload_folder(\\n    repo_id=repo_id,\\n    folder_path=repo_local_path,\\n    path_in_repo='.',\\n)\\n\\nprint('Your model is pushed to the Hub. You can view your model here:', repo_url)\\n\\n\\nWith Transformers:\\npython\\nfrom transformers import PushToHubCallback\\n\\n# Initialize the callback with the output directory,\\ntokenizer, and your Hub username and model name\\npush_to_hub_callback = PushToHubCallback(\\n    output_dir='./your_model_save_path',\\n    tokenizer=tokenizer,\\n    hub_model_id='your-username/my-awesome-model'\\n)\\n\\n# Assuming `trainer` is your Trainer object\\ntrainer.add_callback(push_to_hub_callback)\\n\\n\\nUsing `timm`:\\npython\\nfrom timm.models.hub import push_to_hf_hub\\n\\n# Assuming `model` is your fine-tuned model\\nmodel_cfg = {'labels': ['a', 'b', 'c', 'd']}\\npush_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\\n\\n\\nFor computer vision models, you can also use `push_to_hub`:\\npython\\nprocessor.push_to_hub(hub_model_id)\\ntrainer.push_to_hub(**kwargs)\\n\\n\\nYou can also manually push a model with `model.push_to_hub()`:\\npython\\nmodel.push_to_hub()\\n\\n\\nAdditionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\"}\u001b[0m\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Final output:\n",
+      "There are multiple ways to push a model to the Hub. Here are a few examples using different libraries and functions:\n",
+      "\n",
+      "Using the `api`:\n",
+      "python\n",
+      "api.upload_folder(\n",
+      "    repo_id=repo_id,\n",
+      "    folder_path=repo_local_path,\n",
+      "    path_in_repo='.',\n",
+      ")\n",
+      "\n",
+      "print('Your model is pushed to the Hub. You can view your model here:', repo_url)\n",
+      "\n",
+      "\n",
+      "With Transformers:\n",
+      "python\n",
+      "from transformers import PushToHubCallback\n",
+      "\n",
+      "# Initialize the callback with the output directory,\n",
+      "tokenizer, and your Hub username and model name\n",
+      "push_to_hub_callback = PushToHubCallback(\n",
+      "    output_dir='./your_model_save_path',\n",
+      "    tokenizer=tokenizer,\n",
+      "    hub_model_id='your-username/my-awesome-model'\n",
+      ")\n",
+      "\n",
+      "# Assuming `trainer` is your Trainer object\n",
+      "trainer.add_callback(push_to_hub_callback)\n",
+      "\n",
+      "\n",
+      "Using `timm`:\n",
+      "python\n",
+      "from timm.models.hub import push_to_hf_hub\n",
+      "\n",
+      "# Assuming `model` is your fine-tuned model\n",
+      "model_cfg = {'labels': ['a', 'b', 'c', 'd']}\n",
+      "push_to_hf_hub(model, 'resnet18-random', model_config=model_cfg)\n",
+      "\n",
+      "\n",
+      "For computer vision models, you can also use `push_to_hub`:\n",
+      "python\n",
+      "processor.push_to_hub(hub_model_id)\n",
+      "trainer.push_to_hub(**kwargs)\n",
+      "\n",
+      "\n",
+      "You can also manually push a model with `model.push_to_hub()`:\n",
+      "python\n",
+      "model.push_to_hub()\n",
+      "\n",
+      "\n",
+      "Additionally, you can opt to push your model to the Hub at the end of training by specifying `push_to_hub=True` in the training configuration. Don't forget to have git-lfs installed and be logged into your Hugging Face account.\n"
+     ]
+    }
+   ],
+   "source": [
+    "agent_output = agent.run(\"How can I push a model to the Hub?\")\n",
+    "\n",
+    "print(\"Final output:\")\n",
+    "print(agent_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Agentic RAG vs. standard RAG\n",
+    "\n",
+    "Does the agent setup make a better RAG system? Well, let's comapre it to a standard RAG system using LLM Judge!\n",
+    "\n",
+    "We will use [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) for evaluation since it's one of the strongest OS models we tested for LLM judge use cases."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "eval_dataset = datasets.load_dataset(\"m-ric/huggingface_doc_qa_eval\", split=\"train\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before running the test let's make the agent less verbose."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import logging\n",
+    "\n",
+    "agent.logger.setLevel(logging.WARNING)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "outputs_agentic_rag = []\n",
+    "\n",
+    "for example in tqdm(eval_dataset):\n",
+    "    question = example[\"question\"]\n",
+    "\n",
+    "    enhanced_question = f\"\"\"Using the information contained in your knowledge base, which you can access with the 'retriever' tool,\n",
+    "give a comprehensive answer to the question below.\n",
+    "Respond only to the question asked, response should be concise and relevant to the question.\n",
+    "If you cannot find information, do not give up and try calling your retriever again with different arguments!\n",
+    "Make sure to have covered the question completely by calling the retriever tool several times with semantically different queries.\n",
+    "Your queries should not be questions but affirmative form sentences: e.g. rather than \"How do I load a model from the Hub in bf16?\", query should be \"load a model from the Hub bf16 weights\".\n",
+    "\n",
+    "Question:\n",
+    "{question}\"\"\"\n",
+    "    answer = agent.run(enhanced_question)\n",
+    "    print(\"=======================================================\")\n",
+    "    print(f\"Question: {question}\")\n",
+    "    print(f\"Answer: {answer}\")\n",
+    "    print(f'True answer: {example[\"answer\"]}')\n",
+    "\n",
+    "    results_agentic = {\n",
+    "        \"question\": question,\n",
+    "        \"true_answer\": example[\"answer\"],\n",
+    "        \"source_doc\": example[\"source_doc\"],\n",
+    "        \"generated_answer\": answer,\n",
+    "    }\n",
+    "    outputs_agentic_rag.append(results_agentic)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import InferenceClient\n",
+    "\n",
+    "reader_llm = InferenceClient(\"CohereForAI/c4ai-command-r-plus\")\n",
+    "\n",
+    "outputs_standard_rag = []\n",
+    "\n",
+    "for example in tqdm(eval_dataset):\n",
+    "    question = example[\"question\"]\n",
+    "    context = retriever_tool(question)\n",
+    "\n",
+    "    prompt = f\"\"\"Given the question and supporting documents below, give a comprehensive answer to the question.\n",
+    "Respond only to the question asked, response should be concise and relevant to the question.\n",
+    "Provide the number of the source document when relevant.\n",
+    "If you cannot find information, do not give up and try calling your retriever again with different arguments!\n",
+    "\n",
+    "Question:\n",
+    "{question}\n",
+    "\n",
+    "{context}\n",
+    "\"\"\"\n",
+    "    messages = [{\"role\": \"user\", \"content\": prompt}]\n",
+    "    answer = reader_llm.chat_completion(messages).choices[0].message.content\n",
+    "\n",
+    "    print(\"=======================================================\")\n",
+    "    print(f\"Question: {question}\")\n",
+    "    print(f\"Answer: {answer}\")\n",
+    "    print(f'True answer: {example[\"answer\"]}')\n",
+    "\n",
+    "    results_agentic = {\n",
+    "        \"question\": question,\n",
+    "        \"true_answer\": example[\"answer\"],\n",
+    "        \"source_doc\": example[\"source_doc\"],\n",
+    "        \"generated_answer\": answer,\n",
+    "    }\n",
+    "    outputs_standard_rag.append(results_agentic)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The evaluation prompt follows some of the best principles shown in [our llm_judge cookbook](llm_judge): it follows a small integer Likert scale, has clear criteria, and a description for each score."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "EVALUATION_PROMPT = \"\"\"You are a fair evaluator language model.\n",
+    "\n",
+    "You will be given an instruction, a response to evaluate, a reference answer that gets a score of 3, and a score rubric representing a evaluation criteria are given.\n",
+    "1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n",
+    "2. After writing a feedback, write a score that is an integer between 1 and 3. You should refer to the score rubric.\n",
+    "3. The output format should look as follows: \\\"Feedback: {{write a feedback for criteria}} [RESULT] {{an integer number between 1 and 3}}\\\"\n",
+    "4. Please do not generate any other opening, closing, and explanations. Be sure to include [RESULT] in your output.\n",
+    "5. Do not score conciseness: a correct answer that covers the question should receive max score, even if it contains additional useless information.\n",
+    "\n",
+    "The instruction to evaluate:\n",
+    "{instruction}\n",
+    "\n",
+    "Response to evaluate:\n",
+    "{response}\n",
+    "\n",
+    "Reference Answer (Score 3):\n",
+    "{reference_answer}\n",
+    "\n",
+    "Score Rubrics:\n",
+    "[Is the response complete, accurate, and factual based on the reference answer?]\n",
+    "Score 1: The response is completely incomplete, inaccurate, and/or not factual.\n",
+    "Score 2: The response is somewhat complete, accurate, and/or factual.\n",
+    "Score 3: The response is completely complete, accurate, and/or factual.\n",
+    "\n",
+    "Feedback:\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import InferenceClient\n",
+    "\n",
+    "evaluation_client = InferenceClient(\"meta-llama/Meta-Llama-3-70B-Instruct\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 65/65 [02:24<00:00,  2.23s/it]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Average score for agentic RAG: 78.5%\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 65/65 [02:17<00:00,  2.12s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Average score for standard RAG: 70.0%\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "for type, outputs in [\n",
+    "    (\"agentic\", outputs_agentic_rag),\n",
+    "    (\"standard\", outputs_standard_rag),\n",
+    "]:\n",
+    "    for experiment in tqdm(outputs):\n",
+    "        eval_prompt = EVALUATION_PROMPT.format(\n",
+    "            instruction=experiment[\"question\"],\n",
+    "            response=experiment[\"generated_answer\"],\n",
+    "            reference_answer=experiment[\"true_answer\"],\n",
+    "        )\n",
+    "        messages = [\n",
+    "            {\"role\": \"system\", \"content\": \"You are a fair evaluator language model.\"},\n",
+    "            {\"role\": \"user\", \"content\": eval_prompt},\n",
+    "        ]\n",
+    "\n",
+    "        eval_result = evaluation_client.text_generation(\n",
+    "            eval_prompt, max_new_tokens=1000\n",
+    "        )\n",
+    "        try:\n",
+    "            feedback, score = [item.strip() for item in eval_result.split(\"[RESULT]\")]\n",
+    "            experiment[\"eval_score_LLM_judge\"] = score\n",
+    "            experiment[\"eval_feedback_LLM_judge\"] = feedback\n",
+    "        except:\n",
+    "            print(f\"Parsing failed - output was: {eval_result}\")\n",
+    "\n",
+    "    results = pd.DataFrame.from_dict(outputs)\n",
+    "    results = results.loc[~results[\"generated_answer\"].str.contains(\"Error\")]\n",
+    "    results[\"eval_score_LLM_judge_int\"] = (\n",
+    "        results[\"eval_score_LLM_judge\"].fillna(1).apply(lambda x: int(x))\n",
+    "    )\n",
+    "    results[\"eval_score_LLM_judge_int\"] = (results[\"eval_score_LLM_judge_int\"] - 1) / 2\n",
+    "\n",
+    "    print(\n",
+    "        f\"Average score for {type} RAG: {results['eval_score_LLM_judge_int'].mean()*100:.1f}%\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Let us recap: the Agent setup improves scores by 8.5% compared to a standard RAG!** (from 70.0% to 78.5%)\n",
+    "\n",
+    "This is a great improvement, with a very simple setup 🚀\n",
+    "\n",
+    "(For a baseline, using Llama-3-70B without the knowledge base got 36%)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "disposable",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/notebooks/en/agents.ipynb b/notebooks/en/agents.ipynb
index 92ba0ed..f156c8c 100644
--- a/notebooks/en/agents.ipynb
+++ b/notebooks/en/agents.ipynb
@@ -703,9 +703,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "test2",
+   "display_name": "disposable",
    "language": "python",
-   "name": "test2"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -717,7 +717,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.0"
+   "version": "3.10.14"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/en/rag_evaluation.ipynb b/notebooks/en/rag_evaluation.ipynb
index 7d33aa1..812a2bb 100644
--- a/notebooks/en/rag_evaluation.ipynb
+++ b/notebooks/en/rag_evaluation.ipynb
@@ -1148,7 +1148,7 @@
     "\n",
     "def run_rag_tests(\n",
     "    eval_dataset: datasets.Dataset,\n",
-    "    llm: BaseChatModel,\n",
+    "    llm,\n",
     "    knowledge_index: VectorStore,\n",
     "    output_file: str,\n",
     "    reranker: Optional[RAGPretrainedModel] = None,\n",
@@ -1257,7 +1257,7 @@
     "\n",
     "def evaluate_answers(\n",
     "    answer_path: str,\n",
-    "    eval_chat_model: BaseChatModel,\n",
+    "    eval_chat_model,\n",
     "    evaluator_name: str,\n",
     "    evaluation_prompt_template: ChatPromptTemplate,\n",
     ") -> None:\n",
@@ -1465,7 +1465,7 @@
     "    },\n",
     "    color_continuous_scale=\"bluered\",\n",
     ")\n",
-    "fig.update_layout(w\n",
+    "fig.update_layout(\n",
     "    width=1000,\n",
     "    height=600,\n",
     "    barmode=\"group\",\n",