Chain-of-Verification - Prompt Engineering

Chain-of-Verification is a prompt engineering technique to reduce hallucinations from LLMs! Research Paper (Meta AI): https://arxiv.org/pdf/2309.11495.pdf This recipe uses AIConfig - an open-source config-based framework for building generative AI applications. https://github.com/lastmile-ai/aiconfig/tree/main
huggingface · Feb 16, 2024 · 129d4fe · 129d4fe
1 parent 938c4ae
commit 129d4fe
Showing 1 changed file with 386 additions and 0 deletions.
diff --git a/notebooks/en/chain_of_verification.ipynb b/notebooks/en/chain_of_verification.ipynb
@@ -0,0 +1,386 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "_KIVLBz840Bi"
+      },
+      "source": [
+        "# Chain-of-Verification Recipe - Prompt Engineering\n",
+        "Chain-of-Verification (CoVe) is a **prompt engineering technique to reduce hallucinations!** An LLM generates a baseline response to a user query, but this might contain errors. CoVe helps by creating a plan comprising of verification questions that are used to validate the information. This process results in more accurate answers than the initial response. The final answer is revised based on these validations. **[ Link to Paper](https://arxiv.org/pdf/2309.11495.pdf)**\n",
+        "\n",
+        "**Check out the open-source tool used here! 🚀 [AIConfig Github Repo](https://github.com/lastmile-ai/aiconfig)**\n",
+        "\n",
+        "[Link to Colab](https://colab.research.google.com/drive/1h_Cneit5S2wI4nVPKI8AWGzTadFHwDk3#scrollTo=4MiNxiJc9GPI)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "k3tsITZhVFp-"
+      },
+      "outputs": [],
+      "source": [
+        "# Install AIConfig package\n",
+        "!pip install python-aiconfig"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "51w-3OZC_Z97"
+      },
+      "outputs": [],
+      "source": [
+        "# Import required modules from AIConfig and other dependencies\n",
+        "import openai\n",
+        "import json\n",
+        "import pandas as pd\n",
+        "from aiconfig import AIConfigRuntime, CallbackManager, InferenceOptions\n",
+        "from IPython.display import display, Markdown\n",
+        "\n",
+        "# Use your OpenAI Key\n",
+        "import os\n",
+        "os.environ['OPENAI_API_KEY'] = userdata.get('openai_key')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Z1Y91C-iIyq4"
+      },
+      "source": [
+        "**The cell below defines the CoVe prompt template config.**\n",
+        "\n",
+        "Alternatively, you can also download the config [here](https://github.com/lastmile-ai/aiconfig/blob/main/cookbooks/Chain-of-Verification/cove_template_config.json) and load the config with\n",
+        "\n",
+        "`config = AIConfigRuntime.load('cove_template_config.json')`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "cellView": "form",
+        "id": "8VQVZicOGN5b"
+      },
+      "outputs": [],
+      "source": [
+        "# @title\n",
+        "cove_template_config = {\n",
+        "  \"name\": \"Chain-of-Verification (CoVe)  Template\",\n",
+        "  \"schema_version\": \"latest\",\n",
+        "  \"metadata\": {\n",
+        "    \"models\": {\n",
+        "      \"gpt-4\": {\n",
+        "        \"model\": \"gpt-4\",\n",
+        "        \"top_p\": 1,\n",
+        "        \"temperature\": 0,\n",
+        "        \"presence_penalty\": 0,\n",
+        "        \"frequency_penalty\": 0\n",
+        "      }\n",
+        "    },\n",
+        "    \"parameters\": {\n",
+        "      \"baseline_prompt\": \"Name 25 politicians who were born in New York City, New York. \",\n",
+        "      \"verification_question\": \"Where was {{entity}} born? \"\n",
+        "    }\n",
+        "  },\n",
+        "  \"prompts\": [\n",
+        "    {\n",
+        "      \"name\": \"baseline_response_gen\",\n",
+        "      \"input\": \"{{baseline_prompt}}\",\n",
+        "      \"metadata\": {\n",
+        "        \"model\": {\n",
+        "          \"name\": \"gpt-4\",\n",
+        "          \"settings\": {\n",
+        "            \"system_prompt\": \"\"\n",
+        "          }\n",
+        "        },\n",
+        "        \"parameters\": {},\n",
+        "        \"remember_chat_context\": False\n",
+        "      }\n",
+        "    },\n",
+        "    {\n",
+        "      \"name\": \"verification\",\n",
+        "      \"input\": \"{{verification_question}}\",\n",
+        "      \"metadata\": {\n",
+        "        \"model\": {\n",
+        "          \"name\": \"gpt-4\",\n",
+        "          \"settings\": {\n",
+        "            \"system_prompt\": \"{{entity}}\"\n",
+        "          }\n",
+        "        },\n",
+        "        \"parameters\": {\n",
+        "          \"entity\": \"George Pataki\"\n",
+        "        },\n",
+        "        \"remember_chat_context\": False\n",
+        "      }\n",
+        "    },\n",
+        "    {\n",
+        "      \"name\": \"final_response_gen\",\n",
+        "      \"input\": \"Cross-check the provided list of verification data with the original baseline response that is supposed to accurately answer the baseline prompt. \\n\\nBaseline prompt: {{baseline_prompt}} \\nBaseline response: {{baseline_response_gen.output}}\\nVerification data: {{verification_results}}\",\n",
+        "      \"metadata\": {\n",
+        "        \"model\": {\n",
+        "          \"name\": \"gpt-4\",\n",
+        "          \"settings\": {\n",
+        "            \"system_prompt\": \"For each entity from the baseline response, verify that the entity met the criteria asked for in the baseline prompt based on the verification data. \\n\\nOutput Format: \\n\\n### Revised Response \\nThis is the revised response after running chain-of-verification. \\n(Please output the revised response after the cross-check.)\\n\\n### Failed Entities \\nThese are the entities that failed the cross-check and are no longer included in revised response. \\n(List the entities that failed the cross-check with a concise reason why)\"\n",
+        "          }\n",
+        "        },\n",
+        "        \"parameters\": {\n",
+        "          \"verification_results\": \"Theodore Roosevelt was born in New York City, New York on October 27, 1858. Franklin D. Roosevelt was born in Hyde Park, New York on January 30, 1882. Alexander Hamilton was born in Charlestown, Nevis on January 11, 1755. John Jay was born in New York City, New York on December 12, 1745. DeWitt Clinton was born in Little Britain, New York on March 2, 1769. William H. Seward was born in Florida, New York on May 16, 1801. Charles Evans Hughes was born in Glens Falls, New York on April 11, 1862. Nelson Rockefeller was born in Bar Harbor, Maine on July 8, 1908. Robert F. Wagner Jr. was born in Manhattan, New York on April 20, 1910. Bella Abzug was born in New York City, New York on July 24, 1920. Shirley Chisholm was born in Brooklyn, New York on November 30, 1924. Geraldine Ferraro was born in Newburgh, New York on August 26, 1935. Eliot Spitzer was born in The Bronx, New York on June 10, 1959. Michael Bloomberg was born in Boston, Massachusetts on February 14, 1942. Andrew Cuomo was born in New York City, New York on December 6, 1957. Bill de Blasio was born in Manhattan, New York on May 8, 1961. Charles Rangel was born in Harlem, New York City on June 11, 1930. Daniel Patrick Moynihan was born in Tulsa, Oklahoma on March 16, 1927. Jacob Javits was born in New York City, New York on May 18, 1904. Al Smith was born in New York City, New York on December 30, 1873. Rudy Giuliani was born in Brooklyn, New York on May 28, 1944. George Pataki was born in Peekskill, New York on June 24, 1945. Kirsten Gillibrand was born in Albany, New York on December 9, 1966. Chuck Schumer was born in Brooklyn, New York on November 23, 1950. Alexandria Ocasio-Cortez was born in The Bronx, New York City, New York on October 13, 1989.\"\n",
+        "        },\n",
+        "        \"remember_chat_context\": False\n",
+        "      }\n",
+        "    }\n",
+        "  ]\n",
+        "}\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ykZE2ieO6ryn"
+      },
+      "source": [
+        "## 1. Baseline Response\n",
+        "Prompt LLM with user question that generates a list. The baseline response from the LLM might contain inaccuracies that we can verify.\n",
+        "\n",
+        "**Prompt: Name 20 programming languages that were developed in the United States.**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 18,
+      "metadata": {
+        "id": "6cAw1ekXCxGn"
+      },
+      "outputs": [],
+      "source": [
+        "\n",
+        "config = AIConfigRuntime.create(**cove_template_config) # loads config (see code above)\n",
+        "config.callback_manager = CallbackManager([])\n",
+        "\n",
+        "inference_options = InferenceOptions() # setup streaming"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 19,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "nbolW2mVDeZD",
+        "outputId": "07cbaff4-3125-442a-cd2a-0126aa09b1b7"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "1. C (Dennis Ritchie, Bell Labs)\n",
+            "2. C++ (Bjarne Stroustrup, Bell Labs)\n",
+            "3. Java (James Gosling, Sun Microsystems)\n",
+            "4. Python (Guido van Rossum, Python Software Foundation)\n",
+            "5. JavaScript (Brendan Eich, Netscape Communications)\n",
+            "6. Ruby (Yukihiro Matsumoto, Ruby community)\n",
+            "7. Swift (Apple Inc.)\n",
+            "8. Go (Robert Griesemer, Rob Pike, and Ken Thompson, Google Inc.)\n",
+            "9. Perl (Larry Wall)\n",
+            "10. PHP (Rasmus Lerdorf)\n",
+            "11. Rust (Graydon Hoare, Mozilla Foundation)\n",
+            "12. TypeScript (Microsoft)\n",
+            "13. C# (Microsoft)\n",
+            "14. Objective-C (Brad Cox and Tom Love, Stepstone)\n",
+            "15. Lua (Roberto Ierusalimschy, Waldemar Celes, and Luiz Henrique de Figueiredo, PUC-Rio)\n",
+            "16. Dart (Google)\n",
+            "17. Kotlin (JetBrains)\n",
+            "18. Groovy (James Strachan, Guillaume Laforge, Jochen Theodorou, Paul King, Cedric Champeau)\n",
+            "19. R (Ross Ihaka and Robert Gentleman, University of Auckland)\n",
+            "20. Julia (Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman, Julia Computing)"
+          ]
+        }
+      ],
+      "source": [
+        "# <<TODO>>: Update baseline_prompt but ensure it is structured in a way that outputs a list of entities where each can be verified.\n",
+        "baseline_prompt = \"Name 20 programming languages that were developed in the United States. Include the developer name in parantheses.\"\n",
+        "\n",
+        "# Run baseline prompt to generate initial response which might contain errors\n",
+        "async def run_baseline_prompt(baseline_prompt):\n",
+        "    config.update_parameter(\"baseline_prompt\", baseline_prompt)\n",
+        "    config.save()\n",
+        "\n",
+        "    await config.run(\"baseline_response_gen\", options=inference_options) # run baseline prompt\n",
+        "    return config.get_output_text(\"baseline_response_gen\")\n",
+        "\n",
+        "baseline_response = await run_baseline_prompt(baseline_prompt)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7OQNz9cM7Myv"
+      },
+      "source": [
+        "\n",
+        "## 2. Setup and Test Verification Question\n",
+        "Given both query and baseline response, generate a verification\n",
+        "question that could help to self-analyze if there are any mistakes in the original response. We will use one verification question here.\n",
+        "\n",
+        "**Verification Prompt: Where was this coding language developed: {{entity}}?**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 20,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "jD9S3q5mMtqd",
+        "outputId": "6d3446a9-32ff-4fc1-9e3d-1f3600455b26"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Clojure was developed in the United States."
+          ]
+        }
+      ],
+      "source": [
+        "#  <<TODO>>: Update verification question that takes in entity as a parameter\n",
+        "# verification_question = \"Where was {{entity}} born?\"\n",
+        "verification_question =  \"Where was this coding language developed: {{entity}}?\"\n",
+        "\n",
+        "# Run verification on a single entity from baseline response to test\n",
+        "async def run_single_verification(verification_question, entity):\n",
+        "    params = {\"entity\": entity}\n",
+        "    config.update_parameter(\"verification_question\", verification_question)\n",
+        "    config.save()\n",
+        "\n",
+        "    verification_completion = await config.run(\"verification\", params, options=inference_options)\n",
+        "    return verification_completion\n",
+        "\n",
+        "#  <<TODO>>: Update with an entity from the baseline response\n",
+        "verification_completion = await run_single_verification(verification_question, \"clojure\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "B9Zaypp075f9"
+      },
+      "source": [
+        "## 3. Execute Verifications\n",
+        "Answer each verification question for each entity from the the baseline response. Save the verification results in a single string."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "QFew6GhONR8X",
+        "outputId": "71af66ce-d410-49ab-c65b-87eaee836219"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "The C programming language was developed at Bell Labs in the United States.\n",
+            "\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Extracts entity names from a given baseline response by processing each line with regex.\n",
+        "# TODO: Update regex if the format of the baseline response changes. (ex. not a numbered list)\n",
+        "def gen_entities_list(baseline_response):\n",
+        "  rows = baseline_response.split('\\n')\n",
+        "  entities = []\n",
+        "\n",
+        "  for row in rows:\n",
+        "      if not row.strip():\n",
+        "          continue\n",
+        "      entities.append(pd.Series(row).str.extract(r'(\\d+\\.\\s)([^,]*)')[1].values[0])\n",
+        "\n",
+        "  return entities\n",
+        "\n",
+        "# Run verification question for each entity and concatenates returned verifications into a single string.\n",
+        "async def gen_verification_results(entities):\n",
+        "  verification_data = \"\"\n",
+        "  for n in entities:\n",
+        "      params = {\n",
+        "          \"verification_question\": verification_question,\n",
+        "          \"entity\": n\n",
+        "      }\n",
+        "      verification_completion = await config.run(\"verification\", params, options=inference_options)\n",
+        "      single_verification_text = config.get_output_text(\"verification\")\n",
+        "      verification_data += \" \" + single_verification_text\n",
+        "      print(\"\\n\")\n",
+        "\n",
+        "  return verification_data\n",
+        "\n",
+        "\n",
+        "entities = gen_entities_list(baseline_response)\n",
+        "verification_data = await gen_verification_results(entities)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Ldof6NdR86qI"
+      },
+      "source": [
+        "## 4. Generate Revised Response\n",
+        "Given the discovered inconsistencies (if any), generate a revised response incorporating the verification results."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4MiNxiJc9GPI"
+      },
+      "outputs": [],
+      "source": [
+        "# Generated the revised response using verification data\n",
+        "params = {\"verification_results\": verification_data}\n",
+        "revised_response = await config.run(\"final_response_gen\", params)\n",
+        "\n",
+        "# Display with Markdown\n",
+        "display(Markdown(config.get_output_text(\"final_response_gen\")))"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.11.7"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}