From 85749553d3d6b965449414a544ba46e7b8280303 Mon Sep 17 00:00:00 2001 From: Shubhagyta Jayswal Date: Tue, 28 May 2024 00:05:34 +0530 Subject: [PATCH 1/3] Added eviction policy to semantic_cache class --- ...emantic_cache_chroma_vector_database.ipynb | 998 +++++++++--------- 1 file changed, 511 insertions(+), 487 deletions(-) diff --git a/notebooks/en/semantic_cache_chroma_vector_database.ipynb b/notebooks/en/semantic_cache_chroma_vector_database.ipynb index 885e4a00..99dfd052 100644 --- a/notebooks/en/semantic_cache_chroma_vector_database.ipynb +++ b/notebooks/en/semantic_cache_chroma_vector_database.ipynb @@ -1,48 +1,10 @@ { - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.10.12", - "mimetype": "text/x-python", - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "pygments_lexer": "ipython3", - "nbconvert_exporter": "python", - "file_extension": ".py" - }, - "kaggle": { - "accelerator": "gpu", - "dataSources": [ - { - "sourceId": 6104553, - "sourceType": "datasetVersion", - "datasetId": 3496946 - } - ], - "dockerImageVersionId": 30527, - "isInternetEnabled": true, - "language": "python", - "sourceType": "notebook", - "isGpuEnabled": true - }, - "colab": { - "provenance": [], - "machine_shape": "hm", - "gpuType": "T4" - }, - "accelerator": "GPU" - }, - "nbformat_minor": 0, - "nbformat": 4, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "AVv_M1Dz9TDz" + }, "source": [ "# Implementing semantic cache to improve a RAG system with FAISS.\n", "\n", @@ -60,13 +22,13 @@ "\n", "\n", "\n" - ], - "metadata": { - "id": "AVv_M1Dz9TDz" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "5gtBERjX1vFd" + }, "source": [ "Most tutorials that guide you through creating a RAG system are designed for single-user use, meant to operate in a testing environment. In other words, within a notebook, interacting with a local vector database and making API calls or using a locally stored model.\n", "\n", @@ -85,13 +47,13 @@ "But both requests will require the same information to enrich the prompt. This is the main reason why I chose to place the semantic cache system between the user's request and the retrieval of information from the vector database.\n", "\n", "However, this is a design decision. Depending on the type of responses and system requests, it can be placed at one point or another. It's evident that caching model responses would yield the most time savings, but as I've already explained, it comes at the cost of losing user influence over the response.\n" - ], - "metadata": { - "id": "5gtBERjX1vFd" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "uizxY8679TDz" + }, "source": [ "# Import and load the libraries.\n", "To start we need to install the necesary Python packages.\n", @@ -99,13 +61,23 @@ "* **[xformers](https://github.com/facebookresearch/xformers)**. it's a package that provides libraries an utilities to facilitate the work with transformers models. We need to install in order to avoid an error when we work with the model and embeddings. \n", "* **[chromadb](https://www.trychroma.com/)**. This is our vector Database. ChromaDB is easy to use and open source, maybe the most used Vector Database used to store embeddings.\n", "* **[accelerate](https://github.com/huggingface/accelerate)** Necesary to run the Model in a GPU. " - ], - "metadata": { - "id": "uizxY8679TDz" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2024-02-29T17:30:10.787688Z", + "iopub.status.busy": "2024-02-29T17:30:10.787382Z", + "iopub.status.idle": "2024-02-29T17:34:12.804579Z", + "shell.execute_reply": "2024-02-29T17:34:12.80338Z", + "shell.execute_reply.started": "2024-02-29T17:30:10.787657Z" + }, + "id": "r1nUzd1u9TD0", + "trusted": true + }, + "outputs": [], "source": [ "!pip install -q transformers==4.38.1\n", "!pip install -q accelerate==0.27.2\n", @@ -113,53 +85,45 @@ "!pip install -q xformers==0.0.24\n", "!pip install -q chromadb==0.4.24\n", "!pip install -q datasets==2.17.1" - ], - "metadata": { - "execution": { - "iopub.status.busy": "2024-02-29T17:30:10.787382Z", - "iopub.execute_input": "2024-02-29T17:30:10.787688Z", - "iopub.status.idle": "2024-02-29T17:34:12.804579Z", - "shell.execute_reply.started": "2024-02-29T17:30:10.787657Z", - "shell.execute_reply": "2024-02-29T17:34:12.80338Z" - }, - "trusted": true, - "id": "r1nUzd1u9TD0" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "import numpy as np\n", - "import pandas as pd" - ], + "execution_count": 2, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:23.197205Z", "iopub.execute_input": "2024-02-29T17:35:23.197598Z", + "iopub.status.busy": "2024-02-29T17:35:23.197205Z", "iopub.status.idle": "2024-02-29T17:35:23.202259Z", - "shell.execute_reply.started": "2024-02-29T17:35:23.197556Z", - "shell.execute_reply": "2024-02-29T17:35:23.201404Z" + "shell.execute_reply": "2024-02-29T17:35:23.201404Z", + "shell.execute_reply.started": "2024-02-29T17:35:23.197556Z" }, - "trusted": true, - "id": "5jUwC_eE9TD0" + "id": "5jUwC_eE9TD0", + "trusted": true }, - "execution_count": 2, - "outputs": [] + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "9P-kYtc79TD1" + }, "source": [ "# Load the Dataset\n", "As we are working in a free and limited space, and we can use just a few GB of memory I limited the number of rows to use from the Dataset with the variable `MAX_ROWS`." - ], - "metadata": { - "id": "9P-kYtc79TD1" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xZsN8yzUvfjN" + }, + "outputs": [], "source": [ "#Login to Hugging Face. It is mandatory to use the Gemma Model,\n", "#and recommended to acces public models and Datasets.\n", @@ -167,80 +131,49 @@ "if 'hf_key' not in locals():\n", " hf_key = getpass(\"Your Hugging Face API Key: \")\n", "!huggingface-cli login --token $hf_key" - ], - "metadata": { - "id": "xZsN8yzUvfjN" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": 47, + "metadata": { + "id": "9IVxu-uxtCTw" + }, + "outputs": [], "source": [ "from datasets import load_dataset\n", "\n", "data = load_dataset(\"keivalya/MedQuad-MedicalQnADataset\", split='train')" - ], - "metadata": { - "id": "9IVxu-uxtCTw" - }, - "execution_count": 47, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "ChromaDB requires that the data has a unique identifier. We can make it with this statement, which will create a new column called **Id**.\n" - ], "metadata": { "id": "hmor-i1j9TD1" - } + }, + "source": [ + "ChromaDB requires that the data has a unique identifier. We can make it with this statement, which will create a new column called **Id**.\n" + ] }, { "cell_type": "code", - "source": [ - "data = data.to_pandas()\n", - "data[\"id\"]=data.index\n", - "data.head(10)" - ], + "execution_count": 48, "metadata": { - "id": "WbLf8c7_yHwy", - "outputId": "492eac81-2f7b-4063-f444-405bf489d08e", "colab": { "base_uri": "https://localhost:8080/", "height": 536 - } + }, + "id": "WbLf8c7_yHwy", + "outputId": "492eac81-2f7b-4063-f444-405bf489d08e" }, - "execution_count": 48, "outputs": [ { - "output_type": "execute_result", "data": { - "text/plain": [ - " qtype Question \\\n", - "0 susceptibility Who is at risk for Lymphocytic Choriomeningiti... \n", - "1 symptoms What are the symptoms of Lymphocytic Choriomen... \n", - "2 susceptibility Who is at risk for Lymphocytic Choriomeningiti... \n", - "3 exams and tests How to diagnose Lymphocytic Choriomeningitis (... \n", - "4 treatment What are the treatments for Lymphocytic Chorio... \n", - "5 prevention How to prevent Lymphocytic Choriomeningitis (L... \n", - "6 information What is (are) Parasites - Cysticercosis ? \n", - "7 susceptibility Who is at risk for Parasites - Cysticercosis? ? \n", - "8 exams and tests How to diagnose Parasites - Cysticercosis ? \n", - "9 treatment What are the treatments for Parasites - Cystic... \n", - "\n", - " Answer id \n", - "0 LCMV infections can occur after exposure to fr... 0 \n", - "1 LCMV is most commonly recognized as causing ne... 1 \n", - "2 Individuals of all ages who come into contact ... 2 \n", - "3 During the first phase of the disease, the mos... 3 \n", - "4 Aseptic meningitis, encephalitis, or meningoen... 4 \n", - "5 LCMV infection can be prevented by avoiding co... 5 \n", - "6 Cysticercosis is an infection caused by the la... 6 \n", - "7 Cysticercosis is an infection caused by the la... 7 \n", - "8 If you think that you may have cysticercosis, ... 8 \n", - "9 Some people with cysticercosis do not need to ... 9 " - ], + "application/vnd.google.colaboratory.intrinsic+json": { + "summary": "{\n \"name\": \"data\",\n \"rows\": 16407,\n \"fields\": [\n {\n \"column\": \"qtype\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 16,\n \"samples\": [\n \"susceptibility\",\n \"symptoms\",\n \"information\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Question\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 14979,\n \"samples\": [\n \"What are the symptoms of Danon disease ?\",\n \"What is (are) Dowling-Degos disease ?\",\n \"What are the genetic changes related to Pearson marrow-pancreas syndrome ?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Answer\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 15817,\n \"samples\": [\n \"These resources address the diagnosis or management of glycogen storage disease type III: - Gene Review: Gene Review: Glycogen Storage Disease Type III - Genetic Testing Registry: Glycogen storage disease type III These resources from MedlinePlus offer information about the diagnosis and management of various health conditions: - Diagnostic Tests - Drug Therapy - Surgery and Rehabilitation - Genetic Counseling - Palliative Care\",\n \"Diagnostic Challenges\\n \\nFor doctors, diagnosing chronic fatigue syndrome (CFS) can be complicated by a number of factors:\\n \\n - There's no lab test or biomarker for CFS.\\n - Fatigue and other symptoms of CFS are common to many illnesses.\\n - For some CFS patients, it may not be obvious to doctors that they are ill.\\n - The illness has a pattern of remission and relapse.\\n - Symptoms vary from person to person in type, number, and severity.\\n \\n \\nThese factors have contributed to a low diagnosis rate. Of the one to four million Americans who have CFS, less than 20% have been diagnosed.\\n Exams and Screening Tests for CFS\\n \\nBecause there is no blood test, brain scan, or other lab test to diagnose CFS, the doctor should first rule out other possible causes.\\n \\nIf a patient has had 6 or more consecutive months of severe fatigue that is reported to be unrelieved by sufficient bed rest and that is accompanied by nonspecific symptoms, including flu-like symptoms, generalized pain, and memory problems, the doctor should consider the possibility that the patient may have CFS. Further exams and tests are needed before a diagnosis can be made:\\n \\n - A detailed medical history will be needed and should include a review of medications that could be causing the fatigue and symptoms\\n - A thorough physical and mental status examination will also be needed\\n - A battery of laboratory screening tests will be needed to help identify or rule out other possible causes of the symptoms that could be treated\\n - The doctor may also order additional tests to follow up on results of the initial screening tests\\n \\n \\nA CFS diagnosis requires that the patient has been fatigued for 6 months or more and has 4 of the 8 symptoms for CFS for 6 months or more. If, however, the patient has been fatigued for 6 months or more but does not have four of the eight symptoms, the diagnosis may be idiopathic fatigue.\\n \\nThe complete process for diagnosing CFS can be found here.\\n \\nAdditional information for healthcare professionals on use of tests can be found here.\",\n \"Eating, diet, and nutrition have not been shown to play a role in causing or preventing simple kidney cysts.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4736,\n \"min\": 0,\n \"max\": 16406,\n \"num_unique_values\": 16407,\n \"samples\": [\n 3634,\n 15104,\n 4395\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", + "type": "dataframe", + "variable_name": "data" + }, "text/html": [ "\n", "
\n", @@ -551,239 +484,265 @@ "
\n", " \n" ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "variable_name": "data", - "summary": "{\n \"name\": \"data\",\n \"rows\": 16407,\n \"fields\": [\n {\n \"column\": \"qtype\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 16,\n \"samples\": [\n \"susceptibility\",\n \"symptoms\",\n \"information\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Question\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 14979,\n \"samples\": [\n \"What are the symptoms of Danon disease ?\",\n \"What is (are) Dowling-Degos disease ?\",\n \"What are the genetic changes related to Pearson marrow-pancreas syndrome ?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Answer\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 15817,\n \"samples\": [\n \"These resources address the diagnosis or management of glycogen storage disease type III: - Gene Review: Gene Review: Glycogen Storage Disease Type III - Genetic Testing Registry: Glycogen storage disease type III These resources from MedlinePlus offer information about the diagnosis and management of various health conditions: - Diagnostic Tests - Drug Therapy - Surgery and Rehabilitation - Genetic Counseling - Palliative Care\",\n \"Diagnostic Challenges\\n \\nFor doctors, diagnosing chronic fatigue syndrome (CFS) can be complicated by a number of factors:\\n \\n - There's no lab test or biomarker for CFS.\\n - Fatigue and other symptoms of CFS are common to many illnesses.\\n - For some CFS patients, it may not be obvious to doctors that they are ill.\\n - The illness has a pattern of remission and relapse.\\n - Symptoms vary from person to person in type, number, and severity.\\n \\n \\nThese factors have contributed to a low diagnosis rate. Of the one to four million Americans who have CFS, less than 20% have been diagnosed.\\n Exams and Screening Tests for CFS\\n \\nBecause there is no blood test, brain scan, or other lab test to diagnose CFS, the doctor should first rule out other possible causes.\\n \\nIf a patient has had 6 or more consecutive months of severe fatigue that is reported to be unrelieved by sufficient bed rest and that is accompanied by nonspecific symptoms, including flu-like symptoms, generalized pain, and memory problems, the doctor should consider the possibility that the patient may have CFS. Further exams and tests are needed before a diagnosis can be made:\\n \\n - A detailed medical history will be needed and should include a review of medications that could be causing the fatigue and symptoms\\n - A thorough physical and mental status examination will also be needed\\n - A battery of laboratory screening tests will be needed to help identify or rule out other possible causes of the symptoms that could be treated\\n - The doctor may also order additional tests to follow up on results of the initial screening tests\\n \\n \\nA CFS diagnosis requires that the patient has been fatigued for 6 months or more and has 4 of the 8 symptoms for CFS for 6 months or more. If, however, the patient has been fatigued for 6 months or more but does not have four of the eight symptoms, the diagnosis may be idiopathic fatigue.\\n \\nThe complete process for diagnosing CFS can be found here.\\n \\nAdditional information for healthcare professionals on use of tests can be found here.\",\n \"Eating, diet, and nutrition have not been shown to play a role in causing or preventing simple kidney cysts.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4736,\n \"min\": 0,\n \"max\": 16406,\n \"num_unique_values\": 16407,\n \"samples\": [\n 3634,\n 15104,\n 4395\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } + "text/plain": [ + " qtype Question \\\n", + "0 susceptibility Who is at risk for Lymphocytic Choriomeningiti... \n", + "1 symptoms What are the symptoms of Lymphocytic Choriomen... \n", + "2 susceptibility Who is at risk for Lymphocytic Choriomeningiti... \n", + "3 exams and tests How to diagnose Lymphocytic Choriomeningitis (... \n", + "4 treatment What are the treatments for Lymphocytic Chorio... \n", + "5 prevention How to prevent Lymphocytic Choriomeningitis (L... \n", + "6 information What is (are) Parasites - Cysticercosis ? \n", + "7 susceptibility Who is at risk for Parasites - Cysticercosis? ? \n", + "8 exams and tests How to diagnose Parasites - Cysticercosis ? \n", + "9 treatment What are the treatments for Parasites - Cystic... \n", + "\n", + " Answer id \n", + "0 LCMV infections can occur after exposure to fr... 0 \n", + "1 LCMV is most commonly recognized as causing ne... 1 \n", + "2 Individuals of all ages who come into contact ... 2 \n", + "3 During the first phase of the disease, the mos... 3 \n", + "4 Aseptic meningitis, encephalitis, or meningoen... 4 \n", + "5 LCMV infection can be prevented by avoiding co... 5 \n", + "6 Cysticercosis is an infection caused by the la... 6 \n", + "7 Cysticercosis is an infection caused by the la... 7 \n", + "8 If you think that you may have cysticercosis, ... 8 \n", + "9 Some people with cysticercosis do not need to ... 9 " + ] }, + "execution_count": 48, "metadata": {}, - "execution_count": 48 + "output_type": "execute_result" } + ], + "source": [ + "data = data.to_pandas()\n", + "data[\"id\"]=data.index\n", + "data.head(10)" ] }, { "cell_type": "code", - "source": [ - "MAX_ROWS = 15000\n", - "DOCUMENT=\"Answer\"\n", - "TOPIC=\"qtype\"" - ], + "execution_count": 6, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:25.527688Z", "iopub.execute_input": "2024-02-29T17:35:25.528374Z", + "iopub.status.busy": "2024-02-29T17:35:25.527688Z", "iopub.status.idle": "2024-02-29T17:35:25.709895Z", - "shell.execute_reply.started": "2024-02-29T17:35:25.528341Z", - "shell.execute_reply": "2024-02-29T17:35:25.709127Z" + "shell.execute_reply": "2024-02-29T17:35:25.709127Z", + "shell.execute_reply.started": "2024-02-29T17:35:25.528341Z" }, - "trusted": true, - "id": "DZf0zCI29TD1" + "id": "DZf0zCI29TD1", + "trusted": true }, - "execution_count": 6, - "outputs": [] + "outputs": [], + "source": [ + "MAX_ROWS = 15000\n", + "DOCUMENT=\"Answer\"\n", + "TOPIC=\"qtype\"" + ] }, { "cell_type": "code", - "source": [ - "#Because it is just a sample we select a small portion of News.\n", - "subset_data = data.head(MAX_ROWS)" - ], + "execution_count": 7, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:29.183979Z", "iopub.execute_input": "2024-02-29T17:35:29.184342Z", + "iopub.status.busy": "2024-02-29T17:35:29.183979Z", "iopub.status.idle": "2024-02-29T17:35:29.189229Z", - "shell.execute_reply.started": "2024-02-29T17:35:29.184313Z", - "shell.execute_reply": "2024-02-29T17:35:29.1881Z" + "shell.execute_reply": "2024-02-29T17:35:29.1881Z", + "shell.execute_reply.started": "2024-02-29T17:35:29.184313Z" }, - "trusted": true, - "id": "Mkoj9IrZ9TD1" + "id": "Mkoj9IrZ9TD1", + "trusted": true }, - "execution_count": 7, - "outputs": [] + "outputs": [], + "source": [ + "#Because it is just a sample we select a small portion of News.\n", + "subset_data = data.head(MAX_ROWS)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "rZHg_Qh69TD1" + }, "source": [ "# Import and configure the Vector Database\n", "To store the information, I've chosen to use ChromaDB, one of the most well-known and widely used open-source vector databases.\n", "\n", "First we need to import ChromaDB." - ], - "metadata": { - "id": "rZHg_Qh69TD1" - } + ] }, { "cell_type": "code", - "source": [ - "import chromadb" - ], + "execution_count": 8, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:31.849199Z", "iopub.execute_input": "2024-02-29T17:35:31.849551Z", + "iopub.status.busy": "2024-02-29T17:35:31.849199Z", "iopub.status.idle": "2024-02-29T17:35:32.31736Z", - "shell.execute_reply.started": "2024-02-29T17:35:31.849525Z", - "shell.execute_reply": "2024-02-29T17:35:32.316617Z" + "shell.execute_reply": "2024-02-29T17:35:32.316617Z", + "shell.execute_reply.started": "2024-02-29T17:35:31.849525Z" }, - "trusted": true, - "id": "npJhuZQw9TD1" + "id": "npJhuZQw9TD1", + "trusted": true }, - "execution_count": 8, - "outputs": [] + "outputs": [], + "source": [ + "import chromadb" + ] }, { "cell_type": "markdown", - "source": [ - "Now we only need to indicate the path where the vector database will be stored." - ], "metadata": { "id": "8okox5C89TD1" - } + }, + "source": [ + "Now we only need to indicate the path where the vector database will be stored." + ] }, { "cell_type": "code", - "source": [ - "chroma_client = chromadb.PersistentClient(path=\"/path/to/persist/directory\")" - ], + "execution_count": 9, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:34.410268Z", "iopub.execute_input": "2024-02-29T17:35:34.410646Z", + "iopub.status.busy": "2024-02-29T17:35:34.410268Z", "iopub.status.idle": "2024-02-29T17:35:34.872817Z", - "shell.execute_reply.started": "2024-02-29T17:35:34.410614Z", - "shell.execute_reply": "2024-02-29T17:35:34.872039Z" + "shell.execute_reply": "2024-02-29T17:35:34.872039Z", + "shell.execute_reply.started": "2024-02-29T17:35:34.410614Z" }, - "trusted": true, - "id": "9yK6y0hm9TD1" + "id": "9yK6y0hm9TD1", + "trusted": true }, - "execution_count": 9, - "outputs": [] + "outputs": [], + "source": [ + "chroma_client = chromadb.PersistentClient(path=\"/path/to/persist/directory\")" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "7MhMwk3J9TD1" + }, "source": [ "# Filling and Querying the ChromaDB Database\n", "The Data in ChromaDB is stored in collections. If the collection exist we need to delete it.\n", "\n", "In the next lines, we are creating the collection by calling the `create_collection` function in the `chroma_client` created above." - ], - "metadata": { - "id": "7MhMwk3J9TD1" - } + ] }, { "cell_type": "code", - "source": [ - "collection_name = \"news_collection\"\n", - "if len(chroma_client.list_collections()) > 0 and collection_name in [chroma_client.list_collections()[0].name]:\n", - " chroma_client.delete_collection(name=collection_name)\n", - "\n", - "collection = chroma_client.create_collection(name=collection_name)\n" - ], + "execution_count": 10, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:35:36.1156Z", "iopub.execute_input": "2024-02-29T17:35:36.116012Z", + "iopub.status.busy": "2024-02-29T17:35:36.1156Z", "iopub.status.idle": "2024-02-29T17:35:36.16922Z", - "shell.execute_reply.started": "2024-02-29T17:35:36.115977Z", - "shell.execute_reply": "2024-02-29T17:35:36.168504Z" + "shell.execute_reply": "2024-02-29T17:35:36.168504Z", + "shell.execute_reply.started": "2024-02-29T17:35:36.115977Z" }, - "trusted": true, - "id": "kRCsunE19TD1" + "id": "kRCsunE19TD1", + "trusted": true }, - "execution_count": 10, - "outputs": [] + "outputs": [], + "source": [ + "collection_name = \"news_collection\"\n", + "if len(chroma_client.list_collections()) > 0 and collection_name in [chroma_client.list_collections()[0].name]:\n", + " chroma_client.delete_collection(name=collection_name)\n", + "\n", + "collection = chroma_client.create_collection(name=collection_name)\n" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "rdEtcETr9TD2" + }, "source": [ "We are now ready to add the data to the collection using the `add` function. This function requires three key pieces of information:\n", "\n", "* In the **document** we store the content of the `Answer` column in the Dataset.\n", "* In **metadatas**, we can inform a list of topics. I used the value in the column `qtype`.\n", "* In **id** we need to inform an unique identificator for each row. I'm creating the ID using the range of `MAX_ROWS`.\n" - ], - "metadata": { - "id": "rdEtcETr9TD2" - } + ] }, { "cell_type": "code", - "source": [ - "collection.add(\n", - " documents=subset_data[DOCUMENT].tolist(),\n", - " metadatas=[{TOPIC: topic} for topic in subset_data[TOPIC].tolist()],\n", - " ids=[f\"id{x}\" for x in range(MAX_ROWS)],\n", - ")" - ], + "execution_count": 11, "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, "execution": { - "iopub.status.busy": "2024-02-29T17:35:38.051179Z", "iopub.execute_input": "2024-02-29T17:35:38.051601Z", + "iopub.status.busy": "2024-02-29T17:35:38.051179Z", "iopub.status.idle": "2024-02-29T17:36:38.612836Z", - "shell.execute_reply.started": "2024-02-29T17:35:38.051569Z", - "shell.execute_reply": "2024-02-29T17:36:38.611814Z" + "shell.execute_reply": "2024-02-29T17:36:38.611814Z", + "shell.execute_reply.started": "2024-02-29T17:35:38.051569Z" }, - "trusted": true, "id": "4dDoqJE79TD2", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "36f579dc-ec60-48b1-807a-1e68113cc9f4" + "outputId": "36f579dc-ec60-48b1-807a-1e68113cc9f4", + "trusted": true }, - "execution_count": 11, "outputs": [ { - "metadata": { - "tags": null - }, "name": "stderr", "output_type": "stream", "text": [ "/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:01<00:00, 68.1MiB/s]\n" ] } + ], + "source": [ + "collection.add(\n", + " documents=subset_data[DOCUMENT].tolist(),\n", + " metadatas=[{TOPIC: topic} for topic in subset_data[TOPIC].tolist()],\n", + " ids=[f\"id{x}\" for x in range(MAX_ROWS)],\n", + ")" ] }, { "cell_type": "markdown", + "metadata": { + "id": "du6-iuUisRkM" + }, "source": [ "Once we have the information in the Database we can query it, and ask for data that matches our needs. The search is done inside the content of the document, and it dosn't look for the exact word, or phrase. The results will be based on the similarity between the search terms and the content of documents.\n", "\n", "Metadata isn't directly involved in the initial search process, it can be used to filter or refine the results after retrieval, enabling further customization and precision.\n", "\n", "Let's define a function to query the ChromaDB Database." - ], - "metadata": { - "id": "du6-iuUisRkM" - } + ] }, { "cell_type": "code", - "source": [ - "def query_database(query_text, n_results=10):\n", - " results = collection.query(query_texts=query_text, n_results=n_results )\n", - " return results" - ], + "execution_count": 12, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:36:38.615302Z", "iopub.execute_input": "2024-02-29T17:36:38.616047Z", + "iopub.status.busy": "2024-02-29T17:36:38.615302Z", "iopub.status.idle": "2024-02-29T17:36:38.620516Z", - "shell.execute_reply.started": "2024-02-29T17:36:38.616008Z", - "shell.execute_reply": "2024-02-29T17:36:38.619561Z" + "shell.execute_reply": "2024-02-29T17:36:38.619561Z", + "shell.execute_reply.started": "2024-02-29T17:36:38.616008Z" }, - "trusted": true, - "id": "UjdhZ4MJ9TD2" + "id": "UjdhZ4MJ9TD2", + "trusted": true }, - "execution_count": 12, - "outputs": [] + "outputs": [], + "source": [ + "def query_database(query_text, n_results=10):\n", + " results = collection.query(query_texts=query_text, n_results=n_results )\n", + " return results" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "CL0Crl3x9TD2" + }, "source": [ "## Creating the semantic cache system\n", "To implement the cache system, we will use Faiss, a library that allows storing embeddings in memory. It's quite similar to what Chroma does, but without its persistence.\n", @@ -793,46 +752,46 @@ "In this class, we first query the cache implemented with Faiss, that contains the previous petitions, and if the returned results are above a specified threshold, it will return the content of the cache. Otherwise, it will fetch the result from the Chroma database.\n", "\n", "The cache is stored in a .json file." - ], - "metadata": { - "id": "CL0Crl3x9TD2" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install -q faiss-cpu==1.8.0" - ], + "execution_count": null, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:36:38.621655Z", "iopub.execute_input": "2024-02-29T17:36:38.621968Z", + "iopub.status.busy": "2024-02-29T17:36:38.621655Z", "iopub.status.idle": "2024-02-29T17:36:51.313356Z", - "shell.execute_reply.started": "2024-02-29T17:36:38.621936Z", - "shell.execute_reply": "2024-02-29T17:36:51.312232Z" + "shell.execute_reply": "2024-02-29T17:36:51.312232Z", + "shell.execute_reply.started": "2024-02-29T17:36:38.621936Z" }, - "trusted": true, - "id": "6OzUbRUe9TD2" + "id": "6OzUbRUe9TD2", + "trusted": true }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install -q faiss-cpu==1.8.0" + ] }, { "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "0yGE4cTEp3QJ" + }, + "outputs": [], "source": [ "import faiss\n", "from sentence_transformers import SentenceTransformer\n", "import time\n", "import json" - ], - "metadata": { - "id": "0yGE4cTEp3QJ" - }, - "execution_count": 14, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "yi_riXHhcLy0" + }, "source": [ "The `init_cache()` function below initializes the semantic cache.\n", "\n", @@ -848,13 +807,15 @@ "* IVF. Works well with large datasets without consuming much memory or compromising performance.\n", "\n", "More information about the different indices available with Faiss can be found at this link: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index" - ], - "metadata": { - "id": "yi_riXHhcLy0" - } + ] }, { "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "9poNBxbPl7xE" + }, + "outputs": [], "source": [ "def init_cache():\n", " index = faiss.IndexFlatL2(768)\n", @@ -865,24 +826,24 @@ " encoder = SentenceTransformer('all-mpnet-base-v2')\n", "\n", " return index, encoder" - ], - "metadata": { - "id": "9poNBxbPl7xE" - }, - "execution_count": 15, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "In the `retrieve_cache` function, the .json file is retrieved from disk in case there is a need to reuse the cache across sessions." - ], "metadata": { "id": "_uZzX60odo1U" - } + }, + "source": [ + "In the `retrieve_cache` function, the .json file is retrieved from disk in case there is a need to reuse the cache across sessions." + ] }, { "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "FDJJ86TSp5CO" + }, + "outputs": [], "source": [ "def retrieve_cache(json_file):\n", " try:\n", @@ -892,37 +853,35 @@ " cache = {'questions': [], 'embeddings': [], 'answers': [], 'response_text': []}\n", "\n", " return cache" - ], - "metadata": { - "id": "FDJJ86TSp5CO" - }, - "execution_count": 16, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "The `store_cache` function saves the file containing the cache data to disk." - ], "metadata": { "id": "3uO-12UIdtSD" - } + }, + "source": [ + "The `store_cache` function saves the file containing the cache data to disk." + ] }, { "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "jx1CiKOcwKGn" + }, + "outputs": [], "source": [ "def store_cache(json_file, cache):\n", " with open(json_file, 'w') as file:\n", " json.dump(cache, file)" - ], - "metadata": { - "id": "jx1CiKOcwKGn" - }, - "execution_count": 17, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "t9AdmnhQd2E8" + }, "source": [ "These functions will be used within the `SemanticCache` class, which includes the search function and its initialization function.\n", "\n", @@ -931,26 +890,60 @@ "Afterward, checks if it is within the specified threshold. If positive, it directly returns the response from the cache; otherwise, it calls the `query_database` function to retrieve the data from ChromaDB.\n", "\n", "I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.\n" - ], - "metadata": { - "id": "t9AdmnhQd2E8" - } + ] }, { "cell_type": "code", + "execution_count": 51, + "metadata": { + "execution": { + "iopub.execute_input": "2024-02-29T17:36:51.31678Z", + "iopub.status.busy": "2024-02-29T17:36:51.316449Z", + "iopub.status.idle": "2024-02-29T17:36:55.197427Z", + "shell.execute_reply": "2024-02-29T17:36:55.196616Z", + "shell.execute_reply.started": "2024-02-29T17:36:51.316746Z" + }, + "id": "t_HVtwww9TD2", + "trusted": true + }, + "outputs": [], "source": [ "class semantic_cache:\n", - " def __init__(self, json_file=\"cache_file.json\", thresold=0.35):\n", - " # Initialize Faiss index with Euclidean distance\n", - " self.index, self.encoder = init_cache()\n", + " def __init__(self, json_file=\"cache_file.json\", thresold=0.35, max_response=100, eviction_policy=None):\n", + " \"\"\"Initializes the semantic cache.\n", + "\n", + " Args:\n", + " json_file (str): The name of the JSON file where the cache is stored.\n", + " thresold (float): The threshold for the Euclidean distance to determine if a question is similar.\n", + " max_response (int): The maximum number of responses the cache can store.\n", + " eviction_policy (str): The policy for evicting items from the cache. \n", + " This can be 'LRU' (Least Recently Used) or 'FIFO' (First In First Out).\n", + " If None, no eviction policy will be applied.\n", + " \"\"\"\n", + " \n", + " # Initialize Faiss index with Euclidean distance\n", + " self.index, self.encoder = init_cache()\n", + "\n", + " # Set Euclidean distance threshold\n", + " # a distance of 0 means identicals sentences\n", + " # We only return from cache sentences under this thresold\n", + " self.euclidean_threshold = thresold\n", "\n", - " # Set Euclidean distance threshold\n", - " # a distance of 0 means identicals sentences\n", - " # We only return from cache sentences under this thresold\n", - " self.euclidean_threshold = thresold\n", + " self.json_file = json_file\n", + " self.cache = retrieve_cache(self.json_file)\n", + " self.max_response = max_response\n", + " self.eviction_policy = eviction_policy\n", "\n", - " self.json_file = json_file\n", - " self.cache = retrieve_cache(self.json_file)\n", + " def evict(self):\n", + "\n", + " \"\"\"Evicts an item from the cache based on the eviction policy.\"\"\"\n", + " if self.eviction_policy and len(self.cache[\"questions\"]) > self.max_size:\n", + " for _ in range((len(self.cache[\"questions\"]) - self.max_response)):\n", + " if self.eviction_policy == 'FIFO':\n", + " self.cache[\"questions\"].pop(0)\n", + " self.cache[\"embeddings\"].pop(0)\n", + " self.cache[\"answers\"].pop(0)\n", + " self.cache[\"response_text\"].pop(0)\n", "\n", " def ask(self, question: str) -> str:\n", " # Method to retrieve an answer from the cache or generate a new one\n", @@ -991,6 +984,9 @@ " print(f'response_text: {response_text}')\n", "\n", " self.index.add(embedding)\n", + "\n", + " self.evict()\n", + "\n", " store_cache(self.json_file, self.cache)\n", " end_time = time.time()\n", " elapsed_time = end_time - start_time\n", @@ -999,169 +995,153 @@ " return response_text\n", " except Exception as e:\n", " raise RuntimeError(f\"Error during 'ask' method: {e}\")\n" - ], - "metadata": { - "execution": { - "iopub.status.busy": "2024-02-29T17:36:51.316449Z", - "iopub.execute_input": "2024-02-29T17:36:51.31678Z", - "iopub.status.idle": "2024-02-29T17:36:55.197427Z", - "shell.execute_reply.started": "2024-02-29T17:36:51.316746Z", - "shell.execute_reply": "2024-02-29T17:36:55.196616Z" - }, - "trusted": true, - "id": "t_HVtwww9TD2" - }, - "execution_count": 51, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "### Testing the semantic_cache class." - ], "metadata": { "id": "UBWTqGM7i71N" - } + }, + "source": [ + "### Testing the semantic_cache class." + ] }, { "cell_type": "code", - "source": [ - "# Initialize the cache.\n", - "cache = semantic_cache('4cache.json')" - ], + "execution_count": 52, "metadata": { - "id": "JH8s8eUtCMIS", "colab": { "base_uri": "https://localhost:8080/" }, + "id": "JH8s8eUtCMIS", "outputId": "c613bbfc-9f84-4a96-cd39-45972e69c15b" }, - "execution_count": 52, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Index trained\n" ] } + ], + "source": [ + "# Initialize the cache.\n", + "cache = semantic_cache('4cache.json')" ] }, { "cell_type": "code", - "source": [ - "results = cache.ask(\"How do vaccines work?\")" - ], + "execution_count": 53, "metadata": { - "id": "mKqKLfDe_8bC", - "outputId": "8a92ed95-c822-4382-c6db-d9de289341af", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "mKqKLfDe_8bC", + "outputId": "8a92ed95-c822-4382-c6db-d9de289341af" }, - "execution_count": 53, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Answer recovered from ChromaDB. \n", "response_text: Summary : Shots may hurt a little, but the diseases they can prevent are a lot worse. Some are even life-threatening. Immunization shots, or vaccinations, are essential. They protect against things like measles, mumps, rubella, hepatitis B, polio, tetanus, diphtheria, and pertussis (whooping cough). Immunizations are important for adults as well as children. Your immune system helps your body fight germs by producing substances to combat them. Once it does, the immune system \"remembers\" the germ and can fight it again. Vaccines contain germs that have been killed or weakened. When given to a healthy person, the vaccine triggers the immune system to respond and thus build immunity. Before vaccines, people became immune only by actually getting a disease and surviving it. Immunizations are an easier and less risky way to become immune. NIH: National Institute of Allergy and Infectious Diseases\n", "Time taken: 0.057 seconds\n" ] } + ], + "source": [ + "results = cache.ask(\"How do vaccines work?\")" ] }, { "cell_type": "markdown", + "metadata": { + "id": "dP7H6TypknLN" + }, "source": [ "As expected, this response has been obtained from ChromaDB. The class then stores it in the cache.\n", "\n", "Now, if we send a second question that is quite different, the response should also be retrieved from ChromaDB. This is because the question stored previously is so dissimilar that it would surpass the specified threshold in terms of Euclidean distance." - ], - "metadata": { - "id": "dP7H6TypknLN" - } + ] }, { "cell_type": "code", - "source": [ - "\n", - "results = cache.ask(\"Explain briefly what is a Sydenham chorea\")" - ], + "execution_count": 54, "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, "execution": { - "iopub.status.busy": "2024-02-29T17:37:15.335288Z", "iopub.execute_input": "2024-02-29T17:37:15.335593Z", + "iopub.status.busy": "2024-02-29T17:37:15.335288Z", "iopub.status.idle": "2024-02-29T17:37:17.320691Z", - "shell.execute_reply.started": "2024-02-29T17:37:15.335566Z", - "shell.execute_reply": "2024-02-29T17:37:17.319671Z" + "shell.execute_reply": "2024-02-29T17:37:17.319671Z", + "shell.execute_reply.started": "2024-02-29T17:37:15.335566Z" }, - "trusted": true, "id": "CvJykqVf9TD2", "outputId": "7137919e-e417-47b3-a638-18026b3edfe6", - "colab": { - "base_uri": "https://localhost:8080/" - } + "trusted": true }, - "execution_count": 54, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Answer recovered from ChromaDB. \n", "response_text: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n", "Time taken: 0.082 seconds\n" ] } + ], + "source": [ + "\n", + "results = cache.ask(\"Explain briefly what is a Sydenham chorea\")" ] }, { "cell_type": "markdown", + "metadata": { + "id": "8aPWvU64lxOU" + }, "source": [ "Perfect, the semantic cache system is behaving as expected.\n", "\n", "Let's proceed to test it with a question very similar to the one we just asked.\n", "\n", "In this case, the response should come directly from the cache without the need to access the ChromaDB database." - ], - "metadata": { - "id": "8aPWvU64lxOU" - } + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "id": "sPmmTGGM0pVj" - } + }, + "source": [] }, { "cell_type": "code", - "source": [ - "results = cache.ask(\"Briefly explain me what is a Sydenham chorea.\")" - ], + "execution_count": 55, "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, "execution": { - "iopub.status.busy": "2024-02-29T17:37:17.32865Z", "iopub.execute_input": "2024-02-29T17:37:17.328926Z", + "iopub.status.busy": "2024-02-29T17:37:17.32865Z", "iopub.status.idle": "2024-02-29T17:37:17.463363Z", - "shell.execute_reply.started": "2024-02-29T17:37:17.328902Z", - "shell.execute_reply": "2024-02-29T17:37:17.462397Z" + "shell.execute_reply": "2024-02-29T17:37:17.462397Z", + "shell.execute_reply.started": "2024-02-29T17:37:17.328902Z" }, - "trusted": true, "id": "9_5IcGB-9TD2", "outputId": "13563a7d-01f7-47d1-c345-6ad128f303c3", - "colab": { - "base_uri": "https://localhost:8080/" - } + "trusted": true }, - "execution_count": 55, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Answer recovered from Cache. \n", "0.028 smaller than 0.35\n", @@ -1170,37 +1150,36 @@ "Time taken: 0.019 seconds\n" ] } + ], + "source": [ + "results = cache.ask(\"Briefly explain me what is a Sydenham chorea.\")" ] }, { "cell_type": "markdown", + "metadata": { + "id": "M4H8RoXFqdwE" + }, "source": [ "The two questions are so similar that their Euclidean distance is truly minimal, almost as if they were identical.\n", "\n", "Now, let's try another question, this time a bit more distinct, and observe how the system behaves." - ], - "metadata": { - "id": "M4H8RoXFqdwE" - } + ] }, { "cell_type": "code", - "source": [ - "question_def = \"Write in 20 words what is a Sydenham chorea.\"\n", - "results = cache.ask(question_def)" - ], + "execution_count": 56, "metadata": { - "id": "ysj5P_MBCqju", - "outputId": "d4639f73-dc7e-4c25-93ba-2a8c66dc7c61", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "ysj5P_MBCqju", + "outputId": "d4639f73-dc7e-4c25-93ba-2a8c66dc7c61" }, - "execution_count": 56, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Answer recovered from Cache. \n", "0.228 smaller than 0.35\n", @@ -1209,19 +1188,26 @@ "Time taken: 0.016 seconds\n" ] } + ], + "source": [ + "question_def = \"Write in 20 words what is a Sydenham chorea.\"\n", + "results = cache.ask(question_def)" ] }, { "cell_type": "markdown", - "source": [ - "We observe that the Euclidean distance has increased, but it still remains within the specified threshold. Therefore, it continues to return the response directly from the cache." - ], "metadata": { "id": "MFzXsQwB9TD3" - } + }, + "source": [ + "We observe that the Euclidean distance has increased, but it still remains within the specified threshold. Therefore, it continues to return the response directly from the cache." + ] }, { "cell_type": "markdown", + "metadata": { + "id": "Ot3wrq0p9TD3" + }, "source": [ "# Loading the model and creating the prompt\n", "Time to use the library **transformers**, the most famous library from [hugging face](https://huggingface.co/) for working with language models.\n", @@ -1231,54 +1217,64 @@ "* **AutoModelForCausalLM**: it provides an interface to pre-trained language models specifically designed for language generation tasks using causal language modeling (e.g., GPT models), or the model used in this notebook [Gemma-2b-it](https://huggingface.co/google/gemma-2b-it).\n", "\n", "Please, feel free to test [different Models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending), you need to search for NLP models trained for text-generation.\n" - ], - "metadata": { - "id": "Ot3wrq0p9TD3" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install torch" - ], + "execution_count": null, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:40:32.797334Z", "iopub.execute_input": "2024-02-29T17:40:32.797669Z", + "iopub.status.busy": "2024-02-29T17:40:32.797334Z", "iopub.status.idle": "2024-02-29T17:40:44.152114Z", - "shell.execute_reply.started": "2024-02-29T17:40:32.797635Z", - "shell.execute_reply": "2024-02-29T17:40:44.151056Z" + "shell.execute_reply": "2024-02-29T17:40:44.151056Z", + "shell.execute_reply.started": "2024-02-29T17:40:32.797635Z" }, - "trusted": true, - "id": "tdxiKqjT9TD3" + "id": "tdxiKqjT9TD3", + "trusted": true }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install torch" + ] }, { "cell_type": "code", - "source": [ - "from torch import cuda, torch\n", - "#In a MAC Silicon the device must be 'mps'\n", - "# device = torch.device('mps') #to use with MAC Silicon\n", - "device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'" - ], + "execution_count": 25, "metadata": { "execution": { - "iopub.status.busy": "2024-02-29T17:40:44.153914Z", "iopub.execute_input": "2024-02-29T17:40:44.15434Z", + "iopub.status.busy": "2024-02-29T17:40:44.153914Z", "iopub.status.idle": "2024-02-29T17:40:44.160144Z", - "shell.execute_reply.started": "2024-02-29T17:40:44.154292Z", - "shell.execute_reply": "2024-02-29T17:40:44.159154Z" + "shell.execute_reply": "2024-02-29T17:40:44.159154Z", + "shell.execute_reply.started": "2024-02-29T17:40:44.154292Z" }, - "trusted": true, - "id": "pIDMTCnH9TD7" + "id": "pIDMTCnH9TD7", + "trusted": true }, - "execution_count": 25, - "outputs": [] + "outputs": [], + "source": [ + "from torch import cuda, torch\n", + "#In a MAC Silicon the device must be 'mps'\n", + "# device = torch.device('mps') #to use with MAC Silicon\n", + "device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2024-02-29T17:41:25.628804Z", + "iopub.status.busy": "2024-02-29T17:41:25.628412Z", + "iopub.status.idle": "2024-02-29T17:41:30.202141Z", + "shell.execute_reply": "2024-02-29T17:41:30.200774Z", + "shell.execute_reply.started": "2024-02-29T17:41:25.628766Z" + }, + "id": "CU2T4lp-9TD7", + "trusted": true + }, + "outputs": [], "source": [ "from transformers import AutoTokenizer, AutoModelForCausalLM\n", "\n", @@ -1287,30 +1283,20 @@ "model = AutoModelForCausalLM.from_pretrained(model_id,\n", " device_map=\"cuda\",\n", " torch_dtype=torch.bfloat16)" - ], - "metadata": { - "execution": { - "iopub.status.busy": "2024-02-29T17:41:25.628412Z", - "iopub.execute_input": "2024-02-29T17:41:25.628804Z", - "iopub.status.idle": "2024-02-29T17:41:30.202141Z", - "shell.execute_reply.started": "2024-02-29T17:41:25.628766Z", - "shell.execute_reply": "2024-02-29T17:41:30.200774Z" - }, - "trusted": true, - "id": "CU2T4lp-9TD7" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "id": "0kdqsEbUEywG" - } + }, + "source": [] }, { "cell_type": "markdown", + "metadata": { + "id": "GzHuFrAX9TD7" + }, "source": [ "## Creating the extended prompt\n", "To create the prompt we use the result from query the 'semantic_cache' class and the question introduced by the user.\n", @@ -1318,81 +1304,73 @@ "The prompt have two parts, the **relevant context** that is the information recovered from the database and the **user's question**.\n", "\n", "We only need to put the two parts together to create the prompt then send it to the model." - ], - "metadata": { - "id": "GzHuFrAX9TD7" - } + ] }, { "cell_type": "code", - "source": [ - "prompt_template = f\"Relevant context: {results}\\n\\n The user's question: {question_def}\"\n", - "prompt_template" - ], + "execution_count": 44, "metadata": { - "id": "TdjbfAHhFuhS", - "outputId": "4090da66-328e-478e-c2d7-1957597f8786", "colab": { "base_uri": "https://localhost:8080/", "height": 209 - } + }, + "id": "TdjbfAHhFuhS", + "outputId": "4090da66-328e-478e-c2d7-1957597f8786" }, - "execution_count": 44, "outputs": [ { - "output_type": "execute_result", "data": { - "text/plain": [ - "\"Relevant context: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\\n\\n The user's question: Write in 20 words what is a Sydenham chorea.\"" - ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" - } + }, + "text/plain": [ + "\"Relevant context: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\\n\\n The user's question: Write in 20 words what is a Sydenham chorea.\"" + ] }, + "execution_count": 44, "metadata": {}, - "execution_count": 44 + "output_type": "execute_result" } + ], + "source": [ + "prompt_template = f\"Relevant context: {results}\\n\\n The user's question: {question_def}\"\n", + "prompt_template" ] }, { "cell_type": "code", - "source": [ - "input_ids = tokenizer(prompt_template, return_tensors=\"pt\").to(\"cuda\")" - ], + "execution_count": 45, "metadata": { "id": "DmYAcXEEECnz" }, - "execution_count": 45, - "outputs": [] + "outputs": [], + "source": [ + "input_ids = tokenizer(prompt_template, return_tensors=\"pt\").to(\"cuda\")" + ] }, { "cell_type": "markdown", - "source": [ - "Now all that remains is to send the prompt to the model and wait for its response!\n" - ], "metadata": { "id": "S-QXeuJ09TD8" - } + }, + "source": [ + "Now all that remains is to send the prompt to the model and wait for its response!\n" + ] }, { "cell_type": "code", - "source": [ - "outputs = model.generate(**input_ids,\n", - " max_new_tokens=256)\n", - "print(tokenizer.decode(outputs[0]))" - ], + "execution_count": 46, "metadata": { - "id": "lheL8vHpEMDD", - "outputId": "b646d648-b88d-4a29-ab30-427d00296255", "colab": { "base_uri": "https://localhost:8080/" - } + }, + "id": "lheL8vHpEMDD", + "outputId": "b646d648-b88d-4a29-ab30-427d00296255" }, - "execution_count": 46, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Relevant context: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n", "\n", @@ -1403,10 +1381,25 @@ "Sydenham chorea is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS).\n" ] } + ], + "source": [ + "outputs = model.generate(**input_ids,\n", + " max_new_tokens=256)\n", + "print(tokenizer.decode(outputs[0]))" ] }, { "cell_type": "markdown", + "metadata": { + "execution": { + "iopub.execute_input": "2023-07-12T22:01:56.993351Z", + "iopub.status.busy": "2023-07-12T22:01:56.992775Z", + "iopub.status.idle": "2023-07-12T22:01:57.001309Z", + "shell.execute_reply": "2023-07-12T22:01:56.999431Z", + "shell.execute_reply.started": "2023-07-12T22:01:56.993305Z" + }, + "id": "Uo7lGXBV9TD8" + }, "source": [ "# Conclusion.\n", "There's a 50% reduction in data retrieval time between accessing ChromaDB and going directly to the cache. However, in larger projects, this difference increases, leading to enhancements of 90-95%.\n", @@ -1416,17 +1409,48 @@ "It's common to have multiple instances of the cache class, usually based on user typology, as questions tend to repeat more among users who share common traits.\n", "\n", "In summary, we have created a very simple RAG (Retrieval-Augmented Generation) system and enhanced it with a semantic cache layer between the user's question and obtaining the information necessary to create the enriched prompt." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "machine_shape": "hm", + "provenance": [] + }, + "kaggle": { + "accelerator": "gpu", + "dataSources": [ + { + "datasetId": 3496946, + "sourceId": 6104553, + "sourceType": "datasetVersion" + } ], - "metadata": { - "execution": { - "iopub.status.busy": "2023-07-12T22:01:56.992775Z", - "iopub.execute_input": "2023-07-12T22:01:56.993351Z", - "iopub.status.idle": "2023-07-12T22:01:57.001309Z", - "shell.execute_reply.started": "2023-07-12T22:01:56.993305Z", - "shell.execute_reply": "2023-07-12T22:01:56.999431Z" - }, - "id": "Uo7lGXBV9TD8" - } + "dockerImageVersionId": 30527, + "isGpuEnabled": true, + "isInternetEnabled": true, + "language": "python", + "sourceType": "notebook" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" } - ] -} \ No newline at end of file + }, + "nbformat": 4, + "nbformat_minor": 0 +} From 99e61298bb7e0729d6b829d19322f82a6e1ac3f0 Mon Sep 17 00:00:00 2001 From: shubhagyta swaraj Date: Tue, 28 May 2024 00:08:09 +0530 Subject: [PATCH 2/3] Added eviction policy to semantic_cache class --- .../en/semantic_cache_chroma_vector_database.ipynb | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/notebooks/en/semantic_cache_chroma_vector_database.ipynb b/notebooks/en/semantic_cache_chroma_vector_database.ipynb index 99dfd052..5b7aac73 100644 --- a/notebooks/en/semantic_cache_chroma_vector_database.ipynb +++ b/notebooks/en/semantic_cache_chroma_vector_database.ipynb @@ -889,7 +889,14 @@ "\n", "Afterward, checks if it is within the specified threshold. If positive, it directly returns the response from the cache; otherwise, it calls the `query_database` function to retrieve the data from ChromaDB.\n", "\n", - "I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.\n" + "I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.\n", + "\n", + "I have included FIFO eviction policy in the semantic_cache class, which aims to improve its efficiency and flexibility. By introducing eviction policies, we provide users with the ability to control how the cache behaves when it reaches its maximum capacity. This is crucial for maintaining optimal cache performance and for handling situations where the available memory is constrained. \n", + "\n", + "Looking at the structure of the cache, the implementation of FIFO seemed straightforward. Whenever a new question-answer pair is added to the cache, it's appended to the end of the lists. Thus, the oldest (first-in) items are at the front of the lists. When the cache reaches its maximum size and you need to evict an item, you remove (pop) the first item from each list. This is the FIFO eviction policy. \n", + "\n", + "\n", + "However, for LRU (Least-Recently-Used), the implementation is more complex because this policy requires knowledge of when each item in the cache was last accessed. \n" ] }, { @@ -988,6 +995,7 @@ " self.evict()\n", "\n", " store_cache(self.json_file, self.cache)\n", + " \n", " end_time = time.time()\n", " elapsed_time = end_time - start_time\n", " print(f\"Time taken: {elapsed_time:.3f} seconds\")\n", From 2b349acff943c4af13296e7b4409d81e989daa41 Mon Sep 17 00:00:00 2001 From: Shubhagyta Swaraj Jayswal Date: Thu, 30 May 2024 02:25:51 +0530 Subject: [PATCH 3/3] removed LRU from the docstring, will be implemented later --- notebooks/en/semantic_cache_chroma_vector_database.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/notebooks/en/semantic_cache_chroma_vector_database.ipynb b/notebooks/en/semantic_cache_chroma_vector_database.ipynb index 5b7aac73..db4395c2 100644 --- a/notebooks/en/semantic_cache_chroma_vector_database.ipynb +++ b/notebooks/en/semantic_cache_chroma_vector_database.ipynb @@ -896,7 +896,7 @@ "Looking at the structure of the cache, the implementation of FIFO seemed straightforward. Whenever a new question-answer pair is added to the cache, it's appended to the end of the lists. Thus, the oldest (first-in) items are at the front of the lists. When the cache reaches its maximum size and you need to evict an item, you remove (pop) the first item from each list. This is the FIFO eviction policy. \n", "\n", "\n", - "However, for LRU (Least-Recently-Used), the implementation is more complex because this policy requires knowledge of when each item in the cache was last accessed. \n" + "Another eviction policy is the Least Recently Used (LRU) policy, which is more complex because it requires knowledge of when each item in the cache was last accessed. However, this policy is not yet available and will be implemented later.\n" ] }, { @@ -924,7 +924,7 @@ " thresold (float): The threshold for the Euclidean distance to determine if a question is similar.\n", " max_response (int): The maximum number of responses the cache can store.\n", " eviction_policy (str): The policy for evicting items from the cache. \n", - " This can be 'LRU' (Least Recently Used) or 'FIFO' (First In First Out).\n", + " This can be any policy, but 'FIFO' (First In First Out) has been implemented for now.\n", " If None, no eviction policy will be applied.\n", " \"\"\"\n", " \n",