From e0353f795325b5adc2671f82bc2ccde31c70f012 Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Fri, 21 Jun 2024 14:09:41 +0900 Subject: [PATCH 1/8] Add an RAG via Elasticsearch cookbook I thought it would be useful to provide a cookbook for a RAG backed by Elasticsearch, Huggingface models, and Gemma. Users can easily toggle between vectorisation that is offloaded to ES and self-provided (pre)-vectorisation. This is based on the excellent MongoDB notebook and dataset. Signed-off-by: lloydmeta --- notebooks/en/_toctree.yml | 2 + ...ith_hugging_face_gemma_elasticsearch.ipynb | 6320 +++++++++++++++++ 2 files changed, 6322 insertions(+) create mode 100644 notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml index 85b5853..fd0a37c 100644 --- a/notebooks/en/_toctree.yml +++ b/notebooks/en/_toctree.yml @@ -27,6 +27,8 @@ title: Detecting Issues in a Text Dataset with Cleanlab - local: annotate_text_data_transformers_via_active_learning title: Annotate text data using Active Learning with Cleanlab + - local: rag_with_hugging_face_gemma_elasticsearch + title: Building a RAG System with Gemma, Elasticsearch and Open Source Models - local: rag_with_hugging_face_gemma_mongodb title: Building A RAG System with Gemma, MongoDB and Open Source Models - local: rag_zephyr_langchain diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb new file mode 100644 index 0000000..5f5b7c2 --- /dev/null +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -0,0 +1,6320 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "qsmx4MGD6QSp" + }, + "source": [ + "# Building A RAG System with Gemma, Elasticsearch and Huggingface Models\n", + "\n", + "\n", + " \"Open\n", + "\n", + "\n", + "\n", + "Authored By: [lloydmeta](https://huggingface.co/lloydmeta)\n", + "\n", + "This notebook walks you through building a Retrieve-Augmented-Generation (RAG) powered by Elasticsearch (ES) and Huggingface models, letting you toggle between ES-vectorising vs self-vectorising.\n", + "\n", + "**Note**: this notebook has been tested with ES 8.12.2." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BIL0BjjF6QSt" + }, + "source": [ + "## Step 0: Installing Libraries\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gVSo_nNOUsdn" + }, + "outputs": [], + "source": [ + "!pip install datasets elasticsearch sentence_transformers transformers eland accelerate" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "asQZzrNASBPI" + }, + "source": [ + "## Step 1: Set up\n", + "\n", + "### Credentials\n", + "\n", + "#### Huggingface\n", + "This allows you to authenticate with Huggingface to download models and datasets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NL-NG4jjXb0I" + }, + "outputs": [], + "source": [ + "from huggingface_hub import notebook_login\n", + "\n", + "notebook_login()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ov9J5o5AEjzK" + }, + "source": [ + "#### Elasticsearch deployment\n", + "\n", + "Let's make sure that you can access your Elasticsearch deployment. If you don't have one, create one at [Elastic Cloud](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-a-cloud-deployment)\n", + "\n", + "Ensure you ahve `CLOUD_ID` and `ELASTIC_DEPL_API_KEY` saved as Colab secrets.\n", + "See [this tweet for details](https://twitter.com/GoogleColab/status/1719798406195867814)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "2pSSD57kn14y" + }, + "outputs": [], + "source": [ + "from google.colab import userdata\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n", + "CLOUD_ID = userdata.get(\"CLOUD_ID\") # or \"\"\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n", + "ELASTIC_API_KEY = userdata.get(\"ELASTIC_DEPL_API_KEY\") # or \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uewOyerWGx9p" + }, + "source": [ + "Set up the client and make sure the credentials work." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WDt5s-AFYVZE", + "outputId": "baa481f9-50d6-43fd-9d87-78cadbd75dd6" + }, + "outputs": [], + "source": [ + "from elasticsearch import Elasticsearch, helpers\n", + "\n", + "# Create the client instance\n", + "client = Elasticsearch(cloud_id=CLOUD_ID, api_key=ELASTIC_API_KEY)\n", + "\n", + "# Successful response!\n", + "client.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xuczg29FFZVN" + }, + "source": [ + "### Choose data and query vectorisation options\n", + "\n", + "Here, you need to make a decision: do you want Elasticsearch to vectorise your data and queries, or do you want to do it yourself?\n", + "\n", + "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you chose is to obig).\n", + "\n", + "If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model \"locally\" for data and query vectorisation.\n", + "\n", + "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. Running vectorisation on ES means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to toggle it to `False`!\n", + "\n", + "**Note**: if you change these values, you'll likely need to re-run the notebook from this step." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "whuUt6GySrkk" + }, + "outputs": [], + "source": [ + "USE_ELASTICSEARCH_VECTORISATION = True\n", + "\n", + "EMBEDDING_MODEL_ID = \"thenlper/gte-small\"\n", + "# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model\n", + "# If you use the `gte-base` or `gte-large` embedding models, the numDimension\n", + "# value in the vector search index must be set to 768 and 1024, respectively.\n", + "EMBEDDING_DIMENSIONS = 384" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kF3A7uGc6QSv" + }, + "source": [ + "## Step 2: Data sourcing and preparation\n", + "\n", + "The data utilised in this tutorial is sourced from Hugging Face datasets, specifically the\n", + "[MongoDB/embedded_movies dataset](https://huggingface.co/datasets/MongoDB/embedded_movies)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 232, + "referenced_widgets": [ + "3416975d721243048c39b99af297af44", + "c96f16ab241d4b60813235ed2d5e3580", + "a728d91c3714483a93469ec826e19e08", + "23f3b48bb1644f91b2733f2045bae5f5", + "9a4d9e10fd824105a3a0f66dbcbe9767", + "77a8d8603e81430ab884314918ca45f9", + "a988b6a810b34e7495c086fde8dedd7f", + "a676673d26834b628283d76b5d4302dc", + "01d348af5fe547f4b24e98539cf91f2f", + "dacffd74d0bf46d99f293cfc4430d0f4", + "12e80316a20e4fef8d19139dc5aaf240", + "0e5e3e07ae6a44df99b963259178fe87", + "9e30a1f48eb742ff91b00e4951c2902f", + "27f80f2d47f14feeb6cc0c02076290d5", + "7e7695b6889d42f6afa66d61e0b9f000", + "8149fd2e5eee427cb15ab22644da99ea", + "d3f7f9a12c004d62a48db61ae1bb413a", + "b78cf8ca5f244198a272ab4501c3f28d", + "67784de672194cb885e40ba4d43bbeb4", + "c22e87fbd1754eb6abd336d111e699a2", + "bf91048aaafc45ceb56c83755f9f140e", + "0cef3c2b245241ad938a8b8c9a00e335", + "ab7454174422451b94a6d1beea3a4b61", + "7de4ad29fc594f7db60ccac865e945c4", + "6b74827e673e4530aac1ea1defae1b8c", + "dae854ebfff04e0ebe132e9588195c17", + "3ba409ad57e84f86950ed12b611b5bf2", + "5e418aea478c4c28a95a4ac7c5fb420e", + "6f3cfa2a8e374a0faed68861d19db428", + "1d1f2d5311934718baaded44f1e495ec", + "ac3a6ce7852340089cceff5f1ce18ee3", + "17a8be19a6b1465bbf00d4a0d905841e", + "491da9ab200d4680a4b55cfb8b0d9c4f" + ] + }, + "id": "5gCzss27UwWw", + "outputId": "862ad48b-34ad-4206-c93e-c82f7e587638" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3416975d721243048c39b99af297af44", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Downloading readme: 0%| | 0.00/6.18k [00:00 list[float]:\n", + " if USE_ELASTICSEARCH_VECTORISATION:\n", + " raise Exception(\n", + " f\"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]\"\n", + " )\n", + " else:\n", + " if not text.strip():\n", + " print(\"Attempted to get embedding for empty text.\")\n", + " return []\n", + "\n", + " embedding = embedding_model.encode(text)\n", + " return embedding.tolist()\n", + "\n", + "\n", + "def add_fullplot_embedding(x):\n", + " if USE_ELASTICSEARCH_VECTORISATION:\n", + " raise Exception(\n", + " f\"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]\"\n", + " )\n", + " else:\n", + " full_plots = x[\"fullplot\"]\n", + " return {\"embedding\": [get_embedding(full_plot) for full_plot in full_plots]}\n", + "\n", + "\n", + "if not USE_ELASTICSEARCH_VECTORISATION:\n", + " dataset = dataset.map(add_fullplot_embedding, batched=True)\n", + " dataset[\"train\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7gZ5fno6QSw" + }, + "source": [ + "## Step 4: Create a Search Index with vector search mappings.\n", + "\n", + "At this point, we create an index in Elasticsearch with the right index mappings to handle vector searches.\n", + "\n", + "Go here to read more about [Elasticsearch vector capabilities](https://www.elastic.co/what-is/vector-search)." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "n3gERSl_uFO2", + "outputId": "3307ca4d-6a32-4a6c-dfe3-b8cc3280d938" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Creating index movies\n" + ] + }, + { + "data": { + "text/plain": [ + "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Needs to match the id returned from Eland\n", + "# in general for Huggingface models, you just replace the forward slash with\n", + "# double underscore\n", + "model_id = EMBEDDING_MODEL_ID.replace(\"/\", \"__\")\n", + "\n", + "index_name = \"movies\"\n", + "\n", + "index_mapping = {\n", + " \"properties\": {\n", + " \"fullplot\": {\"type\": \"text\"},\n", + " \"plot\": {\"type\": \"text\"},\n", + " \"title\": {\"type\": \"text\"},\n", + " }\n", + "}\n", + "# define index mapping\n", + "if USE_ELASTICSEARCH_VECTORISATION:\n", + " index_mapping[\"properties\"][\"embedding\"] = {\n", + " \"properties\": {\n", + " \"is_truncated\": {\"type\": \"boolean\"},\n", + " \"model_id\": {\n", + " \"type\": \"text\",\n", + " \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n", + " },\n", + " \"predicted_value\": {\n", + " \"type\": \"dense_vector\",\n", + " \"dims\": EMBEDDING_DIMENSIONS,\n", + " \"index\": True,\n", + " \"similarity\": \"cosine\",\n", + " },\n", + " }\n", + " }\n", + "else:\n", + " index_mapping[\"properties\"][\"embedding\"] = {\n", + " \"type\": \"dense_vector\",\n", + " \"dims\": EMBEDDING_DIMENSIONS,\n", + " \"index\": \"true\",\n", + " \"similarity\": \"cosine\",\n", + " }\n", + "\n", + "# flag to check if index has to be deleted before creating\n", + "should_delete_index = True\n", + "\n", + "# check if we want to delete index before creating the index\n", + "if should_delete_index:\n", + " if client.indices.exists(index=index_name):\n", + " print(\"Deleting existing %s\" % index_name)\n", + " client.indices.delete(index=index_name, ignore=[400, 404])\n", + "\n", + "print(\"Creating index %s\" % index_name)\n", + "\n", + "\n", + "# ingest pipeline definition\n", + "if USE_ELASTICSEARCH_VECTORISATION:\n", + " pipeline_id = \"vectorize_fullplots\"\n", + "\n", + " client.ingest.put_pipeline(\n", + " id=pipeline_id,\n", + " processors=[\n", + " {\n", + " \"inference\": {\n", + " \"model_id\": model_id,\n", + " \"target_field\": \"embedding\",\n", + " \"field_map\": {\"fullplot\": \"text_field\"},\n", + " }\n", + " }\n", + " ],\n", + " )\n", + "\n", + " index_settings = {\n", + " \"index\": {\n", + " \"default_pipeline\": pipeline_id,\n", + " }\n", + " }\n", + "else:\n", + " index_settings = {}\n", + "\n", + "client.options(ignore_status=[400, 404]).indices.create(\n", + " index=index_name, mappings=index_mapping, settings=index_settings\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "neANZEH96QSx" + }, + "source": [ + "Ingesting data into a Elasticsearch is best done in batches. Luckily `helpers` offers an esasy way to do this." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mH2BAuhYva6U", + "outputId": "0beaf822-1052-4f65-cfd2-eb8cec96e666" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "batch: start [0], end [100]\n", + "batch: start [100], end [200]\n", + "batch: start [200], end [300]\n", + "batch: start [300], end [400]\n", + "batch: start [400], end [500]\n", + "batch: start [500], end [600]\n", + "batch: start [600], end [700]\n", + "batch: start [700], end [800]\n", + "batch: start [800], end [900]\n", + "batch: start [900], end [1000]\n", + "batch: start [1000], end [1100]\n", + "batch: start [1100], end [1200]\n", + "batch: start [1200], end [1300]\n", + "batch: start [1300], end [1400]\n", + "batch: start [1400], end [1452]\n", + "Data ingestion into Elasticsearch complete!\n" + ] + } + ], + "source": [ + "from elasticsearch.helpers import BulkIndexError\n", + "\n", + "def batch_to_bulk_actions(batch):\n", + " for record in batch:\n", + " action = {\n", + " \"_index\": \"movies\",\n", + " \"_source\": {\n", + " \"title\": record[\"title\"],\n", + " \"fullplot\": record[\"fullplot\"],\n", + " \"plot\": record[\"plot\"],\n", + " },\n", + " }\n", + " if not USE_ELASTICSEARCH_VECTORISATION:\n", + " action[\"_source\"][\"embedding\"] = record[\"embedding\"]\n", + " yield action\n", + "\n", + "\n", + "def bulk_index(ds):\n", + " start = 0\n", + " end = len(ds)\n", + " batch_size = 100\n", + " if USE_ELASTICSEARCH_VECTORISATION:\n", + " # If using auto-embedding, bulk requests can take a lot longer,\n", + " # so pass a longer request_timeout here (defaults to 10s), otherwise\n", + " # we could get Connection timeouts\n", + " batch_client = client.options(request_timeout=600)\n", + " else:\n", + " batch_client = client\n", + " for batch_start in range(start, end, batch_size):\n", + " batch_end = min(batch_start + batch_size, end)\n", + " print(f\"batch: start [{batch_start}], end [{batch_end}]\")\n", + " batch = ds.select(range(batch_start, batch_end))\n", + " actions = batch_to_bulk_actions(batch)\n", + " helpers.bulk(batch_client, actions)\n", + "\n", + "\n", + "try:\n", + " bulk_index(dataset[\"train\"])\n", + "except BulkIndexError as e:\n", + " print(f\"{e.errors}\")\n", + "\n", + "print(\"Data ingestion into Elasticsearch complete!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rDl8GBg_6QSx" + }, + "source": [ + "## Step 5: Perform Vector Search on User Queries\n", + "\n", + "The following step implements a function that returns a vector search result.\n", + "\n", + "If `USE_ELASTICSEARCH_VECTORISATION` is true, the text query is sent directly to\n", + "ES where the uploaded model will be used to vectorise it first before doing a vector search. If `USE_ELASTICSEARCH_VECTORISATION` is false, then we do the\n", + "vectorising locally before sending a query with the vectorised form of the query." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "e9RLHJsdwG44" + }, + "outputs": [], + "source": [ + "def vector_search(plot_query):\n", + " if USE_ELASTICSEARCH_VECTORISATION:\n", + " knn = {\n", + " \"field\": \"embedding.predicted_value\",\n", + " \"k\": 10,\n", + " \"query_vector_builder\": {\n", + " \"text_embedding\": {\n", + " \"model_id\": model_id,\n", + " \"model_text\": plot_query,\n", + " }\n", + " },\n", + " \"num_candidates\": 150,\n", + " }\n", + " else:\n", + " question_embedding = get_embedding(plot_query)\n", + " knn = {\n", + " \"field\": \"embedding\",\n", + " \"query_vector\": question_embedding,\n", + " \"k\": 10,\n", + " \"num_candidates\": 150,\n", + " }\n", + "\n", + " response = client.search(index=\"movies\", knn=knn, size=5)\n", + " results = []\n", + " for hit in response[\"hits\"][\"hits\"]:\n", + " id = hit[\"_id\"]\n", + " score = hit[\"_score\"]\n", + " title = hit[\"_source\"][\"title\"]\n", + " plot = hit[\"_source\"][\"plot\"]\n", + " fullplot = hit[\"_source\"][\"fullplot\"]\n", + " result = {\n", + " \"id\": id,\n", + " \"_score\": score,\n", + " \"title\": title,\n", + " \"plot\": plot,\n", + " \"fullplot\": fullplot,\n", + " }\n", + " results.append(result)\n", + " return results\n", + "\n", + "def pretty_search(query):\n", + "\n", + " get_knowledge = vector_search(query)\n", + "\n", + " search_result = \"\"\n", + " for result in get_knowledge:\n", + " search_result += f\"Title: {result.get('title', 'N/A')}, Plot: {result.get('fullplot', 'N/A')}\\n\"\n", + "\n", + " return search_result" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bMou2fWE6QSy" + }, + "source": [ + "## Step 6: Handling user queries and loading Gemma\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Z4L4SfueU6PY", + "outputId": "f6343803-30e6-4c40-cc81-5246af8f91a5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Query: What is the best romantic movie to watch and why?\n", + "Continue to answer the query by using these Search Results:\n", + "Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love?\n", + "Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl.\n", + "Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then?\n", + "Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then?\n", + "Title: No Good Deed, Plot: About a police detective, Jack, who, while doing a friend a favor and searching for a runaway teenager on Turk Street, stumbles upon a bizarre band of criminals about to pull off a bank robbery. Jack finds himself being held hostage while the criminals decide what to do with him, and the leader's beautiful girlfriend, Erin, is left alone to watch Jack. Erin, who we discover is a master manipulator of the men in the gang, reveals another side to Jack - a melancholy romantic who could have been a classical cellist. She finds Jack's captivity an irresistible turn-on and he can't figure out if she's for real, or manipulating him, too. Before the gang returns, Jack and Erin's connection intensifies and who ends up with the money is anyone's guess.\n", + ".\n" + ] + } + ], + "source": [ + "# Conduct query with retrival of sources, combining results into something that\n", + "# we can feed to Gemma\n", + "def combined_query(query):\n", + " source_information = pretty_search(query)\n", + " return f\"Query: {query}\\nContinue to answer the query by using these Search Results:\\n{source_information}.\"\n", + "\n", + "\n", + "query = \"What is the best romantic movie to watch and why?\"\n", + "combined_results = combined_query(query)\n", + "\n", + "print(combined_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j93-caKRLyCZ" + }, + "source": [ + "Load our LLM" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 369, + "referenced_widgets": [ + "fd3e46e5b3dc4a95a4b752559ca59976", + "b48ad574aad04703b3b3e8a7c8c4e3e7", + "27c1dd63c0d24c1aae3eb679428191d8", + "4309ca1f71a142fba32037d1f3737992", + "f97b7986c1a14fe28c0f17f0b278b9a3", + "4d12a4f8c0e142c2bcde3e2f602cd642", + "0d2d03d1ce0b4c8eac9c08afc5fced88", + "bba35b6e6e064493841300004e4bebca", + "e89387022e8c40b698d5539f1d4a46eb", + "b14a00ba05ba475388043c03c86889f3", + "00f6634f1fc745f2964638d9c6aae4b4", + "88b6aeb3eac141b3afd8e58d594d3312", + "7a005901e17a4618a063279f97aed88c", + "731163f3573a496e95007f78b0dca252", + "dc57a47113624f55bd7b26a9596b980c", + "8d51f6c97d764d4dae0aeb0d62feff3d", + "04601c70d7994a4fb6a53280a54ce10b", + "2701d95ed5574bbc857e6390c487efcc", + "a8917741fa9e4274a3389d91bb4401d6", + "0e647c5d080a412690b9482ca075fc64", + "69626ea9a332403b8fa7395ec9f38620", + "bd57ef338a1746eaa99448db76d4a63d", + "875a84222e774d3c9eb80284932fe2e7", + "2b624e96c2ca4bbda2923698f72b747d", + "d29fa38babda469d9035a36f7b1f2127", + "e07a89d5ec874dab8d9247091bc7ba36", + "7e8b50ba62b443f89bb9edcccdd445bf", + "f8b43c667bbb4d2db94f9be05ac98157", + "1e52150646c242aab30241bbdaf114a4", + "e804fa638cde44ff948c4598754aa388", + "7db129e34e444e48ab6c580f8a04c45e", + "389aba0a89aa41b4af6b66db3ee706c4", + "1a3d7ebea4b34c76a680331a2908cd03", + "48045a8f337f427989391d2ac9364e19", + "1d1779ae159449c7b9331c2a5ef59f8d", + "c6d097ea17784cb58f4345a63576f9f5", + "efe1219392414cb788a9edbfa27910fa", + "26cfed8a88844a57b895e5a653036db8", + "f372b720336d4cb79b3506a16574f5b3", + "0acbd23304d048a09465c972c4552eac", + "c5f8460b95644d23894f2b84609cf694", + "a8dcb560626641a89ee6cf3a479e950d", + "bebc8a25e3684914841e7a044e86e187", + "d861941d02c54bcf9fdb1c2da2fbd097", + "ca19c76d304b4f64a29de0942cf89289", + "cb6bf240a24f4f339c33e245eb5d33b2", + "dbdbc998460f45558a12297f18dd29cf", + "aa3b99440ba3483bb8bb3f820409dcf9", + "43a1a7621c7f4da5923846b09191148f", + "0955002919cc4953b07b26da573f5f2c", + "73d953e5a0684b77be6d135e774e26fd", + "982306a9c4ce46c897cc0e8d9de21662", + "dc56c3e068df460a8214fff574268dea", + "d44c2d61b6fa4fc78364561652d1871b", + "b2ddc042c22e48bd8dd08c4d2e0595eb", + "8bc49277bed148c7b3865ab562c8ef13", + "1bf54ca09a6146d5af48197841813ed3", + "1c836ba3f588469ca1a0e7f13d0a1713", + "4663eb311c204607bda0d964c1081016", + "470595f960a54c68a92709b39f03147c", + "f6cc7e16a884443f9262efd56a18cd82", + "b745b5ede8f44515b7af076cb87dfde9", + "fbcd0bcc58064ae88a3e7ebe0eff6210", + "97ca6a79d1ea4499939bf5f20277c2a6", + "42893f0d8ad1498db9006daeeae7d24a", + "9a6c67596a4c41c5b9855d14aeb8ab33", + "a7160629d0514f3dbf0823d619d1c697", + "b211c75d452542a2aae17527a85749cd", + "e4a3742738384c8caa6d581eae599b04", + "9647ce9a103b48a0bace5791bb9d7f4d", + "486e4d6aa81c4aa0afdf3aadc6cdec73", + "f5689f11415f4c54bfa08d20f0e69b94", + "d71df090341542eab8449a92564b1511", + "246acdcd2afa490ba80e9cd5d65e37d1", + "75d7da5e10034274aa888cc53239d2f1", + "c6a56702d0d14261aa3321f0025f3ac3", + "7ae3854f9f794a88b0bb435e33f21769", + "5a2acc95b8024a51b49d5b637899b186", + "77d1f0e90be9440cbfbeb45e7b99865c", + "073d83dd85834233819348ab10b668ab", + "8a63fd367ce440e9a71314e90e4fe60d", + "cb9686bc58d341a9a05088b8eaf7f8b3", + "689619a86e0245b69c89d9f4248ca1cf", + "4b775768a9c84bf98c280cec1d9004d2", + "2ec150b4d6ec436fa675d728a4b37ce4", + "0e586865eed240da8d72a19e958ee0b8", + "cfd943bc01dd45efa7a84e21620a2149", + "8000001faa1e402e93abad8f0d147499", + "a9098f2236db4e65bd243b6b81bd5677", + "c3b52f6b69ce4b018e561d68314e068b", + "e7ce8eef05f142c4be9cc18894581def", + "1a3ce0f10d3d46d5b452a0e39e6bf067", + "40e167a97d0a42d1b06e463cc828ee31", + "21723e2868e04f208e9fb848a2c8fecb", + "cd21eabf159a49b5af791870f594eef4", + "789d82b09c7f4604a10f80511f44a42e", + "e1adfad5e87b42248876dbc085b70581", + "72bdcbe0f2d84b90b22031ebcbaa587d", + "e00a7057b6474ee48acfc0c527abb1d1", + "f4fdda1191cf40c3881ca7cf1dccb442", + "fff6ba8fd4204d47a98dad66e97fcc54", + "075798e3bbf548d8b5084e0d70bcf9ed", + "2577a409f2804c6e980a8d03fd17ef15", + "b7d2ba60ac544507b4c4d94adc2a05d0", + "0497cec156c242fd8fba2f62cc6f90ce", + "0a26adeb0cf6486bb0486efccf7ee7c5", + "75c34781e2924e9f8e6d8fecbd1b40d3", + "533e39e0a66245a1a7911c10c2cbaec3", + "1995e3e20bd843f9b564fb0454d0edea", + "876ab2c59f984e9dbaeee4af4192414e", + "25936775f2944468a770a8d811e6221d", + "297d68dba8a34274aa9d781377df37ba", + "e0f8443147214925bdc8438130987c1b", + "fa084b9c19104efa846f2b0999499f88", + "cb473ea349e44ecabe9abd69ff45ce9d", + "033c9145b92b4644b2ed815d42a0938c", + "f561ee8791ec4b2b8f6422ec4e367584", + "fd8d0ae20554493694445059c17aa050", + "b58747cea7d54725820633740e01d82e", + "3f47aa54d2b84ecabcb14b29220bd0c0", + "4af02bc4a30346b79c4714953aab6165" + ] + }, + "id": "OYGmKVv9mm8g", + "outputId": "5ae11253-29ee-4215-c1c0-8e57164fa9ae" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fd3e46e5b3dc4a95a4b752559ca59976", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "tokenizer_config.json: 0%| | 0.00/2.16k [00:00 Date: Tue, 25 Jun 2024 10:06:06 +0900 Subject: [PATCH 2/8] Address review feedback Signed-off-by: lloydmeta --- ...ith_hugging_face_gemma_elasticsearch.ipynb | 54 +++++++++++-------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index 5f5b7c2..3ac4f69 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -6,7 +6,7 @@ "id": "qsmx4MGD6QSp" }, "source": [ - "# Building A RAG System with Gemma, Elasticsearch and Huggingface Models\n", + "# Building A RAG System with Gemma, Elasticsearch and Hugging Face Models\n", "\n", "\n", " \"Open\n", @@ -15,9 +15,12 @@ "\n", "Authored By: [lloydmeta](https://huggingface.co/lloydmeta)\n", "\n", - "This notebook walks you through building a Retrieve-Augmented-Generation (RAG) powered by Elasticsearch (ES) and Huggingface models, letting you toggle between ES-vectorising vs self-vectorising.\n", + "This notebook walks you through building a Retrieve-Augmented-Generation (RAG) powered by Elasticsearch (ES) and Hugging Face models, letting you toggle between ES-vectorising (your ES cluster vectorises for you when ingesting and querying) vs self-vectorising (you vectorise all your data before sending it to ES).\n", "\n", - "**Note**: this notebook has been tested with ES 8.12.2." + "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below !\n", + "\n", + "> [!TIP]\n", + "> This notebook has been tested with ES 8.12.2." ] }, { @@ -26,9 +29,16 @@ "id": "BIL0BjjF6QSt" }, "source": [ - "## Step 0: Installing Libraries\n" + "## Step 1: Installing Libraries\n" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -46,12 +56,10 @@ "id": "asQZzrNASBPI" }, "source": [ - "## Step 1: Set up\n", + "## Step 2: Set up\n", "\n", - "### Credentials\n", - "\n", - "#### Huggingface\n", - "This allows you to authenticate with Huggingface to download models and datasets." + "### Hugging Face\n", + "This allows you to authenticate with Hugging Face to download models and datasets." ] }, { @@ -78,7 +86,8 @@ "Let's make sure that you can access your Elasticsearch deployment. If you don't have one, create one at [Elastic Cloud](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-a-cloud-deployment)\n", "\n", "Ensure you ahve `CLOUD_ID` and `ELASTIC_DEPL_API_KEY` saved as Colab secrets.\n", - "See [this tweet for details](https://twitter.com/GoogleColab/status/1719798406195867814)." + "\n", + "![Image of how to set up secrets using Google Colab](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/colab-secrets.jpeg)." ] }, { @@ -138,11 +147,10 @@ "\n", "Here, you need to make a decision: do you want Elasticsearch to vectorise your data and queries, or do you want to do it yourself?\n", "\n", - "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you chose is to obig).\n", + "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you chose is too big).\n", "\n", "If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model \"locally\" for data and query vectorisation.\n", "\n", - "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. Running vectorisation on ES means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to toggle it to `False`!\n", "\n", "**Note**: if you change these values, you'll likely need to re-run the notebook from this step." ] @@ -170,7 +178,7 @@ "id": "kF3A7uGc6QSv" }, "source": [ - "## Step 2: Data sourcing and preparation\n", + "## Step 3: Data sourcing and preparation\n", "\n", "The data utilised in this tutorial is sourced from Hugging Face datasets, specifically the\n", "[MongoDB/embedded_movies dataset](https://huggingface.co/datasets/MongoDB/embedded_movies)." @@ -299,7 +307,7 @@ "source": [ "The operations within the following code snippet below focus on enforcing data integrity and quality.\n", "1. The first process ensures that each data point's `fullplot` attribute is not empty, as this is the primary data we utilise in the embedding process.\n", - "2. This step also ensures we remove the `plot_embedding` attribute from all data points as this will be replaced by new embeddings created with a different embedding model, the `gte-large`." + "2. The second step also ensures we remove the `plot_embedding` attribute from all data points as this will be replaced by new embeddings created with a different embedding model, the `gte-large`." ] }, { @@ -374,7 +382,7 @@ "id": "XB7YbW1f6QSw" }, "source": [ - "## Step 3: Load Elasticsearch with vectorised data" + "## Step 4: Load Elasticsearch with vectorised data" ] }, { @@ -383,9 +391,9 @@ "id": "uQuE6pwJZUEp" }, "source": [ - "### Load Huggingface model into Elasticsearch if needed\n", + "### Load Hugging Face model into Elasticsearch if needed\n", "\n", - "This step loads and deploys the Huggingface model into Elasticsearch using [Eland](https://eland.readthedocs.io/en/v8.12.1/), if `USE_ELASTICSEARCH_VECTORISATION` is `True`. This allows Elasticsearch to vectorise your queries, and data in later steps." + "This step loads and deploys the Hugging Face model into Elasticsearch using [Eland](https://eland.readthedocs.io/en/v8.12.1/), if `USE_ELASTICSEARCH_VECTORISATION` is `True`. This allows Elasticsearch to vectorise your queries, and data in later steps." ] }, { @@ -461,7 +469,7 @@ "id": "i7gZ5fno6QSw" }, "source": [ - "## Step 4: Create a Search Index with vector search mappings.\n", + "## Step 5: Create a Search Index with vector search mappings.\n", "\n", "At this point, we create an index in Elasticsearch with the right index mappings to handle vector searches.\n", "\n", @@ -499,7 +507,7 @@ ], "source": [ "# Needs to match the id returned from Eland\n", - "# in general for Huggingface models, you just replace the forward slash with\n", + "# in general for Hugging Face models, you just replace the forward slash with\n", "# double underscore\n", "model_id = EMBEDDING_MODEL_ID.replace(\"/\", \"__\")\n", "\n", @@ -585,7 +593,7 @@ "id": "neANZEH96QSx" }, "source": [ - "Ingesting data into a Elasticsearch is best done in batches. Luckily `helpers` offers an esasy way to do this." + "Ingesting data into a Elasticsearch is best done in batches. Luckily `helpers` offers an easy way to do this." ] }, { @@ -673,7 +681,7 @@ "id": "rDl8GBg_6QSx" }, "source": [ - "## Step 5: Perform Vector Search on User Queries\n", + "## Step 6: Perform Vector Search on User Queries\n", "\n", "The following step implements a function that returns a vector search result.\n", "\n", @@ -747,7 +755,7 @@ "id": "bMou2fWE6QSy" }, "source": [ - "## Step 6: Handling user queries and loading Gemma\n" + "## Step 7: Handling user queries and loading Gemma\n" ] }, { @@ -796,7 +804,7 @@ "id": "j93-caKRLyCZ" }, "source": [ - "Load our LLM" + "Load our LLM (here we use [google/gemma-2b-lt](https://huggingface.co/google/gemma-2b-it))" ] }, { From 579a60d84c06ebc86ce3b0a49c396612067bb40a Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 12:04:19 +0900 Subject: [PATCH 3/8] Review feedback Signed-off-by: lloydmeta --- ...ith_hugging_face_gemma_elasticsearch.ipynb | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index 3ac4f69..7b79dcd 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -15,9 +15,9 @@ "\n", "Authored By: [lloydmeta](https://huggingface.co/lloydmeta)\n", "\n", - "This notebook walks you through building a Retrieve-Augmented-Generation (RAG) powered by Elasticsearch (ES) and Hugging Face models, letting you toggle between ES-vectorising (your ES cluster vectorises for you when ingesting and querying) vs self-vectorising (you vectorise all your data before sending it to ES).\n", + "This notebook walks you through building a Retrieve-Augmented Generation (RAG) powered by Elasticsearch (ES) and Hugging Face models, letting you toggle between ES-vectorising (your ES cluster vectorises for you when ingesting and querying) vs self-vectorising (you vectorise all your data before sending it to ES).\n", "\n", - "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below !\n", + "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below!\n", "\n", "> [!TIP]\n", "> This notebook has been tested with ES 8.12.2." @@ -32,13 +32,6 @@ "## Step 1: Installing Libraries\n" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "code", "execution_count": null, @@ -83,11 +76,11 @@ "source": [ "#### Elasticsearch deployment\n", "\n", - "Let's make sure that you can access your Elasticsearch deployment. If you don't have one, create one at [Elastic Cloud](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-a-cloud-deployment)\n", + "Let's make sure that you can access your Elasticsearch deployment. If you don't have one, create one at [Elastic Cloud](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-a-cloud-deployment).\n", "\n", - "Ensure you ahve `CLOUD_ID` and `ELASTIC_DEPL_API_KEY` saved as Colab secrets.\n", + "Ensure you have `CLOUD_ID` and `ELASTIC_DEPL_API_KEY` saved as Colab secrets.\n", "\n", - "![Image of how to set up secrets using Google Colab](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/colab-secrets.jpeg)." + "![Image of how to set up secrets using Google Colab](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/colab-secrets.jpeg)" ] }, { @@ -147,7 +140,7 @@ "\n", "Here, you need to make a decision: do you want Elasticsearch to vectorise your data and queries, or do you want to do it yourself?\n", "\n", - "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you chose is too big).\n", + "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you choose is too big).\n", "\n", "If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model \"locally\" for data and query vectorisation.\n", "\n", From c6ed6822504057b2223c53ad6da88cb1639ba8cb Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 20:41:37 +0900 Subject: [PATCH 4/8] Review feedback Move the embedding and embedding model-related settings down. Add some links and info Signed-off-by: lloydmeta --- ...ith_hugging_face_gemma_elasticsearch.ipynb | 71 ++++++++++--------- 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index 7b79dcd..69b6620 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -130,41 +130,6 @@ "client.info()" ] }, - { - "cell_type": "markdown", - "metadata": { - "id": "xuczg29FFZVN" - }, - "source": [ - "### Choose data and query vectorisation options\n", - "\n", - "Here, you need to make a decision: do you want Elasticsearch to vectorise your data and queries, or do you want to do it yourself?\n", - "\n", - "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you choose is too big).\n", - "\n", - "If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model \"locally\" for data and query vectorisation.\n", - "\n", - "\n", - "**Note**: if you change these values, you'll likely need to re-run the notebook from this step." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "id": "whuUt6GySrkk" - }, - "outputs": [], - "source": [ - "USE_ELASTICSEARCH_VECTORISATION = True\n", - "\n", - "EMBEDDING_MODEL_ID = \"thenlper/gte-small\"\n", - "# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model\n", - "# If you use the `gte-base` or `gte-large` embedding models, the numDimension\n", - "# value in the vector search index must be set to 768 and 1024, respectively.\n", - "EMBEDDING_DIMENSIONS = 384" - ] - }, { "cell_type": "markdown", "metadata": { @@ -378,6 +343,42 @@ "## Step 4: Load Elasticsearch with vectorised data" ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "xuczg29FFZVN" + }, + "source": [ + "### Choose data and query vectorisation options\n", + "\n", + "Here, you need to make a decision: do you want Elasticsearch to vectorise your data and queries, or do you want to do it yourself?\n", + "\n", + "Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for your data and your querying, but **BE AWARE** that this requires your ES deployment to have at least 1 ML node (I would recommend setting autoscaling to true on your Cloud deployment in case the model you choose is too big).\n", + "\n", + "If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model \"locally\" for data and query vectorisation.\n", + "\n", + "Here, I've picked the [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) model for really no other reason than it was used in another cookbook, and it worked well enough for me. Please feel free to try others if you'd like - the only important thing is that you update the `EMBEDDING_DIMENSIONS` according to the model.\n", + "\n", + "**Note**: if you change these values, you'll likely need to re-run the notebook from this step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "whuUt6GySrkk" + }, + "outputs": [], + "source": [ + "USE_ELASTICSEARCH_VECTORISATION = True\n", + "\n", + "EMBEDDING_MODEL_ID = \"thenlper/gte-small\"\n", + "# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model\n", + "# If you use the `gte-base` or `gte-large` embedding models, the numDimension\n", + "# value in the vector search index must be set to 768 and 1024, respectively.\n", + "EMBEDDING_DIMENSIONS = 384" + ] + }, { "cell_type": "markdown", "metadata": { From 6dad0c827a74cd1907eb5cf39223f2e7ea5c65ec Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 21:27:36 +0900 Subject: [PATCH 5/8] * Tested with 8.13 as well * Fix Eland version to avoid breaking changes * Switch to CPU usage by default Signed-off-by: lloydmeta --- .../rag_with_hugging_face_gemma_elasticsearch.ipynb | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index 69b6620..35ad709 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -20,7 +20,7 @@ "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below!\n", "\n", "> [!TIP]\n", - "> This notebook has been tested with ES 8.12.2." + "> This notebook has been tested with ES 8.12.2 and 8.13.4" ] }, { @@ -40,7 +40,8 @@ }, "outputs": [], "source": [ - "!pip install datasets elasticsearch sentence_transformers transformers eland accelerate" + "!pip install datasets elasticsearch sentence_transformers transformers eland==8.12.1 accelerate\n", + "!pip install -U pandas # https://github.com/huggingface/datasets/pull/6978" ] }, { @@ -1096,9 +1097,9 @@ "\n", "tokenizer = AutoTokenizer.from_pretrained(\"google/gemma-2b-it\")\n", "# CPU Enabled uncomment below πŸ‘‡πŸ½\n", - "# model = AutoModelForCausalLM.from_pretrained(\"google/gemma-2b-it\")\n", + "model = AutoModelForCausalLM.from_pretrained(\"google/gemma-2b-it\")\n", "# GPU Enabled use below πŸ‘‡πŸ½\n", - "model = AutoModelForCausalLM.from_pretrained(\"google/gemma-2b-it\", device_map=\"auto\")" + "# model = AutoModelForCausalLM.from_pretrained(\"google/gemma-2b-it\", device_map=\"auto\")" ] }, { @@ -1145,7 +1146,7 @@ " combined_information = combined_query(query)\n", "\n", " # Moving tensors to GPU\n", - " input_ids = tokenizer(combined_information, return_tensors=\"pt\").to(\"cuda\")\n", + " input_ids = tokenizer(combined_information, return_tensors=\"pt\") # .to(\"cuda\") # Add if using GPU\n", " response = model.generate(**input_ids, max_new_tokens=700)\n", "\n", " return tokenizer.decode(response[0], skip_special_tokens=True)\n", From fdd2655399623856affcbaa6817bbe31afe4a3dc Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 22:31:08 +0900 Subject: [PATCH 6/8] Add testing with 8.14.x Signed-off-by: lloydmeta --- .../rag_with_hugging_face_gemma_elasticsearch.ipynb | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index 35ad709..f3674d7 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -8,11 +8,6 @@ "source": [ "# Building A RAG System with Gemma, Elasticsearch and Hugging Face Models\n", "\n", - "\n", - " \"Open\n", - "\n", - "\n", - "\n", "Authored By: [lloydmeta](https://huggingface.co/lloydmeta)\n", "\n", "This notebook walks you through building a Retrieve-Augmented Generation (RAG) powered by Elasticsearch (ES) and Hugging Face models, letting you toggle between ES-vectorising (your ES cluster vectorises for you when ingesting and querying) vs self-vectorising (you vectorise all your data before sending it to ES).\n", @@ -20,7 +15,7 @@ "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below!\n", "\n", "> [!TIP]\n", - "> This notebook has been tested with ES 8.12.2 and 8.13.4" + "> This notebook has been tested with ES 8.12.x, 8.13.x, and 8.14.x" ] }, { @@ -40,8 +35,8 @@ }, "outputs": [], "source": [ - "!pip install datasets elasticsearch sentence_transformers transformers eland==8.12.1 accelerate\n", - "!pip install -U pandas # https://github.com/huggingface/datasets/pull/6978" + "!pip install datasets elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU\n", + "!pip install -U pandas # Remove once https://github.com/huggingface/datasets/pull/6978 (has been released, as this causes dep conflicts)" ] }, { From 8671807311ba7992c0fbec73f1d66427aec162c8 Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 23:20:35 +0900 Subject: [PATCH 7/8] Remove 8.12, move to fixed datasets instead of updating panda Signed-off-by: lloydmeta --- notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index f3674d7..b05a5c5 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -15,7 +15,7 @@ "What should you use for your use case? *It depends* πŸ€·β€β™‚οΈ. ES-vectorising means your clients don't have to implement it, so that's the default here; however, if you don't have any ML nodes, or your own embedding setup is better/faster, feel free to set `USE_ELASTICSEARCH_VECTORISATION` to `False` in the `Choose data and query vectorisation options` section below!\n", "\n", "> [!TIP]\n", - "> This notebook has been tested with ES 8.12.x, 8.13.x, and 8.14.x" + "> This notebook has been tested with ES 8.13.x, and 8.14.x" ] }, { @@ -36,7 +36,7 @@ "outputs": [], "source": [ "!pip install datasets elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU\n", - "!pip install -U pandas # Remove once https://github.com/huggingface/datasets/pull/6978 (has been released, as this causes dep conflicts)" + "!pip install datasets==2.19.2 # Remove version lock if https://github.com/huggingface/datasets/pull/6978 has been released" ] }, { From b48ef67a163a70bdc1305c05c88c7c4b9111e03c Mon Sep 17 00:00:00 2001 From: lloydmeta Date: Wed, 26 Jun 2024 23:25:10 +0900 Subject: [PATCH 8/8] Remove datasets from first dep install Signed-off-by: lloydmeta --- notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb index b05a5c5..8f1aad4 100644 --- a/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb +++ b/notebooks/en/rag_with_hugging_face_gemma_elasticsearch.ipynb @@ -35,7 +35,7 @@ }, "outputs": [], "source": [ - "!pip install datasets elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU\n", + "!pip install elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU\n", "!pip install datasets==2.19.2 # Remove version lock if https://github.com/huggingface/datasets/pull/6978 has been released" ] },