Skip to content

Commit

Permalink
Notebooks modified with comments
Browse files Browse the repository at this point in the history
  • Loading branch information
chetan thote authored and chetan thote committed Nov 5, 2024
1 parent e8e1b96 commit 46d462f
Showing 1 changed file with 20 additions and 37 deletions.
57 changes: 20 additions & 37 deletions notebooks/similarity-search-on-vector-data/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,12 @@
"## What's in this notebook:\n",
"\n",
"1. Create and use a database.\n",
"2. Create a table and load data.\n",
"3. Create a full-text and a vector index.\n",
"4. Similarity search.\n",
"5. Hybrid search.\n",
"6. Clean up.\n",
"2. Create a table to hold vector data and load data.\n",
"3. Search based on vector similarity.\n",
"4. Search using metadata filtering.\n",
"5. Create and use a vector index.\n",
"6. Check that your query is using a vector index.\n",
"7. Clean up.\n",
"\n",
"## Questions?\n",
"\n",
Expand Down Expand Up @@ -156,13 +157,11 @@
"id": "981b4ba7-109c-418b-ab39-1df64426a1f2",
"metadata": {},
"source": [
"## 4. Similarity search.\n",
"## 3. Search based on vector similarity.\n",
"\n",
"Similarity search finds a set of vectors that are most similar to a query vector. This example finds vectors representing paragraphs that are similar to a vector about the Mario Kart Game. The vector for the first paragraph about Mario Kart as our query vector. This is a good semantic query vector for Mario Kart.\n",
"To find the most similar vectors in a query vector, use an <code>ORDER BY\u2026 LIMIT\u2026</code> query. The <code>ORDER BY</code> command will sort the vectors by a similarity score produced by a vector similarity function, with the closest matches at the top.\n",
"\n",
"To find the most similar vectors in a query vector, use an <code>ORDER BY\u2026 LIMIT\u2026</code> query. The <code>ORDER BY</code> command will arrange the vectors by their similarity score produced by a vector similarity function, with the closest matches at the top.\n",
"\n",
"The SQL below finds three paragraphs that are the most similar to the first paragraph about Mario Kart, a semantic similarity search for information about Mario Kart."
"The SQL below sets up a query vector, then uses the <code>DOT_PRODUCT</code> infix operator (<code><\\*></code>) to find the two vectors that are most similar to the query vector."
]
},
{
Expand All @@ -188,9 +187,11 @@
"id": "7220f9af-7a0c-4142-ace1-32102bedf869",
"metadata": {},
"source": [
"## 5. Hybrid search.\n",
"## 4. Search using metadata filtering.\n",
"\n",
"When building vector search applications, you may wish to filter on the fields of a record, with simple filters or via joins, in addition to applying vector similarity operations.\n",
"\n",
"Hybrid Search combines multiple search methods in one query and blends full-text search (which finds keyword matches) and vector search (which finds semantic matches) allowing search results to be (re-)ranked by a score that combines full-text and vector rankings."
"The following query combines the use of an <code>ORDER BY ... LIMIT</code> query and a metadata filter on category. This query will filter to find all comments in the category <code>\"Food\"</code> and then calculate the score for each of those and rank in descending order."
]
},
{
Expand All @@ -201,32 +202,14 @@
"outputs": [],
"source": [
"%%sql\n",
"SET @v_mario_kart = (SELECT v FROM video_games\n",
" WHERE URL = \"https://en.wikipedia.org/wiki/Super_Mario_Kart\"\n",
" ORDER BY id LIMIT 1);\n",
"SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;\n",
"\n",
"WITH fts AS (\n",
" SELECT id, paragraph,\n",
" MATCH(paragraph) AGAINST(\"mario kart\") AS SCORE\n",
" FROM video_games\n",
" WHERE MATCH(paragraph) AGAINST(\"mario kart\")\n",
" ORDER BY SCORE desc\n",
" LIMIT 200\n",
"),\n",
"vs AS (\n",
" SELECT id, paragraph, v <*> @v_mario_kart AS SCORE\n",
" FROM video_games\n",
" ORDER BY score DESC\n",
" LIMIT 200\n",
")\n",
"SELECT vs.id, SUBSTRING(vs.paragraph,0,25),\n",
" FORMAT(IFNULL(fts.score, 0) * .3\n",
" + IFNULL(vs.score, 0) * .7, 4) AS score,\n",
" FORMAT(fts.score, 4) AS fts_s,\n",
" FORMAT(vs.score, 4) AS vs_s\n",
"FROM fts FULL OUTER JOIN vs ON fts.id = vs.id\n",
"ORDER BY score DESC\n",
"LIMIT 5;"
"SELECT id, comment, category,\n",
" comment_embedding <*> @query_vec AS score\n",
" FROM comments\n",
" WHERE category = \"Food\"\n",
" ORDER BY score DESC\n",
" LIMIT 3;"
]
},
{
Expand Down

0 comments on commit 46d462f

Please sign in to comment.