diff --git a/qdrant-landing/content/documentation/tutorials/retrieval-quality.md b/qdrant-landing/content/documentation/tutorials/retrieval-quality.md index 307a315b3..68adebbd0 100644 --- a/qdrant-landing/content/documentation/tutorials/retrieval-quality.md +++ b/qdrant-landing/content/documentation/tutorials/retrieval-quality.md @@ -8,104 +8,78 @@ weight: 21 | Time: 30 min | Level: Intermediate | | | |--------------|---------------------|--|----| -Semantic search pipelines are as good as the embeddings they use. If your model cannot properly represent input data, similar objects might -be far away from each other in the vector space. No surprise, that the search results will be poor in this case. There is, however, another -component of the process which can also degrade the quality of the search results. It is the ANN algorithm itself. +In this tutorial, you will: -In this tutorial, we will show how to measure the quality of the semantic retrieval and how to tune the parameters of the HNSW, the ANN -algorithm used in Qdrant, to obtain the best results. +1. Load and prepare a dataset of arXiv titles with pre-computed embeddings. +2. Create a collection in Qdrant and index the training data. +3. Implement a function to measure retrieval quality using precision@k. +4. Compare the results of ANN search with exact search. +5. Tune HNSW parameters to improve retrieval quality. +6. Analyze the results and explore the trade-offs between precision and performance. -## Embeddings quality +By the end of this tutorial, you'll have a practical understanding of how to measure and improve retrieval quality in Qdrant. -The quality of the embeddings is a topic for a separate tutorial. In a nutshell, it is usually measured and compared by benchmarks, such as -[Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard). The evaluation process itself is pretty -straightforward and is based on a ground truth dataset built by humans. We have a set of queries and a set of the documents we would expect -to receive for each of them. In the [evaluation process](https://qdrant.tech/rag/rag-evaluation-guide/), we take a query, find the most similar documents in the vector space and compare -them with the ground truth. In that setup, **finding the most similar documents is implemented as full kNN search, without any approximation**. -As a result, we can measure the quality of the embeddings themselves, without the influence of the ANN algorithm. +### Step 0: Understanding the Impact of Search Algorithms on Your Retrieval Quality -## Retrieval quality +Semantic search pipelines rely heavily on the quality of the embeddings they use. If your model cannot properly represent input data, similar objects might be far apart in the vector space, leading to poor search results. Another critical component that affects search quality is the Approximate Nearest Neighbor (ANN) algorithm itself. -Embeddings quality is indeed the most important factor in the semantic search quality. However, vector search engines, such as Qdrant, do not -perform pure kNN search. Instead, they use **Approximate Nearest Neighbors** (ANN) algorithms, which are much faster than the exact search, -but can return suboptimal results. We can also **measure the retrieval quality of that approximation** which also contributes to the overall -search quality. +Qdrant harnesses the power of ANN algor "ithms to handle massive datasets efficiently. As your dataset grows, exact k-Nearest Neighbors (kNN) search starts to suffer in performance due to the increased computational time required to compute distances between the query and every point in the dataset. This makes exact kNN impractical for large-scale applications. ANN algorithms like HNSW provide a significant advantage by approximating the nearest neighbors, allowing you to perform fast and scalable searches over large volumes of data. -### Quality metrics +While ANN algorithms are faster, they may return approximate results, which can impact search quality. It's important to measure the retrieval quality of the ANN algorithm compared to the kNN algorithm to ensure it approximates the exact search effectively. -There are various ways of how quantify the quality of semantic search. Some of them, such as [Precision@k](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision_at_k), -are based on the number of relevant documents in the top-k search results. Others, such as [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank), -take into account the position of the first relevant document in the search results. [DCG and NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) -metrics are, in turn, based on the relevance score of the documents. -If we treat the search pipeline as a whole, we could use them all. The same is true for the embeddings quality evaluation. However, for the -ANN algorithm itself, anything based on the relevance score or ranking is not applicable. Ranking in vector search relies on the distance -between the query and the document in the vector space, however distance is not going to change due to approximation, as the function is -still the same. +### Step 1: Load and Prepare the Dataset -Therefore, it only makes sense to measure the quality of the ANN algorithm by the number of relevant documents in the top-k search results, -such as `precision@k`. It is calculated as the number of relevant documents in the top-k search results divided by `k`. In case of testing -just the ANN algorithm, we can use the exact kNN search as a ground truth, with `k` being fixed. It will be a measure on **how well the ANN -algorithm approximates the exact search**. - -## Measure the quality of the search results - -Let's build a quality [evaluation](https://qdrant.tech/rag/rag-evaluation-guide/) of the ANN algorithm in Qdrant. We will, first, call the search endpoint in a standard way to obtain -the approximate search results. Then, we will call the exact search endpoint to obtain the exact matches, and finally compare both results -in terms of precision. - -Before we start, let's create a collection, fill it with some data and then start our evaluation. We will use the same dataset as in the -[Loading a dataset from Hugging Face hub](/documentation/tutorials/huggingface-datasets/) tutorial, `Qdrant/arxiv-titles-instructorxl-embeddings` -from the [Hugging Face hub](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings). Let's download it in a streaming -mode, as we are only going to use part of it. +First, we need to load a dataset and prepare it for evaluation. We'll use the `Qdrant/arxiv-titles-instructorxl-embeddings` dataset from the Hugging Face Hub. ```python +# Install required libraries +pip install qdrant-client datasets + from datasets import load_dataset +# Load dataset dataset = load_dataset( "Qdrant/arxiv-titles-instructorxl-embeddings", split="train", streaming=True ) -``` - -We need some data to be indexed and another set for the testing purposes. Let's get the first 50000 items for the training and the next 1000 -for the testing. -```python +# Split data into training and testing sets dataset_iterator = iter(dataset) -train_dataset = [next(dataset_iterator) for _ in range(60000)] -test_dataset = [next(dataset_iterator) for _ in range(1000)] +train_dataset = [next(dataset_iterator) for _ in range(50000)] +test_dataset = [next(dataset_iterator) for _ in range(5000)] ``` -Now, let's create a collection and index the training data. This collection will be created with the default configuration. Please be aware that -it might be different from your collection settings, and it's always important to test exactly the same configuration you are going to use later -in production. +This code loads the dataset and splits it into two parts: 50,000 items for training and 5,000 items for testing. This separation will allow us to build the search index with training data and evaluate its quality using the test data. + +### Step 2: Create a Collection in Qdrant - +Next, we need to create a collection in Qdrant to store the vectors. ```python from qdrant_client import QdrantClient, models +# Set up Qdrant client client = QdrantClient("http://localhost:6333") + +# Create a collection in Qdrant client.create_collection( collection_name="arxiv-titles-instructorxl-embeddings", vectors_config=models.VectorParams( - size=768, # Size of the embeddings generated by InstructorXL model + size=768, # Size of the embeddings generated by the InstructorXL model distance=models.Distance.COSINE, ), ) ``` -We are now ready to index the training data. Uploading the records is going to trigger the indexing process, which will build the HNSW graph. -The indexing process may take some time, depending on the size of the dataset, but your data is going to be available for search immediately -after receiving the response from the `upsert` endpoint. **As long as the indexing is not finished, and HNSW not built, Qdrant will perform -the exact search**. We have to wait until the indexing is finished to be sure that the approximate search is performed. +This code creates a collection in Qdrant. The `size` parameter represents the dimensionality of the embedding vectors, and the `distance` parameter specifies the distance function to use (e.g., cosine). Using the correct distance function is important for achieving good retrieval quality. + +### Step 3: Index the Training Data + +Once the collection is set up, we need to upload the training data into Qdrant for indexing. ```python -client.upload_points( # upload_points is available as of qdrant-client v1.7.1 +# Upload training data to Qdrant +client.upload_points( # Available as of qdrant-client v1.7.1 collection_name="arxiv-titles-instructorxl-embeddings", points=[ models.PointStruct( @@ -117,111 +91,132 @@ client.upload_points( # upload_points is available as of qdrant-client v1.7.1 ] ) +# Wait for indexing to complete while True: collection_info = client.get_collection(collection_name="arxiv-titles-instructorxl-embeddings") if collection_info.status == models.CollectionStatus.GREEN: - # Collection status is green, which means the indexing is finished break ``` -## Standard mode vs exact search +This code uploads the training dataset to Qdrant, creating an index that can be used for approximate nearest neighbor (ANN) searches. The status check ensures that indexing is complete before moving forward. + +### Step 4: Measure Retrieval Quality -Qdrant has a built-in exact search mode, which can be used to measure the quality of the search results. In this mode, Qdrant performs a -full kNN search for each query, without any approximation. It is not suitable for production use with high load, but it is perfect for the -evaluation of the ANN algorithm and its parameters. It might be triggered by setting the `exact` parameter to `True` in the search request. -We are simply going to use all the examples from the test dataset as queries and compare the results of the approximate search with the -results of the exact search. Let's create a helper function with `k` being a parameter, so we can calculate the `precision@k` for different -values of `k`. +To assess retrieval quality, we need to compare the results of an ANN search with the results of an exact search. This comparison will help determine how well the ANN algorithm approximates the true nearest neighbors. + +The function `avg_precision_at_k(k)` calculates the average precision at k by comparing the ANN results against the exact search results for each item in the test dataset. Precision at k measures how many of the top k results from ANN match the exact search results. ```python def avg_precision_at_k(k: int): precisions = [] for item in test_dataset: + # Perform ANN search ann_result = client.query_points( collection_name="arxiv-titles-instructorxl-embeddings", query=item["vector"], limit=k, ).points + # Perform exact search knn_result = client.query_points( collection_name="arxiv-titles-instructorxl-embeddings", query=item["vector"], limit=k, search_params=models.SearchParams( - exact=True, # Turns on the exact search mode + exact=True, # Enables exact search mode ), ).points - # We can calculate the precision@k by comparing the ids of the search results - ann_ids = set(item.id for item in ann_result) - knn_ids = set(item.id for item in knn_result) + # Calculate precision@k + ann_ids = set(point.id for point in ann_result) + knn_ids = set(point.id for point in knn_result) precision = len(ann_ids.intersection(knn_ids)) / k precisions.append(precision) return sum(precisions) / len(precisions) ``` -Calculating the `precision@5` is as simple as calling the function with the corresponding parameter: +### Step 5: Measure Retrieval Quality ```python -print(f"avg(precision@5) = {avg_precision_at_k(k=5)}") +print(f"Initial avg(precision@5) = {avg_precision_at_k(k=5)}") ``` -Response: +This step measures the initial retrieval quality before any tuning of the HNSW(Hierarchical Navigable Small World) parameters. HNSW is a graph-based indexing algorithm that builds a multi-layer navigation structure. The upper layers are sparse with nodes far apart, while lower layers are denser with closer nodes. The search starts from the top layer, finds the closest node to the target, then moves down to denser layers, iteratively approaching the target position. -```text -avg(precision@5) = 0.9935999999999995 -``` +In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to m. There are two types of parameters that can be tuned: + +Index-time parameters: +- `m`: This parameter limits the maximum number of connections (degree) per node in each layer of the HNSW graph. A higher value for `m` allows more connections between nodes, potentially improving search accuracy but requiring more memory and indexing time. The default value for `m` is 16. +- `ef_construct`: This parameter defines the search range during index construction. A higher value of `ef_construct` means a wider search range during indexing, resulting in a higher quality graph structure. However, this increases the indexing time. The default value for `ef_construct` is 100. + +Search-time parameter: +- `ef`: (also known as `efSearch`): This parameter controls the search range when looking for nearest neighbors during queries. A higher value of `ef` widens the search range, increasing the likelihood of finding true nearest neighbors at the cost of higher search latency. The default value depends on the value of `ef_construct`. -As we can see, the precision of the approximate search vs exact search is pretty high. There are, however, some scenarios when we -need higher precision and can accept higher latency. HNSW is pretty tunable, and we can increase the precision by changing its parameters. - -## Tweaking the HNSW parameters +We'll use the default m and ef_construct as a baseline and then tweak the parameters to see how it affects the precision of the search. Later, we will double these values and observe their impact on retrieval quality. -HNSW is a hierarchical graph, where each node has a set of links to other nodes. The number of edges per node is called the `m` parameter. -The larger the value of it, the higher the precision of the search, but more space required. The `ef_construct` parameter is the number of -neighbours to consider during the index building. Again, the larger the value, the higher the precision, but the longer the indexing time. -The default values of these parameters are `m=16` and `ef_construct=100`. Let's try to increase them to `m=32` and `ef_construct=200` and -see how it affects the precision. Of course, we need to wait until the indexing is finished before we can perform the search. +Qdrant allows us to easily compare the performance between exact and approximate searches. For smaller datasets (e.g., up to 20,000 documents), exact search can be practical, but as the dataset scales, ANN algorithms like HNSW become necessary to handle the increased data volume efficiently. + +### Step 6: Tune HNSW Parameters for Better Precision + +The HNSW algorithm used in Qdrant can be fine-tuned for better precision by adjusting parameters like `m` and `ef_construct`. ```python +# Tune HNSW parameters for improved precision client.update_collection( collection_name="arxiv-titles-instructorxl-embeddings", hnsw_config=models.HnswConfigDiff( - m=32, # Increase the number of edges per node from the default 16 to 32 - ef_construct=200, # Increase the number of neighbours from the default 100 to 200 + m=32, # Increase from the default 16 to 32 + ef_construct=200, # Increase from the default 100 to 200 ) ) +# Wait for re-indexing to complete while True: collection_info = client.get_collection(collection_name="arxiv-titles-instructorxl-embeddings") if collection_info.status == models.CollectionStatus.GREEN: - # Collection status is green, which means the indexing is finished break + +# Measure new precision +print(f"New avg(precision@5) = {avg_precision_at_k(k=5)}") ``` -The same function can be used to calculate the average `precision@5`: +By increasing `m` and `ef_construct`, you're allowing for a more connected graph and a more exhaustive search during indexing, which should lead to higher precision. After tuning, it's important to measure the precision again to verify improvements. -```python -print(f"avg(precision@5) = {avg_precision_at_k(k=5)}") -``` +### How to Select HNSW Parameters +- If you require higher precision, increase `m` and `ef_construct` while considering the increased memory usage and indexing time. +- If memory and indexing time are critical constraints, tune the parameters incrementally to find the right balance. +- Consider adjusting `ef` (also known as `efSearch`), which controls the number of neighbors evaluated during the search. A higher value may increase precision but also increases latency. -Response: +### Measuring Search Quality in the WebUI -```text -avg(precision@5) = 0.9969999999999998 -``` +If you prefer a visual interface, you can also measure search quality in the Qdrant WebUI. To do this, first go to [Qdrant Cloud](https://cloud.qdrant.io/) and select a collection. Then, navigate to the Search Quality tab. + + +![Search Quality Report in Qdrant WebUI](/articles_data/retrieval-quality/user_clean.png) + + +From here you can run a simple search Quality Report on your collection. Qdrant will select a random sample of 100 points for you and run an exact search and an ANN search. It will then compare the results and calculate the precision@10. + +![Search Quality Report in Qdrant WebUI](/articles_data/retrieval-quality/clean_precision_ran.png) + +You can also run a more comprehensive Search Quality Report by clicking on the `Advanced Mod` button. This will open an interface where you can specify parameters like ef and m and evaluate the Search Quality in the webui. + +![Search Quality Report in Qdrant WebUI](/articles_data/retrieval-quality/clean_advanced.png) + +### Step 7: Analyze Results and Conclusion + +Finally, analyze the results: + +- Compare the initial and new precision scores to determine the impact of parameter changes. +- Assess whether the improvements justify the increased resource usage. -The precision has obviously increased, and we know how to control it. However, there is a trade-off between the precision and the search -latency and memory requirements. In some specific cases, we may want to increase the precision as much as possible, so now we know how -to do it. +By following these steps, you can effectively measure and improve the retrieval quality of your Qdrant-based semantic search system. Qdrant's ANN capabilities provide the ability to efficiently handle massive datasets while maintaining acceptable levels of accuracy. -## Wrapping up +### Additional Considerations -Assessing the quality of retrieval is a critical aspect of [evaluating](https://qdrant.tech/rag/rag-evaluation-guide/) semantic search performance. It is imperative to measure retrieval quality when aiming for optimal quality of. -your search results. Qdrant provides a built-in exact search mode, which can be used to measure the quality of the ANN algorithm itself, -even in an automated way, as part of your CI/CD pipeline. +- **Embeddings Quality Is Crucial**: While tuning ANN parameters can improve retrieval quality, the quality of the embeddings is a major factor. +- **Exact Search as Baseline**: Use exact search as a baseline to compare with ANN to separate the impact of embedding quality from the search methods. +- **Monitor Performance**: Continuously monitor retrieval quality and adjust parameters accordingly to optimize both performance and accuracy. -Again, **the quality of the embeddings is the most important factor**. HNSW does a pretty good job in terms of precision, and it is -parameterizable and tunable, when required. There are some other ANN algorithms available out there, such as [IVF*](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes#cell-probe-methods-indexivf-indexes), -but they usually [perform worse than HNSW in terms of quality and performance](https://nirantk.com/writing/pgvector-vs-qdrant/#correctness). +By leveraging Qdrant's ANN capabilities, you can build a robust and scalable semantic search system that delivers high-quality results even at the scale of billions of vectors. diff --git a/qdrant-landing/package-lock.json b/qdrant-landing/package-lock.json index 28fd4ecc0..a003ae794 100644 --- a/qdrant-landing/package-lock.json +++ b/qdrant-landing/package-lock.json @@ -3974,4 +3974,4 @@ "dev": true } } -} +} \ No newline at end of file diff --git a/qdrant-landing/package.json b/qdrant-landing/package.json index 6197c041a..5e42fd961 100644 --- a/qdrant-landing/package.json +++ b/qdrant-landing/package.json @@ -33,4 +33,4 @@ }, "name": "qdrant-landing", "version": "0.1.0" -} +} \ No newline at end of file diff --git a/qdrant-landing/static/articles_data/retrieval-quality/clean_advanced.png b/qdrant-landing/static/articles_data/retrieval-quality/clean_advanced.png new file mode 100644 index 000000000..9b88c7b80 Binary files /dev/null and b/qdrant-landing/static/articles_data/retrieval-quality/clean_advanced.png differ diff --git a/qdrant-landing/static/articles_data/retrieval-quality/clean_precision_ran.png b/qdrant-landing/static/articles_data/retrieval-quality/clean_precision_ran.png new file mode 100644 index 000000000..75f4f65db Binary files /dev/null and b/qdrant-landing/static/articles_data/retrieval-quality/clean_precision_ran.png differ diff --git a/qdrant-landing/static/articles_data/retrieval-quality/user_clean.png b/qdrant-landing/static/articles_data/retrieval-quality/user_clean.png new file mode 100644 index 000000000..6b3af2d3e Binary files /dev/null and b/qdrant-landing/static/articles_data/retrieval-quality/user_clean.png differ