Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
Setup: check that np.random is used to construct the dataset, and that it is somehow passed and used as the dataset in the HNSW index.
Variables: check that mean, variance and dataset size are all varied individually and tested to see their effects.
Expected result:
1. **Effect on Increasing Data Size**: The HNSW index generally maintains high recall as data size increases, with occasional slight performance drops in certain cases.
1. **Effect on Increasing Data Size**: Increasing data size set leads to a slightly decrease in recall.
2. **Effect on Increasing Mean**: The recall is not significantly affected by changes in the mean, showing robustness.
3. **Effect on Increasing Variance**: The recall remains mostly stable, with some rare instances of decreased performance.
3. **Effect on Increasing Variance**: Higher variance leads to a decline in recall.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous ground truth is more correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the variance increases, the vectors become more dispersed, making it harder to capture all neighboring points, and the recall rate decreases. Does this reasoning make sense?

Original file line number Diff line number Diff line change
@@ -1 +1 @@
Using higher \( M \) values improves recall with manageable increases in query latency, making it a favorable trade-off for applications prioritizing accuracy.
Using higher \( M \) values improves recall with manageable increases in query latency and memory usage, making it a favorable trade-off for applications prioritizing accuracy.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Memory usage remains same as efSearch varies. With M=16, setting efSearch to 64 or higher achieves at least 90% recall while minimizing memory usage.
Original file line number Diff line number Diff line change
@@ -1,4 +1 @@
Check that the script installs a package for memory measurement or uses some utility that can legitimately measure memory used.

HNSW should be better, something like this:
The HNSW index type achieves the required recall rate of at least 96% and does so with significantly less memory usage compared to the IVF index type.
The IVF index type achieves the required recall rate of at least 96% and does so with significantly less memory usage compared to the HNSW index type.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ The starter file can be found under "faiss"
Instructions:
1. Set up the environment using `pip install faiss-cpu`.

2. Please use `np.random.random` to construct normal distribution synthetic dataset Refer to `faiss/tutorial/python/1-Flat.py` for code.
2. Please use `np.random.normal` to construct normal distribution synthetic dataset Refer to `faiss/tutorial/python/1-Flat.py` for code.

3. Read faiss/benchs/bench_hnsw.py. You will use this script for testing the HNSW index with the synthetic dataset. You will need to adapt the script to be able to use this synthetic dataset.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ Instructions:
3. Read /starter_file/faiss/benchs/bench_hnsw.py. You will use this script for testing the HNSW index with SIFT1M dataset. Make sure to provide all input arguments required by the file. There are 3 of them.

Question:
What is the optimal combination of M and efSearch to minimize memory usage while maintaining a recall of at least 90%? Use k=10, efConstruction=40, and use varying M values of 16, 24, 32. efSearch is not a parameter that you need to touch.
What is the optimal combination of M and efSearch to minimize memory usage while maintaining a recall of at least 90%? Use k=10, efConstruction=40, and use varying M values of 16, 24, 32.
141 changes: 141 additions & 0 deletions benchmark/experimentation_bench/vector_index/raw_data/q11_data.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
All experiment based on fixed input argument: 10 32 40

Effect on Increasing Data Size:

nb = 50000
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.8257, missing rate 0.0000
efSearch 16 bounded queue False 0.001 ms per query, R@1 0.8257, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9368, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9368, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9832, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9832, missing rate 0.0000
efSearch 128 bounded queue True 0.008 ms per query, R@1 0.9964, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9964, missing rate 0.0000
efSearch 256 bounded queue True 0.013 ms per query, R@1 0.9995, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9995, missing rate 0.0000
nb = 75000
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.8250, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.8250, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9322, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9322, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9837, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9837, missing rate 0.0000
efSearch 128 bounded queue True 0.007 ms per query, R@1 0.9977, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9977, missing rate 0.0000
efSearch 256 bounded queue True 0.014 ms per query, R@1 0.9995, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9995, missing rate 0.0000
nb = 100000
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.8326, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.8326, missing rate 0.0000
efSearch 32 bounded queue True 0.003 ms per query, R@1 0.9371, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9371, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9853, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9853, missing rate 0.0000
efSearch 128 bounded queue True 0.007 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9970, missing rate 0.0000
efSearch 256 bounded queue True 0.013 ms per query, R@1 0.9996, missing rate 0.0000
efSearch 256 bounded queue False 0.012 ms per query, R@1 0.9996, missing rate 0.0000
nb = 125000
efSearch 16 bounded queue True 0.001 ms per query, R@1 0.8247, missing rate 0.0000
efSearch 16 bounded queue False 0.001 ms per query, R@1 0.8247, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9336, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9336, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9815, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9815, missing rate 0.0000
efSearch 128 bounded queue True 0.007 ms per query, R@1 0.9965, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9965, missing rate 0.0000
efSearch 256 bounded queue True 0.014 ms per query, R@1 0.9995, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9995, missing rate 0.0000

Effect on Increasing Mean:
Mean= 0
efSearch 16 bounded queue True 0.001 ms per query, R@1 0.8297, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.8297, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9364, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9364, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9852, missing rate 0.0000
efSearch 64 bounded queue False 0.005 ms per query, R@1 0.9852, missing rate 0.0000
efSearch 128 bounded queue True 0.008 ms per query, R@1 0.9971, missing rate 0.0000
efSearch 128 bounded queue False 0.010 ms per query, R@1 0.9971, missing rate 0.0000
efSearch 256 bounded queue True 0.017 ms per query, R@1 0.9999, missing rate 0.0000
efSearch 256 bounded queue False 0.014 ms per query, R@1 0.9999, missing rate 0.0000
Mean= 1
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.8332, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.8332, missing rate 0.0000
efSearch 32 bounded queue True 0.003 ms per query, R@1 0.9367, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9367, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9865, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9865, missing rate 0.0000
efSearch 128 bounded queue True 0.008 ms per query, R@1 0.9976, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9976, missing rate 0.0000
efSearch 256 bounded queue True 0.014 ms per query, R@1 0.9998, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9998, missing rate 0.0000
Mean= 2
efSearch 16 bounded queue True 0.001 ms per query, R@1 0.8292, missing rate 0.0000
efSearch 16 bounded queue False 0.001 ms per query, R@1 0.8292, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9364, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9364, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9848, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9848, missing rate 0.0000
efSearch 128 bounded queue True 0.007 ms per query, R@1 0.9972, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9972, missing rate 0.0000
efSearch 256 bounded queue True 0.013 ms per query, R@1 0.9998, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9998, missing rate 0.0000
Mean= 3
efSearch 16 bounded queue True 0.001 ms per query, R@1 0.8276, missing rate 0.0000
efSearch 16 bounded queue False 0.001 ms per query, R@1 0.8276, missing rate 0.0000
efSearch 32 bounded queue True 0.003 ms per query, R@1 0.9359, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9359, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9855, missing rate 0.0000
efSearch 64 bounded queue False 0.005 ms per query, R@1 0.9855, missing rate 0.0000
efSearch 128 bounded queue True 0.008 ms per query, R@1 0.9974, missing rate 0.0000
efSearch 128 bounded queue False 0.008 ms per query, R@1 0.9974, missing rate 0.0000
efSearch 256 bounded queue True 0.013 ms per query, R@1 0.9998, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9998, missing rate 0.0000


Effect on Increasing Variance:
Std=1
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.8337, missing rate 0.0000
efSearch 16 bounded queue False 0.001 ms per query, R@1 0.8337, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.9365, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.9365, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9844, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9844, missing rate 0.0000
efSearch 128 bounded queue True 0.007 ms per query, R@1 0.9972, missing rate 0.0000
efSearch 128 bounded queue False 0.007 ms per query, R@1 0.9972, missing rate 0.0000
efSearch 256 bounded queue True 0.013 ms per query, R@1 0.9997, missing rate 0.0000
efSearch 256 bounded queue False 0.013 ms per query, R@1 0.9997, missing rate 0.0000
Std=2
efSearch 16 bounded queue True 0.001 ms per query, R@1 0.7662, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.7662, missing rate 0.0000
efSearch 32 bounded queue True 0.002 ms per query, R@1 0.8961, missing rate 0.0000
efSearch 32 bounded queue False 0.002 ms per query, R@1 0.8961, missing rate 0.0000
efSearch 64 bounded queue True 0.004 ms per query, R@1 0.9688, missing rate 0.0000
efSearch 64 bounded queue False 0.004 ms per query, R@1 0.9688, missing rate 0.0000
efSearch 128 bounded queue True 0.008 ms per query, R@1 0.9933, missing rate 0.0000
efSearch 128 bounded queue False 0.008 ms per query, R@1 0.9933, missing rate 0.0000
efSearch 256 bounded queue True 0.015 ms per query, R@1 0.9984, missing rate 0.0000
efSearch 256 bounded queue False 0.015 ms per query, R@1 0.9984, missing rate 0.0000
Std=3
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.7324, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.7324, missing rate 0.0000
efSearch 32 bounded queue True 0.003 ms per query, R@1 0.8702, missing rate 0.0000
efSearch 32 bounded queue False 0.003 ms per query, R@1 0.8702, missing rate 0.0000
efSearch 64 bounded queue True 0.005 ms per query, R@1 0.9541, missing rate 0.0000
efSearch 64 bounded queue False 0.005 ms per query, R@1 0.9541, missing rate 0.0000
efSearch 128 bounded queue True 0.009 ms per query, R@1 0.9895, missing rate 0.0000
efSearch 128 bounded queue False 0.009 ms per query, R@1 0.9895, missing rate 0.0000
efSearch 256 bounded queue True 0.017 ms per query, R@1 0.9974, missing rate 0.0000
efSearch 256 bounded queue False 0.016 ms per query, R@1 0.9974, missing rate 0.0000
Std=4
efSearch 16 bounded queue True 0.002 ms per query, R@1 0.7081, missing rate 0.0000
efSearch 16 bounded queue False 0.002 ms per query, R@1 0.7081, missing rate 0.0000
efSearch 32 bounded queue True 0.003 ms per query, R@1 0.8555, missing rate 0.0000
efSearch 32 bounded queue False 0.003 ms per query, R@1 0.8555, missing rate 0.0000
efSearch 64 bounded queue True 0.005 ms per query, R@1 0.9472, missing rate 0.0000
efSearch 64 bounded queue False 0.005 ms per query, R@1 0.9472, missing rate 0.0000
efSearch 128 bounded queue True 0.010 ms per query, R@1 0.9870, missing rate 0.0000
efSearch 128 bounded queue False 0.010 ms per query, R@1 0.9870, missing rate 0.0000
efSearch 256 bounded queue True 0.019 ms per query, R@1 0.9966, missing rate 0.0000
efSearch 256 bounded queue False 0.018 ms per query, R@1 0.9966, missing rate 0.0000
10 changes: 10 additions & 0 deletions benchmark/experimentation_bench/vector_index/raw_data/q1_data.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
efSearch 16 bounded queue True 0.008 ms per query, R@1 0.9054, missing rate 0.0000
efSearch 16 bounded queue False 0.008 ms per query, R@1 0.9054, missing rate 0.0000
efSearch 32 bounded queue True 0.013 ms per query, R@1 0.9599, missing rate 0.0000
efSearch 32 bounded queue False 0.012 ms per query, R@1 0.9599, missing rate 0.0000
efSearch 64 bounded queue True 0.021 ms per query, R@1 0.9859, missing rate 0.0000
efSearch 64 bounded queue False 0.022 ms per query, R@1 0.9859, missing rate 0.0000
efSearch 128 bounded queue True 0.039 ms per query, R@1 0.9962, missing rate 0.0000
efSearch 128 bounded queue False 0.038 ms per query, R@1 0.9962, missing rate 0.0000
efSearch 256 bounded queue True 0.065 ms per query, R@1 0.9989, missing rate 0.0000
efSearch 256 bounded queue False 0.062 ms per query, R@1 0.9989, missing rate 0.0000
35 changes: 35 additions & 0 deletions benchmark/experimentation_bench/vector_index/raw_data/q2_data.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
M=16
efSearch 16 bounded queue True 0.004 ms per query, R@1 0.8015, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 16 bounded queue False 0.004 ms per query, R@1 0.8017, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 32 bounded queue True 0.007 ms per query, R@1 0.8912, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 32 bounded queue False 0.006 ms per query, R@1 0.8913, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 64 bounded queue True 0.010 ms per query, R@1 0.9490, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 64 bounded queue False 0.011 ms per query, R@1 0.9491, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 128 bounded queue True 0.020 ms per query, R@1 0.9797, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 128 bounded queue False 0.019 ms per query, R@1 0.9797, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 256 bounded queue True 0.037 ms per query, R@1 0.9930, missing rate 0.0000, memory usage 1280532.00 KB
efSearch 256 bounded queue False 0.033 ms per query, R@1 0.9930, missing rate 0.0000, memory usage 1280532.00 KB

M=24
efSearch 16 bounded queue True 0.007 ms per query, R@1 0.8827, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 16 bounded queue False 0.006 ms per query, R@1 0.8828, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 32 bounded queue True 0.010 ms per query, R@1 0.9487, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 32 bounded queue False 0.010 ms per query, R@1 0.9488, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 64 bounded queue True 0.016 ms per query, R@1 0.9815, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 64 bounded queue False 0.017 ms per query, R@1 0.9815, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 128 bounded queue True 0.030 ms per query, R@1 0.9944, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 128 bounded queue False 0.029 ms per query, R@1 0.9944, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 256 bounded queue True 0.053 ms per query, R@1 0.9987, missing rate 0.0000, memory usage 1336632.00 KB
efSearch 256 bounded queue False 0.055 ms per query, R@1 0.9987, missing rate 0.0000, memory usage 1336632.00 KB

M=32
efSearch 16 bounded queue True 0.009 ms per query, R@1 0.9088, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 16 bounded queue False 0.008 ms per query, R@1 0.9088, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 32 bounded queue True 0.012 ms per query, R@1 0.9623, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 32 bounded queue False 0.012 ms per query, R@1 0.9623, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 64 bounded queue True 0.020 ms per query, R@1 0.9872, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 64 bounded queue False 0.021 ms per query, R@1 0.9873, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 128 bounded queue True 0.036 ms per query, R@1 0.9964, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 128 bounded queue False 0.035 ms per query, R@1 0.9964, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 256 bounded queue True 0.063 ms per query, R@1 0.9992, missing rate 0.0000, memory usage 1405588.00 KB
efSearch 256 bounded queue False 0.063 ms per query, R@1 0.9992, missing rate 0.0000, memory usage 1405588.00 KB
Loading