[SIGMOD 2025] SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search
- AVX512 is required
- For details, please refer to our technical report.
../
├── data/ # datasets and indices
├── symqglib/
| ├── index/
| | ├── fastscan/ # helper function for FastScan
| | └── qg/ # quantized graph
| ├── third/ # third party dependency
| └── utils/ # common utils
├── python/ # python bindings
└── reproduce/ # code for reproduction
- Install from sources in Python env (recommended version: 3.10):
apt-get install -y python-setuptools python-pip
cd python/
pip install -r requirements.txt
sh build.shsymphonyqg.Index(index_type, metric, num_elements, dimension, degree_bound=32)- intialize a non-constructed indexindex_typedefines the index type, currently only support 'QG'metricdefines the metric space, currently only support 'L2'num_elementsdefines the number of elementsdimensiondefines the dimension of data vectordegree_bounddefines the maximum out-degree of graph, must be a multiple of 32
symphonyqg.Index methods:
build_index(data, EF, num_iter=3, num_threads=ALL_THREDS)- construct the index fromdatadatanumpy array of vectors,dtype=float32, shape:(num_elements, dimension)EFa parameter that controls the number of candidates during graph constructionnum_iternumber of interation for indexing, 3 by defaultnum_threadsnumber of threads for indexing, use all threads in system by default
save(filename)- save theIndexto given pathload(filename)- load theIndexfrom given path, the loaded index must have same initialization parameters as the objectset_ef(EF)- set the beam size to control time-accuracy trade-off of queryingsearch(query, k)- search approximateknearest neighbors for a givenqueryquerynumpy array of a query vector,dtype=float32, shape:(dimension,)or(1, dimension)
For examples on real-world datasets, please refer to ./reproduce
import symphonyqg
import numpy as np
D = 64
N = 100000
# Random data
data = np.random.random((N, D)).astype('float32')
# Init index
index = symphonyqg.Index("QG", "L2", num_elements=N, dimension=D, degree_bound=32)
# Construct index
index.build_index(data, 200)
# Set beam size for querying
index.set_ef(100)
# Search query
K = 10
for i in range(10):
query = data[i]
knn = index.search(query, K)
print(knn)
# Save index
index.save("./test.index")
del index
# Load index
index = symphonyqg.Index("QG", "L2", num_elements=N, dimension=D, degree_bound=32)
index.load("./test.index")- For downloading datasets and preprocessing, please refer to
./data/README.md - To build index and test query performance. please refer to
./reproduce/README.mdfor details
mkdir bin/ build/
cd build
cmake ..
make- Currently, we only add an example for indexing. The APIs will be updated later.