-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use IVF_PQ for GPU index build for large datasets #137126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Use IVF_PQ algorithm for GPU index building for large dataset (>= 1M vectors). Temporarily add a factory for calculating IVF_PQ params. Also skip estimation of needed memory when IVF_PQ is used.
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @mayya-sharipova, I've created a changelog YAML for you. |
|
With this params (1M byte vectors): @achirkin Notice how here when switching from NN_DESCENT, we got worse graph building time, and more dense graphs. gist: 1_000_000 docs; 960 dims; euclidean metric
|
Add adaptive fallback to IVF_PQ algorithm when GPU memory is insufficient for NN_DESCENT. Include distance type awareness to avoid Cosine distance with IVF_PQ (unsupported in CUVS 25.10). Add TODO for CUVS 25.12+ upgrade.
|
These are benchmarks on NVIDIA GeForce RTX 4060; 8Gb memory; that before this PR could not force-merge to 1 seg because of insufficient GPU memory. With this PR: opeanai: 2.6M docs; float32; 1536 dims; dot_product
hotpotqa-arctic: 5.2M docs; float32; 768 dims; dot_product
|
| double kmeansTrainsetFraction = Math.clamp(1.0 / Math.sqrt(nRows * 1e-5), minKmeansTrainsetFraction, maxKmeansTrainsetFraction); | ||
|
|
||
| // Calculate number of probes based on number of lists | ||
| int nProbes = Math.round((float) (Math.sqrt(nLists) / 20.0 + 4.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cuVS version, this parameter is overridden by CAGRA after calling the ivf-pq params constructor to:
std::round(2 + std::sqrt(ivf_pq_params.build_params.n_lists) / 20 + ef_construction / 16);
The differencfe is rather small though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in fec345f
ldematte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params factory looks good, but I have reservations about the resource manager changes. Let's see if we can figure out a more robust/cleaner way!
|
One good option could be to limit this PR to changes to CagraIndexParams creation, and we can worry how to include the fallback for dataset > total GPU memory later. |
- Implement automatic switching to IVF_PQ algorithm when NN_DESCENT requires more GPU memory than available. - Cache GPU total memory in the GPUSupport module
|
@ldematte Thanks for the review so far, I have updated the PR. It is ready for another round for reviews. |
Use IVF_PQ fallback for insufficient GPU memory
Add adaptive fallback to IVF_PQ algorithm when GPU memory is insufficient for
NN_DESCENT. Include distance type awareness to avoid Cosine distance with
IVF_PQ (unsupported in CUVS 25.10). Add TODO for CUVS 25.12+ upgrade.
Use IVF_PQ algorithm for GPU index building for large dataset (>= 5M vectors).
Temporarily add a factory for calculating IVF_PQ params (to be
removed with CUVS 25.12+ upgrade.)