Skip to content

Conversation

@mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 24, 2025

Use IVF_PQ fallback for insufficient GPU memory

Add adaptive fallback to IVF_PQ algorithm when GPU memory is insufficient for
NN_DESCENT. Include distance type awareness to avoid Cosine distance with
IVF_PQ (unsupported in CUVS 25.10). Add TODO for CUVS 25.12+ upgrade.

Use IVF_PQ algorithm for GPU index building for large dataset (>= 5M vectors).
Temporarily add a factory for calculating IVF_PQ params (to be
removed with CUVS 25.12+ upgrade.)

Use IVF_PQ algorithm for GPU index building for large dataset (>= 1M vectors).
Temporarily add a factory for calculating IVF_PQ params.
Also skip estimation of needed memory when IVF_PQ is used.
@mayya-sharipova mayya-sharipova added >enhancement auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search v9.2.1 v9.3.0 labels Oct 24, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

@mayya-sharipova mayya-sharipova marked this pull request as draft October 24, 2025 19:17
@mayya-sharipova
Copy link
Contributor Author

With this params (1M byte vectors):

@achirkin Notice how here when switching from NN_DESCENT, we got worse graph building time, and more dense graphs.

gist: 1_000_000 docs; 960 dims; euclidean metric

index_type force_merge_time (ms) QPS1 seg recall1 seg
cpu 130129 421 0.91
gpu NN_DESCENT 20643 467 0.92
gpu IVF_PQ 36536 149 1

Add adaptive fallback to IVF_PQ algorithm when GPU memory is insufficient for
NN_DESCENT. Include distance type awareness to avoid Cosine distance with
IVF_PQ (unsupported in CUVS 25.10). Add TODO for CUVS 25.12+ upgrade.
@mayya-sharipova mayya-sharipova marked this pull request as ready for review October 26, 2025 20:21
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Oct 26, 2025

These are benchmarks on NVIDIA GeForce RTX 4060; 8Gb memory; that before this PR could not force-merge to 1 seg because of insufficient GPU memory. With this PR:

opeanai: 2.6M docs; float32; 1536 dims; dot_product

index_type index_time(ms) force_merge_time (ms) QPS 1 seg recall  1 seg
cpu 638066 807401 139 0.99
gpu 121300 191283 140 1

hotpotqa-arctic: 5.2M docs; float32; 768 dims; dot_product

index_type index_time(ms) force_merge_time (ms) QPS 1 seg recall  1 seg
cpu 666102 1231002 430 0.69
gpu 238028 506233 137 0.95

double kmeansTrainsetFraction = Math.clamp(1.0 / Math.sqrt(nRows * 1e-5), minKmeansTrainsetFraction, maxKmeansTrainsetFraction);

// Calculate number of probes based on number of lists
int nProbes = Math.round((float) (Math.sqrt(nLists) / 20.0 + 4.0));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cuVS version, this parameter is overridden by CAGRA after calling the ivf-pq params constructor to:

std::round(2 + std::sqrt(ivf_pq_params.build_params.n_lists) / 20 + ef_construction / 16);

The differencfe is rather small though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fec345f

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The params factory looks good, but I have reservations about the resource manager changes. Let's see if we can figure out a more robust/cleaner way!

@ldematte
Copy link
Contributor

ldematte commented Nov 5, 2025

One good option could be to limit this PR to changes to CagraIndexParams creation, and we can worry how to include the fallback for dataset > total GPU memory later.

- Implement automatic switching to IVF_PQ algorithm when NN_DESCENT requires
  more GPU memory than available.
- Cache GPU total memory in the GPUSupport module
@mayya-sharipova
Copy link
Contributor Author

@ldematte Thanks for the review so far, I have updated the PR. It is ready for another round for reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.2 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants