Skip to content

[Enhance]: parallelization for large batch sizes #49

@mmbejani

Description

@mmbejani

Affected Component

ailego

Current Behavior

The current compute_one_to_many_* implementations in zvec::ailego::DistanceBatch process batches sequentially.

  • Inner-product computation is vectorized (AVX2) but single-threaded.
  • Throughput is limited by a single core when BatchSize is large.

Desired Improvement

For large batch sizes (e.g. hundreds or thousands of vectors), the outer loop over BatchSize / dp_batch becomes embarrassingly parallel and can benefit significantly from multi-core CPUs.

Introduce optional OpenMP parallelization when the batch size exceeds a configurable threshold.

Example strategy:

  • Keep current behavior for small batches (to avoid OpenMP overhead).
  • Use #pragma omp parallel for over the batch dimension for large batches.
  • Guard with #ifdef _OPENMP to preserve portability.

Impact

  • Improved throughput for large-scale distance computations.
  • No behavior change for small batch sizes.
  • Backward compatible when OpenMP is not enabled.

Metadata

Metadata

Assignees

Labels

enhancementImprove an existing feature or component

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions