Skip to content

Matryoshka prefilter: adopt SearchDimension for large-corpus query latency #341

@autholykos

Description

@autholykos

Problem

Pituitary's vector search runs at full stored dimension today. For small corpora (< ~5k chunks) that's fine. For larger corpora — especially the cross-repo governance use case (#173) and multi-repo workspaces (#228) — full-dim cosine across thousands of chunks becomes the dominant query cost.

Proposal

Adopt stroma v2's matryoshka prefilter via SearchParams.SearchDimension:

  1. Run the first-pass vector scan at a smaller truncated dimension (e.g. 256 for an embedder trained with matryoshka loss).
  2. Rescore the shortlist with full-dimension cosine.

Result: latency drops roughly proportionally to the prefilter dimension, recall stays within ~1-2% of full-dim for matryoshka-trained embedders.

Why this matters for Pituitary

Implementation notes

  • Configuration surface: runtime.search.prefilter_dimension = 256 (or similar). Default 0 = full-dim, no prefilter.
  • Only meaningful for matryoshka-trained embedders (most OpenAI models, some OSS models). Gate behind embedder capability detection or explicit opt-in to avoid recall regressions on non-matryoshka models.
  • Three call sites today embed VectorSearchQuery: internal/index/search.go, internal/analysis/semantic_terminology.go, internal/analysis/repository_similarity.go. All three need to thread the prefilter config.
  • Benchmark against representative corpus to confirm recall stays within tolerance.

Acceptance criteria

  • SearchParams.SearchDimension threaded through all three vector-search call sites
  • Config surface with safe default (off)
  • Embedder-capability gate (don't enable on non-matryoshka embedders without explicit opt-in)
  • Benchmark: latency reduction AND recall@k on a representative corpus at ≥5k chunks
  • Documentation explaining when to enable and at what dimension

Context

Unlocked by stroma v2.0.0 (merged in #337). Part of the Phase 4 stroma adoption plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coreCore source, config, and model capabilitiesarea:performancePerformance and scale characteristicsccd/priority:nextCCD: next uptype:featureimplementing a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions