Matryoshka prefilter: adopt SearchDimension for large-corpus query latency

## Problem

Pituitary's vector search runs at full stored dimension today. For small corpora (< ~5k chunks) that's fine. For larger corpora — especially the cross-repo governance use case (#173) and multi-repo workspaces (#228) — full-dim cosine across thousands of chunks becomes the dominant query cost.

## Proposal

Adopt stroma v2's matryoshka prefilter via `SearchParams.SearchDimension`:

1. Run the first-pass vector scan at a smaller truncated dimension (e.g. 256 for an embedder trained with matryoshka loss).
2. Rescore the shortlist with full-dimension cosine.

Result: latency drops roughly proportionally to the prefilter dimension, recall stays within ~1-2% of full-dim for matryoshka-trained embedders.

## Why this matters for Pituitary

- Unlocks cross-repo governance (#173) and multi-repo workspaces (#228) at realistic corpus sizes without a CPU budget explosion.
- Enables richer retrieval features (wider shortlist for rerankers, contextual retrieval, spec-family-scoped search) by making each individual vector scan cheaper.
- Improves watch-mode / pre-commit hook responsiveness.

## Implementation notes

- Configuration surface: `runtime.search.prefilter_dimension = 256` (or similar). Default 0 = full-dim, no prefilter.
- Only meaningful for matryoshka-trained embedders (most OpenAI models, some OSS models). Gate behind embedder capability detection or explicit opt-in to avoid recall regressions on non-matryoshka models.
- Three call sites today embed `VectorSearchQuery`: `internal/index/search.go`, `internal/analysis/semantic_terminology.go`, `internal/analysis/repository_similarity.go`. All three need to thread the prefilter config.
- Benchmark against representative corpus to confirm recall stays within tolerance.

## Acceptance criteria

- [ ] `SearchParams.SearchDimension` threaded through all three vector-search call sites
- [ ] Config surface with safe default (off)
- [ ] Embedder-capability gate (don't enable on non-matryoshka embedders without explicit opt-in)
- [ ] Benchmark: latency reduction AND recall@k on a representative corpus at ≥5k chunks
- [ ] Documentation explaining when to enable and at what dimension

## Context

Unlocked by stroma v2.0.0 (merged in #337). Part of the Phase 4 stroma adoption plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matryoshka prefilter: adopt SearchDimension for large-corpus query latency #341

Problem

Proposal

Why this matters for Pituitary

Implementation notes

Acceptance criteria

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Matryoshka prefilter: adopt SearchDimension for large-corpus query latency #341

Description

Problem

Proposal

Why this matters for Pituitary

Implementation notes

Acceptance criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions