Skip to content

Evaluate quantized vector storage (int8/binary) for snapshot size reduction #340

@autholykos

Description

@autholykos

Problem

Stroma snapshots store embeddings as float32 blobs. On a modest corpus (~200 spec/doc records, ~3k chunks, dim=1536) this is already multiple megabytes per snapshot — and snapshots are content-addressed, so every non-trivial corpus change writes a fresh file.

This starts to matter when:

  • Users commit snapshot fixtures to repos (dogfooding, tests, reproducible CI).
  • CI artifacts carry snapshots across jobs.
  • Pre-commit hooks rebuild and discard many snapshot revisions per day.

Proposal

Expose stroma v2's quantization options through BuildOptions:

  • int8 quantization — 4× smaller, near-identical recall in typical conditions.
  • binary quantization — 32× smaller, 1-bit sign + full-precision rescore. More aggressive; worth measuring recall impact on spec/doc corpora specifically.

Configure via the runtime block (e.g. runtime.quantization = "float32" | "int8" | "binary"), default float32 so nothing changes without opt-in.

Why this matters for Pituitary

  • Makes snapshot-in-repo a realistic pattern — a 32× smaller .stroma.db fits in a repo without bloating it.
  • Enables snapshot-as-CI-artifact without paying bandwidth + storage taxes.
  • Positions Pituitary for larger corpora (multi-repo governance, RFC: Cross-repo spec governance #173 cross-repo work) where full-precision vectors stop being free.

Implementation notes

  • Use stroma/v2/store.{Encode,Decode}VectorBlob{,Int8,Binary}.
  • Binary quantization's rescore step adds a cosine-similarity pass at full dim — confirm it stays within the SearchParams latency budget on a representative corpus before defaulting.
  • The embedder stays float32 on the query side; quantization is a storage-and-prefilter concern.
  • Verify the reuse-probe path works correctly across quantization changes (quantization change should be equivalent to an embedder-fingerprint mismatch → forces rebuild).

Acceptance criteria

  • runtime.quantization config surface
  • int8 and binary paths exercised through rebuild → search end-to-end
  • Benchmark: snapshot size reduction AND precision@k/recall@k delta on a representative corpus
  • Quantization change triggers rebuild (not a reuse-compatible delta)
  • Documentation recommending int8 as the near-always-safe choice and calling out binary's rescore overhead

Context

Unlocked by stroma v2.0.0 (merged in #337). Part of the Phase 4 stroma adoption plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coreCore source, config, and model capabilitiesarea:performancePerformance and scale characteristicsccd/priority:nextCCD: next uptype:featureimplementing a new feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions