Skip to content

Add outline-guided retrieval path for governance analysis #385

@autholykos

Description

@autholykos

Context

The useful PageIndex lesson for Pituitary is not "vectorless RAG" as a replacement for Stroma. It is the retrieval workflow: inspect document structure, select likely sections, expand local/parent context, then run product-specific reasoning.

That belongs in Pituitary because section-selection prompts, status handling, historical-vs-active preference, and governance confidence are product semantics.

Current adjacent work:

Proposed Capability

Add an outline-guided retrieval path for governance analysis commands such as review-spec, check-doc-drift, and check-compliance.

A first implementation can be simple:

  1. Search candidate records/chunks through the existing Stroma-backed retrieval path.
  2. Read the candidate record outline/tree once Stroma exposes it, or use current section lineage where available.
  3. Select relevant sections deterministically first, with optional model-assisted selection only when the analysis runtime is configured.
  4. Expand selected chunks with ExpandContext, including parent context when available.
  5. Pass bounded, provenance-rich contexts into the governance analyzer.

Complexity Reduction Constraint

This issue should introduce a small shared retrieval/context layer rather than wiring outline selection separately into each analysis command.

The implementation should make the PageIndex-inspired workflow explicit as a reusable phase boundary:

  • candidate record/chunk search;
  • outline/tree projection;
  • deterministic or bounded model-assisted section selection;
  • context expansion;
  • provenance-rich context payload assembly.

check-doc-drift, check-compliance, and review-spec should consume that layer through a narrow API. Avoid expanding the existing analysis monoliths with more embedded search/outline/expansion branching. If the first implementation touches only one command, the new code should still be shaped so the next command can reuse it without copying retrieval logic.

Non-Goals

Acceptance Criteria

  • At least one governance command can use the outline-guided context path behind a flag or config option.
  • Results include enough provenance to debug selection: record ref, chunk id, heading, snapshot fingerprint, and selected/expanded context boundaries.
  • The path works without an LLM when deterministic section selection is used.
  • Optional LLM section selection is bounded and falls back cleanly.
  • The implementation is benchmarkable by Arm B retrieval benchmark: LLM-graded RAG with ExpandContext(IncludeParent) on/off #361.
  • A reusable retrieval/context abstraction exists below command and MCP transport layers, with tests at that boundary.
  • The PR compares complexity against the 2026-05-07 baseline and explicitly states whether PageIndex work reduced, preserved, or increased complexity in the touched hotspots.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coreCore source, config, and model capabilitiesarea:performancePerformance and scale characteristicsccd/priority:nextCCD: next upccd:queueActive CCD work queue itemneed:breakdownCall for participation: issue too big to solve as istype:featureimplementing a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions