Context
Episode Memories (Issue #190) introduces write-time narrative synthesis — detecting episodes across sessions and generating progressive summaries that surface via standard vector search.
Task
As part of Phase 4 (MCP Tools & Polish), create a custom evaluation suite for episode-level retrieval:
Scenarios to test:
- House-hunting scenario (from design doc) — 5 sessions, cross-session activity, aggregation query ("How many properties before Brookside?")
- Job search scenario — applications, interviews, offers across sessions
- Debugging episode — production incident spanning multiple sessions with different services
- Trip planning — flights, hotels, activities discussed across sessions
Metrics:
- Episode summary surfaces in top-3 recall results for aggregation queries
- Constituent memories are reachable via graph expansion from episode summary
- Queries that don't embed near individual memories DO embed near episode summaries
- Progressive summary accuracy vs full regeneration (drift detection)
Success criteria:
- All scenarios pass with episode summaries surfacing correctly
- No regression on existing LoCoMo/LongMemEval benchmarks
- Latency:
remember() adds <10ms synchronous overhead (async detection is unbounded)
Part of
Episode Memories epic (#190), Phase 4
Context
Episode Memories (Issue #190) introduces write-time narrative synthesis — detecting episodes across sessions and generating progressive summaries that surface via standard vector search.
Task
As part of Phase 4 (MCP Tools & Polish), create a custom evaluation suite for episode-level retrieval:
Scenarios to test:
Metrics:
Success criteria:
remember()adds <10ms synchronous overhead (async detection is unbounded)Part of
Episode Memories epic (#190), Phase 4