Problem
Multi-session, multi-hop questions like "How many properties did I view before making an offer on the Brookside townhouse?" fail with current retrieval. The answer spans 4+ separate sessions with no semantic overlap to the query. Vector search returns results about Brookside itself, missing the earlier property viewings entirely.
Current architecture is read-optimized: store raw memories, try to be clever at query time with graph expansion and temporal filtering. But multi-hop temporal reasoning requires a query planner / agentic RAG layer on top — which defeats the purpose of a memory system.
Proposed Solution: Episode Memories
Flip the paradigm: write-time reasoning instead of read-time reasoning.
At ingest time, detect when memories belong to an ongoing episode (an activity/goal spanning multiple sessions) and auto-generate progressive summary memories that capture the narrative state.
How it works
-
EpisodeDetector — At ingest, classify whether a new memory belongs to an existing episode
- Embedding similarity to existing episode summaries
- Or LLM-based classification (more accurate, higher cost)
- Episodes represent activities/goals: "house hunting", "planning a trip", "job search"
-
EpisodeSummarizer — When a memory joins an episode, regenerate the summary
- Progressive: each new memory triggers an update, not a full recompute
- Example output: "User is house hunting. Properties viewed: bungalow (rejected — kitchen renovation needed), Cedar Creek (rejected — over budget), 1-bed condo (rejected — highway noise), 2-bed condo (rejected — outbid). Offer made on Brookside townhouse $340k, accepted."
-
Episode summaries as first-class memories
- Stored with
source_type: "episode_summary"
- Directly searchable via vector search — no multi-hop needed
- Graph links to constituent memories for drill-down
Why this is novel
- Most memory systems optimize the READ path (embeddings, reranking, graph traversal)
- This optimizes the WRITE path — invest compute at ingest to make retrieval trivially easy
- Mirrors how human memory works (schemas/narratives, not isolated facts)
- More token-efficient at query time (one summary vs. 10 raw messages)
- Solves multi-hop without needing agentic retrieval
Implementation Sketch
New components
EpisodeDetector service — episode matching/creation at ingest time
EpisodeSummarizer service — progressive summary generation
EpisodeStore — persistence layer (could extend GraphStore or be standalone)
Integration points
- Hook into
remember() flow after memory storage
- Episode summaries written back via
remember() with special source_type
- Graph edges: episode_summary → constituent memories
- Recall: episode summaries naturally surface via existing vector search
Configuration
episodes.enabled: true/false
episodes.detector: "embedding" | "llm"
episodes.summarizer_model: "gpt-4o-mini" (or local)
episodes.similarity_threshold: 0.7
episodes.max_summary_length: 500
Success Criteria
- LongMemEval multi-session questions that currently score 0% should achieve >60% accuracy
- Episode detection precision >80% (memories correctly assigned to episodes)
- Write latency increase <2x (episode detection + summary update)
- No degradation on single-hop queries (existing LoCoMo benchmark stays at 100%)
Discovery: LongMemEval Benchmark (2026-02-10)
Found during v0.7.3 LongMemEval benchmark run. Q1 ("How many properties before Brookside?") returned 10 results all about Brookside itself — 0% retrieval quality on multi-hop questions. Temporal reasoning and graph expansion exist but have no query planner to decompose multi-step questions. Episode memories solve this at the source.
Problem
Multi-session, multi-hop questions like "How many properties did I view before making an offer on the Brookside townhouse?" fail with current retrieval. The answer spans 4+ separate sessions with no semantic overlap to the query. Vector search returns results about Brookside itself, missing the earlier property viewings entirely.
Current architecture is read-optimized: store raw memories, try to be clever at query time with graph expansion and temporal filtering. But multi-hop temporal reasoning requires a query planner / agentic RAG layer on top — which defeats the purpose of a memory system.
Proposed Solution: Episode Memories
Flip the paradigm: write-time reasoning instead of read-time reasoning.
At ingest time, detect when memories belong to an ongoing episode (an activity/goal spanning multiple sessions) and auto-generate progressive summary memories that capture the narrative state.
How it works
EpisodeDetector — At ingest, classify whether a new memory belongs to an existing episode
EpisodeSummarizer — When a memory joins an episode, regenerate the summary
Episode summaries as first-class memories
source_type: "episode_summary"Why this is novel
Implementation Sketch
New components
EpisodeDetectorservice — episode matching/creation at ingest timeEpisodeSummarizerservice — progressive summary generationEpisodeStore— persistence layer (could extend GraphStore or be standalone)Integration points
remember()flow after memory storageremember()with special source_typeConfiguration
episodes.enabled: true/falseepisodes.detector: "embedding" | "llm"episodes.summarizer_model: "gpt-4o-mini"(or local)episodes.similarity_threshold: 0.7episodes.max_summary_length: 500Success Criteria
Discovery: LongMemEval Benchmark (2026-02-10)
Found during v0.7.3 LongMemEval benchmark run. Q1 ("How many properties before Brookside?") returned 10 results all about Brookside itself — 0% retrieval quality on multi-hop questions. Temporal reasoning and graph expansion exist but have no query planner to decompose multi-step questions. Episode memories solve this at the source.