Skip to content

Episode Memories: Write-time narrative synthesis for multi-hop retrieval #190

@abbudjoe

Description

@abbudjoe

Problem

Multi-session, multi-hop questions like "How many properties did I view before making an offer on the Brookside townhouse?" fail with current retrieval. The answer spans 4+ separate sessions with no semantic overlap to the query. Vector search returns results about Brookside itself, missing the earlier property viewings entirely.

Current architecture is read-optimized: store raw memories, try to be clever at query time with graph expansion and temporal filtering. But multi-hop temporal reasoning requires a query planner / agentic RAG layer on top — which defeats the purpose of a memory system.

Proposed Solution: Episode Memories

Flip the paradigm: write-time reasoning instead of read-time reasoning.

At ingest time, detect when memories belong to an ongoing episode (an activity/goal spanning multiple sessions) and auto-generate progressive summary memories that capture the narrative state.

How it works

  1. EpisodeDetector — At ingest, classify whether a new memory belongs to an existing episode

    • Embedding similarity to existing episode summaries
    • Or LLM-based classification (more accurate, higher cost)
    • Episodes represent activities/goals: "house hunting", "planning a trip", "job search"
  2. EpisodeSummarizer — When a memory joins an episode, regenerate the summary

    • Progressive: each new memory triggers an update, not a full recompute
    • Example output: "User is house hunting. Properties viewed: bungalow (rejected — kitchen renovation needed), Cedar Creek (rejected — over budget), 1-bed condo (rejected — highway noise), 2-bed condo (rejected — outbid). Offer made on Brookside townhouse $340k, accepted."
  3. Episode summaries as first-class memories

    • Stored with source_type: "episode_summary"
    • Directly searchable via vector search — no multi-hop needed
    • Graph links to constituent memories for drill-down

Why this is novel

  • Most memory systems optimize the READ path (embeddings, reranking, graph traversal)
  • This optimizes the WRITE path — invest compute at ingest to make retrieval trivially easy
  • Mirrors how human memory works (schemas/narratives, not isolated facts)
  • More token-efficient at query time (one summary vs. 10 raw messages)
  • Solves multi-hop without needing agentic retrieval

Implementation Sketch

New components

  • EpisodeDetector service — episode matching/creation at ingest time
  • EpisodeSummarizer service — progressive summary generation
  • EpisodeStore — persistence layer (could extend GraphStore or be standalone)

Integration points

  • Hook into remember() flow after memory storage
  • Episode summaries written back via remember() with special source_type
  • Graph edges: episode_summary → constituent memories
  • Recall: episode summaries naturally surface via existing vector search

Configuration

  • episodes.enabled: true/false
  • episodes.detector: "embedding" | "llm"
  • episodes.summarizer_model: "gpt-4o-mini" (or local)
  • episodes.similarity_threshold: 0.7
  • episodes.max_summary_length: 500

Success Criteria

  • LongMemEval multi-session questions that currently score 0% should achieve >60% accuracy
  • Episode detection precision >80% (memories correctly assigned to episodes)
  • Write latency increase <2x (episode detection + summary update)
  • No degradation on single-hop queries (existing LoCoMo benchmark stays at 100%)

Discovery: LongMemEval Benchmark (2026-02-10)

Found during v0.7.3 LongMemEval benchmark run. Q1 ("How many properties before Brookside?") returned 10 results all about Brookside itself — 0% retrieval quality on multi-hop questions. Temporal reasoning and graph expansion exist but have no query planner to decompose multi-step questions. Episode memories solve this at the source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions