A technical comparison for teams building AI support agents that need durable, structured memory.
AI support agents forget. Every session starts from zero. Returning customers re-explain who they are, what plan they're on, what they asked last time. Agents make the same mistakes they made before. Context degrades as conversation history grows.
This isn't a capability gap in the LLM — it's an infrastructure gap. Most AI applications have no memory layer.
The agent receives only the current message. No history, no identity, no memory.
- ✗ Cannot recognise a returning customer
- ✗ Cannot recall prior issues or preferences
- ✗ Every session starts cold
- ✗ Fails completely for multi-session workflows
Concatenate the entire conversation history into the prompt.
- ✗ Blows token budgets on long-running subjects (support customers often have 10–50+ sessions)
- ✗ No ranking — irrelevant history competes with critical facts
- ✗ No provenance — you can't trace why the agent said something
- ✗ Cost scales linearly with customer lifetime
- ✗ Context window limits force arbitrary truncation
Embed all messages, retrieve top-k by similarity to the current query.
- ✗ Non-deterministic — same query can return different results depending on index state
- ✗ No structured extraction — "Alice is on the Enterprise plan" is just a substring in a chunk
- ✗ No confidence scoring — all retrieved chunks are treated equally
- ✗ No temporal reasoning — superseded information (old email, resolved issue) has equal weight to current facts
- ✗ No provenance — retrieved chunks aren't traced back to specific interactions
- ✗ Token budget is managed by truncation, not by ranked priority
Statewave is a structured memory runtime. It doesn't store raw text and hope retrieval works — it compiles durable memories from raw events, scores them, and assembles ranked, token-bounded context bundles with full provenance.
Episodes (raw events) → Compilation → Memories (typed, scored) → Context assembly → Bundle (prompt-ready)
- Episodes — immutable, append-only records of interactions. The raw truth.
- Memories — compiled, typed facts with confidence scores, validity windows, and provenance back to source episodes.
- Context bundles — ranked, token-bounded output with sections (identity facts, procedures, history, recent interactions) ready to inject into any prompt.
| Property | What it means |
|---|---|
| Deterministic | Same subject + task + budget → same context bundle. No non-determinism from vector-only retrieval. |
| Token-bounded | Context assembly respects a configurable token budget. Items are packed by ranked score, not truncated arbitrarily. |
| Ranked | Scoring formula: kind priority × recency × task relevance × temporal validity × semantic similarity (when available). |
| Provenance-traced | Every memory traces to its source episode IDs. Every context bundle reports which facts, summaries, and episodes were included. |
| Auditable | Every assembly call can emit an immutable state-assembly receipt — a ULID-addressable record of which memories and episodes influenced the bundle, with a SHA-256 hash of the bytes delivered to the agent. Compliance reviewers can answer "what state did the agent actually see at decision time?" without trusting application logs. |
| Governable | Per-memory sensitivity labels feed a declarative YAML policy engine. Operators express rules like "PII memories cannot be read by marketing tools" once, and /v1/context enforces them per call — with the decision recorded into the receipt either way (log_only for audit-without-filtering, enforce for hard blocking). |
| Idempotent | Recompiling the same subject produces no duplicate memories. |
| Subject-centric | Everything is organised around subjects (customer, account, workspace). Full lifecycle: ingest → compile → retrieve → inspect → delete. |
These claims are backed by the support-agent context quality eval, which runs 7 tests with 15 binary assertions against a live Statewave instance, the handoff eval (6 tests, 15 assertions), the advanced eval (10 tests, 26 assertions), and the support-agent benchmark:
| Claim | Evidence |
|---|---|
| Identity facts persist across sessions | Eval test 1: name, company, plan recalled correctly for a billing task |
| Relevant preferences surface for matching tasks | Eval test 2: integration preferences (Python SDK, webhooks) appear for integration task |
| Issue history surfaces for follow-up tasks | Eval test 3: SSO issue + ticket number appear for follow-up task |
| Token budget is respected | Eval test 4: token estimate ≤ configured budget |
| Identity persists even for unrelated tasks | Eval test 4: identity facts included regardless of task topic |
| Provenance traces facts to source episodes | Eval test 5: bundle contains fact_ids, each fact has source_episode_ids |
| Compilation is idempotent | Eval test 6: recompile produces 0 new memories |
| Memory extraction is reasonable | Eval test 7: 8–30 memories from 8 episodes, ≥3 profile facts |
| Session-aware ranking surfaces active-session content | Advanced eval: the active month-end-update thread appears in context |
| Open/escalated issues surface in context | Advanced eval: escalation + connection-pool details appear for the open issue |
| Task-relevant facts outrank off-topic ones | Advanced eval: billing/gateway facts rank above an unrelated password reset |
| Repeat-issue signal surfaces the prior fix | Advanced eval: a recurring timeout brings back the earlier "restart" resolution |
| Customer health scoring is explainable | Advanced eval: at_risk/watch state, score < 70, named factors (unresolved, repeated, escalations) |
| Health-aware handoff carries risk level | Advanced eval: handoff health_state/score match the health endpoint, with icon + label in notes |
| Handoff health factors stay compact | Advanced eval: at most 3 health factors in the handoff pack |
| Resolution-aware ranking works | Advanced eval: open billing issue is the active issue; resolution history (≥2) present |
| Handoff is compact and deterministic | Advanced eval: ≤4000-token pack; identical requests produce identical notes + score |
| Handoff carries provenance | Advanced eval: handoff provenance includes episode_ids and resolution_ids |
| Proactive health alerts on degradation | Unit tests: webhook fired on healthy→watch, watch→at_risk, healthy→at_risk; no spam on unchanged |
| Health recovery confirmation | Unit tests: subject.health_improved fired on at_risk→watch, watch→healthy, at_risk→healthy |
| Support workflow superiority vs naive | Workflow benchmark: Statewave 8/8 vs Naive 2/8 on active-issue, repeat-detection, health, provenance, resolution-ranking |
| SLA tracking with breach detection | Unit tests: first-response time, resolution time, breach flags, custom thresholds; integrated into health scoring and handoff |
| SLA breaches degrade health score | Unit tests: sla_resolution_breaches and slow_first_response signals penalize health deterministically |
| SLA context in handoff packs | Unit tests: breach flags and open-issue age appear in handoff when relevant, absent when clean |
The eval exits non-zero on failure and is CI-friendly.
Getting context for a support agent prompt is one SDK call:
from statewave import StatewaveClient
client = StatewaveClient()
context = client.get_context_string("customer-123", "Help with billing question")
# → structured text ready to inject into your system promptOr the full bundle with provenance:
bundle = client.get_context("customer-123", "Help with billing question")
# bundle.assembled_context → prompt text
# bundle.facts → list of memory objects with source_episode_ids
# bundle.provenance → {"fact_ids": [...], "summary_ids": [...], "episode_ids": [...]}
# bundle.token_estimate → integer- Self-hosted storage — Postgres-only, no external services required. Episodes and compiled memories stay in your infrastructure. Whether content leaves the network during compilation or embedding depends on the provider you configure — see Privacy & Data Flow.
- No vendor lock-in — heuristic compiler works without any LLM API key (and is the default). Embeddings and LLM compilation are optional enhancements; the heuristic path runs fully local with zero data egress.
- Operator-friendly — Docker Compose, health endpoints, structured logging, OpenTelemetry tracing, configurable via environment variables.
- Reliable webhook delivery — persistent queue with retries and dead-letter (v0.5). Proactive health alerts emit
subject.health_degradedon state transitions. - Clean API — versioned REST, OpenAPI docs, structured error responses with request-ID correlation.
- Typed SDKs — Python (sync + async, Pydantic models) and TypeScript (full type definitions), both with proper error handling.
- Transparent scoring — the ranking formula is documented, deterministic, and inspectable. No black-box relevance.
| Area | Status |
|---|---|
| Production scale (>10k subjects, high throughput) | Not load-tested at this scale. Multi-replica API supported and verified since v0.8 (Fly multi-machine + Helm HPA); single-Postgres only, no cross-region clustering. |
| Multi-tenant isolation | App-layer query scoping; no Postgres RLS yet. Not battle-tested at scale. |
| LLM compiler vs heuristic compiler quality | LLM compiler exists but no comparative eval published. |
| 50-session production-scale benchmark | Not yet run. |
We are honest about these gaps. If any of these are blockers for your use case, Statewave may not be ready for you yet.
- Single-Postgres only — multi-replica API deployments are supported and verified since v0.8, but cross-region / multi-Postgres clustering is not supported yet
- No built-in auth provider (validates keys you configure, doesn't issue them)
- No streaming (context returned as complete JSON)
- Operator admin console is early — dashboards plus policy and per-tenant config management; no memory editing or advanced ops yet
- Rate limiting is per-IP (distributed/Postgres-backed, but not per-tenant or per-API-key)
- Teams building AI support agents that interact with returning customers across sessions
- Engineering leads who want structured, measurable context quality instead of "we hope RAG works"
- Teams that need provenance — "why did the agent say X?" must be answerable
- Self-hosted storage requirements — episodes and memories must stay in your infrastructure (heuristic compiler keeps the entire pipeline local; LLM/embedding choices determine whether content leaves)
- Small, capable teams using AI coding tools who want a focused product, not an enterprise platform
- Teams that need a hosted SaaS (Statewave is self-hosted infrastructure)
- Teams that just need a vector database (use pgvector/Pinecone/Weaviate directly)
- Teams building chatbots with no multi-session requirement
- Teams that need cross-region clustering or multi-Postgres scale-out today (multi-replica API deployments are supported; cross-region clustering is not)
- Teams looking for a complete agent framework (Statewave is a memory/context layer, not an orchestrator)
| Resource | Link |
|---|---|
| Getting started | getting-started.md |
| Product overview | product.md |
| API contract | api/v1-contract.md |
| Support agent example | statewave-examples/support-agent-python |
| Context quality eval | statewave-examples/eval-support-agent |
| Live demo | Embedded chat widget on statewave.ai |
| Python SDK | pip install statewave · source |
| TypeScript SDK | npm install @statewavedev/sdk · source |