fix(backend): snapshot compaction for AGUI events to prevent OOM#914
fix(backend): snapshot compaction for AGUI events to prevent OOM#914Gkrumbach07 wants to merge 1 commit intomainfrom
Conversation
The backend was OOMKilled (512Mi limit) when replaying large event streams for finished sessions. Multiple concurrent SSE clients each loaded 36K+ events into memory and ran delta compaction, exceeding the memory limit within ~44 seconds. This implements AG-UI snapshot compaction per the serialization spec: finished sessions are collapsed into MESSAGES_SNAPSHOT events (36K events → ~3 events), cached to disk, and served from cache on subsequent reads. Changes: - Add compactToSnapshots() using AG-UI MESSAGES_SNAPSHOT pattern - Add disk caching (agui-events-compacted.jsonl) with atomic writes - Invalidate cache on RUN_STARTED and RUN_ERROR events - Use strings.Builder for O(n) delta concatenation (was O(n²)) - Reuse existing readJSONLFile helper instead of duplicating - Remove dead compactStreamingEvents (180 lines, no longer called) - Bump backend memory limit from 512Mi to 768Mi as safety net Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughThe changes refactor event handling in the websocket backend by replacing delta-based compaction with a snapshot-based approach. Finished event streams are now converted to MESSAGES_SNAPSHOT events containing fully assembled messages and tool calls. The proxy handler is simplified to use a new loadEventsForReplay function uniformly. Backend memory resource limits are increased. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/backend/websocket/agui_store_test.go`:
- Around line 521-522: The test currently uses a fixed sleep to wait for the
async cache write which is racy; update the test to synchronize
deterministically: either change writeCompactedFile to expose a synchronization
primitive (return a done channel or accept a *sync.WaitGroup) and wait on that
in the test, or replace the time.Sleep in agui_store_test.go with a polling loop
that checks for the file’s existence (os.Stat) with a short interval and overall
timeout (failing the test if timeout elapses). Remove the time.Sleep(100 *
time.Millisecond) and use the chosen synchronization approach around
writeCompactedFile to avoid flakes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 13c147d5-0432-4476-9f09-606d669d7fd3
📒 Files selected for processing (4)
components/backend/websocket/agui_proxy.gocomponents/backend/websocket/agui_store.gocomponents/backend/websocket/agui_store_test.gocomponents/manifests/base/backend-deployment.yaml
Review Queue Status
Action needed: Fix CI failures
|
Summary
MESSAGES_SNAPSHOTevents (36K events → ~3 events)agui-events-compacted.jsonl) with atomic writes; subsequent reads serve from cachecompactStreamingEventsdelta compaction code (180 lines, replaced by snapshot compaction)Key changes
compactToSnapshots()— assembles TEXT_MESSAGE and TOOL_CALL sequences into Message objects per AG-UI specloadEventsForReplay()— serves cached snapshots for finished sessions, raw events for active runsRUN_STARTEDandRUN_ERRORstrings.Builderfor O(n) delta concatenation (was O(n²) via+=)readJSONLFilehelper instead of duplicatingTest plan
TestCompactToSnapshots— verifies text messages, tool calls, RAW passthrough, metadata preservationTestLoadEventsForReplay— verifies finished/active session handling, cache write/read, cache invalidationgo vet,gofmt,go buildall clean🤖 Generated with Claude Code