-
Notifications
You must be signed in to change notification settings - Fork 61
fix(backend): stream event logs to prevent OOMKill #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ package websocket | |
|
|
||
| import ( | ||
| "ambient-code-backend/types" | ||
| "bufio" | ||
| "encoding/json" | ||
|
Comment on lines
+14
to
15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Run The lint job is already red for As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity." Also applies to: 57-61 🤖 Prompt for AI Agents |
||
| "fmt" | ||
| "log" | ||
|
|
@@ -53,6 +54,12 @@ func evictStaleWriteMutexes() { | |
| // Set from the STATE_BASE_DIR env var (default "/workspace") at startup. | ||
| var StateBaseDir string | ||
|
|
||
| const ( | ||
| // Scanner buffer sizes for reading JSONL files | ||
| scannerInitialBufferSize = 64 * 1024 // 64KB initial buffer | ||
| scannerMaxLineSize = 1024 * 1024 // 1MB max line size | ||
| ) | ||
|
Comment on lines
+57
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
rg -n -C2 'scannerMaxLineSize|bufio.NewScanner|scanner.Buffer|scanner.Err|loadEvents\(' \
components/backend/websocket/agui_store.go \
components/backend/websocket/agui_proxy.goRepository: ambient-code/platform Length of output: 2700 🏁 Script executed: # Get the complete loadEvents function implementation
sed -n '171,210p' components/backend/websocket/agui_store.goRepository: ambient-code/platform Length of output: 1303 🏁 Script executed: # Check the context around the loadEvents call in agui_proxy.go
sed -n '145,170p' components/backend/websocket/agui_proxy.goRepository: ambient-code/platform Length of output: 1000 🏁 Script executed: # Look for any downstream validation or error checking on the returned events
rg -n -A10 'events := loadEvents' components/backend/websocket/agui_proxy.goRepository: ambient-code/platform Length of output: 383 The scanner's 1MB line limit can silently truncate event replay without signaling failure. When any JSONL line exceeds 1MB, 🤖 Prompt for AI Agents |
||
|
|
||
| // ─── Live event pipe (multi-client broadcast) ─────────────────────── | ||
| // The run handler pipes raw SSE lines to ALL connect handlers tailing | ||
| // the same session. Zero latency — same as the direct run() path. | ||
|
|
@@ -157,21 +164,22 @@ func persistEvent(sessionID string, event map[string]interface{}) { | |
|
|
||
| // ─── Read path ─────────────────────────────────────────────────────── | ||
|
|
||
| // loadEvents reads all AG-UI events for a session from the JSONL log. | ||
| // loadEvents reads all AG-UI events for a session from the JSONL log | ||
| // using a streaming scanner to avoid loading the entire file into memory. | ||
| // Automatically triggers legacy migration if the log doesn't exist but | ||
| // a pre-AG-UI messages.jsonl file does. | ||
| func loadEvents(sessionID string) []map[string]interface{} { | ||
| path := fmt.Sprintf("%s/sessions/%s/agui-events.jsonl", StateBaseDir, sessionID) | ||
|
|
||
| data, err := os.ReadFile(path) | ||
| f, err := os.Open(path) | ||
| if err != nil { | ||
| if os.IsNotExist(err) { | ||
| // Attempt legacy migration (messages.jsonl → agui-events.jsonl) | ||
| if mErr := MigrateLegacySessionToAGUI(sessionID); mErr != nil { | ||
| log.Printf("AGUI Store: legacy migration failed for %s: %v", sessionID, mErr) | ||
| } | ||
| // Retry after migration | ||
| data, err = os.ReadFile(path) | ||
| f, err = os.Open(path) | ||
| if err != nil { | ||
| return nil | ||
| } | ||
|
|
@@ -180,9 +188,14 @@ func loadEvents(sessionID string) []map[string]interface{} { | |
| return nil | ||
| } | ||
| } | ||
| defer f.Close() | ||
|
|
||
| events := make([]map[string]interface{}, 0, 64) | ||
| for _, line := range splitLines(data) { | ||
| scanner := bufio.NewScanner(f) | ||
| // Allow lines up to 1MB (default 64KB may truncate large tool outputs) | ||
| scanner.Buffer(make([]byte, 0, scannerInitialBufferSize), scannerMaxLineSize) | ||
| for scanner.Scan() { | ||
| line := scanner.Bytes() | ||
| if len(line) == 0 { | ||
| continue | ||
| } | ||
|
|
@@ -191,6 +204,9 @@ func loadEvents(sessionID string) []map[string]interface{} { | |
| events = append(events, evt) | ||
| } | ||
| } | ||
| if err := scanner.Err(); err != nil { | ||
| log.Printf("AGUI Store: error scanning event log for %s: %v", sessionID, err) | ||
| } | ||
| return events | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still materializes the full session set before paginating.
allItemsaccumulates every page and the next loop copies them again intosessions, so peak memory is still O(total sessions). In a large namespace this endpoint can still hit the same memory ceiling despite the K8sLimit. At minimum, buildsessionspage-by-page instead of bufferingallItems; longer term, this API needs server-side or cursor-based pagination if the request is supposed to stay bounded.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents