feat: stream resumption for interrupted conversations#2710
feat: stream resumption for interrupted conversations#2710anubra266 wants to merge 2 commits intodurable-executionfrom
Conversation
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
TL;DR — Adds two complementary stream-resumption mechanisms for interrupted conversations: an in-memory buffer (classic mode) that tees encoded SSE chunks and replays them on reconnect via Key changes
Summary | 50 files | 12 commits | base: In-memory
|
| Layer | Change |
|---|---|
| Manage schema | execution_mode varchar(50) NOT NULL DEFAULT 'classic' on agents |
| Runtime schema | New workflow_executions table with status enum (running/suspended/completed/failed) |
| Validation | WorkflowExecutionSelectSchema, InsertSchema, UpdateSchema; AgentInsertSchema gains executionMode |
| DAL | createWorkflowExecution, getWorkflowExecution, getWorkflowExecutionByConversation, updateWorkflowExecutionStatus |
| UI | executionMode field in AgentMetadata, Select component in MetadataEditor, serialization in serialize.ts |
manage-schema.ts · runtime-schema.ts · workflowExecutions.ts · metadata-editor.tsx
There was a problem hiding this comment.
Thorough review of the durable execution mode and stream resumption feature. This is a well-structured addition, but there are several issues that need attention before merge — ranging from a credential-handling blocker to race conditions and type-safety concerns.
Critical (blocks merge):
buildAgentForSteppassescredentialStoreRegistry: undefinedanduserId: undefined, which breaks any MCP tool requiring authenticated credentials- Orphan migration file
0022_futuristic_ozymandias.sql(identical to0023) will cause confusion and potential conflicts - Race condition in
createReadable()— TOCTOU gap between replaying and subscribing
High:
[DONE]sentinel removal fromSSEStreamHelper.complete()breaks the agents-sdk which uses it as a stream-termination signalWritableBackedVercelWriter.write()fire-and-forget async pattern violates WritableStream contract- Unguarded
JSON.parsein generateTaskHandler can crash the task handler on malformed metadata
Medium:
- Changeset should be
minor(new column, new table, new endpoints) approved: 'pending'string in a boolean union is fragile- Metadata replacement vs merge in
updateWorkflowExecutionStatus
| credentialStoreRegistry: undefined, | ||
| userId: undefined, |
There was a problem hiding this comment.
credentialStoreRegistry: undefined and userId: undefined are passed to getMcpToolById. In the core data-access layer, discoverToolsFromServer throws when a tool has a credential reference and the registry is undefined. This will break any MCP tool requiring credentials in the durable path, while the non-durable path (generateTaskHandler.ts) passes the real registry from the request context.
Since workflow steps can't access the Hono request context, you'll likely need to serialize credential store config into the workflow payload or use a service-level credential store that doesn't depend on the request.
| async complete(finishReason = 'stop'): Promise<void> { | ||
| await this.flushQueuedOperations(); | ||
|
|
||
| await this.writeCompletion(finishReason); | ||
| await this.writeDone(); | ||
| } |
There was a problem hiding this comment.
Removing the [DONE] sentinel from SSEStreamHelper.complete() is a breaking change. The agents-sdk (packages/agents-sdk/src/agent.ts) uses data: [DONE] as a break signal to stop reading the stream. Without it, the SDK will hang waiting for the stream to close via TCP FIN instead of cleanly terminating.
If durable mode doesn't need [DONE], consider keeping it in SSEStreamHelper and omitting it only in the durable stream adapters.
| write(chunk: unknown): void { | ||
| const bytes = encoder.encode(`data: ${JSON.stringify(chunk)}\n\n`); | ||
| this.writer.write(bytes).catch((err) => { | ||
| logger.warn({ err }, 'Failed to write to durable stream'); | ||
| }); |
There was a problem hiding this comment.
write() is synchronous but calls this.writer.write(bytes).catch(...) without awaiting it. If two write() calls happen in rapid succession, the second writer.write() is called before the first resolves. This violates the WritableStream contract (must not call write() while a previous write is pending) and can silently reorder chunks or throw.
Either make write() async and await the write, or chain writes via a promise queue.
| const message = error instanceof Error ? error.message : String(error); | ||
| if (!message.includes('not found') && !message.includes('already')) { | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
Swallowing errors whose message includes('not found') or includes('already') is fragile string-matching that could mask real errors (e.g. a database 'connection not found' error). Consider catching a specific error type or checking for an exact error code from the WDK hook resumption API.
| @@ -0,0 +1,5 @@ | |||
| --- | |||
| "@inkeep/agents-api": patch | |||
There was a problem hiding this comment.
This changeset is patch, but the PR adds a new column (execution_mode) to the agents table, a new workflow_executions table, and new API endpoints (/executions). Per the project's semver guidance, schema changes requiring migration should be minor.
| const updateData: Record<string, unknown> = { | ||
| status: params.status, | ||
| updatedAt: now, | ||
| }; | ||
|
|
||
| if (params.metadata !== undefined) { | ||
| updateData.metadata = params.metadata; | ||
| } |
There was a problem hiding this comment.
updateWorkflowExecutionStatus fully replaces the metadata column when params.metadata is provided, rather than merging with existing metadata. This means markWorkflowSuspendedStep (which sets { continuationStreamNamespace }) followed by markWorkflowFailedStep (which sets { error }) will erase the namespace. If any code later needs the namespace after a failure, it will be lost. Consider using a JSON merge or documenting that metadata is intentionally replaced per-status-transition.
| const firstArtifactData = (messageResponse.result as any)?.artifacts?.[0]?.parts?.[0] | ||
| ?.data as { type?: string; toolCallId?: string; toolName?: string; args?: unknown }; | ||
| if (firstArtifactData?.type === 'durable-approval-required') { |
There was a problem hiding this comment.
Reaching into (messageResponse.result as any)?.artifacts?.[0]?.parts?.[0]?.data with a chain of any casts and checking a magic string 'durable-approval-required' is fragile. If the artifact shape changes in generateTaskHandler.ts, this silently breaks. Consider extracting a shared type/constant for the durable-approval-required artifact shape.
| await executeToolStep({ | ||
| payload, | ||
| currentSubAgentId, | ||
| toolCallId: toolCall.toolCallId, | ||
| toolName: toolCall.toolName, | ||
| args: toolCall.args, | ||
| workflowRunId, | ||
| streamNamespace: `r${approvalRound}`, | ||
| taskId, | ||
| preApproved: approvalResult.approved, | ||
| approvalReason: approvalResult.reason, | ||
| }); |
There was a problem hiding this comment.
The return value of executeToolStep is discarded — { type: 'needs_approval' } is silently ignored. If the tool executed inside executeToolStep itself triggers another tool that requires approval (nested tool calling), the pending approval is lost and the workflow continues as if the tool completed successfully.
There was a problem hiding this comment.
PR Review Summary
(12) Total Issues | Risk: High
🔴❗ Critical (3) ❗🔴
🔴 1) stream-buffer-registry.ts Module-scoped singleton not shared across bundle boundaries
Issue: StreamBufferRegistry uses a module-level singleton (export const streamBufferRegistry), but the existing stream-registry.ts uses globalThis with a key to ensure the registry is shared across module boundaries (e.g., WDK workflow bundle and main Hono app bundle).
Why: The comment in stream-registry.ts explicitly states this pattern is needed so both bundles resolve to the same Map. Without this, the buffer registered by chatDataStream (main bundle) won't be found by the resume endpoint or workflow steps (potentially different bundle), causing stream resumption to silently fail with 204 responses.
Fix: Follow the globalThis pattern from stream-registry.ts:
const REGISTRY_KEY = '__inkeep_streamBufferRegistry';
function getBufferRegistry(): Map<string, BufferEntry> {
const g = globalThis as Record<string, unknown>;
if (!g[REGISTRY_KEY]) {
g[REGISTRY_KEY] = new Map<string, BufferEntry>();
}
return g[REGISTRY_KEY] as Map<string, BufferEntry>;
}Refs:
🔴 2) 0022_futuristic_ozymandias.sql Duplicate migration file
Issue: This migration file is byte-identical to 0023_futuristic_ozymandias.sql — both create the same workflow_executions table. The journal references 0022_superb_micromacro at index 22, but this orphaned file is not registered.
Why: If accidentally included in a migration run, it will fail with 'table already exists'. This indicates possible migration state corruption from rebasing/merging that could cause deployment failures.
Fix: Remove this orphaned file using pnpm db:drop or manually delete since it's NOT in the journal. Per AGENTS.md: "NEVER manually delete migration files - use pnpm db:drop".
Refs:
🔴 3) system PR title/description misrepresents actual scope
Issue: The PR is titled "feat: in-memory stream resumption" but the actual diff is +2460/-145 lines implementing comprehensive durable execution infrastructure. The stream resumption feature is ~260 lines; durable execution infrastructure is ~2200+ lines.
Why: The changeset itself says "Add durable execution mode for agent runs with tool approvals and crash recovery." Future maintainers reviewing this PR will be misled about the scope of changes introduced.
Fix: Update PR title to accurately reflect scope: "feat: durable execution mode for agent runs with tool approvals, crash recovery, and stream resumption". Lead with durable execution architecture in the description.
Inline Comments:
- 🔴 Critical:
stream-buffer-registry.ts:25Module-scoped singleton pattern - 🔴 Critical:
chatDataStream.ts:656-667Fire-and-forget async IIFE swallows errors - 🔴 Critical:
0022_futuristic_ozymandias.sql:1Duplicate migration file
🟠⚠️ Major (5) 🟠⚠️
🟠 1) chatDataStream.ts:448-491 Durable execution path doesn't register with stream buffer
Issue: The durable execution path streams directly from run.readable without registering with streamBufferRegistry. Stream resumption via GET /conversations/:conversationId/stream won't work for durable executions.
Why: This is the use case where crash recovery is most needed, yet the resume endpoint returns 204 because the buffer was never registered.
Fix: Either tee run.readable and register with buffer, update the resume endpoint to check durable executions, or document the limitation clearly.
Refs:
🟠 2) stream-buffer-registry.ts:15-25 Overwriting existing buffer leaves active consumers hanging
Issue: If a new chat request arrives for a conversation with an active buffer, register() overwrites without emitting 'done' to existing subscribers, leaving them hanging indefinitely.
Why: Previous resume clients will wait forever for a 'done' event that never comes.
Fix: Emit 'done' on existing emitter before overwriting (see inline comment for code).
🟠 3) stream-buffer-registry.ts:49-69 Race condition between done check and event listener setup
Issue: After checking if (entry.done) and before attaching the 'done' listener, the stream could complete, causing the 'done' event to be missed.
Why: This can cause the resume stream to hang indefinitely in edge cases.
Fix: Subscribe to events FIRST, then replay buffered chunks (see inline comment for code).
🟠 4) multi-file Missing test coverage for critical new functionality
Issue: The StreamBufferRegistry class (core of stream resumption), the resume endpoint, and durable execution workflow steps have zero test coverage. Per AGENTS.md: "ALL new work MUST include these three components - NO EXCEPTIONS: 1. Unit Tests".
Why: Without tests, race conditions, memory leaks, and cleanup failures could slip through undetected. The existing generateTaskHandler.test.ts adds mocks for durable methods but no tests exercise them.
Fix: Add dedicated tests for:
- StreamBufferRegistry: registration, push, replay, cleanup timeouts, concurrent access
- Resume endpoint: 204 responses, authorization, SSE headers
- Durable execution: workflow context propagation, approval queue consumption
Refs:
🟠 5) multi-file Missing changeset for @inkeep/agents-core schema changes
Issue: Schema changes (new execution_mode column, new workflow_executions table) are in @inkeep/agents-core but the changeset only bumps @inkeep/agents-api.
Why: Downstream consumers of @inkeep/agents-core who depend on Drizzle schema types won't receive proper semver signaling.
Fix: Add @inkeep/agents-core: minor to the changeset per data-model-changes skill guidance.
Inline Comments:
- 🟠 Major:
stream-buffer-registry.ts:15-25Overwriting existing buffer - 🟠 Major:
stream-buffer-registry.ts:49-69Race condition in createReadable - 🟠 Major:
chatDataStream.ts:448-491Durable path missing buffer registration
🟡 Minor (2) 🟡
🟡 1) stream-buffer-registry.ts:10 Hardcoded cleanup delay without configurability
Issue: The 60-second cleanup delay is hardcoded with no environment variable override and no observability into buffer lifecycle.
Why: Self-hosted deployments may need different windows; debugging buffer state during incidents is difficult.
Fix: Move to execution-limits constants; add logging for buffer events.
🟡 2) multi-file executionMode naming is opaque to customers
Issue: 'classic' implies 'old/legacy'; 'durable' is infrastructure jargon that doesn't communicate customer benefit.
Why: Customers choosing between modes won't understand tradeoffs.
Fix: Consider renaming to 'ephemeral' | 'resumable' or document clearly.
Inline Comments:
- 🟡 Minor:
stream-buffer-registry.ts:10Hardcoded cleanup delay
💭 Consider (2) 💭
💭 1) executions.ts:100-101 Path parameter style inconsistency
Issue: Uses :executionId Express-style while conversations.ts uses {conversationId} OpenAPI-style.
Fix: Standardize on OpenAPI-style {param} format for consistency.
💭 2) stream-buffer-registry.ts No graceful shutdown handling
Issue: Active setTimeout handles will keep the process alive during shutdown. Active streams may be left inconsistent.
Fix: Add a shutdown() method that clears timeouts and emits 'done' to all active buffers.
🚫 REQUEST CHANGES
Summary: This PR introduces valuable durable execution infrastructure, but has several critical issues that need addressing before merge:
- The stream buffer registry uses a module-scoped singleton that won't work across bundle boundaries — this will cause stream resumption to silently fail in production.
- Duplicate migration file indicates migration state corruption that could fail deployments.
- Missing error handling in the background buffering task will swallow errors silently.
- Durable mode doesn't actually support stream resumption via the documented endpoint.
- Critical functionality lacks test coverage per AGENTS.md requirements.
The architecture is sound and the feature is valuable, but these issues need resolution to avoid production incidents.
Discarded (8)
| Location | Issue | Reason Discarded |
|---|---|---|
tool-approval.ts |
Cross-tenant tool approval bypass via unvalidated toolCallId | Further investigation shows the conversationId is validated before approval; toolCallId is internally generated and not user-controllable in the classic path |
agentExecutionSteps.ts |
Code duplication with generateTaskHandler | WDK workflow steps have serialization constraints requiring separate implementations; documented architectural decision |
workflowExecutions.ts |
Missing optimistic locking on status updates | WDK handles step sequencing; concurrent updates are expected to fail-fast at the workflow level |
PendingToolApprovalManager.ts |
Variable extraction refactoring unrelated to feature | Incidental cleanup that doesn't change behavior; not blocking |
stream-helpers.ts |
Removal of writeDone() [DONE] marker | OpenAI spec doesn't require this; Vercel AI SDK handles stream termination via connection close |
conversations.ts |
Authorization check returns 204 same as no-stream | This is intentional information-leakage protection; logging suggestion is valid but optional |
durable-stream-helper.ts |
Silent write failures in write() | Errors are logged at warn level; stream write failures are inherently non-recoverable |
agent-types.ts |
Record type allows undefined access | Current code uses optional chaining correctly; runtime safety is preserved |
Reviewers (11)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
3 | 1 | 0 | 0 | 2 | 0 | 0 |
pr-review-sre |
7 | 2 | 1 | 0 | 2 | 0 | 2 |
pr-review-architecture |
8 | 2 | 0 | 0 | 1 | 0 | 5 |
pr-review-tests |
7 | 1 | 0 | 0 | 0 | 0 | 6 |
pr-review-errors |
7 | 0 | 0 | 0 | 1 | 0 | 6 |
pr-review-breaking-changes |
5 | 2 | 0 | 0 | 1 | 0 | 2 |
pr-review-precision |
6 | 1 | 0 | 0 | 0 | 0 | 5 |
pr-review-product |
5 | 0 | 1 | 0 | 0 | 0 | 4 |
pr-review-consistency |
6 | 0 | 1 | 0 | 0 | 0 | 5 |
pr-review-types |
5 | 0 | 0 | 0 | 0 | 0 | 5 |
pr-review-security-iam |
1 | 0 | 0 | 0 | 0 | 0 | 1 |
| Total | 60 | 9 | 3 | 0 | 7 | 0 | 41 |
Note: Many findings were deduplicated across reviewers (e.g., multiple reviewers flagged the singleton pattern and error handling issues).
| @@ -0,0 +1,15 @@ | |||
| CREATE TABLE "workflow_executions" ( | |||
There was a problem hiding this comment.
🔴 CRITICAL: Duplicate migration file
Issue: This migration file (0022_futuristic_ozymandias.sql) is byte-identical to 0023_futuristic_ozymandias.sql — both create the same workflow_executions table. The journal (drizzle/runtime/meta/_journal.json) references 0022_superb_micromacro at index 22, but this file is not registered and appears to be a duplicate.
Why: If this orphaned file is accidentally included in a migration run, it will fail with 'table already exists'. This indicates possible migration state corruption that could cause deployment failures.
Fix: Remove this orphaned file using pnpm db:drop to manage migration state properly, or manually delete ONLY this file since it's NOT in the journal. The journaled migrations should remain.
Refs:
| if (agent.executionMode === 'durable') { | ||
| const requestId = `chatds-${Date.now()}`; | ||
| const run = await start(agentExecutionWorkflow, [ | ||
| { | ||
| tenantId, | ||
| projectId, | ||
| agentId, | ||
| conversationId, | ||
| userMessage: userText, | ||
| messageParts: messageParts.length > 0 ? messageParts : undefined, | ||
| requestId, | ||
| resolvedRef: executionContext.resolvedRef, | ||
| forwardedHeaders: | ||
| Object.keys(forwardedHeaders).length > 0 ? forwardedHeaders : undefined, | ||
| outputFormat: 'vercel', | ||
| }, | ||
| ]); | ||
| logger.info( | ||
| { runId: run.runId, conversationId, agentId }, | ||
| 'Durable execution started via /chat' | ||
| ); | ||
| c.header('x-workflow-run-id', run.runId); | ||
| c.header('content-type', 'text/event-stream'); | ||
| c.header('cache-control', 'no-cache'); | ||
| c.header('connection', 'keep-alive'); | ||
| c.header('x-vercel-ai-data-stream', 'v2'); | ||
| c.header('x-accel-buffering', 'no'); | ||
| return stream(c, async (s) => { | ||
| try { | ||
| const reader = run.readable.getReader(); | ||
| while (true) { | ||
| const { done, value } = await reader.read(); | ||
| if (done) break; | ||
| await s.write(value); | ||
| } | ||
| } catch (error) { | ||
| logger.error( | ||
| { error, runId: run.runId }, | ||
| 'Error streaming durable execution via /chat' | ||
| ); | ||
| await s.write(`event: error\ndata: ${JSON.stringify({ error: 'Stream error' })}\n\n`); | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Durable execution path doesn't register with stream buffer
Issue: The durable execution path (executionMode === 'durable') streams directly from run.readable without registering with streamBufferRegistry. This means stream resumption via GET /conversations/:conversationId/stream won't work for durable executions — the endpoint returns 204 because the buffer was never registered.
Why: The PR describes stream resumption as a key feature, but it only works for 'classic' mode. This is the use case where crash recovery is most needed.
Fix: Either:
- Tee
run.readableand register withstreamBufferRegistrysimilar to the non-durable path, OR - Update the resume endpoint to also check for active durable executions via
getRun().getReadable(), OR - Document that stream resumption via GET
/conversations/:id/streamonly works for classic mode, while durable mode uses the separateGET /executions/:id/streamendpoint
Refs:
| timeoutId?: ReturnType<typeof setTimeout>; | ||
| } | ||
|
|
||
| const CLEANUP_DELAY_MS = 60_000; |
There was a problem hiding this comment.
🟡 Minor: Hardcoded cleanup delay without configurability
Issue: The 60-second cleanup delay is hardcoded and not configurable via environment variable. Additionally, there's no observability into buffer lifecycle events (registration, completion, cleanup).
Why: During incident debugging, it's difficult to understand buffer state. Self-hosted deployments may need different cleanup windows. Under load, completed streams accumulate in memory for 60s.
Fix: Consider:
- Move
CLEANUP_DELAY_MSto execution-limits constants for environment override support - Add trace spans or log statements for buffer registration, completion, and cleanup events
- Expose a
getStats()method for health checks:{ activeBuffers: number, totalBytes: number }
Refs:
c63a846 to
42077a3
Compare
There was a problem hiding this comment.
PR Review Summary
(7) Total Issues | Risk: High
🔴❗ Critical (2) ❗🔴
Inline Comments:
- 🔴 Critical:
stream-buffer-registry.ts:13Module-scoped singleton not shared across bundle boundaries - 🔴 Critical:
chatDataStream.ts:656-667Fire-and-forget async IIFE lacks error handling
🟠⚠️ Major (4) 🟠⚠️
🟠 1) stream-buffer-registry.ts Missing test coverage for critical new functionality
files: stream-buffer-registry.ts, conversations.ts:376-405
Issue: The StreamBufferRegistry class (79 lines) and the new resume endpoint have zero test coverage. Per AGENTS.md: "ALL new work MUST include... Unit Tests - NO EXCEPTIONS".
Why: Without tests, these regressions could go undetected:
- Memory leaks if cleanup timer fails to delete entries after 60s
- Race condition if
createReadable()called between chunk replay and event subscription - EventEmitter listener leaks if
onDonehandler fails to clean up - Authorization bypass in resume endpoint (userId check untested)
Fix: Create agents-api/src/domains/run/stream/__tests__/stream-buffer-registry.test.ts with test cases for:
register()creates new buffer entrypush()adds chunks and emits eventscomplete()marks done and schedules 60s cleanupcreateReadable()replays buffered chunks then forwards live ones- Race condition scenarios
- Resume endpoint authorization (204 vs 200 responses)
Refs:
Inline Comments:
- 🟠 Major:
stream-buffer-registry.ts:48-70Race condition between chunk replay and event subscription - 🟠 Major:
conversations.ts:387-389Error response pattern inconsistency — 204 for unauthorized vs 404 - 🟠 Major:
conversations.ts:404Resume endpoint lacks error handling around stream piping
🟡 Minor (1) 🟡
🟡 1) stream-buffer-registry.ts:10 Hardcoded cleanup delay without observability
Issue: The 60-second cleanup delay (CLEANUP_DELAY_MS) is hardcoded with no environment variable override and no logging/metrics for buffer lifecycle events.
Why: During incident debugging, understanding buffer state is difficult. Self-hosted deployments may need different cleanup windows.
Fix: Consider moving to execution-limits constants; add debug logging for register/complete/cleanup events.
🚫 REQUEST CHANGES
Summary: This PR implements the Vercel AI SDK resume pattern for stream reconnection — a valuable feature. However, there are several issues that need addressing before merge:
-
The stream buffer registry uses a module-scoped singleton that won't work across bundle boundaries — this will cause stream resumption to silently fail with 204 responses. Follow the
globalThispattern from the existingstream-registry.ts. -
Fire-and-forget async IIFE in chatDataStream.ts has no error handling — errors become unhandled promise rejections with no logging, making debugging impossible.
-
Missing test coverage for critical new functionality violates AGENTS.md requirements.
-
Race condition in
createReadable()between replaying buffered chunks and subscribing to new chunks — chunks can be permanently lost.
The architecture is sound and the feature is well-scoped. These issues are straightforward to fix.
Discarded (6)
| Location | Issue | Reason Discarded |
|---|---|---|
stream-buffer-registry.ts:3-8 |
Unbounded memory growth from Uint8Array[] accumulation | Valid concern but applies to existing stream-registry pattern too; acceptable for in-memory architecture with 60s cleanup |
stream-buffer-registry.ts:39-41 |
setTimeout keeps process alive during shutdown | Minor operational concern; .unref() is nice-to-have, not blocking |
stream-buffer-registry.ts:34-42 |
No error state distinction in complete() | The SSE stream protocol handles success/error via event content, not buffer state |
stream-buffer-registry.ts:27-32 |
push() silently ignores missing entries | Intentional defensive behavior; logging would be noisy |
chatDataStream.ts:653 |
tee() backpressure coupling | Expected behavior for resumption; buffer reader completing is the goal |
conversations.ts:396 |
Debug-level logging insufficient | Valid but nitpick; info level would be better but not blocking |
Reviewers (5)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
1 | 0 | 0 | 0 | 1 | 0 | 0 |
pr-review-sre |
6 | 1 | 0 | 0 | 1 | 0 | 4 |
pr-review-tests |
4 | 1 | 0 | 0 | 0 | 0 | 3 |
pr-review-consistency |
2 | 0 | 0 | 0 | 1 | 0 | 1 |
pr-review-errors |
5 | 0 | 0 | 0 | 2 | 0 | 3 |
| Total | 18 | 2 | 0 | 0 | 5 | 0 | 11 |
Note: Many findings were deduplicated across reviewers (e.g., error handling in IIFE flagged by both standards and errors reviewers).
| if (!conversation || conversation.userId !== endUserId) { | ||
| return new Response(null, { status: 204 }); | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Error response pattern inconsistency — 204 for unauthorized vs established 404 pattern
Issue: Returns HTTP 204 when conversation is not found or doesn't belong to the authenticated user, but the sibling endpoint in this same file (get conversation, lines 287-291) throws createApiError({ code: 'not_found' }) for the identical check.
Why: This creates two different error semantics for the same authorization failure condition within the same route file. The OpenAPI spec declares 204 only for "No active stream", not for authorization failures.
Fix: Distinguish between the two 204 cases:
| if (!conversation || conversation.userId !== endUserId) { | |
| return new Response(null, { status: 204 }); | |
| } | |
| if (!conversation || conversation.userId !== endUserId) { | |
| throw createApiError({ code: 'not_found', message: 'Conversation not found' }); | |
| } |
Refs:
- Same file, lines 287-291 — uses
createApiError
| c.header('x-vercel-ai-data-stream', 'v2'); | ||
| c.header('x-accel-buffering', 'no'); | ||
|
|
||
| return stream(c, (s) => s.pipe(readable)); |
There was a problem hiding this comment.
🟠 MAJOR: Resume endpoint lacks error handling around stream piping
Issue: The stream pipe has no try-catch, unlike every other streaming endpoint in this codebase (see executions.ts:276-289, chatDataStream.ts:475-490) which wrap stream operations in try-catch blocks with error logging and error event transmission.
Why: If the readable stream errors during transmission (e.g., EventEmitter error, client disconnect), the error propagates as an unhandled exception with no logging. No error event is written to the SSE stream, so clients see an abrupt disconnect with no indication of what went wrong.
Fix: Follow the established pattern:
| return stream(c, (s) => s.pipe(readable)); | |
| return stream(c, async (s) => { | |
| try { | |
| const reader = readable.getReader(); | |
| while (true) { | |
| const { done, value } = await reader.read(); | |
| if (done) break; | |
| await s.write(value); | |
| } | |
| } catch (error) { | |
| logger.error({ error, conversationId }, 'Error resuming conversation stream'); | |
| await s.write(`event: error\ndata: ${JSON.stringify({ error: 'Stream error' })}\n\n`); | |
| } | |
| }); |
Refs:
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: Medium
Delta Review: This review is scoped to changes in commit
329e1ec9bwhich addresses prior review feedback. The delta is +42/-17 lines across 2 files.
✅ Prior Critical Issues — ADDRESSED
The following critical issues from prior reviews have been addressed in this delta:
| Issue | Status | How Addressed |
|---|---|---|
| Module-scoped singleton not shared across bundle boundaries | ✅ Fixed | Now uses globalThis pattern matching stream-registry.ts (lines 11-19) |
| Fire-and-forget async IIFE lacks error handling | ✅ Fixed | Added try/catch with error logging (lines 664-666) |
| Race condition between chunk replay and event subscription | ✅ Fixed | Now subscribes to emitter events BEFORE replaying buffered chunks (lines 77-82) |
| Missing cancel() handler for listener cleanup | ✅ Fixed | Added cancel() handler that removes event listeners (lines 88-93) |
| Overwriting buffer leaves existing consumers hanging | ✅ Fixed | Now emits 'done' to existing subscribers before overwriting (lines 30-33) |
💭 Consider (1) 💭
Inline Comments:
- 💭 Consider:
chatDataStream.ts:664-667Error logged but buffer not marked as errored — resumed streams won't know if source terminated due to error vs completion
🕐 Pending Recommendations (3)
These issues were raised in prior reviews and remain unaddressed:
- 🟠
stream-buffer-registry.tsMissing test coverage for StreamBufferRegistry class - 🟠
conversations.ts:387-389Error response pattern inconsistency — 204 for unauthorized vs 404 - 🟠
conversations.ts:404Resume endpoint lacks error handling around stream piping
💡 APPROVE WITH SUGGESTIONS
Summary: The delta commit addresses all the critical issues raised in prior reviews — excellent work! The globalThis singleton pattern, race condition fix, cancel handler, and error handling are all properly implemented. The remaining pending recommendations from prior reviews (test coverage, error response consistency, stream piping error handling) are still valid but are not blocking. The "Consider" item about distinguishing error vs normal completion in the buffer is a minor enhancement opportunity. This PR is in good shape for merge once the pending major items (particularly test coverage) are addressed or explicitly deferred.
Discarded (0)
No findings were discarded in this delta review.
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-sre |
1 | 0 | 0 | 0 | 1 | 0 | 0 |
pr-review-tests |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
Note: This was a focused delta review. Prior review findings (test coverage, etc.) were confirmed as still pending but not re-raised per review scope guidelines.
| } catch (error) { | ||
| logger.error({ error, conversationId }, 'Error buffering stream for resumption'); | ||
| } finally { | ||
| streamBufferRegistry.complete(conversationId); |
There was a problem hiding this comment.
💭 Consider: Error logged but stream buffer not marked as errored
Issue: The catch block logs the error but completes the buffer normally via finally. Consumers resuming via createReadable() will receive all buffered chunks followed by a clean close, with no indication the stream terminated due to error.
Why: If the source stream throws after buffering partial content, the resume endpoint serves truncated data as if it were complete. The client has no signal that the response is incomplete. This is a minor concern since the SSE stream format itself typically includes terminal events that indicate completion status.
Fix: Consider adding an error(conversationId) method to StreamBufferRegistry that marks the entry as errored and emits an error event, allowing resumed streams to relay the error condition to clients. Alternatively, the Vercel AI SDK data stream format may already handle this via error events in the stream content.
Refs:
|
TL;DR — Adds in-memory stream buffering so clients that disconnect mid-response can reconnect and replay the full SSE stream from the beginning. Implements the Vercel AI SDK resume pattern via a new Key changes
Summary | 6 files | 3 commits | base:
|
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
Delta Review: This review is scoped to changes in commit
5ca41d078since the last automated review. The delta is +44/-8 lines across 2 files.
✅ Delta Changes — APPROVED
The delta adds a PoW (Proof of Work) check bypass for GET requests in web_client app authentication. This is a sound security decision:
| Aspect | Analysis |
|---|---|
| Why skip PoW for GET? | The stream resume endpoint (GET /conversations/:conversationId/stream) needs to work without PoW since the Vercel AI SDK's useChat hook calls it on mount — before any user interaction that could trigger PoW solving. |
| Security preserved | GET requests still require: valid x-inkeep-app-id, valid Origin matching allowedDomains, valid anonymous JWT (signed, not expired), and JWT app claim matching request app ID. |
| PoW intent maintained | Per the PoW spec, PoW prevents "unlimited chat messages" and "mass identity creation" — both POST operations. GET requests don't create identities, send messages, or consume LLM tokens. |
| Test coverage | New test should skip PoW for GET requests (e.g. stream resume) verifies the behavior. Existing PoW tests updated to use POST method appropriately. |
💭 Consider (1) 💭
💭 1) runAuth.ts:515 HEAD requests also safe but not included in skip
Issue: The condition method !== 'GET' means HEAD requests (another safe HTTP method) still require PoW. The codebase has precedent for treating GET and HEAD together (branchScopedDb.ts:97).
Why: Very low risk — HEAD requests aren't used by the Vercel AI SDK or this feature. If HEAD requests were ever made to web_client authenticated endpoints, they'd unexpectedly require PoW.
Fix: Optional: ['GET', 'HEAD'].includes(method) for consistency. Not blocking since HEAD isn't in scope.
🕐 Pending Recommendations (3)
These issues were raised in prior reviews and remain unaddressed (outside delta scope):
- 🟠
stream-buffer-registry.tsMissing test coverage for StreamBufferRegistry class - 🟠
conversations.ts:387-389Error response pattern inconsistency — 204 for unauthorized vs 404 - 🟠
conversations.ts:404Resume endpoint lacks error handling around stream piping
💡 APPROVE WITH SUGGESTIONS
Summary: The delta commit correctly implements PoW bypass for GET requests, enabling the stream resume endpoint to work with the Vercel AI SDK. The security analysis confirms this change aligns with PoW's intended purpose (preventing POST-based abuse) while maintaining all other authentication requirements for GET requests. The pending recommendations from prior reviews remain valid but are outside this delta's scope. 🎉
Discarded (0)
No findings were discarded in this delta review.
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-security-iam |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-tests |
1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Total | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Note: Security-IAM reviewer confirmed the change is appropriate. Tests reviewer noted a minor consideration about HEAD requests.
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
Delta Review: This review is scoped to changes in commit
abff65491which reorders the authorization logic in the resume endpoint. The delta is +24/-14 lines inconversations.ts.
🔴❗ Critical (1) ❗🔴
Inline Comments:
- 🔴 Critical:
conversations.ts:390-392Authorization bypass when conversation not found in database
💭 Consider (1) 💭
💭 1) conversations.ts:382-392 Orphaned EventEmitter listeners on failed authorization
Issue: The reordering calls createReadable(conversationId) at line 382 before validating ownership. When authorization fails at line 391, the ReadableStream's EventEmitter listeners (attached in stream-buffer-registry.ts:77-78) remain attached because cancel() is not called on the abandoned stream.
Why: Minor memory pressure for adversarial traffic (repeated unauthorized resume attempts on active streams). Impact is bounded: listeners are cleaned up automatically when the stream completes or the 60-second cleanup timer fires. The tradeoff appears intentional to optimize the happy path.
Fix: Optional — consider checking ownership with streamBufferRegistry.has() before calling createReadable(), or explicitly cancel the readable when ownership fails.
🕐 Pending Recommendations (3)
These issues were raised in prior reviews and remain unaddressed:
- 🟠
stream-buffer-registry.tsMissing test coverage for StreamBufferRegistry class - 🟠
conversations.ts:387-389Error response pattern inconsistency — 204 for unauthorized vs 404 - 🟠
conversations.ts:404Resume endpoint lacks error handling around stream piping
🚫 REQUEST CHANGES
Summary: The delta introduces a critical authorization bypass — the condition at line 390 allows streaming when the conversation is not found in the database. The fix is a one-liner: change conversation && to !conversation ||. The prior critical issues from earlier reviews have been addressed, but this new delta introduces a regression. The "Consider" item about listener leakage is minor and bounded by the 60-second cleanup.
Discarded (0)
No findings were discarded in this delta review.
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-security-iam |
1 | 0 | 0 | 0 | 1 | 0 | 0 |
pr-review-standards |
1 | 0 | 0 | 0 | 1 | 0 | 0 |
pr-review-sre |
1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Total | 3 | 0 | 1 | 0 | 1 | 0 | 0 |
Note: All three reviewers independently identified the same critical authorization issue — high convergence indicates high confidence.
| if (conversation && conversation.userId !== endUserId) { | ||
| return new Response(null, { status: 204 }); | ||
| } |
There was a problem hiding this comment.
🔴 CRITICAL: Authorization bypass when conversation not found in database
Issue: The condition conversation && conversation.userId !== endUserId returns 204 only when the conversation exists AND belongs to a different user. If conversation is null (not found in DB), the condition evaluates to false and execution proceeds to return the stream without authorization.
Why: This allows an attacker who knows a conversationId with an active stream buffer to resume the stream even if the conversation record is missing (e.g., deleted, race condition, or DB replication lag). Compare to the correct pattern at lines 410-411 which handles the null case: if (!conversation || conversation.userId !== endUserId).
Fix: (1-click apply)
| if (conversation && conversation.userId !== endUserId) { | |
| return new Response(null, { status: 204 }); | |
| } | |
| if (!conversation || conversation.userId !== endUserId) { | |
| return new Response(null, { status: 204 }); | |
| } |
Refs:
eb36d22 to
6d5adc1
Compare
There was a problem hiding this comment.
PR Review Summary
(5) Total Issues | Risk: High
🔴❗ Critical (1) ❗🔴
🔴 1) conversations.ts:390-392 Authorization bypass when conversation not found in database
Issue: The authorization check uses conversation && conversation.userId !== endUserId which only denies access when the conversation IS found AND belongs to a different user. When the conversation is NOT found in the database (conversation is null/undefined), the condition evaluates to false and execution continues to line 402, returning the stream to the caller.
Why: This enables cross-user data access: if User A starts a chat (stream buffered under conversationId), and User B calls the resume endpoint with that conversationId before the conversation is persisted to the database (or if the conversation was deleted), User B receives User A's stream. The stream buffer registry is keyed by conversationId alone without tenant isolation, amplifying the risk.
Fix:
// Change line 390 from:
if (conversation && conversation.userId !== endUserId) {
// to:
if (!conversation || conversation.userId !== endUserId) {This matches the correct authorization pattern used at line 410 in the same file.
Refs:
🟠⚠️ Major (2) 🟠⚠️
🟠 1) stream-buffer-registry.ts Missing test coverage for critical new functionality
files: stream-buffer-registry.ts (106 lines), conversations.ts:376-415
Issue: The StreamBufferRegistry class and the new resume endpoint have zero test coverage. Per AGENTS.md: "ALL new work MUST include... Unit Tests - NO EXCEPTIONS".
Why: Without tests, these regressions could go undetected:
- Memory leaks if cleanup timer fails to delete entries after 60s
- EventEmitter listener leaks if
onDonehandler fails to clean up - Authorization bypass in resume endpoint (the critical bug at line 390 would have been caught by an authorization test)
Fix: Create agents-api/src/domains/run/stream/__tests__/stream-buffer-registry.test.ts with test cases for register, push, complete, createReadable, and cleanup timeout behavior.
Refs:
🟠 2) conversations.ts:402 Resume endpoint lacks error handling around stream piping
Issue: The stream(c, (s) => s.pipe(readable)) call has no try-catch. If the readable stream errors during transmission, the error propagates as an unhandled exception with no logging, and no error event is written to the SSE stream.
Why: Clients will see an abrupt connection termination with no indication of what went wrong. Compare to chatDataStream.ts:487-492 which logs errors and writes an SSE error event.
Fix:
return stream(c, async (s) => {
try {
const reader = readable.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
await s.write(value);
}
} catch (error) {
logger.error({ error, conversationId }, 'Error resuming conversation stream');
await s.write(`event: error\ndata: ${JSON.stringify({ error: 'Stream error' })}\n\n`);
}
});Refs:
🟡 Minor (1) 🟡
🟡 1) stream-buffer-registry.ts:40 No EventEmitter maxListeners configuration
Issue: Each BufferEntry creates a new EventEmitter() with the default maxListeners limit of 10. If multiple clients simultaneously resume the same conversation stream (user opens multiple tabs), exceeding 10 concurrent resume attempts will trigger Node.js MaxListenersExceededWarning.
Why: This can mask real listener leaks in production logs.
Fix: Call emitter.setMaxListeners(0) to disable the warning, or set an explicit reasonable limit like emitter.setMaxListeners(50).
💭 Consider (1) 💭
💭 1) conversations.ts:382-392 Orphaned EventEmitter listeners on failed authorization
Issue: The code calls createReadable(conversationId) at line 382 before validating ownership. When authorization fails at line 391, the ReadableStream's EventEmitter listeners remain attached because cancel() is not called on the abandoned stream.
Why: Minor memory pressure for adversarial traffic. Impact is bounded by the 60-second cleanup timer.
Fix: Consider calling streamBufferRegistry.has(conversationId) before createReadable(), or explicitly cancel the readable when ownership fails.
🚫 REQUEST CHANGES
Summary: This PR implements a valuable stream resumption feature following the Vercel AI SDK pattern. However, there is a critical authorization bypass at line 390 that must be fixed before merge — the condition allows streaming to any user when the conversation is not found in the database. Additionally, the new functionality lacks test coverage per AGENTS.md requirements, and the stream piping lacks error handling.
The architecture is sound (globalThis singleton pattern, EventEmitter for live tailing, 60s cleanup), and prior feedback on race conditions and error handling in the buffer IIFE has been addressed. The remaining issues are straightforward to fix.
Discarded (4)
| Location | Issue | Reason Discarded |
|---|---|---|
stream-buffer-registry.ts |
Unbounded memory growth in chunks array | Valid concern but consistent with existing stream-registry pattern; acceptable for in-memory architecture with 60s cleanup. Would require significant redesign. |
conversations.ts |
Response pattern inconsistency (204 vs 404 for auth failures) | Intentional per PR description — Vercel AI SDK expects 204 to fall back to DB messages. Different semantic than GET conversation endpoint. |
conversations.ts |
SSE header casing inconsistency | Pre-existing split-world issue in codebase; not introduced by this PR. |
runAuth.ts:515 |
HEAD requests not included in PoW skip | HEAD requests not used by this feature or Vercel AI SDK. Very low risk. |
Reviewers (6)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-security-iam |
2 | 1 | 0 | 0 | 0 | 0 | 1 |
pr-review-sre |
6 | 1 | 1 | 0 | 0 | 0 | 4 |
pr-review-tests |
3 | 1 | 0 | 0 | 0 | 0 | 2 |
pr-review-errors |
3 | 1 | 0 | 0 | 0 | 0 | 2 |
pr-review-consistency |
4 | 0 | 0 | 0 | 0 | 0 | 4 |
| Total | 18 | 4 | 1 | 0 | 0 | 0 | 13 |
Note: The authorization bypass (Critical) was independently identified by security-iam and errors reviewers with HIGH confidence convergence.
…into feat/stream-resumption
|
TL;DR — Adds an in-memory stream buffer that captures SSE chunks during active conversations, enabling clients to reconnect and replay the full stream via a new Key changes
Summary | 6 files | 2 commits | base: In-memory stream buffer registry
The registry is stored on
Stream resume endpoint
The handler checks the buffer registry first to avoid a database round-trip when no stream is active. Ownership is validated by comparing the authenticated
PoW bypass for GET requests
Existing PoW tests were updated to use POST to reflect the narrowed scope, and a new
|
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: High
Re-review: This review is scoped to the delta since the last automated review and validates prior findings.
🔴❗ Critical (1) ❗🔴
Inline Comments:
- 🔴 Critical:
conversations.ts:438-439Authorization bypass when conversation not found in database — STILL UNRESOLVED
🕐 Pending Recommendations (3)
These issues were raised in prior reviews and remain unaddressed:
- 🟠
stream-buffer-registry.tsMissing test coverage for StreamBufferRegistry class — per AGENTS.md: "ALL new work MUST include... Unit Tests" - 🟠
conversations.ts:450Resume endpoint lacks error handling around stream piping — compare tochatDataStream.tspatterns that wrap streams in try-catch - 🟡
stream-buffer-registry.ts:11Hardcoded 60s cleanup delay without observability
✅ Prior Critical Issues — ADDRESSED
The following critical issues from prior reviews have been addressed:
| Issue | Status |
|---|---|
| Module-scoped singleton not shared across bundle boundaries | ✅ Fixed — now uses globalThis pattern (lines 12, 15-21) |
| Fire-and-forget async IIFE lacks error handling | ✅ Fixed — added try/catch with error logging (chatDataStream.ts:633-634) |
| Race condition between chunk replay and event subscription | ✅ Fixed — subscribes to events BEFORE replaying buffered chunks (lines 81-82) |
| Missing cancel() handler for listener cleanup | ✅ Fixed — added cancel() handler (lines 92-96) |
| Overwriting buffer leaves existing consumers hanging | ✅ Fixed — emits 'done' to existing subscribers (lines 32-35) |
💭 Consider (1) 💭
💭 1) runAuth.ts:523 HEAD requests also safe but not included in PoW skip
Issue: The condition method !== 'GET' means HEAD requests still require PoW. The codebase has precedent for treating GET and HEAD together.
Why: Very low risk — HEAD isn't used by this feature. Optional enhancement for consistency.
🚫 REQUEST CHANGES
Summary: The critical authorization bypass at line 438 remains unresolved despite being flagged in 6 prior automated reviews and confirmed by the itoqa testing bot. This is a one-line fix: change conversation && to !conversation ||. The other prior critical issues (singleton pattern, race conditions, error handling) have all been properly addressed — nice work on those!
The remaining pending recommendations (test coverage, stream error handling) are also valid but less urgent than the auth bypass. Fix the authorization logic and this PR is in good shape.
Discarded (2)
| Location | Issue | Reason Discarded |
|---|---|---|
conversations.ts:450 |
Using s.pipe() instead of manual reader loop | Both patterns are valid; pipe() is more concise and the error handling concern was already raised in pending recommendations |
stream-buffer-registry.ts |
Unbounded memory growth in chunks array | Valid concern but consistent with existing stream-registry pattern; acceptable for in-memory architecture with 60s cleanup |
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator |
5 | 0 | 1 | 0 | 1 | 3 | 2 |
| Total | 5 | 0 | 1 | 0 | 1 | 3 | 2 |
Note: This was a focused re-review validating prior findings. The authorization bypass (Critical) was already identified by prior reviews but remains unfixed.
| }); | ||
|
|
||
| if (conversation && conversation.userId !== endUserId) { | ||
| return new Response(null, { status: 204 }); |
There was a problem hiding this comment.
🔴 CRITICAL: Authorization bypass when conversation not found in database
Issue: The condition conversation && conversation.userId !== endUserId only denies access when the conversation IS found AND belongs to a different user. When conversation is null/undefined (not found in DB), the condition evaluates to false and execution proceeds to line 450, returning the stream.
Why: This enables cross-user data leakage: if User A starts a chat (stream buffered under conversationId), and User B calls the resume endpoint with that conversationId before the conversation is persisted to the database (or after deletion), User B receives User A's stream. The itoqa test bot confirmed this vulnerability (ADV-1 test failed).
Fix: (1-click apply)
| return new Response(null, { status: 204 }); | |
| if (!conversation || conversation.userId !== endUserId) { | |
| return new Response(null, { status: 204 }); | |
| } |
This matches the correct authorization pattern used at line 458 in the same file.
Refs:
- Line 458 — correct pattern
- itoqa test report confirming cross-user access
Ito Test Report ❌14 test cases ran. 4 failed, 10 passed. Across 14 test cases, 10 passed and 4 failed, with strong coverage confirming that run-stream resumption behaves correctly in core and edge scenarios (replay plus live tailing, concurrent consumers, immediate post-completion replay, TTL-based 204 after buffer cleanup, and reconnect burst stability), while GET conversation listing without PoW, origin enforcement on resume, and conversationId input hardening all worked as intended. The most important issues were two High-severity backend defects—non-GET PoW challenge solution replay being accepted on repeated POST /run/api/chat requests and context validation masking client header errors as 500 internal_server_error—and two Medium-severity playground continuity bugs where conversation IDs are regenerated/reset on reload or remount, causing lost resume/history in mobile reload and rapid dual-tab navigation stress flows. ❌ Failed (4)
|
| Category | Summary | Screenshot |
|---|---|---|
| Adversarial | Origin validation denied missing/disallowed origins and allowed approved origin on resume GET. | ![]() |
| Adversarial | ConversationId injection payloads returned safe non-5xx responses without error disclosure. | ![]() |
| Edge | Re-run confirmed concurrent resume consumers both received full ordered replay with terminal completion. | ![]() |
| Edge | Immediate resume after completion returned replay data on a valid conversation stream. | ![]() |
| Edge | Completion-based TTL validation matched intended cleanup behavior (resume returned empty after delay). | ![]() |
| Edge | Rapid reconnect burst completed without 5xx errors and final resume/detail checks succeeded. | ![]() |
| Happy-path | Resume stream replayed pre-disconnect frames and tailed to completion as expected. | ![]() |
| Happy-path | Resume endpoint returned 204 for a conversation with no active stream buffer. | ![]() |
| Happy-path | Conversation list GET succeeded without PoW header, matching GET auth behavior. | ![]() |
| Happy-path | After TTL expiry, resume returned 204 while conversation detail still returned persisted assistant content. | ![]() |
Commit: 03d9b64
Tell us how we did: Give Ito Feedback





















Summary
Implements the Vercel AI SDK resume pattern using an in-memory stream buffer, so users who disconnect mid-stream can return and receive the response seamlessly.
stream-buffer-registry: new module that buffers encoded SSE chunks perconversationIdin memory using astring[]+EventEmitter. On resume, replays historical chunks then forwards live ones.chatDataStream: tees the encoded SSE stream — one copy to the HTTP client, one to the buffer registry. Cleans up automatically when the stream completes.GET /run/v1/conversations/:conversationId/stream: new endpoint the Vercel AI SDKuseChatcalls on mount. Returns the buffered stream if one is active, or204if not (causing the SDK to fall back to loaded DB messages).Closes PRD-6312