fix: Claude causeSquash crash, Codex accept enum, bindings persistence#17
fix: Claude causeSquash crash, Codex accept enum, bindings persistence#17ranvier2d2 wants to merge 10 commits intomainfrom
Conversation
…thread visibility P2: Add persist_binding to cursor_session.ex and opencode_session.ex so resume works for all harness providers, not just Codex. Fix normalize_resume_cursor to re-encode Cursor/OpenCode cursors as JSON strings (their session modules call Jason.decode on the resumeCursor). P0: Add concurrency option to DrainableWorker (default 1, set to 8 in ProviderCommandReactor). Independent thread operations now run in parallel — 12 concurrent session starts no longer timeout sequentially. P1: Stress test reuses bootstrapProjectId from welcome payload instead of creating a separate project, so threads persist in the main sidebar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…inistic wait - Reject all pending promises on schema error (id="unknown"), not just oldest - Drop ephemeral port from OpenCode persist_binding (stale after restart) - Replace sleep(5000) with deterministic wait polling sessionStopped tracker Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
processSessionStopRequested calls providerService.stopSession() which for Claude interrupts the stream fiber via Fiber.interrupt(). This interruption propagated through processDomainEventSafely (which re-throws interrupt-only causes) and killed the DrainableWorker and the entire Node server. Fix: wrap stopSession in Effect.uninterruptible + Effect.ignore so the stop completes without propagating the fiber interrupt upward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClaudeAdapter.stopSessionInternal used yield* Fiber.interrupt(streamFiber) which joins the dying fiber and propagates the interruption Exit up through the fiber tree, crashing the entire Node server. Changed to forkChild so the interrupt fires without joining. Also added --exclude <provider> flag to stress test script for isolating provider-specific issues during debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex CLI sends exec_approval_request even with approvalPolicy: "never" on resumed sessions (especially when the model changes, e.g. gpt-5.3-codex → gpt-5.4). The harness now auto-approves all non-user-input RPC requests in full-access runtime mode, matching OpenCode's auto-approve pattern. This was the root cause of the Codex resume test failure — the turn hung waiting for an approval response that never came because the event was unmapped in the Node layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex CLI's CommandExecutionRequestApprovalResponse serde deserializer expects the variant "approve" (without trailing 'd'). Our auto-approve was sending "approved" which caused: "failed to deserialize: unknown variant 'approved'" — the approval was rejected by the sandbox. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Codex app-server protocol defines CommandExecutionRequestApprovalResponse as a Rust serde enum with variants: accept, acceptForSession, decline, cancel. Neither "approve" nor "approved" are valid — the deserializer rejects both. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
oxfmt --check flagged docs/index.html, docs/pitch.html, and scripts/stress-test-resume-multi.ts. Also added a comment explaining why the auto-approve branch emits request/resolved without a prior request.opened event — intentional since auto-approved requests bypass state.pending entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ClaudeAdapter: add guardedAsyncIterable wrapper that short-circuits
the Claude SDK async iterator to {done: true} when stopped, preventing
query.close() errors from reaching causeSquash. Remove explicit
Fiber.interrupt (redundant and harmful).
- ProviderService: add catchCause + hasInterruptsOnly guard on stream
consumer fork as defense-in-depth.
- opencode_session.ex: add H4 timing instrumentation (sse_connected_at,
first_permission_asked_at, [H4-TIMING] log markers).
- scripts/stress-test-hypotheses.ts: H1-H4 hypothesis test harness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rash
Queue.shutdown produces an exitInterrupt cause, but Stream.toAsyncIterable
only handles Done causes gracefully (Pull.isDoneCause check at Stream.js:6873).
Interrupt causes fall through to `throw Cause.squash(exit.cause)` — a raw JS
throw that bypasses all Effect error handlers and crashes the process.
Queue.end produces an exitFailDone cause which toAsyncIterable recognizes and
converts to {done: true}, ending the stream cleanly.
Changes:
- promptQueue: Queue<T> → Queue<T, Cause.Done>, shutdown → end
- runtimeEventQueue: same treatment
- 4/4 providers now pass resume E2E (including Claude)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR introduces auto-approval logic for Codex full-access mode, adds resume binding persistence for Cursor and OpenCode sessions, implements SSE connection timing tracking with permission-timing events, normalizes resume cursors as JSON strings for downstream systems, enables concurrent domain-event processing, improves queue shutdown semantics in the Claude adapter, and adds comprehensive stress-test scripts validating multiple hypothesis flows. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
docs/pitch.html (2)
756-1167: Consider adding accessibility attributes to the SVG diagram.The inline SVG diagram lacks accessibility metadata. For users relying on screen readers, adding a
<title>androle="img"oraria-hidden="true"(if purely decorative) would improve accessibility.♿ Suggested accessibility improvement
- <svg class="svg-boundary" viewBox="0 0 680 490"> + <svg class="svg-boundary" viewBox="0 0 680 490" role="img" aria-labelledby="arch-diagram-title"> + <title id="arch-diagram-title">Architecture diagram showing Node server and OTP engine layers</title> <defs>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/pitch.html` around lines 756 - 1167, The SVG diagram lacks accessibility metadata; add a concise <title> describing the diagram and set appropriate accessibility attributes on the <svg> element (e.g., role="img" and aria-labelledby referencing the <title> id, or aria-hidden="true" if it's purely decorative). Locate the <svg class="svg-boundary" ...> element and insert a <title id="..."> with a short description, and ensure the svg has either role="img" and aria-labelledby="..." or aria-hidden="true" depending on whether the graphic conveys content.
1239-1263: Add explicittype="button"to interactive buttons.Without
type="button", buttons default totype="submit"which could cause unexpected form submission behavior if the markup is ever placed inside a form.🔧 Suggested fix
- <button class="mpill active-node" id="btn-node" onclick="setMode('node')"> + <button type="button" class="mpill active-node" id="btn-node" onclick="setMode('node')"> Node — shared V8 heap </button> - <button class="mpill" id="btn-otp" onclick="setMode('otp')"> + <button type="button" class="mpill" id="btn-otp" onclick="setMode('otp')"> OTP — per-process heaps </button>- <button class="crash-btn" id="crash-btn" onclick="crashSession()">Crash session 3</button> - <button class="reset-btn" onclick="resetAll()">Reset</button> + <button type="button" class="crash-btn" id="crash-btn" onclick="crashSession()">Crash session 3</button> + <button type="button" class="reset-btn" onclick="resetAll()">Reset</button>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/pitch.html` around lines 1239 - 1263, Buttons in this snippet (e.g., elements with ids "btn-node", "btn-otp", "crash-btn" and the reset button that calls resetAll()) lack an explicit type and will default to type="submit"; update each <button> element to include type="button" to prevent accidental form submission—specifically add type="button" to the buttons that call setMode('node')/setMode('otp'), crashSession(), and resetAll().scripts/stress-test-resume-multi.ts (1)
200-202: Push listeners accumulate without cleanup.
onPushappends listeners to an array that's never cleared. In this bounded test script, this is fine since the client connects once and disconnects at the end. However, if theT3Clientwere reused across tests, listeners would accumulate.Optional: Return unsubscribe function
onPush(listener: (event: PushEvent) => void) { this.pushListeners.push(listener); + return () => { + const idx = this.pushListeners.indexOf(listener); + if (idx >= 0) this.pushListeners.splice(idx, 1); + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/stress-test-resume-multi.ts` around lines 200 - 202, The onPush method in T3Client currently appends listeners to pushListeners without a way to remove them; change onPush to return an unsubscribe function (or add a corresponding offPush method) that removes the specific listener from this.pushListeners so listeners can be deregistered when a client is reused; update the T3Client method named onPush and any call sites to use the returned unsubscribe or call offPush to avoid accumulation.scripts/stress-test-hypotheses.ts (2)
92-194: Consider extracting shared T3Client to a common module.The
T3Clientclass is nearly identical betweenstress-test-resume-multi.tsandstress-test-hypotheses.ts. Both scripts would benefit from a shared module to avoid drift and reduce maintenance burden.Suggested extraction
Create
scripts/lib/t3-client.ts:// scripts/lib/t3-client.ts import { WebSocket } from "ws"; import crypto from "node:crypto"; export interface PendingRequest { ... } export interface PushEvent { ... } export class T3Client { ... }Then import in both scripts:
import { T3Client } from "./lib/t3-client.ts";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/stress-test-hypotheses.ts` around lines 92 - 194, The T3Client class is duplicated across stress-test-resume-multi.ts and stress-test-hypotheses.ts; extract it to a shared module (e.g., scripts/lib/t3-client.ts) exporting the PendingRequest and PushEvent interfaces and the T3Client class, preserving implementations of connect, send, dispatch, onPush, and disconnect, then replace the local class declarations in both scripts with an import { T3Client, PendingRequest, PushEvent } from "./lib/t3-client"; to avoid drift and centralize maintenance.
243-275: Push listeners accumulate across hypothesis tests.Each hypothesis test (H1, H2, H3, H4) calls
client.onPush()to register a listener, but these listeners are never removed. When runningallhypotheses, the client accumulates four push handlers that all receive every event.While the threadId filtering prevents cross-contamination between tests, this wastes cycles processing events for already-completed tests. For this bounded test script it's acceptable, but consider clearing listeners between tests if this pattern is reused.
Also applies to: 419-442, 582-601, 748-779
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/stress-test-hypotheses.ts` around lines 243 - 275, The push listeners registered with client.onPush(...) accumulate across hypothesis runs; capture the listener callback you pass to client.onPush (or the unsubscribe function returned by client.onPush, if it returns one) and unregister it at the end of each hypothesis run so handlers are removed between tests—e.g., store the listener function passed to client.onPush or the returned disposer and call client.offPush(listener) or disposer() respectively when the test completes (apply the same change for the other occurrences around the symbols shown at the top and the other blocks mentioned).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/harness/lib/harness/providers/opencode_session.ex`:
- Line 160: persist_binding/1 currently may raise via
Harness.Storage.upsert_binding/3 (which can fail inside Sqlite3.step/2) after
session/ready is emitted; wrap the call to persist_binding(state) to catch
exceptions and handle error returns: change the caller to pattern-match on an
{:ok, ...} / {:error, reason} or rescue exceptions, log the failure with context
(including the exception/reason and session id), and avoid leaving the session
marked ready on persistence failure (either mark not ready, retry, or emit an
error state); update the persist_binding/1 helper to return {:ok, _} on success
and {:error, reason} on failure (catch/rescue Sqlite3 errors) so callers can
handle failures deterministically.
---
Nitpick comments:
In `@docs/pitch.html`:
- Around line 756-1167: The SVG diagram lacks accessibility metadata; add a
concise <title> describing the diagram and set appropriate accessibility
attributes on the <svg> element (e.g., role="img" and aria-labelledby
referencing the <title> id, or aria-hidden="true" if it's purely decorative).
Locate the <svg class="svg-boundary" ...> element and insert a <title id="...">
with a short description, and ensure the svg has either role="img" and
aria-labelledby="..." or aria-hidden="true" depending on whether the graphic
conveys content.
- Around line 1239-1263: Buttons in this snippet (e.g., elements with ids
"btn-node", "btn-otp", "crash-btn" and the reset button that calls resetAll())
lack an explicit type and will default to type="submit"; update each <button>
element to include type="button" to prevent accidental form
submission—specifically add type="button" to the buttons that call
setMode('node')/setMode('otp'), crashSession(), and resetAll().
In `@scripts/stress-test-hypotheses.ts`:
- Around line 92-194: The T3Client class is duplicated across
stress-test-resume-multi.ts and stress-test-hypotheses.ts; extract it to a
shared module (e.g., scripts/lib/t3-client.ts) exporting the PendingRequest and
PushEvent interfaces and the T3Client class, preserving implementations of
connect, send, dispatch, onPush, and disconnect, then replace the local class
declarations in both scripts with an import { T3Client, PendingRequest,
PushEvent } from "./lib/t3-client"; to avoid drift and centralize maintenance.
- Around line 243-275: The push listeners registered with client.onPush(...)
accumulate across hypothesis runs; capture the listener callback you pass to
client.onPush (or the unsubscribe function returned by client.onPush, if it
returns one) and unregister it at the end of each hypothesis run so handlers are
removed between tests—e.g., store the listener function passed to client.onPush
or the returned disposer and call client.offPush(listener) or disposer()
respectively when the test completes (apply the same change for the other
occurrences around the symbols shown at the top and the other blocks mentioned).
In `@scripts/stress-test-resume-multi.ts`:
- Around line 200-202: The onPush method in T3Client currently appends listeners
to pushListeners without a way to remove them; change onPush to return an
unsubscribe function (or add a corresponding offPush method) that removes the
specific listener from this.pushListeners so listeners can be deregistered when
a client is reused; update the T3Client method named onPush and any call sites
to use the returned unsubscribe or call offPush to avoid accumulation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 49dfa836-9976-4fd9-bee4-b4880a6712b0
📒 Files selected for processing (14)
apps/harness/lib/harness/providers/codex_session.exapps/harness/lib/harness/providers/cursor_session.exapps/harness/lib/harness/providers/opencode_session.exapps/harness/lib/harness/session_manager.exapps/harness/test/harness/storage_test.exsapps/server/src/orchestration/Layers/ProviderCommandReactor.tsapps/server/src/provider/Layers/ClaudeAdapter.tsapps/server/src/provider/Layers/ProviderService.tsdocs/index.htmldocs/pitch.htmlpackages/shared/src/DrainableWorker.test.tspackages/shared/src/DrainableWorker.tsscripts/stress-test-hypotheses.tsscripts/stress-test-resume-multi.ts
| }) | ||
|
|
||
| emit_event(state, :session, "session/ready", %{}) | ||
| persist_binding(state) |
There was a problem hiding this comment.
Consider handling persist_binding/1 failure.
Harness.Storage.upsert_binding/3 returns :ok on success, but if the underlying Sqlite3.step/2 raises an exception, the session will appear ready (since session/ready was already emitted) while the binding wasn't persisted. This causes silent resume failures later.
Consider logging or handling the result:
🛡️ Suggested fix
emit_event(state, :session, "session/ready", %{})
- persist_binding(state)
+ case persist_binding(state) do
+ :ok -> :ok
+ {:error, reason} ->
+ Logger.warning("Failed to persist OpenCode binding for thread #{state.thread_id}: #{inspect(reason)}")
+ endAnd update the helper to catch exceptions:
defp persist_binding(state) do
cursor_json =
Jason.encode!(%{
"threadId" => state.thread_id,
"sessionId" => state.opencode_session_id
})
- Harness.Storage.upsert_binding(state.thread_id, state.provider, cursor_json)
+ try do
+ Harness.Storage.upsert_binding(state.thread_id, state.provider, cursor_json)
+ rescue
+ e -> {:error, Exception.message(e)}
+ end
end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/harness/lib/harness/providers/opencode_session.ex` at line 160,
persist_binding/1 currently may raise via Harness.Storage.upsert_binding/3
(which can fail inside Sqlite3.step/2) after session/ready is emitted; wrap the
call to persist_binding(state) to catch exceptions and handle error returns:
change the caller to pattern-match on an {:ok, ...} / {:error, reason} or rescue
exceptions, log the failure with context (including the exception/reason and
session id), and avoid leaving the session marked ready on persistence failure
(either mark not ready, retry, or emit an error state); update the
persist_binding/1 helper to return {:ok, _} on success and {:error, reason} on
failure (catch/rescue Sqlite3 errors) so callers can handle failures
deterministically.
Summary
"approve"→"accept"per official Codex app-server protocol (Rust serde rejects both"approve"and"approved")Queue.shutdown→Queue.endonpromptQueueandruntimeEventQueue. Root cause:Queue.shutdownproducesexitInterruptwhichStream.toAsyncIterabledoes NOT handle — falls through tothrow Cause.squash(), a raw JS throw bypassing all Effect error handlers.Queue.endproduces a Done cause thattoAsyncIterablerecognizes and converts to{done: true}gracefullyguardedAsyncIterablewraps Claude SDK async iterator to absorb errors when stopped.catchCause+hasInterruptsOnlyguard on ProviderService stream consumer[H4-TIMING]log markers +h4/permission-timingevent inopencode_session.exfor SSE→permission gap measurementscripts/stress-test-hypotheses.ts— H1-H4 scriptable tests via WebSocketTest plan
tsc --noEmitpassesNote
Recommend squash merge — branch has 10 commits including iterative fix exploration (forkChild → forkDetach → remove interrupt → guardedAsyncIterable → Queue.end).
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Performance
Tests
Documentation