Skip to content

Merge upstream: Add WebSocket disconnect recovery and slow RPC toast UX (#1730)#49

Merged
aaditagrawal merged 4 commits intomainfrom
merge/upstream-1730-ws-disconnect-recovery
Apr 5, 2026
Merged

Merge upstream: Add WebSocket disconnect recovery and slow RPC toast UX (#1730)#49
aaditagrawal merged 4 commits intomainfrom
merge/upstream-1730-ws-disconnect-recovery

Conversation

@aaditagrawal
Copy link
Copy Markdown
Owner

@aaditagrawal aaditagrawal commented Apr 5, 2026

What

Cherry-picks upstream commit f2cd53f2 (PR #1730) onto the fork.

Upstream changes

Adds WebSocket disconnect recovery and slow RPC toast UX:

  • WebSocketConnectionSurface -- New component that shows connection status (disconnected/reconnecting/slow) with toast notifications
  • wsConnectionState -- State machine tracking WS connection lifecycle (connected/disconnected/reconnecting)
  • requestLatencyState -- Tracks RPC request latency to surface "slow" warnings
  • transportError -- Structured transport error types
  • wsTransport.ts -- Major refactor to session-based architecture with reconnect support, retry loops for streams, and graceful disconnect handling
  • orchestrationRecovery.ts -- Hooks into reconnect for orchestration replay
  • __root.tsx -- Integrates connection surface into the app shell

Conflict resolution

  • wsTransport.ts (7 conflict regions) -- Adopted upstream's session-based architecture replacing the old per-instance runtime pattern. Fork had no unique changes to these code paths.
  • wsTransport.test.ts (2 conflict regions) -- Merged upstream's new test utilities and cleanup hooks.

Additional changes

Verification

  • bun typecheck passes (all 7 packages)
  • bun fmt and bun lint clean
  • All new tests pass: wsTransport (12), wsConnectionState (5), requestLatencyState (4), transportError (3), WebSocketConnectionSurface (3) -- 27 tests total

Summary by CodeRabbit

  • New Features

    • Full-screen connection overlay with retry countdown, contextual icon, manual Retry, and expandable connection details
    • Background coordinators for smarter reconnects (online/focus) and toast lifecycle
    • Slow-RPC warning toasts when server acknowledgements are delayed
  • Bug Fixes

    • Transport-related connection errors are sanitized so transient network errors no longer surface as thread error details or copyable messages
  • Quality

    • Expanded test coverage for connection, reconnection, and latency scenarios

juliusmarminge and others added 2 commits April 5, 2026 10:06
Co-authored-by: Hwanseo Choi <hwanseoc@nvidia.com>
Co-authored-by: codex <codex@users.noreply.github.com>
This commit depends on clientTracing from upstream pingdotgg#1739 (OTLP trace proxy),
which is in a separate PR (#48). This stub provides the required exports
so this PR compiles independently. The real implementation from PR #48
will replace this stub when merged.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d656d71-abda-49fb-b4e3-e04cd38d09d7

📥 Commits

Reviewing files that changed from the base of the PR and between 3da2cae and 7594027.

📒 Files selected for processing (5)
  • apps/web/src/components/ChatView.browser.tsx
  • apps/web/src/components/KeybindingsToast.browser.tsx
  • apps/web/src/components/settings/SettingsPanels.browser.tsx
  • apps/web/src/nativeApi.ts
  • apps/web/src/wsNativeApi.ts

📝 Walkthrough

Walkthrough

Implements comprehensive WebSocket/reconnect infrastructure: connection state atom, transport session management with reconnect/resubscribe and reconnection UI, RPC request-latency tracking, transport-error classification/sanitization, wiring for resubscribe recovery, and test + API signature updates to surface resubscribe hooks.

Changes

Cohort / File(s) Summary
WebSocket Connection State
apps/web/src/rpc/wsConnectionState.ts, apps/web/src/rpc/wsConnectionState.test.ts
Adds typed connection status atom, UI/reconnect phases, backoff constants, APIs to record attempts/open/error/close, browser-online handling, exhaustion helper, and hook for UI consumption.
Transport / Protocol Instrumentation
apps/web/src/rpc/protocol.ts, apps/web/src/rpc/transportError.ts, apps/web/src/rpc/transportError.test.ts
Wraps socket constructor with connection-attempt instrumentation; tracks/acknowledges RPC requests and clears tracked requests on protocol errors; adds transport-error detection and sanitization utilities with tests.
Session-Based Transport & Client
apps/web/src/wsTransport.ts, apps/web/src/wsTransport.test.ts, apps/web/src/wsRpcClient.ts
Refactors to session-based transport with ManagedRuntime per session, subscription loop with restart/resubscribe and optional onResubscribe hook, adds reconnect() and dispose() semantics, and forwards subscription options through client APIs; extensive test updates.
RPC Request Latency Tracking
apps/web/src/rpc/requestLatencyState.ts, apps/web/src/rpc/requestLatencyState.test.ts
New module to track pending RPC requests, promote slow requests after 2.5s, cap tracked entries (256), provide ack/clear/reset APIs and a hook for UI.
Connection UI & Coordinators
apps/web/src/components/WebSocketConnectionSurface.tsx, apps/web/src/components/WebSocketConnectionSurface.logic.test.ts
Adds UI surface and coordinators: blocking overlay when config missing, reconnect/toast coordinator that listens to online/focus and triggers forced reconnects, and slow-ack toast coordinator. Exports shouldAutoReconnect.
Root Layout & Recovery Wiring
apps/web/src/routes/__root.tsx, apps/web/src/orchestrationRecovery.ts
Wraps root with new coordinators and surface; replaces sequence-gap helper with runReplayRecovery(reason) and adds "resubscribe" recovery reason and resubscribe-triggered replay paths.
Chat/Error Sanitization & Store Mapping
apps/web/src/components/ChatView.tsx, apps/web/src/store.ts, apps/web/src/components/ui/toast.tsx
Routes thread error messages through sanitizeThreadErrorMessage before persistence/caching; adds hideCopyButton?: boolean to ThreadToastData and conditionally renders copy button; updates wrapping classes.
Native API / IPC / Tests
apps/web/src/wsNativeApi.ts, apps/web/src/wsNativeApi.test.ts, apps/web/src/nativeApi.ts, apps/web/src/wsNativeApi.ts, packages/contracts/src/ipc.ts, apps/web/src/wsNativeApi.test.ts
Makes test API resets async and include request-latency/connection-state resets; updates createWsNativeApi().orchestration.onDomainEvent and IPC contract to accept optional { onResubscribe?: () => void } and forwards options through transport; updates various tests to await resets.
Misc Tests / Harness Adjustments
apps/web/src/components/*.browser.tsx, apps/web/src/wsTransport.test.ts
Updates test setup/teardown to await async native API resets, adds MockWebSocket error emitter, and extends integration tests for reconnect/resubscribe and slow-ack tracking.

Sequence Diagram(s)

sequenceDiagram
    actor Browser
    participant WsTransport as WsTransport
    participant WsConnectionState as WsConnectionState
    participant RpcProtocol as RpcProtocol
    participant SlowRpcTracker as SlowRpcTracker
    participant ToastCoordinator as ToastCoordinator

    Browser->>WsTransport: start connection
    WsTransport->>WsConnectionState: recordWsConnectionAttempt()
    WsTransport->>RpcProtocol: socket opens

    alt open succeeded
        RpcProtocol->>WsConnectionState: recordWsConnectionOpened()
        WsConnectionState->>ToastCoordinator: update (connected)
    else open failed / error
        RpcProtocol->>WsConnectionState: recordWsConnectionErrored(msg)
        WsConnectionState->>ToastCoordinator: update (error/retrying)
        ToastCoordinator->>Browser: show retry/offline toast
    end

    Browser->>RpcProtocol: send RPC request(id, tag)
    RpcProtocol->>SlowRpcTracker: trackRpcRequestSent(id, tag)
    SlowRpcTracker->>SlowRpcTracker: schedule 2.5s threshold

    alt ack before threshold
        RpcProtocol->>SlowRpcTracker: acknowledgeRpcRequest(id)
        SlowRpcTracker->>ToastCoordinator: clear slow toast if needed
    else no ack
        SlowRpcTracker->>SlowRpcTracker: mark as slow
        SlowRpcTracker->>ToastCoordinator: show slow-request toast
    end

    alt disconnect occurs
        RpcProtocol->>WsConnectionState: recordWsConnectionClosed(code)
        WsConnectionState->>WsTransport: compute nextRetryAt / waiting
        WsTransport->>WsTransport: reconnect() after delay
        WsTransport->>RpcProtocol: new socket open (resubscribe)
        RpcProtocol->>WsConnectionState: recordWsConnectionOpened()
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰 I hopped where sockets blink and hum,

Reconnects bounce back, the queues go “thrum,”
Slow acks I note, then sound a drum,
Resubscribe seeds make systems come—
Tiny whiskers, big nets, hop, hop, woo-hoo!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: merging upstream commits that add WebSocket disconnect recovery and slow RPC toast UX improvements.
Description check ✅ Passed The description includes all required sections: What Changed (detailed upstream changes), Why (provides context), and additional notes on conflict resolution and verification. The PR explains both the scope and specific implementations clearly.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch merge/upstream-1730-ws-disconnect-recovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. size:XXL 1,000+ effective changed lines (test files excluded in mixed PRs). labels Apr 5, 2026
Replace clientTracing stub with real implementation from #48 and
reconcile wsTransport changes from both OTLP tracing (#48) and
WS disconnect recovery (#49).
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
apps/web/src/rpc/transportError.test.ts (1)

5-24: Consider adding edge case tests for robustness.

The current tests cover the main scenarios well. For completeness, consider adding tests for edge cases that the implementation handles:

  • null / undefined inputs to both functions
  • Empty string or whitespace-only inputs
  • Case variations (the regex patterns use /i flag)
💡 Optional: Additional edge case tests
+  it("handles null/undefined inputs", () => {
+    expect(isTransportConnectionErrorMessage(null)).toBe(false);
+    expect(isTransportConnectionErrorMessage(undefined)).toBe(false);
+    expect(sanitizeThreadErrorMessage(null)).toBeNull();
+    expect(sanitizeThreadErrorMessage(undefined)).toBeNull();
+  });
+
+  it("handles empty/whitespace strings", () => {
+    expect(isTransportConnectionErrorMessage("")).toBe(false);
+    expect(isTransportConnectionErrorMessage("   ")).toBe(false);
+    expect(sanitizeThreadErrorMessage("")).toBe("");
+  });
+
+  it("matches case-insensitively", () => {
+    expect(isTransportConnectionErrorMessage("socketcloseerror: 1006")).toBe(true);
+    expect(isTransportConnectionErrorMessage("SOCKETOPENERROR: TIMEOUT")).toBe(true);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/rpc/transportError.test.ts` around lines 5 - 24, Add edge-case
unit tests for isTransportConnectionErrorMessage and sanitizeThreadErrorMessage:
include assertions for null and undefined inputs, empty string and
whitespace-only strings, and case-variation inputs (e.g., mixed/capitalized
versions of known transport messages) to verify the functions handle these
inputs consistently; locate tests near the existing describe("transportError")
block and add cases that expect boolean results for
isTransportConnectionErrorMessage and expect sanitized string, original string,
or null for sanitizeThreadErrorMessage as appropriate.
apps/web/src/wsRpcClient.ts (1)

20-22: Reuse the contract's subscription-options type here.

The same onResubscribe shape now lives in both packages/contracts/src/ipc.ts, Lines 197-202 and this local interface. Deriving it from NativeApi keeps future option changes to one edit instead of two and avoids drift between the public contract and the transport-facing client.

Suggested refactor
-interface StreamSubscriptionOptions {
-  readonly onResubscribe?: () => void;
-}
+type StreamSubscriptionOptions = NonNullable<
+  Parameters<NativeApi["orchestration"]["onDomainEvent"]>[1]
+>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/wsRpcClient.ts` around lines 20 - 22, Replace the local
StreamSubscriptionOptions declaration with the subscription options type from
the public contract so the shape is single-sourced: import the contract type
from NativeApi (or the exact subscription/options export inside it) and use that
type (e.g., type StreamSubscriptionOptions = /* contract's subscription/options
type */) instead of redefining onResubscribe locally; update any call sites that
referenced the local interface to use the imported/aliased type (refer to
StreamSubscriptionOptions and NativeApi to locate the code to change).
apps/web/src/wsTransport.test.ts (1)

676-692: Consider awaiting the dispose call for deterministic cleanup.

The test calls void WsTransport.prototype.dispose.call(transport) without awaiting. While this tests the synchronous disposal flag behavior, the comment at line 692 indicates intentional non-awaiting. This is fine for testing the immediate flag set, but be aware that callOrder assertions at line 704 depend on waitFor polling rather than deterministic sequencing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/wsTransport.test.ts` around lines 676 - 692, The test invokes
WsTransport.prototype.dispose via "void
WsTransport.prototype.dispose.call(transport)" without awaiting, causing
non-deterministic async cleanup; change the test to await the disposal (e.g.
make the test async and replace the void call with "await
WsTransport.prototype.dispose.call(transport)") so the runtime.dispose and
runPromise work complete deterministically before asserting callOrder (and then
you can simplify/remove the waitFor polling if desired). Ensure you update the
test function signature to async and reference WsTransport.prototype.dispose,
transport.session.runtime.dispose, and transport.session.runtime.runPromise
where relevant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/wsNativeApi.ts`:
- Around line 11-16: The test reset helper __resetWsNativeApiForTests currently
calls __resetWsRpcClientForTests() without awaiting it; since
__resetWsRpcClientForTests is async and only clears the shared client after
disposal completes, change __resetWsNativeApiForTests to await
__resetWsRpcClientForTests() so disposal finishes before proceeding to
resetRequestLatencyStateForTests(), resetServerStateForTests(), and
resetWsConnectionStateForTests(); update the function signature if necessary to
be async to accommodate the await.

---

Nitpick comments:
In `@apps/web/src/rpc/transportError.test.ts`:
- Around line 5-24: Add edge-case unit tests for
isTransportConnectionErrorMessage and sanitizeThreadErrorMessage: include
assertions for null and undefined inputs, empty string and whitespace-only
strings, and case-variation inputs (e.g., mixed/capitalized versions of known
transport messages) to verify the functions handle these inputs consistently;
locate tests near the existing describe("transportError") block and add cases
that expect boolean results for isTransportConnectionErrorMessage and expect
sanitized string, original string, or null for sanitizeThreadErrorMessage as
appropriate.

In `@apps/web/src/wsRpcClient.ts`:
- Around line 20-22: Replace the local StreamSubscriptionOptions declaration
with the subscription options type from the public contract so the shape is
single-sourced: import the contract type from NativeApi (or the exact
subscription/options export inside it) and use that type (e.g., type
StreamSubscriptionOptions = /* contract's subscription/options type */) instead
of redefining onResubscribe locally; update any call sites that referenced the
local interface to use the imported/aliased type (refer to
StreamSubscriptionOptions and NativeApi to locate the code to change).

In `@apps/web/src/wsTransport.test.ts`:
- Around line 676-692: The test invokes WsTransport.prototype.dispose via "void
WsTransport.prototype.dispose.call(transport)" without awaiting, causing
non-deterministic async cleanup; change the test to await the disposal (e.g.
make the test async and replace the void call with "await
WsTransport.prototype.dispose.call(transport)") so the runtime.dispose and
runPromise work complete deterministically before asserting callOrder (and then
you can simplify/remove the waitFor polling if desired). Ensure you update the
test function signature to async and reference WsTransport.prototype.dispose,
transport.session.runtime.dispose, and transport.session.runtime.runPromise
where relevant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 452376d7-8742-4b6c-b53c-1be6b7710cbf

📥 Commits

Reviewing files that changed from the base of the PR and between fd42c42 and 3da2cae.

📒 Files selected for processing (20)
  • apps/web/src/components/ChatView.tsx
  • apps/web/src/components/WebSocketConnectionSurface.logic.test.ts
  • apps/web/src/components/WebSocketConnectionSurface.tsx
  • apps/web/src/components/ui/toast.tsx
  • apps/web/src/orchestrationRecovery.ts
  • apps/web/src/routes/__root.tsx
  • apps/web/src/rpc/protocol.ts
  • apps/web/src/rpc/requestLatencyState.test.ts
  • apps/web/src/rpc/requestLatencyState.ts
  • apps/web/src/rpc/transportError.test.ts
  • apps/web/src/rpc/transportError.ts
  • apps/web/src/rpc/wsConnectionState.test.ts
  • apps/web/src/rpc/wsConnectionState.ts
  • apps/web/src/store.ts
  • apps/web/src/wsNativeApi.test.ts
  • apps/web/src/wsNativeApi.ts
  • apps/web/src/wsRpcClient.ts
  • apps/web/src/wsTransport.test.ts
  • apps/web/src/wsTransport.ts
  • packages/contracts/src/ipc.ts

__resetWsRpcClientForTests() is async but was called without await,
risking stale client leaking into subsequent tests. Propagate async
through the call chain: wsNativeApi, nativeApi, and all browser test
beforeEach/afterEach hooks.
@aaditagrawal aaditagrawal merged commit 0196390 into main Apr 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL 1,000+ effective changed lines (test files excluded in mixed PRs). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants