Skip to content

feat(bft): keepalive + typed payloads + pending-map race fix#15

Merged
github-actions[bot] merged 1 commit into
mainfrom
feat/bft-keepalive-typed-payloads
May 10, 2026
Merged

feat(bft): keepalive + typed payloads + pending-map race fix#15
github-actions[bot] merged 1 commit into
mainfrom
feat/bft-keepalive-typed-payloads

Conversation

@satyakwok
Copy link
Copy Markdown
Member

Summary

WebSocket robustness pass on `SubscriptionManager`. Three concrete bugs + one ergonomic add + a small naming drift fix.

Bug 1: pending-map race on socket error / close

Pre-fix: only the first pending subscribe saw the rejection; every other in-flight subscribe hung until its 10 s timeout fired one-by-one. Now error + close handlers iterate the pending map and reject every entry with the same surfaced error so callers see failure immediately.

Bug 2: subscribe-response error path swallowed

Pre-fix: only `{ id, result }` was handled; a server-side error reply `{ id, error: { message } }` left the pending caller hanging until timeout. Now the message handler checks for `.error` and rejects with the server's message.

Bug 3: middlebox idle-kill

Caddy `reverse_proxy idle_timeout`, NAT, AWS ALB all drop quiet connections at 60–120 s. Added:

  • `KEEPALIVE_INTERVAL_MS = 30 s` — WebSocket ping every 30 s.
  • `STALE_TIMEOUT_MS = 90 s` — if no frame (event, pong, subscribe-response) lands within 90 s, terminate the socket; close handler reconnects via exponential backoff.

Feature: `subscribeTyped()`

`ChannelPayloadMap` discriminated union maps each Channel to its payload type — `subscribeTyped("newHeads", ...)` gives `payload: NewHeadsPayload` instead of `unknown`. Backwards-compatible: untyped `subscribe()` still works.

Plus per-sub `onError` stored on `InternalSub` so a reconnect-time re-subscribe routes failures back to the original caller, not just the manager-level handler.

Status method

`mgr.status()` returns `{ socketState, subs, secondsSinceLastFrame }` for ops dashboards / debug pages.

Naming drift fix

Test plan

  • `pnpm build` passes
  • Open a long-running subscription against testnet; verify the connection stays alive across a 5-min idle window (ping keeps Caddy from killing it)
  • Trigger a subscribe-response error (eg invalid filter) — caller's promise rejects with the server's message instead of timing out
  • Trigger a sudden socket close (ngrok kill, Caddy reload) — every in-flight `subscribe()` rejects immediately with `"websocket closed before subscribe response"`
  • Use `subscribeTyped("newHeads", ...)` in a TS playground — payload is typed as `NewHeadsPayload`, `.number` autocompletes

…naming align

WebSocket robustness pass on SubscriptionManager. Three concrete bugs +
one ergonomic add + a small naming drift fix.

Bug 1: pending-map race on socket error / close.
  Pre-fix: only the first pending subscribe saw the rejection; every
  other in-flight subscribe hung until its 10 s timeout fired
  one-by-one. Now error + close handlers iterate the pending map and
  reject every entry with the same surfaced error so the caller sees
  failure immediately.

Bug 2: subscribe-response error path swallowed.
  Pre-fix: only `{ id, result }` was handled; a server-side error
  reply `{ id, error: { message } }` left the pending caller hanging
  until timeout. Now the message handler checks for `.error` and
  rejects with the server's message.

Bug 3: middlebox idle-kill.
  Caddy reverse_proxy idle_timeout, NAT, AWS ALB all drop quiet
  connections at 60-120 s. Added KEEPALIVE_INTERVAL_MS=30s ping +
  STALE_TIMEOUT_MS=90s half-open detection. If no frame (event, pong,
  subscribe-response) lands within 90 s the manager terminates the
  socket and the close handler reconnects through the existing
  exponential-backoff path.

Feature: subscribeTyped<C>().
  ChannelPayloadMap discriminated union maps each Channel to its
  payload type — `subscribeTyped("newHeads", ...)` gives
  `payload: NewHeadsPayload` instead of `unknown`. Backwards-
  compatible: untyped subscribe() still works.

Plus per-sub onError stored on InternalSub so a reconnect-time
re-subscribe routes failures back to the original caller, not just
to the manager-level handler.

network.ts naming drift: sentrixMainnet.name was "Sentrix Mainnet" —
the canonical brand is "Sentrix Chain" (matches chainlist registry
ethereum-lists/chains#8266 + every frontend chain config). Also fixed
testnet explorerUrl to the dedicated scan-testnet.sentrixchain.com
host for EIP-3091 deeplink routing — the previous shared
scan.sentrixchain.com pointed testnet tx links at the mainnet view.

Status method added for ops dashboards / debug pages —
`mgr.status()` returns `{ socketState, subs, secondsSinceLastFrame }`.
@github-actions github-actions Bot enabled auto-merge (squash) May 10, 2026 21:30
@github-actions github-actions Bot merged commit 83a8ecc into main May 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant