feat(bft): keepalive + typed payloads + pending-map race fix#15
Merged
Conversation
…naming align
WebSocket robustness pass on SubscriptionManager. Three concrete bugs +
one ergonomic add + a small naming drift fix.
Bug 1: pending-map race on socket error / close.
Pre-fix: only the first pending subscribe saw the rejection; every
other in-flight subscribe hung until its 10 s timeout fired
one-by-one. Now error + close handlers iterate the pending map and
reject every entry with the same surfaced error so the caller sees
failure immediately.
Bug 2: subscribe-response error path swallowed.
Pre-fix: only `{ id, result }` was handled; a server-side error
reply `{ id, error: { message } }` left the pending caller hanging
until timeout. Now the message handler checks for `.error` and
rejects with the server's message.
Bug 3: middlebox idle-kill.
Caddy reverse_proxy idle_timeout, NAT, AWS ALB all drop quiet
connections at 60-120 s. Added KEEPALIVE_INTERVAL_MS=30s ping +
STALE_TIMEOUT_MS=90s half-open detection. If no frame (event, pong,
subscribe-response) lands within 90 s the manager terminates the
socket and the close handler reconnects through the existing
exponential-backoff path.
Feature: subscribeTyped<C>().
ChannelPayloadMap discriminated union maps each Channel to its
payload type — `subscribeTyped("newHeads", ...)` gives
`payload: NewHeadsPayload` instead of `unknown`. Backwards-
compatible: untyped subscribe() still works.
Plus per-sub onError stored on InternalSub so a reconnect-time
re-subscribe routes failures back to the original caller, not just
to the manager-level handler.
network.ts naming drift: sentrixMainnet.name was "Sentrix Mainnet" —
the canonical brand is "Sentrix Chain" (matches chainlist registry
ethereum-lists/chains#8266 + every frontend chain config). Also fixed
testnet explorerUrl to the dedicated scan-testnet.sentrixchain.com
host for EIP-3091 deeplink routing — the previous shared
scan.sentrixchain.com pointed testnet tx links at the mainnet view.
Status method added for ops dashboards / debug pages —
`mgr.status()` returns `{ socketState, subs, secondsSinceLastFrame }`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WebSocket robustness pass on `SubscriptionManager`. Three concrete bugs + one ergonomic add + a small naming drift fix.
Bug 1: pending-map race on socket error / close
Pre-fix: only the first pending subscribe saw the rejection; every other in-flight subscribe hung until its 10 s timeout fired one-by-one. Now error + close handlers iterate the pending map and reject every entry with the same surfaced error so callers see failure immediately.
Bug 2: subscribe-response error path swallowed
Pre-fix: only `{ id, result }` was handled; a server-side error reply `{ id, error: { message } }` left the pending caller hanging until timeout. Now the message handler checks for `.error` and rejects with the server's message.
Bug 3: middlebox idle-kill
Caddy `reverse_proxy idle_timeout`, NAT, AWS ALB all drop quiet connections at 60–120 s. Added:
Feature: `subscribeTyped()`
`ChannelPayloadMap` discriminated union maps each Channel to its payload type — `subscribeTyped("newHeads", ...)` gives `payload: NewHeadsPayload` instead of `unknown`. Backwards-compatible: untyped `subscribe()` still works.
Plus per-sub `onError` stored on `InternalSub` so a reconnect-time re-subscribe routes failures back to the original caller, not just the manager-level handler.
Status method
`mgr.status()` returns `{ socketState, subs, secondsSinceLastFrame }` for ops dashboards / debug pages.
Naming drift fix
Test plan