-
-
Notifications
You must be signed in to change notification settings - Fork 255
Revamp NetworkController RPC endpoint events #7166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp NetworkController RPC endpoint events #7166
Conversation
In a future commit we will introduce changes to `network-controller` so that it will keep track of the status of each network as requests are made. These updates to `createServicePolicy` assist with that. See the changelog for a list of changes to the `ServicePolicy` API. Besides the changes listed there, the tests for `createServicePolicy` have been refactored slightly so that it is easier to maintain in the future.
2f9688e to
2c7678c
Compare
2c7678c to
4c58933
Compare
In a future commit we will introduce changes to `network-controller` so
that it will keep track of the status of each network as requests are
made. This commit paves the way for this to happen by redefining the
existing RPC endpoint-related events that NetworkController produces.
Currently, when requests are made through the network clients that
NetworkController exposes, three events are published:
- `NetworkController:rpcEndpointDegraded`
- Published when enough successive retriable errors are encountered
while making a request to an RPC endpoint that the maximum number of
retries is reached.
- `NetworkController:rpcEndpointUnavailable`
- Published when enough successive errors are encountered while making
a request to an RPC endpoint that the underlying circuit breaks.
- `NetworkController:rpcEndpointRequestRetried`
- Published when a request is retried (mainly used for testing).
It's important to note that in the context of the RPC failover feature,
an "RPC endpoint" can actually encompass multiple URLs, so the above
events actually fire for any URL.
While these events are useful for reporting metrics on RPC endpoints, in
order to effectively be able to update the status of a network, we need
events that are less granular and are guaranteed not to fire multiple
times in a row. We also need a new event.
Now the list of events looks like this:
- `NetworkController:rpcEndpointInstanceDegraded`
- The same as `NetworkController:rpcEndpointDegraded` before.
- `NetworkController:rpcEndpointInstanceUnavailable`
- The same as `NetworkController:rpcEndpointInstanceDegraded` before.
- `NetworkController:rpcEndpointInstanceRetried`
- Renamed from `NetworkController:rpcEndpointRequestRetried`.
- `NetworkController:rpcEndpointDegraded`
- Similar to `NetworkController:rpcEndpointInstanceDegraded`, but
won't be published again if the RPC endpoint is already in a
degraded state.
- `NetworkController:rpcEndpointUnavailable`
- Published when all of the circuits underlying all of the URLs for an
RPC endpoint have broken (none of the URLs are available). Won't be
published again if the RPC endpoint is already in an unavailable
state.
- `NetworkController:rpcEndpointAvailable`
- A new event. Published the first time a successful request is made
to one of the URLs for an RPC endpoint, or following a degraded or
unavailable status.
4c58933 to
9d090e9
Compare
| ): Promise<JsonRpcResponse<Result | null>> { | ||
| return this.#services[0].request(jsonRpcRequest, fetchOptions); | ||
| } | ||
| // Start with the primary (first) service and switch to failovers as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to these changes, each RpcService object could have an optional failoverService property. This class, RpcServiceChain, would then build a chain (really, a linked list) of services. In order to make a request, RpcServiceChain would call request on the first service in the chain, and RpcService would decide whether it needed to call the next service in the chain, etc.
While this model is easy to understand, I needed access to certain data along the way, and it seemed easier to use a loop rather than a linked list. Anyway, I figured it really should be the responsibility of RpcServiceChain to manage how requests are sent across the chain.
| onRetry( | ||
| listener: AddToCockatielEventData< | ||
| Parameters<ServicePolicy['onRetry']>[0], | ||
| listener: CockatielEventToEventListenerWithData< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be doing the same thing as the previous code. I just thought that AddToCockatielEventData<Parameters<...>[0] was a bit ugly (and I added some more self-descriptive utility types, so this is one of them).
| export function createAutoManagedNetworkClient< | ||
| Configuration extends NetworkClientConfiguration, | ||
| >({ | ||
| networkClientId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now expose the network client ID in the rpcEndpoint* messenger events, so we need to receive it and pass it down to createNetworkClient.
| }); | ||
| }); | ||
|
|
||
| it('publishes the NetworkController:rpcEndpointUnavailable event when the failover occurs', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests got moved to src/create-network-client-tests/rpc-endpoint-events.test.ts as I realized they didn't belong here. It's true that this test file is concerned with the RPC failover feature (which is required to test the rpcEndpoint* events), but all of the tests in tests/network-client directory really exercise the middleware stack that createNetworkClient builds and in so doing loop over all of the RPC methods that our internal provider handles specially. We don't need to go to all that trouble to test the rpcEndpoint* events, we can just use an arbitrary RPC method. (I think eventually I will rename tests/network-client to tests/internal-provider-api or something like that, but that's a PR for another time.)
| /** | ||
| * Obtains the event data type from a Cockatiel event or event listener type. | ||
| */ | ||
| export type ExtractCockatielEventData<CockatielEventOrEventListener> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Cockatiel types are a bit awkward to work with (especially since event listeners whose event payloads are empty are typed as Event<void> which is rather strange). These utilities just make them a bit easier to work with.
| - This ought to be unobservable, but we mark it as breaking out of an abundance of caution. | ||
| - **BREAKING:** Split up and update payload data for `NetworkController:rpcEndpoint{Degraded,Unavailable}` ([#7166](https://github.com/MetaMask/core/pull/7166)) | ||
| - The existing events are now called `NetworkController:rpcEndpointInstance{Degraded,Unavailable}` and retain their present behavior. | ||
| - `NetworkController:rpcEndpointInstance{Degraded,Unavailable}` do still exist, but they are now designed to represent the entire RPC endpoint and are guaranteed to not be published multiple times in a row. In particular, `NetworkController:rpcEndpointUnavailable` is published only after trying all of the designated URLs for a particular RPC endpoint and the underlying circuit for the last URL breaks, not as each primary's or failover's circuit breaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular,
NetworkController:rpcEndpointUnavailableis published only after trying all of the designated URLs for a particular RPC endpoint and the underlying circuit for the last URL breaks, not as each primary's or failover's circuit breaks.
This change is I suppose a bit controversial, and the first version of this was slightly different before I landed on this approach. But I think it makes sense? Basically, if we can reach the network somehow, don't broadcast that it's unavailable until we're absolutely sure. That does mean that the NetworkController:rpcEndpointUnavailable may never be published for Infura networks if the failover does its job, but I think that's precisely the intent.
|
|
## Explanation <!-- Thanks for your contribution! Take a moment to answer these questions so that reviewers have the information they need to properly understand your changes: * What is the current state of things and why does it need to change? * What is the solution your changes offer and how does it work? * Are there any changes whose purpose might not obvious to those unfamiliar with the domain? * If your primary goal was to update one package but you found you had to update another one along the way, why did you do so? * If you had to upgrade a dependency, why did you do so? --> In a future commit we will introduce changes to `network-controller` so that it will keep track of the status of each network as requests are made. These updates to `createServicePolicy` assist with that. See the changelog for more. Besides this, the tests for `createServicePolicy` have been refactored slightly so that they are easier to maintain in the future. ## References <!-- Are there any issues that this pull request is tied to? Are there other links that reviewers should consult to understand these changes better? Are there client or consumer pull requests to adopt any breaking changes? For example: * Fixes #12345 * Related to #67890 --> Progresses https://consensyssoftware.atlassian.net/browse/WPC-99. You can see how these changes will be used in the next PR: #7166 ## Checklist - [x] I've updated the test suite for new or updated code as appropriate - [x] I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate - [x] I've communicated my changes to consumers by [updating changelogs for packages I've changed](https://github.com/MetaMask/core/tree/main/docs/contributing.md#updating-changelogs), highlighting breaking changes as necessary - [x] I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds `getCircuitState`, `onAvailable`, and `reset` to `ServicePolicy`, exports Cockatiel types, and updates logic/tests to support availability tracking and circuit state introspection. > > - **controller-utils**: > - **ServicePolicy API**: > - Add `getCircuitState()` to expose underlying circuit state. > - Add `onAvailable` event for first success and post-recovery success. > - Add `reset()` to close the circuit and reset breaker counters. > - **Behavior/Internals**: > - Track availability status and emit `onAvailable`/`onDegraded` appropriately. > - Update `onBreak` to mark unavailable; wire `ConsecutiveBreaker` for reset. > - **Exports**: > - Export `CockatielEventEmitter` and `CockatielFailureReason`; re-export via `index`. > - **Tests**: > - Expand/refactor tests to cover `onAvailable`, `getCircuitState`, `reset`, and timing cases; update export snapshot. > - **Docs**: > - Update `CHANGELOG.md` with new methods and exports. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e597d0b. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Add comprehensive documentation for the getError function that extracts errors from Cockatiel's FailureReason type in circuit breaker event handlers. Documents both possible shapes of the FailureReason object.
9a78c8f to
81a987c
Compare
Remove primaryEndpointUrl field from NetworkController chain-level events: - NetworkController:rpcEndpointChainUnavailable - NetworkController:rpcEndpointChainDegraded - NetworkController:rpcEndpointChainAvailable Chain-level events are designed to represent the overall status of an endpoint chain, not individual endpoints. The primaryEndpointUrl field is redundant since consumers can derive endpoint information from the networkClientId using getNetworkClientById() or getNetworkConfigurationByNetworkClientId(). Individual endpoint events (rpcEndpointUnavailable, rpcEndpointDegraded, rpcEndpointRetried) retain the primaryEndpointUrl field, as it's useful for comparing endpointUrl to primaryEndpointUrl to determine whether the affected endpoint is a primary or a failover. Updated event type definitions, event publishing logic, test assertions, and changelog to reflect these changes.
81a987c to
2c35648
Compare
Add the same undefined check that exists in onBreak to ensure type safety and prevent publishing events with undefined error values.
Use CockatielFailureReason type instead of generic object type for better type safety and clarity.
Capture the chain status before calling service.request() to prevent spurious onBreak emissions. The onDegraded handler can fire synchronously during service.request() and change the status from Unavailable to Degraded before the catch block checks it, causing incorrect onBreak events when recovery attempts fail.
Revert the previous fix that captured previousStatus before the request. Checking the current status (this.#status) is correct because it accounts for status changes that may occur during the request from other services in the chain. The original check prevents duplicate onBreak emissions when the chain is already Unavailable.
packages/network-controller/src/rpc-service/rpc-service-chain.test.ts
Outdated
Show resolved
Hide resolved
The test 'calls onAvailable when a service becomes degraded by responding slowly, and then recovers' was not actually simulating a slow response, so it was only testing initial availability, not recovery from degraded state. Changes: - Add clock.tick(DEFAULT_DEGRADED_THRESHOLD + 1) to first mock to simulate slow response - Add onDegraded listener to verify degradation actually occurred - Add assertions to verify both onDegraded and onAvailable are called - Add assertion to verify call order (degradation before recovery)
packages/network-controller/src/rpc-service/rpc-service-chain.ts
Outdated
Show resolved
Hide resolved
…vents - Remove primaryEndpointUrl from event type definitions for onBreak, onDegraded, and onAvailable - Remove primaryEndpointUrl from event emissions in RpcServiceChain - Update event listener type signatures to not include primaryEndpointUrl - Update all test expectations to remove primaryEndpointUrl from assertions - Update create-network-client.ts to remove primaryEndpointUrl from event handlers - Note: onService* methods still include primaryEndpointUrl as they were not changed
packages/network-controller/src/rpc-service/rpc-service-chain.test.ts
Outdated
Show resolved
Hide resolved
- Remove endpointUrl from onBreak, onDegraded, and onAvailable events in RpcServiceChain - Update type definitions to exclude endpointUrl using ExcludeCockatielEventData - Update event emissions to exclude endpointUrl from chain-level events - Update NetworkController event types to remove endpointUrl from chain-level events (rpcEndpointChainDegraded, rpcEndpointChainAvailable, rpcEndpointChainUnavailable) - Update event handlers in create-network-client.ts to not destructure endpointUrl - Update all test assertions to remove endpointUrl from chain-level event expectations - Remove unused rpcUrl parameters from test functions - Align all chain-level events to not include endpointUrl (consistent with unavailable event)
- Change tertiaryEndpointUrl from 'https://second.endpoint' to 'https://third.endpoint'
Gudahtt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Explanation
In a future commit we will introduce changes to
network-controllerso that it will keep track of the status of each network as requests are made. This commit paves the way for this to happen by redefining the existing RPC endpoint-related events that NetworkController produces.Currently, when requests are made through the network clients that NetworkController exposes, three events are published:
NetworkController:rpcEndpointDegradedNetworkController:rpcEndpointUnavailableNetworkController:rpcEndpointRequestRetriedIt's important to note that in the context of the RPC failover feature, an "RPC endpoint" can actually encompass multiple URLs, so the above events actually fire for any URL.
While these events are useful for reporting metrics on RPC endpoints, in order to effectively be able to update the status of a network, we need events that are less granular and are guaranteed not to fire multiple times in a row. We also need a new event.
Now the list of events looks like this:
NetworkController:rpcEndpointDegradedNetworkController:rpcEndpointUnavailableNetworkController:rpcEndpointRetriedNetworkController:rpcEndpointRequestRetried.NetworkController:rpcEndpointChainDegradedNetworkController:rpcEndpointDegraded, but won't be published again if the RPC endpoint is already in a degraded state.NetworkController:rpcEndpointChainUnavailableNetworkController:rpcEndpointChainAvailableGoing a bit deeper, in order to make the changes above, it was necessary to rewrite the core logic responsible for diverting traffic to failovers from
RpcServicetoRpcServiceChain, which was a more natural fit, anyway. This also meant that we could simplifyRpcService, as well as its tests.References
Progresses https://consensyssoftware.atlassian.net/browse/WPC-99.
Checklist
Note
Introduce chain-level RPC endpoint events and a new RpcServiceChain, rename/update retry and payloads, and refactor failover logic with extensive test updates.
NetworkController:rpcEndpointChainAvailableand chain-level eventsrpcEndpointChainDegraded/rpcEndpointChainUnavailablewith non-repeating semantics insrc/NetworkController.ts.NetworkController:rpcEndpointRequestRetriedtoNetworkController:rpcEndpointRetried.networkClientId, addprimaryEndpointUrlto per-endpoint events; chain-level events omitprimaryEndpointUrl.RpcServiceChain(src/rpc-service/rpc-service-chain.ts) to manage primary/failover endpoints, circuit states, and emit chain/service events.RpcService(src/rpc-service/rpc-service.ts): remove embedded failover, addonAvailable,resetPolicy,getCircuitState, improved error handling/logging.create-network-client.tsbuildsRpcServiceChainand publishes updated events;create-auto-managed-network-client.tsnow passesnetworkClientIdtocreateNetworkClient.src/index.ts; adjustRpcServiceRequestableevent listener types and remove{ isolated: true }fromonBreakdata.CHANGELOG.md.cockatielto devDependencies.Written by Cursor Bugbot for commit 916a0e2. This will update automatically on new commits. Configure here.