Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
cca8f81
Extend createServicePolicy to support live network status
mcmire Nov 14, 2025
6a3cff1
Fix tests
mcmire Nov 15, 2025
c08f398
Add more tests
mcmire Nov 17, 2025
5e0e3e1
No need for getLastInnerFailureReason
mcmire Nov 17, 2025
e2eba7a
Fix an issue with onAvailable
mcmire Nov 17, 2025
246b2b5
Reduce the diff
mcmire Nov 17, 2025
199bb79
Fix tests
mcmire Nov 17, 2025
ff6d832
Use a quasi-enum for the availability status
mcmire Nov 17, 2025
fa66813
Fix test
mcmire Nov 17, 2025
9d090e9
Revamp NetworkController RPC endpoint events
mcmire Nov 14, 2025
0da865b
Remove this comment
mcmire Nov 17, 2025
b3909af
Add 'degraded' status
mcmire Nov 17, 2025
6b628d7
Use similar terminology as in createServicePolicy
mcmire Nov 17, 2025
4a3985a
Merge branch 'update-create-service-policy' into update-network-contr…
mcmire Nov 17, 2025
2d38446
Adjust createServicePolicy as well
mcmire Nov 17, 2025
3d8da80
Adjust createServicePolicy as well
mcmire Nov 17, 2025
7860897
Merge branch 'update-create-service-policy' into update-network-contr…
mcmire Nov 18, 2025
f67839a
Update some of the terminology
mcmire Nov 18, 2025
110cb0b
Update more of the terminology
mcmire Nov 18, 2025
b16597a
RpcEndpointUnvailable -> RpcEndpointUnavailable
mcmire Nov 18, 2025
137c3d7
Merge branch 'main' into update-network-controller-rpc-endpoint-events
mcmire Nov 19, 2025
cb498e3
Address Cursor comment
mcmire Nov 19, 2025
cdd8491
Make the RPC endpoint event tests more realistic, and address Cursor …
mcmire Nov 19, 2025
73017d1
Reword JSDoc for events, remove endpointUrl from chainUnavailable event
mcmire Nov 20, 2025
7de4def
Merge branch 'main' into update-network-controller-rpc-endpoint-events
mcmire Nov 20, 2025
026ce00
Use the primaryEndpointUrl from the Cockatiel event in the messenger …
mcmire Nov 20, 2025
0e13332
Merge branch 'main' into update-network-controller-rpc-endpoint-events
mcmire Nov 21, 2025
74d9d0f
Remove RpcService.getLastInnerFailureReason, use lastError directly
mcmire Nov 21, 2025
bbbfa9c
Remove redundant tag in log
mcmire Nov 21, 2025
4f9d84b
data -> context
mcmire Nov 21, 2025
1862024
Remove unused utility type
mcmire Nov 21, 2025
2f29875
Tweak log statement
mcmire Nov 21, 2025
827696f
Tweak comment
mcmire Nov 21, 2025
72de1d6
Revert log change
mcmire Nov 21, 2025
0f55384
Add comment for why we are ignoring 'isolated'
mcmire Nov 21, 2025
067a807
refactor(network-controller): improve RPC service chain configuration
cryptodev-2s Nov 25, 2025
4b57f32
refactor(network-controller): replace magic number with calculated co…
cryptodev-2s Nov 25, 2025
056703f
test(network-controller): rollback test description change
cryptodev-2s Nov 25, 2025
321ff7c
Group RPC endpoint event tests by failover enabled/disabled
cryptodev-2s Nov 25, 2025
7b9c736
docs: add JSDoc for getError helper in createNetworkClient
cryptodev-2s Nov 25, 2025
2c35648
refactor: remove primaryEndpointUrl from chain-level RPC events
cryptodev-2s Nov 25, 2025
30a9486
docs: clarify event payload changes in network-controller changelog
cryptodev-2s Nov 26, 2025
4e7a37a
fix: add undefined check for error in onServiceBreak handler
cryptodev-2s Nov 26, 2025
61bdbbc
refactor: improve type safety for getError function
cryptodev-2s Nov 26, 2025
9fea617
fix: prevent race condition in onBreak chain status check
cryptodev-2s Nov 26, 2025
e4459ee
revert: remove previousStatus check that introduced race condition
cryptodev-2s Nov 26, 2025
c59898d
test: fix test to properly verify degraded then recover scenario
cryptodev-2s Nov 26, 2025
bcc840f
Remove primaryEndpointUrl from onBreak, onDegraded, and onAvailable e…
cryptodev-2s Nov 26, 2025
3918d08
Remove endpointUrl from chain-level RPC events
cryptodev-2s Nov 26, 2025
916a0e2
Fix typo: correct tertiary endpoint URL in test
cryptodev-2s Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions packages/controller-utils/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add `getCircuitState` method to `ServicePolicy` ([#7164](https://github.com/MetaMask/core/pull/7164))
- This can be used when working with a chain of services to know whether a service's underlying circuit is open or closed.
- Add `onAvailable` method to `ServicePolicy` ([#7164](https://github.com/MetaMask/core/pull/7164))
- This can be used to listen for the initial successful execution of the service, or the first successful execution after the service becomes degraded or the circuit breaks.
- Add `reset` method to `ServicePolicy` ([#7164](https://github.com/MetaMask/core/pull/7164))
- This can be used when working with a chain of services to reset the state of the circuit breaker policy (e.g. if a primary recovers and we want to reset the failovers).
- Export `CockatielEventEmitter` and `CockatielFailureReason` from Cockatiel ([#7164](https://github.com/MetaMask/core/pull/7164))
- These can be used to further transform types for event emitters/listeners.

## [11.15.0]

### Added
Expand Down
2,893 changes: 1,822 additions & 1,071 deletions packages/controller-utils/src/create-service-policy.test.ts

Large diffs are not rendered by default.

104 changes: 96 additions & 8 deletions packages/controller-utils/src/create-service-policy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,15 @@ import type {

export {
BrokenCircuitError,
CockatielEventEmitter,
CircuitState,
ConstantBackoff,
ExponentialBackoff,
handleAll,
handleWhen,
};

export type { CockatielEvent };
export type { CockatielEvent, FailureReason as CockatielFailureReason };

/**
* The options for `createServicePolicy`.
Expand Down Expand Up @@ -85,12 +86,21 @@ export type ServicePolicy = IPolicy & {
* maximum consecutive failures is reached.
*/
circuitBreakDuration: number;
/**
* @returns The state of the underlying circuit.
*/
getCircuitState: () => CircuitState;
/**
* If the circuit is open and ongoing requests are paused, returns the number
* of milliseconds before the requests will be attempted again. If the circuit
* is not open, returns null.
*/
getRemainingCircuitOpenDuration: () => number | null;
/**
* Resets the internal circuit breaker policy (if it is open, it will now be
* closed).
*/
reset: () => void;
/**
* The Cockatiel retry policy that the service policy uses internally.
*/
Expand All @@ -108,6 +118,12 @@ export type ServicePolicy = IPolicy & {
* number of consecutive failures has been reached.
*/
onDegraded: CockatielEvent<FailureReason<unknown> | void>;
/**
* A function which is called when the service succeeds for the first time,
* or when the service fails enough times to cause the circuit to break and
* then recovers.
*/
onAvailable: CockatielEvent<void>;
/**
* A function which will be called by the retry policy each time the service
* fails and the policy kicks off a timer to re-run the service. This is
Expand All @@ -127,6 +143,25 @@ type InternalCircuitState =
}
| { state: Exclude<CircuitState, CircuitState.Open> };

/**
* List of avalability statuses.
*
* Used to keep track of whether the `onAvailable` event should be fired.
*/
const AVAILABILITY_STATUSES = {
Unknown: 'unknown',
Available: 'available',
Unavailable: 'unavailable',
};

/**
* An availability status.
*
* Used to keep track of whether the `onAvailable` event should be fired.
*/
type AvailabilityStatus =
(typeof AVAILABILITY_STATUSES)[keyof typeof AVAILABILITY_STATUSES];

/**
* The maximum number of times that a failing service should be re-run before
* giving up.
Expand Down Expand Up @@ -249,6 +284,8 @@ export function createServicePolicy(
backoff = new ExponentialBackoff(),
} = options;

let availabilityStatus: AvailabilityStatus = AVAILABILITY_STATUSES.Unknown;

const retryPolicy = retry(retryFilterPolicy, {
// Note that although the option here is called "max attempts", it's really
// maximum number of *retries* (attempts past the initial attempt).
Expand All @@ -259,6 +296,7 @@ export function createServicePolicy(
});
const onRetry = retryPolicy.onRetry.bind(retryPolicy);

const consecutiveBreaker = new ConsecutiveBreaker(maxConsecutiveFailures);
const circuitBreakerPolicy = circuitBreaker(handleWhen(isServiceFailure), {
// While the circuit is open, any additional invocations of the service
// passed to the policy (either via automatic retries or by manually
Expand All @@ -267,7 +305,7 @@ export function createServicePolicy(
// service will be allowed to run again. If the service succeeds, the
// circuit will close, otherwise it will remain open.
halfOpenAfter: circuitBreakDuration,
breaker: new ConsecutiveBreaker(maxConsecutiveFailures),
breaker: consecutiveBreaker,
});

let internalCircuitState: InternalCircuitState = getInternalCircuitState(
Expand All @@ -276,27 +314,56 @@ export function createServicePolicy(
circuitBreakerPolicy.onStateChange((state) => {
internalCircuitState = getInternalCircuitState(state);
});

circuitBreakerPolicy.onBreak(() => {
availabilityStatus = AVAILABILITY_STATUSES.Unavailable;
});
const onBreak = circuitBreakerPolicy.onBreak.bind(circuitBreakerPolicy);

const onDegradedEventEmitter =
new CockatielEventEmitter<FailureReason<unknown> | void>();
const onDegraded = onDegradedEventEmitter.addListener;

const onAvailableEventEmitter = new CockatielEventEmitter<void>();
const onAvailable = onAvailableEventEmitter.addListener;

retryPolicy.onGiveUp((data) => {
if (circuitBreakerPolicy.state === CircuitState.Closed) {
onDegradedEventEmitter.emit(data);
}
});
retryPolicy.onSuccess(({ duration }) => {
if (
circuitBreakerPolicy.state === CircuitState.Closed &&
duration > degradedThreshold
) {
onDegradedEventEmitter.emit();
if (circuitBreakerPolicy.state === CircuitState.Closed) {
if (duration > degradedThreshold) {
onDegradedEventEmitter.emit();
} else if (availabilityStatus !== AVAILABILITY_STATUSES.Available) {
availabilityStatus = AVAILABILITY_STATUSES.Available;
onAvailableEventEmitter.emit();
}
}
});
const onDegraded = onDegradedEventEmitter.addListener;

// Every time the retry policy makes an attempt, it executes the circuit
// breaker policy, which executes the service.
//
// Calling:
//
// policy.execute(() => {
// // do what the service does
// })
//
// is equivalent to:
//
// retryPolicy.execute(() => {
// circuitBreakerPolicy.execute(() => {
// // do what the service does
// });
// });
//
// So if the retry policy succeeds or fails, it is because the circuit breaker
// policy succeeded or failed. And if there are any event listeners registered
// on the retry policy, by the time they are called, the state of the circuit
// breaker will have already changed.
const policy = wrap(retryPolicy, circuitBreakerPolicy);

const getRemainingCircuitOpenDuration = () => {
Expand All @@ -306,14 +373,35 @@ export function createServicePolicy(
return null;
};

const getCircuitState = () => {
return circuitBreakerPolicy.state;
};

const reset = () => {
// Set the state of the policy to "isolated" regardless of its current state
const { dispose } = circuitBreakerPolicy.isolate();
// Reset the state to "closed"
dispose();

// Reset the counter on the breaker as well
consecutiveBreaker.success();

// Re-initialize the availability status so that if the service is executed
// successfully, onAvailable listeners will be called again
availabilityStatus = AVAILABILITY_STATUSES.Unknown;
};

return {
...policy,
circuitBreakerPolicy,
circuitBreakDuration,
getCircuitState,
getRemainingCircuitOpenDuration,
reset,
retryPolicy,
onBreak,
onDegraded,
onAvailable,
onRetry,
};
}
1 change: 1 addition & 0 deletions packages/controller-utils/src/index.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ describe('@metamask/controller-utils', () => {
Array [
"BrokenCircuitError",
"CircuitState",
"CockatielEventEmitter",
"ConstantBackoff",
"DEFAULT_CIRCUIT_BREAK_DURATION",
"DEFAULT_DEGRADED_THRESHOLD",
Expand Down
2 changes: 2 additions & 0 deletions packages/controller-utils/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
export {
BrokenCircuitError,
CircuitState,
CockatielEventEmitter,
ConstantBackoff,
DEFAULT_CIRCUIT_BREAK_DURATION,
DEFAULT_DEGRADED_THRESHOLD,
Expand All @@ -14,6 +15,7 @@ export {
export type {
CockatielEvent,
CreateServicePolicyOptions,
CockatielFailureReason,
ServicePolicy,
} from './create-service-policy';
export {
Expand Down
14 changes: 14 additions & 0 deletions packages/network-controller/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add `NetworkController:rpcEndpointAvailable` messenger event ([#7166](https://github.com/MetaMask/core/pull/7166))
- These are counterparts to the (new) `NetworkController:rpcEndpointUnavailable` and `NetworkController:rpcEndpointDegraded` events, but are published when a request to an RPC endpoint URL is made either initially or following a previously established degraded or unavailable status.

### Changed

- **BREAKING:** Use `InternalProvider` instead of `SafeEventEmitterProvider` ([#6796](https://github.com/MetaMask/core/pull/6796))
Expand All @@ -19,6 +24,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- In practice, this should happen rarely if ever.
- **BREAKING:** Migrate `NetworkClient` to `JsonRpcEngineV2` ([#7065](https://github.com/MetaMask/core/pull/7065))
- This ought to be unobservable, but we mark it as breaking out of an abundance of caution.
- **BREAKING:** Split up and update payload data for `NetworkController:rpcEndpoint{Degraded,Unavailable}` ([#7166](https://github.com/MetaMask/core/pull/7166))
- The existing events are now called `NetworkController:rpcEndpointInstance{Degraded,Unavailable}` and retain their present behavior.
- `NetworkController:rpcEndpointInstance{Degraded,Unavailable}` do still exist, but they are now designed to represent the entire RPC endpoint and are guaranteed to not be published multiple times in a row. In particular, `NetworkController:rpcEndpointUnavailable` is published only after trying all of the designated URLs for a particular RPC endpoint and the underlying circuit for the last URL breaks, not as each primary's or failover's circuit breaks.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, NetworkController:rpcEndpointUnavailable is published only after trying all of the designated URLs for a particular RPC endpoint and the underlying circuit for the last URL breaks, not as each primary's or failover's circuit breaks.

This change is I suppose a bit controversial, and the first version of this was slightly different before I landed on this approach. But I think it makes sense? Basically, if we can reach the network somehow, don't broadcast that it's unavailable until we're absolutely sure. That does mean that the NetworkController:rpcEndpointUnavailable may never be published for Infura networks if the failover does its job, but I think that's precisely the intent.

- The event payloads have been changed as well: `failoverEndpointUrl` has been renamed to `endpointUrl`, and `endpointUrl` has been renamed to `primaryEndpointUrl`. In addition, `networkClientId` has been added.
- **BREAKING:** Rename and update payload data for `NetworkController:rpcEndpointRequestRetried` ([#7166](https://github.com/MetaMask/core/pull/7166))
- This event is now called `NetworkController:rpcEndpointInstanceRequestRetried`
- The event payload has been changed as well: `failoverEndpointUrl` has been renamed to `endpointUrl`, and `endpointUrl` has been renamed to `primaryEndpointUrl`. In addition, `networkClientId` and `attempt` have been added.
- **BREAKING:** Update `AbstractRpcService`/`RpcServiceRequestable` to remove `{ isolated: true }` from the `onBreak` event data type ([#7166](https://github.com/MetaMask/core/pull/7166))
- This represented the error produced when `.isolate` is called on a Cockatiel circuit breaker policy, which we never do.
- Bump `@metamask/controller-utils` from `^11.14.1` to `^11.15.0` ([#7003](https://github.com/MetaMask/core/pull/7003))

### Fixed
Expand Down
1 change: 1 addition & 0 deletions packages/network-controller/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
"@types/jest-when": "^2.7.3",
"@types/lodash": "^4.14.191",
"@types/node-fetch": "^2.6.12",
"cockatiel": "^3.1.2",
"deep-freeze-strict": "^1.1.1",
"deepmerge": "^4.2.2",
"jest": "^27.5.1",
Expand Down
Loading
Loading