Skip to content
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
74f4283
Circuit Breakers on Subgraph Execution
ardatan Apr 8, 2026
505dd34
docs: update documentation
theguild-bot Apr 8, 2026
17d2853
fixes
ardatan Apr 8, 2026
42ec510
docs: update documentation
theguild-bot Apr 8, 2026
541f998
Update lib/executor/src/executors/map.rs
ardatan Apr 8, 2026
e147a33
..
ardatan Apr 8, 2026
8a54b57
.
ardatan Apr 8, 2026
c18b1c3
docs: update documentation
theguild-bot Apr 8, 2026
a38a733
Changeset
ardatan Apr 9, 2026
8d4d2c1
..
ardatan Apr 9, 2026
647c8c6
Update lib/router-config/src/traffic_shaping.rs
ardatan Apr 9, 2026
ef7300e
Update lib/executor/Cargo.toml
ardatan Apr 9, 2026
0f9b01e
docs: update documentation
theguild-bot Apr 9, 2026
0f2c88a
Update docs/README.md
ardatan Apr 9, 2026
6602155
Address
ardatan Apr 9, 2026
fbec585
addressed
ardatan Apr 9, 2026
18de8ce
docs: update documentation
theguild-bot Apr 9, 2026
dda0978
Go
ardatan Apr 9, 2026
1da6569
..
ardatan Apr 9, 2026
5797f34
docs: update documentation
theguild-bot Apr 9, 2026
967fbb2
Fix typo
ardatan Apr 9, 2026
965ddde
docs: update documentation
theguild-bot Apr 9, 2026
0fb7484
Improvements
ardatan Apr 9, 2026
b249e93
Merge origin/main into circuit-breaker-subgraph: resolve conflicts in…
Copilot Apr 15, 2026
7a15ead
docs: regenerate README from merged schema
Copilot Apr 15, 2026
fcacdb7
Merge main into circuit-breaker-subgraph and resolve conflicts
Copilot Apr 15, 2026
e547ed8
Merge branch 'main' into circuit-breaker-subgraph
ardatan Apr 15, 2026
fca1aa9
Merge origin/main into circuit-breaker-subgraph: resolve conflicts
Copilot Apr 29, 2026
7489919
Address Devin comment
ardatan Apr 29, 2026
f63509c
docs: update documentation
theguild-bot Apr 29, 2026
40c27f0
fix: clone circuit breaker before await to avoid DashMap lock held ac…
Copilot Apr 29, 2026
8f852f7
..
ardatan Apr 30, 2026
237a3e4
5XX as errors tests
ardatan Apr 30, 2026
e3daccf
More
ardatan Apr 30, 2026
ce48ba7
docs: update documentation
theguild-bot Apr 30, 2026
d05da9c
Update JSON schema
ardatan Apr 30, 2026
24908e6
docs: update documentation
theguild-bot Apr 30, 2026
9ae8038
Lets go
ardatan May 4, 2026
22f32d6
Merge branch 'main' into circuit-breaker-subgraph
ardatan May 8, 2026
1cb1e24
docs: update documentation
theguild-bot May 8, 2026
7a5f42a
Reorder percentage module in mod.rs
ardatan May 8, 2026
ac63b56
Merge branch 'main' into circuit-breaker-subgraph
ardatan May 8, 2026
7d15c8f
Merge branch 'main' into circuit-breaker-subgraph
ardatan May 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/circuit-breaker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
hive-router: patch
hive-router-plan-executor: patch
hive-router-config: patch
---

# Implement Circuit Breaker for Subgraph Requests

This change introduces a circuit breaker mechanism for subgraph requests in the Hive Router. The circuit breaker will monitor the success and failure rates of requests to each subgraph and will prevent future requests if the failure rate exceeds a certain threshold. When the circuit breaker is opened, subsequent requests to that subgraph will fail immediately without attempting to send the request.

This implementation helps improve the resilience and stability of the Hive Router when dealing with unreliable subgraphs.
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ strum = { version = "0.28.0", features = ["derive"] }
mockito = "1.7.0"
futures-util = "0.3.31"
axum = "0.8.4"
recloser = "1.3.1"

# Telemetry
opentelemetry = "0.31.0"
Expand Down
46 changes: 44 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
|[**subscriptions**](#subscriptions)|`object`|Configuration for subscriptions.<br/>Default: `{"broadcast_capacity":0,"enabled":false}`<br/>||
|[**supergraph**](#supergraph)|`object`|Configuration for the Federation supergraph source. By default, the router will use a local file-based supergraph source (`./supergraph.graphql`).<br/>||
|[**telemetry**](#telemetry)|`object`|Default: `{"client_identification":{"name_header":"graphql-client-name","version_header":"graphql-client-version"},"hive":null,"metrics":{"exporters":[],"instrumentation":{"common":{"histogram":{"aggregation":"explicit","bytes":{"buckets":[128,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288,1048576,2097152,3145728,4194304,5242880],"record_min_max":false},"seconds":{"buckets":[0.005,0.01,0.025,0.05,0.075,0.1,0.25,0.5,0.75,1,2.5,5,7.5,10],"record_min_max":false}}},"instruments":{}}},"resource":{"attributes":{}},"tracing":{"collect":{"max_attributes_per_event":16,"max_attributes_per_link":32,"max_attributes_per_span":128,"max_events_per_span":128,"parent_based_sampler":false,"sampling":1},"exporters":[],"instrumentation":{"spans":{"mode":"spec_compliant"}},"propagation":{"b3":false,"baggage":false,"jaeger":false,"trace_context":true}}}`<br/>||
|[**traffic\_shaping**](#traffic_shaping)|`object`|Configuration for the traffic-shaping of the executor. Use these configurations to control how requests are being executed to subgraphs.<br/>Default: `{"all":{"dedupe_enabled":true,"pool_idle_timeout":"50s","request_timeout":"30s"},"max_connections_per_host":100,"router":{"dedupe":{"enabled":false,"headers":"all"},"max_long_lived_clients":128,"request_timeout":"1m"}}`<br/>||
|[**traffic\_shaping**](#traffic_shaping)|`object`|Configuration for the traffic-shaping of the executor. Use these configurations to control how requests are being executed to subgraphs.<br/>Default: `{"all":{"circuit_breaker":null,"dedupe_enabled":true,"pool_idle_timeout":"50s","request_timeout":"30s"},"max_connections_per_host":100,"router":{"dedupe":{"enabled":false,"headers":"all"},"max_long_lived_clients":128,"request_timeout":"1m"}}`<br/>||
|[**websocket**](#websocket)|`object`|Configuration of router's WebSocket server.<br/>Default: `{"enabled":false,"headers":{"persist":false,"source":"connection"},"path":null}`<br/>||

**Additional Properties:** not allowed
Expand Down Expand Up @@ -195,6 +195,7 @@ telemetry:
trace_context: true
traffic_shaping:
all:
circuit_breaker: null
dedupe_enabled: true
pool_idle_timeout: 50s
request_timeout: 30s
Expand Down Expand Up @@ -3040,7 +3041,7 @@ Configuration for the traffic-shaping of the executor. Use these configurations

|Name|Type|Description|Required|
|----|----|-----------|--------|
|[**all**](#traffic_shapingall)|`object`|The default configuration that will be applied to all subgraphs, unless overridden by a specific subgraph configuration.<br/>Default: `{"dedupe_enabled":true,"pool_idle_timeout":"50s","request_timeout":"30s"}`<br/>||
|[**all**](#traffic_shapingall)|`object`|The default configuration that will be applied to all subgraphs, unless overridden by a specific subgraph configuration.<br/>Default: `{"circuit_breaker":null,"dedupe_enabled":true,"pool_idle_timeout":"50s","request_timeout":"30s"}`<br/>||
|**max\_connections\_per\_host**|`integer`|Limits the concurrent amount of requests/connections per host/subgraph.<br/>Default: `100`<br/>Format: `"uint"`<br/>Minimum: `0`<br/>||
|[**router**](#traffic_shapingrouter)|`object`|Configuration for the router itself, e.g., for handling incoming requests, or other router-level traffic shaping configurations.<br/>Default: `{"dedupe":{"enabled":false,"headers":"all"},"max_long_lived_clients":128,"request_timeout":"1m"}`<br/>||
|[**subgraphs**](#traffic_shapingsubgraphs)|`object`|Optional per-subgraph configurations that will override the default configuration for specific subgraphs.<br/>||
Expand All @@ -3050,6 +3051,7 @@ Configuration for the traffic-shaping of the executor. Use these configurations

```yaml
all:
circuit_breaker: null
dedupe_enabled: true
pool_idle_timeout: 50s
request_timeout: 30s
Expand All @@ -3073,6 +3075,7 @@ The default configuration that will be applied to all subgraphs, unless overridd

|Name|Type|Description|Required|
|----|----|-----------|--------|
|[**circuit\_breaker**](#traffic_shapingallcircuit_breaker)|`object`, `null`|Circuit Breaker configuration for all subgraphs.<br/>||
|**dedupe\_enabled**|`boolean`|Enables/disables request deduplication to subgraphs.<br/><br/>When requests exactly matches the hashing mechanism (e.g., subgraph name, URL, headers, query, variables), and are executed at the same time, they will<br/>be deduplicated by sharing the response of other in-flight requests.<br/>Default: `true`<br/>||
|**pool\_idle\_timeout**|`string`|Timeout for idle sockets being kept-alive.<br/>Default: `"50s"`<br/>||
|**request\_timeout**||Optional timeout configuration for requests to subgraphs.<br/><br/>Example with a fixed duration:<br/>```yaml<br/> timeout:<br/> duration: 5s<br/>```<br/><br/>Or with a VRL expression that can return a duration based on the operation kind:<br/>```yaml<br/> timeout:<br/> expression: \|<br/> if (.request.operation.type == "mutation") {<br/> "10s"<br/> } else {<br/> "15s"<br/> }<br/>```<br/>Default: `"30s"`<br/>||
Expand All @@ -3081,12 +3084,32 @@ The default configuration that will be applied to all subgraphs, unless overridd
**Example**

```yaml
circuit_breaker: null
dedupe_enabled: true
pool_idle_timeout: 50s
request_timeout: 30s

```

<a name="traffic_shapingallcircuit_breaker"></a>
#### traffic\_shaping\.all\.circuit\_breaker: object,null

Circuit Breaker configuration for all subgraphs.
When the circuit breaker is open, requests to the subgraph will be
short-circuited and an error will be returned to the client.
The circuit breaker will be triggered based on the error rate of requests to the subgraph, and will attempt to reset after a certain timeout.


**Properties**

|Name|Type|Description|Required|
|----|----|-----------|--------|
|**enabled**|`boolean`|Enable or disable the circuit breaker for the subgraph.<br/>Default: false (circuit breaker is disabled)<br/>Default: `false`<br/>||
|**error\_threshold**|`string`|Percentage after what the circuit breaker should kick in.<br/>Default: 50%<br/>||
|**reset\_timeout**|`string`|The duration after which the circuit breaker will attempt to retry sending requests to the subgraph.<br/>Default: 30s<br/>||
|**volume\_threshold**|`integer`, `null`|Count of requests before starting evaluating.<br/>Default: 5<br/>Format: `"uint"`<br/>Minimum: `0`<br/>||

**Additional Properties:** not allowed
<a name="traffic_shapingrouter"></a>
### traffic\_shaping\.router: object

Expand Down Expand Up @@ -3151,10 +3174,29 @@ Optional per-subgraph configurations that will override the default configuratio

|Name|Type|Description|Required|
|----|----|-----------|--------|
|[**circuit\_breaker**](#traffic_shapingsubgraphsadditionalpropertiescircuit_breaker)|`object`, `null`|Circuit Breaker configuration for the subgraph.<br/>||
|**dedupe\_enabled**|`boolean`, `null`|Enables/disables request deduplication to subgraphs.<br/><br/>When requests exactly matches the hashing mechanism (e.g., subgraph name, URL, headers, query, variables), and are executed at the same time, they will<br/>be deduplicated by sharing the response of other in-flight requests.<br/>||
|**pool\_idle\_timeout**|`string`, `null`|Timeout for idle sockets being kept-alive.<br/>||
|**request\_timeout**||Optional timeout configuration for requests to subgraphs.<br/><br/>Example with a fixed duration:<br/>```yaml<br/> timeout:<br/> duration: 5s<br/>```<br/><br/>Or with a VRL expression that can return a duration based on the operation kind:<br/>```yaml<br/> timeout:<br/> expression: \|<br/> if (.request.operation.type == "mutation") {<br/> "10s"<br/> } else {<br/> "15s"<br/> }<br/>```<br/>||

**Additional Properties:** not allowed
<a name="traffic_shapingsubgraphsadditionalpropertiescircuit_breaker"></a>
##### traffic\_shaping\.subgraphs\.additionalProperties\.circuit\_breaker: object,null

Circuit Breaker configuration for the subgraph.
When the circuit breaker is open, requests to the subgraph will be short-circuited and an error will be returned to the client.
The circuit breaker will be triggered based on the error rate of requests to the subgraph, and will attempt to reset after a certain timeout.


**Properties**

|Name|Type|Description|Required|
|----|----|-----------|--------|
|**enabled**|`boolean`|Enable or disable the circuit breaker for the subgraph.<br/>Default: false (circuit breaker is disabled)<br/>Default: `false`<br/>||
|**error\_threshold**|`string`|Percentage after what the circuit breaker should kick in.<br/>Default: 50%<br/>||
|**reset\_timeout**|`string`|The duration after which the circuit breaker will attempt to retry sending requests to the subgraph.<br/>Default: 30s<br/>||
|**volume\_threshold**|`integer`, `null`|Count of requests before starting evaluating.<br/>Default: 5<br/>Format: `"uint"`<br/>Minimum: `0`<br/>||

**Additional Properties:** not allowed
<a name="websocket"></a>
## websocket: object
Expand Down
13 changes: 13 additions & 0 deletions e2e/configs/circuit_breaker_global.router.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# yaml-language-server: $schema=../../router-config.schema.json
supergraph:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have inlined confings instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those are not even used

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. They are not used, removed!

source: file
path: ../supergraph.graphql
traffic_shaping:
all:
circuit_breaker:
enabled: true
error_threshold: 50%
volume_threshold: 5
reset_timeout: 30s
# Disable deduplication to better test circuit breaker behavior
dedupe_enabled: false
18 changes: 18 additions & 0 deletions e2e/configs/circuit_breaker_mixed.router.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# yaml-language-server: $schema=../../router-config.schema.json
supergraph:
source: file
path: ../supergraph.graphql
traffic_shaping:
all:
circuit_breaker:
enabled: true
error_threshold: 50%
volume_threshold: 10
reset_timeout: 30s
dedupe_enabled: false
subgraphs:
accounts:
circuit_breaker:
enabled: true
volume_threshold: 3 # Override only volume_threshold
# error_threshold and reset_timeout inherit from global
22 changes: 22 additions & 0 deletions e2e/configs/circuit_breaker_per_subgraph.router.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# yaml-language-server: $schema=../../router-config.schema.json
supergraph:
source: file
path: ../supergraph.graphql
traffic_shaping:
all:
circuit_breaker:
enabled: false # Disabled globally
dedupe_enabled: false
subgraphs:
accounts:
circuit_breaker:
enabled: true
error_threshold: 60%
volume_threshold: 3
reset_timeout: 10s
products:
circuit_breaker:
enabled: true
error_threshold: 70%
volume_threshold: 4
reset_timeout: 15s
Loading
Loading