-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
The intent of the persistence circuit-breaker is to shed load when that load is likely to fail due to an overload of the persistence store or loss of connectivity. How desirable this behavior is may differ from persistence plugin to persistence plugin.
For instance, if the persistence backend is sharded in some way (e.g. Cassandra, DynamoDB, or the R2DBC plugin's database partitioning/sharding support), there may be little correlation between calls whether an operation succeeds or fails. This (as Marc Brooker argues in https://brooker.co.za/blog/2022/02/16/circuit-breakers.html) suggests that the breaker will often make poor decisions:
- if it's only some fraction of operations that are to failing shards, then it's unlikely that enough operations will fail consecutively to ever open the breaker (in which case the overhead of having a breaker might not be worth the meager benefit)
- conversely, if there happens to be some burst of operations (e.g. due to retries) against the failing shards, the partial failure of losing the shard is elevated by the circuit breaker into a total failure
Allowing the plugin to take responsibility for the breaker would allow plugins to use multiple breakers (e.g. the partitioned R2DBC plugin controls the partitioning so could have per-partition circuit breakers without a global breaker). Other plugins (e.g. for highly-available distributed DBs) might decide to opt-out of the circuit breaker.
Plugins and users can presently effectively opt out of the circuit breaker by setting an unreasonably large threshold and very short reset timeout, though the extra overhead of the breaker is almost surely not worth it in this case.
The default would of course be the current behavior.