-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Backoff-based error policy #314
base: main
Are you sure you want to change the base?
Conversation
This also has some side effects with breaking changes. Notably: 1. `error_policy` is now told about successful reconciliations. This is required so that stateful error policies can reset upon successes. 2. `error_policy` is now told the reference to the reconciled object. 3. `controller::Error` is now keyed by the object type. This isn't strictly required, but allows for some internal cleanup. 4. `Controller::run` takes a `Backoff` factory rather than a raw `error_policy`. This provides a much cleaner interface for custom policies, and the `backoff` crate provides both constant and exponential policies. If you need the extra control that the old interface provided, use the "hard-mode" `controller::applier` API.
The signature was getting unwieldy, and with the recent changes it was hard to know whether all instances had "kept up". This also gets rid of the weirdness where `error_policy` could rewrite the `ReconcilerAction` of a successful reconciliation.
One downside of this PR is that it worsens the ergonomics of error policies that differ depending on the error case... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments as I was reading through. It looks solid, even if it makes the interface sightly more awkward. I think therefore we should put the error policy on a separate instruction to make that less clumsy. Internals seem super solid though.
#[derive(Debug)] | ||
pub struct BackoffErrorPolicy<MkBackoff, B, K: RuntimeResource, Err, Ctx> { | ||
make_backoff: MkBackoff, | ||
backoffs: HashMap<ObjectRef<K>, B>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh. One back off per object. Interesting. I thought we would have one global one. This makes sense though. Probably smarter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I /think/ we should encourage backoff-per-object by default. There are cases where global backoffs make sense (gating access to some central resource for example), but you probably want to gate those by specific error cases..
@@ -261,7 +360,7 @@ where | |||
/// let cms = Api::<ConfigMap>::all(client.clone()); | |||
/// Controller::new(cmgs, ListParams::default()) | |||
/// .owns(cms, ListParams::default()) | |||
/// .run(reconcile, error_policy, context) | |||
/// .run(reconcile, || backoff::backoff::Constant::new(Duration::from_secs(60)), context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could have a .backoff builder on Controller here for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside there is that we either need to introduce a new generic or box the Backoff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, does it get awkward if we try to do something like:
Controller::(..)
.error(|| backoff::Constant::new(Duration::from_secs(60)))
.run(reconcile, context)
I guess I wanted to have a default like what you have there without having to be explicit about the lambda.
With maybe https://docs.rs/backoff/0.2.1/backoff/type.ExponentialBackoff.html as the default if omitted.
Sorry for the slow responses here. This is ultimately great. Have approved it so merge if you are happy, otherwise there's a bit of bikeshedding in the comments (closed all my exploratory ones). |
Having thought about this some more. You seem unsure about this as well (since it is still here), and there are some downsides to this that is making me more sceptical:
I think our lack of defining a good default start time here is the most problematic. All those values (1s/10s/100s), feel sensible depending on what needs to be created of the back of a CRD. So users probably want to set a minimum time to avoid a bunch of initial errors in logs from early retries. However, doing so in ExponentialBackoff looks pretty awkward to me. Manually creating the instance, selecting a generic clock, possibly using its Default impl. I'm starting to think this is probably not worth merging at the moment. Not sure what you think? |
This adds an
error_policy
based on the backoff crate, which gives us the ability to use stateful error policies. It also comes with some useful default implementations, such as exponential backoff, retry immediately, and constant retry.This also has some side effects with breaking changes. Notably:
error_policy
is now told about successful reconciliations. This isrequired so that stateful error policies can reset upon successes.
error_policy
is now told the reference to the reconciled object.controller::Error
is now keyed by the object type. This isn't strictlyrequired, but allows for some internal cleanup.
Controller::run
takes aBackoff
factory rather than a rawerror_policy
.This provides a much cleaner interface for custom policies, and the
backoff
crate provides both constant and exponential policies. If you need the extra
control that the old interface provided, use the "hard-mode"
controller::applier
API.Fixes #297. Arguably fixes #34.