Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReconcilerAction should be an enum #317

Closed
ctron opened this issue Sep 2, 2020 · 11 comments · Fixed by #851
Closed

ReconcilerAction should be an enum #317

ctron opened this issue Sep 2, 2020 · 11 comments · Fixed by #851
Labels
ergonomics ergonomics of the public interface runtime controller runtime related

Comments

@ctron
Copy link
Contributor

ctron commented Sep 2, 2020

IMHO ReconcilerAction should be an enum, rather than a struct. This would handling the outcome much simpler.

I would imagine something like:

pub enum ReconcilerAction {
  Done,
  Requeue,
  RequeueAfter(Duration),
}

Or maybe:

pub enum ReconcilerAction {
  Done,
  Requeue(Option<Duration>),
}

This would allow you to handle outcomes like this:

match do_reconcile() {
  None => {},
  Requeue => {},
  RequeueAfter(duration) => {},
}
@nightkr
Copy link
Member

nightkr commented Sep 2, 2020

From the reconciler's PoV all of those cases are already supported. (Done => requeue_after: None, Requeue => requeue_after: Some(Duration::zero()), RequeueAfter => Some(duration)). I'm not sure I'm comfortable introducing a distinction between "requeue soon" and "requeue now", especially considering that the old Soon(now) representation would still work too.

Do you have a specific use-case in mind that would require this?

@ctron
Copy link
Contributor Author

ctron commented Sep 2, 2020

From the reconciler's PoV all of those cases are already supported.

I didn't say that anything isn't support. I just looks awkward to me from a Rust perspective.

Take the Result for example, that is an enum as well and makes it easy to handle the outcome e.g. with a match.

Do you have a specific use-case in mind that would require this?

I think the applier function could benefit from it. Making it easier to understand how things get processed.

The same for the actual implementation returning a result:

fn reconcile() -> Result<ReconcilerAction> {
  Ok(ReconcileResult{requeue_after: None})
}
// or
fn reconcile() -> Result<ReconcilerAction> {
  Ok(ReconcileResult::default())
}

That looks simple awkward to me, versus:

fn reconcile() -> Result<ReconcilerAction> {
  Ok(Done)
}

So again, it is not about "making it work", it is about making a nice API that aligns with Rust concepts.

@nightkr
Copy link
Member

nightkr commented Sep 7, 2020

Take the Result for example, that is an enum as well and makes it easy to handle the outcome e.g. with a match.

Yeah.. but as a user you should rarely be matching on the ReconcileResult anyway.

The same for the actual implementation returning a result:
That looks simple awkward to me, versus:

Disabling periodic reconciliation should (IMO) feel somewhat awkward, since it's rarely something that you want to do. External systems change, and it's easy to forget to watch some subresource. Periodic reconciliation helps mitigate both of those cases (but is of course strictly worse than a proper watch).

There are only a few cases where you are truly done with a resource, and legitimately don't care anymore (for example: a Job that has completed).

@nightkr
Copy link
Member

nightkr commented Sep 7, 2020

I could agree that it would be slightly more clear to have something like:

struct ReconcilerAction {
    requeue: ReconcileDelay,
}
enum ReconcileDelay {
    Never,
    After(Duration),
}

But that seems like a very marginal gain to me. I'm not so sure about making ReconcilerAction itself into an enum, since any future actions that we'd want to enable would likely not be mutually exclusive with periodic reconciliation.

@Pscheidl
Copy link

Pscheidl commented Nov 10, 2020

I got interested in this issue :) Well, I created a small showcase pullrequest to better imagine how the change would look like. #338

I don't have all the @teozkr has,but with my limited knowledge from user's perspective, the enum is also what I would expect and the struct was a little bit unexpected. Secondly, you mention there will be a lot more actions in the future, mutually exclusive with re-scheduling the reconciliation after some delay. That seems fine to me, as there will one be one enum level with rescheduling. The ReconcilerAction enum could in future look like:

pub enum ReconcilerAction {
    Requeue(Duration),
    None,
    ActionX,
    ActionY,
    ActionZ,
    SomeOtherDelayedTaskThatOccured(Duration),
    ///etc
}

Also, the original proposal had Requeue and RequeueAfter levels, I think this could be solved with just one RequeueAfter(Duration) - just like the struct does it.

Just my two cents.

@nightkr
Copy link
Member

nightkr commented Nov 11, 2020

Secondly, you mention there will be a lot more actions in the future, mutually exclusive with re-scheduling the reconciliation after some delay.

I said they will likely not be mutually exclusive with rescheduling.

Apart from that, I don't want to promote "never requeue" as the common or easy option.. the existing ReconcilerAction API is already way to close to that..

@clux
Copy link
Member

clux commented Nov 11, 2020

Thanks a lot @Pscheidl . I agree with the enum being the more expected solution compared to master, and your PR illustrates that.

This does intersect with another case that was in development; if we wanted to add a case like ReconcilerAction::AutomaticRequeue, and use backoff crate internally to retry at increasing intervals, then this pr would clash with @teozkr 's work in #314. I had a revisit of the state of that PR now, which has legs, but it comes with some complexity.

I feel that either we expose backoff as the main setup to deal with rescheduling (#314 - which uses backoff::constant as a new way of doing RequeueAfter), or we don't deal with backoffs at all internally (and perhaps add more modes).

So the question I think is really, what are our use cases for other ReconcilerActions? Some thoughts:

  • None - should really not be the default, and should come with a warning (as teo says ^)
  • RequeAfter(static_duration) - 300s default should probably be the main default, as per controller-runtime
  • Manual - and instead we have a user method to inject a requeue, perhaps in response to an external web event?
  • ExponentialBackoff(MinimumTime) - using ideas from or all of Add Backoff-based error policy #314

Which of these are the most useful?

@Pscheidl
Copy link

The PR of mine can be discarded, it was meant just to try it out (talk is cheap, I wanted the code). My apologies to @teozkr for misreading the info above.

How will the exponential backoff work ? Ideally, as far as I understand, the controller should be failure-tolerant. E.g. finalizers stay in the resource metadata unless explicitely removed by the controller, making sure if the controller is down, nothing happens. There seems to be a hashmap in this PR: https://github.com/clux/kube-rs/pull/314/files#diff-e8024b5d2267b4aa47ca3f4fcc5d2338edbee63a282057b55fe3b13ebc28cabaR137

If the controller fails, the time of last reconciliation is lost. That's acceptable probably. Not sure. Just thinking out loud.

Another thought is the RequeueAfter - if we were somehow able to persist the times in the Kubernetes object itself, it would be easy provide users with the option to specify their own function, which would provide time of next reconciliation. This seems like a generic way to solve many problems - instead of having two levels of that enum just for requeing after certain amount of time, there could be one accepting a RequeueStrategy enum, which would provide some default options like exponential or static, and there would always be the Custom option to provide with your own fn.

Anyway, I do think exponential backoff is useful, as working deployments need to be checked periodically, often they can fail (in ours cases, users might cause them to fail) and resources have to be cleaned in case of failure. Such a period can be prolonged after a while, not to stress Kubernetes that much. At least that's how I think about it - I always try for the code to be nice to its surroundings.

@clux
Copy link
Member

clux commented Mar 17, 2022

Revisiting this after writing docs. It's clear that this interface is still one of the awkward ones, and it's front and centre, so would like to address this.

In the lifespan of this issue we have NOT had any other reasonable requests for other modes of reconcile actions to be taken. Controller-runtime has exactly mode (requeAfter) to configure - albeit with an awkward bool to set it. The only potential option that makes sense to me would be a backoff based one, but I think we already support this by adding a Backoff struct to the Context and bumping it from error_policy (and resetting it from reconcile).

Ultimately, the two options we have here are either:

  • requeue with a duration
  • wait for a (occasionally miss-able due to desyncs) change

So I propose that rather than have the escape hook be a magic requeueAfter: None have a documented enum variant with a longer name, that lets the owner bear the responsibility for retaining eventual consistency. E.g. like having a hook for reconcile_all_on or having a fast moving object.

I.e.:

pub enum Action {
    /// When to next trigger the reconciliation if no external watch triggers hit
    ///
    /// This is the best-practice action that ensures eventual consistency of your controller
    /// even in the case of missed changes (which can happen).
    ///
    /// Watch events are not normally missed, so running this once per hour as a fallback is reasonable.
    Requeue(Duration),

    /// Do nothing until a change is detected
    ///
    /// This stops the controller periodically reconciling this object until a relevant watch event
    /// was **detected**.
    ///
    /// **Warning**: If you have watch desyncs, it is possible to miss changes entirely.
    /// It is therefore not recommended to disable requeuing this way, unless you have
    /// frequent changes to the underlying object, or some other hook to retain eventual consistency.
    AwaitChange,
}

as a benefit we get the much snappier reconcile impls:

-    Ok(ReconcilerAction {
-        requeue_after: Some(Duration::from_secs(300)),
-    })
+    Ok(Action::Requeue(Duration::from_secs(300)))

@nightkr
Copy link
Member

nightkr commented Mar 17, 2022

How about adding associated functions, so it'd be Ok(Action::requeue(Duration::from_secs(300))? That way we would get the added convenience for the simple cases, while still not locking us into having a single action forever.

@clux
Copy link
Member

clux commented Mar 17, 2022

Sure, that makes sense to me. Also makes the change in applier easier.

clux added a commit that referenced this issue Mar 17, 2022
@clux clux closed this as completed in #851 Mar 17, 2022
clux added a commit that referenced this issue Mar 17, 2022
* Change ReconcileAction to Action and add associated ctors

fixes #317

Signed-off-by: clux <[email protected]>

* clippy

Signed-off-by: clux <[email protected]>

* clippy + docs

Signed-off-by: clux <[email protected]>

* fix extraneous doc warnings in entry

Signed-off-by: clux <[email protected]>

* remove Default for Action

Signed-off-by: clux <[email protected]>

* fix secret-syncer

Signed-off-by: clux <[email protected]>

* fix remaining references to reconciler action

Signed-off-by: clux <[email protected]>
@clux clux added runtime controller runtime related ergonomics ergonomics of the public interface labels Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ergonomics ergonomics of the public interface runtime controller runtime related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants