Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod disruption schedule #1719

Open
jukie opened this issue Sep 29, 2024 · 9 comments
Open

Pod disruption schedule #1719

jukie opened this issue Sep 29, 2024 · 9 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@jukie
Copy link

jukie commented Sep 29, 2024

Description

What problem are you trying to solve?
I have some workloads that are sensitive to interruptions at certain points of the day and thus are using the karpenter.sh/do-not-disrupt annotation. I'd like the ability to allow disruptions to these pods at specific points via cron format schedule.

How important is this feature to you?
In order to allow reclaiming nodes for expiration or underutilization I'm currently running my own controller that watches DisruptionBlocked events and then removes the do-not-disrupt annotation if the pods are marked with another one indicating the schedule for when disruptions are allowed. I'd like something similar to be added upstream and get rid of my own controller.

  1. karpenter.sh/disruption-schedule - cron format of when disruptions are allowed (e.g. 0 14 * * 6)
  2. karpenter.sh/disruption-schedule-duration - duration for which the schedule is active (e.g. 3h)
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jukie jukie added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 29, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 29, 2024
@jukie jukie changed the title Pod disruption window Pod disruption schedule Sep 29, 2024
@njtran
Copy link
Contributor

njtran commented Oct 1, 2024

Is this for your job/task related pods? Would it be sufficient for you if the do-not-disrupt annotation respected a duration string for how long it couldn't be disrupted, and then otherwise is fine to ignore?

@jukie
Copy link
Author

jukie commented Oct 2, 2024

This would be for always-running job/task workers or singleton services. terminationGracePeriod solves the duration piece for do-not-disrupt and gives us a guaranteed max lifetime for a node but the use case would be for workloads that want to allow disruption at specific times of the day such as a legacy monolith that only runs during business hours. In that scenario I want an extension on do-not-disrupt so that if a node is marked for disruption during the disruption-schedule then it's safe to disrupt immediately.

@redhug1
Copy link

redhug1 commented Oct 7, 2024

Would you be able to share your "own controller" code ?
Thanks.

@njtran
Copy link
Contributor

njtran commented Oct 7, 2024

@jukie what i'm wondering is if this is a function of the lifetime of the pod in any way? What sort of workloads only want to be disrupted at a certain time as opposed to some other signal in the cluster (like other pods going away). I'm not sure I like encoding this sort of API surface onto the pod itself. It's very loosely defined, easier to run into validation issues, and doesn't promote elasticity.

On top of this, Karpenter has to reason about when it's fine to enqueue a disruption vs when it's fine to actually drain the pod. Let's say I had this schedule + duration, do I want to nominate the node for consolidation if it's not in it's disruptable period? Or do i wait for it to be in a disruptable period before I nominate it? If so, if it goes out of being able to disrupt, then i'm now left with a pod I can't evict until my TGP, which could be overall higher cost.

@jukie
Copy link
Author

jukie commented Oct 8, 2024

@njtran I'll try to expand a bit on the long running task executor example - these don't execute as Jobs or ephemeral pods but are polling a queue constantly checking for available work. Some of these tasks could take minutes and some might even take days and are expensive to re-execute so I want to limit disruption to these workers outside of a "maintenance window" such as between 12am-3am on a particular day of the week.

  • Configuring terminationGracePeriod and setting do-not-disrupt on the pod won't allow for this as the pods will stay running until TGP is reached where they'll be forcefully killed at any time of day.
  • NodePool disruption budgets are also insufficient since Karpenter would still reclaim Expired NodeClaims outside the budget window.

For your other questions:

Let's say I had this schedule + duration, do I want to nominate the node for consolidation if it's not in it's disruptable period?

My PR (#1720) uses the existing logic that do-not-disrupt uses by updating podutil.IsEvictable() and podutil.IsDisruptable() to consider this new annotation. If the window is inactive it would lead to the same DisruptionBlocked events that refresh every ~5min until it becomes active.

If so, if it goes out of being able to disrupt, then i'm now left with a pod I can't evict until my TGP, which could be overall higher cost.

Wouldn't the higher cost scenario already be the default? Adding the ability to consider a disruptable window would lower overall cost by being able to disrupt nodes before TGP. It'd probably be a good idea to set a minimum window duration to avoid the scenario you describe though.

@jukie
Copy link
Author

jukie commented Oct 8, 2024

@zack-johnson5455
Copy link

I'd like to add support to this issue. My use case:

We have services with varying tolerance for disruption. We'd like to allow services to express their own requirements: "I can tolerate X restarts every Y days and (optionally), as long as it's within Z timeframe"

We'd prefer to limit the number of different nodepools we manage.

We're currently running 0.36 and are having to implement something similar to https://github.com/jukie/karpenter-deprovision-controller (where we strategically remove do-not-disrupt annotations).

Our understanding is that in 1.0, we can take advantage of expiration + terminationGracePeriod to enforce a maximum node age, regardless of the do-not-disrupt annotation. But that still would make it so that any service who uses the do-not-disrupt annotation is subject to the same frequency of disruption.

I think the proposal described in this issue would give us what we want. Any thoughts or recommendations?

@jukie
Copy link
Author

jukie commented Oct 17, 2024

@njtran any more thoughts on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants