-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP] Add ResourceFlavor fallbacks KEP #2561
base: main
Are you sure you want to change the base?
Conversation
/assign @tenzen-y @alculquicondor |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this feature, and I'm looking forward to providing this feature.
But, I'm wondering if there are some rabbit hole and some hidden or implied specifications in this KEP.
#### Story 2 | ||
|
||
As a batch admin I configured my ClusterQueue to have 3 flavors: FlavorA, FlavorB, FlavorC. I want my users to use run their jobs on FlavorA first. On fallback I want them to try to run theirs jobs on FlavorB but if there is no capacity in the region, use FlavorC instead | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a story for the situations where fragmented resource consumption is happening?
Even if the cloud provider has enough computing, users generally configure the limitation of the autoscaling cluster. And then, once fragmentation happened, this feature is useful.
We propose to introduce a new API to the ClusterQueue object, similarly to AdmissionCheckStrategy. The new field called FlavorFallbackStrategy falls under FlavorFungibility configuration. The order of rules does not affect the order in which Flavors are assigned. This field should be treated more as a mapping between a ResourceFlavor and the timeout, and a list of policies applied to all flavors. | ||
|
||
```golang | ||
type FlavorFungibility struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a JSON on every fields so that we can understand and reach consensus for the field name?
// Values: [TimeoutForPodsReadyExceeded, AdmissionCheckRejected] | ||
Trigger string | ||
|
||
TimeoutMinutes *int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we validate id this timeout is larger than the waitForPodsReady.timeout
?
If no restrictions for this timeout, when we enable waitForPodsReady and fallbackStrategy is enabled, waitForPodsReady could accidentally lose the functionality, right?
// Values: [TimeoutForPodsReadyExceeded, AdmissionCheckRejected] | ||
Trigger string | ||
|
||
TimeoutMinutes *int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide the calculation formula to calculate the time consumed throughout the request processes?
Specifically, I would like to know if the consumed time contains 1. backoff time, 2. the time while queue after the workload is requeued.
For sure, I understand that you mentioned that the time while queue doesn't contain the consumed time.
But, I can not confirm whether or not the time during queue regardless of requeing.
// Values: [TimeoutForPodsReadyExceeded, AdmissionCheckRejected] | ||
Trigger string | ||
|
||
TimeoutMinutes *int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the Workload reaches this timeout, what happens? Will the Workload get the dedicated condition?
Or, will the Workload just be stopped the same as the PodsReadyTimeout mechanism?
Name ResourceFlavorReference | ||
|
||
// trigger is an enum describing whether the fallback is AdmissionCheck based, or timeout based | ||
// Values: [TimeoutForPodsReadyExceeded, AdmissionCheckRejected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AdmissionCheckRejected
is selectable here implies the fact that we can reuse this fallback mechanism in the MultiKueue and other admissionCheck mechanisms, right?
keps/2560-flavor-fallback/kep.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you mention the flavorFungibility KEP in the see-also section?
https://github.com/kubernetes-sigs/kueue/tree/main/keps/582-preempt-based-on-flavor-order
Co-authored-by: Yuki Iwai <[email protected]>
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PBundyra The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Introduces KEP for ResourceFlavor fallbacks. The details are described in-depth in KEP itself.
Which issue(s) this PR fixes:
Part of #2560
Special notes for your reviewer:
Does this PR introduce a user-facing change?