-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5278: update KEP for NominatedNodeName, narrowing down the scope of the feature and moving it to beta #5618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c0bce5c
e0345cd
177e105
7d504eb
e2533f7
bd7922d
65a5008
64d8419
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -630,7 +630,7 @@ Pods that are processed by Permit or PreBind plugins get NominatedNodeName durin | |
| ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
|
||
| Yes. | ||
| The feature can be disabled in Alpha version by restarting the kube-scheduler and kube-apiserver with the feature-gate off. | ||
| The feature can be disabled in Beta version by restarting the kube-scheduler and kube-apiserver with the feature-gate off. | ||
|
|
||
| ###### What happens if we reenable the feature if it was previously rolled back? | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In
Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In
What was the result of the test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The idea behind enablement/disablement tests is that depending on the FG the functionality is or is not working. So the question is more around ensuring that when you turn off the appropriate FG the functionality doesn't set NNN, or in case of kube-apiserver doesn't clear it, and vice versa when it's on. Especially at beta stage, where we need to ensure that users can safely turn off this on-by-default (beta) functionality. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As @stlaz pointed out, this description is required for beta promotion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @soltysh
Thank you for bringing my attention to this point, I missed this earlier when editing the KEP. So the trouble is, this is not a typical promotion from alpha to beta. The scope of this KEP in alpha allowed NNN to be set by components other than kube-scheduler and established the semantics for this field. The designed behavior was that the NNN field would be cleared in some situations, but not after a failed scheduling attempt. But after the v1.34 release there was a change of plans in sig-scheduling, and with the idea of Gang Scheduling coming up (and with that - ideas for new approaches to resource reservation), it seems that NNN might not be the mechanism we want to invest in right now, as a means for other components to suggest pod placement to kube-scheduler. At the same time using NNN as "set in kube-scheduler, read-only in CA" seems like a good and worthwhile approach to solve the buggy scenario "If pod P is scheduled to bind on node N, but binding P takes a long time, and N is otherwise empty, CA might turn down N before P gets bound". Please note that before the alpha KEP the scheduler's code would clear NNN after a failed scheduling attempt. So what this hoping-to-be-beta KEP does vs pre-alpha is:
And what this beta-KEP does vs alpha-KEP is:
With all that, in beta-KEP the NNN should be set when a pod is either waiting for preemption to complete (which had been the case before alpha-KEP), or during prebinding/binding phases. And it should be cleared after binding in api-server. Can you please help me with the following questions?
Thank you! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As it turns out, there are no tests running with this feature enabled (probably because the original plan was to launch in beta in 1.34, and the FG would be on by default, and all tests would run it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, you'll want to update the doc afterwards.
Both options are fine by me. But it seems the versions with (a) will be easier to perform.
Yes, it can be updated in followup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you! I will make sure to update the doc with all the results |
||
|
|
@@ -752,8 +752,8 @@ No. | |
| ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | ||
|
|
||
| Yes - but it should be negligible impact. | ||
| The memory usage in kube-scheduler is supposed to increase by external components starting to use this | ||
| because when `NominatedNodeName` is added on the pods, the scheduler's internal component called `nominator` has to record them so that scheduling cycles can refer to them as necessary. | ||
| The memory usage in kube-scheduler is supposed to increase because when `NominatedNodeName` is added on the pods, the scheduler's | ||
| internal component called `nominator` has to record them so that scheduling cycles can refer to them as necessary. | ||
|
|
||
| ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? | ||
|
|
||
|
|
||
ania-borowiec marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.