-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-5573]: Remove cgroup v1 support #5574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP-5573]: Remove cgroup v1 support #5574
Conversation
--> | ||
|
||
**Upgrade considerations**: | ||
- Clusters upgrading to Kubernetes v1.35+ on cgroup v1 hosts will fail to start kubelet unless `FailCgroupV1` is set to false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For kubeadm view, init/upgrade precheck should fail for cgroup v1.
This should be applied after the kubelet change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this fail?
Oh you mean if kubelet removes cgroup v1. If we don't do removal, is there anything needed from kubeadm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that kubeadm needs an update if the default value of FailCgroupV1 is changed. In kubeadm, we need to do a precheck before kubeadm upgrade
to avoid kubelet failing to restart.
This KEP focus on kubelet, so this may be out of the scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
Let's explicitly document this as Out-of-scope / TODO. I am ok with a follow up KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should release in a state where upgrades are unsafe with a core release binary, that's not a great message to users. kubeadm is part of the core release binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work required seems small enough that we can handle it before releasing this.
2fde3ad
to
59dadda
Compare
FYI @aojea @stmcginnis @medyagh @afbjorklund |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is something we need to do eventually, and there is clearly evidence that the linux ecosystem is moving this direction, but I think there's still a question of timing.
I don't see much evidence provided that non-bleeding-edge distros are done heading this way (including both those that we run "production" clusters on, and those that developers may locally test on with many tools, yes including kind
, I'd be happy to not have to deal with this but we need to consider the impact).
In the linked issue I only see mentions of some major managed clusters, and fedora / bleeding-edge systemd...
I updated this to focus on deprecation on first phase of this and we can leave removal of cgroup for a future release. We will not put a time on the date as it sounds like its still up for debate when we can remove cgroup v1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kannon92 minikube has large user base, due to fast image builds (on cri-dockerd) minikube has not changed to containerd as default container runtime.
this KEP would affect the local kuberentes experience and onboarding experience for kubernetes if it causes breaking minikube.
I suggest this KEP only be approved if cri-dockerd support for cgroup V2 is merged and ready OR if minikube could switch to containerd before this change.
otherwise expect un-needed disruption on local developement expierence for kuberentes and issues on onboarding new users
the first round of this will just flip the field to fail cgroup v1. You can turn this off and Minikibe will still work. Is this sufficient for you? we changed it to not give a date for removal but we will remove cgroup v1 support at some point. So migrating off of cgroup v1 is important. |
### Removal of cgroup v1 | ||
|
||
<UNRESOLVED @haircommander> | ||
Once all supported releases of Kubernetes have `FailCGroupV1` set to true, we can begin the removal of the cgroup v1 support. | ||
|
||
In this section, we should call the places where we are going to remove cgroup v1. | ||
</UNRESOLVED> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can fill it up when we will plan actual removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I think its okay to leave this in though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
only minor comments
07ca3e3
to
23c6068
Compare
|
||
## Motivation | ||
|
||
Following the transition of cgroup v1 support to maintenance mode in KEP-4569, the next logical step is to move cgroup v1 to an unsupported state. This aligns with the broader ecosystem's migration to cgroup v2, including major Linux distributions and the Linux kernel community's focus on cgroup v2 for new features and improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what we mean by unsupported? For example, will we fix bugs? Will there be CI running to check for regressions? (We discussed some it in the sig-node weekly call today.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe refer to the maintanence mode document suggesting the same level of commitment as we had for a few versions now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a round of updates here. PTAL.
yes That sounds reasonable, the goal should be making sure not causing a bad local developer experience an new user onboarding experience for Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
### Removal of cgroup v1 | ||
|
||
<UNRESOLVED @haircommander> | ||
Once all supported releases of Kubernetes have `FailCGroupV1` set to true, we can begin the removal of the cgroup v1 support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/FailCGroupV1/FailCgroupV1/g
a nit
In the PR that makes this switch, please add a release-note-action-required that directs cgroupsv1 users how to prepare in advance. PRR lgtm /approve |
/lgtm Please send the follow up PR for those minor changes. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, deads2k, kannon92, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
### Risks and Mitigations | ||
|
||
The primary risks involve potential disruptions for users who have not yet migrated to cgroup v2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the oomkill change? (which can be mitigated by the config option?)
Belated LGTM, minor comments addressed in #5646 |
Claude code was used to aide in development of this KEP.