Skip to content

Conversation

kannon92
Copy link
Contributor

@kannon92 kannon92 commented Sep 26, 2025

  • One-line PR description:
  • Other comments:

Claude code was used to aide in development of this KEP.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 26, 2025
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2025
@kannon92 kannon92 mentioned this pull request Sep 26, 2025
4 tasks
-->

**Upgrade considerations**:
- Clusters upgrading to Kubernetes v1.35+ on cgroup v1 hosts will fail to start kubelet unless `FailCgroupV1` is set to false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For kubeadm view, init/upgrade precheck should fail for cgroup v1.

This should be applied after the kubelet change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this fail?

Oh you mean if kubelet removes cgroup v1. If we don't do removal, is there anything needed from kubeadm?

Copy link
Member

@pacoxu pacoxu Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that kubeadm needs an update if the default value of FailCgroupV1 is changed. In kubeadm, we need to do a precheck before kubeadm upgrade to avoid kubelet failing to restart.

This KEP focus on kubelet, so this may be out of the scope here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Let's explicitly document this as Out-of-scope / TODO. I am ok with a follow up KEP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should release in a state where upgrades are unsafe with a core release binary, that's not a great message to users. kubeadm is part of the core release binaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work required seems small enough that we can handle it before releasing this.

@kannon92 kannon92 force-pushed the move-cgroup-v1-unsupported branch 2 times, most recently from 2fde3ad to 59dadda Compare October 1, 2025 19:48
@BenTheElder
Copy link
Member

FYI @aojea @stmcginnis @medyagh @afbjorklund
(Re: local clusters & cgroupv1)

@kannon92 kannon92 changed the title [KEP-5573]: Move cgroup v1 to unsupported [KEP-5573]: Remove cgroup v1 support Oct 1, 2025
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is something we need to do eventually, and there is clearly evidence that the linux ecosystem is moving this direction, but I think there's still a question of timing.

I don't see much evidence provided that non-bleeding-edge distros are done heading this way (including both those that we run "production" clusters on, and those that developers may locally test on with many tools, yes including kind, I'd be happy to not have to deal with this but we need to consider the impact).

In the linked issue I only see mentions of some major managed clusters, and fedora / bleeding-edge systemd...

@kannon92
Copy link
Contributor Author

kannon92 commented Oct 3, 2025

I updated this to focus on deprecation on first phase of this and we can leave removal of cgroup for a future release.

We will not put a time on the date as it sounds like its still up for debate when we can remove cgroup v1.

Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannon92 minikube has large user base, due to fast image builds (on cri-dockerd) minikube has not changed to containerd as default container runtime.

this KEP would affect the local kuberentes experience and onboarding experience for kubernetes if it causes breaking minikube.

I suggest this KEP only be approved if cri-dockerd support for cgroup V2 is merged and ready OR if minikube could switch to containerd before this change.

otherwise expect un-needed disruption on local developement expierence for kuberentes and issues on onboarding new users

@kannon92
Copy link
Contributor Author

kannon92 commented Oct 3, 2025

@kannon92 minikube has large user base, due to fast image builds (on cri-dockerd) minikube has not changed to containerd as default container runtime.

this KEP would affect the local kuberentes experience and onboarding experience for kubernetes if it causes breaking minikube.

I suggest this KEP only be approved if cri-dockerd support for cgroup V2 is merged and ready OR if minikube could switch to containerd before this change.

otherwise expect un-needed disruption on local developement expierence for kuberentes and issues on onboarding new users

@medyagh

the first round of this will just flip the field to fail cgroup v1. You can turn this off and Minikibe will still work. Is this sufficient for you?

we changed it to not give a date for removal but we will remove cgroup v1 support at some point. So migrating off of cgroup v1 is important.

Comment on lines +240 to +247
### Removal of cgroup v1

<UNRESOLVED @haircommander>
Once all supported releases of Kubernetes have `FailCGroupV1` set to true, we can begin the removal of the cgroup v1 support.

In this section, we should call the places where we are going to remove cgroup v1.
</UNRESOLVED>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can fill it up when we will plan actual removal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I think its okay to leave this in though.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

only minor comments

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2025
@kannon92 kannon92 force-pushed the move-cgroup-v1-unsupported branch from 07ca3e3 to 23c6068 Compare October 7, 2025 15:22
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2025
@kannon92
Copy link
Contributor Author

kannon92 commented Oct 7, 2025

for sig-node approval
/assign @mrunalp
For PRR.
/assign @deads2k


## Motivation

Following the transition of cgroup v1 support to maintenance mode in KEP-4569, the next logical step is to move cgroup v1 to an unsupported state. This aligns with the broader ecosystem's migration to cgroup v2, including major Linux distributions and the Linux kernel community's focus on cgroup v2 for new features and improvements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify what we mean by unsupported? For example, will we fix bugs? Will there be CI running to check for regressions? (We discussed some it in the sig-node weekly call today.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe refer to the maintanence mode document suggesting the same level of commitment as we had for a few versions now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a round of updates here. PTAL.

@medyagh
Copy link
Member

medyagh commented Oct 8, 2025

@kannon92 minikube has large user base, due to fast image builds (on cri-dockerd) minikube has not changed to containerd as default container runtime.
this KEP would affect the local kuberentes experience and onboarding experience for kubernetes if it causes breaking minikube.
I suggest this KEP only be approved if cri-dockerd support for cgroup V2 is merged and ready OR if minikube could switch to containerd before this change.
otherwise expect un-needed disruption on local developement expierence for kuberentes and issues on onboarding new users

@medyagh

the first round of this will just flip the field to fail cgroup v1. You can turn this off and Minikibe will still work. Is this sufficient for you?

we changed it to not give a date for removal but we will remove cgroup v1 support at some point. So migrating off of cgroup v1 is important.

yes That sounds reasonable, the goal should be making sure not causing a bad local developer experience an new user onboarding experience for Kubernetes.

Copy link
Member

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

### Removal of cgroup v1

<UNRESOLVED @haircommander>
Once all supported releases of Kubernetes have `FailCGroupV1` set to true, we can begin the removal of the cgroup v1 support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/FailCGroupV1/FailCgroupV1/g
a nit

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2025
@deads2k
Copy link
Contributor

deads2k commented Oct 9, 2025

In the PR that makes this switch, please add a release-note-action-required that directs cgroupsv1 users how to prepare in advance.

PRR lgtm

/approve

@dchen1107
Copy link
Member

/lgtm
/approve

Please send the follow up PR for those minor changes.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, deads2k, kannon92, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2025

### Risks and Mitigations

The primary risks involve potential disruptions for users who have not yet migrated to cgroup v2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the oomkill change? (which can be mitigated by the config option?)

@k8s-ci-robot k8s-ci-robot merged commit 7bf6ad0 into kubernetes:master Oct 9, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Oct 9, 2025
@BenTheElder
Copy link
Member

Belated LGTM, minor comments addressed in #5646

@neolit123
Copy link
Member

@kannon92 please ping @pacoxu or me on the k/k PR so that we can update kubeadm / k/system-validators in 1.35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.