Skip to content

Conversation

jaredledvina
Copy link

@jaredledvina jaredledvina commented Sep 23, 2025

This is a backport of

to the 1.3 release branch.

I'm looking to get the fix for #4772 back on 1.3 (we're currently running 1.3.1) instead of waiting until the end of October for 1.4 to be released per https://github.com/opencontainers/runc/blob/main/RELEASES.md. We have noticed the side-effects of this issue in production while migrating to cgroup v2.

Instead of providing systemd CPU quota value (CPUQuotaPerSec),
calculate it based on how opencontainers/cgroups/systemd handles
it (see addCPUQuota).

Signed-off-by: Kir Kolyshkin <[email protected]>
For changes, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.2

Fix integration tests according to changes in [1] (now the CPU quota value set
is rounded the same way systemd does it).

[1]: opencontainers/cgroups#4
Signed-off-by: Kir Kolyshkin <[email protected]>
Since opencontainers/cgroups v0.0.2 (commit b206a01), all stuct
Resources fields are annotated with "omitempty" attribute.
As a result, the loaded configuration may have Resources == nil.

It is totally OK (rootless containers may have no resources configured)
except since commit 6c5441e, cgroup v1 fs manager requires Resources to
be set in the call to NewManager (this is a cgroup v1 deficiency,
or maybe our implementation deficiency, or both).

To work around this, let's add code to ensure Resources is never nil
after loading from state.json.

Signed-off-by: Kir Kolyshkin <[email protected]>
For changelog, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.3

This fixes two runc issues:

1. JSON incompatibility introduced in cgroups v0.0.2 (see
   opencontainers/cgroups#22).

2. Bad CPU shares to CPU weight conversion (see
   opencontainers#4772).

Due to item 2, modify some tests accordingly.

Signed-off-by: Kir Kolyshkin <[email protected]>
@rata
Copy link
Member

rata commented Sep 24, 2025

@jaredledvina Thanks! I'm fine with the idea of a backport if it fixes a bug in prod. I haven't been following that issue, though, can you comment briefly what issue do you hit in prod?

@jaredledvina
Copy link
Author

👋 Hey @rata, sure thing! In short, after migrating to cgroup v2, containers that are created with a low cpu request values will be given a significantly lower relative cpu weight.

As a specific example, we run 1 container that is configured to get 20 mcores. After migrating to cgroup v2, the determined cpu weight is equivalent to if this container requested 0. The updated formula in opencontainers/cgroups#20 improves the granularity of the allocated cpu weights for containers in this sort of situation. kubernetes/kubernetes#131216 (comment) has some nice vizuals of the algorithm improvement.

@rata
Copy link
Member

rata commented Sep 24, 2025

Cool, thanks! @kolyshkin any thoughts on this?

@cyphar
Copy link
Member

cyphar commented Sep 24, 2025

I just want to point out that I didn't really plan to do a 1.3.2 release for another month (meaning this won't be much faster than waiting for 1.4.0), but I could be convinced to cut a release earlier if necessary...

@jaredledvina
Copy link
Author

Ah okay, do you all happen to know when containerd opt's to bump to new minor releases of runc?
In my mind, I was hoping it'd be easier to cherry-pick just these changes back to 1.3 and release as a bug fix versus including all of the other bits in 1.4. However, if the timelines workout that 1.4 would be cut around when 1.3.2 would, then it's probably not worth it.

I'm working on cutting a release off our fork for now to unblock the immediate issue regardless.

@kolyshkin
Copy link
Contributor

Cool, thanks! @kolyshkin any thoughts on this?

I'm all for it. In addition, I was looking into whether we should port cgroups v0.0.4, too (which fixes the annoying bug #4798). The only other change in v0.0.4 which is of interest here is opencontainers/cgroups#24, and it looks like it can't bring other regressions.

So maybe we should shove v0.0.4 here, too; @jaredledvina can you please cherry-pick #4808 as well?

Bumps [github.com/opencontainers/cgroups](https://github.com/opencontainers/cgroups) from 0.0.3 to 0.0.4.
- [Release notes](https://github.com/opencontainers/cgroups/releases)
- [Changelog](https://github.com/opencontainers/cgroups/blob/main/RELEASES.md)
- [Commits](opencontainers/cgroups@v0.0.3...v0.0.4)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/cgroups
  dependency-version: 0.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@jaredledvina
Copy link
Author

@kolyshkin - Thanks for taking a look! I've cherry-picked fc8162e here too.

@kolyshkin kolyshkin changed the title [1.3] cherry-pick cgroup v0.0.2 & v0.0.3 upgrade + related test fixes [1.3] bump cgroups to v0.0.4 + related test fixes Sep 24, 2025
@kolyshkin kolyshkin added this to the 1.3.2 milestone Sep 24, 2025
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming tests are green

@kolyshkin
Copy link
Contributor

I've updated the title and description to better reflect the current state of this PR, and also added labels and a milestone.

@cyphar
Copy link
Member

cyphar commented Sep 25, 2025

Actually, I might as well just prepare a release with this next week.

@cyphar cyphar merged commit e0b5e1e into opencontainers:release-1.3 Sep 25, 2025
36 checks passed
@jaredledvina
Copy link
Author

I greatly appreciate all the help, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants