-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[1.3] bump cgroups to v0.0.4 + related test fixes #4897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.3] bump cgroups to v0.0.4 + related test fixes #4897
Conversation
Instead of providing systemd CPU quota value (CPUQuotaPerSec), calculate it based on how opencontainers/cgroups/systemd handles it (see addCPUQuota). Signed-off-by: Kir Kolyshkin <[email protected]>
For changes, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.2 Fix integration tests according to changes in [1] (now the CPU quota value set is rounded the same way systemd does it). [1]: opencontainers/cgroups#4 Signed-off-by: Kir Kolyshkin <[email protected]>
Since opencontainers/cgroups v0.0.2 (commit b206a01), all stuct Resources fields are annotated with "omitempty" attribute. As a result, the loaded configuration may have Resources == nil. It is totally OK (rootless containers may have no resources configured) except since commit 6c5441e, cgroup v1 fs manager requires Resources to be set in the call to NewManager (this is a cgroup v1 deficiency, or maybe our implementation deficiency, or both). To work around this, let's add code to ensure Resources is never nil after loading from state.json. Signed-off-by: Kir Kolyshkin <[email protected]>
For changelog, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.3 This fixes two runc issues: 1. JSON incompatibility introduced in cgroups v0.0.2 (see opencontainers/cgroups#22). 2. Bad CPU shares to CPU weight conversion (see opencontainers#4772). Due to item 2, modify some tests accordingly. Signed-off-by: Kir Kolyshkin <[email protected]>
@jaredledvina Thanks! I'm fine with the idea of a backport if it fixes a bug in prod. I haven't been following that issue, though, can you comment briefly what issue do you hit in prod? |
👋 Hey @rata, sure thing! In short, after migrating to cgroup v2, containers that are created with a low cpu request values will be given a significantly lower relative cpu weight. As a specific example, we run 1 container that is configured to get 20 mcores. After migrating to cgroup v2, the determined cpu weight is equivalent to if this container requested 0. The updated formula in opencontainers/cgroups#20 improves the granularity of the allocated cpu weights for containers in this sort of situation. kubernetes/kubernetes#131216 (comment) has some nice vizuals of the algorithm improvement. |
Cool, thanks! @kolyshkin any thoughts on this? |
I just want to point out that I didn't really plan to do a 1.3.2 release for another month (meaning this won't be much faster than waiting for 1.4.0), but I could be convinced to cut a release earlier if necessary... |
Ah okay, do you all happen to know when containerd opt's to bump to new minor releases of runc? I'm working on cutting a release off our fork for now to unblock the immediate issue regardless. |
I'm all for it. In addition, I was looking into whether we should port cgroups v0.0.4, too (which fixes the annoying bug #4798). The only other change in v0.0.4 which is of interest here is opencontainers/cgroups#24, and it looks like it can't bring other regressions. So maybe we should shove v0.0.4 here, too; @jaredledvina can you please cherry-pick #4808 as well? |
Bumps [github.com/opencontainers/cgroups](https://github.com/opencontainers/cgroups) from 0.0.3 to 0.0.4. - [Release notes](https://github.com/opencontainers/cgroups/releases) - [Changelog](https://github.com/opencontainers/cgroups/blob/main/RELEASES.md) - [Commits](opencontainers/cgroups@v0.0.3...v0.0.4) --- updated-dependencies: - dependency-name: github.com/opencontainers/cgroups dependency-version: 0.0.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>
@kolyshkin - Thanks for taking a look! I've cherry-picked fc8162e here too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming tests are green
I've updated the title and description to better reflect the current state of this PR, and also added labels and a milestone. |
Actually, I might as well just prepare a release with this next week. |
I greatly appreciate all the help, thank you! |
This is a backport of
to the 1.3 release branch.
I'm looking to get the fix for #4772 back on 1.3 (we're currently running 1.3.1) instead of waiting until the end of October for 1.4 to be released per https://github.com/opencontainers/runc/blob/main/RELEASES.md. We have noticed the side-effects of this issue in production while migrating to cgroup v2.