Skip to content

Conversation

@ngopalak-redhat
Copy link
Contributor

@ngopalak-redhat ngopalak-redhat commented Nov 12, 2025

TODO: Before Review

  • Complete upgrade testing

What I did

This PR enables system-reserved-compressible enforcement by default for all new OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

  • Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)
  • Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

  • Detect when reservedSystemCPUs (--reserved-cpus) is set
  • Automatically clear systemReservedCgroup when reservedSystemCPUs is detected
  • Set enforceNodeAllocatable to ["pods"] only in this scenario
  • Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

  • Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

  1. Deploy a new OCP 4.21+ cluster
  2. SSH into a node and verify kubelet configuration:
    cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
    cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable
  3. Verify the output shows:
    systemReservedCgroup: /system.slice
    enforceNodeAllocatable:
  • pods
  • system-reserved-compressible

For Clusters with Performance Profiles:

  1. Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)
  2. Wait for the MachineConfig to be applied and nodes to reboot
  3. SSH into the affected node and check kubelet configuration:
    cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
    cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable
  4. Verify that:
    - systemReservedCgroup is NOT present (empty/cleared)
    - enforceNodeAllocatable only contains ["pods"]
    - Kubelet starts successfully without errors
  5. Check kubelet logs to confirm no conflicts:
    journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

  1. Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior
  2. Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in new OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.


Related:

@ngopalak-redhat ngopalak-redhat changed the title Implement system-reserved-compressible WIP: Implement system-reserved-compressible Nov 12, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ngopalak-redhat
Once this PR has been reviewed and has the lgtm label, please assign yuqi-zhang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/system-reserved-compressible-1 branch from ca28d80 to 00bb8e1 Compare November 17, 2025 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant