Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix topology spread constraints with zonal volume #1907

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

leoryu
Copy link

@leoryu leoryu commented Jan 9, 2025

Fixes #1239

Description
At present, topology spread constraints in karpenter has 3 problems:

  1. Karpenter inject volume nodeAffinity info to pod, and the nodes not compatible with the volume nodeAffinity are ignored, which will break the topology spread constraints.
  2. When karpenter counting domains, the existing nodes which don't have the related domain pod are not counted, this will case missing some domains in topology spread calculations.
  3. In topology spread calculations, karpenter chooses a single, random min-counts domain from the eligible domains as the requirement, but the instance with this domain may not be compatible with the volume requirement(s).

The major works of this PR are as follows:

  1. Handling pod volume requirements independently.
  2. Add all existing nodes' domains in topology spread calculations.
  3. Add all candidate domains to topology spread constraints when pod has volume requirement(s).

How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: leoryu
Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 9, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @leoryu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 9, 2025
@leoryu leoryu force-pushed the fix-incorrect-topology-spread-constraints-with-zonal-volume branch 5 times, most recently from c355d14 to a993af1 Compare January 12, 2025 02:56
@coveralls
Copy link

coveralls commented Jan 12, 2025

Pull Request Test Coverage Report for Build 12735266711

Details

  • 108 of 112 (96.43%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 81.323%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/scheduling/topologygroup.go 53 54 98.15%
pkg/controllers/provisioning/scheduling/volumetopology.go 4 5 80.0%
pkg/controllers/provisioning/scheduling/existingnode.go 6 8 75.0%
Totals Coverage Status
Change from base Build 12718181288: 0.1%
Covered Lines: 9135
Relevant Lines: 11233

💛 - Coveralls

@leoryu leoryu force-pushed the fix-incorrect-topology-spread-constraints-with-zonal-volume branch 4 times, most recently from 6d8d793 to 1dbfa0a Compare January 12, 2025 13:23
@leoryu leoryu force-pushed the fix-incorrect-topology-spread-constraints-with-zonal-volume branch from 1dbfa0a to 420766f Compare January 12, 2025 14:49
@leoryu leoryu changed the title [WIP]fix: Fix topology spread constraints with zonal volume fix: Fix topology spread constraints with zonal volume Jan 13, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 13, 2025
@leoryu
Copy link
Author

leoryu commented Jan 13, 2025

@jmdeal @engedaam @tallaxes @jonathan-innis @njtran hi, can you help review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zonal Volume Requirements Break Topology Spread Constraints
3 participants