fix(controller): clean stale status entries during node deletion reco… #258
fix(controller): clean stale status entries during node deletion reco… #258SeeyaVhora wants to merge 1 commit into
Conversation
✅ Deploy Preview for node-readiness-controller canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SeeyaVhora The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
|
Welcome @SeeyaVhora! |
|
Hi @SeeyaVhora. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc @ajaysundark /ok-to-test |
|
@SeeyaVhora Can you start by creating an issue describing in detail how to reproduce the symptoms you aim to fix here? |
Thank you for your response @ajaysundark Please let me know if any additional details or adjustments would be helpful. |
|
hello @ajaysundark @mrunalp @SergeyKanzhelev Please look into this PR and review it. |
Summary
This PR fixes a stale status lifecycle issue during node deletion reconciliation in the
NodeReadinessRulecontroller.Previously, deleted nodes could remain temporarily or indefinitely persisted inside
NodeEvaluationsandFailedNodesstatus fields due to reconciliation ordering and cleanup semantics.The controller would:
This created:
FailedNodes.Root Cause
cleanupDeletedNodes()executed afterupdateRuleStatus()and independently patched the API using its own retry loop.As a result:
FailedNodesentries for deleted nodes were never cleaned consistently.Changes
Reconciliation Ordering Fix
Reordered reconciliation flow so
cleanupDeletedNodes()executes beforeupdateRuleStatus().This ensures the single authoritative status patch never contains stale deleted-node entries.
In-Memory Status Cleanup
Refactored
cleanupDeletedNodes()to mutate the in-memoryrule.Statusstate directly instead of performing an independent GET/PATCH retry loop.Persistence is now fully owned by
updateRuleStatus().Lifecycle Consistency
Extended deleted-node cleanup semantics to
FailedNodesin addition toNodeEvaluations, ensuring both status fields remain consistent during node churn and autoscaling scenarios.Regression Coverage
Added deterministic regression coverage validating that deleted nodes are absent from the synchronous post-reconcile status state.
Before vs After
cleanupDeletedNodes()behaviorFailedNodescleanupImpact