Skip to content

Conversation

synhershko
Copy link

When nodes are drained during K8S or OS upgrades, pods get deleted and recreated, but the StatefulSet revision doesn't change. The existing logic incorrectly triggered rolling restarts in these scenarios, causing all pods with lower ordinal numbers to be terminated simultaneously.

This change adds revision comparison checks to prevent rolling restarts when StatefulSet current and update revisions match, indicating no actual spec changes occurred. Rolling restarts now only happen when there are legitimate configuration updates requiring pod recreation.

Changes:

  • Add revision match checks in RollingRestartReconciler.Reconcile()
  • Enhance WorkingPodForRollingRestart() to skip when revisions match
  • Add debug logging to indicate when rolling restarts are skipped

Fixes issues during node upgrades and other node draining scenarios where pods are recreated without spec changes. Fixes #312.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

When nodes are drained during K8S or OS upgrades, pods get deleted and
recreated, but the StatefulSet revision doesn't change. The existing
logic incorrectly triggered rolling restarts in these scenarios, causing
all pods with lower ordinal numbers to be terminated simultaneously.

This change adds revision comparison checks to prevent rolling restarts
when StatefulSet current and update revisions match, indicating no
actual spec changes occurred. Rolling restarts now only happen when
there are legitimate configuration updates requiring pod recreation.

Changes:
- Add revision match checks in RollingRestartReconciler.Reconcile()
- Enhance WorkingPodForRollingRestart() to skip when revisions match
- Add debug logging to indicate when rolling restarts are skipped

Fixes issues during node upgrades and other node draining scenarios
where pods are recreated without spec changes.

Signed-off-by: Itamar Syn-Hershko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Operator terminates nodes via restart after k8s node removed via spot request termination.
1 participant