Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volume not being reattached to healthy node when initial node shutdown #720

Open
evgenii-avdiukhin opened this issue Sep 11, 2024 · 2 comments
Labels
bug Something isn't working Stale

Comments

@evgenii-avdiukhin
Copy link

TL;DR

I have configure csi-driver and deployed jenkins statefullset to test
the volume was automatically created and attached to worker-1
jenkins pod then was scheduled on the same node
then i wanted to test how reattachment works
i shutdown worker-1 hetzner vm
but nothing happened, volume is not being reattached
since tolerations are configured, jenkins pod is terminating and then try to schedule on the node that has the pvc, but he cant because pvc is still on the dead node
what do i do wrong? or this behaviour is not supported by csi-driver?

Expected behavior

hetzne volume is moved to healthy node and pod schedule successfully

Observed behavior

volume is not being reattached

Minimal working example

No response

Log output

No response

Additional information

No response

@evgenii-avdiukhin evgenii-avdiukhin added the bug Something isn't working label Sep 11, 2024
@mpepping
Copy link

By design, StatefulSet pods do not get rescheduled to a new node when the original node becomes unavailable. This is because Kubernetes does not distinguish between a deliberate shutdown and a network partition, so it marks the pods on the down node as Unknown rather than deleting them. That is what you see when power-off/shutdown a node. It rewquires manual rescheduling in case of a StatefulSet.

However if you do a drain or delete of the node running the Jenkins pod, it all works as you may expect. The behavior is the most responsive when draining or deleting nodes. Some 'exclusively attached' events on the workload, but all in all the PVC re-attaches in a reasonable time:

Normal   Scheduled               22s   default-scheduler        Successfully assigned jenkins/jenkins-0 to dev-pool-small-static-worker2
Warning  FailedAttachVolume      23s   attachdetach-controller  Multi-Attach error for volume "pvc-8b1a23a1-cc85-4b09-9231-2c963885e366" Volume is already exclusively attached to one node and can't be attached to another 
Normal   SuccessfulAttachVolume  0s    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8b1a23a1-cc85-4b09-9231-2c963885e366"                  

Copy link

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the Stale label Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants