Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninstall handle cases when directories are mounts and cannot be removed #4470

Merged
merged 2 commits into from
Jul 31, 2023
Merged

Uninstall handle cases when directories are mounts and cannot be removed #4470

merged 2 commits into from
Jul 31, 2023

Conversation

aceeric
Copy link
Contributor

@aceeric aceeric commented Jul 12, 2023

Proposed Changes

In some cases a machine is configured with /var/lib/kubelet as a mount rather than a directory and this causes the rke2-uninstall.sh script to fail where it is removing directories so the machine isn't fully cleaned up. If you use the Rancher Federal Ansible RKE2 installer to install/re-instal RKE2, the presence of the files causes a re-install to fail. The change in this PR accomplishes exactly the same thing as the code it replaces because rm -rf removes all files under a directory before trying to remove the directory itself. In the case where /var/lib/kubelet is a mount, it just ignores the error and moves on to clean up /var/lib/rancher.

Verification

In AWS, provision an instance with two drives and configure /var/lib/kubelet as a mount:

# lsblk
NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0  30G  0 disk 
└─nvme0n1p1 259:1    0  30G  0 part /
nvme1n1     259:2    0  50G  0 disk /var/lib/kubelet

Then install rke2 and uninstall using the modified script. Confirm the script exits without error and /var/lib/rancher is removed.

Linked issue:

@aceeric aceeric requested a review from a team as a code owner July 12, 2023 14:34
bundle/bin/rke2-uninstall.sh Outdated Show resolved Hide resolved
Copy link
Member

@cwayne18 cwayne18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the PR!

@brandond brandond merged commit 3963d6b into rancher:master Jul 31, 2023
1 check passed
@andrewkhalaf
Copy link

andrewkhalaf commented Aug 14, 2023

Why instead of fixing the root cause, this PR is just ignoring the issue?
If the uninstallation script completes normally without unmouting the /var/lib/kubelet directories, the following installation will fail.

Logs:

+ rm -f /usr/local/bin/rke2-killall.sh
+ rm -rf /usr/local/share/rke2
+ rm -rf /etc/rancher/rke2
+ rm -rf /etc/rancher/node
+ rm -d /etc/rancher
+ rm -rf /etc/cni
+ rm -rf /opt/cni/bin
+ rm -rf /var/lib/kubelet
rm: cannot remove '/var/lib/kubelet/pods/03dab7eb-e905-47ad-81a9-c68fa1f592ea/volumes/kubernetes.io~projected/kube-api-access-5wf7p': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/915aaae6-a02a-4580-b912-9be7e1bcb98a/volumes/kubernetes.io~secret/webhook-cert': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/915aaae6-a02a-4580-b912-9be7e1bcb98a/volumes/kubernetes.io~projected/kube-api-access-8q7sr': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/6d5e0117-e648-480d-91ca-7ebd7b0e6e6d/volumes/kubernetes.io~secret/longhorn-grpc-tls': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/6d5e0117-e648-480d-91ca-7ebd7b0e6e6d/volumes/kubernetes.io~projected/kube-api-access-kltqs': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/5934e6a9-0eaa-4ead-bb60-9cb5581b1d53/volumes/kubernetes.io~projected/kube-api-access-lrq6m': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/bed13512-25b9-4066-aa6e-7c9eea292151/volumes/kubernetes.io~projected/kube-api-access-qwzsn': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/61501c04-b4d9-4ca2-a20f-f53c34f3a6fa/volumes/kubernetes.io~projected/kube-api-access-lsh9c': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/38cf225e-1276-47c8-b544-433ef89a3f68/volumes/kubernetes.io~projected/kube-api-access-5pxrn': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/cbc91bf0-1bad-434d-b353-67aec5c753a9/volumes/kubernetes.io~projected/kube-api-access-68688': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/cbebd42d-e06f-4852-983b-7de372b10534/volumes/kubernetes.io~secret/longhorn-grpc-tls': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/cbebd42d-e06f-4852-983b-7de372b10534/volumes/kubernetes.io~projected/kube-api-access-dt9cc': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/d205746b-6c7f-43bb-97de-be95a4ca9cfa/volumes/kubernetes.io~secret/longhorn-grpc-tls': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/d205746b-6c7f-43bb-97de-be95a4ca9cfa/volumes/kubernetes.io~projected/kube-api-access-gnsn9': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/c41214a8-6f3f-4620-b36d-393727e4d30a/volumes/kubernetes.io~projected/kube-api-access-hbcp9': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/2a6fdb4c-6748-4dae-a373-59ee6ddecdb9/volumes/kubernetes.io~projected/kube-api-access-rmqw8': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/959e4e7f-3916-41bd-8e7b-b11069cacb0a/volumes/kubernetes.io~projected/kube-api-access-glq5m': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/2b5a1526-26db-4433-b2ec-af996e531d4f/volumes/kubernetes.io~projected/kube-api-access-qgwkf': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/babd6f7b-a3b4-4bcb-a4b9-f3071c85878d/volumes/kubernetes.io~projected/kube-api-access-mdpw9': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/dc0e82b6-7fb1-4fdd-adc4-a269e8733b89/volumes/kubernetes.io~projected/kube-api-access-htqrm': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/2417edbc-c4d6-47b4-abfb-98021fb21722/volumes/kubernetes.io~projected/kube-api-access-x2wtn': Device or resource busy`

Directory /var/lib/kubelet is not removed and automation script didn't catch the error as the rke2-uninstall.sh script completed.

@caroline-suse-rancher
Copy link
Contributor

Hello @andrewkhalaf, you can check out where our QA has validated that this issue is fixed here. If you follow the same reproduction steps and still see problems, please open a new bug report or feature request. Thanks!

@brandond
Copy link
Member

brandond commented Oct 24, 2023

@andrewkhalaf the umount of those paths should have been handled by the rke2-killall script, prior to the uninstall script trying to delete the directory. Unfortunately you've not included the portion of the log where the killall script was run, so I can't say why it didn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants