Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed Mount Error (exit status 32) When Creating Pod with PVC using the csi-driver-smb #852

Open
hajedkh opened this issue Sep 24, 2024 · 8 comments

Comments

@hajedkh
Copy link

hajedkh commented Sep 24, 2024

What happened:
Pod Creation Error with event:
Warning FailedMount 28s (x8 over 92s) kubelet MountVolume.MountDevice failed for volume "pvc-f03018a6-a450-41a9-b4f7-0609a57120e7" : rpc error: code = Internal desc = volume(viaps012-int.lia.int/archives#pvc-f03018a6-a450-41a9-b4f7-0609a57120e7##) mount "//<HOST>/archives" on "/var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount" failed with mount failed: exit status 32 Mounting command: mount Mounting arguments: -t cifs -o dir_mode=0777,file_mode=0777,uid=1001,gid=1001,<masked> //viaps012-int.lia.int/archives /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/0a24123840085c6b252ac47fff4245d291dfda1381a23183f0c8b394e4183af5/globalmount Output: mount error(13): Permission denied Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

What you expected to happen:
Volume provisionned and pod created

How to reproduce it:
Random after multiple volume mounts it fails for some and the pod stay blocked in ContainerCreationError.

Anything else we need to know?:
When we reschedule the pod in another node in the cluster it works fine (it happens with all worker nodes).
Environment:

  • CSI Driver version: v1.16.0
  • Kubernetes version (use kubectl version): v1.27.13+e709aa5
  • OS (e.g. from /etc/os-release): CoreOS
  • Install tools: Helm
  • Others: Openshift cluster 4.14.26
@andyzhangx
Copy link
Member

that's Permission denied error, does manual mount on the node work?

@hajedkh
Copy link
Author

hajedkh commented Oct 13, 2024

It is the exact same issue using the mount command on the node, randomly sometimes it passes sometimes no.

@andyzhangx
Copy link
Member

then it's not the csi driver issue.

@pcking999
Copy link

@hajedkh do the shares you are connecting to happening to be DFS shares? i just had this exact same issue with the same error, permissions on the shares where fine and hadn't changed. but when i remoted into my worker node and ran journalctl -xe i noticed this error repeated multiple times. "the device mount path ... is still mounted by other references". it appears what happened was when our main file server went down for patching the DFS shares resolved to our backup file server. i could see shares where still mounted to the backup server on the worker host by running "cat /proc/mounts" and looking at the ip address. i think what happened is once the main file server was back online the system tried to mount them against the main fs when pods where brought up but it couldn't because it already had a connection to the backup file server. hence the mounted by other references error. i ended up changing everything to point to the server shares directly not though DFS but it would be nice if DFS worked seamlessly.

@kxs-jnadeau
Copy link

You can also consider the CIFS driver shipping with your worker node kernel. We've seen many, many instabilities in the CIFS driver in the shipping Linux kernel before upstream version 6.5 causing similar issues.

@hajedkh
Copy link
Author

hajedkh commented Nov 13, 2024

@kxs-jnadeau Could you please specify which versions of cifs do you recommend ? I am using CoreOS REHEL 9.2 and cifs module version is 2.37.

@kxs-jnadeau
Copy link

We have seen stability with CIFS driver as shipping by AKS on Ubuntu 22.04 but they are seemingly backporting it from kernel 6.5 on the 5.15 Linux baseline at version 2.44.

@pcking999
Copy link

just had this happen again but this time everything was pointing directly to the server shares and not using the DFS shares at all. i could see that 1 host had the shares mounted pointing to the ip address of our DR server. that doesn't seem to make a lot of sense to me since it wasn't using the DFS shares I'm not sure how that happened. i believe the permission error is a red haring and the true error to focus on is exit status 32.

Our hosts are running RHEL 8.9
Kubernetes version v1.27.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants