-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi-driver-nfs stops working for new PVC on randomly affected nodes. #649
Comments
A previous volume is mounting (and stuck). This happens for various reasons. Such as networking unreachable, nfs client not responding... |
I ran the command and got the following output. The command stays stuck after that output.
172.20.0.14 - Service ip of the NFS server |
It seem DNS is fine. Assuming 172.20.0.14 is accessible. Is there any useful log from kernel? Such as checking the log in host's |
|
Last few messages from dmesg:
Do you think the NFS server running inside the cluster has anything to do with this? I noticed host network trying to connect to service network here: |
I've been noticing inconsistency especially when using helm 4.7.0 |
[MountVolume.SetUp Failure on Pod Reschedule with csi-driver-nfs] Hello guys, I hope everything is fine for you. I encounter a similar issue here : Expected Behavior: Actual Behavior:
Describe:
Environment:
Deployment method:
Side nfs server:
Side csi-nfs-node :
Side container nfs registry.k8s.io/sig-storage/nfsplugin:v4.8.0:
Let me know if further information is needed. Take care. |
you could check whether manual nfs mount works on the node, that's not csi driver issue:
|
Shame on me, you are right... I initialized my other master without these two arguments:
I reinitialized the second master with the correct arguments, then restarted the csi-nfs-node pod, and everything is working fine now. Sorry for the noise. Take care, |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
What happened: We use csi-driver-nfs to dynamically provision NFS volumes using PVC and attach them to pod. Sometime, the node/driver gets stuck in a weird state and stops working all together on affected node. The driver is not able to connect any new NFS pvc to any pod on this affected node. Creating a new PVC creates directory on NFS server, but mounting that directory on affected node fails. We have to replace the node for the driver to get going again.
What you expected to happen: NFS driver should be able to mount nfs directory on affected node, just like it works on other nodes.
How to reproduce it: This happens randomly and there are no specific steps to replicate it. We observe it once or twice every few months. The only way we can recover is to replace the node.
Anything else we need to know?: Not sure if this matters but we have in-cluster NFS server that is exposed using kubernetes service object. This is EKS cluster using default amazon linux AMIs. we observed issues on kubernetes version 1.25 and 1.27. We have not tested on any other version.
Logs:
Running
kubectl logs -n kube-system csi-nfs-node-pzs7j -c nfs
, we see repeated log entries like the one below.I0416 15:22:39.454827 1 utils.go:109] GRPC call: /csi.v1.Node/NodePublishVolume I0416 15:22:39.454846 1 utils.go:110] GRPC request: {"target_path":"/var/lib/kubelet/pods/096e1ea2-4c1e-46eb-8006-ebcc57e34563/volumes/kubernetes.io~csi/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["nfsvers=4.1"]}},"access_mode":{"mode":5}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8","csi.storage.k8s.io/pvc/name":"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-home","csi.storage.k8s.io/pvc/namespace":"opaque-user","server":"nfs.nfs-server-domain-name","share":"/","storage.kubernetes.io/csiProvisionerIdentity":"1713274642129-9461-nfs.csi.k8s.io","subdir":"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8"},"volume_id":"nfs.nfs-server-domain-name##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##"} E0416 15:22:39.454957 1 utils.go:114] GRPC error: rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-server-domain-name##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## already exists
nfs.nfs-server-domain-name is a custom dns entry mapped to the service ip of in-cluster NFS server using coredns custom config.
Environment:
kubectl version
): 1.25 and 1.27NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
Kernel (e.g.
uname -a
):Linux ip-10-0-2-227.ec2.internal 5.10.213-201.855.amzn2.x86_64 #1 SMP Mon Mar 25 18:16:11 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:
The text was updated successfully, but these errors were encountered: