-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull-npd-e2e-test failing ssh handshake #970
Comments
This looks like an infra issue. @BenTheElder Do you know who should we talk to? CC @hakman |
It's a problem with the jobs. SIG K8S infra does not create your test VMs. The test is attempting to SSH to a disposable test VM created by your job. seems like the VM is not serving SSH or something similar |
CC @DigitalVeer |
If these are like node e2e tests, folks in SIG node might be familiar SIG Testing strongly discourages ssh usage in cluster e2e tests, relying instead on hostexec pods when necessary, but for some node style testing that's not sufficient, and mostly folks in SIG Node work with this. |
It's possible there is with an issue with the GCP projects rented by this test. It's unclear to me why the SSH connection is not working but I'll try to debug with @hakman. |
This is an issue with echo "fake filesystem error from problem-maker" > /sys/fs/ext4/sda1/trigger_fs_error Once this runs, the filesystem is mounted as read-only and SSH stops working with
There may be some recent changes that affect the behaviour of |
New updates: Talked to COS team and found the root cause: https://www.spinics.net/lists/linux-ext4/msg90066.html The kernel commit changes EXT4_MF_FS_ABORTED to EXT4_FLAGS_SHUTDOWN when fs error happens so though the fs is remounted as read-only, files can't be read by anyone and SSH connections will fail. This is an intentional change from upstream kernel so on COS side they won't change it. The path forward would be updating the NPD test case for newer kernel versions (>=6.5.0-rc3). |
@wangzhen127 I don't think SSH failing after this is an intended behaviour. |
Yeah, this is from COS team's perspective, because the change in upstream. So there is not much they can do. So they recommend us to update tests. Sorry for the confusion. |
No worries, I just meant that maybe they can configure the SSH server to not fail completely. I agree that the FS should become read-only, but not accepting SSH connections is quite unexpected. |
https://testgrid.k8s.io/presubmits-node-problem-detector#pull-npd-e2e-test starts to fail recently.
This is affecting several different PRs: #955, #961, #969.
The text was updated successfully, but these errors were encountered: