-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFTP Connection fails with Alpine image on Linux Kernel 5.14.0-427.16.1.el9_4.x86_64 #416
Comments
As I encountered a simular situation, I am curious: Did you eventually wait for about 15 minutes to see if the connection comes up late? How much CPU is the container consuming? |
we tried some things on configuration and firewalls and yes it ran more than 15 min several times. But it didn't work. Incoming packets arrived at the container, but we could not see any outbound tcp packets. but the diabian image works fine, so we are using this one now |
I was asking, because what I found out is, that the issue seems to only really happen if your host OS supports large FD numbers. After login and chrooting, OpenSSH tries to close all file descriptors up to the largest used FD number. However, on Linux, to find the largest used FD requires the process to either access /proc filesystem or have the libproc library available. If this is not the case, then ALL file descriptors until the largest FD number possible are trying to be closed. For some reason, the sshd in the alpine image cannot access FD information in the /proc filesystem. Also, the image does not contain libproc. So, OpenSSH tries to close all FD (see below). The debian image has libproc available, so the OpenSSH uses simply this, to find the largest used FD and only closes a hand full of FDs. If you are using the alpine image on a host with a low maximum file descriptor number, then the issue might never occur for you. But if you are on a host with billions of file descriptor numbers avaialbe, all those billions of file descriptors are tried to be closed by OpenSSH after authentication and before presenting the prompt, resulting in ridiculous waiting times (in my case about 15 minutes). |
On connection container terminal shows:
"Accepted password for from 10.0.10.150 port 42379 ssh2"
But there is no connection established. It seems that outbound connection is prevented.
This appears only on alpine images (for a year old image and also for the newest one) and only on a new Linux kernel version of the host we assume.
The debian images work fine, no matter which Linux kernel is installed on the container host.
The problem occured since we updated our kubernetes cluster from older CentOS nodes to Rocky Linux (5.14.0-427.16.1.el9_4.x86_64) using the alpine images.
The text was updated successfully, but these errors were encountered: