Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFTP Connection fails with Alpine image on Linux Kernel 5.14.0-427.16.1.el9_4.x86_64 #416

Open
GitGuruGangsta opened this issue Sep 6, 2024 · 4 comments

Comments

@GitGuruGangsta
Copy link

GitGuruGangsta commented Sep 6, 2024

On connection container terminal shows:
"Accepted password for from 10.0.10.150 port 42379 ssh2"

But there is no connection established. It seems that outbound connection is prevented.

This appears only on alpine images (for a year old image and also for the newest one) and only on a new Linux kernel version of the host we assume.

The debian images work fine, no matter which Linux kernel is installed on the container host.

The problem occured since we updated our kubernetes cluster from older CentOS nodes to Rocky Linux (5.14.0-427.16.1.el9_4.x86_64) using the alpine images.

@zorbla
Copy link

zorbla commented Sep 12, 2024

As I encountered a simular situation, I am curious: Did you eventually wait for about 15 minutes to see if the connection comes up late? How much CPU is the container consuming?

@GitGuruGangsta
Copy link
Author

GitGuruGangsta commented Sep 13, 2024

we tried some things on configuration and firewalls and yes it ran more than 15 min several times. But it didn't work. Incoming packets arrived at the container, but we could not see any outbound tcp packets.
Both alpine and debian images in newest version rely on kernel 6.1.+, so that should not make a difference on a node running on kernel 5.14. So that shouldn't be sth. like kernel incompatibilty problem
Maybe any strange security measure of Redhat Linux (Rocky Linux) that prevents outbound ssh, but just from containers???

but the diabian image works fine, so we are using this one now

@GitGuruGangsta GitGuruGangsta changed the title SFTP Connection fails with Alpine image on Linux Kernel Linux 5.14.0-427.16.1.el9_4.x86_64 SFTP Connection fails with Alpine image on Linux Kernel 5.14.0-427.16.1.el9_4.x86_64 Sep 13, 2024
@zorbla
Copy link

zorbla commented Sep 16, 2024

I was asking, because what I found out is, that the issue seems to only really happen if your host OS supports large FD numbers.

After login and chrooting, OpenSSH tries to close all file descriptors up to the largest used FD number. However, on Linux, to find the largest used FD requires the process to either access /proc filesystem or have the libproc library available. If this is not the case, then ALL file descriptors until the largest FD number possible are trying to be closed.

For some reason, the sshd in the alpine image cannot access FD information in the /proc filesystem. Also, the image does not contain libproc. So, OpenSSH tries to close all FD (see below).

The debian image has libproc available, so the OpenSSH uses simply this, to find the largest used FD and only closes a hand full of FDs.

If you are using the alpine image on a host with a low maximum file descriptor number, then the issue might never occur for you. But if you are on a host with billions of file descriptor numbers avaialbe, all those billions of file descriptors are tried to be closed by OpenSSH after authentication and before presenting the prompt, resulting in ridiculous waiting times (in my case about 15 minutes).

@sazzle2611
Copy link

sazzle2611 commented Sep 21, 2024

We had the same issue after switch from centos to AlmaLinux, for us this was massive issue as we have around 50 SFTP pods for clients. We had to disable chroot to get them working whilst I tried to figure out how to fix it. Info came from here

I copied the code from this repo so that I could try things and build our own images. I spent hours trying to find a way to reduce the max open file limit but the pod (running in kubernetes) ignored the limits.conf file I added and also the sysctl.conf file, it just refused to reduce the limit from the rediculous max open file limit of 1073741816.

Eventually I tried adding this to the bottom of the entrypoint file, just above the call to run sshd
log "Setting ulimit to 512"
ulimit -n 512
It worked!!!!!! I was so relieved, although now I'm kicking myself that the answer was so simple!!

We actually had this issue with the Debian image as well so strange that you don't but maybe our file limit is higher. I stuck with the Alpine image because we want the images as small as possible but don't see any reason the same wouldn't work for Debian.

Maybe an environment variable could be added to this repo to 'optionally' allow the setting of a specified open file limit?

I tried adding the ulimit command to a script in /etc/sftp.d which are automatically run anyway but it didn't work, guess cause it was different bash process, don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants