Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of open files causes issues #130

Open
Champ-Goblem opened this issue May 4, 2023 · 1 comment
Open

Large number of open files causes issues #130

Champ-Goblem opened this issue May 4, 2023 · 1 comment

Comments

@Champ-Goblem
Copy link
Contributor

We have a number of issues caused by the number of files that this crate opens in the context of running in Nydus.

The first issue is with workloads that perform a large number of filesystem operations, the longer the pod runs the more file descriptors that get collected. Nydus sets the rlimit on the host, but for some systems, this is capped at 2^20 (1048576) and can't go above this value. We have seen this cause issues with a workload where it enters a state in which it is in a constant crash loop and is unable to recover unless the pod is deleted and recreated. The pod constantly complains about OSError: [Errno 24] Too many open files yet the actual workload inside the VM is not reaching the descriptor limit.

When inspecting the nr-open count and comparing this to the ulimit within the Linux namespace for the pod on the host node, we see that nr-open is maxed out at the ulimit value. The majority of these files are currently in an open state under the Nydus process.

Having so many files in a constant open state also causes the kubelet CPU usage to increase drastically. This is because the kubelet runs cadvisor which collects metrics on the open file descriptors and the type of file descriptor (eg if it's a socket or a file). We recently opened an issue with cadvisor about this metric stat collection, which can be found here (google/cadvisor#3233), but it would be good to try and solve the issue at the source.

I assume the reason the open file descriptor is “cached” is so that the overhead of executing the open syscall is reduced? If this is the case, is there a way to automatically close a file descriptor if it's not used often? Something like a timeout on the descriptor so if it's not been accessed after x amount of time it gets closed, this allows it to be reopened when it's needed again.

Any thoughts or ideas would be greatly appreciated.

@eryugey
Copy link
Contributor

eryugey commented May 17, 2023

Yes, this is an issue in fuse-backend-rs when cache policy is not None (when entry_timeout is not 0), fuse-backend-rs will store O_PATH fd in inode store, and right now it only will be closed & removed from inode store when fuse kernel module send Forget request, e.g. on memory pressure and inode & dentry reclaim is triggered.

I think one workaround is to trigger inode & dentry reclaim manually, e.g.

echo 2 > /proc/sys/vm/drop_caches

or remount the fuse mount, this only affects the fuse mount in question, not a system-wide operation

mount -o remount /mnt/path/to/fuse

Your suggested "close it after timeout" way should work as well, but it requires FUSE_NOTIFY_INVAL_INODE|ENTRY support, and it seems only fuse dev supports this fuse notify, virtiofs doesn't support it right now. We'll look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants