Large number of open files causes issues #130

Champ-Goblem · 2023-05-04T14:18:14Z

We have a number of issues caused by the number of files that this crate opens in the context of running in Nydus.

The first issue is with workloads that perform a large number of filesystem operations, the longer the pod runs the more file descriptors that get collected. Nydus sets the rlimit on the host, but for some systems, this is capped at 2^20 (1048576) and can't go above this value. We have seen this cause issues with a workload where it enters a state in which it is in a constant crash loop and is unable to recover unless the pod is deleted and recreated. The pod constantly complains about OSError: [Errno 24] Too many open files yet the actual workload inside the VM is not reaching the descriptor limit.

When inspecting the nr-open count and comparing this to the ulimit within the Linux namespace for the pod on the host node, we see that nr-open is maxed out at the ulimit value. The majority of these files are currently in an open state under the Nydus process.

Having so many files in a constant open state also causes the kubelet CPU usage to increase drastically. This is because the kubelet runs cadvisor which collects metrics on the open file descriptors and the type of file descriptor (eg if it's a socket or a file). We recently opened an issue with cadvisor about this metric stat collection, which can be found here (google/cadvisor#3233), but it would be good to try and solve the issue at the source.

I assume the reason the open file descriptor is “cached” is so that the overhead of executing the open syscall is reduced? If this is the case, is there a way to automatically close a file descriptor if it's not used often? Something like a timeout on the descriptor so if it's not been accessed after x amount of time it gets closed, this allows it to be reopened when it's needed again.

Any thoughts or ideas would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

eryugey · 2023-05-17T05:23:20Z

Yes, this is an issue in fuse-backend-rs when cache policy is not None (when entry_timeout is not 0), fuse-backend-rs will store O_PATH fd in inode store, and right now it only will be closed & removed from inode store when fuse kernel module send Forget request, e.g. on memory pressure and inode & dentry reclaim is triggered.

I think one workaround is to trigger inode & dentry reclaim manually, e.g.

echo 2 > /proc/sys/vm/drop_caches

or remount the fuse mount, this only affects the fuse mount in question, not a system-wide operation

mount -o remount /mnt/path/to/fuse

Your suggested "close it after timeout" way should work as well, but it requires FUSE_NOTIFY_INVAL_INODE|ENTRY support, and it seems only fuse dev supports this fuse notify, virtiofs doesn't support it right now. We'll look into this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of open files causes issues #130

Large number of open files causes issues #130

Champ-Goblem commented May 4, 2023

eryugey commented May 17, 2023

Large number of open files causes issues #130

Large number of open files causes issues #130

Comments

Champ-Goblem commented May 4, 2023

eryugey commented May 17, 2023