-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Labels
status:doneCompletedCompleted
Description
Summary
PR #119 fixed a file descriptor leak in the NVIDIA reader by caching the sysinfo::System instance. However, the same System::new() per-call anti-pattern still exists in NVIDIA Jetson and Tenstorrent readers, causing identical FD leaks in API mode (long-running metrics loop).
Problem
Both readers create a new System::new() on every get_process_info() call. In API mode, this runs every interval seconds indefinitely, leaking /proc file descriptors each cycle.
NVIDIA Jetson (src/device/readers/nvidia_jetson.rs)
- Line 176:
System::new()inget_process_info() - Line 261:
System::new()inget_gpu_processes()helper
Tenstorrent (src/device/readers/tenstorrent.rs)
- Line 201:
System::new()inget_process_info()
Solution
Replace per-call System::new() with with_global_system() from src/utils/system.rs, which is the standard pattern already used by:
- AMD reader (
amd.rs:583-585) - Apple Silicon reader (
apple_silicon_native.rs:388) - Local collector (
local_collector.rs:304, 433)
This reuses a single global Mutex<System> instance (GLOBAL_SYSTEM) instead of allocating new ones.
Additionally consider
- NVIDIA reader (
nvidia.rs): PR fix: prevent file descriptor leak in API mode #119 used a struct-levelMutex<System>field. Consider migrating this towith_global_system()as well for consistency. - Furiosa reader (
furiosa.rs:231):list_devices()is called per-cycle in RS mode. Investigate whether this creates persistent file handles in thefuriosa-smi-rslibrary.
References
- PR fix: prevent file descriptor leak in API mode #119: fix: prevent file descriptor leak in API mode by reusing resource handles
- Issue fix: File descriptor leak in API mode causes "Too many open files" error #118: Original FD leak report
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
status:doneCompletedCompleted