Skip to content

fix: prevent file descriptor leak in API mode#119

Merged
inureyes merged 1 commit intomainfrom
fix/118-fd-leak-api-mode
Feb 6, 2026
Merged

fix: prevent file descriptor leak in API mode#119
inureyes merged 1 commit intomainfrom
fix/118-fd-leak-api-mode

Conversation

@inureyes
Copy link
Member

@inureyes inureyes commented Feb 6, 2026

Summary

  • Cache Nvml handle in NvidiaGpuReader and reuse across get_gpu_info() / get_process_info() calls, with graceful reinit on handle invalidation (e.g., GPU hot-unplug)
  • Reuse sysinfo::System instance across get_process_info() calls instead of creating a new one every iteration
  • Create Disks instance once before the API metrics collection loop and refresh in-place each cycle via disks.refresh(true)

Test plan

  • cargo build passes
  • cargo clippy passes with no warnings
  • All 169 tests pass
  • Run API mode for extended period and verify no FD growth via ls /proc/<pid>/fd | wc -l

Closes #118

…dles

Cache Nvml, System, and Disks instances instead of recreating them on
every metrics collection iteration, preventing FD exhaustion in
long-running API server processes.

- Store Nvml handle in NvidiaGpuReader with graceful reinit on failure
- Reuse sysinfo::System across get_process_info() calls
- Create Disks once before the API loop and refresh in-place each cycle
- Remove now-unused standalone get_gpu_processes() function

Closes #118
@inureyes inureyes merged commit f467263 into main Feb 6, 2026
1 check passed
@inureyes inureyes deleted the fix/118-fd-leak-api-mode branch February 6, 2026 04:23
@inureyes inureyes self-assigned this Feb 6, 2026
@inureyes inureyes added type:enhancement New feature or request status:done Completed priority:high High priority issue device:nvidia-gpu NVIDIA GPU related labels Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

device:nvidia-gpu NVIDIA GPU related priority:high High priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: File descriptor leak in API mode causes "Too many open files" error

1 participant