Skip to content

fix: use with_global_system() to prevent FD leak in Jetson, Tenstorrent, and NVIDIA readers (#120)#121

Merged
inureyes merged 1 commit intomainfrom
fix/120-fd-leak-jetson-tenstorrent
Feb 8, 2026
Merged

fix: use with_global_system() to prevent FD leak in Jetson, Tenstorrent, and NVIDIA readers (#120)#121
inureyes merged 1 commit intomainfrom
fix/120-fd-leak-jetson-tenstorrent

Conversation

@inureyes
Copy link
Member

@inureyes inureyes commented Feb 8, 2026

Summary

Fixes #120 - Replace per-call System::new() instances with with_global_system() to prevent file descriptor leaks in API mode (long-running metrics loop).

Problem

The NVIDIA Jetson, Tenstorrent, and NVIDIA readers were creating new System::new() instances on every get_process_info() call. In API mode, this runs every interval indefinitely, leaking /proc file descriptors each cycle.

Changes

NVIDIA Jetson (src/device/readers/nvidia_jetson.rs)

  • Line 176: get_process_info() now uses with_global_system()
  • Line 261: get_gpu_processes() helper now uses with_global_system()
  • Removed unused use sysinfo::System; import

Tenstorrent (src/device/readers/tenstorrent.rs)

  • Line 201: get_process_info() now uses with_global_system()
  • Removed unused use sysinfo::System; import

NVIDIA (src/device/readers/nvidia.rs)

  • Removed system: Mutex<System> field from NvidiaGpuReader struct
  • Migrated get_process_info() to use with_global_system() for consistency with the rest of the codebase

Approach

Uses the with_global_system() function from src/utils/system.rs which provides a global Mutex<System> instance (GLOBAL_SYSTEM). This is the standard pattern already used by:

  • AMD reader (src/device/readers/amd.rs:583-585)
  • Apple Silicon reader (src/device/readers/apple_silicon_native.rs:388)
  • Local collector (src/view/data_collection/local_collector.rs:304, 433)

Testing

  • ✅ Code compiles successfully with cargo build
  • ✅ All three readers now use the same consistent pattern as other readers in the codebase
  • ✅ No functional changes to the API, only internal optimization

Closes #120

Replace per-call System::new() instances with with_global_system() to prevent file descriptor leaks in API mode (long-running metrics loop).

Changes:
- NVIDIA Jetson (nvidia_jetson.rs):
  - Line 176: get_process_info() now uses with_global_system()
  - Line 261: get_gpu_processes() helper now uses with_global_system()
- Tenstorrent (tenstorrent.rs):
  - Line 201: get_process_info() now uses with_global_system()
- NVIDIA (nvidia.rs):
  - Removed system: Mutex<System> field from struct
  - Migrated get_process_info() to use with_global_system() for consistency

This follows the standard pattern already used by AMD reader, Apple Silicon reader, and local collector.

Fixes #120
@inureyes
Copy link
Member Author

inureyes commented Feb 8, 2026

Security & Performance Review

Analysis Summary

  • PR Branch: fix/120-fd-leak-jetson-tenstorrent (all analysis performed here)
  • Scope: changed-files with full-context cross-reference
  • Languages: Rust
  • Total issues: 0 actionable (1 pre-existing observation noted below)
  • Critical: 0 | High: 0 | Medium: 0 | Low: 0

Review Checklist

Security

  • No security vulnerabilities introduced
  • No new unsafe blocks
  • No new external input handling
  • No hardcoded credentials or secrets
  • No command injection vectors

Performance - Global System Pattern

  • with_global_system() correctly applied in all three readers (nvidia.rs, nvidia_jetson.rs, tenstorrent.rs)
  • No nested with_global_system() calls that could deadlock -- all uses are sequential within each call chain
  • Mutex contention analysis: In local_collector.rs, GPU reader get_process_info() (which now uses with_global_system) runs concurrently with the "full process collection" task (which also uses with_global_system). This introduces contention on GLOBAL_SYSTEM, but NOT deadlock -- one task will simply wait for the other. This is acceptable and consistent with the existing AMD reader behavior.
  • Lock hold duration is appropriate -- only held during process refresh and data extraction, then released
  • No System::new() calls remain in any of the three changed files

Correctness

  • All System::new() calls properly replaced with with_global_system()
  • NVIDIA reader: system: Mutex<System> field correctly removed from struct, constructor, and usage
  • Jetson reader: Both get_process_info() (line 179) and get_gpu_processes() fallback path (line 261) correctly use with_global_system()
  • Tenstorrent reader: get_process_info() correctly uses with_global_system()
  • use sysinfo::System; import correctly removed from all three files
  • with_global_system import correctly added to all three files
  • Code compiles cleanly (cargo check passes)
  • No clippy warnings (cargo clippy -- -D warnings passes)
  • All 19 tests pass

Code Quality

  • Consistent pattern with existing AMD and Apple Silicon readers
  • No unused imports
  • Comments properly updated to reflect the new pattern
  • Proper scoping of use sysinfo::{...} imports within functions

Pre-existing Observation (Not Introduced by This PR)

In nvidia_jetson.rs get_gpu_processes() fallback path (line 261-296): When with_global_system is used, only system.refresh_memory() is called before iterating system.processes(). Process data is NOT refreshed, so it reads whatever process state was last populated by another caller. This was the same in the original code (System::new() + refresh_memory() would have had an empty process list), so this PR does not regress behavior -- it actually slightly improves it since the global system may contain previously-refreshed process data.

Verdict

This PR is clean and ready to merge. The changes are mechanical, consistent, and correctly follow the established with_global_system() pattern. No issues found that require fixing.

@inureyes
Copy link
Member Author

inureyes commented Feb 8, 2026

PR Finalization Report

Project Structure

  • Project Type: Rust (Cargo)
  • Test Framework: cargo test (166 lib + 169 bin + 32 integration + 19 doc tests = 386 total)
  • Documentation System: Plain markdown in docs/
  • Lint Tools: cargo fmt, cargo clippy (with .clippy.toml config)

Checklist

Tests

  • Analyzed existing test structure (unit tests in src/, integration tests in tests/)
  • Identified test coverage needs: No new tests required -- this is a pure internal refactor replacing System::new() with with_global_system() across three readers. No public API changes.
  • All 386 tests passing (0 failures)

Documentation

  • Reviewed documentation needs: No updates required
    • No public API changes (internal refactor only)
    • No new features or configuration options
    • No changes to CLI behavior or output

Code Quality

  • cargo fmt --check: Passed (no formatting issues)
  • cargo clippy -- -D warnings: Passed (no warnings)
  • All unused imports (sysinfo::System) properly removed
  • Consistent pattern applied across all three readers (nvidia, nvidia_jetson, tenstorrent)

Changes Verified

  • src/device/readers/nvidia.rs: Removed system: Mutex<System> field, migrated get_process_info() to with_global_system()
  • src/device/readers/nvidia_jetson.rs: Migrated both get_process_info() and get_gpu_processes() to with_global_system()
  • src/device/readers/tenstorrent.rs: Migrated get_process_info() to with_global_system()

All three readers now match the pattern already used by AMD, Apple Silicon, and local collector readers.

Status

All checks passing. Ready for merge.

@inureyes inureyes merged commit 31389b5 into main Feb 8, 2026
4 checks passed
@inureyes inureyes deleted the fix/120-fd-leak-jetson-tenstorrent branch February 8, 2026 08:37
@inureyes inureyes self-assigned this Feb 8, 2026
@inureyes inureyes added type:enhancement New feature or request status:done Completed priority:high High priority issue device:amd-gpu AMD GPU related mode:api API mode related device:nvidia-gpu NVIDIA GPU related device:npu NPU (Neural Processing Unit) related mode:local Local mode related labels Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

device:amd-gpu AMD GPU related device:npu NPU (Neural Processing Unit) related device:nvidia-gpu NVIDIA GPU related mode:api API mode related mode:local Local mode related priority:high High priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: use with_global_system() in Jetson and Tenstorrent readers to prevent FD leak

1 participant