fix: use with_global_system() to prevent FD leak in Jetson, Tenstorrent, and NVIDIA readers (#120) by inureyes · Pull Request #121 · lablup/all-smi

inureyes · 2026-02-08T08:30:48Z

Summary

Fixes #120 - Replace per-call System::new() instances with with_global_system() to prevent file descriptor leaks in API mode (long-running metrics loop).

Problem

The NVIDIA Jetson, Tenstorrent, and NVIDIA readers were creating new System::new() instances on every get_process_info() call. In API mode, this runs every interval indefinitely, leaking /proc file descriptors each cycle.

Changes

NVIDIA Jetson (`src/device/readers/nvidia_jetson.rs`)

Line 176: get_process_info() now uses with_global_system()
Line 261: get_gpu_processes() helper now uses with_global_system()
Removed unused use sysinfo::System; import

Tenstorrent (`src/device/readers/tenstorrent.rs`)

Line 201: get_process_info() now uses with_global_system()
Removed unused use sysinfo::System; import

NVIDIA (`src/device/readers/nvidia.rs`)

Removed system: Mutex<System> field from NvidiaGpuReader struct
Migrated get_process_info() to use with_global_system() for consistency with the rest of the codebase

Approach

Uses the with_global_system() function from src/utils/system.rs which provides a global Mutex<System> instance (GLOBAL_SYSTEM). This is the standard pattern already used by:

AMD reader (src/device/readers/amd.rs:583-585)
Apple Silicon reader (src/device/readers/apple_silicon_native.rs:388)
Local collector (src/view/data_collection/local_collector.rs:304, 433)

Testing

✅ Code compiles successfully with cargo build
✅ All three readers now use the same consistent pattern as other readers in the codebase
✅ No functional changes to the API, only internal optimization

Closes #120

Replace per-call System::new() instances with with_global_system() to prevent file descriptor leaks in API mode (long-running metrics loop). Changes: - NVIDIA Jetson (nvidia_jetson.rs): - Line 176: get_process_info() now uses with_global_system() - Line 261: get_gpu_processes() helper now uses with_global_system() - Tenstorrent (tenstorrent.rs): - Line 201: get_process_info() now uses with_global_system() - NVIDIA (nvidia.rs): - Removed system: Mutex<System> field from struct - Migrated get_process_info() to use with_global_system() for consistency This follows the standard pattern already used by AMD reader, Apple Silicon reader, and local collector. Fixes #120

inureyes · 2026-02-08T08:34:39Z

Security & Performance Review

Analysis Summary

PR Branch: fix/120-fd-leak-jetson-tenstorrent (all analysis performed here)
Scope: changed-files with full-context cross-reference
Languages: Rust
Total issues: 0 actionable (1 pre-existing observation noted below)
Critical: 0 | High: 0 | Medium: 0 | Low: 0

Review Checklist

Security

No security vulnerabilities introduced
No new unsafe blocks
No new external input handling
No hardcoded credentials or secrets
No command injection vectors

Performance - Global System Pattern

with_global_system() correctly applied in all three readers (nvidia.rs, nvidia_jetson.rs, tenstorrent.rs)
No nested with_global_system() calls that could deadlock -- all uses are sequential within each call chain
Mutex contention analysis: In local_collector.rs, GPU reader get_process_info() (which now uses with_global_system) runs concurrently with the "full process collection" task (which also uses with_global_system). This introduces contention on GLOBAL_SYSTEM, but NOT deadlock -- one task will simply wait for the other. This is acceptable and consistent with the existing AMD reader behavior.
Lock hold duration is appropriate -- only held during process refresh and data extraction, then released
No System::new() calls remain in any of the three changed files

Correctness

All System::new() calls properly replaced with with_global_system()
NVIDIA reader: system: Mutex<System> field correctly removed from struct, constructor, and usage
Jetson reader: Both get_process_info() (line 179) and get_gpu_processes() fallback path (line 261) correctly use with_global_system()
Tenstorrent reader: get_process_info() correctly uses with_global_system()
use sysinfo::System; import correctly removed from all three files
with_global_system import correctly added to all three files
Code compiles cleanly (cargo check passes)
No clippy warnings (cargo clippy -- -D warnings passes)
All 19 tests pass

Code Quality

Consistent pattern with existing AMD and Apple Silicon readers
No unused imports
Comments properly updated to reflect the new pattern
Proper scoping of use sysinfo::{...} imports within functions

Pre-existing Observation (Not Introduced by This PR)

In nvidia_jetson.rs get_gpu_processes() fallback path (line 261-296): When with_global_system is used, only system.refresh_memory() is called before iterating system.processes(). Process data is NOT refreshed, so it reads whatever process state was last populated by another caller. This was the same in the original code (System::new() + refresh_memory() would have had an empty process list), so this PR does not regress behavior -- it actually slightly improves it since the global system may contain previously-refreshed process data.

Verdict

This PR is clean and ready to merge. The changes are mechanical, consistent, and correctly follow the established with_global_system() pattern. No issues found that require fixing.

inureyes · 2026-02-08T08:36:30Z

PR Finalization Report

Project Structure

Project Type: Rust (Cargo)
Test Framework: cargo test (166 lib + 169 bin + 32 integration + 19 doc tests = 386 total)
Documentation System: Plain markdown in docs/
Lint Tools: cargo fmt, cargo clippy (with .clippy.toml config)

Checklist

Tests

Analyzed existing test structure (unit tests in src/, integration tests in tests/)
Identified test coverage needs: No new tests required -- this is a pure internal refactor replacing System::new() with with_global_system() across three readers. No public API changes.
All 386 tests passing (0 failures)

Documentation

Reviewed documentation needs: No updates required
- No public API changes (internal refactor only)
- No new features or configuration options
- No changes to CLI behavior or output

Code Quality

cargo fmt --check: Passed (no formatting issues)
cargo clippy -- -D warnings: Passed (no warnings)
All unused imports (sysinfo::System) properly removed
Consistent pattern applied across all three readers (nvidia, nvidia_jetson, tenstorrent)

Changes Verified

src/device/readers/nvidia.rs: Removed system: Mutex<System> field, migrated get_process_info() to with_global_system()
src/device/readers/nvidia_jetson.rs: Migrated both get_process_info() and get_gpu_processes() to with_global_system()
src/device/readers/tenstorrent.rs: Migrated get_process_info() to with_global_system()

All three readers now match the pattern already used by AMD, Apple Silicon, and local collector readers.

Status

All checks passing. Ready for merge.

inureyes merged commit 31389b5 into main Feb 8, 2026
4 checks passed

inureyes deleted the fix/120-fd-leak-jetson-tenstorrent branch February 8, 2026 08:37

inureyes self-assigned this Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use with_global_system() to prevent FD leak in Jetson, Tenstorrent, and NVIDIA readers (#120)#121

fix: use with_global_system() to prevent FD leak in Jetson, Tenstorrent, and NVIDIA readers (#120)#121
inureyes merged 1 commit intomainfrom
fix/120-fd-leak-jetson-tenstorrent

inureyes commented Feb 8, 2026

Uh oh!

inureyes commented Feb 8, 2026

Uh oh!

inureyes commented Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inureyes commented Feb 8, 2026

Summary

Problem

Changes

NVIDIA Jetson (src/device/readers/nvidia_jetson.rs)

Tenstorrent (src/device/readers/tenstorrent.rs)

NVIDIA (src/device/readers/nvidia.rs)

Approach

Testing

Uh oh!

inureyes commented Feb 8, 2026

Security & Performance Review

Analysis Summary

Review Checklist

Security

Performance - Global System Pattern

Correctness

Code Quality

Pre-existing Observation (Not Introduced by This PR)

Verdict

Uh oh!

inureyes commented Feb 8, 2026

PR Finalization Report

Project Structure

Checklist

Tests

Documentation

Code Quality

Changes Verified

Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NVIDIA Jetson (`src/device/readers/nvidia_jetson.rs`)

Tenstorrent (`src/device/readers/tenstorrent.rs`)

NVIDIA (`src/device/readers/nvidia.rs`)