- 
                Notifications
    
You must be signed in to change notification settings  - Fork 670
 
feat: add trtllm env variables + gpu part numbers to config dump #3912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: William Arnold <[email protected]>
          
WalkthroughThe configuration dump module has been extended in two ways: environment variable prefixes have been expanded with 14 new prefixes to capture additional system configurations, and GPU system information extraction has been refactored to parse nvidia-smi XML output instead of CSV format. Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 
 Poem
 Pre-merge checks✅ Passed checks (3 passed)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
components/src/dynamo/common/config_dump/system_info.py (1)
138-164: Consider adding default values for missing GPU fields.When optional fields like
product_name,board_part_number, orgpu_part_numberare missing, they're omitted from thegpu_infodict entirely. This creates inconsistent dict structures across GPUs.Consider providing default values for consistency:
# Extract product name product_name = gpu_elem.find("product_name") - if product_name is not None: - gpu_info["name"] = product_name.text + gpu_info["name"] = product_name.text if product_name is not None else "unknown" # Extract driver version gpu_info["driver_version"] = driver_version # Extract memory total fb_memory = gpu_elem.find("fb_memory_usage/total") - if fb_memory is not None: - gpu_info["memory_total"] = fb_memory.text + gpu_info["memory_total"] = fb_memory.text if fb_memory is not None else "unknown" # Extract board part number board_part = gpu_elem.find("board_part_number") - if board_part is not None: - gpu_info["board_part_number"] = board_part.text + gpu_info["board_part_number"] = board_part.text if board_part is not None else None # Extract GPU part number gpu_part = gpu_elem.find("gpu_part_number") - if gpu_part is not None: - gpu_info["gpu_part_number"] = gpu_part.text + gpu_info["gpu_part_number"] = gpu_part.text if gpu_part is not None else None
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/common/config_dump/environment.py(1 hunks)components/src/dynamo/common/config_dump/system_info.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
components/src/dynamo/common/config_dump/system_info.py
114-118: Starting a process with a partial executable path
(S607)
124-124: Using xml to parse untrusted data is known to be vulnerable to XML attacks; use defusedxml equivalents
(S314)
171-171: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: sglang
 - GitHub Check: trtllm (amd64)
 - GitHub Check: trtllm (arm64)
 - GitHub Check: vllm (arm64)
 - GitHub Check: vllm (amd64)
 - GitHub Check: operator (amd64)
 - GitHub Check: Build and Test - dynamo
 
🔇 Additional comments (4)
components/src/dynamo/common/config_dump/environment.py (1)
24-37: Verify broad prefix scope is intentional.Several newly added prefixes are quite broad and may capture many environment variables:
NVIDIA_- catches all NVIDIA-related variablesOVERRIDE_- catches any override-related variablesPYTORCH_- overlaps with existingTORCH_prefix (line 20)Ensure this broad scope aligns with your config dump requirements, as it may capture more variables than necessary.
components/src/dynamo/common/config_dump/system_info.py (3)
111-111: LGTM: XML parsing approach is sound.The switch from CSV to XML format provides structured access to additional fields like part numbers.
Also applies to: 116-117
124-124: Static analysis warning is a false positive here.The static analysis tool flags XML parsing as potentially vulnerable (S314), but since you're parsing output from the locally-executed
nvidia-smicommand (a trusted system utility), this is not a security concern in this context.
167-172: Broad exception handling is acceptable here.While catching bare
Exception(flagged by BLE001) is generally discouraged, it's appropriate for this best-effort function that should never crash the caller. The debug logging ensures issues are still visible.
| "TOKENIZERS_", | ||
| "DISABLE_TORCH_", | ||
| "PYTORCH_", | ||
| "ENABLE_PERFECT_ROUTER", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent prefix pattern.
Line 35 lacks a trailing underscore, unlike all other prefixes. This means it will only match the exact variable name ENABLE_PERFECT_ROUTER, not variants like ENABLE_PERFECT_ROUTER_FOO.
Apply this diff to fix the inconsistency:
-    "ENABLE_PERFECT_ROUTER",
+    "ENABLE_PERFECT_ROUTER_",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "ENABLE_PERFECT_ROUTER", | |
| "ENABLE_PERFECT_ROUTER_", | 
🤖 Prompt for AI Agents
In components/src/dynamo/common/config_dump/environment.py around line 35, the
prefix "ENABLE_PERFECT_ROUTER" is missing the trailing underscore used by all
other prefixes, so it only matches the exact variable name instead of variants;
update the prefix to "ENABLE_PERFECT_ROUTER_" (add the trailing underscore) so
it follows the same pattern and will match environment variables like
ENABLE_PERFECT_ROUTER_FOO.
Overview:
Adds gpu/board part numbers and trtllm environment varibales to config dump.
Details:
Summary by CodeRabbit
Release Notes