- 
                Notifications
    You must be signed in to change notification settings 
- Fork 663
feat: restructure dsr1 recipes and add gb200 #3891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # 4096 = 256 * 16 | ||
| # moe_max_num_tokens: 4096 | ||
| load_balancer: /mnt/recipes/deepseek-r1/trtllm/wide_ep/eplb.yaml | ||
| load_balancer: /mnt/recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assuming some specific PVC mounting path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is actually coming from the slurm guides, in a separate pr we should remove those and turn all of these into k8s examples (if we want to keep slurm examples we need to fully separate it out)
| values: | ||
| - "true" | ||
| mainContainer: | ||
| image: rohanv672/dynamo:0.5.1-trtllm-ssh | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to update this obviously 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, testing with actual dynamo container today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| initialDelaySeconds: 30 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3000 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably fix this to a more reasonable number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats a good value you think, depending on pvc speed it can end up taking around an hour, I'm thinking 1.5 hours with a comment saying its dependent on your pvc speed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to around 2 hrs and added comment about slow pvc potentially requiring update of probe
| requiredDuringSchedulingIgnoredDuringExecution: | ||
| nodeSelectorTerms: | ||
| - matchExpressions: | ||
| - key: nvidia.com/gpu.present | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary given we also specify it in the limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| metadata: | ||
| name: trtllm-test-compute-domain | ||
| spec: | ||
| numNodes: 9 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a comment here and in prefill/decode that these need to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or ideally that you don't need to specify it ahead of time and the compute domain will grow/shrink accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once grove adds in support for automatic dra we'll remove this, will add in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
| name: trtllm-disagg-multinode | ||
| spec: | ||
| pvcs: | ||
| - name: modelcache-pvc | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need the PVC here and in each component? Also where is this PVC defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya will be adding in the same model_downloader setup as biswas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added model-downloader for pvc, also now i dont mount pvc into frontend
9bb62ad    to
    1692c43      
    Compare
  
    | WalkthroughThe changes reorganize TRTLLM configuration file paths from a flat structure to hierarchical aggregated/disaggregated subdirectories, rename model cache PVC references for consistency, add a Kubernetes model-download Job, and introduce new disaggregated deployment configurations for GB200 systems. Changes
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 
 Poem
 Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
 ✅ Passed checks (1 passed)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
recipes/deepseek-r1/model_cache/model-download.yaml (1)
1-37: Approve with note on root user security posture.The model-download Job is well-structured and the hardcoded model names (
nvidia/DeepSeek-R1-FP4anddeepseek-ai/DeepSeek-R1) follow the intended recipe-specific pattern per learnings. The PVC naming is consistent, and downloading both FP4 and standard variants provides flexibility for different hardware (GB200 vs. Hopper as noted in documentation). However, Checkov flags CKV_K8S_23 and CKV_K8S_20 for running as root. While this is acceptable for a one-time batch download job, consider running as a non-root user for improved security posture.recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml (2)
76-78: Add clarifying comment for decode nodeCount dependency.Lines 76–78 appropriately document that
ComputeDomain.numNodesmust equalprefill.nodeCount + decode.nodeCount. While this is correctly calculated as 1 + 8 = 9 (line 84), the decode section (line 192) lacks a similar comment explaining that itsnodeCount: 8contributes to the ComputeDomain total. For consistency and to aid future maintainers, add a brief comment above line 192 as well.Add this comment above line 192:
multinode: # NOTE: This nodeCount contributes to ComputeDomain.numNodes (see above) nodeCount: 8Also applies to: 76-192
155-162: Startup probe configuration is appropriate but document the intent.Both prefill (lines 155–162) and decode (lines 213–220) services use identical startup probe settings with
failureThreshold: 500(~83 minutes at 10-second intervals). A past review suggested adding a comment explaining why these values match and their relationship to PVC speed. Consider adding a clarifying comment to line 155 (or above the prefill startupProbe block):# NOTE: Prefill and decode startupProbe thresholds must match. # With PVC latency, model loading can take considerable time. # Adjust failureThreshold based on your PVC performance. startupProbe:Also applies to: 213-220
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
- docs/backends/trtllm/README.md(1 hunks)
- docs/backends/trtllm/multinode/multinode-examples.md(2 hunks)
- examples/basics/multinode/trtllm/srun_aggregated.sh(1 hunks)
- examples/basics/multinode/trtllm/srun_disaggregated.sh(1 hunks)
- recipes/deepseek-r1/model-cache/model-cache.yaml(1 hunks)
- recipes/deepseek-r1/model_cache/model-download.yaml(1 hunks)
- recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml(5 hunks)
- recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml(5 hunks)
- recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml(1 hunks)
- recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml(1 hunks)
- recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml(1 hunks)
- recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/perf.yaml(1 hunks)
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml(1 hunks)
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: KrishnanPrash
PR: ai-dynamo/dynamo#2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-07-31T11:26:48.422Z
Learnt from: KrishnanPrash
PR: ai-dynamo/dynamo#2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.
Applied to files:
- docs/backends/trtllm/README.md
- recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml
- docs/backends/trtllm/multinode/multinode-examples.md
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml
- recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml
📚 Learning: 2025-07-02T13:20:28.800Z
Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Applied to files:
- docs/backends/trtllm/multinode/multinode-examples.md
- examples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Applied to files:
- recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
- recipes/deepseek-r1/model_cache/model-download.yaml
- recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
- recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
- recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
- recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml
📚 Learning: 2025-06-05T01:46:15.509Z
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Applied to files:
- examples/basics/multinode/trtllm/srun_aggregated.sh
🪛 Checkov (3.2.334)
recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/perf.yaml
[medium] 3-153: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[high] 3-153: Container should not be privileged
(CKV_K8S_16)
[medium] 3-153: Minimize the admission of root containers
(CKV_K8S_23)
recipes/deepseek-r1/model_cache/model-download.yaml
[medium] 3-37: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-37: Minimize the admission of root containers
(CKV_K8S_23)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3891/merge) by nvrohanv.
recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml
[error] 73-73: Trailing whitespace found and removed by pre-commit. (pre-commit: trailing-whitespace)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: operator (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (14)
examples/basics/multinode/trtllm/srun_disaggregated.sh (1)
20-20: Path reorganization for disaggregated deployments looks good.The default engine config paths now clearly reflect the disaggregated subdirectory structure (
disagg/wide_ep/), making the distinction from aggregated configs explicit and improving discoverability.Also applies to: 24-24
examples/basics/multinode/trtllm/srun_aggregated.sh (1)
21-21: Export addition and path reorganization are appropriate.Explicitly exporting
ENGINE_CONFIGensures the variable is properly passed to worker containers via the--container-envflag (line 66), improving clarity and reliability of environment propagation in multi-node setups.recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml (1)
20-20: Load balancer path reorganization is consistent with disaggregated layout.The path update to
disagg/wide_ep/eplb.yamlaligns with the new directory structure, and theWIDEEPbackend field (line 19) is correctly in all caps per TRTLLM LLM-API requirements.docs/backends/trtllm/README.md (1)
165-165: Documentation path update maintains consistency with reorganized config structure.The AGG_ENGINE_ARGS path update correctly reflects the new aggregated subdirectory layout (
agg/mtp/), ensuring users follow current file organization when running examples.recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml (1)
20-20: Load balancer path correctly reflects disaggregated config organization.Mirrors the prefill config structure with
disagg/wide_ep/eplb.yamlpath, ensuring consistent load balancer reference across disaggregated deployment components.recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml (1)
10-10: Deployment manifest consistently updates PVC reference and model paths.All PVC references now use
model-cache-pvc(consistent with the renamed PVC), and model paths correctly reference the mount point/model-cache/deepseek-r1. Changes are consistently applied to both decode and prefill workers.Also applies to: 37-38, 58-58, 90-91, 111-111
recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml (1)
14-14: Aggregated config load balancer path correctly reorganized.The path update to
agg/wide_ep/eplb.yamlaligns with the aggregated subdirectory structure, mirroring the disaggregated pattern and improving config file organization and discoverability.recipes/deepseek-r1/model-cache/model-cache.yaml (1)
6-6: PVC rename is incomplete—stale references remain across recipes; clarify scope and verify deepseek-r1 manifests are fully updated.The verification reveals the PVC rename to
model-cache-pvcis incomplete. While only 12 references tomodel-cache-pvcexist, stale references to the oldmodel-cachename persist across multiple recipes (qwen3-32b-fp8, llama-3-70b, gpt-oss-120b, and deepseek-r1 itself).Confirm whether this PR intends to:
- Update only deepseek-r1 manifests (then verify all deepseek-r1 files reference
model-cache-pvc)- Update all recipes consistently (then address stale references in qwen3-32b-fp8, llama-3-70b, gpt-oss-120b, and remaining deepseek-r1 files)
Cross-recipe inconsistency risks deployment failures if some manifests reference the old name.
recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml (1)
1-7: LGTM on structure and licensing.The new EPLB configuration file is clean, properly licensed, and the MOE load balancer settings (num_slots: 288, layer_updates_per_iter: 2) align with the TensorRT-LLM documentation referenced in the comment. This externalization of the load balancer config from inline definitions is a good refactor for maintainability.
recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml (1)
10-10: Verify consistency of model cache directory naming.The PVC name is correctly updated to
model-cache-pvc(line 10), and mount paths consistently use/model-cache(lines 35–36, 84–85). However, the--model-patharguments reference/model-cache/deepseek-r1(lines 56, 105) while the model-download job (recipes/deepseek-r1/model_cache/model-download.yaml) downloads to both/model-cache/deepseek-r1-fp4and/model-cache/deepseek-r1. Confirm whether this is intentional (using the non-FP4 variant) or should match the FP4 path used in other deployments.Also applies to: 35-36, 84-85, 56-56, 105-105
docs/backends/trtllm/multinode/multinode-examples.md (1)
139-139: Documentation paths correctly aligned with new structure.The path updates properly reflect the reorganization into
agg/wide_ep/anddisagg/wide_ep/subdirectories. The example commands (lines 139, 168–169) will guide users to the correct configuration files under the new hierarchy. No functional issues detected.Also applies to: 168-169
recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/perf.yaml (1)
59-59: Align hardcoded tokenizer path with model-download outputs.Line 59 hardcodes the tokenizer path to
/model-cache/deepseek-r1-fp4, but the model-download job (recipes/deepseek-r1/model_cache/model-download.yaml) downloads models to both/model-cache/deepseek-r1-fp4and/model-cache/deepseek-r1. Additionally, the deployment manifest (recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml) references/model-cache/deepseek-r1for the model path. Confirm whether the tokenizer should be parameterized (via an environment variable or ConfigMap) or hardcoded to match the deployment's model path.recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml (2)
119-119: Container image requires confirmation.Line 119 references
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.6.0. A past review comment indicated this placeholder "will need to update," and there's a corresponding author comment stating "testing with actual dynamo container today." Confirm that this image tag is finalized and available in the registry before merge, or document if it's intentionally a placeholder for manual deployment configuration.
69-69: Backend field correctly uses all-caps WIDEEP.Line 69 correctly specifies
backend: WIDEEP(all capitals) in the decode config. This aligns with TRTLLM LLM-API requirements for backend field names. Based on learnings
, this is the correct format.
| cache_transceiver_config: | ||
| max_tokens_in_buffer: 4608 | ||
| backend: DEFAULT | ||
| stream_interval: 20 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix trailing whitespace on line 73.
The pre-commit hook flagged trailing whitespace at line 73. This is in the decode-config ConfigMap YAML. Remove any trailing spaces on this line before merge.
🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3891/merge) by nvrohanv.
[error] 73-73: Trailing whitespace found and removed by pre-commit. (pre-commit: trailing-whitespace)
🤖 Prompt for AI Agents
In recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml around line 73,
there is trailing whitespace after "stream_interval: 20"; open the file, remove
any spaces or tabs after the value on line 73 so the line ends immediately after
"20", save the file, and run the pre-commit hook or lint to confirm the trailing
whitespace is gone.
| imagePullPolicy: IfNotPresent | ||
| name: perf | ||
| securityContext: | ||
| privileged: true | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Remove privileged mode unless strictly necessary for performance testing.
The container runs with securityContext.privileged: true (line 142), which is flagged as a high-severity security risk (CKV_K8S_16). While the network tuning at line 34 (sysctl -w net.ipv4.ip_local_port_range) may require elevated privileges, consider whether this can be accomplished via allowPrivilegeEscalation and targeted Linux capabilities instead, or defer to the host network configuration.
Is the privileged mode strictly necessary, or can network tuning be handled outside the container or via setLinuxCapabilities?
I'll verify this review comment by examining the actual file and understanding the security context requirements.
🏁 Script executed:
#!/bin/bash
# Find and examine the perf.yaml file
fd -t f "perf.yaml" | head -20Length of output: 334
🏁 Script executed:
#!/bin/bash
# If found, examine the security context and network tuning sections
fd -t f "perf.yaml" -x cat {}Length of output: 41288
🏁 Script executed:
#!/bin/bash
# Search for securityContext patterns in Job specs across the codebase
rg -n "securityContext:" -A 5 -B 2 --type=yaml | grep -A 5 -B 2 "privileged"Length of output: 3917
🏁 Script executed:
#!/bin/bash
# Search for other network tuning or sysctl usage patterns
rg -n "sysctl" -B 5 -A 5 --type=yamlLength of output: 22978
---
🌐 Web query:
```web
Kubernetes pod securityContext sysctls net.ipv4.ip_local_port_range namespace restriction
💡 Result:
Short answer
- net.ipv4.ip_local_port_range is in Kubernetes' "safe" (namespaced) sysctls and may be set per-Pod via spec.securityContext.sysctls. [1]
- It cannot be used when the Pod uses hostNetwork (net namespace must be namespaced). [1]
- Other sysctls that are not namespaced (node-level) cannot be set in pod securityContext; unsafe sysctls must be explicitly allowed on each node with kubelet's --allowed-unsafe-sysctls. [1]
- Use admission policies (e.g., Kyverno/PSP replacements) to restrict which sysctls (including allowing only net.ipv4.ip_local_port_range) are permitted in your cluster. [2]
Relevant docs
- Kubernetes docs: "Using sysctls in a Kubernetes Cluster" (safe vs unsafe, namespaced sysctls, hostNetwork rule, kubelet flag). [1]
- Example policy enforcement (Kyverno) showing restricting pods to the safe sysctl set including net.ipv4.ip_local_port_range. [2]
Replace container-level privileged: true with pod-level sysctls to eliminate unnecessary privilege escalation.
The container runs with securityContext.privileged: true (line 142) to execute sysctl -w net.ipv4.ip_local_port_range="1024 65000". However, net.ipv4.ip_local_port_range is a namespaced (safe) sysctl in Kubernetes and can be set at the pod level via spec.securityContext.sysctls without requiring privileged mode.
Replace the privileged security context with:
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
    sysctls:
    - name: net.ipv4.ip_local_port_range
      value: "1024 65000"Remove the sysctl command from the container startup script. This pattern applies to all perf.yaml files in the repository (recipes/*/perf.yaml).
🤖 Prompt for AI Agents
In recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/perf.yaml around line 142,
the container securityContext sets privileged: true solely to run sysctl -w
net.ipv4.ip_local_port_range="1024 65000"; remove privileged: true and instead
add a pod-level spec.securityContext with runAsUser: 1000, runAsGroup: 1000,
fsGroup: 1000 and a sysctls list containing name: net.ipv4.ip_local_port_range
and value: "1024 65000"; also remove the sysctl command from the container
startup script; apply the same change to all recipes/*/perf.yaml files.
…nities to empty, remove pvc from frontend
8406768    to
    6b83018      
    Compare
  
    
Restructured example layout, separated out agg cases from disagg for cleaner directory tree. In future each example config should have its own directory with a
deploy.yamlandperf.yaml. I added the DeepSeek-R1 wide-ep gb200 recipe in this format. After my changes there are still slurm examples that need to be migrated.Cleaned up model-download flow, now model-cache directory contains:
model-cache.yaml — defines a PVC named model-cache-pvc
model-download.yaml — performs the model download into the PVC
Several changes were made within model-download.yaml:
Updated volume mount names to consistently use model-cache.
shifted
hf downloadcommand tohf download <model> --local-dir /model-cacheto avoid dealing with model-revisions and needing to figure out snapshot id values