Skip to content

Conversation

@csabakecskemeti
Copy link

@csabakecskemeti csabakecskemeti commented Oct 27, 2025

Overview:

Support DGX Spark for VLLM backend

Details:

New dockerfile for DGX Spark specific instructions:

  • Use Arm64 base image and CUDA13
  • NIXL is buld on-the-fly from version 0.7.0 for CUDA13 compatibility

build.hs also updated to build for DGX Spark
./container/build.sh --framework VLLM --dgx-spark
This uses the new dockerfiel and sets the proper arm64 platform

Where should the reviewer start?

container/build.sh
container/Dockerfile.vllm.dgx-spark

Then new readmes, especially
docs/backends/vllm/DGX-SPARK_README.md

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: N/A

Tested scenarios:

Aggregated serving ✅
  • Dynamo inference service works properly with single DGX Spark node
Disaggregated serving ❗

Tested setup:
[node1] DGX Spark host the dynamo dcgm-exporter, nats, etcd the dynamo frontend and serve as prefill worker
[node2] x86/64 RTX 3090 (CUDA12) serve as the decode worker

Setup I've implemented

Issue ❌
Inference stalls, no response no acivity on any of the workers. I assume it's an issue with my setup or CUDA 13 support (NIXL 0.7.0) related.

Both workes seems registered:
2025-10-27T17:28:11.052173Z INFO dynamo_llm::discovery::watcher: added model model_name="Qwen/Qwen3-0.6B" namespace="dynamo" 2025-10-27T17:28:11.057018Z INFO dynamo_llm::kv_router::scheduler: Runtime config found for worker_id: 7587890414798442552 2025-10-27T17:28:11.057031Z WARN dynamo_llm::kv_router::sequence: Adding worker WorkerWithDpRank { worker_id: 7587890414798442552, dp_rank: 0 }

At inference the router seemingly aware of the 2 workes:

2025-10-27T17:30:43.780177Z INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=7587890414798442538 dp_rank=0 with 26 cached blocks: 0.438 = 1.0 * prefill_blocks + decode_blocks = 1.0 * 0.438 + 0.000 2025-10-27T17:30:43.780211Z INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890414798442538 dp_rank=0, logit: 0.438, cached blocks: 26, total blocks: 61707 2025-10-27T17:30:43.839306Z WARN nixl_connector.get_finished: Releasing expired KV blocks for request 635e21b8-c622-4501-a580-57bae888e2d6 which were retrieved by 0 decode worker(s) within 120 seconds. 2025-10-27T17:30:43.863645Z INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=7587890414798442552 dp_rank=0 with 0 cached blocks: 26.000 = 0.0 * prefill_blocks + decode_blocks = 0.0 * 26.438 + 26.000 2025-10-27T17:30:43.863670Z INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890414798442552 dp_rank=0, logit: 26.000, cached blocks: 0, total blocks: 11432

But nothing happens.
Note both worker (DGX Spark, 3090) is able to infer standalobe with the Dynamo frontend running on the DGX Spark

Summary by CodeRabbit

New Features

  • Added support for building and deploying vLLM on DGX-SPARK infrastructure with ARM64 architecture.

Documentation

  • Added comprehensive guides covering DGX-SPARK build procedures, configuration options, multi-node deployment patterns, quick start examples, and troubleshooting resources.

@csabakecskemeti csabakecskemeti requested review from a team as code owners October 27, 2025 18:11
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi csabakecskemeti! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Oct 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

The changes introduce support for DGX-SPARK (ARM64) vLLM builds, adding a specialized Dockerfile, updating the build script with --dgx-spark CLI option, and providing comprehensive documentation for build procedures and deployment configurations.

Changes

Cohort / File(s) Summary
Documentation
container/BUILD_DGX_SPARK_GUIDE.md, docs/backends/vllm/DGX-SPARK_README.md
Added guides for DGX-SPARK Dockerfile selection logic, build/run usage examples, feature support matrix, multi-node deployment patterns, configuration reference, and troubleshooting.
Build Infrastructure
container/build.sh, container/Dockerfile.vllm.dgx-spark
Introduced --dgx-spark CLI option and USE_DGX_SPARK flag to enforce linux/arm64 platform and select DGX-specific Dockerfile for VLLM. Added multi-stage Dockerfile with custom UCX and NIXL builds, Python environment setup, and dev tooling stage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Dockerfile complexity: Multi-stage build with custom UCX and NIXL compilation from source, dependency management, and virtual environment setup requires verification of build correctness and layer ordering.
  • Build script conditionals: DGX-SPARK vs. standard build branching logic, base image handling, and platform override behavior need validation against existing build flows.
  • Integration points: Verify that build arguments, environment variable passing, and Dockerfile selection logic align across the script modifications.

Poem

🐰 DGX-SPARK ignites with ARM64 might,
Multi-stage builds dance through CUDA night!
UCX and NIXL bloom in layers deep,
Documentation guides where builders leap! 🚀✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "[proposal, review only] dgx spark support - initial; cuda13 and arm base image" directly relates to the primary objective of the changeset, which is to add DGX Spark support for VLLM with CUDA 13 and an ARM64 base image. The title is specific and highlights the main technical changes—DGX Spark support, CUDA 13 compatibility, and ARM base image selection—making it clear to teammates reviewing the history what this PR accomplishes. While the title is moderately long, it avoids vague terminology and accurately represents the core of the changeset. The "[proposal, review only]" prefix appropriately flags the review status without compromising clarity of the actual change being introduced.
Description Check ✅ Passed The PR description comprehensively follows the required template structure. It includes all mandatory sections: Overview (clearly stating "Support DGX Spark for VLLM backend"), Details (explaining the new Dockerfile, NIXL build strategy, and build.sh changes), Where should the reviewer start (pointing to key files), and Related Issues (explicitly noting N/A). The description goes beyond the template by providing tested scenarios with specific outcomes, including successful aggregated serving and the known limitation with disaggregated serving, along with relevant logs. This additional context demonstrates thoroughness while remaining well-organized and easy to follow.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
container/build.sh (1)

627-632: Build argument logic is correct but uses double negation; consider improving clarity.

The condition if [[ ! ((...)) ]] (skip build args when NOT DGX) is functionally correct—it prevents passing BASE_IMAGE args to Dockerfile.vllm.dgx-spark, which hardcodes its base images. However, the double negation reduces readability.

Consider this refactor for improved maintainability:

# Skip BASE_IMAGE and BASE_IMAGE_TAG for DGX-SPARK builds
# (DGX-SPARK Dockerfile hardcodes its own base images)
if ! ( [[ "$USE_DGX_SPARK" == "true" ]] || ( [[ "$PLATFORM" == *"linux/arm64"* ]] && [[ $FRAMEWORK == "VLLM" ]] ) ); then
    BUILD_ARGS+=" --build-arg BASE_IMAGE=$BASE_IMAGE --build-arg BASE_IMAGE_TAG=$BASE_IMAGE_TAG"
fi

Or, using positive logic:

# Only pass BASE_IMAGE args for non-DGX builds
if [[ "$USE_DGX_SPARK" != "true" ]] && ! ( [[ "$PLATFORM" == *"linux/arm64"* ]] && [[ $FRAMEWORK == "VLLM" ]] ); then
    BUILD_ARGS+=" --build-arg BASE_IMAGE=$BASE_IMAGE --build-arg BASE_IMAGE_TAG=$BASE_IMAGE_TAG"
fi
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07cf074 and 5ace02d.

📒 Files selected for processing (4)
  • container/BUILD_DGX_SPARK_GUIDE.md (1 hunks)
  • container/Dockerfile.vllm.dgx-spark (1 hunks)
  • container/build.sh (6 hunks)
  • docs/backends/vllm/DGX-SPARK_README.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • container/Dockerfile.vllm.dgx-spark
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3909/merge) by csabakecskemeti.
docs/backends/vllm/DGX-SPARK_README.md

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace).

🪛 LanguageTool
docs/backends/vllm/DGX-SPARK_README.md

[grammar] ~8-~8: Use a hyphen to join words.
Context: ...rage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric re...

(QB_NEW_EN_HYPHEN)


[grammar] ~80-~80: Use a hyphen to join words.
Context: ...n.md) | ✅ | ARM64 supported | ### Large Scale P/D and WideEP Features | Feature...

(QB_NEW_EN_HYPHEN)


[style] ~231-~231: The verb “get” can be informal. Consider replacing it with a form of “to be”.
Context: ...e components. The final components that get spawned depend upon the chosen deployment patte...

(GET_USED_ARE_USED)


[grammar] ~265-~265: Use a hyphen to join words.
Context: ... launch/disagg_router.sh ``` ### Single Node Data Parallel Attention / Expert Pa...

(QB_NEW_EN_HYPHEN)


[style] ~429-~429: The double modal “needed Cost” is nonstandard (only accepted in certain dialects). Consider “to be Cost”.
Context: ...efill or decode workers as needed > - Cost efficiency: Leverage existing hardwar...

(NEEDS_FIXED)


[grammar] ~456-~456: Ensure spelling is correct
Context: ...vLLM processes when relying on Python's builtin hashing for prefix caching. - If your v...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~523-~523: Use a hyphen to join words.
Context: ...ARK ``` ### Getting Help For DGX-SPARK specific issues: 1. Check this README fo...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
container/BUILD_DGX_SPARK_GUIDE.md

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


67-67: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


91-91: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/backends/vllm/DGX-SPARK_README.md

19-19: Link text should be descriptive

(MD059, descriptive-link-text)


30-30: Link fragments should be valid

(MD051, link-fragments)


217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


299-299: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


320-320: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


336-336: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
docs/backends/vllm/DGX-SPARK_README.md (1)

66-87: Feature support matrix is comprehensive and accurate.

The tables clearly communicate which features are supported, in progress, or experimental on DGX-SPARK, with helpful notes. Status indicators and cross-references align well with the rest of the documentation.

container/Dockerfile.vllm.dgx-spark (4)

9-12: Base image and environment setup looks correct for DGX-SPARK.

The Dockerfile properly:

  • Declares hardcoded NVIDIA vLLM base image (nvcr.io/nvidia/vllm:25.09-py3) with CUDA 13 support
  • Uses dynamo_base for utilities (NATS, etcd, uv)
  • Sets PYTHONPATH to include system Python site-packages for NVIDIA's vLLM (line 40)
  • Configures NIXL-specific environment variables (NIXL_PREFIX, NIXL_LIB_DIR, NIXL_PLUGIN_DIR) at lines 44-47

This approach avoids building vLLM from source, which sidesteps the compute_121a nvcc error mentioned in PR objectives.

Also applies to: 33-48


92-139: UCX and NIXL builds from source are appropriately configured for ARM64/CUDA 13.

The builds correctly:

  • Use UCX v1.19.0 with CUDA 13 support and RDMA flags
  • Build NIXL 0.7.0 (first version with CUDA 13) with GDS backend disabled for ARM64 (line 130)
  • Create both C++ library and Python wheel (lines 131-137) with matching build flags
  • Configure library paths and runtime linking (lines 133-135, 144-150)

The approach of building from source (rather than copying from base image) ensures CUDA 13 compatibility and ARM64 optimization.


152-195: Virtual environment and dependency installation is well-designed for DGX-SPARK.

Key design choices are sound:

  • Creates isolated venv with uv, keeping NVIDIA's system vLLM separate (line 40's PYTHONPATH)
  • Installs fresh NIXL 0.7.0 wheel built in this Dockerfile (line 168) rather than cached version
  • Properly separates runtime stage (prod-ready) from dev stage (line 210) for multi-stage efficiency
  • ENTRYPOINT uses NVIDIA's wrapper (line 194) enabling flexible CMD invocations

This maintains clarity between NVIDIA's vLLM (system Python) and Dynamo's runtime (venv).


210-259: Dev stage properly extends runtime with build and debugging tools.

The multi-stage design:

  • Reuses runtime stage as base for dev (line 210), inheriting all NVIDIA/Dynamo setup
  • Adds comprehensive dev utilities (nvtop, tmux, vim, cmake, etc.) for local debugging
  • Includes all Rust/C++ build dependencies (clang, protobuf, pybind11) needed for maturin editable installs
  • Maintains proper Cargo/Rustup paths from dynamo_base (lines 253-254)

This aligns with the learnings about maintaining consistency across framework-specific Dockerfiles while enabling local development workflows.

container/build.sh (1)

494-494: Help text for --dgx-spark is clear and informative.

The new help entry accurately describes the flag's purpose, mentions Blackwell GPU support, and explains ARM64 auto-detection behavior. It aligns well with the implementation and user-facing documentation.

Comment on lines +9 to +21
```
IF framework == "VLLM":
IF --dgx-spark flag is set OR platform is linux/arm64:
Use: Dockerfile.vllm.dgx-spark (NVIDIA's pre-built vLLM with Blackwell support)
ELSE:
Use: Dockerfile.vllm (Build from source)
ELSE IF framework == "TRTLLM":
Use: Dockerfile.trtllm
ELSE IF framework == "SGLANG":
Use: Dockerfile.sglang
ELSE:
Use: Dockerfile
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code blocks (markdownlint-cli2: MD040).

Lines 9-21 (flowchart/pseudocode) and line 67 (error message) are fenced code blocks without language specifiers. Add appropriate language identifiers:

-```
+```text
 IF framework == "VLLM":
    ...

This helps with syntax highlighting and linting compliance.

Also applies to: 67-69

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 9-21 and 67-69, the fenced
code blocks lack a language specifier causing markdownlint MD040 failures;
update the opening triple-backtick lines to include an appropriate language tag
(e.g., ```text for the pseudocode/flowchart and the error message block) so each
fenced block is like ```text ... ``` and ensure spacing/indentation remains
unchanged.

Comment on lines +81 to +93
**Reason 1: CUDA 13 Compatibility**
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`

**Reason 2: Cache Independence**
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results

**Reason 3: ARM64 Optimization**
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Convert emphasis to proper headings (markdownlint-cli2: MD036).

Lines 81, 86, 91 use emphasis for "Reason N" subsection labels. Convert to headings for proper document structure:

-**Reason 1: CUDA 13 Compatibility**
+#### Reason 1: CUDA 13 Compatibility

-**Reason 2: Cache Independence**
+#### Reason 2: Cache Independence

-**Reason 3: ARM64 Optimization**
+#### Reason 3: ARM64 Optimization
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Reason 1: CUDA 13 Compatibility**
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`
**Reason 2: Cache Independence**
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results
**Reason 3: ARM64 Optimization**
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
#### Reason 1: CUDA 13 Compatibility
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`
#### Reason 2: Cache Independence
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results
#### Reason 3: ARM64 Optimization
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


91-91: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 81 to 93, replace the
emphasized subsection labels currently using bold (e.g., **Reason 1: CUDA 13
Compatibility**) with proper Markdown headings (e.g., "### Reason 1: CUDA 13
Compatibility") for each "Reason N" line to satisfy markdownlint MD036; keep the
following bullet lists unchanged and ensure consistent heading level for all
three reason sections.

@@ -0,0 +1,527 @@
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix trailing whitespace detected by pre-commit hook.

Line 1 contains trailing whitespace that was flagged by the pre-commit pipeline. Remove it before merging.

🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3909/merge) by csabakecskemeti.

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace).

🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around line 1, remove the trailing
whitespace character at the end of the line that the pre-commit hook flagged;
edit the file to delete the extra space (or newline-only whitespace) so the line
ends cleanly, save, and re-run the pre-commit checks.


# LLM Deployment using vLLM on DGX-SPARK

This directory contains reference implementations for deploying Large Language Models (LLMs) in various configurations using vLLM on **DGX-SPARK systems**. For Dynamo integration, we leverage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric reporting to enable KV-aware routing and P/D disaggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Apply hyphenation to compound adjectives in static analysis findings.

Lines 8 and 80 use compound adjectives ("NIXL based" and "Large Scale P/D") that should be hyphenated per grammar rules:

  • Line 8: "NIXL based" → "NIXL-based"
  • Line 80: "Large Scale P/D" → "Large-Scale P/D"

Similar issues may appear throughout the document.

Also applies to: 80-80

🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: Use a hyphen to join words.
Context: ...rage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric re...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 8 and 80, compound
adjectives are missing hyphens; change "NIXL based" to "NIXL-based" at line 8
and "Large Scale P/D" to "Large-Scale P/D" at line 80, and scan the rest of the
document for similar compound adjectives (e.g., "X based", "Large Scale", "KV
aware") and apply hyphenation consistently (e.g., "X-based", "Large-Scale",
"KV-aware").


This figure shows an overview of the major components to deploy:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code block (markdownlint-cli2).

Line 217 has a fenced code block without a language specifier. Add a language identifier for better syntax highlighting and linting compliance:

# Change from:

+---...

To:

+---...

This applies to any other fenced code blocks missing language specifiers in the document.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around line 217, there is a fenced
code block missing a language specifier; update the opening triple-backtick to
include an appropriate language (e.g., ```diff or ```bash or ```yaml depending
on the snippet) so markdownlint-cli2 passes and syntax highlighting is enabled,
and scan the file for any other fenced blocks without language specifiers and
add the correct language identifiers to each.


#### Step-by-Step Multi-Node Setup

**1. Infrastructure Setup**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Convert emphasis to proper headings (markdownlint-cli2: MD036).

Lines 299, 320, and 336 use emphasis (bold text: text) for section labels, but markdown linting expects proper heading syntax. Convert these to headings:

-**1. Infrastructure Setup**
+### 1. Infrastructure Setup

-**2. Build Containers**
+### 2. Build Containers

-**3. Deploy Workers**
+### 3. Deploy Workers

This improves document structure and accessibility.

Also applies to: 320-320, 336-336

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

299-299: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 299, 320, and 336,
convert the section labels currently using bold text (**...**) into proper
Markdown headings (e.g., starting with one or more # characters) to satisfy
MD036; replace each bolded label with an appropriate heading level (choose
consistent levels matching surrounding sections, e.g., ## or ###) and remove the
surrounding ** markers so the sections are parsed as headings by markdownlint.


When using KV-aware routing, ensure deterministic hashing across processes to avoid radix tree mismatches. Choose one of the following:

- Set `PYTHONHASHSEED=0` for all vLLM processes when relying on Python's builtin hashing for prefix caching.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix hyphenation in compound adjectives.

Line 456: "Python's builtin hashing" → "Python's built-in hashing"
Line 523: "DGX-SPARK specific issues" → "DGX-SPARK-specific issues"

These are compound modifiers that should be hyphenated.

Also applies to: 523-523

🧰 Tools
🪛 LanguageTool

[grammar] ~456-~456: Ensure spelling is correct
Context: ...vLLM processes when relying on Python's builtin hashing for prefix caching. - If your v...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 456 and 523, fix
hyphenation in compound adjectives: change "Python's builtin hashing" to
"Python's built-in hashing" at line 456 and change "DGX-SPARK specific issues"
to "DGX-SPARK-specific issues" at line 523; update the two phrases in place
preserving surrounding text and punctuation.

@dmitry-tokarev-nv
Copy link
Contributor

@csabakecskemeti thank you for opening the PR!
This PR introduces vllm.dgx-spark specific dedicated dockerfile that is based on NVIDIA vllm container 25.09 which uses vllm==0.10.1.1, but our main branch is already on vllm==0.11.0. This may create an issue having to support different vllm versions.
Overall this looks like a develop container with many build tools that are not required in runtime and it installs dynamo in editable mode.
We are trying to avoid creating dockerfile that are HW specific.
I appreciate detailed docs and may extract some meaningful info from that.
At this point I would not merge it as is.
There is ongoing work to support CUDA 13 and Spark within Dynamo. Will update this PR in case we indeed find that we cannot avoid having a dedicated Dockerfile for Spark.

Comment on lines +9 to +10
ARG BASE_IMAGE="nvcr.io/nvidia/vllm"
ARG BASE_IMAGE_TAG="25.09-py3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid having HW specific containers. Dynamo has not been using latest vllm releases and main branch is already on vllm==0.11.0.
leaving this comment so this PR is not accidentally merged.
More feedback on the main thread.

Copy link
Author

@csabakecskemeti csabakecskemeti Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmitry-tokarev-nv
I see good point - I wanted to play this safe so not change the existing conatiner setup.
I also prefixed the PR with [proposal, review only] as I don't think it's 100% ready, till I cannot prove the disaggregated setup is working with it.

As I stated it works aggregated serving, and kind of my goal was to seek help with the disaggregated
setup.
I'll continue to investigate pn my own too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants