-
Couldn't load subscription status.
- Fork 662
[proposal, review only] dgx spark support - initial; cuda13 and arm base image #3909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi csabakecskemeti! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe changes introduce support for DGX-SPARK (ARM64) vLLM builds, adding a specialized Dockerfile, updating the build script with --dgx-spark CLI option, and providing comprehensive documentation for build procedures and deployment configurations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (1)
container/build.sh (1)
627-632: Build argument logic is correct but uses double negation; consider improving clarity.The condition
if [[ ! ((...)) ]](skip build args when NOT DGX) is functionally correct—it prevents passing BASE_IMAGE args to Dockerfile.vllm.dgx-spark, which hardcodes its base images. However, the double negation reduces readability.Consider this refactor for improved maintainability:
# Skip BASE_IMAGE and BASE_IMAGE_TAG for DGX-SPARK builds # (DGX-SPARK Dockerfile hardcodes its own base images) if ! ( [[ "$USE_DGX_SPARK" == "true" ]] || ( [[ "$PLATFORM" == *"linux/arm64"* ]] && [[ $FRAMEWORK == "VLLM" ]] ) ); then BUILD_ARGS+=" --build-arg BASE_IMAGE=$BASE_IMAGE --build-arg BASE_IMAGE_TAG=$BASE_IMAGE_TAG" fiOr, using positive logic:
# Only pass BASE_IMAGE args for non-DGX builds if [[ "$USE_DGX_SPARK" != "true" ]] && ! ( [[ "$PLATFORM" == *"linux/arm64"* ]] && [[ $FRAMEWORK == "VLLM" ]] ); then BUILD_ARGS+=" --build-arg BASE_IMAGE=$BASE_IMAGE --build-arg BASE_IMAGE_TAG=$BASE_IMAGE_TAG" fi
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
container/BUILD_DGX_SPARK_GUIDE.md(1 hunks)container/Dockerfile.vllm.dgx-spark(1 hunks)container/build.sh(6 hunks)docs/backends/vllm/DGX-SPARK_README.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
container/Dockerfile.vllm.dgx-spark
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3909/merge) by csabakecskemeti.
docs/backends/vllm/DGX-SPARK_README.md
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace).
🪛 LanguageTool
docs/backends/vllm/DGX-SPARK_README.md
[grammar] ~8-~8: Use a hyphen to join words.
Context: ...rage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric re...
(QB_NEW_EN_HYPHEN)
[grammar] ~80-~80: Use a hyphen to join words.
Context: ...n.md) | ✅ | ARM64 supported | ### Large Scale P/D and WideEP Features | Feature...
(QB_NEW_EN_HYPHEN)
[style] ~231-~231: The verb “get” can be informal. Consider replacing it with a form of “to be”.
Context: ...e components. The final components that get spawned depend upon the chosen deployment patte...
(GET_USED_ARE_USED)
[grammar] ~265-~265: Use a hyphen to join words.
Context: ... launch/disagg_router.sh ``` ### Single Node Data Parallel Attention / Expert Pa...
(QB_NEW_EN_HYPHEN)
[style] ~429-~429: The double modal “needed Cost” is nonstandard (only accepted in certain dialects). Consider “to be Cost”.
Context: ...efill or decode workers as needed > - Cost efficiency: Leverage existing hardwar...
(NEEDS_FIXED)
[grammar] ~456-~456: Ensure spelling is correct
Context: ...vLLM processes when relying on Python's builtin hashing for prefix caching. - If your v...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~523-~523: Use a hyphen to join words.
Context: ...ARK ``` ### Getting Help For DGX-SPARK specific issues: 1. Check this README fo...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
container/BUILD_DGX_SPARK_GUIDE.md
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
67-67: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
81-81: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
91-91: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
docs/backends/vllm/DGX-SPARK_README.md
19-19: Link text should be descriptive
(MD059, descriptive-link-text)
30-30: Link fragments should be valid
(MD051, link-fragments)
217-217: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
299-299: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
320-320: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
336-336: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
docs/backends/vllm/DGX-SPARK_README.md (1)
66-87: Feature support matrix is comprehensive and accurate.The tables clearly communicate which features are supported, in progress, or experimental on DGX-SPARK, with helpful notes. Status indicators and cross-references align well with the rest of the documentation.
container/Dockerfile.vllm.dgx-spark (4)
9-12: Base image and environment setup looks correct for DGX-SPARK.The Dockerfile properly:
- Declares hardcoded NVIDIA vLLM base image (nvcr.io/nvidia/vllm:25.09-py3) with CUDA 13 support
- Uses dynamo_base for utilities (NATS, etcd, uv)
- Sets PYTHONPATH to include system Python site-packages for NVIDIA's vLLM (line 40)
- Configures NIXL-specific environment variables (NIXL_PREFIX, NIXL_LIB_DIR, NIXL_PLUGIN_DIR) at lines 44-47
This approach avoids building vLLM from source, which sidesteps the
compute_121anvcc error mentioned in PR objectives.Also applies to: 33-48
92-139: UCX and NIXL builds from source are appropriately configured for ARM64/CUDA 13.The builds correctly:
- Use UCX v1.19.0 with CUDA 13 support and RDMA flags
- Build NIXL 0.7.0 (first version with CUDA 13) with GDS backend disabled for ARM64 (line 130)
- Create both C++ library and Python wheel (lines 131-137) with matching build flags
- Configure library paths and runtime linking (lines 133-135, 144-150)
The approach of building from source (rather than copying from base image) ensures CUDA 13 compatibility and ARM64 optimization.
152-195: Virtual environment and dependency installation is well-designed for DGX-SPARK.Key design choices are sound:
- Creates isolated venv with uv, keeping NVIDIA's system vLLM separate (line 40's PYTHONPATH)
- Installs fresh NIXL 0.7.0 wheel built in this Dockerfile (line 168) rather than cached version
- Properly separates runtime stage (prod-ready) from dev stage (line 210) for multi-stage efficiency
- ENTRYPOINT uses NVIDIA's wrapper (line 194) enabling flexible CMD invocations
This maintains clarity between NVIDIA's vLLM (system Python) and Dynamo's runtime (venv).
210-259: Dev stage properly extends runtime with build and debugging tools.The multi-stage design:
- Reuses runtime stage as base for dev (line 210), inheriting all NVIDIA/Dynamo setup
- Adds comprehensive dev utilities (nvtop, tmux, vim, cmake, etc.) for local debugging
- Includes all Rust/C++ build dependencies (clang, protobuf, pybind11) needed for maturin editable installs
- Maintains proper Cargo/Rustup paths from dynamo_base (lines 253-254)
This aligns with the learnings about maintaining consistency across framework-specific Dockerfiles while enabling local development workflows.
container/build.sh (1)
494-494: Help text for --dgx-spark is clear and informative.The new help entry accurately describes the flag's purpose, mentions Blackwell GPU support, and explains ARM64 auto-detection behavior. It aligns well with the implementation and user-facing documentation.
| ``` | ||
| IF framework == "VLLM": | ||
| IF --dgx-spark flag is set OR platform is linux/arm64: | ||
| Use: Dockerfile.vllm.dgx-spark (NVIDIA's pre-built vLLM with Blackwell support) | ||
| ELSE: | ||
| Use: Dockerfile.vllm (Build from source) | ||
| ELSE IF framework == "TRTLLM": | ||
| Use: Dockerfile.trtllm | ||
| ELSE IF framework == "SGLANG": | ||
| Use: Dockerfile.sglang | ||
| ELSE: | ||
| Use: Dockerfile | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifier to fenced code blocks (markdownlint-cli2: MD040).
Lines 9-21 (flowchart/pseudocode) and line 67 (error message) are fenced code blocks without language specifiers. Add appropriate language identifiers:
-```
+```text
IF framework == "VLLM":
...This helps with syntax highlighting and linting compliance.
Also applies to: 67-69
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 9-21 and 67-69, the fenced
code blocks lack a language specifier causing markdownlint MD040 failures;
update the opening triple-backtick lines to include an appropriate language tag
(e.g., ```text for the pseudocode/flowchart and the error message block) so each
fenced block is like ```text ... ``` and ensure spacing/indentation remains
unchanged.
| **Reason 1: CUDA 13 Compatibility** | ||
| - NIXL 0.7.0 is the first version with native CUDA 13.0 support | ||
| - Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`) | ||
| - Avoids runtime errors: `libcudart.so.12: cannot open shared object file` | ||
|
|
||
| **Reason 2: Cache Independence** | ||
| - The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12 | ||
| - Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13 | ||
| - Self-contained build = predictable results | ||
|
|
||
| **Reason 3: ARM64 Optimization** | ||
| - UCX and NIXL are built specifically for `aarch64` architecture | ||
| - GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert emphasis to proper headings (markdownlint-cli2: MD036).
Lines 81, 86, 91 use emphasis for "Reason N" subsection labels. Convert to headings for proper document structure:
-**Reason 1: CUDA 13 Compatibility**
+#### Reason 1: CUDA 13 Compatibility
-**Reason 2: Cache Independence**
+#### Reason 2: Cache Independence
-**Reason 3: ARM64 Optimization**
+#### Reason 3: ARM64 Optimization📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Reason 1: CUDA 13 Compatibility** | |
| - NIXL 0.7.0 is the first version with native CUDA 13.0 support | |
| - Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`) | |
| - Avoids runtime errors: `libcudart.so.12: cannot open shared object file` | |
| **Reason 2: Cache Independence** | |
| - The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12 | |
| - Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13 | |
| - Self-contained build = predictable results | |
| **Reason 3: ARM64 Optimization** | |
| - UCX and NIXL are built specifically for `aarch64` architecture | |
| - GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64 | |
| #### Reason 1: CUDA 13 Compatibility | |
| - NIXL 0.7.0 is the first version with native CUDA 13.0 support | |
| - Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`) | |
| - Avoids runtime errors: `libcudart.so.12: cannot open shared object file` | |
| #### Reason 2: Cache Independence | |
| - The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12 | |
| - Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13 | |
| - Self-contained build = predictable results | |
| #### Reason 3: ARM64 Optimization | |
| - UCX and NIXL are built specifically for `aarch64` architecture | |
| - GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64 |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
81-81: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
91-91: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 81 to 93, replace the
emphasized subsection labels currently using bold (e.g., **Reason 1: CUDA 13
Compatibility**) with proper Markdown headings (e.g., "### Reason 1: CUDA 13
Compatibility") for each "Reason N" line to satisfy markdownlint MD036; keep the
following bullet lists unchanged and ensure consistent heading level for all
three reason sections.
| @@ -0,0 +1,527 @@ | |||
| <!-- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix trailing whitespace detected by pre-commit hook.
Line 1 contains trailing whitespace that was flagged by the pre-commit pipeline. Remove it before merging.
🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3909/merge) by csabakecskemeti.
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace).
🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around line 1, remove the trailing
whitespace character at the end of the line that the pre-commit hook flagged;
edit the file to delete the extra space (or newline-only whitespace) so the line
ends cleanly, save, and re-run the pre-commit checks.
|
|
||
| # LLM Deployment using vLLM on DGX-SPARK | ||
|
|
||
| This directory contains reference implementations for deploying Large Language Models (LLMs) in various configurations using vLLM on **DGX-SPARK systems**. For Dynamo integration, we leverage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric reporting to enable KV-aware routing and P/D disaggregation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply hyphenation to compound adjectives in static analysis findings.
Lines 8 and 80 use compound adjectives ("NIXL based" and "Large Scale P/D") that should be hyphenated per grammar rules:
- Line 8: "NIXL based" → "NIXL-based"
- Line 80: "Large Scale P/D" → "Large-Scale P/D"
Similar issues may appear throughout the document.
Also applies to: 80-80
🧰 Tools
🪛 LanguageTool
[grammar] ~8-~8: Use a hyphen to join words.
Context: ...rage vLLM's native KV cache events, NIXL based transfer mechanisms, and metric re...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 8 and 80, compound
adjectives are missing hyphens; change "NIXL based" to "NIXL-based" at line 8
and "Large Scale P/D" to "Large-Scale P/D" at line 80, and scan the rest of the
document for similar compound adjectives (e.g., "X based", "Large Scale", "KV
aware") and apply hyphenation consistently (e.g., "X-based", "Large-Scale",
"KV-aware").
|
|
||
| This figure shows an overview of the major components to deploy: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifier to fenced code block (markdownlint-cli2).
Line 217 has a fenced code block without a language specifier. Add a language identifier for better syntax highlighting and linting compliance:
# Change from:
+---...
To:
+---...
This applies to any other fenced code blocks missing language specifiers in the document.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
217-217: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around line 217, there is a fenced
code block missing a language specifier; update the opening triple-backtick to
include an appropriate language (e.g., ```diff or ```bash or ```yaml depending
on the snippet) so markdownlint-cli2 passes and syntax highlighting is enabled,
and scan the file for any other fenced blocks without language specifiers and
add the correct language identifiers to each.
|
|
||
| #### Step-by-Step Multi-Node Setup | ||
|
|
||
| **1. Infrastructure Setup** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert emphasis to proper headings (markdownlint-cli2: MD036).
Lines 299, 320, and 336 use emphasis (bold text: text) for section labels, but markdown linting expects proper heading syntax. Convert these to headings:
-**1. Infrastructure Setup**
+### 1. Infrastructure Setup
-**2. Build Containers**
+### 2. Build Containers
-**3. Deploy Workers**
+### 3. Deploy WorkersThis improves document structure and accessibility.
Also applies to: 320-320, 336-336
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
299-299: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 299, 320, and 336,
convert the section labels currently using bold text (**...**) into proper
Markdown headings (e.g., starting with one or more # characters) to satisfy
MD036; replace each bolded label with an appropriate heading level (choose
consistent levels matching surrounding sections, e.g., ## or ###) and remove the
surrounding ** markers so the sections are parsed as headings by markdownlint.
|
|
||
| When using KV-aware routing, ensure deterministic hashing across processes to avoid radix tree mismatches. Choose one of the following: | ||
|
|
||
| - Set `PYTHONHASHSEED=0` for all vLLM processes when relying on Python's builtin hashing for prefix caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix hyphenation in compound adjectives.
Line 456: "Python's builtin hashing" → "Python's built-in hashing"
Line 523: "DGX-SPARK specific issues" → "DGX-SPARK-specific issues"
These are compound modifiers that should be hyphenated.
Also applies to: 523-523
🧰 Tools
🪛 LanguageTool
[grammar] ~456-~456: Ensure spelling is correct
Context: ...vLLM processes when relying on Python's builtin hashing for prefix caching. - If your v...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In docs/backends/vllm/DGX-SPARK_README.md around lines 456 and 523, fix
hyphenation in compound adjectives: change "Python's builtin hashing" to
"Python's built-in hashing" at line 456 and change "DGX-SPARK specific issues"
to "DGX-SPARK-specific issues" at line 523; update the two phrases in place
preserving surrounding text and punctuation.
|
@csabakecskemeti thank you for opening the PR! |
| ARG BASE_IMAGE="nvcr.io/nvidia/vllm" | ||
| ARG BASE_IMAGE_TAG="25.09-py3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to avoid having HW specific containers. Dynamo has not been using latest vllm releases and main branch is already on vllm==0.11.0.
leaving this comment so this PR is not accidentally merged.
More feedback on the main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dmitry-tokarev-nv
I see good point - I wanted to play this safe so not change the existing conatiner setup.
I also prefixed the PR with [proposal, review only] as I don't think it's 100% ready, till I cannot prove the disaggregated setup is working with it.
As I stated it works aggregated serving, and kind of my goal was to seek help with the disaggregated
setup.
I'll continue to investigate pn my own too.
Overview:
Support DGX Spark for VLLM backend
Details:
New dockerfile for DGX Spark specific instructions:
build.hs also updated to build for DGX Spark
./container/build.sh --framework VLLM --dgx-sparkThis uses the new dockerfiel and sets the proper arm64 platform
Where should the reviewer start?
container/build.shcontainer/Dockerfile.vllm.dgx-sparkThen new readmes, especially
docs/backends/vllm/DGX-SPARK_README.mdRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Tested scenarios:
Aggregated serving ✅
Disaggregated serving ❗
Tested setup:
[node1] DGX Spark host the dynamo dcgm-exporter, nats, etcd the dynamo frontend and serve as prefill worker
[node2] x86/64 RTX 3090 (CUDA12) serve as the decode worker
Setup I've implemented
Issue ❌
Inference stalls, no response no acivity on any of the workers. I assume it's an issue with my setup or CUDA 13 support (NIXL 0.7.0) related.
Both workes seems registered:
2025-10-27T17:28:11.052173Z INFO dynamo_llm::discovery::watcher: added model model_name="Qwen/Qwen3-0.6B" namespace="dynamo" 2025-10-27T17:28:11.057018Z INFO dynamo_llm::kv_router::scheduler: Runtime config found for worker_id: 7587890414798442552 2025-10-27T17:28:11.057031Z WARN dynamo_llm::kv_router::sequence: Adding worker WorkerWithDpRank { worker_id: 7587890414798442552, dp_rank: 0 }At inference the router seemingly aware of the 2 workes:
2025-10-27T17:30:43.780177Z INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=7587890414798442538 dp_rank=0 with 26 cached blocks: 0.438 = 1.0 * prefill_blocks + decode_blocks = 1.0 * 0.438 + 0.000 2025-10-27T17:30:43.780211Z INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890414798442538 dp_rank=0, logit: 0.438, cached blocks: 26, total blocks: 61707 2025-10-27T17:30:43.839306Z WARN nixl_connector.get_finished: Releasing expired KV blocks for request 635e21b8-c622-4501-a580-57bae888e2d6 which were retrieved by 0 decode worker(s) within 120 seconds. 2025-10-27T17:30:43.863645Z INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=7587890414798442552 dp_rank=0 with 0 cached blocks: 26.000 = 0.0 * prefill_blocks + decode_blocks = 0.0 * 26.438 + 26.000 2025-10-27T17:30:43.863670Z INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890414798442552 dp_rank=0, logit: 26.000, cached blocks: 0, total blocks: 11432But nothing happens.
Note both worker (DGX Spark, 3090) is able to infer standalobe with the Dynamo frontend running on the DGX Spark
Summary by CodeRabbit
New Features
Documentation