Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions container/BUILD_DGX_SPARK_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Building Dynamo for DGX-SPARK (vLLM)

## How `build.sh` Chooses the Dockerfile

The `build.sh` script automatically selects the correct Dockerfile based on the platform and optional flags:

### Dockerfile Selection Logic

```
IF framework == "VLLM":
IF --dgx-spark flag is set OR platform is linux/arm64:
Use: Dockerfile.vllm.dgx-spark (NVIDIA's pre-built vLLM with Blackwell support)
ELSE:
Use: Dockerfile.vllm (Build from source)
ELSE IF framework == "TRTLLM":
Use: Dockerfile.trtllm
ELSE IF framework == "SGLANG":
Use: Dockerfile.sglang
ELSE:
Use: Dockerfile
```
Comment on lines +9 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code blocks (markdownlint-cli2: MD040).

Lines 9-21 (flowchart/pseudocode) and line 67 (error message) are fenced code blocks without language specifiers. Add appropriate language identifiers:

-```
+```text
 IF framework == "VLLM":
    ...

This helps with syntax highlighting and linting compliance.

Also applies to: 67-69

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 9-21 and 67-69, the fenced
code blocks lack a language specifier causing markdownlint MD040 failures;
update the opening triple-backtick lines to include an appropriate language tag
(e.g., ```text for the pseudocode/flowchart and the error message block) so each
fenced block is like ```text ... ``` and ensure spacing/indentation remains
unchanged.


### How to Use

#### For DGX-SPARK (Blackwell GPUs)

**Automatic detection (recommended):**
```bash
./container/build.sh --framework VLLM --platform linux/arm64
```

**Explicit flag:**
```bash
./container/build.sh --framework VLLM --dgx-spark
```

#### For x86_64 (standard GPUs)

```bash
./container/build.sh --framework VLLM
# or explicitly
./container/build.sh --framework VLLM --platform linux/amd64
```

## Key Differences

### Standard vLLM Dockerfile (`Dockerfile.vllm`)
- Builds vLLM from source
- Uses CUDA 12.8
- Supports: Ampere, Ada, Hopper GPUs
- **Does NOT support Blackwell (compute_121)**

### DGX-SPARK Dockerfile (`Dockerfile.vllm.dgx-spark`)
- Uses NVIDIA's pre-built vLLM container (`nvcr.io/nvidia/vllm:25.09-py3`)
- Uses CUDA 13.0
- Supports: **Blackwell GPUs (compute_121)** via DGX-SPARK
- Skips building vLLM from source (avoids nvcc errors)
- **Builds UCX v1.19.0 from source** with CUDA 13 support
- **Builds NIXL 0.7.0 from source** with CUDA 13 support (self-contained, no cache dependency)
- **Builds NIXL Python wheel** with CUDA 13 support
- Adds Dynamo's runtime customizations and integrations

## Why DGX-SPARK Needs Special Handling

DGX-SPARK systems use **Blackwell GPUs** with architecture `compute_121`. When trying to build vLLM from source with older CUDA toolchains:

```
ERROR: nvcc fatal : Unsupported gpu architecture 'compute_121a'
```

**Solution:** Use NVIDIA's pre-built vLLM container that already includes:
- CUDA 13.0 support
- Blackwell GPU architecture support
- DGX Spark functional support
- NVFP4 format optimization

### Why Build UCX and NIXL from Source?

The DGX-SPARK Dockerfile builds UCX v1.19.0 and NIXL 0.7.0 **from source** instead of copying from the base image:

**Reason 1: CUDA 13 Compatibility**
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`

**Reason 2: Cache Independence**
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results

**Reason 3: ARM64 Optimization**
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
Comment on lines +81 to +93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Convert emphasis to proper headings (markdownlint-cli2: MD036).

Lines 81, 86, 91 use emphasis for "Reason N" subsection labels. Convert to headings for proper document structure:

-**Reason 1: CUDA 13 Compatibility**
+#### Reason 1: CUDA 13 Compatibility

-**Reason 2: Cache Independence**
+#### Reason 2: Cache Independence

-**Reason 3: ARM64 Optimization**
+#### Reason 3: ARM64 Optimization
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Reason 1: CUDA 13 Compatibility**
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`
**Reason 2: Cache Independence**
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results
**Reason 3: ARM64 Optimization**
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
#### Reason 1: CUDA 13 Compatibility
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`
#### Reason 2: Cache Independence
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results
#### Reason 3: ARM64 Optimization
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


91-91: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In container/BUILD_DGX_SPARK_GUIDE.md around lines 81 to 93, replace the
emphasized subsection labels currently using bold (e.g., **Reason 1: CUDA 13
Compatibility**) with proper Markdown headings (e.g., "### Reason 1: CUDA 13
Compatibility") for each "Reason N" line to satisfy markdownlint MD036; keep the
following bullet lists unchanged and ensure consistent heading level for all
three reason sections.


## Build Arguments

When using the `--dgx-spark` flag, `build.sh` automatically:
- Selects `Dockerfile.vllm.dgx-spark`
- Sets `PLATFORM=linux/arm64` (forced)
- Sets `NIXL_REF=0.7.0` (for CUDA 13 support)
- Sets `ARCH=arm64` and `ARCH_ALT=aarch64`

The DGX-SPARK Dockerfile itself hardcodes:
- `BASE_IMAGE=nvcr.io/nvidia/vllm`
- `BASE_IMAGE_TAG=25.09-py3`

All other build arguments work the same way.

## Troubleshooting

### Error: `exec /bin/sh: exec format error`
- **Cause:** Building with wrong platform
- **Fix:** Use `--platform linux/arm64` for DGX-SPARK

### Error: `nvcc fatal : Unsupported gpu architecture 'compute_121a'`
- **Cause:** Building from source without Blackwell support
- **Fix:** Use `--dgx-spark` or `--platform linux/arm64` to use pre-built container

### Error: `libcudart.so.12: cannot open shared object file`
- **Cause:** NIXL was built with CUDA 12 but container has CUDA 13
- **Fix:** Rebuild with `--dgx-spark` flag to ensure NIXL 0.7.0 with CUDA 13 support
- **Verify:** Inside container: `ldd /opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu/plugins/libplugin_UCX_MO.so | grep cudart` should show `libcudart.so.13` (not `.so.12`)

## References

- [NVIDIA vLLM Release 25.09 Documentation](https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html)
- [NVIDIA NGC Container Registry](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm)
- [NIXL 0.7.0 Release Notes](https://github.com/ai-dynamo/nixl/releases/tag/0.7.0) - CUDA 13.0 support
- [DGX-SPARK README](../docs/backends/vllm/DGX-SPARK_README.md) - Complete deployment guide

263 changes: 263 additions & 0 deletions container/Dockerfile.vllm.dgx-spark
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# syntax=docker/dockerfile:1.10.0
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# DGX-SPARK specific Dockerfile for vLLM
# Uses NVIDIA's pre-built vLLM container that supports Blackwell GPUs (compute_121)
# See: https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html

ARG BASE_IMAGE="nvcr.io/nvidia/vllm"
ARG BASE_IMAGE_TAG="25.09-py3"
Comment on lines +9 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid having HW specific containers. Dynamo has not been using latest vllm releases and main branch is already on vllm==0.11.0.
leaving this comment so this PR is not accidentally merged.
More feedback on the main thread.

Copy link
Author

@csabakecskemeti csabakecskemeti Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmitry-tokarev-nv
I see good point - I wanted to play this safe so not change the existing conatiner setup.
I also prefixed the PR with [proposal, review only] as I don't think it's 100% ready, till I cannot prove the disaggregated setup is working with it.

As I stated it works aggregated serving, and kind of my goal was to seek help with the disaggregated
setup.
I'll continue to investigate pn my own too.


ARG DYNAMO_BASE_IMAGE="dynamo:latest-none"
FROM ${DYNAMO_BASE_IMAGE} AS dynamo_base

########################################################
########## Runtime Image (based on NVIDIA vLLM) #######
########################################################
#
# PURPOSE: Production runtime environment for DGX-SPARK
#
# This stage uses NVIDIA's pre-built vLLM container that already includes:
# - vLLM with DGX Spark functional support (Blackwell compute_121)
# - CUDA 13.0 support
# - NVFP4 format support
# - All necessary GPU acceleration libraries
#
# We add Dynamo's customizations on top:
# - Dynamo runtime libraries
# - NIXL for KV cache transfer
# - Custom backend integrations
#

FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS runtime

WORKDIR /workspace
ENV DYNAMO_HOME=/opt/dynamo
ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Add system Python site-packages to PYTHONPATH so we can use NVIDIA's vLLM
ENV PYTHONPATH="/usr/local/lib/python3.12/dist-packages:${PYTHONPATH}"

# NVIDIA vLLM container already has Python 3.12 and vLLM installed
# We just need to set up Dynamo's virtual environment and dependencies
ARG ARCH_ALT=aarch64
ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl
ENV NIXL_LIB_DIR=$NIXL_PREFIX/lib/${ARCH_ALT}-linux-gnu
ENV NIXL_PLUGIN_DIR=$NIXL_LIB_DIR/plugins

# Install additional dependencies for Dynamo
# Note: NVIDIA vLLM container already has Python and CUDA tools
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# Python runtime - CRITICAL for virtual environment to work
python3.12-dev \
build-essential \
# jq and curl for polling various endpoints and health checks
jq \
git \
git-lfs \
curl \
# Libraries required by UCX to find RDMA devices
libibverbs1 rdma-core ibverbs-utils libibumad3 \
libnuma1 librdmacm1 ibverbs-providers \
# JIT Kernel Compilation, flashinfer
ninja-build \
g++ \
# prometheus dependencies
ca-certificates && \
rm -rf /var/lib/apt/lists/*

# NVIDIA vLLM container has CUDA already, but ensure CUDA tools are in PATH
ENV PATH=/usr/local/cuda/bin:$PATH

# DeepGemm runs nvcc for JIT kernel compilation, however the CUDA include path
# is not properly set for compilation. Set CPATH to help nvcc find the headers.
ENV CPATH=/usr/local/cuda/include

### COPY NATS & ETCD ###
# Copy nats and etcd from dev image
COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
# Add ETCD and CUDA binaries to PATH so cicc and other CUDA tools are accessible
ENV PATH=/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:/usr/local/cuda/bin:$PATH

### COPY UV EARLY (needed for building NIXL Python wheel) ###
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
COPY --from=ghcr.io/astral-sh/uv:latest /uvx /bin/uvx

# Build UCX and NIXL directly in this stage for CUDA 13.0 support
# This ensures we get fresh NIXL 0.7.0 with CUDA 13 support, not cached CUDA 12 version

# Build UCX from source
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
autoconf automake libtool pkg-config \
libibverbs-dev librdmacm-dev \
&& rm -rf /var/lib/apt/lists/* \
&& cd /usr/local/src \
&& git clone https://github.com/openucx/ucx.git \
&& cd ucx && git checkout v1.19.0 \
&& ./autogen.sh \
&& ./configure \
--prefix=/usr/local/ucx \
--enable-shared \
--disable-static \
--disable-doxygen-doc \
--enable-optimizations \
--enable-cma \
--enable-devel-headers \
--with-cuda=/usr/local/cuda \
--with-verbs \
--with-dm \
--enable-mt \
&& make -j$(nproc) \
&& make -j$(nproc) install-strip \
&& echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf \
&& echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf \
&& ldconfig \
&& cd /usr/local/src \
&& rm -rf ucx

# Build NIXL 0.7.0 from source with CUDA 13.0 support
# Build both C++ library and Python wheel
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
meson ninja-build python3-pip \
&& rm -rf /var/lib/apt/lists/* \
&& git clone --depth 1 --branch 0.7.0 "https://github.com/ai-dynamo/nixl.git" /opt/nixl \
&& cd /opt/nixl \
&& meson setup build/ --buildtype=release --prefix=$NIXL_PREFIX -Ddisable_gds_backend=true \
&& ninja -C build/ -j$(nproc) \
&& ninja -C build/ install \
&& echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf \
&& echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf \
&& ldconfig \
&& mkdir -p /opt/dynamo/wheelhouse/nixl \
&& /bin/uv build . --out-dir /opt/dynamo/wheelhouse/nixl --config-settings=setup-args="-Ddisable_gds_backend=true" \
&& cd - \
&& rm -rf /opt/nixl

ENV PATH=/usr/local/ucx/bin:$PATH

# Set library paths for NIXL and UCX
ENV LD_LIBRARY_PATH=\
/usr/local/cuda/lib64:\
$NIXL_LIB_DIR:\
$NIXL_PLUGIN_DIR:\
/usr/local/ucx/lib:\
/usr/local/ucx/lib/ucx:\
$LD_LIBRARY_PATH

### VIRTUAL ENVIRONMENT SETUP ###
# Note: uv was already copied earlier (needed for building NIXL Python wheel)

# Create Dynamo's virtual environment
RUN uv venv /opt/dynamo/venv --python 3.12

# Install Dynamo dependencies
# Note: vLLM is available via PYTHONPATH pointing to system Python
# Note: We copy dynamo wheels from base, but NIXL wheel was built fresh above with CUDA 13 support
COPY benchmarks/ /opt/dynamo/benchmarks/
RUN mkdir -p /opt/dynamo/wheelhouse
COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl /opt/dynamo/wheelhouse/
COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo*.whl /opt/dynamo/wheelhouse/
RUN uv pip install \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl \
&& cd /opt/dynamo/benchmarks \
&& UV_GIT_LFS=1 uv pip install --no-cache . \
&& cd - \
&& rm -rf /opt/dynamo/benchmarks

# Install common and test dependencies
RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
--mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
UV_GIT_LFS=1 uv pip install \
--no-cache \
--requirement /tmp/requirements.txt \
--requirement /tmp/requirements.test.txt

# Copy benchmarks, examples, and tests for CI
COPY . /workspace/

# Copy attribution files
COPY ATTRIBUTION* LICENSE /workspace/

# Copy launch banner
RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \
sed '/^#\s/d' /workspace/launch_message.txt > ~/.launch_screen && \
echo "cat ~/.launch_screen" >> ~/.bashrc && \
echo "source $VIRTUAL_ENV/bin/activate" >> ~/.bashrc

ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []

###########################################################
########## Development (run.sh, runs as root user) ########
###########################################################
#
# PURPOSE: Local development environment for use with run.sh (not Dev Container plug-in)
#
# This stage runs as root and provides:
# - Development tools and utilities for local debugging
# - Support for vscode/cursor development outside the Dev Container plug-in
#
# Use this stage if you need a full-featured development environment with extra tools,
# but do not use it with the Dev Container plug-in.

FROM runtime AS dev

# Don't want ubuntu to be editable, just change uid and gid.
ARG WORKSPACE_DIR=/workspace

# Install utilities as root
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
# Install utilities
nvtop \
wget \
tmux \
vim \
git \
openssh-client \
iproute2 \
rsync \
zip \
unzip \
htop \
# Build Dependencies
autoconf \
automake \
cmake \
libtool \
meson \
net-tools \
pybind11-dev \
# Rust build dependencies
clang \
libclang-dev \
protobuf-compiler && \
rm -rf /var/lib/apt/lists/*

# Set workspace directory variable
ENV WORKSPACE_DIR=${WORKSPACE_DIR} \
DYNAMO_HOME=${WORKSPACE_DIR} \
RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
CARGO_TARGET_DIR=/workspace/target \
VIRTUAL_ENV=/opt/dynamo/venv \
PATH=/usr/local/cargo/bin:$PATH

COPY --from=dynamo_base /usr/local/rustup /usr/local/rustup
COPY --from=dynamo_base /usr/local/cargo /usr/local/cargo

# Install maturin, for maturin develop
# Editable install of dynamo
RUN uv pip install maturin[patchelf] && \
uv pip install --no-deps -e .

ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []

Loading
Loading