csabakecskemeti · csabakecskemeti · Oct 27, 2025
diff --git a/container/BUILD_DGX_SPARK_GUIDE.md b/container/BUILD_DGX_SPARK_GUIDE.md
@@ -0,0 +1,130 @@
+# Building Dynamo for DGX-SPARK (vLLM)
+
+## How `build.sh` Chooses the Dockerfile
+
+The `build.sh` script automatically selects the correct Dockerfile based on the platform and optional flags:
+
+### Dockerfile Selection Logic
+
+```
+IF framework == "VLLM":
+    IF --dgx-spark flag is set OR platform is linux/arm64:
+        Use: Dockerfile.vllm.dgx-spark  (NVIDIA's pre-built vLLM with Blackwell support)
+    ELSE:
+        Use: Dockerfile.vllm            (Build from source)
+ELSE IF framework == "TRTLLM":
+    Use: Dockerfile.trtllm
+ELSE IF framework == "SGLANG":
+    Use: Dockerfile.sglang
+ELSE:
+    Use: Dockerfile
+```
+
+### How to Use
+
+#### For DGX-SPARK (Blackwell GPUs)
+
+**Automatic detection (recommended):**
+```bash
+./container/build.sh --framework VLLM --platform linux/arm64
+```
+
+**Explicit flag:**
+```bash
+./container/build.sh --framework VLLM --dgx-spark
+```
+
+#### For x86_64 (standard GPUs)
+
+```bash
+./container/build.sh --framework VLLM
+# or explicitly
+./container/build.sh --framework VLLM --platform linux/amd64
+```
+
+## Key Differences
+
+### Standard vLLM Dockerfile (`Dockerfile.vllm`)
+- Builds vLLM from source
+- Uses CUDA 12.8
+- Supports: Ampere, Ada, Hopper GPUs
+- **Does NOT support Blackwell (compute_121)**
+
+### DGX-SPARK Dockerfile (`Dockerfile.vllm.dgx-spark`)
+- Uses NVIDIA's pre-built vLLM container (`nvcr.io/nvidia/vllm:25.09-py3`)
+- Uses CUDA 13.0
+- Supports: **Blackwell GPUs (compute_121)** via DGX-SPARK
+- Skips building vLLM from source (avoids nvcc errors)
+- **Builds UCX v1.19.0 from source** with CUDA 13 support
+- **Builds NIXL 0.7.0 from source** with CUDA 13 support (self-contained, no cache dependency)
+- **Builds NIXL Python wheel** with CUDA 13 support
+- Adds Dynamo's runtime customizations and integrations
+
+## Why DGX-SPARK Needs Special Handling
+
+DGX-SPARK systems use **Blackwell GPUs** with architecture `compute_121`. When trying to build vLLM from source with older CUDA toolchains:
+
+```
+ERROR: nvcc fatal : Unsupported gpu architecture 'compute_121a'
+```
+
+**Solution:** Use NVIDIA's pre-built vLLM container that already includes:
+- CUDA 13.0 support
+- Blackwell GPU architecture support
+- DGX Spark functional support
+- NVFP4 format optimization
+
+### Why Build UCX and NIXL from Source?
+
+The DGX-SPARK Dockerfile builds UCX v1.19.0 and NIXL 0.7.0 **from source** instead of copying from the base image:
+
+**Reason 1: CUDA 13 Compatibility**
+- NIXL 0.7.0 is the first version with native CUDA 13.0 support
+- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
+- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`
+
+**Reason 2: Cache Independence**
+- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
+- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
+- Self-contained build = predictable results
+
+**Reason 3: ARM64 Optimization**
+- UCX and NIXL are built specifically for `aarch64` architecture
+- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64
+
+## Build Arguments
+
+When using the `--dgx-spark` flag, `build.sh` automatically:
+- Selects `Dockerfile.vllm.dgx-spark`
+- Sets `PLATFORM=linux/arm64` (forced)
+- Sets `NIXL_REF=0.7.0` (for CUDA 13 support)
+- Sets `ARCH=arm64` and `ARCH_ALT=aarch64`
+
+The DGX-SPARK Dockerfile itself hardcodes:
+- `BASE_IMAGE=nvcr.io/nvidia/vllm`
+- `BASE_IMAGE_TAG=25.09-py3`
+
+All other build arguments work the same way.
+
+## Troubleshooting
+
+### Error: `exec /bin/sh: exec format error`
+- **Cause:** Building with wrong platform
+- **Fix:** Use `--platform linux/arm64` for DGX-SPARK
+
+### Error: `nvcc fatal : Unsupported gpu architecture 'compute_121a'`
+- **Cause:** Building from source without Blackwell support
+- **Fix:** Use `--dgx-spark` or `--platform linux/arm64` to use pre-built container
+
+### Error: `libcudart.so.12: cannot open shared object file`
+- **Cause:** NIXL was built with CUDA 12 but container has CUDA 13
+- **Fix:** Rebuild with `--dgx-spark` flag to ensure NIXL 0.7.0 with CUDA 13 support
+- **Verify:** Inside container: `ldd /opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu/plugins/libplugin_UCX_MO.so | grep cudart` should show `libcudart.so.13` (not `.so.12`)
+
+## References
+
+- [NVIDIA vLLM Release 25.09 Documentation](https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html)
+- [NVIDIA NGC Container Registry](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm)
+- [NIXL 0.7.0 Release Notes](https://github.com/ai-dynamo/nixl/releases/tag/0.7.0) - CUDA 13.0 support
+- [DGX-SPARK README](../docs/backends/vllm/DGX-SPARK_README.md) - Complete deployment guide
+
diff --git a/container/Dockerfile.vllm.dgx-spark b/container/Dockerfile.vllm.dgx-spark
@@ -0,0 +1,263 @@
+# syntax=docker/dockerfile:1.10.0
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# DGX-SPARK specific Dockerfile for vLLM
+# Uses NVIDIA's pre-built vLLM container that supports Blackwell GPUs (compute_121)
+# See: https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html
+
+ARG BASE_IMAGE="nvcr.io/nvidia/vllm"
+ARG BASE_IMAGE_TAG="25.09-py3"
+
+ARG DYNAMO_BASE_IMAGE="dynamo:latest-none"
+FROM ${DYNAMO_BASE_IMAGE} AS dynamo_base
+
+########################################################
+########## Runtime Image (based on NVIDIA vLLM) #######
+########################################################
+#
+# PURPOSE: Production runtime environment for DGX-SPARK
+#
+# This stage uses NVIDIA's pre-built vLLM container that already includes:
+# - vLLM with DGX Spark functional support (Blackwell compute_121)
+# - CUDA 13.0 support
+# - NVFP4 format support
+# - All necessary GPU acceleration libraries
+#
+# We add Dynamo's customizations on top:
+# - Dynamo runtime libraries
+# - NIXL for KV cache transfer
+# - Custom backend integrations
+#
+
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS runtime
+
+WORKDIR /workspace
+ENV DYNAMO_HOME=/opt/dynamo
+ENV VIRTUAL_ENV=/opt/dynamo/venv
+ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
+# Add system Python site-packages to PYTHONPATH so we can use NVIDIA's vLLM
+ENV PYTHONPATH="/usr/local/lib/python3.12/dist-packages:${PYTHONPATH}"
+
+# NVIDIA vLLM container already has Python 3.12 and vLLM installed
+# We just need to set up Dynamo's virtual environment and dependencies
+ARG ARCH_ALT=aarch64
+ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl
+ENV NIXL_LIB_DIR=$NIXL_PREFIX/lib/${ARCH_ALT}-linux-gnu
+ENV NIXL_PLUGIN_DIR=$NIXL_LIB_DIR/plugins
+
+# Install additional dependencies for Dynamo
+# Note: NVIDIA vLLM container already has Python and CUDA tools
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        # Python runtime - CRITICAL for virtual environment to work
+        python3.12-dev \
+        build-essential \
+        # jq and curl for polling various endpoints and health checks
+        jq \
+        git \
+        git-lfs \
+        curl \
+        # Libraries required by UCX to find RDMA devices
+        libibverbs1 rdma-core ibverbs-utils libibumad3 \
+        libnuma1 librdmacm1 ibverbs-providers \
+        # JIT Kernel Compilation, flashinfer
+        ninja-build \
+        g++ \
+        # prometheus dependencies
+        ca-certificates && \
+    rm -rf /var/lib/apt/lists/*
+
+# NVIDIA vLLM container has CUDA already, but ensure CUDA tools are in PATH
+ENV PATH=/usr/local/cuda/bin:$PATH
+
+# DeepGemm runs nvcc for JIT kernel compilation, however the CUDA include path
+# is not properly set for compilation. Set CPATH to help nvcc find the headers.
+ENV CPATH=/usr/local/cuda/include
+
+### COPY NATS & ETCD ###
+# Copy nats and etcd from dev image
+COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
+COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
+# Add ETCD and CUDA binaries to PATH so cicc and other CUDA tools are accessible
+ENV PATH=/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:/usr/local/cuda/bin:$PATH
+
+### COPY UV EARLY (needed for building NIXL Python wheel) ###
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
+COPY --from=ghcr.io/astral-sh/uv:latest /uvx /bin/uvx
+
+# Build UCX and NIXL directly in this stage for CUDA 13.0 support
+# This ensures we get fresh NIXL 0.7.0 with CUDA 13 support, not cached CUDA 12 version
+
+# Build UCX from source
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        autoconf automake libtool pkg-config \
+        libibverbs-dev librdmacm-dev \
+    && rm -rf /var/lib/apt/lists/* \
+    && cd /usr/local/src \
+    && git clone https://github.com/openucx/ucx.git \
+    && cd ucx && git checkout v1.19.0 \
+    && ./autogen.sh \
+    && ./configure \
+        --prefix=/usr/local/ucx \
+        --enable-shared \
+        --disable-static \
+        --disable-doxygen-doc \
+        --enable-optimizations \
+        --enable-cma \
+        --enable-devel-headers \
+        --with-cuda=/usr/local/cuda \
+        --with-verbs \
+        --with-dm \
+        --enable-mt \
+    && make -j$(nproc) \
+    && make -j$(nproc) install-strip \
+    && echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf \
+    && echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf \
+    && ldconfig \
+    && cd /usr/local/src \
+    && rm -rf ucx
+
+# Build NIXL 0.7.0 from source with CUDA 13.0 support
+# Build both C++ library and Python wheel
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        meson ninja-build python3-pip \
+    && rm -rf /var/lib/apt/lists/* \
+    && git clone --depth 1 --branch 0.7.0 "https://github.com/ai-dynamo/nixl.git" /opt/nixl \
+    && cd /opt/nixl \
+    && meson setup build/ --buildtype=release --prefix=$NIXL_PREFIX -Ddisable_gds_backend=true \
+    && ninja -C build/ -j$(nproc) \
+    && ninja -C build/ install \
+    && echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf \
+    && echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf \
+    && ldconfig \
+    && mkdir -p /opt/dynamo/wheelhouse/nixl \
+    && /bin/uv build . --out-dir /opt/dynamo/wheelhouse/nixl --config-settings=setup-args="-Ddisable_gds_backend=true" \
+    && cd - \
+    && rm -rf /opt/nixl
+
+ENV PATH=/usr/local/ucx/bin:$PATH
+
+# Set library paths for NIXL and UCX
+ENV LD_LIBRARY_PATH=\
+/usr/local/cuda/lib64:\
+$NIXL_LIB_DIR:\
+$NIXL_PLUGIN_DIR:\
+/usr/local/ucx/lib:\
+/usr/local/ucx/lib/ucx:\
+$LD_LIBRARY_PATH
+
+### VIRTUAL ENVIRONMENT SETUP ###
+# Note: uv was already copied earlier (needed for building NIXL Python wheel)
+
+# Create Dynamo's virtual environment
+RUN uv venv /opt/dynamo/venv --python 3.12
+
+# Install Dynamo dependencies
+# Note: vLLM is available via PYTHONPATH pointing to system Python
+# Note: We copy dynamo wheels from base, but NIXL wheel was built fresh above with CUDA 13 support
+COPY benchmarks/ /opt/dynamo/benchmarks/
+RUN mkdir -p /opt/dynamo/wheelhouse
+COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl /opt/dynamo/wheelhouse/
+COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo*.whl /opt/dynamo/wheelhouse/
+RUN uv pip install \
+    /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+    /opt/dynamo/wheelhouse/nixl/nixl*.whl \
+    && cd /opt/dynamo/benchmarks \
+    && UV_GIT_LFS=1 uv pip install --no-cache . \
+    && cd - \
+    && rm -rf /opt/dynamo/benchmarks
+
+# Install common and test dependencies
+RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
+    --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
+    UV_GIT_LFS=1 uv pip install \
+        --no-cache \
+        --requirement /tmp/requirements.txt \
+        --requirement /tmp/requirements.test.txt
+
+# Copy benchmarks, examples, and tests for CI
+COPY . /workspace/
+
+# Copy attribution files
+COPY ATTRIBUTION* LICENSE /workspace/
+
+# Copy launch banner
+RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \
+    sed '/^#\s/d' /workspace/launch_message.txt > ~/.launch_screen && \
+    echo "cat ~/.launch_screen" >> ~/.bashrc && \
+    echo "source $VIRTUAL_ENV/bin/activate" >> ~/.bashrc
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
+
+###########################################################
+########## Development (run.sh, runs as root user) ########
+###########################################################
+#
+# PURPOSE: Local development environment for use with run.sh (not Dev Container plug-in)
+#
+# This stage runs as root and provides:
+# - Development tools and utilities for local debugging
+# - Support for vscode/cursor development outside the Dev Container plug-in
+#
+# Use this stage if you need a full-featured development environment with extra tools,
+# but do not use it with the Dev Container plug-in.
+
+FROM runtime AS dev
+
+# Don't want ubuntu to be editable, just change uid and gid.
+ARG WORKSPACE_DIR=/workspace
+
+# Install utilities as root
+RUN apt-get update -y && \
+    apt-get install -y --no-install-recommends  \
+    # Install utilities
+    nvtop \
+    wget \
+    tmux \
+    vim \
+    git \
+    openssh-client \
+    iproute2 \
+    rsync \
+    zip \
+    unzip \
+    htop \
+    # Build Dependencies
+    autoconf \
+    automake \
+    cmake \
+    libtool \
+    meson \
+    net-tools \
+    pybind11-dev \
+    # Rust build dependencies
+    clang \
+    libclang-dev \
+    protobuf-compiler && \
+    rm -rf /var/lib/apt/lists/*
+
+# Set workspace directory variable
+ENV WORKSPACE_DIR=${WORKSPACE_DIR} \
+    DYNAMO_HOME=${WORKSPACE_DIR} \
+    RUSTUP_HOME=/usr/local/rustup \
+    CARGO_HOME=/usr/local/cargo \
+    CARGO_TARGET_DIR=/workspace/target \
+    VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH=/usr/local/cargo/bin:$PATH
+
+COPY --from=dynamo_base /usr/local/rustup /usr/local/rustup
+COPY --from=dynamo_base /usr/local/cargo /usr/local/cargo
+
+# Install maturin, for maturin develop
+# Editable install of dynamo
+RUN uv pip install maturin[patchelf] && \
+    uv pip install --no-deps -e .
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
+