Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions container/BUILD_DGX_SPARK_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Building Dynamo for DGX-SPARK (vLLM)

## How `build.sh` Chooses the Dockerfile

The `build.sh` script automatically selects the correct Dockerfile based on the platform and optional flags:

### Dockerfile Selection Logic

```
IF framework == "VLLM":
IF --dgx-spark flag is set OR platform is linux/arm64:
Use: Dockerfile.vllm.dgx-spark (NVIDIA's pre-built vLLM with Blackwell support)
ELSE:
Use: Dockerfile.vllm (Build from source)
ELSE IF framework == "TRTLLM":
Use: Dockerfile.trtllm
ELSE IF framework == "SGLANG":
Use: Dockerfile.sglang
ELSE:
Use: Dockerfile
```

### How to Use

#### For DGX-SPARK (Blackwell GPUs)

**Automatic detection (recommended):**
```bash
./container/build.sh --framework VLLM --platform linux/arm64
```

**Explicit flag:**
```bash
./container/build.sh --framework VLLM --dgx-spark
```

#### For x86_64 (standard GPUs)

```bash
./container/build.sh --framework VLLM
# or explicitly
./container/build.sh --framework VLLM --platform linux/amd64
```

## Key Differences

### Standard vLLM Dockerfile (`Dockerfile.vllm`)
- Builds vLLM from source
- Uses CUDA 12.8
- Supports: Ampere, Ada, Hopper GPUs
- **Does NOT support Blackwell (compute_121)**

### DGX-SPARK Dockerfile (`Dockerfile.vllm.dgx-spark`)
- Uses NVIDIA's pre-built vLLM container (`nvcr.io/nvidia/vllm:25.09-py3`)
- Uses CUDA 13.0
- Supports: **Blackwell GPUs (compute_121)** via DGX-SPARK
- Skips building vLLM from source (avoids nvcc errors)
- **Builds UCX v1.19.0 from source** with CUDA 13 support
- **Builds NIXL 0.7.0 from source** with CUDA 13 support (self-contained, no cache dependency)
- **Builds NIXL Python wheel** with CUDA 13 support
- Adds Dynamo's runtime customizations and integrations

## Why DGX-SPARK Needs Special Handling

DGX-SPARK systems use **Blackwell GPUs** with architecture `compute_121`. When trying to build vLLM from source with older CUDA toolchains:

```
ERROR: nvcc fatal : Unsupported gpu architecture 'compute_121a'
```

**Solution:** Use NVIDIA's pre-built vLLM container that already includes:
- CUDA 13.0 support
- Blackwell GPU architecture support
- DGX Spark functional support
- NVFP4 format optimization

### Why Build UCX and NIXL from Source?

The DGX-SPARK Dockerfile builds UCX v1.19.0 and NIXL 0.7.0 **from source** instead of copying from the base image:

**Reason 1: CUDA 13 Compatibility**
- NIXL 0.7.0 is the first version with native CUDA 13.0 support
- Building from source ensures proper linkage against `libcudart.so.13` (not `libcudart.so.12`)
- Avoids runtime errors: `libcudart.so.12: cannot open shared object file`

**Reason 2: Cache Independence**
- The base image (`dynamo_base`) may have cached NIXL 0.6.x built with CUDA 12
- Building fresh in the DGX-SPARK Dockerfile ensures we always get NIXL 0.7.0 with CUDA 13
- Self-contained build = predictable results

**Reason 3: ARM64 Optimization**
- UCX and NIXL are built specifically for `aarch64` architecture
- GDS backend is disabled (`-Ddisable_gds_backend=true`) as it's not supported on ARM64

## Build Arguments

When using the `--dgx-spark` flag, `build.sh` automatically:
- Selects `Dockerfile.vllm.dgx-spark`
- Sets `PLATFORM=linux/arm64` (forced)
- Sets `NIXL_REF=0.7.0` (for CUDA 13 support)
- Sets `ARCH=arm64` and `ARCH_ALT=aarch64`

The DGX-SPARK Dockerfile itself hardcodes:
- `BASE_IMAGE=nvcr.io/nvidia/vllm`
- `BASE_IMAGE_TAG=25.09-py3`

All other build arguments work the same way.

## Troubleshooting

### Error: `exec /bin/sh: exec format error`
- **Cause:** Building with wrong platform
- **Fix:** Use `--platform linux/arm64` for DGX-SPARK

### Error: `nvcc fatal : Unsupported gpu architecture 'compute_121a'`
- **Cause:** Building from source without Blackwell support
- **Fix:** Use `--dgx-spark` or `--platform linux/arm64` to use pre-built container

### Error: `libcudart.so.12: cannot open shared object file`
- **Cause:** NIXL was built with CUDA 12 but container has CUDA 13
- **Fix:** Rebuild with `--dgx-spark` flag to ensure NIXL 0.7.0 with CUDA 13 support
- **Verify:** Inside container: `ldd /opt/nvidia/nvda_nixl/lib/aarch64-linux-gnu/plugins/libplugin_UCX_MO.so | grep cudart` should show `libcudart.so.13` (not `.so.12`)

## References

- [NVIDIA vLLM Release 25.09 Documentation](https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html)
- [NVIDIA NGC Container Registry](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm)
- [NIXL 0.7.0 Release Notes](https://github.com/ai-dynamo/nixl/releases/tag/0.7.0) - CUDA 13.0 support
- [DGX-SPARK README](../docs/backends/vllm/DGX-SPARK_README.md) - Complete deployment guide

263 changes: 263 additions & 0 deletions container/Dockerfile.vllm.dgx-spark
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# syntax=docker/dockerfile:1.10.0
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# DGX-SPARK specific Dockerfile for vLLM
# Uses NVIDIA's pre-built vLLM container that supports Blackwell GPUs (compute_121)
# See: https://docs.nvidia.com/deeplearning/frameworks/vllm-release-notes/rel-25-09.html

ARG BASE_IMAGE="nvcr.io/nvidia/vllm"
ARG BASE_IMAGE_TAG="25.09-py3"

ARG DYNAMO_BASE_IMAGE="dynamo:latest-none"
FROM ${DYNAMO_BASE_IMAGE} AS dynamo_base

########################################################
########## Runtime Image (based on NVIDIA vLLM) #######
########################################################
#
# PURPOSE: Production runtime environment for DGX-SPARK
#
# This stage uses NVIDIA's pre-built vLLM container that already includes:
# - vLLM with DGX Spark functional support (Blackwell compute_121)
# - CUDA 13.0 support
# - NVFP4 format support
# - All necessary GPU acceleration libraries
#
# We add Dynamo's customizations on top:
# - Dynamo runtime libraries
# - NIXL for KV cache transfer
# - Custom backend integrations
#

FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS runtime

WORKDIR /workspace
ENV DYNAMO_HOME=/opt/dynamo
ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Add system Python site-packages to PYTHONPATH so we can use NVIDIA's vLLM
ENV PYTHONPATH="/usr/local/lib/python3.12/dist-packages:${PYTHONPATH}"

# NVIDIA vLLM container already has Python 3.12 and vLLM installed
# We just need to set up Dynamo's virtual environment and dependencies
ARG ARCH_ALT=aarch64
ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl
ENV NIXL_LIB_DIR=$NIXL_PREFIX/lib/${ARCH_ALT}-linux-gnu
ENV NIXL_PLUGIN_DIR=$NIXL_LIB_DIR/plugins

# Install additional dependencies for Dynamo
# Note: NVIDIA vLLM container already has Python and CUDA tools
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# Python runtime - CRITICAL for virtual environment to work
python3.12-dev \
build-essential \
# jq and curl for polling various endpoints and health checks
jq \
git \
git-lfs \
curl \
# Libraries required by UCX to find RDMA devices
libibverbs1 rdma-core ibverbs-utils libibumad3 \
libnuma1 librdmacm1 ibverbs-providers \
# JIT Kernel Compilation, flashinfer
ninja-build \
g++ \
# prometheus dependencies
ca-certificates && \
rm -rf /var/lib/apt/lists/*

# NVIDIA vLLM container has CUDA already, but ensure CUDA tools are in PATH
ENV PATH=/usr/local/cuda/bin:$PATH

# DeepGemm runs nvcc for JIT kernel compilation, however the CUDA include path
# is not properly set for compilation. Set CPATH to help nvcc find the headers.
ENV CPATH=/usr/local/cuda/include

### COPY NATS & ETCD ###
# Copy nats and etcd from dev image
COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
# Add ETCD and CUDA binaries to PATH so cicc and other CUDA tools are accessible
ENV PATH=/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:/usr/local/cuda/bin:$PATH

### COPY UV EARLY (needed for building NIXL Python wheel) ###
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
COPY --from=ghcr.io/astral-sh/uv:latest /uvx /bin/uvx

# Build UCX and NIXL directly in this stage for CUDA 13.0 support
# This ensures we get fresh NIXL 0.7.0 with CUDA 13 support, not cached CUDA 12 version

# Build UCX from source
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
autoconf automake libtool pkg-config \
libibverbs-dev librdmacm-dev \
&& rm -rf /var/lib/apt/lists/* \
&& cd /usr/local/src \
&& git clone https://github.com/openucx/ucx.git \
&& cd ucx && git checkout v1.19.0 \
&& ./autogen.sh \
&& ./configure \
--prefix=/usr/local/ucx \
--enable-shared \
--disable-static \
--disable-doxygen-doc \
--enable-optimizations \
--enable-cma \
--enable-devel-headers \
--with-cuda=/usr/local/cuda \
--with-verbs \
--with-dm \
--enable-mt \
&& make -j$(nproc) \
&& make -j$(nproc) install-strip \
&& echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf \
&& echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf \
&& ldconfig \
&& cd /usr/local/src \
&& rm -rf ucx

# Build NIXL 0.7.0 from source with CUDA 13.0 support
# Build both C++ library and Python wheel
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
meson ninja-build python3-pip \
&& rm -rf /var/lib/apt/lists/* \
&& git clone --depth 1 --branch 0.7.0 "https://github.com/ai-dynamo/nixl.git" /opt/nixl \
&& cd /opt/nixl \
&& meson setup build/ --buildtype=release --prefix=$NIXL_PREFIX -Ddisable_gds_backend=true \
&& ninja -C build/ -j$(nproc) \
&& ninja -C build/ install \
&& echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf \
&& echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf \
&& ldconfig \
&& mkdir -p /opt/dynamo/wheelhouse/nixl \
&& /bin/uv build . --out-dir /opt/dynamo/wheelhouse/nixl --config-settings=setup-args="-Ddisable_gds_backend=true" \
&& cd - \
&& rm -rf /opt/nixl

ENV PATH=/usr/local/ucx/bin:$PATH

# Set library paths for NIXL and UCX
ENV LD_LIBRARY_PATH=\
/usr/local/cuda/lib64:\
$NIXL_LIB_DIR:\
$NIXL_PLUGIN_DIR:\
/usr/local/ucx/lib:\
/usr/local/ucx/lib/ucx:\
$LD_LIBRARY_PATH

### VIRTUAL ENVIRONMENT SETUP ###
# Note: uv was already copied earlier (needed for building NIXL Python wheel)

# Create Dynamo's virtual environment
RUN uv venv /opt/dynamo/venv --python 3.12

# Install Dynamo dependencies
# Note: vLLM is available via PYTHONPATH pointing to system Python
# Note: We copy dynamo wheels from base, but NIXL wheel was built fresh above with CUDA 13 support
COPY benchmarks/ /opt/dynamo/benchmarks/
RUN mkdir -p /opt/dynamo/wheelhouse
COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl /opt/dynamo/wheelhouse/
COPY --from=dynamo_base /opt/dynamo/wheelhouse/ai_dynamo*.whl /opt/dynamo/wheelhouse/
RUN uv pip install \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl \
&& cd /opt/dynamo/benchmarks \
&& UV_GIT_LFS=1 uv pip install --no-cache . \
&& cd - \
&& rm -rf /opt/dynamo/benchmarks

# Install common and test dependencies
RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
--mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
UV_GIT_LFS=1 uv pip install \
--no-cache \
--requirement /tmp/requirements.txt \
--requirement /tmp/requirements.test.txt

# Copy benchmarks, examples, and tests for CI
COPY . /workspace/

# Copy attribution files
COPY ATTRIBUTION* LICENSE /workspace/

# Copy launch banner
RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \
sed '/^#\s/d' /workspace/launch_message.txt > ~/.launch_screen && \
echo "cat ~/.launch_screen" >> ~/.bashrc && \
echo "source $VIRTUAL_ENV/bin/activate" >> ~/.bashrc

ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []

###########################################################
########## Development (run.sh, runs as root user) ########
###########################################################
#
# PURPOSE: Local development environment for use with run.sh (not Dev Container plug-in)
#
# This stage runs as root and provides:
# - Development tools and utilities for local debugging
# - Support for vscode/cursor development outside the Dev Container plug-in
#
# Use this stage if you need a full-featured development environment with extra tools,
# but do not use it with the Dev Container plug-in.

FROM runtime AS dev

# Don't want ubuntu to be editable, just change uid and gid.
ARG WORKSPACE_DIR=/workspace

# Install utilities as root
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
# Install utilities
nvtop \
wget \
tmux \
vim \
git \
openssh-client \
iproute2 \
rsync \
zip \
unzip \
htop \
# Build Dependencies
autoconf \
automake \
cmake \
libtool \
meson \
net-tools \
pybind11-dev \
# Rust build dependencies
clang \
libclang-dev \
protobuf-compiler && \
rm -rf /var/lib/apt/lists/*

# Set workspace directory variable
ENV WORKSPACE_DIR=${WORKSPACE_DIR} \
DYNAMO_HOME=${WORKSPACE_DIR} \
RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
CARGO_TARGET_DIR=/workspace/target \
VIRTUAL_ENV=/opt/dynamo/venv \
PATH=/usr/local/cargo/bin:$PATH

COPY --from=dynamo_base /usr/local/rustup /usr/local/rustup
COPY --from=dynamo_base /usr/local/cargo /usr/local/cargo

# Install maturin, for maturin develop
# Editable install of dynamo
RUN uv pip install maturin[patchelf] && \
uv pip install --no-deps -e .

ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []

Loading