Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Include any tests here.
- [ ] Manual testing

## Checklist
- [ ] My code follows the style guidelines, e.g. `format.sh`.
- [ ] I have run `build_and_install.sh` to verify compilation.
- [ ] I have run `format.sh` to follow the style guidelines.
- [ ] I have run `build.sh` to verify compilation.
- [ ] I have removed redundant variables and comments.
- [ ] I have updated the documentation.
- [ ] I have added tests.
6 changes: 3 additions & 3 deletions .github/workflows/uccl-build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ jobs:
conda activate uccl

cd /home/skytestuser/uccl-test
./build_and_install.sh cuda all 3.11 2>&1 | tee build.log
./build.sh cuda all 3.11 --install 2>&1 | tee build.log

grep -q \"Successfully installed uccl-0.0.1.post4\" build.log
"'
Expand Down Expand Up @@ -170,8 +170,8 @@ jobs:
exit 1
fi

if ! python -c 'import torch; import uccl.ep'; then
echo 'Import of torch and uccl.ep failed. Cleaning up and exiting...'
if ! python -c 'import torch; import uccl_ep'; then
echo 'Import of torch and uccl_ep failed. Cleaning up and exiting...'
python setup.py clean
exit 1
fi
Expand Down
52 changes: 8 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,17 @@ The easiest way to use UCCL is to first build based on your platform. The build

```bash
git clone https://github.com/uccl-project/uccl.git --recursive && cd uccl
bash build_and_install.sh [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p|ep] [py_version] [rocm_index_url]
# Eg, bash build_and_install.sh cuda ep

# For collective and p2p: eg, bash build.sh cuda ccl_rdma --install
bash build.sh [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p] [py_version] [rocm_index_url] --install

# For ep:
cd ep && bash build.sh [cuda|rocm] [py_version] --install
```
> Note:
> - when building for ROCm with python packaging through TheRock, please specify your ROCm index url; the default is `https://rocm.prereleases.amd.com/whl/gfx94X-dcgpu` and it may not be what you want. When installing UCCL wheels for TheRock, please provide pip with the index url and add the optional extra `[rocm]` to the wheel, e.g., `pip install --extra-index-url https://rocm.prereleases.amd.com/whl/gfx94X-dcgpu wheelhouse-therock/uccl-0.0.1.post4-py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl[rocm]`.
> - you can build with different CUDA or ROCm versions by specifying tags such as cuda13 or rocm6. The default versions are CUDA 12.x for the "cuda" tag and ROCm 7.x for the "rocm" tag.
> - check [docs/wheel_build.md](./docs/wheel_build.md) for details.

Then, when running your PyTorch applications, set the environment variable accordingly:
```bash
Expand Down Expand Up @@ -127,8 +132,6 @@ export UCCL_HOME=$(pwd)/uccl
```

To build UCCL for development, you need to install some common dependencies:
<details><summary>Click me</summary>

```bash
# Note if you are using docker+wheel build, there is no need to install the following dependencies.
sudo apt update
Expand All @@ -149,7 +152,6 @@ pip install paramiko pybind11
# Upgrade conda glic to modern ones
conda install -c conda-forge "libstdcxx-ng>=12" "libgcc-ng>=12"
```
</details>

For quick installation with docker, you can directly dive into:
* [`UCCL-Collective RDMA`](collective/rdma/README.md): Collectives for Nvidia/AMD GPUs + IB/RoCE RDMA NICs (currently support Nvidia and Broadcom NICs)
Expand All @@ -160,44 +162,6 @@ For quick installation with docker, you can directly dive into:
* [`UCCL-P2P`](p2p/README.md): P2P for RDMA NICs and GPU IPCs (currently support Nvidia/AMD GPUs and Nvidia/Broadcom NICs)
* [`UCCL-EP`](ep/README.md): EP for MoE training and inference with DeepEP-compatible APIs (currently support Nvidia/AMD GPUs and Nvidia/Broadcom/EFA NICs)

### Python Wheel Build

Run the following to build Python wheels:
```bash
cd $UCCL_HOME
./build.sh [cuda|rocm|therock] [all|rdma|p2p|efa|ep] [py_version] [rocm_index_url]
```

Run the following to install the wheels locally:
```bash
cd $UCCL_HOME
pip install wheelhouse-[cuda/rocm]/uccl-*.whl
```

The cross-compilation matrix is as follows:

| Platform/Feature | rdma-cuda | rdma-rocm | rdma-arm | p2p-cuda | p2p-rocm | p2p-arm | efa |
|--------------------|-----------|-----------|----------|----------|----------|---------|-----|
| cuda + x86 | ✓ | ✓ | x | ✓ | ✓ | x | ✓ |
| cuda + arm (gh200) | ✓ | x | x | ✓ | x | x | x |
| rocm + x86 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x |
| aws p4d/p4de | ✓ | ✓ | x | ✓ | x | x | ✓ |

Note that you need ARM hosts to build ARM wheels, as cross-compilation tool `qemu-user-static` cannot emulate CUDA or ROCm.

### On Cloudlab CPU Machines

If you want to build nccl and nccl-tests on cloudlab ubuntu22, you need to install cuda and openmpi:

```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo apt install ./cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit -y
sudo apt install nvidia-driver-550 nvidia-utils-550 -y
sudo apt-get install openmpi-bin openmpi-doc libopenmpi-dev -y
```

</details>

## Citation
Expand All @@ -212,7 +176,7 @@ The code in this repository is mostly described in the papers below. Please cons
}
```
```bibtex
@article{mao2025uccl,
@article{uccl_ep,
title={UCCL-EP: Portable Expert-Parallel Communication},
author={Mao, Ziming and Zhang, Yihan and Cui, Chihan and You, Kaichao and Chen, Zhongjie and Xu, Zhiying and Shenker, Scott and Raiciu, Costin and Zhou, Yang and Stoica, Ion},
journal={arXiv preprint arXiv:2512.19849},
Expand Down
88 changes: 50 additions & 38 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ set -e
# a purpose-built Docker/Podman image derived from Ubuntu 22.04.
#
# Usage:
# ./build.sh [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p|ep] [py_version] [rocm_index_url] [therock_base_image]
# ./build.sh [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p] [py_version] [rocm_index_url] [therock_base_image] [--install]
#
# Environment Variables:
# CONTAINER_ENGINE=podman Use podman instead of docker.
Expand All @@ -18,9 +18,19 @@ set -e
# The wheels are written to wheelhouse-[cuda|rocm|therock]
# -----------------------

TARGET=${1:-cuda}
BUILD_TYPE=${2:-all}
PY_VER=${3:-$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")}
# Parse arguments: positional args + --install flag
DO_INSTALL=0
POSITIONAL_ARGS=()
for arg in "$@"; do
case "$arg" in
--install) DO_INSTALL=1 ;;
*) POSITIONAL_ARGS+=("$arg") ;;
esac
done

TARGET=${POSITIONAL_ARGS[0]:-cuda}
BUILD_TYPE=${POSITIONAL_ARGS[1]:-all}
PY_VER=${POSITIONAL_ARGS[2]:-$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")}
ARCH="$(uname -m)"

# Container engine: "docker" (default) or "podman"
Expand All @@ -30,14 +40,14 @@ if [[ "$CONTAINER_ENGINE" != "docker" && "$CONTAINER_ENGINE" != "podman" ]]; the
exit 1
fi
# The default for ROCM_IDX_URL depends on the gfx architecture of your GPU and the index URLs may change.
ROCM_IDX_URL=${4:-https://rocm.prereleases.amd.com/whl/gfx94X-dcgpu}
ROCM_IDX_URL=${POSITIONAL_ARGS[3]:-https://rocm.prereleases.amd.com/whl/gfx94X-dcgpu}
# The default for THEROCK_BASE_IMAGE is current, but may change. Make sure to track TheRock's dockerfile.
THEROCK_BASE_IMAGE=${5:-quay.io/pypa/manylinux_2_28_x86_64@sha256:d632b5e68ab39e59e128dcf0e59e438b26f122d7f2d45f3eea69ffd2877ab017}
THEROCK_BASE_IMAGE=${POSITIONAL_ARGS[4]:-quay.io/pypa/manylinux_2_28_x86_64@sha256:d632b5e68ab39e59e128dcf0e59e438b26f122d7f2d45f3eea69ffd2877ab017}
IS_EFA=$( [ -d "/sys/class/infiniband/" ] && ls /sys/class/infiniband/ 2>/dev/null | grep -q rdmap && echo "EFA support: true" ) || echo "EFA support: false"


if [[ $TARGET != cuda* && $TARGET != rocm* && $TARGET != "therock" ]]; then
echo "Usage: $0 [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p|ep] [py_version] [rocm_index_url] [therock_base_image]" >&2
echo "Usage: $0 [cuda|rocm|therock] [all|ccl_rdma|ccl_efa|p2p] [py_version] [rocm_index_url] [therock_base_image] [--install]" >&2
exit 1
fi

Expand Down Expand Up @@ -175,32 +185,6 @@ build_p2p() {
fi
}

build_ep() {
local TARGET="$1"
local ARCH="$2"
local IS_EFA="$3"

set -euo pipefail
echo "[container] build_ep Target: $TARGET"

if [[ "${USE_INTEL_RDMA_NIC:-0}" == "1" ]]; then
echo "[container] Building EP with Intel RDMA NIC support (USE_INTEL_RDMA_NIC=1)"
fi

if [[ "$TARGET" == "therock" ]]; then
echo "Skipping GPU-driven build on therock (no GPU-driven support yet)."
elif [[ "$TARGET" == rocm* || "$TARGET" == cuda* ]]; then
cd ep
# This may be needed if you traverse through different git commits
# make clean && rm -r build || true
USE_INTEL_RDMA_NIC=${USE_INTEL_RDMA_NIC:-0} python3 setup.py build
cd ..
echo "[container] Copying GPU-driven .so to uccl/"
mkdir -p uccl/lib
cp ep/build/**/*.so uccl/
fi
}

build_ukernel() {
local TARGET="$1"
local ARCH="$2"
Expand Down Expand Up @@ -303,7 +287,7 @@ echo "[2/3] Running build inside container..."

# Auto-detect CUDA architecture for ep build
DETECTED_GPU_ARCH=""
if [[ "$BUILD_TYPE" =~ (ep|all|p2p) ]];then
if [[ "$BUILD_TYPE" =~ (all|p2p) ]];then
if [[ "$TARGET" == cuda* ]] && command -v nvidia-smi &> /dev/null; then
DETECTED_GPU_ARCH="$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -n1 | tr -d ' ' || true)"

Expand Down Expand Up @@ -387,7 +371,7 @@ ${CONTAINER_ENGINE} "${CONTAINER_RUN_ARGS[@]}" \
-e TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-}" \
-e DISABLE_AGGRESSIVE_ATOMIC="${DISABLE_AGGRESSIVE_ATOMIC:-0}" \
-e UCCL_WHEEL_PLAT="${UCCL_WHEEL_PLAT:-}" \
-e FUNCTION_DEF="$(declare -f build_rccl_nccl_h build_ccl_rdma build_ccl_efa build_p2p build_ep build_ukernel)" \
-e FUNCTION_DEF="$(declare -f build_rccl_nccl_h build_ccl_rdma build_ccl_efa build_p2p build_ukernel)" \
-w /io \
"$IMAGE_NAME" /bin/bash -c '
set -euo pipefail
Expand Down Expand Up @@ -417,15 +401,12 @@ ${CONTAINER_ENGINE} "${CONTAINER_RUN_ARGS[@]}" \
build_ccl_efa "$TARGET" "$ARCH" "$IS_EFA"
elif [[ "$BUILD_TYPE" == "p2p" ]]; then
build_p2p "$TARGET" "$ARCH" "$IS_EFA"
elif [[ "$BUILD_TYPE" == "ep" ]]; then
build_ep "$TARGET" "$ARCH" "$IS_EFA"
elif [[ "$BUILD_TYPE" == "ukernel" ]]; then
build_ukernel "$TARGET" "$ARCH" "$IS_EFA"
elif [[ "$BUILD_TYPE" == "all" ]]; then
build_ccl_rdma "$TARGET" "$ARCH" "$IS_EFA"
build_ccl_efa "$TARGET" "$ARCH" "$IS_EFA"
build_p2p "$TARGET" "$ARCH" "$IS_EFA"
# build_ep "$TARGET" "$ARCH" "$IS_EFA"
# build_ukernel "$TARGET" "$ARCH" "$IS_EFA"
fi

Expand Down Expand Up @@ -527,3 +508,34 @@ def initialize():
# 3. Done
echo "[3/3] Wheel built successfully (stored in ${WHEEL_DIR}):"
ls -lh "${WHEEL_DIR}"/uccl-*.whl || true

# 4. Optionally install the built wheel
if [[ "$DO_INSTALL" == "1" ]]; then
# Auto-detect uv vs pip
if command -v uv &> /dev/null && [[ -n "${VIRTUAL_ENV:-}" ]]; then
PIP_CMD="uv pip"
else
PIP_CMD="pip"
fi
echo "[4/4] Installing uccl wheel (using ${PIP_CMD})..."
${PIP_CMD} install -r requirements.txt 2>/dev/null || true
${PIP_CMD} uninstall uccl -y 2>/dev/null || true
if [[ "$TARGET" != "therock" ]]; then
${PIP_CMD} install "${WHEEL_DIR}"/uccl-*.whl --no-deps
else
${PIP_CMD} install --extra-index-url "${ROCM_IDX_URL}" "$(ls "${WHEEL_DIR}"/uccl-*.whl)[rocm]"
fi

UCCL_INSTALL_PATH=$(${PIP_CMD} show uccl 2>/dev/null | grep "^Location:" | cut -d' ' -f2 || echo "")
if [[ -n "$UCCL_INSTALL_PATH" && -d "$UCCL_INSTALL_PATH" ]]; then
UCCL_PACKAGE_PATH="$UCCL_INSTALL_PATH/uccl"
if [[ -d "$UCCL_PACKAGE_PATH" ]]; then
echo "UCCL installed at: $UCCL_PACKAGE_PATH"
echo "Set LIBRARY_PATH: export LIBRARY_PATH=\"$UCCL_PACKAGE_PATH/lib:\$LIBRARY_PATH\""
else
echo "UCCL package directory not found at: $UCCL_PACKAGE_PATH"
fi
else
echo "Warning: Could not detect UCCL installation path"
fi
fi
60 changes: 0 additions & 60 deletions build_and_install.sh

This file was deleted.

2 changes: 1 addition & 1 deletion collective/efa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ make MPI=1 MPI_HOME=/opt/amazon/openmpi CUDA_HOME=/usr/local/cuda NCCL_HOME=$UCC

The easiest way is to use docker, which packs all needed external libraries into a python wheel and install into your local python env:
```bash
cd $UCCL_HOME && bash build_and_install.sh cuda efa
cd $UCCL_HOME && bash build.sh cuda efa --install
```

The following alternative is best for development where you have installed all needed external libraries:
Expand Down
4 changes: 2 additions & 2 deletions collective/rdma/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ make MPI=1 MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi CUDA_HOME=/usr/local/cuda

The easiest way is to use docker, which packs all needed external libraries into a python wheel and install into your local python env:
```bash
cd $UCCL_HOME && bash build_and_install.sh cuda rdma
cd $UCCL_HOME && bash build.sh cuda rdma --install
```

The following alternative is best for development where you have installed all needed external libraries:
Expand Down Expand Up @@ -107,7 +107,7 @@ make MPI=1 MPI_HOME=/opt/ohpc/pub/mpi/openmpi4-gnu12/4.1.5 HIP_HOME=/opt/rocm-6.

The easiest way is to use docker:
```bash
cd $UCCL_HOME && bash build_and_install.sh rocm rdma
cd $UCCL_HOME && bash build.sh rocm rdma --install
```

The following alternative is best for development where you have installed all needed external libraries:
Expand Down
Loading