Skip to content

Commit

Permalink
Merge pull request #115 from shlee007/dev/docker
Browse files Browse the repository at this point in the history
Unify docker images and update inference scripts.
  • Loading branch information
JacobKong authored Dec 12, 2024
2 parents 3ef9a88 + 80a1a69 commit fae0bec
Show file tree
Hide file tree
Showing 8 changed files with 104 additions and 172 deletions.
48 changes: 19 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ The video is heavily compressed to comply with GitHub's policy. The high-quality
- [Run a Gradio Server](#run-a-gradio-server)
- [More Configurations](#more-configurations)
- [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
- [Install Dependencies Compatible with xDiT](#install-dependencies-compatible-with-xdit)
- [Using Command Line](#using-command-line-1)
- [🔗 BibTeX](#-bibtex)
- [🧩 Projects that use HunyuanVideo](#-projects-that-use-hunyuanvideo)
Expand Down Expand Up @@ -201,24 +200,32 @@ cd HunyuanVideo

### Installation Guide for Linux

We provide an `environment.yml` file for setting up a Conda environment.
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).

We recommend CUDA versions 12.4 or 11.8 for the manual installation.

Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).

```shell
# 1. Prepare conda environment
conda env create -f environment.yml
# 1. Create conda environment
conda create -n HunyuanVideo python==3.10.9

# 2. Activate the environment
conda activate HunyuanVideo

# 3. Install pip dependencies
# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt

# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]

# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
python -m pip install xfuser==0.4.0
```

In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
Expand All @@ -230,9 +237,12 @@ export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

# Option 2: Forcing to explictly use the CUDA 11.8 compiled version of Pytorch and all the other packages
pip uninstall -r requirements.txt # uninstall all packages
pip uninstall -y xfuser
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m pip install git+https://github.com/Dao-AILab/[email protected]
pip install ninja
pip install git+https://github.com/Dao-AILab/[email protected]
pip install xfuser==0.4.0
```

Additionally, HunyuanVideo also provides a pre-built Docker image. Use the following command to pull and run the docker image.
Expand Down Expand Up @@ -306,26 +316,6 @@ We list some more useful configurations for easy usage:
[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model.

### Install Dependencies Compatible with xDiT

```
# 1. Create a black conda environment
conda create -n hunyuanxdit python==3.10.9
conda activate hunyuanxdit
# 3. Install PyTorch component with CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# 3. Install pip dependencies
python -m pip install -r requirements_xdit.txt
```

You can skip the above steps and pull the pre-built docker image directly, which is built from [docker/Dockerfile_xDiT](./docker/Dockerfile_xDiT)

```
docker pull thufeifeibear/hunyuanvideo:latest
```

### Using Command Line

For example, to generate a video with 8 GPUs, you can use the following command:
Expand Down
61 changes: 26 additions & 35 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,11 @@
- [🛠️ 安装和依赖](#️-安装和依赖)
- [Linux 安装指引](#linux-安装指引)
- [🧱 下载预训练模型](#-下载预训练模型)
- [🔑 推理](#-推理)
- [🔑 单卡推理](#-单卡推理)
- [使用命令行](#使用命令行)
- [运行gradio服务](#运行gradio服务)
- [更多配置](#更多配置)
- [🚀 使用 xDiT 实现多卡并行推理](#-使用-xdit-实现多卡并行推理)
- [安装与 xDiT 兼容的依赖项](#安装与-xdit-兼容的依赖项)
- [使用命令行](#使用命令行-1)
- [🔗 BibTeX](#-bibtex)
- [🧩 使用 HunyuanVideo 的项目](#-使用-hunyuanvideo-的项目)
Expand Down Expand Up @@ -194,46 +193,58 @@ cd HunyuanVideo

### Linux 安装指引

我们提供了 `environment.yml` 文件来设置 Conda 环境。Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)
我们推荐使用 CUDA 12.4 或 11.8 的版本

我们推理使用 CUDA 12.4 或 11.8 的版本
Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)

```shell
# 1. Prepare conda environment
conda env create -f environment.yml
# 1. Create conda environment
conda create -n HunyuanVideo python==3.10.9

# 2. Activate the environment
conda activate HunyuanVideo

# 3. Install pip dependencies
# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt

# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]

# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
python -m pip install xfuser==0.4.0
```

如果在特定GPU型号上遭遇float point exception(core dump)问题,可尝试以下方案修复:
如果在特定 GPU 型号上遭遇 float point exception(core dump) 问题,可尝试以下方案修复:

```shell
#选项1:确保已正确安装CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00(或直接使用我们提供的CUDA12镜像)
#选项1:确保已正确安装 CUDA 12.4, CUBLAS>=12.4.5.8, CUDNN>=9.00 (或直接使用我们提供的CUDA12镜像)
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

#选项2:强制显式使用CUDA11.8编译的Pytorch版本以及其他所有软件包
#选项2:强制显式使用 CUDA11.8 编译的 Pytorch 版本以及其他所有软件包
pip uninstall -r requirements.txt # 确保卸载所有依赖包
pip uninstall -y xfuser
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m pip install git+https://github.com/Dao-AILab/[email protected]
pip install ninja
pip install git+https://github.com/Dao-AILab/[email protected]
pip install xfuser==0.4.0
```

另外,我们提供了一个预构建的 Docker 镜像,可以使用如下命令进行拉取和运行。
```shell
# 用于CUDA 12.4 (已更新避免float point exception)
# 用于 CUDA 12.4 (已更新避免 float point exception)
docker pull hunyuanvideo/hunyuanvideo:cuda_12
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12

# 用于CUDA 11.8
# 用于 CUDA 11.8
docker pull hunyuanvideo/hunyuanvideo:cuda_11
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11
```
Expand All @@ -242,7 +253,7 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua

下载预训练模型参考[这里](ckpts/README.md)

## 🔑 推理
## 🔑 单卡推理
我们在下表中列出了支持的高度/宽度/帧数设置。

| 分辨率 | h/w=9:16 | h/w=16:9 | h/w=4:3 | h/w=3:4 | h/w=1:1 |
Expand Down Expand Up @@ -297,26 +308,6 @@ python3 gradio_server.py --flow-reverse
[xDiT](https://github.com/xdit-project/xDiT) 是一个针对多 GPU 集群的扩展推理引擎,用于扩展 Transformers(DiTs)。
它成功为各种 DiT 模型(包括 mochi-1、CogVideoX、Flux.1、SD3 等)提供了低延迟的并行推理解决方案。该存储库采用了 [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) API 用于混元视频模型的并行推理。

### 安装与 xDiT 兼容的依赖项

```
# 1. 创建一个空白的 conda 环境
conda create -n hunyuanxdit python==3.10.9
conda activate hunyuanxdit
# 2. 使用 CUDA 11.8 安装 PyTorch 组件
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# 3. 安装 pip 依赖项
python -m pip install -r requirements_xdit.txt
```

您可以跳过上述步骤,直接拉取预构建的 Docker 镜像,这个镜像是从 [docker/Dockerfile_xDiT](./docker/Dockerfile_xDiT) 构建的

```
docker pull thufeifeibear/hunyuanvideo:latest
```

### 使用命令行

例如,可用如下命令使用8张GPU卡完成推理
Expand Down
41 changes: 0 additions & 41 deletions docker/Dockerfile_xDiT

This file was deleted.

8 changes: 0 additions & 8 deletions environment.yml

This file was deleted.

6 changes: 2 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
torchvision==0.16.1
opencv-python==4.9.0.80
diffusers==0.30.2
diffusers==0.31.0
transformers==4.46.3
tokenizers==0.20.3
accelerate==1.1.1
Expand All @@ -12,5 +11,4 @@ loguru==0.7.2
imageio==2.34.0
imageio-ffmpeg==0.5.1
safetensors==0.4.3
gradio==4.43.0
urllib3==1.26.6
gradio==4.43.0
16 changes: 0 additions & 16 deletions requirements_xdit.txt

This file was deleted.

50 changes: 11 additions & 39 deletions scripts/run_sample_video.sh
Original file line number Diff line number Diff line change
@@ -1,42 +1,14 @@
#!/bin/bash
# Description: This script demonstrates how to inference a video based on HunyuanVideo model

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=8 \
sample_video.py --video-size 1280 720 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=8 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node 4 \
sample_video.py --video-size 1280 720 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=4 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=2 \
sample_video.py --video-size 1280 720 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=2 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=1 \
sample_video.py --video-size 1280 720 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=1 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=6 \
sample_video.py --video-size 960 960 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=6 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=3 \
sample_video.py --video-size 960 960 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=3 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=2 \
sample_video.py --video-size 960 960 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=2 --ring-degree=1 --seed 42 --save-path ./results

TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=1 \
sample_video.py --video-size 1280 720 --video-length 129 \
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \
--flow-reverse --ulysses-degree=1 --ring-degree=1 --seed 42 --save-path ./results
python3 sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--embedded-cfg-scale 6.0 \
--flow-shift 7.0 \
--flow-reverse \
--use-cpu-offload \
--save-path ./results
46 changes: 46 additions & 0 deletions scripts/run_sample_video_multigpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash
# Description: This script demonstrates how to inference a video based on HunyuanVideo model

# Supported Parallel Configurations
# | --video-size | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node |
# |----------------------|----------------|----------------------------------|------------------|
# | 1280 720 or 720 1280 | 129 | 8x1,4x2,2x4,1x8 | 8 |
# | 1280 720 or 720 1280 | 129 | 1x5 | 5 |
# | 1280 720 or 720 1280 | 129 | 4x1,2x2,1x4 | 4 |
# | 1280 720 or 720 1280 | 129 | 3x1,1x3 | 3 |
# | 1280 720 or 720 1280 | 129 | 2x1,1x2 | 2 |
# | 1104 832 or 832 1104 | 129 | 4x1,2x2,1x4 | 4 |
# | 1104 832 or 832 1104 | 129 | 3x1,1x3 | 3 |
# | 1104 832 or 832 1104 | 129 | 2x1,1x2 | 2 |
# | 960 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
# | 960 960 | 129 | 4x1,2x2,1x4 | 4 |
# | 960 960 | 129 | 3x1,1x3 | 3 |
# | 960 960 | 129 | 1x2,2x1 | 2 |
# | 960 544 or 544 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
# | 960 544 or 544 960 | 129 | 4x1,2x2,1x4 | 4 |
# | 960 544 or 544 960 | 129 | 3x1,1x3 | 3 |
# | 960 544 or 544 960 | 129 | 1x2,2x1 | 2 |
# | 832 624 or 624 832 | 129 | 4x1,2x2,1x4 | 4 |
# | 624 832 or 624 832 | 129 | 3x1,1x3 | 3 |
# | 832 624 or 624 832 | 129 | 2x1,1x2 | 2 |
# | 720 720 | 129 | 1x5 | 5 |
# | 720 720 | 129 | 3x1,1x3 | 3 |

export TOKENIZERS_PARALLELISM=false

export NPROC_PER_NODE=8
export ULYSSES_DEGREE=8
export RING_DEGREE=1

torchrun --nproc_per_node=$NPROC_PER_NODE sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--embedded-cfg-scale 6.0 \
--flow-shift 7.0 \
--flow-reverse \
--ulysses-degree=$ULYSSES_DEGREE \
--ring-degree=$RING_DEGREE \
--save-path ./results

0 comments on commit fae0bec

Please sign in to comment.