-
Notifications
You must be signed in to change notification settings - Fork 583
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #115 from shlee007/dev/docker
Unify docker images and update inference scripts.
- Loading branch information
Showing
8 changed files
with
104 additions
and
172 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -85,7 +85,6 @@ The video is heavily compressed to comply with GitHub's policy. The high-quality | |
- [Run a Gradio Server](#run-a-gradio-server) | ||
- [More Configurations](#more-configurations) | ||
- [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit) | ||
- [Install Dependencies Compatible with xDiT](#install-dependencies-compatible-with-xdit) | ||
- [Using Command Line](#using-command-line-1) | ||
- [🔗 BibTeX](#-bibtex) | ||
- [🧩 Projects that use HunyuanVideo](#-projects-that-use-hunyuanvideo) | ||
|
@@ -201,24 +200,32 @@ cd HunyuanVideo | |
|
||
### Installation Guide for Linux | ||
|
||
We provide an `environment.yml` file for setting up a Conda environment. | ||
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html). | ||
|
||
We recommend CUDA versions 12.4 or 11.8 for the manual installation. | ||
|
||
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html). | ||
|
||
```shell | ||
# 1. Prepare conda environment | ||
conda env create -f environment.yml | ||
# 1. Create conda environment | ||
conda create -n HunyuanVideo python==3.10.9 | ||
|
||
# 2. Activate the environment | ||
conda activate HunyuanVideo | ||
|
||
# 3. Install pip dependencies | ||
# 3. Install PyTorch and other dependencies using conda | ||
# For CUDA 11.8 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia | ||
# For CUDA 12.4 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia | ||
|
||
# 4. Install pip dependencies | ||
python -m pip install -r requirements.txt | ||
|
||
# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
python -m pip install ninja | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
|
||
# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3) | ||
python -m pip install xfuser==0.4.0 | ||
``` | ||
|
||
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions: | ||
|
@@ -230,9 +237,12 @@ export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/ | |
|
||
# Option 2: Forcing to explictly use the CUDA 11.8 compiled version of Pytorch and all the other packages | ||
pip uninstall -r requirements.txt # uninstall all packages | ||
pip uninstall -y xfuser | ||
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 | ||
pip install -r requirements.txt | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
pip install ninja | ||
pip install git+https://github.com/Dao-AILab/[email protected] | ||
pip install xfuser==0.4.0 | ||
``` | ||
|
||
Additionally, HunyuanVideo also provides a pre-built Docker image. Use the following command to pull and run the docker image. | ||
|
@@ -306,26 +316,6 @@ We list some more useful configurations for easy usage: | |
[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters. | ||
It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo model. | ||
|
||
### Install Dependencies Compatible with xDiT | ||
|
||
``` | ||
# 1. Create a black conda environment | ||
conda create -n hunyuanxdit python==3.10.9 | ||
conda activate hunyuanxdit | ||
# 3. Install PyTorch component with CUDA 11.8 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia | ||
# 3. Install pip dependencies | ||
python -m pip install -r requirements_xdit.txt | ||
``` | ||
|
||
You can skip the above steps and pull the pre-built docker image directly, which is built from [docker/Dockerfile_xDiT](./docker/Dockerfile_xDiT) | ||
|
||
``` | ||
docker pull thufeifeibear/hunyuanvideo:latest | ||
``` | ||
|
||
### Using Command Line | ||
|
||
For example, to generate a video with 8 GPUs, you can use the following command: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -81,12 +81,11 @@ | |
- [🛠️ 安装和依赖](#️-安装和依赖) | ||
- [Linux 安装指引](#linux-安装指引) | ||
- [🧱 下载预训练模型](#-下载预训练模型) | ||
- [🔑 推理](#-推理) | ||
- [🔑 单卡推理](#-单卡推理) | ||
- [使用命令行](#使用命令行) | ||
- [运行gradio服务](#运行gradio服务) | ||
- [更多配置](#更多配置) | ||
- [🚀 使用 xDiT 实现多卡并行推理](#-使用-xdit-实现多卡并行推理) | ||
- [安装与 xDiT 兼容的依赖项](#安装与-xdit-兼容的依赖项) | ||
- [使用命令行](#使用命令行-1) | ||
- [🔗 BibTeX](#-bibtex) | ||
- [🧩 使用 HunyuanVideo 的项目](#-使用-hunyuanvideo-的项目) | ||
|
@@ -194,46 +193,58 @@ cd HunyuanVideo | |
|
||
### Linux 安装指引 | ||
|
||
我们提供了 `environment.yml` 文件来设置 Conda 环境。Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)。 | ||
我们推荐使用 CUDA 12.4 或 11.8 的版本。 | ||
|
||
我们推理使用 CUDA 12.4 或 11.8 的版本。 | ||
Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)。 | ||
|
||
```shell | ||
# 1. Prepare conda environment | ||
conda env create -f environment.yml | ||
# 1. Create conda environment | ||
conda create -n HunyuanVideo python==3.10.9 | ||
|
||
# 2. Activate the environment | ||
conda activate HunyuanVideo | ||
|
||
# 3. Install pip dependencies | ||
# 3. Install PyTorch and other dependencies using conda | ||
# For CUDA 11.8 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia | ||
# For CUDA 12.4 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia | ||
|
||
# 4. Install pip dependencies | ||
python -m pip install -r requirements.txt | ||
|
||
# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
python -m pip install ninja | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
|
||
# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3) | ||
python -m pip install xfuser==0.4.0 | ||
``` | ||
|
||
如果在特定GPU型号上遭遇float point exception(core dump)问题,可尝试以下方案修复: | ||
如果在特定 GPU 型号上遭遇 float point exception(core dump) 问题,可尝试以下方案修复: | ||
|
||
```shell | ||
#选项1:确保已正确安装CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00(或直接使用我们提供的CUDA12镜像) | ||
#选项1:确保已正确安装 CUDA 12.4, CUBLAS>=12.4.5.8, 和 CUDNN>=9.00 (或直接使用我们提供的CUDA12镜像) | ||
pip install nvidia-cublas-cu12==12.4.5.8 | ||
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/ | ||
|
||
#选项2:强制显式使用CUDA11.8编译的Pytorch版本以及其他所有软件包 | ||
#选项2:强制显式使用 CUDA11.8 编译的 Pytorch 版本以及其他所有软件包 | ||
pip uninstall -r requirements.txt # 确保卸载所有依赖包 | ||
pip uninstall -y xfuser | ||
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 | ||
pip install -r requirements.txt | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
pip install ninja | ||
pip install git+https://github.com/Dao-AILab/[email protected] | ||
pip install xfuser==0.4.0 | ||
``` | ||
|
||
另外,我们提供了一个预构建的 Docker 镜像,可以使用如下命令进行拉取和运行。 | ||
```shell | ||
# 用于CUDA 12.4 (已更新避免float point exception) | ||
# 用于 CUDA 12.4 (已更新避免 float point exception) | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_12 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 | ||
|
||
# 用于CUDA 11.8 | ||
# 用于 CUDA 11.8 | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_11 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | ||
``` | ||
|
@@ -242,7 +253,7 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua | |
|
||
下载预训练模型参考[这里](ckpts/README.md)。 | ||
|
||
## 🔑 推理 | ||
## 🔑 单卡推理 | ||
我们在下表中列出了支持的高度/宽度/帧数设置。 | ||
|
||
| 分辨率 | h/w=9:16 | h/w=16:9 | h/w=4:3 | h/w=3:4 | h/w=1:1 | | ||
|
@@ -297,26 +308,6 @@ python3 gradio_server.py --flow-reverse | |
[xDiT](https://github.com/xdit-project/xDiT) 是一个针对多 GPU 集群的扩展推理引擎,用于扩展 Transformers(DiTs)。 | ||
它成功为各种 DiT 模型(包括 mochi-1、CogVideoX、Flux.1、SD3 等)提供了低延迟的并行推理解决方案。该存储库采用了 [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) API 用于混元视频模型的并行推理。 | ||
|
||
### 安装与 xDiT 兼容的依赖项 | ||
|
||
``` | ||
# 1. 创建一个空白的 conda 环境 | ||
conda create -n hunyuanxdit python==3.10.9 | ||
conda activate hunyuanxdit | ||
# 2. 使用 CUDA 11.8 安装 PyTorch 组件 | ||
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia | ||
# 3. 安装 pip 依赖项 | ||
python -m pip install -r requirements_xdit.txt | ||
``` | ||
|
||
您可以跳过上述步骤,直接拉取预构建的 Docker 镜像,这个镜像是从 [docker/Dockerfile_xDiT](./docker/Dockerfile_xDiT) 构建的 | ||
|
||
``` | ||
docker pull thufeifeibear/hunyuanvideo:latest | ||
``` | ||
|
||
### 使用命令行 | ||
|
||
例如,可用如下命令使用8张GPU卡完成推理 | ||
|
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,14 @@ | ||
#!/bin/bash | ||
# Description: This script demonstrates how to inference a video based on HunyuanVideo model | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=8 \ | ||
sample_video.py --video-size 1280 720 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=8 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node 4 \ | ||
sample_video.py --video-size 1280 720 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=4 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=2 \ | ||
sample_video.py --video-size 1280 720 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=2 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=1 \ | ||
sample_video.py --video-size 1280 720 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=1 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=6 \ | ||
sample_video.py --video-size 960 960 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=6 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=3 \ | ||
sample_video.py --video-size 960 960 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=3 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=2 \ | ||
sample_video.py --video-size 960 960 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=2 --ring-degree=1 --seed 42 --save-path ./results | ||
|
||
TOKENIZERS_PARALLELISM=false torchrun --nproc_per_node=1 \ | ||
sample_video.py --video-size 1280 720 --video-length 129 \ | ||
--infer-steps 50 --prompt "A cat walks on the grass, realistic style." \ | ||
--flow-reverse --ulysses-degree=1 --ring-degree=1 --seed 42 --save-path ./results | ||
python3 sample_video.py \ | ||
--video-size 720 1280 \ | ||
--video-length 129 \ | ||
--infer-steps 50 \ | ||
--prompt "A cat walks on the grass, realistic style." \ | ||
--seed 42 \ | ||
--embedded-cfg-scale 6.0 \ | ||
--flow-shift 7.0 \ | ||
--flow-reverse \ | ||
--use-cpu-offload \ | ||
--save-path ./results |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/bin/bash | ||
# Description: This script demonstrates how to inference a video based on HunyuanVideo model | ||
|
||
# Supported Parallel Configurations | ||
# | --video-size | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node | | ||
# |----------------------|----------------|----------------------------------|------------------| | ||
# | 1280 720 or 720 1280 | 129 | 8x1,4x2,2x4,1x8 | 8 | | ||
# | 1280 720 or 720 1280 | 129 | 1x5 | 5 | | ||
# | 1280 720 or 720 1280 | 129 | 4x1,2x2,1x4 | 4 | | ||
# | 1280 720 or 720 1280 | 129 | 3x1,1x3 | 3 | | ||
# | 1280 720 or 720 1280 | 129 | 2x1,1x2 | 2 | | ||
# | 1104 832 or 832 1104 | 129 | 4x1,2x2,1x4 | 4 | | ||
# | 1104 832 or 832 1104 | 129 | 3x1,1x3 | 3 | | ||
# | 1104 832 or 832 1104 | 129 | 2x1,1x2 | 2 | | ||
# | 960 960 | 129 | 6x1,3x2,2x3,1x6 | 6 | | ||
# | 960 960 | 129 | 4x1,2x2,1x4 | 4 | | ||
# | 960 960 | 129 | 3x1,1x3 | 3 | | ||
# | 960 960 | 129 | 1x2,2x1 | 2 | | ||
# | 960 544 or 544 960 | 129 | 6x1,3x2,2x3,1x6 | 6 | | ||
# | 960 544 or 544 960 | 129 | 4x1,2x2,1x4 | 4 | | ||
# | 960 544 or 544 960 | 129 | 3x1,1x3 | 3 | | ||
# | 960 544 or 544 960 | 129 | 1x2,2x1 | 2 | | ||
# | 832 624 or 624 832 | 129 | 4x1,2x2,1x4 | 4 | | ||
# | 624 832 or 624 832 | 129 | 3x1,1x3 | 3 | | ||
# | 832 624 or 624 832 | 129 | 2x1,1x2 | 2 | | ||
# | 720 720 | 129 | 1x5 | 5 | | ||
# | 720 720 | 129 | 3x1,1x3 | 3 | | ||
|
||
export TOKENIZERS_PARALLELISM=false | ||
|
||
export NPROC_PER_NODE=8 | ||
export ULYSSES_DEGREE=8 | ||
export RING_DEGREE=1 | ||
|
||
torchrun --nproc_per_node=$NPROC_PER_NODE sample_video.py \ | ||
--video-size 720 1280 \ | ||
--video-length 129 \ | ||
--infer-steps 50 \ | ||
--prompt "A cat walks on the grass, realistic style." \ | ||
--seed 42 \ | ||
--embedded-cfg-scale 6.0 \ | ||
--flow-shift 7.0 \ | ||
--flow-reverse \ | ||
--ulysses-degree=$ULYSSES_DEGREE \ | ||
--ring-degree=$RING_DEGREE \ | ||
--save-path ./results |