-
Notifications
You must be signed in to change notification settings - Fork 404
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #106 from OutstanderWang/main
Fix the repeated saving for xDiT and float point exception
- Loading branch information
Showing
3 changed files
with
49 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -198,7 +198,7 @@ cd HunyuanVideo | |
We provide an `environment.yml` file for setting up a Conda environment. | ||
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html). | ||
|
||
We recommend CUDA versions 11.8 and 12.0+. | ||
We recommend CUDA versions 12.4 or 11.8 for the manual installation. | ||
|
||
```shell | ||
# 1. Prepare conda environment | ||
|
@@ -212,19 +212,33 @@ python -m pip install -r requirements.txt | |
|
||
# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
python -m pip install ninja | ||
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.5.9.post1 | ||
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3 | ||
``` | ||
|
||
Additionally, HunyuanVideo also provides a pre-built Docker image. Use the following command to pull and run the docker image. | ||
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions: | ||
|
||
```shell | ||
# For CUDA 11 | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_11 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | ||
# Option 1: Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image). | ||
pip install nvidia-cublas-cu12==12.4.5.8 | ||
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/ | ||
|
||
# Option 2: Forcing to explictly use the CUDA 11.8 compiled version of Pytorch and all the other packages | ||
pip uninstall -r requirements.txt # uninstall all packages | ||
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 | ||
pip install -r requirements.txt | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
``` | ||
|
||
Additionally, HunyuanVideo also provides a pre-built Docker image. Use the following command to pull and run the docker image. | ||
|
||
# For CUDA 12 | ||
```shell | ||
# For CUDA 12.4 (updated to avoid float point exception) | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_12 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 | ||
|
||
# For CUDA 11.8 | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_11 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | ||
``` | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -189,7 +189,7 @@ cd HunyuanVideo | |
|
||
我们提供了 `environment.yml` 文件来设置 Conda 环境。Conda 的安装指南可以参考[这里](https://docs.anaconda.com/free/miniconda/index.html)。 | ||
|
||
我们推理使用 CUDA 11.8 或 12.0+ 的版本。 | ||
我们推理使用 CUDA 12.4 或 11.8 的版本。 | ||
|
||
```shell | ||
# 1. Prepare conda environment | ||
|
@@ -203,18 +203,32 @@ python -m pip install -r requirements.txt | |
|
||
# 4. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | ||
python -m pip install ninja | ||
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.5.9.post1 | ||
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3 | ||
``` | ||
|
||
另外,我们提供了一个预构建的 Docker 镜像,可以使用如下命令进行拉取和运行。 | ||
如果在特定GPU型号上遭遇float point exception(core dump)问题,可尝试以下方案修复: | ||
|
||
```shell | ||
# 用于 CUDA 11 | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_11 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | ||
#选项1:确保已正确安装CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00(或直接使用我们提供的CUDA12镜像) | ||
pip install nvidia-cublas-cu12==12.4.5.8 | ||
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/ | ||
|
||
#选项2:强制显式使用CUDA11.8编译的Pytorch版本以及其他所有软件包 | ||
pip uninstall -r requirements.txt # 确保卸载所有依赖包 | ||
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 | ||
pip install -r requirements.txt | ||
python -m pip install git+https://github.com/Dao-AILab/[email protected] | ||
``` | ||
|
||
# 用于 CUDA 12 | ||
另外,我们提供了一个预构建的 Docker 镜像,可以使用如下命令进行拉取和运行。 | ||
```shell | ||
# 用于CUDA 12.4 (已更新避免float point exception) | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_12 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 | ||
|
||
# 用于CUDA 11.8 | ||
docker pull hunyuanvideo/hunyuanvideo:cuda_11 | ||
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | ||
``` | ||
|
||
## 🧱 下载预训练模型 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters