-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
两块A100运行时一直不动 #111
Comments
具体卡在哪? --ulysses-degree 2 \ --ring-degree 1 能否将ulysses degree设置成2试试 |
应该是我内存不够,内存只有90GB🤡抱歉了 |
单卡内存只有40GB么? |
不是,单卡80GB(显存) 两张A100 但是这个是在集群里跑的 我申请的机器内存就90GB 不知道够不够😳 |
|
和CPU内存无关,要看你的单张GPU的内存是多少。 |
按照文档中“🚀 xDiT 在多个 GPU 上进行并行推理部分”进行安装,运行的命令是
torchrun --nproc_per_node=2 sample_video.py \ --video-size 960 960 \ --video-length 129 \ --infer-steps 20 \ --prompt "A cat walks on the grass, realistic style." \ --flow-reverse \ --seed -1 \ --ulysses-degree 1 \ --ring-degree 2 \ --save-path ./results
前面一直正常(没报错)但一直卡在 0%| | 0/20 [00:00<?, ?it/s 这步看了下显卡情况是+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:AD:00.0 Off | 0 |
| N/A 43C P0 101W / 300W | 47373MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe Off | 00000000:AF:00.0 Off | 0 |
| N/A 42C P0 100W / 300W | 46875MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
不知道怎么解决
The text was updated successfully, but these errors were encountered: