You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We used CUDA_VISIBLE_DEVICES=0,4,2,6,1,5,3,7 to work around a bug in NCCL that causes NIC port usage conflict at a specific tensor-parallel size. We fixed the bug and using the ascending order mapping should yield the same performance.
We used CUDA_VISIBLE_DEVICES=0,4,2,6,1,5,3,7 to work around a bug in NCCL that causes NIC port usage conflict at a specific tensor-parallel size. We fixed the bug and using the ascending order mapping should yield the same performance.
@erhoo82 Thank you for your reply. Can you explain why used CUDA_VISIBLE_DEVICES=0,4,2,6,1,5,3,7 can work around it? or is there a PR related to NCCL repair? Thank you.
Why is the value of
CUDA_VISIBLE_DEVICES
not configured in ascending order? For example,CUDA_VISIBLE-DEVICES=0,1,2,3,4,5,6,7
better suited for PXN?training_results_v3.1/Azure+NVIDIA/benchmarks/gpt3/implementations/pytorch/config_common.sh
Line 1 in 5b62935
The text was updated successfully, but these errors were encountered: