Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nccl-plugin and rxdm versions and add --install-nccl to prolog #235

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

akiki-liang0
Copy link

  • Update NCCL and RxDM image to latest recommended versions (v1.0.7, and v1.0.13)
  • add --install-nccl flag to libnccl installation command. This installs newer version of libnccl (2.21.5), which is compatible with the NCCL and RxDM images

Tests performed:

  • run llama3 training on A3-Mega with NCCL v1.0.7 and RxDM v1.0.13 (without --install-nccl flag in prolog script): results in segmentation faults
  • run llama3 training on A3-Mega with NCCL v1.0.7 and RxDM v1.0.13 (with --install-nccl flag in prolog script): results in working run

@akiki-liang0 akiki-liang0 changed the title Update nccl-plugin and rxdm versions Update nccl-plugin and rxdm versions and add --install-nccl to prolog Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant