Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The model cannot be trained with multiple cards on a single machine #57

Open
langge52 opened this issue Nov 3, 2023 · 0 comments
Open

Comments

@langge52
Copy link

langge52 commented Nov 3, 2023

Dear author, hello

Thank you very much for sharing these codes.

The problem I am currently facing is that I am unable to train with multiple cards on a single machine. Due to the abandonment of torch. distributed. launch, I have attempted CUDA_ VISIBLE_ DIVICES=0,1,2,3 Python - m torch. distributed. run -- nnodes 1-- nproc_ Per_ Node 4 train.py -- config configs/demo. yaml; Torchrun train.py -- config configs/demo.yaml and other training commands cannot be trained, and there is no relevant log information output. Therefore, I would like to ask you for advice on how to solve this problem. Thank you very much and look forward to your reply. Thank you again.

@langge52 langge52 closed this as completed Nov 3, 2023
@langge52 langge52 reopened this Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant