Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I run a job with pytorch distributed training? #3

Open
chiyuzhang94 opened this issue Dec 19, 2023 · 1 comment
Open

Can I run a job with pytorch distributed training? #3

chiyuzhang94 opened this issue Dec 19, 2023 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@chiyuzhang94
Copy link

chiyuzhang94 commented Dec 19, 2023

Can I run a job with pytorch distributed training?
If I run this commend, does it work?
torchrun --nproc_per_node=$WORLD_SIZE --master_port=1234 newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

@andreeaiana
Copy link
Owner

You can run a job with PyTorch distributed training by changing the accelerator, strategy and devices number of the trainer. For example, you can use the ddp_config.

Alternatively, you can do this from command line as python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent trainer.accelerator=gpu trainer.strategy=ddp trainer.devices=4

@andreeaiana andreeaiana added the help wanted Extra attention is needed label Dec 22, 2023
Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jun 11, 2024
Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jun 24, 2024
Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants