You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper says the algorithm has been trained with 8 A100 GPUs.
I am having two instances, each equipped with 4 A100s instead of one GPU instance with 8 A100 GPUs.
Is there any way to specify the instances in the configurations? In another word, where can I specify the number of nodes in the code? https://lightning.ai/docs/pytorch/stable/common/trainer.html#num-nodes
I would do appreciate if you could give a comment on these.
Update:
I added number of nodes to the training process and sent a pull request. In case of being accepted, this issue will be closed.
The text was updated successfully, but these errors were encountered:
CheruscanArminius
changed the title
Unable to train the algorithms with 2 GPU instances, each with 4 A100s
Unable to train the algorithms with 2 GPU instances (multi-node), each with 4 A100s
Jan 5, 2024
The paper says the algorithm has been trained with 8 A100 GPUs.
I am having two instances, each equipped with 4 A100s instead of one GPU instance with 8 A100 GPUs.
Is there any way to specify the instances in the configurations? In another word, where can I specify the number of nodes in the code?
https://lightning.ai/docs/pytorch/stable/common/trainer.html#num-nodes
I would do appreciate if you could give a comment on these.
Update:
I added number of nodes to the training process and sent a pull request. In case of being accepted, this issue will be closed.
The text was updated successfully, but these errors were encountered: