Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9 days on 720 GPUs? #24

Open
jens321 opened this issue Feb 8, 2023 · 1 comment
Open

9 days on 720 GPUs? #24

jens321 opened this issue Feb 8, 2023 · 1 comment

Comments

@jens321
Copy link

jens321 commented Feb 8, 2023

In section 4.2 (on the VPT Foundation Model Training), the paper states that

Preliminary experiments suggested that our model could benefit from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efficient learning regime63 for that training duration (Appendix H), which took ∼9 days on 720 V100 GPUs.

Could you give some insight as to what required using this many GPUs? Did it have to do with data parallel, model parallel, or yet other reasons?

Thank you.

@Miffyli
Copy link
Collaborator

Miffyli commented Feb 8, 2023

Hey! You could try poking the authors with an email directly. I am not part of the authors, but my understanding is that they did it purely for data-parallel purposes; even the biggest VPT size fits into 32GB V100. With more GPUs they could shorten the training wall-clock time, so I guess they just used as many as they had available :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants