Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory increases with the number of GPUs used #370

Open
DuanXiaoyue-LittleMoon opened this issue Nov 7, 2023 · 0 comments
Open

GPU memory increases with the number of GPUs used #370

DuanXiaoyue-LittleMoon opened this issue Nov 7, 2023 · 0 comments

Comments

@DuanXiaoyue-LittleMoon
Copy link

I'm training the base u-net using the 'accelerate' command provided in the repo (i.e., 'accelerate launch train.py')

I make sure that the batchsize of each GPU is 1. It is expected that no matter how many GPUs I use, as long as I make sure the batchsize of each GPU is 1, the memory usage of each GPU should be roughly the same.

However, I find that the more GPUs I use, the larger the memory usage of each GPU is, though I make sure that the batchsize of each GPU is 1.

For example, when I train the base u-net on a single GPU, the memory usage is:
[0] 19876 / 32510 MB

When I train it with 2 GPUs, the memory usage is:
[0] 23892 / 32510 MB
[1] 23732 / 32510 MB

When I train it with 3 GPUs, the memory usage is:
[0] 25132 / 32510 MB
[1] 24962 / 32510 MB
[2] 24962 / 32510 MB

When I train it with 8 GPUs, the memory usage is:
[0] 31176 / 32510 MB
[1] 31000 / 32510 MB
[2] 30930 / 32510 MB
[3] 30958 / 32510 MB
[4] 30940 / 32510 MB
[5] 30996 / 32510 MB
[6] 31070 / 32510 MB
[7] 30994 / 32510 MB

It would be greatly appreciated if someone could tell me why this is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant