GPU memory increases with the number of GPUs used #370

DuanXiaoyue-LittleMoon · 2023-11-07T09:27:02Z

I'm training the base u-net using the 'accelerate' command provided in the repo (i.e., 'accelerate launch train.py')

I make sure that the batchsize of each GPU is 1. It is expected that no matter how many GPUs I use, as long as I make sure the batchsize of each GPU is 1, the memory usage of each GPU should be roughly the same.

However, I find that the more GPUs I use, the larger the memory usage of each GPU is, though I make sure that the batchsize of each GPU is 1.

For example, when I train the base u-net on a single GPU, the memory usage is:
[0] 19876 / 32510 MB

When I train it with 2 GPUs, the memory usage is:
[0] 23892 / 32510 MB
[1] 23732 / 32510 MB

When I train it with 3 GPUs, the memory usage is:
[0] 25132 / 32510 MB
[1] 24962 / 32510 MB
[2] 24962 / 32510 MB

When I train it with 8 GPUs, the memory usage is:
[0] 31176 / 32510 MB
[1] 31000 / 32510 MB
[2] 30930 / 32510 MB
[3] 30958 / 32510 MB
[4] 30940 / 32510 MB
[5] 30996 / 32510 MB
[6] 31070 / 32510 MB
[7] 30994 / 32510 MB

It would be greatly appreciated if someone could tell me why this is the case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory increases with the number of GPUs used #370

GPU memory increases with the number of GPUs used #370

DuanXiaoyue-LittleMoon commented Nov 7, 2023

GPU memory increases with the number of GPUs used #370

GPU memory increases with the number of GPUs used #370

Comments

DuanXiaoyue-LittleMoon commented Nov 7, 2023