Unable to train this on multiple GPU #24

fliptrail · 2020-05-03T13:23:14Z

Hello,
As the title suggests, I am unable to train this model on multiple gpu configuration. I am trying to train it on 4 RTX 2080 Ti.
It loads up the model only on the 1st GPU utilizing a memory of around 10.5 GB/11 GB
For the remaining GPU's, it is only utilizing a memory of 155 MB/11 GB.
Also, the training speed is independent of the number of GPU's selected by me using CUDA_VISIBLE_DEVICES. So, apparently it is only using the 1st GPU.
I tried diving in the code to find out the exact function multi_gpu_model, but everything seemed fine to me.
So, can you confirm or tell me how to train this implementation over multiple GPU's?

The text was updated successfully, but these errors were encountered:

fliptrail · 2020-05-03T15:40:18Z

I am encountering this exact issue on Tensorflow=2.0.0
tensorflow/tensorflow#30321
Possible solution is given here.

ParikhKadam · 2020-07-14T15:57:08Z

Yes.. the possible solution is in above mentioned link. Read more about "model parallelism vs data parallelism".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to train this on multiple GPU #24

Unable to train this on multiple GPU #24

fliptrail commented May 3, 2020

fliptrail commented May 3, 2020

ParikhKadam commented Jul 14, 2020

Unable to train this on multiple GPU #24

Unable to train this on multiple GPU #24

Comments

fliptrail commented May 3, 2020

fliptrail commented May 3, 2020

ParikhKadam commented Jul 14, 2020