Huge spike in memory usage during initialization #13

manjunaths · 2018-09-18T08:48:52Z

While trying to train with resnext_101_64x4d taken from the fastai/fastai/models repository, and using FP32, there is a huge spike in memory usage. I observed that if I reduce the batch size from 512 to something like 64 or 32 (this is based on the network being trained) then the training goes through but it is a lot slower than if the batch size were 512.

At the start of the training, probably after the first batch, the memory usage goes down drastically. For example here is a capture of memory usage for one of the GPUs in the machine, for batch size set to 32, with everything else being the same for resnext_101_64x4d.

Memory usage in MiB
73
73
81
122
508
680
1084
1094
1094
1094
1094
3048
14024
13976
4776
9356
4696
1706
3110
5170
5596
5596

Note that the same thing repeats for all the 8 GPUs except for a small variations in the values.
As can be seen from above with 32 batch size the memory occupancy only 5596 MiB during the rest of the training (up to 14 epochs, then due to the size it changes). The rest of the memory is unused.

If it is possible to reduce this initial spike in memory usage or if there is someway to change the batch size to a bigger batch size once the training reaches a stable state, it would make the training a lot faster. I tried setting up a bigger batch size from epoch 2 by adding an additional phase with sz : 128 and bs : 512 but it doesn't seem to work for some reason.

Thanks for this amazing work.

yaroslavvb · 2018-09-18T15:35:48Z

Hm, could the spike be somehow connected to data preloading? @bearpelican

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge spike in memory usage during initialization #13

Huge spike in memory usage during initialization #13

manjunaths commented Sep 18, 2018

yaroslavvb commented Sep 18, 2018

Huge spike in memory usage during initialization #13

Huge spike in memory usage during initialization #13

Comments

manjunaths commented Sep 18, 2018

yaroslavvb commented Sep 18, 2018