Training is much slower than you described in paper. #8

zhaone · 2021-01-03T10:25:00Z

Hi, I recently want to reproduce your result and can get the metric your described in paper but I got a problems that the training (almost 3 days) than you described in paper (less than 12 hours).

Environment:

4 * Titan X (same as paper)
batch size 128 (4*32, same as paper)
change distribution framework from horovod to pytorch DDP since thehorovod framework is really hard to set up (even with official horovod docker I still got some errors I can't resolve)

Did I do something wrong? I'm sure that I use DDP correctly and also sure that the bottleneck of training speed is optimization (not IO or something else). Have others met the same problems like me?

The text was updated successfully, but these errors were encountered:

MasterIzumi · 2021-02-08T06:50:04Z

@zhaone Hi, have you found the reason?
I just tried to train the network using the default settings, and I also found the training is around twice as slow as the paper described. It cost 14 hours for 20 epochs (11.5 hours for 36 epochs in the paper).

Here's my environment: 4 * Titan RTX, batch size 128 (4*32), distributed training using Horovod.

Btw, one more thing I notice is that my log shows one epoch takes over 2440 while ~900 in the provided log file, and in #2 they report ~1200 (4 * RTX2080Ti). But the evaluation results are similar.

Here's my training log:

Epoch 20.000, lr 0.00100, time 2440.21 
loss 0.5037 0.2001 0.3036, ade1 1.6102, fde1 3.5928, ade 0.7662, fde 1.1754

Provided log file:

Epoch 20.000, lr 0.00100, time 872.52
loss 0.5018 0.2001 0.3016, ade1 1.5967, fde1 3.5560, ade 0.7638, fde 1.1651

zhaone · 2021-03-01T03:25:21Z

No, I have not solved this problem yet, but your speed is not so ridiculously slow compared with mine (3 times slower than yours). Have you checked where the speed bottleneck is? for example IO?

chenyuntc · 2021-04-30T02:13:16Z

Make sure you use preprocessed data. Otherwise, io and preprocessing is a heavy load.
watch nvidia-smi or watch gpustat to see the gpu utilization while running code. The utilization is usually above 80%.
htop to see the cpu utilization, make sure you have sufficient cpu resource.

wuhaoran111 · 2022-05-16T11:32:57Z

@MasterIzumi i have the same question. And when i use free -h, i see that the memory are exhausted. As i have 128G memory with 4 Titan XP GPU, i think it may use too much memory in the code ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training is much slower than you described in paper. #8

Training is much slower than you described in paper. #8

zhaone commented Jan 3, 2021 •

edited

Loading

MasterIzumi commented Feb 8, 2021

zhaone commented Mar 1, 2021 •

edited

Loading

chenyuntc commented Apr 30, 2021

wuhaoran111 commented May 16, 2022

Training is much slower than you described in paper. #8

Training is much slower than you described in paper. #8

Comments

zhaone commented Jan 3, 2021 • edited Loading

MasterIzumi commented Feb 8, 2021

zhaone commented Mar 1, 2021 • edited Loading

chenyuntc commented Apr 30, 2021

wuhaoran111 commented May 16, 2022

zhaone commented Jan 3, 2021 •

edited

Loading

zhaone commented Mar 1, 2021 •

edited

Loading