Training speed for transformer using SCST #165

liuaohanjsj · 2022-10-03T13:00:36Z

Hi,
I'm wondering the training speed of transformer using new-self-critical or SCST. Because during training the model should be inferenced and the inference speed of transformers are much slower than training. In RNN this should not be a problem, but I think that using the transformer the training would be much slower (I implemented a version my self and the training using RL was about 20x slower).
I'm curious about the training speed in your experiment. Do you have any suggestions?

ruotianluo · 2022-10-03T14:13:13Z

yes, it is much slower. I did add some optimization to speed it up a little, but didn't do any quantitative comparisons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training speed for transformer using SCST #165

Training speed for transformer using SCST #165

liuaohanjsj commented Oct 3, 2022

ruotianluo commented Oct 3, 2022

Training speed for transformer using SCST #165

Training speed for transformer using SCST #165

Comments

liuaohanjsj commented Oct 3, 2022

ruotianluo commented Oct 3, 2022