What is the specific formula for learning rate used in adam optimizer during pre-train? #29

ShuGao0810 · 2018-10-18T06:31:56Z

In your paper, the learning rate used in adam optimizer during pre-train is described as follows:
'We used the Adam optimization scheme [27] with a max learning rate of 2.5e-4. The learning rate was increased linearly from zero over the first 2000 updates and annealed to 0 using a cosine schedule.'
but what is the specific formula for this learning rate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the specific formula for learning rate used in adam optimizer during pre-train? #29

What is the specific formula for learning rate used in adam optimizer during pre-train? #29

ShuGao0810 commented Oct 18, 2018

What is the specific formula for learning rate used in adam optimizer during pre-train? #29

What is the specific formula for learning rate used in adam optimizer during pre-train? #29

Comments

ShuGao0810 commented Oct 18, 2018