-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of learning rate decay #3
Comments
We don't know if this causes any significant differences but you can
check through experiments as part of the project
…On 2020-03-22 11:13, djshen wrote:
Currently, the learning rate decay happens after each iteration and
the update rule is
lr = config.lr/(1 + args.lr_decay*step)
So, the learning rate of step 0 and 1 will be the same value
config.lr.
Is this the expected behavior? Or, the following is correct
lr = config.lr/(1 + args.lr_decay*(step+1))
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
"potentialAction": { ***@***.***": "ViewAction", "target":
"#3", "url":
"#3", "name": "View Issue" },
"description": "View this Issue on GitHub", "publisher": { ***@***.***":
"Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
------
[1] #3
[2]
https://github.com/notifications/unsubscribe-auth/ABI3BHTBNEU2VUMHVHKYOFLRIV65DANCNFSM4LREQZ3A
|
If I use |
The following is a simple example. import tensorflow as tf
lr_init = 0.1
lr_decay = 0.1
lr_keras = tf.keras.optimizers.schedules.InverseTimeDecay(
initial_learning_rate=lr_init,
decay_steps=1,
decay_rate=lr_decay)
lr_simplenn = lr_init
for step in range(11):
# keras get the learning rate "before" a batch
lr_keras_value = lr_keras(step)
print('Step {:2d}: train one batch with lr_keras {:6f} and lr_simplenn {:6f}'.format(
step, lr_keras_value, lr_simplenn))
# simpleNN update the learning "after" a batch
lr_simplenn = lr_init / (1 + lr_decay * step) The output is
If I change
With this modification, I can get exactly the same loss values between simpleNN and its keras counterpart, where I replace almost everything in simpleNN with tf.keras. |
Currently, the learning rate decay happens after each iteration and the update rule is
So, the learning rate of step 0 and 1 will be the same value
config.lr
.Is this the expected behavior? Or, the following is correct
The text was updated successfully, but these errors were encountered: