Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of learning rate decay #3

Open
djshen opened this issue Mar 22, 2020 · 3 comments
Open

Behavior of learning rate decay #3

djshen opened this issue Mar 22, 2020 · 3 comments

Comments

@djshen
Copy link

djshen commented Mar 22, 2020

Currently, the learning rate decay happens after each iteration and the update rule is

lr = config.lr/(1 + args.lr_decay*step)

So, the learning rate of step 0 and 1 will be the same value config.lr.
Is this the expected behavior? Or, the following is correct

lr = config.lr/(1 + args.lr_decay*(step+1))
@cjlin1
Copy link
Owner

cjlin1 commented Mar 22, 2020 via email

@djshen
Copy link
Author

djshen commented Mar 22, 2020

If I use tf.keras.optimizers.schedules.InverseTimeDecay in my code, I need to modify either simpleNN or TensorFlow to get exactly "the same results".

@djshen
Copy link
Author

djshen commented Mar 22, 2020

The following is a simple example.

import tensorflow as tf


lr_init = 0.1
lr_decay = 0.1

lr_keras = tf.keras.optimizers.schedules.InverseTimeDecay(
        initial_learning_rate=lr_init,
        decay_steps=1,
        decay_rate=lr_decay)
lr_simplenn = lr_init

for step in range(11):
        # keras get the learning rate "before" a batch
        lr_keras_value = lr_keras(step)
        print('Step {:2d}: train one batch with lr_keras {:6f} and lr_simplenn {:6f}'.format(
                step, lr_keras_value, lr_simplenn))
        # simpleNN update the learning "after" a batch
        lr_simplenn = lr_init / (1 + lr_decay * step)

The output is

Step  0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step  1: train one batch with lr_keras 0.090909 and lr_simplenn 0.100000
Step  2: train one batch with lr_keras 0.083333 and lr_simplenn 0.090909
Step  3: train one batch with lr_keras 0.076923 and lr_simplenn 0.083333
Step  4: train one batch with lr_keras 0.071429 and lr_simplenn 0.076923
Step  5: train one batch with lr_keras 0.066667 and lr_simplenn 0.071429
Step  6: train one batch with lr_keras 0.062500 and lr_simplenn 0.066667
Step  7: train one batch with lr_keras 0.058824 and lr_simplenn 0.062500
Step  8: train one batch with lr_keras 0.055556 and lr_simplenn 0.058824
Step  9: train one batch with lr_keras 0.052632 and lr_simplenn 0.055556
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.052632

If I change step to (step + 1) in the last line, the output will be

Step  0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step  1: train one batch with lr_keras 0.090909 and lr_simplenn 0.090909
Step  2: train one batch with lr_keras 0.083333 and lr_simplenn 0.083333
Step  3: train one batch with lr_keras 0.076923 and lr_simplenn 0.076923
Step  4: train one batch with lr_keras 0.071429 and lr_simplenn 0.071429
Step  5: train one batch with lr_keras 0.066667 and lr_simplenn 0.066667
Step  6: train one batch with lr_keras 0.062500 and lr_simplenn 0.062500
Step  7: train one batch with lr_keras 0.058824 and lr_simplenn 0.058824
Step  8: train one batch with lr_keras 0.055556 and lr_simplenn 0.055556
Step  9: train one batch with lr_keras 0.052632 and lr_simplenn 0.052632
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.050000

With this modification, I can get exactly the same loss values between simpleNN and its keras counterpart, where I replace almost everything in simpleNN with tf.keras.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants