Behavior of learning rate decay #3

djshen · 2020-03-22T03:13:07Z

Currently, the learning rate decay happens after each iteration and the update rule is

lr = config.lr/(1 + args.lr_decay*step)

So, the learning rate of step 0 and 1 will be the same value config.lr.
Is this the expected behavior? Or, the following is correct

lr = config.lr/(1 + args.lr_decay*(step+1))

The text was updated successfully, but these errors were encountered:

cjlin1 · 2020-03-22T05:37:23Z

We don't know if this causes any significant differences but you can check through experiments as part of the project

…

On 2020-03-22 11:13, djshen wrote: Currently, the learning rate decay happens after each iteration and the update rule is lr = config.lr/(1 + args.lr_decay*step) So, the learning rate of step 0 and 1 will be the same value config.lr. Is this the expected behavior? Or, the following is correct lr = config.lr/(1 + args.lr_decay*(step+1)) -- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#3", "url": "#3", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] #3 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHTBNEU2VUMHVHKYOFLRIV65DANCNFSM4LREQZ3A

djshen · 2020-03-22T08:55:15Z

If I use tf.keras.optimizers.schedules.InverseTimeDecay in my code, I need to modify either simpleNN or TensorFlow to get exactly "the same results".

djshen · 2020-03-22T09:20:19Z

The following is a simple example.

import tensorflow as tf


lr_init = 0.1
lr_decay = 0.1

lr_keras = tf.keras.optimizers.schedules.InverseTimeDecay(
        initial_learning_rate=lr_init,
        decay_steps=1,
        decay_rate=lr_decay)
lr_simplenn = lr_init

for step in range(11):
        # keras get the learning rate "before" a batch
        lr_keras_value = lr_keras(step)
        print('Step {:2d}: train one batch with lr_keras {:6f} and lr_simplenn {:6f}'.format(
                step, lr_keras_value, lr_simplenn))
        # simpleNN update the learning "after" a batch
        lr_simplenn = lr_init / (1 + lr_decay * step)

The output is

Step  0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step  1: train one batch with lr_keras 0.090909 and lr_simplenn 0.100000
Step  2: train one batch with lr_keras 0.083333 and lr_simplenn 0.090909
Step  3: train one batch with lr_keras 0.076923 and lr_simplenn 0.083333
Step  4: train one batch with lr_keras 0.071429 and lr_simplenn 0.076923
Step  5: train one batch with lr_keras 0.066667 and lr_simplenn 0.071429
Step  6: train one batch with lr_keras 0.062500 and lr_simplenn 0.066667
Step  7: train one batch with lr_keras 0.058824 and lr_simplenn 0.062500
Step  8: train one batch with lr_keras 0.055556 and lr_simplenn 0.058824
Step  9: train one batch with lr_keras 0.052632 and lr_simplenn 0.055556
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.052632

If I change step to (step + 1) in the last line, the output will be

Step  0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step  1: train one batch with lr_keras 0.090909 and lr_simplenn 0.090909
Step  2: train one batch with lr_keras 0.083333 and lr_simplenn 0.083333
Step  3: train one batch with lr_keras 0.076923 and lr_simplenn 0.076923
Step  4: train one batch with lr_keras 0.071429 and lr_simplenn 0.071429
Step  5: train one batch with lr_keras 0.066667 and lr_simplenn 0.066667
Step  6: train one batch with lr_keras 0.062500 and lr_simplenn 0.062500
Step  7: train one batch with lr_keras 0.058824 and lr_simplenn 0.058824
Step  8: train one batch with lr_keras 0.055556 and lr_simplenn 0.055556
Step  9: train one batch with lr_keras 0.052632 and lr_simplenn 0.052632
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.050000

With this modification, I can get exactly the same loss values between simpleNN and its keras counterpart, where I replace almost everything in simpleNN with tf.keras.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behavior of learning rate decay #3

Behavior of learning rate decay #3

djshen commented Mar 22, 2020

cjlin1 commented Mar 22, 2020 via email

djshen commented Mar 22, 2020 •

edited

Loading

djshen commented Mar 22, 2020

Behavior of learning rate decay #3

Behavior of learning rate decay #3

Comments

djshen commented Mar 22, 2020

cjlin1 commented Mar 22, 2020 via email

djshen commented Mar 22, 2020 • edited Loading

djshen commented Mar 22, 2020

djshen commented Mar 22, 2020 •

edited

Loading