-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a batch normalizer that has test and train hooks #11
Conversation
A suggested way to do this is to make execution of the | ||
model optimizer force it, e.g., by: | ||
|
||
update_assignments = tf.group(bn1.get_assigner(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost get this part but not quite 😜 How does it relate to the optimizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's a mistake in this example here? The line optimizer = tf.group(update_assignments)
doesn't make much sense to me, especially considering this is the second grouping of the updates. Also, I assume we'd just run
# skipping quite a bit
loss = create_loss_graph()
train = optimizer.minimize(loss)
with sess.as_default():
# get the update assignments and batch normalize
...
update_assignments.run()
train.run(batch_feed_stuff)
I know the documentation for the moving average also calls tf.group twice for some reason. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically, what this is saying is that you should make it so that the optimizer has to run before the assigners run, then rather than running the optimizer, you should run the assigners, which will automatically run the optimizer. I think the second group can probably be replaced with an identity. But I'm not 100% sure (see below)
From the doc https://www.tensorflow.org/versions/r0.8/api_docs/python/framework.html#Graph.control_dependencies
# WRONG
def my_func(pred, tensor):
t = tf.matmul(tensor, tensor)
with tf.control_dependencies([pred]):
# The matmul op is created outside the context, so no control
# dependency will be added.
return t
# RIGHT
def my_func(pred, tensor):
with tf.control_dependencies([pred]):
# The matmul op is created in the context, so a control dependency
# will be added.
return tf.matmul(tensor, tensor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so optimizer is just a reference to the optimizer, and now that it's run by control_dependencies, it's ok to re-assign it to the updater ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically what it's saying is
opt = trainer.minimize(loss)
with tf.control_dependencies([opt]):
new_opt = tf.group(<all batch norm updates>)
So, new_opt is actually the batch norm updates, but it calls the minimizer when you run it.
Nice code. But it seems like it requires the user to know things and I wish it didn’t. Maybe the assignment stuff which I don’t quite understand makes this hard. I like the class interface, but instead of the user doing: ewma = tf.train.ExponentialMovingAverage(decay=0.99)
bn = BatchNormalizer(input, 0.001, ewma, True)
update_assignments = bn.get_assigner()
x = bn.normalize(y, train=training) Can the user do: x, update_assignments = batch_normalize(input, phrase_train, name) And |
ewma = tf.train.ExponentialMovingAverage(decay=0.99) | ||
bn = BatchNormalizer(input, 0.001, ewma, True) | ||
update_assignments = bn.get_assigner() | ||
x = bn.normalize(y, train=training?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a stray ?
on this line.
Is it usual to include weight decay for bn parameters? |
""" | ||
|
||
def __init__(self, input, epsilon, ewma_trainer, name): | ||
rank = len(input.get_shape().as_list()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not rank = len(input.get_shape())
?
I tried it, works fine.
a1eb434
to
0cbf633
Compare
Conflicts: aiutils/tftools/batch_normalizer.py
So I'm pretty happy with the final interface, but I wish the class wasn't so big 😿 . I guess we can leave that as an exercise for future motivated collaborators. TODO: Shrink the implementation while maintaining the same behavior and interface. So LGTM 🎉 (once you get the tests to pass). |
No description provided.