Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a batch normalizer that has test and train hooks #11

Merged
merged 6 commits into from
Apr 29, 2016

Conversation

jrock08
Copy link
Collaborator

@jrock08 jrock08 commented Apr 21, 2016

No description provided.

A suggested way to do this is to make execution of the
model optimizer force it, e.g., by:

update_assignments = tf.group(bn1.get_assigner(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost get this part but not quite 😜 How does it relate to the optimizer?

Copy link
Collaborator

@kevjshih kevjshih Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's a mistake in this example here? The line optimizer = tf.group(update_assignments) doesn't make much sense to me, especially considering this is the second grouping of the updates. Also, I assume we'd just run

# skipping quite a bit
loss = create_loss_graph()
train = optimizer.minimize(loss)
with sess.as_default():
     # get the update assignments and batch normalize
     ...
     update_assignments.run()
     train.run(batch_feed_stuff)

I know the documentation for the moving average also calls tf.group twice for some reason. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, what this is saying is that you should make it so that the optimizer has to run before the assigners run, then rather than running the optimizer, you should run the assigners, which will automatically run the optimizer. I think the second group can probably be replaced with an identity. But I'm not 100% sure (see below)

From the doc https://www.tensorflow.org/versions/r0.8/api_docs/python/framework.html#Graph.control_dependencies

# WRONG
def my_func(pred, tensor):
  t = tf.matmul(tensor, tensor)
  with tf.control_dependencies([pred]):
    # The matmul op is created outside the context, so no control
    # dependency will be added.
    return t
# RIGHT
def my_func(pred, tensor):
  with tf.control_dependencies([pred]):
    # The matmul op is created in the context, so a control dependency
    # will be added.
    return tf.matmul(tensor, tensor)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so optimizer is just a reference to the optimizer, and now that it's run by control_dependencies, it's ok to re-assign it to the updater ops?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically what it's saying is

opt = trainer.minimize(loss)
with tf.control_dependencies([opt]):
  new_opt = tf.group(<all batch norm updates>)

So, new_opt is actually the batch norm updates, but it calls the minimizer when you run it.

@tomlepaine
Copy link
Member

tomlepaine commented Apr 21, 2016

Nice code. But it seems like it requires the user to know things and I wish it didn’t.

Maybe the assignment stuff which I don’t quite understand makes this hard.

I like the class interface, but instead of the user doing:

ewma = tf.train.ExponentialMovingAverage(decay=0.99)
bn = BatchNormalizer(input, 0.001, ewma, True)
update_assignments = bn.get_assigner()
x = bn.normalize(y, train=training)

Can the user do:

x, update_assignments = batch_normalize(input, phrase_train, name)

And batch_normalize takes care of everything else?

ewma = tf.train.ExponentialMovingAverage(decay=0.99)
bn = BatchNormalizer(input, 0.001, ewma, True)
update_assignments = bn.get_assigner()
x = bn.normalize(y, train=training?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stray ? on this line.

@kevjshih
Copy link
Collaborator

Is it usual to include weight decay for bn parameters?

"""

def __init__(self, input, epsilon, ewma_trainer, name):
rank = len(input.get_shape().as_list())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not rank = len(input.get_shape()) ?
I tried it, works fine.

Conflicts:
	aiutils/tftools/batch_normalizer.py
@tomlepaine
Copy link
Member

So I'm pretty happy with the final interface, but I wish the class wasn't so big 😿 . I guess we can leave that as an exercise for future motivated collaborators. TODO: Shrink the implementation while maintaining the same behavior and interface.

So LGTM 🎉 (once you get the tests to pass).

@jrock08 jrock08 merged commit 8bb396b into aiUIUC:master Apr 29, 2016
@jrock08 jrock08 deleted the batchnorm_redux branch April 29, 2016 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants