-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU (WIP) #169
base: master
Are you sure you want to change the base?
Multi GPU (WIP) #169
Conversation
cleaned version of ibab#126
I will squash commits after review completed. :) |
@nakosung GitHub takes care of squashing during merge, don't worry about that :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks nakosung.
I don't see anything wrong with this. I'm assuming the somewhat complicated code for summing up gradients from the different gpus works. With a change this big it's probably better if someone else independently verifies that it works and hasn't broken anything before we merge. Can anyone confirm? I can check that it still works on single-gpu but that's as much as I would be able to do at the moment.
@@ -3,21 +3,36 @@ | |||
from .ops import causal_conv, mu_law_encode | |||
|
|||
|
|||
def create_variable(name, shape): | |||
def _create_variable(name, shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the underscore was for member functions (methods) of classes. Am I wrong?
|
||
# Set up logging for TensorBoard. | ||
writer = tf.train.SummaryWriter(logdir) | ||
writer.add_graph(tf.get_default_graph()) | ||
run_metadata = tf.RunMetadata() | ||
summaries = tf.merge_all_summaries() | ||
summaries = tf.merge_summary(summaries) | ||
#summaries = tf.merge_all_summaries() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented-out line
FWIW, I've tried running this branch on an Amazon p2.8xlarge (with 8 GPUs) and I am finding that while all GPUs log full memory utilization, the |
@astanway Actually this branch does additional minibatch, so you should tweak your learning rate and batch size to determine whether this branch works or not. |
'''Create a bias variable with the specified name and shape and initialize | ||
it to zero.''' | ||
initializer = tf.constant_initializer(value=0.0, dtype=tf.float32) | ||
return tf.Variable(initializer(shape=shape), name) | ||
|
||
|
||
def _get_variable(name, shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the create_embedding_table() function should be changed to tf.get_variable() too? I'm not quite sure, if the embedding table is also trained as model parameters, should it be made shared among GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weixsong Yes. This PR is out-dated. :)
Hi @nakosung , did you have some model that trained by this multiple GPU code? Is the results the same good as single GPU? |
@weixsong I haven't generated samples with multi-node version, but I think it should not differ from single-node version because its loss seemed good enough. |
Multi GPU implementation(Multi tower, averaging gradient)