Multi GPU (WIP) #169

nakosung · 2016-11-02T12:15:59Z

Multi GPU implementation(Multi tower, averaging gradient)

cleaned version of ibab#126

nakosung · 2016-11-02T12:16:21Z

I will squash commits after review completed. :)

lemonzi · 2016-11-02T15:40:59Z

@nakosung GitHub takes care of squashing during merge, don't worry about that :)

jyegerlehner

Thanks nakosung.

I don't see anything wrong with this. I'm assuming the somewhat complicated code for summing up gradients from the different gpus works. With a change this big it's probably better if someone else independently verifies that it works and hasn't broken anything before we merge. Can anyone confirm? I can check that it still works on single-gpu but that's as much as I would be able to do at the moment.

jyegerlehner · 2016-11-10T05:17:36Z

wavenet/model.py

@@ -3,21 +3,36 @@
 from .ops import causal_conv, mu_law_encode


-def create_variable(name, shape):
+def _create_variable(name, shape):


I thought the underscore was for member functions (methods) of classes. Am I wrong?

jyegerlehner · 2016-11-10T05:18:37Z

train.py


    # Set up logging for TensorBoard.
    writer = tf.train.SummaryWriter(logdir)
    writer.add_graph(tf.get_default_graph())
    run_metadata = tf.RunMetadata()
-    summaries = tf.merge_all_summaries()
+    summaries = tf.merge_summary(summaries)
+    #summaries = tf.merge_all_summaries()


delete commented-out line

astanway · 2016-12-26T03:35:47Z

FWIW, I've tried running this branch on an Amazon p2.8xlarge (with 8 GPUs) and I am finding that while all GPUs log full memory utilization, the volatile gpu-util from nvidia-smi stays at 0% across all of them and the training time per step is the same as with a single GPU (~3 seconds).

nakosung · 2016-12-26T05:18:11Z

@astanway Actually this branch does additional minibatch, so you should tweak your learning rate and batch size to determine whether this branch works or not.

akademi4eg · 2017-02-02T08:39:05Z

This PR is marked as WIP. Are there any things not finished yet? (besides conflicts resolving)
Also, have anyone had a chance to test this PR on multi GPU setup? @astanway @nakosung

weixsong · 2017-02-15T11:44:35Z

wavenet/model.py

    '''Create a bias variable with the specified name and shape and initialize
    it to zero.'''
    initializer = tf.constant_initializer(value=0.0, dtype=tf.float32)
    return tf.Variable(initializer(shape=shape), name)


+def _get_variable(name, shape):


is the create_embedding_table() function should be changed to tf.get_variable() too? I'm not quite sure, if the embedding table is also trained as model parameters, should it be made shared among GPU?

@weixsong Yes. This PR is out-dated. :)

weixsong · 2017-02-23T09:17:49Z

Hi @nakosung , did you have some model that trained by this multiple GPU code? Is the results the same good as single GPU?

nakosung · 2017-02-23T10:22:46Z

@weixsong I haven't generated samples with multi-node version, but I think it should not differ from single-node version because its loss seemed good enough.

nakosung and others added 9 commits October 7, 2016 21:43

Fix a bug related to output_gate w/biases

93e5e9e

cleaned version of ibab#126

Add random_crop

9a329b6

Add multi gpu tower

1f00d7b

Merge remote-tracking branch 'origin/master' into random_crop

a210c58

Make random crop optional

19d11a0

Use average loss

fad20de

wavenet_params

65c3f9e

Merge

ce0994b

Rollback random crop

60cbd0d

Nako Sung added 5 commits November 2, 2016 12:21

Make lint happier

f6ea81e

Rollback audio_reader.py

44f92f7

Make tests happier

2592516

Merge scopes

46d44db

Fetch per device

e835ca9

Bypass average_grads for 1 gpu

470d7b1

jyegerlehner reviewed Nov 10, 2016

View reviewed changes

akademi4eg mentioned this pull request Feb 10, 2017

why I just generate a piece of noisy audio #218

Closed

weixsong reviewed Feb 15, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU (WIP) #169

Multi GPU (WIP) #169

nakosung commented Nov 2, 2016

nakosung commented Nov 2, 2016

lemonzi commented Nov 2, 2016

jyegerlehner left a comment •

edited

Loading

jyegerlehner Nov 10, 2016

jyegerlehner Nov 10, 2016

astanway commented Dec 26, 2016

nakosung commented Dec 26, 2016

akademi4eg commented Feb 2, 2017

weixsong Feb 15, 2017

nakosung Feb 15, 2017

weixsong commented Feb 23, 2017

nakosung commented Feb 23, 2017

Multi GPU (WIP) #169

Are you sure you want to change the base?

Multi GPU (WIP) #169

Conversation

nakosung commented Nov 2, 2016

nakosung commented Nov 2, 2016

lemonzi commented Nov 2, 2016

jyegerlehner left a comment • edited Loading

Choose a reason for hiding this comment

jyegerlehner Nov 10, 2016

Choose a reason for hiding this comment

jyegerlehner Nov 10, 2016

Choose a reason for hiding this comment

astanway commented Dec 26, 2016

nakosung commented Dec 26, 2016

akademi4eg commented Feb 2, 2017

weixsong Feb 15, 2017

Choose a reason for hiding this comment

nakosung Feb 15, 2017

Choose a reason for hiding this comment

weixsong commented Feb 23, 2017

nakosung commented Feb 23, 2017

jyegerlehner left a comment •

edited

Loading