Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging AudioReader, TextReader and ImageReader #166

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

Zeta36
Copy link

@Zeta36 Zeta36 commented Oct 26, 2016

As asked in #162 and in #117, I've made a merging from the AudioReader, the TextReader and the ImageReader so anybody can train and generate not only wav files but also texts and images (#117 and #129).

Changes are transparent, so the only thing it has to be done is change two new parameters in the wavenet_params.json:
{
....
"raw_type": "Audio",
"file_ext": "*.wav"
}

In this way, you can decide to train the model in texts by just copying a folder with texts and setting later the params in wavenet_params.json to:
{
....
"raw_type": "Text",
"file_ext": "*.txt"
}

For image testing just:
{
....
"raw_type": "Image",
"file_ext": "*.jpg"
}

The file_ext can be change to any pattern like ".gif", ".mp3", etc.

In the generation time the only change is that I've change the args param --wav_out_path to --file_out_path. Nothing else.

I've tested everything in my machine and it's all fine. I don't know if these changes are going to pass the test cases. Anyway, I think this is a good thing for the team.

Somebody could help me with this by doing two things:

  1. Change test case if this PR doesn't pass it due to an old test case implementation problem.
  2. Change the Readme file and explain a little better the new changes and how to set the two new params in wavenet_params.json.

Thank you!!

Regards,
Samu.

@Zeta36
Copy link
Author

Zeta36 commented Oct 26, 2016

Merging from the AudioReader, the TextReader and the ImageReader so anybody can train and generate not only wav files but also texts and images (#117 and #129).

@jyegerlehner
Copy link
Contributor

Regarding the test failures: you can run these tests locally on your own machine with:

sh ci/test.sh

@Zeta36
Copy link
Author

Zeta36 commented Oct 26, 2016

@jyegerlehner , yes I know it. But I have no too much time. I did my best this morning (in Spain) trying to merge all the readers. That's why I asked somebody to help me with this two simple ways:

  1. Changing the test case if this PR doesn't pass it due to an old test case implementation problem.
  2. And changing the Readme file and explain a little better the new changes and how to set the two new params in wavenet_params.json.

I think you could help me with this @jyegerlehner. I'd be very grateful.

Regards.

@jyegerlehner
Copy link
Contributor

jyegerlehner commented Oct 26, 2016

@Zeta36 I also have demands on my time. There's no rush. You can get to it whenever you can get around to it.

@Zeta36
Copy link
Author

Zeta36 commented Oct 26, 2016

I've passed all the tests in Python 2.7, but I don't know why I have an error in Python 3.5 (https://travis-ci.org/ibab/tensorflow-wavenet/builds/170878554):

In the four test files I have the next error:

 File "/home/travis/build/ibab/tensorflow-wavenet/test/test_model.py", line 11, in <module>

    from wavenet import (WaveNetModel, time_to_batch, batch_to_time, causal_conv,

  File "/home/travis/build/ibab/tensorflow-wavenet/wavenet/__init__.py", line 1, in <module>

    from .model import WaveNetModel

  File "/home/travis/build/ibab/tensorflow-wavenet/wavenet/model.py", line 3, in <module>

    from .ops import causal_conv

  File "/home/travis/build/ibab/tensorflow-wavenet/wavenet/ops.py", line 5, in <module>

    import audio_reader

ImportError: No module named 'audio_reader'

Somebody knows what can it be? (In python 2.7 all works well). Locally, the test is passed too in 3.5.

Regards.

@Zeta36 Zeta36 closed this Oct 26, 2016
@Zeta36 Zeta36 reopened this Oct 26, 2016
@Zeta36
Copy link
Author

Zeta36 commented Oct 26, 2016

No way. All test are passed locally in my machine with:
sh ci/test.sh

But in the github Travis-CI it is passed for Python 2.7 but not for Python 3.5 (https://travis-ci.org/ibab/tensorflow-wavenet/builds/170909383)

I don't know what's the problem.

@lemonzi
Copy link
Collaborator

lemonzi commented Oct 26, 2016

This is great, thanks a lot! I had assumed you would have made enough changes to make a merge that supported all cases very difficult.

Regarding the bug, I found this on StackOverflow: http://stackoverflow.com/questions/12172791/changes-in-import-statement-python3. So, from what people say and if I understood correctly, the safest way to proceed would be to always import wavenet.whatever, both in the script and in the internal code. I'm no Python expert though, and using a relative import seemed to work so far.

@Zeta36
Copy link
Author

Zeta36 commented Oct 26, 2016

Thank you @lemonzi. You were right about the problem.

All tests passed now ;).

If you finally commit these changes to the master I think I could have soon a working version of the global conditioning too.

@jyegerlehner
Copy link
Contributor

Thanks for this @Zeta36

I pulled the branch to sanity check that the usual use-cases still work. When I try to train on the usual audio stuff I got this exception:

  File "train.py", line 316, in <module>
    main()
  File "train.py", line 217, in main
    input_batch = reader.dequeue(args.batch_size)
  File "/home/jd/dev/models/tensorflow/tensorflow-wavenet/wavenet/audio_reader.py", line 89, in dequeue
    encode_output = mu_law_encode(output, self.quantization_channels)
NameError: global name 'mu_law_encode' is not defined

whereas with the current master it runs OK.

def FileReader(data_dir, coord, sample_rate, sample_size,
silence_threshold, quantization_channels,
pattern, EPSILON=0.001, raw_type="Audio"):
if raw_type == "Audio":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this if-elif we might prefer a FileReader factory, something like lemonzi had me do here for creating the optimizer.



def write_output(waveform, filename, sample_rate, raw_type="Audio"):
if raw_type == "Image":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the above idea a bit further, we could have a dictionary of FileHandler objects with the factory method for creating the right kind of Reader, a write method, bool IsAudio() and so on. That would allow us to eliminate the if statements with magic string literals that are litered across the code. I think it will be more maintainable because it won't force anyone changing the code to know all the places where those string literals appear, and the code would be more compact. For example, this if-else statement could be replaced with a single line

 file_handler[raw_type].write_output(waveform, filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants