Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Style Token Module #605

Merged
merged 7 commits into from
Jan 20, 2025
Merged

Global Style Token Module #605

merged 7 commits into from
Jan 20, 2025

Conversation

roedoejet
Copy link
Member

@roedoejet roedoejet commented Nov 29, 2024

PR Goal?

Add a Global Style Token Module à la https://arxiv.org/abs/1803.09017

Fixes?

#293

Feedback sought?

Sanity. Check training/synthesis with GST turned on

Priority?

medium

Tests added?

all for model code, so no added tests, but I still need to update existing tests since the commands and schemas have changed.

How to test?

  1. train a new model and try synthesizing with a style reference.

Confidence?

Modelling - medium
I'm medium-confident for the modelling side since it has successfully trained models and seems to result in better models when training with noisy data. That said, I'm not including it by default due to the extra complications at inference time (you have to provide a reference audio)

Versioning - low
Here's an example of a new item in a config. Should I be doing something in the versioning to automatically add use_global_style_token_module=False if Version==1.0 and that key is not found? Should I also be bumping up the version of the config here?

Version change?

minor version bump for FastSpeech2 and change to schemas

Related PRs?

EveryVoiceTTS/FastSpeech2_lightning#100

Copy link

semanticdiff-com bot commented Nov 29, 2024

@joanise
Copy link
Member

joanise commented Dec 9, 2024

For config file versioning:

  • yes, you need a minor bump, touching the schemas automatically require a minor bump
  • yes, you should write logic so that when we load a model written with an older version, the new parameter is instantiated with the value that keeps the old semantics, i.e. with False in this case. Samuel left hooks in place just for that when we load models.

@joanise
Copy link
Member

joanise commented Dec 9, 2024

  • yes, you should write logic so that when we load a model written with an older version, the new parameter is instantiated with the value that keeps the old semantics, i.e. with False in this case. Samuel left hooks in place just for that when we load models

Oh, since you made the field optional and defaulting to False, this may already all happen automatically for you. The test would be to load a model written with 0.2, and write it back with 0.3, if it works as expected it's all good.

@joanise
Copy link
Member

joanise commented Dec 9, 2024

  • yes, you need a minor bump, touching the schemas automatically require a minor bump

And let's bump to 0.3.0, without the a element, as discussed at the meeting today.

@marctessier
Copy link
Collaborator

When I run all tests, I get this error below or see attached log file:

======================================================================
ERROR: test_filelist_language (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
Use a different language than the one provided in the filelist.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 117, in test_filelist_language
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

======================================================================
ERROR: test_filelist_speaker (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
Use a different speaker than the one provided in the filelist.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 140, in test_filelist_speaker
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

======================================================================
ERROR: test_plain_filelist (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 166, in test_plain_filelist
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

----------------------------------------------------------------------
Ran 213 tests in 81.990s

FAILED (errors=3)
============ Finished job 3365462 on Wed 11 Dec 2024 02:59:24 PM EST with rc=1

TEST-all.e3365462.txt

@marctessier
Copy link
Collaborator

I tried to train a FP , it starts but crashed before finishing the first epoch when I run as-is with this : use_global_style_token_module: true

RuntimeError: It looks like your LightningModule has parameters that were not 
used in producing the loss returned by training_step. If this is intentional, 
you must enable the detection of unused parameters in DDP, either by setting the
string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag
in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.
Loading EveryVoice modules: 100%|██████████| 4/4 [00:10<00:00,  2.73s/it]   
srun: error: ib14gpu-002: task 0: Exited with exit code 1

gst.e3365525.txt
gst.o3365525.txt

@marctessier
Copy link
Collaborator

If I use use_global_style_token_module: false training will work...

Copy link
Contributor

github-actions bot commented Jan 14, 2025

CLI load time: 0:00.27
Pull Request HEAD: 6b8cb165a98712e7347f8f0bd1b97486a65aef3a
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1048 |     103678 |     typer.main
import time:       284 |     122986 |   typer
import time:      7952 |     202276 | everyvoice.cli

Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 76.34%. Comparing base (dccad14) to head (6b8cb16).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
everyvoice/demo/app.py 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #605      +/-   ##
==========================================
- Coverage   76.49%   76.34%   -0.16%     
==========================================
  Files          47       47              
  Lines        3476     3483       +7     
  Branches      477      479       +2     
==========================================
  Hits         2659     2659              
- Misses        714      721       +7     
  Partials      103      103              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joanise
Copy link
Member

joanise commented Jan 15, 2025

Needs the Copyright exceptions listed in LICENSE, otherwise looks good.

Copy link
Collaborator

@marctessier marctessier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebuilding a new ENV from scratch and running a few more sanity tests . I was able to use use_global_style_token_module: true with no more errors. Looks good .

@roedoejet roedoejet merged commit aa1326e into main Jan 20, 2025
4 checks passed
@roedoejet roedoejet deleted the dev.ap/gst branch January 20, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants