Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

Open
anirpipi opened this issue Jul 1, 2023 · 2 comments
Open

Comments

@anirpipi
Copy link

anirpipi commented Jul 1, 2023

Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:

########################### ONNX Conversion ############################

from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport

m = TTSModelExport()

tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"

vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'

text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)

tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)

########################### Inference ############################

from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time

text2speech = Text2Speech(tag_name)

text = 'hello world!'
wav = wav['wav']

soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")

######################################################################

On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize

Can anyone please help how to include the stats.h5 during inference using espnet_onnx

@Masao-Someki
Copy link
Collaborator

Hi @anirpipi, sorry for the late reply, and thank you for reporting the issue.
It may be a bug, so I would like to check this problem.
It seems you are using your own trained model, can you confirm that this issue still happens with the published models? If it's reproducible, I will download the model and investigate this.

@anirpipi
Copy link
Author

Hi..Thanks for the response.
Its the same case with pre-trained models also..
For VITS, its fine but for FastSpeech2+PWG, the problem occurs..
Can you please look into it once
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants