The sudden change of speaker's voice in streaming inference #428

JeneeeF · 2024-09-24T06:05:34Z

Great work, thanks.

I successfully finetuned the 300M model with my own data. However, while using the latest streaming inference methods, specifies one female speaker to do long text speech synthesis, I find some words or phrases suddenly change to a malse speaker voice , which scares me.

The reason behind the sudden change in the speaker's voice during streaming inference is something I'm curious about. Could it be because you're segmenting speech tokens as input? Is that possible to solve this problem by adjusting "token_min_hop_len", "token_max_hop_len", "token_overlap_len" or the other parameters?

Hope for your reply.

aluminumbox · 2024-09-24T07:13:43Z

set stream=False.
this is due to train/inference mismatching. first, we use fp16 inference, we will add amp train later. second, we train whole utterance, so there is performance degradation in streaming inference mode, but this is not easy to solve.

JeneeeF · 2024-09-25T03:55:57Z

Thanks for your reply.

set stream=False. this is due to train/inference mismatching. first, we use fp16 inference, we will add amp train later. second, we train whole utterance, so there is performance degradation in streaming inference mode, but this is not easy to solve.

Thanks for your reply. Do you have any recommendations for mitigating the phenomenon of voice changing during streaming inference process? This phenomenon rarely appears in short sentences synthesis compared with long text speech synthesis.

github-actions · 2024-10-26T02:32:07Z

This issue is stale because it has been open for 30 days with no activity.

github-actions bot added the stale label Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The sudden change of speaker's voice in streaming inference #428

The sudden change of speaker's voice in streaming inference #428

JeneeeF commented Sep 24, 2024

aluminumbox commented Sep 24, 2024

JeneeeF commented Sep 25, 2024

github-actions bot commented Oct 26, 2024

The sudden change of speaker's voice in streaming inference #428

The sudden change of speaker's voice in streaming inference #428

Comments

JeneeeF commented Sep 24, 2024

aluminumbox commented Sep 24, 2024

JeneeeF commented Sep 25, 2024

github-actions bot commented Oct 26, 2024