This repository provides audio samples of Zero-Shot Grad-TTS.
The audio samples for Zero-Shot Grad-TTS can be found in link.
For the comparison of synthesis performance for the Seen speaker, we randomly selected speakers from the LibriTTS dataset used in the learning and performed speech synthesis.
For the comparison of synthesis performance for Unseen Speakers, a total of 11 speakers were selected from the VCTK dataset and speech synthesis was performed. The 11 selected speakers are as follows.
VCTK: p225, p234, p245, p302
For model comparison, we perform comparisons with the flow-based Zero-shot Multi Speaker voice synthesis models SC-GlowTTS and YourTTS.
Composite audio samples from SC-GlowTTS, YourTTS were used by downloading voice samples provided by the authors. You can download the authors' voice samples from the link below.
This work was partially supported by the Artificial Intelligence Industry Cluster Agency(AICA) grant funded by the Korea government(MSIT) (K-Digital Challenge : AI Startup Foundation Competition, 2023), and by the research fund from Chosun University, 2023.
These audio samples are MIT-licensed.