✨ SuperVoice VoiceBox

Feel free to join my Discord Server to discuss this model!

An independent VoiceBox implementation for voice synthesis. Currently in BETA.

Features

⚡️ Narural sounding
🎤 High quality - 24khz audio
🤹‍♂️ Versatile - synthesiszed voice has high variability
📕 Currently only English language is supported, but nothing stops us from adding more languages.

Samples

sample_1.mp4

sample_2.mp4

sample_3.mp4

sample_4.mp4

How to use

Supervoice consists of three networks: gpt for phoneme and prosogy generation, audio model for audio synthesis and vocoder for audio generation. Supervoice is published using Torch Hub, so you can use it as follows:

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Vocoder
vocoder = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vocoder', model='bigvsan')
vocoder.to(device)
vocoder.eval()

# GPT Model
gpt = torch.hub.load(repo_or_dir='ex3ndr/supervoice-gpt', model='phonemizer')
gpt.to(device)
gpt.eval()

# Main Model
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-voicebox', model='phonemizer', gpt=gpt, vocoder=vocoder)
model.to(device)
model.eval()

# Generate audio
# Supervoice has three example voices: "voice_1", "voice_2" (my favorite), "voice_3"
# You can also remove the voice parameter to use the random one, or provide your own, but you need a TextGrid alignment for that.
# Steps means quality of the audio, recommended value is 4, 8 or 32.
# Alpha is a parameter of randomness, it should be less than 1.0, stable synthesis with small variaons is 0.1, 0.3 is a good value for more expressive synthesis, 0.5 is a maximum recommended value.
output = model.synthesize("What time is it, Steve?", voice = "voice_1", steps = 8, alpha = 0.1)

# Output of melspec
melspec = output['melspec']

# Output 1D tensor of 24000khz audio (missing if vocoder is not provided)
waveform = output['wav']

# Play audio in notebook
display(Audio(data=waveform, rate=24000))

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
eval		eval
samples		samples
supervoice		supervoice
utils		utils
voices		voices
.gitignore		.gitignore
README.md		README.md
datasets.yaml		datasets.yaml
datasets_align.sh		datasets_align.sh
datasets_index.py		datasets_index.py
datasets_prepare.py		datasets_prepare.py
datasets_stats.py		datasets_stats.py
eval.ipynb		eval.ipynb
eval_audio.ipynb		eval_audio.ipynb
eval_dataset.ipynb		eval_dataset.ipynb
generate_voices.py		generate_voices.py
hubconf.py		hubconf.py
train.py		train.py
train.sh		train.sh
train_tokenizer.py		train_tokenizer.py
workbook.ipynb		workbook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ SuperVoice VoiceBox

Features

Samples

How to use

License

About

Releases

Packages

Languages

Issalk/supervoice-voicebox

Folders and files

Latest commit

History

Repository files navigation

✨ SuperVoice VoiceBox

Features

Samples

How to use

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages