Korean multi-speaker VITS

This project was implemented using the official PyTorch implementation by "jaywalnut310". After training for 10 epochs (32 batch size, 460k steps), the inference results for two random male and female speakers are available in the inference_samples.

Getting Started

Setting up the development environment for this project can be challenging due to version conflicts with various libraries.

Therefore, we managed the development environment of this project using a Docker container.

The Docker image, which can be used to create the Docker container, can be downloaded from Docker Hub.

Dataset

The data used for model training can be downloaded from the following link.

Data Preprocessing

데이터 세트를 다운로드 한 뒤 학습할 수 있게 전처리를 해야 합니다.

If file paths contain Korean or special characters, change them to English or numbers.
Convert .wav files for training to a 22kHz sampling rate.
If there are stereo files, convert them to mono.
The filelists downloaded from Google Drive link the labels and wav files. (.cleaned files are the result of converting sentences using g2pk.) 4-1. If using a different phoneme conversion module, convert a few English words to Korean pronunciation and remove the '\xa0' special character.
Place the make_mels.py file at the top path of the dataset and generate melspectrograms. (Pre-create files required for training.)

Installing

You can clone this GitHub repository and use it.

git clone https://github.com/0913ktg/vits_korean_multispeaker

You can download the model checkpoints and filelists from the Google Drive link.

Train

Once you have 22kHz audio files, train, and validation filelists, and have completed data preprocessing, you can start training by running train_ms.py. Multi-GPU usage has been confirmed.

Synthesis

In inference.py, modify the path for the Generator checkpoint accordingly.
Enter the desired Korean sentences in texts. (Separate multiple sentences with a comma.)
Enter the speaker number in sid. (from 0 to 184)
Running inference.py will create a file named test{i}.wav.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
inference_samples		inference_samples
monotonic_align		monotonic_align
resources		resources
text		text
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
commands.txt		commands.txt
commons.py		commons.py
data_utils.py		data_utils.py
inference.py		inference.py
losses.py		losses.py
make_mels.py		make_mels.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Korean multi-speaker VITS

Getting Started

Dataset

Data Preprocessing

Installing

Train

Synthesis

About

Releases

Packages

Languages

License

0913ktg/vits_korean_multispeaker

Folders and files

Latest commit

History

Repository files navigation

Korean multi-speaker VITS

Getting Started

Dataset

Data Preprocessing

Installing

Train

Synthesis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages