Skip to content

ultranationalism/GPT-SoVITS-mindspore

Repository files navigation

GPT-SoVITS-MindSpore-WebUI

A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.

English | 中文简体


This repo is the implementation of the GPT-SoVITS model in MindSpore, reference to the implementation by RVC-BOSS

Features:

  1. Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.

  2. Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.

  3. Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.

  4. WebUI Tools(TODO): Integrated tools include automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

Installation

Tested Environments

  • Python 3.9, Mindspore 2.2.3, CU116

Linux

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

Install Manually

Install Dependences

pip install -r requirements.txt

Install FFmpeg

Conda Users
conda install ffmpeg
Ubuntu/Debian Users
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'

Pretrained Models

You can use the model conversion tool GPT_SoVITS/convert.py to transform PyTorch model weights into MindSpore model weights.

cd GPT-SoVITS-mindspore
python GPT_SoVITS/convert.py --g_path path_to_your_GPT_model \
--s_path path_to_your_Sovits_model \

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

Start Inference

Launch Webui

You can use a startup script:

cd GPT-SoVITS-mindspore
bash launch_webui.sh

Or directly launch the Python file:

cd GPT-SoVITS-mindspore
python GPT_SoVITS/inference_webui.py

Reference information

Upload a clip for reference audio (must be 3-10 seconds) then fill in the Text for reference audio, which is basically what does the character say in the audio. Choose the language on the right.

The reference audio is very important as it determines the speed and the emotion of the output. Please try different ones if you did not get your desired output.

Inference

Fill the inference text and set the inference language, then click Start inference.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published