Neural Speech Synthesis Using Transformers

Reference Paper: Neural Speech Synthesis Using Transformers

Final Report: report

Problem Definition:

The problem I have implemented over the course of this project is based on Text-to-Speech (TTS). Text to Speech had gained a lot of prominence in various fields and had solved many real-world problems that we encounter in daily life. TTS has been a revelation right from the beginning, but it has become more powerful. With the advancement, we are able to generate intelligible and natural audios which are indistinguishable from human recordings. The main idea behind the project is to enable users to customize their lexicon to convey their thoughts, emotions, and requests in their day-to-day lives. It immensely helps users with disabilities and impacts the way they communicate by relying on this technology to express their ideas and share with those around them. There are huge applications of TTS in our daily life, Among them, the most popular ones would be in our personal voice assistants and we also come across this in various google products such as (Ok Google, Google Translate, and Google Maps).

Project Objectives:

The main objective of this project is to build a text-to-speech application, such that generated audios are indistinguishable from the human voice. The TTS system mainly focuses on the following objectives:

For the TTS system input is Text, so first, we need to preprocess the text such as removing special characters, syllable boundaries, and punctuations are also included as special markers. Then generate phoneme sequences for text which are used for input for the model training.
Preprocess the audio files such as normalizing the audio frequencies, trimming the audio to a specified length, and then extracting the Mel-spectrograms. Store the extracted Mel-spectrograms which are useful for the training of the TTS model.
Design and implement the TTS model according to this paper. For this project, I have used this repo as a reference.
Integrate HiFi-Gan vocoder with the TTS model. HiFI-Gan vocoder is used to generate the high-quality audios from the Mel-spectrograms.
Integrate Griffin-Lim vocoder with the TTS model. Griffin-Lim vocoder is also used to generate the audios from the Mel-spectrograms.
Design and implement the TTS system that can read out the text at any speed rate that the user specifies.
Finally, built web-app using StreamLit for this TTS system.

Transformer TTS implementation taken from : https://github.com/soobinseo/Transformer-TTS

Setting up project environment:

Create new enviroment using annaconda or miniconda

conda create -n env_name python=3.6

Activate conda enviroment

conda activate env_name

Install dependencies using requirements.txt

pip install -r requirements.txt

Pre-Trained LJSpeech API

Inference using pre-trained model(Griffin-Lim Vocoder):

python predict_tts.py -t "Please, say something."

After running the above command output audio file will be saved in the outputs directory
Inference using pre-trained model(HiFi-Gan vocoder)

python predict_tts_vocoder.py -t "please say something" --voc hifigan

To use the web-app run the following command

streamlit run app.py

Results:

Generated Audio Samples:
For Text: Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems, replacing RNN models such as long short-term memory
- Griffin-Lim Vocoder: sample audio output
- HiFi-Gan Vocoder: sample audio output
Screenshot of the Webapp:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
Transformer_tts		Transformer_tts
__pycache__		__pycache__
hifigan		hifigan
outputs		outputs
.DS_Store		.DS_Store
Project Report ISE 244.pdf		Project Report ISE 244.pdf
README.md		README.md
app.py		app.py
hifigan_predictor.py		hifigan_predictor.py
predict_tts.py		predict_tts.py
predict_tts_vocoder.py		predict_tts_vocoder.py
requirements.txt		requirements.txt
result.png		result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Speech Synthesis Using Transformers

Problem Definition:

Project Objectives:

Transformer TTS implementation taken from : https://github.com/soobinseo/Transformer-TTS

Setting up project environment:

Pre-Trained LJSpeech API

Results:

About

Releases

Packages

Languages

sudheer997/Neural-Speech-Synthesis-Using-Transformers-

Folders and files

Latest commit

History

Repository files navigation

Neural Speech Synthesis Using Transformers

Problem Definition:

Project Objectives:

Transformer TTS implementation taken from : https://github.com/soobinseo/Transformer-TTS

Setting up project environment:

Pre-Trained LJSpeech API

Results:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages