This repository contains the implementation of a Bangla Text-to-Speech (TTS) system based on the paper "Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention" along with additional notes, code, and works related to the project.
The implementation is based on the paper mentioned above. For detailed insights and notes on the paper, refer to Bangla TTS with Guided Attention Notes.
- The LJ Speech Dataset: A public domain speech dataset consisting of 13,100 short audio clips of a single female speaker.
- Open SLR - High quality TTS data for Bengali languages: A public domain speech dataset consisting of 10,100 short audio clips of a speaker.
To train a model using The LJ Speech Dataset:
- Download the dataset and extract it into a directory. Set the directory path in
pkg/hyper.py
. - Run the preprocessing script:
python3 main.py --action preprocess
- Train the Text2Mel network:
python3 main.py --action train --module Text2Mel
- Train the SSRN network:
python3 main.py --action train --module SuperRes
Synthesized samples along with their corresponding sentences are contained in the synthesis
directory. The pre-trained models for Text2Mel and SuperRes (auto-saved during training at logdir/text2mel/pkg/trained.pkg
and logdir/superres/pkg/trained.pkg
, respectively) will be loaded during synthesis.
To synthesize samples listed in sentences.txt
:
python3 main.py --action synthesis
An example of the attention matrix for a specific sentence is also provided.
The current pre-trained model is based on 20k batches trained for Text2Mel and 19k batches trained for SuperRes. While the results are not entirely satisfying, improvements are possible by tuning hyperparameters. You can download the pre-trained model from our Google Drive.
Ensure you have the following dependencies installed:
- scipy, librosa, num2words, matplotlib
- PyTorch == 1.8.1
- CUDA 10.2
- numpy
For TensorFlow implementation, refer to Kyubyong/dc_tts.
For any questions or suggestions, please contact Sajid Ahmed ([email protected]) or Arifuzzaman Arman ([email protected]).
This revised README provides clearer instructions and structure for users interested in understanding and utilizing the repository.