Bangla-Speech-Emotion

This repository contains the implementation of a Bangla Text-to-Speech (TTS) system based on the paper "Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention" along with additional notes, code, and works related to the project.

Paper Summary

The implementation is based on the paper mentioned above. For detailed insights and notes on the paper, refer to Bangla TTS with Guided Attention Notes.

Dataset

The LJ Speech Dataset: A public domain speech dataset consisting of 13,100 short audio clips of a single female speaker.
Open SLR - High quality TTS data for Bengali languages: A public domain speech dataset consisting of 10,100 short audio clips of a speaker.

Training

To train a model using The LJ Speech Dataset:

Download the dataset and extract it into a directory. Set the directory path in pkg/hyper.py.
Run the preprocessing script:
```
python3 main.py --action preprocess
```

Train the Text2Mel network:

python3 main.py --action train --module Text2Mel

Train the SSRN network:

python3 main.py --action train --module SuperRes

Samples

Synthesized samples along with their corresponding sentences are contained in the synthesis directory. The pre-trained models for Text2Mel and SuperRes (auto-saved during training at logdir/text2mel/pkg/trained.pkg and logdir/superres/pkg/trained.pkg, respectively) will be loaded during synthesis.

To synthesize samples listed in sentences.txt:

python3 main.py --action synthesis

An example of the attention matrix for a specific sentence is also provided.

Pre-trained Model

The current pre-trained model is based on 20k batches trained for Text2Mel and 19k batches trained for SuperRes. While the results are not entirely satisfying, improvements are possible by tuning hyperparameters. You can download the pre-trained model from our Google Drive.

Dependency

Ensure you have the following dependencies installed:

scipy, librosa, num2words, matplotlib
PyTorch == 1.8.1
CUDA 10.2
numpy

Relative

For TensorFlow implementation, refer to Kyubyong/dc_tts.

For any questions or suggestions, please contact Sajid Ahmed ([email protected]) or Arifuzzaman Arman ([email protected]).

This revised README provides clearer instructions and structure for users interested in understanding and utilizing the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bangla-Speech-Emotion

Paper Summary

Dataset

Training

Samples

Pre-trained Model

Dependency

Relative

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
pkg		pkg
synthesis		synthesis
LICENSE		LICENSE
README.md		README.md
main.py		main.py
sentences.txt		sentences.txt

sajidahmed12/Bangla-Speech-Emotion

Folders and files

Latest commit

History

Repository files navigation

Bangla-Speech-Emotion

Paper Summary

Dataset

Training

Samples

Pre-trained Model

Dependency

Relative

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages