Emotional-Speech-Synthesis

This work was done for the course DT2119 Speech and Speaker Recognition (2022) at Royal Institute of Technology (KTH), Stockholm. Read the report.pdf for all information about the project.

The project aimed to condition a TTS system (Tacotron2 + WaveGlow) on a emotion or a speaker identity using the IEMOCAP and the VCTK datasets. It builds upon the following repositories:
https://github.com/NVIDIA/tacotron2
https://github.com/NVIDIA/waveglow
https://github.com/resemble-ai/Resemblyzer

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
data		data
logs/tensorboard		logs/tensorboard
processing		processing
tacotron2		tacotron2
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
emotion_embedding_network.py		emotion_embedding_network.py
report.pdf		report.pdf
speaker_embedding2.py		speaker_embedding2.py
speaker_embeddings.py		speaker_embeddings.py
style_loss.py		style_loss.py
train_emotion_embedding_network.py		train_emotion_embedding_network.py

Provide feedback