Dimensional Speech Emotion Recognition by Using Acoustic Features and Word Embeddings using Multitask Learning

by Bagus Tris Atmaja, Masato Akagi

This paper has been published in APSIPA Transaction on Signal and Information Processing.

Abstract

The majority of research in speech emotion recognition (SER) is conducted to recognize emotion categories. Recognizing dimensional emotion attributes is also important, however, and it has several advantages over categorical emotion. For this research, we investigate dimensional SER using both speech features and word embeddings. The concatenation network joins acoustic networks and text networks from bimodal features. We demonstrate that those bimodal features, both are extracted from speech, improve the performance of dimensional SER over unimodal SER either using acoustic features or word embeddings. A significant improvement on the valence dimension is contributed by the addition of word embeddings to SER system, while arousal and dominance dimensions are also improved. We proposed a multitask learning (MTL) approach for the prediction of all emotional attributes. This MTL maximizes the concordance correlation between predicted emotion degrees and true emotion labels simultaneously. The findings suggest that the use of MTL with two parameters is better than other evaluated methods in representing the interrelation of emotional attributes. In unimodal results, speech features attain higher performance on arousal and dominance, while word embeddings are better for predicting valence. The overall evaluation uses the concordance correlation coefficient score of the three emotional attributes. We also discuss some differences between categorical and dimensional emotion results from psychological and engineering perspectives.

Software implementation

The implementation of the algorithm proposed in the paper was conducted using Numpy, Keras (v2.3), and Tensorflow (v1.15).

All source code used to generate the results and figures in the paper are in the code folder. The calculations and figure generation are all run inside Jupyter notebooks. The data used in this study is provided in data and the sources for the manuscript text and figures are in latex. Results generated by the code are saved in results. See the README.md files in each directory for a full description.

Architecture of the proposed dimensional SER with the main results.

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/bagustris/dimensional-ser.git

or download a zip archive.

A copy of the paper is also archived at https://doi.org/10.1017/ATSIP.2020.14

Dependencies

You'll need a working Python environment to run the code. The recommended way to set up your environment is through the Anaconda Python distribution which provides the conda package manager. Anaconda can be installed in your user directory and does not interfere with the system Python installation. The required dependencies are specified in the file requirements.txt.

We use pip virtual environments to manage the project dependencies in isolation. Thus, you can install our dependencies without causing conflicts with your setup (even with different Python versions).

Run the following command in the repository folder (where environment.yml is located) to create a separate environment and install all required dependencies in it:

pip3.6 venv REPO_NAME

Reproducing the results

Since the dataset is not included, it is difficult to reproduce the result. However, the plot in the paper can be reproduced from the csv file in data directory.

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE.md for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently published in the journal of APSIPA Transaction on Signal and Information Processing.

Citation

B. T. Atmaja and M. Akagi, “Dimensional speech emotion recognition from speech 
features and word embeddings by using multitask learning,” APSIPA Transactions 
on Signal and Information Processing, vol. 9, p. e17, 2020.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dimensional Speech Emotion Recognition by Using Acoustic Features and Word Embeddings using Multitask Learning

Abstract

Software implementation

Getting the code

Dependencies

Reproducing the results

License

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
code		code
data		data
fig		fig
latex		latex
readme.md		readme.md
requirements.txt		requirements.txt

bagustris/dimensional-ser

Folders and files

Latest commit

History

Repository files navigation

Dimensional Speech Emotion Recognition by Using Acoustic Features and Word Embeddings using Multitask Learning

Abstract

Software implementation

Getting the code

Dependencies

Reproducing the results

License

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages