AES E-Library

AES E-Library

Toward Language-Agnostic Speech Emotion Recognition

Cross-language speech emotion recognition is receiving increased attention due to its extensive real-world applicability. This work proposes a language-agnostic speech emotion recognition algorithm focusing on Italian and German languages. The mel-scaled and temporal modulation spectral representations are combined and then subsequently modeled by means of Gaussian mixture models. Emotion prediction is carried out via a Kullback Leibler divergence scheme. The proposed methodology is applied to two problem settings: one including positive vs. negative emotion classification and a second one where all Big Six emotional states are considered. A thorough experimental campaign demonstrated the efficacy of such a method, as well as its superiority over other generative modeling schemes and state-of-the-art approaches. The results demonstrate the feasibility of recognizing emotional states in a language-, gender- and speaker-independent setting.

JAES Volume 68 Issue 1/2 pp. 7-13; January 2020
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:


Start a discussion about this paper!

AES - Audio Engineering Society