Cross-language speech emotion recognition is receiving increased attention due to its extensive real-world applicability. This work proposes a language-agnostic speech emotion recognition algorithm focusing on Italian and German languages. The mel-scaled and temporal modulation spectral representations are combined and then subsequently modeled by means of Gaussian mixture models. Emotion prediction is carried out via a Kullback Leibler divergence scheme. The proposed methodology is applied to two problem settings: one including positive vs. negative emotion classification and a second one where all Big Six emotional states are considered. A thorough experimental campaign demonstrated the efficacy of such a method, as well as its superiority over other generative modeling schemes and state-of-the-art approaches. The results demonstrate the feasibility of recognizing emotional states in a language-, gender- and speaker-independent setting.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.