Vocal Affects Perceived from Spontaneous and Posed Speech

Oh, Eunmi; Suhr, Jinsun

AES E-Library

Vocal Affects Perceived from Spontaneous and Posed Speech

This study examines listeners’ natural ability to identify an anonymous speaker’s emotions from speech samples with broad ranges of emotional intensity. This study aims to compare emotional ratings between posed and spontaneous speech samples and analyzes how basic acoustic parameters are utilized. The spontaneous samples were extracted from the Korean Spontaneous Speech corpus consisting of casual conversations. The posed samples with emotions (happiness, neutrality, anger, sadness) were obtained from the Emotion Classification dataset. Non-native listeners were asked to evaluate seven opposite pairs of affective attributes perceived from the speech samples. Listeners perceived fewer spontaneous samples as having negative valences. The posed samples had higher mean rating scores than those of the spontaneous speeches, only in negative valences. Listeners reacted more sensitively to the posed than spontaneous speeches in negative valence and had difficulty detecting happiness from the posed samples. The spontaneous samples perceived as positive had higher variance in pitch and higher maximum pitch than those perceived as negative. Contrastingly, the posed samples perceived as negative valences were positively correlated with higher values of the pitch parameters. These results can be utilized to assign specific vocal affects to artificial intelligence voice agents or virtual humans, rendering more human-like voices.

Open
Access

Authors: Oh, Eunmi; Suhr, Jinsun
Affiliations: Yonsei University; Yonsei University(See document for exact affiliation information.)
AES Convention: 155 (October 2023) Paper Number: 10671
Publication Date: October 25, 2023 Import into BibTeX
Subject: Signal Processing
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=22252

AES E-Library

Vocal Affects Perceived from Spontaneous and Posed Speech

ABOUT AES

Contact Us