Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes

Abeßer, Jakob; Ullah, Asad; Ziegler, Sebastian; Grollmisch, Sascha

AES E-Library

Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes

Individual sounds are difficult to detect in complex soundscapes because of a strong overlap. This article explores the task of estimating sound polyphony, which is defined here as the number of audible sound classes. Sound polyphony measures the complexity of a soundscape and can be used to inform sound classification algorithms. First, a listening test is performed to assess the difficulty of the task.The results showthat humans are only able to reliably count up to three simultaneous sound sources and that they underestimate the degree of polyphony for more complex soundscapes. Human performance depends mainly on the spectral characteristics of the sounds and, in particular, on the number of overlapping noise-like and transient sounds. In a second step, four deep neural network architectures, including an object detection approach for natural images, are compared to contrast human performance with machine learning--based approaches. The results show that machine listening systems can outperform human listeners for the task at hand. Based on these results, an implicit modeling of the sound polyphony based on the number of previously detected sound classes seems less promising than the explicit modeling strategy.

Open
Access

Authors: Abeßer, Jakob; Ullah, Asad; Ziegler, Sebastian; Grollmisch, Sascha
Affiliations: Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany;(See document for exact affiliation information.)
JAES Volume 71 Issue 12 pp. 860-872; December 2023
Publication Date: December 12, 2023 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=22348

Download Now (786 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location: (CD JAES71) /jaes71/12/pg860.pdf

DOI: https://doi.org/10.17743/jaes.2022.0106

Start a discussion about this paper!

AES E-Library

Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes

ABOUT AES

Contact Us