AES E-Library

AES E-Library

Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes

Document Thumbnail

Individual sounds are difficult to detect in complex soundscapes because of a strong overlap. This article explores the task of estimating sound polyphony, which is defined here as the number of audible sound classes. Sound polyphony measures the complexity of a soundscape and can be used to inform sound classification algorithms. First, a listening test is performed to assess the difficulty of the task.The results showthat humans are only able to reliably count up to three simultaneous sound sources and that they underestimate the degree of polyphony for more complex soundscapes. Human performance depends mainly on the spectral characteristics of the sounds and, in particular, on the number of overlapping noise-like and transient sounds. In a second step, four deep neural network architectures, including an object detection approach for natural images, are compared to contrast human performance with machine learning--based approaches. The results show that machine listening systems can outperform human listeners for the task at hand. Based on these results, an implicit modeling of the sound polyphony based on the number of previously detected sound classes seems less promising than the explicit modeling strategy.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 71 Issue 12 pp. 860-872; December 2023
Publication Date:
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=22348


Download Now (786 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location:

DOI:

Start a discussion about this paper!


AES - Audio Engineering Society