AES E-Library

AES E-Library

Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

Document Thumbnail

The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focuses on a generic audio detection and classification method that combines hierarchical bimodal segmentation with hybrid pattern classification at different temporal resolutions. This paper presents the algorithmic perspective of a mobile back-end system to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The goal is the implementation of a system that is capable of performing well in different conditions without relying on complicated pattern recognition systems and taxonomies. For this reason, minimal prior knowledge is necessary so that there is consistent behavior for different input signals and computational environments. Novel “classification confidence” metrics are implemented.

JAES Volume 64 Issue 12 pp. 1042-1054; December 2016
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:


Start a discussion about this paper!

AES - Audio Engineering Society