Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George

AES E-Library

Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focuses on a generic audio detection and classification method that combines hierarchical bimodal segmentation with hybrid pattern classification at different temporal resolutions. This paper presents the algorithmic perspective of a mobile back-end system to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The goal is the implementation of a system that is capable of performing well in different conditions without relying on complicated pattern recognition systems and taxonomies. For this reason, minimal prior knowledge is necessary so that there is consistent behavior for different input signals and computational environments. Novel “classification confidence” metrics are implemented.

Authors: Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George
Affiliation: Aristotle University of Thessaloniki, Thessaloniki, Greece
JAES Volume 64 Issue 12 pp. 1042-1054; December 2016
Publication Date: December 27, 2016 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=18537

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD JAES64) /jaes64/12/pg1042.pdf

DOI: https://doi.org/10.17743/jaes.2016.0051

Start a discussion about this paper!

AES E-Library

Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

ABOUT AES

Contact Us