Experimenting with 1D CNN Architectures for Generic Audio Classification

Vrysis, Lazaros; Thoidis, Iordanis; Dimoulas, Charalampos; Papanikolaou, George

AES E-Library

Experimenting with 1D CNN Architectures for Generic Audio Classification

During the recent years, convolutional neural networks have been the standard on audio semantics, surpassing traditional classification approaches which employed hand-crafted feature engineering as front-end and various classifiers as back-end. Early studies were based on prominent 2D convolutional topologies for image recognition, adapting them to audio classification tasks. After the surge of deep learning in the past decade, real end-to-end audio learning, employing algorithms that directly process waveforms are to become the standard. This paper attempts a comparison between deep neural setups on typical audio classification tasks, focusing on optimizing 1D convolutional neural networks that can be deployed on various audio in-formation retrieval tasks, such as general audio detection and classification, environmental sound or speech emotion recognition.

Authors: Vrysis, Lazaros; Thoidis, Iordanis; Dimoulas, Charalampos; Papanikolaou, George
Affiliation: Aristotle University of Thessaloniki
AES Convention: 148 (May 2020) Paper Number: 10329
Publication Date: May 28, 2020 Import into BibTeX
Subject: Posters: Signal Processing
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=20746

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: /conv/148/10329.pdf

Start a discussion about this paper!

AES E-Library

Experimenting with 1D CNN Architectures for Generic Audio Classification

ABOUT AES

Contact Us