Comparison of Audio Spectral Features in a Convolutional Neural Network

Vines, Greg; Nemer, Elias

AES E-Library

Comparison of Audio Spectral Features in a Convolutional Neural Network

Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Typically the Mel-Spectrogram is used to create the input features to the network justified by the Mel scale’s human auditory system basis. In this paper, we compare several spectral features in a gender detection speech model comparing their performance and showing that the Mel-Spectrogram is not always the best choice for input features.

Open
Access

Authors: Vines, Greg; Nemer, Elias
Affiliations: San Diego, CA, USA; San Diego, CA, USA;(See document for exact affiliation information.)
AES Convention: 153 (October 2022) Paper Number: 10634
Publication Date: October 19, 2022 Import into BibTeX
Subject: Applications in Audio
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=21963

Download Now (470 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location: /conv/153/10634.pdf

Start a discussion about this Applications in Audi!

AES E-Library

Comparison of Audio Spectral Features in a Convolutional Neural Network

ABOUT AES

Contact Us