Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Typically the Mel-Spectrogram is used to create the input features to the network justified by the Mel scale’s human auditory system basis. In this paper, we compare several spectral features in a gender detection speech model comparing their performance and showing that the Mel-Spectrogram is not always the best choice for input features.
https://www.aes.org/e-lib/browse.cfm?elib=21963
Download Now (470 KB)
This paper is Open Access which means you can download it for free.
Learn more about the AES E-Library
Start a discussion about this Applications in Audi!