Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition

Korvel, Gražina; Treigys, Povilas; Tamulevicus, Gintautas; Bernataviciene, Jolita; Kostek, Bozena

AES E-Library

Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition

The aim of this study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN), which is a class of deep, feed-forward artificial neural network. The authors analyzed the audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. This choice was made because CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in a Lithuanian word-recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.

Authors: Korvel, Gražina; Treigys, Povilas; Tamulevicus, Gintautas; Bernataviciene, Jolita; Kostek, Bozena
Affiliations: Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania; Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland(See document for exact affiliation information.)
JAES Volume 66 Issue 12 pp. 1072-1081; December 2018
Publication Date: December 20, 2018 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=19880

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD JAES66) /jaes66/12/pg1072.pdf

DOI: https://doi.org/10.17743/jaes.2018.0066

Start a discussion about this report!

AES E-Library

Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition

ABOUT AES

Contact Us