Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition
×
Cite This
Citation & Abstract
G. Korvel, P. Treigys, G. Tamulevicus, J. Bernataviciene, and B. Kostek, "Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition," J. Audio Eng. Soc., vol. 66, no. 12, pp. 1072-1081, (2018 December.). doi: https://doi.org/10.17743/jaes.2018.0066
G. Korvel, P. Treigys, G. Tamulevicus, J. Bernataviciene, and B. Kostek, "Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition," J. Audio Eng. Soc., vol. 66 Issue 12 pp. 1072-1081, (2018 December.). doi: https://doi.org/10.17743/jaes.2018.0066
Abstract: The aim of this study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN), which is a class of deep, feed-forward artificial neural network. The authors analyzed the audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. This choice was made because CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in a Lithuanian word-recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.
@article{korvel2018analysis,
author={korvel, gražina and treigys, povilas and tamulevicus, gintautas and bernataviciene, jolita and kostek, bozena},
journal={journal of the audio engineering society},
title={analysis of 2d feature spaces for deep learning-based speech recognition},
year={2018},
volume={66},
number={12},
pages={1072-1081},
doi={https://doi.org/10.17743/jaes.2018.0066},
month={december},}
@article{korvel2018analysis,
author={korvel, gražina and treigys, povilas and tamulevicus, gintautas and bernataviciene, jolita and kostek, bozena},
journal={journal of the audio engineering society},
title={analysis of 2d feature spaces for deep learning-based speech recognition},
year={2018},
volume={66},
number={12},
pages={1072-1081},
doi={https://doi.org/10.17743/jaes.2018.0066},
month={december},
abstract={the aim of this study was to evaluate the suitability of 2d audio signal feature maps for speech recognition based on deep learning. the proposed methodology employs a convolutional neural network (cnn), which is a class of deep, feed-forward artificial neural network. the authors analyzed the audio signal feature maps, namely spectrograms, linear and mel-scale cepstrograms, and chromagrams. this choice was made because cnn performs well in 2d data-oriented processing contexts. feature maps were employed in a lithuanian word-recognition task. the spectral analysis led to the highest word recognition rate. spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. the 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.},}
TY - report
TI - Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition
SP - 1072
EP - 1081
AU - Korvel, Gražina
AU - Treigys, Povilas
AU - Tamulevicus, Gintautas
AU - Bernataviciene, Jolita
AU - Kostek, Bozena
PY - 2018
JO - Journal of the Audio Engineering Society
IS - 12
VO - 66
VL - 66
Y1 - December 2018
TY - report
TI - Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition
SP - 1072
EP - 1081
AU - Korvel, Gražina
AU - Treigys, Povilas
AU - Tamulevicus, Gintautas
AU - Bernataviciene, Jolita
AU - Kostek, Bozena
PY - 2018
JO - Journal of the Audio Engineering Society
IS - 12
VO - 66
VL - 66
Y1 - December 2018
AB - The aim of this study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN), which is a class of deep, feed-forward artificial neural network. The authors analyzed the audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. This choice was made because CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in a Lithuanian word-recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.
The aim of this study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN), which is a class of deep, feed-forward artificial neural network. The authors analyzed the audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. This choice was made because CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in a Lithuanian word-recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.
Authors:
Korvel, Gražina; Treigys, Povilas; Tamulevicus, Gintautas; Bernataviciene, Jolita; Kostek, Bozena
Affiliations:
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania; Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland(See document for exact affiliation information.) JAES Volume 66 Issue 12 pp. 1072-1081; December 2018
Publication Date:
December 20, 2018Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19880