Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network
×
Cite This
Citation & Abstract
J. Wang, D. Wang, Y. Chen, X. Lu, and C. Zheng, "Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network," Paper 10272, (2019 October.). doi:
J. Wang, D. Wang, Y. Chen, X. Lu, and C. Zheng, "Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network," Paper 10272, (2019 October.). doi:
Abstract: To improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (CNN) combined with time-delay neural network (TDNN) is introduced, which is referred as CNN-TDNN. The CNN-TDNN model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the CNN-TDNN hidden layers. Experimental results show that the optimized CNN-TDNN model has better performance than DNN, CNN, TDNN, and CNN-TDNN. The average recognition word error rate (WER) can be reduced by 11.76% when comparing with the baselines.
@article{wang2019noise,
author={wang, jie and wang, dunze and chen, yunda and lu, xun and zheng, chengshi},
journal={journal of the audio engineering society},
title={noise robustness automatic speech recognition with convolutional neural network and time delay neural network},
year={2019},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{wang2019noise,
author={wang, jie and wang, dunze and chen, yunda and lu, xun and zheng, chengshi},
journal={journal of the audio engineering society},
title={noise robustness automatic speech recognition with convolutional neural network and time delay neural network},
year={2019},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={to improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (cnn) combined with time-delay neural network (tdnn) is introduced, which is referred as cnn-tdnn. the cnn-tdnn model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the cnn-tdnn hidden layers. experimental results show that the optimized cnn-tdnn model has better performance than dnn, cnn, tdnn, and cnn-tdnn. the average recognition word error rate (wer) can be reduced by 11.76% when comparing with the baselines.},}
TY - paper
TI - Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network
SP -
EP -
AU - Wang, Jie
AU - Wang, Dunze
AU - Chen, Yunda
AU - Lu, Xun
AU - Zheng, Chengshi
PY - 2019
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2019
TY - paper
TI - Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network
SP -
EP -
AU - Wang, Jie
AU - Wang, Dunze
AU - Chen, Yunda
AU - Lu, Xun
AU - Zheng, Chengshi
PY - 2019
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2019
AB - To improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (CNN) combined with time-delay neural network (TDNN) is introduced, which is referred as CNN-TDNN. The CNN-TDNN model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the CNN-TDNN hidden layers. Experimental results show that the optimized CNN-TDNN model has better performance than DNN, CNN, TDNN, and CNN-TDNN. The average recognition word error rate (WER) can be reduced by 11.76% when comparing with the baselines.
To improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (CNN) combined with time-delay neural network (TDNN) is introduced, which is referred as CNN-TDNN. The CNN-TDNN model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the CNN-TDNN hidden layers. Experimental results show that the optimized CNN-TDNN model has better performance than DNN, CNN, TDNN, and CNN-TDNN. The average recognition word error rate (WER) can be reduced by 11.76% when comparing with the baselines.
Authors:
Wang, Jie; Wang, Dunze; Chen, Yunda; Lu, Xun; Zheng, Chengshi
Affiliations:
Guangzhou University, Guangzhou, China; Power Grid Planning Center, Guandgong Power Grid Company, Guangdong, China; Institute of Acoustics, Chinese Academy of Sciences, Beijing, China(See document for exact affiliation information.)
AES Convention:
147 (October 2019)
Paper Number:
10272
Publication Date:
October 8, 2019Import into BibTeX
Subject:
Posters: Applications in Audio
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=20645