AES E-Library

AES E-Library

Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network

To improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (CNN) combined with time-delay neural network (TDNN) is introduced, which is referred as CNN-TDNN. The CNN-TDNN model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the CNN-TDNN hidden layers. Experimental results show that the optimized CNN-TDNN model has better performance than DNN, CNN, TDNN, and CNN-TDNN. The average recognition word error rate (WER) can be reduced by 11.76% when comparing with the baselines.

AES Convention: Paper Number:
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this paper!

AES - Audio Engineering Society