Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks
×
Cite This
Citation & Abstract
C. Lan, Y. Wang, L. Zhang, and H. Zhao, "Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks," J. Audio Eng. Soc., vol. 70, no. 7/8, pp. 611-620, (2022 July.). doi:
C. Lan, Y. Wang, L. Zhang, and H. Zhao, "Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks," J. Audio Eng. Soc., vol. 70 Issue 7/8 pp. 611-620, (2022 July.). doi:
Abstract: To improve the recognition rate of the speaker recognition system, a model scheme combined with the Additive Margin--Softmax loss function is proposed from the perspective of model differentiation and based on the fusion of Convolutional Neural Network and Gated Recurrent Unit, which not only reduces the distance of similar sample features and increases the distance among different types of sample features simultaneously but also uses layer normalization to constrain the distribution of high-dimensional features. In order to address the problem of poor robustness of the speaker recognition system in real scenes, the SpecAugment data enhancement method is proposed to train the speaker model to combat external environmental interference. Based on the experimental data, the speech recognition performance of the proposed and traditional methods is analyzed. The experimental results show that, compared with other models, the equal error rate based on the Additive Margin--Convolutional Neural Network--Gated Recurrent Unit method is 4.48%, and the recognition rate is 99.18%. Adding layer normalization to the training model can improve the training speed to a certain extent, and the speaker model has better robustness.
@article{lan2022research,
author={lan, chaofeng and wang, yuqiao and zhang, lei and zhao, hongyun},
journal={journal of the audio engineering society},
title={research on additive margin softmax speaker recognition based on convolutional and gated recurrent neural networks},
year={2022},
volume={70},
number={7/8},
pages={611-620},
doi={},
month={july},}
@article{lan2022research,
author={lan, chaofeng and wang, yuqiao and zhang, lei and zhao, hongyun},
journal={journal of the audio engineering society},
title={research on additive margin softmax speaker recognition based on convolutional and gated recurrent neural networks},
year={2022},
volume={70},
number={7/8},
pages={611-620},
doi={},
month={july},
abstract={to improve the recognition rate of the speaker recognition system, a model scheme combined with the additive margin--softmax loss function is proposed from the perspective of model differentiation and based on the fusion of convolutional neural network and gated recurrent unit, which not only reduces the distance of similar sample features and increases the distance among different types of sample features simultaneously but also uses layer normalization to constrain the distribution of high-dimensional features. in order to address the problem of poor robustness of the speaker recognition system in real scenes, the specaugment data enhancement method is proposed to train the speaker model to combat external environmental interference. based on the experimental data, the speech recognition performance of the proposed and traditional methods is analyzed. the experimental results show that, compared with other models, the equal error rate based on the additive margin--convolutional neural network--gated recurrent unit method is 4.48%, and the recognition rate is 99.18%. adding layer normalization to the training model can improve the training speed to a certain extent, and the speaker model has better robustness.},}
TY - paper
TI - Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks
SP - 611
EP - 620
AU - Lan, Chaofeng
AU - Wang, Yuqiao
AU - Zhang, Lei
AU - Zhao, Hongyun
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 7/8
VO - 70
VL - 70
Y1 - July 2022
TY - paper
TI - Research on Additive Margin Softmax Speaker Recognition Based on Convolutional and Gated Recurrent Neural Networks
SP - 611
EP - 620
AU - Lan, Chaofeng
AU - Wang, Yuqiao
AU - Zhang, Lei
AU - Zhao, Hongyun
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 7/8
VO - 70
VL - 70
Y1 - July 2022
AB - To improve the recognition rate of the speaker recognition system, a model scheme combined with the Additive Margin--Softmax loss function is proposed from the perspective of model differentiation and based on the fusion of Convolutional Neural Network and Gated Recurrent Unit, which not only reduces the distance of similar sample features and increases the distance among different types of sample features simultaneously but also uses layer normalization to constrain the distribution of high-dimensional features. In order to address the problem of poor robustness of the speaker recognition system in real scenes, the SpecAugment data enhancement method is proposed to train the speaker model to combat external environmental interference. Based on the experimental data, the speech recognition performance of the proposed and traditional methods is analyzed. The experimental results show that, compared with other models, the equal error rate based on the Additive Margin--Convolutional Neural Network--Gated Recurrent Unit method is 4.48%, and the recognition rate is 99.18%. Adding layer normalization to the training model can improve the training speed to a certain extent, and the speaker model has better robustness.
To improve the recognition rate of the speaker recognition system, a model scheme combined with the Additive Margin--Softmax loss function is proposed from the perspective of model differentiation and based on the fusion of Convolutional Neural Network and Gated Recurrent Unit, which not only reduces the distance of similar sample features and increases the distance among different types of sample features simultaneously but also uses layer normalization to constrain the distribution of high-dimensional features. In order to address the problem of poor robustness of the speaker recognition system in real scenes, the SpecAugment data enhancement method is proposed to train the speaker model to combat external environmental interference. Based on the experimental data, the speech recognition performance of the proposed and traditional methods is analyzed. The experimental results show that, compared with other models, the equal error rate based on the Additive Margin--Convolutional Neural Network--Gated Recurrent Unit method is 4.48%, and the recognition rate is 99.18%. Adding layer normalization to the training model can improve the training speed to a certain extent, and the speaker model has better robustness.
Authors:
Lan, Chaofeng; Wang, Yuqiao; Zhang, Lei; Zhao, Hongyun
Affiliations:
College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China; Beidahuang Industry Group General Hospital, Harbin, China; College of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China(See document for exact affiliation information.) JAES Volume 70 Issue 7/8 pp. 611-620; July 2022
Publication Date:
July 19, 2022Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21827