Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles
×
Cite This
Citation & Abstract
AH. H.. Al-Noori, and P. Duncan, "Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles," J. Audio Eng. Soc., vol. 67, no. 4, pp. 174-189, (2019 April.). doi: https://doi.org/10.17743/jaes.2019.0004
AH. H.. Al-Noori, and P. Duncan, "Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles," J. Audio Eng. Soc., vol. 67 Issue 4 pp. 174-189, (2019 April.). doi: https://doi.org/10.17743/jaes.2019.0004
Abstract: Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. However, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. A new online training method is proposed to improve robustness of speaker recognition in noisy conditions. An estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. Based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. When the signal-to-noise ratio is higher than 20 dB, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. When the signal-to-noise ratio is below 15 dB, the proposed method improves robustness of recognition. However, the new method shows limitations with speech samples that have been contaminated with interior train noise. Train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.
@article{al-noori2019robust,
author={al-noori, ahmed h.y. and duncan, philip},
journal={journal of the audio engineering society},
title={robust speaker recognition in noisy conditions by means of online training with noise profiles},
year={2019},
volume={67},
number={4},
pages={174-189},
doi={https://doi.org/10.17743/jaes.2019.0004},
month={april},}
@article{al-noori2019robust,
author={al-noori, ahmed h.y. and duncan, philip},
journal={journal of the audio engineering society},
title={robust speaker recognition in noisy conditions by means of online training with noise profiles},
year={2019},
volume={67},
number={4},
pages={174-189},
doi={https://doi.org/10.17743/jaes.2019.0004},
month={april},
abstract={automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. however, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. a new online training method is proposed to improve robustness of speaker recognition in noisy conditions. an estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. when the signal-to-noise ratio is higher than 20 db, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. when the signal-to-noise ratio is below 15 db, the proposed method improves robustness of recognition. however, the new method shows limitations with speech samples that have been contaminated with interior train noise. train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.},}
TY - paper
TI - Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles
SP - 174
EP - 189
AU - Al-Noori, Ahmed H.Y.
AU - Duncan, Philip
PY - 2019
JO - Journal of the Audio Engineering Society
IS - 4
VO - 67
VL - 67
Y1 - April 2019
TY - paper
TI - Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles
SP - 174
EP - 189
AU - Al-Noori, Ahmed H.Y.
AU - Duncan, Philip
PY - 2019
JO - Journal of the Audio Engineering Society
IS - 4
VO - 67
VL - 67
Y1 - April 2019
AB - Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. However, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. A new online training method is proposed to improve robustness of speaker recognition in noisy conditions. An estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. Based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. When the signal-to-noise ratio is higher than 20 dB, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. When the signal-to-noise ratio is below 15 dB, the proposed method improves robustness of recognition. However, the new method shows limitations with speech samples that have been contaminated with interior train noise. Train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.
Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. However, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. A new online training method is proposed to improve robustness of speaker recognition in noisy conditions. An estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. Based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. When the signal-to-noise ratio is higher than 20 dB, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. When the signal-to-noise ratio is below 15 dB, the proposed method improves robustness of recognition. However, the new method shows limitations with speech samples that have been contaminated with interior train noise. Train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.
Authors:
Al-Noori, Ahmed H.Y.; Duncan, Philip
Affiliation:
School of Computing Science and Engineering, University of Salford, Salford, UK JAES Volume 67 Issue 4 pp. 174-189; April 2019
Publication Date:
April 5, 2019Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=20450