Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. However, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. A new online training method is proposed to improve robustness of speaker recognition in noisy conditions. An estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. Based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. When the signal-to-noise ratio is higher than 20 dB, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. When the signal-to-noise ratio is below 15 dB, the proposed method improves robustness of recognition. However, the new method shows limitations with speech samples that have been contaminated with interior train noise. Train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.