For machine hearing in complex sences (i.e. reverberation, noise), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, it is suggested that there may be cross-talk between identification and localization streams in auditory system. Based on this idea, a multi-task based sound localization method is proposed in this study. The proposed model takes waveform as input, and simutaneously estimates the azimuth of sound source and the time-frequency (T-F) masks. Localization experiments were performed using binaural simulation in reverberant environment and the results show that comparing to single-task sound localization method, the presence of speech enhancement task can improve the localization performance.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.