The end-to-end framework has been introduced into the binaural localization modeling and achieved higher localization accuracy than the other models, however, the reasonability and interpretability for applying the related neural networks remain unclear. It has been well documented that the auditory system relies on binaural cues for sound localization, and the equalization and cancellation (EC) theory describes how the binaural cues are extracted. In this paper, an end-to-end binaural localization model is proposed based on the EC theory. In the proposed model, a convolution neural network(CNN) with a specifically designed activation function is used to implement the EC theory. The proposed model was trained in synthesized rooms and evaluated in real rooms. Experiment results show that CNN kernels learned by the proposed model are corresponding to binaural cues, and the proposed model outperforms the current end-to-end model by a 10.73% improvement in localization accuracy and a 12.91%improvement in root mean square error(RMSE).
https://www.aes.org/e-lib/browse.cfm?elib=21689
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Learn more about the AES E-Library
Start a discussion about this paper!