This article presents a method for real-time estimation of the directions of speech sources from captured binaural audio. Accurate direction estimates are required in order to embed the sound sources correctly to the auditory environment of the far-end user in telecommunication between two augmented reality audio (ARA) users. The dependency of the estimation accuracy on the orientation of the near-end user is avoided in this method by combining the information from an orientation tracker to the direction estimates. The results from the anechoic experiments illustrate that the presented method can estimate the direction(s) of non-simultaneous speech source(s) in real-time, and that head movement improves the estimation accuracy of sources on the sides of the user.
https://www.aes.org/e-lib/browse.cfm?elib=15540
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Learn more about the AES E-Library
Start a discussion about this paper!