The use of neural networks for modeling and interpolating binaural room impulse responses (BRIRs) is investigated for facilitating spatial audio applications that require head tracking in multiple degrees of freedom. A deep neural network model is adopted from an architecture originally proposed for neural representation problems to predict unknown BRIRs that contain salient early reflection peaks, given head coordinates. Instead of its original time-domain formulation, a frequency-domain formulation is proposed to enhance the model efficiency and flexibility for band-limited BRIRs. Both model formulations are evaluated with measured and simulated BRIRs in terms of modeling accuracy and interpolation performance, respectively. It is shown that the frequency-domain formulation is more effecient at modeling band-limited BRIRs than its time-domain counterpart as the former only learns the partial frequency spectrum, and that models with both formulations significantly outperform conventional methods for interpolating sparse BRIRs.
https://www.aes.org/e-lib/browse.cfm?elib=22312
Download Now (1.7 MB)
This paper is Open Access which means you can download it for free.
Learn more about the AES E-Library
Start a discussion about this Immersive & Spatial Audio!