Ipsilateral and contralateral head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at a virtual location. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we compare and contrast subspace-based techniques for reconstructing HRTFs at arbitrary directions for a sparse dataset (e.g., IRCAM-Listen HRTF database) using (i) hybrid-based (combined linear and nonlinear) principal component analysis (PCA)+fully-connected neural network (FCNN), and (ii) a fully nonlinear (viz., deep learning based) Autoencoder (AE) approach. The results from the AE-based approach show improvement over the hybrid approach, in both objective and subjective tests, and we validate the AE-based approach on the MIT dataset.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.