AES New York 2017
Paper Session P16
P16 - Spatial Audio—Part 2
Saturday, October 21, 9:00 am — 12:30 pm (Rm 1E11)
Jean-Marc Jot, Magic Leap - Sunnyvale, CA, USA
P16-1 On Data-Driven Approaches to Head-Related-Transfer Function Personalization—Haytham Fayek, Oculus Research and Facebook - Redmond, WA, USA; Laurens van der Maaten, Facebook AI Research - New York, NY, USA; Griffin Romigh, Oculus Research - Redmond, WA, USA; Ravish Mehra, Oculus Research - Redmond, WA, USA
Head-Related Transfer Function (HRTF) personalization is key to improving spatial audio perception and localization in virtual auditory displays. We investigate the task of personalizing HRTFs from anthropometric measurements, which can be decomposed into two sub tasks: Interaural Time Delay (ITD) prediction and HRTF magnitude spectrum prediction. We explore both problems using state-of-the-art Machine Learning (ML) techniques. First, we show that ITD prediction can be significantly improved by smoothing the ITD using a spherical harmonics representation. Second, our results indicate that prior unsupervised dimensionality reduction-based approaches may be unsuitable for HRTF personalization. Last, we show that neural network models trained on the full HRTF representation improve HRTF prediction compared to prior methods.
Convention Paper 9890 (Purchase now)
P16-2 Eigen-Images of Head-Related Transfer Functions—Christoph Hold, Technische Universität Berlin - Berlin, Germany; Fabian Seipel, TU Berlin - Berlin, Germany; Fabian Brinkmann, Audio Communication Group, Technical University Berlin - Berlin, Germany; Athanasios Lykartsis, TU Berlin - Berlin, Germany; Stefan Weinzierl, Technical University of Berlin - Berlin, Germany
The individualization of head-related transfer functions (HRTFs) leads to perceptually enhanced virtual environments. Particularly the peak-notch structure in HRTF spectra depending on the listener’s specific head and pinna anthropometry contains crucial auditive cues, e.g., for the perception of sound source elevation. Inspired by the eigen-faces approach, we have decomposed image representations of individual full spherical HRTF data sets into linear combinations of orthogonal eigen-images by principle component analysis (PCA). Those eigen-images reveal regions of inter-subject variability across sets of HRTFs depending on direction and frequency. Results show common features as well as spectral variation within the individual HRTFs. Moreover, we can statistically de-noise the measured HRTFs using dimensionality reduction.
Convention Paper 9891 (Purchase now)
P16-3 A Method for Efficiently Calculating Head-Related Transfer Functions Directly from Head Scan Point Clouds—Rahulram Sridhar, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
A method is developed for efficiently calculating head-related transfer functions (HRTFs) directly from head scan point clouds of a subject using a database of HRTFs, and corresponding head scans, of many subjects. Consumer applications require HRTFs be estimated accurately and efficiently, but existing methods do not simultaneously meet these requirements. The presented method uses efficient matrix multiplications to compute HRTFs from spherical harmonic representations of head scan point clouds that may be obtained from consumer-grade cameras. The method was applied to a database of only 23 subjects, and while calculated interaural time difference errors are found to be above estimated perceptual thresholds for some spatial directions, HRTF spectral distortions up to 6 kHz fall below perceptual thresholds for most directions.
Convention Paper 9892 (Purchase now)
P16-4 Head Rotation Data Extraction from Virtual Reality Gameplay Using Non-Individualized HRTFs—Juan Simon Calle, New York University - New York, NY, USA; THX; Agnieszka Roginska, New York University - New York, NY, USA
A game was created to analyze the subject’s head rotation during the process of localizing a sound in a 360-degree sphere in a VR gameplay. In this game the subjects are asked to locate a series of sounds that are randomly placed in a sphere around their heads using generalized HRTFs. The only instruction given to the subjects is that they need to locate the sounds as fast and accurate as possible by looking at where the sound was and then pressing a trigger. To test this tool 16 subjects were used. It showed that the average time that it took the subjects to locate the sound was 3.7±1.8 seconds. The average error in accuracy was 15.4 degrees. The average time that it took the subjects to start moving their head was 0.2 seconds approximately. The average rotation speed achieved its maximum value at 0.8 seconds and the average speed at this point was approximately 102 degrees per second.
Convention Paper 9893 (Purchase now)
P16-5 Relevance of Headphone Characteristics in Binaural Listening Experiments: A Case Study—Florian Völk, Technische Universität München - Munich, Germany; WindAcoustics - Windach, Germany; Jörg Encke, Technical University of Munich - Munich, Germany; Jasmin Kreh, Technical University of Munich - Munich, Germany; Werner Hemmert, Technical University of Munich - Munich, Germany
Listening experiments typically target performance and capabilities of the auditory system. Another common application scenario is the perceptual validation of algorithms and technical systems. In both cases, systems other than the device or subject under test must not affect the results in an uncontrolled manner. Binaural listening experiments require that two signals with predefined amplitude or phase differences stimulate the left and right ear, respectively. Headphone playback is a common method for presenting the signals. This study quantifies potential headphone-induced interaural differences by physical measurements on selected circum-aural headphones and by comparison to psychoacoustic data. The results indicate that perceptually relevant effects may occur, in binaural listening experiments, traditional binaural headphone listening, and virtual acoustics rendering such as binaural synthesis.
Convention Paper 9894 (Purchase now)
P16-6 Evaluating Binaural Reproduction Systems from Behavioral Patterns in a Virtual Reality—A Case Study with Impaired Binaural Cues and Tracking Latency—Olli Rummukainen, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Sebastian Schlecht, International Audio Laboratories - Erlangen, Germany; Axel Plinge, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
This paper proposes a method for evaluating real-time binaural reproduction systems by means of a wayfinding task in six degrees of freedom. Participants physically walk to sound objects in a virtual reality created by a head-mounted display and binaural audio. The method allows for comparative evaluation of different rendering and tracking systems. We show how the localization accuracy of spatial audio rendering is reflected by objective measures of the participants' behavior and task performance. As independent variables we add tracking latency or reduce the binaural cues. We provide a reference scenario with loudspeaker reproduction and an anchor scenario with monaural reproduction for comparison.
Convention Paper 9895 (Purchase now)
P16-7 Coding Strategies for Multichannel Wiener Filters in Binaural Hearing Aids—Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Beatriz Lopez-Garrido, Servicio de Salud de Castilla la Mancha (SESCAM) - Castilla-Mancha, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
Binaural hearing aids use sound spatial techniques to increase intelligibility, but the design of the algorithms for these devices presents strong constraints. To minimize power consumption and maximize battery life, the digital signal processors embedded in these devices have very low frequency clocks and low amount of available memory. In the binaural case the wireless communication between both hearing devices also increases the power consumption, making necessary the study of relationship between intelligibility improvement and required transmission bandwidth. In this sense, this paper proposes and compares several coding strategies in the implementation of binaural multichannel Wiener filters, with the aim of keeping minimal communication bandwidth and transmission power. The obtained results demonstrate the suitability of the proposed coding strategies.
Convention Paper 9896 (Purchase now)