AES New York 2017
Poster Session P11
P11 - Spatial Audio
Thursday, October 19, 2:00 pm — 3:30 pm (Poster Area)
P11-1 Deep Neural Network Based HRTF Personalization Using Anthropometric Measurements—Chan Jun Chun, Korea Institute of Civil Engineering and Building Technology (KICT) - Goyang, Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Nam Kyun Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
A head-related transfer function (HRTF) is a very simple and powerful tool for producing spatial sound by filtering monaural sound. It represents the effects of the head, body, and pinna as well as the pathway from a given source position to a listener’s ears. Unfortunately, while the characteristics of HRTF differ slightly from person to person, it is usual to use the HRIR that is averaged over all the subjects. In addition, it is difficult to measure individual HRTFs for all horizontal and vertical directions. Thus, this paper proposes a deep neural network (DNN)-based HRTF personalization method using anthropometric measurements. To this end, the CIPIC HRTF database, which is a public domain database of HRTF measurements, is analyzed to generate a DNN model for HRTF personalization. The input features for the DNN are taken as the anthropometric measurements, including the head, torso, and pinna information. Additionally, the output labels are taken as the head-related impulse response (HRIR) samples of a left ear. The performance of the proposed method is evaluated by computing the root-mean-square error (RMSE) and log-spectral distortion (LSD) between the referenced HRIR and the estimated one by the proposed method. Consequently, it is shown that the RMSE and LSD for the estimated HRIR are smaller than those of the HRIR averaged over all the subjects from the CIPIC HRTF database.
Convention Paper 9860 (Purchase now)
P11-2 The Upmix Method for 22.2 Multichannel Sound Using Phase Randomized Impulse Responses—Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
The upmix technique for 22.2 multichannel sound was studied using room impulse responses (RIRs) processed by phase randomized technique. From the result of the first experiment, the spatial impression of proposed method was close to the original sound, but the timbre differed. In the second experiment we divided the RIRs at the moment when the diffuse reverberation tail begins (mixing time) by two kinds of time, namely fixed to 80 msec and different mixing times for each frequency band. From the result, the similarity of proposed methods and the original sound was improved, however, it is suggested that the similarity of the timbre depends on the sound sources and the suitable mixing time of RIRs.
Convention Paper 9861 (Purchase now)
P11-3 A 3D Sound Localization System Using Two Side Loudspeaker Matrices—Yoshihiko Sato, University of Aizu - Aizuwakamatsu-shi, Fukushima, Japan; Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
We have proposed a new 3D sound reproduction system that consists of two side loudspeaker matrices each with four loudspeakers. The 3D sound images that applied to this system were created by the amplitude panning method and convolution of head-related transfer function (HRTF). In our past research we used the loudspeaker matrices arranged as a square shape, nevertheless the accuracy of sound image localization should be improved. We changed the shape of loudspeaker matrices from a square to a diamond by rotating 45 degrees to improve direction perception. As a result, we could be closer the localized sound images to intended directions than the square-shaped loudspeaker matrices by implementing the diamond-shaped loudspeaker matrices.
Convention Paper 9862 (Purchase now)
P11-4 Optimization of Interactive Binaural Processing —François Salmon, CMAP - Ecole Polytechnique - Paris, France; Ecole nationale supérieure Louis-Lumière - Paris, France; Matthieu Aussal, CMAP - Ecole Polytechnique - Paris, France; Etienne Hendrickx, University of Brest - Paris, France; Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Laurent Millot, ENS Louis-Lumière - Paris, France; Acte Institute (UMR 8218, CNRS/University Paris 1) - Paris, France
Several monitoring devices may be involved during a post-production. Given its lower cost and practical aspects, head-tracked binaural processing could be helpful for professionals to monitor spatialized audio contents. However, this technology provides significant spectral coloration in some sound incidences and suffers from its current comparison to a stereophonic signal reproduced through headphones. Therefore, different processing methods are proposed to optimize the binaural rendering and to find a new balance between externalization and timbral coloration. For this purpose, the alteration of the HRTF spectral cues in the frontal area only has been studied. In order to evaluate the accuracy of such treatments, listening tests were conducted. One HRTF processing method offered as much externalization as the original HRTFs while having a closer timbre quality to the original stereo signal.
Convention Paper 9863 (Purchase now)
P11-5 A Direct Comparison of Localization Performance When Using First, Third, and Fifth Ambisonics Order for Real Loudspeaker and Virtual Loudspeaker Rendering—Lewis Thresh, University of York - York, UK; Cal Armstrong, University of York - York, UK; Gavin Kearney, University of York - York, UK
Ambisonics is being used in applications such as virtual reality to render 3-dimensional sound fields over headphones through the use of virtual loudspeakers, the performance of which has previously been assessed up to third order. Through a localization test, the performance of first, third, and fifth order Ambisonics is investigated for optimized real and virtual loudspeaker arrays utilizing a generic HRTF set. Results indicate a minor improvement in localization accuracy when using fifth order over third though both show vast improvement over first. It is shown that individualized HRTFs are required to fully investigate the performance of Ambisonic binaural rendering.
Convention Paper 9864 (Purchase now)