Thursday, October 19, 2:00 pm — 3:30 pm
P11-1 Deep Neural Network Based HRTF Personalization Using Anthropometric Measurements—Chan Jun Chun, Korea Institute of Civil Engineering and Building Technology (KICT) - Goyang, Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Nam Kyun Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
A head-related transfer function (HRTF) is a very simple and powerful tool for producing spatial sound by filtering monaural sound. It represents the effects of the head, body, and pinna as well as the pathway from a given source position to a listener’s ears. Unfortunately, while the characteristics of HRTF differ slightly from person to person, it is usual to use the HRIR that is averaged over all the subjects. In addition, it is difficult to measure individual HRTFs for all horizontal and vertical directions. Thus, this paper proposes a deep neural network (DNN)-based HRTF personalization method using anthropometric measurements. To this end, the CIPIC HRTF database, which is a public domain database of HRTF measurements, is analyzed to generate a DNN model for HRTF personalization. The input features for the DNN are taken as the anthropometric measurements, including the head, torso, and pinna information. Additionally, the output labels are taken as the head-related impulse response (HRIR) samples of a left ear. The performance of the proposed method is evaluated by computing the root-mean-square error (RMSE) and log-spectral distortion (LSD) between the referenced HRIR and the estimated one by the proposed method. Consequently, it is shown that the RMSE and LSD for the estimated HRIR are smaller than those of the HRIR averaged over all the subjects from the CIPIC HRTF database.
Convention Paper 9860
P11-2 The Upmix Method for 22.2 Multichannel Sound Using Phase Randomized Impulse Responses—Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
The upmix technique for 22.2 multichannel sound was studied using room impulse responses (RIRs) processed by phase randomized technique. From the result of the first experiment, the spatial impression of proposed method was close to the original sound, but the timbre differed. In the second experiment we divided the RIRs at the moment when the diffuse reverberation tail begins (mixing time) by two kinds of time, namely fixed to 80 msec and different mixing times for each frequency band. From the result, the similarity of proposed methods and the original sound was improved, however, it is suggested that the similarity of the timbre depends on the sound sources and the suitable mixing time of RIRs.
Convention Paper 9861
P11-3 A 3D Sound Localization System Using Two Side Loudspeaker Matrices—Yoshihiko Sato, University of Aizu - Aizuwakamatsu-shi, Fukushima, Japan; Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
We have proposed a new 3D sound reproduction system that consists of two side loudspeaker matrices each with four loudspeakers. The 3D sound images that applied to this system were created by the amplitude panning method and convolution of head-related transfer function (HRTF). In our past research we used the loudspeaker matrices arranged as a square shape, nevertheless the accuracy of sound image localization should be improved. We changed the shape of loudspeaker matrices from a square to a diamond by rotating 45 degrees to improve direction perception. As a result, we could be closer the localized sound images to intended directions than the square-shaped loudspeaker matrices by implementing the diamond-shaped loudspeaker matrices.
Convention Paper 9862
P11-4 Optimization of Interactive Binaural Processing —François Salmon, Ecole Nationale Supérieur Louis-Lumière - Paris, France; CMAP - Ecole Polytechnique - Paris, France; Matthieu Aussal, CMAP - Ecole Polytechnique - Paris, France; Etienne Hendrickx, Paris Conservatory (CNSMDP) - Paris, France; Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Laurent Millot, ENS Louis-Lumière - Paris, France; Acte Institute (UMR 8218, CNRS/University Paris 1) - Paris, France
Several monitoring devices may be involved during a post-production. Given its lower cost and practical aspects, head-tracked binaural processing could be helpful for professionals to monitor spatialized audio contents. However, this technology provides significant spectral coloration in some sound incidences and suffers from its current comparison to a stereophonic signal reproduced through headphones. Therefore, different processing methods are proposed to optimize the binaural rendering and to find a new balance between externalization and timbral coloration. For this purpose, the alteration of the HRTF spectral cues in the frontal area only has been studied. In order to evaluate the accuracy of such treatments, listening tests were conducted. One HRTF processing method offered as much externalization as the original HRTFs while having a closer timbre quality to the original stereo signal.
Convention Paper 9863
P11-5 A Direct Comparison of Localization Performance When Using First, Third, and Fifth Ambisonics Order for Real Loudspeaker and Virtual Loudspeaker Rendering—Lewis Thresh, University of York - York, UK; Calum Armstrong, University of York - York, UK; Gavin Kearney, University of York - York, UK
Ambisonics is being used in applications such as virtual reality to render 3-dimensional sound fields over headphones through the use of virtual loudspeakers, the performance of which has previously been assessed up to third order. Through a localization test, the performance of first, third, and fifth order Ambisonics is investigated for optimized real and virtual loudspeaker arrays utilizing a generic HRTF set. Results indicate a minor improvement in localization accuracy when using fifth order over third though both show vast improvement over first. It is shown that individualized HRTFs are required to fully investigate the performance of Ambisonic binaural rendering.
Convention Paper 9864