AES Milan 2018
Poster Session P24
P24 - Posters: Spatial Audio
Saturday, May 26, 09:30 — 11:00 (Arena 2)
P24-1 Acoustic and Subjective Evaluation of 22.2- and 2-Channel Reproduced Sound Fields in Three Studios—Madhu Ashok, University of Rochester - Rochester, NY, USA; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA
Three studios of similar outer-shell dimensions, with varying acoustic treatments and absorptivity, were evaluated via both recorded and simulated binaural stimuli for 22.2- and 2-channel playback. A series of analysis, including acoustic modelling in CATT-Acoustic and subjective evaluation, was conducted to test whether the 22.2-channel playback preserved common perceptual impressions regardless of room-dependent physical characteristics. Results from multidimensional scaling (MDS) indicated that listeners used one perceptual dimension for differentiating between reproduction format, and others for physical room characteristics. Clarity and early decay time measured in the three studios illustrated a similar pattern when scaled from 2- to 22.2-channel reproduced sound fields. Subjective evaluation revealed a tendency to preserve inherent perceptual characteristics of 22.2-channel playback in spite of different playback conditions.
Convention Paper 10018 (Purchase now)
P24-2 Audio Source Localization as an Input to Virtual Reality Environments—Agneya A. Kerure, Georgia Institute of Technology - Atlanta, GA, USA; Jason Freeman, Georgia Institute of Technology - Atlanta, GA, USA
This paper details an effort towards incorporating audio source localization as an input to virtual reality systems, focusing primarily on games. The goal of this research is to find a novel method to use localized live audio as an input for level generation or creation of elements and objects in a virtual reality environment. The paper discusses the current state of audio-based games and virtual reality, and details the design requirements of a system consisting of a circular microphone array that can be used to localize the input audio. The paper also briefly discusses signal processing techniques used for audio information retrieval and introduces a prototype of an asymmetric virtual reality first-person shooter game as a proof-of-concept of the potential of audio source localization for augmenting the immersive nature of virtual reality.
Convention Paper 10019 (Purchase now)
P24-3 High Order Ambisonics Encoding Method Using Differential Microphone Array—Shan Gao, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
High order Ambisonics (HOA) is a flexible way to represent and analyze the sound field. In the process of the spherical Fourier transform, the microphone array need be uniformly scattered on the surface of a sphere, which limits the application of the theory. In this paper we introduce a HOA encoding method using the differential microphone arrays (DMAs). We obtain the particular beam patterns of different orders of spherical functions by the weighted sum of time-delayed outputs from closely-spaced differential microphone array. Then HOA coefficients are estimated by projecting the signals to the beam patterns. The coefficients calculated by the DMA are compared to the results derived from the theoretical spherical harmonics, which proves the effectiveness of our method.
Convention Paper 10020 (Purchase now)
P24-4 Development of a 64-Channel Spherical Microphone Array and a 122-Channel Loudspeaker Array System for 3D Sound Field Capturing and Reproduction Technology Research—Shoken Kaneko, Yamaha Corporation - Iwata-shi, Japan; Tsukasa Suenaga, Yamaha Corporation - Iwata-shi, Japan; Hitoshi Akiyama, Yamaha Co. - Iwata, Shizuoka, Japan; Yoshiro Miyake, Yamaha Corporation - Iwata-shi, Japan; Satoshi Tominaga, Yamaha Corporation - Hamamatsu-shi, Japan; Futoshi Shirakihara, Yamaha Corporation - Iwata-shi, Japan; Hiraku Okumura, Yamaha Corporation - Iwata-shi, Japan; Kyoto University - Kyoto, Japan
In this paper we present our recent activities on building facilities to drive research and development on 3D sound field capturing and reproduction. We developed a 64-channel spherical microphone array, the ViReal Mic, and a 122-channel loudspeaker array system, the ViReal Dome. The ViReal Mic is a microphone array whose microphone capsules are mounted on a rigid sphere, with the positions determined by the spherical Fibonacci spiral. The ViReal Dome is a loudspeaker array system consisting of 122 active coaxial loudspeakers. We present the details of the developed systems and discuss directions of future research.
Convention Paper 10021 (Purchase now)
P24-5 A Recording Technique for 6 Degrees of Freedom VR—Enda Bates, Trinity College Dublin - Dublin, Ireland; Hugh O'Dwyer, Trinity College - Dublin, Ireland; Karl-Philipp Flachsbarth, Trinity College - Dublin, Ireland; Francis M. Boland, Trinity College Dublin - Dublin, Ireland
This paper presents a new multichannel microphone technique and reproduction system intended to support six degrees of freedom of listener movement. The technique is based on a modified form of the equal segment microphone array (ESMA) concept and utilizes four Ambisonic (B-format) microphones in a near-coincident arrangement with a 50cm spacing. Upon playback, these Ambisonic microphones are transformed into virtual microphones with different polar patterns that change based on the listener's position within the reproduction area. The results of an objective analysis and an informal subjective listening test indicate some inconsistencies in the on and off-axis response, but suggest that the technique can potentially support six degrees of freedom in a recorded audio scene using a compact microphone array that is well suited to Virtual Reality (VR) and particularly Free View Point (FVV) applications.
Convention Paper 10022 (Purchase now)
P24-6 On the Use of Bottleneck Features of CNN Auto-Encoder for Personalized HRTFs—Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Chan Jun Chun, Korea Institute of Civil Engineering and Building Technology (KICT) - Goyang, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
The most effective way of providing immersive sound effects is to use head-related transfer functions (HRTFs). HRTFs are defined by the path from a given sound source to the listener's ears. However, sound propagation by HRTFs differs slightly between people because the head, body, and ears differ for each person. Recently, a method for estimating HRTFs using a neural network has been developed, where anthropometric pinna measurements and head-related impulse responses (HRIRs) are used as the input and output layer of the neural network. However, it is inefficient to accurately measure such anthropometric data. This paper proposes a feature extraction method for the ear image instead of measuring anthropometric pinna measurements directly. The proposed method utilizes the bottleneck features of a convolutional neural network (CNN) auto-encoder from the edge detected ear image. The proposed feature extraction method using the CNN-based auto-encoder will be incorporated into the HRTF estimation approach.
Convention Paper 10023 (Purchase now)