Friday, October 20, 9:00 am — 12:00 pm
P12-1 Efficient Structures for Virtual Immersive Audio Processing—Jean-Marc Jot, Magic Leap - Sunnyvale, CA, USA; Daekyoung Noh, Xperi Corp - Santa Ana, CA, USA; Themis Katsianos, Xperi Corp - Highland, CA, USA
New consumer audio formats have been developed in recent years for the production and distribution of immersive multichannel audio recordings including surround and height channels. HRTF-based binaural synthesis and cross-talk cancellation techniques can simulate virtual loudspeakers, localized in the horizontal plane or at elevated apparent positions, for audio reproduction over headphones or convenient loudspeaker playback systems. In this paper we review and discuss the practical design and implementation challenges of immersive audio virtualization methods, and describe computationally efficient processing approaches and topologies enabling more robust and consistent reproduction of directional audio cues in consumer applications.
Convention Paper 9865
P12-2 Robust 3D Sound Capturing with Planar Microphone Arrays Using Directional Audio Coding—Oliver Thiergart, International Audio Laboratories Erlangen - Erlangen, Germany; Guendalina Milano, International Audio Laboratories Erlangen - Erlangen, Germany; Tobias Ascherl, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
Real-world VR applications require to capture 3D sound with microphone setups that are hidden from the field-of-view of the 360-degree camera. Directional audio coding (DirAC) is a spatial sound capturing approach that can be applied to a wide range of compact microphone arrays. Unfortunately, its underlying parametric sound field model is often violated which leads to a degradation of the spatial sound quality. Therefore, we combine the non-linear DirAC processing with a linear beamforming approach that approximates the panning gains in DirAC such that the required amount of non-linear processing is reduced while increasing the robustness against model violations. Additionally, we derive a DOA estimator that enables 3D sound capturing with DirAC using compact 2D microphone arrays, which are often preferred in VR applications.
Convention Paper 9866
P12-3 Frequency Bands Distribution for Virtual Source Widening in Binaural Synthesis—Hengwei Su, Tokyo University of the Arts - Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
The aim of this study is to investigate the perceived width in binaural synthesis. To synthesize sounds with extended source widths, monophonic signals were divided by 1/3-octave filter bank, each component was then distributed to different directions by convolution with head-related transfer function within the intended width. A subjective listening experiment was conducted by using pairwise comparison to evaluate differences of perceived widths between stimuli with different synthesis widths and distribution methods. The results showed that this processing method can achieve a wider sound source width in binaural synthesis. However, effectiveness may vary with spectrum characteristics of source signals. Thus, a further revision of this method is needed to improve the stability and the performance.
Convention Paper 9867
P12-4 Improving Elevation Perception in Single-Layer Loudspeaker Array Display Using Equalizing Filters and Lateral Grouping—Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan; Naoki Fukasawa, University of Aizu - Aizu Wakamatsu, Japan; Yurina Suzuki, University of Aizu - Aizu Wakamatsu, Japan
A system to improve the perception of elevated sources is presented. This method relies on “equalizing filters,” a technique that aims to compensate for unintended changes in the magnitude spectrum produced by the placement of loudspeakers with respect to the desired location. In the proposed method, when sources are on the horizon, a maximum of two loudspeakers are used for reproduction. Otherwise, the horizon spatialization is mixed with one that uses side loudspeakers grouped by lateral direction. Results from a subjective experiment suggest that the proposed method is capable of producing elevated images, but the perceived elevation range is somewhat compressed.
Convention Paper 9868
P12-5 Development and Application of a Stereophonic Multichannel Recording Technique for 3D Audio and VR—Helmut Wittek, SCHOEPS GmbH - Karlsruhe, Germany; Günther Theile, VDT - Geretsried, Germany
A newly developed microphone arrangement is presented that aims at an optimal pickup of ambient sound for 3D Audio. The ORTF-3D is a discrete 8ch setup that can be routed to the channels of a 3D Stereo format such as Dolby Atmos or Auro3D. It is also ideally suited for immersive sound formats such as wavefield synthesis or VR/Binaural, as it creates a complex 3D ambience that can be mixed or binauralized. The ORTF-3D setup was developed on the basis of stereophonic rules. It creates an optimal directional image in all directions as well as a high spatial sound quality due to highly uncorrelated signals in the diffuse sound. Reports from sound engineers affirm that it creates a highly immersive sound in a large listening area and still is compact and practical to use.
Convention Paper 9869
P12-6 Apparent Sound Source De-Elevation Using Digital Filters Based on Human Sound Localization—Adrian Celestinos, Samsung Research America - Valencia, CA, USA; Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Ritesh Banka, Samsung Research America - Valencia, CA USA; William Decanio, Samsung Research America - Valencia, CA, USA; Allan Devantier, Samsung Research America - Valencia, CA, USA
The possibility of creating an apparent sound source elevated or de-elevated from its current physical location is presented in this study. For situations where loudspeakers need to be placed in different locations than the ideal placement for accurate sound reproduction digital filters are created and connected in the audio reproduction chain either to elevate or de-elevate the perceived sound from its physical location. The filters are based on head related transfer functions (HRTF) measured in human subjects. The filters relate to the average head, ears, and torso transfer functions of humans isolating the effect of elevation/de-elevation only. Preliminary tests in a movie theater setup indicate that apparent de-elevation can be achieved perceiving about –20 degrees from its physical location.
Convention Paper 9870