145th AES CONVENTION Paper Session P09: Spatial Audio-Part 1

AES New York 2018
Paper Session P09

P09 - Spatial Audio-Part 1


Thursday, October 18, 1:30 pm — 3:30 pm (1E11)

Chair:
Jean-Marc Jot, Magic Leap - San Francisco, CA, USA

P09-1 Impression of Spatially Distributed Reverberation in Multichannel Audio ReproductionSarvesh Agrawal, Rensselear Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA
Auditory immersion and spatial impression in multichannel audio reproduction can be altered by changing the number of loudspeakers and independent reverberation channels. The spatial impression can change drastically as one moves away from the sweet-spot. Since multichannel audio reproduction is not limited to one position, it is critical to investigate Listener Envelopment (LEV) and immersion at off-axis positions. This work discusses the impression of spatially distributed decorrelated reverberation at on- and off-axis positions. Laboratory environment is used to reproduce a diffused sound field in the horizontal plane through 128 independent audio channels and loudspeakers. Results from psychoacoustical experiments show that there are perceptible differences even at higher channel counts. However, spatial impression does not change significantly beyond 16 channels of decorrelated reverberation and equally spaced loudspeakers at on- and off-axis positions.
Convention Paper 10076 (Purchase now)

P09-2 From Spatial Recording to Immersive Reproduction—Design & Implementation of a 3DOF Audio-Visual VR SystemMaximillian Kentgens, RWTH Aachen University - Aachen, Germany; Stefan Kühl, RWTH Aachen University - Aachen, Germany; Christiane Antweiler, RWTH Aachen University - Aachen, Germany; Peter Jax, RWTH Aachen University - Aachen, Germany
The complex mutual interaction between human visual perception and hearing demands combined examinations of 360° video and spatial audio systems for Virtual Reality (VR) applications. Therefore, we present a joint audio-visual end-to-end chain from spatial recording to immersive reproduction with full rotational three degrees of freedom (3DOF). The audio-subsystem is based on Higher Order Ambisonics (HOA) obtained by Spherical Microphone Array (SMA) recordings, while the video is captured with a 360° camera rig. A spherical multi-loudspeaker setup for audio in conjunction with a VR head-mounted video display is used to reproduce a scene as close as possible to the original scene with regard to the perceptual modalities of the user. A database of immersive content as a basis for future research in spatial signal processing was set up by recording several rehearsals and concerts of the Aachen Symphony Orchestra. The data was used for a qualitative assessment of the eligibility of the proposed end-to-end system. A discussion shows the potential and limitations of the approach. Therein, we highlight the importance of coherent audio and video to achieve a high degree of immersion with VR recordings.
Convention Paper 10077 (Purchase now)

P09-3 Required Bit Rate of MPEG-4 AAC for 22.2 Multichannel Sound Contribution and DistributionShu Kitajima, NHK Science & Technology Research Laboratories - Tokyo, Japan; Takehiro Sugimoto, NHK - Setagaya-ku, Tokyo, Japan; Satoshi Oode, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tomoyasu Komori, NHK Science and Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Joji Urano, Japan Television Network Corporation - Tokyo, Japan
22.2 multichannel sound (22.2 ch sound) is currently broadcast using MPEG-4 Advanced Audio Coding (AAC) in 8K Super Hi-Vision broadcasting in Japan. The use of MPEG-4 AAC for contribution and distribution transmissions is also planned. Contribution and distribution transmissions require sufficient audio quality to withstand repeated coding and decoding processes. In this study the bit rate of MPEG-4 AAC for a 22.2 ch sound signal satisfying tandem transmission quality was investigated by the subjective evaluation specified in Recommendation ITU-R BS.1116-3. The basic audio quality of 72 stimuli made from a combination of 6 bit rates, 3 different numbers of tandems, and 4 contents were evaluated by 28 listeners. The required bit rates of 22.2 sound material transmission for 3, 5, and 7 tandems were concluded to be 96, 144, and 160 kbit/s per channel, respectively.
Convention Paper 10078 (Purchase now)

P09-4 Effect of Binaural Difference in Loudspeaker Directivity on Spatial Audio ProcessingDaekyoung Noh, Xperi/DTS - Santa Ana, CA, USA; Oveal Walker, XPERI/DTS - Calabasas, CA, USA
Head Related Transfer Functions (HRTFs) are typically measured with loudspeakers facing the listener. Therefore, it is assumed that loudspeaker directivity to the left and the right ear is equal. However, in practice the directivity to both ears may not be equal. For instance, differences can be caused by changes to a listener’s location or uncommon loudspeaker driver orientations. This paper discusses the effect of binaural difference in directivity of loudspeaker on spatial audio processing and proposes an efficient solution that improves the spatial effect by compensating the directivity difference. Subjective evaluation is conducted to measure the performance of the proposed solution.
Convention Paper 10079 (Purchase now)


Return to Paper Sessions