AES Los Angeles 2014
Paper Session P1
P1 - Spatial Audio: Part 1
Thursday, October 9, 9:00 am — 12:30 pm (Room 308 AB)
Jason Corey, University of Michigan - Ann Arbor, MI, USA
P1-1 MPEG-H Audio—The New Standard for Universal Spatial / 3D Audio Coding—Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Achim Kuntz, International Audio Laboratories Erlangen - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Recently, a new generation of spatial audio formats were introduced that include elevated loudspeakers and surpass traditional surround sound formats, such as 5.1, in terms of spatial realism. To facilitate high-quality bitrate-efficient distribution and flexible reproduction of 3D sound, the MPEG standardization group recently started the MPEG-H Audio Coding development for the universal carriage of encoded 3D sound from channel-based, object-based, and HOA-based input. High quality reproduction is supported for many output formats from 22.2 and beyond down to 5.1, stereo, and binaural reproduction—independently of the original encoding format, thus overcoming incompatibility between various 3D formats. The paper describes the current status of the standardization project and provides an overview of the system architecture, its capabilities, and performance.
Convention Paper 9095
P1-2 Bit Rate of 22.2 Multichannel Sound Signal Meeting Broadcast Quality—Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Yasushige Nakayama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Satoshi Oode, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
The bit rate of a 22.2 multichannel sound (22.2 ch) signal meeting broadcast quality was investigated by preforming several subjective evaluations. 22.2 ch is currently planned to be transmitted by MPEG-4 AAC (advanced audio coding) in 8K Super Hi-Vision broadcast. A subjective evaluation of the basic audio quality of a coded 22.2 ch signal was carried out using 49 stimuli made from a combination of seven bit rates and seven contents. Moreover, a subjective evaluation at two different listening positions and that of a downmixed 5.1 ch signal were also carried out for comparison with that of a 22.2 ch signal at the sweet spot. A bit rate meeting broadcast quality was found from the obtained results.
Convention Paper 9096
P1-3 Design, Coding and Processing of Metadata for Object-Based Interactive Audio—Simone Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Andreas Hölzer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian Borß, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian Ertel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Michael Kratschmer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
For object-based audio an appropriate definition of metadata is needed to ensure flexible playback in any reproduction scenario and to allow for interactivity. Important use-cases for object-based audio and audio interactivity are described and metadata requirements are derived. A metadata scheme is defined that allows for enhanced audio rendering techniques such as content-dependent processing, automatic scene scaling and enhanced level control. Also, a metadata preprocessing logic is proposed that prepares rendering and playout and allows for user interaction with the audio content of an object-based scene. In addition, the paper points out how the metadata can be transported efficiently in a bitstream. The proposed metadata scheme has been adopted and integrated into the currently finalized MPEG-H 3D Audio standard.
Convention Paper 9097
P1-4 On Spatial-Aliasing-Free Sound Field Reproduction Using Finite Length Line Source Arrays—Frank Schultz, University of Rostock / Institute of Communications Engineering - Rostock, Germany; Till Rettberg, University of Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
Concert sound reinforcement systems aim at the reproduction of homogeneous sound fields over extended audiences for the whole audio bandwidth. For the last two decades this has been mostly approached by using so called line source arrays for which Wavefront Sculpture Technology (WST) was introduced in the literature. The paper utilizes a signal processing model developed for sound field synthesis in order to analyze and expand WST criteria for straight arrays. Starting with the driving function for an infinite and continuous linear array, spatial truncation and discretization are subsequently taken into account. The role of the involved loudspeakers as a spatial lowpass filter is stressed, which can reduce undesired spatial aliasing contributions. The paper aims to give a better insight on how to interpret the synthesized sound fields.
Convention Paper 9098
P1-5 The Focal Shift Phenomena for Focused Source Reproduction Using Loudspeaker Arrays—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ian Drumm, University of Salford - Salford, Greater Manchester, UK
The focal shift phenomenon in optics describes how the position of the focus point in a focusing system is not simply defined by geometrical ray-based models but is affected by diffraction and is consequently a function of the size of the lens and the frequency of the light. The same effect is also observed in acoustics when looking at the focused field using physical focusing reflectors. This paper describes the focal shift phenomenon applied to the reproduction of focused sources with sound field synthesis systems, and presents a formula for the prediction of the actual rendered focal point position and also a frequency dependent positional correction for the improved rendering of a focused source with a given loudspeaker setup.
Convention Paper 9099
P1-6 Impulse Response Upmixing Using Particle Systems—Nuno Fonseca, ESTG/CIIC, Polytechnic Institute of Leiria - Leiria, Portugal
With the increase of the computational power of DSP’s and CPU’s, impulse responses (IR) and the convolution process are becoming a very popular approach to recreate some audio effects like reverb. But although many IR repositories exist, most IR recordings consider only mono or stereo. This paper presents an approach for impulse response upmixing using particle systems. Using a reverse engineering process, a particle system is created, capable of reproducing the original impulse response. By re-rendering the obtained particle system with virtual microphones, an upmixing result can be obtained. Depending on the type of virtual microphone, several different output formats can be supported, ranging from stereo to surround, and including binaural support, Ambisonics, or even custom speaker scenarios (VBAP).
Convention Paper 9100
P1-7 ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL—Junaid Jameel Ahmad, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Claudio Alberti, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Jung Wook (Jonathan) Hong, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada; Brett Leonard, University of Nebraska at Omaha - Omaha, NE; McGill University - Montreal, Quebec, Canada; Marco Mattavelli, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Clemens Par, Swiss Audec - Morges, Switzerland; Schuyler Quackenbush, Audio Research Labs - Scotch Plains, USA; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
Inverse problems have only been known in spatial audio for a very short time; their only solution, called „inverse coding“ in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. For instance, inversely coded NHK 22.2 multichannel signals in combination with USAC may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available HE-AAC v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. We likewise present a 3D audio codec design for signals up to NHK 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full Higher Order Ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. The second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. The technology has been specified as the world’s first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain. A preliminary version of this technology, based on a downmix in frequency domain, was submitted to MPEG’s „Phase 2“ selection of low-bitrate 3D coding technologies and made use of an USAC binary, which unfortunately offered no tuning options.
Convention Paper 9218