AES Amsterdam 2008: Paper Session P10

P10 - Spatial Audio Perception and Processing - Part 2

Sunday, May 18, 13:00 — 17:30
Chair: Renato Pelligrini, sonic emotion ag - Obergltt (Zurich), Switzerland

P10-1 Analysis and Adjustment of Planar Microphone Arrays for Application in Directional Audio Coding—Markus Kallinger, Fabian Kuech, Richard Schultz-Amling, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jukka Ahonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
Directional Audio Coding (DirAC) is a well-established and efficient way to capture and reproduce a spatial sound event. In a recording room, DirAC requires four spatially coincident microphones to estimate the desired parameters, i.e., direction-of-arrival and diffuseness of sound: one omnidirectional and three figure-of-eight microphones pointing along the axes of a three-dimensional Cartesian coordinate system. In most consumer applications only two dimensional scenes need to be reproduced, implying that only two figure-of-eight microphones are required. Furthermore, instead of directional microphones, arrays of omnidirectional microphones are considered for economic reasons. Therefore, we investigate various two-dimensional microphone configurations with respect to their usability for DirAC. We derive theoretical limits for the correct estimation of both direction-of-arrival and diffuseness for the most suitable planar arrays. Furthermore, we suggest a way to equalize the systematic bias for the direction-of-arrival estimation, introduced by the discrete planar arrays.
Convention Paper 7374 (Purchase now)

P10-2 Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio Using Directional Audio Coding—Richard Schultz-Amling, Fabian Kuech, Markus Kallinger, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jukka Ahonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
Recording and reproduction of spatial audio becomes more and more important, as multichannel audio applications gain increasing attention. Directional Audio Coding (DirAC) represents a well proven approach for the analysis and reproduction of spatial sound. In the analysis part, the direction-of-arrival and the diffuseness of the sound field is estimated in subbands using B-format signals, which can be created with 3-D omnidirectional microphone arrays. However, 3-D microphone configurations are not practical in consumer applications, e.g., due to physical design constraints. In this paper we propose a new approach that allows for an approximation of the required B-format signals but is based on a planar microphone configuration only. Comparisons with the standard DirAC approach confirm that the proposed method is able to correctly estimate the desired parameters within a wide range of frequency and the spatial resolution matches the human perception.
Convention Paper 7375 (Purchase now)

P10-3 User-Dependent Optimization of Wave Field Synthesis Reproduction for Directive Sound Fields—Frank Melchior, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and Delft University of Technology, Delft, The Netherlands; Christoph Sladeczek, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and Bauhaus Universität Weimar, Germany; Diemer Diemer de Vries, Delft University of Technology - Delft, The Netherlands; Bernd Fröhlich, Bauhaus Universität Weimar - Weimar, Germany
The use of wave field synthesis (WFS) enables the correct localization of sources over a large listening area. This works well for simulated point sources outside the listening area. The perception of focused sources is only correct for a subspace of the listening area. The subspace depends on the selected set of loudspeakers used for the reproduction of the focused source. If the position of the listener is known the selection of loudspeakers as well as the signal processing can be optimized. By the use of continuous tracking of the listener this adaptation can be done in real time. The same data can be used to simulate a specific directivity of a source and optimize a corresponding room simulation for the tracked listener. We present a wave field synthesis system for the simulation of directive focused sources including room simulation, which is continuously optimized for the position of a tracked listener. Our observations confirm that this approach significantly improves the localization and sound quality of focused sources located inside the listening area.
Convention Paper 7376 (Purchase now)

P10-4 Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding—Jonas Engdegård, Barbara Resch, Dolby Sweden - Stockholm, Sweden; Cornelia Falch, Oliver Hellmuth, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Jeroen Koppens, Erik Schuijers, Werner Oomen, Philips Applied Technologies - Eindhoven, The Netherlands
Following the recent trend of employing parametric enhancement tools for increasing coding or spatial rendering efficiency, Spatial Audio Object Coding (SAOC) is one of the recent additions to the standardization activities in the MPEG audio group. SAOC is a technique for efficient coding and flexible, user-controllable rendering of multiple audio objects based on transmission of a mono or stereo downmix of the object signals. The SAOC system extends the MPEG Surround standard by re-using its spatial rendering capabilities. This paper will describe the chosen reference model architecture, the association between the different operational modes and applications, and the current status of the standardization process.
Convention Paper 7377 (Purchase now)

P10-5 Focusing of Virtual Sound Sources in Higher Order Ambisonics—Jens Ahrens, Sascha Spors, Technische Universität Berlin - Berlin, Germany
Higher order Ambisonics (HOA) is an approach to the physical (re-)synthesis of a given wave field. It is based on the orthogonal expansion of the involved wave fields formulated for interior problems. This implies that HOA is per se only capable of recreating the wave field generated by events outside the listening area. When a virtual source is intended to be reproduced inside the listening area, strong artifacts arise in certain listening positions. These artifacts can be significantly reduced when a wave field with a focus point is reproduced instead of a virtual source. However, the reproduced wave field only coincides with that of the virtual source in one half-space defined by the location and nominal orientation of the focus point. The wave field in the other half-space converges toward the focus point.
Convention Paper 7378 (Purchase now)

P10-6 Listener Envelopment—What Has Been Done and What Future Research Is Needed?—Dan Nyberg, Jan Berg, Luleå University of Technology - Piteå, Sweden
In concert hall acoustics, the perceived spatial impression and/or spaciousness are characterized by the two attributes apparent source width (ASW) and listener envelopment (LEV). For LEV there are no clear consensus across the results of previous work. This paper aims to discuss the research performed on LEV and how these research results are confirming or contradicting each other. There is a consensus on the arrival angle of the later sound energy and its influence on LEV, whereas there is no clear agreement on the delay time and frequency content of the late reflections.
Convention Paper 7379 (Purchase now)

P10-7 Obtaining a Highly Directive Center Channel from Coincident Stereo Microphone Signals—Christof Faller, Illusonic LLC - Lausanne, Switzerland
Time-frequency based postprocessing applied to a coincident stereo recording is proposed to generate an audio signal with a highly directive directional response pointing straight forward. Assuming an ideal coincident stereo microphone, the directional response of this center channel is effectively time and frequency invariant. Further, the look direction can be steered to left and right front directions. The technique is based on the insight that the signal that predicts left from right, modified by limiting the magnitude of the frequency domain prediction gains, has a center forward directional response. The center channel is generated using both, a left-right and a right-left magnitude-limited-predictor signal. Applications of the proposed scheme are use of stereo microphones as “digital steerable shot gun microphones” and center channel generation for music recording.
Convention Paper 7380 (Purchase now)

P10-8 Spatial Sound in the Use of Multimodal Interfaces for the Acquisition of Motor Skills—Pablo F. Hoffmann, Aalborg University - Aalborg, Denmark
This paper discusses the potential effectiveness of spatial sound in the use of multimodal interfaces and virtual environment technologies for the acquisition of motor skills. Because skills are generally of multimodal nature, spatial sound is discussed in terms of the role that may play in facilitating skill acquisition by complementing, or substituting, other sensory modalities. An overview of related research areas on audiovisual and audiotactile interaction is given in connection to the potential benefits of spatial sound as a means to improve the perceptual quality of the interfaces as well as to convey information that may prove critical for the transfer of motor skills.
Convention Paper 7381 (Purchase now)

P10-9 Evaluating the Sensation of Envelopment Arising from 5-Channel Surround Sound Recordings—Sunish George, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK; Søren Beck, Bang & Olufsen a/s - Struer, Denmark
This paper discusses a series of listening tests conducted in the UK and Denmark to evaluate the perceived envelopment of surround audio recordings. The listening tests were designed to overcome some drawbacks (such as range equalization bias) present in the scores of a listening test based on ITU-R. BS. 1534-1 Recommendation (MUSHRA). In this method the listeners were asked to evaluate the envelopment of 5-channel surround sound recordings using a 100-point continuous scale. In order to calibrate the scale, two anchor recordings were used to define points 15 and 85 on the scale. The anchor recordings were selected by means of a formal listening test and interviews with the listeners. According to the obtained results, the proposed method provides repeatable results.
Convention Paper 7382 (Purchase now)

Last Updated: 20080612, tendeloo

AES Amsterdam 2008Paper Session P10

AES Amsterdam 2008
Paper Session P10