AES Amsterdam 2008: Poster Session P9

P9 - Spatial Audio Perception and Processing

Sunday, May 18, 09:30 — 11:00
P9-1 Audio-Visual Processing Tools for Auditory Scene Synthesis—Gavin Kearney, Rozenn Dahyot, Frank Boland, Trinity College Dublin - Dublin, Ireland
We present an integrated set of audio-visual tracking and synthesis tools to aid matching of the audio to the video position in both horizontal and periphonic sound reinforcement systems. Compensation for screen size and loudspeaker layout for high definition formats is incorporated and the spatial localization of the source is rendered using advanced spatialization techniques. A subjective comparison of several original and enhanced film sequences using the Vector Base Amplitude Panning (VBAP) method is presented. The results show that the encoding of non-contradictory audio-visual spatial information, for presentation on different loudspeaker layouts significantly improves the naturalness of the listening/viewing experience.
Convention Paper 7365 (Purchase now)

P9-2 Encoding Higher Order Ambisonics with AAC—Erik Hellerud, Norwegian University of Science and Technology - Trondheim, Norway; Ian Burnett, University of Wollongong - Wollongong, New South Wales, Australia; Audun Solvang, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
In this paper we explore a simple method for reducing the bit rate needed for transmitting and storing Higher Order Ambisonics (HOA). The HOA B-format signals are simply encoded using Advanced Audio Coding (AAC) as if they were individual mono signals. Wave field simulations show that by allocating more bits to the lower order signals than the higher the resulting error is very low in the sweet spot but increases as function of distance from the center. Encoding the higher order signals with a low bit rate does not lead to a reduced audio quality. The spatial information is improved when higher-order channels are included, even if these are encoded with a low bit rate.
Convention Paper 7366 (Purchase now)

P9-3 Virtualized Listening Tests for Loudspeakers—Timo Hiekkanen, Helsinki University of Technology - Espoo, Finland; Aki Mäkivirta, Genelec Oy - Iisalmi, Finland; Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
The precise location of a loudspeaker in a listening room is known to affect loudspeaker preference ratings. When multiple loudspeakers are compared the evaluation is limited by the poor human auditory memory. To overcome these problems, a method to evaluate and compare loudspeakers using headphones is proposed. The method utilizes personal head-related transfer functions in rendering the sound field recorded in a standard listening room with an artificial head. Equalization of circumaural headphones and the artificial head are investigated. Formal listening tests are conducted to examine differences between the proposed binaural method and real loudspeakers in a standard listening room. Listening tests show that the virtualized loudspeakers can be nearly imperceptible from reality in many but not in all cases.
Convention Paper 7367 (Purchase now)

P9-4 Binaural Rendering in MDCT Domain for Multi-Object Audio Coding—Shinya Iizuka, Kei Kikuiri, Nobuhiko Naka, NTT DoCoMo, Inc. - Yokosuka, Kanagawa, Japan
We propose a binaural rendering method in Modified Discrete Cosine Transform (MDCT) domain. It has good compatibility with audio codecs because a number of audio codecs utilize an MDCT filter bank for time-frequency transform. The proposal maps MDCT coefficients to the real part of the Modulated Complex Lapped Transform (MCLT) coefficients and processes the amplitudes and phases according to the binaural information. The inverse MCLT is applied to the coefficients with a synthesis window function, which is derived from the perfect reconstruction condition for the phase shifted signal under the assumption of linear phase property. The proposed method is applicable to the Binaural Cue Coding Type I and offers equivalent subjective quality to the original binaural signal.
Convention Paper 7368 (Purchase now)

P9-5 Room-Dependent Preference of Virtual Surround Sound—Frederick Scott, Agnieszka Roginska, New York University - New York, NY, USA
A common method for simulating surround sound over headphones, so-called virtual surround sound, is the convolution of content information with binaural cues. Often, room information is included. This paper examines if using HRTFs with room impulse responses customized to the room the listener is in enhances the listening experience. Perceptual experiments were conducted to evaluate whether or not listeners prefer a room accurate rendering versus a room that is dissimilar to the one a listener is seated in. A preference test was conducted using music as the test material.
Convention Paper 7369 (Purchase now)

P9-6 Quantization of 2-D Higher Order Ambisonics Wave Fields—Audun Solvang, U. Peter Svensson, Erik Hellerud, Norwegian University of Science and Technology - Trondheim, Norway
The spatial distribution of the quantization noise for a 2-D Higher Order Ambisonics (HOA) signal is investigated analytically. Uniformly distributed loudspeakers radiating plane waves in a nonreverberant environment and frequency domain quantization are presumed. It is found that employing the same quantization interval for all orders leads to uniformly distributed quantization noise in space. Assigning a larger quantization interval (i.e., fewer bits) to higher orders leads to a radially increasing quantization noise. Matching the quantization error to the reproduction error at the near perfect reconstruction boundary suggests that as little as four bits per sample can be used for quantization. Furthermore, high-pass filtering the HOA components opens up for employing as little as three bits per sample. This quantization strategy seems very promising for reducing the rate of HOA.
Convention Paper 7370 (Purchase now)

P9-7 A Binaural Auditory Model for the Evaluation of Reproduced Stereophonic Sound—Marko Takanen, Helsinki University of Technology - Espoo, Finland; Gaëtan Lorho, Nokia Corporation - Helsinki, Finland
Binaural cues describing the differences in phase and power between signals at the two ears enable our auditory system to localize sound sources and segregate spatially multiple auditory events. Recent publications on binaural auditory models have shown how the interaural coherence can be utilized to estimate these cues and therefore model the localization ability of our auditory system. This approach is exploited in this paper to estimate the binaural cues at different frequency bands and identify the spatial location of sound sources from recorded broadband signals. We illustrate the application of a binaural auditory model to evaluate sound reproduced by a stereophonic loudspeaker setup in terms of source localization and specific loudness.
Convention Paper 7371 (Purchase now)

P9-8 An Augmented Reality Audio Mixer and Equalizer—Ville Riikonen, Miikka Tikander, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
In Augmented Reality Audio (ARA) applications the real sound environment of the user is extended with virtual objects. The real environment is reproduced as a pseudo-acoustic world via a special ARA headset that consists of binaural microphones and headphones. However, the headset causes coloration to the pseudo-acoustic representation. In order to make the headset acoustically transparent, equalization is needed. Digital equalization easily causes unacceptable delays. This paper presents a novel ARA mixer with real-time analog equalization to correct the coloration caused by the leakage through the headset and changed resonances in the closed ear canal.
Convention Paper 7372 (Purchase now)

P9-9 Sub-Band Adaptive Crosstalk Cancellation: A Novel Approach for Immersive Audio—Stefania Cecchi, Lorenzo Palestini; Paolo Peretti, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Porto Potenza Picena (MC), Italy
In the field of immersive audio, crosstalk canceller is required when a virtual sound is rendered over two loudspeakers. In the last decade several adaptive algorithms have been proposed: nowadays the least square (LMS) algorithm seems to be the best compromise between simplicity and robustness although its convergence is weakened for colored inputs. In this paper a new approach for crosstalk cancellation based on a sub-band adaptive algorithm will be derived. The effectiveness of this algorithm, considering colored input, will be presented in terms of matrix inversion quality and fast convergence rate comparing it with the conventional LMS algorithm.
Convention Paper 7373 (Purchase now)

Last Updated: 20080612, tendeloo

AES Amsterdam 2008Poster Session P9

AES Amsterdam 2008
Poster Session P9