AES New York 2007: Paper Session P19

AES 123rd Convention - Where Audio Comes Alive

AES New York 2007
Paper Session P19

P19 - Signal Processing for 3-D Audio, Part 1

Monday, October 8, 9:00 am — 12:00 pm
Chair: Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA

P19-1 Spatial Audio Scene Coding in a Universal Two-Channel 3-D Stereo Format—Jean-Marc Jot, Arvindh Krishnaswami, Jean Laroche, Juha Merimaa, Mike Goodwin, Creative Advanced Technology Centre - Scotts Valley, CA, USA
We describe a frequency-domain method for phase-amplitude matrix decoding and up-mixing of two-channel stereo recordings, based on spatial analysis of 2-D or 3-D directional and ambient cues in the recording and re-synthesis of these cues for consistent reproduction over any headphone or loudspeaker playback system. The decoder is compatible with existing two-channel phase-amplitude stereo formats; however, unlike existing time-domain decoders, it preserves source separation and allows accurate reproduction of ambiance and reverberation cues. The two-channel spatial encoding/decoding scheme is extended to incorporate 3-D elevation, without relying on HRTF cues. Applications include data-efficient storage or transmission of multichannel soundtracks and computationally-efficient interactive audio spatialization in a backward-compatible stereo encoding format.
Convention Paper 7276 (Purchase now)

P19-2 Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding—Michael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
In standard virtualization of stereo or multichannel recordings for headphone reproduction, channel-dependent interaural relationships based on head-related transfer functions are imposed on each input channel in the binaural mix. In this paper we describe a new binaural reproduction paradigm based on frequency-domain spatial analysis-synthesis. The input content is analyzed for channel-independent positional information on a time-frequency basis, and the binaural signal is generated by applying appropriate HRTF cues to each time-frequency component, resulting in a high spatial resolution that overcomes a fundamental limitation of channel-centric virtualization methods. The spatial analysis and synthesis algorithms are discussed in detail and a variety of applications are described.
Convention Paper 7277 (Purchase now)

P19-3 Real-Time Spatial Representation of Moving Sound Sources—Christos Tsakostas, Holistiks Engineering Systems - Athens, Greece; Andreas Floros, Ionian University - Corfu, Greece
The simulation of moving sound sources represents a fundamental issue for efficiently representing virtual worlds and acoustic environments but it is limited by the Head Related Transfer Function resolution measurement, usually overcome by interpolation techniques. In this paper a novel time-varying binaural convolution / filtering algorithm is presented that, based on a frequency morphing mechanism that takes into account both physical and psychoacoustic criteria, can efficiently simulate a moving sound source. It is shown that the proposed algorithm overcomes the excessive calculation load problems usually raised by legacy moving sound source spatial representation techniques, while high-quality 3-D sound spatial quality is achieved in both terms of objective and subjective criteria.
Convention Paper 7279 (Purchase now)

P19-4 The Use of Cephalometric Features for Headmodels in Spatial Audio Processing—Sunil Bharitkar, Audyssey Labs - Los Angeles, CA, USA, and University of Southern California, Los Angeles, CA, USA; Pall Gislason, Audyssey Labs - Los Angeles, CA, USA
In two-channel or stereo applications, such as for televisions, automotive infotainment, and hi-fi systems, the loudspeakers are typically placed substantially close to each other. The sound field generated from such a setup creates an image that is perceived as monophonic while lacking sufficient spatial “presence.” Due to this limitation, a stereo expansion technique may be utilized to widen the soundstage to give the perception to listener(s) that sound is originated from a wider angle (e.g., +/– 30 degrees relative to the median plane) using head-related-transfer functions (HRTF’s). In this paper we propose extensions to the headmodel (viz., the ipsilateral and contralateral headshadow functions) based on analysis of the diffraction of sound around head cephalometric features, such as the nose, whose dimensions are of the order to cause variations in the headshadow responses in the high-frequency region. Modeling these variations is important for accurate rendering of a spatialized sound-field for 3-D audio applications. Specifically, this paper presents refinements to the existing spherical head-models for spatial audio applications.
Convention Paper 7280 (Purchase now)

P19-5 MDCT Domain Analysis and Synthesis of Reverberation for Parametric Stereo Audio—K. Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
We propose a parametric stereo coding analysis and synthesis directly in the MDCT domain using an analysis by synthesis parameter estimation. The stereo signal is represented by an equalized sum signal and spatialization parameters. Equalized sum signal and the spatialization parameters are obtained by sub-band analysis in the MDCT domain. The de-correlated signal required for the stereo synthesis is also generated in the MDCT domain. Subjective evaluation test using MUSHRA shows that the synthesized stereo signal is perceptually satisfactory and comparable to the state of the art parametric coders.
Convention Paper 7281 (Purchase now)

P19-6 Correlation-Based Ambience Extraction from Stereo Recordings—Juha Merimaa, Michael M. Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
One of the key components in current multichannel upmixing techniques is identification and extraction of ambience from original stereo recordings. This paper describes correlation-based ambience extraction within a time-frequency analysis-synthesis framework. Two new estimators for the time- and frequency-dependent amount of ambience in the input channels are analytically derived. These estimators are discussed in relationship to two other algorithms from the literature and evaluated with simulations. It is also shown that the time constant used in a recursive correlation computation is an important factor in determining the performance of the algorithms. Short-time correlation estimates are typically biased such that the amount of ambience is underestimated.
Convention Paper 7282 (Purchase now)

Last Updated: 20070906, mei

AES New York 2007Paper Session P19

AES New York 2007
Paper Session P19