AES London 2010
Paper Session P18

P18 - Audio Coding and Compression

Monday, May 24, 14:00 — 17:30 (Room C5)
Chair: Jamie A. S. Angus, University of Salford - Salford, Greater Manchester, UK

P18-1 High-Level Sound Coding with Parametric BlocksDaniel Möhlmann, Otthein Herzog, Universität Bremen - Bremen, Germany
This paper proposes a new parametric encoding model for sound blocks that is specifically designed for manipulation, block-based comparison, and morphing operations. Unlike other spectral models, only the temporal evolution of the dominant tone and its time-varying spectral envelope are encoded, thus greatly reducing perceptual redundancy. All sounds are synthesized from the same set of model parameters, regardless of their length. Therefore, new instances can be created with greater variability than through simple interpolation. A method for creating the parametric blocks from an audio stream through partitioning is also presented. An example of sound morphing is shown and applications of the model are discussed.
P18-2 Exploiting High-Level Music Structure for Lossless Audio CompressionFlorin Ghido, Tampere University of Technology - Tampere, Finland
We present a novel concept of "noncontiguous" audio segmentation by exploiting the high-level music structure. The existing lossless audio compressors working in asymmetrical mode divide the audio into quasi-stationary segments of variable length by recursive splitting (MPEG-4 ALS) or by dynamic programming (asymmetrical OptimFROG) before computing a set of linear prediction coefficients for each segment. Instead, we combine several variable length segments into a group and use a single set of linear prediction coefficients for each group. The optimal algorithm for combining has exponential complexity and we propose a quadratic time approximation algorithm. Integrated into asymmetrical OptimFROG, the proposed algorithm obtains up to 1.20% (on average 0.23%) compression improvements with no increase in decoder complexity.
P18-3 Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC TechnologyJürgen Herre, Cornelia Falch, Dirk Mahne, Giovanni del Galdo, Markus Kallinger, Oliver Thiergart, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The importance of telecommunication continues to grow in our everyday lives. An ambitious goal for developers is to provide the most natural way of audio communication by giving users the impression of being located next to each other. MPEG Spatial Audio Object Coding (SAOC) is a technology for coding, transmitting, and interactively reproducing spatial sound scenes on any conventional multi-loudspeaker setup (e.g., ITU 5.1). This paper describes how Directional Audio Coding (DirAC) can be used as recording front-end for SAOC-based teleconference systems to capture acoustic scenes and to extract the individual objects (talkers). By introducing a novel DirAC to SAOC parameter transcoder, a highly efficient way of combining both technologies is presented that enables interactive, object-based spatial teleconferencing.
P18-4 A New Parametric Stereo- and Multichannel Extension for MPEG-4 Enhanced Low Delay AAC (AAC-ELD)María Luis Valero, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Andreas Hölzer, DSP Solutions GmbH & Co. - Regensburg, Germany; Markus Schnell, Johannes Hilpert, Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jonas Engdegård, Heiko Purnhagen, Per Ekstrand, Kristofer Kjörling, Dolby Sweden AB - Stockholm, Sweden
ISO/MPEG standardizes two communication codecs with low delay: AAC-LD is a well established low delay codec for high quality communication applications such as video conferencing, tele-presence, and Voice over IP. Its successor AAC-ELD offers enhanced bit rate efficiency being an ideal solution for broadcast audio gateway codecs. Many existing and upcoming communication applications benefit from the transmission of stereo or multichannel signals at low bitrates. With low delay MPEG Surround, ISO has recently standardized a low delay parametric extension for AAC-LD and AAC-ELD. It is based on MPEG Surround technology with specific adaption for low delay operation. This extension comes along with a significant improved coding efficiency for transmission of stereo and multichannel signals.
P18-5 Efficient Combination of Acoustic Echo Control and Parametric Spatial Audio CodingFabian Kuech, Markus Schmidt, Meray Zourub, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
High-quality teleconferencing systems utilize surround sound to provide natural communication experience. Directional Audio Coding (DirAC) is an efficient parametric approach to capture and reproduce spatial sound. It uses a monophonic audio signal together with parametric spatial cue information. For reproduction, multiple loudspeaker signals are determined based on the DirAC stream. To allow for hands-free operation, multichannel acoustic echo control (AEC) has to be employed. Standard approaches apply multichannel adaptive filtering to address this problem. However, computational complexity constraints and convergence issues inhibit practical applications. This paper proposes an efficient combination of AEC and DirAC by explicitly exploiting its parametric sound field representation. The approach suppresses the echo components in the microphone signals solely based on the single channel audio signal used for the DirAC synthesis of the loudspeaker signals.
P18-6 Sampling Rate Discrimination: 44.1 kHz vs. 88.2 kHzAmandine Pras, Catherine Guastavino, McGill University - Montreal, Quebec, Canada
It is currently common practice for sound engineers to record digital music using high-resolution formats, and then down sample the files to 44.1 kHz for commercial release. This study aims at investigating whether listeners can perceive differences between musical files recorded at 44.1 kHz and 88.2 kHz with the same analog chain and type of AD-converter. Sixteen expert listeners were asked to compare 3 versions (44.1 kHz, 88.2 kHz, and the 88.2 kHz version down-sampled to 44.1 kHz) of 5 musical excerpts in a blind ABX task. Overall, participants were able to discriminate between files recorded at 88.2 kHz and their 44.1 kHz down-sampled version. Furthermore, for the orchestral excerpt, they were able to discriminate between files recorded at 88.2 kHz and files recorded at 44.1 kHz.
P18-7 Comparison of Multichannel Audio Decoders for Use in Mobile and Handheld DevicesManish Nema, Ashish Malot, Nokia India Pvt. Ltd. - Bangalore, Karnataka, India
Multichannel audio provides immersive experience to listeners. Consumer demand coupled with technological improvements will drive consumption of high-definition content in mobile and handheld devices. There are several multichannel audio coding algorithms, both, proprietary ones like Dolby Digital, Dolby Digital Plus, Windows Media Audio Professional (WMA Pro), Digital Theater Surround High Definition (DTS-HD), and standard ones like Advanced Audio Coding (AAC), MPEG Surround, available in the market. This paper presents salient features/coding techniques of important multichannel audio decoders and a comparison of these decoders on key parameters like processor complexity, memory requirements, complexity/features for stereo playback, and quality/coding efficiency. The paper also presents a ranking of these multichannel audio decoders on the key parameters in a single table for easy comparison.
