Last Updated: 20050817, mei
P12 - Audio Coding -1
Sunday, October 9, 9:30 am — 12:00 pm
Chair: Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA
P12-1 Upfront Time Segmentation Methods for Transform Coding of Audio—Omar Niamut, Richard Heusdens, Huib Lincklaen Arriëns, Delft University of Technology - Delft, The Netherlands
We study a transform coder that employs a dynamic programming-based rate-distortion optimization framework for time segmentation. Although this coder exhibits a high performance, its computational complexity makes it unfeasible for many practical applications. It is investigated whether up-front time segmentation can reduce computational complexity without a significant decrease in performance. Up-front time segmentation can be accomplished by replacing the rate-distortion cost functional with low-complexity cost measures that are independent of bit rate and perceptual distortion. Through both quantitative and qualitative evaluation it is shown that dynamic programming-based up-front time segmentation for minimization of perceptual entropy can be a viable alternative to rate-distortion optimal time segmentation.
Convention Paper 6585 (Purchase now)
P12-2 Enhanced Accuracy of the Tonality Measure and Control Parameter Extraction Modules in MPEG-4 HE-AAC—Sang-Uk Ryu, Kenneth Rose, University of California at Davis - Santa Barbara, CA, USA
This paper investigates possible enhancements of the high efficiency-advanced audio coding (HE-AAC) encoder, with focus on the spectral band replication modules. The HE-AAC encoder generates side information, including control parameters, that characterizes the energy distribution across time and frequency as well as tonal and noise components, to ensure perceptually coherent regeneration of the high band at the decoder. The accuracy of the encoder's tonality measure and control parameter extraction modules is analyzed, leading to the proposal of an alternative approach employing sinusoidal analysis, which offers enhanced estimation of tonal and noise energy levels, as well as an improved control parameter extraction procedure. Comparative performance evaluation of the standard and modified encoders, on a set of audio signals demonstrates the perceptual impact of estimation inaccuracy on the regenerated high band quality, and identifies the type of audio where it causes meaningful degradation.
Convention Paper 6586 (Purchase now)
P12-3 New Techniques in Spatial Audio Coding—Alan Seefeldt, Mark Vinton, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA
The goal of spatial audio coding is to data compress multichannel audio material by combining channels into a composite signal and transmitting supporting side-information so that a decoder can reconstruct an approximation of the original signal from the composite. Many techniques have been discussed in the literature, most of which manipulate across time and frequency the magnitude and phase of the composite channels to create a perceptual approximation of the original multichannel sound field. Building on this framework, we discuss new techniques for computing and applying the side-information, new de-correlation techniques, and a new way of utilizing a traditional spatial coding system for the purpose of synthesizing a multichannel signal blindly from an existing stereo signal. We also compare the performance of this system to other existing systems.
Convention Paper 6587 (Purchase now)
P12-4 A New Broadcast Quality Low Bit Rate Audio Coding Scheme Utilizing Novel Bandwidth Extension Tools—Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA
In this paper we describe the components of a novel audio coding algorithm capable of delivering high-fidelity CD-like stereo audio at the bit rates of 40 to 48 kbps and natural sounding FM grade mono at the bit rates of 18 to 22 kbps. Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio codecs. Formerly, we proposed one of a newer class of Bandwidth Extension techniques which are applied directly to the high resolution frequency representation of the signal (e.g., MDCT). This technique is based on a Fractal Self-Similarity Model (FSSM) for signal spectrum. The FSSM bandwidth extension forms a key component of the proposed codec. Other important components of the proposed scheme include a novel parametric stereo coding technique and a wideband psychoacoustic model that makes an explicit use of the Comodulation Release of Masking (CMR) phenomenon. This audio coding scheme is geared toward broadcast applications where codec latency and encoder complexity is generally not an overriding concern. Algorithmic details, audio demonstrations, and comparison to other audio coding schemes will be presented.
Convention Paper 6588 (Purchase now)
P12-5 The MPEG-4 Audio Lossless Coding (ALS) Standard—Technology and Applications—Tilman Liebchen, Technical University of Berlin - Berlin, Germany; Takehiro Moriya, Noboru Harada, Yutaka Kamamoto, NTT Communication Science Labs - Atsugi, Japan; Yuriy Reznik, Real Networks, Inc. - Seattle, WA, USA
MPEG-4 Audio Lossless Coding (ALS) is a new extension of the MPEG-4 audio coding family. The ALS core codec is based on forward-adaptive linear prediction, which offers remarkable compression together with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material. In this paper authors who have actively contributed to the standard describe the basic elements of the ALS codec with a focus on prediction, entropy coding, and related tools. We also present the latest developments in the standardization process and point out the most important applications of this new lossless audio format.
Convention Paper 6589 (Purchase now)