Last Updated: 20060907, saj
P3 - Audio Coding
Thursday, October 5, 1:30 pm — 5:30 pm
Chair: Brett Crockett, Dolby Laboratories - San Francisco, CA, USA
P3-1 An Enhanced Encoder for the MPEG-4 ALS Lossless Coding Standard—Takehiro Moriya, Noboru Harada, Yutaka Kamamoto, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan
MPEG-4 Audio Lossless Coding (ALS) is a lossless coding standard for audio signals based on time-domain prediction. Enhanced encoder algorithms and implementation examples of the MPEG-4 ALS are described in this paper. To reduce the computational complexity of the encoder, simplified algorithms have been developed for the multichannel prediction coding and the long-term prediction tools. In addition, processing speed has been enhanced by means of software optimization. As a result of these improvements, encoding speed becomes as much as six times faster than that of the MPEG reference software. This makes the standard more useful for various practical applications.
Convention Paper 6869 (Purchase now)
P3-2 Perceptually Biased Linear Prediction—Arijit Biswas, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Albertus C. den Brinker, Philips Research Laboratories - Eindhoven, The Netherlands
A perceptually biased linear prediction scheme is proposed for audio coding, which uses only simple modifications of the coefficients defining the normal equations for a least-squares error. Thereby, the spectral masking effects are mimicked in the prediction synthesis filter without using an explicit psychoacoustic model. The main advantage is the reduced computational complexity. The proposed approach was Implemented in a Laguerre-based linear prediction scheme, and its performance has been evaluated in comparison with a linear prediction approach controlled by the ISO MPEG-1 Layer I-II model as well as with one of the latest spectral integration-based psychoacoustic models. Listening tests clearly demonstrate the viability of the proposed method.
This paper has been selected as the winner of the first Audio Engineering Society Student Paper Award.
Convention Paper 6870 (Purchase now)
P3-3 Fast Complex Quadrature Mirror Filterbanks for MPEG-4 HE-AAC—Han-Wen Hsu, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Spectral Band Replication (SBR) has been introduced in MPEG-4 HE-AAC as a bandwidth extension tool. All the framework of SBR is on a complex-value domain to avoid the aliasing effect, and hence results in considerable time complexity. This paper focuses on the complex Quadrature Mirror Filter (QMF) banks used in HE-AAC encoders and decoders, and proposes the two fast decomposition methods that are based on DCT-IV and DFT respectively, for the time-consuming matrix operations in the filterbanks. Therefore, the time complexity can be effectively reduced by the available fast algorithms for DCT and FFT.
Convention Paper 6871 (Purchase now)
P3-4 Compression Artifacts in Perceptual Audio Coding—Chi-Min Liu, Han-Wen Hsu, Chung-Han Yang, Kan-Chun Lee, Shou-Hung Tang, Yung-Cheng Yang, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Perceptual audio coding, as is known to all, can encode an audio signal transparently for human auditory perception at general conditions. In the past, there have been some types of artifacts defined in linear quantization or MP3 music tracks. However, with the advance of the new technologies from AAC, SBR, and parametric coding, various new types of artifacts will result. This paper models the newly-exploited audible artifacts and analyzes the problematic encoder modules leading to the artifacts. These artifacts should be a major measurement in developing the objective or subjective test. Also, we consider the artifact relief through the concealment schemes in decoders or the module design in encoders.
Convention Paper 6872 (Purchase now)
P3-5 Design of HE-AAC Version 2 Encoder—Chung-Han Yang, Han-Wen Hsu, Kan-Chun Lee, Shou-Hung Tang, Yung-Cheng Yang, Chia-Ming Chang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
HE-AAC version 2 consists of three encoders: MPEG-4 AAC low complexity, spectral band replication (SBR), and parametric stereo coding (PS). Our previous works have considered several modules including T/F grid search, tone/noise component adjustment, coupling coding, time region decision, and down-mix method in these three encoders. This paper considers the associated solutions related to the HE-AAC version 2, and an integrated design is proposed. The objective and subjective tests will be used to check the quality improvement.
Convention Paper 6873 (Purchase now)
P3-6 Analysis and Synthesis for Universal Spatial Audio Coding—Michael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the need for efficient representation of multichannel audio content. SAC methods are typically based on analyzing interchannel relationships in the input audio and resynthesizing those same relationships between the output channels. Recently, a method was proposed and demonstrated based on analyzing the input audio scene and describing it without reference to the channel configuration, thereby enabling flexible, accurate rendering on arbitrary output systems. In this paper we provide further mathematical treatment of this universal spatial audio coding system; we develop an analysis-synthesis method based on a linear algebraic model; present an efficient approach for adapting the synthesis to arbitrary loudspeaker configurations; and describe a straightforward scheme for scalable reduction of the spatial cue data.
Convention Paper 6874 (Purchase now)
P3-7 A Novel Very Low Bit Rate Multichannel Audio Coding Scheme Using Accurate Temporal Envelope Coding and Signal Synthesis Tools—Chandresh Dubey, Richa Gupta, Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibel Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal
Multichannel audio is increasingly ubiquitous in consumer audio applications such as satellite radio broadcast systems, surround sound playback systems, multichannel audio streaming, and other emerging applications. These applications often present a challenging bandwidth constraints making parametric multichannel coding schemes attractive. Several techniques have been proposed recently to address this problem. Here we present a novel low bit rate five-channel encoding system that has shown promising results. This technique called the Immersive Soundfield Rendition (ISR) System emphasizes accurate reproduction of multiband temporal envelope. The ISR system also incorporates a very low over-head (blind upmixing) mode. The proposed multichannel coding system has yielded promising results for multichannel coding in the 0 to 12 kbps range. More information and audio demonstrations will be available at www.atc-labs.com/isr.
Convention Paper 6875 (Purchase now)
P3-8 New Results in Low Bit Rate Speech Coding and Bandwidth Extension—Raghuram Annadana, Harinarayanan E. V., ATC Labs - Chatham, NJ, USA; Anibel Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Emerging digital audio applications for broadcast radio and multimedia systems are presenting new challenges such as the need to code mixed audio content, error robustness, higher audio bandwidth, and the need to deliver high quality audio at low bit rates; demanding a paradigm shift in the existing low bit rate speech coding techniques. This paper describes the continuation of our research in the area of low bit rate speech coding and enhancements in the recently introduced bandwidth extension toolkit, Audio Bandwidth Extension Toolkit (ABET). Several new modes of operation have been introduced in the codec, in particular making innovative use of perceptual coding tools. In addition, a new mode in ABET is added to improve the efficiency of the temporal shaping tool, Multi Band Temporal Amplitude Coding (MBTAC), by exploiting the time and frequency correlation in signals. The structure of the codec and its performance in these modes of operation are detailed. Audio demonstrations and further information is available at www.atc-labs.com/lbr/ and www.atc-labs.com/abet/.
Convention Paper 6876 (Purchase now)