AES San Francisco 2010
Paper Session P21
Sunday, November 7, 9:00 am — 12:30 pm (Room 220)
Paper Session: P21 - Low-Bit-Rate Audio Coding
P21-1 Combination of Different Perceptual Models with Different Audio Transform Coding Schemes—Implementation and Evaluation—Armin Taghipour, Nicole Knölke, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany
In this paper four combinations of perceptual models and transform coding systems are implemented and compared. The first of the two perceptual models is based on a DFT with a uniform frequency resolution. The second model uses IIR filters designed in accordance with the temporal/spectral resolution of the auditory system. Both of the two transform coding systems use a uniform spectral decomposition (MDCT). While in the first system the quantizers are directly controlled by the perceptual model, the second system uses a pre- and post-filter with frequency warping for shaping the quantization noise with a temporal/spectral resolution more adapted to the auditory system. Implementation details are given and results of subjective tests are presented.
Convention Paper 8283 (Purchase now)
P21-2 Using Noise Substitution for Backwards-Compatible Audio Codec Improvement—Colin Raffel, Experimentalists Anonymous - Stanford, CA, USA
A method for representing error in perceptual audio coding as filtered noise is presented. Various techniques are compared for analyzing and re-synthesizing the noise representation. A focus is placed on improving the perceived audio quality with minimal data overhead. In particular, it is demonstrated that per-critical-band energy levels are sufficient to provide an increase in quality. Methods for including the coded error data in an audio file in a backwards-compatible manner are also discussed. The MP3 codec is treated as a case study, and an implementation of this method is presented.
Convention Paper 8284 (Purchase now)
P21-3 An Introduction to AVS Lossless Audio Coding—Haiyan Shu, Haibin Huang, Ti-Eu Chan, Rongshan Yu, Susanto Rahardja, Institute for Infocomm Research, Agency for Science, Technology & Research - Singapore
Recently, the audio video coding standard workgroup of China (AVS) issued a call for proposal for audio lossless coding. Several proposals were received, in which the proposal from the Institute for Infocomm Research was selected as Reference Model (RM). The RM is based on time-domain linear prediction and residual entropy coding. It introduces a novel residual pre-processing method for random access data frames and a memory-efficient arithmetic coder with dynamic symbol probability generation. The performance of RM is found to be comparable to those of MPEG-4 ALS and SLS. The AVS lossless coding is expected to be finalized at the end of 2010. It will become the latest extension of the AVS-P3 audio coding standard.
Convention Paper 8285 (Purchase now)
P21-4 Audio Re-Synthesis Based on Waveform Lookup Tables—Sebastian Heise, Michael Hlatky, Accessive Tools GmbH - Bremen, Germany; Jörn Loviscach, Hochschule Bielefeld, University of Applied Sciences - Bielefeld, Germany
Transmitting speech signals at optimum quality over a weak narrowband network requires audio codecs that must not only be robust to packet loss and operate at low latency, but also offer a very low bit rate and maintain the original sound of the coded signal. Advanced speech codecs for real-time communication based on code-excited linear prediction provide bandwidths as low as 2 kbit/s. We propose a new coding approach that promises even lower bit rates through a synthesis approach not based on the source-filter model, but merely on a lookup table of audio waveform snippets and their corresponding Mel-Frequency Cepstral Coefficients (MFCC). The encoder performs a nearest-neighbor search for the MFCC features of each incoming audio frame against the lookup table. This process is heavily sped up by building a multi-dimensional search tree of the MFCC-features. In a speech coding application, for each audio frame, only the index of the nearest neighbor in the lookup table would need to be transmitted. The encoder synthesizes the audio signal from the waveform snippets corresponding to the transmitted indices.
Convention Paper 8286 (Purchase now)
P21-5 A Low Bit Rate Mobile Audio High Frequency Reconstruction—Bo Hang, Ruimin Hu, Yuhong Yang, Ge Gao, Wuhan University - Wuhan, China
In present communication systems, high quality audio signals are supposed to be provided with low bit rate and low computational complexity. To increase the high frequency band quality in current communication system, this paper proposed a novel audio coding high frequency bandwidth extension method, which can improve decoded audio quality with increasing only a few coding bits per frame and a little computational complexity. This method calculates high-frequency synthesis filter parameters by using a codebook mapping method, and transmits quantified gain corrections in high-frequency parts of multiplexing coding bit streams. The test result shows that this method can provide comparable audio quality with lower bit consumption and computational complexity compared to the high frequency regeneration of AVS-P10.
Convention Paper 8287 (Purchase now)
P21-6 Perceptual Distortion-Rate Optimization of Long Term Prediction in MPEG AAC—Tejaswi Nanjundaswamy, Vinay Melkote, University of California, Santa Barbara - Santa Barbara, CA, USA; Emmanuel Ravelli, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Long Term Prediction (LTP) in MPEG Advanced Audio Coding (AAC) exploits inter-frame redundancies via predictive coding of the current frame, given previously reconstructed data. Particularly, AAC Low Delay mandates LTP, to exploit correlations that would otherwise be ignored due to the shorter frame size. The LTP parameters are typically selected by time-domain techniques aimed at minimizing the mean squared prediction error, which is mismatched with the ultimate perceptual criteria of audio coding. We thus propose a novel trellis-based approach that optimizes the LTP parameters, in conjunction with the quantization and coding parameters of the frame, explicitly in terms of the perceptual distortion and rate tradeoffs. A low complexity "two-loop" search alternative to the trellis is also proposed. Objective and subjective results provide evidence for substantial gains.
Convention Paper 8288 (Purchase now)
P21-7 Stereo Audio Coding Improved by Phase Parameters—Miyoung Kim, Eunmi Oh, Hwan Shim, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
The parametric stereo coding exploiting phase parameters in a bit-efficient way is a part of MPEG-D USAC (Unified Speech and Audio Coding) standard. This paper describes the down-mixing and up-mixing scheme to further enhance the stereo coding in strong out-of-phase or near out-of-phase signals. The conventional downmixing as a sum of left and right channel for parametric stereo coding has the potential problems, phase cancellation in out-of-phase signals, which results in audible artifacts. This paper proposes the phase alignment by estimated overall phase difference (OPD) parameter and inter-channel phase difference (IPD) parameter. Furthermore, this paper describes the phase modification to minimize the phase discontinuity of down-mixed signal by scaling the size of the stereo channels.
Convention Paper 8289 (Purchase now)