AES Vienna 2007
Home Visitors Exhibitors Press Students Authors
Visitors
Technical Program
Paper Sessions

Spotlight on Broadcasting

Spotlight on Live Sound

Spotlight on Archiving

Detailed Calendar

Convention Planner

Paper Sessions

Workshops

Tutorials

Exhibitor Seminars

Application Seminars

Special Events

Student Program

Technical Tours

Technical Council

Standards Committee

Heyser Lecture









Last Updated: 20070321, mei

P3 - Low Bit-Rate Audio Coding

Saturday, May 5, 10:00 — 11:30

P3-1 Enhanced MPEG-4 Low Delay AAC—Low Bit-Rate High-Quality CommunicationMarkus Schnell, Ralf Geiger, Markus Schmidt, Manuel Jander, Markus Multrus, Fraunhofer IIS - Erlangen, Germany; Gerald Schuller, Fraunhofer IDMT - Ilmenau, Germany; Jürgen Herre, Fraunhofer IIS - Erlangen, Germany
The MPEG-4 Low Delay Advanced Audio Coding (AAC-LD) scheme has recently evolved into a popular algorithm for audio communication. It produces excellent audio quality at bit rates between 64 kbit/s and 48 kbit/s per channel. This paper introduces an enhancement to AAC-LD that reduces the bit rate demand by 25-to-33 percent. This is achieved by adding both a delay-optimized version of the Spectral Band Replication (SBR) tool and by utilizing a dedicated low delay filterbank. The introduced techniques maintain the high audio quality and offer an algorithmic delay low enough for use in two way communication systems. This paper describes the coder enhancements including a detailed discussion of algorithmic delay issues, a performance assessment, and possible applications.
Convention Paper 6998 (Purchase now)

P3-2 On the Design of Low Power MPEG-4 HE-AAC EncoderWen-Chieh Lee, Chung-Han Yang, Cheng-Lun Hu, National Chiao Tung University - Hsinchu, Taiwan
Spectral Band Replication (SBR) has been combined with MPEG AAC as a bandwidth extension tool. The resulting scheme is referred to as the MPEG-4 High Efficient (HE) AAC or aacPlus. With the SBR module taking care of the high frequency contents, the conventional AAC encoder can compress the low frequency part using most of the available bits. The SBR parameters are all calculated by SBR encoder in complex domain in the architecture of conventional QMF. If the components in the SBR encoder can be implemented in the real domain, the computational complexity of HE-AAC will be reduced by half. The paper proposes a low power MPEG-4 HE-AAC encoder to reduce the computational complexity. Both subjective and objective experiments are conducted to demonstrate the quality of the low power HE-AAC encoder on critical music tracks.
Convention Paper 6999 (Purchase now)

P3-3 High Quality, Low Power QMF Bank Design for SBR, Parametric Coding, and MPEG Surround DecodersHsin-Yao Tseng, Han-Wen Hsu, Chi-Min Liu, National Chiao Tung University - Hsin-Chu, Taiwan
Due to the alias-free properties, the complex quadrature mirror filter (QMF) bank has been used in MPEG-4 audio standard on SBR, parametric, and surround coding. The high complexity overhead from the complex QMF bank and the complex data processing in the decoder leads to the development of a low power decoder, which adopts the real QMF bank as the basic building module to reduce the complexity. However, the artifacts from the aliasing in the real QMF bank are the major concern. This paper studies the artifacts from the real QMF bank and proposes a novel QMF bank design to achieve both low complexity and high quality. Also, this paper applies the novel QMF bank to develop the high-quality and low-power SBR, parametric, and MPEG surround decoders and shows the merits in complexity and quality.
Convention Paper 7000 (Purchase now)

P3-4 Low Power Stereo Perceptual Audio Coding Based on Adaptive Masking Threshold ReuseEvelyn Kurniawati, Sapna George, ST Microelectronics Asia Pacific Pte. Ltd. - Singapore
The term perceptual audio coder refers to audio compression schemes that exploit the properties of human auditory perception. The idea is to allocate the quantization noise elegantly below the masking threshold to make it imperceptible to the ear. The process requires considerable computational effort, especially due to the psychoacoustics analysis and bit allocation-quantization process. This paper proposes a new method to simplify the psychoacoustics modeling process by adaptively reusing the computed masking threshold depending on the signal characteristics. The method also devices a scheme to patch the potential spectral hole problems that might occur when the quantization parameters are reused. This proposal can be applied to generic stereo perceptual audio encoders where low computational complexity is required.
Convention Paper 7001 (Purchase now)

P3-5 A Hybrid Warped Linear Prediction (WLP) AAC Audio Coding AlgorithmJaeseong Lee, Young-Cheol Park, Dae-Hee Youn, Hong-Goo Kang, Yonsei University - Seoul, Korea
We propose a hybrid warped linear prediction (WLP) AAC audio coding algorithm. The proposed algorithm employs a warped linear prediction (WLP) processor to construct a perceptual pre- and post-filter for the MPEG-4 AAC. The WLP residue is applied to the MDCT filter-bank, and the signal-to-mask ratio (SMR) of the corresponding block is modified to set a masking threshold for the WLP residues. In the decoder, the reconstructed residual signal is passed to a modified WLP synthesis filter to restore the audio signal. Subjective tests show that the proposed audio codec operating at 50 kbps has comparable perceptual quality to the conventional MPEG-4 AAC operating at the 58 kbps.
Convention Paper 7002 (Purchase now)

P3-6 Comparison of Stereo Redundancy Reduction Schemes for an Ultra Low Delay Audio CoderTobias Albert, Fraunhofer IIS - Erlangen, Germany; Gerald Schuller, Stefan Wabnik, Ulrich Krämer, Jens Hirschfeld, Fraunhofer IDMT - Ilmenau, Germany
In the Fraunhofer Ultra Low Delay Audio Coder (ULD) a pre-filter that is controlled by a psychoacoustic model is followed by a quantizer and a predictive coder to code signals in the time-domain. The output of the predictor is entropy coded and transmitted. Predictor and entropy coder form the lossless redundancy reduction part of the coder. Our goal is to improve the lossless redundancy-reduction part for stereo signals. We present and evaluate six different alternatives for the stereo redundancy reduction, and we combine those alternatives to obtain a higher compression ratio.
Convention Paper 7003 (Purchase now)

P3-7 Speech Codec Enhancements Utilizing Time Compression and Perceptual CodingMaciej Kulesza, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband speech codec. The time expansion procedure is applied to the speech signal after transmission and decoding in order to restore original time relations. Finally, the wideband speech signal is presented to the user. The method for spectral envelope estimation involving perceptual criteria is described. The algorithms for tonal components detection were evaluated and compared during experiments carried-out.
Convention Paper 7004 (Purchase now)

P3-8 Design and Implementation of a Web-Based Software Framework for Real Time Intelligent Audio Coding Based on Speech/Music DiscriminationJose Enrique Muñoz Exposito, Nicolas Ruiz Reyes, Sebastian Garcia-Galan, Pedro Vera Candeas, University of Jaén - Jaén, Spain
In this paper a software framework based on client-server architecture is implemented for real time intelligent audio coding. A speech/music discrimination scheme analyzes the input audio signal and takes a decision about the nature of the audio signal (speech or music) on a frame by frame basis. According to the decision of the speech/music discriminator, a suitable coder is selected at each frame. The designed software framework makes use of the speech and audio coders incorporated into the MPEG4 audio standard (HVXC or CELP for speech frames and TwinVQ or AAC for music frames) to evaluate the performance of an intelligent multimode audio coder. The framework supports several types of audio features (timbral texture features and rhythmic content features) and classifiers (classical Statistical Pattern Recognition (SPR) classifiers, Multilayer Perceptron Neural Networks (MLPNN), Support Vector Machines (SVM), Fuzzy Expert Systems (FES), Hidden Markov Models (HMM)). Comparison between a speech/music discrimination based-intelligent audio coder and MPEG4-AAC has been performed using audio signals representative of the two corresponding classes (speech and music). Subjective and objective tests have been accomplished aiming at assessing the behavior of the intelligent audio coding scheme.
Convention Paper 7005 (Purchase now)

P3-9 Quantization of Laguerre-Based Stereo Linear PredictorsAlbertus C. den Brinker, Philips Research Laboratories - Eindhoven, The Netherlands; Arijit Biswas, Technische Universiteit Eindhoven - Eindhoven, The Netherlands
Recently a quantization scheme for stereo linear prediction systems was proposed and was tested using random data as input. This research is extended in the current paper by incorporating Laguerre filters in the stereo linear prediction scheme. First, it is shown that the associated normalized reflection matrices (NRM) can be obtained efficiently. Second, the system was tested using stereo audio data in order to gain an insight into the required bit rates for practical applications.
Convention Paper 7006 (Purchase now)