AES Barcelona 2005: Poster Session Z3


	Return to Paper Sessions
	AES Barcelona 2005 Poster Session Z3 - Low Bit Rate Coding

Last Updated: 20050331, mei

Sunday, May 29, 09:30 — 11:00

Z3-1 A Sinusoidal Modeling Approach Based on Perceptual Matching Pursuits for Parametric Audio Coding—Pedro Vera Candeas, Nicolas Ruiz Reyes, University of Jaén - Jaén, Spain; M. Rosa-Zuerea, University of Alcalá - Alcalá de Henares, Madrid, Spain; J. C. Cuevas-Martinez, J. L. Blanco-Claraco, University of Jaén - Jaén, Spain
In this paper we propose a new sinusoidal modeling method based on perceptual matching pursuits with application to parametric audio coding. Complex exponentials compose the over-complete dictionary for matching pursuits. The main contribution is the minimization of a perceptual distortion measure defined in the bark scale to select the optimum atom at each iteration of the pursuits. Furthermore, a psychoacoustic stopping criterion for the pursuits is presented. The proposed sinusoidal modeling method is suitable to be integrated into a parametric audio coder based on the three-part model of Sines, Transients and Noise (STN model), as appreciated in experimental results. Our method provides significant advantages regarding previous works mainly because it operates in the bark scale, instead of the frequency scale.
Convention Paper 6377 (Purchase now)

Z3-2 An Audio Quantizer Based on a Time Domain Auditory Masking Model—Thomas Zarouchas, University of Patras - Patras, Greece; Joerg Buchholz, University of Western Syndney - Penrith South, Australia; John Mourjopoulos, University of Patras - Patras, Greece
A novel, nonuniform PCM audio quantizer is described, employing a time-domain computational auditory masking model (CAMM). The model utilizes the concept of signal dependent compression to produce an internal representation of the input signal so that via the use of a decision device a time-varying threshold can be derived. Based on this model, the proposed quantizer evaluates masked/unmasked regions of the signal, so that by an iterative process, the desired variable bit allocation table can be generated to quantize audio samples. Preliminary results indicate good perceptual quality for an average rate of 6.7 bits/sample.
Convention Paper 6378 (Purchase now)

Z3-3 A Perceptual Post Filter for Wideband Speech and Audio ACELP Codecs—Cathy Shao, Martin Bouchard, University of Ottawa - Ottawa, Ontario, Canada
In this paper a novel post filtering method is proposed to improve the perceptual quality of wideband speech and audio Algebraic Code Excited Linear Prediction (ACELP) codecs (such as the ITU-G722.2 Adaptive MultiRate WideBank speech codec, AMR-WB). This perceptual processing is derived from the characteristics of the human hearing system. Based on these characteristics, and more specifically on the analysis of the perceptual loudness difference between the original and the coded signal, it is proposed to add a post filter to reduce the perception of the coding noise. Simulation results show that the objective assessment scores obtained using wideband Perceptual Evaluation of Speech Quality (w-PESQ) can be improved significantly using the proposed method, especially for wideband female speech.
Convention Paper 6379 (Purchase now)

Z3-4 A New Audio Compression Method Based On Spectral Oriented Trees—Alvin W. Y. Su, Wei-Chen Chang, Jing-Xin Wang, National Cheng-Kung University - Tainan, Taiwan
A new audio compression method based on a Spectral Oriented Tree is presented. After frequency transformation is performed over a frame of audio samples, the transform coefficients are arranged to form one to several quad trees depending on harmonic structures of the signal. In each quad tree, the coefficients having larger magnitudes regarded as more important are placed closer to the root position of the tree. A method called Concurrent Encoding In Descendant Tree (CEIDT) is employed to encode the tree coefficients such that those important coefficients are encoded before less important coefficients. Therefore, scalability is easy by discarding the tailing bits at any position of the bitstream. The quality is comparable to that of MP3. The proposed method does not use a psychoacoustic model and the computation complexity is relatively lower compared to those of MP3 and AAC. Only one small coding table is used for the CEIDT method while the tables used in MP3 or AAC require a lot of memory space. Thus, the proposed method provides a lower cost alternative, too.
Convention Paper 6380 (Purchase now)

Z3-5 Fast Bit Allocation Method for MP3/AAC Encoders—Kyoung Ho Bang, Yonsei University - Seoul, Korea; Keun Sup Lee, Samsung Electronics Co. - Suwon, Korea; Young Cheol Park, Dae Hee Youn, Yonsei University - Seoul, Korea
In MP3/AAC encoders, the quantization parameter called scale factor controls the quantization noise and the bit rate. Tuning these encoders would require a characterization of the rate-distortion function per subband, which seems to be available only in a parametric manner. In this paper a fast bit allocation method for an MP3/AAC encoder is presented. The resulting encoder is able to produce an ISO/MPEG compliant bitstream that can guarantee better audio quality. More importantly, the number of computational steps is greatly reduced as compared to the method recommended by the ISO/MPEG committee, because the efficient bit allocation algorithm significantly reduces the number of iterations required. It was found that the efficient bit allocation algorithm works best when the bit rate demanded by the psychoacoustic model in order to keep the quantization noise below the masking threshold is almost equal to the operational bit rate.
Convention Paper 6381 (Purchase now)

Z3-6 Bit Reservoir Design for HE-AAC—Chi-Min Liu, Li-Wei Chen, Han-Wen Hsu, Wen-Chieh Lee, National Chiao-Tung University - Hsin-Chu, Taiwan
High Efficiency AAC (HE-AAC) has included the Spectral Band Replication (SBR) in combination with AAC to achieve high audio quality at bit rates lower than 96 kbits per second. SBR reconstructs high frequency signal through replicating the low frequency parts. The bits allocated to AAC encoder module and SBR module decides the quality and compression efficiency. In the past, we have designed the bit reservoir for AAC to reserve and predict the bits necessary for each time frame. The bit reservoir should be extended for HE-AAC especially for the SBR module. This paper considers the design of the bit reservoir for the HE-AAC. The efficiency of bit reservoir is verified through extensive objective tests.
Convention Paper 6382 (Purchase now)

Z3-7 Accurate Spectral Replacement—Aníbal Ferreira, University of Porto/ATC Labs - Porto, Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Recent advances in perceptual audio coding are strongly based on the concept of bandwidth extension. Most techniques implementing bandwidth extension require an analysis/synthesis filter bank in addition to that used by the associated perceptual audio coder, which increases the overall system complexity and coding delay, and makes difficult the correct alignment between the operation of the audio coder and the operation of the bandwidth extension technique. We present a new Accurate Spectral Replacement (ASR) technique that is based on a suitable decomposition of the MDCT filter bank and that implements synthesis of sinusoidal components with an accuracy much higher than the natural frequency resolution of the filter bank. The ASR technique is described, its performance is assessed with both synthetic and natural audio signals, and its main areas of application are addressed. Audio demonstrations are available at http://www.atc-labs.com/asr/.
Convention Paper 6383 (Purchase now)