AES Amsterdam 2008: Paper Session P4

P4 - Low Bit-Rate Audio Coding

Saturday, May 17, 14:00 — 18:00
Chair: Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany

P4-1 Time-Varying Transform for High Quality Audio Communication Codecs —Pierrick Philippe, France Télécom R&D - Cesson Sevigne, France; David Virette, Balázs Kövesi, France Télécom R&D - Lannion, France
High quality audio communication is a current challenge addressed by the standardization committees. In this context, ITU and MPEG recently issued standards for high quality coding of both speech and music contents. Transform coding is used and allows quality commensurate with bit rates regardless of the audio content. Up to now, only constant transform sizes were used in these coding schemes since time varying transform needed look-ahead for perfect reconstruction, hence adding further delay. In this paper we demonstrate how variable transform sizes can be used without affecting the coding delay. Based on filter bank theory, a framework avoiding look-ahead is presented. The quality improvement offered by the proposed solution is illustrated in the context of MPEG4 enhanced low delay AAC.
Convention Paper 7333 (Purchase now)

P4-2 Differential Graph-Based Coding of Spikes in a Biologically-Inspired Universal Audio Coder—Ramin Pichevar, Hossein Najaf-Zadeh, Louis Thibault, Hassan Lahdili, Communications Research Centre - Ottawa, Ontario, Canada
In a previous work we showed that it is possible to code audio materials using a biologically-inspired universal audio coder based on matching pursuit. The best atoms/kernels chosen by matching pursuit are represented by spikes to reflect the biologically-inspired nature of the algorithm. In that work, each spike or atom was defined by parameters such as timing, channel frequency, amplitude, chirp factor, etc., that were encoded independently. However, encoding each atom/spike as a separate entity is very bit consuming. In the present paper, we propose algorithms to encode only the difference between parameters associated with spikes. Hence, we assume that each spike/atom is a node in a graph and choose the sequence of spikes that will minimize the differential encoding costs. Methods based on minimum spanning tree and traveling salesman are proposed and compared for the graph-based optimization of the code.
Convention Paper 7334 (Purchase now)

P4-3 Unraveling the Relationship between Basic Audio Quality and Fidelity Attributes in Low Bit-Rate Multichannel Audio Codecs —Paulo Marins, Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
Prior to this study the evaluation of multichannel audio codecs has been done mainly according to the ITU-R standards BS.1116 and BS.1534. Basic audio quality is the only perceptual attribute assessed in the majority of these tests. This approach, although efficient for measuring the overall quality of several codecs at once, does not provide reasons why a particular codec is rated as better as or worse than another. In this paper fidelity attributes were included; these were based on the attributes suggested in the ITU-R standards but have not been used explicitly in codec evaluation up to now. In this experiment the perceptual importance of these attributes and their contribution to the basic audio quality of low bit-rate surround sound codecs were investigated.
Convention Paper 7335 (Purchase now)

P4-4 A New Perceptual Model for Audio Coding Based on Spectro-Temporal Masking —Steven van de Par, Philips Research Europe - Eindhoven, The Netherlands; Jeroen Koppens, Philips Applied Technologies - Eindhoven, The Netherlands; Armin Kohlrausch, Philips Research Europe - Eindhoven, The Netherlands, and Eindhoven University of technology, Eindhoven, The Netherlands; Werner Oomen, Philips Applied Technologies - Eindhoven, The Netherlands
In psychoacoustics, considerable advances have been made recently in developing computational models that can predict the discriminability of two sounds taking into account spectro-temporal masking effects. These models operate as artificial observers by making predictions about the discriminability of arbitrary signals [e.g., Dau et al., J. Acoust. Soc. Am. 99, Vol. 36(15), 1996]. Therefore, such models can be applied in the context of a perceptual audio coder. A drawback, however, is the computational complexity of such advanced models, especially because the model needs to evaluate each quantization option separately. In this paper a model is introduced and evaluated that is a computationally lighter version of the Dau model but maintains its essential spectro-temporal masking predictions. Listening test results in a transform coder setting show that the proposed model outperforms a conventional purely spectral masking model and the original model proposed by Dau.
Convention Paper 7336 (Purchase now)

P4-5 Delayless Mixing—On the Benefits of MPEG-4 AAC-ELD in High Quality Communication Systems —Markus Schnell, Markus Schmidt, Fraunhofer IIS - Erlangen, Germany; Per Ekstrand, Dolby Sweden - Stockholm, Sweden; Tobias Albert, Daniel Przioda, Manfred Lutzky, Ralf Geiger, Fraunhofer IIS - Erlangen, Germany; Vesa Ruoppila, Dolby Germany - Nuremberg, Germany; Fredrik Henn, Dolby Sweden - Stockholm, Sweden; Erlend Tårnes, Tandberg - Oslo, Norway
Tele- and video conferencing systems for modern business communication are managed by central hubs, so-called multipoint control units (MCU). One major task of these units is the mixing of audio streams from the participating sites. This is traditionally done by decoding the streams, mixing in time domain, and then re-encoding of the mixed signals. This requires additional processing power, leads to increased delay, and degraded audio quality. The paper demonstrates how the recently standardized MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec offers a solution to these problems by efficient and delayless mixing in the transform domain of the codec.
Convention Paper 7337 (Purchase now)

P4-6 Low-Power MPEG-4 HE-AAC Version-2 Encoder —Chi-Min Liu, Han-Wen Hsu, Chung-Han Yang, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
In MPEG-4 HE-AAC version-2 encoder, the analysis/synthesis complex-exponential modulation filter banks are used in spectral band replication (SBR) and parametric stereo (PS) coding. Due to the aliasing interference, the complex banks instead of real banks are adopted in the SBR and PS coding. However, the additional overhead from the complex values in the CEMFB and the subsequent processing have led to high operational overhead. Our previous work has designed the SBR encoders based on the real-domain cosine modulation filter banks; we proposed a complexification-based approach for the SBR coding. This paper extends the work into PS coding. An approximate method for parameters estimation is proposed to save operational overhead with only one CEMFB-analysis channel. Also, a phase-adjustment down-mixing method is proposed to reduce energy vanish effects.
Convention Paper 7338 (Purchase now)

P4-7 Low Complexity Bit Allocation Algorithms for MP3/AAC Encoding —S Nithin, National Institute of Technology - Surathkal, India; Kumaraswamy Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
We have developed two reduced complexity bit-allocation algorithms for MP3/AAC based audio encoding, which can be useful at low bit-rates. One algorithm derives optimum bit-allocation using constrained optimization of weighted noise-to-mask ratio and the second algorithm uses decoupled iterations for distortion control and rate control, with convergence criteria. MUSHRA based evaluation indicated that the new algorithm would be comparable to AAC but requiring only about 1/10th the complexity.
Convention Paper 7339 (Purchase now)

P4-8 Linear Filtering in MDCT Domain —Kumaraswamy Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
In this paper expressions for convolution multiplication properties of MDCT are derived starting from equivalent DFT representations. Using these expressions, methods for implementing linear filtering through block convolution in the MDCT domain are presented. The implementation is exact for symmetric filters and approximate for non-symmetric filters in the case of rectangular window-based MDCT. For a general MDCT window function, the filtering is done on the windowed segments and hence the convolution is approximate for symmetric as well as non-symmetric filters. This approximation error is shown to be perceptually insignificant for symmetric impulse response filters. Moreover, the inherent 50% overlap between adjacent frames used in MDCT computation does reduce this approximation error similar to smoothing of other block processing errors. The presented techniques are useful for compressed domain processing of audio signals.
Convention Paper 7340 (Purchase now)

Last Updated: 20080612, tendeloo

AES Amsterdam 2008Paper Session P4

AES Amsterdam 2008
Paper Session P4