Events of the AES: AES 112th Convention: Session L: LOW BIT RATE AUDIO CODING

Return to 112th

Detailed Calendar
(in Excel)

Calendar (in PDF)

Chairman's Welcome

Exhibitors

Metadata Symposium

Special Events

Papers

Workshops

Technical Tours

Cultural Tours

Students

Historical

Heyser Lecture

Information
for Authors

Tech Comm Mtgs

Standards Comm Mtgs

Travel

Hotel Information

Registration

Session L: LOW BIT RATE AUDIO CODING - PART 2

Sunday, May 12, 14:00 – 18:00 h
Chair: Markus Erne, Scopein Research, Aarau, Switzerland

14:00 h
L1 Perceptual Coding Using Sinusoidal Modeling in the MDCT Domain – Aníbal Ferreira, INESC Porto, Porto, Portugal

MDCT based perceptual audio coders shape the quantization noise according to simple psychoacoustic rules and general behavioral aspects of the audio signal such as ‘stationarity’ and tonality. As a consequence, the resulting compressed audio representation has little semantic value making difficult MPEG-7 oriented operations such as feature extraction and audio modification directly in the compressed domain. First results in this perspective are reported using an enhanced version of an MDCT based perceptual coder that implements sinusoidal modeling and subtraction directly in the MDCT frequency domain, as well as spectral envelope modeling and normalization. The implications on the coding efficiency are also addressed.
Convention Paper 5569

14:30 h
L2 Energy-Adapted Matching Pursuits in Multi-Parts Models for Audio Coding Purposes – Manuel Zurera¹, Pedro Vera Candeas², Nicolas Reyes², Francisco Ferreras¹,Damian Muñoz³ - ¹UAH, Alcalá de Henares, Spain; ²UJA, Alcalá de Henares, Spain; ³Universidad de Jaen, Linares, Spain

The application of the matching pursuit algorithm for extracting sinusoidal components and transients from audio signals is proposed. The resulting residue is perceptually modeled as a noise like signal. This multi-part model (Sines + Transients + Noise) is used for audio coding purposes. First of all, an accurate detection of transients in audio signals is required. When a transient is detected, energy-adapted matching pursuits are accomplished using a wavelet-packet based dictionary and a dictionary of sinusoidal functions. Otherwise, the matching pursuit algorithm is only applied with the harmonic dictionary. In both cases, the resulting residue is then modeled as a noise-like signal using the Equivalent Rectangular Bandwidth (ERB) model. The parameters of this multi-part model are efficiently quantized, taking into account psycho-acoustical information, so as to assure high perceptual quality at low bit rates. The combination of these all ideas results in nearly transparent audio coding at binary rates lower than 32 kbps for most of the CD-quality one channel audio signals considered for testing.
Convention Paper 5570

15:00 h
L3 Perceptual Audio Modeling Based on Total Least Squares Algorithms – Kris Hermus, Werner Verhelst, Patrick Wambacq, KU Leuven, Leuven, Belgium

Total Least Squares (TLS) algorithms automatically decompose (audio) frames into a number of exponentially damped sinusoids. This can provide for more efficient modeling than plain sinusoidal modeling, especially in the case of transitional frames. Straightforward implementations of TLS optimize a SNR criterion. In our implementation we apply TLS in a sub band scheme in which the number of damped sinusoids is both frame and sub band dependent. This is made possible through the use of perceptual information provided by the MPEG-I psycho-acoustic model I. Experiments on different audio tracks provide proof of concept for our perceptual ESM, and illustrate the significant reduction in modeling components compared to a non-perceptual ESM.
Convention Paper 5571

15:30 h
L4 High-Order LPC and Line Spectrum Pairs – Kelvin Eng¹, Dong Yan Huang², Say Wei Foo¹ - ¹National University of Singapore, Singapore; ²Institute of Microelectronics, Singapore

High order linear predictive coding (LPC) analysis, as a pre-preprocessing stage in an audio codec designed for wideband arbitrary audio signals, is found to be particularly beneficial for audio samples of an instrumental nature compared to that of a vocal nature. With increasing LPC orders, it is imperative to keep the bits consumed by the pre-processing stage constant as a proportion of the total bit rate. To achieve this, the properties of the Line Spectrum Pairs (LSP) parameter are exploited in a proposed multistage vector quantization scheme for high order LPC. Notably, incorporating LSP differences in the design of the quantizer was the most efficient, with no perceptible differences at an average of 1.645 bits/sample, compared to the case of scalar quantization, which is used as a benchmark at 2 bits/sample. Particularly, using LSP differences as a bit allocation mechanism proves to be especially effective in dealing with clips of a percussive nature.
Convention Paper 5572

16:00 h
L5 A New Bit Allocation Method for Low Delay Audio Coding at Low Bit Rates – Kelvin Eng¹, Dong Yan Huang², Say Wei Foo¹ - ¹National University of Singapore, Singapore; ²Institute of Microelectronics, Singapore

A low delay variable bit rate audio codec, implemented for wideband arbitrary audio signals, combines inter-frame and intra-frame bit allocation in an adaptive scheme. An outer-loop uses a moving average noise-to-mask ratio (NMR) indicator and a bit reserve to adaptively allocate bits from frames of a lesser perceptual significance to frames of a greater perceptual significance. An inner loop allocates the available bits to each line of the spectrum via an adaptive algorithm based on a weighting function derived from the masking thresholds. Through informal listening tests, the proposed new bit allocation method resulted in an improvement in audio quality over most samples, as opposed to one using a single adaptive intra-frame loop. Particularly, these improvements were more perceptible at the lower bit rates of about 36 kbps as opposed to the higher bit rates of about 64 kbps. Numerical results also indicate a savings of 8 – 10% of the total bit rate.
Convention Paper 5573

16:30 h
L6 Binaural Cue Coding Applied to Stereo and Multichannel Audio Compression – Christof Faller, Frank Baumgarte, Agere Systems, Murray Hill, NJ, USA

Binaural Cue Coding (BCC) is an efficient representation for spatial audio that can be applied to stereo and multi-channel audio compression. Conventional mono audio coders are enhanced with BCC for coding of stereo and multi-channel audio signals. There is only a relatively small overhead in bit rate for encoding stereo and multi-channel audio signals compared to the bit rate of the mono audio coder alone. The presented implementations have low complexity and are suitable for real-time applications. Results from subjective tests suggest that the proposed scheme provides better audio quality for encoding of stereo audio signals than conventional perceptual transform audio coders for a wide range of bitrates.
Convention Paper 5574

17:00 h
L7 Why Binaural Cue Coding is Better Than Intensity Stereo Coding – Frank Baumgarte, Christof Faller, Agere Systems, Murray Hill, NJ, USA

Intensity Stereo Coding (ISC) is a joint-channel audio coding tool that is part of the ISO/MPEG standards. ISC can introduce severe distortions if applied to full bandwidth or to audio signals with a dynamic or wide spatial image. In contrast, Binaural Cue Coding (BCC) is a systematic approach for representing auditory spatial cues which includes ISC as a subset. BCC is independent of the time/frequency resolution used by the coder, thus it can be optimized for spatial image reproduction. Subjective listening tests confirm that ISC is significantly compromised by an inappropriate time/frequency resolution and that BCC has superior quality and robustness.
Convention Paper 5575

17:30 h
L8 Analyzing Decompressed Audio with the “Inverse Decoder” - Toward an Operative Algorithm – Sascha Moehrs¹, Jürgen Herre¹, Ralf Geiger² - ¹Fraunhofer Institute for Integrated Circuits, Erlangen, Germany; ²Fraunhofer Institute for Integrated Circuits, Ilmenau, Germany

Perceptual audio coding of high quality audio signals is nowadays widely used. To reproduce the audio data, the bitstream is expanded into an uncompressed audio format by the decoding algorithm. As shown previously, it is feasible to recover the encoding compression parameters from the decoded audio signal and even translate a decoded audio signal back into its original bitstream representation. This technique is referred to as “inverse decoding” and has several interesting applications, including tandem-resistant re-encoding of audio signals. The paper illustrates practical results obtained by the first working implementation of an “Inverse Decoder” based on the popular MP3 coder. The performance of the algorithm is evaluated in terms of reconstruction precision and computational complexity. Finally, algorithmic issues are discussed.
Convention Paper 5576