120th AES Convention - Paris, France - Dates: Saturday May 20 - Tuesday May 23, 2006 - Porte de Versailles

4 Day Planner
Paper Sessions
Exhibitor Seminars
Application Seminars
Student Program
Special Events
Technical Tours
Heyser Lecture
Tech Comm Mtgs
Standards Mtgs
Hotel Reservation

AES Paris 2006

Home | Technical Program | Exhibition | Visitors | Students | Press

Last Updated: 20060403, mei

P16 - Low Bit-Rate Audio Coding, Part 1

Monday, May 22, 08:40 — 12:20

Chair: Mark Vinton, Dolby Laboratories - San Francisco, CA, USA

P16-1 The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio CodecsPaulo Marins, Francis Rumsey, Slawomir Zielinsky, University of Surrey - Guildford, Surrey, UK
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artifacts inherent to low bit-rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artifacts. The choice of the artifacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For.”

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 08:40
Convention Paper 6745 (Purchase now)

P16-2 Improved Noise Weighting in CELP Coding of Speech—Applying the Vorbis Psychoacoustic Model to SpeexJean-Marc Valin, CSIRO ICT Centre - Epping, New South Wales, Australia; Xiph.Org Foundation; Christopher Montgomery, Red Hat - Westford, MA, USA; Xiph.Org Foundation
One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting filter. In this paper we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the significant advantage of improving the quality of an existing codec without the need to change the bit-stream. More specifically, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a significant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20 percent reduction in bit-rate. The technique itself is not specific to Speex and could be applied to other CELP codecs.

Presentation is scheduled to begin at 09:00
Convention Paper 6746 (Purchase now)

P16-3 Reduced Bit Rate Ultra Low Delay Audio CodingStefan Wabnik, Gerald Schuller, Jens Hirschfeld, Ulrich Krämer, Fraunhofer Institute for Digital media Technology (IDMT) - Ilmenau, Germany
An audio coder with a very low delay (6 to 8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced rate coding. We propose to use a different noise shaping method instead, resulting in an approach that uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 09:20
Convention Paper 6747 (Purchase now)

P16-4 Real-Time Subband-ADPCM Low-Delay Audio Coding ApproachFlorian Keiler, Thomson Corporate Research - Hannover, Germany
A low-delay audio codec using the ADPCM structure (ADPCM = adaptive differential pulse code modulation) in subbands is presented. With the use of eight subbands a coarse spectral shaping of the coding noise is obtained and the signal delay is approximately 3 ms. The targeted bit rate is in the range of 128 to 176 kbit/s per channel for near transparent audio quality. The codec uses a cosine-modulated filter bank and backward adaptive calculation of the prediction coefficients and quantization scaling factors. The computations are optimized for a real-time implementation on a fixed-point DSP with an almost constant workload over time. A comparison with the Philips Subband Coder (SBC) and the Fraunhofer Ultra Low Delay Codec (ULD) is performed.

Presentation is scheduled to begin at 09:40
Convention Paper 6748 (Purchase now)

P16-5 Scalable Bitplane Runlength CodingChris Dunn, Scala Technology Ltd. - London, UK
Low-complexity audio compression offering fine-grain bit rate scalability can be realized with bitplane runlength coding. Adaptive Golomb codes are computationally simple runlength codes that allow bitplane runlength coding to achieve notable coding efficiency. For multiblock audio frames, coefficient interleaving prior to bitplane runlength coding results in a substantial increase in coding efficiency. It is shown that bitplane runlength coding is more compact than the best known SPIHT arrangement for audio bitplane coding and achieves coding efficiency that is competitive with fixed-rate quantization.

Presentation is scheduled to begin at 10:00
Convention Paper 6749 (Purchase now)

P16-6 Scalable Audio Coding with Iterative Auditory MaskingChristophe Veaux, Pierrick Philippe, France Telecom R&D - Cesson-Sévigné, France
In this paper reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6750 (Purchase now)

P16-7 A Frequency-Domain Framework for Spatial Audio Coding Based on Universal Spatial CuesMichael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the emerging need to efficiently represent high-fidelity multichannel audio. The SAC methods previously described involve analyzing the input audio for interchannel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel configuration. In this paper we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 10:40
Convention Paper 6751 (Purchase now)

P16-8 Parametric Joint-Coding of Audio SourcesChristof Faller, EPFL - Lausanne, Switzerland
The following coding scenario is addressed. A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multichannel surround, wave field synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties that determine the spatial cues at the mixer output. Subjective evaluation indicates that the proposed scheme achieves high audio quality.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6752 (Purchase now)

P16-9 Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio CodingChristophe Tournery, Christof Faller, EPFL - Lausanne, Switzerland
For parametric stereo and multichannel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multichannel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g., binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose an algorithm for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6753 (Purchase now)

P16-10 Closing the Gap between the Multichannel and the Stereo Audio World: Recent MP3 Surround ExtensionsBernhard Grill, Oliver Hellmuth, Johannes Hilpert, Jürgen Herre, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel applications. Presented in 2004, the MP3 Surround format allows representation of high-quality 5.1 surround sound at bit rates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround. The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:40
Convention Paper 6754 (Purchase now)

P16-11 Design for High Frequency Adjustment Module in MPEG-4 HEAAC Encoder Based on Linear Prediction MethodHan-Wen Hsu, Yung-Cheng Yang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high-frequency adjustment module. In this paper an efficient method based on the Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit different subband contents. Furthermore, the artifact due to the sharing of control parameters is also investigated and the efficient decision criterion of control parameters is proposed.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 12:00
Convention Paper 6755 (Purchase now)

  (C) 2006, Audio Engineering Society, Inc.