AES 116th Convention: PAPERS

Return to 116th

Registration

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

Preliminary Program

4 Day Planner PDF

Convention Program

(in PDF)

Exhibitor Seminars

(in PDF)

Multichannel

Symposium

Paper Sessions

Tutorial Seminars

Workshops

Special Events

Exhibitor Seminars

Tours

Student Program

Historical

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Hotel Information

Travel Info

Press Information

v3.2, 20040408, ME

Session I Sunday, May 9 13:00 h–15:30 h
LOW BIT-RATE AUDIO CODING—PART 2
(focus on new technologies)
Chair: Markus Erne, Scopein Research, Aarau, Switzerland

I-1 Audio Coder Enhancement Using Scalable Binaural Cue Coding with Equalized Mixing—Frank Baumgarte, Christof Faller, Peter Kroon, Agere Systems, Allentown, PA, USA
A major application for Binaural Cue Coding (BCC) is multichannel audio coding. A previously proposed system combines a full-band BCC coder for spatial parameters with an audio coder for a down-mixed representation of the multichannel input. This paper presents a scalable hybrid coder combining a partial-band BCC as preprocessor and postprocessor with a subband coder. The hybrid system supports a gradual tradeoff of bit rate and spatial image ranging from transparent multichannel and stereo to full-band BCC. To avoid coloration from the required up- and downmixing within BCC, an equalized mixing scheme based on a binaural loudness model is proposed. Subjective tests and bit rate simulations confirm the expected benefits of the hybrid coder in the transition range from full-band BCC to stereo.
I-2 Spatial Decomposition of Time-Frequency Regions: Subbands or Sinusoids—Aki Härmä¹, Christof Faller^2

1 Helsinki University of Technology, Espoo, Finland² Agere Systems, Allentown, PA, USA
Techniques where a stereo or a multichannel signal is decomposed into spatial source-labeled time-frequency slots by level, time-difference, and coherence metrics have become popular in recent years. Good examples are binaural cue coding and up/downmixing techniques. In this paper we will provide an overview and discuss parallel approaches in the field of array processing and blind source separation. Typically, time-frequency slots are formed from subband representations of signals. However, it is also possible to produce a similar spatial decomposition for a parametric representation (sinusoids, transients, and noise) of a stereo or multichannel audio signal. Advantages and disadvantages of the two approaches in audio coding applications are discussed in this paper.
I-3 A Guideline to Audio Codec Delay—Manfred Lutzky¹, Gerald Schuller², Marc Gayer¹, Ulrich Krämer², Stefan Wabnik^2

1 Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany² Fraunhofer Institute for Digital Media Technology IDMT, Ilmenau, Germany
Digital audio processing has been revolutionized by perceptual audio coding in the past decade. The main parameter to benchmark different codecs is the audio quality at a certain bit rate. For many applications, however, delay is another key parameter that varies between only a few and hundreds of milliseconds depending on the algorithmic properties of the codec. Latest research results in low-delay audio coding can significantly improve the performance of applications such as communications, digital microphones, and wireless loudspeakers with lip synchronicity to a video signal. This paper describes the delay sources and magnitude of the most common audio codecs and thus provides a guideline for the choice of the most suitable codec for a given application.
I-4 Parametric Audio Coding Based Wavetable Synthesis— Marek Szczerba, Werner Oomen, Marc Klein Middelink, Philips Digital Systems Laboratories, Eindhoven, The Netherlands
For mobile applications memory and computational complexity requirements are very strict. Therefore, traditional wavetable/FM synthesis methods have to compromise between the number and the quality of instruments in the soundbank. This paper presents a wavetable synthesizer employing a parametric representation of the soundbank samples, sharing the advantages of both wavetable and parametric synthesis methods. The soundbank is compact and thus easy to store and transmit, and the sound quality can match that of traditional wavetable synthesis. Moreover, postprocessing of samples in a parametric representation—such as pitch change, filtering, and envelope—can be performed directly in the parametric domain, effectively reducing synthesizer complexity.
I-5 Removal of Birdie Artifact in Perceptual Audio Coders—Vinod Prakash, Anil Kumar, Preethi Konda, Sarat Chandra Vadapalli, Ittiam Systems Pvt. Ltd., Bangalore, India
The birdie artifact is the predominant factor affecting audio quality of perceptual coders operating at very low bit rates. Conventional approaches to overcome the birdie artifact involve use of low-pass filters to reduce the amount of signal to quantize. This approach does not eliminate the birdie artifact if the effect is seen in the in-band components. This paper proposes a new algorithm to overcome the birdie artifact and hence improve the audio quality. The proposed algorithm modifies the bit allocation strategy such that the critical bands are preserved, while still maintaining the perceptual distortion criteria. Results of spectrogram analysis are presented.