AES Conventions and Conferences

   Return to 116th
   Detailed Calendar
         (in Excel)
   Calendar (in PDF)
   Preliminary Program
   4 Day Planner PDF
   Convention Program
         (in PDF)
   Exhibitor Seminars
         (in PDF)
   Paper Sessions
   Tutorial Seminars
   Special Events
   Exhibitor Seminars
   Student Program
   Heyser Lecture
   Tech Comm Mtgs
   Standards Mtgs
   Hotel Information
   Travel Info
   Press Information

v3.0, 20040325, ME

Session Z10 Monday, May 10 13:00 h–14:30 h
Posters: Low Bit-Rate Audio Coding

Z10-1 Optimal Bit Allocation Strategy for Perceptual Audio Coders Employing Uniform Quantization SchemesPreethi Konda, Vinod Prakash, Ittiam Systems Pvt. Ltd., Bangalore, India
Using the perceptual distortion metric returned by the psychoacoustic module, conventional bit allocation schemes operate iteratively to maintain equal perceptual distortion in all critical bands. For codecs employing uniform quantization schemes, this paper proposes a new approach to determine the optimal MNR (Mask-to-Noise Ratio) levels for the critical bands. The scheme exploits the fact that the quantizer used is uniform in nature and all critical bands are equally distorted, to arrive at a noniterative solution. Additionally, this method is independent of the target bit-rate. The proposed scheme achieves a 2- to 3-times reduction in the complexity of the quantization block. An example application for this scheme is given with reference to the MPEG-2 Layer 1 and 2 encoder.
Z10-2 Embedded Speech Codec Based on SpeexMd. Kamaruzzaman, Hervé Taddei, Siemens AG, Munich, Germany
Embedded speech coding technique is of interest for many applications like VoIP, multimedia broadcasting, and video conferencing. We propose a CELP-based embedded speech codec that is operable for both narrowband and wideband speech signals. Our three-layered embedded codec offers three bit-rates. This embedded codec is based on the Speex codec. In our embedded speech codec, innovation vectors of the higher layers are embedded in the innovation vector of the lowest layer. All speech coding parameters but the innovation vector are shared between the lowest layer and higher layers. In our algorithm, higher bit rates are rewarded with better quality, penalizing the lowest bit rate.
Z10-3 A Memory and Computationally Efficient Synthesis Sub-band Filter for MPEG Audio Decoding—Mahabaleswara Bhatt, India Product Development Center, Analog Devices, Bangalore, India
This paper proposes a novel method for memory and computationally efficient implementation of a sub-band synthesis filter for MPEG audio decoding. In contrast to the conventional approach, this derived approach proposes to compute 64 sets of windowing operations in the beginning, each with eight input samples and four re-arranged window coefficients. Subsequently, these windowed sequences are used for two matrixing operations. The proposed fast algorithm exploits not only the DCT relationship for matrixing operations but also procedure pruning for required DCT coefficient computations. Moreover, the windowing operations make use of the symmetry that exists in the window coefficient array. Additionally, the derived approach eliminates the intermediate arrays and explicit filtering operation by appropriately merging these into the windowing and matrixing operations itself. This yields a benefit in reducing the memory requirement and also involves data transfers while computing.
Z10-4 Transient Detection for Transform Domain Coders—Venkata Suresh Babuu, Ashish Kumar Malot, Vijayachandran V. M., Vinay M. K., Emuzed India Pvt. Ltd., Bangalore, India
State-of-the art audio encoders are based on transform-domain coding algorithms. Due to time-frequency uncertainty, transform domain coders suffer from “pre-echo” and “diffusion” artifacts during transient portions of the signal. These artifacts occur because of large transform lengths used to achieve higher coding gains. Audio encoders employ various tools such as adaptive transform lengths, TNS, etc., to efficiently code transient portions of the audio signal. Typically audio signals consist of time domain transients (e.g., castanets), frequency domain transients (e.g., flute, clarinet), and transients observed in speech signals during consonant to vowel transitions, etc. Identification of these transients in an audio signal is vital to achieve perceptual quality at low bit-rates. This paper discusses the various transient classes present in audio signals, apart from describing a transient detector employed for efficient modeling of all classes of transients. The proposed transient detector has been incorporated in MPEG-4 AAC encoder, independent of the psychoacoustic analysis methodology used. Listening tests as well as OPERA scores indicate substantial improvement in audio quality over the baseline encoder.
Z10-5 Signal-Adaptive Parametric Modeling for High Quality Low Bit-Rate Audio Coding—Pedro Vera-Candeas1, Nicolas Ruiz-Reyes1, Manuel Rosa-Zurera2, Jose Curpián-Alonso1, Pedro Jesús Reche-López1
University of Jaén, Linares, Spain
University of Alcalá, , Alcalá de Henares, Madrid, Spain
In this paper, signal-adaptive parametric models based on over-complete dictionaries of time-frequency atoms are considered for high-quality low bit-rate parametric audio coding. There are a variety of frameworks for deriving over-complete signal expansions, which differ in the structure of the dictionary and the manner in which dictionary atoms are selected for the expansion. Psychoacoustic-adapted matching pursuits are accomplished for extracting sinusoidal components using an harmonic dictionary, while energy-adapted matching pursuits are carried out for transients modeling with a wavelet-based dictionary. First, transients are detected, modeled (with wavelet functions), and removed from the original audio signal, leaving a residue. Then, sinusoids are modeled using complex exponential functions and removed from the initial residue, leaving a noise-like signal. This final residue is modeled taking advantage of the good time-frequency location of the wavelet transform and considering psychoacoustic principles. An M-depth Wavelet Transform is first applied to the residue. Energy of each wavelet sub-band is then computed, and finally a Time Noise Shaping (TNS) process is applied to each one, which involves a parametric model for the noise-like signal. The resulting multipart model (Sines + Transients + Noise) is efficiently applied by taking into account psychoacoustical information for audio coding purposes. The combination of all these ideas results in nearly transparent parametric audio coding at binary rates close to 16 kbps for most of the CD-quality one-channel audio signals considered for testing. Listening tests allow us to say that our coder achieves better results than MPEG-4 AAC at very low bit rates (close to 16 kbps).
Z10-6 Decoder-Based Approach to Enhance Low Bit-Rate Audio— Evelyn Kurniawati1, Chiew Tong Lau1, Benjamin Premkumar1, Javed Absar2, Sapna George2
Nanyang Technological University, Singapore
ST Microelectronics Asia Pacific Pte. Ltd., Singapore
A method to improve the PSNR of a perceptual audio coder is presented. It is based on the use of a noise estimator at the decoder side to relate the quantization parameters and the quantization error. The sp? quartic equation established contains two real roots, of which one is the desired spectral value. This value contains lesser quantization error compared to the dequantized spectral value of a normal decoder. This leads to an improvement of up to 12 dB in SNR without significant increase in the decoder complexity.
Z10-7 Efficient Intraframe Coding of Monophonic Audio—Aníbal Ferreira, University of Porto / INESC, Porto, Porto, Portugal;
This paper describes the design of an Advanced Audio Spectral Coder (ASC) that seeks: coding efficiency by combining source and perceptual audio coding techniques; bitstream semantic scalability by segmenting the audio signal into transients, sinusoids and noise; low delay coding by using a moderate transform size and no bit stream buffer; and embedded error robustness by not using interframe coding. The operation of ASC is explained, its performance is assessed using a few test results, and potential application areas are also addressed.

Back to AES 116th Convention Back to AES Home Page

(C) 2004, Audio Engineering Society, Inc.