AES 116th Convention: POSTERS

Return to 116th

Registration

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

Preliminary Program

4 Day Planner PDF

Convention Program

(in PDF)

Exhibitor Seminars

(in PDF)

Multichannel

Symposium

Paper Sessions

Tutorial Seminars

Workshops

Special Events

Exhibitor Seminars

Tours

Student Program

Historical

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Hotel Information

Travel Info

Press Information

v3.0, 20040325, ME

Session Z10 Monday, May 10 13:00 h–14:30 h
Posters: Low Bit-Rate Audio Coding

Z10-1 Optimal Bit Allocation Strategy for Perceptual Audio Coders Employing Uniform Quantization Schemes—Preethi Konda, Vinod Prakash, Ittiam Systems Pvt. Ltd., Bangalore, India
Using the perceptual distortion metric returned by the psychoacoustic module, conventional bit allocation schemes operate iteratively to maintain equal perceptual distortion in all critical bands. For codecs employing uniform quantization schemes, this paper proposes a new approach to determine the optimal MNR (Mask-to-Noise Ratio) levels for the critical bands. The scheme exploits the fact that the quantizer used is uniform in nature and all critical bands are equally distorted, to arrive at a noniterative solution. Additionally, this method is independent of the target bit-rate. The proposed scheme achieves a 2- to 3-times reduction in the complexity of the quantization block. An example application for this scheme is given with reference to the MPEG-2 Layer 1 and 2 encoder.
Z10-2 Embedded Speech Codec Based on Speex—Md. Kamaruzzaman, Hervé Taddei, Siemens AG, Munich, Germany
Embedded speech coding technique is of interest for many applications like VoIP, multimedia broadcasting, and video conferencing. We propose a CELP-based embedded speech codec that is operable for both narrowband and wideband speech signals. Our three-layered embedded codec offers three bit-rates. This embedded codec is based on the Speex codec. In our embedded speech codec, innovation vectors of the higher layers are embedded in the innovation vector of the lowest layer. All speech coding parameters but the innovation vector are shared between the lowest layer and higher layers. In our algorithm, higher bit rates are rewarded with better quality, penalizing the lowest bit rate.
Z10-3 A Memory and Computationally Efficient Synthesis Sub-band Filter for MPEG Audio Decoding—Mahabaleswara Bhatt, India Product Development Center, Analog Devices, Bangalore, India
This paper proposes a novel method for memory and computationally efficient implementation of a sub-band synthesis filter for MPEG audio decoding. In contrast to the conventional approach, this derived approach proposes to compute 64 sets of windowing operations in the beginning, each with eight input samples and four re-arranged window coefficients. Subsequently, these windowed sequences are used for two matrixing operations. The proposed fast algorithm exploits not only the DCT relationship for matrixing operations but also procedure pruning for required DCT coefficient computations. Moreover, the windowing operations make use of the symmetry that exists in the window coefficient array. Additionally, the derived approach eliminates the intermediate arrays and explicit filtering operation by appropriately merging these into the windowing and matrixing operations itself. This yields a benefit in reducing the memory requirement and also involves data transfers while computing.
Z10-4 Transient Detection for Transform Domain Coders—Venkata Suresh Babuu, Ashish Kumar Malot, Vijayachandran V. M., Vinay M. K., Emuzed India Pvt. Ltd., Bangalore, India
State-of-the art audio encoders are based on transform-domain coding algorithms. Due to time-frequency uncertainty, transform domain coders suffer from “pre-echo” and “diffusion” artifacts during transient portions of the signal. These artifacts occur because of large transform lengths used to achieve higher coding gains. Audio encoders employ various tools such as adaptive transform lengths, TNS, etc., to efficiently code transient portions of the audio signal. Typically audio signals consist of time domain transients (e.g., castanets), frequency domain transients (e.g., flute, clarinet), and transients observed in speech signals during consonant to vowel transitions, etc. Identification of these transients in an audio signal is vital to achieve perceptual quality at low bit-rates. This paper discusses the various transient classes present in audio signals, apart from describing a transient detector employed for efficient modeling of all classes of transients. The proposed transient detector has been incorporated in MPEG-4 AAC encoder, independent of the psychoacoustic analysis methodology used. Listening tests as well as OPERA scores indicate substantial improvement in audio quality over the baseline encoder.
Z10-5 Signal-Adaptive Parametric Modeling for High Quality Low Bit-Rate Audio Coding—Pedro Vera-Candeas¹, Nicolas Ruiz-Reyes¹, Manuel Rosa-Zurera², Jose Curpián-Alonso¹, Pedro Jesús Reche-López^1

1 University of Jaén, Linares, Spain² University of Alcalá, , Alcalá de Henares, Madrid, Spain
In this paper, signal-adaptive parametric models based on over-complete dictionaries of time-frequency atoms are considered for high-quality low bit-rate parametric audio coding. There are a variety of frameworks for deriving over-complete signal expansions, which differ in the structure of the dictionary and the manner in which dictionary atoms are selected for the expansion. Psychoacoustic-adapted matching pursuits are accomplished for extracting sinusoidal components using an harmonic dictionary, while energy-adapted matching pursuits are carried out for transients modeling with a wavelet-based dictionary. First, transients are detected, modeled (with wavelet functions), and removed from the original audio signal, leaving a residue. Then, sinusoids are modeled using complex exponential functions and removed from the initial residue, leaving a noise-like signal. This final residue is modeled taking advantage of the good time-frequency location of the wavelet transform and considering psychoacoustic principles. An M-depth Wavelet Transform is first applied to the residue. Energy of each wavelet sub-band is then computed, and finally a Time Noise Shaping (TNS) process is applied to each one, which involves a parametric model for the noise-like signal. The resulting multipart model (Sines + Transients + Noise) is efficiently applied by taking into account psychoacoustical information for audio coding purposes. The combination of all these ideas results in nearly transparent parametric audio coding at binary rates close to 16 kbps for most of the CD-quality one-channel audio signals considered for testing. Listening tests allow us to say that our coder achieves better results than MPEG-4 AAC at very low bit rates (close to 16 kbps).
Z10-6 Decoder-Based Approach to Enhance Low Bit-Rate Audio— Evelyn Kurniawati¹, Chiew Tong Lau¹, Benjamin Premkumar¹, Javed Absar², Sapna George^2

1 Nanyang Technological University, Singapore² ST Microelectronics Asia Pacific Pte. Ltd., Singapore
A method to improve the PSNR of a perceptual audio coder is presented. It is based on the use of a noise estimator at the decoder side to relate the quantization parameters and the quantization error. The sp? quartic equation established contains two real roots, of which one is the desired spectral value. This value contains lesser quantization error compared to the dequantized spectral value of a normal decoder. This leads to an improvement of up to 12 dB in SNR without significant increase in the decoder complexity.
Z10-7 Efficient Intraframe Coding of Monophonic Audio—Aníbal Ferreira, University of Porto / INESC, Porto, Porto, Portugal;
This paper describes the design of an Advanced Audio Spectral Coder (ASC) that seeks: coding efficiency by combining source and perceptual audio coding techniques; bitstream semantic scalability by segmenting the audio signal into transients, sinusoids and noise; low delay coding by using a moderate transform size and no bit stream buffer; and embedded error robustness by not using interframe coding. The operation of ASC is explained, its performance is assessed using a few test results, and potential application areas are also addressed.