AES Munich 2009 Friday, May 8, 13:00 — 16:30
Paper Session P11
P11 - Audio Coding
Chair: Nick Zacharov
P11-1 A Novel Scheme for Low Bit Rate Unified Speech and Audio Coding—MPEG RM0—Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Markus Multrus, Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Stefan Bayer, Guillaume Fuchs, Johannes Hilpert, Nikolaus Rettelbach, Frederik Nagel, Julien Robilliard, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Redwan Salami, VoiceAge Corporation - Montreal, Quebec, Canada; Gerald Schuller, Fraunhofer IDMT - Ilmenau, Germany; Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Bernhard Grill, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Coding of speech signals at low bit rates, such as 16 kbps, has to rely on an efficient speech reproduction model to achieve reasonable speech quality. However, for audio signals not fitting to the model this approach generally fails. On the other hand, generic audio codecs, designed to handle any kind of audio signal, tend to show unsatisfactory results for speech signals, especially at low bit rates. To overcome this, a process was initiated by ISO/MPEG, aiming to standardize a new codec with consistent high quality for speech, music, and mixed content over a broad range of bit rates. After a formal listening test evaluating several proposals MPEG has selected the best performing codec as the reference model for the standardization process. This paper describes this codec in detail and shows that the new reference model reaches the goal of consistent high quality for all signal types.
Convention Paper 7713 (Purchase now)
P11-2 A Time-Warped MDCT Approach to Speech Transform Coding—Bernd Edler, Sascha Disch, Leibniz Universität Hannover - Hannover, Germany; Stefan Bayer, Guillaume Fuchs, Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The modified discrete cosine transform (MDCT) is often used for audio coding due to its critical sampling property and good energy compaction, especially for harmonic tones with constant fundamental frequencies (pitch). However, in voiced human speech the pitch is time-varying and thus the energy is spread over several transform coefficients, leading to a reduction of coding efficiency. The approach presented herein compensates for pitch variation in each MDCT block by application of time-variant re-sampling. A dedicated signal adaptive transform window computation ensures the preservation of the time domain aliasing cancellation (TDAC) property. Re-sampling can be designed such that the duration of the processed blocks is not altered, facilitating the replacement of the conventional MDCT in existing audio coders.
Convention Paper 7710 (Purchase now)
P11-3 A Phase Vocoder Driven Bandwidth Extension Method with Novel Transient Handling for Audio Codecs—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Disch, Leibniz Universitaet Hanover - Hanover, Germany; Nikolaus Rettelbach, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Storage or transmission of audio signals is often subject to strict bit-rate constraints. This is accommodated by audio encoders that encode the lower frequency part in a waveform preserving way and approximate the high frequency signal from the lower frequency data by using a set of reconstruction parameters. This so called bandwidth extension can lead to roughness and other unpleasant auditory sensations. In this paper the origin of these artifacts is identified, and an improved bandwidth extension method called Harmonic Bandwidth Extension (HBE) is outlined avoiding auditory roughness in the reconstructed audio signal. Since HBE is based on phase vocoders, and thus intrinsically not well suited for transient signals, an enhancement of the method by a novel transient handling approach is presented. A listening test demonstrates the advantage of the proposed method over a simple phase vocoder approach.
Convention Paper 7711 (Purchase now)
P11-4 Efficient Cross-Fade Windows for Transitions between LPC-Based and Non-LPC-Based Audio Coding—Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-LPC-based coding mode (based on AAC) operating in the transform-domain and an LPC-based coding (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wLPT). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-LPC-based coding. This paper presents the new set of windows that was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes.
Convention Paper 7712 (Purchase now)
P11-5 Low Bit-Rate Audio Coding in Multichannel Digital Wireless Microphone Systems—Stephen Wray, APT Licensing Ltd. - Belfast, Northern Ireland, UK
Despite advances in voice and data communications in other domains, sound production for live events (concerts, theater, conferences, sports, worship, etc.) still largely depends on spectrum-inefficient forms of analog wireless microphone technology. In these live scenarios, low-latency transmission of high-quality audio is mission critical. However, while demand increases for wireless audio channels (for microphones, in-ear monitoring, and talkback
systems), some of the radio bands available for “Program Making and Special Events” are to be re-assigned for new wireless mobile telephony and Internet connectivity services: the FCC recently decided to permit so-called White Space Devices to operate in sections of UHF spectrum previously reserved for shared use by analog TV and wireless microphones. This paper examines the key performance aspects of low bit-rate audio codecs for the next generation of bandwidth-efficient digital wireless microphone systems that meet the future needs of live events.
Convention Paper 7714 (Purchase now)
P11-6 Krasner’s Audio Coder Revisited—Jamie Angus, Chris Ball, Thomas Peeters, Rowan Williams, University of Salford - Salford, Greater Manchester, UK
An audio compression encoder and decoder system based on Krasner’s work was implemented. An improved Quadrature Mirror Filter tree, which more closely approximates modern critical band measurements, splits the input signal into sub bands that are encoded using both adaptive quantization and entropy coding. The uniform adaptive quantization scheme developed by Jayant was implemented and enhanced through the addition of non-uniform quantization steps and look ahead. The complete codecs are evaluated using the perceptual audio evaluation algorithm PEAQ and their performance compared to equivalent MPEG-1 Layer III files. Initial, limited, tests reveal that the proposed codecs score Objective Difference Grades close to or even better than MPEG-1 Layer III files encoded at a similar bit rate.
Convention Paper 7715 (Purchase now)
P11-7 Inter-Channel Prediction to Prevent Unmasking of Quantization Noise in Beamforming—Mauri Väänänen, Nokia Research Center - Tampere, Finland
This paper studies the use of inter-channel prediction for the purpose of preventing or reducing the risk of noise unmasking when beamforming type of processing is applied to quantized microphone array signals. The envisaged application is the re-use and postprocessing of user-created content. Simulations with an AAC coder and real-world recordings using two microphones are performed to study the suitability of two existing coding tools for this purpose: M/S stereo coding and the AAC Long Term Predictor (LTP) tool adapted for inter-channel prediction. The results indicate that LTP adapted for inter-channel prediction often gives more coding gain than mere M/S stereo coding, both in terms of signal-to-noise ratio and perceptual entropy.
Convention Paper 7716 (Purchase now)