AES New York 2007: Paper Session P9

AES 123rd Convention - Where Audio Comes Alive

AES New York 2007
Paper Session P9

P9 - Audio Coding

Saturday, October 6, 12:30 pm — 5:00 pm
Chair: James Johnston, Microsoft Corporation - Redmond, WA, USA

P9-1 Network Music Performance with Ultra-Low-Delay Audio Coding under Unreliable Network Conditions—Ulrich Kraemer, Jens Hirschfeld, Gerald Schuller, Stefan Wabnik, Fraunhofer IDMT - Ilmenau, Germany; Alexander Carôt, Christian Werner, University of Lübeck - Lübeck, Germany
A key issue for successfully interconnecting musicians in real-time over the Internet is minimizing the end-to-end signal delay for transmission and coding. The variance of transmission delay (“jitter”) occasionally causes some packets to arrive too late for playback. To avoid this problem previous approaches are working with rather large receive buffers while accepting larger delay. In this paper we will present a novel solution that keeps buffer sizes and delay minimal. On the network layer we are using a highly optimized audio framework called “Soundjack” and on the coding layer we are working with an ultra low-delay codec for high-quality audio. We analyze and evaluate a modified transmission and coding scheme for the Fraunhofer Ultra-Low-Delay (ULD) audio coder, which is designed to be more resilient to lost and late arriving data packets.
Convention Paper 7214 (Purchase now)

P9-2 A Very Low Bit-Rate Protection Layer to Increase the Robustness of the AMR-WB+ Codec against Bit Errors—Philippe Gournay, University of Sherbrooke - Sherbrooke, Quebec, Canada
Audio codecs face various channel impairments when used in challenging applications such as digital radio. The standard AMR-WB+ audio codec includes a concealment procedure to handle lost frames. It is also inherently robust to bit errors, although some bits within any given frame are more sensitive than others. Motivated by this observation, the present paper makes two contributions. First, a detailed study of the sensitivity of individual bits in AMR-WB+ frames is provided. All the bits in a frame are then divided into three sensitivity classes so that efficient unequal error protection (UEP) schemes can be designed. Then, a very low bit rate protection layer to increase the robustness of the codec against bit errors is proposed and assessed using the results of subjective audio quality tests. Remarkably, in contrast to the standard codec, where some errors have a very discernable effect, the protection layer ensures that the decoded audio is free of major channel artifacts even at a significant 0.5 percent bit error rate.
Convention Paper 7215 (Purchase now)

P9-3 Trellis Based Approach for Joint Optimization of Window Switching Decisions and Bit Resource Allocation—Vinay Melkote, Kenneth Rose, University of California at Santa Barbara - Santa Barbara, CA, USA
The fact that audio compression for streaming or storage is usually performed offline alleviates traditional constraints on encoding delay. We propose a rate-distortion optimized approach, within the MPEG Advanced Audio Coding framework, to trade delay for optimal window switching and resource allocation across frames. A trellis is constructed where stages correspond to audio frames, nodes represent window choices, and branches implement transition constraints .A suitable cost comprising bit consumption and psychoacoustic distortion, is optimized via multiple passes through the trellis until the desired bit-rate is achieved. The procedure offers optimal window switching as well as better bit distribution than conventional bit-reservoir schemes that are restricted to “borrow” bits from past frames. Objective and subjective tests show considerable performance gains.
Convention Paper 7216 (Purchase now)

P9-4 Transcoding of Dynamic Range Control Coefficients and Other Metadata into MPEG-4 HE AAC—Wolfgang Schildbach, Kurt Krauss, Coding Technologies - Nuremberg, Germany; Jonas Rödén, Coding Technologies - Stockholm, Sweden
With the introduction of HE-AAC (also known as aacPlus) into several new broadcasting systems, the topic of how to best encode new and transcode pre-existing metadata such as dynamic range control (DRC) data, program reference level and downmix coefficients into HE-AAC has gained renewed interest. This paper will discuss the means of carrying metadata within HE-AAC and derived standards like DVB, and present studies on how to convert metadata persistent in different formats into HE-AAC. Listening tests are employed to validate the results.
Convention Paper 7217 (Purchase now)

P9-5 Advanced Audio for Advanced IPTV Services—Roland Vlaicu, Oren Williams, Dolby Laboratories - San Francisco, CA, USA
Television service providers have significant new requirements for audio delivery in next-generation broadcast systems such as high-definition television and IPTV. These include the capability to deliver soundtracks from mono to 5.1 channels and beyond with greater efficiency than current systems. Compatibility with existing consumer home cinema systems must also be maintained. A new audio delivery system, Enhanced AC-3, has been developed to meet these requirements, and has been standardized in DVB, . . ., as well as in ATSC. Also, Enhanced AC-3 is being included in widely used middleware solutions and paired with RTP considerations. This paper describes how operators can manage multichannel assets on linear broadcast turn-around and video-on-demand services in order to provide a competitive IPTV offering.
Convention Paper 7218 (Purchase now)

P9-6 A Study of the MPEG Surround Quality versus Bit-Rate Curve—Jonas Rödén, Coding Technologies - Stockholm, Sweden; Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits - Erlangen, Germany; Heiko Purnhagen, Coding Technologies - Stockholm, Sweden; Erik Schuijers, Jeroen Kippens, Philips Applied Technologies - Eindhoven, The Netherlands; Karsten Linzmeier, Andreas Hölzer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
MPEG Surround provides unsurpassed multichannel audio compression efficiency by extending a mono or stereo audio coder with additional side information. This compression method has two important advantages. The first is its backward compatibility, which is important when MPEG Surround is employed to upgrade an existing service. Second, the amount of side information can be varied over a wide range to enable high-quality multichannel audio compression at extremely low bit rates up to perceptual transparency at higher bit rates. The present paper provides a study of the performance of MPEG Surround, highlighting the various tradeoffs that are available when using MPEG Surround. Furthermore a quality versus bit rate curve describing the MPEG Surround performance will be presented.
Convention Paper 7219 (Purchase now)

P9-7 Quality Impact of Diotic Versus Monaural Hearing on Processed Speech—Arnault Nagle, Catherine Quinquis, Aurélien Sollaud, Anne Battistello, France Telecom Research and Development - Lannion, France; Dirk Slock, Institut Eurecom - Antipolis Cedex, France
In VoIP audio conferencing, hearing is done over handsets or headphones, so through one or two ears. In order to keep the same loudness perception between the two modes, a listener can only tune the sound level. The goal of this paper is to show that monaural or diotic hearing has a quality impact on speech processed by VoIP coders. It can increase or decrease the differences in perceived quality between tested coders and even change their ranking according to the sound level. This impact on the ranking of the coders will be explained thanks to the normal equal-loudness-level contours over headphones and the specifics of some coders. It is important to be aware of the impact of the hearing system and its associated sound level.
Convention Paper 7220 (Purchase now)

P9-8 A Novel Audio Post-Processing Toolkit for the Enhancement of Audio Signals Coded at Low Bit Rates—Raghuram Annadana, Harinarayanan E.V., ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
Low bit rate audio coding often results in the loss of a number of key audio attributes such as audio bandwidth and stereo separation. Additionally, there is also typically a loss in the level of details and intelligibility and/or warmth in the signal. Due to the proliferation, e.g., on Internet, of low bit rate audio coded using a variety of coding schemes and bit rates over which the listener has no control, it is becoming increasingly attractive to incorporate processing tools in the player that can ensure a consistent listener experience. We describe a novel post-processing toolkit which incorporates tools for (i) stereo enhancement, (ii) blind bandwidth extension, (iii) automatic noise removal and audio enhancement, and, (iv) blind 2-to-5 channel upmixing. Algorithmic details, listening results, and audio demonstrations will be presented.
Convention Paper 7221 (Purchase now)

P9-9 Subjective Evaluation of Immersive Sound Field Rendition System and Recent Enhancements—Chandresh Dubey, Raghuram Annadana, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
Consumer audio applications such as satellite radio broadcasts, multichannel audio streaming, and playback systems coupled with the need to meet stringent bandwidth requirements are eliciting newer challenges in parametric multichannel audio coding schemes. This paper describes the continuation of our research concerning the Immersive Soundfield Rendition (ISR) system. In particular we present detailed subjective result data benchmarking the ISR system in comparison to MPEG Surround and also characterizing the audio quality level at different sub-modes of the system. We also describe enhancements to various algorithmic components in particular the blind 2-to-5 channel upmixing algorithm and describe a novel scheme for providing enhanced stereo downmix at the receiver for improved decoding by conventional matrix decoding systems.
Convention Paper 7222 (Purchase now)

Last Updated: 20070820, mei

AES New York 2007Paper Session P9

AES New York 2007
Paper Session P9