Last Updated: 20050913, wtm
P14 - Audio Coding -2
Sunday, October 9, 1:00 pm — 4:00 pm
Chair: Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
P14-1 Jointly Optimal Time Segmentation, Distribution, and Quantization for Sinusoidal Coding of Audio and Speech—Richard Heusdens, Jesper Jensen, Pim Korten, Delft University of Technology - Delft, The Netherlands
In this paper we propose a rate-distortion optimal algorithm for sinusoidal coding of audio and speech. The algorithm determines for a pre-specified target bit-rate the optimal (variable-length) time segmentation, the optimal distribution of sinusoidal components over the segments, and the optimal (scalar) quantizers for quantizing the sinusoid parameters amplitude, phase, and frequency. The optimization is done by jointly optimizing the segment lengths, number of sinusoids, and quantizers using high-resolution quantization theory and dynamic programming techniques, which makes it possible to execute the algorithm in polynomial time. A particular advantage of the proposed method is that, given a target bit-rate, it solves the problem of finding the optimal balance between total number of sinusoids and number of bits per sinusoid.
Convention Paper 6596 (Purchase now)
P14-2 Enhanced Performance in the Functionality of Fine Grain Scalability—KiHyun Choo, Eunmi Oh, Jung-Hoe Kim, ChangYong Son, Samsung Advanced Institute of Technology - Suwon, Korea
The purpose of this paper is to take advantage of the characteristics of arithmetic decoding and then improve coding efficiency of codecs that provide the functionality of fine-grain scalability. The smart decoding algorithm exploits the fact that a decoding buffer still contains meaningful information for arithmetic decoding when there is no bit to be fed into the buffer. We tested the effect of the symbols additionally decoded from truncated MPEG-4 BSAC and MPEG-4 scalable lossless audio coding (SLS) bit streams. On average, approximately 41 symbols and 13 additional symbols are uniquely decodable per frame in MPEG-4 BASC and MPEG-4 SLA respectively. The experimental results show that much less spectral difference and higher SNR with the smart arithmetic decoding. This additional “compression” can be effective when transmitting truncated bit streams at lower bit rates.
Convention Paper 6597 (Purchase now)
P14-3 Scalability in KOZ Audio Compression Technology—Kevin Short, Ricardo Garcia, Michelle Daniels, Chaoticom Technologies - Andover, MA, USA
Intra-codec scalability in the KOZ audio compression technology is presented in detail. The KOZ codec uses a psychoacoustic model and high-resolution spectral analysis to create, prioritize, and layer audio objects, making it inherently scalable by varying the number of layers. The layers are sufficiently fine-grained to allow both small-step and large-step bit rate variations in real-time during content delivery. Decoder scalability based on availability of device resources is introduced. An overview of the architecture of the KOZ technology and some of the applications of its scalability are discussed.
Convention Paper 6598 (Purchase now)
P14-4 MPEG Spatial Audio Coding/MPEG Surround: Overview and Current Status—Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christof Faller, Agere Systems - Allentown, PA, USA; J. Rödén, Coding Technologies - Stockholm, Sweden; F. Myburg, Philips Applied Technologies - Eindhoven, The Netherlands; S. Disch, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Heiko Purnhagen, Coding Technologies - Stockholm, Sweden; G. Hotho, Philips Research Laboratories - Eindhoven, The Netherlands; M. Neusinger, Franuhofer Institute for Integrated Circuits IIS - Erlangen, Germany; K. Kjörling, Coding Technologies, Stockholm, Swede; W. Oomen, Philips Applied Tec
Recently, the MPEG a udio standardization group started a new work item on spatial audio coding. This new approach allows for a fully backward compatible representation of multichannel audio at bit rates that are only slightly higher than common rates currently used for coding of mono/stereo sound. This paper briefly describes the underlying idea and reports on the current status of the MPEG standardization activities. It provides an overview of the resulting "MPEG Surround" technology and discusses its capabilities. The current level of performance will be illustrated by listening test results.
Convention Paper 6599 (Purchase now)
P14-5 Efficient Design of Time-Frequency Stereo Parameter Sets for Parametric HE-AAC—Kan-Chun Lee, Chung-Han Yang, Han-Wen Hsu, Wen-Chieh Lee, Chi-Min Liu, Tzu-Wen Chang, National Chiao Tung University - Hsinchu, Taiwan
A parametric stereo coding (PS) tool is used to reconstruct stereo signal from the monaural signal. The tool can be jointly used with the HE-AAC to have high compression ratio and is referred to as the parametric HE-AAC in this paper. The PS tool is able to capture the stereo image of the audio input signal into a limited number of parameters, requiring only a small overhead. In MPEG-4 HE-AAC, the PS tool segments a frame into several regions in time domain and into stereo bands in frequency domain to deliver stereo parameter sets. This paper considers the design of the stereo parameters. These methods are integrated in the NCTU-HE-AAC and the objective experiments are conducted to check the quality.
Convention Paper 6600 (Purchase now)
P14-6 Structural Analysis of Low Latency Audio Coding Schemes—Ralf Geiger, Manfred Lutzky, Markus Schnell, Markus Schmidt, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Low latency audio coding gains increasing importance among upcoming high quality communication applications like videoconferencing and VoIP. This paper provides a comparison of two low latency audio codecs suitable for these tasks: MPEG-4 ER AAC-LD and ITU-T G722.1 Annex C. Despite their similar coding strategies both codecs show significant differences with respect to the used tools and coding performance. A comparison of the coding tools is provided and the influence on different signal classes is discussed.
Convention Paper 6601 (Purchase now)