Home | Technical Program | Exhibition | Visitors | Students | Press
Last Updated: 20060404, meiP23 - Posters: Low Bit-Rate Audio Coding
Monday, May 22, 16:00 — 17:30
P23-1 The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio Codecs—Paulo Marins, Francis Rumsey, Slawomir Zielinsky, University of Surrey - Guildford, Surrey, UK
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artifacts inherent to low bit-rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artifacts. The choice of the artifacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For.”
[Poster Presentation Associated with Paper Presentation P16-1]
Convention Paper 6745 (Purchase now)
P23-2 Reduced Bit Rate Ultra Low Delay Audio Coding—Stefan Wabnik, Gerald Schuller, Jens Hirschfeld, Ulrich Krämer, Fraunhofer Institute for Digital media Technology (IDMT) - Ilmenau, Germany
An audio coder with a very low delay (6 to 8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced rate coding. We propose to use a different noise shaping method instead, resulting in an approach that uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.
[Poster Presentation Associated with Paper Presentation P16-3]
Convention Paper 6747 (Purchase now)
P23-3 Scalable Audio Coding with Iterative Auditory Masking—Christophe Veaux, Pierrick Philippe, France Telecom R&D - Cesson-Sévigné, France
In this paper reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.
[Poster Presentation Associated with Paper Presentation P16-6]
Convention Paper 6750 (Purchase now)
P23-4 A Frequency-Domain Framework for Spatial Audio Coding Based on Universal Spatial Cues—Michael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the emerging need to efficiently represent high-fidelity multichannel audio. The SAC methods previously described involve analyzing the input audio for interchannel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel configuration. In this paper we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.
[Poster Presentation Associated with Paper Presentation P16-7]
Convention Paper 6751 (Purchase now)
P23-5 Parametric Joint-Coding of Audio Sources—Christof Faller, EPFL - Lausanne, Switzerland
The following coding scenario is addressed. A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multichannel surround, wave field synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties that determine the spatial cues at the mixer output. Subjective evaluation indicates that the proposed scheme achieves high audio quality.
[Poster Presentation Associated with Paper Presentation P16-8]
Convention Paper 6752 (Purchase now)
P23-6 Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio Coding—Christophe Tournery, Christof Faller, EPFL - Lausanne, Switzerland
For parametric stereo and multichannel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multichannel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g., binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose an algorithm for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.
[Poster Presentation Associated with Paper Presentation P16-9]
Convention Paper 6753 (Purchase now)
P23-7 Closing the Gap between the Multichannel and the Stereo Audio World: Recent MP3 Surround Extensions—Bernhard Grill, Oliver Hellmuth, Johannes Hilpert, Jürgen Herre, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel applications. Presented in 2004, the MP3 Surround format allows representation of high-quality 5.1 surround sound at bit rates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround. The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.
[Poster Presentation Associated with Paper Presentation P16-10]
Convention Paper 6754 (Purchase now)
P23-8 Design for High Frequency Adjustment Module in MPEG-4 HE-AAC Encoder Based on Linear Prediction Method—Han-Wen Hsu, Yung-Cheng Yang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsin-Chu, Taiwan
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high-frequency adjustment module. In this paper an efficient method based on the Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit different subband contents. Furthermore, the artifact due to the sharing of control parameters is also investigated and the efficient decision criterion of control parameters is proposed.
[Poster Presentation Associated with Paper Presentation P16-11]
Convention Paper 6755 (Purchase now)
P23-9 Evaluation of Real-Time Transport Protocol Configurations Using aacPlus—Andreas Schneider, Kurt Krauss, Andreas Ehret, Coding Technologies - Nuremberg, Germany
aacPlus is a highly efficient audio codec that is being used in a growing number of applications where the compressed audio data is encapsulated in a real-time transport protocol and transmitted over error-prone channels. In this paper the implication of packet losses during transmission and techniques to mitigate the impact on the resulting audio quality are discussed. Example transmission channel characteristics are used to show how typical protocol configuration parameters are derived. The benefits of the described techniques are evaluated and verified by setting up a complete simulation chain and performing listening tests.
[Poster Presentation Associated with Paper Presentation P20-2]
Convention Paper 6789 (Purchase now)
P23-10 Audio Communication Coder—Anibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA; Deepen Sinha, ATC Labs - Chatham, NJ, USA
3G mobile and wireless communication networks elicit new ways of multimedia human interaction and communication, notably two-way high-quality audio communication. This is in line with both the consumer expectation of new audio experiences and functionalities, and with the motivation of telecom operators to offer consumers new services and communication modalities. In this paper we describe the design and optimization of a monophonic audio coder (Audio Communication Coder -ACC) that features low-delay coding (< 50 ms) and intrinsic error robustness, while minimizing complexity and achieving competitive coding gains and audio quality at bit rates around 32 kbit/s and higher. ACC source, perceptual, and bandwidth extension tools are described and an emphasis is placed on ACC structural and operational features making it suitable for real-time, two-say audio communication. A few performance results are also presented. Audio demonstrations are available at http://www.atc-labs.com/acc/.
[Poster Presentation Associated with Paper Presentation P20-3]
Convention Paper 6790 (Purchase now)
P23-11 ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding—Ralf Geiger, Fraunhofer IIS - Erlangen, Germany; Rongshan Yu, Institute for Infocomm Research - Singapore; Jürgen Herre, Fraunhofer IIS - Erlangen, Germany; Susanto Rahardja, Institute for Infocomm Research - Singapore; Sang-Wook Kim, Samsung Electronics - Suwon, Korea; Xiao Lin, Institute for Infocomm Research - Singapore; Markus Schmidt, Fraunhofer IIS, Erlangen - Germany
Recently, the MPEG Audio standardization group has successfully concluded the standardization process on technology for lossless coding of audio signals. This paper provides a summary of the Scalable Lossless Coding (SLS) technology as one of the results of this standardization work. MPEG-4 Scalable Lossless Coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innovative technology is described in detail and its performance is characterized for lossless and near-lossless representation, both in conjunction with an AAC coder and as a stand-alone compression engine. A number of application scenarios for the new technology are also discussed.
[Poster Presentation Associated with Paper Presentation P20-4]
Convention Paper 6791 (Purchase now)
P23-12 A Scalable CELP/Transform Coder for Low Bit Rate Speech and Audio Coding—Guillaume Fuchs, Roch Lefebvre, University of Sherbrooke - Sherbrooke, Quebec, Canada
With the increase of channel capacity in communication systems, several emerging applications require an acceptable reproduction quality for speech signals at low bit rates and a superior quality for any kind of audio inputs when more bandwidth is available. To meet this requirement, we propose a new scalable audio coding algorithm. The proposed coder consists of a wideband speech coder embedded in a multilayer transform coding algorithm. The transform coefficients are quantized using scalable lattice vector quantization. The global system exhibits low computational complexity and memory requirements and leads to a very fine-grained scalability. The new coding algorithm is suitable for communications over heterogeneous networks with no or uneven guarantee on the quality of service (QoS) for packet delivery.
Convention Paper 6802 (Purchase now)
P23-13 A New Low Bit-Rate Speech Coding Scheme for Mixed Content—Raghuram A., ATC Labs - Chatham, NJ, USA; Anibal Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Speech coding is a very mature research area and many coding schemes are available that provide speech qualities ranging from highly intelligible synthetic speech at about 2-kbit/s, to wideband natural speech at about 16-kbit/s. However, emerging application scenarios such as information services on broadcast radio are eliciting additional concurrent challenges not easily addressed by current speech coding technology, namely the need to code mixed audio material, the need to permit flexible bit-rate coding configurations, the need to scale effectively in quality in the range 2- to 8-kbit/s, and the need to offer pleasant natural sound. In this paper we present a new very low rate speech/audio coding technology addressing those concurrent challenges thanks to the use of innovative approaches regarding accurate reconstruction of harmonic complexes, optimal coding of the excitation, efficient side information coding, and suitable combination of new bandwidth extension techniques. The structure of the speech/audio coder is detailed and its performance in the range 2.4- to 12-kbit/s is illustrated and compared to that of reference coders.
Convention Paper 6803 (Purchase now)
P23-14 On Improving Parametric Stereo Audio Coding—Jimmy Lapierre, Roch Lefebvre, Sherbrooke University - Sherbrooke, Quebec, Canada
Existing schemes for stereo and spatial audio coding rely on psychoacoustically-relevant parametric models. These systems generally encode and transmit interchannel intensity, coherence, and phase parameters extracted from a time-frequency plane. Building on this framework, we discuss a number of potential refinements that can improve the quality or reduce the bit-rate of these existing schemes using information already transmitted to the decoder. We also evaluate and assert the performance of these enhancements with a distortion analysis of the relevant parameters.
Convention Paper 6804 (Purchase now)
P23-15 Stack-Run Audio Coding—Marie Oger, Julien Bensa, Stéphane Ragot, France Télécom R&D - Lannion Cedex, France; Marc Antonini, Lab. I3S-UMR 6070 CNRS and University of Nice Sophia Antipolis - Sophia Antipolis, France
In this paper we present an application of stack-run entropy coding to audio compression. Stack-run coding represents signed integers and zero run length by adaptive arithmetic coding using a quaternary alphabet (0, 1, +. -). We use this method to encode scalar quantization indices representing the MDCT spectrum of perceptually-weighted wideband audio signals (sampled at 16000 Hz). Noise injection and pre-echo reduction are also used to improve quality. The average quality of the proposed technique is similar to ITU-T G722.1. In addition, we compare the performance of scalar quantization with stack-run coding to the multirate lattice vector quantization of 3GPP AMR-WB+.
Convention Paper 6805 (Purchase now)
P23-16 A Codebook-Based Cascade Coder for Embedded Lossless Audio Coding—Christian Ritz, Kevin Adistambha, Jason Lukasiak, Ian Burnett, University of Wollongong - Wollongong, New South Wales, Australia
Embedded lossless audio coding embeds a perceptual audio coding bit stream within a lossless audio coding bit stream. Such an approach provides access to both a lossy and lossless version of the audio signal within the one coding scheme. Previously, a lossless embedded audio coder based on the Advanced Audio Coding (AAC) approach and utilizing both backward Linear Predictive Coding (LPC) and cascade coding was proposed. This paper further investigates the adaptation of cascade coding to lossless audio compression using a novel codebook-based approach. The codebook is trained using LPC residual signals obtained from the decorrelation stage of the embedded coder. Results show that the overall lossless compression performance of cascade coding closely follows Rice coding.
Convention Paper 6806 (Purchase now)
P23-17 A Unified Transient Detector for Enhanced aacPlus Encoder—Samsudin, Boon Poh Ng, Nanyang Technological University - Singapore; Evelyn Kurniawati, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore; Farook Sattar, Nanyang Technological University - Singapore; Sapna George, STMicroelectronics Asia Pacific Pte.Ltd. - Singapore
An enhanced aacPlus audio codec is a combination of MPEG-4 Advanced Audio Coding (AAC), Spectral Band Replication (SBR), and Parametric Stereo (PS). To deal with transient signals, SBR and AAC employ separate transient detectors, although both detectors basically perform detection on the same signal. This paper presents an idea of a low-complexity transient detector that operates in a PS encoder. It performs online detection at the same time as PS spatial parameter extraction and takes advantage of some computations performed for subband grouping. Testing on a few percussive solo instrument signals and percussive mixture shows a good transient information matching with the transient information generated by the original SBT and AAC detectors, with much less computational requirements. This implies that complexity of the encoder can be reduced by replacing both detectors with the proposed unified low-complexity detector.
Convention Paper 6807 (Purchase now)
P23-18 New Results in Rate-Distortion Optimized Parametric Audio Coding—Mads Græsbøll Christensen, Søren Holdt Jensen, Aalborg University - Aalborg, Denmark
In this paper we summarize some recently published methods and results in parametric audio coding. These are all based on rate-distortion optimized coding using a perceptual distortion measure. We summarize how a number of well-known computationally efficient methods for incorporating perception in sinusoidal parameter estimation relate to minimizing this perceptual distortion measure. Then a number of methods for parametric coding of transients are compared, and results of listening tests are presented. Finally, we show how the complexity of rate-distortion optimized audio coding can be reduced by rate-distortion estimation.
Convention Paper 6808 (Purchase now)
P23-19 Harmonic Structure Reconstruction in Audio Compression Method Based on Spectral-Oriented Trees—Wei-Chen Chang, Jing-Xin Wang, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
A novel audio compression method called Harmonic Structure Quad Tree is presented. The method employs a bit-plane based quantization-encoding method called Concurrent Encoding in Hierarchical Tree to encode MDCT coefficients of the overlapping audio frames. Scalability is easily achieved by discarding the tailing bits at any position of the bit-stream as long as head information is preserved. In this paper an embedded harmonic structure reconstruction method is proposed to predict and restore the coefficients missed during the encoding process. The proposed method is compared favorably to the popular MP3 coder in both pop and classic audio programs. No psychoacoustic model is used. The computation complexity and the coding table size are much smaller compared to those of an MP3 coder.
Convention Paper 6809 (Purchase now)
P23-20 An Experimental Audio Coder Using Rate-Distortion Controlled Temporal Block Switching—Johannes Boehm, Sven Kordon, Peter Jax, Thomson Corporate Research - Hannover, Germany
To address the requirement of piecewise stationarity within the analyzed signal segments, today’s state of the art audio codecs make use of two filter bank resolutions. Short temporal resolution sequences are used to adapt to transient-like jump signals; long temporal resolutions are used to effectively code the more steady or slowly drifting waveforms. With increasing computational capacity a better adaptation of the filter bank to the signal becomes feasible. This paper presents an experimental MDCT-based transform coder that is capable of switching between four filter bank resolutions. A distortion measure is deployed that is driven by a simple psychoacoustic model that incorporates masking effects both for stationary and transient signals. A rate-distortion control is proposed to partition the signal to optimally match the signal contour with the temporal resolutions of the filter bank. Performance results are presented and compared to the conventional two-resolution approach. Proposals for further developments such as pre-segmentation are evaluated.
Convention Paper 6810 (Purchase now)
P23-21 Detection and Extraction of Transients for Audio Coding—Oliver Niemeyer, Thomson Corporate Research - Hannover, Germany; Bernd Edler, University of Hannover - Hannover, Germany
An algorithm for the detection and extraction of transient signal components is presented. It is based on the detection of sharp onsets of the signal power in time direction of the complex time frequency domain. Afterward the detected transients are extracted in the corresponding MDCT spectrum. The audio signal containing only the extracted transients is synthesized using the inverse MDCT. In an audio coding application this transient signal and a resulting residual signal can be coded separately using specifically optimized coders. One approach of such an audio coding scheme using an MDCT-based coder for the transient signal is also presented.
Convention Paper 6811 (Purchase now)
P23-22 Audio Coding Using a Genetic Algorithm—David Marston, BBC R&D - Tadworth, Surrey, UK
Currently MPEG 1 Layer II encoders incorporate a feedforward technique where a psychoacoustic model derived from the input signal drives the bit allocation. This paper describes a novel approach where a psychoacoustic metric compares the output signal with the input signal to drive the bit allocation and uses a genetic algorithm in the feedback process. The audio quality is compared with that of leading conventional audio coders. The aim of the work was to assess how far Layer II coding can be improved and whether any further progress can be made with conventional coding.
Convention Paper 6812 (Purchase now)
P23-23 Parametric Representation of Multichannel Audio Based on Principal Component Analysis—Manuel Briand, David Virette, France Telecom R&D - Lannion Cedex, France; Nadine Martin, Laboratoire des Images et des Signaux - St. Martin D’Hères Cedex, France
Low-bit-rate parametric audio coding for multichannel audio is mainly based on Binaural Cue Coding (BCC). In this paper we show that the Unified Domain Representation (UDR) of multichannel audio, recently introduced, is equivalent to BCC scheme in a parametric stereo coding context. Based on the fact that spatial parameters can be represented by rotation angles, we propose a general model based on the Principal Component Analysis (PCA) approach. This model may be applied both to parametric representation of multichannel audio signals and upmix methods. Moreover, we apply the analysis results to propose a new parametric audio coding method based on frequency subbands PCA processing.
Convention Paper 6813 (Purchase now)
P23-24 A Dual Audio Transcoding Algorithm for Digital Multimedia Broadcasting Services—Kyoung Ho Bang, Yonsei University - Seoul, Korea; Young Cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
In this paper we propose a dual audio transcoding algorithm to service high quality audio streams using a broadcasting network comprising heterogeneous audio formats. As two typical cases, audio transcodings from TDTV to T-DMB and S-DMB services are considered. While the Korean DTV audio standard employs the Dolby AC-3, the Korean T-DMB and Korean S-DMB services use the MPEG-4 BSAC and the MPEG-4 HE-AAC audio coding technologies, respectively. In the proposed algorithm, the bit allocation information of AC-3 is reused in the process of BSAC and HE-AAC encodings and the nested loops are reestablished as two independent loops, which saves significant amount of computational cost. Overall, the transcoding algorithm can save about 65 percent of the computational cost for the BSAC encoding and 31 percent of HE-AAC encoding. Subjective quality evaluations show that the proposed algorithm has mean diffgrades of -0.02 and -0.01 relative to the tandem method. Due to its computational simplicity and effective performance, the proposed algorithm is suitable for the mobile multimedia services.
Convention Paper 6814 (Purchase now)
P23-25 A Subband Domain Downmixing Scheme for Parametric Stereo Encoder—Samsudin, Nanyang Technological University - Singapore; Evelyn Kurniawati, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore; Farook Sattar, Ng Boon Poh, Nanyang Technological University - Singapore; Sapna George, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore
Parametric Stereo (PS) coding describes a stereo audio signal with a monaural signal and a set of spatial parameters. This paper describes a signal-dependant, subband-domain downmixing scheme for a PS encoder to obtain the monaural signal. The downmixing is performed on the subband signal output from PS analysis filtering, hence no extra signal decomposition is required. The scheme is able to minimize phase cancellation by performing phase alignment of the stereo signals prior to the mixing. In addition, power equalization ensures preservation of the overall power of the original stereo signal in the monaural downmixed signal. Additional computational requirements can be kept low by making use of the available PS spatial parameter data for the phase alignment. Testing on synthetic and real-life audio recording shows a good performance especially for audio recording with significant out-of-phase side signal component.
Convention Paper 6815 (Purchase now)
|(C) 2006, Audio Engineering Society, Inc.