AES Vienna 2007
Home Visitors Exhibitors Press Students Authors
Visitors
Technical Program
Paper Sessions

Spotlight on Broadcasting

Spotlight on Live Sound

Spotlight on Archiving

Detailed Calendar

Convention Planner

Paper Sessions

Workshops

Tutorials

Exhibitor Seminars

Application Seminars

Special Events

Student Program

Technical Tours

Technical Council

Standards Committee

Heyser Lecture









Last Updated: 20070405, mei

P15 - Low Bit-Rate Audio Coding

Monday, May 7, 09:00 — 13:00

Chair: Karlheinz Brandenburg, Technical University of Ilmenau - Ilmenau, Germany

P15-1 A Biologically-Inspired Low-Bit-Rate Universal Audio CoderRamin Pichevar, Hossein Najaf-Zadeh, Louis Thibault, Communications Research Centre - Ottawa, Ontario, Canada
We propose a new biologically-inspired paradigm for universal audio coding based on neural spikes. Our proposed approach is based on the generation of sparse 2-D representations of audio signals, dubbed as spikegrams. The spikegrams are generated by projecting the signal onto a set of over-complete adaptive gammachirp (gammatones with additional tuning parameters) kernels. A masking model is applied to the spikegrams to remove inaudible spikes and to increase the coding efficiency. The paradigm proposed in this paper is a first step toward the implementation of a high-quality audio encoder by further processing acoustical events generated in the spikegrams. Upon necessary optimization and fine-tuning our coding system, operating at 1 bit/sample for sound sampled at 44.1 kHz, is expected to deliver high quality audio for broadcast applications and other applications such as archiving and audio recording.
Convention Paper 7078 (Purchase now)

P15-2 The Relationship Between Basic Audio Quality and Selected Artifacts in Perceptual Audio Codecs—Part II: Validation ExperimentPaulo Marins, Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
A pilot study was conducted to investigate the perceptual importance of selected audio coding artifacts and their relationship with basic audio quality. An additional experiment was undertaken to validate the results obtained in the pilot calibration experiment. A listening test was designed that required a panel of expert subjects to evaluate the selected artifacts used in the initial study. In this second experiment, however, certain experimental parameters were modified; these included different levels of degradation and program material. The outcomes of the validation experiment are presented in this paper along with a detailed evaluation of the impact of the chosen experimental artifacts on basic audio quality assessments for perceptual audio codecs.
Convention Paper 7079 (Purchase now)

P15-3 New Enhancements to Immersive Sound Field Rendition (ISR) SystemChandresh Dubey, Raghuram Annadana, Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, ATC Labs - Chatham, NJ, USA, and University of Porto, Porto, Portugal
Consumer audio applications such as satellite radio broadcasts, multichannel audio streaming, and playback systems coupled with the need to meet stringent bandwidth requirements are eliciting newer challenges in parametric multichannel audio coding schemes. This paper describes the continuation of our research concerning the Immersive Soundfield Rendition (ISR) system and the different enhancements in various algorithmic components. The need to maintain a constant bit rate for many applications requires a rate control mechanism. The various strategies utilized in the rate loop mechanism are presented. In addition, an innovative phase compensated downmixing scheme has been incorporated in the ISR system so as to generate a high quality carrier signal. Enhancements have been made to the blind up-mixing scheme and considerable gains have been made in terms of acoustic diversity. The various enhancements of the ISR system and its performance are detailed. Audio demonstrations are available at http://www.atc-labs.com/isr.
Convention Paper 7080 (Purchase now)

P15-4 Aspects of Scalable Audio CodingChris Dunn, Independent Consultant - London, UK
Banded weight data is transmitted as side information within coded audio bit streams in order to achieve psychoacoustically-appropriate shaping of quantization noise. Methods of reducing the information overhead corresponding to weight data are discussed in the context of scalable bit-plane coding. Two approaches to coding band weight data are compared in terms of coding efficiency and error resilience. In the first approach, weights are coded as a block of data at the beginning of each frame, using a predictor and Golomb coding of weight prediction residuals to achieve high coding efficiency. This approach is compared to coding weights for bands as they become significant, with weight data distributed across each coded bit stream frame.
Convention Paper 7081 (Purchase now)

P15-5 Source-Controlled Variable Bit Rate Extension for the AMR-WB+ Audio CodecAmélie Marty, Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
This paper presents a source-controlled, variable bit rate extension to the AMR-WB+ standard audio codec. AMR-WB+ allows multirate operation and, in particular, rate switching at every frame. However, the standard does not support source-controlled rate determination since it does not include a signal classifier. The proposed extension includes a signal classifier and rate mapping function for each signal class. Classification is performed at a lower frame rate compared to AMR-WB+, with typically one classification decision every second. Significant rate savings can be achieved by encoding speech at lower rates than other signals such as music. Applications include audio broadcasting over packet networks and storage of multimedia signals with mixed signals in the audio track.
Convention Paper 7082 (Purchase now)

P15-6 Multiple Description Coding for Audio Transmission Using Conjugate Vector QuantizationMylene Kwong, Roch Lefebvre, Soumaya Cherkaoui, Université de Sherbrooke - Sherbrooke, Quebec, Canada
This paper explores robustness issues for real-time audio transmission over perturbed networks where multiple paths can be considered. Conjugate vector quantization (CVQ), a form of multiple description coding, can improve the resilience to packet losses. This work presents a generalized CVQ structure, where K>2 different conjugate codebooks are trained to create the best resulting codebook. Experiments show that four-description CVQ performs very closely to unconstrained VQ in clear channel conditions, while providing significant improvements in lossy channels. We also present a fast search algorithm that allows tradeoffs between computational complexity and memory storage at the encoder. This robust quantization scheme can encode sensitive information such as spectral coefficients in a speech coder or a perceptual audio coder.
Convention Paper 7083 (Purchase now)

P15-7 MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multichannel Audio CodingJuergen Herre, Fraunhofer IIS - Erlangen, Germany; Kristofer Kjörling, Coding Technologies AB - Stockholm, Sweden; Jeroen Breebaart, Philips Research - Eindhoven, The Netherlands; Christof Faller, Agere Systems - Allentown, PA, USA; Sascha Disch, Fraunhofer Institute IIS - Erlangen, Germany; Heiko Purnhagen, Coding Technologies - Stockholm, Sweden; Jeroen Koppens, Philips Applied Technologies - Eindhoven, The Netherlands; Johannes Hilpert, Fraunhofer Institute IIS - Erlangen, Germany; Jonas Rödén, Coding Technologies - Stockholm, Sweden; et al.
In 2004 the ISO/MPEG Audio standardization group started a new work item on efficient and backward compatible coding of high-quality multichannel sound using parametric coding techniques. Finalized in fall of 2006, the resulting MPEG Surround specification allows to carry surround sound at bit rates that have been commonly used for coding of mono or stereo sound. This paper summarizes the results of the standardization process by describing the underlying ideas and providing an overview of the MPEG Surround technology. The performance of the scheme is characterized by the results of the recent verification tests. These tests include several operation modes as they would be used in typical application scenarios to introduce multichannel audio into existing audio services.
Convention Paper 7084 (Purchase now)

P15-8 Adaptive Design of the Preprocessing Stage for Stereo Lossless Audio CompressionFlorin Ghido, Ioan Tabus, Tampere University of Technology - Tampere, Finland
We propose a novel lossless audio compression scheme, which combines stereo preprocessing with stereo prediction. We show that such a scheme provides improved asymmetrical compression at almost no complexity increase for decoder (compared with stereo prediction alone), or the same compression for lower decoder complexity. The stage of stereo prediction is preceded by a rotation-like channel transformation, which improves compression by requiring smaller inter-channel optimal prediction orders and by obtaining smaller magnitudes for prediction coefficients. On a corpus consisting of 84 audio files (in CD-A format), for the OptimFROG-AS (asymmetric) lossless audio compressor using stereo prediction with orders 8/4, we obtained, on an audio corpus (in CD-audio format) of size 51.6 GB, compression improvements up to 5.10 percent on average 0.23 percent.
Convention Paper 7085 (Purchase now)