AES 17th Conference
High Quality Audio Coding
1999, September 2nd-5th


Villa Castelletti
Signa (near Florence)

AES Logo


1-1 Audio Coding in Digital Broadcasting Systems
Martin Dietz, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)

High quality audio coding is one of the key technologies for digital broadcasting systems, since this technology allows sufficiently low datarates for cost and spectrum efficient audio broadcasting in the digital domain. The paper gives an overall view of existing and emerging digital broadcasting systems, including Eureka DAB, WorldSpace, DVB, ATSC, ARIB, IBOC and DRM.

1-2 Streaming-Audio@Internet: Perspectives for the broadcasters!
Gerhard Stoll, IRT, Munich, Germany (Invited)

The Internet's impact on broadcasters has changed from its initial position as a site for posting broadcasting information to its emergence as a valuable additional medium to reach new customers through streaming audio and new interactive concepts using push and pull technologies. Increased availability of bandwidth, faster modems, improved and scalable audio coding schemes will create the opportunity for better definition of audio quality on the Web. Will the Internet be able to broadcast in the near future not only news but also music channels and sport events with a sufficient quality of service, maybe even in surround sound?

1-3 DVD-Audio Specification
Hiroaki Suzuki, JVC Multimedia System Development Center, Yamato, Kanagawa, Japan (Invited)

DVD-Audio Specifications Ver. 1.0 will have been published early spring of 1999. DVD-Audio is a product which has been brought through the discussion between WG-4 of DVD-Forum, mainly representing hardware manufacturers, and ISC (International Steering Committee), representing the three recording industry associations, IFPI, RIAA, and RIAJ. Not only providing features related to the high-quality pure audio and multi-channel music, DVD-Audio also offers various multimedia and value added capabilities. The paper describes the major features which will surely make DVD-Audio successful as a new audioformat.

1-4 Dolby Digital: Audio Coding for Digital Television and Storage Applications
Steve Vernon, Dolby Laboratories Inc., San Francisco, California, USA

Dolby Digital is a worldwide audio coding standard, with applications that include digital television and home theater. Its underlying perceptual coding engine provides high quality multichannel audio at low bitrates, without requiring excessive computational complexity. Dolby Digital also supports a number of system features that improve the overall listening experience, such as volume normalization and dynamics processing. This paper presents an overview of Dolby Digital, focusing on the design elements that differentiate it from competing systems and that have contributed to its success.

2-1 The MLP Lossless Compression System
M.A. Gerzon (deceased), P.G. Craven, J.R. Stuart, M.J. Law and R.J. Wilson, Meridian Audio, Cambridge, UK (Invited)

Lossless compression provides bit-exact delivery of the original signal and is ideal where the highest possible confidence in the final sound quality is required. MLP (Meridian Lossless Packing) has recently been adopted as the lossless coding method used on DVD-Audio.
MLP uses four principal strategies to reduce both the total quantity and the peak rate of encoded data. MLP can invert a matrix transformation losslessly: this allows a 2-channel representation to be transmitted alongside a multichannel signal, with minimal increase in data rate.
The paper illustrates how the characteristics of the incoming audio affect the coding performance, and indicates MLP's versatility, achieved by use of substreams and an open-ended metadata specification.

2-2 DTS Multi-Channel 96 kHz Audio Compression
Yu-Li You, Zoran Fejzo, Stephen Smyth,Paul Smith, Marina Bosi and Ming Yan, Digital Theater Systems, Inc., Agoura Hills, CA, USA (Invited)

This paper presents DTS multi-channel audio compression technology that operates at a sampling frequency of up to 96 kHz and remains compatible with DTS first generation technology which delivers 5.1 channel audio at a sampling frequency of up to 48 kHz. High-sampling frequency (up to 96 kHz) multi- channel audio is decomposed into core audio (up to 48 kHz) and a residual. The core audio is encoded using DTS first generation technology and the encoded core bit stream is fully compatible with first generation DTS decoders in the market. The residual is encoded using technologies that extend the sampling frequency to up to 96 kHz and can even improve the quality of the core audio. The compressed residue is attached as an extension to the core bit stream. The extension bit stream is ignored by the first generation DTS decoders but can be decoded by the second generation DTS decoders. The en coding algorithm can deliver 96 kHz multi-channel audio at 24 bits per sample with a rate of up to 4.5 Mbps. Details of the main processing blocks of the core and extension decoding algorithms, their computational load and memory requirements are presented. One implementation is described of a 5.1 channel, 96 kHz, 24-bit decoding process operating on dual SPARC 21065L 32-bit floating-point processors. This particular configuration confines all the essential high sampling rate operations to the second processor, allowing a simple hardware upgrade path to be considered for the 96/24 'high definition' audio format.

2-3 The Design of a Video Friendly Audio Coding System for Distribution Applications
Louis D. Fielder, Craig C. Todd, Dolby Laboratories, Inc., San Francisco, CA, USA (Invited)

Dolby E is an audio bit rate reduction system for audio-visual distribution applications. The coding system was developed for situations involving tandem coding operations, editing and switching at video frame boundaries, and transport on AES3 serial-digital data paths. Special TDAC transforms are organized a frame structure that allows for adaptive transform lengths and frame synchrony with video framing.

3-1 MP3 and AAC explained
Karlheinz Brandenburg, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)

The last years have shown widespread proliferation of .mp3-files, both from legal and illegal sources. Yet most people using these audio files do not know much about audio compression and how to use it. The paper gives an introduction to audio compression for music file exchange. Beyond the basics the focus is on quality issues and the compression ratio / audio bandwidth / artifacts tradeoffs.

3-2 MPEG-4 Intellectual Property Management & Protection (IPMP) Overview & Aplications
Jack Lacy, AT&T Laboratories, Florham Park, NJ, USA; Niels Rump, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany; Talal Shamoon, InterTrust Technologies Corporation, Sunnyvale, CA, USA; Panos Kudumakis, CRL, London, England (Invited)

This presentations acquaints you with the MPEG-4 Intellectual Property Management & Protection (IPMP) Framework, a generic approach to the specification of a framework for the management and protection of intellectual property rights. Besides specifying a flexible way to identify content, MPEG-4 IPMP offers a "hooks" mechanism that allows for the inclusion of information in the MPEG-4 bit stream that is used to indicate that a particular non-normative IPMP system is to be used to manage access to that stream. This enables the design of applications that provide efficient management and protection of both content-related and patent-related IP.

3-3 AESSC SC-06-04 Activities on Digital Music Distribution
Phil Wiser, Liquid Audio, Redwood City, CA, USA; Niels Rump, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)

Since forming in July 1997, the SC-06-04 standard committee has addressed many online audio issues ranging from audio quality to security aspects of digital music. This group initially focused on mechanisms for calibrating often abused audio quality descriptors such as "CD quality". The group has also hosted discussions with international rights organizations to address the needs of that community. Finally, the SC-06-04 has provided a forum for discussion of ideas that are critical issues for the newly formed SDMI process.

3-4 SDMI: Creating a Marketplace for Secure Digital Music
Cary Sherman, Recording Industry Association of America, Washington D.C., USA (Invited)

A progress report on the Secure Digital Music Initiative. SDMI is a forum for the worldwide recording industry and technology companies to develop an open, interoperable architecture and specification for digital music security. The goal is to meet consumer demand for convenient access to digital music, protect artists ' rights online, and allow technology and music companies to build successful businesses.

4-1 MPEG-4 speech coding
Masayuki Nishiguchi, Audio and Speech Group, Sony Corporation, Tokyo, Japan (Invited)

MPEG-4 Audio standard (ISO/IEC 14496-3) supports highly efficient representation of speech signals at 2.0-24 kbps using two basic algorithms; Harmonic Vector eXcitation Coding (HVXC) and Code Excited Linear Prediction (CELP). This paper describes overview of MPEG-4 speech coding algorithms and new functionalities such as speed/pitch change and bit-rate scalability, which, together with high coding efficiency, distinguish MPEG-4 from other standards.

4-2 MPEG-4 Structured Audio Authoring Considerations
Lee Ray, Create Audio Technology Center, Scotts Valley, CA, USA (Invited)

MPEG-4 Structured Audio (SA) provides sets of materials which can be manipulated either as the author has specified in creation, as the listener specifies during the listening session or with a combination of both kinds of control. Unlike a system that is a "straight wire with (coding) gain", SA can transmit new kinds of interactive content. Various strategies for authoring interactive content with SA are reviewed.

4-3 The MPEG-4 General Audio Coder
Bernhard Grill, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)

The paper presents the MPEG-4 general audio (GA) coder which - unlike the MPEG-4 speech or synthetic coding schemes - is designed to work with any type of audio material. The GA coder includes all features of MPEG-2 AAC, plus extensions for improved coding quality at very low bitrates, including the TwinVQ scheme. In addition, embedded coding techniques, where one program is represented by a hierarchical set of coding layers, further extend the capabilities.

4-4 An Overview of MPEG-4 Audio Version 2
Heiko Purnhagen, Laboratorium für Informationstechnologie University of Hannover, Germany (Invited)

The MPEG-4 Audio Standard provides audio and speech coding for natural and synthetic content at bitrates ranging from 2 to 64 kbit/s and above. While the first version of MPEG-4 Audio was ffinalizedin 1998, work continues for Version 2, complementing MPEG-4 Audio by the following new tools: Error Resilience, Low-Delay Audio Coding, Small Step Scalability, Parametric Audio Coding, and Environmental Spatialisation.

5-1 Application of a physiological ear model to irrelevance reduction in audio coding
Frank Baumgarte, Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität Hannover, Germany

A previously published physiological ear model is applied as perceptual model to an audio coder complying with the ISO/MPEG-2 AAC standard. The achieved subjective sound quality is compared to results from an optimized psychoacoustical model. Significant deviations of the generated masked thresholds from the physiological ear model and the psychoacoustical model are evaluated with respect to psychoacoustical measurements.

5-2 Audio coding with auditory time-frequency noise shaping and irrelevancy reducing vector quantization
Markus Vaalgamaa, Aki Härmä, Unto Laine, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland

A perceptual audio codec using Warped Linear Prediction, Temporal Noise Shaping and Vector Quantization is presented. These techniques are used to shape the quantization noise according to spectral and temporal masking characteristics of ear. The quantization process is controlled by an auditory model. In addition simulations and subjective listening test results of the codec are presented.

5-3 Generalized Audio Coding with MPEG-4 Structured Audio
Eric D. Scheirer, Youngmoo E. Kim, Machine Listening Group, MIT Media Laboratory, Cambridge, MA, USA

The MPEG-4 Structured Audio standard was created to enable low-bitrate yet high-quality transmission of synthetic audio soundtracks. However, structured-audio techniques are suitable for flexible natural coding of audio as well as audio synthesis. This article introduces the concept of generalized audio coding, in which the Structured Audio decoder is used to emulate the behavior of other audio decoders.

6-1 Warped low-delay CELP for wideband audio coding
Aki Härmä, Unto K. Laine, Matti Karjalainen, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland

Low-delay audio coding is a somewhat new trend in perceptual wideband audio coding. Low coding delay is important, e.g., in applications based on bidirectional real-time audio transmission. The technical aspects and psychoacoustics of such applications are reviewed and an audio codec with a coding delay of less that 2 ms is introduced in the paper.

6-2 Exploiting Excess Masking for Audio Compression
Ye Wang, Miikka Vilermo, Nokia Research Center Speech and Audio Systems Lab., Tampere, Finland

In order to improve audio coding performance, excess masking has been employed for the compression of complex audio signals. A new algorithm is developed to classify and preprocess maskers. A psychoacoustic model is used to estimate simultaneous masking threshold. This masking threshold is used for quantizing audio signal coefficients in the frequency domain. Preliminary test results show improved coding efficiency.

6-3 Audio Coding based on Rate-Distortion and Perceptual Optimization Techniques
Markus Erne, Institut für Signal- und Informationsverarbeitung ETH Zürich, Zürich Switzerland

A new approach to audio coding based on optimization techniques will be presented. In contrast to existing coding algorithms, this framework is based on a signal-adaptive filterbank which is controlled using optimization techniques for a rate-distortion or perceptual metric. Dynamic Programming techniques allow to optimize the time-frequency tiling of the filterbank and to minimize the distortion for a given bit-budget.

6-4 Sinusoids plus noise modelling for audio signals
A. W.J. Oomen, Philips Research Laboratories, Eindhoven, The Netherlands; A.C. den Brinker, Philips Research Laboratories, Eindhoven, NL

Considered is the problem of describing quasi-stationary audio segments as a sum of sinusoids plus noise. After the extraction of the sinusoids, the noise is described in stochastic terms. An essential problem here is to determine how many sinusoidal components should be taken into account. It is shown how this problem can be handled by the psychoacoustic model.

6-5 Advanced Perceptual Digital Audio Coding Algorithm
Juriy A. Kovalgin, Dmitriy A. Hitrov Maxim, V. Zyrianov, State University of Telecommunications St.Petersburg Chair of Radio, Broadcasting Russia, St.Petersburg, Russia

A new efficient coding scheme of high-quality audio signals with digital data compression is proposed. It is based on a signal-adaptive filter bank, the modified psychoacoustic model (MPEG-ISO/IEC-FCD14496-3, Subpart 1), another algorithm of bit-allocation. Masking in the frequency and time domain, binaural unmasking effect and particular features of virtual sound source localization by stereo signal binaural perception are considered.

6-6 High quality consistent analysis-synthesis in sinusoidal coding
Koen Vos, Department of Electrical Engineering, Delft University of Technology, Delft, NL; Renat Vafin, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden; Richard Heusdens, W. Bastiaan Kleijn

In this paper, we discuss several improvements in sinusoidal coding. First, we present an analysis method which estimates windowed sinusoids to represent a segment of an input speech or audio signal. The window used in the analysis is the same amplitude-complementary window as is used in overlap-add synthesis, which makes analysis consistent with synthesis. Second, we present techniques for optimization of sinusoidal parameters based on the squared difference between the input signal and reconstruction. It is shown how the overlapping nature of segments can be accounted for in the sinusoidal estimation. Efficient methods for computation are also discussed. Experimental results verify that our procedures provide a significant improvement in reconstruction accuracy.

6-7 Developments with a Zerotree Audio Codec
Ben Leslie, Chris Dunn, Mark Sandler, Department of Electronic Engineering King's College London Strand, London, UK

We present a detailed description of the EZK algorithm, a version of zerotree quantisation developed by the authors, and describe why it provides higher compression than the alternatives. We compare the performance of the algorithm using several alternative transforms, and describe the incorporation of psychoacoustic modelling. Finally, we compare the performance of the codec with some alternatives.

6-8 The Perceptual Audio Coding Concept: From Speech to High-Quality Audio Coding
Anibal Ferreira, FEUP University of Porto, Porto, Portugal

High-quality audio coding finds its roots in the speech coding arena whose research activity not only historically produced the first coding solutions and standards, but also raised the attention to the importance of modeling perceptual phenomena. This paper tracks the evolution of the perceptual audio coding concept by identifying its importance on reference speech, wideband speech, and high-quality audio coders. Representative algorithms are reviewed and an emphasis is placed on the current state-of-the-art as well as on future developments and technological trends.

6-9 An Audio Codec for Multiple Generations Compression Without Loss of Perceptual Quality
Frank Kurth, Institut für Informatik V, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany

We describe a generic audio codec allowing for multiple, i.e. cascaded, lossy compression without loss of perceptual quality as compared to the first generation of compressed audio. For this sake we transfer encoding information to all subsequent codecs in a cascade. The new method is applicable to wide range of current audio codecs as documented by our MPEG-1 implementation.

6-10 Improving Lossless Audio Coding
Jürgen Koller, Thomas Sporer, Karlheinz Brandenburg, Fraunhofer-Institut für Integrierte Schaltungen, Erlangen, Germany

During the last years low bit-rate coding of high quality digital audio signals spread over a large range of applications. Such lossy audio coding schemes should not be used in studio applications. Lossless audio coding gives the opportunity to preserve as much as possible of the precious audio material. At the 103rd AES convention we presented first results on a new coding scheme, which combines the advantages of low bit rate audio coding and lossless compression.
In this paper we describe the improvements done on that coding scheme: A different quantization characteristic and improved strategies for quantization control and bit-stream representation lead to a distinct improvement in compression.

6-11 Higher Order Estimation of Sinusoidality with Applications for Quality Coding of Musical Instruments
Shlomo Dubnov, Department of Communication Systems Engineering, Faculty of Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel

Sinusoidal models capture time-varying spectral characteristics of a sound by representing harmonic components with time-varying sinusoids. An important characteristic of harmonic components is their quality of fit to a "sinusoidality" assumption. In the paper we consider Higher Order Statistical (HOS) methods for sinusoidality determination. HOS are sensitive to non-linear interactions between spectral components, such as phase and frequency coupling. We show that sinusoidality is closely related to frequency but NOT to phase coupling. We present a bispectral algorithm for sinusoidality detection and a simplified version of the algorithm that determines an upper sinusoidality cutoff frequency from kurtosis estimates of the residual (pre-whitened) signal.

7-1 Subjective Evaluation of Large and Small Impairments in Audio Codecs
Gilbert A. Soulodre, Michel Lavoie, Advanced Sound Systems Group Communications Research Centre, Ottawa, Ont., Canada (Invited)

A significant amount of research has been conducted in the development of a test methodology for evaluating small impairments in audio codecs. Specifically, ITU-R Recommendation BS.1116 provides a full description of the accepted methodology which has been used extensively in the development and evaluation of high quality audio codecs. This methodology has proven effective at generating consistent results in subjective tests and can provide a high degree of resolution in discriminating between codecs. With the recent trend towards very low bitrate audio codecs and the corresponding lower quality, a need has arisen for a subjective test methodology which will allow the performance of these codecs to be evaluated and compared in a rigorous fashion. The method described in BS.1116 is not entirely appropriate for this purpose and so a new method has been developed for evaluating audio codecs with larger impairments. The new method strives to maintain those aspects of BS.1116 which have proven most effective, while extending it to address the particular difficulties encountered when evaluating large impairments. This paper describes the methodologies for evaluating small and large impairments, and includes results from formal subjective tests.

7-2 Perceptual Quality Assessment for Digital Audio: PEAQ – the Proposed ITU Standard for Objective Measurement of Perceived Audio Quality
Catherine Colomes, Centre Commun d'Etudes de Télédiffusion et Télécommunications, Rennes, France; Christian Schmidmer, OPTICOM, Erlangen, Germany; Thilo Thiede, Tepholm & Westermann ApS, Denmark; William C. Treurniet, Communications Research Centre, Ottawa, Canada (Invited)

An increasing attention is being paid to the issue of objective quality assessment, basically in the perceptual coding and the metrology domains. Perceptual coding of audio signals is increasingly used in transmission and storage of high quality digital audio, and there is an obvious demand for an acceptable objective method to measure the quality of such signals. A new measurement method is described that combines features from several earlier methods. This method meets the requirements of the user community, and it has become a recommendation within the ITU Radio communication Study Groups.

Updated: 03.08.99