- 1-1 Audio Coding in Digital
Broadcasting Systems
Martin Dietz, Fraunhofer-Institut für Integrierte
Schaltungen IIS-A, Erlangen, Germany (Invited)
- High quality audio coding is one of the key
technologies for digital broadcasting systems, since this technology allows sufficiently
low datarates for cost and spectrum efficient audio broadcasting in the
digital domain. The paper gives an overall view of existing and emerging
digital broadcasting systems, including Eureka DAB, WorldSpace, DVB, ATSC,
ARIB, IBOC and DRM.
- 1-2 Streaming-Audio@Internet:
Perspectives for the broadcasters!
Gerhard Stoll, IRT, Munich, Germany (Invited)
- The Internet's impact on broadcasters has changed from its initial
position as a site for posting broadcasting information to its emergence as a
valuable additional medium to reach new customers through streaming audio and
new interactive concepts using push and pull technologies. Increased
availability of bandwidth, faster modems, improved and scalable audio coding
schemes will create the opportunity for better definition of audio quality on
the Web. Will the Internet be able to broadcast in the near future not only
news but also music channels and sport events with a sufficient quality of
service, maybe even in surround sound?
- 1-3 DVD-Audio Specification
Hiroaki Suzuki, JVC Multimedia System Development Center, Yamato, Kanagawa, Japan (Invited)
- DVD-Audio Specifications Ver. 1.0 will have been published early
spring of 1999. DVD-Audio is a product which has been brought through the
discussion between WG-4 of DVD-Forum, mainly representing hardware
manufacturers, and ISC (International Steering Committee), representing the
three recording industry associations, IFPI, RIAA, and RIAJ.
Not only providing features related to the high-quality pure audio and
multi-channel music, DVD-Audio also offers various multimedia and value added
capabilities. The paper describes the major features which will surely make
DVD-Audio successful as a new audioformat.
- 1-4 Dolby Digital: Audio Coding
for Digital Television and Storage Applications
Steve Vernon, Dolby Laboratories Inc., San Francisco,
California, USA
- Dolby Digital is a worldwide audio coding
standard, with applications that include digital television and home theater.
Its underlying perceptual coding engine provides high quality multichannel audio
at low bitrates, without requiring excessive computational complexity. Dolby
Digital also supports a number of system features that improve the overall
listening experience, such as volume normalization and dynamics processing.
This paper presents an overview of Dolby Digital, focusing on the design
elements that differentiate it from competing systems and that have contributed
to its success.
- 2-1 The MLP Lossless Compression System
M.A. Gerzon (deceased), P.G. Craven,
J.R. Stuart, M.J. Law and R.J. Wilson, Meridian Audio, Cambridge, UK (Invited)
- Lossless compression provides bit-exact delivery of the original
signal and is ideal where the highest possible confidence in the final sound
quality is required. MLP (Meridian Lossless Packing) has recently been
adopted as the lossless coding method used on DVD-Audio.
MLP uses four principal strategies to reduce both the total quantity and
the peak rate of encoded data. MLP can invert a matrix transformation
losslessly: this allows a 2-channel representation to be transmitted
alongside a multichannel signal, with minimal increase in data rate.
The paper illustrates how the characteristics of the incoming audio affect
the coding performance, and indicates MLP's versatility, achieved by use
of substreams and an open-ended metadata specification.
- 2-2 DTS Multi-Channel 96 kHz Audio Compression
Yu-Li You, Zoran Fejzo, Stephen Smyth,Paul Smith, Marina Bosi and Ming Yan, Digital Theater Systems, Inc., Agoura Hills, CA, USA (Invited)
- This paper presents DTS multi-channel audio compression technology that
operates at a sampling frequency of up to 96 kHz and remains compatible with
DTS first generation technology which delivers 5.1 channel audio at a sampling
frequency of up to 48 kHz. High-sampling frequency (up to 96 kHz) multi-
channel audio is decomposed into core audio (up to 48 kHz) and a residual.
The core audio is encoded using DTS first generation technology and the
encoded core bit stream is fully compatible with first generation DTS
decoders in the market. The residual is encoded using technologies that
extend the sampling frequency to up to 96 kHz and can even improve the quality
of the core audio. The compressed residue is attached as an extension to
the core bit stream. The extension bit stream is ignored by the first
generation DTS decoders but can be decoded by the second generation DTS
decoders. The en coding algorithm can deliver 96 kHz multi-channel audio
at 24 bits per sample with a rate of up to 4.5 Mbps. Details of the main
processing blocks of the core and extension decoding algorithms, their
computational load and memory requirements are presented. One implementation
is described of a 5.1 channel, 96 kHz, 24-bit decoding process operating on dual
SPARC 21065L 32-bit floating-point processors. This particular configuration
confines all the essential high sampling rate operations to the second
processor, allowing a simple hardware upgrade path to be considered for the
96/24 'high definition' audio format.
- 2-3 The Design of a Video
Friendly Audio Coding System for Distribution Applications
Louis D. Fielder, Craig C. Todd, Dolby Laboratories, Inc.,
San Francisco, CA, USA (Invited)
- Dolby E is an audio bit rate reduction system for audio-visual
distribution applications. The coding system was developed for situations
involving tandem coding operations, editing and switching at video frame
boundaries, and transport on AES3 serial-digital data paths. Special TDAC
transforms are organized a frame structure that allows for adaptive transform
lengths and frame synchrony with video framing.
- 3-1 MP3 and AAC explained
Karlheinz Brandenburg, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)
- The last years have shown widespread proliferation of .mp3-files,
both from legal and illegal sources. Yet most people using these audio
files do not know much about audio compression and how to use it. The
paper gives an introduction to audio compression for music file
exchange. Beyond the basics the focus is on quality issues and the
compression ratio / audio bandwidth / artifacts tradeoffs.
- 3-2 MPEG-4 Intellectual Property Management & Protection (IPMP) Overview & Aplications
Jack Lacy, AT&T Laboratories, Florham Park, NJ, USA;
Niels Rump, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany;
Talal Shamoon, InterTrust Technologies Corporation, Sunnyvale, CA, USA;
Panos Kudumakis, CRL, London, England (Invited)
- This presentations acquaints you with the MPEG-4 Intellectual Property
Management & Protection (IPMP) Framework, a generic approach to the
specification of a framework for the management and protection of
intellectual property rights.
Besides specifying a flexible way to identify content, MPEG-4 IPMP
offers a "hooks" mechanism that allows for the inclusion of information
in the MPEG-4 bit stream that is used to indicate that a particular
non-normative IPMP system is to be used to manage access to that stream.
This enables the design of applications that provide efficient
management and protection of both content-related and patent-related IP.
- 3-3 AESSC SC-06-04 Activities on Digital Music Distribution
Phil Wiser, Liquid Audio, Redwood City, CA, USA;
Niels Rump, Fraunhofer-Institut für Integrierte Schaltungen IIS-A, Erlangen, Germany (Invited)
- Since forming in July 1997, the SC-06-04 standard committee has
addressed many online audio issues ranging from audio quality to security
aspects of digital music. This group initially focused on mechanisms for
calibrating often abused audio quality descriptors such as "CD quality".
The group has also hosted discussions with international rights organizations
to address the needs of that community. Finally, the SC-06-04 has provided a
forum for discussion of ideas that are critical issues for the newly formed
SDMI process.
- 3-4 SDMI: Creating a Marketplace for Secure Digital Music
Cary Sherman, Recording Industry Association of America,
Washington D.C., USA (Invited)
- A progress report on the Secure Digital Music Initiative. SDMI is a
forum for the worldwide recording industry and technology companies to
develop an open, interoperable architecture and specification for digital
music security. The goal is to meet consumer demand for convenient access to
digital music, protect artists ' rights online, and allow technology and music
companies to build successful businesses.
- 4-1 MPEG-4 speech coding
Masayuki Nishiguchi, Audio and Speech Group, Sony Corporation, Tokyo, Japan (Invited)
- MPEG-4 Audio standard (ISO/IEC 14496-3) supports highly efficient
representation of speech signals at 2.0-24 kbps using two basic
algorithms; Harmonic Vector eXcitation Coding (HVXC) and Code
Excited Linear Prediction (CELP). This paper describes overview
of MPEG-4 speech coding algorithms and new functionalities such
as speed/pitch change and bit-rate scalability, which, together
with high coding efficiency, distinguish MPEG-4 from other standards.
- 4-2 MPEG-4 Structured Audio Authoring Considerations
Lee Ray, Create Audio Technology Center, Scotts Valley, CA,
USA (Invited)
- MPEG-4 Structured Audio (SA) provides sets of materials which can be
manipulated either as the author has specified in creation, as the listener
specifies during the listening session or with a combination of both kinds
of control. Unlike a system that is a "straight wire with (coding) gain",
SA can transmit new kinds of interactive content. Various strategies for
authoring interactive content with SA are reviewed.
- 4-3 The MPEG-4 General Audio Coder
Bernhard Grill, Fraunhofer-Institut für Integrierte
Schaltungen IIS-A, Erlangen, Germany (Invited)
- The paper presents the MPEG-4 general audio (GA) coder which - unlike
the MPEG-4 speech or synthetic coding schemes - is designed to work with
any type of audio material. The GA coder includes all features of MPEG-2
AAC, plus extensions for improved coding quality at very low bitrates,
including the TwinVQ scheme. In addition, embedded coding techniques,
where one program is represented by a hierarchical set of coding layers,
further extend the capabilities.
- 4-4 An Overview of MPEG-4 Audio Version 2
Heiko Purnhagen, Laboratorium für Informationstechnologie
University of Hannover, Germany (Invited)
- The MPEG-4 Audio Standard provides audio and speech coding for
natural and synthetic content at bitrates ranging from 2 to 64 kbit/s and
above. While the first version of MPEG-4 Audio was ffinalizedin 1998, work
continues for Version 2, complementing MPEG-4 Audio by the following new
tools: Error Resilience, Low-Delay Audio Coding, Small Step Scalability,
Parametric Audio Coding, and Environmental Spatialisation.
- 5-1 Application of a
physiological ear model to irrelevance reduction in audio coding
Frank Baumgarte, Institut für Theoretische
Nachrichtentechnik und Informationsverarbeitung Universität Hannover, Germany
- A previously published physiological ear model is applied as
perceptual model to an audio coder complying with the ISO/MPEG-2 AAC standard.
The achieved subjective sound quality is compared to results from an optimized
psychoacoustical model. Significant deviations of the generated masked
thresholds from the physiological ear model and the psychoacoustical model are
evaluated with respect to psychoacoustical measurements.
- 5-2 Audio coding with
auditory time-frequency noise shaping and irrelevancy reducing vector quantization
Markus Vaalgamaa, Aki Härmä, Unto Laine, Helsinki
University of Technology, Laboratory of Acoustics and Audio Signal Processing, Espoo,
Finland
- A perceptual audio codec using Warped Linear Prediction, Temporal
Noise Shaping and Vector Quantization is presented. These techniques are used
to shape the quantization noise according to spectral and temporal masking
characteristics of ear. The quantization process is controlled by an auditory
model. In addition simulations and subjective listening test results of the
codec are presented.
- 5-3 Generalized Audio Coding with MPEG-4 Structured Audio
Eric D. Scheirer, Youngmoo E. Kim, Machine Listening Group,
MIT Media Laboratory, Cambridge, MA, USA
- The MPEG-4 Structured Audio standard was created to enable
low-bitrate yet high-quality transmission of synthetic audio soundtracks.
However, structured-audio techniques are suitable for flexible natural coding
of audio as well as audio synthesis. This article introduces the concept of
generalized audio coding, in which the Structured Audio decoder is used to
emulate the behavior of other audio decoders.
- 6-1 Warped low-delay CELP for wideband audio coding
Aki Härmä, Unto K. Laine, Matti Karjalainen,
Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing,
Espoo, Finland
- Low-delay audio coding is a somewhat new trend in perceptual wideband
audio coding. Low coding delay is important, e.g., in applications based on
bidirectional real-time audio transmission. The technical aspects and
psychoacoustics of such applications are reviewed and an audio codec with a
coding delay of less that 2 ms is introduced in the paper.
- 6-2 Exploiting Excess Masking for Audio Compression
Ye Wang, Miikka Vilermo, Nokia Research Center Speech and
Audio Systems Lab., Tampere, Finland
- In order to improve audio coding performance, excess masking has been
employed for the compression of complex audio signals. A new algorithm is
developed to classify and preprocess maskers. A psychoacoustic model is used
to estimate simultaneous masking threshold. This masking threshold is used for
quantizing audio signal coefficients in the frequency domain. Preliminary test
results show improved coding efficiency.
- 6-3 Audio Coding based on
Rate-Distortion and Perceptual Optimization Techniques
Markus Erne, Institut für Signal- und Informationsverarbeitung ETH Zürich, Zürich Switzerland
- A new approach to audio coding based on optimization techniques will
be presented. In contrast to existing coding algorithms, this framework is
based on a signal-adaptive filterbank which is controlled using optimization
techniques for a rate-distortion or perceptual metric. Dynamic Programming
techniques allow to optimize the time-frequency tiling of the filterbank and to
minimize the distortion for a given bit-budget.
- 6-4 Sinusoids plus noise modelling for audio signals
A. W.J. Oomen, Philips Research Laboratories, Eindhoven,
The Netherlands; A.C. den Brinker, Philips Research
Laboratories, Eindhoven, NL
- Considered is the problem of describing quasi-stationary audio
segments as a sum of sinusoids plus noise. After the extraction of the
sinusoids, the noise is described in stochastic terms. An essential problem
here is to determine how many sinusoidal components should be taken into
account. It is shown how this problem can be handled by the psychoacoustic
model.
- 6-5 Advanced Perceptual Digital Audio Coding Algorithm
Juriy A. Kovalgin, Dmitriy A. Hitrov Maxim, V. Zyrianov, State University of Telecommunications St.Petersburg Chair of Radio, Broadcasting Russia, St.Petersburg, Russia
- A new efficient coding scheme of high-quality audio signals with
digital data compression is proposed. It is based on a signal-adaptive filter
bank, the modified psychoacoustic model (MPEG-ISO/IEC-FCD14496-3, Subpart 1),
another algorithm of bit-allocation. Masking in the frequency and time domain,
binaural unmasking effect and particular features of virtual sound source
localization by stereo signal binaural perception are considered.
- 6-6 High quality consistent
analysis-synthesis in sinusoidal coding
Koen Vos, Department of Electrical Engineering, Delft
University of Technology, Delft, NL; Renat Vafin,
Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm,
Sweden; Richard Heusdens, W. Bastiaan Kleijn
- In this paper, we discuss several improvements in sinusoidal coding. First, we
present an analysis method which estimates windowed sinusoids to represent a
segment of an input speech or audio signal. The window used in the analysis is
the same amplitude-complementary window as is used in overlap-add synthesis,
which makes analysis consistent with synthesis. Second, we present techniques
for optimization of sinusoidal parameters based on the squared difference
between the input signal and reconstruction. It is shown how the overlapping
nature of segments can be accounted for in the sinusoidal estimation. Efficient
methods for computation are also discussed. Experimental results verify that our
procedures provide a significant improvement in reconstruction accuracy.
- 6-7 Developments with a Zerotree Audio Codec
Ben Leslie, Chris Dunn, Mark Sandler, Department of Electronic
Engineering King's College London Strand, London, UK
- We present a detailed description of the EZK algorithm, a version of
zerotree quantisation developed by the authors, and describe why it provides
higher compression than the alternatives. We compare the performance of the
algorithm using several alternative transforms, and describe the incorporation
of psychoacoustic modelling. Finally, we compare the performance of the codec
with some alternatives.
- 6-8 The Perceptual Audio
Coding Concept: From Speech to High-Quality Audio Coding
Anibal Ferreira, FEUP University of Porto, Porto, Portugal
- High-quality audio coding finds its roots in the speech coding arena
whose research activity not only historically produced the first coding
solutions and standards, but also raised the attention to the importance of
modeling perceptual phenomena. This paper tracks the evolution of the
perceptual audio coding concept by identifying its importance on reference
speech, wideband speech, and high-quality audio coders. Representative
algorithms are reviewed and an emphasis is placed on the current
state-of-the-art as well as on future developments and technological trends.
- 6-9 An Audio Codec for Multiple
Generations Compression Without Loss of Perceptual Quality
Frank Kurth, Institut für Informatik V, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- We describe a generic audio codec allowing for multiple, i.e.
cascaded, lossy compression without loss of perceptual quality as compared to
the first generation of compressed audio. For this sake we transfer encoding
information to all subsequent codecs in a cascade. The new method is
applicable to wide range of current audio codecs as documented by our MPEG-1
implementation.
- 6-10 Improving Lossless Audio Coding
Jürgen Koller, Thomas Sporer, Karlheinz Brandenburg,
Fraunhofer-Institut für Integrierte Schaltungen, Erlangen, Germany
- During the last years low bit-rate coding of high quality digital
audio signals spread over a large range of applications. Such lossy audio
coding schemes should not be used in studio applications. Lossless audio coding
gives the opportunity to preserve as much as possible of the precious
audio material. At the 103rd AES convention we presented first results on
a new coding scheme, which combines the advantages of low bit rate audio
coding and lossless compression.
In this paper we describe the improvements done on that coding scheme: A
different quantization characteristic and improved strategies for
quantization control and bit-stream representation lead to a distinct
improvement in compression.
- 6-11 Higher Order Estimation
of Sinusoidality with Applications for Quality Coding of Musical Instruments
Shlomo Dubnov, Department of Communication Systems
Engineering, Faculty of Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- Sinusoidal models capture time-varying
spectral characteristics of a sound by representing harmonic components with
time-varying sinusoids. An important characteristic of harmonic components is
their quality of fit to a "sinusoidality" assumption. In the paper
we consider Higher Order Statistical (HOS) methods for sinusoidality determination.
HOS are sensitive to non-linear interactions between spectral components, such
as phase and frequency coupling. We show that sinusoidality is closely related
to frequency but NOT to phase coupling. We present a bispectral algorithm for
sinusoidality detection and a simplified version of the algorithm that determines
an upper sinusoidality cutoff frequency from kurtosis estimates of the residual
(pre-whitened) signal.
- 7-1 Subjective Evaluation of Large and Small Impairments in Audio Codecs
Gilbert A. Soulodre, Michel Lavoie, Advanced Sound Systems
Group Communications Research Centre, Ottawa, Ont., Canada (Invited)
- A significant amount of research has been conducted in the
development of a test methodology for evaluating small impairments in audio
codecs. Specifically, ITU-R Recommendation BS.1116 provides a full description
of the accepted methodology which has been used extensively in the development
and evaluation of high quality audio codecs. This methodology has proven
effective at generating consistent results in subjective tests and can
provide a high degree of resolution in discriminating between codecs. With
the recent trend towards very low bitrate audio codecs and the corresponding
lower quality, a need has arisen for a subjective test methodology which will
allow the performance of these codecs to be evaluated and compared in a
rigorous fashion. The method described in BS.1116 is not entirely appropriate
for this purpose and so a new method has been developed for evaluating audio
codecs with larger impairments. The new method strives to maintain those
aspects of BS.1116 which have proven most effective, while extending it to
address the particular difficulties encountered when evaluating large
impairments. This paper describes the methodologies for evaluating small
and large impairments, and includes results from formal subjective tests.
- 7-2 Perceptual Quality Assessment for Digital Audio:
PEAQ the Proposed ITU Standard for Objective Measurement of Perceived Audio Quality
Catherine Colomes, Centre Commun d'Etudes de Télédiffusion
et Télécommunications, Rennes, France; Christian Schmidmer, OPTICOM,
Erlangen, Germany; Thilo Thiede, Tepholm & Westermann ApS, Denmark; William
C. Treurniet, Communications Research Centre, Ottawa, Canada (Invited)
- An increasing attention is being paid to the issue of objective
quality assessment, basically in the perceptual coding and the
metrology domains. Perceptual coding of audio signals is increasingly
used in transmission and storage of high quality digital audio, and
there is an obvious demand for an acceptable objective method to
measure the quality of such signals. A new measurement method is
described that combines features from several earlier methods. This
method meets the requirements of the user community, and it has
become a recommendation within the ITU Radio communication Study
Groups.
|