Audio Engineering Society Preprints

AES 111th Convention

New York
November 30-December 3, 2001

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

5409
Gottfried K. Behler,Michael Makarski,
Horn-drivers and horns in general are measured and characterized as combinations only. This is a restriction compared to the possibility of arbitrarily combining the two part at its standardized connection. With the method presented here an individual measurement of each part is possible and the overall transfer characteristics of the combined system is calculated by a computational tool.
Two-Port Representation of the Junction Between Horn-driver and Horn

5410
Mark Dodd,
Magneto-static Finite Element Methods have been applied to designing loudspeaker motors for some time. Advances in both hardware and software present the opportunity of extending our knowledge about the detailed electro-magnetic behaviour of loudspeaker motors. In particular by coupling a lumped element kinematic model to a transient-magnetic finite element model the motion of a loudspeaker voice coil may be calculated for an arbitrary electrical input. This motion includes components due to transduction non-linearities such as flux-modulation, eddy currents, and force factor profile. A great advantage of this technique is that the results show the distribution of eddy currents throughout the structure allowing their display and analysis. Furthermore, by calculating the motion of a loudspeaker voice coil for a sinusoidal input, the harmonic distortion of the transduction mechanism may be calculated at any chosen level and frequency. Preliminary results from a new motor structure are presented to illustrate some of the benefits and current limitations of this technique.
The Transient Magnetic Behaviour of Loudspeaker Motors

5411
Andrew Bright,
Non-linear distortion compensation by DSP has been actively research over the last decade. This paper analyses the potential benefits such distortion compensation systems may provide in complete electroacoustic systems. Specifically, this paper compares the increase in amplifier output required by such systems with the increase in sensitivity provided by shorter voice coils. It is shown that a net increase in the sensitivity is provided if a loudspeaker's voice coil height is equal to its magnet gap height, and used over a range of peak-to-peak excursions equal to three times this height.
Compensating Non-linear Distortion in an 'Equal-Hung' Voice Coil

5412
Malcolm O.J. Hawksford,Neil Harris,
The degree to which radiation from a loudspeaker is diffuse may be quantified by a spatial correlation function normalised to the on-axis response. This is true for any loudspeaker type, including the distributed-mode loudspeaker (DML). However, because of the variation in material damping and design-related constraints, correlation commonly varies both with frequency and direction. A modified function, the offset spatial bandwidth of correlation function, is introduced as a means of describing diffuse performance and quantifying its variation over the radiation field.
Spatial Bandwidth of Diffuse Radiation in Distributed-Mode Loudspeakers

5413
James J. McTigue,
A commercially available acoustic simulation software with integral normalized Head Related Transfer Function (HRTF) was utilized to create virtual sound spaces, reverberation, and auralization. Four men and four women with self-reported normal hearing were tested individually in a room specifically designed for audiological research. Stimuli were presented via headphones and subjects responded via a custom Graphical User Interface (GUI). Azimuth had a significant effect on all of the dependent variables. Gender differed significantly with respect to distance judgement. Room type significantly affected elevation judgement. In addition, there were interactions between gender and azimuth (azimuth judgement), room and reverberation type (distance judgement), and reverberation type and azimuth (distance judgement).
Impact of Artificial Reverberation on Perceived Sound Localization During a Headphone Listening Task

5414
Chulmin Choi,Lai-Hoon Kim,Sejin Doo,Yangki Oh,Koeng-Mo Sung,
Measuring and reproducing spatial impressions of a sound field has been an important issue in auditorium acoustics for the study of existing auditory spaces. The directional impulse response, which contains spatial information, can be measured and auralized. Typical method for synthesizing binaural impulse response from the measured data assumes each reflection as ideal impulse with measured time delay and level. This makes difference between auralized situation and real situation because the shape of each reflection contains the information of sound color. We propose improved auralization algorithm, which uses real reflection samples from measured impulse response. In this paper, we measured the early reflection profile of a hall using 5-microphones system and synthesized the binaural impulse response using proposed algorithm.
Research on the Improvement of Naturalness in Auralization

5415
Matti Karjalainen,Hanna Jarvelainen,
The perceptual aspects of reverberation are less well known than the acoustic principle itself and its DSP-based simulation in artificial reverberators. In this paper, a series of psychoacoustic experiments are reported, along with their interpretation using auditory modeling, in order to reveal the underlying principles of late reverberation perception. Motivated by the results, a simple technique for reverb design is proposed.
More About This Reverberation Science: Perceptually Good Late Reverberation

5416
Chris Kyriakakis,Athanasios Mouchtaris,
Multichannel audio offers significant advantages for music reproduction that include the ability to provide better localization and envelopment, as well as reduced imaging distortion. Consumer media such as DVD-Audio and SACD allow the delivery of multichannel program material today. However, although there are thousands of music recordings available in mono or two-channel stereo, only a handful have been recorded using microphone techniques that would allow subsequent multichannel rendering. In this paper we propose a new method that is capable of synthesizing the required microphone signals from a smaller set of signals recorded in a particular venue. These synthesized ``virtual'' microphone signals can be used to produce multichannel recordings that accurately capture the acoustics of the particular venue. Applications of the proposed system include remastering of existing monophonic and stereophonic recordings for multichannel rendering, as well as transmission of multichannel audio over the current internet infrastructure.
Time-Frequency Methods for Virtual Microphone Signal Synthesis

5417
Jason Corey,Wieslaw Woszczyk,Geoff Martin,Rene Quesnel,
A system for control and synthesis of auditory perspective in a multichannel soundfield is described in this paper. The system employs a soundfield synthesis engine comprised of several acoustic simulation devices working in parallel that are all controlled by one intuitive, programmable controller. The controller allows smooth, efficient, and dynamic modification of the spatial attributes of a multichannel soundfield.
An Integrated Multidimensional Controller of Auditory Perspective in a Multichannel Sound Field

5418
Wolfgang Klippel,
A new method is presented for the numerical simulation of the large signal performance of drivers and loudspeaker systems. The basis is an extended loudspeaker model considering the dominant nonlinear and thermal effects. The use of a two-tone excitation allows the response of fundamental, DC-component, harmonics, and intermodulation components to be measured as a function of frequency and amplitude. After measurement of the linear and nonlinear parameters, the electrical, mechanical, and acoustical state variables may be calculated by numerical integration. The relationship between large signal parameters and non-linear transfer behavior is discussed by modeling two drivers. The good agreement between simulated and measured responses shows the basic modeling, parameter identification, and numerical predictions are valid even at large amplitudes. The method presented reduces time-consuming measurements and provided essential information for quality assessment and diagnosis. The extended loudspeaker model also allows prediction of design changes on the large signal performance by changing the model parameters to reflect the driver design changes. The incorporation of nonlinear parameters into the loudspeaker model allows for optimization in both the small and large signal domains by model prediction.
Prediction of Speaker Performance at High Amplitudes

5419
Ryan J. Mihelich,
A new method for the estimation of the nonlinear modeling parameters of a electrodynamic loudspeaker is presented. Measurements of time-domain voice coil displacement are compared with the predicted displacement from a modeled loudspeaker. An optimizer adjusts the coefficients of the functions describing the nonlinear parameters. Loudspeaker nonlinear parameters are obtained through minimization of error between the measurements and the modeled response. The resulting nonlinear model yields good agreement with measured data over a broad frequency and amplitude range.
Loudspeaker Nonlinear Parameter Estimation: An Optimization Method

5420
Martin Rausch,Manfred Kaltenbacher,Hermann Landes,Reinhard Lerch,Gerhard Krump,Leonhard Kreitmeier,
In this paper the applicability of an efficient numerical calculation scheme in the computer-aided design of electrodynamic loudspeakers is demonstrated. This modeling scheme is based on a finite element method (FEM) and allows the precise calculation of the electromagnetic, mechanical and acoustic fields including their couplings. Furthermore, nonlinear effects in the mechanical behavior of the spider as well as magnetic nonlinearities due to the nonhomogeneity of the magnetic field are taken into account.
Computer-Aided Design of Electrodynamic Loudspeakers by Using a Finite Element Method

5421
J.R. Wright,
This paper describes a method of increasing the acoustic compliance of a loudspeaker cabinet, by introducing activated carbon into the enclosure. The process is explained and working examples discussed.
The Virtual Loudspeaker Cabinet

5422
Jong-Soong Lim,Chris Kyriakakis,
One of the key limitations in spatial audio rendering over loudspeakers is the degradation that occurs as the listener¡¯s head moves away from the intended sweet spot. In this paper, we propose a method for designing immersive audio rendering filters using adaptive synthesis methods that can update the filter coefficients in real time. These methods can be combined with a head tracking system to compensate for changes in the listener¡¯s head position. The rendering filter's weight vectors are synthesized in the frequency domain using magnitude and phase interpolation in frequency sub-bands.
Adaptive Synthesis of Immersive Audio Rendering Filters

5423
Nick Zacharov,Kalle Koivuniemi,
This papers presents the external preference mapping of the perception of spatial sound reproduction systems. 13 spatial sound samples and 8 reproduction systems were subjectively tested in terms of preference and direct attribute ratings. The unravelling of this data to establish what perceptual attributes contribute to subjective preference is performed using multivariate calibration techniques, the results of which are presented and analyzed in detail. A predictive model of subjective preference is developed and presented for this class of spatial sound reproduction systems.
Unravelling the Perception of Spatial Sound Reproduction: Analysis & External Preference Mapping

5424
Nick Zacharov,Kalle Koivuniemi,
This paper presents the methods used in developing a descriptive language for a set of samples created for evaluating different spatial sound reproduction systems. The different methods of language development are discussed, and the language development process employed is explained in detail. The developed descriptive language is presented with the associated direct attribute scales. Lastly, the development of training samples is presented.
Unravelling the Perception of Spatial Sound Reproduction: Language Development, Verbal Protocol Analysis and Listener Training

5425
Amber Naqvi,Francis Rumsey,
This paper presents the results of computer simulation of active reflectors in a reference listening room which are used to create artificial reflections in a two speaker, stereo listening configuration. This formulates the second phase of experiments in the active listening room project involving the analysis of computer modeling results and loudspeaker selection based on free field response. The aim of this project is to create a truly variable listening condition in a reference listening room by means of active simulation of key acoustic parameters such as the early reflection pattern, early decay time and reverberation time.
The Active Listening Room Simulator: Part 2

5426
Ralph Glasgal,
Ambiophonics, is the logical successor, to stereophonics, 5.1, 6,0, 7,1, 10.2, or Ambisonics in the periphonic recording and reproduction of frontally staged music or drama. We show how only two recording media channels, driving a multi-speaker surround Ambiophonic system, can consistently and optimally generate a "you are there" sound field that the domestic concert hall listener can sense has normal binaural physiological verisimilitude. Ambiophonics can deliver such realism even from standard two media channel recordings such as the existing library of LPs, CDs, DVDs, or SACDs or via super-wide-stage recordings made using an Ambiophone.
Ambiophonics. Achieving Physiological Realism in Music Recording and Reproduction

5427
Vladimir Z. Mesarovic,Raghunath Rao,Miroslav V. Dokic,Sanjay Joshi,
In the recent years we have witnessed the explosion of digital multimedia technologies and content, both in video and audio arenas. The development of digital audio technologies was even more dramatic and versatile than any other form of multimedia. The proliferation of the new audio technologies had a profound impact on content providers, chip manufacturers, product manufactures as well as end consumers. Besides affecting the component-integration, this audio format proliferation also increased the complexity of multimedia systems that support multiple audio formats. One of the major technical challenges in these systems is how to enable these new features in the existing systems with the minimal cost and time-to-market penalties. Typically, the solution is to add a separate external audio decoder, but then the problem is how to achieve the glueless interface between the existing system and the new decoder. This paper addresses the problem of interfacing the external audio decoders with the rest of the multimedia system on the software protocol level.
A/V Synchronization Using Modified Packetized Elementary Stream (PES) Over Synchronous Audio Interface

5428
Johann Gaboriau,Xiaofan Fei,Eric Walburger,
A method used to remove the distortion in a digital PWM amplifier is introduced. This method is based on correction factors added to each integrator of a multi-bits delta-sigma modulator loop. This results in a tremendous improvement in the distortion performance of the system. A dynamic range of 100dB is obtained, with all harmonics supressed below 102dB. It is particularly useful for digital audio amplifiers.
High Performance PWM Power Audio Amplifier

5429
Philip Nye,
Over the last five years the trade organisation Entertainment Technology and Services Association has been working on ACN – a modern network protocol suite optimised for control of large numbers of diverse pieces of equipment in live performance and other challenging environments where speed and ease of configuration and rapid and reliable response is essential. Input has come from major players right across the entertainment technology industries with contribution of substantial resources from several major companies in the field.
ACN - A Protocol Suite for Entertainment Technology Networking

5430
Hiroyuki Hashimoto,Kenichi Terai,Isao Kakuhari,Yoshio Nakamura,Hisashi Sano,
We have developed the active control system for low frequency road noise in automobiles combined with an audio system. It is the first commercial application in the world. This system adopts feedback control in the front seats and feedforward control in the rear seat, furthermore a music compensation circuit is applied. So it reduces only the noise in the front seats about 10 dB without canceling out the music. As the result, it is comfortable to the passengers to listen to the music.
Active Control System for Low Frequency Road Noise Combined with an Audio System

5431
Peter Mapp,
A survey carried out by the authors on a range of commercial Jet Aircraft whilst in normal flight, found a wide variation in perceived intelligibility and operational signal to noise ratios of their on-board PA systems. Typical passenger cabin background noise levels of 80-85 dBA (100 -105 dBC) were recorded, whilst many systems were found to operate with either zero or negative S/N ratios. The results of extensive mock up testing are reported. It is shown that high frequency dispersion is a major factor contributing to the perceived intelligibility. The use of Distributed Mode Loudspeaker technology was found to bring about significant improvements in clarity & intelligibility but the effectiveness of Rasti as an accurate intelligibility descriptor under these conditions is questioned.
Improving the Intelligibility of Aircraft PA Systems

5432
Mario Di Cola,Davide Doldi,Davide Saronni,
The directional properties of horn devices are governed by the wavefront's shape presented at the mouth. An analysis of the sound pressure distribution across the horn's mouth that could certainly be helpful to understand how the wavefront is shaped there. Moreover, this could help to understand what happen in some particular circumstances. Midrange beaming or high frequency mouth diffraction phenomena for example, are two well-known obstacles to overcome designing a broadband constant directivity horn. The method forwarded by us in the previous work is here extended to some different cases and improved in the data processing. The results that come out of such analysis will be shown through graphic illustrations. Presented will be the results obtained performing measurements upon real devices correlated to traditional directivity plots as well.
Analysis of Directivity Anomalies in Mid and High Frequency Horn Loudspeakers

5433
Takashi Katayama,Kosuke Nishio,Masaharu Matsumoto,Yoshiaki Takagi,Yasuhito Watanabe,Kazuhiro Iida,
We have developed the MPEG-2 AAC (LC profile) Encoder in the professional authoring systems for Electric music distribution(hereafter EMD). The encoders for professional use are required higher quality sounds to express music contents correctly than the encoder for consumer use. So we have researched a few quantization methods in MPEG-2 AAC encoding algorithm that we think it effect the sound quality seriously and we have developed new methods to get the high quality sounds. In this paper, we described the new methods and the construction of the encoder system for the EMD using them.
High-quality Encoding Algorithms in MPEG-2 AAC for Electric Music Distribution

5434
Vijay Parsa,Donald G. Jamieson,
Distortion in hearing aids degrades the sound quality and reduces user satisfaction with these devices. In this paper, a distortion measure derived from the hearing aid response to natural speech stimuli is presented. The hearing aid response was modelled using a time-varying ARMA system whose coefficients were estimated using the Multiple Model Least Squares (MMLS) algorithm. The amount of distortion in the hearing aid was quantified using an ``auditory distance" parameter, which computes the distance between the hearing aid response and the model output. It is shown that the auditory parameter correlates better with perceptual judgements of hearing aid sound quality, both by normal and hearing impaired listeners, than the conventional hearing aid distortion measures.
Hearing Aid Distortion Measurement Using the Auditory Distance Parameter

5435
Ye Wang,Juha Ojanpera,Miikka Vilermo,Mauri Vaananen,
This paper presents three schemes for re-compressing MP3 (MPEG-1 Layer III) audio bitstreams. The first two schemes are lossless ones, which exploit the inter-frame redundancies of the main data (the scale factors and the quantized MDCT coefficients) of the MP3 bitstream. The third scheme is a lossy approach, which exploit the redundancies between consecutive beat-patterns. The aim is to study the potential of the new coding schemes. Preliminary results are demonstrated in this paper.
Schemes for Re-Compressing MP3 Audio Bitstreams

5436
Vladimir Z. Mesarovic,Raghunath Rao,Miroslav V. Dokic,Sachin Deo,
In today's competitive consumer audio market the Advanced Audio Coding (AAC) format has quickly become a must-have technology with its adoption on the Internet, in digital radio, digital television and home theatre. Compression using AAC retains high audio quality even at low bit rates. One reason for this effectiveness is the use of Huffman variable length coding to represent frequency domain information. However, this requires to perform the relatively complex task of Huffman decoding in the audio decoder, which is typically very sensitive to cost and processor speed requirements. Furthermore, encoders can sometimes create worst-case scenarios consisting of very long code words, even when unnecessary. Thus, one needs to optimize the Huffman decoding for these worst-case scenarios without giving up average performance. This paper discusses various methods for Huffman decoding, their inherent implementation tradeoffs on a DSP platform and proposes improvements that are specific to the Huffman codebooks used in AAC.
Selecting an Optimal Huffman Decoder for AAC

5437
Christof Faller,
Perceptual audio coders use a varying number of bits to encode subsequent frames according to the perceptual entropy of the audio signal. For transmission over a constant bitrate channel the bitstream must be buffered. The buffer must be large enough to absorb variations in the bitrate, otherwise the quality of the audio will be compromised. We present a new scheme for buffer control of perceptual audio coders. In contrast to conventional schemes the proposed scheme systematically reduces the variation in a perceptual distortion measure over time. The new scheme applied to a perceptual audio coder (PAC) improves the quality of the encoded signal for a given buffer size. The same technique can be used to increase the performance of other coders such as MPEG-1 Layer III or MPEG-2 AAC while maintaining backward compatibility.
Audio Coding Using Perceptually Controlled Bitstream Buffering

5438
Richard Barnert,Martin Opitz,
The growing use of multimedia applications in mobile equipment like notebooks, MP3-players or mobile phones causes the need for miniature speakers with optimized performance. In order to improve their acoustic performance the detailed behavior of transducers has to be studied. Modern tools like numeric simulation programs and laservibrometry give a deeper sight into the demanded characteristic features. In this paper the application of three important tools is described, examples are shown.
Modern Development Tools for Dynamic Transducers

5439
William R. Hoy,Charles McGregor,
For complex directional response data to be useful, it must be gathered and deployed in a much more disciplined manner than has typically been applied to magnitude-only data. The loudspeaker under test and measurement microphone must be precisely positioned; geometrical errors must be corrected; and, temperature variations must be accounted for. An object oriented data structure is described which facilitates solutions to each of these challenges. Practical applications employing the new data structure are also presented.
Loudspeaker Complex Directional Response Characterization

5440
David W. Gunness,
Transfer functions of acoustical systems usually include significant phase lag due to propagation delay. When this delay varies from one transfer function to another, basic mathematical operations such as averaging and interpolation produce unusable results. A calculation method is presented which produces much better results, using well-known mathematical operations. Applications of the technique include loudspeaker complex directional response characterization, complex averaging, and DSP filter design for loudspeaker steering.
Loudspeaker Transfer Function Averaging and Interpolation

5441
Leandro de Campos Teixeira Gom,Emilia Gomez,Nicolas Moreau,
Depending on the application, audio watermarking systems must be robust to piracy attacks. Desynchronization attacks, aimed at preventing the detector from correctly locating the information contained in the watermark, are particularly difficult to neutralize. In this paper, we introduce resynchronization methods for audio watermarking based on the use of training sequences. These methods reverse the effect of a large class of desynchronization attacks. Simulation results confirm the efficiency of the proposed methods.
Resynchronization Methods for Audio Watermarking an Audio System

5442
Jurgen Herre,Christian Neubauer,Frank Siebenhaar,Ralph Kulessa,
Today, music distribution over the Internet is an increasingly important business. In this context, watermarking can provide beneficial means to transmit rights information within the content. To convey the origin of such content, the combination of simultaneous low bitrate encoding and watermark embedding is a promising novel technique. This paper describes the basic concept of combined compression/watermarking for audio signals. In contrast to separate steps of encoding and watermarking the combined approach enables an optimal coordination between the quantization strategy in the audio encoder and the watermark embedding process. This allows to adjust the system to specific needs in terms of audio quality and watermark robustness. Experimental results obtained from an extended MPEG-2/4 AAC encoder and a first implementation of an extended MPEG-1/2 Layer-3 encoder confirm the potential of the concept.
New Results on Combined Audio Compression/Watermarking

5443
Dane Grant,Louis D. Fielder,Grant Davidson,
Subjective quality is a critical indicator of the suitability of bit-rate reduction codecs for digital audio contribution/distribution applications. Accordingly, a formal double-blind test was conducted to evaluate the subjective quality of a contribution/distribution cascade (eight Dolby E audio codecs in tandem), both separately and when combined with three standardized stereo emission codecs. In this test, the contribution/distribution cascade exceeded the ITU-R basic audio quality requirements for broadcast applications. The results further indicate that when using the contribution/distribution cascade in tandem with any one of the emission codecs, audio quality is effectively limited by the emission codec alone. The test methodologies conformed to Recommendation ITU-R BS.1116, and hence represent an estimate of worst-case performance.
Subjective Evaluation of an Audio Distribution Coding System

5444
Joao Manuel Rodrigues,Ana Maria Tome,Tomas Oliveira e Silva,
Two strategies have been used to evaluate the perceptual significance of a distortion introduced into a signal: the masking threshold concept, and the internal representation approach. Psychoacoustic models found in the perceptual audio coding literature generally follow the first approach in spite of the recognized unsuitability of this approach for the task. The more plausible internal representation approach is currently exploited in objective measurement of audio quality but not, to our knowledge, in audio coders. In this paper, we explore a standard internal representation model, and find some statistical relations that might, in the future, be applied to audio coding.
Auditory Models in Audio Coding

5445
Piotr Kleczkowski,
The action of the dynamic compressor introduces non-linear distortion. This is meaningful for the attack portion of this action. The distortion is analyzed and the feasibility of its reduction is investigated. It is possible to reduce the distortion by the appropriate choice of the time function actually controlling the gain. It is shown that the distortion can be measured in absolute scale, but it is difficult to develop a psychoacoustically justified measure. Some listening tests have been performed and their results are compared to quantitative analysis leading to interesting conclusions.
The Reduction of Distortion in the Dynamic Compressor

5446
Matti Karjalainen,Kazuho Ono,Ville Pulkki,
Binaural modeling of coloration perceived due to multiple coherent sources is studied under the condition that sounds arrive at a listener successively within a certain time delay. The model simulates the perception of coloration of two horizontally located sources under prominent precedence effect conditions. Listening experiments are conducted to ensure the validity of the modeling. A new methodology is adopted in the experiments to minimize the error owing to individuality of HRTFs and inaccuracy of HRTF measurements.
Binaural Modeling of Multiple Sound Source Perception: Methodology and Coloration Experiments

5447
Thomas J. Loredo,
Many common audio test and measurement procedures require characterization of the output signal of the device under test in terms of harmonic (sinusoidal) components and residual noise when the device processes sinusoidal input signals. This work uses the Bayesian approach to statistical inference to address such problems as parameter estimation problems when discrete samples of the output signal are given. In the resulting Bayesian harmonic analysis the power spectrum computed from the discrete-time Fourier transform appears as the logarithm of the posterior probability for the frequency of a single sinusoid rather than as an estimate of the signal spectrum; more complicated functions of the transform arise when analyzing signals with multiple sinusoids. Problems such as spectral leakage are addressed by nonlinear processing of the Fourier transform, offering several advantages over methods that use (linear) windowing of data.
Bayesian Harmonic Analysis for Audio Testing and Measurement

5448
Jon D. Paul,
Transformers function in digital audio systems primarily to reject common mode noise interference, to break ground loops and to enhance balance to reduce inductive emissions. A new test characterizes the interference rejection of digital audio transformers used at the receiver input of transmission systems. A sample set of decoded frame sync clocks are accumulated by a statistical time interval analyzer. The analyzer calculates the mean value of the set of periods, the standard deviation (jitter), and provides a period histogram. The histogram and standard deviation establish a basis for comparing the high frequency interference rejection of various transformers and for quantifying the nature of the induced jitter. Test data are presented for 7 different types of transformers.
Characterizing Digital Audio Transformers with Induced Jitter Histograms

5449
Matti Karjalainen,Juha Merimaa,Timo Peltonen,Tapio Lokki,
Room impulse responses are inherently multidimensional, including components in three coordinate directions, each one further being described as a time-frequency representation. Such 5-dimensional data is difficult to visualize and interpret. We propose methods that apply 3-D microphone arrays, directional analysis of measured room responses, and visualization of data, yielding useful information about the time-frequency-direction properties of the responses. The applicability of the methods is demonstrated with three different cases of real measurements.
Measurement, Analysis, and Visualization of Directional Room Responses

5450
Chris Kyriakakis,Sunil Bharitkar,
Room acoustical modes, particularly in small rooms, cause a significant variation in the room responses measured at different locations. Responses measured only a few cm apart can vary by up to 15-20 dB at certain frequencies. This makes it difficult to equalize an audio system for multiple simultaneous listeners. Previous methods have utilized multiple microphones and spatial averaging with equal weighting. In this paper we present a different multiple point equalization method. We first determine representative prototypical room responses derived from several room responses that share similar characteristics, using the fuzzy unsupervised learning method. These prototypical responses can then be combined to form a general point response. When we use the inverse of the general point response as an equalizing filter, our results show a significant improvement in equalization performance over the spatial averaging methods. Applications of this method include equalization and multiple point sound control at home and in automobiles.
New Factors in Room Equalization Using a Fuzzy Logic Approach

5451
D.B., Jr. Keele,
The EIA-426-B standard: “Loudspeakers, Optimum Amplifier Power” (April 1998) specifies a test CD that contains the calibration and test signals for all the tests defined in the standard. This CD is intended to improve the consistency and convenience of the standard and will be made available through the EIA and other sources. This paper describes the development process of the signals placed on the CD with emphasis on the spectral-shaped random noise signal used for life testing and the variable-rate sine-wave sweep test signal used for power compression tests. All signals were generated analytically using a signal processing and data analysis program. In the process of creating the signals, a couple of errors were detected in the standard in its description of the method for generating the variable-rate sweep signal. The paper also develops the math for generating variable-rate sweeps whose spectrums roll-off at an arbitrary given rate. Complete statistics and measurements are described for the signals as placed on the CD and for the signals as played back on a typical CD player. Also described are a series of 6.5-cycle shaped tone bursts that are included on the CD. These are intended for use as a test stimulus for short-term power assessment of loudspeakers and electronics, and for testing the frequency response, energy decay and narrow-band phase/polarity of systems.
Development of Test Signals for the EIA-426-B Loudspeaker Power Rating Compact Disk

5452
Alex Loscos,Jordi Bonada,Hideki Kenmochi,Xavier Serra,Pedro Cano,
In this paper we present two different approaches to the modeling of the singing voice. Each of these approaches has been thought to fit in the specific requirements of two applications. These are an automatic voice impersonator for karaoke systems and a singing voice synthesizer.
Spectral Approach to the Modeling of the Singing Voice

5453
Frantisek Kadlec,
Design and generation of test signals for measurement of electroacoustic systems, as well as for psychoacoustic testing is discussed. The main focus is on a digital generation of low-level harmonic test signals and an observation how subsequent DSP processing of those introduces a distortion. Mathematical analysis describes the origin of this distortion, and its audible perception is also examined. Further it is shown how the influence of various distorting components on audible perception can be minimized by using dither. Additionally, the generation of sweeping frequency signals and multitone signals is discussed.
Design, Generation and Analysis of Digital Test Signals

5454
Thibaud Guichardan,
With the increasing availability of surround sound in cinemas, home theatres and multimedia computers, the demand for surround sound programs is constantly rising. Still, it remains difficult to assess to which extent does the use of surround sound increase the feeling of immersion in the total movie-viewing experience. The purpose of this sutdy is to attempt an evaluation of the variation of the viewer's immersive sensation, with the change of the surround sound proportion, also taking into account the scenes dramatic content.
Interactions Between Surround Sound Level and the Immersive Feeling in the Multichannel Movie Experience

5455
Ville-Veikko Mattila,
Perceptual analysis of speech quality in mobile communications was carried out by semantic differentiation and external preference mapping, and the developed attributes were mapped to overall quality judgements. A clean speech sample and a speech sample corrupted by car cabin noise from two speakers were processed by different processing chains representing, e.g., transmission of speech over real GSM networks, various standardised speech coders and speech coding with erroneous transmission channels, etc., resulting in a total of 170 samples. The perceptual characteristics of the test samples were described by 18 screened and trained subjects. The final descriptive language with 21 attributes and their rating scales were developed in panel discussions. The scaled attributes were mapped to overall quality evaluations collected from 30 screened and trained subjects by partial least-squares regression (PLSR).
Descriptive Analysis of Speech Quality in Mobile Communications: Descriptive Language Development and External Preference Mapping

5456
Sheila Flanagan,Brian C.J. Moore,
Listeners were required to identify which of six vowel-like harmonic complexes was presented on each trial. The components of the complexes were added either in cosine or in random phase and the fundamental frequency was 50 or 100 Hz. The sounds were reproduced in a typical listening room via a distributed mode loudspeaker (DML) or a conventional loudspeaker. Overall accuracy of vowel identification was similar for the two loudspeakers. For both loudspeakers, performance was better for cosine phase than for random phase, indicating that phase information is preserved to some extent even in the far field.
The Effect of Loudspeaker Type on the Identification of Vowel-like Harmonic Complexes

5457
Francis Rumsey,Russell Mason,Bart de Bruyn,
The subjective spatial effect of continuous noise signals with interaural time difference fluctuations was investigated. These fluctuations were created by sinusoidal interchannel time difference fluctuations between signals that were presented over loudspeakers. Both verbal and non-verbal elicitation techniques were applied to examine the subjective effect. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the apparent width of the perceived sound source.
An Investigation of Interaural Time Difference Fluctuations, Part 3: The Subjective Effect of Fluctuations in Continuous Stimuli Delivered Over Loudspeakers

5458
Francis Rumsey,Russell Mason,Bart de Bruyn,
The subjective spatial effect of decaying noise signals with interaural time difference fluctuations was investigated. These fluctuations were created by sinusoidal interchannel time difference fluctuations between signals which were presented over loudspeakers. Both verbal and non-verbal elicitation techniques were applied to examine the subjective effect. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the apparent width of the acoustical environment whilst the apparent size of the perceived sound source did not change.
An Investigation of Interaural Time Difference Fluctuations, Part 4: The Subjective Effect of Fluctuations in Decaying Stimuli Delivered Over Loudspeakers

5459
Jeong-Il Seo,Jong-Won Seok,Jin-Woo Hong,Tae-Jin Lee,
Little attempts have been done on adopting a watermarking technique for Internet Audio Streaming Service. In this paper, we integrate an audio watermarking technique to MPEG-2 AAC Audio Streaming Service. Our novel audio watermarking scheme using linear prediction based watermarking embedding and extraction is robust to common signal processing attacks. We design and implement a simple packet loss recovery algorithm for the robust extraction of watermark data, and design a business model for Internet audio streaming service. Experimental results show that our future AAC Streaming Service can safely protect the streaming audio contents from any kinds of unauthorized copy or reproduction.
Internet Audio Streaming Service Technology Integrated with Copyright Protection

5460
Ralph Sperschneider,Pierre Lauber,
In digital audio, received bit streams of compressed audio data might be corrupted due to error-prone transmission channels. During decompression, errors will be propagated towards the audio output. Concealing these errors allows to minimize the resulting obtrusive deteriorations. The paper describes techniques for concealing transmission errors in ISO/MPEG-2/4 AAC digital audio signals by exploiting specific audio signal characteristics. These techniques have been successfully applied to both simulation and real-time processing.
Error Concealment for Compressed Digital Audio

5461
Thomas Thaler,Georg Dickmann,
IEEE 1394 can now be used to create a scalable, synchronous, integrated services infrastructure; the key to building this is the IEEE p1394.1 bridging standard. The paper shows how scalability can be achieved via split multiportal bridges, shows how to distribute network time for synchronization, and demonstrates feasibility through simulations.
Scalability and Synchronization in IEEE 1394-Based Content-Creation Networks

5462
Gabriele Spenger,Jurgen Herre,Christian Neubauer,Niels Rump,
Despite widespread interest, electronic commerce for audio currently still presents a major challenge with regard to aspects such as security, transaction handling, interoperability of devices and services and end user experience. While many component technologies are already available addressing certain aspects of this scenario, a seamless integration of these functionalities into one unified framework has not been established yet. In response to this problem, the vision of the ongoing MPEG-21 standardization effort is to define a multimedia framework enabling transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities. This paper attempts to provide an overview over the concepts and the current status of the MPEG-21 framework and discusses its relevance for future electronic distribution and commerce of audio.
MPEG-21 - What does it bring to Audio?

5463
Oliver Hellmuth,Jurgen Herre,Eric Allamanche,Markus Cremer,Thorsten Kastner,Wolfgang Hirsch,
Driven by an increasing need for characterizing multimedia material, much research effort has been spent in the field of content-based classification recently. This paper presents a system for automatic identification of audio material from a database of registered works. The system is designed to allow reliable, fast and robust detection of audio material with the resources provided by today's standard computing platforms. Based on low level signal features standardized within the MPEG-7 framework, the underlying audio fingerprint format bears the potential for worldwide interoperability. Particular attention is given to issues of robustness to common signal distortions, providing good performance not only under laboratory conditions, but also in real-world applications. Improvements in discrimination, speed of search and scalability are discussed.
Advanced Audio Identification Using MPEG-7 Content Description

5464
Richard Foss,Bob Moses,Rob Laubscher,
Digital Harmony Studio is a specification for an IEEE-1394-based studio architecture for professional audio production. The specification identifies a number of device categories, including legacy adapters. Legacy adapters provide a vital link between the pro studio environments and current pro audio devices, and will typically take the form of breakout boxes exposing legacy ports. This paper describes a reference design for the first working device within the ‘Legacy Adapter’ category of the specification.
A Legacy Adapter Component of a 1394-Based Professional Studio Architecture

5465
Manfred Hibbing,
Recent improvements of digital audio, represented by the new CD formats DVD-A and SACD, demand appropriately designed studio microphones with improved high-frequency characteristics. It will be shown that purely acoustical means are not suitable, as the noise performance would be degraded considerably. However, the problem can be solved by combining an alternative transducer design with electrical equalization. Details of the design backgrounds and practical results will be presented.
Design of Studio Microphones with Extended High-Frequency Response

5466
Richard Barnert,
Techniques are shown which make it possible to alter the frequency response and therefore the sound of a condenser microphone by using the nature of diaphragm modal shapes. The modal behaviour can further be influenced by stretching the diaphragm at certain points, which leads to increased sensitivity. By applying these low cost methods, it is possible to modify specific frequency responses and to improve the signal-to-noise ratio in an easy way.
Modal Improved Condenser Microphone

5467
Stephan Peus,Otmar Kern,
With the introduction of a new technique of analog to digital conversion a digitally interfaced microphone could be developed retaining the full dynamic range and quality of analog micro-phones. Similar to known gain-ranging procedures, two separate conversion circuits are em-ployed. But in opposite, that critical signal switching processes are completely prevented, re-sulting in a very high dynamic range and proper signal processing up to maximum signal lev-els. Advantages and possibilities of using the new technique are shown based on an example which contains remote controllable functions, which were so far available only in the following signal processing, e.g. in a mixing console.
Benefits of a Digitally Interfaced Studio Microphone

5468
Barry Blesser,
Artificial reverberator algorithms should be evaluated using stochastic methods. The reverberation impulse response is separated into the early part, containing the unique spatial personality, and the late part, containing the statistically random process. Stochastic models collapse the large amount of data in the late reverberation into a small number of temporal and spectral metrics. When they match the perceptual criteria, the process is transparent. This provides a scientific method for achieving high quality without the need for extensive ad hoc listening experiments. Other disciplines, notably statistical physics, psychoacoustics, architecture and music, contribute critically important insights. Some of the apparent paradoxes converge into a coherent theory using this approach.
An Interdisciplinary Integration of Reverberation

5469
Frank Foti,Robert Orban,
Few people in the record industry really know how a radio station processes its material before it hits the FM airwaves. This article’s purpose is to remove the many myths and misconceptions surrounding this arcane art. Every radio station uses a transmission audio processor in front of its transmitter. The processor’s most important function is to control the peak modulation of the transmitter to the legal requirements of the regulatory body in each station’s nation. However, very few stations use a simple peak limiter for this function. Instead, they use more complex audio chains. These can accurately constrain peak modulation while significantly decreasing the peak-to-average ratio of the audio. This makes the station sound louder within the allowable peak modulation.
What Happens to My Recording When it’s Played on the Radio?

5470
Duane Wise,
This paper addresses issues of digital IIR filter performance, namely the likelihood of fixed-point overflow and the propagation of quantization error. Measuring these quantities requires a method of determining a transfer function between arbitrary nodes of a filter structure. Norm functions are defined for application to these transfer functions which address overflow and error propagation issues depending on the signals employed. In addition, functions of the state matrix of an IIR filter are defined that measure the potential for limit cycles.
A Tutorial on Performance Metrics and Noise Propagation in Digital IIR Filters

5471
Ralf Geiger,Thomas Sporer,Jurgen Koller,Karlheinz Brandenburg,
Most of the current audio coding schemes use transforms like the Modified Discrete Cosine Transform (MDCT) to calculate a blockwise frequency representation of the audio signal. Since these transforms usually produce floating point values even for integer input samples, a quantization process is necessary to achieve a reduction of data rate. This paper presents a new transform with perfect reconstruction that produces integer output values. The transform is called IntMDCT and is derived from the MDCT preserving most of its attractive properties. It provides a good spectral representation of the audio signal, critical sampling and overlapping of blocks. A lossless audio coding scheme may be built by simply cascading IntMDCT with an entropy coding scheme.
Audio Coding based on Integer Transforms

5472
Joshua D. Reiss,Mark B. Sandler,
Sigma delta modulation is a popular technique for high-resolution analog-to-digital conversion and digital-to-analog-conversion. It has been considered as a new format for recording and storage of audio signals. To reduce the storage capacity, a lossless compression scheme can be applied. However, this scheme offers less than 3:1 compression. This may not be sufficient for storage on media such as a Digital Versatile Disk (DVD). We propose a scheme based on a technique known as bit-grouping. Errors are introduced in the compression, but they are confined to frequencies outside the audible range. Our studies indicate that bit-grouping allows one to achieve greater than 4:1 compression.
Efficient Compression of Oversampled 1-bit Audio Signals

5473
Mark Kahrs,
The Teager Energy Operator (TEO) is a nonlinear time domain operator with a delightfully simple implementation. We first review the TEO as well as the Discrete Energy Separation Algorithm (DESA). We compare Short Time Fourier Transform techniques such as the well known Phase Vocoder with the Energy Separation operator using synthetic signals. We also study the performance with noise. We also compare the Hilbert Transform Instantaneous Frequency detection with the TEO. We review the use of the DESA operator in musical instrument analysis and discuss the use of the TEO in transient detection.
Audio Applications of the Teager Energy Operator

5474
Mark R. Avis,
Low frequency normal modes of an enclosed soundfield introduce unwanted frequency, spatial and temporal artefacts to reproduced electroacoustic signals. A novel control approach is presented based on an analytical modal decomposition, which incorporates a low-frequency soundfield model formed from the sum of a number of second-order IIR filter sections. It is shown that within constraints determined by the model accuracy, IIR controllers may be constructed and may be applied to control tasks such as point pressure cancellation and the reduction of modal quality factor.
IIR Biquad Controllers for Low Frequency Acoustic Resonance

5475
Robert Adams,Karl Sweetland,
An audio DAC with internal DSP core has been designed that includes three high-quality audio DACs as well as a DSP programmed to provide algorithms that correct for speaker/amplifier deficiencies. These algorithms include equalization, two-band compression/limiting with arbitrary compression curve, spatial enhancement, delay and interpolation. All signal processing parameters are register-programmable using an external microcontroller. The DAC architecture uses a multi-bit sigma-delta design with a mismatch-shaping scrambler.
A Single-Chip Three-Channel 112 dB Audio DAC with Audio DSP Capability

5476
Claus-Christian Burgel,Reinfried Bartholomaus,Wolfgang Fiesel,Johannes Hilpert,Andreas Hoelzer,Karsten Linzmeier,Martin Weishart,
Contrary to the MPEG-1 Audio compression schemes, Advanced Audio Coding (AAC) in its MPEG-2 and MPEG-4 flavours has no inherent upper limit for the sampling frequency of the input signal. Furthermore, the bitstream format allows to cover a dynamic range far beyond the range provided by 24 bit linear PCM coding. This makes the AAC coder an ideal candidate for representing audio signals with parameters that are usually associated with high resolution audio systems. This paper discusses the application of this highly efficient compression scheme to digital programme material represented with 24 bit and 96 kHz.
Beyond CD-Quality: Advanced Audio Coding (AAC) for High Resolution Audio with 24 Bit Resolution and 96 kHz Sampling Frequency

5477
Stanley P. Lipshitz,John Vanderkooy,
This paper extends our previous studies of 1-bit sigma-delta modulators. We now investigate 1_-bit (i.e., 3 level) systems, and again find that we can predict their idle-tone behavior and the spectral structure of their output. We address the question of whether sigma-delta modulators are adequately dithered by their internal noise, and compare the behavior of minimum-phase and non-minimum-phase (i.e., chaotic) modulator designs. Numerous computer simulations, greatly aided by coherent averaging, have guided us to two basic mechanisms that explain the distortion: the sweeping of the idle tone, and the saturating quantizer characteristic. A model of the nonlinearity of a dithered quantizer explains the general nature of the harmonic distortion components and their phases. The data also show unambiguously that the maximum possible dither should always be used.
Towards a Better Understanding of 1-Bit Sigma-Delta Modulators - Part 2

5478
James A.S. Angus,
This paper clarifies some of the confusion which has arisen over the efficacy of dither in PCM and Sigma-Delta Modulation systems. It describes a fair means of comparison between them. It presents results that show dither is effective in Sigma-Delta Modulation systems of any order, and proposes methods for achieving optimum performance in both systems.
Effective Dither in High Order Sigma-Delta Modulators

5479
Malcolm O.J. Hawksford,
A non-invasive method of audio file identification is described to ascertain the proximity of processed audio, of possibly dubious origin, to a reference file. A differential correlation function forms the basis of a comparative metric. Testing embraces linear and non-linear processes including perceptual based codecs. Applications include DVD_A and SACD when used as the only legitimate non-watermarked release formats.
Non-invasive Identification of Audio Content for High-resolution Applications

5480
Matti Karjalainen,Aki Makivirta,Poju Antsalo,Vesa Valimaki,
In a room with strong low-frequency modes the control of excessively long decays is problematic or impossible with conventional passive means. In this paper we present a systematic methodology for active modal equalization able to correct the modal decay behavior of a loudspeaker-room system. Two methods of modal equalization are proposed. The first method modifies the primary sound such that modal decays are controlled. The second method uses separate primary and secondary radiators and controls modal decays with sound fed into the secondary radiator. Case studies of the first method of implementation are presented.
Low-Frequency Modal Equalization of Loudspeaker-Room Responses

5481
Louis D. Fielder,
Traditionally, electronic equalization is used to improve the subjective quality of sound reproduction through the use of simple linear filters of low complexity. It will be shown that the properties of typical rooms combine with psychoacoustics to limit practical equalization to the use of minimum-phase filters of relatively low order despite the existence of new and powerful digital signal processing tools. The high Q and non minimum-phase nature of the room-loudspeaker-listener transfer function due to wave interference effects creates severe problems for more complete equalization. Typical cinemas and a listening room will be used to investigate the difficulties of more powerful equalization approaches.
Practical Limits for Room Equalization

5482
Seyed-Ali Azizi,
The overall frequency response of a graphic or parametric equalizer bank, consisting of a number of serially connected cut or boost equalizers, may show serious deviations from the user defined gain setting. The deviations are caused by mutual interference’s of the equalizers, and depend on the gains, quality factors and center frequencies of the individual equalizers. This paper treats this problem first as a nonlinear optimization one, and then introduces a new approach to efficiently counteract the undesired interference effects. It is based on the „Opposite Filter Concept“: Based on the user defined parameter setting of an equalizer bank, a set of simple filters counteracting the interference’s are adequately parameterized and serially inserted into the equalizer bank, resulting in substantial diminution of the interference effects, and consequently in generation of an overall frequency response close to the desired one .
A New Concept of Interference Compensation for Parametric and Graphic Equalizer Banks

5483
Keith Weiner,
This paper presents a general-purpose interactive audio pipeline. It supports a set of sound streams and a set of processing algorithms, such that each stream may traverse through any number of processing algorithms in any order, and that control parameters may be changed on the fly. Unique to this system is support for on the fly buffer length changes at any stage of the pipeline, and programmatic changes to control parameters.
An Efficient Pipeline for Digital Signal Processing in Interactive Audio

5484
Brahim Hamadicharef,Emmanuel Ifeachor,
An intelligent audio system for sound design using artificial intelligence techniques is reported. The system is used to analyse acoustic recordings, extract salient sound features and to process them to generate parameters for sound synthesis, in a manner that mimics human audio experts. Preliminary tests show that the use of the system reduces design time and yet the quality of the resulting sound is considered high by audio experts.
An Intelligent System Approach to Sound Synthesis Parameter Optimisation

5485
Mark Ureda,
Straight-line arrays produce highly directional polar response curves in the vertical plane resulting in high on-axis gain. In many venues, however, it is useful to blend this high on-axis gain with improved response in the near field beneath and in front of the array. To accomplish this the lower section of a straight-line array is curved. This paper derives the directivity functions of two such arrays, namely the "J-Array" and the "Spiral Array."
"J" and "Spiral" Line Arrays

5486
Vance Breshears,
An increasing number of sound system designers are implementing multi-channel (primarily Left/Center/Right) sound reinforcement systems for both permanent installation, touring and show applications. Due to the large size of most performance venues and the speaker locations required to provide a stereo listening environment, the arrival times of the direct sound from primary speaker sources can be widely varying. Mixing techniques will be discussed and demonstrated through the use of auralizations. Basic mixing guidelines will be outlined.
Mixing Techniques for Multi-Channel (Left/Center/Right) Sound Reinforcement Systems

5487
Miomir Mijic,
Traditionally, Serbian Orthodox churches were small enough not to require sound reinforcement systems. However, during the last decade larger churches that required use of electroacoustical equipment were erected. They gave rise to the issue of aesthetical and functional requirements church audio systems needed to satisfy. The service in Serbian churches consists of acapela chorus’ polyphonic singing coupled with preachers’ chant. Results of recent acoustical research based on the analysis of autocorrelation function of signals recorded during the service, as well as subjective evaluation of acoustical quality, led to preliminary conclusions regarding desirable acoustical response in churches. This paper analyzes requirements for sound reinforcement systems in Serbian orthodox churches.
Design Requirements for Sound Reinforcement Systems in Serbian Orthodox Churches

5488
Paul Bauman,Marcel Urban,Christian Heil,
We introduce Fresnel’s ideas in optics to the field of acoustics. Fresnel analysis provides an effective, intuitive approach to the understanding of complex interference phenomena and thus opens the road to establishing the criteria for the effective coupling of sound sources and for the coverage of a given audience geometry in sound reinforcement applications. The derived criteria form the basis of what is termed Wavefront Sculpture Technology.
Wavefront Sculpture Technology

5489
Markus Erne,
Low-bit rate audio coding has become a widely used technology during past years. By of the use of sophisticated signal processing techniques, exploiting psychoacoustic phenomena, nontransparent coding results in artifacts sounding very different from traditional distortions which are frequently not obvious at all to the untrained listener. The AES Technical Committee on Audio Coding therefore has started an activity to produce a CD-ROM which presents some of the most common coding artifacts in more detail. The CD-ROM not only explains and comments each of the coding artifacts separately but for each artifact, audio examples are presented, using different degrees of distortion, varying from "subtle" up to "obvious".
Perceptual Audio Coders "What to listen for"

5490
Michael J. Smithers,Matt C. Fellers,
Presented are modifications to the MPEG-2 AAC encoder that significantly increase computational efficiency while maintaining high sound quality. These modifications include changes to the perceptual model, block-switching control, pre-estimation of quantizer scale-factors, and changes to the quantizer rate/distortion loop. These changes result in an overall speedup (when combined with processor-specific optimizations) of approximately 250% compared to the reference Low Complexity professional MPEG-2 AAC encoder. Tests show a mean degradation of 0.2 on the ITU-R 5-point audio impairment scale.
Increased Efficiency MPEG-2 AAC Encoding

5491
Sang-Wook Kim,Sung-Hee Park,Yeon-Bae Kim,
Previous MPEG-1, MPEG-2 Audio standards provided a single bitrate, single bandwidth tool set, with different configurations of that tool set specified for use in various applications. MPEG-4 provides several bitrate and bandwidth options within a single bitstream, providing a scalability functionality that permits a given bitstream to scale to the requirement of different channels and applications or to be responsive to a given channel that has dynamic throughput characteristics. Many of the tools specified in MPEG-4 are the state-of-the-art tools providing scalable compression of speech and audio signals. In this paper, we present the fine grain scalability tool in MPEG-4 Audio.
Fine Grain Scalability in MPEG-4 Audio

5492
Chris Dunn,
A comparison of audio coder quantisation schemes that offer fine-grain bitrate scalability is made with reference to fixed-rate quantisation. Coding efficiency is assessed in terms of the number of bits allocated to significant transform coefficients, and the average number of significant coefficients coded. A new method of arranging the transform hierarchy for SPIHT zero tree algorithms is shown to result in significantly improved performance relative to previously reported SPIHT implementations. Results for a new quantisation algorithm are presented which suggest low-complexity fine-grain scalable coding is possible with no coding efficiency penalty relative to fixed-rate coding.
Efficient Audio Coding with Fine-Grain Scalability

5493
Michael Truman,John White,Michael J. Smithers,
A key requirement of interactive applications is that they respond quickly to user input. This demands that the audio signal processing be performed with minimal delay. Generally, perceptual audio coders are in conflict with this requirement because they process data in long blocks to improve compression performance. This paper describes a real-time, multi-channel audio encoder designed to minimize the delay that is compatible with current consumer home theatre decoding technology.
Low-Latency Encoding for Consumer Applications

5494
Shyh-shiaw Kuo,James Johnston,
We have studied and concluded that the time domain cross channel prediction is generally not applicable to perceptual audio coding.
A Study Of Why Cross Channel Prediction Is Not Applicable To Perceptual Audio Coding

5495
Earl Vickers,
Traditional audio level control devices, such as automatic gain controls (AGCs) and compressors, generally have little or no advance knowledge of the dynamic characteristics of the remainder of the current audio program. If such advance knowledge is available (i.e., if audio files can be pre-analyzed), it becomes possible to match desired values of overall loudness and dynamics. We introduce two new measures, "long-term loudness matching level" and "dynamic spread," and present new methods for long-term loudness and dynamics matching.
Automatic Long-term Loudness and Dynamics Matching

5496
Aki Makivirta,Christophe Anet,
The in-situ responses of a total of 372 loudspeakers in 164 professional monitoring rooms around the world have been measured after acoustical calibration. All measured rooms have been equipped with factory calibrated three way monitors and acoustically calibrated with standardized apparatus. The results provide a thorough understanding of typical monitoring conditions for stereo and multichannel rooms, distribution in room parameters and quality of reproduced audio. Results are compared to current standards and recommendations.
A Survey Study of In-Situ Stereo and Multi-Channel Monitoring Conditions

5497
Robert Jay Ellis-Geiger,
It is a good time to reflect on where the music and audio recording industry has come from and to consider how best to use the new hardware/software tools, options, views and opinions that are being presented to us. In support of this paper the author will give an audio demonstration, covering recording, production and mastering using a computer. The author will also give a live demonstration on composing music for film to illustrate the creative possibilities when using a computer to assist ‘traditional’ orchestral composition and orchestration.
Music and Sound Production Within a Computer

Back to AES Preprints


(C) 2003, Audio Engineering Society, Inc.