Audio Engineering Society Preprints

AES 109th Convention

Los Angeles, CA, USA
2000 Sept. 22-25

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

5174
Tero Tolonen,
This study presents a framework for audio and music processing, which comprises an analysis and a synthesis path that are connected at three-representational levels. Auditory signal analysis techniques include a multipitch analysis model, an event detector, and sinusoidal modeling which are combined in an iterative sound separation system. Techniques are presented for detection of perceptually relevant features, such as inharmonicity, vibrato and decay characteristic, from polyphonic mixtures of harmonic sounds. The integration of the analysis and synthesis parts is illustrated with examples where two-voice acoustic guitar signals are analyzed into an object-based representation and resynthesized using sound source models.
Object-Based Sound Source Modeling for Musical Signals

5175
Duane K. Wise,
This paper investigates the stopband sensitivity of linear-phase FIR filters, specifically sensitivity degradation when the stopband level is on the order of the resolution of the tap weights. The tap weights of the FIR filter can be divided into subgroups and individually scaled so that sensitivity improves at a relatively small computational expense. This advantage can be gained only with a specific variant of convolution that exploits a common pattern of linear-phase FIR tap weights.
Block Floating-Point FIR Filters Using a Fixed-Point Multiplier

5176
Christian Neubauer ,Jürgen Herre,
Audio watermarking is a technique for the transmission of additional data along with audio material within existing distribution channels. Recent research has produced a number of algorithms for the embedding and retrieval of watermarks in audio signals. While most known systems operate in the linear domain, few are capable of embedding watermarks into compressed material. This paper investigates the different areas of applications and their specific requirements with respect to watermarking technology. Both watermarking in linear and compressed domain are contrasted in their concepts and properties, and recent technological advances are described.
Advanced Watermarking and Its Applications

5177
Heiko Purnhagen,Nikolaus Meine,Bernd Edler,
Parametric modeling permits an efficient representation of audio signals and is utilized for very low bit-rate coding in the MPEG-4 Standard. The paper examines the MPEG-4 parametric audio coding tools, Harmonic and Individual Lines plus Noise (HILN), which are based on a decomposition of the audio signal into components that are described by appropriate source models and represented by model parameters. Until now, HILN encoding mainly focused on maximum audio quality at the expense of high-computational complexity. In this paper different approaches to speed up HILN encoding are presented and the tradeoff between computational complexity and audio quality is analyzed.
Speeding Up HILN-MPEG-4 Parametric Audio Encoding with Reduced Complexity

5178
Ye Wang ,Miikka Vilermo,Leonid Yaroslavsky,
This paper focuses on the energy compaction properties of five different transforms: DFT, DCT, SDFT( (N+1)/2, 1/2), MDCT, and DST. Energy compaction properties of these transforms are compared experimentally. In addition to sinusoidal signals, 16 classical and pop music pieces were used for the experiments. The influence of different window sizes (256, 512, and 1024 samples) and different window shapes (rectangular and sine) was investigated. The results of the experiments are presented and analyzed.
Energy Compaction Property of the MDCT in Comparison with Other Transforms

5179
Guy Torio,Jeff Segota,
Dual-diaphragm condenser capsules are commonly used in popular multipattern microphones as well as in some single-pattern unidirectional designs. These capsules react much differently to sound sources than single-diaphragm capsules in the frequency range of up to approximately 1 kHz. As a method of developing intuition, frequency response and directionality as a function of distance from the sound source were demonstrated using a simple model. Pop sensitivity and multipattern operation were also investigated. Comparisons and conclusions are validated with measurements made on physical microphones.
Unique Directional Properties of Dual-Diaphragm Microphones

5180
Woolf Chris,Oliver Prudden,
The manufacturers of microphones pay considerable attention to the response curves and polar patterns of their products. However the addition of any mounting clips, suspension systems or windshielding devices will negate much of it. Understanding the deterioration is assisted by being able to visualize it clearly. This paper describes a display technique that can be associated with conventional measurement systems and shows both response and polar pattern on a single plot in a manner that is easy to interpret. It also shows how this can be extended to display the effect of individual elements of a microphone accessory in an unambiguous graphical fashion. The paper illustrates several typical examples of pattern disturbance and how their effects can be explained.
A Display Technique for Evaluating the Disturbance of Microphone Response Patterns

5181
de Bree H.E.,
Pressure-gradient microphones are not very sensitive to low-frequency sound waves. The Microflown, however, has a high signal-to-noise ratio at these lower frequencies. A combination of a Microflown and a pressure-gradient microphone has been realized to create a high-end wide-band velocity microphone pair with a figure-eight polar pattern. The electronics, realization, and test results of this add-on microphone are presented. With the use of the wide-band microphone pair, the Blumlein recording technique was tested.
Add-On Microflown for a High-End Pressure-Gradient Microphone

5182
J. W. van Honschoten,H.E. de Bree,van Eerden F. J. M.,Krijnen G. J. M.,
The Microflown, a novel acoustic flow sensor that is capable of measuring particle velocity in a fluid, can easily and accurately be calibrated in a standing-wave tube. This calibration method in the standing-wave tube is generally most favorable. Even very low frequencies can be measured with this simple setup. It is, therefore, important to analyze the limits of validity of the calibration method. In this paper the effects of the viscous and thermal properties of the fluid on the measurements of the transfer function, when calibrating sensors, are considered. It is shown both theoretically and experimentally for the broad-frequency spectrum that these viscothermal effects are relatively small.
The Influence of Viscothermal Effects on Calibration Measurements in a Tube

5183
Jong-Soong Lim ,Chris Kyriakakis,
This paper describes a method for implementing immersive audio rendering filters for multiple listeners simultaneously. In particular, the paper focuses on the case of two listeners and four loudspeakers to determine the optimum weighting vectors for the necessary FIR filters using the least-mean-squares (LMS) adaptive algorithm. It also presents the results of four different adaptive filter implementations which can deliver a binaural audio signal to each listener's ears for two different situations. The first situation renders a virtual source in the same direction for both listeners; and the second one renders the virtual source at different directions for each listener. Several signal processing and implementation issues are presented. Results are presented for the general case in which each listener is seated at an arbitrary position relative to the loudspeakers as well as for the simpler to implement symmetric case in which each listener is seated at the center of each loudspeaker pair.
Virtual Loudspeaker Rendering for Multiple Listeners

5184
Aki Mäkivirta,Genelec OY,Jan Abildgaard Pedersen,
This paper presents a method to change independently the amplitude and delay responses in subjective listening test material. It is necessary in subjective listening experiments to apply modern statistical methods treating simultaneously several statistical variables. A case study of producing audio test material with this method is presented. It is related to an experiment where the audibility of the amplitude response variation and the delay response variation are studied at low frequencies based on data obtained from impulse responses of a room.
A Method for Orthogonal Amplitude and Delay Processing of Subjective Listening Test Material

5185
Michael J. Kemp,S. Marcos ,
A method for applying nonlinear synthesis to analyze and simulate traditional audio dynamics compressors and limiters is described. A set of level-dependent impulse response measurements was taken at each of the multiple attenuation levels. In addition, a measurement of attenuation characteristic against applied signal amplitude was made at various settings of the device and applied in simulation, and a bilinear dynamic convolution was performed varying on a sample-by-sample basis.
Analysis and Simulation of Analogue Dynamic Compressors and Limiters in the Digital Domain

5186
Ricardo A. Garcia,
An overview of an algorithm that searches through the space of sound synthesis techniques is presented. A modular approach to construct sound synthesis techniques is introduced. The preparatory steps needed to use genetic programming as a search tool for this space are explained, focusing on the manipulation and evaluation of the modular descriptions of the topologies.
Towards the Automatic Generation of Sound Synthesis Techniques: Preparatory Steps

5187
Mark Phillips,Jeff Barish,Rob Maher,
Multidimensional signal modeling of musical notes can produce expressive and realistic synthesis for musical applications. The PRISM synthesis technology employs time-varying signal models using envelopes for pitch, power, and timbre evolution. PRISM also models the statistical variations of parameters over time in order to produce indefinitely long, nonlooped sustains while maintaining the character of the original note's sustain. The models from multiple signals are combined into a multidimensional parameter space. During synthesis, the desired location in the parameter space is mapped from input control values. Based on the desired location, the neighboring data sets are interpolated to form an intermediate, smoothly varying data set for synthesis. Using signal models allow intuitive and powerful modifications of note properties both offline and in real time.
The Modeling and Synthesis of Musical Signals with PRISM

5188
Stanley P. Lipshitz ,John Vanderkooy,
Single-stage, 1-bit sigma-delta converters are, in principle, imperfectible. The authors prove this fact. The reason, simply stated, is that when properly dithered they are in constant overload. The consequence is that distortion, limit cycles, instability, and noise modulation can never be totally avoided. Recording, editing, or storage systems based upon single-stage, 1-bit sigma-delta conversion and, in particular, professional systems using this type of conversion are thus a bad idea. In contrast multibit sigma-delta converters, which output linear PCM code (here, multibit refers to five or so bits in the converter), are in principle infinitely perfectible. They can be properly dithered to guarantee the absence of all distortion, limit cycles, and noise modulation. The audio industry is making a tragic mistake if it adopts 1-bit sigma-delta conversion as an archival format to replace multibit, linear PCM.
Why Professional 1-Bit Sigma-Delta Conversion is a Bad Idea

5189
Godwin L. Bainbridge,Malcolm O. J. Hawksford,Peter J. Hughes,
This paper explores the use of stereo acoustic echo cancellation, where the method of sound spatialization utilizes a pair-wise loudspeaker paradigm using head-related transfer functions (HRTFs) with crosstalk cancellation. The algorithms utilized for the adaptive filters are the fast affine and the block recursive least squares. The effects of using distributed mode loudspeakers with this method of sound spatialization are also considered.
Stereo Acoustic Echo Cancellation for Sound Spatialisation Using Pair-Wise Loudspeakers with Cross-Talk Cancellation

5190
Shige Nakao,Hitoshi Terasawa,Fumitaka Aoyagi,Norio Terada,
A current-mode multibit DAC has been designed to meet the needs of high-end audio with a dynamic range of 117 dB and THD+N of 0.0004%. The device supports the 24-bit PCM audio format at sample rates ranging from 16 kHz to 192 kHz, which allows DVD-Audio playback. The architecture implemented in the device differs significantly from previous multibit sigma-delta modulators. The achievements of the new architecture compared to previous proposals are a significant improvement of level linearity of signal-to-noise ratio (SNR) and idle tones, which arise from element mismatching. This circuit-block configuration also provides FIR filtering for DSD (SACD) input data.
A 117-dB D-Range Current-Mode Multi-Bit Audio DAC for PCM and DSD Audio Playback

5191
Kevin James McLaughlin ,Adams Robert,
An asynchronous sample-rate converter is presented, which supports rates of up to 192 kHz with a dynamic range of 139 dB. Down-sampling ratios up to 7.75:1 and up-sampling ratio of 1:8 are achieved while maintaining a minimum of 120-dB THD+N within these sampling-rate ratios. A digital servo loop provides excellent jitter rejection and automatic tracking of the sampling ratio. The output samples of the sample-rate converter are obtained by convolving the input samples with the desired antialiasing polyphase filter. To perform the convolution, the upper bits of the output of the digital servo loop are used as the starting pointer to the input samples stored in RAM, while the lower bits are used to select the correct polyphase coefficients from a highly oversampled polyphase filter.
An Asynchronous Sample-Rate Converter with 120-dB THD+N Supporting Sample Rates Up to 192 kHz

5192
Malcolm O. J. Hawksford,
A family of current-steering transimpedance amplifier circuits is presented for use in high-resolution digital-to-analog converters. The problems of achieving accurate current-to-voltage conversion are discussed, with a specific emphasis on digital audio applications. Comparisons are made with conventional virtual-earth feedback amplifiers, and the inherent distortion mechanisms relating to dynamic open-loop gain are discussed. Motivation for this work follows the introduction of DVD-Audio carrying linear PCM with a resolution of 24 bit at a sampling rate of 192 kHz.
Current-Steering Transimpedance Amplifiers for High-Resolution Digital-to-Analogue Converters

5193
Pallab Midya,Miller Matt,Mark Sandler,
Integral noise shaping (INS) is introduced as a method for quantization of pulse-width modulation (PWM), which takes into account the fact that it is PWM that is being quantized. The quantization errors are noise shaped by analytically integrating the ideal and quantized PWM waveforms. The resulting quantizer can noise shape the rising and falling edges of two-sided PWM through the same quantization process. This effectively doubles the oversampling ratio without any changes to the switching or clock frequency. This method results in a tremendous improvement in signal-to-noise ratio (SNR). It is particularly useful for high-end digital audio amplifiers.
Integral Noise Shaping for Quantization of Pulse-Width Modulation

5194
Pallab Midya,Bill Roeckner,Pat Rakers,Poojan Wagh,
This paper introduces a computationally efficient, highly linear, natural sampling algorithm. The natural sampler is an essential block in a power efficient PWM digital audio amplifier. The algorithm is implemented on a DSP and comprises a prediction filter and an interpolator. The prediction filter uses feedback to get a very accurate estimate of the required sampling time for audio frequency signals. The interpolator extracts the PWM duty ratio from the input signal based upon the prediction filter's sampling estimate. The algorithm is scalable with performance requirements in terms of computation required as well as data and program memory.
Prediction Correction Algorithm for Natural Pulse-Width Modulation

5195
Mark Takita,
A compact, high performance PWM hybrid amplifier module, currently under development at Nikon Research Corporation of America, is described. This module has a compact volume of 6-cubic in and a power output capability of 1400 W continuous and 2450 V-A peak. The circuitry is optimized to reduce distortion and increase the linearity of this device, making it appropriate for audio amplification.
A Hybridized, High Performance, Compact PWM Amplifier for Audio

5196
Michael Score,Donald Dapkus,
Class-D amplifiers are two to four times more efficient than Class-AB amplifiers and provide a solution for applications where lower supply current, lower heat dissipation, or longer battery life are necessary. However, traditional Class-D amplifiers introduce other problems: increased solution size and cost. These Class-D amplifiers require an output filter to operate. A new modulation scheme has been developed to eliminate the need for an output filter. This paper discusses the advantages of Class-D over Class-AB and focuses on the details of the new modulation scheme and why it enables filterless operation. This paper also discusses the effect that eliminating the output filter has on supply current, efficiency, sound quality, and electromagnetic interference (EMI).
Optimized Modulation Scheme Eliminates Output Filter

5197
Thomas Frederiksen,Henrik Bengtsson,Karsten Nielsen,
A novel high-efficiency power amplifier topology for audio reproduction is presented. The topology breaks previous performance barriers in switching technology by combining an effective error-correction method, Multivariable Enhanced Cascade Control (MECC), with a new integrated modulator topology, Controlled Oscillation Modulation (COM). This topological combination proves to be very elegant. Extensive measurements are given on 250-W, 500-W, and 1000-W case implementations of the MECC/COM topology, showing, e.g., 0.0005% (-106-dB) true THD combined with state-of-the-art power and volume efficiency.
A Novel Audio Power Amplifier Topology with High Efficiency and State-of-the-Art Performance

5198
César Pascual,Bill Roeckner,
This paper describes two new algorithms to convert a uniformly sampled sequence into a naturally sampled pulse-width modulation (NPWM) signal. They are suitable for both single- and double-edge NPWM. Single-edge algorithm A requires nine additions and seven multiplications per output sample and achieves a total harmonic distortion (THD) of -88.48 dB when the input signal is a 6.66-kHz tone upsampled 8 times. Single-edge algorithm B requires 14 additions, 11 multiplications, and one comparison per output sample and yields -113.07 dB of THD under the same conditions. Intermodulation distortion is very low for both algorithms. Polynomial interpolation methods with the same degree of accuracy are computationally more expensive. The algorithms have enabled the implementation of digital audio amplifiers performing efficient real-time digital signal processing.
Computationally Efficient Conversion from Pulse-Code Modulation to Naturally Sampled Pulse-Width Modulation

5199
Søren Bech,
The audibility of changes in the lower system cutoff frequency and slope and in passband amplitude and group delay ripple has been investigated for a subwoofer system for two situations: a real loudspeaker in an anechoic chamber and a simulated system reproduced via headphones. The signals were program material, selected to ensure a sufficient energy content at the relevant frequencies. The experiments were conducted with six subjects with normal hearing using a paired comparisons procedure. The subjects evaluated the magnitude of lower bass and upper bass in relation to a fixed reference condition. The first experiment investigated the influence of filter order (second, fourth, and sixth) and lower cutoff frequency (20, 35, and 50 Hz) at three reproduction levels and for four programs. The second experiment examined the influence of amplitude and delay ripple corresponding to four reverberation times at three reproduction levels and for four programs. The results of the first experiment show that the lower cutoff frequency has a significant influence on the perceived level of lower and upper bass reproduction, independent of reproduction levels. The filter order was found not to be of significant importance to the conditions investigated. The results of the second experiment show that the amplitude ripple has a significant influence on the perceived level of lower and upper bass reproduction. The results further show that there were no significant differences between the data produced by the two reproduction methods.
Quantification of Subwoofer Requirements, Part II: The Influence of Lower System Cut-Off Frequency and Slope and Pass-Band Amplitude and Group Delay Ripple

5200
Ulrich Horbach,
It is well known that the inherent linear and nonlinear distortion characteristics of conventional multiway studio transducers can only partly be corrected by digital filter techniques because of the fact that the acoustic output is distributed over the area of interest in a more or less nonuniform way. In this investigation, two different approaches to overcome this problem are presented and compared. First, it is shown that local equalization of a distributed mode panel loudspeaker around the center of excitation, using the early part of the impulse response, yields a very good result over a wide area. The second approach employs a wide-band waveguide design, where the acoustic output of the compression driver is equalized, and nonlinear distortion is corrected by a cost-efficient implementation of a Volterra filter.
Design of High-Quality Studio Loudspeakers Using Digital Correction Techniques

5201
Floyd E. Toole,
At the point of delivering recorded sounds to listeners, the complex interactions of loudspeakers, rooms, and listeners present many challenges to delivering satisfying listening experiences. In the past the directional and spatial limitations of two-channel stereo led to many creative solutions in both loudspeakers and room design. However, multichannel audio changes the rules and sets new requirements for optimum listening conditions, not all of which are yet understood. Underlying all of this, though, are the basic requirements for timbral accuracy-good sound quality. In this respect there is a need for the recording industry to catch up to the highest standards of consumer audio if the "art" is to be adequately preserved. This presentation is a review of the science of sound reproduction in rooms, with a specific focus on what may be needed in order for our industry to face the challenges of multichannel recording and reproduction.
The Acoustics and Psychoacoustics of Loudspeakers and Rooms-The Stereo Past and the Multichannel Future

5202
James D. Johnston,Yin Hay ,
In the usual stereo audio presentation, a partial sound stage consisting primarily of the front elements of the sound stage is created by two channels, either sampled from several microphones set in the original sound field or more often by a mixdown of many microphones placed both in proximity to the performers and out in the hall to capture the ambience. The information presented by the two channels, in either case, is a small fraction of the information in the original sound field. Additionally, this fraction is presented to the front of the listener. The presentation does not create an envelopment experience, where one is immersed in the original sound field, as the information is not present. While some processors mimic the effect, such effects are not based on the actual venue but rather on some hypothetical model of a venue. In holographic or auralized two-channel presentation, a presumed human head-related transfer function (HRTF) is used to create an impression of sound arising from other than the front of the listener. This works well in headphones or with interaural cancellation for one listener facing directly ahead and on the central axis between the loudspeakers. This method can, with some difficulty, produce an immersive effect for one point in the sound field, assuming that the subject maintains the proper head position, and the subject's head has an HRTF like that of the presumed functions. The ultimate form of this is, of course, binaural recording, where an actual head model is used to capture the information for one head location. Beyond two-channel presentation, one can think of analytically capturing an original sound field to some degree of accuracy. This would require the use of many channels, perhaps placed in a sphere about the listener's head in the simplest form, requiring very high data rates (1000 to 10 000 channels, perhaps) and creating a very high probability of influencing the sound field in the space with the microphones and the supporting mechanisms. As a result this technique is currently infeasible, and is likely to remain infeasible, for basic physical reasons as well as data-rate reasons, and actual analytic capture of the spatial aspects of a sound field in this fashion is unlikely.
Perceptual Soundfield Reconstruction

5203
Yew-Hin Liew,Jun Yang,See-Ee Tan,Woon-Seng Gan,
The intention of crosstalk cancellation is to invert the transmission path of the system. Based on psychoacoustic masking in the frequency domain, the inversion problem is relaxed when many perceptual equivalent outputs exist. This paper presents a scheme that is capable of reducing the power consumption by employing frequency masking and frequency-dependent regularization. Subjective testing was also conducted to verify the perceptual quality.
Power Improvement in Crosstalk Cancellation Using Psychoacoustic Frequency Masking

5204
See-Ee Tan,Jun Yang,Yew-Hin Liew,Woon-Seng Gan,
A real-time postprocessing technique is proposed to enhance the sound images for playback over elevated loudspeakers for both two loudspeaker and multiple loudspeaker configurations. This is achieved by filter design using either one-third-octave filtering or regularization to prevent sharp peaks in the frequency response. The algorithm was implemented on a TMS320C6201 EVM DSP board, and subjective testing was carried out.
Elevated Speakers Image Correction Using 3-D Audio Processing

5205
Tero Tolonen,Hanna Järveläinen,
A listening experiment was conducted to study the audibility of variation of decay parameters in plucked string synthesis. A digital commuted-waveguide-synthesis model was used to generate the test sounds. The decay of each tone was parameterized with an overall and a frequency-dependent decay parameter. Two different fundamental frequencies, tone durations, and types of excitation signals were used totaling eight test sets for both parameters. The results indicate that variations between 25% and 40% in the time constant of decay are inaudible. This suggests that large deviations in decay parameters can be allowed from a perceptual viewpoint. The results are applied to model-based audio processing.
Perceptual Study of Decay Parameters in Plucked String Synthesis

5206
Jan Berg,Francis Rumsey,
In an experiment, inspired by aspects of the Repertory Grid Technique, to find the dimensions forming the perceived spatial impression of a sound reproducing system, subjects frequently described their experiences as being either natural or artificial. These results are analyzed using multivariate methods to investigate the correlation between attributes relating to naturalness and other more descriptive attributes.
Correlation between Emotive, Descriptive and Naturalness Attributes in Subjective Data Relating to Spatial Sound Reproduction

5207
Douglas S. Brungart,
This paper describes a novel auditory distance display that uses prerecorded speech samples to simulate the loudness and vocal effort of a live talker at distances ranging from 0.25 m to 64 m from the listener. The speech samples were recorded from six talkers at vocal efforts ranging from a quiet whisper to a loud shout and were carefully processed to reproduce the same audio signal at the listener's ears as a live talker at the simulated distance location. Listening tests showed that the perceived distances of the stimuli were compressed relative to the simulated distances of the talkers, but that changes in the vocal effort of the talker reliably produced large changes in the apparent distance of the utterance.
A Speech-Based Auditory Distance Display

5208
Josef Chalupper,
In this study two so-called "psychoacoustic processors" are examined exemplarily by applying concepts, models, and methods of scientific psychoacoustics. Physical measurements of processed sounds and results of hearing experiments on speech intelligibility and sound quality (Aural Exciter) and loudness (Loudness Maximizer) are presented and discussed with regard to classic psychoacoustic models and potential new applications. Therefore relevant psychoacoustic facts, in particular the perception of nonlinear distortion, are reviewed.
Aural Exciter and Loudness Maximizer: What's Psychoacoustic about "Psychoacoustic Processors?"

5209
Mark A. Ericson,
Linear motion of a harmonic sound source was simulated for various trajectory paths. The acoustic signal was processed with frequency and intensity changes because of Doppler shifts in frequency, overall intensity changes, and atmospheric absorption. While listening to these sounds over headphones, four participants were asked to make magnitude estimations of the sound source speed under various combinations of motion effects. The frequency and intensity changes were found to contribute to the ability of the listeners to judge sound source speed. Inclusion of these motion attributes produced a veridical simulation of sound source motion.
Magnitude Estimation of Sound Source Speed

5210
David W. Gunness,Ryan J. Mihelich,
The traditional method of predicting the acoustical field produced by an arbitrarily shaped source is a high-frequency, angle-limited reduction of the Kirchhoff-Helmholtz equation. The broadband, broad-angle version of the Kirchhoff-Helmholtz equation is derived and implemented as a numerical method. Acoustical field predictions of real sources developed with this method agree closely with measured data. This agreement even extends to low frequencies and angles near and beyond 90 degrees off of the primary axis. Applications of the technique are described, including a powerful and efficient directional response characterization method.
Loudspeaker Acoustical Field Calculations with Application to Directional Response Measurement

5211
David W. Gunness ,William R. Hoy,
The modeling technique presented in Part 1 of the work is extended to three-dimensional space through the use of a flat tessellation of the horn mouth. This is made possible by a more complete version of the Kirchhoff-Helmholtz integral, which is applicable to a surface of arbitrary shape. The three-dimensional technique is effective with asymmetrical devices and produces better agreement with measurements at low frequencies and at angles near and beyond 90 degrees off axis.
Improved Loudspeaker Array Modeling-Part 2

5212
Juha Backman,
Numerical criteria for describing the frequency response of a loudspeaker are discussed with respect to their usefulness of numerical optimization of loudspeakers. Bandpass and transmission-line enclosures are used as examples to illustrate how bandwidth, flatness of response, smoothness of response, etc., can be defined in a numerically efficient manner, using similar definitions of the criteria for these loudspeaker types that are very different in their characteristics.
Optimization of Bandpass and Transmission-Line Loudspeakers

5213
Andrew Bright,
The vibration behavior of small, single-suspension loudspeakers is investigated experimentally. A theoretical model of rocking-mode behavior was developed from these measurements. The theoretical models are used to assess the problems of rocking modes and other behavior introduced by the single-suspension construction.
Vibration Behaviour of Single-Suspension Electrodynamic Loudspeakers

5214
Mario Mario Di Cola,Davide Doldi,
Today's studies about horns are actually based almost completely upon directivity control. In fact, while a good power response is finally recognized as important in professional loudspeaker systems engineering, the directivity behavior of horns represents certainly a useful tool to get it. The directional properties of these devices are governed by the wavefront's shape presented at the mouth. An analysis of the sound pressure's magnitude and phase distribution across the horn's mouth can certainly be helpful in understanding how the wavefront is shaped there. Moreover, this can help to understand what happens in some particular circumstances. For example, midrange beaming or high-frequency mouth diffraction phenomena are two well-known obstacles to overcome when designing a broadband constant directivity horn. A method to perform such an analysis is shown through graphic illustrations. This paper presents the results obtained performing measurements on real devices and the correlation to the polar plots.
Horn's Directivity Related to the Pressure Distribution at Their Mouth

5215
Neil Harris,Vladimir Gontcharov,Malcolm O. J. Hawksford,
The acoustics of conventional, dipolar, and panel loudspeakers are compared in the presence of a dominant specular reflection, using both measurement and numerical simulation. Following the suggestions of earlier papers, the degree of correlation between the direct and indirect sounds is investigated. The concept of a "correlation map" is introduced.
Measurement and Simulation Results Comparing the Acoustics of Various Direct Radiators in the Presence of a Dominant Specular Reflection

5216
D. B. (Don) Keele, Jr.,
A brief tutorial review of Constant Beamwidth Theory (CBT), as first developed by the military for underwater transducers (JASA, July 1978 and June 1983), is described. In this paper the transducer is a circular spherical cap of arbitrary half-angle with Legendre function shading. This provides a constant beam pattern and directivity with extremely low side lobes for all frequencies above a certain cutoff frequency. This paper extends the theory by simulation to discrete-source loudspeaker arrays, including: 1) circular wedge line arrays of arbitrary sector angle, which provide controlled coverage in one plane only; 2) circular spherical caps of arbitrary half-angle, which provide controlled axially symmetric coverage; and 3) elliptical toroidal caps, which provide controlled coverage for arbitrary and independent vertical and horizontal angles.
The Application of Broadband Constant Beamwidth Transducer (CBT) Theory to Loudspeaker Arrays

5217
James A. S. Angus,
The distributed mode loudspeaker's performance is analyzed with reference to its resonance structure. In particular the effect of the wave propagation type, shear or bending, over the frequency range is examined. The paper also examines the effect of diffusing boundaries on the resonance structure.
Distributed Mode Loudspeaker Resonance Structures

5218
Yuvi Kahana,Philip A. Nelson,
The theory for finding a set of orthogonal basis functions describing sound radiation and scattering from irregular-shaped bodies is discussed. The technique is based on the use of singular value decomposition (SVD). It is shown how the "mode shapes" of a radiating sphere, described by the complex spherical harmonics, are related to those extracted using SVD. The method is implemented numerically, using the boundary-element method (BEM), for the cases of an ellipsoid, a baffled shallow cylinder (to describe the concha), and an accurate baffled pinna to show the relationships between the basis functions on the surface of the body and those in the far field. For the latter case, numerical simulations of the mode shapes, as investigated by E. A. G. Shaw, are also presented.
Spatial Acoustic Mode Shapes of the Human Pinna

5219
Felipe Orduña,Jóse Javier López,Alberto González,Grao Gandia,
The condition number of the matrix of electroacoustic head-related transfer functions (HRTFs) in a two-channel sound reproduction system has been used as a measure of robustness of the Atal-Schroeder crosstalk canceller. A comparative study has been made using results produced by computer simulations and HRTFs measured in an anechoic chamber by means of a dummy head. It has been found that acoustic scattering by the head has a very important and beneficial influence on robustness, especially for large loudspeaker separations. For narrow loudspeaker separations of less than about 40 degrees, it has been found that crosstalk cancellation exhibits a large variation of alternating between very low and very high robustness. Simulations and measurements have also been made of the natural channel separation under the same conditions. Scattering by the head seems to provide a good level of natural channel separation at high frequencies and large loudspeaker angles. At low frequencies or small loudspeaker angles, natural channel separation is poor.
Robustness of Acoustic Crosstalk Cancellation as a Function of Frequency and Loudspeaker Separation

5220
Carlos I. Beltran,Jesse H. Spence,
Desktop computers can now execute exhaustive numerical models of loudspeakers in a matter of minutes. The ability to model mechanical structures, acoustical spaces, and their interaction has been demonstrated. However, the accuracy has been limited to approximately one octave above the first mechanical breakup of the diaphragm. A more precise model is needed. Prior limitations are discussed; and new methods are introduced, which have been successfully integrated into commercially available finite-element analysis software. For the first time high-bandwidth drivers, such as full ranges, and woofer midranges are being modeled accurately over their entire operating range. The analysis takes five minutes on a typical PC as of May 2000.
High-Accuracy Wide-Bandwidth Automated Loudspeaker Modeling Using Finite-Element Analysis

5221
Kuang-tao Chiao,Neil Harris,Chris Kyriakakis,
This paper examines methods for digital room correction and loudspeaker equalization as they apply to distributed mode flat-panel loudspeakers. It also presents a method that combines linear predictive coding (LPC) inverse filtering and tunable discrete wavelet transform (DWT) octave-band equalization. Furthermore, it discusses frequency response and time-domain smearing/spread issues and considerations for real-time implementation.
A New Approach to Speaker/Room Equalization

5222
Mitsukazu Kuze ,Kazue Satoh,
The authors have developed a piezoelectric super tweeter which can very flatly reproduce the frequency range from 10 kHz to 100 kHz. It is suitable for the next generation of high-fidelity audio products, such as DVD-Audio.
Development of a Piezo-Electric Super Tweeter Suitable for DVD-Audio

5223
Daniel M. Warren,
A method has been developed by which differential-algebraic equations governing nonlinear transducer systems can be formed. The technique is particularly well suited for symbolic math engines, allowing a high degree of automation. Transducers are represented as networks of nonlinear elements; the equations formed are consistent with the nonlinear constitutive laws of the elements and the constraints implied by their networking. The governing equations can be solved using freely available software, although the solver must be provided with consistent initial conditions. This technique allows rapid formulation of the governing equations and simulation of prototype transducer designs.
Differential-Algebraic Equations Governing Nonlinear Transducer Networks

5224
William L. Martens,Nick Zacharov,
Multidimensional perceptual unfolding was executed for a set of spatially processed speech samples reproduced via headphones. This paper describes only the first stage of a two-part study employing the analytic technique termed external unfolding. In this first stage, global dissimilarity ratings were made for all pair-wise comparisons of the experimental stimuli under each of four listening conditions, which included broadband or band-limited speech samples presented either simultaneously or sequentially. These four data sets were analyzed independently using INDSCAL (INdividual Differences SCALing), a method for processing inter-stimulus dissimilarity data, which has specific advantages over classical MultiDimensional Scaling (MDS) analysis. First, it is designed to characterize quantitatively the individual differences in responses obtained from a group of experimental subjects. Second, the spatial configuration of points derived for the experimental stimuli, termed the stimulus space, has an inherently unique orientation that has none of the ambiguity that makes the interpretation of classical MDS results problematic. The stimulus space derived in the first stage of this study is one of two inputs required for external unfolding. It is combined with discrete attribute ratings collected in the second stage of this study to reveal the principal perceptual attributes of the set of spatially processed speech samples and their relative salience under various conditions.
Multidimensional Perceptual Unfolding of Spatially Processed Speech I: Deriving Stimulus Space Using INDSCAL

5225
Russell Mason,Natanya Ford,Francis Rumsey,Bart de Bruyn,
Current research into spatial audio has shown an increasing interest in the way subjective attributes of reproduced sound are elicited from listeners. The emphasis at present is on verbal semantics; however, studies suggest that nonverbal methods of elicitation could be beneficial. Research into the relative merits of these methods has shown that nonverbal responses may result in different elicited attributes compared with verbal techniques. Non-verbal responses may be closer to the perception of the stimuli than the verbal interpretation of this perception. There is evidence that drawing is not as accurate as other nonverbal methods of elicitation when it comes to reporting the localization of auditory images. However, the advantage of drawing is its ability to describe the whole auditory space rather than a single dimension.
Verbal and Non-Verbal Elicitation Techniques in the Subjective Assessment of Spatial Sound Reproduction

5226
Sheila Flanagan,Brian C. J. Moore,
Perceived timbre depends strongly on spectral shape. The authors compared spectral shape discrimination for a distributed mode loudspeaker (DML) and a cone loudspeaker. Subjects were asked to distinguish between a tone complex with a flat spectral envelope and a tone complex with a ripple in its spectral envelope for each type of loudspeaker. The ripple depth was varied to determine threshold. Performance was significantly better for the DML than for the cone loudspeaker.
The Influence of Loudspeaker Type on Timbre Perception

5227
Panayiotis G. Georgiou,Athanasios Mouchtaris,Stergios I. Roumeliotis,Chris Kyriakakis,
This paper describes the underlying concepts behind the spatial sound renderer built at the University of Southern California's Immersive Audio Laboratory. In creating this sound rendering system, the authors were faced with three main challenges: first, the rendering of sound using the head-related transfer functions; second, the cancellation of the crosstalk terms; and third, the localization of the listener's ears. To deal with the spatial rendering sound, a two-layer method of modeling the HRTFs was used. The first layer accurately reproduced the ITDs and IADs, and the second layer reproduced the spectral characteristics of the HRTFs. A novel method for generating the required crosstalk cancellation filters as the listener moves was developed based on low-rank modeling. Using Karhunen-Loeve expansion, the authors can interpolate among listener positions from a small number of HRTF measurements. Finally, a head detection algorithm for tracking the location of the listener's ears in real time using a laser scanner is presented.
Immersive Sound Rendering Using Laser-Based Tracking

5228
David J. M. Robinson ,Malcolm O. J. Hawksford,
Nonlinearity in the human ear can cause audible distortion not present in the original signal. Such distortion is generated within the ear by intermodulation of a spectral complex, itself containing possible masked components. When psychoacoustic codecs remove these supposedly masked components, the in-ear generated distortion is also removed, and so our listening experience is modified. In this paper the in-ear distortion is quantified, and a method is suggested for predicting the in-ear distortion arising from an audio signal. The potential performance gains because of incorporating this knowledge into an audio codec are assessed.
Psychoacoustic Models and Non-Linear Human Hearing

5229
Renato S. Pellegrini,
A new rendering algorithm is introduced, which allows modeling a given room parameterized by a set of perceptual parameters. The processing cost and memory requirements are minimized; and the system is capable of reproducing a large number of sound sources and independently processing many different listening positions. Rather than independently reproducing a large number of reflections (as in mirror-image rendering or ray tracing), sets of reflections are combined in a simple statistical representation of direction of incidence, diffuseness, absorption, etc. For all perceptual parameters, a statistical representation is defined, which can be easily used to reproduce impulse responses for any number of reproduction channels from 2 to nth. For a high number of reproduction channels, wave-field synthesis techniques can be used to reproduce a complete sound field, rather than a sweet spot-based perception for one listening position.
Perception-Based Room Rendering for Auditory Scenes

5230
Henryk Lopacz ,Piotr Kleczkowski,
The energetic impulse response of a modeled room was processed with the discrete wavelet transform in order to obtain the approximation of the room impulse response. The listening tests comparing the audible effects of both the measured and synthesized impulse responses showed good-quality synthesis was achieved with this method. Further improvement was achieved by a phase randomization procedure.
Synthesis of Room Impulse Response Based on the Discrete Wavelet Transform

5231
Hrvoje Domitrovic ,Sinisa Fajt,Ivan Stamac,
Since the acoustics of two of the three halls of Slovenian National Theatre in Maribor proved to be inadequate, its management asked for an expert analysis of the existing acoustical status. The objective part of the analysis comprised investigations of useful and detrimental reflections, reverberation time curves, direct-to-reflected energy ratios, clarity values, and ALcons and RASTI values. Subjective investigation included syllabic intelligibility tests and qualified listeners' evaluation. Subsequent analysis gave the designer all the necessary parameters for reconstruction.
Compound of Objective and Subjective Investigation Aimed at Acoustical Amelioration of a Playhouse

5232
Eric Benjamin ,Benjamin Gannon,
Subwoofers are used as a part of both two-channel and multichannel domestic sound reproduction systems, both with and without bass redirection. Room effects and a lack of convenient means for setting the subwoofer level make system setup difficult. The objective and subjective effects on subwoofer performance are evaluated. The room acoustic transfer function was evaluated for eight rooms, and the amount of low-frequency acoustic gain and loudspeaker interaction was measured. Subjective experiments were conducted to assess the effect of low-frequency room acoustics by applying the measured room impulse response to program material reproduced over headphones. Subwoofer level calibration using both program material and test signals was studied, and specific recommendations were made.
The Effect of Room Acoustics on Subwoofer Performance and Level Setting

5233
Zihou Meng,Kimihiro Sakagami,Masayuki Morimoto,Guoan Bi,
To estimate the predictability of the observed data in the prediction of a room impulse response, the concept of concentrated space is introduced. The predictability is discussed based on the dimension analysis of the concentrated space and estimated for the optimum prediction method using the prolate spheroidal functions. The theoretic result is compared with the result of a psychological acoustic experiment designed to evaluate the extrapolation methods with different bandwidth and different length of observed data.
Predictability of a Room Impulse Response

5234
Wolfgang Teuber ,Ernst-Joachim Voelker,
Sound sources produce both air-borne and structure-borne sound, which are transmitted via building constructions. Acoustical measures are such that the sound pressure level of structure-borne sound is reduced by introducing floating constructions on springs. For different applications, permitted levels of structure-borne sound are known and have been published. They are, for instance, related to the proper function of sensitive machinery, computer or sensitive instruments. In the same way, permitted noise values of sound, which are irradiated from the studio surfaces, must be considered. Some acoustical measures for floating floors, heavy construction with low-resonant frequencies, and sound insulating boxes are explained. Level differences as a result of different acoustical measures are involved.
Measures to Avoid the Transmissions of Structure-Borne Sound: Sound Sources Next to Studios

5235
Tom Langlotz,Ernst-Joachim Voelker,
Many permitted noise levels for studios are well known and in practical use for different types of studios and control rooms. They are strict and difficult to reach. Disturbing noise originates from adjacent studios or outside the building. Acoustical measures are related to building constructions and single elements such as windows, doors or suspended ceilings. The different sound paths can be described to evaluate their contribution to the overall sound insulation. Double wall and double floor constructions are important for damping the sound radiation from the surfaces. In this paper constructions are described and measurement results are included. Building defects usually lead to disadvantages for studio use. Restrictions are inevitable. The costs for repairs blast the expectations. Therefore, very careful craftsmanship is required. It should be emphasized that these studio constructions are used in many other situations as well, e.g., theaters, music schools or high-quality apartments.
Double Wall and Double Floor Constructions for Obtaining the Permitted Noise Levels in Studios

5236
Ernst-Joachim Voelker,Wolfgang Teuber,
For microphones both the sensitivity in mV/Pa and the noise level, e.g. in dB(A), are defined. They are part of the product specification. The minimum noise level is like an unavoidable noise floor, which remains when high amplification is required. The frequency response and the weighted noise level can be measured and compared with the permitted noise levels for studios, which are generally required in order to allow high-quality recordings and transmission. The question arises whether the studios are good enough for these recordings, e.g., for microphones at a large distance to the sound source, as it applies normally for drama studios or special music recordings. The signal-to-noise ratio of digital transmission allows lower levels to be recorded and reproduced in control rooms. With special microphones the sound of lower levels, e.g., 1 dB(A), can be measured. Noise rating curves for 15 to 20 dB(A) indicate that a much higher noise level is permitted in many cases. In this paper a rating curve is proposed for microphones in different studios in order to define the noise levels for both microphones and studios.
Noise Levels of Microphones for High-Quality Recordings-Are Our Studios Good Enough?-

5237
Masamichi Otani,Toshio Wakatsuki,Mikihiko Okamoto,Mitsuo Kubo,
CR-300 is a unique studio for sound effects recording (Foley Recording Studio) at NHK (Japan Broadcasting Corporation), which has a recording studio floor, two control rooms, and a preproduction room. This paper presents an outline of the architectural acoustic design and the acoustical properties of the studio floor, which is utilized for recording various sound effects used on all the media of NHK; and the control room, which is suitable for multichannel recording.
Architectural Acoustic Design of a Sound Effect Studio for Multi-Channel Recording

5238
Dai Yang,Hongmei Ai,Chris Kyriakakis,C.C. Jay Kuo,
A new high-quality multichannel audio compression algorithm based on the MPEG-2 Advanced Audio Coding (AAC) is presented in this paper. The Karhunen-Loeve Transform (KLT) is used on multichannel audio signals in the preprocessing stage to remove the interchannel redundancy. Then signals in the decorrelated channels are compressed using a modified AAC main profile encoder. Experimental results show that compared to the original AAC, the proposed algorithm achieves better performance with the objective mask-to-noise-ratio (MNR) measurement while maintaining similar computational complexity.
An Inter-Channel Redundancy Removal Approach for High-Quality Multichannel Audio Compression

5239
Jerry Bauck,
A compatibility problem exists when two-channel program material is played over loudspeaker layouts with more than two front loudspeakers. Two-channel music recordings, when used either as cinema sound tracks or played on a typical home theater setup, are examples, as are older quadraphonic and related recordings played over a 5.1 layout. The theory of generalized head-related stereophony of Cooper and Bauck applies to this problem, indicating an infinite number of solutions. A few solutions are examined in detail, resulting in several remedial network specifications. These solutions are incorporated into a simulation and a signal-based error function, essentially a plot of the sweet spot is plotted for each and compared to errors which arise from conventional methods. Under several types of input signals, the new processor exhibits substantially better performance than conventional methods, yet it is simple and economical to implement.
Conversion of Two-Channel Stereo for Presentation by Three Frontal Loudspeakers

5240
Jerry Bauck,
The problem of properly equalizing the centered phantom image (e.g., centered vocalist or virtual center loudspeaker in a home theater) is examined in detail. The strong dependence of the correct EQ on loudspeaker spacing is emphasized. Applying the theory of head-related stereophony, correct EQ curves are derived for several common loudspeaker spacings. Complete results are presented as magnitude responses of second- and fourth-order analog and digital filters as well as poles and zeros, making possible cookbook implementations. A surface plot shows magnitude responses for correct EQ for loudspeaker spacings from ±90° to 0°. Results are also presented for the situation in which the correct EQ has been applied for a particular loudspeaker spacing, but the loudspeakers are at other spacings. Filters realizing the inverses of these results will correct the mistake. A modest standardization effort will allow manufacturers to build user-adjustable correction filters into their equipment.
Equalization for Central Phantom Images and Dependence on Loudspeaker Spacing: Reformatting from Three Loudspeakers to Two Loudspeakers

5241
Nick Zacharov,Søren Bech,
In an effort to establish a better understanding of multichannel level alignment, a correlation and regression analysis has been performed on objective metrics and subjective data collected and subsequently reported in Parts I-III and V. This study considers subjective data acquired with nine test signals with different setups considering loudspeaker placement, directivity, reproduction bandwidth, and absolute reproduction level. Objective metrics considered include linear and weighted SPL and several loudness metrics. A constant specific loudness signal with a high-pass filter at 500 Hz was found to provide the best prediction of subjective level alignment under a wide range of practical usage situations with a wide range of metrics.
Multichannel Level Alignment, Part IV: The Correlation between Physical Measures and Subjective Level Calibration

5242
Itai M. Neoran,
Mixing monophonic and stereophonic sound sources into multichannel surround is traditionally done using divergence-based X-Y controls or by grouping stereo panners. An enhanced method, based on surround phantom positioning, is proposed, presenting a natural generalization of traditional stereo mixing techniques. Using matrix manipulations, each stereophonic source is rotated in the surround coordinate system to an arbitrary direction within 360 degrees, its stereophonic width is stretched to any desired stage width from 0 to 360 degrees, and its distance from the listener is set using a distance pan-pot, based on a room simulation. The proposed method is also compatible with surround sound inputs. The method has been implemented in real-time DSP, with applications in both audio production and consumer audio.
Surround Sound Mixing Using Rotation, Stereo Width, and Distance Pan Pots

5243
Thomas Lund,
Power panning relies on the creation of phantom images. Outside a narrow listening sweet spot, there can be only as many consistent directions obtained as the number of loudspeaker channels, i.e., positions not depending on phantom images. Source localization in the 5.1 format may be enhanced if panning is not only based on level but also involves room rendering utilizing multidirectional reflections. As an added benefit the listening sweet spot can be widened, thereby giving the whole 5.1 idea more advantages over traditional two-channel reproduction. The paper discusses experiences from film and music production incorporating room rendering into the panning, especially how the image stabilization with integrated room rendering can be used as a tool to involve the listener in ways not possible before.
Enhanced Localization in 5.1 Production

5244
Charlie Fox,
The paper outlines a facet of ongoing research into the development of acquisition techniques for surround sound location recording. Specifically, the research for this presentation entailed experimentation with omnidirectional microphones in various array configurations (A/B/C/+) in order to discover field recording methods that may produce solid, authentic surround sound imaging through a standard 5.1 loudspeaker/monitor arrangement. Examples of recordings and the preliminary results of blind listening tests will be included in the presentation.
Investigating the Potential of Omnidirectional Mic Arrays in the Reproduction of Surround Sound

5245
Setsu Komiyama,Hiroyuki Okubo,Kazuho Ono,Koichiro Hiyama,
This paper describes an interactive multichannel audio system linked to the Virtual Reality Modeling Language (VRML). In this system dry sources, source positions, and simulated room impulse responses are transmitted together with a room shape composed with VRML. The multichannel stereo sound is resynthesized at the receiver, and the graphics of the room are displayed on the screen. The reproduced sound is automatically changed in real time as the listener moves his viewpoint in the VRML graphics. The required number of channels and the method of resynthesizing are discussed in detail. Synchronization of picture and multichannel sound effectively increases the sensation of reality.
Interactive Multichannel Sound Reproduction Linked with VRML Graphics

5246
Rob Laubscher ,Bob Moses,Richard Foss,
With the advent of the IEEE 1394 standard, many audio device manufacturers have anticipated its employment within audio production systems, with the hope that future audio production systems will utilize this single connection type for the transmission of all audio and control data. This paper outlines extensive work that has been performed on the design and implementation of 1394 audio production components and, in the process, describes the resolution of the problems that arise when audio and audio device control data pass across 1394.
A 1394-Based Architecture for Professional Audio Production

5247
Stephen R. Macatee ,Devin Cook,
Networked audio systems are the present and future of the professional audio industry. Use of Microsoft ActiveX controls to control multiple CobraNet devices from one or more PC locations over an Ethernet network are discussed. Because of the professional audio industry's understandably conservative transition into such a future, a means to also incorporate current, familiar, and trusted equipment into networked systems is a worthy endeavor. This is necessary until networked equipment is more prevalent, understood, and trusted. A means to control such legacy equipment and to incorporate tools, such as relays, switches and indicators, into networks is also illustrated using ActiveX over standard CobraNet or any Ethernet network. The biggest advantage to the ActiveX approach is its ability to control many manufacturers' devices from a single user interface while needing no cooperation between manufacturers.
Controlling Audio Systems with ActiveX Controls over CobraNet and Other Ethernet-Based Networks

5248
Stephen H. Lampen,
This paper is an analysis of premise/data cables, generally known as Category 5, and their potential use for carrying analog and digital audio signals.
Transporting Audio Signals on Category 5 UTP

5249
Brent Karley ,Teddy Chen,Jayant Datta,
The design and implementation of a PC-based graphic user interface and control engine for an audio processing system is presented. Design considerations are discussed, and a breakdown between PC-based control tasks and real-time signal processing tasks on a dedicated audio processor is provided. The PC controller provides an innovative means to control, memory manage, and download software modules on the audio processor through a real-time nonintrusive interface. There are several possible applications for this PC-based controller, such as in a stand-alone PC-based audio system, an audio processor development tool, and a design reference and documentation tool.
A PC-Based Graphic User Interface and Control Engine for an Audio Processing System

5250
Natalie Packham,Frank Kurth,
This paper introduces a technique to embed context-based information into an audio signal and to extract this information from the signal during playback. Context-based information can be any kind of data related to the audio signal, such as the lyrics of a song or the score of a piece of music. A typical application will display the extracted information. Techniques from lossy audio compression, especially MPEG-1 Layer II, are used to ensure inaudibility of the noise introduced into the audio signal by the embedded data. Another aspect of the work deals with synchronized processing of the context-based information. Applications for embedding text information and for extracting this information during playback have been implemented and are presented in this paper. A variety of tests have been conducted to study the embedding capacity of various audio signals and to verify inaudibility of the embedded information.
Transport of Context-Based Information in Digital Audio Data

5251
Søren H. Nielsen,Thomas Lund,
A sine tone at 0 dB FS is often believed to be the maximum level obtainable from a digital medium. Therefore, it is typically the maximum-level digital filters and analog circuitry in consumer equipment that are aimed at reproducing. As the authors have shown in previous papers, intersample peaks may be considerably higher than 0 dB FS. This paper examines the sonic consequences when 0 dB FS + signals are reproduced in typical consumer equipment. The performance of a variety of domestic CD players exposed to such signals are presented and evaluated.
0 dB FS + Levels in Digital Mastering

5252
Marc Ihle,Kristian Kroschel,Rainer Riedlinger,
This paper describes a new noise suppression algorithm using a small microphone array. It can be used to generate noise suppressed speech signals, e.g., in conference systems, hands-free mobile phones, and man-machine interfaces. The microphone array comprises three omnidirectional microphones, located very close to each other. Using digital signal processing, the directivity of the array is controlled to generate a noisy signal in which the speech signal is suppressed. This signal is then used as a noise reference for a noise reduction algorithm based on the well-known spectral subtraction (spectral weighting) method. As a result, a significant noise suppression is achieved, whereas musical noise and other distortions are kept to a minimum.
A Novel Noise Suppression Algorithm Using a Very Small Microphone Array

5253
Flemming Christensen,Clemen Boje Jensen,Henrik Møller,
An artificial head has been designed. The design is based on acoustical measurements on real people. The design method includes a specially developed measurement technique that separates the pinna from the body and head, thus making it possible to adjust separately the shape of the body, head, and pinna. This approach has led to a design for which the head-related transfer functions (HRTFs) are representative of a typical human subject for all directions. Furthermore, the measurements reveal which parts of an HRTF are formed by the body, head, and pinna, respectively.
The Design of VALDEMAR-An Artificial Head for Binaural Recording Purposes

5254
Jong-Won Seok,Jin-Woo Hong,
This paper presents a novel watermarking algorithm using prediction-based watermarking embedding and detection as a solution for protecting unauthorized copying of digital content. Unlike the direct sequence spread-spectrum-based watermarking methods, the delayed version of a modified original audio signal is used, which is obtained from the linear prediction analysis as a watermark signal. Experimental results show that the authors' watermarking scheme is robust against common signal processing attacks, such as mixdown, amplitude compression, time-scale modification, data compression, etc.
Prediction-Based Audio Watermark Detection Algorithm

5255
Francois Pachet,Olivier Delerue,
This paper proposes a system for performing on-the-fly mixing of audio sources to produce spatialized sound. It introduces a constraint system linked to a graphical user interface representing the sound sources and connected to a spatializer. The constraint system expresses various sorts of properties on the configuration of sound sources. When the user moves one source through the interface, the constraint system is activated and attempts to enforce the constraints that may have been violated. The resulting system allows mixing on the fly, while letting the user choose between several listening viewpoints in a coherent manner. The design of the system and an audio implementation using Microsoft DirectX API are reported.
On-the-Fly Multi-Track Mixing

5256
Jürgen Herre,Michael Schug,
Perceptual audio coding has become a widely used technique for economic transmission and storage of high-quality audio signals. For the reproduction of compressed audio, the bitstream is expanded into a standard audio format, such as linear PCM, by the decoding algorithm. This paper investigates the questions: Is it possible to recover the compression parameters from an analysis of the decoded audio signal? Can a decoded audio signal be translated back into its bitstream representation (the "inverse decoding" problem)? An algorithmic approach to these questions is presented and detailed based on the MPEG-2/4 AAC coder. Several interesting applications for such techniques are discussed, including identification of the used coding algorithm and tandem-resistant re-encoding of audio signals.
Analysis of Decompressed Audio-The "Inverse Decoder"

5257
Frank Kurth,Viktor Hassenrik,
This paper presents an audio codec for multiple, i.e. cascaded, lossy compression without degeneration of perceptual quality. The coding scheme is based on a previously reported psychoacoustic embedding strategy, which allows transfer of encoding information to all subsequent codecs in a cascade. A new dynamic embedding scheme has been developed, which allows an optimal exploitation of a signal's perceptual embedding capacity. As opposed to previous methods, the authors' approach allows one to overcome embedding capacity problems and possible signal distortions. Moreover, it allows the realization of a new embedding strategy, which guarantees the perfect reconstruction of the first-generation signal after an arbitrary number of encoding and decoding steps. In the presented MPEG-1-based codec implementations, the dynamic embedding scheme was realized by a specific bit reservoir technique.
A Dynamic Embedding Codec for Multiple Generations Compression

5258
Leandro de C. T. Gomes,Madeleine Bonnet,Nicolas Moreau,
Audio watermarking techniques generally include embedding a noiselike sequence (the watermark) into a signal. This sequence must remain secret, since its knowledge allows the erasure of the watermark. The detection is performed through a correlation measure between the watermark and the possibly jammed watermarked signal. Only the signal owner is able to carry out the detection, which is private. This paper presents a novel approach that uses a cyclostationary sequence as the watermark. While still privately detectable through a correlation measure, the watermark may also be publicly detected by exploiting its property of cyclostationarity. The suitable choice of cyclostationary sequences enables one to hide both private and public data in the signal.
Cyclostationarity-Based Audio Watermarking with Private and Public Hidden Data

5259
Zoran Fejzo,Stephen Smyth,Keith McDowell,Yu-Li You,
The DTS core-plus extension multichannel audio coding methodology permits established audio coding systems to operate at higher bit rates and/or higher sampling frequencies while remaining compatible with first-generation decoders (legacy decoders). While popular algorithms such as DTS Coherent Acoustics, Dolby AC-3 and MPEG II can all benefit from our process, the paper describes in detail its specific application to the DTS subband ADPCM coding algorithm. A compatible bitstream is made up of "core" data, representing 48-kHz encoded audio signals; and "extension" data, representing differential audio data and/or higher frequency components. Existing core-only decoders make use of the core data to reproduce 48-kHz sampled audio. New 96-kHz decoders make use of both core and extension data to reproduce 96-kHz audio. The paper also describes the efficient real-time implementations of the 5.1-channel, core+extension decoder on a single SHARC 21065L 32-bit floating-point processor and the encoder on a Pentium III processor.
Backward Compatible Enhancement of DTS Multi-Channel Audio Coding That Delivers 96-kHz/24-Bit Audio Quality

5260
Antony W. Rix,Michael P. Hollier,John G. Beerends,Andries P. Hekstra,
This paper describes a new model for perceptual evaluation of speech quality (PESQ). This model is based on the integration of the perceptual speech quality measure (PSQM99) and the perceptual analysis measurement system (PAMS). PESQ is currently draft ITU-T Recommendation P.862 and is expected to replace P.861. PESQ provides a new international standard for objective assessment of speech codecs and end-to-end measurement of telephone networks.
PESQ-The New ITU Standard for End-to-End Speech Quality Assessment

5261
Wolfgang Klippel,
Nonlinear and thermal mechanisms in woofers, headphones, shakers, and other actuators produce signal distortion and limit the acoustic output at high amplitudes. Large signal parameters based on extended transducer modeling and measured by system identification techniques reveal the physical causes, allow an objective assessment of the performance, and give indications for practical improvements. Typical problems are discussed in a case study based on a set of drivers intended for woofer application.
Diagnosis and Remedy of Nonlinearities in Electrodynamical Transducers

5262
D. Preis,R. Gregg,
A quantitative comparison is given between the signal-to-distortion ratio (SDR) for wide-band operation of slightly nonlinear electronic circuits or systems and that derived from conventional total harmonic distortion (THD) for single-frequency excitation at the equivalent input-power level. Simulated or measured SDRs are presented graphically for three different, commonly encountered nonlinearities (soft compression, hard clipping and slew-rate limiting), each operated over a range of successively higher input-power levels. The results indicate that for slight to moderate nonlinear operation, the wide-band SDR based on coherence analysis of white-noise excitation can be from 2 to 20 dB worse than that predicted from the THD produced by a single sine wave.
Coherence-Based, Wide-Band, Signal-to-Distortion Ratio versus Total Harmonic Distortion of Slightly Non-Linear Audio Systems

5263
Henrik Staffeldt,
In a previous paper, a new method was presented that allowed a detailed prediction of array sound pressure fields when only low-resolution one-third-octave, 5o polar data was available for the individual loudspeakers in the array. The present paper investigates the accuracy of such predictions by comparing predicted and measured on- and off-axis frequency responses and polar patterns of a number of different array configurations. The predictions are based on complex summation of the sound pressure contributions from the individual loudspeakers in the array under the assumption that interaction effects, such as reflections, diffraction and shadowing among the loudspeakers in the array, are negligible. The way interaction effects can modify the array sound field and reduce the accuracy of the predictions is also discussed in the paper.
The Accuracy of Loudspeaker Array Sound Field Predictions Using Low-Resolution 1/3-Octave, 5° Polar Data

5264
Bjarke P. Bovbjerg,Flemming Christensen,Pauli Minnaar,Xiaoping Chen,
Head-related transfer functions were measured on the artificial head VALDEMAR for nearly 12 000 directions on the sphere around the head. The design and the precision of the setup are described, and the results are presented. Because of the high-directional resolution, contributions from different parts of the artificial head to the head-related transfer functions were revealed.
Measuring the Head-Related Transfer Functions of an Artificial Head with a High-Directional Resolution

5265
Marshall Buck,
A shaped sine burst wavelet is very effective for revealing audible loudspeaker distortion. The same stimulus can be used to measure both frequency response and distortion and lends itself to gated operation. A wavelet can be designed with a flat-top spectrum one-half octave wide using an IFFT and windowing. A lower crest factor wavelet with attractive qualities can be constructed with multiple synchronous sine waves and used in a measurement system particularly suitable for quality control testing. The synchronous wavelet has a flat spectrum over one-quarter to one-half octave, and is zero outside that region, leaving a wide dynamic range for distortion components to be detected. Comparisons with standard swept sine measurements are presented.
Testing Loudspeakers with Wavelets

5266
Kazue Satoh,Toshikazu Chiba,
In order to construct an optimum multichannel sound reproduction system, it is important to measure and evaluate correctly the sound field reproduced by each loudspeaker. The developed sound field measurement system performs highly precise sound analysis with an easy setup. It comprises seven microphones installed in the vertex of tetrad structure, a microphone amplifier, and a PCMCIA card that collects and feeds data into a compact PC. The measurement is carried out by reproducing a DVD or CD containing the test signal. The authors believe that this system will play an important role in the development of audio systems incorporating DVD-Audio players.
The Development of a Car Sound Field Measurement System Using Compact PC

5267
Jóse Javier López,Alberto González,Grao Gandia,Felipe Orduña,
The performance of two-channel transaural sound reproduction systems has been assessed with respect to the form and size of the crosstalk cancellation zones for different loudspeaker arrangements. Channel separation has been measured for small displacements of a dummy head in the horizontal plane around the point of optimal crosstalk cancellation. Behavior of the system in different frequency bands and for loudspeakers arrangements with different angular separations has also been considered. The experimental study has been compared with computer simulations based on models of the transaural sound reproduction system, with and without consideration of the acoustic scattering produced by the human head. Experimental results point toward the importance of including the acoustic scattering of the human head in computer simulations and that it is a very important factor in the design of transaural systems. This work has also provided some useful guidelines for practical implementations of 3-D sound reproduction systems.
Modeling and Measurement of Cross-Talk Cancellation Zones for Small Displacements of the Listener in Transaural Sound Reproduction with Different Loudspeaker Arrangements

5268
Menno van der Veen,Francisco de Leon,Brian Gladstone,Valeriu Tatu,
This paper proposes a new measurement standardization procedure to measure and quantify the acoustic noise produced by power transformers in audio and video equipment under normal and adverse mains conditions. These conditions and the physical mechanisms that cause the noise are discussed. A new measurement setup and units are proposed to measure and quantify the transformer noise. Examples of measurements of four transformers are given for comparison and illustration.
Measuring Acoustic Noise Emitted by Power Transformers

5269
Chris Woolf,Oliver Prudden,
The turbulent and variable character of wind, and its silent generation, makes it difficult to reproduce in the laboratory. A few standard wind machines have been built, but they are large, expensive, and inaccessible to many designers. This paper describes a simple technique to measure the effectiveness of microphone windshields without the need for such a device. It is based on real-time comparison methods and real wind. The technique has given statistically reliable and useful results. The paper gives an example of the performance measurement for a new product.
Windnoise Measurement Using Real Wind

5270
Bernhard Grill,Stefan Geyersberger,Johannes Hilpert,Bodo Teichmann,
The specification of MPEG-4 Audio, the next-generation audio coding technology, has been finalized by ISO/IEC. This paper investigates the implementation options of the new standard on various platforms. Special attention has been given to the new low-delay AAC coder variant and the MPEG-4 scalable profile as well as to several transport options for MPEG-4 Audio content.
Implementation of MPEG-4 Audio Components on Various Platforms

5271
Ralph Sperschneider,
This paper describes an error resilient source coding approach for variable length codes. The presented algorithm has been designed to make the Huffman coded spectral data of MPEG-2/4 Advanced Audio Coding (AAC) more resilient against transmission errors. It has become part of ISO/IEC 14496-3/AMD 1 (MPEG-4 Audio Version 2). With this approach, the perceptual audio quality can be significantly increased in the presence of adverse channel conditions as they appear in the case of wireless transmission applications. New subjective test results are presented based on the MUSHRA test method.
Error Resilient Source Coding with Variable Length Codes and Its Application to MPEG Advanced Audio Coding

5272
Kelvin H. C. Eng,Say Wei Foo,Dong-Yan Huang,
An analysis-by-synthesis stage and additional noiseless compression (Huffman coding) tools are employed to implement the quantization and coding process in a perceptual audio transform coding system. The objective of the process is to keep the quantization noise below the masked threshold, given an available number of bits. An iterative method with two nested loops realizes the control of distortion and bit rate for the process. Preliminary analysis indicates that in some instances more than the required amount of bits, estimated by the perceptual entropy (PE), is allocated to subbands when these additional bits could be re-allocated to regions of a higher need. Moreover, quantization at a fixed bit rate neglected the bits not used in the coding process. A proposed third iteration loop addresses these two inadequacies by dynamically allocating bits according to the PE. Results indicate that there are improvements in the audio quality, particularly at low bit rates, but at a higher computational cost. Moreover with the proposed method, perceptually transparent coding is achieved at an estimated bit rate of 1.797 bits/sample-a significant improvement over past estimates of 2.1 bits/sample.
Dynamic Allocation of Bits Based on Perceptual Entropy in Perceptual Audio Coding Systems

5273
Ashish Aggarwal,Shankar L. Regunathan,
An estimation-theoretic framework is proposed for prediction in scalable coding of stereophonic audio, which overcomes fundamental drawbacks of conventional prediction approaches. The framework offers the means to combine all the information available at the enhancement layer to produce the optimal signal estimate. The method efficiently exploits both interchannel and intrachannel redundancy without incurring additional side information. The proposed estimation-theoretic prediction is implemented within an MPEG-4-based scalable coder for stereophonic audio signals. Objective and subjective performance comparison on the MPEG-4 SQAM database shows that optimal prediction yields substantial bit-rate reduction while maintaining the same reproduction quality. The improvement is achieved at the cost of a modest increase in computational complexity.
Optimal Prediction in Scalable Coding of Stereophonic Audio

5274
Bernd Edler,Christof Faller,Gerald Schuller,
Recently a new concept for perceptual audio coding was presented, which is based on a prefilter in the encoder and a corresponding postfilter in the decoder, both controlled by a psychoacoustic model. It enables individual selection of spectral and temporal resolutions for irrelevancy reduction and redundancy reduction. This paper addresses problems related to the efficient transmission of the filter parameters and presents techniques for efficient temporal and spectral modeling of masked thresholds using linear time-varying filters.
Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter

5275
Matthew A. Watson,Michael Truman,
This paper examines the parameters involved in determining the achievable performance for lossless audio coding techniques. Entropy theory can be difficult to apply to a practical system. Therefore, many compromises are made for practical implementation because of complexity and memory constraints. In order to develop an efficient implementation, it is essential to have a metric that takes all these factors into consideration. The authors propose a metric and show how it can be applied to several different coding systems.
Analyzing the Performance of Lossless Coding Techniques Used in Audio Coders

5276
Roger Shively,
The automotive space is a harsh and challenging listening environment. This paper covers the various spectral and spatial aspects of the vehicle environment and how system designers approach the challenges provided by the environment, including amplifier and loudspeaker design, loudspeaker placement, and system tuning and control.
Automotive Audio Design (A Tutorial)

5277
Alberto Bellini,Angelo Farina,Carlo Morandi,Emanuele Ugolotti,
This paper summarizes the results of the ESPRIT project APLODSP. The goal of this application experiment has been to develop adequate models for the simulation of the nonlinear behavior of loudspeakers and to design a dedicated audio processor to reduce sound distortion. This involves the definition of a systematic design flow for antidistortion audio processors and the effective exploitation of CAD tools for the automatic implementation of the defined algorithms. The audio processor was implemented with a DSP using state-of-the-art tools for simulation, validation, and synthesis. DSP is the emerging low-cost technology for audio processing and, in particular, for car audio systems. In fact, car manufacturers are planning to reduce the cables inside the car and to use a single cable to distribute the main signals multiplexed all around the vehicle. This transition to digital audio signal transmission will foster the use of active loudspeakers equipped with dedicated digital audio processors. The audio processor, designed and tested within this ESPRIT project, can be seen as a first step in this direction. It shows how the loudspeaker distortion can be reduced by digital signal processing, and it exploits the versatility of digital designs in order to allow hardware re-use for different car models. It also allows a very fast redesign to fit many different purposes. A semiautomatic design flow for the design of antidistortion audio processors is available, which synthesizes a dedicated audio processor for each loudspeaker model, once suitable loudspeaker parameters have been specified.
APLODSP, Design of Customizable Audio Processors for Loudspeaker System Compensation by DSP

5278
Alberto Bellini,Angelo Farina,Gianfranco Cibelli,Emanuele Ugolotti,
Sound reproduction within a car is a difficult task. Reverberation, reflection, echo, noise, and vibration are some of the issues that account for the difficulty. The first step in the direction of increasing sound comfort is to equalize the acoustic pressure response in the frequency domain. To accomplish this task, the inversion of the measured sound pressure level (SPL) should be performed. The inversion of this system's transfer function is the key point of the equalizing procedure, and many advanced techniques were developed toward this aim. However, they produce a large number of FIR filter taps and thus cannot be implemented in real-time on a low-cost DSP. Frequency warping theory allows one to design a filter especially fitted for low frequencies with a small number of taps. In this paper an automatic tool to develop warped filters for target car cockpits equalization was designed and validated through experiments in commercial cars.
Experimental Validation of Equalizing Filters for Car Cockpits Designed with Warping Techniques

5279
Alan S. Phillips,
Acoustic lever loudspeaker designs with a given driver are discussed. Advantages over current low-frequency loudspeaker system designs are examined. Measured small and large signal performance is compared to computer simulations.
Design of Acoustic Lever Loudspeaker Systems, Part One

5280
Michael Christel,Alexander Hauptmann,Howard Wactlar,
This paper discusses the role of speech recognition in the system and how it is augmented with other technologies.
Improving Access to Digital Video Archives Through Informedia Technology

Back to AES Preprints


(C) 2003, Audio Engineering Society, Inc.