AES 113th Convention
Los Angeles, CA, USA
October 5-8, 2002
AES Preprint Ordering
Single Convention Preprints are available through the AES Preprint Search and Shop facility.
The preference in loudspeaker product design is for small size, while preserving maximum low frequency extension and output volume. If the goal is to create a realistic reproduction of a live event, then certain speaker parameters must be adequately controlled, such as volume displacement, intermodulation distortion, stored energy and off-axis frequency response. Components must be carefully selected for low distortion performance. Parameters like phase linearity and cabinet diffraction are sometimes overrated. Multi-channel speaker setups require propagation delay correction and bass management, if not all speakers cover the full frequency range. These issues are reviewed at the advent of high resolution surround sound. The new technology can only fulfill its promise and expand into more than a niche market, if capable loudspeakers are widely available.
Which Loudspeaker Parameters are Important to Create the Illusion of a Live Performance in the Living Room?
The amplitude response of a loudspeaker system is characterized by a series of spatially averaged measurements. The proposed approach recognizes that the listener hears three acoustical events in a typical domestic environment: the direct sound, the early arrivals and the reverberant sound field. A survey of 15 domestic multi-channel installations was used to determine the typical angle of the direct sound and the early arrivals. The reflected sound that arrives at the listener after encountering only one room boundary is used to approximate the early arrivals, and the total sound power is used to approximate the reverberant sound field. Two unique directivity indices are also defined and the in-room response of the loudspeaker is predicted from anechoic data.
Characterizing the Amplitude Response of Loudspeaker Systems
Gene Czerwinski,Sergei Alexandrov,Alexander Voishvillo,Alexander Terekhov,
Harmonic distortion and THD do not convey sufficient information about nonlinearity in loudspeakers and horn drivers. Multitone stimulus and Gaussian noise produce more informative nonlinear response. Reaction to Gaussian noise can be transformed into coherence or incoherence function. They provide information about nonlinearity in the form of “easy-to-grasp” frequency-dependent curves. Alternatively, the results of multitone measurement are difficult to interpret, compare, and overlay. New method of depicting the results of multitone measurements has been developed. The distortion products are averaged in a moving frequency window. The result of measurement is a single continuous frequency-dependent curve that takes into account the level of distortion products and their “density”. The curves can be easily overlaid and compared. Future development of a new method may lead to correlation between the level of distortion curves and the audibility of nonlinear distortion.
Graphing, Interpretation, and Comparison of the Results of Loudspeaker Nonlinearity Measurement
Ryan J. Mihelich,
The magnetic field in the air gap of a conventional loudspeaker motor is most often an asymmetric nonlinear function of axial position. Placement of the voice-coil into this asymmetrical field yields an asymmetric nonlinear force-factor, Bl, which is a primary cause of amplitude modulation distortion in loudspeakers. Adjustment of the rest position of the voice-coil in this field can alter the nature of this modulation distortion. Common practice is to nominally set the voice-coil at the geometric center of the gap or at the position generating maximum Bl. A time-domain nonlinear simulator has been used to investigate effects of voice-coil placement in an asymmetric flux field on amplitude modulation distortion.
The Effects of Voice-Coil Axial Rest Position on Amplitude Modulation Distortion in Loudspeakers
Horn drivers have the worst nonlinear distortion compared to other components of professional sound systems (omitting free propagation distortion). Some of the driver’s distortions can be mitigated by proper mechanical measures. However, distortions caused by nonlinear air compression and propagation are inherent to any horn driver. In this work the comparison of nonlinear distortions caused by different sources is carried through measurements and modeling. The new dynamic model of compression driver is based on the system of nonlinear differential and algebraic equations. Complex impedance of an arbitrary horn is considered by turning the impedance into a system of differential equations describing the pressure and velocity at the horn’s throat. Comparison is carried out using harmonic distortion and the reaction to multitone.
Nonlinearity in Horn Drivers - Where the Distortion Comes From?
Kazunori Kobayashi,Ken-ichi Furuya,Akitoshi Kataoka,
We propose a beamforming method that is applicable to near sound fields, where a filter-and-sum microphone array which maintains better quality for the target sound than the conventional delay-and-sum array, and describe a real-time implementation that includes steering of the beam to detected talker locations. With the use of a microphone array, our system also reduces levels of noise, to achieve high-quality sound acquisition; furthermore, it allows the talker to be in any position. Computer simulation and experiments show that our method will be effective in teleconferencing systems.
A Talker-Tracking Microphone Array for Teleconferencing Systems
Chris Kyriakakis,Panayiotis G. Georgiou,
In this paper we investigate an alternative to the Gaussian density for modeling signals encountered in audio environments. The observation that sound signals are impulsive in nature, combined with the reverberation e®ects commonly encountered in audio, motivates the use of the Sub-Gaussian density. The new Sub-Gaussian statistical model and the separable solution of its Maximum Likelihood estimator are derived. These are used in an array scenario to demonstrate with both simulations and two di®erent microphone arrays the achievable performance gains. The simulations exhibit the robustness of the sub-Gaussian based method while the real world experiments reveal a signi¯cant performance gain, supporting the claim that the sub-Gaussian model is better suited for sound signals.
An Alternative Model for Sound Signals Encountered in Reverberant Environments; Robust Maximum Likelihood Localization and Parameter Estimation Based on a Sub-Gaussian Model
Hyen-O Oh,Dae-Hee Youn,Jin Woo Hong,Jong Won Seok,
In echo watermarking, effort to improve robustness often conflicts with the requirement of imperceptibility. They have been inherent trade-offs in general audio watermarking techniques. In this paper, we challenge to the development of imperceptible but detectable echo kernels being directly embedded into the high-quality audio signal. Mathematical and perceptual characteristics of echo kernels are analyzed in frequency domain. Finally, we can obtain much flat frequency response in perceptually significant bands by combining closely located positive and negative echoes. The proposed echo makes it possible to improve the robustness of echo watermark without breaking the imperceptibility.
Imperceptible Echo for Robust Audio Watermarking
Christian Neubauer,Frank Siebenhaar,Jurgen Herre,Robert Bauml,
Nowadays, distribution of audio material is not limited to physical media anymore. Instead, distribution via Internet is of increasing importance. In order to attach additional information to the audio content, either for forensic or digital rights management purposes or for annotation purposes, watermarking is a promising technique since it is independent of the audio format and transmission technology. State-of-the-art spread spectrum watermarking systems can offer high robustness against unintentional and intentional signal modifications. However, their data rate is typically comparatively low, often below 100 bit/s. This paper describes the adaptation of a new watermarking scheme called Scalar Costa Scheme (SCS), which is based on dithered quantization of audio signals. In order to fulfill the demands of high quality audio signal processing, modifications of the basic SCS scheme, such as the introduction of a psychoacoustic model and new algorithms to determine quantization intervals are required. Simulation figures and results of a sample implementation, which show the potential of this new watermarking scheme, are presented in this paper along with a short theoretical introduction to the SCS watermarking scheme.
New High Data Rate Audio Watermarking Based on SCS (Scalar Costa Scheme)
Ronald Streicher,Wes Dooley,
Despite being one of the progenitors of all modern microphones and recording techniques, the bidirectional pattern is still not very well understood; its proper and effective use remains somewhat of a mystery to many recording and sound reinforcement engineers. In this paper, the bidirectional microphone will be examined from historical, technical, and operational perspectives. We will review how it developed and exists as a fundamental element of almost all other single-order microphone patterns. In the course of describing how this unique pattern responds to soundwaves arriving from different angles of incidence, we will show that it very often can be successfully employed where other more commonly-used microphones cannot.
The Bidirectional Microphone: A Forgotten Patriarch
Chris Kyriakakis,Athanasios Mouchtaris,Shrikanth S. Narayanan,
Multichannel audio can immerse a group of listeners in a seamless aural environment. However, several issues must be addressed, such as the excessive transmission requirements of multichannel audio, as well as the fact that to-date only a handful of music recordings have been made with multiple channels. Previously, we proposed a system capable of synthesizing the multiple channels of a virtual multichannel recording from a smaller set of reference recordings. In this paper these methods are extended to provide a more general coverage of the problem. The emphasis here is on time-varying filtering techniques that can be used to enhance particular instruments in the recording, which is desired in order to simulate virtual microphones in several locations close and around the sound source.
Gaussian Mixture Model Based Methods for Virtual Microphone Signal Synthesis
Jan Abildgaard Pedersen,Gert Munch,
The directivity of a single loudspeaker driver is controlled by adding an acoustic reflector* to an ordinary driver. The driver radiates upwards and the sound is redistributed by being reflected off the acoustic reflector. The shape of the acoustic reflector is non-trivial and yields an interesting and useful directivity both in the vertical and horizontal plane. 2D FEM simulations and 3D BEM simulations are compared to free field measurements performed on a loudspeaker using the acoustic reflector. The resulting directivity is related to results of previously reported psychoacoustic experiments.
Driver Directivity Control by Sound Redistribution
Mark S. Ureda,
The on-axis pressure response of a vertical line source is known to decrease at 3dB per doubling of distance in the near field and at 6dB in the far field. The present paper shows that the conventional mathematics used to achieve this result understates the distance at which the -3dB to -6dB transition occurs. An examination of the pressure field of a line source reveals that the near field extends to a greater distance at positions laterally displaced from the centerline, normal to the source. The paper introduces the "endpoint" convention for the pressure response and compares the on-axis response of straight and hybrid line sources.
Pressure Response of Line Sources
The narrow vertical pattern achieved by line arrays has prompted much interest in the method for many forms of sound reinforcement in recent years. The live sound segment of the audio community has used horns and compression drivers for sound reinforcement for several decades. To adopt a line array philosophy, to meet the demands of high level sound reinforcement, requires an approach that allows for the creation of a line source from the output of compression drivers. Additionally it is desired that the line array take on different vertical patterns dependant upon use. This requires the solution to allow for the array to be articulated. Outlined in this work is a waveguide/compression driver combination that is compact and simple in approach and highly suited for articulated arrays.
High Frequency Components for High Output Articulated Line Arrays
John Vanderkooy,Paul M. Boers,
Direct-radiator loudspeakers become more efficient as the total magnetic flux is increased, but the accompanying equalization and amplifier modify the gains thus made. We study the combination of an efficient high-Bl driver with several amplifier types, including a highly efficient class-D amplifier. Comparison is made of a typical simulated driver, excited with a few different amplifier types, using various audio signals. The comparison is quite striking as the Bl value of the driver increases, significantly favouring the class-D amplifier.
High-Efficiency Direct-Radiator Loudspeaker Systems
Victor W. Sparrow,Wontak Kim,
Implementing the parametric array for audio applications is examined through numerical modeling and analytical approximation. The analytical solution of the nonlinear wave equation is used to provide guidelines on the design parameters of the parametric array. The solution relates the source size, input pressure level, and the carrier frequency to the audible signal response including the output level, beam width, and length of the interaction region. A time domain finite difference code that accurately solves the KZK nonlinear parabolic wave equation is used to predict the response of the parametric array. The accuracy of the numerical model is established by a simple parametric array experiment. In considering the implementation issues for audio applications of the parametric array, the emphasis is given to the poor frequency response and the harmonic distortion. Signal processing techniques to improve the frequency response and the harmonic distortion are suggested and implemented through the numerical simulation.
Audio Application of the Parametric Array - Demonstration through a Numerical Model
D.B. Jr. Keele,
Conventional CBT arrays require a driver configuration that conforms to either a spherical-cap curved surface or a circular arc. CBT arrays can also be implemented in flat-panel or straight-line array configurations using signal delays and Legendre function shading of the driver amplitudes. Conventional CBT arrays do not require any signal processing except for simple frequency-independent shifts in loudspeaker level. However, the signal processing for the delay-derived CBT configurations, although more complex, is still frequency independent. This is in contrast with conventional constant-beamwidth flat-panel and straight-line designs which require strongly frequency-dependent signal processing. Additionally, the power response roll-off of the delay-derived CBT arrays is one half the roll-off rate of the conventional designs, i.e., 3- or 6-dB/octave (line or flat) for the CBT array versus 6- or 12-dB/octave for the conventional designs.
Implementation of Straight-Line and Flat-Panel Constant Beamwidth Transducer (CBT) Loudspeaker Arrays using Signal Delays
Ricardo A. Garcia,
Design of Sound Synthesis Techniques (SST) is a hard problem. It is usually assumed that it requires human ingenuity to find a suitable solution. Many of the SSTs commonly used are the fruit of experimentation and long refinement processes. An automated approach for design of SSTs is proposed. The problem is stated as a search in the multidimensional SST space. It uses Genetic Programming (GP) to suggest valid functional forms, and standard optimization techniques to fit their internal parameters. A psychoacoustic auditory model is used to compute the perceptual distance between the target and test sounds. The developed AGeSS (Automatic Generator of Sound Synthesizers) system is introduced, and a simple example of the evolved SSTs is shown
Automatic Design of Sound Synthesis Techniques by means of Genetic Programming
Keith A. McMillen,
Advances in technology have afforded listeners an available dynamic range in excess of 120dB. While impressive in proper concert halls and listening rooms, large dynamic ranges are not always realistic for all environments and musical styles. This paper describes a practical multi-band dynamics processor software object that can reside in low cost consumer products and allow the user to adjust dynamic range to fit his or her taste and listening environment.
A Consumer Adjustable Dynamic Range Control System
We present a very simple, effective, and computationally efficient algorithm to reduce the typical hiss amplification (or “breathing”) artifact of dynamics compressors working under high compression ratios. The algorithm works in the time domain, is very easy to implement, has a very low computational cost and requires little program memory, therefore being of special interest for consumer-audio applications.
A Simple, Efficient Algorithm for Reduction of Hiss Amplification Under High Dynamics Compression
Minki Yang,Jong-Hoon Oh,
The paper presents a method to compensate for nonlinear distortion of digital Pulse Width Modulation (PWM) power amplifiers by prefiltering the input signals using artificial neural networks. We first construct a model of the digital amplifier using artificial neural networks. Using this model, the artificial neural network model of a predistortion filter is trained such that the combined system, the digital amplifier and predistortion filter, produces an output, that is linearly proportional to the input. The simulation results show that the artificial neural networks of the predistortion filter can effectively correct the nonlinear distortion of the digital amplifier.
Adaptive Predistortion Filter for Linearization of Digital PWM Power Amplifier using Neural Networks
Dragana A. Sagovnovic,
Contemporary means of communication (e.g. mobile telephony) have brought new limitations that telecom operators have to take into consideration. One of them is the fact that types of the deterioration of speech quality, perceived in mobile telephony, are different from the degradations noted in fixed telephony. During the process of estimating the quality of speech transmission, in this paper are used the comparative (objective) methods for a formed database of degradations of a real mobile communications system. The consideration of the results of the objective-method tests is based on the development of a new objective method of speech quality. The results gathered during the comparison tests have been displayed and interpreted for different types of Serbian vowels: front vowels, /e/; mid vowel, /a/; and back vowels, /u/.
Objective Measures of the Quality of Speech Transmission in a Real Mobile Network – Measuring, Estimate, and Prediction Method
Timoleon Papadopoulos,Philip A. Nelson,
Inverse filtering in a single or in multiple channels arises as a problem in a number of applications in the areas of communications, active control, sound reproduction and virtual acoustic imaging. In the single-channel case, when the plant C(z) sought to be inverted has zeros outside the unit circle in the z-plane, an approximation to the inverse 1/C(z) can be realized with an FIR filter if an appropriate amount of modeling delay is introduced to the system. But the closer the zeros of C(z) are to unit circle (either inside or outside it), the longer the FIR inverse has to be, typically several tens of times longer than the plant. An off-line implementation utilizing a variant of the backward-in-time filtering technique usually associated with zero-phase FIR filtering is presented. This forms the basis on which a single-channel mixed phase plant can be inverted with an IIR filter of order roughly double than that of C(z), thus decimating the processing time required for the inverse filtering computation.
Computationally Efficient Inversion of Mixed Phase Plants with IIR Filters.
A new algorithm to find an optimal filter partition for efficient long convolution with low input/output delay is presented. For a specified input/output delay and filter length, our algorithm finds the non-uniform filter partition that minimizes computational cost of the convolution. We perform a detailed cost analysis of different block convolution schemes, and show that our optimal-partition finder algorithm allows for significant performance improvement. Furthermore, when several long convolutions are computed in parallel and their outputs are mixed down (as is the case in multiple-source 3-D audio rendering), the algorithm finds an optimal partition (common to all channels) that allows for further performance optimization.
Optimal Filter Partition for Efficient Convolution with Short Input/Output Delay
Rob Clark,Emmanuel Ifeachor,Glenn Rogers,
Digital filter morphing techniques exist to reduce audible transient distortion during filter frequency response change. However, such distortions are heavily dependent on signal content, frequency response settings, filter topology, interpolation scheme, and sampling rates. This paper presents an investigation into these issues, implementing various filter topologies using different input stimuli and filter state change scenarios. The paper identifies the mechanisms causing these distortions, specifying worst case filter state change scenarios. The effects of existing interpolator schemes, finite word length, and system sampling rates on signal distortion are presented. The paper provides an understanding of filter state change, critical in the design of filter morphing algorithms.
Filter Morphing – Topologies, Signals and Sampling Rates
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
Inverse filtering has been proposed for numerous applications in audio and telecommunications, such as speaker equalization, virtual source creation and room deconvolution. When an impulse response (IR) is non-minimum phase, its corresponding inverse can produce artifacts that become distinctly audible. These artifacts produced by the inverse filtering can actually degrade the signal rather than improve it. The severity of these artifacts is affected by the characteristics of the filter and the method (time or frequency domain) used to compute its inverse. In this paper, objective and subjective tests were conducted to investigate and highlight the potential limitations associated with several different inverse-filtering techniques. The subjective tests were conducted in compliance with the ITU-R MUSHRA method.
Evaluation of Inverse Filtering Techniques for Room/Speaker Equalization
Chris Kyriakakis,J. Michael Peterson,
Several high quality audio applications require the use of long finite impulse response (FIR) filters to model the acoustical properties of a room. Several structures for subband filtering are examined. These structures have the ability to divide long FIR filters into smaller FIR filters that are easier to use. Two structures will be discussed to process the signals in a real-time manner, time-convolution of spectrograms and generalized filterbanks. Also filter estimation will be discussed.
Using Subband Filters to Reduce the Complexity of Real-time Signal Processing
Edward V. Semyonov,Stanley P. Lipshitz,John Vanderkooy,
We extend an idea, proposed in an earlier paper, to use noise-shaping techniques in the generation of digital test signals. The previous paper proposed using noise shaping around an undithered quantizer to generate sinusoidal digital test signals with spectra having error nulls at the harmonics of the signal frequency, thus making digital distortion measurements of very great dynamic range possible. We extend this idea in a number of ways. We show: (a) that dither is necessary in order to suppress spurious artifacts caused by the nonlinearity of an undithered noise shaper, (b) that wider and deeper nulls at the harmonic frequencies can be achieved by using higher-order noise-shaper designs, (c) that IIR filter designs can moderate the increased noise power that accompanies an increased FIR filter order, and (d) some other novel uses of noise shaping in digital signal generation.
Noise Shaping in Digital Test-Signal Generation
The delay of a signal from the input terminals of a loudspeaker amplifier to the output terminals of a microphone can be represented as two parts, one from the electrical input to acoustical transmission, and an acoustical propagation delay from some point on the loudspeaker to the microphone. For computational models of mixtures of loudspeakers to be correct, these delays must be measured accurately. It will be shown that temperature differences as small as 1 degree Celsius between measurements of two models of loudspeaker can cause significant differences in the predicted sound field. Though sound speed is much less sensitive to changes in humidity, the difference between assuming a typical humidity and assuming zero humidity (which is the norm) can be significant.
Factors Affecting Accuracy of Loudspeaker Measurements for Computational Predictions
Although stereo systems for large rooms were pioneered in well documented work at Bell Labs in the 1930's, most modern practitioners appear to be ignorant of the most important of that work as applied to modern sound reinforcement. This paper draws on the author's experience over nearly twenty years with both portable and permanent systems using two and three front referenced channels. Design criteria and examples are presented to illustrate both good and bad design practices, and some important pitfalls are noted.
Systems for Stereo Sound Reinforcement - Performance Criteria, Design Techniques, and Practical Examples
Stephen H. Lampen,David A. DeSmidt,
One of the key differences between cable designed for analog signals and cable designed for digital signals is the impedance of the cable. Why is impedance important for digital but not for analog? What effect do impedance variations or mismatching have on digital signals? Can you use Category 5e or Category 6 computer cable to run digital audio? Can you use coaxial cable to carry digital audio? This paper addresses all these questions, and also outlines the limitations of digital cable designs.
Cable Impedance and Digital Audio
The role of emergency sound and voice alarm systems in life safety management has never been so important. However, to be effective, it is essential that such systems are adequately intelligible. Verification of system intelligibility is therefore assuming ever-greater importance. Whilst a number of verification techniques are available, each is not without its drawbacks. The paper reviews the available methods and using the results of new research, highlights areas of weakness of the current techniques.
Limitations of Current Sound System Intelligibility Verification Techniques
Chris Kyriakakis,Sunil Bharitkar,Philip Hilmes,
Traditionally, room response equalization is performed to improve sound quality at a given listener. How- ever, room responses vary with source and listener positions. Hence, in a multiple listener environment, equalization may be performed through spatial averaging of magnitude responses at locations of interest. However, the performance of averaging based equalization, at the listeners, may be a®ected when listener positions change. In this paper, we present a statistical approach to map variations in listener positions to a performance metric of equalization for magnitude response averaging. The results indicate that, for the analyzed listener con¯gurations, the zone of equalization depends on distance of microphones from a source and the frequencies in the sound.
Robustness of Multiple Listener Equalization with Magnitude Response Averaging
Stephen H. Lampen,David A. DeSmidt,
Coaxial cables have been used to run digital audio signals for many years, and have been added to the AES specs (AES3-id). How is coax different from twisted pairs? What are the distance limitations? What trade-offs are made going from digital twisted pairs to digital coax? These questions are all answered in this paper including a discussion of baluns, which are used to convert from one format to the other.
Coax and Digital Audio
There is a measurable interference between correlated signals produced by multiple loudspeakers in a standard five-channel loudspeaker configuration, resulting in an audible comb filter effect. This is due to small individual differences in distances between the ears of the listener and the various loudspeakers. Although this effect is caused by the dimensions and characteristics of the monitoring environment, it can be minimized in the recording process, particularly through the relative placement of microphones and choice of their directional characteristics. In order to analyse this effect, the correlation of microphone signals and their amplitude differences in a recording environment are evaluated using theoretical models. This procedure is applied to coincident and spaced pairs of transducers for direct and reverberant sounds.
The Significance of Interchannel Correlation, Phase and Amplitude Differences on Multichannel Microphone Techniques
It is often colloquially said that there are “no rules” for the spatialisation of multichannel content. This paper seeks to identify creative ways in which the multichannel systems of today and tomorrow can be harnessed to provide aesthetically convincing surround environments. The suitability of production techniques based on Ambisonic methods is re-evaluated for this task, and the discussion focuses on the role of multichannel systems in sound design for presentation in the movie theatre.
Breaking and Making the Rules: Sound Design in 6.1
Jason Corey,Wieslaw Woszczyk,
Phantom images that rely on interchannel level differences can be produced easily for two-channel stereo. Yet one of the most difficult challenges in production for a five-channel environment is the creation of stable phantom images to the side of the listening position. The addition of simulated early reflection patterns from all five loudspeakers influences the localization of lateral phantom sources. Listening tests were conducted to compare participants' abilities to localize lateral sources under three conditions: power-panned sources alone, sources with simulated early reflection patterns, and simulated early reflection patterns alone (without direct sound). Results compare localization error for the three conditions at different locations and suggest that early reflection patterns alone can be sufficient for source localization.
Localization of Lateral Phantom Images in a 5-channel System with and without Simulated Early Reflections
Koichiro Hiyama,Setsu Komiyama,Kimio Hamasaki,
It is important to find out how many loudspeakers are necessary for multichannel sound systems to reproduce the spatial impression of diffuse sound field. This paper discusses this issue in the case when loudspeakers are placed, symmetrically along a concentric circle around the listener, and at the same height of the listener's ear. It becomes clear that the spatial impression of diffuse sound field can be reproduced by only two symmetrical pairs of loudspeakers (that is, four loudspeakers in all). On this arrangement, one pair of loudspeakers should be place in the frontal area around the listener with an angle of about 60 degrees, and the other pair should be in the rear area with an angle of 120 to 180 degrees.
The Minimum Number of Loudspeakers and its Arrangement for Reproducing the Spatial Impression of Diffuse Sound Field
Chris Kyriakakis,Ching-Shun Lin,
New high capacity optical discs and high bandwidth networks provide the capability for delivering multichannel audio. Although there are many one-and two-channel recordings in existence, only a handful of multichannel recordings exist. In this paper we propose a neural network approach that can synthesize microphone signals with the correct acoustical characteristics of specific venues that have been characterized in advance. These signals can be used to generate a multichannel recording with the acoustical characteristics of the original venue. The complex semi-cepstrum technique is employed to extract features from musical signals recorded in a venue and these signals are sent into the fuzzy cerebellar model articulation controller (FCMAC) for training.
A Fuzzy Cerebellar Model Approach for Synthesizing Multichannel Recordings
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
It is now well understood that listener envelopment (LEV) is an essential component of good concert hall acoustics. An objective measure based on the late-lateral energy has been shown to perform well at predicting LEV. One goal of multichannel surround systems is to improve the re-creation of the concert hall experience in a home listening environment. By varying the amount of late-lateral energy, such systems should allow the perception of LEV to be enhanced and controlled. In this paper the loudspeaker/listening room interactions are shown to limit the range of acoustical conditions that can be re-created. A series of formal subjective tests were conducted to determine if objective measures of late-lateral energy are suitable for predicting LEV in multichannel surround systems.
Investigation of Listener Envelopment in Multichannel Surround Systems
It is commonly-accepted thinking that the use of a five-channel surround sound reproduction system increases the size of the listening area over that for two-channel stereophonic systems. In actual fact, for many types of program material, the area of this so-called "sweet spot" is reduced due to interference between the channels at the listener's ears. This effect is described and analysed through theoretical evaluation and psychoacoustic listening tests.
Interchannel Interference at the Listening Position in a Five-channel Loudspeaker Configuration
Myung Soo Kang,Kyoo Nyun Kim,
As systems employing multichannel audio more and more, it is necessary to consider the surround sound capability in the process of the sound recording and playing. This paper presents a structured audio format and its application to design more efficient surround sound system. 3D sound technology is used for localization of sound source. We defined the reusable sound object to clarify the audio format. Sound object is a unit of recorded sound samples that can be changed by various effect properties. Filter and 3D property are applied to change the sound objects in each track.
A Multichannel Surround Audio Format Applying 3D Sound Technology
Doh-Hyung Kim,Jung-Hoe Kim,Sang-Wook Kim,
In this paper, a new hybrid type of scalable lossless audio coding scheme based on MPEG-4 BSAC is proposed. This method introduces two residual error signals, lossy coding error signal and prediction error signal, and utilizes the rice coding as a lossless coding tool. These kinds of process enable to increase the compression ratio. As a result of experiment, average total file size can be reduced to about 50 ~ 60% of the original size. Consequently, slight modification of conventional MPEG-4 general audio coding scheme can give a scalable lossless audio coding functionality between lossy and lossless bitstream.
Scalable Lossless Audio Coding Based on MPEG-4 BSAC
Lossless audio coding enables the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal. The compression is achieved by means of decorrelation methods such as linear prediction. However, since audio signals usually consist of at least two channels, which are often highly correlated with each other, it is worthwhile to make use of inter-channel correlations as well. In the current paper it is shown how conventional (mono) prediction can be extended to stereo and multichannel prediction in order to improve compression efficiency. Results for stereo and multichannel recordings are given.
Lossless Audio Coding Using Adaptive Multichannel Prediction
Dai Yang,Hongmei Ai,Chris Kyriakakis,C.-C. Jay Kuo,
Current high quality audio coding techniques mainly focus on coding efficiency, which makes them extremely sensitive to channel noise, especially in high error rate wireless channels. In this work, we propose an error-resilient layered audio codec (ERLAC) which provides functionalities of both fine-grain scalability and error-resiliency. A progressive quantization, a dynamic segmentation scheme, a frequency interleaving technique and an unequal error protection scheme are adopted in the proposed algorithm to construct the final error robust layered audio bitstream. The performance of the proposed algorithm is tested under different error patterns of WCDMA channels with several test audio materials. Our experimental results show that the proposed approach achieves excellent error resilience at a regular user bit rate of 64 kb/s.
Design of Error-Resilient Layered Audio Codec
Dong Yan Huang,Ju-Nia Al Lee,Say Wei Foo,Lin Weisi,
Reliable delivery of audio bitstream is vital to ensure the acceptable audio quality perceived by 3G network customers. In other words, an audio coding scheme that is employed must be fairly robust over the error prone channels. Various error resilience techniques can be utilized for the purpose. Due to the fact that some parts of the audio bitstream are less sensitive to transmission errors than others, the Unequal Error Protection (UEP) is used to reduce the redundancy introduced by error resilience requirements. The current UEP scheme with convolutional codes and multi-stage interleaving has an unfortunate tendency to generate burst errors at the decoder output as the noise level is increased. A concatenated system combining Reed-Solomon codes with convolutional codes in UEP scheme is investigated for MPEG Advanced Audio Coding (AAC). Under severe channel conditions with random bit error rates of up to 5x10-02, the proposed scheme achieved more than 50% improvement in residual bit error rate over the original scheme at a bitrate of 64kb/s and sampling frequency of 48 kHz. Under burst error conditions with burst error length of up to 4 ms, the proposed scheme achieved more than 65% improvement in bit error rate over the original scheme. The average percentage overhead incurred by using the concatenated system is about 3.5% of the original UEP scheme. Further improvements are made by decreasing the puncturing rate of convolutional codes. However, this can only be adopted when high protection is needed in extremely noisy conditions (e.g., channel BER significantly exceeds 1.00e-02) since it incurs increased overheads.
Application of a Concatenated Coding System with Convolutional Codes and Reed-Solomon Codes for MPEG Advanced Audio Coding
Jeongil Seo,Jin Woo Hong,Kyeoungok Kang,Daeyoung Jang,
In this paper, we describe a simple method for reproducing high frequency components at low-bit rate audio coding. To compress an audio signal at low-bit rates (below 16 kbps per channel) we should use a lower sampling frequency (below 16 kHz) or high performance audio coding technology. When an audio signal is sampled at a low frequency and coded at a low-bit rate, high frequency components are lost and reverberant sound because of quantization noise between pitch pulses. In short-term period, the harmonic characteristic of audio signal is stationary, so the replication of high-frequency bands with low-frequency bands can extend the frequency range of resulting sound and enhance the sound quality. In addition, for reducing the number of bands to be reproduced we adapt this algorithm at the Bark scale domain. For the compatibility with conventional audio decoder, the additional bit stream is added at the end of each frame, which is generated by conventional audio coder. We adapt this proposed algorithm to MPEG-2 AAC and confirmed to increase the quality of audio in comparison with the conventional MPEG-2 AAC coded audio at the same rate. The computational cost of proposed algorithm is similar or a little more than conventional MPEG-2 AAC decoder.
A Simple Method for Reproducing High Frequency Components at Low-Bit Rate Audio Coding
Christof Faller,Raziel Haimi-Cohen,Peter Kroon,Joseph Rothweiler,
Some digital audio broadcasting systems, such as Satellite Digital Audio Radio Services (SDARS), transmit many audio programs over the same transmission channel. Instead of splitting up the channel into fixed bitrate subchannels, each carrying one audio program, one can dynamically distribute the channel capacity among the audio programs. We describe an algorithm which implements this concept taking into account statistics of the bitrate variation of audio coders and perception. The result is a dynamic distribution of the channel capacity among the coders depending on the perceptual entropy of the individual programs. This solution provides improved audio quality compared with fixed bitrate subchannels for the same total transmission capacity. The proposed scheme is non-iterative and has a low computational complexity.
Perceptually-Based Joint-Program Audio Coding
Hossein Najaf-Zadeh,Hassan Lahdili,Louis Thibault,
In this work, the effect of inharmonic structure of audio maskers on the produced masking pattern is addressed. In most auditory models, the tonal structure of the masker is analyzed to determine the masking threshold. Based on psychoacoustic data, masking thresholds caused by tonal signals are lower compared to those produced by noise-like maskers. However, the relationship between spectral components has not been considered. It has been found that for two different multi-tonal maskers with the same power; the one with a harmonic structure produces lower masking threshold. This paper proposes a modification to the MPEG psychoacoustic model 2 in order to take into account the inharmonic structure of the input signal. Informal listening tests have shown that the bit rate required for transparent coding of inharmonic (multi-tonal) audio material can be reduced by 10% if the new modified psychoacoustic model 2 is used in the MPEG 1 Layer II encoder.
Incorporation of Inharmonicity Effects into Auditory Masking Models
Christof Faller,Frank Baumgarte,
In this paper we describe an efficient scheme for compression and flexible spatial rendering of audio signals. The method is based on Binaural Cue Coding (BCC) which was recently introduced for efficient compression of multi-channel audio signals. The encoder input consists of separate signals without directional spatial cues, such as separate sound source signals, i.e. several monophonic signals. The signal transmitted to the decoder consists of the mono sum-signal of all input signals plus a low bit rate (e.g. 2 kb/s) set of BCC parameters. The mono signal can be encoded with any conventional audio or speech coder. Using the BCC parameters and the mono signal, the BCC synthesizer can flexibly render a spatial image by determining the perceived direction of the audio content of each of the encoder input signals. We provide the results of an audio quality assessment using headphones, which is a more critical scenario than loudspeaker playback.
Binaural Cue Coding Applied to Audio Compression with Flexible Rendering
Martin Weishart,Rirgen Herre,Ralph Goebel,
Perceptual audio coding has become a widely-used technique for economic transmission and storage of high-quality audio signals. Audio compression schemes, as known from MPEG-1, 2 and 4 allow encoding with either a constant or a variable bitrate over time. While many applications demand a constant bitrate due to channel characteristics, the use of variable bitrate encoding becomes increasingly attractive e.g.~for Internet Audio and portable audio players. Using such an approach can lead to significant improvements in audio quality compared to traditional constant bitrate encoding, but the consumed average bitrate will generally depend on the compressed audio material. This paper presents investigations into "two-pass encoding" which combines the flexibility of variable bitrate encoding and a predictable target bit consumption.
Two-Pass Encoding Of Audio Material Using MP3 Compression
Christian Neubauer,Frank Siebenhaar,Karlheinz Brandenburg,
In today's multimedia world digital content is easily available to and widely used by end consumers. On the one hand high quality, the ability to be copied without loss of quality and the existence of portable players make digital content, in particular digital music, very attractive to consumers. On the other hand the music industry is facing increasing revenue loss due to illegal copying. To cope with this problem so called Digital Rights Management (DRM) systems have been developed in order to control the usage of content. However, as a matter of fact currently no vendor and no DRM system is widely accepted by the market. This is due to the incompatibility of different systems, the lack of open standards and other reasons. This paper analyzes the current situation of DRM systems, derives requirements for DRM systems and presents technological building blocks to meet these requirements. Finally an alternative approach for a DRM system is presented that better respects the rights of the consumers.
Technical Aspects of Digital Rights Management Systems
Gary S. Brown,
Using a configurable microprocessor to implement low-bit-rate audio applications by tailoring the instruction set reduces algorithm complexity and implementation cost. As an example, this paper describes a Dolby Digital (AC-3) decoder implementation that uses a commercially-available configurable microprocessor to achieve 32-bit floating-point precision while minimizing the required processor clock rate and die size. This paper focuses on how the audio quality and features of the reference decoder algorithm dictate the customization of the microprocessor. This paper shows examples of audio specific extensions to the processor's instruction set to create a family of AC-3 decoder implementations that meet multiple performance and cost points. How this approach benefits other audio applications is also discussed.
Configurable Microprocessor Implementation of Low Bit Rate Audio Decoding
Nathan Bentall,Gary Cook,Eamon Hughes,Mike Smith,Christopher Sleight,Peter Eastty,Michael Page,
As demand continues to grow for production equipment targeting the high resolution, multi-channel capabilities of SACD [Super Audio CD], there is increasing interest in adding DSD capability to both new and existing systems. The prospect of researching and implementing the necessary algorithms from scratch can be daunting. The high data rates, coupled with the asymmetric multipliers often required by the algorithms, make conventional von-Neumann-type DSP platforms (where most developers traditionally have their DSP expertise) seem sub-optimal. Based on custom DSD audio processing engines and packaged in the very compact SODIMM form factor, the modules described in this paper can add high quality, real time, low latency DSD Audio processing functionality to a system with a minimum of development time.
DSD Processing Modules – Architecture and Application.
Michael Page,Nathan Bentall,Peter Eastty,Christopher Sleight,Gary Cook,Eamon Hughes,
The development of large-scale multi-track production equipment for Direct Stream Digital (DSD) audio will be substantially aided by the availability of a convenient multi-channel interface. DSD is the high-resolution 1-bit audio coding system for the Super Audio CD consumer disc format. Existing multi-channel audio interconnection and networking protocols are not easily able to support the high-frequency sample clock required by DSD. A new interconnection type is introduced to provide reliable, low-latency, full-duplex transfer of 24 channels of DSD, using a single conventional office networking cable. The interconnection transfers audio data using Ethernet physical layer technology, whilst conveying a DSD sample clock over the same cable.
Multi-Channel Audio Connection for Direct Stream Digital
James A. S. Angus,
This paper clarifies some of the confusion, which has arisen over the efficacy of dither in PCM and Sigma-Delta Modulation (SDM) systems. It presents a means of analyzing "in-band" idle tone structure using chaos theory and describes a fair means of comparison between PCM and SDM. It presents results, which show that dither can be effective in SDM systems.
The Effect of Idle Tone Structure on Effective Dither in Delta-Sigma Modulation Systems: Part 2
Derk Reefman,Erwin Janssen,
A new method for the DC analysis of a Sigma Delta modulator (SDM) is presented. The model used for the description of a SDM is adopted from Candy's model for a first order SDM. However, where Candy's model is exact for a first order SDM, it fails to be so in a higher order case. In our model, we deal with this by the introduction of stochastic behaviour of the SDM, and obtain the probability density distribution function of some variables which determine many of the characteristics of the SDM in the time domain. Comparison with simulation results shows that the assumption of stochastic behaviour is rather good for SDM orders greater than 3, which display significant noise shaping. For lower orders (or less aggressive noise shaping) the approximation is less good. As an aside, the new model of sigma delta modulation also clarifies why the 'time-quantized' dither approach presented by Hawksford is much better compared to standard quantizer dithering.
DC Analysis of High Order Sigma Delta Modulators
Audio power amplifiers have typically been supplied power by the simplest possible means, usually an offline supply with no line or load regulation, most commonly based on a line frequency transformer. Even modern amplifiers utilizing switchmode power supplies are usually designed without line or load regulation. The exception has been made for high-end audiophile amplifiers. The pros and cons of a regulated power supply are investigated.
Power Supply Regulation in Audio Power Amplifiers
This paper reviews a progression of circuits used for protecting bipolar power transistors in the output stages of audio power amplifiers. Design oriented methods of determining the protection locus are shown in a mathematical and graphical procedure. The circuits are then expanded from their standard configurations to allow for transient excursion beyond steady state limits, and thermally dependent protection limits, to better match the protection limits to the actual output stage capability. This allows the protection scheme to prevent output stage failure in the least restrictive way. A new method is shown for achieving a junction temperature estimation system without the use of a multiplier.
Audio Power Amplifier Output Stage Protection
Hundreds of millions of tapes are deteriorating. As these tapes age, more and more of them will begin deteriorating. This paper will describe how to recover these "unplayable" tapes as well as how to store them properly. Also, this paper will cover all of the issues of archiving audio. This includes high-capacity and cheap Hard Disk Drives as well as equipment obsolescence and new media.
This paper will discuss the past, present and future of recording audio education. It will describe how the job market and educational requirements have changed, and will take a look at how to plan for a successful career. It will also provide valuable information on getting and keeping a job in today’s fast-paced world of professional audio.
Finding A Recording Audio Education Program that Suits Your Career Choice
Robert Laubscher,Richard Foss,
This paper explores the use of Universal Plug and Play (UPnP) as a studio control technology. The architecture of a possible studio control technology is introduced. The elements of this studio control architecture are related to the architecture of UPnP. A sample implementation demonstrates the key aspects of using UPnP as a studio control technology.
Studio Exploring Using Universal Plug and Play
Richard Foss,Jun-ichi Fujimori,
“mLAN” describes a network that allows for the transmission and receipt of audio and music control data by audio devices. IEEE 1394 was chosen as the specification upon which to implement mLAN. mLAN has built on IEEE 1394 and related standards, introducing formats, structures, and procedures that enable the deployment of IEEE 1394 within a music studio context. This paper discusses these standards, their implementations, and provides pointers to the future evolution of mLAN.
mLAN - The Current Status and Future Directions
Norman P. Jouppi,Michael J. Pan,
Mutually-immersive audio telepresence attempts to create for the user the audio perception of being in a remote location, as well as simultaneously create the perception for people in the remote location that the user of the system is present there. The system provides bidirectional multi-channel audio with relatively good fidelity, isolation from local sounds, and a reduction of local reflections. The system includes software for reducing unwanted feedback and joystick control of the audio environment. We are investigating the use of this telepresence system as a substitute for business travel. Initial user response has been positive, and future improvements in networking quality of service (QOS) will improve the interactivity of the system.
Mutually-Immersive Audio Telepresence
Chulmin Choi,Lae-hoon Kim,Yangki Oh,Sejin Doo,Koeng-Mo Sung,
This paper describes the measurement and assessment method for sound field in a car cabin, which is so small that conventional room acoustical parameters cannot be employed directly. Firstly, we measured the sound field using multi-channel microphone system and calculated some room acoustical parameters for judging their validity in the car cabin. As a result, we could conclude that many of the conventional parameters do not have useful meaning in a car and its audio system. By analyzing the impulse responses from many cars, we could develop some parameters for more profound assessment of the sound field in a car.
Assessment of Sound Field in a Car
Fabio Bozzoli,Angelo Farina,
The paper describes a measurement system developed for assessing the speech intelligibility inside car compartments. This relates directly to the understandability of the voices being reproduced through the radio receiving system (news, traffic info, etc.), but in the future it will be used also for assessing the direct speech communications between the passengers, and the performance of hands-free communication devices. The system is based on the use of two head and torso simulators, one equipped with an artificial mouth, the second equipped with binaural microphones. Only the second is used when the sound is being reproduced through the car’s sound system. The MLS-based version of the STI method is used for performing the measurements, taking into account the effect of the background noise and the electro-acoustic propagation inside the compartment.
Measurement of the Speech Intelligibility Inside Cars
Level measurements do not necessarily correlate well with subjective loudness. Several methods have been described for making objective measurements that are designed to correlate better with actual loudness. Some of these measures are: A-weighting and B-weighting, the methods of Stevens and Zwicker as described in ISO 532A and B, and the method described by Moore and Glasberg. All of these measures are intended to describe (measure) the loudness of sounds with constant spectra. How well do these measures work with typical audio signals distributed via broadcast or recording?
Comparison of Objective Measures of Loudness using Audio Program Material
Descriptive analysis of processed speech quality was carried out by semantic differentiation and external preference mapping was used to relate the attributes to overall quality judgements. Clean and noisy speech samples from different speakers were processed by various processing chains representing mobile communications, resulting in a total of 170 samples. The perceptual characteristics of the test samples were described by 18 screened subjects and the final descriptive language with 21 attributes were developed in panel discussions. The scaled attributes were mapped to overall quality evaluations collected from 30 screened subjects by partial least square regression. The Phase II ideal point modelling was used to predict the quality with an average error of about 6 % and to study the linearity of the attributes.
Descriptive Analysis and Ideal Point Modelling of Speech Quality in Mobile Communication
William L. Martens,Charith N. W. Giragama,
A single, common perceptual space for a small set of processed guitar timbres was derived for two groups of listeners, one group comprising native speakers of the Japanese language, the other group comprising native speakers of Sinhala, a language of Sri Lanka. Subsets of these groups made ratings on 13 bipolar adjective scales for the same set of sounds, each of the two group using anchoring adjectives taken from their native language. The adjectives were those freely chosen most often in a preliminary elicitation. The results showed that the Japanese and Sinhalese semantic scales related differently to the dimensions of their shared timbre space that was derived using MDS analysis of the combined dissimilarity ratings of listeners from both groups.
Relating Multilingual Semantic Scales to a Common Timbre Space
Frank Baumgarte,Christof Faller,
Binaural Cue Coding (BCC) offers a compact parametric representation of auditory spatial information such as localization cues inherent in multi-channel audio signals. BCC allows to reconstruct the spatial image given a mono signal and spatial cues that require a very low rate of a few kbit/s. This paper reviews relevant auditory perception phenomena exploited by BCC. The BCC core processing scheme design is discussed from a psychoacoustic point of view. This approach leads to a BCC implementation based on binaural perception models. The audio quality of this implementation is compared with low-complexity FFT-based BCC schemes presented earlier. Furthermore, spatial equalization schemes are introduced to optimize the auditory spatial image of loudspeaker or headphone presentation.
Design and Evaluation of Binaural Cue Coding Schemes
Natanya Ford,Francis Rumsey,TIm Nind,
An investigation is described which further develops a Graphical Assessment Language (GAL) for subjectively evaluating spatial attributes of audio reproductions. Two groups of listeners, those with previous experience of using a GAL and listeners new to graphical elicitation, were involved in the study which considered the influence of a central automotive loudspeaker on listeners’ perception of ensemble width, instrument focus and image skew. Listeners represented these attributes from both driver’s and passenger’s seats using their own graphical descriptors. Source material for the study consisted of simple instrumental and vocal sources chosen for their spectral and temporal characteristics. Sources ranged from a sustained cello melody to percussion and speech extracts. When analysed using conventional statistical methods, responses highlighted differences in listeners’ perceptions of width, focus and skew for the various experimental conditions.
Evaluating Influences of a Central Automotive Loudspeaker on Perceived Spatial Attributes using a Graphical Assessment Language
Atsushi Marui,William L. Martens,
Controlled non-linear distortion effects processing produces a wide range of musically useful outputs, especially in the production of popular guitar sounds. But systematic control of distortion effects has been difficult to attain, due to the complex interaction of input gain, "drive" level, and "tone" controls. Rather than attempting to calibrate the output of commercial effects processing hardware, which typically employs proprietary distortion algorithms, a realtime software-based distortion effects processor was implemented and tested. Three distortion effect types were modeled using both waveshaping and a second order filter to provide more complete control over the parameters typically manipulated in controlling effects for electric guitars. The motivation was to relate perceptual differences between effects processing outputs and the mathematical functions describing the non-linear waveshaping producing variation in distortion. Perceptual calibration entailed the following listening sessions: First, listeners adjusted the tone of each of nine test outputs, then they made both pairwise dissimilarity ratings and attribute ratings for those nine stimuli. The results provide a basis for an effects-processing interface that is perceptually-calibrated for system users.
Multidimensional Perceptual Calibration for Distortion Effects Processing Software
The National Radio Systems Committee’s testing and evaluation program for in-band on-channel digital audio broadcast systems is described. The results of laboratory and field tests performed during 2001 on iBiquity Digital Corporation’s AM-band and FM-band IBOC DAB systems are reported. The conclusions drawn from the laboratory and field test results are also reported, and implications for the future are discussed.
Industry Evaluation of In-Band On-Channel Digital Audio Broadcast Systems
Eunmi L. Oh,Jung-Hoe Kim,
The current study is concerned with assessing the sound quality of various audio codecs including ubiquitous de-facto standards. Formal listening tests were conducted based on the ITUR Recommendation BS.1116 in order to provide an objective measure of sound quality. Codecs under test included de-facto standards that were commercially and non-commercially available, and the MPEG general audio. In addition, our recently updated codec was tested. Test items consisted of usual MPEG test sequences and other sensitive sound excerpts at the bitrate of 64 and 96 kbps stereo. Experimental results show that the sound quality of our newest codec outpaces that of most of other codecs.
Comparisons of De-Facto and MPEG Standard Audio Codecs in Sound Quality
Portions of the digital audio chain have been incrementally improved by development, such that objective specifications indicate a very high level of performance. Subjective reviews of these components often claim to observe substantial differences between products. This investigation uses the tool PEAQ (Perceptual Evaluation of Audio Quality) to measure the audio degradation caused by Analog to Digital Converters, Digital to Analog Converters, and Sample Rate Conversion, and also to measure the minute incremental changes of codec audio quality which accompany very small changes in data rate.
Evaluating Digital Audio Artifacts with PEAQ
Richard O. Duda,V. Ralph Algazi,Dennis M. Thompson,
This paper concerns the use of a simple head-and-torso model to correct deficiencies in the low-frequency behavior of experimentally measured head-related transfer functions (HRTFs). This so-called ``snowman'' model consists of a spherical head located above a spherical torso. In addition to providing improved low-frequency response for music reproduction, the model provides the major low-frequency localization cues, including cues for low-elevation as well as high-elevation sources. The model HRTF and the measured HRTF can be easily combined by using the phase response of the model at all frequencies and by ``cross-fading'' between the dB magnitude responses of the model and the measurements. For efficient implementation, the exact snowman HRTF is approximated by two time delays and two first-order IIR filters. Because the poles are independent of the location of the virtual source, this supports a simple real-time implementation that allows for arbitrarily rapid head and source motion.
The Use of Head-and-Torso Models for Improved Spatial Sound Synthesis
Daniel Schobben,Ronald Aarts,
Headphone signal processing systems that are commercially available today are no t optimized for the individual listener. This results in large localization erro rs for most listeners. In the present work, a system is introduced that requires a one time calibration procedure, which can be carried out conveniently by the listener. This system consists of conventional headphones into which small micro phones have been mounted. An active noise cancellation method is used to achieve a sound reproduction via headphones, which is as close as possible to a referen ce loudspeaker setup. The active noise cancellation system is based on adaptive filters that are implemented in the frequency domain.
Three-dimensional Headphone Sound Reproduction Based on Active Noise Cancellation
V. Ralph Algazi,Eric J. Angel,Richard O. Duda,
This paper addresses the design of virtual auditory spaces that optimize the localization of sound sources under engineering constraints. Such a design incorporates some critical cues commonly provided by rooms and by head motion. Different designs are evaluated by psychoacoustics tests with several subjects. Localization accuracy is measured by the azimuth and elevation errors and the front/back confusion rate. We present a methodology and results for some simple canonical environments that optimize the localization of sounds.
On the Design of Canonical Sound Localization Environments