Audio Engineering Society Papers

AES 121st Convention

San Francisco, CA, USA
October 5-8, 2006

AES Paper Ordering

Single Convention Papers are available through the AES Paper Search and Shop facility.

Papers Listing

6859
CMOS Self-Oscillating Class D Amplifier with Output Optimization
Balkir, Sina; Hoffman, Michael; White, Dan
Simple amplifier topologies are not the norm for integrated circuit (IC) class D amplifiers. A simple self-oscillating topology [1] is mapped into a standard CMOS technology and fabricated in a 0.5 micron process. The output stage is optimized for a range of modulation indices [2], simultaneously increasing average efficiency and reducing chip area. Modifications to the optimization methodology are proposed to enhance efficiency and reduce the large transient currents inherent in CMOS inverter chains. Test results are presented and compared to predicted and simulated values. This project shows that design complexity is not requisite for good performance and high efficiency.

6860
A Digital Class-D Amplifier with Power Supply Correction
Biallais, Arnaud; de Buys, Frans; de Saint-Moulin, Renaud; Dooper, Lûtsen; Putzeys, Bruno; Reefman, Derk; Rutten, Robert; Tol, Jeroen; van den Boom, Jeroen
Digital class-D amplifiers are cost-effective solutions for a wide range of digital audio applications, because of their high power efficiency and ease of integration. This paper presents a real-time cost-effective power supply correction algorithm, which increases the power supply rejection of an open-loop digital class-D amplifier substantially. It enables open-loop digital class-D amplifiers with inexpensive power supplies with less decoupling. Measurements on a prototype amplifier with single-ended speaker loads show 57dB suppression for 100Hz supply ripple, while intermodulation products between a 100Hz ripple and a 1kHz tone are attenuated with 40dB. The dynamic range of the amplifier is 94dB, which is in agreement with phase noise measurements on the on-chip clock generator.

6861
Automotive Class D Digital Amplifier Output Stage
Babb, Mike; Deken, Richard; Lee, Jim; Pauk, Ondrej; Stewart, Brad; Wildhaber, Daniel
Automotive applications using Class D Digital PWM Switching amplifiers have long been limited by the approach’s perceived higher cost, the potential for radiated electromagnetic interference affecting in-vehicle electronics and the difficulties of designing this type of amplifier using discrete components. An IC built on a high performance Analog Mixed Signal plus Power silicon process coupled with innovative circuit design allows the deployment of high power Class D amplifiers in vehicles that previously been confined to only Class AB amplifier designs. Class D PWM amplifiers, switching to 400 kHz or more, can achieve an unweighted dynamic range in exceeding 100 dB, linearity in excess of 80 dB, and PSRR greater than 50 dB with a full scale audio signal.

6862
High Performance Digital Feedback for PWM Digital Audio Amplifiers
Midya, Pallab; Paulo, Theresa; Roeckner, Bill
Non-idealities associated with the power stage of pulse width modulated (PWM) based, open loop digital audio amplifiers limit their performance. A high performance digital feedback system corrects for both power supply noise and power stage non-linearity in a PWM digital audio amplifier. An integrated circuit (IC) implementation of this system, along with measured results, is presented. The PWM amplifier, switching between 300 kHz and 400 kHz, achieves an unweighted dynamic range in excess of 100 dB, linearity in excess of 80 dB, and excellent power supply rejection (PSR) with a large scale audio signal.

6863
Asynchronous Sample Rate Converter for Digital Audio Amplifiers
Midya, Pallab; Roeckner, Bill; Schooler, Tony
A high performance digital audio amplifier system requires an asynchronous sample rate converter to synchronize the input digital data stream to the low jitter system clock used to generate the digital PWM output. By performing the sample rate conversion with highly oversampled signals the computation and memory requirements are minimized. The performance of the digital amplifier system is not limited by the sample rate converter while accommodating multiple input and output rates. The digital amplifier system, including the asynchronous sample rate converter, is implemented in an IC. Measured data shows linearity performance exceeding 120 dB.

6864
Jitter Simulation in High Resolution Digital Audio
Hawksford, Malcolm J.
To reconstruct an audio waveform samples must be located precisely in time. Practical systems have sources of jitter described by both correlated and uncorrelated elements that result in low-level distortion. However, less well known is how different forms of jitter distort an audio signal. Jitter theory is developed to produce a simulator to enable jitter induced distortion to be determined. Distortion spectra can then be observed and time domain distortion auditioned. Jitter induced distortion is compared to a range of errors, including DAC errors and incorrect use of dither. System architectures studied include LPCM with up-sampling and noise shaping and SDM.

6865
The Performance of Look-Ahead Sigma-Delta Modulators with Unstable Noise Shaping Filters
Angus, Jamie A. S.; Hoare, Steve
Look-ahead Sigma-Delta modulators look forward k samples before deciding to output a “one” or a “zero”. We look at the performance of such modulators, when used with unstable noise shaping filters, and examine the trade-off between the number of paths that must be kept alive, and the look-ahead depth needed, to assure stability. Such information helps define the number of required paths in a “Pruned Tree” algorithm or the stack size in a “Stack” algorithm, as well as the minimum search depth required. Results are presented showing that an unstable noise-shaping filter, with an out-of-band gain of 3.0, can be used successfully. This gives a 10-12 dB improvement in signal-to-noise ratio, compared to a conventional modulator.

6866
Prediction and Verification of Powered Loudspeaker Requirements for an Assisted Reverberation System
Poletti, Mark A.; Schwenke, Roger
Electronic enhancement systems are being increasingly used to provide control of the acoustics of multipurpose venues. Reverberation enhancement systems are an important component of such systems. These provide an electroacoustic field which supports the naturally occurring reverberant field using multiple microphones and loudspeakers. This paper derives a simple formula for the on-axis SPL required to support a given sound field. The formula is verified by measurements on a theatre with an installed assisted reverberation system. A relative measurement method is also developed to allow the maximum level to be determined from measurements at a lower level.

6867
The Representation of, and Control over Mixing Desks via a Software-based Matrix
Foss, Richard; Foulkes, Philip
The control over all the parameters of a mixing desk can be a daunting task. This paper describes a software system that has been created to represent the signal processing and routing functions of MIDI-controllable mixing desks in a conceptually clear manner. Input to output routings are displayed in the form of a matrix, while signal processing functionality can be accessed at the inputs, outputs, and cross-points. XML is used to capture the elements of the mixing desk, and to associate appropriate MIDI control messages with these elements. This enables the same matrix template to be used for many mixing desks. Remote control is enabled by IP-based MIDI routing software known as MIDINet.

6868
A Cascaded Delta-Sigma DAC with DWA for Decreasing Mismatch Effect
Katsumi, Syunsuke; Terada, Yousuke; Yasuda, Akira; Zen, Masao
In this paper, we propose a small-sized and high performance cascaded delta-sigma DAC (CDS-DAC) that uses an analog FIR filter including a second mismatch shaping functions. If multi-bit DAC was composed, the mismatch caused by variation of elements degrades the overall performance in analog part. We propose novel CDS-DAC having a second-order mismatch shaping function, which can be realized with several switches in analog part and a first order mismatch shaper in digital part in order to improve a performance degradation caused by the mismaches. The simulation result is obtained the SNR of the proposed CDS-DAC is 122 dB when the oversampling ratio is 128 and mismatch of components is 1%.

6869
An Enhanced Encoder for the MPEG-4 ALS Lossless Coding Standard
Harada, Noboru; Kamamoto, Yutaka; Moriya, Takehiro
MPEG-4 ALS is a lossless coding standard for audio signals based on time-domain prediction, and it was officially published in March 2006. Enhanced encoder algorithms and implementation examples of the MPEG-4 ALS are shown in this paper. To reduce the computational complexity of the encoder, new algorithms have been developed for the multi-channel prediction coding (MCC) tool, the long-term prediction (LTP) tool, and the hierarchical block switching (BS) tool. In addition, processing speed has been enhanced by means of software optimization, including assembler codes. As a result of these improvements, the software provides maximally 10 times faster encoding speed than that of the MPEG reference software and becomes more useful for various practical applications.

6870
Perceptually Biased Linear Prediction
Biswas, Arijit; den Brinker, Albertus C.
A perceptually biased linear prediction scheme is proposed for audio coding. Using only simple modifications of the coefficients defining the normal equations for a least-squares error, the spectral masking effects are mimicked in the prediction synthesis filter without using an explicit psycho-acoustic model. The main advantage of such a scheme is the reduced computational complexity. The proposed approach was implemented in a Laguerre-based Linear Prediction scheme and its performance has been evaluated in comparison with a Laguerre-based Linear Prediction approach controlled by the ISO MPEG-1 Layer I-II model, as well as with one of the latest spectral integration based psycho-acoustic models. Listening tests clearly demonstrate the viability of the proposed method.

6871
Fast Complex Quadrature Mirror Filterbanks for MPEG-4 HE-AAC
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min
Spectral Band Replication (SBR) has been introduced in MPEG-4 HE-AAC as bandwidth extension tool. All the framework of SBR is on complex-value domain to avoid the aliasing effect, and hence results in considerable time complexity. This paper focuses on the complex Quadrature Mirror Filter (QMF) banks used in HE-AAC encoder and decoder, and proposes the two fast decomposition methods for the time-consuming matrix operations in the filterbanks, that are based on DCT and DFT respectively. Hence, the time complexity can be effectively reduced by the fast algorithms for DCT and FFT.

6872
Compression Artifacts in Perceptual Audio Coding
Chang, Chia-Ming; Hsu, Han-Wen; Lee, Kan-Chun; Lee, Wen-Chieh; Liu, Chi-Min; Tang, Shou-Hung; Yang, Chung-Han; Yang, Yung-Cheng
Perceptual audio coding achieves high compression ratio by exploiting the perceptual irrelevance and data redundancies. By the use of advanced and sophisticated signal processing techniques, perceptual coding has generated artifacts sounding very different from the traditional distortions. In audio industry, it is always an important step to maturing a technology by modeling, measuring, and listening the artifacts induced from the technology. In the past, there have been some types of artifacts defined in linear quantization or MP3 music tracks. With the advance of the new coding modules in AAC, SBR, and parametric coding, various new types of artifacts are generated. This paper models the frequently induced audible artifacts and analyzes the problematic encoder modules.

6873
Design of HE-AAC Version 2 Encoder
Chang, Chia-Ming; Hsu, Han-Wen; Lee, Kan-Chun; Lee, Wen-Chieh; Liu, Chi-Min; Yang, Chung-Han; Yang, Yung-Cheng; Tang, Shou-Hung
HE-AAC version 2 primarily consists of three encoders: AAC, Spectral Band Replication (SBR) and Parametric Stereo Coding (PS). In the past, we have considered several modules in the three encoders. This paper defines other design issues and considers the associated solutions related to the HE-AAC version 2. The subjective and objective tests will be used to check the quality improvement.

6874
Analysis and Synthesis for Universal Spatial Audio Coding
Goodwin, Michael M.; Jot, Jean-Marc
Spatial audio coding (SAC) addresses the need for e cient representation of multichannel audio content. SAC methods are typically based on analyzing inter-channel relationships in the input audio and resynthesizing those same relationships between the output channels. Recently, a method was proposed and demonstrated based on analyzing the input audio scene and describing it without reference to the channel con guration, thereby enabling exible, accurate rendering on arbitrary output systems. In this paper, we provide further mathematical treatment of this universal spatial audio coding system; we develop an analysis-synthesis method based on a linear algebraic model, present an e cient approach for adapting the synthesis to arbitrary speaker con gurations, and describe a straightforward scheme for scalable reduction of the spatial cue data.

6875
A Novel Very Low Bit Rate Multi-Channel Audio Coding Scheme using Accurate Temporal Envelope Coding and Signal Synthesis Tools
Dubey, Chandresh; Ferreira, Anibel J. S.; Gupta, Richa; Sinha, Deepen
Multichannel audio is increasingly ubiquitous in consumer audio applications such as satellite radio broadcast systems, surround sound playback systems, multichannel audio streaming and other emerging applications. These applications often present a challenging bandwidth constraint making parametric multichannel coding schemes attractive. Several techniques have been proposed recently to address this problem. Here we present a novel low bit rate 5 channel encoding system that has shown promising results. This technique called the Immersive Soundfield Rendition (ISR) System emphasizes accurate reproduction of multi-band temporal envelope. The ISR system also incorporates a very low over-head (blind upmixing) mode. The proposed multichannel coding system has yielded promising results for multi-channel coding in the 0-12 kbps range. More information and audio demos will be available at http://ww.atc-labs.com/isr

6876
New Results in Low Bit Rate Speech Coding and Bandwidth Extension
Annadana, Raghuram; E. V., Harinarayanan; Ferreira, Anibel; Sinha, Deepen
Emerging digital audio applications for broadcast radio and multimedia systems are presenting new challenges such as the need to code mixed audio content, error robustness, higher audio bandwidth and the need to deliver high quality audio at low bit rates demanding a paradigm shift in the existing low bit rate speech coding techniques. This paper describes the continuation of our research concerning low bit rate speech coding and enhancements in the recently introduced bandwidth extension toolkit, Audio Bandwidth Extension Toolkit (ABET). Several new modes of operation have been introduced in the codec with the innovative use of perceptual coding tools. In addition, a new mode in ABET is added to improve the efficiency of the temporal shaping tool, Multi Band Temporal Amplitude Coding (MBTAC), by exploiting the time and frequency correlation in signals. The structure of the codec and its performance in these modes of operation are detailed. Audio demos and further information is available at http://www.atc-labs.com/lbr/

6877
A New Method for Measuring Distortion using a Multitone Stimulus and Non-Coherence
Brunet, Pascal; Temme, Steve
A new approach for measuring distortion based on dual-channel analysis of non-coherence between a stimulus and response is presented. This method is easy to implement, provides a continuous distortion curve against frequency, and can be used with a multitone stimulus, noise, or even music. Multitone is a desirable test signal for fast frequency response measurements and also for assessing system non-linearities. However, conventional single-channel multitone measurements are challenging because the number of intermodulation tones grows exponentially with the number of stimulus tones and makes it extremely difficult to separate harmonics from intermodulation products. By using dual-channel measurement techniques, only well-known, standard signal processing techniques are used, resulting in simplicity, accuracy and repeatability.

6878
Gradient Microphone Design using Miniature Microphone Arrays
Backman, Juha
The paper describes a method of using a dense array of miniature microphones (e.g. MEMS or miniature electret) to yield precise one-point multi-channel gradient microphones. The signals obtained from individual microphones in the array are used to obtain an estimate for the zeroth, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multi-channel signals with adjustable polar patterns for recording purposes.

6879
Wind Generated Noise in Microphones - An Overview - Part II
Brixen, Eddy B.
When microphones are exposed to wind, noise is generated. The amount of noise generated depends on many factors: the speed and the direction of the wind being of course two of the important factors. However, the size, shape and design principles of the microphones are also most important factors. At higher wind speeds, not only is noise generated but also distortion is introduced, normally as a result of clipping. This paper is the 2nd in a series of two. The first paper basically presented standard condenser microphones with 1st order characteristics. This paper presents a number of comparative measurements on electro dynamic microphones basically some of the workhorses in the field.

6880
A Portable Record Player for Wax Cylinders Using both Laser-beam Reflection and Stylus Methods
Fukube, Tohru; Shimizu, Yasuyuki
The wax phonograph cylinder invented by Thomas Edison in 1885 was the medium for recording sound until about 1930. Although a lot of historically valuable cylinders have been preserved all over the world, most of them changed in quality by re-crystallization and had many cracks on their surfaces. We have developed a portable record player (3.4kg-weight, 45cm-width, 33cm-length, and 10.5cm-height) for the cylinders (2-minute, 55-mm diameter, 105-mm height, 400 grooves) using both laser-beam reflection and stylus methods. In this article, the record player is shown to be useful for carrying it by a hand and reproducing sounds in real time from damaged wax cylinders as well as the undamaged.

6881
High-Accuracy Full-Sphere Electro Acoustic Polar Measurements at High Frequencies using the HELS Method
Keele, Jr., Don. B. (Don); Lu, Huancai; Wu, Sean
Traditionally, high-accuracy full-sphere polar measurements require dense sampling of the sound field at very-fine angular increments, particularly at high frequencies. The proposed HELS (Helmholtz Equation Least Squares) method allows this restriction to be relaxed significantly. Using this method, far fewer sampling points are needed for full and accurate reconstruction of the radiated sound field. Depending on the required accuracy, sound fields can be reconstructed using only 10 to 20% of the number of sampling points required by conventional techniques. The HELS method allows accurate reconstruction even for sample spacing that violates the Nyquist spatial sampling rate in certain directions. This paper examines the convergence of HELS solutions via theory and simulation for reconstruction of the acoustic radiation patterns generated by a rectangular plate mounted on an infinite rigid flat baffle. In particular, the impact of the numbers of expansion terms and measurement points as well as errors imbedded in the input data on the resultant accuracy of reconstruction is analyzed.

6882
Measurement and Visualization of Loudspeaker Cone Vibration
Klippel, Wolfgang; Schlechter, Joachim
Optical measurement of loudspeaker cone vibration (scanning vibrometry) can also be accomplished by using Laser triangulation technique which is a cost effective alternative to Doppler interferometry. Since triangulation sensors provide primarily displacement advanced signal processing is required to measure the break-up modes up to 20 kHz at sufficient signal to noise ratio. In addition to stroboscopic animation of the radiation pattern a new decomposition technique is presented for the visualization of the measured data. Radial and circular modes can be separated and the total vibration can be split into radiating and non-radiating vibration components. This kind of post-processing reveals critical vibration modes, simplifies the interpretation and gives indications for further improvements.

6883
An Extended Small Signal Parameter Loudspeaker Model for the Linear Array Transducer
Axelsson, Jens-Peter; Jabbari, Ali; Little, Richard W.; Struck, Christopher J.; Unruh, Andrew D.
The Linear Array Transducer (LAT) is a tubular form-factor loudspeaker driver technology which, to a good first approximation, can be modeled by the standard linear time invariant small signal parameter (SSP) loudspeaker circuit model. However, to understand the behavior of a LAT to a greater level of detail, the SSP model can be extended with the addition of four additional parameters. The nature of these additional parameters in the model are explained. Additionally, the model is correlated to measurements of currently available LATs. Finally, it is shown how the LAT extended SSP model is approximated by the standard loudspeaker SSP model.

6884
An Optimized Full-Bandwidth 20Hz-20kHz Digitally Controlled Co-Axial Source
Debail, Bernard G.A.; Shaiek, Hmaied; Kerneis, Yvon; Boucher, Jean Marc; Diquelou, Pierre Yves
In this paper, we address the design considerations of the first four-ways full-bandwidth co-axial loudspeaker system. Unlike conventional co-axial drivers, a special motor configuration has been considered in order to approach a real co-incident source. Practical constraints of moving mass element and cone shape will be shown with reference to the targeted sound radiation characteristics. Dedicated digital signal processing techniques has been implemented in order to optimize relevant parameters such as frequency response, directivity index and radiation patter of the system especially on drivers’ overlap bands. The optimization is obtained by a complex weighting of the crossover filter transfer functions. The optimal weights are obtained with a new routine using the gradient algorithm.

6885
Linear Phase Crossover Filters Advantages in Concert Sound Reinforcement Systems: a practical approach
Di Cola, Mario; Hadelich, Miguel T.; Ponteggia, Daniele; Saronni, Davide
Today concert sound reinforcement systems constantly demand for improved performances. The crossover approach could be a key point to improve the overall speaker performances and modern DSP offers several chances to do that. FIR Linear Phase Filters, one of the processing that strictly belongs to digital domain, could be one of the most useful. FIR Linear Phase Crossover technique, even though is well know and have been largely studied from a while, for some practical reasons is still not quite largely applied at the moment. Anyway, better system transient response together with increased dept and warmth can be obtained with this method even though some limitations need to be considered. The results of practical application of existing DSP based device to several real world loudspeaker systems will be shown together with the achievable advantages: overall time and phase response improvements, polar response stability improvements, general output capability increasing if very steep brick-wall functions are used. Issues and limitations that could arise attempting to set a loudspeaker crossover with Linear Phase Filters will be analyzed and shown as well. In order to compare the results, several loudspeaker systems have been processed and analyzed using both FIR Linear Phase and Standard crossover techniques. The results will be shown and demonstrated with the large support of real measurements results.

6886
Optimum Diaphragm and Waveguide Geometry for Coincident source Drive Units
Dodd, Mark
Coincident source loudspeakers avoid the response and directivity irregularities seen with conventional spaced drivers in the crossover region. Earlier work has shown that by placing the high frequency driver at the apex of the low frequency diaphragm the directivity of both drivers may be regularized at the crossover frequency. This paper describes the application of finite element and boundary element methods to explore how some simple sources interact with various boundary conditions. A novel geometry giving much-improved bandwidth and directivity is introduced. Simulated results of an idealized high frequency driver using this geometry are compared to those of an idealized direct radiating dome. The implementation of the new design using this novel structure is discussed and measured results from the complete design are presented.

6887
Investigation of Hearing Loss Influence on Music Perception, In Auditoria, by Means of Stereo Dipole Reproduction
Binelli, Marco; Capra, Andrea; Farina, Angelo; Marmiroli, Daniela; Martignon, Paolo
The largest part of who seats in theatres or in auditoria has not an optimal perception of sound, because of hearing loss. So, in order to find out a correlation between objective parameters and subjective descriptors, executing therefore meaningful listening tests, we need first to study the perception of the customer. For this purpose, selected theatergoers, different for age, sex and degree, were chosen as subjects. The listening test was based on the virtual spatial recreation of several theaters, by means of an optimized stereo dipole technology. The test was repeated, for 30 subjects, with and without hearing aid, which was previously set to compensate the auditory loss. Some preliminary data analysis results are here shown.

6888
Audibility of Linear Distortion with Variations in Sound Pressure Level and Group Delay
Geddes, Earl R.; Lee, Lidia W.
Recent psychoacoustic studies of nonlinear distortion have yielded some new insights into what audible problems in loudspeaker might be related to. This paper will show the results of recent subjective tests which extend the work of various previous works to show that sound level significantly affects the perception of linear distortion in audio systems. This means that the hearing system itself is nonlinear and what has been thought of as being nonlinear distortion in the audio system may actually be a nonlinear perception directly in the receiver itself.

6889
Development and Evaluation of Short-Term Loudness Meters
Lavoie, Michel C.; Soulodre, Gilbert A.
Recently, much effort has been devoted to developing and evaluating an algorithm that can accurately measure the long-term loudness of mono, stereo, and multichannel audio signals. This has resulted in a new ITU Recommendation that provides a single loudness reading for the overall audio sequence. In many applications it is desirable to also have a measure that can continuously track the short-term loudness of the audio signal over time. Such a meter would be used in conjunction with existing metering methods to provide additional information about the audio signal. In the present study, subjective test methods are devised to aid in the development of a short-term loudness meter. Subjective methods for evaluating the meter’s performance are also explored.

6890
Headphones Listening Tests
Opitz, Martin
In the present work a dedicated listening environment for the subjective evaluation of headphones is described. Focus is on the subjective sound quality of different headphones. The listening environment comprises a dedicated playback-system in a reference listening room. A software package consisting of 3 parts is developed allowing the design, execution and post-processing of headphones listening tests. First experience with this tool and results of subjective audio ratings for 4 different headphones are described. Factorial analysis of the obtained responses suggests that a projection of the ratings in a 2-dimensional factorial space results in negligible loss of information. The benefit of a new membrane technology results in significantly improved subjective ratings of the respective headphones.

6891
Distance Perception of Phantom Sound Images Presented by Multiple-Loudspeakers Placed at Different Distance in Front of Listener
Hamasaki, Kimio; Kurozumi, Kohichi; Okumura, Reiko
Distance perception of the composite sound image reproduced by multiple loudspeakers, which were placed facing (near and far) a listener on a horizontal plane, was investigated. The multiple loudspeakers were placed at different distances in front of the listener, and reproduced both phantom and real sound images. The results of subjective evaluations showed the possibility that phantom sound images could be composed by the near and far loudspeakers and that listeners could distinguish their distance from it among the real sound images presented by each loudspeaker.

6892
Implementation of Swing Sound Image and Its Localization Accuracy in Two-Channel Stereo Sound Reproduction
Kudo, Akihiro; Kubo, Seiya; Hokari, Haruhide; Shimada, Shoji
In virtual sound reproduction with headphones, a well-known problem is that using nonindividualized head-related transfer functions(HRTFs) yields front-back confusion in sound image localization. To overcome this problem, a swing sound image method has already been reported that significantly reduces the front-back confusion in single sound source reproduction. In order to apply the method to two-channel stereo sound reproduction, this paper describes two methods of producing the swing sound image; the twist and compand methods. Three listening tests are used to assess their localization validity. The results show that, with suitable parameters, these methods can reduce front-back confusion.

6893
Error-Robust Frame Splitting For Audio Streaming Over the Lossy Packet Network
Chang, Joon-Hyuk; Kim, Jong Kyu; Kim, Jung Su; Kim, Nam Soo; Yun, Hwan Sik
In this paper, we propose a noble audio streaming scheme for the perceptual audio coder over the packet-based network. At first, a single frame is split into several subframes which are independently decoded based on the packet size for the robust error concealment. Also, we further improve the subframe splitting techniques by allocating spectral lines adaptively. Through an informal listening test, it is discovered that our approach enhances the audio signal under the lossy packet network.

6894
Adaptive Filter Banks using Fixed Size MDCT and Subband Merging for Audio Coding- Comparison with the MPEG AAC Flter Banks
Bimbot, Frederic; Camberlein, Ewen; Philippe, Pierrick
The MPEG Audio AAC Standard [1] uses two MDCT sizes and transition windows in order to adapt these uniform transforms to the signal specifics. Here, we present a new adaptation scheme based on a fixed size MDCT where subband merging is done in order to obtain better temporal resolution on chosen parts of the spectrum. The performances of the proposed approach are evaluated, with respect to the AAC, thanks to an objective measure, which takes into account the temporal masking effect.

6895
An Audio Archiving Format Based on the MPEG-4 Audio Lossless Coding
Harada, Noboru; Kamamoto, Yutaka; Moriya, Takehiro
An audio archiving tool using MPEG-4 Audio Lossless Coding as the encoding engine offers the excellent compression performance and functionality for handling several audio files as one archived file. The archiving tool is suitable for a variety of applications. Test results shows that the compression performance of the proposed tool is much better than that of ZIP when the input data are audio files. This application format for the archiving tool has been proposed to the MPEG-A and is being discussed as one of the Multimedia Application Formats under development.

6896
Quality Improvement of Scalable Audio Codec Based on Phase Estimation Technique For Reconstructed Harmonic Structure
Chang, Wei-Chen; Su, Alvin W.Y.
A spectral oriented trees based audio coder with harmonic structure reconstruction was proposed recently. Its fine scalability, low complexity, and almost MP3 quality make it suitable for Internet applications. However, during the reconstruction process, the phase information of reconstructed coefficient is absent. This may create phase discontinuities between two adjacent frames. It is audible in listening tests and makes the objective grade degraded for many testing sources. In this paper, we presented a new inter/intra-frame phase estimation method to reduce the problem and a refined harmonic reconstruction method is also applied. The quality improvement is significant. It outperforms the previous method and its performance is closer to the popular MP3pro for many audio music sources.

6897
The Singing Tutor: Expression Categorization and Segmentation of the Singing Voice
Bonada, Jordi; Loscos, Alex; Mayor, Oscar
Computer evaluation of singing interpretation has traditionally been based exclusively on tuning and tempo. This article presents a tool for the automatic evaluation of singing voice performances that regards on tuning and tempo but also on the expression of the voice. For such purpose, the system performs analysis at note and intra-note levels. Note level analysis outputs traditional note pitch, note onset and note duration information while Intra-note level analysis is in charge of the location and the expression categorization of note’s attacks, sustains, transitions, releases and vibratos. Segmentation is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules. A graphical tool for the evaluation and fine-tuning of the system will be presented. The interface gives feedback about analysis descriptors and rule probabilities.

6898
Expert System for Automatic Classification and Quality Assessment of Singing Voices
Zwan, Pawel
The aim of the research work presented is an automatic singing voice quality/type recognition system. For this purpose a database containing singers’ sample recordings is constructed and parameters are extracted from recorded voices of trained and untrained singers of different voice types. Parameters, which are especially designed for the analysis of the singing voice, are analyzed and a feature vector is formed. Each of singers’ voice samples is judged by experts and information about voice type/quality is obtained. Parameters extracted are used in the training process of a neural network and the effectiveness of an automatic voice timbre/quality classification is tested by comparing automatic recognition results with subjective expert judgements. Finally, discussion of results is presented and conclusions are derived.

6899
Facilities Used for Introductory Electronic Music: A Survey of Universities with an Undergraduate Degree in Audio
Akins, Joseph
This study reports the facilities used for introductory electronic music in United States universities that offered an undergraduate degree in audio production and technology in fall of 2005. The population included 54 programs listed on the Audio Engineering Society’s Directory of Educational Programs. With an online questionnaire, each university reported on the first hands-on electronic music course offered at their institution. With a response rate of 81%, the respondents reported on specific hardware, software, purposes, and curricular application. For example, 93% of the respondents reported using Mac OS where 20% reported using Microsoft Windows.

6900
Improvements to a Sample-Concatenation Based Singing Voice Synthesizer
Blaauw, Merlijn; Bonada, Jordi; Loscos, Alex
This paper describes recent improvements to our singing voice synthesizer based on concatenation and transformation of audio samples using spectral models. Improvements include firstly robust automation of previous singer database creation process, a lengthy and tedious task which involved recording scripts generation, studio sessions, audio editing, spectral analysis, and phonetic based segmentation; and secondly synthesis technique enhancement, improving the quality of sample transformations and concatenations, and discriminating between phonetic intonation and musical articulation.

6901
Modeling Musical Artculation Gestures in Singing Voice Performances
Bonada, Jordi; Maestre, Esteban; Mayor, Oscar
We present a procedure to automatically describe musical articulation gestures used in singing voice performances. We detail a method to characterize temporal evolution of fundamental frequency and energy contours by a set of piece-wise fitting techniques. Based on this, we propose a meaningful parameterization that allows reconstructing contours from a compact set of parameters at different levels. We test the characterization method by applying it to fundamental frequency contours of manually segmented transitions between adjacent notes, and train several classifiers with manually labeled examples. We show the recognition accuracy for different parameterizations and levels of representation.

6902
Automatic Tonal Analysis from Music Summaries for Version Identification
Gómez, Emilia; Herrera, Perfecto; Ong, Beesuan
Identifying versions of the same song by means of automatically extracted audio features is a complex task to achieve using computers, even though it may seem very simple for a human listener. The design of a system to perform this job gives the opportunity to analyze which features are relevant for music similarity. This paper focuses on the analysis of tonal and structural similarity and its application to the identification of different versions of the same piece. This work describes the situations where a song is versioned and several musical aspects are transformed with respect to the canonical version. A quantitative evaluation is made using tonal descriptors, including chroma representations and tonality, combined with the automatic extraction of summary of pieces through music structural analysis.

6903
Groovator - An Implementation of Real-Time Rhythm Transformations
Bonada, Jordi; Janer, Jordi; Jordà, Sergi
This paper describes a real-time system for rhythm manipulation of polyphonic audio signals. A rhythm analysis module extracts information of tempo and beat location. Based on this rhythm information, we apply different transformations: "Tempo variation", "Groove", "Meter variation" and "Accent variation". This type of manipulation is generally referred as Content-based transformations. We address characteristics of the analysis and transformation algorithms. In addition, user interaction plays also an important role in this system. Tempo variations can be controlled either by tapping the rhythm with a MIDI interface or by using an external audio signal as tempo control. We will conclude pointing out several use-cases, focusing on live performance situations.

6904
A Personalized Preset-based Audio System for Interactive Service
Jang, Daeyoung; Lee, Taejin; Lee, Yongju; Yoo, Jae-hyoun
A conventional audio service provides mixed one audio scene to user, so user can control the overall volume only. In personalized audio service however, user can control properties of audio objects such as loudness, direction and distance to construct his/her audio scene. But it is not easy to create audio scene for normal users, so we adopted preset-based system, which can provide various audio scenes to user and user can choose one of them based on his/her preference, conveniently. The system consists of an authoring tool, streaming server and a terminal. In this paper, we present design and implementation method of a personalized preset-based audio system and describe the simulation results and applications.

6905
Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Enviroments
Farner, Snorre; Sæbø, Asbjørn; Solvang, Audun; Svensson, Peter
Hand-clapping experiments were performed by pairs of subjects under the influence of a delay up to 68 ms in various acoustic environments. The mean tempo decreased close to linearly as function of the delay. During each sequence the tempo slowed down to a degree that increased with the delay but for delays shorter than about 15–23 ms, the tempo increased during the sequence. For the timing imprecision, and for the subjects’ judgements of their own ensemble performance, no effect of the delay could be observed up to 20 ms. Above 32 ms the effects were observed to increase with the delay. Virtual anechoic conditions lead to a higher imprecision than the reverberant conditions, and real-reverberation conditions lead to a slightly lower tempo.

6906
Audio System for Portable Market
Archibald, Fitzgerald J.
This paper describes the audio system software for portable audio players with respect to software and System-on-a-chip (SoC) architecture. The software system for portable devices includes audio playback, radio application, audio record, movie playback, image viewer applications. In addition, the portable systems can contain gaming and navigation applications. The portable audio players demand low-power and small form-factors differentiated by wide array of audio effects like equalizer, Time Scale Modification (TSM), and cross-fade. SoC plays a critical role in determining audio quality, audio features, power efficiency, battery life, form-factor, time to market and cost.

6907
Design and Evaluation of a High Performance Class D Headphone Driver
Magrath, Anthony J.
This paper presents the design and bench evaluation of a Class D headphone amplifier and provides compelling arguments as to why it is advantageous to use Class D, even for output powers as low as 40mW. Design tradeoffs are discussed which show how significant savings in power can be achieved at typical listening levels when compared with a conventional class AB amplifier.

6908
Loudspeaker-Room Adaptation for a Specific Listening Position using Information about the Complete Sound Field
Abildgaard Pedersen, Jan
A novel method is presented for equalizing a loudspeaker, for a specific listening position in order to compensate for an influence of the room in which it is positioned. The method is based on measuring the sound pressure in the listening position (focus position) and in at least 3 randomly selected positions scattered across the entire listening room (room positions). The measurement in the listening position holds information about the listener’s access to the sound field while the room positions hold information about the energy in the 3D sound field. The correction for the listening position is then bound by upper and lower gain limits calculated as a function of frequency from the information about the 3D sound field.

6909
Allpass Arrays: Theory, Design, and Applications
Goodwin, Michael M.
The realization of non-directional linear electroacoustic arrays using Bessel weighting has been described in the literature. In this paper, we discuss generalized allpass arrays; since the far-field response of a uniformly spaced linear array is specified by a mapping of the DTFT of the array weights, any FIR approximation of an allpass filter gives weights which result in a nearly uniform array response as achieved by Bessel arrays. We explain the fundamental array theory and present a straightforward method for the design of arbitrary-order allpass arrays. We then discuss applications of allpass arrays in crossover-filtered configurations and in the implementation of efficient frequency-invariant beamformers.

6910
Assessment of Nonlinearity in Transducers and Sound Systems – From THD to Perceptual Models
Voishvillo, Alex
Research of audibility of loudspeaker nonlinear distortion has not shown good correlation between traditionally used metrics (harmonics and intermodulation) and the subjective performance. The problem of sound fidelity-related methods to assess nonlinearity in transducers has not been solved. Wide application of low-bit rate compression systems (MP3, etc.) demanded the development of objective measurement methods based on perceptual models. These methods however have not been used for measurement of loudspeakers and they may not be optimal for that due to the different nature of nonlinearity in transducers. Recently perceptual models created specifically for the assessment of nonlinearity in transducers have emerged. In this work analysis of the old and new methods, their comparison, and the prospects for future developments are discussed.

6911
An Important Aspect of Underhung Voice-Coils: A Technical Tribute to Ray Newman
Carlson, David; Frye, Kent; Keele, Jr., Don. B. (Don); Long, Jim; Newman, Raymond J.; Ruhlen, Matt
In the 1970s, Ray Newman while at Electro-Voice, single handedly and very successfully promoted the use of the then new concept of the Thiele/Small parameters and related design techniques for categorizing loudspeakers and systems to the loudspeaker industry. This paper posthumously recounts the contents of three significant Electro-Voice memos written in 1992 by Ray Newman concerning a comparison of overhung versus underhung loudspeaker motor assemblies. The information in the memos is still very relevant today. He proposed a comparison between the two assembly types assuming motors that had the: 1. same Xmax, 2. same efficiency, 3. similar thermal behavior, and 4. same voice coil. He calculated the required magnetic gap energy and discovered to his surprise that the magnet requirements actually went down dramatically when switching from an overhung to an underhung structure and depended only on the ratio between Xmax and the voice-coil length. This is in contrast with “common sense” that dictates that longer gaps mean larger magnets. He showed that for high-excursion motors, a switch could be made from a ferrite overhung structure to an equivalent high-energy neodymium underhung structure with little cost penalty. This paper recounts this early work and then presents motor predictions using present-day magnetic FEM simulators illustrating his concepts. Ray’s original memos and notes will also be included as an appendix to the paper.

6912
The Acoustic Center: A New Concept for Loudspeakers at Low Frequencies
Vanderkooy, John
This paper focuses on the acoustic center, which represents a particular point for a normal sealed-box loudspeaker that acts as the origin of its low-frequency radiation. At low frequencies, the radiation from such a loudspeaker becomes simpler as the wavelength of the sound becomes large relative to the enclosure dimensions, and the system behaves externally as a spherical point source. Although there are near-field effects very close to the loudspeaker, the acoustic center has a clear meaning even a short distance from the enclosure, up to frequencies of about 200 Hz for typical systems. The low-frequency response of loudspeakers in rooms is determined by the position of their acoustic centers. The study is underpinned by: (1) a mathematical multipole expansion of the output of a loudspeaker, (2) an acoustic boundary-element calculation of a number of loudspeaker systems, (3) some measurements that corroborate the concept of the acoustic center, and (4) a discussion of a number of relevant concepts.

6913
Contextual Effects on Sound Quality Judgements: Part II – Multi-Stimulus vs. Single Stimulus Method
Beresford, Kathryn; Ford, Natanya; Rumsey, Francis; Zielinski, Slawomir K.
In a previous pilot experiment (Part I), a single stimulus method was employed to evaluate contextual effects on sound quality judgements. In this investigation (Part II), a multi-stimulus comparison method was used to evaluate the potential influence of listening context on sound quality judgements. Audio quality was assessed, as before, in two differing audio environments: a left-hand drive vehicle and an ITU-R BS.1116-conformant listening room. Trained and untrained listeners compared and graded audio quality for four stimuli with degradations in the midfrequency range. No identified reference (anchor) was used in the listening test, providing the opportunity for the influence of the audio environment to be observed in the results. Contraction bias, which was caused by the single stimulus method, was not evident in the results of this second study. Additionally listeners were able to discriminate between differently degraded stimuli where this was not possible in the initial research. Some small contextual effects were observed, however biases resulting from the indirect context comparison make it difficult to draw substantial conclusions.

6914
Audibility of Time Differences in Adjacent Head-Related Transfer Functions (HRTFs)
Hoffmann, Pablo F.; Møller, Henrik
Changes in the temporal and spectral characteristics of the sound reaching the two ears are known to be of great importance for the perception of spatial sound. The smallest change that can be reliably perceived provides a measure of how accurate directional hearing is. The present study investigates audibility of changes in the temporal characteristics of HRTFs. A listening test is conducted to measure the smallest change in the interaural time difference (ITD) that produces an audible difference of any nature. Results show a large inter-individual variation with a range of audibility thresholds from about 20 µs to more than 300 µs.

6915
Perceptual Evaluation of Algorithms for Blind Up-mix
Bube, Sebastian; Fabris, Christian; Hohberger, Thomas; Köhler, Anja; Liebetrau, Judith; Sporer, Thomas; Walther, Andreas
The number of consumer home theatre systems with surround capabilities has increased heavily. Nonetheless, most audio content is still 2-channel stereo. Thus, to enjoy the advantages of their surround-systems for all types of content, consumers resort to systems that automatically create multi-channel sound from legacy sources ("blind up-mix"). While a number of algorithms are used today, there is no commonly accepted test methodology to evaluate their sonic performance. Standardized listening test procedures evaluate audio quality relative to an unimpaired reference as a ground truth and thus are not applicable to up-mix scenarios. In this paper a new listening test procedure is described which is designed to consistently assess the quality of up-mix (or down-mix) algorithms. First test results are presented.

6916
Pitch Transposition of Flute Tones Based on Variation of Average Spectral Distribution
Griffith, Niall J. L.; O'Leary, Sean
The problem of pitch transposition in relation to the consistency of the timbre of a flute over its pitch range is investigated. The transposition method outlined here is based on the average variation of spectral distribution with pitch, and preserving the spectral behaviour in relation to the productive mechanism. A set of measures is proposed to measure the variation of the average spectral distribution with pitch, and a set of samples are analysed over the pitch range of the instrument. These measures are used in the transposition model to correct the average spectral distribution.

6917
Pitch Coherence as a Measure of Apparent Distance in Performance Spaces and Muddiness in Sound Recordings
Griesinger, David
This paper demonstrates a physiological method whereby sonic distance and muddiness can be quantified through the detection of pitch fundamentals from the phase coherence of harmonics in the vocal formant range. The method allows the perceived direct/reverberant ratio of a performance or a recording to be determined from a single channel of a recording of speech or music, allowing quality assessments during actual performaces. Preferred values of the direct/reverberant ratios above 1000Hz obtained by this method are +3 to +6dB. This result has important consequences both for performance acoustics and recording.

6918
A Comparison of Various Multichannel Loudness Measurement Techniques
Lyman, Steve; Seefeldt, Alan
In this study, two recently proposed objective measures of perceived loudness for monophonic audio signals are extended in several ways to deal with multichannel audio. The extensions range in complexity from a simple sum of the individual channels to the use of measured HRTFs to simulate the audio signals arriving at the ears. A database of subjective loudness matching data of multichannel audio is generated, and the performance of the various objective measures, including the particular multichannel measure recently adopted by the ITU-R, is compared against this data.

6919
Predicting Listener Preferences for Surround Microphone Technique through Binaural Analysis of Loudspeaker-Reproduced Piano Performances
Kim, Sungyoung; Martens, William L.; Marui, Atsushi; Walker, Kent
Four solo piano pieces were presented through a five-channel loudspeaker reproduction system for a pairwise preference test in a previous study, and the results of that test were described in terms of the interaction between program material and surround microphone technique. In an attempt to predict the obtained preference choices on the basis of the binaural signals recorded during loudspeaker reproduction of differing versions of these musical programs, a number of electroacoustic measures on the test stimuli were examined via stepwise multiple regression. The most successful prediction resulted from a combination of Ear Signal Incoherence (ESI) and Side Bass Ratio (SBR), regardless of methodological differences between two independently tested groups of listeners.

6920
The Accuracy and Consistency of Spectrographic Analysis for Voice Identification
Smith, Jeff M.
This test investigated the accuracy and consistency of voice identification comparisons made by 5 trained examiners over a three week period. These individuals were all students of the University of Colorado at Denver and had taken a semester long course in Audio Forensics with limited training in voice identification. Each week, examiners conducted 8 closed-trial comparisons of 4 clue-phrases from both male and female speakers. In simulating a closed set spectrographic line-up, each comparison consisted of spectrograms from a pool of 4 “known” speakers and one “unknown” speaker- audio recordings of the known and unknown speakers were made 9 months apart. From the pool of known speakers, the examiner made a positive identification match to the unknown. After the three week period, data reveled that examiners reached the same conclusion in all three examinations for only 50% of the comparisons. The average accuracy of these examinations was 65%. This paper discusses the outcome of the experiment including interpretation of these and other results.

6921
Loudspeaker Thermal and Safety Data Acquisition System
Buck, Marshall
A four channel data acquisition system has been designed for measurement of four parameters needed for safety and thermal testing and modeling in a voice coil driven loudspeaker: Voice coil temperature, RMS voltage drive, current, and true V x I power dissipated. These data are stored in Excel format for post-processing. Safety tests with alternating or direct current stimulation require the current versus time displays needed for pre testing to UL 1480 and ANSI/CEA-636 standards Rated at 32 amperes, and 125 volts, the instrument is suitable for voice coils rated up to 5000 Watts. A real time graphics display on a standard PC is implemented with a USB interface.

6922
Surface Scattering Uniformity Measurements in Reflection Free Environments
Conti, Lorenzo; Farina, Angelo; Galaverna, Paolo; Martignon, Paolo; Rizzi, Lorenzo; Rosati, Andrea
Following previous investigation, carried out at the University of Parma in 1999 and 2000, LAE (Laboratory of Acoustics and Electroacoustics) started a new measurement campaign to compare with the original results on the same type of diffusor panels, to verify AES-4id-2001 measurement standard and to investigate the nature of scattering phenomena in more detail. Measurements are conducted on the floor of a large closed space to obtain a reflection free time window, long enough to study the first reflection from the panel; the use of sine sweep excitation signals instead of the recommended MLS ones permits to ameliorate the acquisition process. The present article discusses research background studies and the results from the first round of measurements.

6923
Picturing Dither: Dithering Pictures
Christou, Cameron N.; Lipshitz, Stanley P.
The desirable properties that follow from the use of (nonsubtractive) triangular probability density function (TPDF) random dither in digital audio quantization and noise shaping are now well known in the audio community. The principal purpose of this paper is to use a visual analogy to aid audio engineers in their understanding of how proper TPDF dithering and noise shaping can convert otherwise objectionable, correlated quantization errors into benign, uncorrelated and less visible ones. As they say, “a picture is worth a thousand words.” Our secondary purpose is to demonstrate, in the process, that the very same concepts, applied now in the spatial instead of the temporal domain, are just as useful and beneficial in the field of digital picture processing too. We present color (in the PDF version of this paper) and monochrome images of the results of coarse quantization, both with and without dither and/or noise shaping, to help us make our points. [In the “live” presentation of this paper, we shall play an audio example at the same time as we show each picture, so that one can simultaneously both see and hear each effect being discussed.]

6924
Comparison of Frequency-Warped Representations for Source Separation of Stereo Mixtures
Burred, Juan José; Sikora, Thomas
We evaluate the use of different frequency-warped, nonuniform time-frequency representations for the purpose of blind sound source separation from stereo mixtures. Such transformations enhance resolution in spectral areas relevant for the discrimination of the different sources, improving sparsity and mixture disjointness. In this paper, we study the effect of using such representations on the localization and detection of the sources, as well as on the quality of the separated signals. Specifically, we evaluate a constant-Q and several auditory warpings in combination with a shortest path separation algorithm and show that they improve detection and separation quality in comparison to using the Short Time Fourier Transform.

6925
Auditory Component Analysis
Boley, Jon
Two of the principle research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis and a class of statistical analysis techniques known as Independent Component Analysis. This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly. It then measures features of the resulting tracks and separates the sounds statistically by matching feature sets and attempting to make the output streams statistically independent. The proposed system is found to successfully separate artificial and acoustic mixes of sounds. As expected, the amount of separation is inversely proportional to the amount of reverberation present, number of sources, and interchannel correlation.

6926
Frequency Domain Artificial Reverberation using Spectral Magnitude Decay
Krishnan, Praveen Gobichettipalayam; Sadanandam, Ravirala Narayana Karthik; Vickers, Earl; Wu, Jian-Lung (Larry)
A novel method of producing artificial reverberation in the frequency domain, using spectral magnitude decay, is presented. The method involves scaling and decaying the magnitude response of the short-time Fourier transform, based on the desired room level and decay time as functions of frequency. Unlike time domain methods such as feedback delay networks, the current method requires less memory and provides independent control of the reverb energy and decay time in each frequency bin. Compared to convolution reverbs, the current approach offers flexible parametric control over the decay spectra and a computational cost that is independent of the decay time.

6927
Design of an Automatic Beat-Matching Algorithm for Portable Media Devices
Fedigan, Stephen; Jochelson, Danny
Methods to achieve accurate beat detection for musical signals have received much attention recently; however, very little literature has addressed techniques for achieving beat matching between two streams on portable devices with limited memory and processing power. This paper describes the architecture, design methods, obstacles, optimizations, and results for a new beat matching algorithm created for real-time use on embedded devices. This algorithm produces promising performance for use on portable media devices that often play modern musical genres.

6928
Artificial Reverberation: Comparing Algorithms by Using Monaural Analysis Tools
Bitzer, Joerg; Extra, Denis; Fischer, Sven; Simmer, Uwe
In this paper a comparison of different algorithms for artificial reverberation is be presented. The tested algorithms are commercially available devices and digital plug-ins in a broad price range plus algorithms known from literature. For the analysis we developed an analysis toolbox, which contains several monaural analysis methods, including the energy decay curve in fractional octave bands, auto-correlation, and other known measures of reverberation qualities. Furthermore, the behaviour over time will be analyzed, showing that many systems cannot be considered as time-invariant. Some statistical analysis of the impulse response will also be given. The purpose is to investigate whether synthetic reverberation is created pertaining to attributes of real rooms and whether there are differences between algorithms or not.

6929
Inverse Filtering Design Using a Minimal-Phase Target Function from Regularization
Bouchard, Martin; Norcross, Scott G.; Soulodre, Gilbert A.
Inverse filtering methods commonly use amplitude regularization as a technique to limit the amount of work done by the inverse filter. The amount of regularization needed must be carefully selected so that the audio quality is not degraded. This paper introduces a method of using the magnitude of the regularization to design a target/desired response in which the phase response can be arbitrarily chosen. By choosing a minimum-phase response, one can reduce any pre-response in the corrected signal that is introduced by the regularization. Objective measures, such as PEAQ and subjective tests conducted in accordance to the MUSHRA method are used to evaluate the subjective performance of this new approach of designing a target response.

6930
The Origins of DSP and Compression: Some Pale Gleams from the Past; A Historical Perspective on Early Speech Synthesis and Scramblers, and the Foundations of Digital Audio
Paul, Jon D.
This paper explores the history that led to modern day DSP and Audio Compression. The roots of modern digital audio sprang from Dudley's 1928 VOCODER, and the WWII era SIGSALY speech scrambler. We highlight these key inventions, detail their hardware and block diagrams, describe how they functioned, and illustrate their relationship to modern day DSP and compression algorithms.

6931
Determining the Need for Dither when Re-Quantizing a 1-D Signal
Benitez-Quiroz, Carlos Fabian; Hunt, Shawn D.
This paper presents novel methods for determining if dither is needed when reducing the bit depth of a one dimensional digital signal. These are statistical based methods in both the time and frequency domains, and are based on determining whether the quantization noise with no dither added is white. If this is the case, then no undesired harmonics are added in the quantization or re-quantization process. Experiments showing the effectiveness of the methods with both synthetic and real audio signals are presented.

6932
Shape-changing Symmetric Objects for Sound Synthesis
Bindel, David; Bruyns, Cynthia
In the last decade, many researchers have used modal synthesis for sound generation. Using a modal decomposition, one can convert a large system of coupled differential equations into simple, independent differential equations in one variable. To synthesize sound from the system, one solves these decoupled equations numerically, which is much more efficient than solving the original coupled system. For large systems, such as those obtained from finite-element analysis of a musical instrument, the initial modal decomposition is time-consuming. To design instruments from physical simulation, one would like to be able to compute modes in real-time, so that the geometry, and therefore spectrum, of an instrument can be changed interactively. In this paper, we describe how to quickly compute modes of instruments which have rotational symmetry in order to synthesize sounds of new instruments quickly enough for interactive instrument design.

6933
Unisong: A Choir Singing Synthesizer
Blaauw, Merlijn; Bonada, Jordi; Hideki, Kenmochi; Loscos, Alex
Computer generated singing choir synthesis can be achieved by two means: clone transformation of a single voice or concatenation of real choir recording snippets. As of today, the synthesis quality for these two methods lack of naturalness and intelligibility respectively. Unisong is a new concatenation based choir singing synthesizer able to generate a high quality synthetic performance of the score and lyrics specified by the user. This article describes all actions and techniques that take place in the process of virtual synthesis generation: choir recording scripts design and realization, human supervised automatic segmentation of the recordings, creation of samples database, and sample acquiring, transformation and concatenation. The synthesizer will be demonstrated with song sample.

6934
Accurate Low-Frequency Magnitude and Phase Estimation in the Presence of DC and Near-DC Aliasing
Garcia, Ricardo A.; Short, Kevin M.
Efficient high resolution parameter estimation of sinusoidal elements has been shown to be of fundamental importance in applications such as measurement, parametric decomposition of sounds and low bitrate audio coding. It has been shown that certain methods such as the Complex Spectral Phase Evolution (CSPE) can be used to estimate the true frequency, magnitude and phase of underlying tones in a signal with accuracy that is significantly more precise than the signal resolution of a transform based analysis. These methods usually require the signal elements to be spectrally separated so that the mutual interference is minimal. This paper extends the methods introduced in CSPE for low frequency real tone signals, where the interference or “leakage” from the negative frequencies is unavoidable, regardless of what analysis window is used. The new techniques give improved magnitude and phase estimates for the signal parameters.

6935
Frequency Domain Phase Model of Transient Events
Short, Kevin M.
Short time transient events are extremely challenging to represent in the transform domain employed by common transform-based codecs used in applications such audio compression. These short-time events last for a duration that is much shorter than a typical data window, and consequently have power distributed throughout the transform domain. Accurate representation of these events in the transform domain requires higher bitrates than usually available. A common solution is to use window switching, where smaller windows are used for short time transient events, but this has a negative impact on the bitrate as well. In this paper, we show that with certain simplifying assumptions, transient reconstruction can be reduced to a tractable problem that is performed in the frequency domain, so that the transient event can be easily mixed in with the representation of the non-transient events. A closed form frequency domain representation for the phase of a transient event is introduced, and it is shown that this can be done in an iterative way that allows for increasingly complex transient structures back in the time domain.

6936
Doing Good by the "Bad Boy": Performing George Antheil's Ballet mécanique with Robots
Lehrman, Paul; Singer, Eric
The Ballet mécanique by George Antheil was a musical composition far ahead of its time. Written in 1924, it required technology that didn't exist: multiple synchronized player pianos. Not until 1999, with the aid of computers and MIDI, could the piece be performed the way the composer envisioned it. Since then, it has been played over 20 times in North America and Europe. But its most unusual performance was the result of a collaboration between the authors: one, the music technologist who revived the piece and the other, a musical robotics expert. At the request of the National Gallery of Art in Washington, DC, they built a completely automated 27-piece orchestra, which played the piece nearly 100 times, without a serious failure.

6937
Linear Array Transducer Technology
Struck, Christopher J.; Unruh, Andrew D.
The Linear Array Transducer (LAT) is a loudspeaker technology using multiple opposed interleaved diaphragms to create a bass transducer with a cylindrical form factor and almost no mechanical vibration. Construction and operating principles are described. Equivalent circuit diagrams are provided and frequency response and impedance measurements are shown. Recent structural improvements in the LAT are also discussed. System (box) design considerations are discussed and examples of its use in a product are shown.

6938
A Novel Flexible Loudspeaker Driven by an Electret Diaphragm
Chen, Jen-Luan; Chiang, Dar-Ming
According to the flexible electronics intensively applied to consumer product in future, a new flexible electrostatic loudspeaker driven by the electret diaphragm is developed. The electret diaphragms of flexible loudspeakers are fabricated by fluoro-polymer with nano-meso-micro pores and charged by the corona method at room temperature. The interior surface areas of the pores of the electret films are effectively to increase the retention and stability of charges. The experimental results reveal that the retention and stability of charges of the electret diaphragm is satisfied to drive the flexible electrostatic loudspeaker. The sound pressure level of the flexible electrostatic loudspeaker ( 60mm*80mm*1mm) is measured as 80 dB/0.2 W at 1 kHz and 20 cm distance.

6939
Stress Analysis on Moving Assemblies and Suspensions of Loudspeakers
Bolaños, Fernando
The paper explains the basic results of numerical and experimental analysis of moving assemblies and suspensions of speakers taking into account the bending forces and the in plane forces that acts on these slender bodies. The distribution of these stresses is shown in cones of direct radiators and in domes of compression drivers as well. An explanation of the generation of subharmonics is obtained by this technique. The suddenly jump of the working point on moving assemblies is justified by means of the compression forces that act on the suspensions. These compression forces are the cause of the buckle or snap that very often the speakers do. This article analyzes different suspension’s types showing the compromise situation for the designer.

6940
Non Linear Stiffness of the Loudspeaker Measured in the Evacuated Space
Demoli, Nazif; Djurek, Danijel; Djurek, Ivan; Petosic, Antonio
The impedance of the mechanical vibration system of the loudspeaker was measured in vacuo in order to remove the contributions of the radiation impedance of air. The Hooke constant k was evaluated by the use of calibrated weights and from the membrane displacements due to the force exerted by DC current in the voice coil. The resonant frequency was found to decrease with increasing non-linear Hooke constant, which is attributed to the effective mass of the vibration system, dependent upon elongation. The effective mass was evaluated from the fitting of measured and calculated loudspeaker impedance curve.

6941
Linearization of Nonlinear Loudspeakers
Pedersen, Bo Rohde; Rubak, Per
Feed forward methods for compensation of nonlinearities in loudspeakers are studied and tested in simulation cases. An adaptive feed forward controller is investigated to handle the drift caused by temperature, ageing and production spread. For estimating the needed parameter accuracy (match between controller and plant parameters) we have tested a simple feed forward controller with different degree off parameter mistuning between plant (loudspeaker) and controller. The required system identification (tracking of the changes in linear loudspeaker parameters) is investigated using and simple 2nd order IIR model for the linear loudspeaker. Different techniques to handle the stability problem for adaptive IIR filters are investigated.

6942
Response Adaptation of Loudspeaker System
Jung, Hyun-Ju; Lee, Mingu; Lee, Sinlyul; Sung, Koeng-Mo
In this paper, variations in the frequency response of vented-box loudspeakers due to the adjustment of several less constrained physical parameters are predicted. In addition, with this information, estimation of the optimum values of the parameters that the corresponding frequency response optimally fits the arbitrary objective response is accomplished. Also, the MATLAB® GUI program, which performs the procedure automatically, with the vented-box loudspeaker parameters as input, is presented. The extendibility of the limitations of this method in terms of the type of the loudspeaker, the optimality criterion, etc., is discussed.

6943
Digital-Driven Piezoelectric Speaker using Multi-Bit Delta-Sigma Modulation
Ogata, Katsuya; Soga, Tsuyoshi; Ueno, Hajime; Yasuda, Akira
Although a substantial quantity of music data is stored as digital information, as in the case of CDs and MDs, an analog drive is still the main component of a loudspeaker. If the speaker can be driven digitally, it becomes possible to perform all processes from the input to the output digitally. As a result, the analog power amplifier and some other components become unnecessary and a small, light, and high-quality speaker can be achieved. In this paper, we propose a digital-driven piezoelectric speaker employing multi-bit delta-sigma modulation.

6944
Design Considerations for Shallow Subwoofers
Futtrup, Claus
Conventional subwoofers are usually quite deep to accommodate long throw. A shallow subwoofer (SSW) design is presented which aims to maintain the quality of low distortion and long throw bass reproduction. Two design concepts, applicable to larger drivers, are described and results are shown. The result is more bass for a given speaker-depth without compromising the sound quality at low frequencies. One such concept is the sandwich cone, another the strut supported cone. The chosen design of a low profile diaphragm with strut support, is described in detail. Another issue is the motor and spider design. Considerations for joint motor and spider design are analyzed for a series of configurations. Advantages and disadvantages of each are described. The chosen design integrates the spider into the motor system to preserve space but still allows for a large diameter spider to be applied.

6945
A Plane Wave Transducer: Technology and Applications
Kelloniemi, Antti; Mettälä, Kari
A method of high volume manufacturing of a plane wave transducer element is presented with application examples and measurement results. The element produces a plane wave, which exhibits remarkably less geometric attenuation with increasing distance than conventional cone loudspeakers. These new, highly directive audio transducers are beneficial in several uses. As the sound is transmitted only to the wanted direction, the amount of reflections deteriorating the sound quality is minimized and the amount of disturbance at surrounding space is diminished. A directive microphone can be produced using the same technology, which in turn enables the construction of a locally controlled active noise cancellation panel.

6946
Digital Correction of Switching Amplifier by Error Re-modulation method
Jang, Seong-cheol; Lee, Heesoo; Park, Haekwang; Shimanskiy, Vladislav; Song, Youngsuk
In this paper, the error re-modulation method for digital correction of pulse width modulation switching amplifier is proposed. This method extracts an error signal from the difference between the reference pulse width modulation signal and power stage output, and generates the error pulse width modulation signal by using a re-modulation method. The error pulse width modulation signal is then used to compensate for the power supply noise and nonlinearity of the power stage. The proposed method is suitable for the correction of the PWM controller in a full digital amplifier.

6947
Iterative Method for Natural Sampling
Jang, Seong-cheol; Shimanskiy, Vladislav
Performance of pure digital audio amplifiers using pulse width modulation (PWM) highly depends on the accuracy of the pulse-coded audio (PCM) signal to PWM modulation sequence conversion. This process implies the recovery of original analog signal values at irregular time instances bearing on a uniformly distributed PCM data only. The recovery, or “natural sampling”, requires interpolation processing giving a trade-off between accuracy of the result and computation speed. In this paper we propose a method for natural sampling providing tunable speed-performance constrains while giving the advantage of easy implementation in VLSI. Cubic polynomial interpolation and iterative solving algorithm, as well as experimental results, are presented in the paper.

6948
A High Performance S/PDIF Receiver
Lesso, Paul
This paper details the design and implementation of a novel S/PDIF transceiver with a very low jitter bandwidth. We describe and demonstrate a system based on multiple-loops that synchronises to the incoming data stream with a very low bandwidth and provides the original data unmodified on a clean low jitter output clock without the need for a sample rate converter. Thus we eliminate any jitter above a low frequency (typically 10Hz) on the input data and also avoid any distortion caused by sample rate converters.

6949
Loudspeaker-Based 3-D Audio System Design Using the M-S Shuffler Matrix
Jot, Jean-Marc; Walsh, Martin
This paper outlines a new design methodology that can help to achieve higher quality 3-D audio reproduction over loudspeakers for a variety of applications using only adapted M-S matrices. Several key M-S matrix based topologies are summarized and a new design methodology is presented that allows the design and efficient implementation of any new 3-D audio system using only M-S matrix-based topologies. A real-world design example is used to highlight how this new design methodology can not only help the 3-D audio system design process, but also improve the audio quality of the resulting reproduction.

6950
Binaural Simulation of Complex Acoustic Scenes for Interactive Audio
Jot, Jean-Marc; Philp, Adam; Walsh, Martin
We describe a computationally efficient spatial reverberation and 3-D positional audio mixing architecture for real-time virtual acoustics using headphones or loudspeakers. A new method for binaural synthesis of massive numbers of sound sources is introduced. Extensions of the processing architecture are described for modeling spatially extended sound events, simulating near-field emitters, rendering multi-room reverberation and incorporating the perceptually salient features of early reflections and acoustic obstructions in the listener's immediate virtual environment. The proposed approach enables the implementation of scalable interactive 3D audio rendering systems in personal computers, game consoles, set top boxes or mobile phones. The associated scene representation model is compatible with current interactive audio standards, including OpenAL, MPEG-4 and JSR-234.

6951
A Technique for Nonlinear System Measurement
Abel, Jonathan S.; Berners, David P.
A method for measuring nonlinear systems having a Volterra series is presented. The Volterra series is the parallel combination of elements having series input and output filters around a power-law distortion, and may be used to represent a wide variety of systems combining filtering and memoryless distortion functions. The technique is to measure the system using a swept sinusoid at a variety of amplitudes, and to use least squares to first separate the element responses and then identify the unknown input and output filters.

6952
Esophageal Voice Enhancement by Modeling Radiated Pulses in Frequency Domain
Bonada, Jordi; Loscos, Alex
Although esophageal speech has demonstrated to be the most popular voice recovering method after laryngectomy surgery, it is difficult to master and shows a poor degree of intelligibility. This article proposes a new method for esophageal voice enhancement using speech digital signal processing techniques based on modeling radiated voice pulses in frequency domain. The analysis-transformation-synthesis technique creates a non-pathological spectrum for those utterances featured as voiced and filters those unvoiced. Healthy spectrum generation implies transforming the original timbre, modeling harmonic phase coupling from the spectral shape envelope, and deriving pitch from frame energy analysis. Resynthesized speech aims to improve intelligibility, minimize artificial artifacts, and acquire resemblance to patient’s pre-surgery original voice.

6953
A Novel IIR Equalizer for Non-Minimum Phase Loudspeaker Systems
Freitas, Diamantino; Marques, Avelino
A novel approach for the equalization of non-minimum phase loudspeaker systems based on the design of an IIR inverse filter is presented. This IIR inverse filter is designed in time domain by minimization of the least squares error function that results of using the typical “Output Error” configuration in the inverse modeling of non-minimum phase systems, with an adjustable delay. Due to the nonlinear nature of the error function, iterative optimization methods for nonlinear least squares problems were applied, namely the Levenberg-Marquardt method. This approach allows the design of inverse filter based equalization solutions with lower computational requirements, lower equalization error and lower delay of the equalized loudspeaker system than the most used one, the FIR based inverse filter. The advantages of this new approach are demonstrated with its application for the equalization of two loudspeaker systems. The results of the objective evaluation of this application are outlined, presented and discussed regarding time and frequency domain equalization errors and the delay of the equalized loudspeaker.

6954
Spring Reverb Emulation Using Dispersive Allpass Filters in a Waveguide Structure
Abel, Jonathan S.; Berners, David P.; Costello, Sean; Smith, Julius O., III
Wave propagation along springs in a spring reverberator is studied, and digital emulations of several popular spring reverberator models are presented. Measurements on a number of springs reveal several dispersive propagation modes and evidence of coupling among them. The torsional mode typically used by spring reverberators is seen to be highly dispersive, giving the spring its characteristic sound. Spring reverberators often have several springs operating in parallel, and the emulations presented here use a set of parallel waveguide structures, one for each spring element. The waveguides explicitly compute the left-going and right-going torsional waves, including dispersion, propagation and reflection effects. Scattering from spring imperfections and from the rings coupling counter-wound springs are modeled via waveguide scattering junctions.

6955
Characteristics of Inharmonic Frequency Analysis of GHA and its Application to Audio Signal Processing
Fukube, Tohru; Muraoka, Teruo
GHA is a frequency analysis originally proposed by N.wiener in 1930. He aimed to analyze stochastic signals utilizing harmonic frequency analysis and clarified that any signal can be represented by almost periodic function whose frequency components are in inharmonic relationship. In 1993 Dr. Hirata proposed an inharmonic frequency analysis applicable to audio signal processing and it became called as “GHA”. The authors have been engaged in its improvements and utilizations, and reported several applications. Among them the authors have reported intensive noise reduction to damaged SP records in the past convention. In this paper, the principle of GHA and its fundamental characteristics will be explained together with its application to noise reduction in comparison with conventional spectral subtraction method.

6956
A Hybrid Speech Codec Employing Parametric and Perceptual Coding Techniques
Czyzewski, Andrzej; Kulesza, Maciej; Szwoch, Grzegorz
A hybrid speech codec for VoIP telephony applications is presented employing combined parametric and perceptual coding techniques. The signal is divided into voiced signal components that are encoded using the perceptual algorithm, unvoiced components that are encoded parametrically and transients that are not encoded with a lossy method. The codec architecture where voiced part of the CELP residual signal is perceptually encoded and transmitted to the decoder along with the CELP main bit stream is also examined. Various methods for transient detection in the speech signal are discussed. The results of experiments revealing the improved subjective quality of the transmitted speech are also presented.

6957
EuCon: An Object-Oriented Protocol for Connecting Control Surfaces to Software Applications
Boyer, Rob; Campbell, Phil; Freshour, Scott; Kloiber, Martin; McTigue, Jim; Milne, Steve
This paper describes a control surface to application protocol that addresses the problem of raising user interface efficiency in increasingly complex software applications. Compared with existing MIDI based protocols, this protocol was designed to have enough bandwidth, high control resolution, and wide variety of controls to provide software application users with the rich and efficient experience offered by modern large format mixing consoles. Recognizing that today’s audio engineer uses many different applications, it is able to simultaneously control multiple applications running on one or more computers from a single control surface. To give users the widest possible choice of applications, object-oriented design was utilized to promote ease of adoption by software developers.

6958
Considerations on Audio for Flash: Getting to the Vector Soundstage
Van Winkle, Charles
The Flash Platform has been known for animations and interactivity for some time now and research shows the Flash Player is one of the world’s more pervasive software platforms [1]. Although providing audio-rich video or interactive content through Flash is not new, preparing audio assets for Flash is new to many audio professionals. Audio for Flash poses noteworthy changes to audio professionals’ workflows when compared to more customary mediums for video or interactive content e.g. DVDs or video games. This paper gives an overview of the Flash Platform and takes a first look at the considerations audio professionals must make when preparing audio assets for Flash with modified practices suggested when necessary.

6959
5.1 Surround and 3D (Full Sphere with Height) Reproduction for Interactive Gaming and Training Simulation
Miller III, Robert (Robin)
Immersive sound for gaming and simulation, perhaps more than for music and movies, requires preserving directionality of direct sounds, both fixed and moving, and acoustical reflections dynamically affecting those sounds, to effect the spatiality being presented. Conventionally (as with popular music), sources are panned close-microphone signals or synthesized sounds; the presentation pretends “They are here,” where spatiality is largely that of the listening environment. Convolution with room impulse responses can contribute diffuse ambience but not “real” spatiality and tone color. These issues pertain not only to 5.1 where reproduction is a 2D horizontal circle of speakers, but to advanced 3D interactive reproduction, where the listener perceives the experience at the center of the sphere of natural hearing. Production techniques are introduced that satisfy both 3D and compatible 5.1. Independent measurement confirms that the system preserves directionality and reproduces life-like spatiality and tone color continuously in the 3D perception sphere.

6960
Automatic Volume and Equalization Control in Mobile Devices
Sergey, Kib; Budkin, Alexey; Goldin, Alexander A.
Noise spectrum and level are changed dynamically in mobile environments. Speaker volume comfortable in quite conditions becomes too low when the ambient noise level increases significantly. Speaker volume adjusted for good intelligibility in high ambient noise becomes annoyingly loud in quite. Automatic Volume Control may compensate for different levels of ambient noise by increasing or decreasing the speaker gain accordingly. However, if the noise and sound spectra are very different, such simple gain adjustment may not work well. Instead, more advanced technology will dynamically equalize reproduced sound so that it exceeds the noise level by a defined amount all over the frequency range.This paper describes practical aspects for Automatic Volume and Equalization Control in mobile audio and communication devices.

6961
Speech Source Enhancement using a Modified ADRess Algorithm for Applications in Mobile Communications
Cahill, Niall; Cooney, Rory; Humphreys, Kenneth; Lawlor, Robert
An approach to refine and adapt an existing music sound source separation algorithm to speech enhancement is presented. The existing algorithm has the capability to extract music sources from stereo recordings using the position of the sources in the stereo field. Described in this paper is the ability of a modified Azimuth Discrimination and Resynthesis algorithm (m-ADRess) to enhance speech in the presence of noise using a two-microphone array. Also proposed is a novel extension to the algorithm, which enables further noise removal from speech based on elevation angle of arrival. Objective measures and an informal listening test of processed speech show the suitability of m-ADRess for cleaning noisy speech mixtures in an anechoic environment.

6962
Frame Loss Concealment for Audio Decoders Employing Spectral Band Replication
Rose, Kenneth; Ryu, Sang-Uk
An efficient frame loss concealment technique is proposed for audio decoders employing spectral band replication (SBR). The high frequency bands of the lost frame are reconstructed by estimating the parametric information involved in the SBR process. Utilizing all SBR data from the previous and the next frame, the high-band envelope is adaptively estimated from the energy evolution of the surrounding frames. The tonality control parameters are determined so as to ensure smooth transition between the lost frame and its neighbors. Subjective quality evaluation demonstrates that the proposed technique achieves better quality of the concealed audio than the technique adopted in the standard aacPlus decoder.

6963
High-Frequency Interpolation for Motion-Tracked Binaural Sound
Algazi, V. Ralph; Duda, Richard O.; Hom, Roger C-M
Motion-tracked binaural (MTB) recording captures and exploits localization cues resulting from head rotation. The pressure field around the recording head is sampled with several microphones, and a head tracker on the listener’s head is used to interpolate between the microphone signals. Although time-domain interpolation works at low frequencies, phase interference causes problems at high frequencies. We previously reported on a simple procedure whereby low-frequency components were continuously interpolated but high-frequency components were obtained from the microphone nearest to the listener’s ear. Although effective, this technique may result in audible switching artifacts. In this paper we present and evaluate methods for continuous high-frequency interpolation of the spectral magnitudes of adjacent microphones that essentially eliminate spectral discontinuities arising from head rotation.

6964
Perceptual Importance of Karhunen-Lòeve Transformed Multichannel Audio Signals
Henning, Lars; Jiao, Yu; Rumsey, Francis; Zielinski, Slawomir K.
The Karhunen-Lòeve Transform (KLT) can be used to reduce the interchannel redundancy of multichannel audio signals. For this paper, the perceptual importance of Karhunen-Lòeve transformed multichannel audio signals was systematically studied using two experiments. The first experiment investigated the perceptual efects caused by removing some KLT eigenchannels. The results showed that some eigenchannels are not perceptually important and consequently can be discarded with minimal degradation of basic audio quality. The second experiment involved further investigation on the perceptual effect of KLT processing on the audio quality of multichannel audio as a function of the nature of the multichannel audio and eigenvalue extraction methods of KLT processing. It was also attempted to establish the relationship between the order of perceptual importance and the order of statistical importance of KLT eigenchannels.

6965
A New Upmixer for Enhancement of Reverberance Imagery in Multichannel Loudspeaker Audio Scenes
Usher, John
This paper introduces a new signal processing system which enhances reverberance imagery (i.e. perceived ambiance or listener envelopment) in loudspeaker audio scenes. Sound components which affect reverberance imagery are extracted from a pair of unencoded audio signals and are radiated with two additional loudspeakers behind the listener. The new “ambiance extraction” system improves upon all extant systems by using a novel automatic (blind) equalizer based on the normalized least means square (NLMS) algorithm to align the input signals with respect to both level and time in order to create the difference signal. The alignment is typically undertaken using a 1024-tap frequency and ±10 ms time equalizer, which allows sound components with a high short-term correlation to be removed from the input audio signals. Subjective and objective evaluation was undertaken with recordings of solo musical performances in a concert hall, and show that the new system provides a computationally practical, high-quality solution to the problem of ambiance extraction for audio upmixing.

6966
Natural Reproduction of Symphony Orchestra Music by an Advanced Multichannel Live Sound System
Hamasaki, Kimio; Iwaki, Masakazu; Nakayama, Yasushige; Nishiguchi, Toshiyuki; Okubo, Hiroyuki; Okumura, Reiko
An advanced multichannel audio system for reproducing a live sound field with an ultimate sensation of presence and reality was set up and studied. The goal of this system is to provide listeners with a natural reproduction of orchestral music, as if they were hearing it in an actual sound field such as that in a concert hall. Subjective evaluations of hearing impression on orchestral sound were carried out to determine which attributes of a front sound stage were necessary for the natural reproduction of an orchestra. The results of the evaluations showed that perceptions of width, depth and localization of the orchestral sound influence the impressions of presence and reality.

6967
Localization in Horizontal-Only Ambisonic Systems
Benjamin, Eric; Heller, Aaron; Lee, Richard
Ambisonic reproduction systems are unique in their ability to separately reproduce the pressure and velocity components of the recorded audio signals. Gerzon proposed a theory of localization[1,2] in which the human auditory system is presumed to localize using the direction of the velocity vector in the reproduced sound at low frequencies, and the energy vector at high frequencies. An Ambisonic decoder has the energy and velocity vectors coincident. These are the directions of the apparent source when the listener can turn to face it. [2] Separately maximizing the low-frequency and mid/high-frequency operation of the reproduction system can optimize localization where the listener cannot turn to face the apparent source. We test the localization of horizontal-only Ambisonic reproduction systems using various narrow-band test signals to separately evaluate low-frequency and mid-frequency localization.

6968
An Experimental Verification of Localization in Two-Channel Stereo
Benjamin, Eric
In two-channel stereo the ratio of intensities between two loudspeakers is varied, and at low frequencies the differences in times-of-arrival of the sounds create phase differences between the two ears. These phase differences mimic those experienced in natural hearing, and thus the perceived localization is similar. The experiments described in this paper test the localization provided by stereo in actual use. The perceptions of listeners were collected and the acoustic signals at the entrance to their ear canals were recorded for analysis. Localization under optimum conditions gave results which are substantially similar to what is predicted by theory. Localization in sub-optimum conditions, such as at very low frequencies and such as are encountered in automobiles, was found to be substantially in error.

6969
Solving the Sticky Shed Problem in Magnetic Recording Tapes: New Laboratory Research and Analysis Provides a Safe and Effective Remedy
Richardson, Charles A.
The goal is to make available to AES’s members new research of its author and a leading analytical laboratory concerning: a) the primary causes and principal source of sticky shed material found on magnetic tapes; b) the unnecessary damage which baking tapes causes; and c) the development of a new, safe and effective process which restores contaminated tapes to their originally anticipated life span and allows repeated, trouble-free playbacks with excellent sonic performance. The methods used were: a) chemical analysis of tapes’ composition with and without sticky shed, b) electron microscope imaging of contaminated and remediated tapes and c) stickion-friction measurements of tapes without back coating and free of sticky shed, with back coating and sticky shed, and after restoration. The key findings are: a) head and hydrolysis cause sticky shed, b) back coating is the source of most of the sticky shed, c) baking causes degraded playback and permanent damage, and d) correct removal of back coating restores most problem tapes to long life allowing many trouble-free playbacks providing excellent sonic performance.

6970
Tape Degradation Factors and Predicting Tape Life
Hess, Richard L.
From 1947 through the 1990s, most of the world’s sound was entrusted to analog magnetic recording tape for archi-val storage. Now that analog magnetic tape has moved into a niche market, audio professionals and archivists worry about the remaining lifetime of existing tapes. This paper defines the basic tape types and the current state of knowl-edge of their degradation mechanisms. Conflicting prior work is reviewed and correlated with current experience. A new playback method for squealing tapes is described. Illustrations of various types of tape degradations and a sur-vey of many of the techniques used for tape restoration are included. Suggestions are made for further research and archival practices.

6971
Music MetaData Quality: A Multiyear Case Study using the Music of Skip James
Freed, Adrian
The case study reported here is an exploratory step towards developing a quantitative system for audio and music metadata quality measurement. Errors, their sources and their propagation mechanisms are carefully examined in a small but meaningful subset of music metadata centered on a single artist Skip James.

6972
Stop Counting Samples
Lund, Thomas
Level restriction in digital music production has traditionally been based on simply measurering the value of individual samples. Where sample counting may have been appropriate in the early days of digital, previous work has revealed how processing now exploits our archaic principles to an extent where significant distortion can be expected to develop downstream of the studio in signal converters and perceptual codecs. The paper shows how production methods in combination with simplistic level assessment is responsible not only for more distortion and listener fatigue, but also for level jumps where digital interfacing or file transferring is used, e.g. at a broadcast station. Improved working practices and measurement methods are suggested.

6973
A Real-Time Rhythmic Analyzer and Equalizer
Loviscach, Joern
The rhythmic analyzer and equalizer presented here allows to cut or boost the signal at a given audio frequency and a given rhythmic frequency, that is, number of beats per minute (BPM). A task that can be addressed with the rhythmic equalizer is for instance to emphasize series of 8th triplet notes played on the hi-hat of a drum set. The software works in real time and offers an interactive graphical user interface that supports both analysis and adjustment. The current energy distribution in the two-dimensional audio frequency / BPM space is displayed as a continuously updated backdrop image. The user paints the intended adjustments of levels and phases onto an image layer on top of this image.

6974
Blind Dereverberation of Audio Signals Using a Modified Constant Modulus Algorithm
Huang, Hesu; Kyriakakis, Chris
The single-channel blind dereverberation approach we present in this paper is an extension to the one based on Constant Modulus Algorithm (CMA) we proposed in previous work. By substituting the Modified CMA algorithm for the original CMA algorithm, we demonstrate a more suitable approach for blind deconvolution of reverberant audio signals with super-Gaussian distribution. To further improve the performance, the Modified CMA is applied to the LP residual instead of the time domain signal because of the flatter spectrum provided by the Linear Prediction (LP) residual approach. In real implementations, a Delayless Subband Adaptive Filtering (DSAF) architecture is also combined with CMA to further reduce the computational complexity. Experimental results show that our modified method outperforms previous approaches in audio signal blind dereverberation.

6975
Perceptual Importance of the Number of Loudspeakers for Reproducing the Late Part of a Room Impulse Response in a Listening Room
Solvang, Audun; Svensson, Peter
A sound field generated by 16 loudspeakers in the horizontal plane was used as reference and the impairment introduced by using 8, 4 and 3 loudspeakers for reproducing the late part of the room impulse response was investigated using listening tests. Stimuli were synthesized from repetitive octave-band wide pulses that were convolved with room impulse responses, and tempo as well as octave-band center frequencies were varied. Results show generally a barely perceptible impairment. Increasing the tempo led to a larger impairment for all loudspeaker configurations and frequencies. The impairment depended on the number of loudspeakers at 8 kHz but not at 250 Hz or 1 kHz. The reverberation in the listening room, 0.12 - 0.20 s, might have masked fluctuations in interaural time differences that are the dominating cue for 250 Hz and 1 kHz. The reverberation time was, however, so short that it hardly influenced fluctuations in the interaural level differences, the dominating cue at 8 kHz.

6976
A System for Adapting Broadcast Sound to the Aural Characteristics of Elderly Listeners
Komori, Tomoyasu; Takagi, Tohru
This paper describes an adaptive sound reproduction system for elderly listeners. We developed new audiometric equipment to gauge MAF (Minimum Audible Field) of listeners in the range from 125Hz to 16kHz. We found the average MAF by age for people from their twenties to eighties and investigated ways to adapt speech signals for elderly listeners based on their aural characteristics. The system adjusts the speech signal energy with reference to the partitioned frequency band below the average MAF. We have broadcasted the pilot programs for elderly listeners by using a simple method to reduce BGM (Back Ground Music) by 6dB from the original level. We evaluated subjectively that the proposed method is more effective than the simple method.

6977
A Comparison between Spatial Audio Listener Training and Repetitive Practice
Brookes, Tim; Kassier, Rafael; Rumsey, Francis
Despite the existence of various timbral ear training systems, relatively little work has been carried out into listener training for spatial audio. Additionally, listener training in published studies has tended to extend only to repetitive practice without feedback. In order for a generalised training system for spatial audio listening skills to prove effective, it must demonstrate that learned skills are transferable away from the training environment and it must compare favourably with repetitive practice on specific tasks. A novel study has been conducted to compare a generalised training system with repetitive practice on performance in spatial audio evaluation tasks. Transfer is assessed and practice and training are compared against a control group for tasks involving both near and far transfer.

6978
Quantified Total Consonance as an Assessment Parameter for the Sound Quality
Choi, In Yong; Chon, Sang Bae; Lee, Mingu; Sung, Koeng-Mo
There have been many attempts to quantify consonance for a long time. This paper introduces a more efficient and systematic algorithm for consonance quantification than the conventional definitions that were used in the past. We also verify that the quantified consonance can be treated as an additional psychoacoustical parameter to evaluate the sound quality of a certain noise-like sound from dual horns of a vehicle.

6979
Music Genre Categorization in Humans and Machines
Guaus, Enric; Herrera, Perfecto
Music Genre Classification is one of the most active tasks in Music Information Retrieval (MIR). Many successful approaches can be found in literature. Most of them are based on Machine Learning algorithms applied to different audio features automatically computed for a specific database. But there is no computational model that explains how musical features are combined in order to yield genre decision in humans. In this work we present a listening experiment where audio has been altered in order to preserve some properties of music (rhythm, harmony, etc) but at the same time degrading other ones. Results are compared with a series of state-of-the-art genre classifiers based on these musical properties and we draw some lessons from that comparison.

6980
Decoding Second Order Ambisonics to 5.1 Surround Systems
Neukom, Martin
In order to play back Higher Order Ambisonics (HOA) in concert, symmetric speaker set-ups with a large number of speakers are used. At the moment the only possibilities to provide Ambisonics to home users are the rendering for headphones with HRTF and the conversion to surround 5.1 systems. This paper shows the difficulties and limitations of the conversion of Higher Order Ambisonics to 5.1 surround and presents some viable solutions.

6981
Artificial Reverberation: Comparing Algorithms by Using Binaural Analysis Tools
Bitzer, Joerg; Extra, Denis
Different objective measurements are known to rate the spatial quality of concert halls. Many measures are based on analyzing the binaural impulse responses. In this paper, we will compare different algorithms for artificial reverberation in terms of these measures.The tested algorithms are commercially available devices and digital plug-ins in a broad price range. For the analysis, we programmed an analysis toolbox which contains several binaural analysis methods, including the interaural cross-correlation and the interaural difference. Furthermore, less known measures, modifications, and new techniques will be presented. The results indicate that objective measures can give some first impression of the spatial quality of reverberation devices.

6982
Loudspeaker and Room Response Modeling with Psychoacoustic Warping, Linear Prediction, and Parametric Filters
Bharitkar, Sunil; Kyriakakis, Chris
Traditionally, room response modeling is performed to obtain lower order room impulse response models for real-time applications. These models can be FIR or IIR, and maybe either linear-phase or minimum-phase. In this paper, we present an approach to model room responses using linear predictive coding (LPC) and parametric filters designed in the frequency warped domain. Frequency warping to the psychoacoustic Bark scale allows significant lower filter order designs. Within this context, the LPC model utilizes a significantly lower number of poles to model room resonances at low frequencies in the warped domain. The relatively low-order LPC pole locations and gains are then used to determine the center frequencies, the gain, and Q of a parametric filter bank. Gain and Q optimization of the parametric filter bank is performed to match the parametric filter spectrum to the LPC spectrum. Subsequently, the second-order poles and zeros of the parametric filter bank are directly unwarped back into the linear domain for low-complexity real-time applications. The results show that warping lowers the computational requirements for determining the roots as the density of the roots and the number of roots of the LPC polynomial is substantially reduced. Furthermore, results from using simply 4-6 parametric filter banks, modeled from the LPC spectrum, below 400 Hz show significant equalization.

6983
Contactless Hearing Aid for Infants Employing Signal Processing Algorithms
Dalka, Piotr; Kostek, Bozena; Kulesza, Maciej
The proposed contactless hearing aid is designated to be attached to the infant’s crib for sound amplification in a free field. It consists of 4 electret microphone matrix, and a prototype DSP board. The compressed speech is transmitted and amplified via miniature loudspeakers. Algorithms that are worked out deal with parasitic feedback, which occurs due to the small distance between microphone and monitors and potentially high amplification required. The beamforming algorithm is based on an artificial neural network (ANN). The ANN is used as a nonlinear filter in the frequency domain. Principles of algorithms engineered and the prototype DSP unit design are presented in the paper. Also, results of experiments simulating the real-life conditions are analysed and discussed.

6984
An Enhanced Implementation of the ADRess (Azimuth Discrimination and Resynthesis) Music Source Separation Algorithm
Cahill, Niall; Cooney, Rory; Lawlor, Robert
In this paper we present a novel enhancement to an existing music source separation algorithm which allows for a 76% decrease in computational load whilst enhancing its separation capabilities. The enhanced implementation is based on the ADRess (Azimuth Discrimination and Resynthesis) algorithm which performs a separation of sources within stereo music recordings based on the spatial audio cues created by source localization techniques. The ADRess algorithm employs gain scaling and phase cancellation techniques to isolate sources based on their position across the stereo field. Objective measures and subjective listening tests have shown the separation performance of the enhanced algorithm to be objectively and perceptually comparable with that of the original ADRess algorithm, whilst realizing a finer spatial resolution.

6985
A Simple, Robust Measure of Reverberation Echo Density
Abel, Jonathan S.; Huang, Patty
A simple, robust method for measuring echo density from a reverberation impulse response is presented. Based on the property that a reverberant field takes on a Gaussian distribution once an acoustic space is fully mixed, the measure counts samples lying outside a standard deviation in a given impulse response window and normalizes by that expected for Gaussian noise. The measure is insensitive to equalization and level, and is seen to perform well on simulated data, artificial reverberation, and measurements of room impulse responses. Listening tests indicate a correlation between echo density measured in this way and perceived temporal quality or texture of the reverberation.

6986
Pole-Zero Analysis of the Soundfield in Small Rooms at Low Frequencies
Oclee-Brown, Jack
At low frequencies in small rooms the number of acoustic modes is sparse. In this region the sound field can be most easily modelled using the method of modal decomposition. From this it is possible to extract the transfer behaviour of the room for different source and receiver locations and to analyse the pole-zero positions. In this paper it is demonstrated that the locations of the poles are independent of the source and receiver. The effects of some room correction methods are shown in this context.

6987
Modelling of Loudspeaker Systems using High-Resolution Data
Ahnert, Wolfgang; Feistel, Stefan
The need for high-resolution loudspeaker data is evaluated in detail, particularly complex data in original impulse response or frequency response formats, and how a new data format proposed earlier can be used for storing this and other information required to adequately describe a complex loudspeaker system. Prediction results for several loudspeaker models are compared based on different spectral and spatial resolutions. Calculations are also compared against measurements for different loudspeaker types, such as multi-way loudspeakers, clusters and line array systems. Finally, the advantages of more precise predictions are discussed with respect to increasing requirements regarding computer performance and data storage.

6988
Software Based Live Sound Measurements
Ahnert, Wolfgang; Feistel, Stefan; Finder, Enno; Miron, Radu Alexandru
In previous publications the authors introduced the software based measurement system EASERA to be used for measurements with different excitation signals like Sweep, MLS or Noise. The actual approach extends the range of excitations to natural signals like speech and music. This work investigates selected parameters like frequency range, dynamic range and fluctuation of the signal and the signal duration in order to reach conclusions about the conditions required to obtain results comparable with standard excitation signals. In this respect also the limitations of the standard stimuli and the proposed natural stimuli are discussed.

6989
Detection of Localized Sound Leaks in Walls and Their Effects on the Speech Privacy (Security) of Closed Rooms
Bradley, John S.; Gover, Bradford N.
A new speech privacy measurement procedure accurately indicates the degree of speech privacy at individual listening locations outside of a closed room, including near localized weak spots. To investigate the importance of various defects (such as penetrations, electrical outlets), they were introduced into a test wall dividing two reverberation rooms. For each configuration of the wall, the following sound transmission measurements were made from one room to the other: i.) a standard transmission loss test, ii.) the new speech privacy measurement procedure, and iii.) impulse response measurements to a highly-directional spherical microphone array. The results indicate the degree to which the various defects affect the speech privacy conditions, and the extent to which they are detectable by the various methods.

Back to AES Papers


(C) 2007, Audio Engineering Society, Inc.