AES San Francisco 2006
P1 - Digital Amplifiers
Paper Session Details
Thursday, October 5, 9:00 am — 11:30 am
Chair: Bob Adams
P1-1 CMOS Class D Amplifier with Output Optimization—Dan White, Sina Balkir, Michael Hoffman, University of Nebraska-Lincoln - Lincoln, NE, USA
Simple amplifier topologies are not the norm for integrated circuit (IC) class D amplifiers. A simple self-oscillating topology is mapped into a standard CMOS technology and fabricated in a 0.5 micron process. The output stage is optimized for a range of modulation indices, simultaneously increasing average efficiency and reducing chip area. Modifications to the optimization methodology are proposed to enhance efficiency and reduce the large transient currents inherent in CMOS inverter chains. Test results are presented and compared to predicted and simulated values. This project shows that design complexity is not requisite for good performance and high efficiency.
Convention Paper 6859 (Purchase now)
P1-2 A Digital Class-D Amplifier with Power Supply Correction—Jeroen Tol, Robert Rutten, Derk Reefman, Philips Research Laboratories - Eindhoven, The Netherlands; Arnaud Biallais, Lûtsen Dooper, Jeroen van den Boom, Philips Semiconductors - Nijmegen, The Netherlands; Renaud de Saint-Moulin, Philips Applied Technologies - Leuven, Belgium; Bruno Putzeys, Hypex Electronics - Groningen, The Netherlands; Frans de Buys, Philips Software - Leuven, Belgium
Digital class-D amplifiers are cost-effective solutions for a wide range of digital audio applications because of their high power efficiency and ease of integration. This paper presents a real-time cost-effective power supply correction algorithm, which increases the power supply rejection of an open-loop digital class-D amplifier substantially. It enables open-loop digital class-D amplifiers with inexpensive power supplies with less decoupling. Measurements on a prototype amplifier with single-ended loudspeaker loads show 57 dB suppression for 100 Hz supply ripple, while intermodulation products between a 100 Hz supply ripple and a 1 kHz audio tone are attenuated with 40 dB. The dynamic range of the amplifier is 96 dB, which is in agreement with jitter calculations and measured phase noise of the on-chip clock generator.
Convention Paper 6860 (Purchase now)
P1-3 Automotive Class D Digital Amplifier Output Stage—Brad Stewart, Freescale Semiconductor - Chandler, AZ, USA; Jim Lee, Freescale Semiconductor - Tempe, AZ, USA; Daniel Wildhaber, Freescale Semiconductor - Ridgeland, MI, USA; Ondrej Pauk, Freescale Semiconductor - Tempe, AZ, USA; Richard Deken, James Babb, Freescale Semiconductor - Ridgeland, MI, USA
Automotive applications using class D digital PWM switching amplifiers have long been limited by the approach’s perceived higher cost, the potential for radiated electromagnetic interference affecting in-vehicle electronics, and the difficulties of designing this type of amplifier using discrete components. An integrated circuit built on a high performance analog mixed signal plus power silicon process coupled with innovative circuit design now allows the deployment of high power class D amplifiers in vehicles that had previously been confined to only class AB amplifier designs. Class D PWM amplifiers, switching to 400 kHz or more, can achieve an unweighted dynamic range in exceeding 100 dB, linearity in excess of 80 dB, and PSRR greater than 80 dB with a full scale audio signal.
Convention Paper 6861 (Purchase now)
P1-4 High Performance Digital Feedback for PWM Digital Audio Amplifiers—Pallab Midya, Bill Roeckner, Theresa Paulo, Freescale Semiconductor - Lake Zurich, IL, USA
Non-idealities associated with the power stage of pulse width modulated (PWM) based, open loop digital audio amplifiers limit their performance. A high performance digital feedback system corrects for both power supply noise and power stage nonlinearity in a PWM digital audio amplifier. An integrated circuit (IC) implementation of this system, along with measured results, is presented. The PWM amplifier, switching between 300 kHz and 400 kHz, achieves an unweighted dynamic range in excess of 100 dB, linearity in excess of 80 dB, and excellent power supply rejection (PSR) with a large scale audio signal.
Convention Paper 6862 (Purchase now)
P1-5 Asynchronous Sample Rate Converter for Digital Audio Amplifiers—Pallab Midya, Bill Roeckner, Freescale Semiconductor - Lake Zurich, IL, USA; Tony Schooler, Motorola, Inc. - Schaumburg, IL, USA
A high performance digital audio amplifier system requires an asynchronous sample rate converter to synchronize the input digital data stream to the low jitter system clock used to generate the digital pulse width modulated (PWM) output. By performing the sample rate conversion with highly oversampled signals the computation and memory requirements are minimized. The performance of the digital amplifier system is not limited by the sample rate converter, while accommodating multiple input and output rates. The digital amplifier system, including the asynchronous sample rate converter, is implemented in an IC. Measured data shows linearity performance exceeding 120 dB.
Convention Paper 6863 (Purchase now)
P2 - High Resolution and Live Sound
Thursday, October 5, 9:00 am — 11:30 am
Chair: Joshua Reiss, Queen Mary, University of London - London, UK
P2-1 Jitter Simulation in High Resolution Digital Audio—Malcolm Hawksford, University of Essex - Colchester, Essex, UK
To reconstruct an audio waveform samples must be located precisely in time. Practical systems have sources of jitter described by both correlated and uncorrelated elements that result in low-level distortion. However, less well known is how different forms of jitter distort an audio signal. Jitter theory is developed to produce a simulator to enable jitter-induced distortion to be determined. Distortion spectra can then be observed and time domain distortion auditioned. Jitter induced distortion is compared to a range of errors, including DAC errors and incorrect use of dither. System architectures studied include LPCM with up-sampling and noise shaping and SDM.
Convention Paper 6864 (Purchase now)
P2-2 The Performance of Look-Ahead Delta-Sigma Modulators with Unstable Noise Shaping Filters—Steve Hoare, Jamie Angus, University of Salford - Salford, UK
Look-ahead sigma-delta modulators look forward k samples before deciding to output a “one” or a “zero.” We look at the performance of such modulators, when used with unstable noise shaping filters, and examine the trade-off between the number of paths that must be kept alive and the look-ahead depth needed to assure stability. Such information helps define the number of required paths in a “Pruned Tree” algorithm or the stack size in a “Stack” algorithm, as well as the minimum search depth required. Results are presented showing that an unstable noise-shaping filter, with an out-of-band gain of 3.0, can be used successfully. This gives a 10 to12 dB improvement in signal-to-noise ratio, compared to a conventional modulator.
Convention Paper 6865 (Purchase now)
P2-3 Prediction and Verification of Powered Loudspeaker Requirements for an Assisted Reverberation System—Mark Poletti, Industrial Research Limited - Lower Hutt, Wellington, New Zealand; Roger Schwenke, Meyer Sound Laboratories - Berkeley, CA, USA
Electronic enhancement systems are being increasingly used to provide control of the acoustics of multipurpose venues. Reverberation enhancement systems are an important component of such systems. These provide an electroacoustic field that supports the naturally occurring reverberant field using multiple microphones and loudspeakers. This paper derives a simple formula for the on-axis SPL required to support a given sound field. The formula is verified by measurements in two rooms with installed assisted reverberation systems. A relative measurement method is also developed to allow the maximum level to be determined from measurements at a lower level.
Convention Paper 6866 (Purchase now)
P2-4 The Representation of and Control Over Mixing Desks via a Software-Based Matrix—Richard Foss, Philip Foulkes, Rhodes University - Grahamstown, South Africa
The control over all the parameters of a mixing desk can be a daunting task. This paper describes a software system that has been created to represent the signal processing and routing functions of MIDI-controllable mixing desks in a conceptually clear manner. Input to output routings are displayed in the form of a matrix, while signal processing functionality can be accessed at the inputs, outputs, and cross-points. XML is used to capture the elements of the mixing desk and to associate appropriate MIDI control messages with these elements. This enables the same matrix template to be used for many mixing desks. Remote control is enabled by IP-based MIDI routing software known as MIDINet.
Convention Paper 6867 (Purchase now)
P2-5 A Cascaded Delta Sigma DAC with DWA Decreasing Mismatch Effect—Yousuke Terada, Akira Yasuda, Masao Zen, Syunsuke Katsumi, Hosei University - Tokyo, Japan
In this paper we propose a small-sized and high-performance cascaded delta-sigma DAC (CDS-DAC) that uses an analog FIR filter including second mismatch shaping functions. If a multibit DAC was composed, the mismatch caused by a variation of elements degrades the overall performance of the analog part. We propose a novel CDS-DAC having a second-order mismatch shaping function, which can be realized with several switches in the analog part and a first order mismatch shaper in the digital part in order to improve a performance degradation caused by the mismatches. The simulation shows that the SNR of the proposed CDS-DAC is 122 dB where the oversampling ratio is 128 and the mismatch of components is 1 percent.
Convention Paper 6868 (Purchase now)
P3 - Audio Coding
Thursday, October 5, 1:30 pm — 5:30 pm
Chair: Brett Crockett, Dolby Laboratories - San Francisco, CA, USA
P3-1 An Enhanced Encoder for the MPEG-4 ALS Lossless Coding Standard—Takehiro Moriya, Noboru Harada, Yutaka Kamamoto, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan
MPEG-4 Audio Lossless Coding (ALS) is a lossless coding standard for audio signals based on time-domain prediction. Enhanced encoder algorithms and implementation examples of the MPEG-4 ALS are described in this paper. To reduce the computational complexity of the encoder, simplified algorithms have been developed for the multichannel prediction coding and the long-term prediction tools. In addition, processing speed has been enhanced by means of software optimization. As a result of these improvements, encoding speed becomes as much as six times faster than that of the MPEG reference software. This makes the standard more useful for various practical applications.
Convention Paper 6869 (Purchase now)
P3-2 Perceptually Biased Linear Prediction—Arijit Biswas, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Albertus C. den Brinker, Philips Research Laboratories - Eindhoven, The Netherlands
A perceptually biased linear prediction scheme is proposed for audio coding, which uses only simple modifications of the coefficients defining the normal equations for a least-squares error. Thereby, the spectral masking effects are mimicked in the prediction synthesis filter without using an explicit psychoacoustic model. The main advantage is the reduced computational complexity. The proposed approach was Implemented in a Laguerre-based linear prediction scheme, and its performance has been evaluated in comparison with a linear prediction approach controlled by the ISO MPEG-1 Layer I-II model as well as with one of the latest spectral integration-based psychoacoustic models. Listening tests clearly demonstrate the viability of the proposed method.
This paper has been selected as the winner of the first Audio Engineering Society Student Paper Award.
Convention Paper 6870 (Purchase now)
P3-3 Fast Complex Quadrature Mirror Filterbanks for MPEG-4 HE-AAC—Han-Wen Hsu, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Spectral Band Replication (SBR) has been introduced in MPEG-4 HE-AAC as a bandwidth extension tool. All the framework of SBR is on a complex-value domain to avoid the aliasing effect, and hence results in considerable time complexity. This paper focuses on the complex Quadrature Mirror Filter (QMF) banks used in HE-AAC encoders and decoders, and proposes the two fast decomposition methods that are based on DCT-IV and DFT respectively, for the time-consuming matrix operations in the filterbanks. Therefore, the time complexity can be effectively reduced by the available fast algorithms for DCT and FFT.
Convention Paper 6871 (Purchase now)
P3-4 Compression Artifacts in Perceptual Audio Coding—Chi-Min Liu, Han-Wen Hsu, Chung-Han Yang, Kan-Chun Lee, Shou-Hung Tang, Yung-Cheng Yang, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Perceptual audio coding, as is known to all, can encode an audio signal transparently for human auditory perception at general conditions. In the past, there have been some types of artifacts defined in linear quantization or MP3 music tracks. However, with the advance of the new technologies from AAC, SBR, and parametric coding, various new types of artifacts will result. This paper models the newly-exploited audible artifacts and analyzes the problematic encoder modules leading to the artifacts. These artifacts should be a major measurement in developing the objective or subjective test. Also, we consider the artifact relief through the concealment schemes in decoders or the module design in encoders.
Convention Paper 6872 (Purchase now)
P3-5 Design of HE-AAC Version 2 Encoder—Chung-Han Yang, Han-Wen Hsu, Kan-Chun Lee, Shou-Hung Tang, Yung-Cheng Yang, Chia-Ming Chang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
HE-AAC version 2 consists of three encoders: MPEG-4 AAC low complexity, spectral band replication (SBR), and parametric stereo coding (PS). Our previous works have considered several modules including T/F grid search, tone/noise component adjustment, coupling coding, time region decision, and down-mix method in these three encoders. This paper considers the associated solutions related to the HE-AAC version 2, and an integrated design is proposed. The objective and subjective tests will be used to check the quality improvement.
Convention Paper 6873 (Purchase now)
P3-6 Analysis and Synthesis for Universal Spatial Audio Coding—Michael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the need for efficient representation of multichannel audio content. SAC methods are typically based on analyzing interchannel relationships in the input audio and resynthesizing those same relationships between the output channels. Recently, a method was proposed and demonstrated based on analyzing the input audio scene and describing it without reference to the channel configuration, thereby enabling flexible, accurate rendering on arbitrary output systems. In this paper we provide further mathematical treatment of this universal spatial audio coding system; we develop an analysis-synthesis method based on a linear algebraic model; present an efficient approach for adapting the synthesis to arbitrary loudspeaker configurations; and describe a straightforward scheme for scalable reduction of the spatial cue data.
Convention Paper 6874 (Purchase now)
P3-7 A Novel Very Low Bit Rate Multichannel Audio Coding Scheme Using Accurate Temporal Envelope Coding and Signal Synthesis Tools—Chandresh Dubey, Richa Gupta, Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibel Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal
Multichannel audio is increasingly ubiquitous in consumer audio applications such as satellite radio broadcast systems, surround sound playback systems, multichannel audio streaming, and other emerging applications. These applications often present a challenging bandwidth constraints making parametric multichannel coding schemes attractive. Several techniques have been proposed recently to address this problem. Here we present a novel low bit rate five-channel encoding system that has shown promising results. This technique called the Immersive Soundfield Rendition (ISR) System emphasizes accurate reproduction of multiband temporal envelope. The ISR system also incorporates a very low over-head (blind upmixing) mode. The proposed multichannel coding system has yielded promising results for multichannel coding in the 0 to 12 kbps range. More information and audio demonstrations will be available at www.atc-labs.com/isr.
Convention Paper 6875 (Purchase now)
P3-8 New Results in Low Bit Rate Speech Coding and Bandwidth Extension—Raghuram Annadana, Harinarayanan E. V., ATC Labs - Chatham, NJ, USA; Anibel Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Emerging digital audio applications for broadcast radio and multimedia systems are presenting new challenges such as the need to code mixed audio content, error robustness, higher audio bandwidth, and the need to deliver high quality audio at low bit rates; demanding a paradigm shift in the existing low bit rate speech coding techniques. This paper describes the continuation of our research in the area of low bit rate speech coding and enhancements in the recently introduced bandwidth extension toolkit, Audio Bandwidth Extension Toolkit (ABET). Several new modes of operation have been introduced in the codec, in particular making innovative use of perceptual coding tools. In addition, a new mode in ABET is added to improve the efficiency of the temporal shaping tool, Multi Band Temporal Amplitude Coding (MBTAC), by exploiting the time and frequency correlation in signals. The structure of the codec and its performance in these modes of operation are detailed. Audio demonstrations and further information is available at www.atc-labs.com/lbr/ and www.atc-labs.com/abet/.
Convention Paper 6876 (Purchase now)
P4 - Instrumentation and Measurement
Thursday, October 5, 1:30 pm — 4:00 pm
Chair: Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA
P4-1 A New Method for Measuring Distortion Using a Multitone Stimulus and Noncoherence—Steve Temme, Pascal Brunet, Listen, Inc. - Boston, MA, USA
A new approach for measuring distortion based on dual-channel analysis of noncoherence between a stimulus and response is presented. This method is easy to implement, provides a continuous distortion curve against frequency, and can be used with a multitone stimulus, noise, or even music. Multitone is a desirable test signal for fast frequency response measurements and also for assessing system nonlinearities. However, conventional single-channel multitone measurements are challenging because the number of intermodulation tones grows rapidly with the number of stimulus tones and makes it extremely difficult to separate harmonics from intermodulation products. By using dual-channel measurement techniques, only well-known, standard signal processing techniques are used, resulting in simplicity, accuracy, and repeatability.
Convention Paper 6877 (Purchase now)
P4-2 Gradient Microphone Design Using Miniature Microphone Arrays—Juha Backman, Nokia Corporation - Espoo, Finland
The paper describes a method of using a dense array of miniature microphones (e.g., MEMS or miniature electret) to yield precise one-point multichannel gradient microphones. The signals obtained from individual microphones in the array are used to obtain an estimate for the zeroth, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multichannel signals with adjustable polar patterns for recording purposes.
Convention Paper 6878 (Purchase now)
P4-3 Wind Generated Noise in Microphones—An Overview: Part II—Eddy B. Brixen, DPA Microphones A/S - Allerød, Denmark, EBB-consult, Smorum, Denmark
When microphones are exposed to wind, noise is generated. The amount of noise generated depends on many factors: the speed and the direction of the wind being, of course, two of the important factors. However, the size, shape, and design principles of the microphones are also important factors. At higher wind speeds, not only is noise generated but also distortion is introduced, normally as a result of clipping. This paper is the second in a series of two. The first paper basically presented standard condenser microphones with 1st order characteristics. This paper presents a number of comparative measurements on electro dynamic microphones—basically some of the workhorses in the field.
Convention Paper 6879 (Purchase now)
P4-4 A Portable Record Player for Wax Cylinders Using both Laser-Beam Reflection and Stylus Methods—Tohru Ifukube, The University of Tokyo - Meguro-ku, Tokyo, Japan; Yasuyuki Shimizu, Japan Women’s University - Bunkyou-ku, Tokyo, Japan
The wax phonograph cylinder invented by Thomas Edison in 1885 was the medium for recording sound until about 1930. Although a lot of historically valuable cylinders have been preserved all over the world, most of them changed in quality by recrystallization and had many cracks on their surfaces. We have developed a portable record player (3.4 kg-weight, 45 cm-width, 33 cm-length, and 10.5 cm-height) for the cylinders using both laser-beam reflection and stylus methods. In this paper the record player is shown to be useful for carrying it by hand and reproducing sounds in real time from damaged wax cylinders as well as the undamaged.
Convention Paper 6880 (Purchase now)
P4-5 High-Accuracy Full-Sphere Electro-Acoustic Polar Measurements at High Frequencies Using the HELS Method—Huancai Lu, Sean Wu, SenSound LLC - Grosse Point Farms, MI, USA; D. B. (Don) Keele, Jr., Harman International Industries - Northridge, CA, USA
Traditionally, high-accuracy full-sphere polar measurements require dense sampling of the sound field at very-fine angular increments, particularly at high frequencies. The proposed HELS (Helmholtz Equation Least Squares) method allows this restriction to be relaxed significantly. Using this method, far fewer sampling points are needed for full and accurate reconstruction of the radiated sound field. Depending on the required accuracy, sound fields can be reconstructed using only 10 to 20 percent of the number of sampling points required by conventional techniques. The HELS method allows accurate reconstruction even for sample spacing that violates the Nyquist spatial sampling rate in certain directions. This paper examines the convergence of HELS solutions via theory and simulation for reconstruction of the acoustic radiation patterns generated by a rectangular plate mounted on an infinite rigid flat baffle. In particular, the impact of the numbers of expansion terms and measurement points as well as errors imbedded in the input data on the resultant accuracy of reconstruction is analyzed.
Convention Paper 6881 (Purchase now)
P5 - Loudspeakers - Part 1
Friday, October 6, 8:30 am — 11:00 am
Chair: Steve Temme, Listen, Inc. - Boston, MA, USA
P5-1 Measurement and Visualization of Loudspeaker Cone Vibration—Wolfgang Klippel, Klippel GmbH - Dresden, Germany; Joachim Schlechter, University of Technology - Dresden, Germany
Optical measurement of loudspeaker cone vibration (scanning vibrometry) can also be accomplished by using a laser triangulation technique that is a cost effective alternative to Doppler interferometry. Since triangulation sensors provide primarily displacement, advanced signal processing is required to measure the break-up modes up to 20 kHz at sufficient signal to noise ratio. In addition to stroboscopic animation of the radiation pattern a new decomposition technique is presented for the visualization of the measured data. Radial and circular modes can be separated and the total vibration can be split into radiating and non-radiating vibration components. This kind of postprocessing reveals critical vibration modes, simplifies the interpretation, and gives indications for further improvements.
Convention Paper 6882 (Purchase now)
P5-2 An Extended Small Signal Parameter Loudspeaker Model for the Linear Array Transducer—Andrew Unruh, Christopher Struck, Richard Little, Ali Jabbari, Jens-Peter Axelsson, Tymphany Corporation - Cupertino, CA, USA
The Linear Array Transducer (LAT) is a tubular form-factor loudspeaker driver technology that, to a good first approximation, can be modeled by the standard linear time invariant small signal parameter (SSP) loudspeaker circuit model. However, to understand the behavior of a LAT to a greater level of detail, the SSP model can be extended with the addition of eight additional mechanical parameters. In this paper the nature of these additional parameters in the model are explained. Further, an extended blocked impedance model is introduced that may be used with LATs or conventional loudspeakers. Additionally, the model is correlated to measurements of currently available LATs. Finally, it is shown how the LAT extended SSP model is approximated by the standard loudspeaker SSP model.
Convention Paper 6883 (Purchase now)
P5-3 An Optimized Full Bandwidth 20 Hz to 20 kHz Digitally Controlled Coaxial Source—Hmaied Shaiek, ENST de Bretagne - Brest Cedex, France; Bernard Debail, Yvon Kerneis, Cabasse Acoustic Center - Plouzané, France; Jean Marc Boucher, ENST de Bretagne - Brest Cedex, France; Pierre Yves Diquelou, Cabasse Acoustic Center - Plouzané, France
This paper addresses the design considerations of the first four-ways, full-bandwidth coaxial loudspeaker system. Unlike conventional coaxial drivers, a special motor configuration has been considered in order to approach to a real coincident source. Practical constraints of moving mass element and cone shape will be exhibited with regard to the targeted sound radiation characteristics. Dedicated digital signal processing techniques will be implemented in order to optimize relevant parameters such as frequency response, directivity index, and radiation pattern of the system especially on drivers’ overlap bands. This optimization is achieved thanks to a complex weighting of the crossover filter transfer functions. The optimal weights are approached with a new routine using the gradient algorithm.
Convention Paper 6884 (Purchase now)
P5-4 Linear Phase Crossover Filter Advantages in Concert Sound Reinforcement Systems: A Practical Approach—Mario Di Cola, Audio Labs Systems - Milano, Italy; Miguel T. Hadelich, Dolby Laboratories - San Francisco, CA, USA; Daniele Ponteggia, Studio Ponteggia - Terni, Italy; Davide Saronni, Audio Labs Systems - Milano, Italy
Today concert sound reinforcement systems constantly demand improved performances. The chosen crossover approach could be a key point to improve the overall speaker performances, and modern DSP offers several chances to achieve this. FIR Linear Phase Filters, one of the methods of processing that belongs strictly to digital domain, could be very useful tools. FIR Linear Phase Crossover, even though well known and studied for some time, is for practical reasons not widely applied at the moment. Practical issues in real world applications suggest mixing together FIR and IIR techniques to arrive as a more efficient and practical approach. While the result of such a combination will not provide a perfect, ideal linear phase system, the overall result will be very similar to that of a Minimum Phase system. Better transient response, increased dept and warmth, improved coverage stability if compared to standard crossover approach: these are some of the advantages that can be achieved. In this practical research, an existing DSP-based piece of equipment has been used to process some real world loudspeaker systems. This piece of equipment can perform all the required processing using mixed FIR and IIR filter techniques. The differences and the advantages in time response, phase response, directivity improvements, and increased output capability will be reported in order to compare the results. Different kinds of loudspeaker systems have been processed and analyzed setting them both for the Minimum Phase response and other All Pass responses. The results will be shown and demonstrated with the large support of real measurements results.
Convention Paper 6885 (Purchase now)
P5-5 Optimum Diaphragm and Waveguide Geometry for Coincident Source Drive Units—Mark Dodd, KEF Audio - Maidstone, Kent, UK
Coincident source loudspeakers avoid the response and directivity irregularities seen with conventional spaced drivers in the crossover region. Earlier work has shown that by placing the high frequency driver at the apex of the low frequency diaphragm the directivity of both drivers may be regularized in the crossover region. This paper describes the application of transient finite element method to explore how some simple sources interact with various boundary conditions. A novel geometry giving much-improved bandwidth and directivity is introduced. Simulated results of an idealized high frequency driver using this geometry are compared to those of an idealized direct radiating dome. The implementation of a new design incorporating this novel structure is discussed, and results from a complete design are presented.
Convention Paper 6886 (Purchase now)
P6 - Psychoacoustics, Perception, and Listening Tests - Part 1
Friday, October 6, 8:30 am — 11:30 am
Chair: Sean Olive, Harman-JBL - Northridge, CA, USA
P6-1 Investigation of Hearing Loss Influence on Music Perception, in Auditoria, by Means of Stereo Dipole Reproduction—Andrea Capra, Marco Binelli, Daniela Marmiroli, Paolo Martignon, Angelo Farina, University of Parma - Parma, Italy
A large number of people who sit in theaters or in auditoria do not have an optimal perception of sound because of hearing loss. So, in order to find a correlation between objective parameters and subjective descriptors, executing therefore meaningful listening tests, we need to first study the perception of the customer. For this purpose, selected theatergoers, different in age, sex, and degree of hearing perception, were chosen as subjects. The listening test was based on the virtual spatial recreation of several theaters, by means of an optimized stereo dipole technology. The test was repeated with 30 subjects with and without hearing aids, which were previously set to compensate for the auditory loss. Some preliminary data analysis results are shown.
Convention Paper 6887 (Purchase now)
P6-2 Audibility of Linear Distortion with Variations in Sound Pressure Level and Group Delay—Lidia Lee, Eastern Michigan University - Ypsilanti, MI, USA; Earl Geddes, GedLee LLC - Northville, MI, USA
Recent psychoacoustic studies of nonlinear distortion have yielded some new insights into what audible problems in loudspeakers might be related to. This paper will show the results of recent subjective tests that extend the work of various previous works to show that sound level significantly affects the perception of linear distortion in audio systems. This means that the hearing system itself is nonlinear, and what has been thought of as being nonlinear distortion in the audio system may actually be a nonlinear perception directly in the receiver itself.
Convention Paper 6888 (Purchase now)
P6-3 Development and Evaluation of Short-Term Loudness Meters—Gilbert Soulodre, Michel Lavoie, Communications Research Centre - Ottawa, Ontario, Canada
Recently, much effort has been devoted to developing and evaluating an algorithm that can accurately measure the long-term loudness of mono, stereo, and multichannel audio signals. This has resulted in a new ITU recommendation that provides a single loudness reading for the overall audio sequence. In many applications it is desirable to also have a measure that can continuously track the short-term loudness of the audio signal over time. Such a meter would be used in conjunction with existing metering methods to provide additional information about the audio signal. In the present paper subjective test methods are devised to aid in the development of a short-term loudness meter. Subjective methods for evaluating the meter’s performance are also explored.
Convention Paper 6889 (Purchase now)
P6-4 Headphones Listening Tests—Martin Opitz, AKG Acoustics GmbH - Vienna, Austria
In the present paper a dedicated listening environment for the subjective evaluation of headphones is described. Focus is on the subjective sound quality of different headphones. The listening environment comprises a dedicated playback-system in a reference listening room. A software package consisting of three parts is developed allowing the design, execution, and postprocessing of headphones listening tests. The first experience with this tool and the results of subjective audio ratings for four different headphones are described. Factorial analysis of the obtained responses suggests that a projection of the ratings in a two-dimensional factorial space results in negligible loss of information. The benefit of a new membrane technology results in significantly improved subjective ratings of the respective headphones.
Convention Paper 6890 (Purchase now)
P6-5 Distance Perception of Phantom Sound Images Presented by Multiple Loudspeakers Placed at Different Distances in Front of a Listener—Reiko Okumura, Kimio Hamasaki, Kohichi Kurozumi, NHK Science & Technical Research Laboratories - Tokyo, Japan
Distance perception of the composite sound image reproduced by multiple loudspeakers, which were placed facing (near and far) a listener on a horizontal plane, was investigated. The multiple loudspeakers were placed at different distances in front of the listener and reproduced both phantom and real sound images. The results of subjective evaluations showed the possibility that phantom sound images could be composed by the near and far loudspeakers and that listeners could distinguish their distance from it among the real sound images presented by each loudspeaker.
Convention Paper 6891 (Purchase now)
P6-6 Implementation of Swing Sound Image and its Localization Accuracy in Two-Channel Stereo Sound Reproduction—Akihiro Kudo, Seiya Kubo, Haruhide Hokari, Shoji Shimada, Nagaoka University of Technology - Nagaaoka, Niigata, Japan
In virtual sound reproduction with headphones, a well-known problem is that using nonindividualized head-related transfer functions (HRTFs) yields front-back confusion in sound image localization. To overcome this problem, a swing sound image method has already been reported that significantly reduces the front-back confusion in single sound source reproduction. In order to apply the method to two-channel stereo sound reproduction, this paper proposes two methods of producing the swing sound image: the twist and compand methods. Three listening tests are used to assess their localization accuracy. The results show that, with suitable parameters, these methods can reduce front-back confusion.
Convention Paper 6892 (Purchase now)
P7 - Posters: Audio Coding
Friday, October 6, 9:00 am — 10:30 am
P7-1 Error-Robust Frame Splitting for Audio Streaming Over the Lossy Packet Network—Jong Kyu Kim, Hwan Sik Yun, Jung Su Kim, Seoul National University - Seoul, Korea; Joon Hyuk Chang, Inha University - Incheon, Korea; Nam Soo Kim, Seoul National University - Seoul, Korea
In this paper we propose a novel audio streaming scheme for the perceptual audio coder over the packet-switching network. Each frame is split into several subframes, which are independently decoded based on the specified packet size for robust error concealment. We further improve the subframe splitting techniques by allocating the spectral lines to each subframe adaptively. Through an informal listening test, it was discovered that our approach enhances audio signals under the lossy packet network environments.
Convention Paper 6893 (Purchase now)
P7-2 Adaptive Filter Banks Using Fixed Size MDCT and Subband Merging for Audio Coding—Comparison with the MPEG AAC Filter Banks—Ewen Camberlein, Pierrick Philippe, France Télécom R&D - Cesson-Sévigné, France; Frederic Bimbot, IRISA - Rennes, France
The MPEG Audio AAC Standard uses two MDCT sizes and transition windows in order to adapt these uniform transforms to the signal specifics. We present a new adaptation scheme based on a fixed size MDCT where subband merging is applied in order to obtain appropriate temporal resolution on chosen parts of the spectrum. The performances of the proposed approach are evaluated, with respect to the AAC, thanks to an objective measure, which takes into account the temporal masking effect.
Convention Paper 6894 (Purchase now)
P7-3 An Audio Archiving Format Based on the MPEG-4 Audio Lossless Coding—Noboru Harada, Takehiro Moriya, Yutaka Kamamoto, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan
An audio archiving tool using MPEG-4 audio lossless coding as the encoding engine offers the excellent compression performance and functionality for handling several audio files as one archived file. The archiving tool is suitable for a variety of applications. Test results show that the compression performance of the proposed tool is much better than that of ZIP when the input data are audio files. This application format for the archiving tool has been proposed to the MPEG-A and is being discussed as one of the multimedia application formats under development.
Convention Paper 6895 (Purchase now)
P7-4 Quality Improvement of a Scalable Audio Codec Based on a Phase Estimation Technique for Reconstructed Harmonic Structure—Wei-Chen Chang, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
A spectral oriented trees-based audio coder with harmonic structure reconstruction was proposed recently. Its fine scalability, low complexity, and almost MP3 quality make it suitable for internet applications. However, during the reconstruction process, the phase information of the reconstructed coefficients is absent. This may create phase discontinuities between two adjacent frames. It is audible in listening tests and the objective grades are degraded for many testing sources. In this paper we proposed a new inter/intra frame estimation method to reduce the problem and a refined harmonic reconstruction method is also applied. The quality improvement is significant. It outperforms the method in [the above mentioned paper] and its performance is closer to the popular MP3 Pro for many audio music sources.
Convention Paper 6896 (Purchase now)
P8 - Posters: Analysis and Synthesis
Friday, October 6, 11:00 am — 12:30 pm
P8-1 The Singing Tutor: Expression Categorization and Segmentation of the Singing Voice—Oscar Mayor, Jordi Bonada, Alex Loscos, Pompeu Fabra University - Barcelona, Spain
Computer evaluation of singing interpretation has traditionally been based exclusively on tuning and tempo. This paper presents a tool for the automatic evaluation of singing voice performances that depends on tuning and tempo but also on the expression of the voice. For such purpose, the system performs analysis at note and intra-note levels. Note level analysis outputs traditional note pitch, note onset, and note duration information, while intra-note level analysis is in charge of the location and the expression categorization of the note’s attacks, sustains, transitions, releases, and vibratos. Segmentation is done using an algorithm based on untrained Hidden Markov Models (HMMs) with probabilistic models built out of a set of heuristic rules. A graphical tool for the evaluation and fine-tuning of the system is presented. The interface gives feedback about analysis descriptors and rule probabilities.
Convention Paper 6897 (Purchase now)
P8-2 Expert System for Automatic Classification and Quality Assessment of Singing Voices—Pawel Zwan, University of Technology Gdansk - Gdansk, Poland
The aim of the research work presented is an automatic singing voice quality recognition system. For this purpose a database containing singers’ sample recordings is constructed and parameters are extracted from recorded voices of trained and untrained singers of different voice types. Parameters, which are especially designed for the analysis of the singing voice, are analyzed and a feature vector is formed. Each singers’ voice sample is judged by experts and information about voice quality is obtained. Parameters extracted are used in the training process of a neural network, and the effectiveness of an automatic voice quality classification is tested by comparing automatic recognition results with subjective expert judgments. Finally, discussion of results is presented and conclusions are derived.
Convention Paper 6898 (Purchase now)
P8-3 Facilities Used for Introductory Electronic Music: A Survey of Universities with an Undergraduate Degree in Audio—Joseph Akins, Middle Tennessee State University - Murfreesboro, TN, USA
This paper reports the facilities used for introductory electronic music in United States universities that offered an undergraduate degree in audio production and technology in fall of 2005. The population included 54 programs listed on the Audio Engineering Society’s Directory of Educational Programs. With an online questionnaire, each university reported on the first hands-on electronic music course offered at their institution. With a response rate of 81 percent, the respondents reported on specific hardware, software, purposes, and curricular application. For example, 93 percent of the respondents reported using Mac OS where 20 percent reported using Microsoft Windows.
Convention Paper 6899 (Purchase now)
P8-4 Improvements to a Sample-Concatenation Based Singing Voice Synthesizer—Jordi Bonada, Merlijn Blaauw, Alex Loscos, Universitat Pompeu Fabra - Barcelona, Spain
This paper describes recent improvements to our singing voice synthesizer based on concatenation and transformation of audio samples using spectral models. Improvements include robust automation of previous singer database creation process, a lengthy and tedious task that involved recording scripts generation, studio sessions, audio editing, spectral analysis, and phonetic based segmentation; and synthesis technique enhancement, improving the quality of sample transformations and concatenations, and discriminating between phonetic intonation and musical articulation.
Convention Paper 6900 (Purchase now)
P8-5 Modeling Musical Articulation Gestures in Singing-Voice Performances—Esteban Maestre, Jordi Bonada, Oscar Mayor, Universitat Pompeu Fabra - Barcelona, Spain
We present a procedure to automatically describe musical articulation gestures used in singing voice performances. We detail a method to characterize temporal evolution of fundamental frequency and energy contours by a set of piece-wise fitting techniques. Based on this, we propose a meaningful parameterization that allows reconstructing contours from a compact set of parameters at different levels. We test the characterization method by applying it to fundamental frequency contours of manually segmented transitions between adjacent notes and train several classifiers with manually labeled examples. We show the recognition accuracy for different parameterizations and levels of representation.
Convention Paper 6901 (Purchase now)
P8-6 Automatic Tonal Analysis from Music Summaries for Version Identification—Emilia Gómez, Beesuan Ong, Perfecto Herrera, Universitat Pompeu Fabra - Barcelona, Spain
Identifying versions of the same song by means of automatically extracted audio features is a complex task to achieve using computers, even though it may seem very simple for a human listener. The design of a system to perform this job gives the opportunity to analyze which features are relevant for music similarity. This paper focuses on the analysis of tonal and structural similarity and its application to the identification of different versions of the same piece. It describes the situations where a song is versioned and several musical aspects are transformed with respect to the canonical version. A quantitative evaluation is made using tonal descriptors, including chroma representations and tonality, combined with the automatic extraction of summary of pieces through music structural analysis.
Convention Paper 6902 (Purchase now)
P8-7 Groovator—An Implementation of Real-Time Rhythm Transformations—Jordi Janer, Jordi Bonada, Sergi Jordà, Universitat Pompeu Fabra - Barcelona, Spain
This paper describes a real-time system for rhythm manipulation of polyphonic audio signals. A rhythm analysis module extracts information of tempo and beat location. Based on this rhythm information, we apply different transformations: tempo, swing, meter, and accent. This type of manipulation is generally referred to as content-based transformations. We address characteristics of the analysis and transformation algorithms. In addition, user interaction also plays an important role in this system. Tempo variations can be controlled either by tapping the rhythm with a MIDI interface or by using an external audio signal such as prcussion or the voice as tempo control. We will conclude pointing out several use-cases, focusing on live performance situations.
Convention Paper 6903 (Purchase now)
P9 - Posters: Computers & Mobile Audio
Friday, October 6, 1:30 pm — 3:00 pm
P9-1 A Personalized Preset-Based Audio System for Interactive Service—Taejin Lee, Jae-hyoun Yoo, Yongju Lee, Daeyoung Jang, ETRI (Electronics and Telecommunication Research Institute) - Daejeon, Korea
A conventional audio service provides one mixed audio scene to the user, so user can control the overall volume only. In personalized audio service, however, the user can control properties of audio objects such as loudness, direction, and distance to construct his/her audio scene. But it is not easy to create an audio scene for normal users, so we adopted a preset-based system, which can provide various audio scenes to the user and the user can choose one of them based on his/her preference. The system consists of an authoring tool, streaming server, and a terminal. In this paper we present a design and implementation method of a personalized preset-based audio system and describe the simulation results and applications.
Convention Paper 6904 (Purchase now)
P9-2 Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments—Snorre Farner, Norwegian University of Science and Technology - Trondheim, Norway, presently at IRCAM, Paris, France; Audun Solvang, Asbjørn Sæbø, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
Hand-clapping experiments were performed by pairs of subjects under the influence of a delay up to 68 ms in various acoustic environments. The mean tempo decreased close to linearly as function of the delay. During each sequence the tempo slowed down to a degree that increased with the delay, but for delays shorter than about 15 to 23 ms, the tempo increased during the sequence. For the timing imprecision, and for the subjects’ judgments of their own ensemble performance, no effect of the delay could be observed up to 20 ms. Above 32 ms the effects were observed to increase with the delay. Virtual anechoic conditions lead to a higher imprecision than the reverberant conditions, and real-reverberation conditions lead to a slightly lower tempo.
Convention Paper 6905 (Purchase now)
P9-3 Audio System for Portable Market—Archibald Fitzgerald, Texas Instruments, Inc. - Bangalore, India
This paper describes the audio system software for portable audio players with respect to software and system-on-a-chip (SoC) architecture. The software system for portable devices includes audio playback, radio application, audio record, movie playback, and image viewer applications. In addition, the portable systems can contain gaming and navigation applications. The portable audio players demand low-power and small form-factors differentiated by wide array of audio effects like equalizer, time scale modification (TSM), and cross-fade. System-on-a-chip architecture and capabilities are critical for audio quality, audio features, power efficiency, battery life, form-factor, time to market, and cost.
Convention Paper 6906 (Purchase now)
P9-4 Design and Evaluation of a High Performance Class D Headphone Driver—Anthony Magrath, Wolfson Microelectronics - Edinburgh, Scotland, UK
This paper presents the design and bench evaluation of a class D headphone amplifier and provides compelling arguments as to why it is advantageous to use class D, even for output powers as low as 40 mW. Design tradeoffs are discussed that show how significant savings in power can be achieved at typical listening levels when compared with a conventional class AB amplifier.
Convention Paper 6907 (Purchase now)
P10 - Loudspeakers - Part 2
Friday, October 6, 2:00 pm — 4:30 pm
Chair: Richard Little, Tymphany - Cupertino, CA, USA
P10-1 Loudspeaker-Room Adaptation for a Specific Listening Position Using Information about the Complete Sound Field—Jan Abildgaard Pedersen, Lyngdorf Audio - Skive, Denmark
A novel method is presented for equalizing a loudspeaker for a specific listening position in order to compensate for an influence of the room in which it is positioned. The method is based on measuring the sound pressure in the listening position (focus position) and in at least three randomly selected positions scattered across the entire listening room (room positions). The measurement in the listening position holds information about the listener’s access to the sound field while the room positions hold information about the energy in the 3-D sound field. The correction for the listening position is then bound by upper and lower gain limits calculated as a function of frequency from the information about the 3-D sound field.
Convention Paper 6908 (Purchase now)
P10-2 Allpass Arrays: Theory, Design, and Applications—Michael Goodwin, Creative Advanced Technology Center - Scotts Valley, CA, USA
The realization of nondirectional linear electroacoustic arrays using Bessel weighting and other methods has been described in the literature. In this paper we discuss generalized allpass arrays; since the far-field response of a uniformly spaced linear array is specified by a mapping of the DTFT of the array weights, any FIR approximation of an allpass filter gives weights that result in a nearly uniform array response. We explain the fundamental array theory and present a straightforward method for the design of arbitrary-order allpass arrays. We further discuss applications of allpass arrays in crossover-filtered configurations and in the implementation of efficient frequency-invariant beamformers.
Convention Paper 6909 (Purchase now)
P10-3 Assessment of Nonlinearity in Transducers and Sound Systems —From THD to Perceptual Models—Alex Voishvillo, JBL Professional - Northridge, CA, USA
Research of audibility of loudspeaker nonlinear distortion has not shown good correlation between traditionally used metrics (harmonics and intermodulation) and subjective performance. The problem of sound fidelity-related methods to assess nonlinearity in transducers has not been solved. Wide application of low-bit rate compression systems (MP3, etc.) demanded the development of objective measurement methods based on perceptual models. These methods, however, have not been used for measurement of loudspeakers, and they may not be optimal for that due to the different nature of the nonlinearity in transducers. Recently, perceptual models created specifically for the assessment of nonlinearity in transducers have emerged. In this paper analysis of the old and new methods, their comparison, and the prospects for future developments are discussed.
Convention Paper 6910 (Purchase now)
P10-4 An Important Aspect of Underhung Voice-Coils: A Technical Tribute to Ray Newman—Raymond J. Newman (Deceased), Electro-Voice Inc. - Buchanan, MI, USA; D. B. (Don) Keele, Jr., Harman International Industries, Inc. - Martinsville, IN, USA; David Carlson, Jim Long, Electro-Voice Div. Telex Communications - Burnsville, MN, USA; Kent Frye, Gentex Corporation - Zeeland, MI, USA; Matthew Ruhlen, John Sheerin, Harman/Becker Automotive Systems - Martinsville, IN, USA
In the1970s, Ray Newman while at Electro-Voice, single handedly and very successfully promoted the use of the then new concept of the Thiele/Small parameters and related design techniques for categorizing loudspeakers and systems to the loudspeaker industry. This paper posthumously recounts the contents of three significant Electro-Voice memos written in 1992 by Ray Newman concerning a comparison of overhung versus underhung loudspeaker motor assemblies. The information in the memos is still very relevant today. He proposed a comparison between the two assembly types assuming motors that had: (1) the same Xmax, (2) the same efficiency, (3) similar thermal behavior, and (4) the same voice coil. He calculated the required magnetic gap energy and discovered to his surprise that the magnet requirements actually went down dramatically when switching from an overhung to an underhung structure and depended only on the ratio between Xmax and the voice-coil length. This is in contrast with “common sense” that dictates that longer gaps mean larger magnets. He showed that for high-excursion motors, a switch could be made from a ferrite overhung structure to an equivalent high-energy neodymium underhung structure with little cost penalty. This paper recounts this early work and then presents motor predictions using a present-day magnetic FEM simulator. The results show that indeed, the magnetic energy required by an underhung motor is actually less than an overhung motor as long as the operating flux in the overhung motor’s core is below the point where the core and fringe losses are comparable to its gap energy. Ray’s original memos and notes will also be included as an appendix to the paper along with reminiscences from the paper’s co-authors.
Convention Paper 6911 (Purchase now)
P10-5 The Acoustic Center: A New Concept for Loudspeakers at Low Frequencies—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
This paper focuses on the acoustic center, which represents a particular point for a normal sealed-box loudspeaker that acts as the origin of its low-frequency radiation. At low frequencies, the radiation from such a loudspeaker becomes simpler as the wavelength of the sound becomes large relative to the enclosure dimensions, and the system behaves externally as a spherical point source. Although there are near-field effects very close to the loudspeaker, the acoustic center has a clear meaning even a short distance from the enclosure, up to frequencies of about 200 Hz for typical systems. The low-frequency response of loudspeakers in rooms is determined by the position of their acoustic centers. The study is underpinned by: (1) a mathematical multipole expansion of the output of a loudspeaker, (2) an acoustic boundary-element calculation of a number of loudspeaker systems, (3) some measurements that corroborate the concept of the acoustic center, and (4) a discussion of a number of relevant concepts.
Convention Paper 6912 (Purchase now)
P11 - Psychoacoustics, Perception, and Listening Tests - Part 2
Friday, October 6, 2:00 pm — 5:30 pm
Chair: Brent Edwards, Starkey Laboratories - Eden Prarie, MN, USA
P11-1 Contextual Effects on Sound Quality Judgments: Part II—Multistimulus vs. Single Stimulus Method—Kathryn Beresford, University of Surrey - Guildford, Surrey, UK; Natanya Ford, Harman Becker Automotive Systems - Bridgend, Wales, UK; Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
In a previous pilot experiment (Part I; Convention Paper 6648 presented at the 120th AES Convention, Paris, France), a single stimulus method was employed to evaluate contextual effects on sound quality judgments. In this investigation (Part II) a multistimulus comparison method is used to evaluate the potential influence of listening context on sound quality judgments. Audio quality is assessed, as before, in two differing audio environments: a left-hand drive vehicle and an ITU-R BS.1116-conformant listening room. Trained and untrained listeners compared and graded audio quality for four stimuli with degradations in the midfrequency range. No identified reference (anchor) was used in the listening test, providing the opportunity for the influence of the audio environment to be observed in the results. Contraction bias, which was caused by the single stimulus method, was not evident in the results of this second study. Additionally listeners were able to discriminate between differently degraded stimuli where this was not possible in the initial research. Some small contextual effects were observed, however biases resulting from the indirect context comparison make it difficult to draw substantial conclusions.
Convention Paper 6913 (Purchase now)
P11-2 Audibility of Time Differences in Adjacent Head-Related Transfer Functions (HRTFs)—Pablo Hoffmann, Henrik Møller, Aalborg University - Aalborg, Denmark
Changes in the temporal and spectral characteristics of the sound reaching the two ears are known to be of great importance for the perception of spatial sound. The smallest change that can be reliably perceived provides a measure of how accurate directional hearing is. The present paper investigates audibility of changes in the temporal characteristics of HRTFs. A listening test is conducted to measure the smallest change in the inter-aural time difference (ITD) that produces an audible difference of any nature. Results show a large inter-individual variation with a range of audibility thresholds from about 20 µs to more than 300 µs.
Convention Paper 6914 (Purchase now)
P11-3 Perceptual Evaluation of Algorithms for Blind Up-Mix—Thomas Sporer, Fraunhofer Institute for Digital Media Technology (IDMT) - Ilmenau, Germany; Andreas Walther, Fraunhofer IIS, Technical University of Ilmenau, Ilmenau, Germany - Erlangen, Germany; Judith Liebetrau, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Sebastian Bube, Christian Fabris, Thomas Hohberger, Anja Köhler, Technical University Ilmenau - Ilmenau, Germany
The number of consumer home theater systems with surround capabilities has increased heavily. Nonetheless, most audio content is still two-channel stereo. Thus, to enjoy the advantages of their surround-systems for all types of content, consumers resort to systems that automatically create multichannel sound from legacy sources ("blind up-mix"). While a number of algorithms are used today, there is no commonly accepted test methodology to evaluate their sonic performance. Standardized listening test procedures evaluate audio quality relative to an unimpaired reference as a ground truth and thus are not applicable to up-mix scenarios. In this paper a new listening test procedure is described, which is designed to consistently assess the quality of up-mix (or down-mix) algorithms. First test results are presented.
Convention Paper 6915 (Purchase now)
P11-4 Pitch Transposition of Flute Tones Based on Variation of Average Spectral Distribution—Sean O’Leary, Niall Griffith, University of Limerick - Limerick, Ireland
The problem of pitch transposition in relation to the consistency of the timbre of a flute sound over its pitch range is investigated. The transposition method outlined here is based on the variation of the average spectral distribution with pitch, and preserving the spectral behavior in relation to the productive mechanism. A set of measures is proposed to measure the variation of the average spectral distribution with pitch, and a set of samples are analyzed over the pitch range of the instrument. These measures are used in the transposition model to correct the average spectral distribution.
Convention Paper 6916 (Purchase now)
P11-5 Pitch Coherence as a Measure of Apparent Distance in Performance Spaces and Muddiness in Sound Recordings—David Griesinger, Harman Specialty Group - Bedford, MA, USA
This paper demonstrates a physiological method whereby sonic distance and muddiness can be quantified through the detection of pitch fundamentals from the phase coherence of harmonics in the vocal formant range. The method allows the perceived direct/reverberant ratio of a performance or a recording to be determined from a single channel of a recording of speech or music, allowing quality assessments during actual performances. Preferred values of the direct/reverberant ratios above 1000 Hz obtained by this method are +3 to +6 dB. This result has important consequences both for performance acoustics and recording.
Convention Paper 6917 (Purchase now)
P11-6 A Comparison of Various Multichannel Loudness Measurement Techniques—Alan Seefeldt, Steve Lyman, Dolby Laboratories - San Francisco, CA, USA
In this paper two recently proposed objective measures of perceived loudness for monophonic audio signals are extended in several ways to deal with multichannel audio. The extensions range in complexity from a simple sum of the individual channels to the use of measured HRTFs to simulate the audio signals arriving at the ears. A database of subjective loudness matching data of multichannel audio is generated, and the performance of the various objective measures, including the particular multichannel measure recently adopted by the ITU-R, is compared against this data.
Convention Paper 6918 (Purchase now)
P11-7 Predicting Listener Preferences for Surround Microphone Technique through Binaural Signal Analysis of Loudspeaker-Reproduced Piano Performances—Sungyoung Kim, William Martens, McGill University - Montreal, Quebec, Canada; Atsushi Marui, McGill University - Montreal, Quebec, Canada, Tokyo National University of Fine Arts and Music, Tokyo, Japan; Kent Walker, McGill University - Montreal, Quebec, Canada
Four solo piano pieces were presented through a five-channel loudspeaker reproduction system for a pairwise preference test in a previous study, and the results of that test were described in terms of the interaction between program material and surround microphone technique. In an attempt to predict the obtained preference choices on the basis of the binaural signals recorded during loudspeaker reproduction of differing versions of these musical programs, a number of electroacoustic measures on the test stimuli were examined via stepwise multiple regression. The most successful prediction resulted from a combination of Ear Signal Incoherence (ESI) and Side Bass Ratio (SBR), regardless of methodological differences between two independently tested groups of listeners.
Convention Paper 6919 (Purchase now)
P12 - Posters: Measurements & Modeling
Friday, October 6, 4:00 pm — 5:30 pm
P12-1 The Accuracy and Consistency of Spectrographic Analysis for Voice Identification—Jeff Smith, University of Colorado at Denver - Denver, CO, USA
This test investigated the accuracy and consistency of voice identification comparisons made by five trained examiners over a three week period. These individuals were all students of the University of Colorado at Denver and had taken a semester-long course in Audio Forensics with limited training in voice identification. Each week, examiners conducted eight closed-trial comparisons of four clue-phrases from both male and female speakers. In simulating a closed set spectrographic line-up, each comparison consisted of spectrograms from a pool of four “known” speakers and one “unknown” speaker—audio recordings of the known and unknown speakers were made nine months apart. From the pool of known speakers, the examiner made a positive identification match to the unknown. After the three week period, data reveled that examiners reached the same conclusion in all three examinations for only 50 percent of the comparisons. The average accuracy of these examinations was 65 percent. This paper discusses the outcome of the experiment including interpretation of these and other results.
Convention Paper 6920 (Purchase now)
P12-2 Loudspeaker Thermal and Safety Data Acquisition System—Marshall Buck, Psychotechnology, Inc. - Los Angeles, CA, USA
A four-channel data acquisition system has been designed for measurement of four parameters needed for safety and thermal testing and modeling in a voice coil driven loudspeaker: voice coil temperature, RMS voltage drive, current, and true V x I power dissipated. These data are stored in Excel format for postprocessing. Safety tests with alternating or direct current stimulation require the current versus time displays needed for pretesting to UL 1480 and ANSI/CEA-636 standards. Rated at 32 amperes and 125 volts, the instrument is suitable for voice coils rated up to 5000 Watts. A real-time graphics display on a standard PC is implemented with a USB interface.
Convention Paper 6921 (Purchase now)
P12-3 Surface Scattering Uniformity Measurements in Reflection-Free Environments—Lorenzo Rizzi, LAE - Laboratorio di Acustica ed Elettroacustica - Parma, Italy; Angelo Farina, Università di Parma - Parma, Italy; Paolo Galaverna, Genesis Acoustic Workshop - Parma, Italy; Paolo Martignon, Lorenzo Conti, Andrea Rosati, LAE - Laboratorio di Acustica ed Elettroacustica - Parma, Italy
Following previous investigations, carried out at the University of Parma in 1999 and 2000, LAE (Laboratory of Acoustics and Electroacoustics) started a new measurement campaign to compare with the original results on the same type of diffusor panels, to verify AES-4id-2001 measurement standard, and to investigate the nature of scattering phenomena in more detail. Measurements were conducted on the floor of a large closed space to obtain a reflection free time window, long enough to study the first reflection from the panel; the use of sine sweep excitation signals instead of the recommended MLS ones permits to ameliorate the acquisition process. The present paper discusses research background studies and the results from the first round of measurements.
Convention Paper 6922 (Purchase now)
P13 - Signal Processing - Part 1
Saturday, October 7, 8:30 am — 12:30 pm
Chair: Ronald Aarts, Philips Research Laboratories - Eindhoven, The Netherlands
P13-1 Picturing Dither: Dithering Pictures—Stanley Lipshitz, Cameron Christou, University of Waterloo - Waterloo, Ontario, Canada
The desirable properties that follow from the use of (nonsubtractive) triangular probability density function (TPDF) random dither in digital audio quantization and noise shaping are now well known in the audio community. The principal purpose of this paper is to use a visual analogy to aid audio engineers in their understanding of how proper TPDF dithering and noise shaping can convert otherwise objectionable, correlated quantization errors into benign, uncorrelated, and less visible ones. As they say, “a picture is worth a thousand words.” Our secondary purpose is to demonstrate, in the process, that the very same concepts, applied now in the spatial instead of the temporal domain, are just as useful and beneficial in the field of digital picture processing too. We present color and monochrome images of the results of coarse quantization, both with and without dither and/or noise shaping, to help us make our points. [In the “live” presentation of this paper, we shall play an audio example at the same time as we show each picture, so that one can simultaneously both see and hear each effect being discussed.]
Convention Paper 6923 (Purchase now)
P13-2 Comparison of Frequency-Warped Representations for Source Separation of Stereo Mixtures—Juan José Burred, Thomas Sikora, Technical University Berlin - Berlin, Germany
We evaluate the use of different frequency-warped, nonuniform time-frequency representations for the purpose of blind sound source separation from stereo mixtures. Such transformations enhance resolution in spectral areas relevant for the discrimination of the different sources, improving sparsity and mixture disjointness. In this paper we study the effect of using such representations on the localization and detection of the sources, as well as on the quality of the separated signals. Specifically, we evaluate a constant-Q and several auditory warpings in combination with a shortest path separation algorithm and show that they improve detection and separation quality in comparison to using the Short Time Fourier Transform.
Convention Paper 6924 (Purchase now)
P13-3 Auditory Component Analysis—Jon Boley, University of Miami - Coral Gables, FL, USA
Two of the major research areas currently being evaluated for the so-called sound source separation problem are auditory scene analysis and a class of statistical analysis techniques known as independent component analysis. This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly It then measures features of the resulting tracks and separates the sounds statistically by matching feature sets and attempting to make the output streams statistically independent. The proposed system is found to successfully separate artificial and acoustic mixes of sounds. As expected, the amount of separation is inversely proportional to the amount of reverberation present, number of sources, and interchannel correlation.
Convention Paper 6925 (Purchase now)
P13-4 Frequency Domain Artificial Reverberation Using Spectral Magnitude Decay—Earl Vickers, The Sound Guy, Inc. - Seaside, CA, USA; Jian-Lung (Larry) Wu, Stanford Center for Computer Research in Music and Acoustics - Stanford, CA, USA; Praveen Gobichettipalayam Krishnan, Ravirala Narayana Karthik Sadanandam, University of Missouri - Rolla, MO, USA
A novel method of producing artificial reverberation in the frequency domain, using spectral magnitude decay, is presented. The method involves accumulating the magnitudes of the short-time Fourier transform, based on the desired decay time as a function of frequency. Compared to time domain methods such as feedback delay networks, the current method requires less memory and provides independent control of the reverberation energy and decay time in each frequency bin. Compared to convolution reverbs, the current approach offers flexible parametric control over the decay spectra and a computational cost that is independent of decay time.
Convention Paper 6926 (Purchase now)
P13-5 Design of an Automatic Beat-Matching Algorithm for Portable Media Devices—Danny Jochelson, Texas Instruments, Inc. - Dallas, TX, USA; Stephen Fedigan, General Dynamics Vertex RSI - Richardson, TX, USA
Methods to achieve accurate beat detection for musical signals have received much attention recently; however, very little literature has addressed techniques for achieving beat matching between two streams on portable devices with limited memory and processing power. This paper describes the architecture, design methods, obstacles, optimizations, and results for a new beat matching algorithm created for real-time use on embedded devices. This algorithm produces promising performance for use on portable media devices that often play modern musical genres.
Convention Paper 6927 (Purchase now)
P13-6 Artificial Reverberation: Comparing Algorithms by Using Monaural Analysis Tools—Denis Extra, Uwe Simmer, Sven Fischer, Joerg Bitzer, University of Applied Science Oldenburg/Ostfriesland/Wilhelmshaven - Oldenburg, Germany
In this paper a comparison of different algorithms for artificial reverberation is presented. The tested algorithms are commercially available devices and digital plug-ins in a broad price range plus algorithms known from literature. For the analysis we developed an analysis toolbox, which contains several monaural analysis methods, including the energy decay curve in fractional octave bands, auto-correlation, and other known measures of reverberation qualities. Furthermore, the behavior over time will be analyzed, showing that many systems cannot be considered as time-invariant. Some statistical analysis of the impulse response will also be given. The purpose is to investigate whether synthetic reverberation is created pertaining to attributes of real rooms and whether there are differences between algorithms or not.
Convention Paper 6928 (Purchase now)
P13-7 Inverse Filtering Design Using a Minimal-Phase Target Function from Regularization—Scott Norcross, Communications Research Centre - Ottawa, Ontario, Canada; Martin Bouchard, University of Ottawa - Ottawa, Ontario, Canada; Gilbert Soulodre, Communications Research Centre - Ottawa, Ontario, Canada
Inverse filtering methods commonly use amplitude regularization as a technique to limit the amount of work done by the inverse filter. The amount of regularization needed must be carefully selected so that the audio quality is not degraded. This paper introduces a method of using the magnitude of the regularization to design a target/desired response in which the phase response can be arbitrarily chosen. By choosing a minimum-phase response, one can reduce any pre-response in the corrected signal that is introduced by the regularization. A phase response that consists of a frequency-dependent mixture of minimum- and zero-phase components is also introduced. Informal listening tests were performed to verify the effectiveness of the new method.
Convention Paper 6929 (Purchase now)
P13-8 The Origins of DSP and Compression: Some Pale Gleams from the Past—Jon Paul, Scientific Conversion, Inc. - Novato, CA, USA
This paper explores the history that led to modern day DSP and audio compression. The roots of of modern digital audio sprang from Dudley’s 1936 VOCODER and the WWII-era SIGSALY speech scrambler. We highlight these key inventions, detail their hardware and block diagrams, describe how they functioned, and illustrate their relationship to modern day DSP and compression algorithms.
Convention Paper 6930 (Purchase now)
P14 - Analysis and Synthesis of Sound
Saturday, October 7, 8:30 am — 11:30 am
Chair: Duane Wise, Consultant - Boulder, CO, USA
P14-1 Determining the Need for Dither when Re-Quantizing a 1-D Signal—Carlos Fabian Benitez-Quiroz, Shawn D. Hunt, University of Puerto Rico - Mayaguez, Puerto Rico
This paper presents novel methods for determining if dither is needed when reducing the bit depth of a one-dimensional digital signal. These are statistical-based methods in both the time and frequency domains, and are based on determining whether the quantization noise with no dither added is white. If this is the case, then no undesired harmonics are added in the quantization or re-quantization process. Experiments showing the effectiveness of the methods with both synthetic and real audio signals are presented.
Convention Paper 6931 (Purchase now)
P14-2 Shape-Changing Symmetric Objects for Sound Synthesis—Cynthia Bruyns, David Bindel, University of California at Berkely - Berkeley, CA, USA
In the last decade, many researchers have used modal synthesis for sound generation. Using a modal decomposition, one can convert a large system of coupled differential equations into simple, independent differential equations in one variable. To synthesize sound from the system, one solves these decoupled equations numerically, which is much more efficient than solving the original coupled system. For large systems, such as those obtained from finite-element analysis of a musical instrument, the initial modal decomposition is time-consuming. To design instruments from physical simulation, one would like to be able to compute modes in real-time, so that the geometry, and therefore spectrum, of an instrument can be changed interactively. In this paper we describe how to quickly compute modes of instruments that have rotational symmetry in order to synthesize sounds of new instruments quickly enough for interactive instrument design.
Convention Paper 6932 (Purchase now)
P14-3 Unisong: A Choir Singing Synthesizer—Jordi Bonada, Merlijn Blaauw, Alex Loscos, Universitat Pompeu Fabra - Barcelona, Spain; Kenmochi Hideki, YAMAHA Corporation - Hamamatsu, Japan
Computer-generated singing choir synthesis can be achieved by two means: clone transformation of a single voice or concatenation of real choir recording snippets. As of today, the synthesis quality for these two methods lack naturalness and intelligibility, respectively. Unisong is a new concatenation-based choir singing synthesizer able to generate a high quality synthetic performance out of the score and lyrics specified by the user. This paper describes all actions and techniques that take place in the process of virtual synthesis generation: choir recording scripts design and realization, human supervised automatic segmentation of the recordings, creation of samples database, and sample acquiring, transformation and concatenation. The synthesizer will be demonstrated with song sample.
Convention Paper 6933 (Purchase now)
P14-4 Accurate Low-Frequency Magnitude and Phase Estimation in the Presence of DC and Near-DC Aliasing—Kevin Short, University of New Hampshire - Durham, NH, Groove Mobile, Bedford, MA, USA; Ricardo Garcia, Groove Mobile - Bedford, MA, USA
Efficient high resolution parameter estimation of sinusoidal elements has been shown to be of fundamental importance in applications such as measurement, parametric decomposition of signals, and low bit-rate audio coding. Certain methods such as the Complex Spectral Phase Evolution (CSPE) can be used to estimate the true frequency, magnitude, and phase of underlying tones in a signal with accuracy that is significantly more precise than the signal resolution of a transform-based analysis. These methods usually require the signal elements to be spectrally separated so that the mutual interference is minimal (often referred as the “analysis window main lobe width”). This paper extends the methods introduced in CSPE for low frequency real tone signals, where the interference or “leakage” from the negative frequencies is unavoidable, regardless of what analysis window is used. The new technique gives improved magnitude and phase estimates for the sinusoidal parameters.
Convention Paper 6934 (Purchase now)
P14-5 Frequency Domain Phase Model of Transient Events—Kevin Short, University of New Hampshire - Durham, NH, USA, Groove Mobile, Bedford, MA, USA
Short time transient events are extremely challenging to represent in the transform domain employed by common transform-based codecs used in applications such audio compression. These short-time events last for a duration that is much shorter than a typical data window and, consequently, have power distributed throughout the transform domain. Accurate representation of these events in the transform domain requires higher bit rates than usually available. A common solution is to use window switching, where smaller windows are used for short time transient events, but this has a negative impact on the bit rate as well. In this paper we show that with certain simplifying assumptions, transient reconstruction can be reduced to a tractable problem that is performed in the frequency domain, so that the transient event can be easily mixed in with the representation of the non transient events. A closed form frequency domain representation for the phase of a transient event is introduced, and it is shown that this can be done in an iterative way that allows for increasingly complex transient structures back in the time domain.
Convention Paper 6935 (Purchase now)
P14-6 Doing Good by the “Bad Boy”: Performing George Antheil’s Ballet mécanique with Robots—Paul Lehrman, Tufts University - Medford, MA, USA; Eric Singer, League of Electronic Music Urban Robots - Brooklyn, NY, USA
The Ballet mécanique, by George Antheil, was a musical composition far ahead of its time. Written in 1924, it required technology that didn't exist: multiple synchronized player pianos. Not until 1999, with the aid of computers and MIDI, could the piece be performed the way the composer envisioned it. Since then, it has been played over 20 times in North America and Europe. But its most unusual performance was the result of a collaboration between the authors: one, the music technologist who revived the piece and the other, a musical robotics expert. At the request of the National Gallery of Art in Washington, DC, they built a completely automated 27-piece orchestra, which played the piece nearly 100 times, without a serious failure.
Convention Paper 6936 (Purchase now)
P15 - Posters: Loudspeakers
Saturday, October 7, 9:00 am — 10:30 am
P15-1 Linear Array Transducer Technology—Andrew Unruh, Christopher Struck, Tymphany Corporation - Cupertino, CA, USA
The Linear Array Transducer (LAT) is a loudspeaker technology using multiple opposed interleaved diaphragms to create a bass transducer with a cylindrical form factor and almost no mechanical vibration. Design goals, construction, and operating principles are described and frequency response and impedance measurements are shown. Recent structural improvements in the LAT are also discussed. System design considerations are discussed and examples of its use in a product are shown.
Convention Paper 6937 (Purchase now)
P15-2 A Novel Flexible Loudspeaker Driven by an Electret Diaphragm—Dar-Ming Chiang, Jen-Luan Chen, Industrial Technology Research Institute (ITRI) - Chutung, Hsinchu, Taiwan
According to the flexible electronics intensively applied to consumer product in future, a new flexible electrostatic loudspeaker driven by the electret diaphragm is developed. The electret diaphragms of flexible loudspeakers are fabricated by fluoro-polymer with nano-meso-micro pores and charged by the corona method at room temperature. The interior surface areas of the pores of the electret films are effectively to increase the retention and stability of charges. The experimental results reveal that the retention and stability of charges of the electret diaphragm is satisfied to drive the flexible electrostatic loudspeaker. The sound pressure level of the flexible electrostatic loudspeaker (60 mm*80 mm*1 mm) is measured as 80 dB/0.2 W at 1 kHz and 30 cm distance.
Convention Paper 6938 (Purchase now)
P15-3 Stress Analysis on Moving Assemblies and Suspensions of Loudspeakers—Fernando Bolaños, Acústica Beyma S. A. - Valencia, Spain
This paper explains the basic results of numerical and experimental analysis of moving assemblies and suspensions of loudspeakers taking into account the bending forces and the in-plane forces that act on these slender bodies. The distribution of these stresses is shown in cones of direct radiators and in domes (for example, in compression drivers) as well. An explanation of the generation of subharmonics is obtained by this technique. The suddenly jump of the working point on moving assemblies is justified by means of the compression forces that act on the suspensions. These compression forces are the cause of the buckle or snaps that very often occur in the loudspeakers. This paper analyzes different types of suspension showing the compromise situation the designer has to deal with.
Convention Paper 6939 (Purchase now)
P15-4 Nonlinear Stiffness of the Loudspeaker Measured in the Evacuated Space—Ivan Djurek, Faculty of Elec. Eng. and Computing - Zagreb, Croatia; Danijel Djurek, AVAC – Alessandro Volta Applied Ceramics - Zagreb, Croatia; Antonio Petosic, Faculty of Elec. Eng. and Computing - Zagreb, Croatia; Nazif Demoli, Institute of Physics - Zagreb, Croatia
The impedance of the mechanical vibration system of the loudspeaker was measured in vacuo in order to remove the contributions of the radiation impedance of air. The Hooke constant k was evaluated by the use of calibrated weights and from the membrane displacements due to the force exerted by DC current in the voice coil. The resonant frequency was found to decrease with increasing nonlinear Hooke constant, which is attributed to the effective mass of the vibration system, dependent upon elongation. The effective mass was evaluated from the fitting of measured and calculated loudspeaker impedance curve.
Convention Paper 6940 (Purchase now)
P15-5 Linearization of Nonlinear Loudspeakers—Bo Rohde Pedersen, Aalborg University - Esbjerg, Denmark; Per Rubak, Aalborg University - Aalborg, Denmark
Feed forward methods for compensation of nonlinearities in loudspeakers are studied and tested in simulation cases. An adaptive feed forward controller is investigated to handle the drift caused by temperature, aging, and production spread. For estimating the needed parameter accuracy (match between controller and plant parameters) we have tested a simple feed forward controller with different degree off parameter mistuning between plant (loudspeaker) and controller. The required system identification (tracking of the changes in linear loudspeaker parameters) is investigated using and simple 2nd order IIR model for the linear loudspeaker. Different techniques to handle the stability problem for adaptive IIR filters are investigated.
Convention Paper 6941 (Purchase now)
P15-6 Response Adaptation of Loudspeaker System—Mingu Lee, Hyun-Ju Jung, Sinlyul Lee, Koeng-Mo Sung, Seoul National University - Seoul, Korea
In this paper variations in the frequency response of vented-box loudspeakers due to the adjustment of several less constrained physical parameters are predicted. In addition, with this information, estimation of the optimum values of the parameters that the corresponding frequency response optimally fits the arbitrary objective response is accomplished. Also, the MATLAB GUI program, which performs the procedure automatically, with the vented-box loudspeaker parameters as input, is presented. The extendibility of the limitations of this method in terms of the type of the loudspeaker, the optimality criterion, etc., is discussed.
Convention Paper 6942 (Purchase now)
P15-7 Digitally-Driven Piezoelectric Loudspeaker Using Multibit Sigma-Delta Modulation—Hajime Ueno, Tsuyoshi Soga, Katsuya Ogata, Akira Yasuda, Hosei University - Tokyo, Japan
Although a substantial quantity of music data is stored as digital information, as in the case of CDs and MDs, an analog drive is still the main component of a loudspeaker. If the speaker can be driven digitally, it becomes possible to perform all processes from the input to the output digitally. As a result, the analog power amplifier and some other components become unnecessary and a small, light, and high-quality loudspeaker can be achieved. In this paper we propose a digitally-driven piezoelectric speaker employing multibit delta-sigma modulation.
Convention Paper 6943 (Purchase now)
P15-8 Design Considerations for Shallow Subwoofers—Claus Futtrup, Tymphany Corporation - Cupertino, CA, USA
Conventional subwoofers are usually quite deep to accommodate long throw. A shallow subwoofer (SSW) design is presented that aims to maintain the quality of low distortion and long throw bass reproduction. Two design concepts, applicable to larger drivers, are described and results are shown. The result is more bass for a given loudspeaker-depth without compromising the sound quality at low frequencies. One such concept is the sandwich cone, another the strut supported cone. The chosen design of a low profile diaphragm with strut support is described in detail. Another issue is the motor and spider design. Considerations for joint motor and spider design are analyzed for a series of configurations. Advantages and disadvantages of each are described. The chosen design integrates the spider into the motor system to preserve space but still allows for a large diameter spider to be applied.
Convention Paper 6944 (Purchase now)
P15-9 A Plane Wave Transducer: Technology and Applications—Antti Kelloniemi, Kari Mettälä, Panphonics Oy - Espoo, Finland
A plane wave transducer technology is presented with application examples and measurement results. The technology enables high volume manufacturing of audio elements. These new, highly directive audio transducers are beneficial in several uses. A plane wave source exhibits remarkably less geometric attenuation with increasing distance than conventional cone loudspeakers. As the sound is transmitted only to the wanted direction, the amount of reflections deteriorating the sound quality is minimized and the amount of disturbance at surrounding space is diminished. A directive microphone can be produced using the same technology, which in turn enables the construction of a locally controlled active noise cancellation panel.
Convention Paper 6945 (Purchase now)
P16 - Posters: Amplifiers & High Resolution
Saturday, October 7, 1:30 pm — 3:00 pm
P16-1 Digital Correction of Switching Amplifier by Error Remodulation Method—Haekwang Park, Vladislav Shimanskiy, Youngsuk Song, Heesoo Lee, Seongcheol Jang, Samsung Electronics Co. Ltd. - Seoul, Korea
In this paper the error remodulation method for digital correction of a pulse width modulation switching amplifier is proposed. This method extracts an error signal from the difference between the reference pulse width modulation signal and power stage output and generates the error pulse width modulation signal by using a remodulation method. The error pulse width modulation signal is then used to compensate for the power supply noise and nonlinearity of the power stage. The proposed method is suitable for the correction of the PWM controller in a full digital amplifier.
Convention Paper 6946 (Purchase now)
P16-2 Iterative Method for Natural Sampling—Vladislav Shimanskiy, Seong-cheol Jang, Samsung Electronics Co. Ltd. - Seoul, Korea
The performance of pure digital audio amplifiers using pulse-width modulation highly depends on the conversion accuracy of the pulse-code modulated audio signal into pulse-width modulated sequence. This process implies the recovery of original analog signal values at irregular time instances bearing on a uniformly distributed pulse-code modulated data only. The recovery, or “natural sampling,” requires interpolation processing giving a trade-off between accuracy of the result and computation speed. In this paper we propose a method for natural sampling by providing tunable speed-performance constrains while giving the advantage of easy implementation in VLSI. Cubic polynomial interpolation and iterative solving algorithm, as well as experimental results, are presented in the paper.
Convention Paper 6947 (Purchase now)
P16-3 A High Performance S/PDIF Transceiver—Paul Lesso, Wolfson Microelectronics - Edinburgh, Scotland, UK
This paper details the design and implementation of a novel S/PDIF transceiver with a very low jitter bandwidth. We describe and demonstrate a system based on multiple-loops that synchronizes to the incoming data stream with a very low bandwidth and provides the original data unmodified on a clean low jitter output clock without the need for a sample rate converter. Thus we eliminate any jitter above a low frequency (typically 10 Hz) on the input data and also avoid any distortion caused by sample rate converters.
Convention Paper 6948 (Purchase now)
P17 - Signal Processing - Part 2
Saturday, October 7, 2:30 pm — 6:00 pm
Chair: Jürgen Herre, Fraunhofer IIS - Erlangen, Germany
P17-1 Loudspeaker-Based 3-D Audio System Design Using the M-S Shuffler Matrix—Martin Walsh, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Technology Center, Scotts Valley, CA, USA
This paper outlines a new design methodology that can help to achieve higher quality 3-D audio reproduction over loudspeakers for a variety of applications using only adapted M-S matrices. Several key M-S matrix-based topologies are summarized, and a new design methodology is presented that allows the design and efficient implementation of any new 3-D audio system using only M-S matrix-based topologies. A real-world design example is used to highlight how this new design methodology can not only help the 3-D audio system design process, but also improve the audio quality of the resulting reproduction.
Convention Paper 6949 (Purchase now)
P17-2 Binaural Simulation of Complex Acoustic Scenes for Interactive Audio—Jean-Marc Jot, Martin Walsh, Creative Advanced Technology Center - Scotts Valley, CA, USA; Adam Philp, Creative Labs – Sensaura - Egham, Surrey, UK
We describe a computationally efficient 3-D positional audio and spatial reverberation processing architecture for real-time virtual acoustics using headphones or loudspeakers. An advantageous method for binaural synthesis of massive numbers of sound sources is introduced. Extensions of the architecture are described for simulating nearfield emitters, modeling spatially extended sound events, rendering multiroom reverberation, and incorporating the perceptually salient features of early reflections and acoustic obstructions in the listener's immediate virtual environment. The proposed approach enables the implementation of scalable interactive 3-D audio rendering systems in personal computers, game consoles, set top boxes or mobile phones. The associated scene representation model is compatible with current interactive audio standards including OpenAL, MPEG-4, and JSR-234.
Convention Paper 6950 (Purchase now)
P17-3 A Technique for Nonlinear System Measurement—Jonathan Abel, David Berners, Universal Audio, Inc. - Santa Cruz, CA, USA, Stanford, University, Stanford, CA, USA
A method for measuring nonlinear systems having a certain type Volterra series is presented. The Volterra series studied is the parallel combination of elements having series input and output filters around a power-law distortion and may be used to represent a wide variety of systems combining filtering and memoryless distortion functions. The technique is to measure the system using a swept sinusoid at a variety of amplitudes and to use least squares to first separate the element responses and then identify the unknown input and output filters.
Convention Paper 6951 (Purchase now)
P17-4 Esophageal Voice Enhancement by Modeling Radiated Pulses in Frequency Domain—Alex Loscos, Jordi Bonada, Universitat Pompeu Fabra - Barcelona, Spain
Although esophageal speech has been demonstrated to be the most popular voice recovering method after laryngectomy surgery, it is difficult to master and shows a poor degree of intelligibility. This paper proposes a new method for esophageal voice enhancement using speech digital signal processing techniques based on modeling radiated voice pulses in the frequency domain. The analysis-transformation-synthesis technique creates a nonpathological spectrum for those utterances featured as voiced and filters those unvoiced. Healthy spectrum generation implies transforming the original timbre, modeling harmonic phase coupling from the spectral shape envelope, and deriving pitch from frame energy analysis. Resynthesized speech aims to improve intelligibility, minimize artificial artifacts, and acquire resemblance to patient’s presurgery original voice.
Convention Paper 6952 (Purchase now)
P17-5 A Novel IIR Equalizer for Nonminimum Phase Loudspeaker Systems—Avelino Marques,, Polytechnical Institute of Engineering of Porto - Porto, Portugal; Diamantino Freitas, University of Porto - Porto, Portugal
A novel approach for the equalization of nonminimum phase loudspeaker systems based on the design of an IIR inverse filter is presented. This IIR inverse filter is designed in time domain by minimization of the least squares error function that results from using the typical “Output Error” configuration in the inverse modeling of nonminimum phase systems, with an adjustable delay. Due to the nonlinear nature of the error function, iterative optimization methods for nonlinear least squares problems were applied, namely the Levenberg-Marquardt method. This approach allows the design of inverse filter-based equalization solutions with lower computational requirements, lower equalization error, and lower delay of the equalized loudspeaker system than the most used one, the FIR inverse filter. The advantages of this new approach are demonstrated with its application for the equalization of a two loudspeaker systems. The results of the objective evaluation of this application are outlined, presented, and discussed regarding time and frequency domain equalization errors and the delay of the equalized loudspeaker.
Convention Paper 6953 (Purchase now)
P17-6 Spring Reverb Emulation Using Dispersive Allpass Filters in a Waveguide Structure—Jonathan Abel, David Berners, Universal Audio, Inc. - Santa Cruz, CA, USA, Stanford University, Stanford, CA, USA; Sean Costello, Analog Devices - San Jose, CA, USA; Julius O. Smith III, Stanford University - Stanford, CA, USA
Wave propagation along springs in a spring reverberator is studied, and digital emulations of several popular spring reverberator models are presented. Measurements on a number of springs reveal several dispersive propagation modes and evidence of coupling among them. The torsional mode typically used by spring reverberators is seen to be highly dispersive, giving the spring its characteristic sound. Spring reverberators often have several springs operating in parallel, and the emulations presented here use a set of parallel waveguide structures, one for each spring element. The waveguides explicitly compute the left-going and right-going torsional waves, including dispersion, propagation, and reflection effects. Scattering from spring imperfections and from the rings coupling counter-wound springs are modeled via waveguide scattering junctions.
Convention Paper 6954 (Purchase now)
P17-7 Characteristics of Inharmonic Frequency Analysis of GHA and its Application to Audio Signal Processing—Teruo Muraoka, Tohru Fukube, The University of Tokyo - Tokyo, Japan
GHA is a frequency analysis originally proposed by N. Wiener in 1930. His aim was to analyze stochastic signals utilizing harmonic frequency analysis and clarified that any signal can be represented by almost periodic function whose frequency components are in inharmonic relationship. In 1993 Dr. Hirata proposed an inharmonic frequency analysis applicable to audio signal processing, and it became known as “GHA.” The authors have been engaged in its improvement and utilization and reported several applications. Among them, the authors have reported intensive noise reduction to damaged SP records at the last convention [AES 120th Convention, Paris, France, Convention Paper 6725]. In this paper the principle of GHA and its fundamental characteristics will be explained together with its application to noise reduction in comparison with conventional spectral subtraction method.
Convention Paper 6955 (Purchase now)
P18 - Computers & Mobile Audio
Saturday, October 7, 2:30 pm — 6:00 pm
Chair: Jerry Bauck, Cooper Bauck Corporation - Tempe, AZ, USA
P18-1 A Hybrid Speech Codec Employing Parametric and Perceptual Coding Techniques—Maciej Kulesza, Grzegorz Szwoch, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A hybrid speech codec for VoIP telephony applications is presented employing combined parametric and perceptual coding techniques. The signal is divided into voiced signal components that are encoded using the perceptual algorithm, unvoiced components that are encoded parametrically, and transients that are not encoded with a lossy method. The codec architecture where voiced part of the CELP residual signal is perceptually encoded and transmitted to the decoder along with the CELP main bit stream is also examined. Various methods for transient detection in the speech signal are discussed. The results of experiments revealing the improved subjective quality of the transmitted speech are also presented.
Convention Paper 6956 (Purchase now)
P18-2 EuCon: An Object-Oriented Protocol for Connecting Control Surfaces to Software Applications—Steve Milne, Euphonix, Inc. - Palo Alto, CA, USA; Phil Campbell, Hobbyhorse Music LLC - Palo Alto, CA, USA; Scott Freshour, Rob Boyer, Jim McTigue, Martin Kloiber, Euphonix, Inc. - Palo Alto, CA, USA
This paper describes a control surface to application protocol that addresses the problem of raising user interface efficiency in increasingly complex software applications. Compared with existing MIDI-based protocols, this protocol was designed to have enough bandwidth, high control resolution, and wide variety of controls to provide software application users with the rich and efficient experience offered by modern large format mixing consoles. Recognizing that today’s audio engineer uses many different applications, it is able to simultaneously control multiple applications running on one or more computers from a single control surface. To give users the widest possible choice of applications, object-oriented design was utilized to promote ease of adoption by software developers.
Convention Paper 6957 (Purchase now)
P18-3 Considerations on Audio for Flash: Getting to the Vector Soundstage—Charles Van Winkle, Adobe Systems Incorporated - Seattle, WA, USA
The Flash Platform has been known for animations and interactivity for some time now and research shows that Flash Player is one the world’s more pervasive software platforms. Although providing audio-rich video or interactive content through Flash is not new, preparing audio assets for Flash is new to many audio professionals. Audio for Flash poses noteworthy changes to audio professionals’ workflows when compared to more customary mediums for video or interactive content, e.g., DVDs or video games. This paper attempts to take a first look at the considerations audio professionals must make when preparing audio assets for Flash. This paper gives an overview of the Flash Platform and takes a first look at the considerations audio professionals must make when preparing audio assets for Flash with modified practices suggested when necessary.
Convention Paper 6958 (Purchase now)
P18-4 5.1 Surround and 3-D (Full Sphere with Height) Reproduction for Interactive Gaming and Training Simulation—Robert (Robin) Miller III, FilmakerTechnology - Bethlehem, PA, USA
Immersive sound for gaming and simulation, perhaps more than for music and movies, requires preserving directionality of direct sounds, both fixed and moving, and acoustical reflections dynamically affecting those sounds, to effect the spatiality being presented. Conventionally (as with popular music), sources are panned close-microphone signals or synthesized sounds; the presentation pretends “They are here,” where spatiality is largely that of the listening environment. Convolution with room impulse responses can contribute diffuse ambience but not “real” spatiality and tone color. These issues pertain not only to 5.1 where reproduction is a 2-D horizontal circle of loudspeakers, but to advanced 3-D interactive reproduction, where the listener perceives the experience at the center of the sphere of natural hearing. Production techniques are introduced that satisfy both 3-D and compatible 5.1. Independent measurement confirms that the system preserves directionality and reproduces life-like spatiality and tone color continuously in the 3-D perception sphere.
Convention Paper 6959 (Purchase now)
P18-5 Automatic Volume and Equalization Control in Mobile Devices—Alexander Goldin, Alango Ltd. - Haifa, Israel; Alexey Budkin, Alango Ltd. - St. Petersburg, Russia
Noise spectrum and level are changed dynamically in mobile environments. Loudspeaker volume comfortable in quiet conditions becomes too low when the ambient noise level increases significantly. Loudspeaker volume adjusted for good intelligibility in high ambient noise becomes annoyingly loud in quiet. Automatic Volume Control may compensate for different levels of ambient noise by increasing or decreasing the loudspeaker gain accordingly. However, if the noise and sound spectra are very different, such simple gain adjustment may not work well. More advanced technology will dynamically equalize reproduced sound so that it exceeds the noise level by a specified ratio all over the frequency range. This paper describes principles and practical aspects for Automatic Volume and Equalization Control in mobile audio and communication devices.
Convention Paper 6960 (Purchase now)
P18-6 Speech Source Enhancement Using Modified ADRess Algorithm for Applications In Mobile Communications—Niall Cahill, Rory Cooney, Kenneth Humphreys, Robert Lawlor, National University of Ireland - Maynooth, Ireland
An approach to refine and adapt an existing music sound source separation algorithm to speech enhancement is presented. The existing algorithm has the capability to extract music sources from stereo recordings using the position of the sources in the stereo field. Described in this paper is the ability of a Modified Azimuth Discrimination and Resynthesis algorithm (M-ADRess) to enhance speech in the presence of noise using a two-microphone array. Also proposed is a novel extension to the algorithm, which enables further noise removal from speech based on elevation angle of arrival. Objective measures of processed speech show the suitability of M-ADRess for cleaning noisy speech mixtures in an anechoic environment.
Convention Paper 6961 (Purchase now)
P18-7 Frame Loss Concealment for Audio Decoders Employing Spectral Band Replication—Sang-Uk Ryu, Kenneth Rose, University of California at Santa Barbara - Santa Barbara, CA, USA
This paper presents an advanced frame loss concealment technique for audio decoders employing spectral band replication (SBR). The high frequency signal of the lost frame is reconstructed by estimating the parametric information involved in the SBR process. Utilizing all SBR data from the previous and next frame, the high-band envelope is adaptively estimated from the energy evolution in the surrounding frames. The tonality control parameters are determined in a way that ensures smooth inter-frame transition. Subjective quality evaluation demonstrates that the proposed technique implemented within the aacPlus SBR decoder offers better audio quality after concealment than that achieved by the technique adopted in the standard aacPlus decoder.
Convention Paper 6962 (Purchase now)
P19 - Multichannel Sound
Sunday, October 8, 9:00 am — 11:30 am
Chair: Nicolas Saint-Arnaud, Dolby Laboratories - San Francisco, CA, USA
P19-1 High-Frequency Interpolation for Motion-Tracked Binaural Sound—Roger Hom, Intel Corp. - Folsom, CA, USA; V. Ralph Algazi, Richard Duda, University of California at Davis - Davis, CA, USA
Motion-tracked binaural (MTB) recording captures and exploits localization cues resulting from head rotation. The pressure field around the recording head is sampled with several microphones, and a head tracker on the listener’s head is used to interpolate between the microphone signals. Although time-domain interpolation works at low frequencies, phase interference causes problems at high frequencies. We previously reported on a simple procedure whereby low-frequency components were continuously interpolated, but high-frequency components were obtained from the microphone nearest to the listener’s ear. Although effective, this technique may result in audible switching artifacts. In this paper we present and evaluate methods for continuous high-frequency interpolation of the spectral magnitudes of adjacent microphones that essentially eliminate spectral discontinuities arising from head rotation.
Convention Paper 6963 (Purchase now)
P19-2 Perceptual Importance of Karhunen-Lòeve Transformed Multichannel Audio Signals—Lars Henning, Yu Jiao, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
The Karhunen-Lòeve Transform (KLT) can be used to reduce the interchannel redundancy of multichannel audio signals. For this paper the perceptual importance of Karhunen-Lòeve transformed multichannel audio signals was systematically studied using two experiments. The first experiment investigated the perceptual effects caused by removing some KLT eigenchannels. The results showed that some eigenchannels are not perceptually important and consequently can be discarded with minimal degradation of basic audio quality. The second experiment involved further investigation on the perceptual effect of KLT processing on the audio quality of multichannel audio as a function of the nature of the multichannel audio and eigenvalue extraction methods of KLT processing. It was also attempted to establish the relationship between the order of perceptual importance and the order of statistical importance of KLT eigenchannels.
Convention Paper 6964 (Purchase now)
P19-3 A New Upmixer for Enhancement of Reverberance Imagery in Multichannel Loudspeaker Audio Scenes—John Usher, McGill University - Montreal, Quebec, Canada
This paper introduces a new signal processing system which enhances reverberance imagery (i.e., perceived ambiance or listener envelopment) in multichannel loudspeaker audio scenes. Sound components that affect reverberance imagery are extracted from a pair of unencoded audio signals and are radiated with two additional loudspeakers behind the listener. The new “ambiance extraction” system improves upon all extant systems by using a novel automatic (blind) equalizer based on the normalized least means square (NLMS) algorithm to align the input signals with respect to both level and phase in order to create the difference signal. The alignment is typically undertaken using a 1024-tap frequency and ±10 ms time equalizer, which allows sound components with a high short-term correlation to be removed from the input audio signals. Subjective and objective evaluation was undertaken with recordings of solo musical performances in a concert hall, and show that the new system provides a new, computationally practical, high-quality solution to the problem of ambiance extraction for audio upmixing.
Convention Paper 6965 (Purchase now)
P19-4 Natural Reproduction of a Symphony Orchestra by the Advanced Multichannel Live Sound System—Kimio Hamasaki, Toshiyuki Nishiguchi, Hiroyuki Okubo, Yasushige Nakayama, Reiko Okumura, Masakazu Iwaki, NHK Science & Technical Research Laboratories - Tokyo, Japan
An advanced multichannel audio system for reproducing a live sound field with an ultimate sensation of presence and reality was set up and studied. The goal of this system is to provide listeners with a natural reproduction of orchestral music, as if they were hearing it in an actual sound field such as that in a concert hall. Subjective evaluations of hearing impression on orchestral sound were carried out to determine which attributes of a front sound stage were necessary for the natural reproduction of an orchestra. The results of the evaluations showed that perceptions of width, depth, and localization of the orchestral sound influence the impressions of presence and reality.
Convention Paper 6966 (Purchase now)
P19-5 Localization in Horizontal-Only Ambisonic Systems—Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA; Richard Lee, Consultant - Cooktown, Queensland, Australia; Aaron Heller, SRI International - Menlo Park, CA, USA
Ambisonic reproduction systems are unique in their ability to separately reproduce the pressure and velocity components of the recorded audio signals. Gerzon proposed a theory of localization in which the human auditory system is presumed to localize using the direction of the velocity vector in the reproduced sound at low frequencies and the energy vector at high frequencies. An Ambisonic decoder has the energy and velocity vectors coincident. These are the directions of the apparent source when the listener can turn to face it. Separately maximizing the low-frequency and mid/high-frequency operation of the reproduction system can optimize localization where the listener cannot turn to face the apparent source. We test the localization of horizontal-only Ambisonic reproduction systems using various narrow-band test signals to separately evaluate low-frequency and mid-frequency localization.
Convention Paper 6967 (Purchase now)
P20 - Audio Content: Interpretation and Management
Sunday, October 8, 9:00 am — 12:30 pm
Chair: Mark Sandler, Queen Mary, University of London - London, UK
P20-1 An Experimental Verification of Localization in Two-Channel Stereo—Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA
In two-channel stereo the ratio of intensities between two loudspeakers is varied, and at low frequencies the differences in times-of-arrival of the sounds create phase differences between the two ears. These phase differences mimic those experienced in natural hearing, and thus the perceived localization is similar. The experiments described in this paper test the localization provided by stereo in actual use. The perceptions of listeners were collected and the acoustic signals at the entrance to their ear canals were recorded for analysis. Localization under optimum conditions gave results that are substantially similar to what is predicted by theory. Localization in sub-optimum conditions, such as at very low frequencies and such as are encountered in automobiles, was found to be substantially in error.
Convention Paper 6968 (Purchase now)
P20-2 Solving the Sticky Shed Problem in Magnetic Recording Tapes: New Laboratory Research and Analysis Provide a Safe and Effective Remedy—Charles Richardson, Richardson Audio Video - Annapolis, MD, USA
The goal is to make available to AES’s members new research of its author and a leading analytical laboratory concerning: (a) the primary causes and principal source of sticky shed material found on magnetic tapes; (b) the unnecessary damage which baking tapes causes; and (c) the development of a new, safe and effective process that restores contaminated tapes to their originally anticipated life span and allows repeated, trouble-free playbacks with excellent sonic performance. The methods used were: (a) chemical analysis of tapes’ composition with and without sticky shed, (b) electron microscopic imaging of contaminated and remediated tapes, and (c) stickion-friction measurements of tapes without back coating and free of sticky shed, with back coating and sticky shed, and after restoration. The key findings are: (a) heat and hydrolysis cause sticky shed, (b) back coating is the source of most of the sticky shed, (c) baking causes degraded playback and permanent damage, and (d) correct removal of back coating restores most problem tapes to long life allowing many trouble-free playbacks providing excellent sonic performance.
Convention Paper 6969 (Purchase now)
P20-3 Tape Degradation Factors and Predicting Tape Life—Richard Hess, Vignettes Media - Aurora, Ontario, Canada
From 1947 through the 1990s, most of the world’s sound was entrusted to analog magnetic recording tape for archival storage. Now that analog magnetic tape has moved into a niche market, audio professionals and archivists worry about the remaining lifetime of existing tapes. This paper defines the basic tape types and the current state of knowledge of their degradation mechanisms. Conflicting prior work is reviewed and correlated with current experience. Illustrations of various types of tape degradations and a survey of many of the techniques used for tape restoration are included. Suggestions are made for further research and archival practices.
Convention Paper 6970 (Purchase now)
P20-4 Music Metadata Quality: A Multiyear Case Study Using the Music of Skip James—Adrian Freed, University of California at Berkely - Berkeley, CA, USA
The case study reported here is an exploratory step toward developing a quantitative system for audio and music metadata quality measurement. Errors, their sources, and their propagation mechanisms are carefully examined in a small but meaningful subset of music metadata centered on a single artist Skip James.
Convention Paper 6971 (Purchase now)
P20-5 Stop Counting Samples—Thomas Lund, TC Electronic A/S - Risskov, Denmark
Level restriction in digital music production has traditionally been based on simply measuring the value of individual samples. Where sample counting may have been appropriate in the early days of digital, previous work has revealed how processing now exploits our archaic measurement principles to an extent where significant distortion can be expected to develop downstream of the studio in perceptual codecs, DA, and sample rate converters. This paper suggests that production methods in combination with simplistic level assessment is responsible not only for more distortion and listener fatigue, but also for level jumps where digital interfacing or file transferring is used, e.g., at a broadcast station. Improved working practices and measurement methods are suggested.
Convention Paper 6972 (Purchase now)
P20-6 A Real-Time Rhythmic Analyzer and Equalizer—Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
The rhythmic analyzer and equalizer presented in this paper allows to cut or boost the signal at a given audio frequency and a given rhythmic frequency, that is, number of beats per minute (BPM). A task that can be addressed with the rhythmic equalizer is, for instance, to emphasize series of 1/8 triplet notes played on the hi-hat of a drum set. The software works in real time and offers an interactive graphical user interface that supports both analysis and adjustment. The current energy distribution in the two-dimensional audio frequency (Hz) / rhythmic frequency (BPM) space is displayed as a continuously updated backdrop image. The user paints the intended adjustments of BPM levels and phases onto an image layer on top of this image.
Convention Paper 6973 (Purchase now)
P20-7 Blind Dereverberation of Audio Signals Using a Modified Constant Modulus Algorithm—Hesu Huang, Chris Kyriakakis, University of Southern California - Los Angeles, CA, USA
The single-channel blind dereverberation approach we present in this paper is an extension to the one based on Constant Modulus Algorithm (CMA) we proposed in previous work. By substituting the modified CMA algorithm for the original CMA algorithm, we demonstrate a more suitable approach for blind deconvolution of reverberant audio signals with super-Gaussian distribution. To further improve the performance, the modified CMA is applied to the LP residual instead of the time domain signal because of the flatter spectrum provided by the Linear Prediction (LP) residual approach. In real implementations, a Delayless Subband Adaptive Filtering (DSAF) architecture is also combined with CMA to further reduce the computational complexity. Experimental results show that our modified method outperforms previous approaches in audio signal blind dereverberation.
Convention Paper 6974 (Purchase now)
P21 - Posters: Psychoacoustics and Perception
Sunday, October 8, 9:30 am — 11:00 am
P21-1 Perceptual Importance of the Number of Loudspeakers for Reproducing the Late Part of a Room Impulse Response—Audun Solvang, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
A sound field generated by 16 loudspeakers in the horizontal plane was used as reference, and the impairment introduced by using 8, 4, and 3 loudspeakers for reproducing the late part of the room impulse response was investigated using listening tests. Stimuli were synthesized from repetitive octave-band wide pulses that were convolved with room impulse responses, and tempo as well as octave-band center frequencies were varied. Results show generally a barely perceptible impairment. Increasing the tempo led to a larger impairment for all loudspeaker configurations and frequencies. The impairment depended on the number of loudspeakers at 8 kHz but not at 250 Hz or 1 kHz. The reverberation in the listening room, 0.12 - 0.20 s, might have masked fluctuations in interaural time differences that are the dominating cue for 250 Hz and 1 kHz. The reverberation time was, however, so short that it hardly influenced fluctuations in the interaural level differences, the dominating cue at 8 kHz.
Convention Paper 6975 (Purchase now)
P21-2 A System for Adapting Broadcast Sound to the Aural Characteristics of Elderly Listeners—Tomoyasu Komori, Tohru Takagi, NHK Science & Technical Research Laboratories - Setagaya-ku, Tokyo, Japan
This paper describes an adaptive sound reproduction system for elderly listeners. We developed new audiometric equipment to gauge MAF (Minimum Audible Field) of listeners in the range from 125 Hz to 1 6kHz. We found the average MAF by age for people from their twenties to eighties and investigated ways to adapt speech signals for elderly listeners based on their aural characteristics. The system adjusts the speech signal energy with reference to the partitioned frequency band below the average MAF. We have broadcasted the pilot programs by using a simple method the speech mixed with the BGM (Back Ground Music) that is only reduced by 6 dB from the original level. We confirmed that our proposed method is preferable to the simple method.
Convention Paper 6976 (Purchase now)
P21-3 A Comparison between Spatial Audio Listener Training and Repetitive Practice—Rafael Kassier, Tim Brookes, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Despite the existence of various timbral ear training systems, relatively little work has been carried out into listener training for spatial audio. Additionally, listener training in published studies has tended to extend only to repetitive practice without feedback. In order for a generalized training system for spatial audio listening skills to prove effective, it must demonstrate that learned skills are transferable away from the training environment, and it must compare favorably with repetitive practice on specific tasks. A novel study has been conducted to compare a generalized training system with repetitive practice on performance in spatial audio evaluation tasks. Transfer is assessed and practice and training are compared against a control group for tasks involving both near and far transfer.
Convention Paper 6977 (Purchase now)
P21-4 Quantified Total Consonance as an Assessment Parameter for the Sound Quality—Sang Bae Chon, In Yong Choi, Mingu Lee, Koeng-Mo Sung, Seoul National University - Seoul, Korea
There have been many attempts to quantify consonance. This paper introduces a more efficient and systematic algorithm for consonance quantification than the conventional definitions that were used in the past. We also verify that the quantified consonance can be treated as an additional psychoacoustical parameter to evaluate the sound quality of a certain noise-like sound from dual horns of a vehicle.
Convention Paper 6978 (Purchase now)
P21-5 Music Genre Categorization in Humans and Machines—Enric Guaus, Perfecto Herrera, High Music School of Catalonia - Barcelona, Spain, and Universitat Pompeu Fabra, Barcelona, Spain
Music genre classification is one of the most active tasks in music information retrieval (MIR). Many successful approaches can be found in literature. Most of them are based on machine learning algorithms applied to different audio features automatically computed for a specific database. But there is no computational model that explains how musical features are combined in order to yield genre decision in humans. In this paper we present a series of listening experiments where audio has been altered in order to preserve some properties of music (rhythm, harmony, etc.) but at the same time degrading other ones. Results will be compared with a series of state-of-the-art genre classifiers based on these musical properties. We draw some lessons from comparing them.
Convention Paper 6979 (Purchase now)
P22 - Posters: Signal Processing
Sunday, October 8, 11:30 am — 1:00 pm
P22-1 Decoding Second Order Ambisonics to 5.1 Surround Systems—Martin Neukom, Zurich School of Music, Drama and Dance, HMT - Zurich, Switzerland
In order to play back Higher Order Ambisonics (HOA) in concert, symmetric loudspeaker set-ups with a large number of speakers are used. At the moment the only possibilities to provide Ambisonics to home users are the rendering for headphones with HRTF and the conversion to 5.1 surround systems. This paper shows the difficulties and limitations of the conversion of Higher Order Ambisonics to 5.1 surround and presents some viable solutions.
Convention Paper 6980 (Purchase now)
P22-2 Artificial Reverberation: Comparing Algorithms by Using Binaural Analysis Tools—Joerg Bitzer, Denis Extra, University of Applied Science Oldenburg - Oldenburg, Germany
Different measures are known for objective measurements of the spatial quality of concert halls . Many of these measures are based on analyzing the binaural impulse responses. In this paper we will compare different algorithms for artificial reverberation in terms of these measures. The tested algorithms are commercially available devices and digital plug-ins in a broad price range. For the analysis, we programmed an analysis toolbox that contains several binaural analysis methods, including the interaural cross-correlation and the interaural difference. Furthermore, lesser known measures, modifications, and new techniques will be presented. The results indicate that objective measures can give some first impression of the spatial quality of reverberation devices.
Convention Paper 6981 (Purchase now)
P22-3 Loudspeaker and Room Response Modeling with Psychoacoustic Warping, Linear Prediction, and Parametric Filters—Sunil Bharitkar, Chris Kyriakakis, Audyssey Laboratories, Inc. - Los Angeles, CA, USA
Traditionally, room response modeling is performed to obtain lower order room impulse response models for real-time applications. These models can be FIR or IIR, and may be either linear-phase or minimum-phase. In this paper we present an approach to model room responses using linear predictive coding (LPC) and parametric filters designed in the frequency warped domain. Frequency warping to the psychoacoustic Bark scale allows significant lower filter order designs. Within this context, the LPC model utilizes a significantly lower number of poles to model room resonances at low frequencies in the warped domain. The relatively low-order LPC pole locations and gains are then used to determine the center frequencies, the gain, and Q of a parametric filter bank. Gain and Q optimization of the parametric filter bank is performed to match the parametric filter spectrum to the LPC spectrum. Subsequently, the second-order poles and zeros of the parametric filter bank are directly unwarped back into the linear domain for low-complexity real-time applications. The results show that warping lowers the computational requirements for determining the roots as the density of the roots, and the number of roots of the LPC polynomial is substantially reduced. Furthermore, results fro sing simply four-to-six parametric filter banks, modeled from the LPC spectrum, below 400 Hz show significant equalization.
Convention Paper 6982 (Purchase now)
P22-4 Contactless Hearing Aid for Infants Employing Signal Processing Algorithms—Maciej Kulesza, Piotr Dalka, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - and Institute of Physiology and Pathology of Hearing, Gdansk, Poland
The proposed contactless hearing aid is designed to be attached to the infant’s crib for sound amplification in a free field. It consists of a four-electret microphone matrix and a prototype DSP board. The compressed speech is transmitted and amplified via miniature loudspeakers. Algorithms that are worked out deal with parasitic feedback, which occurs due to the small distance between microphone and monitors and potentially high amplification required. The beamforming algorithm is based on an artificial neural network (ANN). The ANN is used as a nonlinear filter in the frequency domain. Principles of algorithms engineered and the prototype DSP unit design are presented in the paper. Also, results of experiments simulating the real-life conditions are analyzed and discussed.
Convention Paper 6983 (Purchase now)
P22-5 An Enhanced Implementation of the ADRess (Azimuth Discrimination and Resynthesis) Music Source Separation Algorithm—Rory Cooney, Niall Cahill, Robert Lawlor, National University of Ireland - Maynooth, Co. Kildare, Ireland
In this paper we present a novel enhancement to an existing music source separation algorithm that allows for a 76 percent decrease in computational load while enhancing its separation capabilities. The enhanced implementation is based on the ADRess (Azimuth Discrimination and Resynthesis) algorithm, which performs a separation of sources within stereo music recordings based on the spatial audio cues created by source localization techniques. The ADRess algorithm employs gain scaling and phase cancellation techniques to isolate sources based on their position across the stereo field. Objective measures and subjective listening tests have shown the separation performance of the enhanced algorithm to be objectively and perceptually comparable with that of the original ADRess algorithm, while realizing a finer spatial resolution.
Convention Paper 6984 (Purchase now)
P22-6 A Simple, Robust Measure of Reverberation Echo Density—Jonathan Abel, Universal Audio, Inc. - Santa Cruz, CA, USA, Stanford, University, Stanford, CA, USA; Patty Huang, Stanford University - Stanford, CA, USA, Helsinki University of Technology, Espoo, Finland
A simple, robust method for measuring echo density from a reverberation impulse response is presented. Based on the property that a reverberant field takes on a Gaussian distribution once an acoustic space is fully mixed, the measure counts samples lying outside a standard deviation in a given impulse response window and normalizes by that expected for Gaussian noise. The measure is insensitive to equalization and reverberation time and is seen to perform well on both artificial reverberation and measurements of room impulse responses. Listening tests indicate a correlation between echo density measured in this way and perceived temporal quality or texture of the reverberation.
Convention Paper 6985 (Purchase now)
P23 - Room and Architectural Acoustics
Sunday, October 8, 2:00 pm — 4:00 pm
Chair: Jan Voetmann, DELTA Acoustics and Vibration - Hoersholm, Denmark
P23-1 Pole-Zero Analysis of the Soundfield in Small Rooms at Low Frequencies—Jack Oclee-Brown, KEF Audio (UK) - Maidstone, Kent, UK
At low frequencies in small rooms the number of acoustic modes is sparse. In this region the sound field can be most easily modeled using the method of modal decomposition. From this it is possible to extract the transfer behavior of the room for different source and receiver positions and to determine the pole and zero locations. In this paper the locations of the poles and zeros are discussed, and it is demonstrated that the locations of the poles are independent of the source and receiver positions. The result of some room correction methods and other effects are shown in this context.
Convention Paper 6986 (Purchase now)
P23-2 Modeling of Loudspeaker Systems Using High-Resolution Data—Stefan Feistel, Wolfgang Ahnert, Ahnert Feistel Media Group - Berlin, Germany
The need for high-resolution loudspeaker data is evaluated in detail, particularly complex data in original impulse response or frequency response formats, and how a new data format proposed earlier can be used for storing this and other information required to adequately describe a complex loudspeaker system. Prediction results for several loudspeaker models are compared based on different spectral and spatial resolutions. Calculations are also compared against measurements for different loudspeaker types, such as multi-way loudspeakers, clusters, and line array systems. Finally, the advantages of more precise predictions are discussed with respect to increasing requirements regarding computer performance and data storage.
Convention Paper 6987 (Purchase now)
P23-3 Software Based Live Sound Measurements—Wolfgang Ahnert, Stefan Feistel, Alexander Miron, Enno Finder, Ahnert Feistel Media Group - Berlin, Germany
In previous publications the authors introduced the software based measurement system EASERA to be used for measurements with different excitation signals like Sweep, MLS or Noise. The actual approach extends the range of excitations to natural signals like speech and music. This paper investigates selected parameters such as frequency range, dynamic range, and fluctuation of the signal and the signal duration in order to reach conclusions about the conditions required to obtain results comparable with standard excitation signals. In this respect the limitations of the standard stimuli and the proposed natural stimuli are also discussed.
Convention Paper 6988 (Purchase now)
P23-4 Detection of Localized Sound Leaks in Walls and Their Effects on the Speech Privacy (Security) of Closed Rooms—Bradford Gover, John Bradley, National Research Council Canada - Ottawa, Ontario, Canada
A new speech privacy measurement procedure accurately indicates the degree of speech privacy at individual listening locations outside of a closed room, including near localized weak spots. To investigate the importance of various defects (such as penetrations, electrical outlets), they were introduced into a test wall dividing two reverberation rooms. For each configuration of the wall, the following sound transmission measurements were made from one room to the other: (i) a standard transmission loss test, (ii) the new speech privacy measurement procedure, and (iii) impulse response measurements to a highly-directional spherical microphone array. The results indicate the degree to which the various defects affect the speech privacy conditions and the extent to which they are detectable by the various methods.
Convention Paper 6989 (Purchase now)