AES New York 2009
Paper Session Details
P1 - Audio Perception
Friday, October 9, 9:00 am — 12:30 pm
Chair: Poppy Crum, Johns Hopkins School of Medicine - Baltimore, MD, USA
P1-1 Effect of Whole-Body Vibration on Speech. Part I: Stimuli Recording and Speech Analysis—Durand Begault, NASA Ames Research Center - Moffett Field, CA, USA
In space launch operations, speech intelligibility for radio communications between flight deck and ground control is of critical concern particularly during launch phases of flight having a predicted 12 Hz thrust oscillation. The potential effects of extreme acceleration and vibration during launch scenarios may impact vocal production. In this study, the effect of 0.5 and 0.7 g whole body vibration was evaluated on speech production of words (Diagnostic Rhyme Test word list). Six subjects were recorded in a supine position using a specially-designed chair and vibration platform. Vocal warbling, pitch modulation, and other effects were observed in the spectrographic and fundamental frequency analyses.
Convention Paper 7820 (Purchase now)
P1-2 Comparison of Objective Assessment Methods for Intelligibility and Quality of Speech—Juan-Pablo Ramirez, Alexander Raake, Deutscher Telekom Laboratories, TU Berlin - Berlin, Germany
Subjective rating of the quality of speech in narrow-band telecommunication is parametrically assessed by the so-called E-model. Intelligibility of the speech signal transmitted has a significant impact on the opinion of users. The Speech Intelligibility Index quantifies the amount of speech perceptual features available to the listener in conditions with background noise and linear frequency distortion. It has shown to be highly correlated with subjective speech recognition performance. This paper proposes a comparison between both models. It refers to, and details, improvements toward the modeling of quality in wide-band transmission.
Convention Paper 7821 (Purchase now)
P1-3 A Novel Listening Test-Based Measure of Intelligibility Enhancement—Markus Kallinger, Henning Ochsenfeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Anne Schlüter, University of Applied Science Oldenburg - Oldenburg, Germany
One of the main tasks of speech signal processing aims at increasing the intelligibility of speech. Furthermore, in environments with low ambient noise the listening effort can be supported by appropriate algorithms. Objective and subjective measures are available to evaluate these algorithms’ performance. However, most of these measures are not specifically designed to evaluate the performance of speech enhancement approaches in terms of intelligibility improvement. This paper proposes a novel listening test-based measure, which makes use of a speech intelligibility test, the Oldenburg Sentence Test (German Oldenburger Satztest, OLSA). Recent research results indicate a correlation between listening effort and speech intelligibility. Therefore, we propose to use our measure for both intelligibility enhancement for algorithms being operated at low signal-to-noise ratios (SNRs) and listening effort improvement at high SNRs. We compare the novel measure to results obtained from listening test-based as well as instrumental evaluation procedures. Good correlation and more plausible results in specific situations illustrate the potential of the proposed method.
Convention Paper 7822 (Purchase now)
P1-4 Which Wideband Speech Codec? Quality Impact Due to Room-Acoustics at Send Side and Presentation Method—Alexander Raake, Marcel Wältermann, Sascha Spors, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
We report on two listening tests to determine the speech quality of different wideband (WB) speech codecs. In the first test, we studied various network conditions, including WB–WB and WB–narrowband (WB–NB) tandeming, packet loss, and background noise. In addition to other findings, this test showed some codec quality rank-order changes when compared to the literature. To evaluate the hypothesis that secondary test factors lead to this rank-order effect, we conducted another speech quality listening test. Here we simulated different source material recording conditions (room-acoustics and microphone positions), processed the material with different WB speech coders, and presented the resulting files monotically in one and diotically in another test. The paper discusses why and how these factors impact speech quality.
Convention Paper 7823 (Purchase now)
P1-5 Evaluating Physical Measures for Predicting the Perceived Quality of Blindly Separated Audio Source Signals—Thorsten Kastner, University of Erlangen-Nuremberg - Erlangen, Germany, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
For blind source separation (BSS) based applications where the aim is the reproduction of the separated signals, the perceived quality of the produced audio signals is an important key factor to rate these systems. In this paper several signal-derived features are compared to assess their relevance in reflecting the perceived audio quality of BSS signals. The most relevant features are then combined in a multiple linear regression model to predict the perceptual quality. In order to cover a large variety of source signals and different algorithms, the reference ratings are obtained from extensive listening tests rating the BSS algorithms that participated in the Stereo Source Separation Campaigns 2007 (SASSEC) and 2008 (SiSEC). Results are presented for predicting the perceived quality of SiSEC items based on a model that was calibrated using SASSEC material.
Convention Paper 7824 (Purchase now)
P1-6 Statistics of MUSHRA Revisited—Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany, TU Ilmenau, Ilmenau, Germany; Judith Liebetrau, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Sebastian Schneider, TU Ilmenau - Ilmenau, Germany
Listening tests are the final instance when judging perceived audio quality. To achieve reliable and repeatable results, the experimental design and the statistical analysis of results are of great importance. The “triple stimulus with hidden reference” test (Rec. ITU-R BS.1116) and the MUSHRA test (multi-stimulus with hidden reference and anchors, Rec. ITU-R BS.1534, MUSHRA) are well established standardized listening tests. Traditionally, the statistical analysis of both is based on simple parametric statistics. This paper reanalyzes the results from MUSHRA tests with alternative statistical approaches mainly considering the fact that in MUSHRA every listener is not only assigning a score to each item, but is performing an inherent ranking test and a paired comparison test (“better-worse”) between pairs of stimuli Thus, more statistical information is made visible.
Convention Paper 7825 (Purchase now)
P1-7 Statistical Analysis of ABX Results Using Signal Detection Theory—Jon Boley, LSB Audio - Lafayette, IN, USA; Michael Lester, LSB Audio - Lafayette, IN, USA, Shure Incorporated, Niles, IL, USA
ABX tests have been around for decades and provide a simple, intuitive means to determine if there is an audible difference between two audio signals. Unfortunately, however, the results of proper statistical analyses are rarely published along with the results of the ABX test. The interpretation of the results may critically depend on a proper statistical analysis. In this paper a very successful analysis method known as signal detection theory is presented in a way that is easy to apply to ABX tests. This method is contrasted with other statistical techniques to demonstrate the benefits of this approach.
Convention Paper 7826 (Purchase now)
P2 - Music Production
Friday, October 9, 9:00 am — 12:30 pm
Chair: Jason Corey, University of Michigan - Ann Arbor, MIm USA
P2-1 Computational Optimization of a Practical End-Fire Loudspeaker Array—Andrew Christian
The mechanics of an array of loudspeakers that focuses coherent acoustical energy in the longitudinal direction and attempt to cancel it in the transverse direction are discussed. A practical situation is discussed in generality, which leads to the creation of two absolute measures over which the performance of an array may be evaluated. A numerical scheme to evaluate the performance of different configurations of the array is proposed and its accuracy verified. A realistic situation is proposed as a test bed. A simulation is run over all practical configurations of the array, which generates graphs showing the features of these configurations. These results are discussed and an optimized design settled upon. Further practical considerations of end-fire arrays are discussed.
Convention Paper 7827 (Purchase now)
P2-2 Improved Methods for Controlling Touring Loudspeaker Arrays—Ambrose Thompson, Martin Audio - London, UK
Users of modern array loudspeakers, used for high level sound reinforcement, demand more precise control of these systems. Current methods of control were examined and found to be inadequate for meeting a new more stringent set of user requirements. We investigate how these requirements may be formed into a mathematical model of the system suitable for numerical optimization. The primary design variable for optimization was the complex transfer functions applied to each acoustic source. We then describe how the optimized transfer functions were implemented with FIR filters on typically available hardware. Finally, comparison was made between the predicted and measured output for a large array.
Convention Paper 7828 (Purchase now)
P2-3 Investigations on the Inclusion of the LFE Channel in the ITU-R BS.1770-1 Loudness Algorithm—Scott G. Norcross, Michel C. Lavoie, Communication Research Centre - Ottawa, Ontario, Canada
The current ITU-R BS.1770-1 loudness algorithm does not include the LFE channel for 5.1-channel audio signals. It has been proposed that the LFE channel should be included in the loudness measurement to improve measurement accuracy and to fully reflect all the channels of a 5.1 audio signal. On the other hand the exclusion of the LFE channel in most downmixing systems may be one reason not to include it in the loudness measurement. Along with looking at objective pros and cons of adding the LFE channel to the BS.1770-1 loudness algorithm, results of formal subjective tests are used to show the effect of the LFE channel on perceived loudness of multichannel program material.
Convention Paper 7829 (Purchase now)
P2-4 Automatic Equalization of Multichannel Audio Using Cross-Adaptive Methods—Enrique Perez-Gonzalez, Joshua Reis, Queen Mary, University of London - London, UK
A method for automatically equalizing a multi-track mixture has been implemented. The method aims to achieve equal average perceptual loudness on all frequencies amongst all multi-track channels. The method uses accumulative spectral decomposition techniques together with cross-adaptive audio effects to achieve equalization. The method has applications in live and recorded audio mixing where the audio engineer would like to reduce set-up time, or as a tool for inexperienced users wishing to perform audio mixing. Results are reported that show how the frequency content of each channel is modified, and that demonstrate the ability of the automatic equalization method to achieve a well-balanced and equalized final mix.
Convention Paper 7830 (Purchase now)
P2-5 Inside Out— Time Variant Electronic Acoustic Enhancement Provides the Missing Link for Acoustic Music Outdoors—Steve Barbar, E-coustic Systems - Belmont, MA, USA
No matter how good the acoustic ensemble, moving them from the concert hall to an outdoor stage dramatically changes the listening experience for both the musicians, and those in attendance—usually, not for the better. For the musicians, the loss of reflected and reverberant energy alters communication between members of the ensemble. The physiology of playing the instrument changes as well—without support from reflected and reverberant energy, musicians must compensate. Thus while the outdoor performance experience of may be deemed “good” for both those playing as well those listening, it is not the experience that either desire. This paper describes how time variant electro-acoustic enhancement has been successfully used to dramatically improve the acoustical musical experience for outdoor performance.
Convention Paper 7831 (Purchase now)
P2-6 Engineering Outreach for Student Chapter Activities—Scott Porter, Todd Marco, Jeremy Joseph, Jason Morris, The Pennsylvania State University - State College PA, USA
The Penn State Audio Engineering Society Student Section has been active since its establishment in late 1991. Recently, the student officers have made a concerted effort to increase the section’s visibility and outreach to university students in science and engineering disciplines at both the graduate and undergraduate level. To accomplish this, the authors built around the existing infrastructure by adding new events and programs to engage students at a variety of technical, artistic, and interpersonal levels. In this paper the section’s core programming will be briefly discussed and followed by an examination of the additional events that have attracted new science and engineering students to the section.
Convention Paper 7832 (Purchase now)
P2-7 Desktop Music Production and the Millennials: A Challenge for Educators, Researchers, and Audio Equipment and Music Software Industry—Jan-Olof Gullö, Royal College of Music - Stockholm, Södertörn University, Huddinge, Sweden
Music is very important for today’s youth the Millennials. They often listen to music for hours every day and many also produce music by themselves. As a result young people now show different musical abilities compared with earlier generations. New software for music production, combined with the development of less expensive but more powerful computers, has made Desktop Music Production available to a large public. Producers of music production software also show a growing interest in the educational market. This raises questions about what demands this puts on the training and work of teachers in music production and audio education as well as the future challenges to suppliers of music production software and music technology.
Convention Paper 7833 (Purchase now)
P3 - Transducers and Amplifiers
Friday, October 9, 10:00 am — 11:30 am
P3-1 Target Modes in Moving Assemblies of Pleated Loudspeaker—Jose Martínez, Fernando Bolaños, Acustica Beyma S.L. - Moncada, Valencia, Spain; Enrique Gonzalo Segovia Eulogio, Jaime Ramis Soriano, Universidad de Alicante - Alicante, Spain
In this paper we present the process followed for the adjustment of a numerical model in finite elements of the mechanical behavior of a pleated loudspeaker, based on the AMT technology (Air Motion Transformer). In this type of transducer, the diaphragm is formed by longitudinal folds. In the internal face of each one of these folds is printed a conductive ribbon. We have obtained first the participation factors and the generalized mass from the results of a natural vibration modal analysis. Next, an analysis is realized taking into account the loss factors of the materials, followed by a forced vibration modal analysis. Finally, a method is described for the characterization of the materials (Young Modulus and Loss Factor), by using modal analysis techniques.
Convention Paper 7835 (Purchase now)
P3-2 Cone Shape Optimization Based on FE/BE Simulation to Improve the Radiated Sound Field—Patrick Macey, PACSYS Limited - Nottingham, UK
An optimization procedure is used in conjunction with finite/boundary element simulation to adjust the shape of an axisymmetric cone, defined as initially straight, and improve the radiated sound field, while keeping the maximum depth as a constraint. The effect of several different objective functions is considered. The optimization procedure is made more feasible by reducing the effect of local minima by artificially high damping applied to some components within the drive unit.
Convention Paper 7836 (Purchase now)
P3-3 Study and Improvement of DC-Link Perturbations Models for DCI-NPC Power Amplifiers—Vicent Sala, Luis Romeral, Universitat Politecnica de Catalunya - Terrassa, Spain
This paper presents the most important sources of distortion in high power DCI-NPC amplifiers. These distortions sources are attributed to power supply perturbations due to the DC-Link midpoint voltage oscillations. We have presented a classic model for assessing the magnitude of these disturbances and justified the need to correct the classic model to accommodate the real load impedance variations. A new model is proposed, and this is compared and studied with the classic model by analytical, experimental, and simulation methods. This paper concludes that in control or cancellation applications it is necessary to use models that include the load impedance variations regarding the frequency.
Convention Paper 7837 (Purchase now)
P3-4 Active Control Based on an Estimator for the Bus-Pumping Cancellation in the Half-Bridge Class-D Audio Power Amplifiers—Vicent Sala, Luis Romeral, Universitat Politecnica de Catalunya - Terrassa, BCN, Spain
This paper presents a new technique to avoid the distortion generated due to midpoint voltage variations on DC-Bus of Class-D Half Bridge audio amplifiers. An usual distortion source on a Class-D amplifier is the Bus-Pumping effect. Bus-Pumping causes characteristic distortion and introduces attenuation on output gain. Due to this effect the amplifier efficiency decreases. By including distortion factors on the Half-Bridge amplifier model a new Hybrid Active Control (HAC) is implemented. The HAC minimizes distortion due to midpoint DC-Bus voltage variations. Simulation and experimental results confirm the theoretical approach of parameter influence on Bus-Pumping and the effectiveness of HAC implemented control. Results show a reduction of general distortion, which allews incremented audio amplifier efficiency.
Convention Paper 7838 (Purchase now)
P3-5 Simple Amplifier for Single Frequency Subwoofer—Vladimir E. Filevski, Audio Expert DOO - Skopje, Macedonia
Frequency mapped amplifier driving single frequency subwoofer is an inexpensive way to add the missing low tones to small satellite loudspeakers. The whole amplifier system consists of a band-pass filter (typically 20–120 Hz), an envelope detector, single frequency (typically 50–60 Hz) constant amplitude generator, mixer (that multiplies output from the envelope detector and single frequency generator), and a conventional power amplifier. The amplifier proposed in this paper unites the functions of the mixer, the generator, and the power amplifier in a single unit and does not need a DC power supply, but it runs on 50/60 Hz AC power supply, without rectifier and without big voltage-smoothing capacitors. With an appropriate MOSFETs the proposed amplifier can run directly on the 120 V/60 Hz mains supply line—without a power transformer; but in that case, it needs a loudspeaker with a sufficiently high impedance on the frequency of 60 Hz in order not to stress output transistors of the amplifier.
Convention Paper 7839 (Purchase now)
P4 - Transducer Modeling and Design
Friday, October 9, 2:30 pm — 7:00 pm
Chair: Siegfried Linkwitz
P4-1 Modeling the Intermodulation Distortion of a Coaxial Loudspeaker—Edward Dupont, Stanley Lipshitz, University of Waterloo - Waterloo, Ontario, Canada
This paper is an attempt to explain the intermodulation distortion of a coaxial loudspeaker driver. Such a loudspeaker, in which the woofer and tweeter are excited at frequencies f1 and f2 respectively, is known to produce sum and difference frequencies f± = f1 ± f2. Generation of these can be attributed to both the nonlinearity of the equations of motion and the Lagrangian boundary behavior of the woofer. A simplified model is used consisting of an acoustic monopole located in front of a baffled planar piston. To characterize the phenomena of interest the second-order equation for pressure is used. An exact integral solution is then given for the f± pressure terms. A special case analytic solution is also discussed. Several numerical investigations of the model are performed and compared with experiment.
Convention Paper 7840 (Purchase now)
P4-2 Study and Characterization of the Odd and Even Nonlinearities in Electrodynamic Loudspeakers by Periodic Random-Phase Multisines—Pepe Gil-Cacho, Toon van Waterschoot, Marc Moonen, Katholieke Universiteit Leuven (KUL) - Leuven, Belgium; Søren Holdt Jensen, Aalborg University - Aalborg, Denmark
In acoustic echo cancellation (AEC) applications, often times an acoustic path from a loudspeaker to a microphone is estimated by means of a linear adaptive filter. However, loudspeakers introduce nonlinear distortions that may strongly degrade the adaptive filter performance, thus nonlinear filters have to be considered. In this paper measurements of three types of loudspeakers are conducted to detect, quantify, and qualify nonlinearities by means of periodic random-phase multisines. It is shown that odd nonlinearities are more predominant than even nonlinearities over the entire frequency range. The aim of this paper is then to demonstrate that third-order (cubic) adaptive filters have to be used, which is in clear conflict with the extensive, almost unique, use of second-order (quadratic) Volterra filters.
Convention Paper 7841 (Purchase now)
P4-3 The Effect of Sample Variation among Cabinets of a Line Array on Simulation Accuracy—Stefan Feistel, Wolfgang Ahnert, Ahnert Feistel Media Group - Berlin, Germany
Most line array systems consist of a number of discrete sound sources. For typical performance criteria of such arrays, such as the homogeneous, controlled radiation of sound or its minimum variation among mechanically identical arrays, it is important that the radiation properties such as sensitivity and directional response of the individual sources are very similar. Based on statistical means, we discuss the effect of sample variation on the overall array performance. We show that for typical modeling applications the influence of sample variation is small and that it can be neglected in most cases as a minor error. Our results are derived by three different methods, a rigorous mathematical analysis, numerical simulations, and exemplary measurements.
Convention Paper 7842 (Purchase now)
P4-4 SPICE Simulation of Headphones and Earphones—Mark Kahrs, University of Edinburgh - Edinburgh, UK
Unlike loudspeakers, headphones and earphones are not in the mainstream of electroacoustical design. In this paper different designs for headphones and earphones are discussed and simulated with the aid of SPICE, the well-known electrical circuit simulator. These simulations can be used to perform elementary design tradeoffs. One significant difficulty is the lack of component measurements in the open literature. The paper begins with an overview of design aspects of headphones. This is followed by a review of the use of SPICE as an electroacoustical simulator. The following section details various experiments done using SPICE to explore headphone design. The conclusion decries the lack of publicly available information as well as the dearth of components.
Convention Paper 7843 (Purchase now)
P4-5 A Preliminary SPICE Model to Calculate the Radiation Impedance of a Baffled Circular Piston—Scott Porter, Stephen Thompson, The Pennsylvania State University - State College PA, USA
Acoustic systems often use circular pistons to radiate sound into fluid media. Mathematically, the solution to the radiation impedance of a baffled circular piston is well known. Implementing the exact solution in circuit analysis packages, such as SPICE, however, can be difficult because many commercial packages do not include Bessel and Struve functions. A SPICE subcircuit is presented that calculates the radiation impedance for all frequencies to a good approximation.
Convention Paper 7844 (Purchase now)
P4-6 Comparison between Measurement and Boundary Element Modelization of Subwoofers—Manuel Melon, Christophe Langrenne, Olivier Thomas, Alexandre Garcia, CNAM - Paris, France
At very low frequency, even large anechoic chambers cannot be used to measure subwoofers accurately. A solution consists in using the Field Separation Method (FSM). This technique allows subtracting the field reflected by the measurement room walls to the measured field, thus recovering the acoustic pressure that would have been radiated under free field conditions. In this paper Field Separation Method is used to measure two subwoofer prototypes. Results are compared to the ones given by a boundary element modelization of the subwoofers. Input velocities required for the modeling are measured by using a laser Doppler vibrometer. Comparisons are performed on the following quantities: on-axis pressure and directivity. Discrepancies between results obtained by these two methods are discussed and explained when possible.
Convention Paper 7845 (Purchase now)
P4-7 Design of a Coincident Source Driver Array with a Radial Channel Phase-Plug and Novel Rigid Body Diaphragms—Mark Dodd, GPAcoustics (UK) Ltd. - Maidstone, Kent, UK; Jack Oclee-Brown, KEF Audio (UK) Ltd. - Maidstone, Kent, UK
The simple source characteristics that coincident-source driver arrays promise are an attractive design goal, but many engineering obstacles must be overcome to avoid undesirable complex behavior. This paper outlines an innovative design approach that achieves simple source behavior over several octaves and avoids the complex acoustical and vibrational behavior found in traditional drivers. The high frequency section of the driver combines techniques for optimizing the response introduced by Dodd and the radial channel phase-plug design, introduced by the authors. The midrange unit uses a cone with novel geometry allowing it to move as a rigid body to cover an octave above the crossover frequency. The resulting driver and its measured behavior is described in the light of some possible alternative approaches.
Convention Paper 7846 (Purchase now)
P4-8 New Induction Drive Transducer Designs—Marshall Buck, Psychotechnology, Inc. - Los Angeles, CA, USA; Patrick Turnmire, RedRock Acoustics - Arroyo Seco, NM, USA; David Graebener, Wisdom Audio, LLC - Carson City, NV, USA
Induction motor designs for a loudspeaker typically use a transformer assembly with a fixed primary and moving secondary (driving ring) immersed in a static magnetic field. Induction drive audio transducers can be designed to produce very high efficiency, high output mid/high devices. In another configuration a very linear, long stroke woofer with high output is realizable. Measurements will be provided on prototypes of both types of devices. The midrange exhibits an output of 83 acoustic watts with 300 Watts drive, and a maximum 10 Watt efficiency of 45%. The devices have a very linear stroke with both Bl vs. X and Le vs. X extremely flat. Reliability is enhanced over conventional voice coil drive means due to the elimination of moving lead in wires. A wide range of nominal impedances can be designed by changing the wire size and number of turns in the primary.
Convention Paper 7847 (Purchase now)
P4-9 A Thin and Flexible Sound Generator Using an Electro-Active Elastomer—Takehiro Sugimoto, Kazuho Ono, Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Yuichi Morita, Kosuke Hosoda, Daisaku Ishii, Foster Electric Co., Ltd. - Akishima, Tokyo, Japan
We propose a new sound generator using electroactive elastomer (EAE), which is an elastic and flexible material. Our prototype sound generator is composed of a thin polyurethane elastomer sheet and conducting polymer electrodes. The electrodes are formed on both surfaces of the elastomer sheet and are driven by audio signals with a DC bias voltage. We conducted a transformation analysis of the EAE and found that using of the side-length change is more effective than using the thickness change. An EAE sound generator provides 30 dB higher sound pressure level (SPL) at 1 kHz than one using thickness change. The configuration design and operating conditions suitable for a sound generator are discussed.
Convention Paper 7848 (Purchase now)
P5 - Virtual Acoustics
Friday, October 9, 2:30 pm — 7:00 pm
Chair: David Griesinger
P5-1 The Effect of Whole-Body Vibration on Preferred Bass Equalization in Automotive Audio Systems—Germain Simon, Chalmers University - Göteborg, Sweden; Sean Olive, Todd Welti, Harman International - Northridge, CA, USA
A set of experiments was conducted to study the effect of whole-body vibration on preferred low frequency equalization of an automotive audio system. Listeners indicated their preferred bass equalization for four different music programs reproduced through a high quality automotive audio system auditioned in situ (in the car) and through a headphone-based binaural room scanning system. The task was repeated while the listener experienced different levels of simulated and real whole-body vibrations associated with the automotive audio system itself. The results reveal that the presence of whole-body vibration can reduce the preferred level of bass equalization by as much as 3 dB depending on the program, the level of vibration, and the individual listener.
Convention Paper 7956 (Purchase now)
P5-2 Using Programmable Graphics Hardware for Acoustics and Audio Rendering—Nicolas Tsingos, Dolby Laboratories - San Francisco, CA, USA
Over the last decade, the architecture of graphics accelerators (GPUs) has dramatically evolved, outpacing traditional general purpose processors (CPUs) with an average 2.25-fold increase in performance every year. With massive processing capabilities and high-level programmability, current GPUs can be leveraged for applications far beyond visual rendering. In this paper we offer an overview of modern programmable GPUs and how they can be applied to acoustics and audio rendering for virtual reality or gaming applications. For tasks ranging from sound synthesis and audio signal processing to numerical acoustic simulations, GPUs massive parallelism and dedicated instructions can offer a 5- to 100-fold performance improvement over traditional CPU implementations. We illustrate such benefits with results from 3-D audio processing and sound scattering simulations and discuss future opportunities for audio and acoustics applications on massively multicore processors.
Convention Paper 7850 (Purchase now)
P5-3 Designing Practical Filters for Sound Field Reconstruction—Mihailo Kolundzija, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland, University of California at Berkeley, Berkeley, CA, USA
Multichannel sound field reproduction techniques, such as Wave Field Synthesis (WFS) and Sound Field Reconstruction (SFR), define loudspeaker filters in the frequency domain. However, in order to use these techniques in practical systems, one needs to convert these frequency-domain characteristics to practical and efficient time-domain digital filters. Additional limitation of SFR comes from the fact that it uses a numerical matrix pseudoinversion procedure, where the obtained filters are sensitive to numerical errors at low levels when the system matrix has high condition number. This paper describes physically-motivated modifications of the SFR approach that allow for mitigating conditioning problems and frequency-domain loudspeaker filter smoothing that allows for designing short time-domain filters without affecting the sound field reconstruction accuracy. It also provides comparisons of sound field reproduction accuracy of WFS and SFR using the obtained discrete-time filters.
Convention Paper 7851 (Purchase now)
P5-4 Investigations on Modeling BRIR Tails with Filtered and Coherence-Matched Noise—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
This paper investigates to what extent the tails of left and right binaural room impulse responses (BRIRs) can be replaced by white Gaussian noise that has been processed to have the same energy decay relief and interaural coherence as the original BRIRs’ tail. For this purpose BRIRs were generated consisting of two parts where the first part is taken from the original BRIR and the second part is filtered and coherence-matched noise. A subjective test was carried out to investigate how the perceived similarity between original and modeled BRIRs decreases as the split point between the parts approaches the direct sound part of the BRIRs. Also, frequency-dependent and conventional frequency-independent interaural coherence matching were compared.
Convention Paper 7852 (Purchase now)
P5-5 Localization of Sound Sources in Reverberant Environments Based on Directional Audio Coding Parameters—Oliver Thiergart, Richard Schultz-Amling, Giovanni Del Galdo, Dirk Mahne, Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Methods for spatial audio processing are becoming more important as the variety of multichannel audio applications is permanently increasing. Directional Audio Coding (DirAC) represents a well proven technique to capture and reproduce spatial sound on the basis of a downmix audio signal and parametric side information, namely direction of arrival and diffuseness of the sound. In addition to spatial audio reproduction, the DirAC parameters can be exploited further. In this paper we propose a computationally efficient approach to determine the position of sound sources based on DirAC parameters. It is shown that the proposed localization method provides reliable estimates even in reverberant environments. The approach also allows to trade off between localization accuracy and tracking performance of moving sound sources.
Convention Paper 7853 (Purchase now)
P5-6 Distance Perception in Loudspeaker-Based Room Auralization —Sylvain Favrot, Jörg M. Buchholz, Technical University of Denmark - Kgs. Lyngby, Denmark
A loudspeaker-based room auralization (LoRA) system has been recently proposed that efficiently combines modern room acoustic modeling techniques with high-order Ambisonics (HOA) auralization to generate virtual auditory environments (VAEs). The reproduction of the distance of sound events in such VAE is very important for its fidelity. A direct-scaling distance perception experiment was conducted to evaluate the LoRA system including the use of near-field control (NFC) for HOA. Experimental results showed that (i) loudspeaker-based auralization in the LoRA system provides similar distance perception to that of the corresponding real environment and that (ii) NFC-HOA provides a significant increase in the range of perceived distances for near sound sources as compared to standard HOA.
Convention Paper 7854 (Purchase now)
P5-7 Dual Radius Spherical Cardioid Microphone Arrays for Binaural Auralization—Frank Melchior, IOSONO GmbH - Erfurt, Germany, TU Delft, Delft, The Netherlands; Oliver Thiergart, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Diemer de Vries, TU Delft - Delft, The Netherlands; Sandra Brix, Fraunhofer IDMT - Ilmenau, Germany
The direction dependent analysis of impulse response measurements using spherical microphone arrays can deliver a universal basis for binaural auralization. A new method using dual radius open sphere arrays is proposed to overcome limitations in practical realizations of such arrays. Different methods to combine the two radii have been analyzed and will be presented. A plane wave decomposition in conjunction with a high resolution HRTF database is used to generate a binaural auralization, wherein the different designs are simulated under ideal and real conditions. The results have been evaluated in a quality grading experiment. It is shown that the dual radius cardioids design is an effective method to enhance the perceived quality in comparison to conventional spherical array designs.
Convention Paper 7855 (Purchase now)
P5-8 Recording Multichannel Sound within Virtual Acoustics—Wieslaw Woszczyk, Tom Beghin, Martha de Francisco, Doyuen Ko, McGill University - Montreal, Quebec, Canada
Virtual acoustic environments were implemented in a laboratory based on real-time convolution of multiple high-resolution impulse responses previously measured in real rooms. The quality of these environments has been tested during live music performance and recording in a highly demanding Virtual Haydn Project. The technological and conceptual novelty of this project is to allow one to separate the recording of the direct sound and of the ambient sound into two independent processes, each individually optimized and adjusted. The method offers recording and rehearsal rooms that are guaranteed to be quiet and free from traffic noise and other interference. At present, the technical complexity and system cost are still very high but we can expect that these will be reduced in time.
Convention Paper 7856 (Purchase now)
P5-9 The Application of Compressive Sampling to the Analysis and Synthesis of Spatial Sound Fields—Nicolas Epain, Craig Jin, André Van Schaik, University of Sydney - Sydney, NSW, Australia
Compressive sampling provides a new and interesting tool to optimize measurements of physical phenomena with a small number of sensors. The essential idea is that close to perfect reconstruction of the observed phenomenon may be possible when it can be described by a sparse set of basis functions. In this paper we show how to apply compressive sampling techniques to the recording, analysis, and synthesis of spatially extended sound fields. Numerical simulations demonstrate that our proposed method can dramatically improve the playback of spatialized sound fields as compared, for example, with High Order Ambisonics.
Convention Paper 7857 (Purchase now)
P6 - Production and Analysis of Musical Sounds
Friday, October 9, 3:30 pm — 5:00 pm
P6-1 Automated Cloning of Recorded Sounds by Software Synthesizers—Sebastian Heise, Michael Hlatky, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Any audio recording can be turned into a digital musical instrument by feeding it into an audio sampler. However, it is difficult to edit such a sound in musical terms or even to control it in real time with musical expression. Even the application of a more sophisticated synthesis method will show little change. Many composers of electronic music appreciate the direct and clear access to sound parameters that a traditional analog synthesizer offers. Is it possible to automatically generate a synthesizer setting that approximates a given audio recording and thus clone a given sound to be controlled with the standard functions of the particular synthesizer employed? Even though this problem seems highly complex, we demonstrate that its solution becomes feasible with computer systems available today. We compare sounds on the basis of acoustic features known from Music Information Retrieval and apply a specialized optimization strategy to adjust the settings of VST instruments. This process is sped up using multi-core processors and networked computers.
Convention Paper 7858 (Purchase now)
P6-2 Low-Latency Conversion of Audible Guitar Tones into Visible Light Colors—Nermin Osmanovic, Microsoft Corporation - Redmond, WA, USA
Automated sound-to-color transformation system makes it possible to display the corresponding color map that matches the actual note played on the guitar at the same instant. One application of this is to provide intelligent color effect “light show” for live instruments on stage during performance. By using time and frequency information of the input signal, a computer can analyze sound events and determine which tone is currently being played. The knowledge about guitar tone sound event being played on the audio input provides a basis for the implementation of a digital sound-to-light converter. The converter streams live audio input, analyzes frames based on the signal’s power threshold, determines fundamental frequency of the current tone, maps this information to color, and displays the targeted light color in real-time. The final implementation includes full screen presentation mode with real time display of both pitch and intensity of the sound.
Convention Paper 7859 (Purchase now)
P6-3 TheremUS: The Ultrasonic Theremin—André Gomes, Globaltronic-Electrónica e Telecomunicaçoes - Águeda, Portugal; Daniel Albuquerque, Guilherme Campos, José Vieira, University of Aveiro - Aveiro, Portugal
In the Theremin, the performer’s hand movements, detected by two antennas, control the pitch and volume of the generated sound. The TheremUS builds on this concept by using ultrasonic sensing for hand position detection and processing all signals digitally, a distinct advantage in terms of versatility. Not only can different sound synthesis algorithms be programmed directly on the instrument but also it can be easily connected to other digital sound synthesis or multimedia devices; a MIDI interface was included for this purpose. The TheremUS also features translucent panels lit by controllable RGB LED devices. This makes it possible to specify sound-color mappings in the spirit of the legendary Ocular Harpsichord by Castel.
Convention Paper 7860 (Purchase now)
P6-4 Structural Segmentation of Irish Traditional Music Using Chroma at Set Accented Tone Locations—Cillian Kelly, Mikel Gainza, David Dorran, Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
An approach is presented that provides a structural segmentation of Irish Traditional Music. Chroma information is extracted at certain locations within the music. The resulting chroma vectors are compared to determine similar structural segments. Chroma is only calculated at “set accented tone” locations within the music. Set accented tones are considered to be impervious to melodic variation and are entirely representative of an Irish Traditional tune. Results show that comparing set accented tones represented by chroma signi?cantly increases the structural segmentation accuracy than when set accented tones are represented by pitch values.
Convention Paper 7861 (Purchase now)
P6-5 Rendering Audio Using Expressive MIDI—Stephen J. Welburn, Mark D. Plumbley, Queen Mary, University of London - London, UK
MIDI renderings of audio are traditionally regarded as lifeless and unnatural—lacking in expression. However, MIDI is simply a protocol for controlling a synthesizer. Lack of expression is caused by either an expressionless synthesizer or by the difficulty in setting the MIDI parameters to provide expressive output. We have developed a system to construct an expressive MIDI representation of an audio signal, i.e., an audio representation that uses tailored pitch variations in addition to the note base pitch parameters that audio-to-MIDI systems usually attempt. A pitch envelope is estimated from the original audio, and a genetic algorithm is then used to estimate pitch modulator parameters from that envelope. These pitch modulations are encoded in a MIDI file and rerendered using a sampler. We present some initial comparisons between the final output audio and the estimated pitch envelopes.
Convention Paper 7862 (Purchase now)
P6-6 Editing MIDI Data Based on the Acoustic Result—Sebastian Heise, Michael Hlatky, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
MIDI commands provide an abstract representation of audio in terms of note-on and note-off times, velocity, and controller data. The relationship of these commands to the actual audio signal is dependent on the actual synthesizer patch in use. Thus, it is hard to implement effects such as compression of the dynamic range or time correction based on MIDI commands alone. To improve on this, we have developed software that silently computes a software synthesizer’s audio output on each parameter update to support editing of the MIDI data based on the resulting audio data. Precise alignment of sounds to the beat, sample-correct changes in articulation, and musically meaningful dynamic compression through velocity data become possible.
Convention Paper 7863 (Purchase now)
P6-7 Sound Production and Audio Programming of the Sound Installation GROMA—Judith Nordbrock, Martin Rumori, Academy of Media Arts Cologne - Cologne, Germany
In this paper we shall be picturing the sound production, in particular the mixing scenarios and the audio programming, of GROMA. GROMA is a permanent urban sound installation that incorporates early historic texts on urbanization combined with environmental sounds from two of Cologne’s partner cities, Rotterdam and Liège, in an algorithmic multichannel composition. This installation was inaugurated in 2008 at the location of Europe's largest underground parking lot in Cologne, situated in the area of Rheinauhafen. For producing the sound material, special methods had to be developed, that allow for the fine-grained aesthetical design of the sound in the unconventional venue and that also support the aleatoric combination of the sound situations.
Convention Paper 7864 (Purchase now)
P7 - Audio in Multimodal Applications
Saturday, October 10, 9:00 am — 1:00 pm
Chair: Rob Maher, Montana State University - Bozeman, MT, USA
P7-1 Listening within You and without You: Center-Surround Listening in Multimodal Displays—Thomas Santoro, Naval Submarine Medical Research Laboratory (NSMRL) - Groton, CT, USA; Agnieszka Roginska, New York University - New York, NY, USA; Gregory Wakefield, University of Michigan - Ann Arbor, MI, USA
Listener cognitive performance improvements from implementations of enhanced spatialized auditory displays are considered. Investigations of cognitive decision making driven by stimuli organized in “center-surround” auditory-only and visual-only arrangements are described. These new prototype interfaces employing a center-surround organization (“listening within – listening without”) exploit the capability of the auditory and visual modalities for concurrent operation and facilitate their functioning to support cognitive performance in synthetic, immersive environments.
Convention Paper 7865 (Purchase now)
P7-2 A Loudspeaker Design to Enhance the Sound Image Localization on Large Flat Displays—Gabriel Pablo Nava, Keiji Hirata, NTT Communication Science Laboratories - Seika-cho, Kyoto, Japan; Masato Miyoshi, Kanazawa University - Ishikawa-ken, Japan
A fundamental problem in auditory displays implemented with conventional stereo loudspeakers is that correct localization of sound images can be perceived only at the sweet spot and along the symmetrical axis of the stereo array. Although several signal processing-based techniques have been proposed to expand the listening area, less research on the loudspeaker configuration has been reported. This paper describes a new loudspeaker design that enhances the localization of sound images on the surface of flat display panels over a wide listening area. Numerical simulations of the acoustic field radiated, and subjective tests performed using a prototype panel show that the simple principle of the design effectively modifies the radiation pattern so as to widen the listening area.
Convention Paper 7866 (Purchase now)
P7-3 A Method for Multimodal Auralization of Audio-Tactile Stimuli from Acoustic and Structural Measurements—Clemeth Abercrombie, Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
A new method for the reproduction of sound and vibration for arbitrary musical source material based on physical measurements is presented. Tactile signals are created by the convolution of “uncoupled” vibration with impulse responses derived from mechanical impedance measurements. Audio signals are created by the convolution of anechoic sound with binaural room impulse responses. Playback is accomplished through headphones and a calibrated motion platform. Benefits of the method include the ability to make multimodal, side-by-side listening tests for audio-tactile stimuli perceived in real music performance situations. Details of the method are discussed along with obstacles and applications. Structural response measurements are presented as validation of the need for measured vibration signals in audio-tactile displays.
Convention Paper 7867 (Purchase now)
P7-4 "Worms" in (E)motion: Visualizing Emotions Evoked by Music—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Reinhard Kopiez, Hanover University of Music and Drama - Hanover, Germany; Oliver Grewe, Studienstiftung des deutschen Volkes e. V. - Bonn, Germany; Eckart Altenmüller, Hanover University of Music and Drama - Hanover, Germany
Music plays an important role in everyday human life. One important reason for this is the capacity of music to influence listeners’ emotions. This study describes the application of a recently developed interface for the visualization of emotions felt while listening to music. Subjects (n = 38) listened to 7 musical pieces of different styles. They were asked to report their own emotions, felt in real-time, in a two-dimensional emotion space using computer software. Films were created from the time series of all self-reports as a synopsis. This technique of data visualization allows an appealing method of data analysis while providing the opportunity to investigate commonalities of emotional self-reports as well as differences between subjects. In addition to presenting the films, the authors of this study also discuss its possible applications in areas such as social sciences, musicology, and the music industry.
Convention Paper 7868 (Purchase now)
P7-5 Enhanced Automatic Noise Removal Platform for Broadcast, Forensic, and, Mobile Applications—Shamail Saeed, Harinarayanan E.V., ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Balaji V., ATC Labs - Noida, India
We present new enhancements and additions to our novel Adaptive/Automatic Wide Band Noise Removal (AWNR) algorithm proposed earlier. AWNR uses a novel framework employing dominant component subtraction followed by adaptive Kalman filtering and subsequent restoration of the dominant components. The model parameters for Kalman filtering are estimated utilizing a multi-component Signal Activity Detector (SAD) algorithm. The enhancements we present here include two enhancements to the core filtering algorithm, including the use of a multi-band filtering framework as well as a color noise model. In the first case it is shown how the openness of the filtered signal improves through the use of a two band structure with independent filtering. The use of color noise model, on the other hand, improves the level of filtering for wider types of noises. We also describe two other structural enhancements to the AWNR algorithm that allow it to better handle respectively dual microphone recording scenarios and forensic/restoration applications. Using an independent capture from a noise microphone the level of filtering is substantially increased. Furthermore for forensic applications a two/multiple pass filtering framework in which SAD profiles may be fine tuned using manual intervention are desirable.
Convention Paper 7869 (Purchase now)
P7-6 Catch Your Breath— Musical Biofeedback for Breathing Regulation—Diana Siwiak, Jonathan Berger, Yao Yang, Stanford University - Stanford, CA, USA
Catch Your Breath is an interactive musical biofeedback system adapted from a project designed to reduce respiratory irregularity in patients undergoing 4D-CT scans for oncological diagnosis. The medical application system is currently implemented and undergoing assessment as a means to reduce motion-induced distortion in CT images. The same framework was implemented as an interactive art installation. The principle is simple—the subjects breathing motion is tracked via video camera using fiducial markers, and interpreted as a real-time variable tempo adjustment to a MIDI file. The subject adjusts breathing to synchronize with a separate accompaniment line. When the subjects breathing is regular and at the desired tempo, the audible result sounds synchronous and harmonious. The accompaniments tempo gradually decreases, which causes breathing to synchronize and slow down, thus increasing relaxation.
Convention Paper 7870 (Purchase now)
P7-7 Wavefield Synthesis for Interactive Sound Installations—Grace Leslie, IRCAM - Paris, France, University of California, San Diego, La Holla, CA, USA; Diemo Schwarz, Olivier Warusfel, Frédéric Bevilacqua, Pierre Jodlowski, IRCAM - Paris, France
Wavefield synthesis (WFS), the spatialization of audio through the recreation of a virtual source’s wavefront, is uniquely suited to interactive applications where listeners move throughout the rendering space and more than one listener is involved. This paper describes the features of WFS that make it useful for interactive applications, and takes a recent project at IRCAM as a case study that demonstrates these advantages. The interactive installation GrainStick was developed as a collaboration between the composer Pierre Jodlowski and the European project Sound And Music For Everyone Everyday Everywhere Everyway (SAME) at IRCAM, Paris. The interaction design of GrainStick presents a new development in multimodal interfaces and multichannel sound by allowing users control of their auditory scene through gesture analysis performed on infrared camera motion tracking and accelerometer data.
Convention Paper 7871 (Purchase now)
P7-8 Eidola: An Interactive Augmented Reality Audio-Game Prototype—Nikolaos Moustakas, Andreas Floros, Nikolaos-Grigorios Kanellopoulos, Ionian University - Corfu, Greece
Augmented reality audio represents a new trend in digital technology that enriches the real acoustic environment with synthesized sound produced by virtual sound objects. On the other hand, an audio-game is based only on audible feedback rather than on visual. In this paper an audio-game prototype is presented that takes advantage of the characteristics of augmented reality audio for realizing more complex audio-game scenarios. The prototype was realized as an audiovisual interactive installation, allowing the further involvement of the physical game space as the secondary component for constructing the game ambient environment. A sequence of tests has shown that the proposed prototype can efficiently support complex game scenarios provided that the necessary advanced interaction paths are available.
Convention Paper 7872 (Purchase now)
P8 - Data Compression
Saturday, October 10, 9:00 am — 12:30 pm
Chair: Christof Faller
P8-1 Wireless Transmission of Audio Using Adaptive Lossless Coding—David Trainor, APTX (APT Licensing Ltd.) - Belfast, Northern Ireland, UK
In audio devices, such as smartphones, media players, and wireless headsets, designers face the conflicting requirements of elevating coded audio quality and reducing algorithm complexity, device power dissipation, and transmission bit-rates. As a result, there are significant challenges in providing highest-quality real-time audio streaming between devices over wireless networks. Mathematically-lossless audio coding algorithms are an attractive means of maximizing coded audio quality. However, in the context of wireless audio transmission between portable devices, characteristics of such algorithms such as modest levels of bandwidth reduction, encoding complexity, and robustness to data loss need to be carefully controlled. Such control can be elegantly engineered by incorporating real-time adaptation and scaling into the audio coding algorithm itself. This paper describes a lossless audio coding algorithm called apt-X Lossless, which has been designed with scalability and automated adaptation as its principal characteristics.
Convention Paper 7873 (Purchase now)
P8-2 Quantization with Constrained Relative Entropy and Its Application to Audio Coding—Minyue Li, W. Bastiaan Kleijn, KTH - Royal Institute of Technology - Stockholm, Sweden
Conventional quantization distorts the probability density of the source. In scenarios such as low bit rate audio coding, this leads to perceived distortion that is not well characterized by commonly used distortion criteria. We propose the relative entropy between the probability densities of the original and reconstructed signals as an additional fidelity measure. Quantization with a constraint on relative entropy ensures that the probability density of the signal is preserved to a controllable extent. When it is included in an audio coder, the new quantization facilitates a continuous transition between the underlying concepts of the vocoder, the bandwidth extension, and a rate-distortion optimized coder. Experiments confirm the effectiveness of the new quantization scheme.
Convention Paper 7874 (Purchase now)
P8-3 Enhanced Stereo Coding with Phase Parameters for MPEG Unified Speech and Audio Coding—JungHoe Kim, Eunmi Oh, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea; Julien Robilliard, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The proposed technology is concerned with a bit-efficient way to deliver phase information. This technology is to encode only interchannel phase difference (IPD) parameter and to estimate overall phase difference (OPD) parameter at the decoder with transmitted interchannel phase difference and channel level difference. The proposed technology reduces the bit rate for phase parameters compared to the case that both IPD parameters and OPD parameters are transmitted as specified in MPEG Parametric Stereo. The entropy coding scheme for phase parameters is improved utilizing the wrapping property of the phase parameters. We introduce phase smoothing at the decoder and adaptive control of quantization precision for phase parameters to minimize annoying artifacts due to abrupt changes of quantized phase parameters. The proposed phase coding can improve stereo sound quality significantly and it was accepted as a part of the MPEG-D USAC (Unified Speech and Audio Coding) standard.
Convention Paper 7875 (Purchase now)
P8-4 An Enhanced SBR Tool for Low-Delay Applications—Michael Werner, Ilmenau University of Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau Technical University - Ilmenau, Germany, Fraunhofer IDMT, Ilmenau, Germany
An established technique to reduce the data rate of an audio coder is Spectral Band Replication (SBR). The standard SBR tool is made for applications where encoding/decoding delay is of no concern. Our goal is to obtain an SBR system with little algorithmic delay for use in real-time applications, such as wireless microphones or video conferencing. We already developed a low delay SBR tool (LD-SBR) but it produces a relatively high amount of side information. This paper presents an enhanced SBR tool for low delay applications that uses techniques from LD-SBR in combination with Codebook Mapping (CBM). This leads to an enhanced low delay SBR tool with a reduced amount of side information without degrading audio quality.
Convention Paper 7876 (Purchase now)
P8-5 Audio Codec Employing Frequency-Derived Tonality Measure—Maciej Kulesza, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A transform codec employing efficient algorithm for detection of spectral tonal components is presented. The tonality measure used in the MPEG psychoacoustic model is replaced with the method providing adequate tonality estimates even if the tonal components are deeply frequency modulated. The reliability of the hearing threshold estimated using a psychoacoustic model with standardized tonality measure and the proposed one is investigated using objective quality testing methods. The proposed tonality estimator is also used as a basis for detector of noise-like signal bands. Instead of quantizing the noise-like signal components according to the usual transform coding scheme, the signal bands containing only noise-like components are filled with locally generated noise in the decoder. The results of the listening tests revealing usefulness of employed tonality estimation method for such a coding scenario are presented.
Convention Paper 7877 (Purchase now)
P8-6 New Approaches to Statistical Multiplexing for Perceptual Audio Coders Used in Multi-Program Audio Broadcasting—Deepen Sinha, ATC Labs - Chatham, NJ, USA; Harinarayanan E.V., Ranjeeta Sharma, ATC Labs - Noida, India
In the case of multi-program audio broadcasting or transmission joint encoding is an attractive proposition. It has previously been reported that the conventional joint encoding benefits conventional perceptual audio coders in this scenario. However, previous attempts to such statistical multiplexing have focused primarily on joint bit allocation. Here we show that such an approach is not sufficient to realize the promise of statistical multiplexing. Rather a successful strategy has two essential ingredients including a highly accurate psychoacoustic model and a coordination mechanism that goes beyond joint allocation. We describe methods to achieve these objectives and also present objective and subjective coding results.
Convention Paper 7878 (Purchase now)
P8-7 Subjective Evaluation of mp3 Compression for Different Musical Genres—Amandine Pras, Rachel Zimmerman, Daniel Levitin, Catherine Guastavino, McGill University - Montreal, Quebec, Canada
Mp3 compression is commonly used to reduce the size of digital music files but introduces a number of artifacts, especially at low bit rates. We investigated whether listeners prefer CD quality to mp3 files at various bit rates (96 kb/s to 320 kb/s), and whether this preference is affected by musical genre. Thirteen trained listeners completed an AB comparison task judging CD quality and compressed. Listeners significantly preferred CD quality to mp3 files up to 192 kb/s for all musical genres. In addition, we observed a significant effect or expertise (sound engineers vs. musicians) and musical genres (electric vs. acoustic music).
Convention Paper 7879 (Purchase now)
P9 - Spatial Audio
Saturday, October 10, 10:00 am — 11:30 am
P9-1 Surround Sound Track Productions Based on a More Channel Headphone—Florian M. Koenig, Ultrasone AG & Florian König Enterprises GmbH - Germering, Germany
Stereo headphones have become very popular in the last few years. One reason was the development around mp3 and its huge market acceptance. Meanwhile, portable surround sound devices could be the successor of “stereo” applications in consumer electronics, multimedia, and games. Some problems are 3-D sound obstructive: humans need individual reproduced binaural signals (HRTF ~ outer ear shape) due to all types of headphone uses. Additionally, infants sound coloration cognition is different from adults, who investigate! Commercial headphones need an adaptive realistic 3-D image over all headphone sound sources (TV, CD, mp3 / mobile phone) with a minimum of elevation effect and a virtual distance perception. We realized surround sound mixings with a 4.0 headphone 5.1 loudspeaker compatible; they can be demonstrated as well.
Convention Paper 7880 (Purchase now)
P9-2 Reconstruction and Evaluation of Dichotic Room Reverberation for 3-D Sound Generation—Keita Tanno, Akira Saji, Huakang Li, Tatsuya Katsumata, Jie Huang, The University of Aizu - Aizu-Wakamatsu, Fukushima, Japan
Artificial reverberation is often used to increase reality and prevent the in-the-head localization in a headphone-based 3-D sound system. In traditional methods, diotic reverberations were used. In this research, we measured the impulse responses of some rooms by a Four Point Microphone method, and calculated the sound intensity vectors by the Sound Intensity method. From the sound intensity vectors, we obtained the image sound sources. Dichotic reverberation was reconstructed by the estimated image sound sources. Comparison experiments were conducted for three kinds of reverberations, i.e., diotic reverberations, dichotic reverberations, and dichotic reverberations added with Head-Related Transfer Functions (HRTF). From the results, we could clarify the 3-D sounds reconstructed by dichotic reverberations with Head-Related Transfer Functions have more spatial extension than other methods.
Convention Paper 7881 (Purchase now)
P9-3 Reproduction 3-D Sound by Measuring and Construction of HRTF with Room Reverberation—Akira Saji, Keita Tanno, Jie Huang, The University of Aizu - Aizu-Wakamatsu, Fukushima, Japan
In this paper we propose a new method using HRTFs that contain room reverberations (R-HRTF). The reverberation is not added to the dry sound source separated with HRTF but contained at their measured process in the HRTFs. We measured the HRTFs in a real reverberant environment for directions of azimuth 0, 45, 90, 135 (left side) and elevation from 0 to 90 (step of 10 degrees) degrees, then constructed a 3-D sound system with the measured R-HRTF with headphones and examined if the sound reality is improved. As a result, we succeed in creating a 3-D spatial sound system with more reality compared with a traditional HRTFs sound system by signal processing.
Convention Paper 7882 (Purchase now)
P9-4 3-D Sound Synthesis of a Honeybee Swarm—Jussi Pekonen, Antti Jylhä, Helsinki University of Technology - Espoo, Finland
Honeybee swarms are characterized by their buzzing sound, which can be very impressive close to a hive. We present two techniques and their real-time sound synthesis of swarming honeybees in 3-D multichannel setting. Both techniques are based on a source-filter model using a sawtooth oscillator with all-pole equalization filter. The synthesis is controlled by the motion of the swarm, which is modeled in two different ways: as a set of coupled individual bees or with a swarming algorithm. The synthesized sound can be spatialized using the location information generated by the model. The proposed methods are capable of producing a realistic honeybee swarm effect to be used in, e.g., virtual reality applications.
Convention Paper 7883 (Purchase now)
P9-5 An Investigation of Early Reflection’s Effect on Front-Back Localization in Spatial Audio—Darrin Reed, Robert C. Maher, Montana State University - Bozeman, MT, USA
In a natural sonic environment a listener is accustomed to hearing reflections and reverberation. It is conceived that early reflections could reduce front-back confusion in synthetic 3-D audio. This paper describes an experiment to determine whether or not simulated reflections can reduce front-back confusion for audio presented with nonindividualized HRTFs via headphones. Although the simple addition of a single-order reflection is not shown to eliminate all front-back confusions, some cases of lateral reflections from a side boundary can be shown to both assist and inhibit localization ability depending on the relationship of the source, observer, and reflective boundary.
Convention Paper 7884 (Purchase now)
P9-6 Some Further Investigations on Estimation of HRIRs from Impulse Responses Acquired in Ordinary Sound Fields—Shouichi Takane, Akita Prefectural University - Akita, Japan
The author’s group proposed the method for estimation of Head-Related Transfer Functions (HRTFs) or its corresponding impulse response referred to as Head-Related Impulse Responses (HRIRs) [Takane et al., Proc. JCA(2007)]. In this paper the proposed method was further investigated in two parameters affecting the performance of the estimation. The first parameter is the order of AR coefficients, indicating how many past samples are assumed to be related to the current sample. It was found that Signal-to-Deviation Ratio (SDR) was improved by using the proposed method when the order of AR coefficients was about a half of the cutout points. The second parameter is the number of samples used for the computation of AR coefficients. It was shown from the results that SDR was greatly improved when this number corresponds to the duration of the response. This indicates the proposed method properly works in ideal situations.
Convention Paper 7885 (Purchase now)
P9-7 Virtual Ceiling Speaker: Elevating Auditory Imagery in a 5-Channel Reproduction—Sungyoung Kim, Masahiro Ikeda, Akio Takahashi, Yusuke Ono, Yamaha Corp. - Iwata, Shizuoka, Japan; William L. Martens, University of Sydney - Sydney, NSW, Australia
In this paper we propose a novel signal processing method called Virtual Ceiling Speaker (VCS) that creates virtually elevated auditory imagery via a 5-channel reproduction system. The proposed method is based on transaural crosstalk cancellation using three channels: center, left-surround, and right-surround. The VCS reproduces a binaurally elevated signal via two surround loudspeakers that inherently reduce transaural crosstalk, while the residual crosstalk component is suppressed by a center channel signal that is optimized for natural perception of elevated sound. Subjective evaluations show that the virtually elevated auditory imagery maintains similar perceptual characteristics when compared to sound produced from an elevated loudspeaker. Moreover, the elevated sound contributes to an enhanced sense of musical expressiveness and spatial presence in music reproduction.
Convention Paper 7886 (Purchase now)
P9-8 Spatial Soundfield Reproduction with Zones of Quiet—Thushara Abhayapala, Yan Jennifer Wu, Australian National University - Canberra, Australia
Reproduction of a spatial soundfield in an extended region of open space with a designated quiet zone is a challenging problem in audio signal processing. We show how to reproduce a given spatial soundfield without altering a nearby quiet zone. In this paper we design a spatial band stop filter over the zone of quiet to suppress the interference from the nearby desired soundfield. This is achieved by using the higher order spatial harmonics to cancel the undesirable effects of the lower order harmonics of the desired soundfield on the zone of quiet. We illustrate the theory and design by simulating a 2-D spatial soundfield.
Convention Paper 7887 (Purchase now)
P10 - Consumer Audio
Saturday, October 10, 2:30 pm — 5:00 pm
Chair: John Strawn, S Systems, Inc. - Larkspur, CA, USA
P10-1 The Wii Remote as a Musical Instrument: Technology and Case Studies—Paul D. Lehrman, Tufts University - Medford, MA, USA
The inexpensive and ubiquitous remote for the Nintendo Wii game system uses a combination of technologies that are highly suited for music generation and control. These include position tracking, tilt and motion measurement in three dimensions, a two-dimensional joystick (with its companion "Nunchuk"), and multiple buttons. A new accessory, the "MotionPlus," adds gyroscopic sensing and another, the "Balance Board" adds body-position sensing. Use of the system in several musical performance contexts is examined including conducting a synthetic orchestra and playing expressive single and multi-user instruments.
Convention Paper 7888 (Purchase now)
P10-2 Measurement Techniques for Evaluating Microphone Performance in Windy Environments—Simon Busbridge, University of Brighton, AudioGravity Ltd; David Herman, AudioGravity Ltd
The traditional solution for controlling microphone wind response (foam windshield) is of limited benefit in miniature applications. The use of ECM and MEMS type microphones is typically associated with DSP-type solutions to reduce the unwanted output from air mass flow. Such solutions vary widely in their effectiveness. The situation is compounded by the range of techniques in current use to evaluate microphone wind response. This paper discusses the essential elements necessary for consistent microphone wind measurements and proposes a standard measurement technique that will be of use to all developers and manufacturers concerned with controlling microphone wind noise. Practical implementation of the technique and results obtained for a range of microphones are presented.
Convention Paper 7889 (Purchase now)
P10-3 Psychoacoustical Bandwidth Extension of Lower Frequencies—Judith Liebetrau, Daniel Beer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Matthias Lubkowitz, TU Ilmenau - Ilmenau, Germany
Nowadays, small and flat loudspeakers are requested by the market for home and mobile entertainment. One major problem of such small devices is the reproduction of lower frequencies due to physical limitations. A lack of low frequencies has a negative impact on the perceived audio quality. To obtain good audio quality even with small loudspeakers, the utilization of psychoacoustical effects is conceivable. Basic principles of psychoacoustical bandwidth extension, concretely implemented algorithms and parameter settings for a considerable extension of bandwidth are explained. Furthermore, a listening test method for evaluating perceived audio quality in comparison to bass extension is described. Based on that, an assessment of increased bandwidth extension and sound coloration is done and a conclusion is drawn.
Convention Paper 7890 (Purchase now)
P10-4 Reducing the Complexity of Sub-band ADPCM Coding to Enable High-Quality Audio Streaming from Mobile Devices—Neil Smyth, David Trainor, APTX (APT Licensing Ltd.) - Belfast, Northern Ireland, UK
The number of consumer audio applications demanding high quality audio compression and communication across wireless networks continues to grow. Although the consumer is increasingly demanding higher audio quality, devices such as portable media players and wireless headsets also demand low computational complexity, low power dissipation, and practical transmission bit-rates to help conserve battery life. This paper discusses research undertaken to lower the complexity of existing high-quality sub-band ADPCM coding schemes to better satisfy these conflicting criteria.
Convention Paper 7891 (Purchase now)
P10-5 An Interactive Audio System for Mobiles—Yohan Lasorsa, Jacques Lemordant, INRIA - Rhône-Alpes, France
This paper presents an XML format for embedded interactive audio, deriving from well-established formats like iXMF and SMIL. We introduce in this format a new paradigm for audio elements and animations synchronization, using a flexible event-driven system in conjunction with graph description capabilities to replace audio scripting. We have implemented a sound manager for J2ME smartphones and the iPhone. Guidance applications for blind people based on this audio system are being developed.
Convention Paper 7892 (Purchase now)
P11 - Sound in Real Spaces
Saturday, October 10, 3:30 pm — 5:00 pm
P11-1 Acoustics of National Parks and Historic Sites: The 8,760 Hour MP3 File—Robert Maher, Montana State University - Bozeman, MT, USA
According to current U.S. National Park Service (NPS) management policies, the natural soundscape of parks and historic sites is a protected resource just like the ecosystems, landscapes, and historic artifacts for which the parks were formed. While several NPS sites have been studied extensively for noise intrusions by tour aircraft and mechanized recreation, most parks and historic sites do not yet have an acoustic baseline for management purposes. A recent initiative of the NPS Natural Sounds Office is to obtain continuous audio recordings of specific sites for one entire year. This paper explores the engineering and scientific issues associated with obtaining, archiving, and cataloging an 8,760 hour long audio recording for Grant-Kohrs Ranch National Historic Site.
Convention Paper 7893 (Purchase now)
P11-2 Improved Speech Dereverberation Method Using the Kurtosis-Maximization with the Voiced/Unvoiced/Silence Classification—Jae-woong Jeong, Se-Woon Jeon, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Korea; Seok-Pil Lee, Korea Electronics Technology Institute (KETI) - Sungnam, Korea; Dae-hee Youn, Yonsei University - Seoul, Korea
In this paper we present a new speech dereverberation method using the kurtosis-maximization based on the voiced/unvoiced/silence (V/UV/S) classification. Since kurtosis of the UV/S sections are much smaller than V sections, adaptation of the dereverberation filter using these sections often results in slow and nonrobust convergence, and, in turn, poor dereverberation. The proposed algorithm controls adaptation of the dereverberation filter using the results of V/UV/S classification, together with kurtosis measure of the input speech. For the selective control of adaptation, both hard decision and voice likelihood measure based on various features together with kurtosis were tried, and then, the step-size of the adaptive algorithm was varied according to various control strategies. The proposed algorithm provides better and more robust dereverberation performance than the conventional algorithm, which was confirmed through the experiments.
Convention Paper 7894 (Purchase now)
P11-3 A Survey of Broadcast Television Perceived Relative Audio Levels—Chris Hanna, Matthew Easley, THAT Corporation - Milford, MA, USA
Perceived television volume levels can vary dramatically as audio changes both within a given broadcast channel and between broadcast channels. This paper surveys the broadcast audio levels in two large metropolitan areas (Atlanta and Boston). Both analog and digital broadcasts are monitored from cable and satellite providers. Two-channel perceived loudness is measured utilizing the ITU-R Rec. BS.1770 loudness meter standard. Statistical data is presented showing the severity and nature of the perceived loudness changes. Finally, dynamic volume control technology is applied to the most severe recordings for perceived loudness comparisons.
Convention Paper 7896 (Purchase now)
P11-4 Optimizing the Re-enforcement Effect of Early Reflections on Aspects of Live Musical Performance Using the Image Source Model—Michael Terrell, Joshua Reiss, Queen Mary University of London - London, UK
The image source method is used to identify early reflections which have a re-enforcement effect on the sound traveling within an enclosure. The distribution of absorptive material within the enclosure is optimized to produce the desired re-enforcement effect. This is applied to a monitor mix and a feedback prevention case study. In the former it is shown that the acoustic path gain of the vocals can be increased relative to the acoustic path gain of the other instruments. In the latter it is shown that the acoustic path from loudspeaker to microphone can be manipulated to increase the perceived signal level before the onset of acoustic feedback.
Convention Paper 7897 (Purchase now)
P11-5 The Influence of the Rendering Architecture on the Subjective Performance of Blind Source Separation Algorithms—Thorsten Kastner, University of Erlangen-Nuremberg - Erlangen, Germany, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Blind Source Separation algorithms often include a time/frequency (t/f) decomposition / filterbank as an important part allowing for frequency selective separation of the input signal. In order to investigate the importance of the t/f processing architecture for the achieved subjective audio quality, a set of blindly separated audio signals were taken from the Stereo Audio Source Separation Campaign (SASSEC) 2007 and rated in a MUSHRA listening test. The test furthermore included material that was generated by using the separated signals to drive an enhanced time/frequency rendering architecture, as it is offered by MPEG Spatial Audio Object Coding (SAOC). In this way, the same basic separation core algorithm was applied together with different t/f rendering architectures. The listening test reveals an improved subjective quality for the SAOC post-processed items.
Convention Paper 7898 (Purchase now)
P11-6 Real-Time Implementation of Robust PEM-AFROW-Based Solutions for Acoustic Feedback Control—Simone Cifani, Rudy Rotili, Emanuele Principi, Stefano Squartini, Francesco Piazza, Università Politecnica delle Marche - Ancona, Italy
Acoustic feedback is a longstanding problem in the audio processing field, occurring whenever sound is captured and reproduced in the same environment. Different control strategies have been proposed over the years, among which a feedback cancellation technique based on the prediction error method (PEM) has revealed to be performing on purpose. Recent studies have shown that the integration of a suppression or a noise reduction filter in the system loop might be beneficial from different perspectives. In this paper a real-time implementation of the aforementioned algorithm is presented, which exploits the partitioned-block frequency-domain (PBFD) technique to allow the system to work also with long acoustic paths. NU-Tech software platform has been used on purpose for real-time simulations, performed over synthetic and real acoustic conditions.
Convention Paper 7899 (Purchase now)
P11-7 Perception-Based Audio Signal Mixing in Automotive Environments—Wolfgang Hess, Harman/Becker Automotive Systems - Karlsbad-Ittersbach, Germany
Information and announcement presentation in noisy environments such as vehicles requires dynamic adjustment of signals for optimal information audibility and speech intelligibility. Not only variant ambient noises, but, in particular, their combination with today’s vehicle infotainment systems capability to reproduce a variety of entertainment signal sources, make information presentation difficult. Most different input level ranges as well as a variety of compressions ratios of audio signals have to be considered. A further challenge is the dynamic, loudness-dependent binaural intelligibility level difference of the human auditory system. The algorithm presented in this paper solves these issues described here by dynamically mixing information and announcement signals to entertainment signals. Entertainment signals are attenuated as little as possible, and information or announcement signals are added in loudness as demanded. As a result, optimal announcement intelligibility and information audibility is achieved.
Convention Paper 7900 (Purchase now)
P11-8 Visualization and Analysis Tools for Low Frequency Propagation in a Generalized 3-D Acoustic Space—Adam J. Hill, Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK
A toolbox is described that enables 3-D animated visualization and analysis of low-frequency wave propagation within a generalized acoustic environment. The core computation exploits a Finite-Difference Time-Domain (FDTD) algorithm selected because of its known low frequency accuracy. Multiple sources can be configured and analyses performed at user-selected measurement locations. Arbitrary excitation sequences enable virtual measurements embracing both time-domain and spatio-frequency domain analysis. Examples are presented for a variety of low-frequency loudspeaker placements and room geometries to illustrate the versatility of the toolbox as an acoustics design aid.
Convention Paper 7901 (Purchase now)
P12 - Transducers Manufacturing and Equipment
Sunday, October 11, 9:00 am — 1:00 pm
Chair: Alexander Voishvillo
P12-1 Time Varying Behavior of the Loudspeaker Suspension: Displacement Level Dependency—Finn Agerkvist, Technical University of Denmark - Lyngby, Denmark; Bo Rhode Petersen, Esbjerg Institute of Technology, Aalborg University - Aalborg, Denmark
The compliance of the loudspeaker suspension is known to depend on the recent excitation level history. Previous investigations have shown that the electrical power as well as displacement and velocity plays a role. In this paper the hypothesis that the changes in compliance are caused mainly by how much the suspension has been stretched, i.e., the maximum displacement, is investigated. For this purpose the changes in compliance are measured when exposing the loudspeaker to different levels and types of electrical excitation signals, as well as mechanical excitation only. For sinusoidal excitation the change in compliance is shown to depend primarily on maximum displacement. But for square pulse excitation the duration of the excitation also plays an important role.
Convention Paper 7902 (Purchase now)
P12-2 Fast Measurement of Motor and Suspension Nonlinearities in Loudspeaker Manufacturing—Wolfgang Klippel, University of Technology Dresden - Dresden, Germany; Joachim Schlechter, KLIPPEL GmbH - Dresden, Germany
Nonlinear distortions are measured at the end of the assembling line to check the loudspeaker system and to make a pass/fail decision. However, the responses of single components and total harmonic distortion have a low diagnostic value because they are difficult to interpret and do not reveal the particular cause of the defect. A new measurement technique is presented that measures the nonlinearities of motor and suspension system directly. The results are single-valued parameters (e.g., voice coil offset in mm), which are directly related with the geometry and large signal parameters of the loudspeaker system. The measurement is only based on the measurement of the electrical signals at the speaker’s terminals giving full robustness against ambient noise. The accuracy of the measurement results is investigated while performing measurements using short stimuli between 0.2 and 1.3 seconds. The paper discusses new possibilities for on-line diagnostic during end-of-line testing and the integration into production control to increase the yield of the production.
Convention Paper 7903 (Purchase now)
P12-3 A Novel Technique for Detecting and Locating Loudspeaker Defects—Yi Yang, Junfeng Wei, Haihong Feng, Zhoubin Wen, Chinese Academy of Sciences - Beijing, China
A novel technique for the measurement of rub and buzz using a fast tracking high pass filter is presented first in this paper. In the tests of 100,000 loudspeaker samples on a production line, the very low missed detection was 0.006% and the false alarm rate was 4.68% with this method compared with human hearing tests. Then a method consisted of detecting loudspeaker defect and estimating loudspeaker displacement without using a laser displacement sensor is launched. Only the current response of the loudspeaker is used to predict loudspeaker displacement. And there is less than 3% error in phase component according to the direct laser measurement. Several experiments proved this method was very effective.
Convention Paper 7904 (Purchase now)
P12-4 Practical Measurement of Loudspeaker Distortion Using a Simplified Auditory Perceptual Model—Steve Temme, Pascal Brunet, Listen Inc. - Boston, MA, USA; D. B. (Don) Keele Jr., DBK Associates and Labs - Bloomington, IN, USA
Manufacturing defects in loudspeaker production can often be identified by an increase in rub and buzz distortion. This type of distortion is quite noticeable because it contributes an edgy sound to the reproduction and is annoying because it often sounds separate or disembodied from the fundamental signal. The annoyance of rub and buzz distortion is tied intimately to human perception of sound and psychoacoustics. To properly implement automated production-line testing of loudspeaker rub and buzz defects, one has to model or imitate the hearing process using a sufficiently accurate perceptual model. This paper describes the results of a rub and buzz detection system using a simplified perceptual model based on human masking thresholds that yields excellent results.
Convention Paper 7905 (Purchase now)
P12-5 The Audio Performance Comparison and Effective Error Correction Method of Switching Amplifiers—Jae Cheol Lee, Haekwang park, Donghyun Lim, Joonhyun Lee, Yongserk Kim, Samsung Electronics Co., Ltd.
This paper introduces various open-loop and closed-loop switching amplifiers and then reviews merits and demerits when they are applied to consumer electronic products. Audio specifications in products that open-loop and closed-loop switching amplifiers are adopted to are measured and analyzed as to whether they have weak points. After that, the paper proposes a simple and effective method for the error control. The proposed method has the outstanding audio performance in consumer electronics products.
Convention Paper 7906 (Purchase now)
P12-6 Investigation of Switching Frequency Variations and EMI Properties in Self-Oscillating Class D Amplifiers—Dennis Nielsen, Arnold Knott, Technical University of Denmark - Lyngby, Denmark; Gerhard Pfaffinger, Harman/Becker Automotive Systems GmbH - Straubing, Germany; Michael Andreas E. Andersen, Technical University of Denmark - Lyngby, Denmark
Class D audio amplifiers have gained significant influence in sound reproduction due to their high efficiency. One of the most commonly used control methods in these amplifiers is self-oscillation. A parameter of key interest in self-oscillating amplifiers is the switching frequency, which is known for its variation. Knowledge of switching frequency variations is of great importance with respect to electromagnetic interference (EMI). This paper will investigate whether the switching frequency is depended on modulation index and audio reference frequency. Validation is done using simulations, and the results are compared with measurements performed on a 50 W prototype amplifier. The switching frequency is tracked through accurate spectrum measurements, and very good compliance with simulation results are observed.
Convention Paper 7907 (Purchase now)
P12-7 Design Optimizations for a High Performance Handheld Audio Analyzer—Markus Becker, NTi Audio AG - Schaan, Liechtenstein
All types of advanced mobile devices share certain design challenges. For example, incorporating a powerful embedded processor system to support comprehensive functionality via a full featured easy-to-use human interface at low power consumption. But designing a multi-function handheld audio analyzer adds further challenges based upon requirements for extremely low noise floor, wide measurement range, compatibility with measuring microphones and other demands, including standards compliance. Additional requirements include the efficient display of complex data onto a restricted size display, and efficient and safe operation in many different locations and environments. These place further design burdens on the user interface and the instrument package, respectively.
Convention Paper 7908 (Purchase now)
P12-8 The 48 Volt Phantom Menace Returns—Rosalfonso Bortoni, Wayne Kirkwood, THAT Corporation - Milford, MA, USA
Hebert and Thomas presented a paper at the 110th AES Convention [Convention Paper 5335] that described the “phantom menace” phenomenon wherein microphone phantom power faults can damage audio input circuitry. Now, a few years later, this paper brings and provides new information about the phantom menace fault mechanisms, analyzes common protection circuits, and introduces a new protection scheme that is more robust. In addition, new information is presented relating these input protection schemes to audio performance and recommendations are made to minimize noise and distortion.
Convention Paper 7909 (Purchase now)
P13 - Spatial Audio
Sunday, October 11, 9:00 am — 1:00 pm
Chair: Jean-Marc Jot
P13-1 Microphone Array Optimization for a Hearing Restoration Headset—Marty Johnson, Philip Gillett, Efrain Perini, Alessandro Toso, Virginia Tech - Blacksburg, VA, USA; Daniel Harris, Sennheiser Research Laboratory - Palo Alto, CA, USA
Subjects wearing communications or hearing protection headsets lose the ability to localize sound accurately. Here we describe a hearing restoration headset designed to restore a user’s natural hearing by processing signals from an array of microphones using a filter-and-sum technique and presenting the result to the user via the headset’s speakers. The filters are designed using a phase compensation technique for mapping the microphone array manifolds (or directional transfer functions) onto the target HRTFs. To optimize the performance of the system, a 3-D numerical model of a KEMAR mannequin with headset was built and verified experimentally up to 12 KHz. The numerical model was used to optimize a three microphone array that demonstrated low reconstruction error up to 12 KHz.
Convention Paper 7910 (Purchase now)
P13-2 Optimized Parameter Estimation in Directional Audio Coding Using Nested Microphone Arrays—Giovanni Del Galdo, Oliver Thiergart, Fabian Kuech, Maja Taseskma, Divya Sishtla V.N., Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional Audio Coding (DirAC) is an efficient technique to capture and reproduce spatial sound on the basis of a downmix audio signal, direction of arrival, and diffuseness of sound. In practice, these parameters are determined using arrays of omnidirectional microphones. The main drawback of such configurations is that the estimates are reliable only in a certain frequency range, which depends on the array size. To overcome this problem and cover large bandwidths, we propose concentric arrays of different sizes. We derive optimal joint estimators of the DirAC parameters with respect to the mean squared error. We address the problem of choosing the optimal array sizes for specific applications such as teleconferencing and we verify our findings with measurements.
Convention Paper 7911 (Purchase now)
P13-3 Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis—Juha Merimaa, Sennheiser Research Laboratory - Palo Alto, CA, USA
Using head-related transfer functions (HRTFs) in binaural synthesis often produces undesired timbral coloration. In this paper a method for designing modified HRTF filters with reduced timbral effects is proposed. The method is based on reducing the variation in the root-mean-square spectral sum of a pair of HRTFs while preserving the interaural time difference and interaural level difference. In formal listening tests it is shown that the coloration due to the tested non-individualized HRTFs can be significantly reduced without altering the resulting localization.
Convention Paper 7912 (Purchase now)
P13-4 An Active Multichannel Downmix Enhancement for Minimizing Spatial and Spectral Distortions—Jeffrey Thompson, Aaron Warner, Brandon Smith, DTS, Inc. - Agora Hills, CA, USA
With the continuing growth of multichannel audio formats, the issue of downmixing to legacy formats such as stereo or mono remains an important problem. Traditional downmix methods use fixed downmix coefficients and mixing equations to blindly combine N input channels into M output channels, where N is greater than M. This commonly produces unpredictable and unsatisfactory results due to the dependence of these passive methods on input signal characteristics. In this paper an active downmix enhancement employing frequency domain analysis of key inter-channel spatial cues is described that minimizes various distortions commonly observed in downmixed audio such as spatial inaccuracy, timbre change, signal coloration, and reduced intelligibility.
Convention Paper 7913 (Purchase now)
P13-5 Physical and Perceptual Properties of Focused Virtual Sources in Wave Field Synthesis—Sascha Spors, Hagen Wierstorf, Matthias Geier, Jens Ahrens, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
Wave field synthesis is a well established high-resolution spatial sound reproduction technique. Its physical basis allows reproduction of almost any desired wave field, even virtual sources that are positioned in the area between the loudspeakers and the listener. These are known as focused sources. A previous paper has revealed that focused sources have a number of remarkable physical properties, especially in the context of spatial sampling. This paper will further investigate on these and other physical artifacts. Additionally, results of perceptual experiments will be discussed in order offer a conclusion on the perceptual relevance of the derived artifacts in practical implementations.
Convention Paper 7914 (Purchase now)
P13-6 Localization Curves for a Regularly-Spaced Octagon Loudspeaker Array—Laurent S. R. Simon, Russell Mason, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Multichannel microphone array designs often use the localization curves that have been derived for 2-0 stereophony. Previous studies showed that side and rear perception of phantom image locations require somewhat different curves. This paper describes an experiment conducted to evaluate localization curves using an octagon loudspeaker setup. Interchannel level differences were produced between the loudspeaker pairs forming each of the segments of the loudspeaker array, one at a time, and subjects were asked to evaluate the perceived sound event's direction and its locatedness. The results showed that the localization curves derived for 2-0 stereophony are not directly applicable, and that different localization curves are required for each loudspeaker pair.
Convention Paper 7915 (Purchase now)
P13-7 Fixing the Phantom Center: Diffusing Acoustical Crosstalk—Earl Vickers, STMicroelectronics - Santa Clara, CA, USA
When two loudspeakers play the same signal, a "phantom center" image is produced between the speakers. However, this image differs from one produced by a real center speaker. In particular, acoustical crosstalk produces a comb-filtering effect, with cancellations that may be in the frequency range needed for the intelligibility of speech. We present a method for using phase decorrelation to fill in these gaps and produce a flatter magnitude response, reducing coloration and potentially enhancing dialog clarity. This method also improves headphone compatibility and reduces the tendency of the phantom image to move toward the nearest speaker.
Convention Paper 7916 (Purchase now)
P13-8 Frequency-Domain Two- to Three-Channel Upmix for Center Channel Derivation and Speech Enhancement—Earl Vickers, STMicroelectronics - Santa Clara, CA, USA
Two- to three-channel audio upmix can be useful in a number of contexts. Adding a front center loudspeaker provides a more stable center image and an increase in dialog clarity. Even in the absence of a physical center loudspeaker, the ability to derive a center channel can facilitate speech enhancement by making it possible to boost or filter the dialog, which is usually panned to the center. Two- to three-channel upmix can also be a first step in upmixing from two to five channels. We propose a frequency-domain upmix process using a vector-based signal decomposition, including methods for improving the selectivity of the center channel extraction. A geometric interpretation of the algorithm is provided. Unlike most existing frequency-domain upmix methods, the current algorithm does not perform an explicit primary/ambient decomposition. This reduces the complexity and improves the quality of the center channel derivation.
Convention Paper 7917 (Purchase now)
P14 - Signal Processing
Sunday, October 11, 10:00 am — 11:30 am
P14-1 A New Distance Measurement Method for Distance-Based Howling Canceller—Akira Sogami, Arata Kawamura; Youji Iiguni, Osaka University - Toyonaka, Osaka, Japan
In this paper we propose a new distance measurement method for a distance-based howling canceller. We have previously proposed a howling canceller that uses only distance information between the loudspeaker and the microphone. The howling canceller suppresses howling based on the distance measured by a sonic wave. The conventional measurement method however has a noise while on distance measurement. To solve the problem we propose a new distance measurement method that suppresses the noise. Simulation results in a practical environment show that the proposed distance measurement method can almost exactly estimate the distance more silently than the conventional method.
Convention Paper 7918 (Purchase now)
P14-2 New Technology for Hearing Stimulation Employing the SPS-S Method—Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland; Henryk Skarzynski, Institute of Physiology and Pathology of Hearing - Warsaw, Poland; Bozena Kostek, University of Gdansk - Gdansk, Poland, Institute of Physiology and Pathology of Hearing, Warsaw, Poland; Piotr Odya, Piotr Suchomski, Gdansk University of Technology - Gdansk, Poland; Piotr Skarzynski, Sense Organs Institute - Nadarzyn, Poland
A prototype of a the new Compact Audio Therapy Unit (CATU) is presented that can process any audio signal inside a very compact device working in real time, employing advanced digital filtration, signal keying, manipulating playback rate, various spectral modifications of the signal, repeating phrases, and others. It was designed to provide a platform for the therapy with the new Method of the Aural Perception Stimulation (SPS-S). The design for wearability allows one to use the device effectively in normal everyday life conditions, e.g., outdoors. The compact and versatile processing device can potentially open a new era in patients and trainees mobility.
Convention Paper 7919 (Purchase now)
P14-3 Frequency Characteristics Measurements of 78 rpm Acoustic Record Players by the Pulse-Train Method—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, University of Tokyo - Tokyo, Japan
The authors have been engaged in the research for the restoration of seriously damaged audio signals, employing Generalized Harmonic Analysis (GHA). In this research, it is important to know frequency characteristics of sound reproducing equipments to realize proper sound reproduction. However, frequency characteristics of the ancient acoustic record players such as ”Credenza,” etc., are significant but not clear: especially, the frequency characteristics when records are actually reproduced. However it can solely be measured with frequency record and vibrator-method is not used any more. In fact, no shellac-made 78 rpm frequency record can be manufactured today: the traditional measurement techniques are inapplicable. On the other hand, one of the authors previously developed Pulse-Train measurement for phonograph cartridges, in order to obtain their frequency characteristics of amplitude and phase performances. This method is applicable so long as any pulse waveform is curved on record surface. Thus the authors employed this method. Radial directional groove was curved on a surface of shellac disc-record, and Pulse-Train response is obtained by reproducing the record with an acoustic record player. Some examples will be exhibited in the report.
Convention Paper 7920 (Purchase now)
P14-4 MDCT for Encoding Residual Signals in Frequency Domain Linear Prediction—Sriram Ganapathy, Johns Hopkins University - Baltimore, MD, USA; Petr Motlicek, Idiap Research Institute - Martigny, Switzerland; Hynek Hermansky, Johns Hopkins University - Baltimore, MD, USA
Frequency domain linear prediction (FDLP) uses autoregressive models to represent Hilbert envelopes of relatively long segments of speech/audio signals. Although the basic FDLP audio codec achieves good quality of the reconstructed signal at high bit-rates, there is a need for scaling to lower bit-rates without degrading the reconstruction quality. Here, we present a method for improving the compression efficiency of the FDLP codec by the application of the modified discrete cosine transform (MDCT) for encoding the FDLP residual signals. In the subjective and objective quality evaluations, the proposed FDLP codec provides competent quality of reconstructed signal compared to the state-of-the-art audio codecs for the 32 - 64 kbps range.
Convention Paper 7921 (Purchase now)
P14-5 State-Space Biquad Filters with Low Noise and Improved Efficiency for Pipelined DSPs—David McGrath, Dolby Laboratories - Sydney, NSW, Australia
A State-Space filter structure is presented, along with simplified equations for mapping the coefficients of arbitrary biquad filter coefficients to the State-Space structure. This procedure allows low noise implementation of an arbitrary second-order filter transfer function. A block-processing variant of the State-Space structure is described, with the added benefit that greater efficiency can be achieved on some classes of modern pipelined DSP processors.
Convention Paper 7922 (Purchase now)
P14-6 A Bandlimited Oscillator by Frequency-Domain Synthesis for Virtual Analog Applications—Glen Deslauriers, Colby Leider, University of Miami - Coral Gables, FL, USA
Problems posed by the bandlimited synthesis of audio signals have long been addressed by the music and audio engineering communities. However, few of the proposed solutions have the flexibility necessary to accurately model and produce the variety of waveform functions present in an analog oscillator. Preferably, an additive technique would be employed as the ideal method of alias-free synthesis. Inverse Fourier Transform synthesis is one method that is often discussed but less-frequently utilized. Here we propose modifications to the method and implementation of Inverse Fourier Transform synthesis as a viable basis for the creation of a software oscillator for a Virtual Analog instrument. Design results show the quality to outperform a variety of currently implemented methods.
Convention Paper 7923 (Purchase now)
P14-7 Digital Simulation of Phonograph Tracking Distortion—Richard Tollerton, Isomorphic Software, Inc. - San Francisco, CA, USA
Phonograph tracking distortion results from the misalignment of a playback cartridge with respect to the cutting head. While it has been researched for decades, it remains a source of mystery: it cannot be isolated, it has not been accurately simulated, and its importance remains undecided. Here, a PCM simulation of horizontal and vertical tracking distortion of extremely high quality is presented, operating on the principle of phase modulation, allowing tracking distortion to be evaluated in isolation with real musical content. In this context, tracking distortion is equivalent to digital audio sampling jitter, with the jitter spectrum equal to the signal spectrum. Implications of this connection, as well as simulation accuracy, preliminary listening test results, and potential applications are discussed.
Convention Paper 7924 (Purchase now)
P15 - Digital Audio Effects
Sunday, October 11, 2:00 pm — 5:30 pm
Chair: David Berners
P15-1 Discrete Time Emulation of the Leslie Speaker—Jorge Herrera, Craig Hanson, Jonathan S. Abel, Stanford University - Stanford, CA, USA
A discrete-time emulation of the Leslie loudspeaker acoustics is described. The midrange horn and subwoofer baffle are individually modeled, with their rotational dynamics separately tracked, and used to drive time-varying FIR filters applied to the input. The rotational speeds of the horn and baffle are approximated by first-order difference equations having different time constants for acceleration and deceleration. Several time-varying FIR filter methods were explored, all based on impulse responses tabulated over a dense set of horn and baffle angles. In one method, the input sample scales an interpolated impulse response at the current horn or baffle angle, which is added to the output. An example model of a Leslie 44W is presented.
Convention Paper 7925 (Purchase now)
P15-2 A Novel Transient Handling Scheme for Time Stretching Algorithms—Frederik Nagel, Andreas Walther, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Changing either speed or pitch of audio signals without affecting the respective other is often used for music production and creative reproduction, such as remixing. It is also utilized for other purposes such as bandwidth extension and speech enhancement. While stationary signals can be stretched without harming the quality, transients are often not well maintained after time stretching. The present paper demonstrates a novel approach for transient handling in time stretching algorithms. Transient regions are replaced by stationary signals. The thereby removed transients are saved and re-inserted to the time dilated stationary audio signal after time stretching.
Convention Paper 7926 (Purchase now)
P15-3 The Switched Convolution Reverberator—Keun-Sup Lee, Jonathan S. Abel, Stanford University - Stanford, CA, USA; Vesa Välimäki, Helsinki University of Technology - Espoo, Finland; David P. Berners, Universal Audio, Inc. - Santa Cruz, CA, USA
An artificial reverberator having low memory and small computational cost, appropriate for mobile devices, is presented. The reverberator consists of an equalized comb filter driving a convolution with a short noise sequence. The reverberator equalization and decay rate are controlled by low-order IIR filters, and the echo density is that of the noise sequence. While this structure is efficient and readily generates high echo densities, if a fixed noise sequence is used, the reverberator has an unwanted periodicity at the comb filter delay length. To overcome this difficulty, the noise sequence is regularly updated or “switched.” Several structures for updating the noise sequence, including a leaky integrator sensitive to the signal crest factor, and a multi-band architecture, are described.
Convention Paper 7927 (Purchase now)
P15-4 An Emulation of the EMT 140 Plate Reverberator Using a Hybrid Reverberator Structure—Aaron Greenblatt, Stanford University - Stanford, CA, USA; Jonathan S. Abel, David P. Berners, Stanford University - , Stanford, CA, USA, Universal Audio Inc., Scotts Valley, CA, USA
A digital emulation of the Elektromesstechnik (EMT) 140 plate reverberator is presented. The EMT 140 consists of a signal plate and a moveable damping plate; it is approximately linear and time invariant, and its impulse response is characterized by a whip-like onset and high echo density. Here, the hybrid reverberator proposed by Stewart and Murphy, in which a short convolution is run in parallel with a feedback delay network (FDN), is used to model the plate. The impulse response onset is only weakly dependent on the damping control and is modeled by the convolution; the FDN is fit to the impulse response tail. The echo density, equalization, and decay rates are matched at the transition between the convolution and FDN.
Convention Paper 7928 (Purchase now)
P15-5 Simulation of a Guitar Amplifier Stage for Several Triode Models: Examination of Some Relevant Phenomena and Choice of Adapted Numerical Schemes—Ivan Cohen, IRCAM - Paris, France, Orosys R&D, Montpellier, France; Thomas Hélie, IRCAM - Paris, France
This paper deals with the simulation of a high gain triode stage of a guitar amplifier. Triode models taking into account various "secondary phenomena" are considered and their relevance on the stage is analyzed. More precisely, both static and dynamic models (including parasitic capacitances) are compared. For each case, the stage can be modeled by a nonlinear differential algebraic system. For static triode models, standard explicit numerical schemes yield efficient stable simulations of the stage. However, the effect due to the capacitances in dynamic models is audible (Miller effect) and must be considered. The problem becomes stiff and requires the use of implicit schemes. The results are compared for all the models and corresponding VST plug-ins have been implemented.
Convention Paper 7929 (Purchase now)
P15-6 Over-Threshold Power Function Feedback Distortion Synthesis—Tom Rutt, Coast Enterprises, LLC - Asbury Park, NJ, USA
This paper describes an approach to nonlinear distortion synthesis, which uses Over-Threshold Power Function (OTPF) Feedback. The linear gain of an OTPF Feedback distortion synthesizer (using a high gain amplifier) is determined by a linear feedback element. When the output signal becomes greater than a positive threshold value, or less than a negative threshold value, additional OTPF feedback is applied to the distortion synthesizer. The action of this OTPF Feedback Distortion synthesis closely emulates the soft limiting input/output response characteristics of vacuum tube triode grid limit distortion. An important feature of an OTPF feedback distortion synthesizer is that it always behaves as an instantaneous soft limiter and never results in clipping of the output signal peak levels (even at maximum allowable peak input signal levels), if its nonlinear gain constants are set optimally. The paper also describes both circuit and software plug-in realizations of distortion synthesizers that employ Over-Threshold Cubic Function Feedback.
Convention Paper 7930 (Purchase now)
P15-7 Dynamic Panner: An Adaptive Digital Audio Effect for Spatial Audio—Martin Morrell, Joshua D. Reiss, Queen Mary University of London - London, UK
Digital audio effects usually have their parameters controlled by the user, whereas adaptive digital audio effects, or A-DAFx, have some parameters that are driven by the automatic processing and extraction of sound features in the signal content. In this paper we introduce a new A-DAFx, the Dynamic Panner. Based on RMS measurement of its incoming audio signal, the sound source is panned between two user defined points. The audio effect is described and discussed, detailing the technicalities of all the control parameters and the creative context in which the effect can be used. Objective results that can be obtained from the effect are also presented. The audio effect has been implemented as both stereo and 3-dimensional effects using Max/MSP.
Convention Paper 7931 (Purchase now)
P16 - Multichannel Sound and Imaging
Sunday, October 11, 3:30 pm — 5:00 pm
P16-1 Evaluation of a Multipoint Equalization System Based on Impulse Responses Prototype Extraction—Stefania Cecchi, Lorenzo Palestini, Paolo Peretti, Laura Romoli, Francesco Piazza, Università Politecnica delle Marche - Ancona, Italy; Alberto Carini, Universita’ di Urbino - Urbino, Italy
In this paper a frequency domain multipoint equalization algorithm, which combines fractional octave smoothing of measured impulse responses (IRs) in multiple locations and the extraction of a representative prototype, is presented. The proposed approach is evaluated considering different methods to combine the IRs for the prototype extraction and to obtain the inverse filter for equalization, using sets of impulse responses measured in realistic environments. With respect to previous works, the influence on the equalization performance of the number of considered positions and of the equalization zone size is deeply investigated. Also a comparison with the single point equalization approach is reported. Finally, the multipoint equalization robustness is evaluated also on positions different from those used for the equalizer estimation.
Convention Paper 7932 (Purchase now)
P16-2 Acoustic Design of NHK HD-520 Multichannel Postproduction Studio—A New Approach for Room Acoustic Design Using Multi-Layered Random Diffusers—Yasushi Satake, Kazuhiro Makino, Yasuhiro Sakiyama, Hideo Tsuro, Nittobo Acoustic Engineering Co., Ltd. - Tokyo, Japan; Akira Fukada, Ryota Ono, NHK (Japan Broadcasting Corporation) - Shibuya-ku, Tokyo, Japan; Kazutsugu Uchimura, NHK Media Technology - Shibuya-ku, Tokyo, Japan; Junichi Mikami, NHK Integrated Technology - Shibuya-ku, Tokyo, Japan; Masamichi Otani, NHK (Japan Broadcasting Corporation) - Shibuya-ku, Tokyo, Japan; Ikuko Sawaya, NHK Science & Technical Research Laboratories - Ketagaya-ku, Tokyo, Japan
In this paper a novel approach for room acoustic design adopting the renewal project of NHK HD-520 multichannel postproduction studio is introduced. HD-520 studio is designed for direct surround loudspeaker arrangements based on ITU-R BS.775-1 and adopting an acoustically transparent screen. Generally, there are three important keys for acoustic design of multichannel postproduction studios. The first is to obtain stable and flat low frequency responses; the second, smooth panning and accurate phantom sound image. And the third, natural sounds that have well-balanced frequency characteristics. To resolve these problems and produce a superior monitoring environment, a new approach for room acoustic design using multi-layered random diffusers with cylindrical diffusers of different sizes (MLRD) is applied for this project. First of all, an outline and the acoustic design concept for the renewal of NHK HD-520 studio are introduced. Second, the concrete method for room acoustic design for the purpose of certifying the high quality for monitoring of both audio and picture is introduced. The preferable acoustic characteristics were shown in the measurement results, and a high reputation has been given by engineers for the comfortable work area for a surround monitoring environment.
Convention Paper 7933 (Purchase now)
P16-3 Robust Interchannel Correlation (ICC) Estimation Using Constant Interchannel Time Difference (ICTD) Compensation—Dongil Hyun, Yonsei University - Seoul, Korea; Jeongil Seo, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Youngcheol Park, Yonsei University - Wonju, Korea; Daehee Youn, Yonsei University - Seoul, Korea
This paper proposes an interchannel correlation (ICC) estimation method that can enhance the performance of the spatial audio coding such as Parametric Stereo of HE-AACv2 and MPEG Surround. Conventional ICC estimation methods assume that phase differences between two channel signals are constant in parameter bands and those phase differences are compensated to maximize the ICC. The proposed method introduces robust ICC estimation by compensating constant interchannel time difference (ICTD). ICTD is estimated from interchannel phase difference (IPD) and linear phases corresponding to ICTD are compensated. ICTD is Simulation results show that the proposed method provides more accurate ICC’s than the conventional methods.
Convention Paper 7934 (Purchase now)
P16-4 Measurement of Audio System Imaging Performance—David Clark, DLC Design - Northville, MI, USA
Mixing engineers assign different sounds to different channels of stereo or multichannel media with the expectation that the listener will experience the intended directional aspects of the sound. For standard +/–30 degree arrangement of playback speakers and listener, this expectation is usually realized. As non-standard influences, such as listening off-centerline, are introduced, localization and other aspects of spatial rendition are degraded. This paper describes a measurement system for quantifying these degradations in common perceptual dimensions such as image direction, width, distance, and stability.
Convention Paper 7936 (Purchase now)
P16-5 Matching Perceived Auditory Width to the Visual Image of a Performing Ensemble in Contrasting Multi-Modal Environments—Daniel L. Valente, Boys Town National Research Hospital - Omaha, NE, USA; Shane A. Myrbeck, Arup Acoustics - San Francisco, CA, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
Participants were given an audio-visual matching test, in which they were instructed to align the acoustic width of a performing ensemble to a varying set of audio and visual cues. Participants are asked to assess a vocal ensemble that is positioned with varying visual width in five contrasting physical spaces with monotonically-increasing Reverberation Times. Each performance to be assessed begins with forced auditory-visual mismatch (the acoustical location of the sound sources not matching that of the visual imagery), and participants were instructed to align the acoustic presentation to the visual imagery of the performance. The results show that the participants' matching ability is based on the source-distance as well as the spacing of the ensemble.
Convention Paper 7937 (Purchase now)
P16-6 Stereo Music Source Separation for 3-D Upmixing—Hwan Shim, Jonathan Abel, Stanford University - Stanford, CA, USA; Koeng-Mo Sung, Seoul National University - Kwanak-Gu, Seoul, Korea
A method for 3-D upmixing based on stereo source separation and a primary-ambient decomposition is presented. The method separately renders primary and ambient components, and separately pans sources derived from the primary signal. Since all separated sources appear in the upmixed output, it is more important that the source separation method be free of audible artifacts than achieve a complete separation of the sources present. Typically, the mixing vector amplitude or energy is allocated to the various sources present, for instance all given to the most likely source, or allocated to each source in proportion to its likelihood. However, these choices produce “musical” noise and source motion artifacts in the upmixed signal. Here, two sources are selected according to the mixing vector direction, and the mixing vector energy is allocated by inverting the panning matrix associated with the selected sources. Listening tests show an upmix with separated sources and few audible artifacts.
Convention Paper 7938 (Purchase now)
P16-7 Automated Assessment of Surround Sound—Richard C. Cabot, Qualis Audio - Lake Oswego, OR, USA
The design of a real time electronic listener, optimized for surround sound program assessment, is described. Problems commonly encountered in surround audio production and distribution are automatically identified, including stereo/mono downmix compatibility, balance, metadata inconsistencies, channel interchange, loudness, excessive or inadequate level, and the presence of hum. Making measurements that correlate with audibility, displaying the results in a form easily understood by non-audio personnel created numerous design challenges. The technology used to solve these challenges, particularly that of downmix compatibility, will be described.
Convention Paper 7939 (Purchase now)
P17 - Audio Networks
Monday, October 12, 9:00 am — 12:00 pm
Chair: Richard Foss, Rhodes University - Grahamstown, South Africa
P17-1 Performance Metrics for Network Audio Systems: Methodology and Comparison—Nicolas Bouillot, Mathieu Brulé, Jeremy R. Cooperstock, McGill University - Montreal, Quebec, Canada
Network audio transmission is becoming increasingly popular within the broadcast community, with applications to Voice over IP (VoIP) communications, audio content distribution, and radio broadcast. Issues of end-to-end latency, jitter, and overall quality, including glitches of the delivered signal, all impact on the value of the technology. Although considerable literature exists comparing audio codecs, little has been published to compare systems in terms of their real-word performance. In response, we describe methods for accurately assessing the quality of audio streams transmitted over networks. These methods are then applied to an empirical evaluation of several audio compression formats supported by different streaming engines.
Convention Paper 7940 (Purchase now)
P17-2 An Integrated Connection Management and Control Protocol for Audio Networks—Richard Foss, Rhodes University - Grahamstown, South Africa; Robby Gurdan, Bradley Klinkradt, Nyasha Chigwamba, Universal Media Access Networks (UMAN) - Dusseldorf, Germany
With the advent of digital networks that link audio devices, there is a need for a protocol that integrates control and connection management, allows for streaming of all media content such as audio and video between devices from different manufacturers, and that provides a common approach to the control of these devices. This paper proposes such a protocol, named XFN (currently being standardized as part of the AES X170 project). XFN is an IP-based peer to peer network protocol in which any device on the network may send or receive connection management, control, and monitoring messages. Essential to the XFN protocol is the fact that each parameter in a device can be addressed via a hierarchical structure that reflects the natural layout of the device.
Convention Paper 7941 (Purchase now)
P17-3 Mixing Console Design Considerations for Telematic Music Applications—Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Chris Chafe, Stanford University - Stanford, CA, USA; Pauline Oliveros, Doug Van Nort, Rensselaer Polytechnic Institute - Troy, NY, USA
This paper describes the architecture for a new mixing console that was especially designed for telematic live music collaborations. The prototype mixer is software-based and programmed in Pure Data. It has many traditional features but also a number of extra modules that are important for telematic projects: transmission test unit, latency meter, remote data link, auralization unit, remote sound level calibration unit, remote monitoring, and a synchronized remote audio recording unit.
Convention Paper 7942 (Purchase now)
P17-4 Comparison of Receiver-Based Concealment and Multiple Description Coding in an 802.11-Based Wireless Multicast Audio Distribution Network—Marcus Purat, Tom Ritter, Beuth Hochschule für Technik Berlin - Berlin, Germany
This paper presents aspects of a study of different methods to mitigate the impact of packet loss in a wireless distribution network on the subjective quality of compressed high fidelity audio. The system was simulated in Matlab based on parameters of an 802.11a WLAN in multicast-mode and the Vorbis codec. To aid the selection of the most appropriate packet loss concealment strategy not only the additional bandwidth, the processing requirements or the latency need to be considered. One important differentiating factor is the perceived subjective audio quality. Therefore an accurate estimate of the subsequent audio quality is required. Several simulation-based methods using psychoacoustic models of the human hearing system to quantify the subjective audio quality are compared.
Convention Paper 7943 (Purchase now)
P17-5 Audio-Over-IP Acceptance Test Strategy—Matthew O'Donnell, BSkyB (British Sky Broadcasting) - London, UK
Ensuring the integrity of an application that delivers audio-over-IP through Ethernet demands thorough acceptance testing during the development cycle, due to the effect of the potentially volatile “Best Effort” nature of IP transport upon performance of the application. This paper investigates attributes of protocols used on top of IP that must be taken into account during development and their impact on an audio-over-IP's Quality of Experience to the end user.
Convention Paper 7944 (Purchase now)
P17-6 Long-Distance Uncompressed Audio Transmission over IP for Postproduction—Nathan Brock, Michelle Daniels, University of California, San Diego - La Jolla, CA, USA; Steve Morris, Skywalker Sound - Marin County, CA, USA; Peter Otto, University of California, San Diego - La Jolla, CA, USA
The highly distributed nature of contemporary cinema postproduction has led many to believe that high-speed networking of uncompressed audio could significantly improve workflow efficiency. This paper will provide an overview of several significant issues with long-distance networking, including synchronization, latency, bandwidth limitations, and control protocols. We will present a recent networked postproduction demonstration, in which audio assets in Seattle, San Francisco, and San Diego along with local video assets were streamed to and controlled from a single DAW. These results are expected to lead to persistent wide-area networked postproduction environments to remotely access and control audiovisual assets.
Convention Paper 7945 (Purchase now)
P18 - Analysis and Synthesis of Sound
Monday, October 12, 9:00 am — 10:30 am
Chair: Sunil Bharitkar, Audyssey Labs/USC - Los Angeles, CA, USA
P18-1 Audio Bandwidth Extension Using Cluster Weighted Modeling of Spectral Envelopes—Nikolay Lyubimov, Alexey Lukin, Moscow State University - Moscow, Russian Federation
This paper presents a method for blind bandwidth extension of band-limited audio signals. A rough generation of the high-frequency content is performed by nonlinear distortion (waveshaping) applied to the mid-range band of the input signal. The second stage is shaping of the high-frequency spectrum envelope. It is done by a Cluster Weighted Model for MFCC coefficients, trained on full-bandwidth audio material. An objective quality measure is introduced and the results of listening tests are presented.
Convention Paper 7946 (Purchase now)
P18-2 Applause Sound Detection with Low Latency—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
This paper presents a comprehensive investigation on the detection of applause sounds in audio signals. It focuses on the processing of single microphone recordings in real-time with low latency. A particular concern is the intensity of the applause within the sound mixture and the influence of the interfering sounds on the recognition performance which is investigated experimentally. Well-known feature sets, feature processing, and classification methods are compared. Additional low-pass filtering of the feature time series leads to the concept of sigma features and yields further improvements of the detection result.
Convention Paper 7947 (Purchase now)
P18-3 Loudness Descriptors to Characterize Wide Loudness-Range Material—Esben Skovenborg, Thomas Lund, TC Electronic A/S - Risskov, Denmark
Previously we introduced the concept of loudness descriptors—key numbers to summarize loudness properties of a broadcast program or music track. This paper presents the descriptors: Foreground Loudness and Loudness Range. Wide loudness-range material is typically level-aligned based on foreground sound rather than overall loudness. Foreground Loudness measures the level of foreground sound, and Loudness Range quantifies the variation in loudness. We propose to use these descriptors for loudness profiling and alignment, especially when live, raw, and film material is combined with other broadcast programs, thereby minimizing level-jumps and also preventing unnecessary dynamics-processing. The loudness descriptors were computed for audio segments in both linear PCM and perceptually coded versions. This evaluation demonstrates that the descriptors are robust against nearly-transparent transformations.
Convention Paper 7948 (Purchase now)
P19 - Arrays
Monday, October 12, 10:00 am — 11:30 am
P19-1 Control of Acoustic Radiation Pattern in a Dual-Dipole Array—Mincheol Shin, Philip Nelson, University of Southampton - Highfield, Southampton, UK
This paper proposes a control strategy for generating various acoustic radiation patterns associated with a “personalized sound field” using an acoustic source array. A dual-dipole array, which models each loudspeaker as an imperfect dipole and a source signal control algorithm, which effectively considers both energy difference maximization and radiation efficiency are introduced in order to obtain better radiation patterns for generating the desired sound field. With the proposed dual dipole array and control algorithm, a wide controllability from a comparatively low to a high frequency range is obtained, although the array size is small enough to implement in mobile applications. Conceptually novel control strategy proposed enables to cancel out the backward radiation efficiently. The performance of the dual dipole source array is verified by using computer simulations.
Convention Paper 7949 (Purchase now)
P19-2 A Novel Beam-Forming Loudspeaker System Using Digitally Driven Speaker System—Kyosuke Watanabe, Akira Yasuda, Hajime Ohtani, Ryota Suzuki, Naoto Shinkawa, Tomohiro Tsuchiya, Kenzo Tsuihiji, Hosei University - Koganei, Tokyo, Japan
In this paper we propose a beam-forming loudspeaker based on digitally direct driven loudspeaker system (digital-SP); the proposed speaker employs multi-bit delta-sigma modulation in addition to a line speaker array and a delay circuit. The proposed speaker can be realized only by D flip-flops and digital-SP. Delay elements are introduced between the mismatch shaper and sub-loudspeakers, and beam-forming is realized without degradation of the noise shaping performance of the multi-bit DSM. By using a small amount of additional hardware, we can easily control the sound direction. If a beam-forming loudspeaker can be driven digitally, all processes can be performed digitally without the use of analog components such as power amplifiers, and a small, light, high-quality speaker system can be realized. The prototype is constructed using an FPGA, CMOS drivers, and a line speaker array. The effectiveness has been confirmed by using measurement data. A gain of 8 dB or more is measured relative to the normal digital speaker system. An attenuation of 14.7dB for 40° direction is measured.
Convention Paper 7950 (Purchase now)
P19-3 Speaker Array System Based on Equalization Method with a Quiet Zone—Soonho Baek, Myung-Suk Song, Yonsei University - Seoul, Korea; Seok-Pil Lee, Korea Electronics Technology Institute - Seongnam, Korea; Hong-Goo Kang, Yonsei University - Seoul, Korea
This paper proposes an equalization-based loudspeaker array system to form a consistent sound spot to listeners under reverberant environment. To overcome the poor sound quality of conventional beamforming methods in a reverberant environment, the proposed method designs a novel criterion to reproduce as close as possible sound to the original source at the target point as well as to make null at specified points located in the quiet zone. Simulation results with a 16-channel loudspeaker array system confirm the superiority of the proposed method. In addition, we also verify that the sound pressure level of the quiet zone depends on the number and the area of quiet points.
Convention Paper 7951 (Purchase now)
P19-4 On the Secondary Source Type Mismatch in Wave Field Synthesis Employing Circular Distributions of Loudspeakers—Jens Ahrens, Sascha Spors, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
The theory of wave field synthesis has been formulated for linear and planar arrays of loudspeakers but has been found to be also applicable with arbitrary convex loudspeaker contours with acceptable error. The main source of error results from the fact that the required properties of the employed loudspeakers are dictated by the Neumann Green’s function of the array geometry under consideration. For nonlinear and nonplanar arrays a systematic error arises that is a result of the mismatch between the spatio-temporal transfer function of the loudspeakers and the Neumann Green’s function of the loudspeaker contour under consideration. We investigate this secondary source type mismatch for the case of circular distributions of loudspeakers.
Convention Paper 7952 (Purchase now)
P19-5 A Configurable Microphone Array with Acoustically Transparent Omnidirectional Elements—Jonathan S. Abel, Nicholas J. Bryan, Travis Skare, Miriam Kolar, Patty Huang, Stanford University - Stanford, CA, USA; Darius Mostowfi, Countryman Associates, Inc. - Menlo Park, CA, USA; Julius O. Smith III, Stanford University - Stanford, CA, USA
An acoustically transparent, configurable microphone array with omnidirectional elements, designed for room acoustics analysis and synthesis and archaeological acoustics applications, is presented. Omnidirectional microphone elements with 2 mm-diameter capsules and 1 mm-diameter wire mounts produce a nearly acoustically transparent array, and provide a simplified mathematical framework for processing measured signals. The wire mounts are fitted onto a 1.6 cm-diameter tube forming the microphone stand, with the microphones arranged above the tube so that acoustic energy can propagate freely across the array. The wire microphone mounts have some flexibility, and the array may be configured. Detachable arms with small speakers are used to estimate the element positions with an accuracy better than the 2 mm microphone diameter.
Convention Paper 7953 (Purchase now)
P19-6 Microphone Array Synthetic Reconfiguration—Yoomi Hur, Stanford University - Stanford, CA, USA, Yonsei University, Seoul, Korea; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Young-cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
This paper describes methods for processing signals recorded at a microphone array so as to estimate the signals that would have appeared at the elements of a different, colocated microphone array, i.e., "translating" measurements made at one microphone array to those hypothetically appearing at another array. Two approaches are proposed, a nonparametric method in which a fixed, low-sidelobe beamformer applied to the "source" array drives virtual sources rendered on the "target" array, and a parametric technique in which constrained beamformers are used to estimate source directions, with the sources extracted and rendered to the estimated directions. Finally, a hybrid method is proposed, which combines both approaches so that the extracted point sources and residual can be separately rendered. Experimental results using an array of 2 mm-diameter microphones and human HRTFs are reported as a simple example.
Convention Paper 7954 (Purchase now)
P19-7 Design and Optimization of High Directivity Waveguide for Vertical Array—Mario Di Cola, Audio Labs Systems - Casoli, CH, Italy; Dario Cinanni, Andrea Manzini, Tommaso Nizzoli, 18 Sound - Division of A.E.B, Srl - Cavriago, RE, Italy; Daniele Ponteggia, Studio Ponteggia - Temi, TR, Italy
Vertically arrayed loudspeaker systems have become widely used for several applications: concert sound, large scale systems, corporate events, and so on. In this kind of system the design of a proper acoustic waveguide is a key point for the system's performances. An acoustic waveguide, properly designed for this purpose, should be optimized for several features at the same time: acoustic loading properties, proper driver-throat matching, minimum internal reflection, low distortion, and, most of all, proper wavefront curvature optimization for good array-ability. An example of a practical approach to the design, dimensioning, and optimization of acoustic waveguide will be shown through loudspeaker system designing principles together with computer simulations and measured final results.
Convention Paper 7955 (Purchase now)
P20 - Loudspeakers in Rooms
Monday, October 12, 10:30 am — 1:00 pm
Chair: Sunil Bharitkar, Audyssey Labs/USC - Los Angeles, CA, USA
P20-1 Investigation of Bonello Criteria for Use in Small Room Acoustics—Todd Welti, Harman International - Northridge, CA, USA
The Bonello Criteria are a set of conditions that are an attempt to use rectangular room dimensions as a general predictor of room modal response quality. Though intuitively satisfying, and often used in room design, the Bonello Criteria make certain assumptions that are almost never met in listening rooms, and the approach has never been systematically validated. An investigation using a computer model and a large number of possible room dimensions was made to see if meeting the Bonello Criteria should result in improved low frequency acoustical responses. Overall, the Bonello Criteria correlates only weakly to the Variance of Spatial Average and there is no correlation to Mean Spatial Variance.
Convention Paper 7849 (Purchase now)
P20-2 Subwoofers in Rooms: Effect of Absorptive and Resonant Room Structures—Juha Backman, Nokia Corporation - Espoo, Finland
The room-loudspeaker interaction at low frequencies, where individual modes can be easily identified, needs careful consideration when flat response and controlled spatial distribution are desired. The methods for controlling low frequency response are loudspeaker placement, use of multiple subwoofers, use of absorptive materials, and specific to low-frequency acoustics, use of resonators. The effects of various types of resonators and absorptive surfaces are computed using FEM for single and multiple subwoofer configurations in symmetrical and asymmetrical rooms, indicating that taking both the symmetry of the mode to be controlled and the loudspeaker placement into account when placing the resonators and/or absorbers is needed for optimal results.
Convention Paper 7957 (Purchase now)
P20-3 In Situ Measurements of Acoustic Absorption Coefficients Using the Surface Pressure Method—Scott Mallais, John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
This paper revisits a method for determining the acoustic reflection factor by use of two pressure measurements: one at a surface under study and the other at a rigid surface in the same location of a room. The rigid surface is approximated in situ by placing a steel sheet in front of the surface under study. Measurements are made with and without the sheet at the same location. The ratio of these measurements is used to determine the acoustic reflection factor of the surface. The principle and limitations of this method are discussed, and experimental results will be given for a rigid surface, a resonant surface, and an absorptive surface, measured in different environments.
Convention Paper 7958 (Purchase now)
P20-4 The Challenge to Find the Optimum Radiation Pattern and Placement of Stereo Loudspeakers in a Room for the Creation of Phantom Sources and Simultaneous Masking of Real Sources—Siegfried Linkwitz, Linkwitz Lab - Corte Madera, CA, USA
Stereo sound reproduction relies upon the creation of an illusion. Ideally the two loudspeakers and the room disappear, leaving only a phantom acoustic scene to be listened to. The polar frequency response of a loudspeaker determines the angular distribution of room reflections and their spectral content. The placement of the loudspeakers relative to the room surfaces determines the initial delay of the reflections. Together they affect the formation of phantom sources. A proven loudspeaker and room configuration is proposed as starting point for listening tests to determine the optimum loudspeaker radiation pattern. It is an invitation to extend our understanding of the psychoacoustic processes that are involved with stereo listening in a room and to replace anecdotal with scientific evidence.
Convention Paper 7959 (Purchase now)
P20-5 The Subjective and Objective Evaluation of Room Correction Products—Sean E. Olive, John Jackson, Allan Devantier, David Hunt, Harman International - Northridge, CA, USA
A panel of eight trained listeners gave comparative ratings for five different room correction products based on overall preference and spectral balance. The same loudspeaker/subwoofer without correction was included as a hidden anchor. The results show significant differences among the room correction products in terms of listener preference and perceived spectral balance. The subjective ratings are largely explained by a combination of anechoic and in-room frequency response measurements made on the combined acoustic response of the room correction/loudspeaker.
Convention Paper 7960 (Purchase now)