AES San Francisco 2012
Paper Session Details

P1 - Amplifiers and Equipment

Friday, October 26, 9:00 am — 11:00 am (Room 121)

Chair:
Jayant Datta, THX - San Francisco, CA, USA; Syracuse University - Syracuse, NY, USA

P1-1 A Low-Voltage Low-Power Output Stage for Class-G Headphone Amplifiers—Alexandre Huffenus, EASii IC - Grenoble, France
This paper proposes a new headphone amplifier circuit architecture, which output stage can be powered with very low supply rails from ±1.8 V to ±0.2 V. When used inside a Class-G amplifier, with the switched mode power supply powering the output stage, the power consumption can be significantly reduced. For a typical listening level of 2x100µW, the increase in power consumption compared to idle is only 0.7mW, instead of 2.5mW to 3mW for existing amplifiers. In battery-powered devices like smartphones or portable music players, this can increase the battery life of more than 15% during audio playback. Theory of operation, electrical performance and a comparison with the actual state of the art will be detailed.
Convention Paper 8684 (Purchase now)

P1-2 Switching/Linear Hybrid Audio Power Amplifiers for Domestic Applications, Part 2: The Class-B+D Amplifier—Harry Dymond, University of Bristol - Bristol, UK; Phil Mellor, University of Bristol - Bristol, UK
The analysis and design of a series switching/linear hybrid audio power amplifier rated at 100 W into 8 O are presented. A high-fidelity linear stage controls the output, while the floating mid-point of the power supply for this linear stage is driven by a switching stage. This keeps the voltage across the linear stage output transistors low, enhancing efficiency. Analysis shows that the frequency responses of the linear and switching stages must be tightly matched to avoid saturation of the linear stage output transistors. The switching stage employs separate DC and AC feedback loops in order to minimize the adverse effects of the floating-supply reservoir capacitors, through which the switching stage output current must flow.
Convention Paper 8685 (Purchase now)

P1-3 Investigating the Benefit of Silicon Carbide for a Class D Power Stage—Verena Grifone Fuchs, University of Siegen - Siegen, Germany; CAMCO GmbH - Wenden, Germany; Carsten Wegner, University of Siegen - Siegen, Germany; CAMCO GmbH - Wenden, Germany; Sebastian Neuser, University of Siegen - Siegen, Germany; Dietmar Ehrhardt, University of Siegen - Siegen, Germany
This paper analyzes in which way silicon carbide transistors improve switching errors and loss associated with the power stage. A silicon carbide power stage and a conventional power stage with super-junction devices are compared in terms of switching behavior. Experimental results of switching transitions, delay times, and harmonic distortion as well as a theoretical evaluation are presented. Emending the imperfection of the power stage, silicon carbide transistors bring out high potential for Class D audio amplification.
Convention Paper 8686 (Purchase now)

P1-4 Efficiency Optimization of Class G Amplifiers: Impact of the Input Signals—Patrice Russo, Lyon Institute of Nanotechnology - Lyon, France; Gael Pillonnet, University of Lyon - Lyon, France; CPE dept; Nacer Abouchi, Lyon Institute of Nanotechnology - Lyon, France; Sophie Taupin, STMicroelectronics, Inc. - Grenoble, France; Frederic Goutti, STMicroelectronics, Inc. - Grenoble, France
Class G amplifiers are an effective solution to increase the audio efficiency for headphone applications, but realistic operating conditions have to be taken into account to predict and optimize power efficiency. In fact, power supply tracking, which is a key factor for high efficiency, is poorly optimized with the classical design method because the stimulus used is very different from a real audio signal. Here, a methodology has been proposed to find class G nominal conditions. By using relevant stimuli and nominal output power, the simulation and test of the class G amplifier are closer to the real conditions. Moreover, a novel simulator is used to quickly evaluate the efficiency with these long duration stimuli, i.e., ten seconds instead of a few milliseconds. This allows longer transient simulation for an accurate efficiency and audio quality evaluation by averaging the class G behavior. Based on this simulator, this paper indicates the limitations of the well-established test setup. Real efficiencies vary up to ±50% from the classical methods. Finally, the study underlines the need to use real audio signals to optimize the supply voltage tracking of class G amplifiers in order to achieve a maximal efficiency in nominal operation.
Convention Paper 8687 (Purchase now)

P2 - Networked Audio

Friday, October 26, 9:00 am — 10:30 am (Room 122)

Chair:
Ellen Juhlin, Meyer Sound - Berkeley, CA, USA; AVnu Alliance

P2-1 Audio Latency Masking in Music Telepresence Using Artificial Reverberation—Ren Gang, University of Rochester - Rochester, NY, USA; Samarth Shivaswamy, University of Rochester - Rochester, NY, USA; Stephen Roessner, University of Rochester - Rochester, NY, USA; Akshay Rao, University of Rochester - Rochester, NY, USA; Dave Headlam, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Network latency poses significant challenges in music telepresence systems designed to enable multiple musicians at different locations to perform together in real-time. Since each musician hears a delayed version of the performance from the other musicians it is difficult to maintain synchronization and there is a natural tendency for the musicians to slow their tempo while awaiting response from their fellow performers. We asked if the introduction of artificial reverberation can enable musicians to better tolerate latency by conducting experiments with performers where the degree of latency was controllable and for which artificial reverberation could be added or not. Both objective and subjective evaluation of ensemble performances were conducted to evaluate the perceptual responses at different experimental settings.
Convention Paper 8688 (Purchase now)

P2-2 Service Discovery Using Open Sound Control—Andrew Eales, Wellington Institute of Technology - Wellington, New Zealand; Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
The Open Sound Control (OSC) control protocol does not have service discovery capabilities. The approach to adding service discovery to OSC proposed in this paper uses the OSC address space to represent services within the context of a logical device model. This model allows services to be represented in a context-sensitive manner by relating parameters representing services to the logical organization of a device. Implementation of service discovery is done using standard OSC messages and requires that the OSC address space be designed to support these messages. This paper illustrates how these enhancements to OSC allow a device to advertise its services. Controller applications can then explore the device’s address space to discover services and retrieve the services required by the application.
Convention Paper 8689 (Purchase now)

P2-3 Flexilink: A Unified Low Latency Network Architecture for Multichannel Live Audio—Yonghao Wang, Birmingham City University - Birmingham, UK; John Grant, Nine Tiles Networks Ltd. - Cambridge, UK; Jeremy Foss, Birmingham City University - Birmingham, UK
The networking of live audio for professional applications typically uses layer 2-based solutions such as AES50 and MADI utilizing fixed time slots similar to Time Division Multiplexing (TDM). However, these solutions are not effective for best effort traffic where data traffic utilizes available bandwidth and is consequently subject to variations in QoS. There are audio networking methods such as AES47, which is based on asynchronous transfer mode (ATM), but ATM equipment is rarely available. Audio can also be sent over Internet Protocol (IP), but the size of the packet headers and the difficulty of keeping latency within acceptable limits make it unsuitable for many applications. In this paper we propose a new unified low latency network architecture that supports both time deterministic and best effort traffic toward full bandwidth utilization with high performance routing/switching. For live audio, this network architecture allows low latency as well as the flexibility to support multiplexing multiple channels with different sampling rates and word lengths.
Convention Paper 8690 (Purchase now)

P3 - Audio Effects and Physical Modeling

Friday, October 26, 10:00 am — 11:30 am (Foyer)

P3-1 Luciverb: Iterated Convolution for the Impatient—Jonathan S. Abel, Stanford University - Stanford, CA, USA; Michael J. Wilson, Stanford University - Stanford, CA, USA
An analysis of iteratively applied room acoustics used by Alvin Lucier to create his piece "I'm Sitting in a Room" is presented, and a real-time system allowing interactive control over the number of rooms in the processing chain is described. Lucier anticipated that repeated application of a room response would bring out room resonances and smear the input sound over time. What was unexpected was the character of the smearing, turning a transient input into a sequence of crescendos at the room modes, ordered from high-frequency to low-frequency. Here, a room impulse response convolve with itself L times is shown have energy at the room mofes, each with a roughly Gaussian envelope, peaking at the observed L/2 times the frequency-dependent decay time.
Convention Paper 8691 (Purchase now)

P3-2 A Tilt Filter in a Servo Loop—John Lazzaro, University of California, Berkeley - Berkeley, CA, USA; John Wawrzynek, University of California, Berkeley - Berkeley, CA, USA
Tone controls based on the tilt filter first appeared in 1982, in the Quad 34 Hi-Fi preamp. More recently, tilt filters have found a home in specialist audio processors such as the Elysia mpressor. This paper describes a novel dynamic filter design based on a tilt filter. A control system sets the tilt slope of the filter, in order to servo the spectral median of the filter output to a user-specified target. Users also specify a tracking time. Potential applications include single-instrument processing (in the spirit of envelope filters) and mastering (for subtle control of tonal balance). Although we have prototyped the design as an AudioUnit plug-in, the architecture is also a good match for analog circuit implementation.
Convention Paper 8692 (Purchase now)

P3-3 Multitrack Mixing Using a Model of Loudness and Partial Loudness—Dominic Ward, Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Cham Athwal, Birmingham City University - Birmingham, UK
A method for generating a mix of multitrack recordings using an auditory model has been developed. The proposed method is based on the concept that a balanced mix is one in which the loudness of all instruments are equal. A sophisticated psychoacoustic loudness model is used to measure the loudness of each track both in quiet and when mixed with any combination of the remaining tracks. Such measures are used to control the track gains in a time-varying manner. Finally we demonstrate how model predictions of partial loudness can be used to counteract energetic masking for any track, allowing the user to achieve better channel intelligibility in complex music mixtures.
Convention Paper 8693 (Purchase now)

P3-4 Predicting the Fluctuation Strength of the Output of a Spatial Chorus Effects Processor—William L. Martens, University of Sydney - Sydney, NSW, Australia; Robert W. Taylor, University of Sydney - Sydney, NSW, Australia; Luis Miranda, University of Sydney - Sydney, NSW, Australia
The experimental study reported in this paper was motivated by an exploration of a set of related audio effects comprising what has been called “spatial chorus.” In contrast to a single-output, delay-modulation-based effects processor that produces a limited range of results, complex spatial imagery is produced when parallel processing channels are subjected to incoherent delay modulation. In order to develop a more adequate user interface for control of such “spatial chorus” effects processing, a systematic investigation of the relationship between algorithmic parameters and perceptual attributes was undertaken. The starting point for this investigation was to perceptually scale the amount of modulation present in a set of characteristic stimuli in terms of the auditory attribute that Fastl and Zwicker called “fluctuation strength.”
Convention Paper 8694 (Purchase now)

P3-5 Computer-Aided Estimation of the Athenian Agora Aulos Scales Based on Physical Modeling—Areti Andreopoulou, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This paper presents an approach to scale estimation for the ancient Greek Aulos with the use of physical modeling. The system is based on manipulation of a parameter set that is known to affect the sound of woodwind instruments, such as the reed type, the active length of the pipe, its inner and outer diameters, and the placement and size of the tone-holes. The method is applied on a single Aulos pipe reconstructed from the Athenian Agora fragments. A discussion follows on the resulting scales and the system’s advantages, and limitations.
Convention Paper 8695 (Purchase now)

P3-6 A Computational Acoustic Model of the Coupled Interior Architecture of Ancient Chavín—Regina E. Collecchia, Stanford University - Stanford, CA, USA; Miriam A. Kolar, Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
We present a physical, modular computational acoustic model of the well-preserved interior architecture at the 3,000-year-old Andean ceremonial center Chavín de Huántar. Our previous model prototype [Kolar et. al. 2010] translated the acoustically coupled topology of Chavín gallery forms to a model based on digital waveguides (bi-directional by definition), representing passageways, connected through reverberant scattering junctions, representing the larger room-like areas. Our new approach treats all architectural units as “reverberant” digital waveguides, with scattering junctions at the discrete planes defining the unit boundaries. In this extensible and efficient lumped-element model, we combine architectural dimensional and material data with sparsely measured impulse responses to simulate multiple and circulating arrival paths between sound sources and listeners.
Convention Paper 8696 (Purchase now)

P3-7 Simulating an Asymmetrically Saturated Nonlinearity Using an LNLNL Cascade—Keun Sup Lee, DTS, Inc. - Los Gatos, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
The modeling of a weakly nonlinear system having an asymmetric saturating nonlinearity is considered, and a computationally efficient model is proposed. The nonlinear model is the cascade of linear filters and memoryless nonlinearities, an LNLNL system. The two nonlinearities are upward and downward saturators, limiting, respectively, the amplitude of their input for either positive or negative excursions. In this way, distortion noted in each half an input sinusoid can be separately controlled. This simple model is applied toy simulating the signal chain of the Echoplex EP-4 tape delay, where informal listening tests showed excellent agreement between recorded and simulated program material.
Convention Paper 8697 (Purchase now)

P3-8 Coefficient Interpolation for the Max Mathews Phasor Filter—Dana Massie, Audience, Inc. - Mountain View, CA, USA
Max Mathews described what he named the “phasor filter,” which is a flexible building block for computer music, with many desirable properties. It can be used as an oscillator or a filter, or a hybrid of both. There exist analysis methods to derive synthesis parameters for filter banks based on the phasor filter, for percussive sounds. The phasor filter can be viewed as a complex multiply, or as a rotation and scaling of a 2-element vector, or as a real valued MIMO (multiple-input, multiple-output) 2nd order filter with excellent numeric properties (low noise gain). In addition, it has been proven that the phasor filter is unconditionally stable under time varying parameter modifications, which is not true of many common filter topologies. A disadvantage of the phasor filter is the cost of calculating the coefficients, which requires a sine and cosine in the general case. If pre-calculated coefficients are interpolated using linear interpolation, then the poles follow a trajectory that causes the filter to lose resonance. A method is described to interpolate coefficients using a complex multiplication that preserves the filter resonance.
Convention Paper 8698 (Purchase now)

P3-9 The Dynamic Redistribution of Spectral Energies for Upmixing and Re-Animation of Recorded Audio—Christopher J. Keyes, Hong Kong Baptist University - Kowloon, Hong Kong
This paper details a novel approach to upmixing any n channels of audio to any arbitrary n+ channels of audio using frequency-domain processing to dynamically redistribute spectral energies across however many channels of audio are available. Although primarily an upmixing technique, the process may also help the recorded audio regain the sense of “liveliness” that one encounters in concerts of acoustic music, partially mimicking the effects of sound spectra being redistributed throughout a hall due to the dynamically changing radiation patterns of the instruments and the movements of the instruments themselves, during performance and recording. Preliminary listening tests reveal listeners prefer this technique 3 to 1 over a more standard upmixing technique.
Convention Paper 8699 (Purchase now)

P3-10 Matching Artificial Reverb Settings to Unknown Room Recordings: A Recommendation System for Reverb Plugins—Nils Peters, International Computer Science Institute - Berkeley, CA, USA; University of California Berkeley - Berkeley, CA, USA; Jaeyoung Choi, International Computer Science Institute - Berkeley, CA, USA; Howard Lei, International Computer Science Institute - Berkeley, CA, USA
For creating artificial room impressions, numerous reverb plugins exist and are often controllable by many parameters. To efficiently create a desired room impression, the sound engineer must be familiar with all the available reverb setting possibilities. Although plugins are usually equipped with many factory presets for exploring available reverb options, it is a time-consuming learning process to find the ideal reverb settings to create the desired room impression, especially if various reverberation plugins are available. For creating a desired room impression based on a reference audio sample, we present a method to automatically determine the best matching reverb preset across different reverb plugins. Our method uses a supervised machine-learning approach and can dramatically reduce the time spent on the reverb selection process.
Convention Paper 8700 (Purchase now)

P4 - Audio in Education

Friday, October 26, 10:30 am — 11:30 am (Room 122)

Chair:
Jason Corey, University of Michigan - Ann Arbor, MI, USA

P4-1 Teaching Audio Processing Software Development in the Web Browser—Matthias Birkenstock, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Web-based learning progresses from lectures and videos to hands-on software development problems that are to be solved interactively in the browser. This work looks into the technical underpinnings required and available to support teaching the mathematics of audio processing in this fashion. All intensive computations are to happen on the client to minimize the amount of data transfer and the computational load on the server. This setup requires editing source code and executing it in the browser, dealing with the audio computation in the browser, playing back and visualizing computed waveforms. Validating the user’s solution cannot be done by a sample-by-sample comparison with a “correct” result, but requires a tolerant comparison based on psychoacoustic features.
Convention Paper 8701 (Purchase now)

P4-2 Distance Learning Strategies for Sound Recording Technology—Duncan Williams, University of Plymouth - Plymouth, UK
This paper addresses the design of a full credit remote access module as part of an undergraduate degree course in Sound Recording Technology at a public university in Texas. While audio engineering has been historically regarded by industry, and to a certain extent the corresponding educational sector, as a vocational skill, and as such, one that must be learned in practice, the client university required that all content be delivered, facilitated, and assessed entirely electronically—a challenge that necessitated a number of particular pedagogical approaches. This work focuses on the advantages and disadvantages of such a system for technical and vocational content delivery in practice.
Convention Paper 8703 (Purchase now)

P5 - Measurement and Models

Friday, October 26, 2:00 pm — 6:00 pm (Room 121)

Chair:
Louis Fielder, Dolby - San Francisco, CA, USA

P5-1 Measurement of Harmonic Distortion Audibility Using a Simplified Psychoacoustic Model—Steve Temme, Listen, Inc. - Boston, MA, USA; Pascal Brunet, Listen, Inc. - Boston, MA, USA; Parastoo Qarabaqi, Listen, Inc. - Boston, MA, USA
A perceptual method is proposed for measuring harmonic distortion audibility. This method is similar to the CLEAR (Cepstral Loudness Enhanced Algorithm for Rub & buzz) algorithm previously proposed by the authors as a means of detecting audible Rub & Buzz, which is an extreme type of distortion [1,2]. Both methods are based on the Perceptual Evaluation of Audio Quality (PEAQ) standard [3]. In the present work, in order to estimate the audibility of regular harmonic distortion, additional psychoacoustic variables are added to the CLEAR algorithm. These variables are then combined using an artificial neural network approach to derive a metric that is indicative of the overall audible harmonic distortion. Experimental results on headphones are presented to justify the accuracy of the model.
Convention Paper 8704 (Purchase now)

P5-2 Overview and Comparison of and Guide to Audio Measurement Methods—Gregor Schmidle, NTi Audio AG - Schaan, Liechtenstein; Danilo Zanatta, NTi Audio AG - Schaan, Liechtenstein
Modern audio analyzers offer a large number of measurement functions using various measurement methods. This paper categorizes measurement methods from several perspectives. The underlying signal processing concepts, as well as strengths and weaknesses of the most popular methods are listed and assessed for various aspects. The reader is offered guidance for choosing the optimal measurement method based on the specific requirements and application.
Convention Paper 8705 (Purchase now)

P5-3 Spherical Sound Source for Acoustic Measurements—Plamen Valtchev, Univox - Sofia, Bulgaria; Dimitar Dimitrov, BMS Production; Rumen Artarski, Thrax - Sofia, Bulgaria
A spherical sound source for acoustic measurements is proposed, consisting of a pair of coaxial loudspeakers and a pair of compression drivers radiating into a common radially expanding horn in full 360-degree horizontal plane. This horn’s vertical radiation pattern is defined by the enclosures of the LF arrangement. The LF membranes radiate spherically the 50 to 500 Hz band, whereas their HF components complete the horizontal horn reference ellipsoid-like diagram in both vertical directions to a spherical one. The assembly has axial symmetry, thus perfect horizontal polar pattern. The vertical pattern is well within ISO 3382 specifications, even without any “gliding.” Comparative measurements against a purposely built typical dodecahedron revealed superior directivity, sound power capability, and distortion performance.
Convention Paper 8706 (Purchase now)

P5-4 Low Frequency Noise Reduction by Synchronous Averaging under Asynchronous Measurement System in Real Sound Field—Takuma Suzuki, Etani Electronics Co., Ltd. - Ohta-ku, Tokyo, Japan; Hiroshi Koide, Etani Electronics Co., Ltd. - Ohta-ku, Tokyo, Japan; Akihiko Shoji, Etani Electronics Co., Ltd. - Ohta-ku, Tokyo, Japan; Kouichi Tsuchiya, Etani Electronics Co., Ltd. - Ohta-ku, Tokyo, Japan; Tomohiko Endo, Etani Electronics Co., Ltd. - Ohta-ku, Tokyo, Japan; Shokichiro Hino, Etani Electronics Co. Ltd - Ohta-ku, Tokyo, Japan
An important feature in synchronous averaging is the synchronization of sampling clock between the transmitting and receiving devices (e.g., D/A and A/D converters). However, in the case where the devices are placed apart, synchronization becomes difficult to gain. For such circumstances, an effective method is proposed that enables synchronization for an asynchronous measurement environment. Normally, a swept-sine is employed as a measuring signal but because its power spectrum is flat, the signal-to-noise ratio (SNR) is decreased in a real environment with high levels of low frequency noise. To solve this, the devised method adopts the means of “enhancing the signal source power in low frequencies” and “placing random fluctuations in the repetitive period of signal source.” Subsequently, its practicability was verified.
Convention Paper 8707 (Purchase now)

P5-5 Measurement and Analysis of the Spectral Directivity of an Electric Guitar Amplifier: Vertical Plane—Agnieszka Roginska, New York University - New York, NY, USA; Justin Mathew, New York University - New York, NY, USA; Andrew Madden, New York University - New York, NY, USA; Jim Anderson, New York University - New York, NY, USA; Alex U. Case, fermata audio + acoustics - Portsmouth, NH, USA; University of Massachusetts—Lowell - Lowell, MA, USA
Previous work presented the radiation pattern measurement of an electric guitar amplifier densely sampled spatially on a 3-D grid. Results were presented of the directionally dependent spectral features on-axis with the driver, as a function of left/right position, and distance. This paper examines the directionally dependent features of the amplifier measured at the center of the amplifier, in relationship to the height and distance placement of the microphone. Differences between acoustically measured and estimated frequency responses are used to study the change in the acoustic field. This work results in a better understanding of the spectral directivity of the electric guitar amplifier in all three planes.
Convention Paper 8708 (Purchase now)

P5-6 The Radiation Characteristics of a Horizontally Asymmetrical Waveguide that Utilizes a Continuous Arc Diffraction Slot—Soichiro Hayashi, Bose Corporation - Framingham, MA, USA; Akira Mochimaru, Bose Corporation - Framingham, MA, USA; Paul F. Fidlin, Bose Corporation - Framinham, MA, USA
One of the unique requirements for sound reinforcement speaker systems is the need for flexible coverage control—sometimes this requires an asymmetrical pattern. Vertical control can be achieved by arraying sound sources, but in the horizontal plane, a horizontally asymmetrical waveguide may be the best solution. In this paper the radiation characteristics of horizontally asymmetrical waveguides with continuous arc diffraction slots are discussed. Waveguides with several different angular variations are developed and their radiation characteristics are measured. Symmetrical and asymmetrical waveguides are compared, and the controllable frequency range and limitations are discussed.
Convention Paper 8709 (Purchase now)

P5-7 Analysis on Multiple Scattering between the Rigid-Spherical Microphone Array and Nearby Surface in Sound Field Recording—Guangzheng Yu, South China University of Technology - Guangzhou, Guangdong, China; Bo-sun Xie, South China University of Technology - Guangzhou, China; Yu Liu, South China University of Technology - Guangzhou, China
The sound field recording with a rigid spherical microphone array (RSMA) is a newly developed technique. In room sound field recording, when an RSMA is close to a reflective surface, such as the wall or floor, the multiple scattering between the RSMA and the surface occurs and accordingly causes the error in the recorded signals. Based on the mirror-image principle of acoustics, an equivalent two-sphere model is suggested, and the multipole expansion method is applied to analyze the multiple scattering between the RSMA and reflective surface. Using an RSMA with 50 microphones the relationships among the error in RSMA output signals caused by multiple scattering and frequency, direction of incident plane wave, and distance of RSMA relative to reflective surface are analyzed.
Convention Paper 8710 (Purchase now)

P5-8 Calibration of Soundfield Microphones Using the Diffuse-Field Response—Aaron Heller, SRI International - Menlo Park, CA, USA; Eric M. Benjamin, Surround Research - Pacifica, CA, USA
The soundfield microphone utilizes an array of microphones to derive various components of the sound field to be recorded or measured. Given that at high frequencies the response varies with the angle of incidence, it may be argued that any angle of incidence is as important as another, and thus it is important to achieve a calibration that achieves an optimum perceived response characteristic. Gerzon noted that “Above a limiting frequency F ˜ c/(pi r) [. . .] it is found best to equalise the nominal omni and figure-of-eight outputs for an approximately flat response to homogeneous random sound fields.” In practice, however, soundfield microphones have been calibrated to realize a flat axial response. The present work explores the theoretical ramifications of using a diffuse-field equalization target as opposed to a free-field equalization target and provides two practical examples of diffuse-field equalization of tetrahedral microphone arrays.
Convention Paper 8711 (Purchase now)

P6 - Spatial Audio Over Loudspeakers

Friday, October 26, 2:00 pm — 6:00 pm (Room 122)

Chair:
Rhonda Wilson, Dolby Laboratories - San Francisco, CA, USA

P6-1 Higher Order Loudspeakers for Improved Surround Sound Reproduction in Rooms—Mark A. Poletti, Industrial Research Limited - Lower Hutt, New Zealand; Terence Betlehem, Industrial Research Limited - Lower Hutt, New Zealand; Thushara Abhayapala, Australian National University - Canberra, ACT, Australia
Holographic surround sound systems aim to accurately reproduce a recorded field in a small region of space around one or more listeners. This is possible at low frequencies with well-matched loudspeakers and acoustically treated rooms. At high frequencies the region of accurate reproduction shrinks and source localization is compromised. Furthermore, in typical rooms rooms reflections compromise quality. High quality reproduction therefore requires large numbers of loudspeakers and the use of techniques to reduce unwanted reverberation. This paper considers the use of higher-order loudspeakers that have multiple modes of radiation to offer an extended frequency range and zone of accurate reproduction. In addition, if a higher-order microphone is used for calibration, room effects can be effectively removed.
Convention Paper 8712 (Purchase now)

P6-2 A Model for Rendering Stereo Signals in the ITD-Range of Hearing—Siegfried Linkwitz, Linkwitz Lab - Corte Madera, CA, USA
Live sounds at a concert have spatial relationships to each other and to their environment. The specific microphone technique used for recording the sounds, the placement and directional properties of the playback loudspeakers, and the room’s response determine the signals at the listener’s ears and thus the rendering of the concert recording. For the frequency range, in which Inter-aural Time Differences dominate directional hearing, a free-field transmission line model will be used to predict the placement of phantom sources between two loudspeakers. Level panning and time panning of monaural sources are investigated. Effectiveness and limitations of different microphone pairs are shown. Recording techniques can be improved by recognizing fundamental requirements for spatial rendering. Observations from a novel 4-loudspeaker setup are presented. It provides enhanced spatial rendering of 2-channel sound.
Convention Paper 8713 (Purchase now)

P6-3 A Method for Reproducing Frontal Sound Field of 22.2 Multichannel Sound Utilizing a Loudspeaker Array Frame—Hiroyuki Okubo, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Satoshi Oishi, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
NHK has been developing Super Hi-Vision (SHV), an ultrahigh-definition TV system that has a 7,680 x 4,320 pixel video image and a 22.2 multichannel three-dimensional sound system. A loudspeaker array frame (LAF) integrated into a flat panel display can synthesize wavefront of frontal sound source and localize sound images on the display and back of the viewer by using technology to simulate sound propagation characteristics. This makes it possible to listen to 22.2 multichannel sound without installing 24 discrete loudspeakers surrounding the listener in the room. In this paper we describe the prototype of the LAF and its performance focusing on frontal sound reproduction.
Convention Paper 8714 (Purchase now)

P6-4 Low-Frequency Temporal Accuracy of Small-Room Sound Reproduction—Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, CO4 3SQ, UK
Small-room sound reproduction is strongly affected by room-modes in the low-frequency band. While the spectral impact of room-modes is well understood, there is less information on how modes degrade the spatiotemporal response of a sound reproduction system. This topic is investigated using a bespoke finite-difference time-domain (FDTD) simulation toolbox to virtually test common subwoofer configurations using tone bursts to judge waveform fidelity over a wide listening area. Temporal accuracy is compared to the steady-state frequency response to determine any link between the two domains. The simulated results are compared to practical measurements for validation.
Convention Paper 8715 (Purchase now)

P6-5 Experiments of Sound Field Reproduction Inside Aircraft Cabin Mock-Up—Philippe-Aubert Gauthier, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada; Cédric Camier, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada; Felix A. Lebel, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Y. Pasco, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada; Alain Berry, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada
Sound environment reproduction of various flight conditions in aircraft mock-ups is a valuable tool for the study, prediction, demonstration, and jury testing of interior aircraft sound quality and comfort. To provide a faithful reproduced sound environment, time, frequency, and spatial characteristics should be preserved. Physical sound field reproduction approaches for spatial sound reproduction are mandatory to immerse the listener body in the proper sound field so that localization cues are recreated. Vehicle mock-ups pose specific problems for sound field reproduction. Confined spaces, needs for invisible sound sources, and a singular acoustical environment make the use of open-loop sound field reproduction technologies not ideal. In this paper preliminary experiments in an aircraft mock-up with classical multichannel least-square methods are reported. The paper presents objective evaluations of reproduced sound fields. Promising results along with practical compromises are reported.
Convention Paper 8716 (Purchase now)

P6-6 Wave Field Synthesis with Primary Source Correction: Theory, Simulation Results, and Comparison to Earlier Approaches—Florian Völk, Technische Universitaet Muenchen - München, Germany; Hugo Fastl, Technical University of Munich - Munich, Germany
Wave field synthesis (WFS) with primary source correction (PSC) extends earlier theoretical derivations by the correct synthesis of primary point sources at a reference point. In this paper the theory of WFS with PSC is revised with respect to other derivations, extended for the application to focus points and validated by numerical simulation. A comparison to earlier approaches to WFS concludes the paper.
Convention Paper 8717 (Purchase now)

P6-7 Limitations of Point-Source Sub-Woofer Array Models for Live Sound—Ambrose Thompson, Martin Audio - High Wycombe, UK; Josebaitor Luzarraga Iturrioz, Martin Audio - High Wycombe, UK; Phil Anthony, Martin Audio - High Wycombe, UK
This paper examines the validity of applying simple models to the kind of highly configurable, low frequency arrays typically used for live sound. Measurements were performed on a single full-sized touring sub-woofer array element at different positions within a number of different array configurations. It was discovered that radiation was rarely omnidirectional and in some cases more than 20 dB from being so. Additionally, the in-situ polar response significantly differed from that obtained with the cabinet in isolation, the degree of difference (2–10 dB) was strongly dependent on array type and element position. For compact arrays we demonstrate, via the application of the “acoustic center” concept, that even when elemental radiation approaches omnidirectional behavior some array configurations are particularly susceptible to errors arising from commonly applied assumptions.
Convention Paper 8718 (Purchase now)

P6-8 Improved Methods for Generating Focused Sources Using Circular Arrays—Mark A. Poletti, Industrial Research Limited - Lower Hutt, New Zealand
Circular loudspeaker arrays allow the reproduction of 2-D sound fields due to sources outside the loudspeaker radius. Sources inside the array can be approximated by focusing sound from a subset of the loudspeakers to a point. The resulting sound field produces the required divergence of wave fronts in a half-space beyond the focus point. This paper presents two new methods for generating focused sources that produce lower errors than previous approaches. The first derives an optimum window for limiting the range of active loudspeakers by matching the field to that of a monopole inside the source radius. The second applies pressure matching to a monopole source over a region where the wavefronts are diverging.
Convention Paper 8719 (Purchase now)

P7 - Amplifiers, Transducers, and Equipment

Friday, October 26, 3:00 pm — 4:30 pm (Foyer)

P7-1 Evaluation of t_rr Distorting Effects Reduction in DCI-NPC Multilevel Power Amplifiers by Using SiC Diodes and MOSFET Technologies—Vicent Sala, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; Tomas Resano, Jr., UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; MCIA Research Center; Jose Luis Romeral, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; Jose Manuel Moreno, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain
In the last decade, the Power Amplifier applications have used multilevel diode-clamped-inverter or neutral-point-clamped (DCI-NPC) topologies to present very low distortion at high power. In these applications a lot of research has been done in order to reduce the sources of distortion in the DCI-NPC topologies. One of the most important sources of distortion, and less studied, is the reverse recovery time (t_rr) of the clamp diodes and MOSFET parasitic diodes. Today, with the emergence of Silicon Carbide (SiC) technologies, these sources of distortion are minimized. This paper presents a comparative study and evaluation of the distortion generated by different combinations of diodes and MOSFETs with Si and SiC technologies in a DCI-NPC multilevel Power Amplifier in order to reduce the distortions generated by the non-idealities of the semiconductor devices.
Convention Paper 8720 (Purchase now)

P7-2 New Strategy to Minimize Dead-Time Distortion in DCI-NPC Power Amplifiers Using COE-Error Injection—Tomas Resano, Jr., UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; MCIA Research Center; Vicent Sala, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; Jose Luis Romeral, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain; Jose Manuel Moreno, UPC-Universitat Politecnica de Catalunya - Terrassa, Catalunya, Spain
The DCI-NPC topology has become one of the best options to optimize energy efficiency in the world of high power and high quality amplifiers. This can use an analog PWM modulator that is sensitive to generate distortion or error, mainly for two reasons: Carriers Amplitude Error (CAE) and Carriers Offset Error (COE). Other main error and distortion sources in the system is the Dead-Time (td). This is necessary to guarantee the proper operation of the power amplifier stage so that errors and distortions originated by it are unavoidable. This work proposes a negative COE generation to minimize the distortion effects of td. Simulation and experimental results validates this strategy.
Convention Paper 8721 (Purchase now)

P7-3 Further Testing and Newer Methods in Evaluating Amplifiers for Induced Phase and Frequency Modulation via Tones, Amplitude Modulated Signals, and Pulsed Waveforms—Ronald Quan, Ron Quan Designs - Cupertino, CA, USA
This paper will present further investigations from AES Convention Paper 8194 that studied induced FM distortions in audio amplifiers. Amplitude modulated (AM) signals are used for investigating frequency shifts of the AM carrier signal with different modulation frequencies. A square-wave and sine-wave TIM test signal is used to evaluate FM distortions at the fundamental frequency and harmonics of the square-wave. Newer amplifiers are tested for FM distortion with a large level low frequency signal inducing FM distortion on a small level high frequency signal. In particular, amplifiers with low and higher open loop bandwidths are tested for differential phase and FM distortion as the frequency of the large level signal is increased from 1 KHz to 2 KHz.
Convention Paper 8722 (Purchase now)

P7-4 Coupling Lumped and Boundary Element Methods Using Superposition—Joerg Panzer, R&D Team - Salgen, Germany
Both, the Lumped and the Boundary Element Method are powerful tools for simulating electroacoustic systems. Each one can have its preferred domain of application within one system to be modeled. For example the Lumped Element Method is practical for electronics, simple mechanics, and internal acoustics. The Boundary Element Method on the other hand enfolds its strength on acoustic-field calculations, such as diffraction, reflection, and radiation impedance problems. Coupling both methods allows to investigate the total system. This paper describes a method for fully coupling of the rigid body mode of the Lumped to the Boundary Element Method with the help of radiation self- and mutual radiation impedance components using the superposition principle. By this, the coupling approach features the convenient property of a high degree of independence of both domains. For example, one can modify parameters and even, to some extent, change the structure of the lumped-element network without the necessity to resolve the boundary element system. This paper gives the mathematical derivation and a demonstration-example, which compares calculation results versus measurement. In this example electronics and mechanics of the three involved loudspeakers are modeled with the help of the lumped element method. Waveguide, enclosure and radiation is modeled with the boundary element method.
Convention Paper 8723 (Purchase now)

P7-5 Study of the Interaction between Radiating Systems in a Coaxial Loudspeaker—Alejandro Espi, Acústica Beyma - Valencia, Spain; William A. Cárdenas, Sr., University of Alicante - Alicante, Spain; Jose Martinez, Acustica Beyma S.L. - Moncada (Valencia), Spain; Jaime Ramis, University of Alicante - Alicante, Spain; Jesus Carbajo, University of Alicante - Alicante, Spain
In this work the procedure followed to study the interaction between the mid and high frequency radiating systems of a coaxial loudspeaker is explained. For this purpose a numerical Finite Element model was implemented. In order to fit the model, an experimental prototype was built and a set of experimental measurements, electrical impedance, and pressure frequency response in an anechoic plane wave tube among these, were carried out. So as to take into account the displacement dependent nonlinearities, a different input voltage parametric analysis was performed and internal acoustic impedance was computed numerically in the frequency domain for specific phase plug geometries. Through inversely transforming to a time differential equation scheme, a lumped element equivalent circuit to evaluate the mutual acoustic load effect present in this type of acoustic coupled systems was obtained. Additionally, the crossover frequency range was analyzed using the Near Field Acoustic Holography technique.
Convention Paper 8724 (Purchase now)

P7-6 Flexible Acoustic Transducer from Dielectric-Compound Elastomer Film—Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Kazuho Ono, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Hiroyuki Okubo, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Kentaro Nakamura, Tokyo Institute of Technology - Midori-ku, Yokohama, Japan
To increase the sound pressure level of a flexible acoustic transducer from a dielectric elastomer film, this paper proposes compounding various kinds of dielectrics into a polyurethane elastomer, which is the base material of the transducer. The studied dielectric elastomer film utilizes a change in side length derived from the electrostriction for sound generation. The proposed method was conceived from the fact that the amount of dimensional change depends on the relative dielectric constant of the elastomer. Acoustical measurements demonstrated that the proposed method was effective because the sound pressure level increased by 6 dB at the maximum.
Convention Paper 8725 (Purchase now)

P7-7 A Digitally Driven Speaker System Using Direct Digital Spread Spectrum Technology to Reduce EMI Noise—Masayuki Yashiro, Hosei University - Koganei, Tokyo, Japan; Mitsuhiro Iwaide, Hosei University - Koganei, Tokyo, Japan; Akira Yasuda, Hosei University - Koganei, Tokyo, Japan; Michitaka Yoshino, Hosei University - Koganei, Tokyo, Japan; Kazuyki Yokota, Hosei University - Koganei, Tokyo, Japan; Yugo Moriyasu, Hosei University - Koganei, Tokyo, Japan; Kenji Sakuda, Hosei University - Koganei, Tokyo, Japan; Fumiaki Nakashima, Hosei University - Koganei, Tokyo, Japan
In this paper a novel digital direct-driven speaker for reducing electromagnetic interference incorporating a spread spectrum clock generator is proposed. The driving signal of a loudspeaker, which has a large spectrum at specific frequency, interferes with nearby equipment because the driving signal emits electromagnetic waves. The proposed method changes two clock frequencies according to the clock selection signal generated by a pseudo-noise circuit. The noise performance deterioration caused by the clock frequency switching can be reduced by the proposed modified delta-sigma modulator, which changes coefficients of the DSM according to the width of the clock period. The proposed method can reduce out-of-band noise by 10 dB compared to the conventional method.
Convention Paper 8726 (Purchase now)

P7-8 Automatic Speaker Delay Adjustment System Using Wireless Audio Capability of ZigBee Networks—Jaeho Choi, Seoul National University - Seoul, Korea; Myoung woo Nam, Seoul National University - Seoul, Korea; Kyogu Lee, Seoul National University - Seoul, Korea
IEEE 802.15.4 (ZigBee) standard is a low data rate, low power consumption, low cost, flexible network system that uses wireless networking protocol for automation and remote control applications. This paper applied these characteristics on the wireless speaker delay compensation system in a large venue (over 500-seat hall). Traditionally delay adjustment has been manually done by sound engineers, but our suggested system will be able to analyze delayed-sound of front speaker to rear speaker automatically and apply appropriate delay time to rear speakers. This paper investigates the feasibility of adjusting the wireless speaker delay over the above-mentioned ZigBee network. We present an implementation of a ZigBee audio transmision and LBS (Location-Based Service) application that allows to calculation a speaker delay time.
Convention Paper 8727 (Purchase now)

P7-9 A Second-Order Soundfield Microphone with Improved Polar Pattern Shape—Eric M. Benjamin, Surround Research - Pacifica, CA, USA
The soundfield microphone is a compact tetrahedral array of four figure-of-eight microphones yielding four coincident virtual microphones; one omnidirectional and three orthogonal pressure gradient microphones. As described by Gerzon, above a limiting frequency approximated by fc = pc/r, the virtual microphones become progressively contaminated by higher-order spherical harmonics. To improve the high-frequency performance, either the array size must be substantially reduced or a new array geometry must be found. In the present work an array having nominally octahedral geometry is described. It samples the spherical harmonics in a natural way and yields horizontal virtual microphones up to second order having excellent horizontal polar patterns up to 20 kHz.
Convention Paper 8728 (Purchase now)

P7-10 Period Deviation Tolerance Templates: A Novel Approach to Evaluation and Specification of Self-Synchronizing Audio Converters—Francis Legray, Dolphin Integration - Meylan, France; Thierry Heeb, Digimath - Sainte-Croix, Switzerland; SUPSI, ICIMSI - Manno, Switzerland; Sebastien Genevey, Dolphin Integration - Meylan, France; Hugo Kuo, Dolphin Integration - Meylan, France
Self-synchronizing converters represent an elegant and cost effective solution for audio functionality integration into SoC (System-on-Chip) as they integrate both conversion and clock synchronization functionalities. Audio performance of such converters is, however, very dependent on the jitter rejection capabilities of the synchronization system. A methodology based on two period deviation tolerance templates is described for evaluating such synchronization solutions, prior to any silicon measurements. It is also a unique way for specifying expected performance of a synchronization system in the presence of jitter on the audio interface. The proposed methodology is applied to a self-synchronizing audio converter and its advantages are illustrated by both simulation and measurement results.
Convention Paper 8729 (Purchase now)

P7-11 Loudspeaker Localization Based on Audio Watermarking—Florian Kolbeck, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Iwona Sobieraj, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Bliem, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Localizing the positions of loudspeakers can be useful in a variety of applications, above all the calibration of a home theater setup. For this aim, several existing approaches employ a microphone array and specifically designed signals to be played back by the loudspeakers, such as sine sweeps or maximum length sequences. While these systems achieve good localization accuracy, they are unsuitable for those applications in which the end-user should not be made aware that the localization is taking place. This contribution proposes a system that fulfills these requirements by employing an inaudible watermark to carry out the localization. The watermark is specifically designed to work in reverberant environments. Results from realistic simulations confirm the practicability of the proposed system.
Convention Paper 8730 (Purchase now)

P8 - Emerging Audio Technologies

Saturday, October 27, 9:00 am — 12:30 pm (Room 121)

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P8-1 A Method for Enhancement of Background Sounds in Forensic Audio Recordings—Robert C. Maher, Montana State University - Bozeman, MT, USA
A method for suppressing speech while retaining background sound is presented in this paper. The procedure is useful for audio forensics investigations in which a strong foreground sound source or conversation obscures subtle background sounds or utterances that may be important to the investigation. The procedure uses a sinusoidal speech model to represent the strong foreground signal and then performs a synchronous subtraction to isolate the background sounds that are not well-modeled as part of the speech signal, thereby enhancing the audibility of the background material.
Convention Paper 8731 (Purchase now)

P8-2 Transient Room Acoustics Using a 2.5 Dimensional Approach—Patrick Macey, Pacsys Ltd. - Nottingham, UK
Cavity modes of a finite acoustic domain with rigid boundaries can be used to compute the transient response for a point source excitation. Previous work, considering steady state analysis, showed that for a room of constant height the 3-D modes can be computed very rapidly by computing the 2-D cross section modes. An alternative to a transient modal approach is suggested, using a trigonometric expansion of the pressure through the height. Both methods are much faster than 3-D FEM but the trigonometric series approach is more easily able to include realistic damping. The accuracy of approximating an “almost constant height” room to be constant height is investigated by example.
Convention Paper 8732 (Purchase now)

P8-3 Multimodal Information Management: Evaluation of Auditory and Haptic Cues for NextGen Communication Displays—Durand Begault, Human Systems Integration Division, NASA Ames Research Center - Moffett Field, CA, USA; Rachel M. Bittner, New York University - New York, NY, USA; Mark R. Anderson, Dell Systems, NASA Ames Research Center - Moffett Field, CA, USA
Auditory communication displays within the NextGen data link system may use multiple synthetic speech messages replacing traditional air traffic control and company communications. The design of an interface for selecting among multiple incoming messages can impact both performance (time to select, audit, and release a message) and preference. Two design factors were evaluated: physical pressure-sensitive switches versus flat panel “virtual switches,” and the presence or absence of auditory feedback from switch contact. Performance with stimuli using physical switches was 1.2 s faster than virtual switches (2.0 s vs. 3.2 s); auditory feedback provided a 0.54 s performance advantage (2.33 s vs. 2.87 s). There was no interaction between these variables. Preference data were highly correlated with performance.
Convention Paper 8733 (Purchase now)

P8-4 Prototype Spatial Auditory Display for Remote Planetary Exploration—Elizabeth M. Wenzel, NASA-Ames Research Center - Moffett Field, CA, USA; Martine Godfroy, NASA Ames Research Center - Moffett Field, CA, USA; San Jose State University Foundation; Joel D. Miller, Dell Systems, NASA Ames Research Center - Moffett Field, CA, USA
During Extra-Vehicular Activities (EVA), astronauts must maintain situational awareness (SA) of a number of spatially distributed "entities" such as other team members (human and robotic), rovers, and a lander/habitat or other safe havens. These entities are often outside the immediate field of view and visual resources are needed for other task demands. Recent work at NASA Ames has focused on experimental evaluation of a spatial audio augmented-reality display for tele-robotic planetary exploration on Mars. Studies compared response time and accuracy performance with different types of displays for aiding orientation during exploration: a spatial auditory orientation aid, a 2-D visual orientation aid, and a combined auditory-visual orientation aid under a number of degraded vs. nondegraded visual conditions. The data support the hypothesis that the presence of spatial auditory cueing enhances performance compared to a 2-D visual aid, particularly under degraded visual conditions.
Convention Paper 8734 (Purchase now)

P8-5 The Influence of 2-D and 3-D Video Playback on the Perceived Quality of Spatial Audio Rendering for Headphones—Amir Iljazovic, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Florian Leschka, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bernhard Neugebauer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Algorithms for processing of spatial audio are becoming more attractive for practical applications as multichannel formats and processing power on playback devices enable more advanced rendering techniques. In this study the influence of the visual context on the perceived audio quality is investigated. Three groups of 15 listeners are presented to audio-only, audio with 2-D video, and audio with 3-D video content. The 5.1 channel audio material is processed for headphones using different commercial spatial rendering techniques. Results indicate that a preference for spatial audio processing over a downmix to conventional stereo can be shown with the effect being larger in the presence of 3-D video content. Also, the influence of video on perceived audio quality is significant for 2-D and 3-D video presentation.
Convention Paper 8735 (Purchase now)

P8-6 An Autonomous System for Multitrack Stereo Pan Positioning—Stuart Mansbridge, Queen Mary University of London - London, UK; Saorise Finn, Queen Mary University of London - London, UK; Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
A real-time system for automating stereo panning positions for a multitrack mix is presented. Real-time feature extraction of loudness and frequency content, constrained rules, and cross-adaptive processing are used to emulate the decisions of a sound engineer, and pan positions are updated continuously to provide spectral and spatial balance with changes in the active tracks. As such, the system is designed to be highly versatile and suitable for a wide number of applications, including both live sound and post-production. A real-time, multitrack C++ VST plug-in version has been developed. A detailed evaluation of the system is given, where formal listening tests compare the system against professional and amateur mixes from a variety of genres.
Convention Paper 8736 (Purchase now)

P8-7 DReaM: A Novel System for Joint Source Separation and Multitrack Coding—Sylvain Marchand, University of Western Brittany - Brest, France; Roland Badeau, Telecom ParisTech - Paris, France; Cléo Baras, GIPSA-Lab - Grenoble, France; Laurent Daudet, University Paris Diderot - Paris, France; Dominique Fourer, University Bordeaux - Talence, France; Laurent Girin, GIPSA-Lab - Grenoble, France; Stanislaw Gorlow, University of Bordeaux - Talence, France; Antoine Liutkus, Telecom ParisTech - Paris, France; Jonathan Pinel, GIPSA-Lab - Grenoble, France; Gaël Richard, Telecom ParisTech - Paris, France; Nicolas Sturmel, GIPSA-Lab - Grenoble, France; Shuhua Zang, GIPSA-Lab - Grenoble, France
Active listening consists in interacting with the music playing, has numerous applications from pedagogy to gaming, and involves advanced remixing processes such as generalized karaoke or respatialization. To get this new freedom, one might use the individual tracks that compose the mix. While multitrack formats lose backward compatibility with popular stereo formats and increase the file size, classic source separation from the stereo mix is not of sufficient quality. We propose a coder / decoder scheme for informed source separation. The coder determines the information necessary to recover the tracks and embeds it inaudibly in the mix, which is stereo and has a size comparable to the original. The decoder enhances the source separation with this information, enabling active listening.
Convention Paper 8737 (Purchase now)

P9 - Auditory Perception

Saturday, October 27, 9:00 am — 12:00 pm (Room 122)

Chair:
Scott Norcross, Dolby Laboratories - San Francisco, CA, USA

P9-1 Subjective Evaluation of Personalized Equalization Curves in Music—Weidong Shen, The Institute of Otolaryngology, Department of Otolaryngology, PLA Genera Hospital - Beijing, China; Tiffany Chua, University of California, Irvine - Irvine, CA, USA; Kelly Reavis, Portland Veterans Affairs Medical Center - Portland, OR, USA; Hongmei Xia, Hubei Zhong Shan Hospital - Wuhan, Hubei, China; Duo Zhang, MaxLinear Inc. - Carlsbad, CA, USA; Gerald A. Maguire, University of California, Irvine - Irvine, CA, USA; David Franklin, University of California, Irvine - Irvine, CA, USA; Vincent Liu, Logitech - Irvine, CA, USA; Wei Hou, Huawei Technologies Co., Ltd. - Shenzhen, Guangdong, China; Hung Tran, AuralWare LLC - Rancho Santa Margarita, CA, USA
This paper investigated the subjective quality of equalized music in which equalization (EQ) curves were tailored to the individuals’ preferences. Listeners subjectively rated a number of pre-selected psychoacoustic-based EQ curves over three test sessions. The personalized EQ curve was the curve that had the highest rating among the pool of pre-selected equalization curves. Listeners were instructed to rate music quality according to the ITU-R BS 1284 scale. Statistical analysis showed that listeners consistently rated music to which personalized EQ curves were applied significantly higher than the original CD-quality music.
Convention Paper 8738 (Purchase now)

P9-2 Thresholds for the Discrimination of Tonal and Narrowband Noise Bursts—Armin Taghipour, International AudioLabs Erlangen - Erlangen, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany; Masoumeh Amirpour, International Audiolabs Erlangen - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Several psychoacoustic models used in perceptual audio coding take into account the difference in masking effects of tones and narrowband noise and therefore incorporate some kind of tonality estimation. For possible optimization of these estimators for time-varying signals it is desirable to know the duration above which the auditory system is able to discriminate between tone bursts and narrowband noise bursts. There is some knowledge that this duration is frequency, bandwidth, and loudness dependent, but up to now no systematic studies have been performed. Therefore this paper presents the setup and the results of experiments for determining frequency dependent thresholds.
Convention Paper 8739 (Purchase now)

P9-3 Identification and Evaluation of Target Curves for Headphones—Felix Fleischmann, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Andreas Silzle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Generally, loudspeakers are designed to have a flat frequency response. For headphones there is no consensus about the optimal transfer function and equalization. In this study several equalization strategies were tested on commercially available headphones. The headphones were measured on an artificial head and equalization filters were designed in the frequency domain consisting of two parts: The first part of the filter is specific to each headphone, flattening the magnitude response of the headphone at the entrance of the blocked ear-canal. The second part of the filter is generic target curve for headphones. Different target curves were tested on the three headphones during a formal listening test using binaural signals. A target curve designed by expert listeners comparing loudspeaker with binaural headphone reproduction was preferred.
Convention Paper 8740 (Purchase now)

P9-4 Consistency of Balance Preferences in Three Musical Genres—Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
Balancing the level of different sound sources is the most basic task performed in the process of mixing. While this task forms a basic building block of music mixing, very little research has been conducted to objectively study mixing habits and balance preferences. In this study data is collected from 15 highly-trained subjects performing simple mixing tasks on multiple musical excerpts spanning three musical genres. Balance preference is examined over musical genre, and the results exhibit more narrow variances in balance for certain genres over others.
Convention Paper 8741 (Purchase now)

P9-5 The Effect of Acoustic Environment on Reverberation Level Preference—Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
Reverberation plays a very important role in modern music production. The available literature is minimal concerning the interaction between reverberation preference and the listening environment used during critical balancing tasks. Highly trained subjects are tasked with adding reverberation to a fixed stereo mix in two different environments: a standard studio control room and a highly reflective mix room. Distributions of level preference are shown to be narrower for more reflective mixing environments, and the mean level is below those set in a less reverberant environment.
Convention Paper 8742 (Purchase now)

P9-6 Localization of a Virtual Point Source within the Listening Area for Wave Field Synthesis—Hagen Wierstorf, Technische Universität Berlin - Berlin, Germany; Alexander Raake, Technische Universität Berlin - Berlin, Germany; Sascha Spors, Universität Rostock - Rostock, Germany
One of the main advantages of Wave Field Synthesis (WFS) is the existence of an extended listening area contrary to the sweet spot in stereophony. At the moment there is little literature available on the actual localization properties of WFS at different points in the listening area. One reason is the difficulty to place different subjects reliably at different positions. This study systematically investigates the localization performance for WFS at many positions within the listening area. To overcome the difficulty to place subjects, the different listening positions and loudspeaker arrays were simulated by dynamic binaural synthesis. In a pre-study it was verified that this method is suitable to investigate the localization performance in WFS.
Convention Paper 8743 (Purchase now)

P10 - Transducers

Saturday, October 27, 2:00 pm — 6:00 pm (Room 121)

Chair:
Alex Voishvillo, JBL Professional - Northridge, CA, USA

P10-1 The Relationship between Perception and Measurement of Headphone Sound Quality—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA
Double-blind listening tests were performed on six popular circumaural headphones to study the relationship between their perceived sound quality and their acoustical performance. In terms of overall sound quality, the most preferred headphones were perceived to have the most neutral spectral balance with the lowest coloration. When measured on an acoustic coupler, the most preferred headphones produced the smoothest and flattest amplitude response, a response that deviates from the current IEC recommended diffuse-field calibration. The results provide further evidence that the IEC 60268-7 headphone calibration is not optimal for achieving the best sound quality.
Convention Paper 8744 (Purchase now)

P10-2 On the Study of Ionic Microphones—Hiroshi Akino, Audio-Technica Co. - Machida-shi, Tokyo, Japan; Kanagawa Institute of Technology - Kanagawa, Japan; Hirofumi Shimokawa, Kanagawa Institute of Technology - Kanagawa, Japan; Tadashi Kikutani, Audio-Technica U.S., Inc. - Stow, OH, USA; Jackie Green, Audio-Technica U.S., Inc. - Stow, OH, USA
Diaphragm-less ionic loudspeakers using both low-temperature and high-temperature plasma methods have already been studied and developed for practical use. This study examined using similar methods to create a diaphragm-less ionic microphone. Although the low-temperature method was not practical due to high noise levels in the discharges, the high-temperature method exhibited a useful shifting of the oscillation frequency. By performing FM detection on this oscillation frequency shift, audio signals were obtained. Accordingly, an ionic microphone was tested in which the frequency response level using high-temperature plasma increased as the sound wave frequency decreased. Maintaining performance proved difficult as discharges in the air led to wear of the needle electrode tip and adhesion of products of the discharge. Study results showed that the stability of the discharge corresponded to the non-uniform electric field that was dependent on the formation shape of the high-temperature plasma, the shape of the discharge electrode, and the use of inert gas that protected the needle electrode. This paper reviews the experimental outcome of the two ionic methods, and considerations given to resolve the tip and discharge product and stability problems.
Convention Paper 8745 (Purchase now)

P10-3 Midrange Resonant Scattering in Loudspeakers—Juha Backman, Nokia Corporation - Espoo, Finland
One of the significant sources of midrange coloration in loudspeakers is the resonant scattering of the exterior sound field from ports, recesses, or horns. This paper discusses the qualitative behavior of the scattered sound and introduces a computationally efficient model for such scattering, based on waveguide models for the acoustical elements (ports, etc.), and mutual radiation impedance model for their coupling to the sound field generated by the drivers. In the simplest case of driver-port interaction in a direct radiating loudspeaker an approximate analytical expression can be written for the scattered sound. These methods can be applied to numerical optimization of loudspeaker layouts.
Convention Paper 8746 (Purchase now)

P10-4 Long Distance Induction Drive Loud Hailer Characterization—Marshall Buck, Psychotechnology, Inc. - Los Angeles, CA, USA; Wisdom Audio; David Graebener, Wisdom Audio Corporation - Carson City, NV, USA; Ron Sauro, NWAA Labs, Inc. - Elma, WA, USA
Further development of the high power, high efficiency induction drive compression driver when mounted on a tight pattern horn results in a high performance loud hailer. The detailed performance is tested in an independent laboratory with unique capabilities, including indoor frequency response at a distance of 4 meters. Additional characteristics tested include maximum burst output level, polar response, and directivity balloons. Outdoor tests were also performed at distances up to 220 meters and included speech transmission index and frequency response. Plane wave tube driver-phase plug tests were performed to assess incoherence, power compression, efficiency, and frequency response.
Convention Paper 8747 (Purchase now)

P10-5 Optimal Configurations for Subwoofers in Rooms Considering Seat to Seat Variation and Low Frequency Efficiency—Todd Welti, Harman International - Northridge, CA, USA
The placement of subwoofers and listeners in small rooms and the size and shape of the room all have profound influences on the resulting low frequency response. In this study, a computer model was used to investigate a large number of room, seating, and subwoofer configurations. For each configuration simulated, metrics for seat to seat consistency and bass efficiency were calculated and combined in a newly proposed metric, which is intended as an overall figure of merit. The data presented has much practical value in small room design for new rooms, or even for modifying existing configurations.
Convention Paper 8748 (Purchase now)

P10-6 Modeling the Large Signal Behavior of Micro-Speakers—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
The mechanical and acoustical losses considered in the lumped parameter modeling of electro-dynamical transducers may become a dominant source of nonlinear distortion in micro-speakers, tweeters, headphones, and some horn compression drivers where the total quality factor Q_ts is not dominated by the electrical damping realized by a high force factor Bl and a low voice resistance R_e. This paper presents a nonlinear model describing the generation of the distortion and a new dynamic measurement technique for identifying the nonlinear resistance R_ms(v) as a function of voice coil velocity v. The theory and the identification technique are verified by comparing distortion and other nonlinear symptoms measured on micro-speakers as used in cellular phones with the corresponding behavior predicted by the nonlinear model.
Convention Paper 8749 (Purchase now)

P10-7 An Indirect Study of Compliance and Damping in Linear Array Transducers—Richard Little, Far North Electroacoustics - Surrey, BC, Canada
A linear array transducer is a dual-motor, dual-coil, multi-cone, tubularly-shaped transducer whose shape defeats many measurement techniques that can be used to examine directly the force-deflection behavior of its diaphragm suspension system. Instead, the impedance curve of the transducer is compared against theoretical linear models to determine best-fit parameter values. The variation in the value of these parameters with increasing input signal levels is also examined.
Convention Paper 8750 (Purchase now)

P10-8 Bandwidth Extension for Microphone Arrays—Benjamin Bernschütz, Cologne University of Applied Sciences - Cologne, Germany; Technical University of Berlin - Berlin, Germany
Microphone arrays are in the focus of interest for spatial audio recording applications or the analysis of sound fields. But one of the major problems of microphone arrays is the limited operational frequency range. Especially at high frequencies spatial aliasing artifacts tend to disturb the output signal. This severely restricts the applicability and acceptance of microphone arrays in practice. A new approach to enhance the bandwidth of microphone arrays is presented, which is based on some restrictive assumptions concerning natural sound fields, the separate acquisition and treatment of spatiotemporal and spectrotemporal sound field properties, and the subsequent synthesis of array signals for critical frequency bands. Additionally, the method can be used for spatial audio data reduction algorithms.
Convention Paper 8751 (Purchase now)

P11 - Spatial Audio

Saturday, October 27, 2:00 pm — 3:30 pm (Foyer)

P11-1 Blind Upmixing for Height and Wide Channels Based on an Image Source Method—Sunwoong Choi, Yonsei University - Seoul, Korea; Dong-il Hyun, Younsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Seokpil Lee, Korea Electronics Technology Institute (KETI) - Seoul, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
In this paper we present a method of synthesizing the height and wide channel signals for stereo upmx to multichannel format beyond 5.1. To provide an improved envelopment, reflections from ceiling and side walls are considered for the height and wide channel synthesis. Early reflections (ERs) corresponding to the spatial sections covered by the height and wide channel speakers are separately synthesized using the image source method, and the parameters for the ER generation are determined from the primary-to-ambient ratio (PAR) estimated from the stereo signal. Later, the synthesized ERs are mixed with decorrelated ambient signals and transmitted to the respective channels. Subjective listening tests verify that listener envelopment can be improved by using the proposed method.
Convention Paper 8752 (Purchase now)

P11-2 Spatial Sound Design Tool for 22.2 Channel 3-D Audio Productions, with Height—Wieslaw Woszczyk, McGill University - Montreal, QC, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Advanced television and cinema systems utilize multiple loudspeakers distributed in three dimensions potentially allowing sound sources and ambiances to appear anywhere in the 3-D space enveloping the viewers, as is the case in 22.2 channel audio format for Ultra High Definition Television (UHDTV). The paper describes a comprehensive tool developed specifically for designing auditory spaces in 22.2 audio but adaptable to any advanced multi-speaker 3-D sound rendering system. The key design goals are the ease of generating and manipulating ambient environments in 3-D and time code automation for creating dynamic spatial narration. The system uses low-latency convolution of high-resolution room impulse responses contained in the library. User testing and evaluation show that the system’s features and architecture enable fast and effective spatial design in 3-D audio.
Convention Paper 8753 (Purchase now)

P11-3 Efficient Primary-Ambient Decomposition Algorithm for Audio Upmix—Yong-Hyun Baek, Yonsei University - Wonju, Kwangwon-do, Korea; Se-Woon Jeon, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Seokpil Lee, Korea Electronics Technology Institute (KETI) - Seoul, Korea
Decomposition of a stereo signal into the primary and ambient components is a key step to the stereo upmix and it is often based on the principal component analysis (PCA). However, major shortcoming of the PCA-based method is that accuracy of the decomposed components is dependent on both the primary-to-ambient power ratio (PAR) and the panning angle. Previously, a modified PCA was suggested to solve the PAR-dependent problem. However, its performance is still dependent on the panning angle of the primary signal. In this paper we proposed a new PCA-based primary-ambient decomposition algorithm whose performance is not affected by the PAR as well as the panning angle. The proposed algorithm finds scale factors based on a criterion that is set to preserve the powers of the mixed components, so that the original primary and ambient powers are correctly retrieved. Simulation results are presented to show the effectiveness of the proposed algorithm.
Convention Paper 8754 (Purchase now)

P11-4 On the Use of Dynamically Varied Loudspeaker Spacing in Wave Field Synthesis—Rishabh Ranjan, Nanyang Technological University - Singapore, Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore
Wave field synthesis (WFS) has evolved as a promising spatial audio rendering technique in recent years and has been widely accepted as the optimal way of sound reproduction technique. Suppressing spatial aliasing artifacts and accurate reproduction of sound field remain the focal points of research in WFS over the recent years. The use of optimum loudspeaker configuration is necessary to achieve perceptually correct sound field in the listening space. In this paper we analyze the performance of dynamically spaced loudspeaker arrays whose spacing varies with the audio signal frequency content. The proposed technique optimizes the usage of a prearranged set of loudspeaker arrays to avoid spatial aliasing at relatively low frequencies as compared to uniformly fixed array spacing in conventional WFS setups.
Convention Paper 8755 (Purchase now)

P11-5 A Simple and Efficient Method for Real-Time Computation and Transformation of Spherical Harmonic-Based Sound Fields—Robert E. Davis, University of the West of Scotland - Paisley, Scotland, UK; D. Fraser Clark, University of the West of Scotland - Paisley, Scotland, UK
The potential for higher order Ambisonics to be applied to audio applications such as virtual reality, live music, and computer games relies entirely on the real-time performance characteristics of the system, as the computational overhead determines factors of latency and, consequently, user experience. Spherical harmonic functions are used to describe the directional information in an Ambisonic sound field, and as the order of the system is increased, so too is the computational expense, due to the added number of spherical harmonic functions to be calculated. The present paper describes a method for simplified implementation and efficient computation of the spherical harmonic functions and applies the technique to the transformation of encoded sound fields. Comparisons between the new method and typical direct calculation methods are presented.
Convention Paper 8756 (Purchase now)

P11-6 Headphone Virtualization: Improved Localization and Externalization of Non-Individualized HRTFs by Cluster Analysis—Robert P. Tame, DTS, Inc. - Bangor, County Down, UK; QMUL - London, UK; Daniele Barchiese, Queen Mary University of London - London, UK; Anssi Klapuri, Queen Mary University of London - London, UK
Research and experimentation is described that aims to prove the hypothesis that by allowing a listener to choose a single non-individualized profile of HRTFs from a subset of maximally different best representative profiles extracted from a database improved localization, and externalization can be achieved for the listener. k-means cluster analysis of entire impulse responses is used to identify the subset of profiles. Experimentation in a controlled environment shows that test subjects who were offered a choice of a preferred HRTF profile were able to consistently discriminate between a front center or rear center virtualized sound source 78.6% of the time, compared with 64.3% in a second group given an arbitrary HRTF profile. Similar results were obtained from virtualizations in uncontrolled environments.
Convention Paper 8757 (Purchase now)

P11-7 Searching Impulse Response Libraries Using Room Acoustic Descriptors—David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
The ease with which Impulse Response (IR) libraries can be searched is a principal determinant of the usability of a convolution reverberation system. Popular software packages for convolution reverb typically permit searching over metadata that describe how and where an IR was measured, but this "how and where" information often fails to adequately characterize the perceptual properties of the reverberation associated with the IR. This paper explores an alternative approach to IR searching based not on “how and where” descriptors but instead on room acoustics descriptors that are thought to be more perceptually relevant. This alternative approach was compared with more traditional approaches on the basis of a simple IR search task. Results are discussed.
Convention Paper 8758 (Purchase now)

P11-8 HRIR Database with Measured Actual Source Direction Data—Javier Gómez Bolaños, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
A database is presented consisting of head-related impulse responses (HRIR) of 21 subjects measured in an anechoic chamber with simultaneous measurement of head position and orientation. The HRIR data for sound sources at 1.35 m and 68 cm in 240 directions with elevations between ±45 degrees and full azimuth range were measured using the blocked ear canal method. The frequency region of the measured responses ranges from 100 Hz up to 20 kHz for a flat response (+0.1dB / –0.5 dB). This data is accompanied with the measured azimuth and elevation of the source respect to the position and orientation of the subject's head obtained with a tracking system based on infrared cameras. The HRIR data is accessible from the Internet.
Convention Paper 8759 (Purchase now)

P11-9 On the Study of Frontal-Emitter Headphone to Improve 3-D Audio Playback—Kaushik Sunder, Nanyang Technological University - Singapore, Singapore; Ee-Leng Tan, Nanyang Technological University - Singapore, Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore
Virtual audio synthesis and playback through headphones by its virtue have several limitations, such as the front-back confusion and in-head localization of the sound presented to the listener. Use of non-individual head related transfer functions (HRTFs) further increases these front-back confusion and degrades the virtual auditory image. In this paper we present a method for customizing non-individual HRTFs by embedding personal cues using the distinctive morphology of the individual’s ear. In this paper we study the frontal projection of sound using headphones to reduce the front-back confusion in 3-D audio playback. Additional processing blocks, such as decorrelation and front-back biasing are implemented to externalize and control the auditory depth of the frontal image. Subjective tests are conducted using these processing blocks, and its impact to localization is reported in this paper.
Convention Paper 8760 (Purchase now)

P11-10 Kinect Application for a Wave Field Synthesis-Based Reproduction System—Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Paolo Peretti, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Wave field synthesis is a reproduction technique capable to reproduce realistic acoustic image taking advantage of a large number of loudspeakers. In particular, it is possible to reproduce moving sound sources, achieving good performance in terms of sound quality and accuracy. In this context, an efficient application of a wave field synthesis reproduction system is proposed, introducing a Kinect control on the transmitting room, capable to accurately track the source movement and thus preserving the spatial representation of the acoustic scene. The proposed architecture is implemented using a real time framework considering a network connection between the receiving and transmitting room: several tests have been performed to evaluate the realism of the achieved performance.
Convention Paper 8761 (Purchase now)

P12 - Sound Analysis and Synthesis

Saturday, October 27, 2:30 pm — 6:00 pm (Room 122)

Chair:
Jean Laroche, Audience, Inc.

P12-1 Drum Synthesis via Low-Frequency Parametric Modes and Altered Residuals—Haiying Xia, CCRMA, Stanford University - Stanford, CA, USA; Electrical Engineering, Stanford University - Standford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
Techniques are proposed for drum synthesis using a two-band source-filter model. A Butterworth lowpass/highpass band-split is used to separate a recorded “high tom" drum hit into low and high bands. The low band, containing the most salient modes of vibration, is downsampled and Poisson-windowed to accelerate its decay and facilitate mode extraction. A weighted equation-error method is used to fit an all-pole model—the “modal model”—to the first five modes of the low band in the case of the high tom. The modal model is removed from the low band by inverse filtering, and the resulting residual is taken as a starting point for excitation modeling in the low band. For the high band, low-order linear prediction (LP) is used to model the spectral envelope. The bands are resynthesized by feeding the residual signals to their respective all-pole forward filters, upsampling the low band, and summing. The modal model can be modulated to obtain the sound of different drums and other effects. The residuals can be altered to obtain the effects of different striking locations and striker materials.
Convention Paper 8762 (Purchase now)

P12-2 Drum Pattern Humanization Using a Recursive Bayesian Framework—Ryan Stables, Birmingham City University - Birmingham, UK; Cham Athwal, Birmingham City University - Birmingham, UK; Rob Cade, Birmingham City University - Birmingham, UK
In this study we discuss some of the limitations of Gaussian humanization and consider ways in which the articulation patterns exhibited by percussionists can be emulated using a probabilistic model. Prior and likelihood functions are derived from a dataset of professional drummers to create a series of empirical distributions. These are then used to independently modulate the onset locations and amplitudes of a quantized sequence, using a recursive Bayesian framework. Finally, we evaluate the performance of the model against sequences created with a Gaussian humanizer and sequences created with a Hidden Markov Model (HMM) using paired listening tests. We are able to demonstrate that probabilistic models perform better than instantaneous Gaussian models, when evaluated using a 4/4 rock beat at 120 bpm.
Convention Paper 8763 (Purchase now)

P12-3 Procedural Audio Modeling for Particle-Based Environmental Effects—Charles Verron, REVES-INRIA - Sophia-Antipolis, France; George Drettakis, REVES/INRIA Sophia-Antipolis - Sophia-Antipolis, France
We present a sound synthesizer dedicated to particle-based environmental effects, for use in interactive virtual environments. The synthesis engine is based on five physically-inspired basic elements (that we call sound atoms) that can be parameterized and stochastically distributed in time and space. Based on this set of atomic elements, models are presented for reproducing several environmental sound sources. Compared to pre-recorded sound samples, procedural synthesis provides extra flexibility to manipulate and control the sound source properties with physically-inspired parameters. In this paper the controls are used simultaneously to modify particle-based graphical models, resulting in synchronous audio/graphics environmental effects. The approach is illustrated with three models that are commonly used in video games: fire, wind, and rain. The physically-inspired controls simultaneously drive graphical parameters (e.g., distribution of particles, average particles velocity) and sound parameters (e.g., distribution of sound atoms, spectral modifications). The joint audio/graphics control results in a tightly-coupled interaction between the two modalities that enhances the naturalness of the scene.
Convention Paper 8764 (Purchase now)

P12-4 Knowledge Representation Issues in Audio-Related Metadata Model Design—György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
In order for audio applications to interoperate, some agreement on how information is structured and encoded has to be in place within developer and user communities. This agreement can take the form of an industry standard or a widely adapted open framework consisting of conceptual data models expressed using formal description languages. There are several viable approaches to conceptualize audio related metadata, and several ways to describe the conceptual models, as well as encode and exchange information. While emerging standards have already been proven invaluable in audio information management, it remains difficult to design or choose the model that is most appropriate for an application. This paper facilitates this process by providing an overview, focusing on differences in conceptual models underlying audio metadata schemata.
Convention Paper 8765 (Purchase now)

P12-5 High-Level Semantic Metadata for the Control of Multitrack Adaptive Digital Audio Effects—Thomas Wilmering, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
Existing adaptive digital audio effects predominantly use low-level features in order to derive control data. These data do not typically correspond to high-level musicological or semantic information about the content. In order to apply audio transformations selectively on different musical events in a multitrack project, audio engineers and music producers have to resort to manual selection or annotation of the tracks in traditional audio production environments. We propose a new class of audio effects that uses high-level semantic audio features in order to obtain control data for multitrack effects. The metadata is expressed in RDF using several music and audio related Semantic Web ontologies and retrieved using the SPARQL query language.
Convention Paper 8766 (Purchase now)

P12-6 On Accommodating Pitch Variation in Long Term Prediction of Speech and Vocals in Audio Coding—Tejaswi Nanjundaswamy, University of California, Santa Barbara - Santa Barbara, CA, USA; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Exploiting inter-frame redundancies is key to performance enhancement of delay constrained perceptual audio coders. The long term prediction (LTP) tool was introduced in the MPEG Advanced Audio Coding standard, especially for the low delay mode, to capitalize on the periodicity in naturally occurring sounds by identifying a segment of previously reconstructed data as prediction for the current frame. However, speech and vocal content in audio signals is well known to be quasi-periodic and involve small variations in pitch period, which compromise the LTP tool performance. The proposed approach modifies LTP by introducing a single parameter of “geometric” warping, whereby past periodicity is geometrically warped to provide an adjusted prediction for the current samples. We also propose a three-stage parameter estimation technique, where an unwarped LTP filter is first estimated to minimize the mean squared prediction error; then filter parameters are complemented with the warping parameter, and re-estimated within a small neighboring search space to retain the set of S best LTP parameters; and finally, a perceptual distortion-rate procedure is used to select from the S candidates, the parameter set that minimizes the perceptual distortion. Objective and subjective evaluations substantiate the proposed technique’s effectiveness.
Convention Paper 8767 (Purchase now)

P12-7 Parametric Coding of Piano Signals—Michael Schnabel, Ilmenau University of Technology - Ilmenau, Germany; Benjamin Schubert, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer IIS - Erlangen, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany
In this paper an audio coding procedure for piano signals is presented based on a physical model of the piano. Instead of coding the waveform of the signal, the compression is realized by extracting relevant parameters at the encoder. The signal is then re-synthesized at the decoder using the physical model. We describe the development and implementation of algorithms for parameter extraction and the combination of all the components into a coder. A formal listening test was conducted, which shows that we can obtain a high sound quality at a low bit rate, lower than conventional coders. We obtain a bitrate of 11.6 kbps for the proposed piano coder. We use HE-AAC as reference codec at a gross bitrate of 16 kbps. For low and medium chords the proposed piano coder outperforms HE-AAC in terms of subjective quality, while the quality falls below HE-AAC for high chords.
Convention Paper 8768 (Purchase now)

P13 - Auditory Perception and Evaluation

Saturday, October 27, 4:00 pm — 5:30 pm (Foyer)

P13-1 Real-Time Implementation of Glasberg and Moore's Loudness Model for Time-Varying Sounds—Elvira Burdiel, Queen Mary University of London - London, UK; Lasse Vetter, Queen Mary University of London - London, UK; Andrew J. R. Simpson, Queen Mary University of London - London, UK; Michael J. Terrell, Queen Mary University of London - London, UK; Andrew McPherson, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
In this paper, a real-time implementation of the loudness model of Glasbergand Moore [J. Audio Eng. Soc. 50, 331–342 (2002)] for time-varying sounds is presented. This real-time implementation embodies several approximations to the model that are necessary to reduce computational costs, both in the time and frequency domains. A quantitative analysis is given that shows the effect of parametric time and frequency domain approximations by comparison to the loudness predictions of the original model. Using real-world music, both the errors introduced as a function of the optimization parameters and the corresponding reduction in computational costs are quantified. Thus, this work provides an informed, contextual approach to approximation of the loudness model for practical use.
Convention Paper 8769 (Purchase now)

P13-2 Subjective Selection of Head-Related Transfer Functions (HRTF) Based on Spectral Coloration and Interaural Time Differences (ITD) Cues—Kyla McMullen, Clemson University - Clemson, SC USA; Agnieszka Roginska, New York University - New York, NY, USA; Gregory H. Wakefield, University of Michigan - Ann Arbor, MI, USA
The present study describes an HRTF subjective individualization procedure in which a listener selects from a database those HRTFs that pass several perceptual criteria. Earlier work has demonstrated that listeners are as likely to select a database HRTF as their own when judging externalization, elevation, and front/back discriminability. The procedure employed in this original study requires individually measured ITDs. The present study modifies the original procedure so that individually measured ITDs are unnecessary. Specifically, a standardized ITD is used, in place of the listener's ITD, to identify those database minimum-phase HRTFs with desirable perceptual properties. The selection procedure is then repeated for one of the preferred minimum-phase HRTFs and searches over a database of ITDs. Consistent with the original study, listeners prefer a small subset of HRTFs; in contrast, while individual listeners show clear preferences for some ITDs over others, no small subset of ITDs appears to satisfy all listeners.
Convention Paper 8770 (Purchase now)

P13-3 Does Understanding of Test Items Help or Hinder Subjective Assessment of Basic Audio Quality?—Nadja Schinkel-Bielefeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany; Netaya Lotze, Leibniz Universität Hannover - Hannover, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany
During listening tests for subjective evaluation of intermediate audio quality, sometimes test items in various foreign languages are presented. The perception of basic audio quality thereby may vary depending on the listeners' native language. This study investigated the role of understanding in quality assessment employing regular German sentences and sentences consisting of half-automatically generated German-sounding pseudo words. Especially less experienced listeners rated pseudo words slightly higher than German sentences of matching prosody. While references were heard longer for pseudo items, for other conditions they tend to hear German items longer. Though effects of understanding in our study were small, they may play a role in foreign languages that are less understandable than our pseudo sentences and differ in phoneme inventory.
Convention Paper 8771 (Purchase now)

P13-4 Subjective Assessments of Higher Order Ambisonic Sound Systems in Varying Acoustical Conditions—Andrew J. Horsburgh, University of the West of Scotland - Paisley, Scotland, UK; Robert E. Davis, University of the West of Scotland - Paisley, Scotland, UK; Martyn Moffat, University of the West of Scotland - Paisley, Scotland, UK; D. Fraser Clark, University of the West of Scotland - Paisley, Scotland, UK
Results of subjective assessments in source perception using higher order Ambisonics are presented in this paper. Test stimuli include multiple synthetic and naturally recorded sources that have been presented in various horizontal and mixed order Ambisonic listening tests. Using a small group of trained and untrained listening participants, materials were evaluated over various Ambisonic orders, with each scrutinized for localization accuracy, apparent source width and realistic impression. The results show a general preference for 3rd order systems in each of the three test categories: speech, pure tone, and music. Localization results for 7th order show a trend of stable imagery with complex stimuli sources and pure tones in the anechoic environment providing the highest accuracy.
Convention Paper 8772 (Purchase now)

P13-5 A Viewer-Centered Revision of Audiovisual Content Classifiers—Katrien De Moor, Ghent University - Ghent, Belgium; Ulrich Reiter, Norwegian University of Science and Technology - Trondheim, Norway
There is a growing interest in the potential value of content-driven requirements for increasing the perceived quality of audiovisual material and optimizing the underlying performance processes. However, the categorization of content and identification of content-driven requirements is still largely based on technical characteristics. There is a gap in the literature when it comes to including viewer and content related aspects. In this paper we go beyond purely technical features as content classifiers and contribute to the deeper understanding of viewer preferences and requirements. We present results from a qualitative study using semi-structured interviews, aimed at exploring content-driven associates from a bottom-up perspective. The results show that users’ associations, requirements, and expectations differ across different content types, and that these differences should be taken into account when selecting stimulus material for subjective quality assessments. We also relate these results to previous research on content classification.
Convention Paper 8773 (Purchase now)

P13-6 Perception of Time-Varying Signals: Timbre and Phonetic JND of Diphthong—Arthi Subramaniam, Indian Institute of Science - Bangalore, India; Thippur V. Sreenivas, Indian Institute of Science - Bangalore, India
In this paper we propose a linear time-varying model for diphthong synthesis based on linear interpolation of formant frequencies. We, thence, determine the timbre just-noticeable difference (JND) for diphthong /a I/ (as in ‘buy’) with a constant pitch excitation through perception experiment involving four listeners and explore the phonetic JND of the diphthong. Their JND responses are determined using 1-up-3-down procedure. Using the experimental data, we map the timbre JND and phonetic JND onto a 2-D region of percentage change of formant glides. The timbre and phonetic JND contours for constant pitch show that the phonetic JND region encloses timbre JND region and also varies across listeners. The JND is observed to be more sensitive to ending vowel /I/ than starting vowel /a/ in some listeners and dependent on the direction of perturbation of starting and ending vowels.
Convention Paper 8774 (Purchase now)

P13-7 Employing Supercomputing Cluster to Acoustic Noise Map Creation—Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland; Jozef Kotus, Gdansk University of Technology - Gdansk, Poland; Maciej Szczodrak, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
A system is presented for determining acoustic noise distribution and assessing its adverse effects in short time periods inside large urban areas owing to the employment of a supercomputing cluster. A unique feature of the system is the psychoacoustic noise dosimetry implemented to inform interested citizens about predicted auditory fatigue effects that may be caused by the exposure to excessive noise. The noise level computing is based on the engineered Noise Prediction Model (NPM) stemmed from the Harmonoise model. Sound level distribution in the urban area can be viewed by users over the prepared www service. An example of a map is presented in consecutive time periods to show the capability of the supercomputing cluster to update noise level maps frequently.
Convention Paper 8775 (Purchase now)

P13-8 Objective and Subjective Evaluations of Digital Audio Workstation Summing—Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Padraig Buttner-Schnirer, McGill University - Montreal, QC, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Many recording professionals attest to a perceivable difference in sound quality between different digital audio workstations (DAWs), yet there is little in the way of quantifiable evidence to support these claims. To test these assertions, the internal summing of five different DAWs is tested. Multitrack stems are recorded into each DAW and summed to a single, stereo mix. This mix is evaluated objectively in reference to a purely mathematical sum generated in Matlab to avoid any system-specific limitations in the summing process. The stereo sums are also evaluated by highly trained listeners through a three-alternative forced-choice test focusing on three different DAWs. Results indicate that when panning is excluded from the mixing process, minimal objective and subjective differences exist between workstations.
Convention Paper 8776 (Purchase now)

P13-9 Hong Kong Film Score Production: A Hollywood Informed Approach—Robert Jay Ellis-Geiger, City University of Hong Kong - Hong Kong, SAR China
This paper represents a Hollywood-informed approach toward film score production, with special attention given to the impact of dialogue on the recording and mixing of the music score. The author reveals his process for creating a hybrid (real and MIDI) orchestral film score that was recorded, mixed, and produced in Hong Kong for the English language feature film, New York November (2011). The film was shot in New York, directed by Austrian filmmakers Gerhard Fillei and Joachim Krenn in collaboration with film composer and fellow Austrian, Sascha Selke. Additional instruments were remotely recorded in Singapore and the final sound track was mixed at a dubbing theater in Berlin. The author acted as score producer, conductor, co-orchestrator, MIDI arranger, musician, and composer of additional music.
Convention Paper 8777 (Purchase now)

P13-10 Investigation into Electric Vehicles Exterior Noise Generation—Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy; Ferruccio Bettarelli, Leaff Engineering - Ancona, Italy; Ariano Lattanzi, Leaff Engineering - Ancona, Italy
Electric vehicles have been receiving increasing interest in the last years for the well-known benefit that can be derived. However, electric cars do not produce noise as does an internal combustion engine vehicle, thus leading to safety issues for pedestrians and cyclists. Therefore, it is necessary to create an external warning sound for electric cars maintaining users’ sound quality expectation. In this context several sounds generated with different techniques are here proposed, taking into consideration some aspects of real engine characteristics. Furthermore, a subjective investigation is performed in order to define users’ preferences in the wide range of possible synthetic sounds.
Convention Paper 8778 (Purchase now)

P14 - Spatial Audio Over Headphones

Sunday, October 28, 9:00 am — 11:30 am (Room 121)

Chair:
David McGrath, Dolby Australia - McMahons Point, NSW, Australia

P14-1 Preferred Spatial Post-Processing of Popular Stereophonic Music for Headphone Reproduction—Ella Manor, The University of Sydney - Sydney, NSW, Australia; William L. Martens, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia
The spatial imagery experienced when listening to conventional stereophonic music via headphones is considerably different from that experienced in loudspeaker reproduction. While the difference might be reduced when stereophonic program material is spatially processed in order to simulate loudspeaker crosstalk for headphone reproduction, previous listening tests have shown that such processing typically produces results that are not preferred by listeners in comparisons with the original (unprocessed) version of a music program. In this study a double blind test was conducted in which listeners compared five versions of eight programs from a variety of music genres and gave both preference ratings and ensemble stage width (ESW) ratings. Out of four alternative postprocessing algorithms, the outputs that were most preferred resulted from a nearfield crosstalk simulation mimicking low-frequency interaural level differences typical for close-range sources.
Convention Paper 8779 (Purchase now)

P14-2 Interactive 3-D Audio: Enhancing Awareness of Details in Immersive Soundscapes?—Mikkel Schmidt, Technical University of Denmark - Kgs. Lyngby, Denmark; Stephen Schwartz, SoundTales - Helsingør, Denmark; Jan Larsen, Technical University of Denmark - Kgs. Lyngby, Denmark
Spatial audio and the possibility of interacting with the audio environment is thought to increase listeners' attention to details in a soundscape. This work examines if interactive 3-D audio enhances listeners' ability to recall details in a soundscape. Nine different soundscapes were constructed and presented in either mono, stereo, 3-D, or interactive 3-D, and performance was evaluated by asking factual questions about details in the audio. Results show that spatial cues can increase attention to background sounds while reducing attention to narrated text, indicating that spatial audio can be constructed to guide listeners' attention.
Convention Paper 8780 (Purchase now)

P14-3 Simulating Autophony with Auralized Oral-Binaural Room Impulse Responses—Manuj Yadav, University of Sydney - Sydney, NSW, Australia; Luis Miranda, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia; William L. Martens, University of Sydney - Sydney, NSW, Australia
This paper presents a method for simulating the sound that one hears from one’s own voice in a room acoustic environment. Impulse responses from the mouth to the two ears of the same head are auralized within a computer-modeled room in ODEON; using higher-order ambisonics for modeling the directivity pattern of an anthropomorphic head and torso. These binaural room impulse responses, which can be measured for all possible head movements, are input into a mixed-reality room acoustic simulation system for talking-listeners. With the system, “presence” in a room environment different from the one in which one is physically present is created in real-time for voice related tasks.
Convention Paper 8781 (Purchase now)

P14-4 Head-Tracking Techniques for Virtual Acoustics Applications—Wolfgang Hess, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Synthesis of auditory virtual scenes often requires the use of a head-tracker. Virtual sound fields benefit from continuous adaptation to a listener’s position while presented through headphones or loudspeakers. For this task position- and time-accurate, continuous robust capturing of the position of the listener’s outer ears is necessary. Current head-tracker technologies allow solving this task by cheap and reliable electronic techniques. Environmental conditions have to be considered to find an optimal tracking solution for each surrounding and for each field of application. A categorization of head-tracking systems is presented. Inside-out describes tracking stationary sensors from inside a scene, whereas outside-in is the term for capturing from outside a scene. Marker-based and marker-less approaches are described and evaluated by means of commercially available products, e.g., the MS Kinect, and proprietary developed systems.
Convention Paper 8782 (Purchase now)

P14-5 Scalable Binaural Synthesis on Mobile Devices—Christian Sander, University of Applied Sciences Düsseldorf - Düsseldorf, Germany; Robert Schumann Hochschule Düsseldorf - Düsseldorf, Germany; Frank Wefers, RWTH Aachen University - Aachen, Germany; Dieter Leckschat, University of Applied Science Düsseldorf - Düsseldorf, Germany
The binaural reproduction of sound sources through headphones in mobile applications is becoming a promising opportunity to create an immersive three-dimensional listening experience without the need for extensive equipment. Many ideas for outstanding applications in teleconferencing, multichannel rendering for headphones, gaming, or auditory interfaces implementing binaural audio have been proposed. However, the diversity of applications calls for scalability of quality and performance costs so as to use and share hardware resources economically. For this approach, scalable real-time binaural synthesis on mobile platforms was developed and implemented in a test application in order to evaluate what current mobile devices are capable of in terms of binaural technology, both qualitatively and quantitatively. In addition, the audio part of three application scenarios was simulated.
Convention Paper 8783 (Purchase now)

P15 - Signal Processing Fundamentals

Sunday, October 28, 2:00 pm — 5:30 pm (Room 121)

Chair:
Lars Villemoes, Dolby Sweden - Stockholm, Sweden

P15-1 Frequency-Domain Implementation of Time-Varying FIR Filters—Earl Vickers, STMicroelectronics, Inc. - Santa Clara, CA, USA; The Sound Guy, Inc. - San Jose, CA, USA
Finite impulse response filters can be implemented efficiently by means of fast convolution in the frequency domain. However, in applications such as speech enhancement or channel upmix, where the filter is a time-varying function of the input signal, standard approaches can suffer from artifacts and distortion due to circular convolution and the resulting time-domain aliasing. Existing solutions can be computationally prohibitive. This paper compares a number of previous algorithms and presents an alternate method based on the equivalence between frequency-domain convolution and time domain windowing. Additional computational efficiency can be attained by careful choice of the analysis window.
Convention Paper 8784 (Purchase now)

P15-2 Estimating a Signal from a Magnitude Spectrogram via Convex Optimization—Dennis L. Sun, Stanford University - Stanford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
The problem of recovering a signal from the magnitude of its short-time Fourier transform (STFT) is a longstanding one in audio signal processing. Existing approaches rely on heuristics that often perform poorly because of the nonconvexity of the problem. We introduce a formulation of the problem that lends itself to a tractable convex program. We observe that our method yields better reconstructions than the standard Griffin-Lim algorithm. We provide an algorithm and discuss practical implementation details, including how the method can be scaled up to larger examples.
Convention Paper 8785 (Purchase now)

P15-3 Distance-Based Automatic Gain Control with Continuous Proximity-Effect Compensation—Walter Etter, Bell Labs, Alcatel-Lucent - Murray Hill, NJ, USA
This paper presents a method of Automatic Gain Control (AGC) that derives the gain from the sound source to microphone distance, utilizing a distance sensor. The concept makes use of the fact that microphone output levels vary inversely with the distance to a spherical sound source. It is applicable to frequently arising situations in which a speaker does not maintain a constant microphone distance. In addition, we address undesired bass response variations caused by the proximity effect. Knowledge of the sound-source to microphone distance permits accurate compensation for both frequency response changes and distance-related signal level changes. In particular, a distance-based AGC can normalize these signal level changes without deteriorating signal quality, as opposed to conventional AGCs, which introduce distortion, pumping, and breathing. Provided an accurate distance sensor, gain changes can take effect instantaneously and do not need to be gated by attack and release time. Likewise, frequency response changes due to undesired proximity-effect variations can be corrected adaptively using precise inverse filtering derived from continuous distance measurements, sound arrival angles, and microphone directivity no longer requiring inadequate static settings on the microphone for proximity-effect compensation.
Convention Paper 8786 (Purchase now)

P15-4 Subband Comfort Noise Insertion for an Acoustic Echo Suppressor—Guangji Shi, DTS, Inc. - Los Gatos, CA, USA; Changxue Ma, DTS, Inc. - Los Gatos, CA, USA
This paper presents an efficient approach for comfort noise insertion for an acoustic echo suppressor. Acoustic echo suppression causes frequent noise level change in noisy environments. The proposed algorithm estimates the noise level for each frequency band using a minimum variance based noise estimator, and generates comfort noise based on the estimated noise level and a random phase generator. Tests show that the proposed comfort noise insertion algorithm is able to insert an appropriate level of comfort noise that matches the background noise characteristics in an efficient manner.
Convention Paper 8787 (Purchase now)

P15-5 Potential of Non-uniformly Partitioned Convolution with Freely Adaptable FFT Sizes—Frank Wefers, RWTH Aachen University - Aachen, Germany; Michael Vorländer, RWTH Aachen University - Aachen, Germany
The standard class of algorithms used for FIR filtering with long impulse responses and short input-to-output latencies are non-uniformly partitioned fast convolution methods. Here a filter impulse response is split into several smaller sub filters of different sizes. Small sub filters are needed for a low latency, whereas long filter parts allow for more computational efficiency. Finding an optimal filter partition that minimizes the computational cost is not trivial, however optimization algorithms are known. Mostly the Fast Fourier Transform (FFT) is used for implementing the fast convolution of the sub filters. Usually the FFT transform sizes are chosen to be powers of two, which has a direct effect on the partitioning of filters. Recent studies reveal that the use of FFT transform sizes that are not powers two has a strong potential to lower the computational costs of the convolution even more. This paper presents a new real-time low-latency convolution algorithm, which performs non-uniformly partitioned convolution with freely adaptable FFT sizes. Alongside, an optimization technique is presented that allows adjusting the FFT sizes in order to minimize the computational complexity for this new framework of non-uniform filter partitions. Finally the performance of the algorithm is compared to conventional methods.
Convention Paper 8788 (Purchase now)

P15-6 Comparison of Filter Bank Design Algorithms for Use in Low Delay Audio Coding—Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Thomas Krause, Leibniz Universität Hannover - Hannover, Germany; Jörn Ostermann, Leibniz Universität Hannover - Hannover, Germany
This paper is concerned with the comparison of filter bank design algorithms for use in audio coding applications with a very low coding delay of less than 1ms. Different methods for numerical optimization of low delay filter banks are analyzed and compared. In addition, the use of the designed filter banks in combination with a delay-free ADPCM coding scheme is evaluated. Design properties and results of PEAQ (Perceptual Evaluation of Audio Quality) based objective audio-quality evaluation as well as a listening test are given. The results show that in our coding scheme a significant improvement of audio-quality, especially for critical signals, can be achieved by the use of filter banks designed with alternative filter bank design algorithms.
Convention Paper 8789 (Purchase now)

P15-7 Balanced Phase Equalization; IIR Filters with Independent Frequency Response and Identical Phase Response—Peter Eastty, Oxford Digital Limited - Oxford, UK
It has long been assumed that in order to provide sets of filters with arbitrary frequency responses but matching phase responses, symmetrical, finite impulse response filters must be used. A method is given for the construction of sets of infinite impulse response (recursive) filters that can achieve this aim with lower complexity, power, and delay. The zeros of each filter in a set are rearranged to provide linear phase while the phase shift due to the poles of each filter is counteracted by all-pass compensation filters added to other members of the set.
Convention Paper 8790 (Purchase now)

P16 - Analysis and Synthesis of Sound

Sunday, October 28, 2:00 pm — 3:30 pm (Foyer)

P16-1 Envelope-Based Spatial Parameter Estimation in Directional Audio Coding—Michael Kratschmer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Oliver Thiergart, International Audio Laboratories Erlangen - Erlangen, Germany; Ville Pulkki, Aalto University - Espoo, Finland
Directional Audio Coding provides an efficient description of spatial sound in terms of few audio downmix signals and parametric side information, namely the direction-of-arrival (DOA) and diffuseness of the sound. This representation allows an accurate reproduction of the recorded spatial sound with almost arbitrary loudspeaker setups. The DOA information can be efficiently estimated with linear microphone arrays by considering the phase information between the sensors. Due to the microphone spacing, the DOA estimates are corrupted by spatial aliasing at higher frequencies affecting the sound reproduction quality. In this paper we propose to consider the signal envelope for estimating the DOA at higher frequencies to avoid the spatial aliasing problem. Experimental results show that the presented approach has great potential in improving the estimation accuracy and rendering quality.
Convention Paper 8791 (Purchase now)

P16-2 Approximation of Dynamic Convolution Exploiting Principal Component Analysis: Objective and Subjective Quality Evaluation—Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
In recent years, several techniques have been proposed in the literature in order to attempt the emulation of nonlinear electro-acoustic devices, such as compressors, distortions, and preamplifiers. Among them, the dynamic convolution technique is one of the most common approaches used to perform this task. In this paper an exhaustive objective and subjective analysis of a dynamic convolution operation based on principal components analysis has been performed. Taking into consideration real nonlinear systems, such as bass preamplifier, distortion, and compressor, comparisons with the existing techniques of the state of the art have been carried out in order to prove the effectiveness of the proposed approach.
Convention Paper 8792 (Purchase now)

P16-3 Optimized Implementation of an Innovative Digital Audio Equalizer—Marco Virgulti, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Emanuele Ciavattini, Leaff Engineering - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Digital audio equalization is one of the most common operations in the acoustic field, but its performance depends on computational complexity and filter design techniques. Starting from a previous FIR implementation based on multirate systems and filterbanks theory, an optimized digital audio equalizer is derived. The proposed approach employs all-pass IIR filters to improve the filterbanks structure developed to avoid ripple between adjacent bands. The effectiveness of the optimized implementation is shown comparing it with the FIR approach. The solution presented here has several advantages increasing the equalization performance in terms of low computational complexity, low delay, and uniform frequency response.
Convention Paper 8793 (Purchase now)

P16-4 Automatic Mode Estimation of Persian Musical Signals—Peyman Heydarian, London Metropolitan University - London, UK; Lewis Jones, London Metropolitan University - London, UK; Allan Seago, London Metropolitan University - London, UK
Musical mode is central to maqamic musical traditions that span from Western China to Southern Europe. A mode usually represents the scale and is to some extent an indication of the emotional content of a piece. Knowledge of the mode is useful in searching multicultural archives of maqamic musical signals. Thus, the modal information is worth inclusion in metadata of a file. An automatic mode classification algorithm will have potential applications in music recommendation and play list generation, where the pieces can be ordered based on a perceptually accepted criterion such as the mode. It has the possibility of being used as a framework for music composition and synthesis. This paper presents an algorithm for classification of Persian audio musical signals, based on a generative approach, i.e., Gaussian Mixture Models (GMM), where chroma is used as the feature. The results will be compared with a chroma-based method with a Manhattan distance measure that was previously developed by ourselves.
Convention Paper 8794 (Purchase now)

P16-5 Generating Matrix Coefficients for Feedback Delay Networks Using Genetic Algorithm—Michael Chemistruck, University of Miami - Coral Gables, FL, USA; Kyle Marcolini, University of Miami - Coral Gables, FL, USA; Will Pirkle, University of Miami - Coral Gables, FL, USA
The following paper analyzes the use of the Genetic Algorithm (GA) in conjunction with a length-4 feedback delay network for audio reverberation applications. While it is possible to manually assign coefficient values to the feedback network, our goal was to automate the generation of these coefficients to help produce a reverb with characteristics as similar to those of a real room reverberation as possible. To do this we designed a GA to be used in conjunction with a delay-based reverb that would be more desirable in the use of real-time applications than the more computationally expensive convolution reverb.
Convention Paper 8795 (Purchase now)

P16-6 Low Complexity Transient Detection in Audio Coding Using an Image Edge Detection Approach—Julien Capobianco, France Telecom Orange Labs/TECH/OPERA - Lannion Cedex, France; Université Pierre et Marie Curie - Paris, France; Grégory Pallone, France Telecom Orange Labs/TECH/OPERA - Lannion Cedex, France; Laurent Daudet, University Paris Diderot - Paris, France
In this paper we propose a new low complexity method of transient detection using an image edge detection approach. In this method, the time-frequency spectrum of an audio signal is considered as an image. Using appropriate mapping function for converting energy bins into pixels, audio transients correspond to rectilinear edges in the image. Then, the transient detection problem is equivalent to an edge detection problem. Inspired by standard image methods of edge detection, we derive a detection function specific to rectilinear edges that can be implemented with a very low complexity. Our method is evaluated in two practical audio coding applications, in replacement of the SBR transient detector in HEAAC+ V2 and in the stereo parametric tool of MPEG USAC.
Convention Paper 8796 (Purchase now)

P16-7 Temporal Coherence-Based Howling Detection for Speech Applications—Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Hao Liu, Chinese Academy of Sciences - Beijing, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China
This paper proposes a novel howling detection criterion for speech applications, which is based on temporal coherence (will be referred as TCHD). The proposed TCHD criterion is based on the fact that the speech only has a relatively short coherence time, while the coherence times of the true howling components are nearly infinite since the howling components are perfectly correlated with themselves for large delays. The proposed TCHD criterion is computationally efficient for two reasons. First, the fast Fourier transform (FFT) can be applied directly to compute the temporal coherence. Second, the proposed TCHD criterion does not need to identify spectral peaks from the raw periodogram of the microphone signal. Simulation and experimental results show the validity of the proposed TCHD criterion.
Convention Paper 8797 (Purchase now)

P16-8 A Mixing Matrix Estimation Method for Blind Source Separation of Underdetermined Audio Mixture—Mingu Lee, Samsung Electronics Co. - Suwon-si, Gyeonggi-do, Korea; Keong-Mo Sung, Seoul National University - Seoul, Korea
A new mixing matrix estimation method for under-determined blind source separation of audio signals is proposed. By statistically modeling the local features, i.e., the magnitude ratio and phase difference of the mixtures, in a time-frequency region, a region can have information of the mixing angle of a source with reliability amounted to its likelihood. Regional data are then clustered with statistical tests based on their likelihood to produce estimates for the mixing angle of the sources as well as the number of them. Experimental results show that the proposed mixing matrix estimation algorithm outperform the existing methods.
Convention Paper 8798 (Purchase now)

P16-9 Speech Separation with Microphone Arrays Using the Mean Shift Algorithm—David Ayllón, University of Alcala - Alcalá de Henares, Spain; Roberto Gil-Pita, University of Alcala - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, Spain
Microphone arrays provide spatial resolution that is useful for speech source separation due to the fact that sources located in different positions cause different time and level differences in the elements of the array. This feature can be combined with time-frequency masking in order to separate speech mixtures by means of clustering techniques, such as the so-called DUET algorithm, which uses only two microphones. However, there are applications where larger arrays are available, and the separation can be performed using all these microphones. A speech separation algorithm based on mean shift clustering technique has been recently proposed using only two microphones. In this work the aforementioned algorithm is generalized for arrays of any number of microphones, testing its performance with echoic speech mixtures. The results obtained show that the generalized mean shift algorithm notably outperforms the results obtained by the original DUET algorithm.
Convention Paper 8799 (Purchase now)

P16-10 A Study on Correlation Between Tempo and Mood of Music—Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
In this paper a study is carried out to identify a relationship between mood description and combinations of various tempos and rhythms. First, a short review of music recommendation systems along with music mood recognition studies is presented. In addition, some details on tempo and rhythm perception and detection are included. Then, the experiment layout is explained in which a song is first recorded and then its rhythm and tempo are changed. This constitutes the basis for a mood tagging test. Six labels are chosen for mood description. The results show a significant dependence between the tempo and mood of the music.
Convention Paper 8800 (Purchase now)

P17 - Spatial Audio Processing

Monday, October 29, 9:00 am — 1:00 pm (Room 121)

Chair:
Jean-Marc Jot, DTS, Inc. - Calabasas, CA, USA

P17-1 Comparing Separation Quality of Nonnegative Matrix Factorization and Nonnegative Matrix Factor 2-D Deconvolution in Audio Source Separation Tasks—Julian M. Becker, RWTH Aachen University - Aachen, Germany; Volker Gnann, RWTH Aachen University - Aachen, Germany
The Nonnegative Matrix Factorization (NMF) is widely used in audio source separation tasks. However, the separation quality of NMF varies a lot depending on the mixture. In this paper we analyze the use of NMF in source separation tasks and show how separation results can be significantly improved by using the Nonnegative Matrix Factor 2D Deconvolution (NMF2D). NMF2D was originally proposed as an extension to the NMF to circumvent the problem of grouping notes, but it is used differently in this paper to improve the separation quality, without taking the problem of grouping notes into account.
Convention Paper 8801 (Purchase now)

P17-2 Aspects of Microphone Array Source Separation Performance—Bjoern Erlach, Stanford University - Stanford, CA, USA; Rob Bullen, SoundScience P/L/ - Australia; Jonathan S. Abel, Stanford University - Stanford, CA, USA
The performance of a blind source separation system based on a custom microphone array is explored. The system prioritizes artifact-free processing over source separation effectiveness and extracts source signals using a quadratically constrained least-squares fit based on estimated source arrival directions. The level of additive noise present in extracted source signals is computed empirically for various numbers of microphones used and different degrees of uncertainty in knowledge of microphone locations. The results are presented in comparison to analytical predictions. The source signal estimate variance is roughly inversely proportional to the number of sensors and roughly proportional to both the additive noise variance and microphone position error variance. Beyond a threshold the advantages of increased channel count and precise knowledge of the sensor locations are outweighed by other limitations.
Convention Paper 8802 (Purchase now)

P17-3 A New Algorithm for Generating Realistic Three-Dimensional Reverberation Based on Image Sound Source Distribution in Consideration of Room Shape Complexity—Toshiki Hanyu, Nihon University - Funabashi, Chiba, Japan
A new algorithm for generating realistic three-dimensional reverberation based on statistical room acoustics is proposed. The author has clarified the relationship between reflected sound density and mean free path in consideration of room shape complexity [Hanyu et al., Acoust. Sci. & Tech. 33, 3 (2012),197–199]. Using this relationship the new algorithm can statistically create image sound source distributions that reflect the room shape complexity, room’s absorption, and room volume by using random numbers of three-dimensional orthogonal coordinates. The image sound source distribution represents characteristics of three-dimensional reverberation of the room. Details of this algorithm and how to apply this to multichannel audio, binaural audio, game sound, and so on are introduced.
Convention Paper 8803 (Purchase now)

P17-4 Audio Signal Decorrelation Based on Reciprocal-Maximal Length Sequence Filters and Its Applications to Spatial Sound—Bo-sun Xie, South China University of Technology - Guangzhou, China; Bei Shi, South China University of Technology - Guangzhou, China; Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
An algorithm of audio signal decorrelation is proposed. The algorithm is based on a pair of all-pass filters whose responses match with a pair of reciprocal maximal length sequences (MLSs). Taking advantage of the characters of uniform power spectrum and low-valued cross-correlation of reciprocal MLSs, the filters create a pair of output signals with low cross-correlation but almost identical magnitude spectra to the input signal. The proposed algorithm is applied to broadening the auditory source-width and enhancing subjective envelopment in multichannel sound reproduction. Preliminary psychoacoustic experiments validate the performance of the proposed algorithm.
Convention Paper 8805 (Purchase now)

P17-5 Utilizing Instantaneous Direct-to-Reverberant Ratio in Parametric Spatial Audio Coding—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
Scenarios with multiple simultaneous sources in an acoustically dry room may be challenging for parametric spatial sound reproduction techniques, such as directional audio coding (DirAC). It has been found that the decorrelation process used in the reproduction causes a perception of added reverberation. A model for DirAC reproduction is suggested in this paper, which utilizes estimation of instantaneous direct-to-reverberant ratio. The sound is divided into reverberant and non-reverberant parts using this ratio, and decorrelation is applied only for the reverberant part. The results of formal listening tests are presented that show that perceptually good audio quality can be obtained using this approach for both dry and reverberant scenarios.
Convention Paper 8804 (Purchase now)

P17-6 A Downmix Approach with Acoustic Shadow, Low Frequency Effects, and Loudness Control—Regis Rossi A. Faria, University of São Paulo - Ribeirão Preto, Brazil; NEAC - Audio Engineering and Coding Center - University of São Paulo - São Paulo, Brazil; José Augusto Mannis, University of Campinas - Campinas, Brazil
Conventional algorithms for converting 5.1 sound fields into 2.0 have systematically suppressed the low frequency information and neglected spatial auditory effects produced by the real position of the loudspeakers in 3/2/1 arrangements. We designed a downmix variation in which the listener's head acoustic shadow and the LFE information are considered in the conversion in order to investigate how the use of such models can contribute to improve the surround experience in two-channel modes and to customize the downmix conversion so to achieve the particular balance required by individual surround programs. A test implementation with integrated loudness control was carried out. Preliminary results show the potential benefits in using this approach and point to critical parameters involved in the downmixing task.
Convention Paper 8806 (Purchase now)

P17-7 Direct-Diffuse Decomposition of Multichannel Signals Using a System of Pairwise Correlations—Jeffrey Thompson, DTS, Inc. - Calabasas, CA, USA; Brandon Smith, DTS, Inc. - Calabasas, CA, USA; Aaron Warner, DTS, Inc. - Calabasas, CA, USA; Jean-Marc Jot, DTS, Inc. - Calabasas, CA, USA
Decomposing an arbitrary audio signal into direct and diffuse components is useful for applications such as spatial audio coding, spatial format conversion, binaural rendering, and spatial audio enhancement. This paper describes direct-diffuse decomposition methods for multichannel signals using a linear system of pairwise correlation estimates. The expected value of a correlation coefficient is analytically derived from a signal model with known direct and diffuse energy levels. It is shown that a linear system can be constructed from pairwise correlation coefficients to derive estimates of the Direct Energy Fraction (DEF) for each channel of a multichannel signal. Two direct-diffuse decomposition methods are described that utilize the DEF estimates within a time-frequency analysis-synthesis framework.
Convention Paper 8807 (Purchase now)

P17-8 A New Method of Multichannel Surround Sound Coding Utilizing Head Dynamic Cue—Qinghua Ye, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Lingling Zhang, Beijing, China; Hefei Yang, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China
Considering the disadvantages of conventional matrix surround systems, a new method of multichannel surround sound coding is proposed. At the encoder, multichannel signals are converted into two pairs of virtual stereo signals, which are modulated to make corresponding changes by simulating dynamic cue of the head's slight rotation. Then the left and right channels of the virtual stereos are added respectively into two-channel stereo for transmission. At the decoder, the front and back stereos are separated by extracting the dynamic cue and then redistributed to the multichannel system. This new method can realize better surround sound reproduction without affecting sound effects of the downmixed stereo.
Convention Paper 8808 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS October 27th 10am �� 6pm October 28th 10am �� 6pm October 29th 10am �� 4pm

REGISTRATION DESK October 25th 3pm �� 7pm October 26th 8am �� 6pm October 27th 8am �� 6pm October 28th 8am �� 6pm October 29th 8am �� 4pm

TECHNICAL PROGRAM October 26th 9am �� 7pm October 27th 9am �� 7pm October 28th 9am �� 7pm October 29th 9am �� 5pm

Audio Engineering Society

AES San Francisco 2012Paper Session Details