AES San Francisco 2010
Paper Session Details

 

P1 - Transducers and Processing for Live Sound


Thursday, November 4, 9:30 am — 12:30 pm (Room 220)

Chair:
Scott Norcross

P1-1 A Performance Ranking of Seven Different Types of Loudspeaker Line ArraysD. B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA
Seven types of loudspeaker line arrays were ranked considering eight performance parameters including (1) beamwidth uniformity, (2) directivity uniformity, (3) sound field uniformity, (4) side lobe suppression, (5) uniformity of polar response, (6) smoothness of off-axis frequency response, (7) sound pressure rolloff versus distance, and (8) near-far polar pattern uniformity. Line arrays analyzed include: (1) un-shaded straight-line array, (2) Hann-shaded straight-line array, (3) “J”-line array, (4) spiral- or progressive-line array, (5) un-shaded circular-arc array, (6) CBT circular-arc array, and (7) a CBT delay-curved straight-line array. All arrays were analyzed assuming no extra drive signal processing other than frequency-independent shading. A weighted performance analysis yielded the following ranking from best to worse 6, 7, 5, 4, 3, 2, 1, with the CBT Legendre-shaded circular-arc array on top and the un-shaded straight-line array on the bottom.
Convention Paper 8155 (Purchase now)

P1-2 A Reliable Procedure for Polarity Measurements on Line ArraysGregor Schmidle, Markus Becker, NTi Audio AG - Schaan, Liechtenstein
The performance of a line array strongly depends on the correct installation of its loudspeakers. For instance, a single loudspeaker with incorrect polarity may clearly compromise the sound level and directivity of the whole system. The identification of such errors, however, can be very time consuming. Therefore, it is desirable to have a fast, yet reliable procedure to finding such array elements. This paper presents a step-by-step method to check the integrity of a line array and to find the cause in case of a polarity problem. Besides the theoretical background, a successful practical case is described.
Convention Paper 8156 (Purchase now)

P1-3 Calculating Time Delays of Multiple Active Sources in Live SoundAlice Clifford, Josh Reiss, Queen Mary University of London - London, UK
Delays caused by differences in distance between sources and microphones cause many problems in live audio, most notably comb filtering. This paper presents a new method that is able to calculate the relative time delays of multiple active sources to multiple microphones where previous methods are unable to. The calculated time delays can be used to compensate for delays that cause comb filtering and can also be used in source separation methods that utilize delays. The proposed method is shown to be able to calculate delays in configurations where other methods fail and is also able to give an estimate of sources physical positions. The results show that multiple delays can be accurately calculated when multiple sources are active and that noise can effect the accuracy of the method.
Convention Paper 8157 (Purchase now)

P1-4 Coherent Superposition of Acoustic Sources as a Function of Environmental ParametersStefan Feistel, Ahnert Feistel Media Group - Berlin, Germany; Rainer Feistel, Institut für Ostseeforschung - Warnemuende Germany
Sound reinforcement systems and loudspeaker arrays consist of numerous, spatially distributed sources. The signal alignment of these components is crucial to provide even level coverage and consistent spectral distribution throughout the audience areas. One usually assumes that sources located close to each other sum coherently in contrast to sources spaced far apart which sum energetically at the receiver. However, in reality this assumption seldom holds since environmental conditions, such as fluctuations of temperature and air flow and their spatial correlation, determine the actual level of coherence. We derive the theoretical framework for this important effect based on stochastic theory. We quantify the resulting coherence as a function of the different environmental parameters. Results are verified using numerical models and measurement.
Convention Paper 8158 (Purchase now)

P1-5 Optimizing the Controls of Homogeneous Loudspeaker ArrayMichael Terrell, Mark Sandler, Queen Mary University of London - London, UK
An optimization technique is demonstrated that can be used to control the frequency response at multiple receiver locations to a homogenous loudspeaker array. Each loudspeaker element in the array is identical but can be controlled individually using vertical and horizontal angle, delay, broadband gain, a 7-band parametric equalizer, and a 5th order all pass filter. The arrays examined contain 8 loudspeakers, and the controls are optimized using a genetic algorithm. This is demonstrated for a 2-D case study based on the requirement that the magnitude response at each location is the same.
Convention Paper 8159 (Purchase now)

P1-6 Perceptual Dimensions of Stage-Floor Vibration Experienced During a Musical PerformanceClemeth L. Abercrombie, Artec Consultants Inc. - New York, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
The human ability to distinguish differences in tactile signals generated by a musical instrument and experienced on typical stage-floor constructions is explored using an audio-tactile display (headphones and calibrated motion platform). Audio and vibration signals generated by a contrabass are combined with mechanical impedance measurements of five stage floors to create stimuli. Test participants are asked to report differences between tactile signals given a fixed audio environment. Multidimensional scaling is used to identify perceptual dimensions in subjective responses. Results show that stage vibration exceeds the threshold of perception, with acceleration up to 0.04 ms-2 Wk-peak. Sensation level dominates perceived differences between tactile signals measured for different stage-floor constructions, while audio-tactile time delays have negligible influence.
Convention Paper 8160 (Purchase now)

P2 - Speech Processing


Thursday, November 4, 9:30 am — 12:30 pm (Room 236)

Chair:
John Strawn

P2-1 Language Scrambling for In-Game Voice-Chat ApplicationsNicolas Tsingos, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA
We propose a solution to enable speech-driven alien language synthesis in an in-game voice-chat system. Our technique selectively replaces the users’ input speech with a corresponding alien language output synthesized on-the-fly. It is optimized for a client-server architecture and uses a concatenative synthesis framework. To limit memory requirements, as well as preserve forwarding capabilities on the server, the concatenative synthesis is performed in the coded domain. For gaming applications, our approach can be used to selectively scramble the speech of opposing team members in order to provide compelling in-game voice feedback without exposing their strategy. The system has been implemented with multiple alien races in a virtual environment with effective, entertaining results.
Convention Paper 8161 (Purchase now)

P2-2 Speech Referenced Limiting: Controlling the Loudness of a Signal with Reference to its Speech LoudnessMichael Fisher, Nicky Chong-White, The HEARing CRC - Melbourne, Victoria, Australia, National Acoustic Laboratories, Sydney, NSW, Australia; Harvey Dillon, National Acoustic Laboratories - Sydney, NSW, Australia
A novel method of sound amplitude limiting for signals conveying speech is presented. The method uses the frequency-specific levels of the speech conveyed by the signal to generate a set of time-varying speech reference levels. It limits the level of sounds conveyed by the signal to these speech reference levels. The method is called speech referenced limiting (SRL). It provides minimal limiting of speech while providing greater control over the loudness of non-speech sounds compared to conventional (fixed threshold) limiters. It is appropriate for use in applications where speech is the primary signal of interest such as telephones, computers, amplified hearing protectors, and hearing aids. The effect of SRL on speech and non-speech sounds is presented.
Convention Paper 8162 (Purchase now)

P2-3 Individually Adjustable Signal Processing Algorithms for Improved Speech Intelligibility with Cochlear ImplantsIsabell Kiral-Kornek, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany; Andreas Büchner,, Hörzentrum Hannover, Hannover Medical School - Hannover, Germany
Thanks to the development of cochlear implants (CIs) the treatment of certain hearing impairments and even deafness has become possible. However, up to now individual adjustments of the speech processing within a cochlear implant are limited to an unequal amplification of different frequency bands. As the perception of speech among patients differs beyond just loudness, improvements concerning individually adjustable signal processing are to be made. A novel approach is being presented that aims to increase the intelligibility for patients by extending common speech recognition tests to allow for an optimized speech processing tailored to the patients’ needs.
Convention Paper 8163 (Purchase now)

P2-4 Objective Evaluation of Wideband Speech Codecs for Bluetooth Voice CommunicationGary Spittle, CSR - Cambridge Silicon Radio - Cambridge, UK; Jacek Spiewla, Walter Kargus, Walter Zuluaga, Xuejing Sun, CSR - Cambridge Silicon Radio - Detroit, MI, USA
Bluetooth devices that stream audio are becoming increasingly popular. The user expectation has increased and as a result the requirements on wireless audio devices using Bluetooth has produced a number of challenges. This paper discusses the impact on the wideband speech due to various forms of adverse connection conditions. These are measured in terms of speech quality and intelligibility. A detailed understanding of various forms of degradation allows proper solutions to be provided.
Convention Paper 8164 (Purchase now)

P2-5 Speech Synthesis Controlled by Eye GazingAndrzej Czyzewski, Kuba Lopatka, Bartosz Kunka, Rafal Rybacki, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as "talking by eyes" providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot talk and move any part of their body. The paper describes a methodology of determining the fixation point on a computer screen. Then it presents an algorithm of concatenative speech synthesis used in the solution engineered. An analysis of working with the system is provided. Conclusions focusing on system characteristics are included.
Convention Paper 8165 (Purchase now)

P2-6 Voice Samples Recording and Speech Quality Assessment for Forensic and Automatic Speaker IdentificationAndrey Barinov, Speech Technology Center Ltd. - Saint Petersburg, Russia
The task of speaker recognition or speaker identification becomes very important in our digital world. Most of the law enforcement organizations use either automatic or manual speaker identification tools for investigation processes. In any case, before carrying out the identification analysis, they usually need to record a voice sample from the suspect either for one to one comparison or to fill in the database. In this paper we describe the parameters of speech signal that are important for speaker identification performance, we propose the approaches of quality assessment, and provide the practical recommendations of taking the high quality voice sample, acceptable for speaker identification. The materials of this paper might be useful for both soft/hardware developers and forensic practitioners.
Convention Paper 8166 (Purchase now)

P3 - Acoustical Measurements


Thursday, November 4, 2:30 pm — 6:30 pm (Room 236)

Chair:
John Vanderkooy

P3-1 Methods for Extending Room Impulse Responses beyond Their Noise FloorNicholas J. Bryan, Jonathan S. Abel, Stanford University - Stanford, CA, USA
Two methods of extending measured room impulse responses below their noise floor and beyond their measured duration are presented. Both methods extract frequency-dependent reverberation energy decay rates, equalization levels, and noise floor levels, and subsequently extrapolate the reverberation decay toward silence. The first method crossfades impulse response frequency bands with a late-field response synthesized from Guassian noise. The second method imposes the desired decay rates on the original impulse response bands. Both methods maintain an identical impulse response prior to the noise floor arrival in each band and seamlessly transition to a natural sounding decay after the noise floor arrival.
Convention Paper 8167 (Purchase now)

P3-2 On the Use of Ultrasound Transducer Arrays to Account for Time-Variance on Room Acoustics MeasurementsJoel Preto Paulo, ISEL- Instituto Superior de Engenharia de Lisboa - Lisbon, Portugal, CAPS – Instituto Superior Técnico, TU Lisbon, Lisbon, Portugal; José Bento Coelho, CAPS – Instituto Superior Técnico, TU Lisbon - Lisbon, Portugal
In real room acoustical measurements, the assumption of time-invariant system is usually not verified. A measurement technique was set-up with the purposes of monitoring the acoustical media, searching for time variance phenomena, and for low SNR situations. A probe test signal in the ultrasonic band is sent to the room by using a parametric loudspeaker array, with high polar pattern directivity, simultaneous with the test signal frames. The relevant parameters to establish time-variance and associated thresholds are then estimated from the acquired ultrasonic sound. The valid test signal frames, which pass the thresholds test, are labeled with a weighting factor depending on its significance. Otherwise, the frames are rejected, not entering on the averaging process. Results are presented and discussed herein.
Convention Paper 8168 (Purchase now)

P3-3 Impulse Response Measurements in the Presence of Clock DriftNicholas J. Bryan, Miriam A. Kolar, Jonathan S. Abel, Stanford University - Stanford, CA, USA
There are many impulse response measurement scenarios in which the playback and recording devices maintain separate unsynchronized digital clocks resulting in clock drift. Clock drift is problematic for impulse response measurement techniques involving convolution, including sinusoidal sweeps and pseudo-random noise sequences. We present analysis of both a drifting record clock and playback clock, with a focus on swept sinusoids. When using a sinusoidal sweep without accounting for clock drift, the resulting impulse response is seen to be convolved with an allpass filter having the same frequency trajectory form as the input swept sinusoid with a duration proportional to the input sweep length. Two methods are proposed for estimating the clock drift and compensating for its effects in producing an impulse response measurement. Both methods are shown to effectively eliminate any clock effects in producing room impulse response measurements.
Convention Paper 8169 (Purchase now)

P3-4 Quasi-Anechoic Loudspeaker Measurement Using Notch Equalization for Impulse ShorteningRichard Stroud, Stroud Audio Inc. - Kokomo, IN, USA
The length of the impulse response of a typical piston driver is largely determined by the characteristic second-order high-pass response of the driver. This time response makes anechoic (i.e., gated) measurement difficult in non-anechoic environments, as reflections must be suppressed to returns of 30 ms. or more. This paper outlines a quasi-anechoic frequency and phase response modification technique using a tuned notch, or band-cut, equalization that shortens the impulse response and allows correct full-range loudspeaker measurement in moderately sized non-anechoic rooms.
Convention Paper 8170 (Purchase now)

P3-5 Estimating Room Impulse Responses from Recorded Balloon PopsJonathan S. Abel, Nicholas J. Bryan, Patty P. Huang, Miriam A. Kolar, Bissera V. Pentcheva, Stanford University - Stanford, CA, USA
Balloon pops are convenient for probing the acoustics of a space, as they generate relatively uniform radiation patterns and consistent “N-wave” waveforms. However, the N-wave spectrum contains nulls that impart an undesired comb-filter-like quality when the recorded balloon pop is convolved with audio. Here, a method for converting recorded balloon pops into full bandwidth impulse responses is presented. Rather than directly processing the balloon pop recording, an impulse response is synthesized according to the echo density and frequency band energies estimated in running windows over the balloon pop. Informal listening tests show good perceptual agreement between measured room impulse responses using a loudspeaker source and a swept sine technique and those derived from recorded balloon pops.
Convention Paper 8171 (Purchase now)

P3-6 Complex Modulation Transfer Function and its Applications in Transducer and Room Acoustics MeasurementsJuha Backman, Nokia Corporation - Espoo, Finland
Modulation transfer function in audio applications describes well the clarity of sound, but conventional definitions and measurement methods are not easily applicable to transducer measurements, low-frequency acoustics, or capturing effects of narrow-band phenomena. A revised definition of modulation transfer function, taking into account the magnitude and phase of modulation transfer for each carrier and modulator frequency combination is presented. This function is derived from the complex frequency response by analyzing the response at the carrier frequency and at the modulation sidebands. Also the distortion of modulation envelope arising from the asymmetry especially in the phase transfer properties is discussed. Examples of the use of the complex modulation transfer function are presented for simple filters, anechoic response measurements of loudspeakers, and for loudspeakers in rooms.
Convention Paper 8172 (Purchase now)

P3-7 Practical Implementation of Perceptual Rub & Buzz Distortion and Experimental ResultsSteve Temme, Pascal Brunet, Brian Fallon, Listen, Inc. - Boston, MA, USA
In a previous paper [1], we demonstrated how an auditory perceptual model based on an ITU standard can be used to detect audible Rub & Buzz defects in loudspeakers using a single tone stimulus. In this paper we demonstrate a practical implementation using a stepped sine sweep stimulus and present detailed experimental results on loudspeakers including comparison to human listeners and other perceptual methods.
Convention Paper 8173 (Purchase now)

P3-8 Measurement of Turbulent Air Noise Distortion in Loudspeaker SystemsWolfgang Klippel, Robert Werner, Klippel GmbH - Dresden, Germany
Air leaks in the dust cap and cabinets of loudspeakers generate turbulent noise that highly impairs the perceived sound quality as rub and buzz and other loudspeaker defects do. However, traditional measurement techniques often fail in the detection of air leaks because the noise has a large spectral bandwidth but a low power density and similar spectral properties as ambient noise generated in a production environment. The paper models the generation process of turbulent air noise and develops a novel measurement technique based on asynchronous demodulation and envelope averaging. The technique accumulates the total energy of the leak noise radiated during the measurement interval and increases the sensitivity by more than 20 dB for measurement times larger than 1s. The paper also presents the results of the practical evaluation and discusses the application to end-of-line testing.
Convention Paper 8174 (Purchase now)

P4 - Loudness and Dynamics


Thursday, November 4, 2:30 pm — 5:00 pm (Room 220)

Chair:
Brett Crockett

P4-1 The Loudness War: Background, Speculation, and RecommendationsEarl Vickers, STMicroelectronics, Inc. - Santa Clara, CA, USA
There is growing concern that the quality of commercially distributed music is deteriorating as a result of mixing and mastering practices used in the so-called “loudness war.” Due to the belief that “louder is better,” dynamics compression is used to squeeze more and more loudness into the recordings. This paper reviews the history of the loudness war and explores some of its possible consequences, including aesthetic concerns and listening fatigue. Next, the loudness war is analyzed in terms of game theory. Evidence is presented to question the assumption that loudness is significantly correlated to listener preference and sales rankings. The paper concludes with practical recommendations for de-escalating the loudness war.
Convention Paper 8175 (Purchase now)

P4-2 Subjective Evaluation of Gating Methods for Use with the ITU-R BS.1770 Loudness AlgorithmScott Norcross, Michel Lavoie, Communications Research Centre - Ottawa, Ontario, Canada
Loudness measurements using ITU-R Recommendation BS.1770 can be biased downward relative to the perceived loudness level when periods of silence and/or low level signals are present in the program being measured. To address this, it has been proposed that some form of gating be added to the loudness algorithm. To evaluate various gating methods, a formal subjective test was conducted to measure the subjective loudness of broadcast material. The results of the subjective test were used to assess the performance of the gating technique proposed by the EBU P/LOUD expert group on loudness. The study further explored the effect of gating threshold and analysis window size on the accuracy of the objective measurement. While the use of gating did improve the accuracy of the loudness algorithm no single combination could be found that satisfied all scenarios.
Convention Paper 8176 (Purchase now)

P4-3 Comparing Continuous Subjective Loudness Responses and Computational Models of Loudness for Temporally Varying SoundsSam Ferguson, University of New South Wales - Sydney, NSW, Australia; Densil Cabrera, The University of Sydney - Sydney, NSW, Australia; Emery Schubert, University of New South Wales - Sydney, NSW, Australia
There are many ways in which loudness can be objectively estimated, including simple weighted models based on physical sound level, as well as complex and computationally intensive models that incorporate many psychoacoustical factors. These complex models have been generated from principles and data derived from listening experiments using highly controlled, usually brief, artificial stimuli; whereas the simple models tend to have a real world emphasis in their derivation and validation. Loudness research has recently also focused on estimating time-varying loudness, as temporal aspects can have a strong effect on loudness. In this paper continuous subjective loudness responses are compared to time-series outputs of loudness models. We use two types of stimuli: a sequence of sine tones and a sequence of band-limited noise bursts. The stimuli were analyzed using a variety of loudness models, including those of Glasberg and Moore, Chalupper and Fastl, and Moore, Glasberg and Baer. Continuous subjective responses were obtained from 24 university students, who rated loudness continuously in time over the period of the experiment, while using an interactive interface.
Convention Paper 8177 (Purchase now)

P4-4 Measuring Dynamics: Comparing and Contrasting Algorithms for the Computation of Dynamic RangeJon Boley, LSB Audio LLC - Lafayette, IN, USA; Michael Lester, Shure Incorporated - Niles, IL, USA; Christopher Danner, University of Miami - Coral Gables, FL, USA
There is a consensus among many in the audio industry that recorded music has grown increasingly compressed over the past few decades. Some industry professionals are concerned that this compression often results in poor audio quality with little dynamic range. Although some algorithms have been proposed for calculating dynamic range, we have not been able to find any studies suggesting that any of these metrics accurately represent any perceptual dimension of the measured sound. In this paper we review the various proposed algorithms and compare their results with the results of a listening test. We show that none of the tested metrics accurately predict the perceived dynamic range of a musical track, but we identify some potential directions for future work.
Convention Paper 8178 (Purchase now)

P4-5 Dynamic Range Control for Audio Signals Using Fourth-Order ProcessingQing Yang, John Harris, University of Florida - Gainesville FL, USA
The human auditory system has been shown to be more sensitive to transient signals than stationary signals given the same energy. Conventional second-order measurements based on energy or root-mean-squared value cannot adequately characterize the auditory perception of non-stationary audio signals. A fourth-order dynamic range control (DRC) algorithm is proposed in this paper. The perceptual quality and dynamic range reduction effectiveness are evaluated for both second-order and fourth-order DRC algorithms. Evaluation results show that our proposed fourth-order DRC algorithm offers better balance of perceptual quality and dynamic range reduction than the conventional second-order approach.
Convention Paper 8179 (Purchase now)

P5 - Emerging Applications


Thursday, November 4, 3:00 pm — 4:30 pm (Room 226)

P5-1 A Robust Audio Feature Extraction Algorithm for Music IdentificationJiajun Wang, Beijing University of Posts and Telecommunications - Beijing, China; Marie-Luce Bourguet, Queen Mary University of London - London, UK
In this paper we describe a novel audio feature extraction method that can effectively improve the performance of music identification under noisy circumstances. It is based on a dual box approach that extracts from the sound spectrogram point clusters with significant energy variation. This approach was tested in a song finder application that can identify music from samples recorded by microphone in the presence of dominant noise. A series of experiments show that under noisy circumstances, our system outperforms current state-of-the-art music identification algorithms and provides very good precision, scalability, and query efficiency.
Convention Paper 8180 (Purchase now)

P5-2 The Low Complexity MP3-Multichannel Audio Decoding SystemHyun Wook Kim, Han Gil Moon, Samsung Electronics - Suwon, Korea
In this paper a low complexity MP3 multichannel audio system is proposed. Utilizing the proposed decoding system, the advanced multichannel MP3 decoder can play high quality multichannel audio as well as the legacy stereo audio with low processing power. The system mainly consists of two parts, one of which is an MP3 decoding part and the other one a parametric multichannel decoding part. The transform domain convolution-synthesis method is equipped to replace the PQMF module in the MP3 decoding part and several small point DFT modules instead of the large point DFT module used in the multichannel decoding part. This combination can reduce computing power dramatically without any loss of decoded audio signal.
Convention Paper 8181 (Purchase now)

P5-3 The hArtes CarLab: A New Approach to Advanced Algorithms Development for Automotive AudioStefania Cecchi, Andrea Primavera, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy; Ferruccio Bettarelli, Emanuele Ciavattini, Leaff Engineering - Ancona (AN), Italy; Romolo Toppi, FAITAL S.p.a. - Milano, Italy; Jose Gabriel De Figueiredo Coutinho, Wayne Luk, Imperial College London - London, UK; Christian Pilato, Fabrizio Ferrandi, Politecnico di Milano - Milano, Italy; Vlad M. Sima, Koen Bertels, Delft University of Technology - Delft, The Netherlands
In the last decade automotive audio has been gaining great attention by the scientific and industrial community. In this context, a new approach to test and develop advanced audio algorithms for a heterogeneous embedded platform has been proposed within the European hArtes project. A real audio laboratory installed in a real car (hArtes CarLab) has been developed employing professional audio equipment. The algorithms can be tested and validate on a PC exploiting each application as a plug-in of a real time framework. Then a set of tools (hArtes Toolchain) can be used to generate code for the embedded platform starting from plug-in implementation. An overview of the entire system is here presented, showing its effectiveness.
Convention Paper 8182 (Purchase now)

P5-4 Real-Time Speech Visualization System for Speech Training and DiagnosisYuichi Ueda, Tadashi Sakata, Akira Watanabe, Kumamoto University - Kumamoto-shi, Japan
We have been interested in visualizing speech information to observe speech phenomena, analyze speech signals, and substitute the hearing disorders or the speech disorders. In order to realize such speech visualization, we have developed a software tool, Speech-ART, and utilized it in investigating speech. Although the functional advantages of system have been effective in offline operation, the use of a speech training tool or real-time observation of speech sound has been restricted. Consequently, we have increased efficiency in analyzing speech parameters and displaying speech image, and then developed a real-time speech visualizing system. In this paper we describe the background of speech visualization, the characteristics of our system, and the applications of the system in the future.
Convention Paper 8184 (Purchase now)

P5-5 Underdetermined Binaural 3-D Sound Localization of Simultaneously Active SourcesMartin Rothbucher, David Kronmüller, Hao Shen, Klaus Diepold, Technische Universität München - München, Germany
Mobile robotic platforms are equipped with multimodal human-like sensing, e.g., haptic, vision, and audition, in order to collect data from the environment. Recently, robotic binaural hearing approaches based on Head-Related Transfer Functions (HRTFs) have become a promising technique to localize sounds in a three-dimensional environment with only two microphones. Usually, HRTF-based sound localization approaches are restricted to one sound source. To cope with this difficulty, Blind Source Separation (BSS) algorithms were utilized to separate the sound sources before applying HRTF localization. However, those approaches usually are computationally expensive and restricted to sparse and statistically independent signals for the underdetermined case. In this paper we present underdetermined sound localization that utilizes a super-positioned HRTF database. Our algorithm is capable of localizing sparse, as well as broadband signals, whereas the signals are not statistically independent.
Convention Paper 8185 (Purchase now)

P5-6 Wireless Multisensor Monitoring of the Florida Everglades: A Pilot Project Colby Leider, University of Miami - FL, USA; Doug Mann, Peavey Electronics Corporation - Meridian, MS, USA; Daniel P. Dickinson, University of Miami - FL, USA
Prior work (e.g., Calahan 1984; Havstad and Herrick 2003) describes the need for long-term ecological monitoring of environmental data such as surface temperature and water quality. Newer studies by Maher, Gregoire, and Chen (2005) and Maher (2009, 2010) motivate the value in similarly documenting natural sound environments in U.S. national parks on the order of a year. Building on these ideas we describe a new system capable of combined remote audio and environmental monitoring on the order of multiple years that is currently being tested in the Florida Everglades.
Convention Paper 8186 (Purchase now)

P6 - Microphone Processing


Friday, November 5, 9:00 am — 10:30 am (Room 220)

Chair:
Jon Boley

P6-1 Digitally Enhanced Shotgun Microphone with Increased DirectivityHelmut Wittek, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany; Christof Faller, Illusonic LLC - Lausanne, Switzerland; Christian Langen, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany; Alexis Favrot, Christophe Tournery, Illusonic LLC - Lausanne, Switzerland
Shotgun microphones are still state-of-the-art when the goal is to achieve the highest possible directivity and signal-to-noise ratio with high signal fidelity. As opposed to beamformers, properly designed shotgun microphones do not suffer greatly from inconsistencies and sound color artifacts. A digitally enhanced shotgun microphone is proposed, using a second backward-oriented microphone capsule and digital signal processing with the goal of improving directivity and reducing diffuse gain at low and medium frequencies significantly, while leaving the sound color essentially unchanged. Furthermore, the shotgun microphone’s rear lobe is attenuated.
Convention Paper 8187 (Purchase now)

P6-2 Conversion of Two Closely Spaced Omnidirectional Microphone Signals to an XY Stereo SignalChristof Faller, Illusonic LLC - St-Sulpice, Switzerland
For cost and form factor reasons it is often advantageous to use omni-directional microphones in consumer devices. If the signals of a pair of such microphones are used directly, time-delay stereo with possibly some weak level-difference cues (device body shadowing) is obtained. The result is weak localization and little channel separation. If the microphones are relatively closely spaced, time-delay cues can be converted to intensity-difference cues by applying delay-and-subtract processing to obtain two cardioids. The delay-and-subtract processing is generalized to also be applicable when there is a device body between the microphones. The two cardioids could be directly used as stereo signal, but to prevent low frequency noise the output signals are derived using a time-variant filter applied to the input microphone signals.
Convention Paper 8188 (Purchase now)

P6-3 Determined Source Separation for Microphone Recordings Using IIR FiltersChristian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Josh Reiss, Queen Mary University of London - London, UK
A method for determined blind source separation for microphone recordings is presented that attenuates the direct path cross-talk using IIR filters. The unmixing filters are derived by approximating the transmission paths between the sources and the microphones by a delay and a gain factor. For the evaluation, the proposed method is compared to three other approaches. Degradation of the separation performance is caused by fractional delays and the directivity of microphones and sources, which are discussed here. The advantages of the proposed method are low latency, low computational complexity, and high sound quality.
Convention Paper 8189 (Purchase now)

P7 - Loudspeaker Design and Amplifiers


Friday, November 5, 9:00 am — 1:00 pm (Room 236)

Chair:
Christopher Struck

P7-1 An Improved Beryllium Dome Diaphragm Assembly for Large Format Compression DriversMarshall Buck, Psychotechnology, Inc. - Los Angeles, CA, USA; Gordon Simmons, Sam Saye, Brush-Wellman - Fremont, CA, USA
We describe the development, manufacture, and testing of a new large format compression driver diaphragm using a beryllium dome and new type of polymer surround that exhibits improved performance. This design promises to give long life and good reliability with little or no change in performance anticipated over the life of the diaphragm. A comprehensive set of tests of Beryllium, Aluminum, and Titanium diaphragm compression drivers is described including frequency response, distortion, and wavelet time domain analysis on a 2-inch plane wave tube. Substantial differences were measured in the performance categories, particularly in the frequency range above 4 kHz.
Convention Paper 8190 (Purchase now)

P7-2 Point-Source Loudspeaker Reversely-Attached Acoustic Horn: Improvement of Acoustic Characteristics and Application to Some MeasurementsTakahiro Miura, Teruo Muraoka, Tohru Ifukube, The University of Tokyo - Tokyo, Japan
It is ideal to measure acoustic characteristics by point- source sound. We proposed, at a previous convention, a point-source measurement loudspeaker that is designed to attach the mouse of hyperbolic horn to the diaphragm of the driver unit. The difference of directional intensity of the loudspeaker at the frequency range of 20 – 700 Hz were within 3 dB at any combination of azimuth and elevation. At the frequency range over 700 Hz, differences of azimuthal directional intensity were within 10 dB while that of the elevational ones were within 20 dB. Following these results, difference of directional frequency characteristics is discussed. Then we applied the loudspeaker for the measurement of acoustic characteristics of a hall.
Convention Paper 8191 (Purchase now)

P7-3 Ironless Motor Loudspeaker: Quantization of the Subjective Enhanced Sound QualityMathias Remy, Technocentre Renault - Guyancourt, France, Laboratoire d'Acoustique de l'Université du Maine, Le Mans Cedex, France; Guy Lemarquand, Technocentre Renault - Guyancourt, France; Daniele Ceruti, Faital S.p.A., Fabbrica Italiana Altoparlanti S.p.A. - Donato Milanese (MI), Italy; Gaël Guyader, Laboratoire d'Acoustique de l'Université du Maine - Le Mans Cedex, France; Romolo Toppi, Faital S.p.A., Fabbrica Italiana Altoparlanti S.p.A. - Donato Milanese (MI), Italy; Marc-François Six, Hutchinson S.A. - Chalette-sur-Loing Cedex, France
This paper presents a set of measurements realized on two automotive loudspeakers. These two loudspeakers have the exact same moving part and suspensions parts but different motors. The first one is equipped with a traditional production model motor made of ferrite and iron whereas the second one gets a prototype of ironless motor made totally of permanent magnets. Blind listening tests performed with these two loudspeakers showed a significant advantage of perceived sound quality for the ironless motor loudspeaker. Several types of measurements have been realized in order to try to quantify and explain this sound quality enhancement. Results are given in this paper.
Convention Paper 8192 (Purchase now)

P7-4 Air Velocity and Pressure Profiles in the Front of an Electrodynamic LoudspeakerDanijel Djurek, Allesandro Volta Applied Ceramics (AVAC) Laboratory for Nonlinear Dynamics - Zlatar Bistrica, Croatia; Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia
Air velocity was recorded in front of an electrodynamic loudspeaker by the use of hot wire anemometric technique. Wire temperature response was detected up to 2 kHz and harmonics were analyzed by the use of the King formula. Near field effects were detected at the z-axis distances comparable to the loudspeaker diameter. Extended Greenspan theory was applied to explain measured data. It was stressed the importance of air viscosity in damping of Morse convection in near field regime. Near field effects at distances up to 30 cm were discussed in terms of the Morse convection indicated by the imaginary part of air impedance. According to continuity equation of air flow microphone signal was correlated to the fluid velocity divergence.
Convention Paper 8193 (Purchase now)

P7-5 New Techniques for Evaluating Audio Amplifiers via Measuring for Induced Wow and Flutter and Differential Phase DistortionsRon Quan, Ron Quan Designs - Cupertino, CA, USA
In the past, mechanical systems were measured for Wow and Flutter or frequency modulation but not amplifiers. Instead, amplifiers are typically measured for intermodulation and harmonic distortion. A new method for audio amplifier/device performance measures frequency modulation effects and differential phase distortion. Frequency and phase detectors are used to evaluate induced frequency and phase modulation from an amplifier under two conditions. The first condition has a low frequency signal inducing the modulation on a high frequency signal. The second condition has a high frequency AM signal inducing the modulation on a lower frequency signal. Practical design topologies for the new test methods are shown and the results of the new testing methods are tabulated.
Convention Paper 8194 (Purchase now)

P7-6 Analysis of Two-Pole Compensation in Linear Audio AmplifiersHarry Dymond, Phil Mellor, University of Bristol - Bristol, UK
An analysis of the two-pole compensation technique used in three-stage linear audio amplifiers is presented. An expression for the loop-gain of a linear amplifier incorporating two-pole compensation is derived, allowing the designer to easily select the unity loop-gain frequency and zero location by choosing appropriate values for the compensation components. Also presented is a simulation method that allows the designer to observe an amplifier’s closed-loop and loop-gain responses in a single pass without requiring modification to the circuit’s feedback path; and two separate modifications to the usual two-pole compensation approach that improve phase margin and significantly enhance negative-rail power-supply rejection ratio.
Convention Paper 8195 (Purchase now)

P7-7 A Robust Pseudo-Ternary Modulation Scheme for Filter-Less Digital Class D AmplifiersRossella Bassoli, Carlo Crippa, Germano Nicollini, ST-Ericsson - Monza Brianza, Italy
This paper presents a new pseudo-ternary modulation scheme for bridge-tied-load digital class D amplifiers that is more robust versus output stage distortions compared to existing ternary modulations. The effects of finite rise and fall times and their mismatches have been introduced for a classical ternary modulation scheme, where a large degradation of the dynamic range can be observed and then extended to reported high performance ternary modulators. It is shown that, even if linearization pulses are inserted to cope with the finite rise/fall time problem, also these modulators are somehow impacted by the edge mismatches.
Convention Paper 8196 (Purchase now)

P7-8 Switching/Linear Hybrid Audio Power Amplifiers for Domestic Applications, Part 1: The Class-B•D AmplifierHarry Dymond, Phil Mellor, University of Bristol - Bristol, UK
The analysis, design, and testing of a parallel switching/linear hybrid audio power amplifier rated at 100 W into 8 ohms are presented. The amplifier employs a hysteretically controlled switching stage and high-bandwidth linear amplifier whose high-gain negative-feedback loop controls the output signal. The majority of the output current is provided by the switching stage, enhancing efficiency. The amplifier’s fidelity has been tested with standard commercially available equipment, while efficiency has been evaluated across a very wide range of signal and load conditions using a custom active-load and automated test procedure. The combined fidelity and efficiency test results are analyzed, and the suitability for domestic applications of this amplifier configuration discussed.
Convention Paper 8197 (Purchase now)

P8 - Audio Processing—1


Friday, November 5, 9:30 am — 11:00 am (Room 226)

P8-1 Near and Far-Field Control of Focused Sound Radiation Using a Loudspeaker ArraySangchul Ko, Youngtae Kim, Jung-Woo Choi, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
In this paper a sound manipulation technique is proposed to prevent unwanted eavesdropping or disturbing others in the vicinity if a multimedia device is being used in a public place. This is capable of realizing the creation of a spatial region having highly acoustic potential energy at the listener’s position. For doing so, the paper discusses the design of multichannel filters with a spatial directivity pattern for a given arbitrary loudspeaker array configuration. First some limitations in using conventional beamforming techniques are presented, and then a novel control strategy is suggested for reproducing a desired acoustic property in a spatial area of interest close to the loudspeaker array. This technique also allows us to control an acoustic property in an area relatively far from the array with a single objective function. In order to precisely produce a desired shape of energy distribution in both areas, spatial weighting technique is introduced. The results are compared with those from controlling each area separately.
Convention Paper 8198 (Purchase now)

P8-2 A Real-Time implementation of a Novel Psychoacoustic Approach for Stereo Acoustic Echo CancellationStefania Cecchi, Laura Romoli, Paolo Peretti, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy
Stereo acoustic echo cancellers (SAECs) are used in teleconferencing systems to reduce undesired echoes originating from coupling between loudspeakers and microphones. The main problem of this approach is related to the issue of uniquely identifying each pair of room acoustic paths, due to high interchannel coherence. In this paper a real-time implementation of a novel approach for SAEC based on the psychoacoustic effect of missing fundamental is proposed. An adaptive algorithm is employed to track and remove the fundamental frequency of one of the two channels, ensuring a continuous decorrelation without affecting the stereo quality. Several tests are presented taking into account a real-time implementation on a DSP framework in order to confirm its effectiveness.
Convention Paper 8199 (Purchase now)

P8-3 Solo Plucked String Sound Detection by the Energy-to-Spectral Flux Ratio (ESFR)Byung Suk Lee, LG Electronics Inc. - Seocho-Gu, Seoul, Korea, Columbia University, New York, NY, USA; Chang-Heon Lee, Yonsei University - Seoul, Korea; Gyuhyeok Jeong, In Gyu Kang, LG Electronics Inc. - Seocho-Gu, Seoul, Korea
We address the problem of distinguishing solo plucked string sound from speech. Due to the harmonic components present in both types of signals, a low complexity music/speech classifier often misclassifies these signals. To capture the sustained harmonic structures observed in solo plucked string sound, we propose a new feature, the Energy-to-Spectral Flux Ratio (ESFR). The values and the statistics of the ESFR for solo plucked string sound were distinct from those for speech when calculated over windows of 20 to 50 ms. By building a low complexity detector with the ESFR, we demonstrate the discriminating performance of the ESFR feature for the considered problem.
Convention Paper 8200 (Purchase now)

P8-4 Separation of Repeating and Varying Components in Audio MixturesSean Coffin, Stanford University - Stanford, CA, USA
A large amount of modern pop music contains digital “loops” or “samples” (short audio clips) that appear multiple times during a song. In this paper a novel approach to separating these exactly repeating component waveforms from the rest of an audio mixture is presented. By examining time-frequency representations of the mixture during several instances of a single repeating component and taking the complex value for each time-frequency bin with the smallest magnitude across all instances we can effectively extract the content that is perceived to be repeating given that the rest of the mixture varies sufficiently. Results are presented demonstrating successful application to commercially available recordings as well as to constructed audio mixtures achieving signal to interference ratios up to 42.8 dB.
Convention Paper 8201 (Purchase now)

P8-5 High Quality Time-Domain Pitch Shifting Using PSOLA and Transient PreservationAdrian von dem Knesebeck, Pooya Ziraksaz, Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
An enhanced pitch shifting system is presented that uses the Pitch Synchronous Overlap Add (PSOLA) technique and a transient detection for processing of monophonic speech or instrument signals. The PSOLA algorithm requires the pitch information and the pitch marks for the signal segmentation in the analysis stage. The pitch is acquired using a well established pitch detector. A new robust pitch mark positioning algorithm is presented that achieves high quality results and allows the positioning of the pitch marks in a frame-based manner to enable real-time application. The quality of the pitch shifter is furthermore enhanced by extracting the transient components before the PSOLA and reapplying them at the synthesis stage to eliminate repetitions of the transients.
Convention Paper 8202 (Purchase now)

P9 - Listening Tests


Friday, November 5, 11:00 am — 1:00 pm (Room 220)

Chair:
Jon Boley

P9-1 A Digital-Domain Listening Test for High-ResolutionJohn Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
There is much debate over whether sampling rates and wordlengths greater than the CD standard are significant for high-quality audio. Tests that have been done require extreme care in selecting compatible devices with known characteristics. I propose tests that use the highest-quality wide-band microphones, only one set of ADCs and DACs, and wide-band reproducing loudspeakers. Real music and artificial signals can be used that have ultrasonic content. The ADCs and DACs are always used at the same extended bit width and high sampling rate, typically 24 bits and 176.4 or 192 kHz. To perform comparative tests at reduced sampling rates and lower bit widths, the digital data is mathematically altered to conform closely to the reduced specification. Files so created can be played back with precise time registration and identical level. ABX tests can be used to quantify if differences are heard, and ensure blindness of tests. Switching of program material can be done in the digital domain, so that relays or other compromising connectivity can be avoided. This paper discusses some remaining difficult issues and outlines the mathematical computations that will be necessary for sample-rate conversion, linear-phase aliasing and reconstruction filters, dithering, and noise shaping of the processed signal.
Convention Paper 8203 (Purchase now)

P9-2 Variance in Level Preference of Balance Engineers: A Study of Mixing Preference and Variance Over TimeRichard King, Brett Leonard, Grzegorz Sikora, McGill University - Montreal, Quebec, Canada
Limited research has been conducted that quantifies how much expert listeners vary over time. A task-based testing method is employed to discern the range of variance an expert listener displays over both short and long periods of time. Mixing engineers are presented with a basic mixing task comprised of one stereo backing track and a solo instrument or voice. By tracking the range in level in which the mixing engineers place a soloist into an accompanying track over a number of trials, trends are observed. Distributions are calculated for three genres of music and variance is calculated over time. The results show that in fact the variance is relatively low, and even lower for the more experienced subjects. These results also provide a baseline for future testing.
Convention Paper 8204 (Purchase now)

P9-3 Evaluation of Superwideband Speech and Audio CodecsUlf Wüstenhagen, Bernhard Feiten, Jens Kroll, Alexander Raake, Marcel Wältermann, Deutsche Telekom AG Laboratories - Berlin, Germany
Increasingly growing usage of headphones for different telephony applications is paired with an increased quality expectation of the user. Recently, different standardization bodies have started work on an enhancement of telephone services. One objective is to improve the quality by providing a codec that supports low-delay super-wideband or fullband quality and in addition show a good quality not only for speech but also for music. Deutsche Telekom Laboratories have evaluated a range of low-delay super-wideband speech and audio codecs in comprehensive listening tests. The tests were conducted using the MUSHRA test method. A mixture of speech and audio conditions were used to check the performance of the codecs for different program types. The results of the listening tests are presented and discussed in the light of future applications.
Convention Paper 8205 (Purchase now)

P9-4 Subjective Listening Tests and Neural Correlates of Speech Degradation in Case of Signal-Correlated NoiseJan-Niklas Antons, Anne K. Porbadnigk, Robert Schleicher, Benjamin Blankertz, Sebastian Möller, Berlin Institute of Technology - Berlin, Germany; Gabriel Curio, Charité-University Medicine - Berlin, Germany
In this paper we examine whether particularly sensitivity of the human cortex to reduction in speech quality is visible in the electroencephalogram (EEG) and whether these measures can be used to improve the behavioral assessment of speech quality. We degraded a speech stimulus (vowel /a/) in a scalable way and asked for a behavioral rating. In addition, the brain activity was measured with EEG. We trained classifiers, who were found capable of distinguishing between events that are seemingly similar at the behavioral level (i.e., no button press), neurally, however, noise contamination is detected, possibly affecting the long-term contentment with the transmission quality.
Convention Paper 8206 (Purchase now)

P10 - Audio Processing—2


Friday, November 5, 11:30 am — 1:00 pm (Room 226)

P10-1 MPEG-A Professional Archival Application Format and its Application for Audio Data ArchivingNoboru Harada, Yutaka Kamamoto, Takehiro Moriya, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan; Masato Otsuka, Memory-Tech Corporation - Tokyo, Japan
ISO/IEC 23000-6 (MPEG-A) Professional Archival Application Format (PA-AF) has just been standardized. This paper proposes an optimized and standard compliant implementation of a PA-AF archiving tool for audio archiving applications. The implementation made use of an optimized MPEG-4 Audio Lossless Coding (ALS) codec library for audio data compression and Gzip for other files. The PA-AF specification was extended to support platform specific attributes of Mac OSs while keeping interoperability among other OSs. Performance test results for actual audio data, such as ProTools HD projects, show that the processing time of a devised PA-AF archiving tool is twice as fast as that of MacDMG and WinZip while the compressed data size is much smaller than that of MacDMG and WinZip.
Convention Paper 8207 (Purchase now)

P10-2 Switched Convolution Reverberator with Two-Stage Decay and Onset Time ControlKeun-Sup Lee, Jonathan S. Abel, Stanford University - Stanford, CA, USA
An efficient artificial reverberator having two-stage decay and onset time controls is presented. A second-order comb filter controlling the reverberator frequency-dependent decay rates and onset times drives a switched convolution with short noise sequences. In this way, a non-exponential reverberation envelope is produced by the comb filter, while the switched convolution structure produces a high echo density. Several schemes for generating two-stage decays and onset time controls with different onset characteristics in different frequency-band are described.
Convention Paper 8208 (Purchase now)

P10-3 Guitar-to-MIDI Interface: Guitar Tones to MIDI Notes Conversion Requiring No Additional PickupsMamoru Ishikawa, Takeshi Matsuda, Michael Cohen, Univeristy of Aizu - Aizu-Wakamatsu, Fukushima-ken, Japan
Many musicians, especially guitarists (both professional and amateur), use effects processors. In recent years, a large variety of digital processing effects have been made available to consumers. Further, desktop music, the “lingua franca” of which is MIDI, has become widespread through advances in computer technology and DSP. Therefore, we are developing a “Guitar to MIDI” interface device that analyzes the analog guitar audio signal and emits a standard MIDI stream. Similar products are already on the market (such as the Roland GI-20 GK-MIDI Interface), but almost all of them need additional pickups or guitar modification. The interface we are developing requires no special guitar accessories. We describe a prototype platformed on a PC that anticipates a self-contained embedded system.
Convention Paper 8209 (Purchase now)

P10-4 A Mixed Mechanical/Digital Approach for Sound Beam Pointing with Loudspeakers Line ArrayPaolo Peretti, Stefania Cecchi, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy; Marco Secondini, Andrea Fusco, FBT Elettronica S.p.a. - Recanati (MC), Italy
Digital steering is often used in line array sound systems in order to tilt the reproduced sound beam in a desired direction. Unfortunately, the working frequency range is limited to low and medium frequencies, thus, sound beams referred to high frequencies can be tilted only by using a mechanical steering involving both an expensive manufacture and a higher environmental impact. The proposed solution is a mixed approach to sound beam steering by considering an on-axis mechanical rotation of each loudspeaker together with the classical digital control applied to the entire system. In this manner the sound beam can be tilted also at high frequency maintaining linear array geometry. Simulations, considering real loudspeaker directivity, will be shown in order to demonstrate the effectiveness of the proposed approach.
Convention Paper 8210 (Purchase now)

P10-5 The Non-Flat and Continually Changing Frequency Response of Multiband CompressorsEarl Vickers, STMicroelectronics, Inc. - Santa Clara, CA, USA
Multiband dynamic range compressors are powerful, versatile tools for audio mastering, broadcast, and playback. However, they are subject to certain problems relating to frequency response. First, when excited by a time-varying narrow-band input such as a swept sinusoid, they create unwanted magnitude peaks at the band boundaries. Second, and more importantly, the frequency response continually changes, which may have unwanted effects on the long-term average spectral balance. This paper proposes a frequency-domain solution for the unwanted magnitude peaks, whereby slight adjustments to the band boundaries prevent sinusoidal peaks from being midway between two bands. For the second problem, real-time spectral balance compensation may be implemented in either the time or frequency domain.
Convention Paper 8211 (Purchase now)

P10-6 Volterra Series-Based Distortion EffectFinn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
A large part of the characteristic sound of the electric guitar comes from nonlinearities in the signal path. Such nonlinearities may come from the input- or output-stage of the amplifier, which is often equipped with vacuum tubes or a dedicated distortion pedal. In this paper the Volterra series expansion for non linear systems is investigated with respect to generating good distortion. The Volterra series allows for unlimited adjustment of the level and frequency dependency of each distortion component. Subjectively relevant ways of linking the different orders are discussed.
Convention Paper 8212 (Purchase now)

P11 - Acoustical and Physical Modeling


Friday, November 5, 2:30 pm — 6:30 pm (Room 220)

Chair:
Julius O. Smith

P11-1 Virtual Acoustic Prototyping—Practical Applications for Loudspeaker DevelopmentAlex Salvatti, JBL Professional - Northridge, CA, USA
Acoustic simulations using finite elements have been used in loudspeaker development for over 20 years, with complexity and accuracy accelerating in tandem with the increases in computing power generally available on the engineering desktop. Using user-friendly, modern FEA software, the author presents an overview of methods to build virtual prototypes of both horns and loudspeaker drivers that allows a significant reduction in the number of physical prototypes, as well as reduced development time. A comparison of simulated vs. measured data proves the validity of the methods.
Convention Paper 8213 (Purchase now)

P11-2 Simulation of Horn Driver Response by Combination of Matrix Analysis and FEAAlex Voishvillo, JBL Professional - CA, USA
To access performance of a horn driver (compression driver loaded by a horn), measurement of frequency response on-axis and off-axis must be carried out. The measurement process is time-consuming especially if the entire 3-dimensional “balloon” of responses is to be measured. Prediction of directional responses of the horn only (without compression driver) can be performed by the FEA (Finite Elements Analysis) or BEA (Boundary Elements Analysis). However, FEA or BEA of horn only provides relative directional properties of the horn. The SPL responses of horn driver at different angles remain unknown because these responses depend on interaction of electrical, mechanical, and acoustical parameters of the compression driver and the acoustical parameters of the horn. New methods based on a combination of FEA and matrix analysis makes it possible to predict the response of a combination of various compression drivers and horns without actually measuring each combination and even without physically building horns. This method was verified during the development of a new AM series of JBL professional loudspeaker systems and showed high accuracy.
Convention Paper 8214 (Purchase now)

P11-3 Dynamic Motion of the Corrugated Ribbon In a Ribbon MicrophoneDaniel Moses Schlessinger, Sennheiser DSP Research Laboratory - Palo Alto, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
Ribbon microphones are known for their warm sonics, owing in part to the unique ribbon motion induced by the sound field. Here the motion of the corrugated ribbon element in a sound field is considered, and a physical model of the ribbon motion is presented. The model separately computes propagating torsional disturbances and coupled transverse and longitudinal disturbances. Each propagation mode is implemented as a mass-spring model where a mass is identified with a ribbon corrugation fold. The model is parametrized using ribbon material and geometric properties. Laser vibrometer measurements are presented, revealing stiffness in the transverse and longitudinal propagation and showing close agreement between measured and modeled ribbon motion.
Convention Paper 8215 (Purchase now)

P11-4 Modeling of Leaky Acoustic Tube for Narrow-Angle Directional MicrophoneKazuho Ono, Takehiro Sugimoto, Akio Ando, Kimio Hamasaki, NHK Science and Technology Research Laboratories - Kinuta Setagaya-ku, Tokyo, Japan; Takeshi Ishii, Yutaka Chiba, Keishi Imanaga, Sanken Microphone Co. Ltd. - Suginami-ku, Tokyo, Japan
Line microphones have been popular as narrow directional microphones for a long time. Their structure adopts a leaky acoustical tube with many slits to suppress off-axis sensitivity, together with a directional capsule attached to this tube. Although many microphones of this type are on the market, we have no quantitative theory to explain its behavior, which is very important for effectively designing directivity. We thus modeled the leaky acoustical tube using a distributed equivalent circuit and combined it with the directional capsule’s equivalent circuit model. The analysis showed that the model agreed well with the measurement results, particularly at the directional characteristics, while an ordinary model of acoustical tube using delay and sum modeling did not.
Convention Paper 8216 (Purchase now)

P11-5 Modeling Viscoelasticity of Loudspeaker Suspensions Using Retardation SpectraTobias Ritter, Finn Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
It is well known that, due to viscoelastic effects in the suspension, the displacement of the loudspeaker increases with decreasing frequency below the resonance. Present creep models are either not precise enough or purely empirical and not derived from the basis of physics. In this investigation, the viscoelastic retardation spectrum, which provides a more fundamental description of the suspension viscoelasticity, is first used to explain the accuracy of the empirical LOG creep model (Knudsen et al.). Then, two extensions to the LOG model are proposed that include the low and high frequency limit of the compliance, not accounted for in the original LOG model. The new creep models are verified by measurements on two 5.5 loudspeakers with different surrounds.
Convention Paper 8217 (Purchase now)

P11-6 Physical Modeling and Synthesis of Motor Noise for Replication of a Sound Effects LibrarySimon Hendry, Josh Reiss, Queen Mary University of London - London, UK
This paper presents the results of objective tests exploring the concept of using a small number of physical models to create and replicate a large number of samples from a traditional sound effects library. The design of a DC motor model is presented and this model is used to create both a household drill and a small boat engine. The harmonic characteristics, as well as the spectral centroid were compared with the original samples, and all the features agree to within 6.1%. The results of the tests are discussed with a heavy emphasis on realism and perceived accuracy, and the parameters that have to be improved in order to humanize a model are explored.
Convention Paper 8218 (Purchase now)

P11-7 Measures and Parameter Estimation of Triodes for the Real-Time Simulation of a Multi-Stage Guitar PreamplifierIvan Cohen, Ircam - Paris, France, Orosys R&D, Montpellier, France; Thomas Hélie, Ircam - Paris, France
This paper deals with the real-time simulation of a multi-stage guitar preamplifier. Dynamic triode models based on Norman Koren’s model, and "secondary phenomena" as grid rectification effect and parasitic capacitances are considered. Then, the circuit is modeled by a nonlinear differential algebraic system, with extended state-space representations. Standard numerical schemes yield efficient stable simulations of the circuit and are implemented as VST plug-ins. Measures of real triodes have been realized, to develop new triode models, and to characterize the capabilities of aged and new triodes. The results are compared for all the models, using lookup tables generated with the measures and Norman Koren’s model with its parameters estimated from the measures.
Convention Paper 8219 (Purchase now)

P11-8 ZFIT: A MATLAB Tool for Thiele-Small Parameter Fitting and OptimizationChristopher Struck, CJS Labs - San Francisco, CA, USA
Over the years, many approaches to the calculation of the Thiele-Small parameters have been presented. Most current methods rely upon curve-fitting to the impedance magnitude data for a specific lumped parameter model. A flexible Matlab least-mean-squares optimization tool for complex loudspeaker impedance data is described. Magnitude and phase data are fit to a user-selected lumped parameter model of variable complexity. Appropriate constraints on the optimization help identify if the selected model is of sufficient order or overly complex for the given data. Examples are shown for impedance data from several different loudspeaker drivers.
Convention Paper 8220 (Purchase now)

P12 - Virtual Rooms


Friday, November 5, 2:30 pm — 5:30 pm (Room 236)

Chair:
Jean-Marc Jot

P12-1 Assessing Virtual Teleconferencing RoomsMansoor Hyder, Michael Haun, Olesja Weidmann, Christian Hoene, Universität Tübingen - Tübingen, Germany
Spatial audio makes teleconferencing more natural, helps to locate and distinguish talkers in a virtual acoustic environment, and to understand multiple talkers. This paper presents a study on how to design virtual acoustic environments used in 3-D audio teleconferences to maximize localization performance, easiness, and subjective speech quality ratings. We conducted subjective listening-only tests considering different parameters describing the virtual acoustic environment, including acoustic room properties, virtual sitting arrangements, reflections of a conference table, number of concurrent talkers and different voice types of call participants. The experimental results help us to enhance the performance of our open-source, spatial audio teleconferencing solution named "3DTel" by enhancing the quality of its user experience in terms of naturalness and speech quality.
Convention Paper 8221 (Purchase now)

P12-2 Stereo Acoustic Echo Cancellation for Telepresence SystemsShreyas Paranjpe, Scott Pennock, Phil Hetherington, QNX Software Systems Inc. (Wavemakers) - Vancouver, BC, Canada
A Telepresence system provides its users the sense that they are virtually present in another physical location. Typically, this means providing a high quality audio and video communication path. Simply adding video communication to typical audio teleconferencing is not enough. When users see each other, they quickly realize the poor audio performance, such as half duplex behavior, that is commonly implemented. In order to make affordable Telepresence systems for everyone, the challenge is to design high performance audio communication systems that are computationally efficient.
Convention Paper 8222 (Purchase now)

P12-3 Early Energy Conditions in Small Rooms and in Convolutions of Small-Room Impulse ResponsesU. Peter Svensson, Hassan El-Banna Zidan, Norwegian University of Science and Telecommunications - Trondheim, Norway
A simplified prediction model for the early-to-late energy ratio has been tested for small rooms that are typically used in video conferences. Measurements have been carried out in a few rooms and early- and late-energy levels have been compared with predictions according to Barron's model and predicted octave-band levels are typically within 1-2 dB of measured values. Measured impulse responses are then convolved to simulate a video conference setup, and simplified predictions of the early and late energy conditions of convolved impulse responses are compared with (convolved) measurements.
Convention Paper 8223 (Purchase now)

P12-4 A Convolution-Based System for Virtual Acoustic Support of Performing MusiciansWieslaw Woszczyk, Doyuen Ko, Brett Leonard, McGill University - Montreal, Quebec, Canada
Musicians performing on stage need to hear a proper balance of themselves and other players in order to achieve a good sense of ensemble and musicality. Their ability is influenced by the quality of the acoustic response of the room. In spaces where the existing acoustic conditions are detrimental to good communication on stage, ‘electronic architecture’ may be used to rebuild the acoustic support for musicians. A system is developed that utilizes measured impulse responses from a variety of superior acoustic spaces to generate, using near zero-latency multichannel convolution, an artificial sound field augmenting that already present. This method of virtual acoustic technology does not amplify (or use again) the energy produced by the existing room; instead it generates desirable room response components from other measured spaces. The adjustable acoustic conditions are set using a comprehensive GUI, transducer arrays, and a layered system architecture.
Convention Paper 8224 (Purchase now)

P12-5 Simulating Hearing Loss in Virtual TrainingRamy Sadek, David Krum, Mark Bolas, University of Southern California Institute for Creative Technologies - Playa Vista, CA, USA
Audio systems for virtual reality and augmented reality training environments commonly focus on high-quality audio reproduction. Yet many trainees may face real-world situations wherein hearing is compromised. In these cases, the hindrance caused by impaired or lost hearing is a significant stressor that may affect performance. Because this phenomenon is hard to simulate without actually causing hearing damage, trainees are largely unpracticed at operating with diminished hearing. To improve the match between training scenarios and real-world situations, this effort aims to add simulated hearing loss or impairment as a training variable. The goal is to affect everything users hear—including non-simulated sounds such as their own and each other’s voices—without overt noticeability, risk to hearing, or requiring headphones.
Convention Paper 8225 (Purchase now)

P12-6 OpenAIR: An Interactive Auralization Web Resource and DatabaseSimon Shelley, Damian T. Murphy, University of York - Heslington, York, UK
There have been many recent initiatives to capture the impulse responses of important or interesting acoustic spaces, although not all of this data has been made more widely available to researchers interested in auralization. This paper presents the Open Acoustic Impulse Response (OpenAIR) Library, a new online resource allowing users to share impulse responses and related acoustical information. Open-source software is provided, allowing the user to render the acoustical data using various auralization strategies. Software tools and guidelines for the process of impulse response capture are also provided, aiming to disseminate best practice. The database can accommodate impulse response datasets captured according to different measurement techniques and the use of robust spatial audio coding formats is also considered for the distribution of this type of information. Visitors to the resource can search for acoustical data using keywords and can also browse uploaded datasets on a world map.
Convention Paper 8226 (Purchase now)

P13 - Audio Equipment and Measurement


Friday, November 5, 2:30 pm — 4:00 pm (Room 226)

P13-1 Neutral-Point Oscillation Control Based on a New Audio Space Vector Modulation (A-SVM) for DCI-NPC Power AmplifiersVicent Sala, Luis Romeral, G. Ruiz, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
In this paper the oscillation or flotation in the DC-BUS neutral point in the DCI-NPC (Diode Clamped Inverter – Neutral Point Clamped) amplifiers is presented as one of the most important distortion sources. This perturbation is characterized and studied, as well as its causes and distorting effects. It also presents two techniques of vector modulation for audio. The intelligent use of these techniques in the process of vector modulation allows the redistribution of the charge of the two capacitors in the DC-BUS, allowing the control of the voltage in the neutral point of the DC-BUS, and therefore, the cancellation of the flotation and its distorting effects. Experimental and simulation results that verify these strategies are presented.
Convention Paper 8227 (Purchase now)

P13-2 Vacuum Tube Amplifiers Using Electronic DC TransformersTheeraphat Poomalee, Kamon Jirasereeamornkul, King Mongkut’s University of Technology Thonburi - Tung-kru, Bangkok Thailand; Marian K. Kazimierczuk, Wright State University - Dayton, OH, USA
This paper proposes a method to synthesis vacuum-tube audio amplifiers using the electronic DC transformers to replace the traditional audio-frequency output transformers usually used in the output stage of the amplifier. The proposed amplifiers can achieve the frequency response from DC-100 kHz if the DC transformers operated at 500 kHz switching frequency and interleave technique are used. The principle of operation, DC model, and various examples are given.
Convention Paper 8228 (Purchase now)

P13-3 The Single Stereo Display and Stereo VU MetersMichael D. Callaghan, Radio Station KIIS-FM - Los Angeles, CA, USA
This paper describes the use of a single row of bi-color indicators to replace and overcome the deficiencies of the typical pair of meters used to show left and right signal levels in stereo applications. By using bi-color elements, a total of three colors are actually obtained; a single color when the left channel is driven, a single color when the right channel is driven, and a mixture of the two when both channels are driven. Watching the row of indicators during program operation will indicate three different amplitudes; the left channel volume, the right channel volume, and the difference between the two of them. These amplitudes are immediately obvious and very easy to interpret.
Convention Paper 8229 (Purchase now)

P13-4 Frequency Characteristics Measurements of Cylindrical Record Player by the Pulse-Train MethodTeruo Muraoka, Takahiro Miura, Tohru Ifukube, The University of Tokyo - Tokyo, Japan
The authors have been engaged in the research of restoration of seriously damaged audio signals employing Generalized Harmonic Analysis (GHA). In this research it is important to know frequency characteristics of sound reproducing equipment to obtain clear sound with proper tonal equalization. The authors previously measured frequency characteristics of several acoustic 78 rpm shellac-record players utilizing the Pulse-Train Method, and successively measured cylindrical record players with same method recently. Frequency characteristics of phonograph record players were measured using frequency test records conventionally, however it is impossible to obtain shellac or cylindrical test records any more. Therefore the authors employed the Pulse-Train Method, which was originally developed for the measurements of phonograph cartridges and cutter heads in 1970s. For the measurement this time, the authors first made a cylindrical record curved a silent sound groove and curved an additional groove perpendicular to the sound groove on the cylinder surface. Pulse-train response was obtained by reproducing the cylindrical record using object record players and reference electric record player. Frequency characteristics of object record players were analyzed applying DFT to measured Pulse-Train waveforms.
Convention Paper 8230 (Purchase now)

P13-5 Seeing Sound: Sound Sensor Array with Optical OutputsCharles Seagrave, Seagrave Instruments - San Rafael, CA, USA; Eric Benjamin, Consultant - Pacifica, CA, USA
Characterization of acoustic spaces frequently involves taking SPL measurements at numerous locations within the space. Such measurements typically require relocation of the measurement apparatus or multiple microphones wired to a multiplexer. This approach can be time consuming, especially if it must be repeated after changes in loudspeaker location or acoustical treatments of other modifications. This paper presents methods of visualizing both standing waves in rooms and loudspeaker coverage uniformity in outdoor venues, using an array of sound sensors with optical (visible light) output. This new approach allows for rapid visual observation of sound fields, and simultaneous SPL data collection from multiple positions.
Convention Paper 8231 (Purchase now)

P13-6 Effects of Oversampling on SNR Using Swept-Sine AnalysisChristopher Bennett, Daniel Harris, Adam Tankanow, Ryan Twilley, Oygo Sound, LLC - Miami, FL, USA
The swept-sine technique is an alternative method to acquire impulse response measurements and distortion component responses. Swept-sine analysis has been under recent investigation for its use in auditory applications. In this paper the researchers seek to show that an improvement in signal-to-noise ratio (SNR) can be achieved by applying oversampling while utilizing swept-sine analysis. Oversampling does not give an improvement in SNR in traditional click impulse response methods; however, due to the noise shaping properties of the post-processing involved in swept-sine analysis, the noise floor can be reduced.
Convention Paper 8232 (Purchase now)

P13-7 Rapid In-Place Measurements of Multichannel VenuesJohn Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
It is often useful to have transfer-function measurements of large venues with an audience present. This precludes multiple chirps or other long-duration signals from being used. This paper studies the use of simultaneous, multiple “orthogonal” maximum-length sequences applied to the loudspeakers, captured by a number of microphones at selected listening positions. Such MLS signals last only a few seconds and are noise-like, being minimally disruptive to an audience, yet they allow full transfer-function system identification between each loudspeaker and microphone. The main detractor of the method is that the effective noise level is high. This paper studies implementation issues and assesses the S/N of such measurements. It turns out that exciting each loudspeaker separately is usually better than simultaneous excitation, except in special circumstances. An example is shown for the simultaneous measurement of two loudspeakers in a room with two microphones.
Convention Paper 8233 (Purchase now)

P13-8 Ground Loops: The Rest of the StoryBill Whitlock, Jensen Transformers, Inc. - Chatsworth, CA, USA; Jamie Fox, The Engineering Enterprise - Alameda, CA, USA
The mechanisms that enable so-called ground loops to cause well-known hum, buzz, and other audio system noise problems are well known. But what causes power-line related currents to flow in signal cables in the first place? This paper explains how magnetic induction in ordinary premises AC wiring creates the small voltage differences normally found among system ground connections, even if “isolated” or “technical” grounding is used. The theoretical basis is explored, experimental data shown, and an actual case history related. Little has been written about this “elephant in the room” topic in engineering literature and apparently none in the context of audio or video systems. It is shown that simply twisting L-N pairs in the premises wiring can profoundly reduce system noise problems.
Convention Paper 8234 (Purchase now)

P14 - Loudspeakers and Microphones


Friday, November 5, 4:30 pm — 6:00 pm (Room 226)

P14-1 Coaxial Flat Panel Loudspeaker System with Dynamic Push-Pull DriveDrazenko Sukalo, DSLab - Device Solution Laboratory - Munich, Germany
After the successful introduction of the flat television, acousticians are concerned with the design of a “full-range” flat panel loudspeaker. A new design with low manufactured depth, consisting of an array of two conventional cone drivers and a transmission line and the method for driving of them is presented. The main aim was to build a small-sized flat panel box but with extended low frequency response and low distortion output because of the extended liner diaphragm excursion. The PSpice-OrCAD® simulator was used to represent a distributed model of a transmission line. The results of the simulation show the influence of the parameters of the transmission-line enclosure on the impedance curve and resonant frequency of the woofer driver. Among others, this paper is concerned with an active filter design for driving loudspeaker drive units in an appropriate phase relationship in the low frequency region, by means of implementing of DPP drive. A prototype of the flat panel loudspeaker is built according to the described design concept and the results of sound pressure level measurement are presented. The design result from work performed for DSLab and is subject to the referenced patent.
Convention Paper 8235 (Purchase now)

P14-2 A Novel Universal-Serial-Bus-Powered Digitally Driven Loudspeaker System with Low Power Dissipation and High FidelityHajime Ohtani, Akira Yasuda, Kenzo Tsuihiji, Ryota Suzuki, Daigo Kuniyoshi, Hosei University - Koganei, Tokyo, Japan; Junichi Okamura, Trigence Semiconductor - Chiyoda, Tokyo, Japan
We propose a novel digitally driven loudspeaker system in which a newly devised mismatch shaper method, multilevel noise shaping dynamic element matching, is used to realize high fidelity, high sound power level, and low power dissipation. The unit used for the mismatch shaper method can easily increase the number of sound pressure levels with the aid of an H-bridge circuit, even when the number of sub-speakers is fixed. Further, it reduces the noise caused by quantization and loudspeaker mismatches and decreases the switching loss. The output sound power level equipped with six voice coils is 94 dB/m when a 3.3-V universal-serial-bus power supply is used exclusively. The power efficiency is 95% at 0 dBFS and 75% at –10 dBFS.
Convention Paper 8236 (Purchase now)

P14-3 Loudspeaker Rub Fault Detection by Means of a New Nonstationary Procedure TestGerman Ruiz, Vicent Sala, Miguel Delgado, Juan Antonio Ortega, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
This paper addresses rub defect loudspeaker detection. The study includes a simulation with a rub model based on classical static coulomb friction added to the loudspeaker nonlinearities parametric model to demonstrate the current signal viability to rub failure detection. The electric current signal is analyzed by means of Zhao-Atlas-Marks distribution (ZAMD). A failure extractor based on relevant harmonic ZAMD frequency regions segmentation and Mahalanobis distance is presented. The simulation and experimental results show the goodness and reliability of rub detection method presented.
Convention Paper 8237 (Purchase now)

P14-4 Contributions to the Improvement of the Response of a Pleated LoudspeakerJaime Ramis, Rita Martinez, Acustica Beyma S.L. - Moncada, Valencia, Spain; E. Segovia, Obras Públicas e Infraestructura Urbana - Spain; Jesus Carbajo, Jaime Ramis, Universidad de Alicante - Alicante, Spain
In this paper we describe some results that have led to the improvement of the response of an Air Motion Transformer loudspeaker. First, it is noteworthy that it has been found an approximate analytical solution to the differential equations system that governs the behavior of the moving assembly of this type of transducer, being this valid when the length of the pleat is much greater than the radius of the cylindrical part. This solution is valid for any type of analysis (static, modal, and harmonic), and the modes are significantly simplified assuming the hypothesis above mentioned. In addition, we have analyzed the influence of the thickness and the shape of perforation of the pole piece in the frequency response of the loudspeaker.
Convention Paper 8238 (Purchase now)

P14-5 Exploring the Ultra-directional Acoustic Responses of an Electret Cell Array LoudspeakerYu-Chi Chen, Wen-Ching Ko, National Taiwan University - Taipei, Taiwan; Chang-Ho Liou, National Taiwan University - Taipei, Taiwan, Industrial Technology Research Institute, Hsinchu Taiwan; Wen-Hsin Hsiao, Chih-Chiang Cheng, Wen-Jong Wu, Pei-Zen Chang, National Taiwan University - Taipei, Taiwan; Chih-Kung Lee, National Taiwan University - Taipei, Taiwan, Institute for Information Industry, Taipei, Taiwan
In recent years, novel thin-plate loudspeakers have triggered much interest. Applications in areas such as 3C peripherals, automobile audio systems, and home theater have been actively discussed. However, the acoustic directivity of a thin-plate loudspeaker depends on the frequency response. At this time, thin-plate loudspeakers have poor directivity. However, if this limitation can be overcome, thin-plate loudspeakers can find useful applications such as in museums, supermarkets, or exhibition areas that require a channeling the sound to a particular area or location without affecting nearby areas or unintended audiences. From previous studies, electret cell arrays have been confirmed to be an excellent flexible flat loudspeaker since it can create high performance sounds in a mid to high frequency range. An electret loudspeaker can generate ultra-directional audible sound by adjusting the array size, amplitude modulation, and layout structure.
Convention Paper 8239 (Purchase now)

P14-6 A Soundfield Microphone Using Tangential CapsulesEric Benjamin, Suround Research - Pacifica, CA, USA
The traditional soundfield microphone is a tetrahedral array of pressure gradient microphones, the outputs of which are linearly combined in order to realize signals that are proportional to co-located microphones, one with omnidirectional sensitivity and three orthogonal microphones with figure-of-eight sensitivity. This configuration works well and has been the basis of commercial products for a number of years. Recently, an alternative array type has been disclosed [2,3] by Craven, Law, and Travis, comprised of pressure gradient sensors arranged with their principle axes oriented tangentially with respect to the center. Additional analysis has been performed and several prototypes were constructed and evaluated.
Convention Paper 8240 (Purchase now)

P14-7 A 2-Way Loudspeaker Array System with Pseudorandom Spacing for Music ConcertsYuki Ayabe, Saburo Nakano, Tokyo City University - Setagaya-ku, Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Shogo Kiryu, Tokyo City University - Setagaya-ku, Tokyo, Japan
A 96-channel loudspeaker array system that allows real-time control of sound field has been developed for live musical concerts. Multiple sound focused at different points can been generated and controlled independently using the system. The variable delay circuits, the controller of the power amplifier, and the communication circuit between the hardware and the computer are implemented in FPGAs. In order to extend the frequency range and reduce the spatial aliasing, the loudspeaker array is assembled by two-way loudspeakers with pseudorandom spacing.
Convention Paper 8241 (Purchase now)

P15 - Multichannel Audio Playback


Saturday, November 6, 9:00 am — 1:00 pm (Room 220)

Chair:
Alex Voishvillo

P15-1 Why Ambisonics Does WorkEric Benjamin, Suround Research - Pacifica, CA, USA; Richard Lee, Pandit Littoral - Cookstown, Australia; Aaron Heller, SRI International - Menlo Park, CA, USA
Several techniques exist for surround sound, including Ambisonics, VBAP, WFS, and pair-wise panning. Each of the systems have strengths and weaknesses but Ambisonics has long been favored for its extensibility and for being a complete solution, including both recording and playback. But Ambisonics has not met with great critical or commercial success despite having been available in one form or another for many years. Some observers have gone so far as to suggest that Ambisonics can’t work. The present paper is intended to provide an analysis of the performance of Ambisonics according to various psychoacoustic mechanisms in spatial hearing, such as localization and envelopment.
Convention Paper 8242 (Purchase now)

P15-2 Design of Ambisonic Decoders for Irregular Arrays of Loudspeakers by Non-Linear OptimizationAaron J. Heller, SRI International - Menlo Park, CA, USA; Eric Benjamin, Surround Research - Pacifica, CA, USA; Richard Lee, Pandit Littoral - Cooktown, Queensland, Australia
In previous papers, the present authors described techniques for design, implementation, and evaluation of Ambisonic decoders for regular loudspeaker arrays. However, to accommodate domestic listening rooms, irregular arrays are often required. Because the figures of merit used to predict decoder performance are non-linear functions of loudspeaker positions, non-linear optimization techniques are needed. In this paper we discuss the implementation of an open-source application based on the NLopt non-linear optimization software library that derives decoders for arbitrary arrays of loudspeakers, as well as providing a prediction of their performance using psychoacoustic criteria, such as Gerzon’s velocity and energy localization vectors. We describe the implementation and optimization criteria and report on listening tests comparing the decoders produced.
Convention Paper 8243 (Purchase now)

P15-3 Discrete Driving Functions for Wave Field Synthesis and Higher Order AmbisonicsCésar D. Salvador, Universidad de San Martín de Porres - Lima, Peru
Practical implementations of physics-based spatial sound reproduction techniques, such as Wave Field Synthesis (WFS) and Higher Order Ambisonics (HOA), require real-time filtering, scaling, and delaying operations on the audio signal to be spatialized. These operations form the so-called loudspeaker’s driving function. This paper describes a discretization method to obtain a rational representation in the z-plane from the continuous WFS and HOA driving functions. Visual and numerical comparisons between the continuous and discrete driving functions, and between the continuous and discrete sound pressure fields, synthesized with circular loudspeaker arrays, are shown. The percentage discretization errors, in the reproducible frequency range and in the whole listening area, are in the order of 1%. A methodology for the reconstruction of immersive soundscapes composed with nature sounds is also reported as a practical application.
Convention Paper 8244 (Purchase now)

P15-4 Reducing Artifacts of Focused Sources in Wave Field SynthesisHagen Wierstorf, Matthias Geier, Sascha Spors, Technische Universität Berlin - Berlin, Germany
Wave Field Synthesis provides the possibility to synthesize virtual sound sources located between the loudspeaker array and the listener. Such sources are known as focused sources. Previous studies have shown that the reproduction of focused sources is subject to audible artifacts. The strength of those artifacts heavily depends on the size of the loudspeaker array. This paper proposes a method to reduce artifacts in the reproduction of focused sources by using only a subset of loudspeakers of the array. A listening test verifies the method and compares it to previous results.
Convention Paper 8245 (Purchase now)

P15-5 On the Anti-Aliasing Loudspeaker for Sound Field Synthesis Employing Linear and Circular Distributions of Secondary SourcesJens Ahrens, Sascha Spors, Deutsche Telekom AG Laboratories - Berlin, Germany
The theory of analytical approaches for sound field synthesis like wave field synthesis, nearfield compensated higher order Ambisonics, and the spectral division method requires continuous distributions of secondary sources. In practice, discrete loudspeakers are employed and the synthesized sound field is corrupted by a number of artifacts that are commonly referred to as spatial aliasing. This paper presents a theoretical investigation of the properties of the loudspeakers that are required in order to suppress such spatial aliasing artifacts. It is shown that the employment of such loudspeakers is not desired since the suppression of spatial aliasing comes by the cost of an essential restriction of the reproducible spatial information when practical loudspeaker spacings are assumed.
Convention Paper 8246 (Purchase now)

P15-6 The Relationship between Sound Field Reproduction and Near-Field Acoustical HolographyFilippo M. Fazi, Philip Nelson, University of Southampton - Southampton, UK
The problem of reproducing a desired sound field with an array of loudspeakers and the technique known as Near-Field Acoustical Holography share some fundamental theoretical aspects. It is shown that both problems can be formulated as an integral equation that usually defines an ill-posed problem. The example of spherical geometry and planar geometry is discussed in detail. It is shown that for both the reproduction and the acoustical holography cases, the ill-conditioning of the problem is greatly affected by the distance between the source layer and the measurement/control surface.
Convention Paper 8247 (Purchase now)

P15-7 Surround Sound with Height in Games Using Dolby Prologic IIzNicolas Tsingos, Christophe Chabanne, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA; Matt McCallus, RedStorm Entertainment - Cary, NC, USA
Dolby Pro Logic IIz is a new matrix encoding/decoding system that enables the transmission of a pair of height channels within a conventional surround sound stream (e.g. 5.1). In this paper we provide guidelines for the use of Pro logic IIz for interactive gaming applications including recommended speaker placement, creation of elevation information, and details on how to embed the height channels within a 5- or 7-channel stream. Surround sound with height is already widely available in home-theater receivers. It offers increased immersion to the user and is a perfect fit for 2-D or stereoscopic 3-D video games.
Convention Paper 8248 (Purchase now)

P15-8 Optimal Location and Orientation for Midrange and High Frequency Loudspeakers in the Instrument Panel of an Automotive InteriorRoger Shively, Harman International - Novi, MI, USA; Jérôme Halley, Harman International - Karlsbad, Germany; François Malbos, Harman International - Chateau du Loir, France; Gabriel Ruiz, Harman International - Bridgend, Wales, UK
In a follow-up to a previous paper (AES Convention Paper # 8023, May 2010) using the modeling process described there for modeling loudspeakers in an automotive interior, the optimization of midrange and of high frequency tweeter loudspeakers’ positions for best acoustic performance in the driver's side (left) and passenger's side (right) of automotive instrument panel is reported on.
Convention Paper 8249 (Purchase now)

P16 - Signal Analysis and Synthesis


Saturday, November 6, 9:00 am — 12:30 pm (Room 236)

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P16-1 Maintaining Sonic Texture with Time Scale Compression by a Factor of 100 or MoreRobert Maher, Montana State University - Bozeman, MT, USA
Time lapse photography is a common technique to present a slowly evolving visual scene with an artificially rapid temporal scale. Events in the scene that unfold over minutes, hours, or days in real time can be viewed in a shorter video clip. Audio time scaling by a major compression factor can be considered the aural equivalent of time lapse video, but obtaining meaningful time-compressed audio requires interesting practical and conceptual challenges in order to retain the original sonic texture. This paper reviews a variety of existing techniques for compressing 24 hours of audio into just a few minutes of representative "time lapse" audio and explores several useful modifications and optimizations.
Convention Paper 8250 (Purchase now)

P16-2 Sound Texture Analysis Based on a Dynamical Systems Model and Empirical Mode DecompositionDoug Van Nort, Jonas Braasch, Pauline Oliveros, Rensselaer Polytechnic Institute - Troy, NY, USA
This paper describes a system for separating a musical stream into sections having different textural qualities. This system translates several contemporary approaches to video texture analysis, creating a novel approach in the realm of audio and music. We first represent the signal as a set of mode functions by way of the Empirical Mode Decomposition (EMD) technique for time/frequency analysis, before expressing the dynamics of these modes as a linear dynamical system (LDS). We utilize both linear and nonlinear techniques in order to learn the system dynamics, which leads to a successful separation of the audio in time and frequency.
Convention Paper 8251 (Purchase now)

P16-3 An Improved Audio Watermarking Scheme Based on Complex Spectral Phase Evolution SpectrumJian Wang, Ron Healy, Joe Timoney, NUI Maynooth - Co. Kildare, Ireland
In this paper a new audio watermarking algorithm based on the CSPE algorithm is presented. This is an extension of a previous scheme. Peaks in a spectral representation derived from the CSPE are utilized for watermarking, instead of the previously proposed frequency identification. Although this new scheme is simple, it achieves a high robustness besides perceptual transparency and accuracy which is one distinguishing advantage over our previous scheme.
Convention Paper 8252 (Purchase now)

P16-4 About This Dereverberation Business: A Method for Extracting Reverberation from Audio SignalsGilbert Soulodre, Camden Labs - Ottawa, Ontario, Canada
There are many situations where the reverberation found in an audio signal is not appropriate for its final use, and therefore we would like to have a means of altering the reverberation. Furthermore we would like to be able to modify this reverberation without having to directly measure the acoustic space in which it was recorded. In the present paper we describe a method for extracting the reverberant component from an audio signal. The method allows an estimate of the underlying dry signal to be derived. In addition, the reverberant component of the signal can be altered.
Convention Paper 8253 (Purchase now)

P16-5 Automatic Recording Environment Identification Using Acoustic ReverberationUsman Amin Chaudhary, Hafiz Malik, University of Michigan-Dearborn - Dearborn, MI, USA
Recording environment leaves its acoustic signature in the audio recording captured in it. For example, the persistence of sound, due to multiple reflections from various surfaces in a room, causes temporal and spectral smearing of the recorded sound. This distortion is referred to as audio reverberation time. The amount of reverberation depends on the geometry and composition of a recording location, the difference in the estimated acoustic signature can be used for recording environment identification. We describe a statistical framework based on maximum likelihood estimation to estimate acoustic signature from the audio recording and use it for automatic recording environment identification. To achieve these objectives, digital audio recording is analyzed first to estimate acoustic signature (in the form of reverberation time and variance of the background noise), and competitive neural network based clustering is then applied to the estimated acoustic signature for automatic recording location identification. We have also analyzed the impact of source-sensor directivity, microphone type, and learning rate of clustering algorithm on the identification accuracy of the proposed method.
Convention Paper 8254 (Purchase now)

P16-6 Automatic Music Production System Employing Probabilistic Expert SystemsGang Ren, Gregory Bocko, Justin Lundberg, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
An automatic music production system based on expert audio engineering knowledge is proposed. An expert system based on a probabilistic graphical model is employed to embed professional audio engineering knowledge and infer automatic production decisions based on musical information extracted from audio files. The production pattern, which is represented as a probabilistic graphic model, can be “learned” from the operation data of a human audio engineer or manually constructed from domain knowledge. The authors also discuss the real-time implementation of the proposed automatic production system for live mixing application scenarios. Musical event alignment and prediction algorithms are introduced to improve the time synchronization performance of our production model. The authors conclude with performance evaluations and a brief summary.
Convention Paper 8255 (Purchase now)

P16-7 Musical Eliza: An Automatic Musical Accompany System Based on Expressive Feature AnalysisGang Ren, Justin Lundberg, Gregory Bocko, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose an interactive algorithm that musically accompanies musicians based on the matching of expressive feature patterns to existing archive recordings. For each accompany music segment, multiple realizations with different musical characteristics are performed by master music performers and recorded. Musical expressive features are extracted from each accompany segment and its semantic analysis is obtained using music expressive language model. When the performance of system user is recorded, we extract and analyze musical expressive feature in real time and playback the accompany track from the archive database that best matches the expressive feature pattern. By creating a sense of musical correspondence, our proposed system provides exciting interactive musical communication experience and finds versatile entertainment and pedagogical applications.
Convention Paper 8256 (Purchase now)

P17 - Real-Time Audio Processing


Saturday, November 6, 2:30 pm — 6:30 pm (Room 220)

Chair:
Jayant Datta

P17-1 A Time Distributed FFT for Efficient Low Latency ConvolutionJeffrey Hurchalla, Garritan Corp. - Orcas, WA, USA
To enable efficient low latency convolution, a Fast Fourier Transform (FFT) is presented that balances processor and memory load across incoming blocks of input. The proposed FFT transforms a large block of input data in steps spread across the arrival of smaller blocks of input and can be used to transform large partitions of an impulse response and input data for efficiency, while facilitating convolution at very low latency. Its primary advantage over a standard FFT as used for a non-uniform partition convolution method is that it can be performed in the same processing thread as the rest of the convolution, thereby avoiding problems associated with the combination of multithreading and near real-time calculations on general purpose computing architectures.
Convention Paper 8257 (Purchase now)

P17-2 An Infinite Impulse Response (IIR) Hilbert Transformer Filter Design Guide for AudioDaniel Harris, Sennheiser Research Laboratory - Palo Alto, CA, USA; Edgar Berdahl, Stanford University - Stanford, CA, USA
Hilbert Transformers have found many applications in the signal processing community, from single-sideband communication systems to audio effects. IIR implementations are attractive for computationally sensitive systems due to their lower number of coefficients. However, as in any advanced filter design problem, their tuning and implementation present a number of design challenges and tradeoffs. Furthermore, while literature addressing these problems exists, designers must draw from several sources to find answers. In this paper we present a complete start-to-finish explanation of how to implement an efficient infinite impulse response (IIR) Hilbert transformer filter. We start from a half-band filter design and show how the poles move as the half-band filter is transformed into summed all-pass filters and then from there into a Hilbert transformer filter. The design technique is based largely on pole locations and creates a filter in the cascaded 1st order allpass form, which is numerically robust.
Convention Paper 8258 (Purchase now)

P17-3 Automatic Parallelism from Dataflow GraphsRamy Sadek, University of Southern California - Playa Vista, CA, USA
This paper presents a novel algorithm to automate high-level parallelization from graph-based data structures representing data flow. Algorithm correctness is shown via a formal proof by construction. This automatic optimization yields large performance improvements for multi-core machines running host-based applications. Results of these advances are shown through their incorporation into the audio processing engine Application Rendering Immersive Audio (ARIA) presented at AES 117. Although the ARIA system is the target framework, the contributions presented in this paper are generic and therefore applicable in a variety of software such as Pure Data and Max/MSP, game audio engines, non-linear editors and related systems. Additionally, the parallel execution paths extracted are shown to give effectively optimal cache performance, yielding significant speedup for such host-based applications.
Convention Paper 8259 (Purchase now)

P17-4 The Design of Low-Complexity Wavelet-Based Audio Filter Banks Suitable for Embedded PlatformsNeil Smyth, CSR - Cambridge Silicon Radio - Belfast, N. Ireland, UK
Many audio applications require the use of low complexity, low power, and low latency filter banks (e.g., real-time audio streaming to mobile devices). The underlying mathematics of wavelet transforms provides these attractive characteristics for embedded platforms. However, commonly used wavelets (Haar, Daubechies) possess coefficients containing irrational numbers that lead to distortion in fixed-point implementations. This paper discusses the development and provides practical performance comparisons of filter banks using wavelet transforms as an alternative to more commonly used sub-banding filter banks in PCM audio coding algorithms. The advantages and disadvantages of wavelets used in such audio compression applications are also discussed.
Convention Paper 8260 (Purchase now)

P17-5 Application of Optimized Inverse Filtering to Improve Time Response and Phase Linearization in Multiway Loudspeaker SystemsMario Di Cola, Audio Labs Systems - Casoli (CH), Italy; Daniele Ponteggia, Studio Ing. Ponteggia - Terni (TR), Italy
Digital processing has been widely demonstrated to be a very useful technique in improving loudspeaker systems’ performances. Particularly interesting is Inverse Filtering applied to loudspeaker systems because it can improve performances and sound quality in terms of transient response and reduced overall phase shift. Inverse Filtering is a processing technique that can be realized with FIR filtering techniques with a specific sequence of taps that need to be synthesized “ad hoc” for a specific transducer and/or for a specific loudspeaker system configuration. Most of the studies on this matter so far, with very few exceptions, have been focused on the “DSP processing” point of view, being generally related to the involved mathematics and relative numerical problems. This paper represents a discussion on the philosophy that should drive the application of this technique to process a loudspeaker system in order to really improve it, and consequently it’s been focused on the analysis of the loudspeaker system nature and the understanding of what can really be processed with a 1-dimensional “action.” We will discuss what can be synthesized as a “2-port” model of the loudspeaker and then what can be effectively obtained by processing the input signal of a loudspeaker system.
Convention Paper 8261 (Purchase now)

P17-6 Filter Design for a Double Dipole Flat Panel Loudspeaker System Using Time Domain Toeplitz EquationsTobias Corbach, Martin Holters, Udo Zölzer, Helmut-Schmidt-University/University of the Federal Armed Forces - Hamburg, Germany
Today flat panel loudspeakers are used in multiple applications. Due to their high directivity and their good structural integration properties, flat panel loudspeakers are commonly used for directed acoustic information. A previously proposed system of 2 parallel flat panel dipole loudspeakers with adapted input filtering ensures a high suppression of the backward radiation and only minor influences to the forward radiation side. This paper presents a new approach to the filter computation for this application. It makes use of the time domain convolution, realized by Toeplitz matrices and builds the desired filter impulse responses by a least squares approach. The different filter computations as well as the numerical and measured results are shown.
Convention Paper 8262 (Purchase now)

P17-7 A Low Complexity Approach for Loudness CompensationPradeep D. Prasad, Ittiam Systems Pvt. Ltd. - Bangalore, Karnataka, India
The essence of loudness compensation is to maintain the perceived spectral balance of audio content irrespective of the playback volume level. The need for this compensation arises due to the inherent non-linearity in human aural perception manifesting as change in spectral balance. The compensation varies with critical band, original, and playback specific loudness. This results in a computationally intensive approach of estimating original and target specific loudness and calculating required compensation for every frame. A low complexity algorithm is proposed to enable resource constrained devices to efficiently perform loudness compensation. A closed form expression is derived for the proposed compensation followed by an analysis of the quality versus complexity tradeoff.
Convention Paper 8263 (Purchase now)

P17-8 MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio ScenesOliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Jeroen Koppens, Philips Applied Technologies - Eindhoven, The Netherlands; Jürgen Herr, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jonas Engdegård, Dolby Sweden AB - Stockholm, Sweden; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Lars Villemoes, Dolby Sweden AB - Stockholm, Sweden; Leonid Terentiev, Cornelia Falch, Andreas Hölzer, María Luis Valero, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Barbara Resch, Dolby Sweden AB - Stockholm, Sweden; Harald Mundt, Dolby Germany GmbH, Nürnberg, Germany; Hyen-O Oh, Digital TV Lab., LG Electronics, Seoul, Korea
In 2007, the ISO/MPEG Audio standardization group started a new work item on efficient coding of sound scenes comprising several audio objects by parametric coding techniques. Finalized in the summer of 2010, the resulting MPEG “Spatial Audio Object Coding” (SAOC) specification allows the representation of such scenes at bit rates commonly used for coding of mono or stereo sound. At the decoder side, each object can be interactively rendered, supporting applications like user-controlled music remixing and spatial teleconferencing. This paper summarizes the results of the standardization process, provides an overview of MPEG SAOC technology, and illustrates its performance by the results of the recent verification tests. The test includes operation modes for several typical application scenarios that take advantage of object-based processing.
Convention Paper 8264 (Purchase now)

P18 - Binaural Audio


Saturday, November 6, 2:30 pm — 6:30 pm (Room 236)

Chair:
Durand R. Begault

P18-1 Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis, Part 2: Individual HRTFsJuha Merimaa, Sennheiser Research Laboratory - Palo Alto, CA, USA
In the first part of this study [1], a method for designing modified head-related transfer function (HRTF) filters with reduced timbral effects was proposed. Spectral localization cues were effectively scaled down while preserving the interaural time and level differences. For non-individualized HRTFs, the modifications were found to produce no statistically significant changes in localization. This paper continues the investigation using individual HRTFs. It is shown that in this case the reduction in timbral effects comes at a slight listener-dependent cost in localization performance. The filter design thus enables trading off more neutral timbre against more accurate localization.
Convention Paper 8265 (Purchase now)

P18-2 On the Improvement of Auditory Accuracy with Non-Indivisualized HRTF-Based SoundsCatarina Mendonça, Jorge Santos, University of Minho - Minho, Portugal; Guilherme Campos, Paulo Dias, José Vieira, University of Aveiro - Aveiro, Portugal; João Ferreira, University of Minho - Minho, Portugal
Auralization is a powerful tool to increase the realism and sense of immersion in Virtual Reality environments. The Head Related Transfer Function (HRTF) filters commonly used for auralization are non-individualized, as obtaining individualized HRTFs poses very serious practical difficulties. It is therefore extremely important to understand to what extent this hinders sound perception. In this paper we address this issue from a learning perspective. In a set of experiments, we observed that mere exposure to virtual sounds processed with generic HRTF did not improve the subjects’ performance in sound source localization, but short training periods involving active learning and feedback led to significantly better results. We propose that using auralization with non-individualized HRTF should always be preceded by a learning period.
Convention Paper 8266 (Purchase now)

P18-3 Processing and Improving Head-Related Impulse Response Database for AuralizationBen Supper, Focusrite Audio Engineering Ltd. - High Wycombe, UK
To convert a database of anechoic head-related impulse responses [HRIRs] into a set of data that is suitable for auralization involves many stages of processing. The output data set must be precisely corrected to account for some circumstances of the recording. It must then be equalized to remove coloration. Finally, the database must be interpolated to a finer resolution. This paper explains these stages of correction, equalization, and spatial interpolation for a frequently-used data set obtained from a KEMAR dummy head. The result is a useful database of HRIRs that can be applied dynamically to audio signals for research and entertainment purposes.
Convention Paper 8267 (Purchase now)

P18-4 Stimulus-Dependent HRTF PreferenceAgnieszka Roginska, New York University - New York, NY, USA; Gregory H. Wakefield, University of Michigan - Ann Arbor, MI, USA; Thomas S. Santoro, Naval Submarine Medical Research Lab - Groton, CT, USA
Measurement of individual Head Related Transfer Functions (HRTFs) can be inconvenient, expensive, and time consuming. User selected HRTFs can alleviate the complexity of individually measured HRTFs and make better quality 3-D audio available to more listeners. This paper presents the results of a study designed to investigate the use of user-selected HRTFs augmented with customized interaural cues. In the study presented subjects were asked to select HRTFs that resulted in an accurate precept based on three specific criteria: externalization quality, elevation, and front/back discrimination. Subjective tests were conducted using three different stimuli. Results of the experiment are presented.
Convention Paper 8268 (Purchase now)

P18-5 Comparison between Spherical Headmodels and HRTFs in Upmixing for Headphone-Based Virtual Surround and Stereo Expansion—Part ISunil Bharitkar, Chris Kyriakakis, Audyssey Labs., Inc. - Los Angeles, CA, USA, University of Southern California, Los Angeles, CA, USA
In this paper, a first of multiple-parts, we compare the performance of headmodels with head-related transfer functions (HRTFs), which have previously been published, using different upmixing techniques for headphone virtual surround. We consider a spherical head, with and without the pinna or the torso model, whereas for the HRTFs we incorporate the CIPIC, Nagoya, and MIT HRTF sets in the up-mixing. The up-mixing technique includes using the Moorer reverberator, a modified Moorer reverberator, and modeling the direct sound, the first several discrete reflections (with adjustable delay and amplitude), and the diffuse field reflections with a tunable frequency dependent decorrelator. Furthermore, since the measured HRTFs can introduce audible coloration we investigate if there is a trade-off between localization and timbre by incorporating complex-domain smoothing of the HRTF time responses. To evaluate the localization and timbre performance between the models we use movie and music content (viz., stereo, ITU downmix, and a commercial down-mix method) as well as Gaussian tone noise bursts of critical bandwidth.
Convention Paper 8269 (Purchase now)

P18-6 HRTF Measurements with Recorded Reference SignalMarko Durkovic, Florian Sagstetter, Klaus Diepold, Technische Universität München - München, Germany
Head-Related Transfer Functions (HRTFs) are used for adding spatial information in 3-D audio synthesis or for binaural robotic sound localization. Both tasks work best when using a custom HRTF database that fits the physiology of each person or robot. Usually, measuring HRTFs is a time consuming and complex procedure that is performed with expensive equipment in an anechoic chamber. In this paper we present a method that enables HRTF measurement in everyday environments by passively recording the surroundings without the need to actively emit special excitation signals. Experiments show that our method captures the HRTF's spatial cues and enables accurate sound localization.
Convention Paper 8270 (Purchase now)

P18-7 Angular Resolution Requirements for Binaural Room ScanningTodd Welti, Harman International - Northridge, CA, USA; Xinting Zhang, State University of New York at Binghampton - Binghampton, NY, USA
Binaural Room Scanning is a method of capturing and reproducing a binaural representation of a room or car, using a dummy head incorporating binaural microphones, and individual measurements made with the dummy head positioned at a number of different head angles. The measurement process can be time-consuming. It is therefore important to know how high the angular resolution needs to be. An experiment was performed to see if the angular resolution could be reduced from the current 1 degree resolution to 15 degree resolution, without causing an audible difference. Using a 3 alternative forced choice method, trained listeners compared 1 degree and 15 degree angular resolution and could not reliably detect the difference.
Convention Paper 8271 (Purchase now)

P18-8 Binaural Reproduction of 22.2 Multichannel Sound over LoudspeakersKentaro Matsui, Akio Ando, NHK Science and Technology Research Laboratories - Tokyo, Japan
NHK has proposed the 22.2 multichannel sound system, which consists of 22 loudspeakers and 2 for LFE producing three-dimensional spatial sound, to be the format for future TV broadcasting. To allow it to be reproduced in homes, we have investigated various reproduction methods that use fewer loudspeakers. We introduce a design of binarual rendering of the 22.2 multichannel sound with three frontal loudspeakers as a minimum configuration model for homes. It can stably process the system inverse filters by dividing them into all-pass and minimum-phase components and successfully compensate the sound quality with a peak suppression method.
Convention Paper 8272 (Purchase now)

P19 - Spatial Sound Processing—1


Saturday, November 6, 2:30 pm — 4:00 pm (Room 226)

P19-1 Estimation of the Probability Density Function of the Interaural Level Differences for Binaural Speech SeparationDavid Ayllon, Roberto Gil-Pita, Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares (Madrid), Spain
Source separation techniques are applied to audio signals to separate several sources from one mixture. One important challenge of speech processing is noise suppression and several methods have been proposed. However, in some applications like hearing aids, we are not interested just in removing noise from speech but amplifying speech and attenuating noise. A novel method based on the estimation of the Power Density Function of the Interaural Level Differences in conjunction with time-frequency decomposition and binary masking is applied to speech-noise mixtures in order to obtain both signals separately. Results show how both signals are clearly separated and the method entails low computational cost so it could be implemented in a real-time environment, such as a hearing aid device.
Convention Paper 8273 (Purchase now)

P19-2 The Learning Effect of HRTF-Based 3-D Sound Perception with a Horizontally Arranged 8-Loudspeaker SystemAkira Saji, Keita Tanno, Li Huakang, Tetsuya Watanabe, Jie Huang, The University of Aizu - Aizuwakamatsu City, Fukushima, Japan
This paper argues about the learning effects on the localization of HRTF-based 3-D sound using an 8-channel loudspeaker system, which creates virtual sound images. This system can realize sound with elevation by 8 channel loudspeakers arranged on the horizontal plane and convolving HRTF, not using high or low mounted loudspeakers. The position of the sound image that the system creates is difficult to perceive because such HRTF-based sounds are unfamiliar. However, after repetition of the learning process, almost all listeners can perceive the position of the sound images better. This paper shows this learning effect for an HRTF-based 3-D sound system.
Convention Paper 8274 (Purchase now)

P19-3 Spatial Audio Attention Model Based Surveillance Event DetectionBo Hang, Ruimin Hu, Xiaochen Wang, Weiping Tu, Wuhan University - Wuhan, China
In this paper we propose a bottom-up audio attention model based on spatial audio cues and subband energy change for unsupervised event detection in stereo audio surveillance. First, the spatial audio parameter Interaural Level Difference (ILD) is extracted to calculate and represent the attention events, which are caused by rapid moving sound source. Then the subband energy change is computed to present the salient energy distribution change in frequency domain. At last, an environment adaptive normalization is used to assess the normalized attention level. Experimental results demonstrate that the proposed audio attention model is effective for audio surveillance event detection.
Convention Paper 8275 (Purchase now)

P19-4 Investigating Perceptual Effects Associated with Vertically Extended Sound Fields Using Virtual Ceiling SpeakerYusuke Ono, Sungyoung Kim, Masahiro Ikeda, Yamaha Corporation - Iwata, Shizuoka, Japan
Virtual Ceiling Speaker (VCS) is a signal processing method that creates an elevated auditory image using an optimized cross-talk compensation for a 5-channel reproduction system. In order to understand latent perceptual effects caused by virtually elevated sound imageries, we experimentally compared the perceptual differences between physically and virtually elevated sound sources in terms of ASW, LEV, Powerfulness, and Clarity. The results showed that listeners perceived higher LEV or Clarity by adding physically or virtually elevated early reflections than 5-channel content in either case. It might implicate that attributes related to spatial dimensions were relatively well expressed due to virtually elevated signals using VCS.
Convention Paper 8276 (Purchase now)

P19-5 Enhancing 3-D Audio Using Blind Bandwidth ExtensionTim Habigt, Marko Durkovic, Martin Rothbucher, Klaus Diepold, Technische Universität München - München, Germany
Blind bandwidth extension techniques are used to recreate high frequency bands of a narrowband audio signal. These methods allow increasing the perceived quality of signals that are transmitted via a narrow frequency band as in telephone or radio communication systems. We evaluate the possibility to use blind bandwidth extension methods in 3-D audio applications, where high frequency components are necessary to create an impression of elevated sound sources.
Convention Paper 8277 (Purchase now)

P20 - Spatial Sound Processing—2


Saturday, November 6, 4:30 pm — 6:00 pm (Room 226)

P20-1 Inherent Doppler Properties of Spatial AudioMartin Morrell, Joshua D. Reiss, Queen Mary University of London - London, UK
The Doppler shift is a naturally occurring phenomenon that shifts the pitch of sound if the emitting object’s distance to the listener is not a constant. These pitch deviations, alongside amplitude change help humans to localize a source’s position, velocity, and movement direction. In this paper we investigate spatial audio reproduction methods to determine if Doppler shift is present for a moving sound source. We expand spatialization techniques to include time-variance in order to produce the Doppler shift. Recordings of several different loudspeaker layouts demonstrate the presence of Doppler with and without time-variance, comparing this to the pre-calculated theoretical values.
Convention Paper 8278 (Purchase now)

P20-2 A Binaural Model with Head Motion that Resolves Front-Back Confusions for Analysis of Room Impulse ResponsesJohn T. Strong, Jonas Braasch, Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
Front-back confusions occur in both psychoacoustic localization tests and interaural cross-correlation-based binaural models. Head motion has been hypothesized and tested successfully as a method of resolving such confusions. This ICC-based model set forth here simulates head motion by filtering test signals with a trajectory of HRTFs and shifting an azimuth remapping function to follow the same trajectory. By averaging estimated azimuths over time, the correct source location prevails while the front-back reversed location washes out. This model algorithm is then extended to room impulse response analysis. The processing is performed on simulated binaural impulse responses at the same position but different head angles. The averaging allows the model to discriminate reflections coming from the front from those arriving from the rear.
Convention Paper 8279 (Purchase now)

P20-3 A Set of Microphone Array Beamformers Designed to Implement a Constant-Amplitude Panning LawYoomi Hur, Stanford University - Stanford, CA, USA, Yonsei University, Seoul, Korea; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Young-cheol Park, Yonsei University - Wonju, Korea; Dae-Hee Youn, Yonsei University - Seoul, Korea
This paper describes a technique for designing a collection of beamformers, a "beamformerbank," that approximately produces a constant-amplitude panning law. Useful in multichannel recording scenarios, a point source will appear with energy above a specified sidelobe level in at most two adjacent beams, and the beam sum will approximate the source signal. A non-parametric design method is described in which a specified sidelobe level determines beam width as a function of arrival direction and frequency, leading directly to the number and placement of beams at every frequency. Simulation results using several microphone array configurations are reported to verify the performance of proposed technique.
Convention Paper 8280 (Purchase now)

P20-4 A 3-D Sound Creation System Using Horizontally Arranged LoudspeakersKeita Tanno, Akira Saji, Huakang Li, Jie Huang, The University of Aizu - Fukushima, Japan
In this research we have studied a 3-D sound creation system using 5- and 8-channel loudspeaker arrangements. This system has a great advantage in that it does not require the users to purchase a new audio system or to reallocate loudspeakers. The only change for creators of television stations, video game makers, and so on is to install the new proposed method for creation of the 3-D sound sources. Head-related transfer functions are used to create the signals of left and right loudspeaker groups. An extended amplitude panning method is proposed to decide the amplitude ratios between and within loudspeaker groups. Listening experiments show that the subjects could perceive the elevation of sound images created by the system as well.
Convention Paper 8281 (Purchase now)

P20-5 Locating Sounds Around the ScreenDavid Black, Hochschule Bremen, University of Applied Sciences - Bremen, Germany; Jörn Loviscach, Hochschule Bielefeld, University of Applied Sciences - Bielefeld, Germany
Today's large-size computer screens can display a wealth of information easily enough to overload the user's visual perceptual channel. Looking for a remedy for this effect, we researched into providing additional acoustic cues through surround sound loudspeakers mounted around the screen. In this paper we demonstrate the results of user evaluations of interaction with screen elements using the surround-screen setup. Results of these tests have shown that applying surround-screen sound can enhance response times in a simple task, and that users can localize the approximate origin of a sound when played back with this technique.
Convention Paper 8282 (Purchase now)

P21 - Low-Bit-Rate Audio Coding


Sunday, November 7, 9:00 am — 12:30 pm (Room 220)

Chair:
Marina Bosi

P21-1 Combination of Different Perceptual Models with Different Audio Transform Coding Schemes—Implementation and EvaluationArmin Taghipour, Nicole Knölke, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany
In this paper four combinations of perceptual models and transform coding systems are implemented and compared. The first of the two perceptual models is based on a DFT with a uniform frequency resolution. The second model uses IIR filters designed in accordance with the temporal/spectral resolution of the auditory system. Both of the two transform coding systems use a uniform spectral decomposition (MDCT). While in the first system the quantizers are directly controlled by the perceptual model, the second system uses a pre- and post-filter with frequency warping for shaping the quantization noise with a temporal/spectral resolution more adapted to the auditory system. Implementation details are given and results of subjective tests are presented.
Convention Paper 8283 (Purchase now)

P21-2 Using Noise Substitution for Backwards-Compatible Audio Codec ImprovementColin Raffel, Experimentalists Anonymous - Stanford, CA, USA
A method for representing error in perceptual audio coding as filtered noise is presented. Various techniques are compared for analyzing and re-synthesizing the noise representation. A focus is placed on improving the perceived audio quality with minimal data overhead. In particular, it is demonstrated that per-critical-band energy levels are sufficient to provide an increase in quality. Methods for including the coded error data in an audio file in a backwards-compatible manner are also discussed. The MP3 codec is treated as a case study, and an implementation of this method is presented.
Convention Paper 8284 (Purchase now)

P21-3 An Introduction to AVS Lossless Audio CodingHaiyan Shu, Haibin Huang, Ti-Eu Chan, Rongshan Yu, Susanto Rahardja, Institute for Infocomm Research, Agency for Science, Technology & Research - Singapore
Recently, the audio video coding standard workgroup of China (AVS) issued a call for proposal for audio lossless coding. Several proposals were received, in which the proposal from the Institute for Infocomm Research was selected as Reference Model (RM). The RM is based on time-domain linear prediction and residual entropy coding. It introduces a novel residual pre-processing method for random access data frames and a memory-efficient arithmetic coder with dynamic symbol probability generation. The performance of RM is found to be comparable to those of MPEG-4 ALS and SLS. The AVS lossless coding is expected to be finalized at the end of 2010. It will become the latest extension of the AVS-P3 audio coding standard.
Convention Paper 8285 (Purchase now)

P21-4 Audio Re-Synthesis Based on Waveform Lookup TablesSebastian Heise, Michael Hlatky, Accessive Tools GmbH - Bremen, Germany; Jörn Loviscach, Hochschule Bielefeld, University of Applied Sciences - Bielefeld, Germany
Transmitting speech signals at optimum quality over a weak narrowband network requires audio codecs that must not only be robust to packet loss and operate at low latency, but also offer a very low bit rate and maintain the original sound of the coded signal. Advanced speech codecs for real-time communication based on code-excited linear prediction provide bandwidths as low as 2 kbit/s. We propose a new coding approach that promises even lower bit rates through a synthesis approach not based on the source-filter model, but merely on a lookup table of audio waveform snippets and their corresponding Mel-Frequency Cepstral Coefficients (MFCC). The encoder performs a nearest-neighbor search for the MFCC features of each incoming audio frame against the lookup table. This process is heavily sped up by building a multi-dimensional search tree of the MFCC-features. In a speech coding application, for each audio frame, only the index of the nearest neighbor in the lookup table would need to be transmitted. The encoder synthesizes the audio signal from the waveform snippets corresponding to the transmitted indices.
Convention Paper 8286 (Purchase now)

P21-5 A Low Bit Rate Mobile Audio High Frequency ReconstructionBo Hang, Ruimin Hu, Yuhong Yang, Ge Gao, Wuhan University - Wuhan, China
In present communication systems, high quality audio signals are supposed to be provided with low bit rate and low computational complexity. To increase the high frequency band quality in current communication system, this paper proposed a novel audio coding high frequency bandwidth extension method, which can improve decoded audio quality with increasing only a few coding bits per frame and a little computational complexity. This method calculates high-frequency synthesis filter parameters by using a codebook mapping method, and transmits quantified gain corrections in high-frequency parts of multiplexing coding bit streams. The test result shows that this method can provide comparable audio quality with lower bit consumption and computational complexity compared to the high frequency regeneration of AVS-P10.
Convention Paper 8287 (Purchase now)

P21-6 Perceptual Distortion-Rate Optimization of Long Term Prediction in MPEG AACTejaswi Nanjundaswamy, Vinay Melkote, University of California, Santa Barbara - Santa Barbara, CA, USA; Emmanuel Ravelli, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Long Term Prediction (LTP) in MPEG Advanced Audio Coding (AAC) exploits inter-frame redundancies via predictive coding of the current frame, given previously reconstructed data. Particularly, AAC Low Delay mandates LTP, to exploit correlations that would otherwise be ignored due to the shorter frame size. The LTP parameters are typically selected by time-domain techniques aimed at minimizing the mean squared prediction error, which is mismatched with the ultimate perceptual criteria of audio coding. We thus propose a novel trellis-based approach that optimizes the LTP parameters, in conjunction with the quantization and coding parameters of the frame, explicitly in terms of the perceptual distortion and rate tradeoffs. A low complexity "two-loop" search alternative to the trellis is also proposed. Objective and subjective results provide evidence for substantial gains.
Convention Paper 8288 (Purchase now)

P21-7 Stereo Audio Coding Improved by Phase ParametersMiyoung Kim, Eunmi Oh, Hwan Shim, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
The parametric stereo coding exploiting phase parameters in a bit-efficient way is a part of MPEG-D USAC (Unified Speech and Audio Coding) standard. This paper describes the down-mixing and up-mixing scheme to further enhance the stereo coding in strong out-of-phase or near out-of-phase signals. The conventional downmixing as a sum of left and right channel for parametric stereo coding has the potential problems, phase cancellation in out-of-phase signals, which results in audible artifacts. This paper proposes the phase alignment by estimated overall phase difference (OPD) parameter and inter-channel phase difference (IPD) parameter. Furthermore, this paper describes the phase modification to minimize the phase discontinuity of down-mixed signal by scaling the size of the stereo channels.
Convention Paper 8289 (Purchase now)

P22 - Enhancement of Audio Reproduction


Sunday, November 7, 9:00 am — 10:00 am (Room 236)

Chair:
Richard Foss, Rhodes University - Grahamstown, South Africa

P22-1 Enhancing Stereo Audio with Remix CapabilityHyen-O Oh, LG Electronics Inc. - Seoul, Korea, Yonsei University, Seoul, Korea; Yang-Won Jung, LG Electronics Inc. - Seoul, Korea; Alexis Favrot, Illusonic LLC - Lausanne, Switzerland; Christof Faller, Illusonic LLC - Lausanne, Switzerland, EPFL, Lausanne, Switzerland
Many audio appliances feature capabilities for modifying audio signals, such as equalization, acoustic room effects, etc. However, these modification capabilities are always limited in the sense that they apply to the audio signal as a whole and not to a specific "audio object." We are proposing a scheme that enables modification of stereo panning and gain of specific objects inherent in a stereo signal. This capability is enabled (possibly stereo backwards compatibly) by adding a few kilobits of side information to the stereo signal. For generating the side information, the signals of the objects to be modified in the stereo signal are needed.
Convention Paper 8290 (Purchase now)

P22-2 Automatically Optimizing Situation Awareness and Sound Quality for a Sound Isolating EarphoneJohn Usher, Hearium Labs - San Francisco, CA, USA
Sound isolating (SI) earphones are increasingly used by the general public with portable media players in noisy urban and transport environments. The dangers of these SI earphones are becoming increasingly apparent, and an urgent review of their usage is being recommended by legislators. The problem is that user is removed from their local ambient scene: a reduction in their “situation awareness” that often leads to accidents involving unheard oncoming vehicles. This paper introduces a new automatic gain control system to automatically mix the ambient sound field with reproduced audio material. A discussion of the audio system architecture is given and an analysis of 20 different warning sounds is used to suggest suitable parameters.
Convention Paper 8291 (Purchase now)

P23 - Perception and Subjective Evaluation of Audio


Sunday, November 7, 9:30 am — 11:00 am (Room 226)

P23-1 Toward an Algorithm to Simulate Ensemble Rhythmic Interaction Based on Quantifiable Strategy FunctionsNima Darabi, U. Peter Svensson, Norwegian University of Science and Telecommunications - Trondheim, Norway; Chris Chafe, Stanford University - Stanford, CA, USA
This paper studies the strategy taken by a pair of ensemble performers under the influence of delay. A general quantifiable measure of strategy taken by performers in an interactive rhythmic performance is represented in a form of a single-parameter strategy function. This is done by imposing an assumption about a decision-making process for “onset generation” by a participant, with one degree of freedom, to the observed data. We present specific examples of such strategy functions, suitable for different scenarios of rhythmic collaboration. By perpendicular projection of strategy functions of an ensemble performing trail onto Cartesian axis a nominal trial was transformed to a “strategy path” to show how the performers change their strategies during the course of a trial. By mathematical induction it was proven that this transformation from the time domain to a “strategy domain” is conditionally reversible, i.e., time vectors of an ensemble trial can be reconstructed by a domino effect having its time-free strategy path and given an initial state. This algorithm is considered to be a means to simulate the ensemble trials based on the overall strategies leading them.
Convention Paper 8292 (Purchase now)

P23-2 Hearing Threshold of Pure Tones and a Fire Alarm Sound for People Listening to Music with HeadphonesKaori Sato, Shogo Kiryu, Tokyo City University - Setagaya-ku, Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan
When listening to music through headphones, the listeners may be less sensitive to environmental sounds. The sound pressure level of the fire alarm bell sound was measured in an actual internet cafe. The hearing thresholds of pure tones and the fire alarm bell sound were measured for the subjects with headphones. The minimum sound pressure level of the fire alarm bell sound recorded in the cafe was about 40 dB under the worst condition. When the subjects listened to pseudo-music signals through headphones, the hearing threshold of the fire alarm sound increased to about 80 dB.
Convention Paper 8293 (Purchase now)

P23-3 Psychoacoustic Measurement and Auditory Brainstem Response in the Frequency Range between 10 kHz and 30 kHzMotoi Koubori, Tokyo City University - Setagaya-ku, Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Mizuki Omata, Masaki Kyoso, Shogo Kiryu, Tokyo City University - Setagaya-ku, Tokyo, Japan
High-frequency components above 20 kHz can be recorded in recent high-resolution audio media. However, it is argued whether such components can be perceived or not. In this paper a psychoacoustic measurement and auditory brainstem response in the high-frequency range are reported. In the psycho-acoustic measurement, some subjects could perceive the high-frequency sounds above 20 kHz and the auditory brainstem response could be measured for one subject at the frequency of 22 kHz. However, the sound pressure levels of the thresholds were beyond 80 dB in the both measurements. The results were unremarkable. Because auditory brainstem response is a direct signal from the auditory nerve, the nerve seems not to be stimulated by weak high-frequency sounds.
Convention Paper 8294 (Purchase now)

P23-4 Acoustical Design of Control Room for Stereo and Multichannel Production and Reproduction—A Novel ApproachBogic Petrovic, Zorica Davidovic, BoZo Electronics, MyRoom Acoustics - Beograd, Serbia
This paper describes a new method of acoustic adaptation of control rooms with a goal to satisfy the necessary conditions for a quality control room, able to provide a better mix translation to other systems, with less need for the engineer to adapt, which is compatible for stereo as well as for surround monitoring. Two practical examples of control rooms will be described, which are realized by using the new principles, along with the descriptions and experiences of sound engineers who have worked in them.
Convention Paper 8295 (Purchase now)

P23-5 New 10.2-Channel Vertical Surround System (10.2-VSS); Comparison Study of Perceived Audio Quality in Various Multichannel Sound Systems with Height LoudspeakersSunmin Kim, Young Woo Lee, Samsung Electronics - Suwon, Korea; Ville Pulkki, Aalto University School of Science and Technology - Aalto, Finland
This paper presents the listening test results of perceived audio quality with several loudspeaker arrangements in order to find the optimal configuration of loudspeakers for a next-generation multichannel sound system. We compare new reproduction formats with NHK 22.2-channel and 7.1-channel setup of Recommendation ITU-R BS.775-2. The subjective evaluations focused on the loudspeaker configurations at the top layer were carried out with test materials generated with different methods, by mixing, and by reproducing B-format recordings. The results show that the perceptual difference in the overall quality achieved with the new 10.2-channel vertical surround system with 3 top loudspeakers and the NHK 22.2-channel system was imperceptible in a grading scale used in the experiment.
Convention Paper 8296 (Purchase now)

P23-6 Perceptually Motivated Scoring of Musical Meter Classification AlgorithmsMatthias Varewyck, Jean-Pierre Martens, Ghent University - Ghent, Belgium
In this paper perceived confusions between the four most popular meters 2/4, 3/4, 4/4, and 6/8 in Western music are examined. A theoretical framework for modeling these confusions is proposed and translated into a perceptually motivated objective score that can be used for the evaluation of meter classification algorithms with respect to meter labels that were elicited from a single annotator. Experiments with three artificial and two real algorithms showed that the new score is preferable over the traditional accuracy since the score rewards algorithms that make reasonable errors and seems to be more robust against different annotators.
Convention Paper 8297 (Purchase now)

P23-7 Classification of Audiovisual Media ContentUlrich Reiter, Norwegian University of Science and Technology - Trondheim, Norway
This paper describes a qualitative experiment designed to ultimately derive a set of meaningful attributes for the classification of audiovisual media content. Whereas such attributes are available for the classification of video only content, they are missing for audiovisual content. Based on the suggestions made by Woszczyk et al. in their 1995 AES Convention paper [Preprint 4133], we have taken a closer look in a combined set of experiments, one consisting in a quality trade-off decision, and one consisting in a relevance sorting task with respect to these attributes.
Convention Paper 8298 (Purchase now)

P23-8 The Influence of Texture and Spatial Quality on the Perceived Quality of Blindly Separated Audio Source SignalsThorsten Kastner, University of Erlangen - Erlangen, Germany, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Blind Audio Source Separation (BASS) algorithms are often employed in applications where the aim is the acoustic reproduction of the separated source signals. The perceived quality of the reproduced signals is therefore a crucial criterion. Two different factors can be roughly distinguished that have influence on the perceived quality of blindly separated source signals. First, the quality of the separation of a desired target source from a signal mixture. Second, the preservation of the spatial image of the source, the spatial position of the target source in the signal mixture as it is perceived by the listener. Based on extensive MUSHRA-style listening tests, results are presented reflecting the influence of both factors on the overall basic audio quality of BASS signals. Further, a nonlinear regression model is set up to parametrize the influence of both factors on the subjective audio quality. A correlation of 0.98 between predicted and measured subjective quality and a root mean square prediction error of 2.7 on a [0,100] MUSHRA-scale was achieved for predicting the basic audio quality from an unknown listening test.
Convention Paper 8299 (Purchase now)

P23-9 Perceptual Evaluation of Spatial Audio QualityHwan Shim, Eunmi Oh, Sangchul Ko, Samsung Electronics - Gyeonggi-do, Korea; Sang Ha Park, Seoul National University - Seoul, Korea
With rapid development in multimedia devices, realistic spatial audio is of interest. In this paper we discuss how to evaluate realistic audio experience and then determine major perceptual attributes to deliver realistic audio experience to listeners. We propose eight attributes in the three categories such as “timbre,” “localization,” and “spaciousness.” Each perceptual attribute is evaluated by subjective listening tests using different surround reproduction systems including 10.2 and 22.2 channel systems. The experimental results show which spatial audio attribute is influential for realistic audio experience and which attribute is difficult to reproduce by using current reproduction systems.
Convention Paper 8300 (Purchase now)

P24 - Audio Transmission


Sunday, November 7, 10:15 am — 11:15 am (Room 236)

Chair:
Richard Foss, Rhodes University - Grahamstown, South Africa

P24-1 Parameter Relationships in High-Speed Audio NetworksNyasha Chigwamba, Richard Foss, Rhodes University - Grahamstown, South Africa; Robby Gurdan, Brad Klindradt, Universal Media Access Networks GmbH - Dusseldorf, Germany
There exists a need to remotely control and monitor parameters within audio devices. It is often necessary for changes in one parameter to affect other parameters. Thus, it is important to create relationships between parameters. The capability for relationships has existed for some time between the parameters within mixing consoles. This paper explores the parameter relationships within mixing consoles, the parameter relationships in current audio networks, and then goes on to propose some fundamental relationships that should exist between parameters. It describes how these relationships have been implemented within the X170 protocol.
Convention Paper 8301 (Purchase now)

P24-2 Experiment of Sixteen-Channel Audio Transmission Over IP Network by MPEG-4 ALS and Audio Rate-Oriented Adaptive Bit-Rate Video CodecYutaka Kamamoto, Noboru Harada, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan, NTT Network Innovation Laboratories, Yokosuka, Kanagawa, Japan; Takehiro Moriya, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan; Sunyong Kim, Masanori Ogawara, Tatsuya Fujii, NTT Network Innovation Laboratories - Yokosuka, Kanagawa, Japan
This paper describes an experiment of lossless audio transmission over the IP network and introduces a prototype codec that combines lossless audio coding and variable bit rate video coding. In the experiment 16-channel acoustic signals compressed by MPEG-4 ALS were transmitted from a live venue to a cafe via the IP network to provide high-quality music. At the cafe, received sound data were decoded losslessly and appropriately remixed for adjustment to the environment at the location. The combination of high-definition video and audio data enables fans to enjoy a musical performance at places other than the live venue at the same time. This experiment motivates us to develop a codec that guarantees audio quality.
Convention Paper 8302 (Purchase now)

P25 - Audio in Education


Sunday, November 7, 11:30 am — 12:30 pm (Room 236)

Chair:
Richard Foss, Rhodes University - Grahamstown, South Africa

P25-1 The Contributions of Thomas Edison to Music EducationKevin D. Kelleher, Stephen F. Austin State University - Nacogdoches, TX, USA
With the invention of the phonograph in 1877, Thomas Edison initiated an expansion of the musical experience. His device provided new learning opportunities for both amateur and professional musicians, in addition to people who claimed no musical background. Advertised as a musical educator, Edison’s phonograph instructed families in the home and children at school. As a result of the recording feature of Edison’s machine, distinct new methods of studying music emerged. Recordings, for example, were utilized to facilitate distance instruction, and the Edison School Phonograph offered music educators the ability to record their pupils. Recording at home, moreover, was marketed with publications that included detailed descriptions and instructive pictures of recording techniques.
Convention Paper 8303 (Purchase now)

P25-2 Shaping Audio Engineering Curriculum: An Expert Panel’s View of the FutureDavid Tough, Belmont University - Nashville, TN, USA
Audio engineering programs are being created and expanded at 4-year universities across the United States due to increasing demand for the subject at the university level. The purpose of this online study was to ask an expert panel of engineers to create a ranking of essential core competencies and technologies needed by audio engineering technology programs 10 years into the future (2019). A panel of 52 audio experts and industry leaders were selected as a purposive sample and an online, modified Delphi methodology was employed. The 3-round process produced 160 competencies that can be used by administrators to construct future curriculum and technologies needed for their AET programs.
Convention Paper 8304 (Purchase now)

P26 - Auditory Perception


Sunday, November 7, 2:30 pm — 5:00 pm (Room 220)

Chair:
Poppy Crum

P26-1 Progress in Auditory Perception Research Laboratories—Multimodal Measurement Laboratory of Dresden University of TechnologyM. Ercan Altinsoy, Ute Jekosch, Sebastian Merchel, Jürgen Landgraf, Dresden University of Technology - Dresden, Germany
This paper presents the general ideas and implementation details of the MultiModal Measurement Laboratory (MMM Lab) of Dresden University of Technology. This lab combines VR equipment for multiple modalities (auditory, tactile, vestibular, visual) and is capable of presenting high-performance, interactive simulations. The goals are to discuss the progress in auditory perception research laboratories in recent years and the technical parameters, which should be considered for the implementation of reproduction systems for different modalities.
Convention Paper 8305 (Purchase now)

P26-2 Families of Sound Attributes for the Assessment of Spatial AudioSarah Le Bagousse, Orange Labs - France Télécom R&D - Cesson Sévigné, France; Mathieu Paquier, LISyC - Université de Bretagne Occidentale - Brest, France; Catherine Colomes, Orange Labs - France Télécom R&D - Cesson Sévigné, France
Over the last years, studies have highlighted many features liable to be used for the characterization of sounds by several elicitation methods. These various experiments have resulted in the production of a long list of sound attributes. But, as their respective meaning and weight are not alike for assessors and listeners, the analysis of the results of a listening test based on sound criteria remains complex and difficult. The experiments reported in this paper were aimed at shortening the list of attributes by clustering them in sound families from the results of two semantic tests based on either a free categorization (i) or use of a multi-dimensional scaling method (ii).
Convention Paper 8306 (Purchase now)

P26-3 Listening Tests for the Effect of Loudspeaker Directivity and Positioning on Auditory Scene PerceptionDavid Clark, DLC Design - Northville, MI, USA
Using stereo playback in a typical living room, subjects were exposed to six loudspeaker configurations under double-blind conditions and asked if the auditory scene was better or worse than that presented by a reference stereo system. For all configurations, the auditory scene was judged to be plausible, but mean scores were lower than those for the reference. The reference comprised symmetrically-placed conventional box loudspeakers with subwoofers.
Convention Paper 8307 (Purchase now)

P26-4 Modeling Tempo of Human Response to a Sudden Tempo Change Using Damped Harmonic OscillatorsNima Darabi, Peter Svensson, Jon Forbord, Norwegian University of Science and Telecommunications - Trondheim, Norway
A human-computer interactive subjective test was held in which 12 users tapped with a suddenly changing metronome by hand-clapping and finger-tapping. Up-sampled recorded trials with different interpolation methods were used to measure their internal timekeeper's tempo in response to each tempo step. An iterative prediction error minimization method was applied on the step response signals, to identify the underlying human users’ tempo-changing system related to this sensori-motor synchronization task. Experimental data indicated that the system is fairly LTI and would most likely resemble a second order damped harmonic oscillator. Fit ratio comparison showed that a delayed two-pole one-zero underdamped oscillator (P2DUZ) could be the trade-off between complexity and efficiency of the model. The related parameters for each user (as a set of their memory related built-in factors) were also extracted and shown to be slightly individual-dependent.
Convention Paper 8308 (Purchase now)

P26-5 Increasing Intelligibility of Multiple Talkers by Selective MixingPiotr Kleczkowski, Magdalena Plewa, Marek Pluta, AGH University of Science and Technology - Kraków, Poland
Five tracks of speech signal were recorded. One of the tracks, the target track, consisted of spoken numbers, so that by counting the number of correctly heard words the degree of comprehension of the target talker could be quantified in each trial. Two types of mixes of all five tracks were performed: a simple mix and a selective mix. The latter mix is a development of the processing technique known as binary masking. A large group of subjects (54) listened to both types of mixes and it was found that selective mixing slightly increased the intelligibility of the target talker.
Convention Paper 8309 (Purchase now)

P27 - Room Acoustics


Sunday, November 7, 2:30 pm — 5:00 pm (Room 236)

Chair:
Søren Bech

P27-1 First Results from a Large-Scale Measurement Program for Home TheatersTomlinson Holman, Ryan Green, University of Southern California - Los Angeles, CA, USA, Audyssey Laboratories, Los Angeles, CA, USA
The introduction of one auto-equalization system to the home theater market with an accompanying reporting infrastructure provides methods of data collection that allows research into many practical system installations. Among the results delivered are histograms of room volume, reverberation time vs. volume and frequency, early arrival sound frequency response both equalized and unequalized, and steady-state frequency response both equalized and unequalized. The variation in response over the listening area is studied as well and sheds light on contemporary use of the Schroeder frequency.
Convention Paper 8310 (Purchase now)

P27-2 Improving the Assessment of Low Frequency Room Acoustics Using Descriptive AnalysisMatthew Wankling, Bruno Fazenda, William J. Davies, University of Salford - Salford, Greater Manchester, UK
Several factors contribute to the perceived quality of reproduced low-frequency audio in small rooms. Listeners often use descriptive terms such as “boomy” or “resonant.” However a robust terminology for rating samples during listening tests does not currently exist. This paper reports on an procedure to develop such a set of subjective descriptors for low frequency reproduced sound, using descriptive analysis. The descriptors that resulted are Articulation, Resonance, and Bass Content. These terms have been used in listening tests to measure the subjective effect of changing three objective room parameters: modal decay time, room volume, and source/receiver position. Reducing decay time increased Articulation while increased preference is associated with increased Articulation and decreased Resonance.
Convention Paper 8311 (Purchase now)

P27-3 Subjective Preference of Modal Control Methods in Listening RoomsBruno M. Fazenda, Lucy A. Elmer, Matthew Wankling, J. A. Hargreaves, J. M. Hirst, University of Salford - Greater Manchester, UK
Room modes are well known to cause unwanted effects in the correct reproduction of low frequencies in critical listening rooms. Methods to control these problems range from simple loudspeaker/listener positioning to quite complex digital signal processing. Nonetheless, the subjective importance and impact of these methods has rarely been quantified subjectively. A number of simple control methods have been implemented in an IEC standard listening environment. Eight different configurations were setup in the room simultaneously and could therefore be tested in direct comparison to each other. A panel of 20 listeners were asked to state their preferred configuration using the method of paired comparison. Results show clear winners and losers, indicating an informed strategy for efficient control.
Convention Paper 8312 (Purchase now)

P27-4 Wide-Area Psychoacoustic Correction for Problematic Room Modes Using Non-Linear Bass SynthesisAdam J. Hill, Malcolm O. J. Hawksford, University of Essex - Colchester, UK
Small room acoustics are characterized by a limited number of dominant low-frequency room modes that result in wide spatio-pressure variations that traditional room correction systems find elusive to correct over a broad listening area. A psychoacoustic-based methodology is proposed whereby signal components coincident only with problematic modes are filtered and substituted by virtual bass components to forge an illusion of the suppressed frequencies. A scalable and hierarchical approach is studied using the Chameleon Subwoofer Array (CSA), and subjective evaluation confirms a uniform large-area performance. Bass synthesis exploits parallel nonlinear and phase vocoder generators with outputs blended as a function of transient and steady-state signal content.
Convention Paper 8313 (Purchase now)

P27-5 Beyond Coding: Reproduction of Direct and Diffuse Sounds in Multiple EnvironmentsJames D. Johnston, DTS Inc. - Kirkland, WA, USA; Jean-Marc Jot, DTS Inc. - Scotts Valley, CA, USA; Zoran Fejzo,, DTS Inc. - Calabasas, CA, USA; Steve R. Hastings, DTS Inc. - Scotts Valley, CA, USA
For many years, the difference in perception between perceptually direct sounds (i.e., sounds with a specific direction) and perceptually diffuse sounds (i.e., sounds that "surround" or "envelop" the listener) have been recognized, leading to a variety of approaches for simulating or capturing these perceptual effects. Here, we discuss a system using separation of direct and diffuse signals, or for synthetic signals (e.g., those made by modern production methods) synthesis of the diffuse signal in one of several ways, in order to enable the reproduction system, after measuring the characteristics of the playback system, to provide the best possible sensation from that particular set of playback equipment.
Convention Paper 8314 (Purchase now)


Return to Paper Sessions