AES Los Angeles 2014
Paper Session Details
P1 - Spatial Audio: Part 1
Thursday, October 9, 9:00 am — 12:30 pm
Jason Corey, University of Michigan - Ann Arbor, MI, USA
P1-1 MPEG-H Audio—The New Standard for Universal Spatial / 3D Audio Coding—Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Achim Kuntz, International Audio Laboratories Erlangen - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Recently, a new generation of spatial audio formats were introduced that include elevated loudspeakers and surpass traditional surround sound formats, such as 5.1, in terms of spatial realism. To facilitate high-quality bitrate-efficient distribution and flexible reproduction of 3D sound, the MPEG standardization group recently started the MPEG-H Audio Coding development for the universal carriage of encoded 3D sound from channel-based, object-based, and HOA-based input. High quality reproduction is supported for many output formats from 22.2 and beyond down to 5.1, stereo, and binaural reproduction—independently of the original encoding format, thus overcoming incompatibility between various 3D formats. The paper describes the current status of the standardization project and provides an overview of the system architecture, its capabilities, and performance.
Convention Paper 9095
P1-2 Bit Rate of 22.2 Multichannel Sound Signal Meeting Broadcast Quality—Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Yasushige Nakayama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Satoshi Oode, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
The bit rate of a 22.2 multichannel sound (22.2 ch) signal meeting broadcast quality was investigated by preforming several subjective evaluations. 22.2 ch is currently planned to be transmitted by MPEG-4 AAC (advanced audio coding) in 8K Super Hi-Vision broadcast. A subjective evaluation of the basic audio quality of a coded 22.2 ch signal was carried out using 49 stimuli made from a combination of seven bit rates and seven contents. Moreover, a subjective evaluation at two different listening positions and that of a downmixed 5.1 ch signal were also carried out for comparison with that of a 22.2 ch signal at the sweet spot. A bit rate meeting broadcast quality was found from the obtained results.
Convention Paper 9096
P1-3 Design, Coding and Processing of Metadata for Object-Based Interactive Audio—Simone Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Andreas Hölzer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian Borß, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian Ertel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Michael Kratschmer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
For object-based audio an appropriate definition of metadata is needed to ensure flexible playback in any reproduction scenario and to allow for interactivity. Important use-cases for object-based audio and audio interactivity are described and metadata requirements are derived. A metadata scheme is defined that allows for enhanced audio rendering techniques such as content-dependent processing, automatic scene scaling and enhanced level control. Also, a metadata preprocessing logic is proposed that prepares rendering and playout and allows for user interaction with the audio content of an object-based scene. In addition, the paper points out how the metadata can be transported efficiently in a bitstream. The proposed metadata scheme has been adopted and integrated into the currently finalized MPEG-H 3D Audio standard.
Convention Paper 9097
P1-4 On Spatial-Aliasing-Free Sound Field Reproduction Using Finite Length Line Source Arrays—Frank Schultz, University of Rostock / Institute of Communications Engineering - Rostock, Germany; Till Rettberg, University of Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
Concert sound reinforcement systems aim at the reproduction of homogeneous sound fields over extended audiences for the whole audio bandwidth. For the last two decades this has been mostly approached by using so called line source arrays for which Wavefront Sculpture Technology (WST) was introduced in the literature. The paper utilizes a signal processing model developed for sound field synthesis in order to analyze and expand WST criteria for straight arrays. Starting with the driving function for an infinite and continuous linear array, spatial truncation and discretization are subsequently taken into account. The role of the involved loudspeakers as a spatial lowpass filter is stressed, which can reduce undesired spatial aliasing contributions. The paper aims to give a better insight on how to interpret the synthesized sound fields.
Convention Paper 9098
P1-5 The Focal Shift Phenomena for Focused Source Reproduction Using Loudspeaker Arrays—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ian Drumm, University of Salford - Salford, Greater Manchester, UK
The focal shift phenomenon in optics describes how the position of the focus point in a focusing system is not simply defined by geometrical ray-based models but is affected by diffraction and is consequently a function of the size of the lens and the frequency of the light. The same effect is also observed in acoustics when looking at the focused field using physical focusing reflectors. This paper describes the focal shift phenomenon applied to the reproduction of focused sources with sound field synthesis systems, and presents a formula for the prediction of the actual rendered focal point position and also a frequency dependent positional correction for the improved rendering of a focused source with a given loudspeaker setup.
Convention Paper 9099
P1-6 Impulse Response Upmixing Using Particle Systems—Nuno Fonseca, ESTG/CIIC, Polytechnic Institute of Leiria - Leiria, Portugal
With the increase of the computational power of DSP’s and CPU’s, impulse responses (IR) and the convolution process are becoming a very popular approach to recreate some audio effects like reverb. But although many IR repositories exist, most IR recordings consider only mono or stereo. This paper presents an approach for impulse response upmixing using particle systems. Using a reverse engineering process, a particle system is created, capable of reproducing the original impulse response. By re-rendering the obtained particle system with virtual microphones, an upmixing result can be obtained. Depending on the type of virtual microphone, several different output formats can be supported, ranging from stereo to surround, and including binaural support, Ambisonics, or even custom speaker scenarios (VBAP).
Convention Paper 9100
P1-7 ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL—Junaid Jameel Ahmad, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Claudio Alberti, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Jung Wook (Jonathan) Hong, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada; Brett Leonard, University of Nebraska at Omaha - Omaha, NE; McGill University - Montreal, Quebec, Canada; Marco Mattavelli, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Clemens Par, Swiss Audec - Morges, Switzerland; Schuyler Quackenbush, Audio Research Labs - Scotch Plains, USA; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
Inverse problems have only been known in spatial audio for a very short time; their only solution, called „inverse coding“ in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. For instance, inversely coded NHK 22.2 multichannel signals in combination with USAC may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available HE-AAC v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. We likewise present a 3D audio codec design for signals up to NHK 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full Higher Order Ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. The second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. The technology has been specified as the world’s first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain. A preliminary version of this technology, based on a downmix in frequency domain, was submitted to MPEG’s „Phase 2“ selection of low-bitrate 3D coding technologies and made use of an USAC binary, which unfortunately offered no tuning options.
Convention Paper 9218
P2 - Education
Thursday, October 9, 9:00 am — 11:30 am
Tim Ryan, Webster University - St. Louis, MO, USA
P2-1 Apprenticeship Skills in Audio Education: A Comparison of Classroom and Institutional Focus as Reported by Educators—Doug Bielmeier, Middle Tennessee State University - Murfreesboro, TN, USA
Recent research of audio industry employers indicated that their new hires lacked communication skills, which the employers deemed valuable for their new hire’s success. In this research audio engineering technology (AET) educators were surveyed about the communication skills focused on in their classrooms, focus of their departments/ institutions, and their internship programs. The quantitative data suggested that both educators and their institutions lacked a focus on apprenticeship skills. Also, fewer than half of the institutions required an internship. Further research must be conducted to understand what these educators reported and how it affects AET education as a whole.
Convention Paper 9101
P2-2 Partnering Approaches for Teaching Music Technology—Jeffrey Rodgers, University of Saint Francis - Fort Wayne, IN, USA; Purdue University
The use of collaborative learning techniques is rapidly becoming a popular method for teaching 21st century skills across the United States. The term “Partnering” has been used to refer to different types of collaborative learning; most recently being defined by Marc Prensky in his book, Teaching Digital Natives, Partnering for Real Learning. This paper addresses the need for more established methods for teaching music technology skills, concepts, and theories that utilize a collaborative, partnering style of instruction. Specifically, these partnering methods are intended for students in high school and higher education.
Convention Paper 9102
P2-3 Pathways through Recording Analysis—William Moylan, University of Massachusetts - Lowell - Lowell, MA, USA
If you are in the audio industry, you analyze recordings. What do you listen for? What do you hear? Why are you listening? There are many relevant answers to these questions depending on one’s role in the industry or purpose for listening. This paper will explore the process of recording analysis and the idea of “pathways” through the many elements, dimensions, and functions that it might address; pathways that can be modified to suit the material and purpose for the analysis. Music recording will then be used as an example to bring a focus to the process. “Recording analysis” will then be the study of sound qualities of recordings and the interrelationships of those qualities and the music’s materials and structure and its text.
Convention Paper 9103
P2-4 Case Study: University Recording Arts Program Seeks to Educate, Engage, and Recruit High School Students—Leslie Gaston-Bird, University of Colorado Denver - Denver, CO, USA; Lorne Bregitzer, University of Colorado Denver - Denver, CO, USA; Lynnae Rome, University of Colorado Denver - Denver, CO, USA
A team of researchers from the University of Colorado Denver Recording Arts program visited the Denver School of the Arts high school in an effort to discover (1) whether students could detect an audible difference between music encoded as AAC and MP3 when comparing them to the original WAV file, (2) whether a trend in music career choices would appear based on gender, and (3) whether this visit could serve as an ongoing recruitment activity. The results presented here could be useful for other universities who want to engage in these activities. We will also consider the time, cost, and impact of a day-long visit.
Convention Paper 9104
P2-5 Musical Chairs: From Spectator to Stage—Mike Godwin, University of Western Ontario - London, ON, Canada; Leslie Linton, University of Western Ontario - London, Ontario, Canada
This presentation demonstrates an interdisciplinary and multidisciplinary project that involves the creation of an interactive computer application involving the fields of music education, music performance, acoustical engineering, and computer science. Typical music computer programs or “apps” usually involve creative strategies to explore various techniques for teaching the elements of music theory, history or composition and often use computer generated characters and music. This “app” is quite different in that it allows users to explore (tap, touch, pinch-to-zoom) actual performances through video; they can seamlessly “walk” through the orchestra, band or choir, and as they move around the audio changes according to where they are situated. Imagine “Google street view” with seamless video and audio instead of connected still photos and no sound.
Convention Paper 9105
P3 - Spatial Audio: Part 2
Thursday, October 9, 2:30 pm — 6:30 pm
Clemens Par, Swiss Audec - Morges, Switzerland
P3-1 A Polygon-Based Panning Method for 3D Loudspeaker Setups—Christian Borß, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
In this paper we introduce the “Edge Fading Amplitude Panning” (EFAP) method for 3D loudspeaker setups. Similar to other panning methods like Vector Base Amplitude Panning (VBAP), it can be used to create phantom sources between the loudspeaker positions. The proposed method features symmetric panning gains for symmetric loudspeaker setups, N-wise panning by using polygons instead of triangles, and a better behavior for large opening angles between loudspeakers while involving a computational complexity that is in the same order of magnitude as VBAP.
Convention Paper 9106
P3-2 Utilizing Contralateral ICTDs to Stabilize Lateral Imaging in 5.1 Surround Systems—Michael Tierney, New York University - New York, NY, USA; Adrian Tregonning, New York University - New York, NY, USA
In 5.1 surround sound systems the problems of lateral image instability and a non-linear lateral panning path are well known. Alternative panning techniques have been developed in an attempt to overcome these problems, but often improved spatial imaging compromises spectral integrity. This is an undesirable tradeoff in the context of music mixing and production. The current paper examines the effect of low-level inter-channel time differences (ICTDs) in contralateral channels with respect to lateral imaging. Subjective experiments evaluated localization perceptions with ICTDs of 1 ms in either the front or surround contralateral channel. This led to more accurate and predictable lateral image positioning with minimal spectral coloration. The results are used to propose a more effective 5.1 lateral panning mechanism.
Convention Paper 9107
P3-3 Investigation into the Impact of 3D Surround Systems on Envelopment—Paul Power, University of Salford - Salford, Greater Manchester, UK; Bill Davies, University of Salford - Salford, Greater Manchester, UK; Jonathan Hirst, University of Salford - Salford, Greater Manchester, UK
This investigation assessed a number of 2D and 3D surround systems focusing on the attribute “envelopment? to determine if surround systems with height significantly enhance the perception of envelopment over current 2D systems. To assess each of the systems an objective and subjective method was used. The objective method consisted of measuring the IACC (Inter-Aural Cross Correlation) of each reproduction system by reproducing three different types of sound scenes over each system. In addition, a subjective listening test was also carried out to evaluate the perceived envelopment. The objective measure showed that the introduction of height channels did lower the IACC. Further, subjective listening test results showed that there were significant differences between the height and horizontal surround systems in terms of envelopment, however this was dependent on the audio stimulus used. Finally, a correlation between the objective and subjective measures showed a strong negative correlation.
Convention Paper 9108
P3-4 An Architecture for Reverberation in High Order Ambisonics—Fernando Lopez-Lezcano, CCRMA, Stanford University - Stanford, CA, USA
This paper describes a reverberation architecture implemented within the signal chain of a periphonic HOA (High Order Ambisonics) audio stream. An HOA signal (3rd order in the example implementation) representing the dry source signal is decoded into an array of virtual sources uniformly distributed within the reverberant space being simulated. These virtual sources are convolved with independent, decorrelated impulse responses, optionally tailored to model spatial variations of the simulated reverberation. The output of each convolver is then encoded back into High Order Ambisonics and mixed with the original Ambisonics dry signal. The result is a convolution reverberation engine with a HOA input that outputs HOA and maintains the spatial characteristics of the input signal.
Convention Paper 9109
P3-5 Spatial Calibration of Surround Sound Systems including Listener Position Estimation—Guangji Shi, DTS, Inc. - Los Gatos, CA, USA; Martin Walsh, DTS Inc. - Los Gatos, CA, USA; Edward Stein, DTS, Inc. - Los Gatos, CA, USA
While most modern surround sound formats specify ideal loudspeaker placement, it is often impractical to comply with these specifications in most homes. In this paper we propose a simplified approach to spatial calibration for incorrectly set up surround sound systems. The proposed system utilizes a microphone array embedded into a component of the reproduction system whose location is predictable, such as a sound bar or a front center speaker. In addition to estimating loudspeaker positions, the proposed system is able to estimate a listener’s position using voice input and calibrate the surround sound system utilizing the estimated listener position. Tests conducted in a typical living room setup show that the proposed system is able to improve the listening experience on such compromised systems with only simple user interactions such as voice commands.
Convention Paper 9110
P3-6 Comparison of Pressure-Matching and Mode-Matching Beamforming for Methods for Circular Loudspeaker Arrays—Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Mincheol Shin, ISVR, University of Southampton - Southampton, Hampshire, UK; Ferdinando Olivieri, University of Southampton - Southampton, Hampshire, UK; Simone Fontana, Huawei European Research Center - Munich, Germany; Yue Lang, Huawei European Research Center - Munich, Germany
Pressure-matching and mode-matching are two well-known strategies used for the computation of beamforming digital filters for microphone and loudspeaker arrays. A theoretical comparison is presented of these two methods when these are applied to a circular loudspeaker array mounted on a rigid cylinder. The pressure-matching method is used to generate the desired acoustic pressure at a number of control points arranged in the far field of a circular loudspeaker array, while in the case of mode-matching an attempt is made to minimize the squared error between the Fourier coefficients that represent the reproduced and target radiation pattern of the array. It is shown that, in the case under consideration, the two strategies are identical if the effect of spatial aliasing is negligible.
Convention Paper 9111
P3-7 MIAP: Manifold-Interface Amplitude Panning in Max/MSP and Pure Data—Zachary Seldess, Sonic Arts R&D, UC San Diego - San Diego, CA, USA
This paper discusses MIAP (Manifold-Interface Amplitude Panning), a new freely available implementation of Meyer Sound’s SpaceMap abstract spatialization software via a collection of C externals for Max/MSP and Pure Data. SpaceMap’s technical and conceptual innovations are discussed and placed within the larger context of widely available codified spatialization algorithms and approaches such as Vector-base amplitude panning. An examination of the new implementation is made along with discussion of added features resulting from the translation.
Convention Paper 9112
P3-8 Aurally Aided Visual Search Performance Comparing Virtual Audio Systems—Camilla H. Larsen, Aalborg University - Aalborg, Denmark; David S. Lauritsen, Aalborg University - Aalborg, Denmark; Jacob J. Larsen, Aalborg University - Aalborg, Denmark; Marc Pilgaard, Aalborg University - Aalborg, Denmark; Jacob B. Madsen, Aalborg University - Aalborg, Denmark; Rasmus Stenholt, Aalborg University - Aalborg, Denmark
Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D) and an amplitude panning audio system (panning) in a virtual environment. We present a performance study involving 33 participants locating aurally-aided visual targets placed at fixed positions, under different audio conditions. A varying amount of visual distractors were present, represented as black circles with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations.
Convention Paper 9150
P4 - Transducers—Part 1
Thursday, October 9, 2:30 pm — 5:30 pm
Joerg Panzer, R&D Team - Salgen, Germany
P4-1 The Dynamics Detection and Processing Method for Preventing Large Displacement Transducer Damage Problem—Yu-Ting Tsai, Feng Chia University - Taichung, Taiwan; Jin H. Huang, Feng Chia University - Taichung, Taiwan
The method for avoiding the large displacement damage problem of the transducer diaphragm is presented in this study. To account for the displacement and supply current, a set of transduction equations with parameters for the receiver is established. The numerical solver is used to obtain the displacement values from the transduction equations of the transducer and then limit the peak of the current and coil velocity by using the dynamics limiter. Once the peak limited process is done, the safe input voltage can be obtained satisfactorily for preventing the impulse damage of displacement. The numerical and experimental results indicate that the proposed method features high efficiency and ability applied in the wide application of transducers.
Convention Paper 9113
P4-2 Specifying Xmax for Micro-Speakers and Smart Amplifiers—Géraldine Vignon, NXP - Leuven, Belgium; Shawn Scarlett, NXP - Nijmegen, The Netherlands
Smart amplifiers are a new generation of amplifiers using a real-time feedback loop to ensure the speaker stays within its physical limits, and they are capable of delivering significantly more output power than previously possible. This jump requires the definition of new performance ratings for micro-speakers. The purpose of this paper is to recommend test procedures and methods for the specification of a maximal speaker-membrane excursion (Xmax) suitable for an application that uses a smart amplifier. The proposed set of criteria to characterize this usable excursion should allow micro-speaker vendors to provide optimized and reliable solutions to their customers.
Convention Paper 9114
P4-3 Comparative Study of Si and SiC MOSFET for High Voltage Class D Audio Amplifiers—Dennis Nielsen, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Silicon (Si) Metal–Oxide–Semiconductor Field-Effect Transistors (MOSFETs) are traditionally utilized in class D audio amplifiers. It has been proposed to replace the traditional inefficient electrodynamic transducer with the electrostatic transducer. This imposes new high voltage requirements on the MOSFETs of class D amplifiers and significantly reduces the selection of suitable MOSFETs. As a consequence it is investigated if Silicon-Carbide (SiC) MOSFETs could represent a valid alternative. The theory of pulse timing errors are revisited for the application of high voltage and capacitive loaded class D amplifiers. It is shown that SiC MOSFETs can compete with Si MOSFETs in terms of THD. Validation is done using calculations and a ± 500 V amplifier driving a 100 nF load. THD+N below 0.3% is reported.
Convention Paper 9115
P4-4 Adaptive Stabilization of Electro-Dynamical Transducers—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
A new control technique for electro-dynamical transducer is presented that stabilizes the voice coil position, compensates for nonlinear distortion, and generates a desired transfer response by preprocessing the electrical input signal. The control law is derived from transducer modeling using lumped elements and identifies all free parameters of the model by monitoring the electrical signals at the transducer terminals. The control system stays operative for any stimulus including music and other audio signals. The active stabilization is important for small loudspeakers generating the acoustical output at maximum efficiency.
Convention Paper 9116
P4-5 Retrofitting a Complex, Safety-Critical PA System for Periodic Testing—Gregor Schmidle, NTi Audio AG - Schaan, Liechtenstein; Philipp Schwizer, NTi Audio AG - Schaan, Liechtenstein; Winfried Häns, Kernkraftwerk Gundremmingen GmbH - Gundremmingen, Germany
This paper describes considerations and the implementation of retrofitting a fully-automated procedure, for testing a public address system, into a safety-critical environment (a nuclear power plant). There are over 4000 loudspeakers, about 200 amplifiers as well as various alarm-signal generators that need to be tested every day within a few minutes. Additionally, all command room microphones are checked using a semi-automated procedure. The procedures were designed and configured to not only reliably detect single defective components, but also to not produce any false alarms.
Convention Paper 9117
P4-6 The Correlation between Distortion Audibility and Listener Preference in Headphones—Steve Temme, Listen, Inc. - Boston, MA, USA; Sean Olive, Harman International - Northridge, CA, USA; Steve Tatarunis, Listen Inc. - Boston, MA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
It is well-known that the frequency response of loudspeakers and headphones has a dramatic impact on sound quality and listener preference, but what role does distortion have on perceived sound quality? To answer this question, five popular headphones with varying degrees of distortion were selected and equalized to the same frequency response. Trained listeners compared them subjectively using music as the test signal, and the distortion of each headphone was measured objectively using a well-known commercial audio test system. The correlation between subjective listener preference and objective distortion measurement is discussed.
Convention Paper 9118
P5 - Audio Signal Processing
Thursday, October 9, 3:00 pm — 4:30 pm
P5-1 An Evaluation of Chromagram Weightings for Automatic Chord Estimation—Zhengshan Shi, Stanford University - Stanford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
Automatic Chord Estimation (ACE) is a central task in Music Information Retrieval. Generally, audio files are parsed into chroma-based features for further processing in order to estimate the chord being played. Much work has been done to improve the estimation algorithm by means of statistical models for chroma vector transitions, but not as much attention has been given to the loudness model during the feature extraction stage. In this paper we evaluate the effect on chord-recognition accuracy due to the use of various nonlinear transformations and loudness weightings applied to the power spectrum that is “folded" to form the chromagram in which chords are detected. Nonlinear spectral transformations included square-root magnitude, magnitude, magnitude-squared (power spectrum), and dB magnitude. Weightings included A-weighted dB and Gaussian-weighted magnitude.
Convention Paper 9119
P5-2 CUDA Accelerated Audio Digital Signal Processing for Real-Time Algorithms—Nicholas Jillings, Birmingham City University - Birmingham, UK; Yonghao Wang, Birmingham City University - Birmingham, UK
This paper investigates the use of idle graphics processors to accelerate audio DSP for real-time algorithms. Several common algorithms have been identified for acceleration and were executed in multiple thread and block configurations to ascertain the desired configuration for the different algorithms. The GPU and CPU performing on the same data sizes and algorithm are compared against each other. From these results the paper discusses the importance of optimizing the code for GPU operation including the allocating shared resources, optimizing memory transfers, and forced serialization of feedback loops. It also introduces a new method for audio processing using GPU’s as the default processor instead of an accelerator.
Convention Paper 9120
P5-3 A Modified Variable Step Size NLMS Algorithm for Acoustic Echo Cancellation Application—Youhong Lu, Microsoft - Redmond, WA, USA; Syavosh Zadissa, Microsoft - Redmond, WA, USA
The Variable Step Size (VSS) Normalized Least Mean-Square (NLMS) algorithm has been studied in depth. Numerous publications have covered this technique from both theoretical and practical point of views. This contribution builds on the past knowledge and proposes an improvement. This is a single filter approach without any double talk detection. We will show that the proposed technique, in the context of acoustic echo cancellation, offers superior performance in terms of convergence speed, misalignment error, while offering superior resilience to low Echo to Interference Ratio, and Echo Path Change.
Convention Paper 9121
P5-4 Robust Artificial Bandwidth Extension Technique Using Enhanced Parameter Estimation—Jonggeun Jeon, Hanyang University - Korea; Yaxing Li, Hanyang University - Korea; Sangwon Kang, Hanyang University - Korea; Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Eunmi Oh, Samsung Electronics Co., Ltd. - Suwon, Korea; Hosang Sung, Samsung Electronics - Korea
We propose a robust artificial bandwidth extension (ABE) technique to improve the narrowband speech signal quality using enhanced excitation estimation and spectrum envelope. For excitation estimation, we use a whitened narrowband excitation signal, generated by passing the excitation signal through a whitening filter. An adaptive spectral double shifting method is introduced to obtain an enhanced wideband excitation signal. For envelope estimation, we propose an enhanced combined method using the codebook and linear mapping. The proposed ABE system is applied to the decoded output of an adaptive multi-rate (AMR) codec at 12.2 kbps. We evaluate its performance using spectral distortion, wideband perceptual evaluation of speech quality, and a formal listening test. The objective and subjective evaluation confirm that the proposed ABE system provides better speech quality than that of AMR at the same bit rate.
Convention Paper 9122
P5-5 Audio Signal Recovery from Single-Frame Randomly Gated Fourier Magnitudes—Dominic Fannjiang, Davis Senior High School - Davis, CA, USA; Albert Fannjiang, University of California, Davis - Davis, CA, USA
Few-frame phase retrieval is motivated by the demand of real time audio signal processing which is severely ill-posed and fundamentally challenging. This paper is an exploratory study of single-frame phase retrieval of audio signals with two additional ingredients: a random gating function and symmetry-breaking DC component. In general, randomly phased gating and a suitably chosen DC component can bring the success rate of single-frame phase retrieval to unity and yield accurate, stable, fast convergent numerical reconstruction. The tradeoff between the diversity of the gate and the magnitude of DC component is investigated.
Convention Paper 9123
P5-6 Triode Emulator: Part 2—Dimitri Danyuk, Consultant - Palmetto Bay, FL, USA
The paper describes method for accurate emulating of triode behavior at high input levels. Under gross overload the grid current becomes a main origin of distortion. The measurements of grid current for popular 12AX7 triode are presented. In the region of interest grid current dependence on input signal is emulated with a simple circuit. The output harmonic weighting of the emulator is examined and compared with existing solution. The results can be applied to solid-state guitar amplifiers.
Convention Paper 9124
P5-7 A SIMULINK Toolbox of Sigma-Delta Modulators for High Resolution Audio Conversions—Isacco Arnaldi, Birmingham City University - Birmingham, UK; Yonghao Wang, Birmingham City University - Birmingham, UK
Sigma–Delta modulation is the only form of analog-to-digital conversion that allows achievement of high bit resolution at relatively low costs. There is a lack of tools in academic and in industry that allow entry level engineers to familiarize with the concepts governing this conversion technique, especially for high orders, multi-bit, and different architectures of sigma-delta modulators. The goal of this paper is to present a graphical toolbox, developed in Simulink and based on the behavioral model previously presented in  and available in  that allows to simulate and theoretically evaluate ten different architectures, continuous and discrete time, as well as single- and multi-bit implementations of different orders of analog-to-digital sigma-delta modulators.
Convention Paper 9125
P6 - Spatial Audio: Part 3
Friday, October 10, 9:00 am — 12:00 pm
Bob Schulein, RBS Consultants - Schaumburg, IL, USA
P6-1 PHOnA: A Public Dataset of Measured Headphone Transfer Functions—Braxton B. Boren, Princeton University - Princeton, NJ, USA; Michele Geronazzo, University of Padova - Padova, Italy; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria; Edgar Choueiri, Princeton University - Princeton, NJ, USA
A dataset of measured headphone transfer functions (HpTFs), the Princeton Headphone Open Archive (PHOnA), is presented. Extensive studies of HpTFs have been conducted for the past twenty years, each requiring a separate set of measurements, but this data has not yet been publicly shared. PHOnA aggregates HpTFs from different laboratories, including measurements for multiple different headphones, subjects, and repositionings of headphones for each subject. The dataset uses the spatially oriented format for acoustics (SOFA), and SOFA conventions are proposed for efficiently storing HpTFs. PHOnA is intended to provide a foundation for machine learning techniques applied to HpTF equalization. This shared data will allow optimization of equalization algorithms to provide more universal solutions to perceptually transparent headphone reproduction.
Convention Paper 9126
P6-2 Converting Two-Channel Stereo Signals to B-Format for Directional Audio Coding Reproduction—Mikko-Ville Laitinen, Aalto University - Espoo, Finland
A method for transforming two-channel stereo audio signals to B-format is proposed, which provides unaltered spatial qualities when the B-format signals are reproduced with directional audio coding (DirAC). The proposed method simulates anechoic B-format recordings of the stereo signals with two different virtual loudspeaker configurations, and the simulated B-format signals are combined according to time-frequency analysis of the stereo signals. The analysis is based on estimating the diffuseness of the generated virtual sound field and the coherence between the loudspeaker channels.
Convention Paper 9127
P6-3 Binaural Reproduction over Loudspeakers Using Low-Order Modeled HRTFs—Kentaro Matsui, Keio University - Yokohama-shi, Kanagawa, Japan; NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Yasushige Nakayama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Maho Sugaya, Keio University - Yokohama-shi, Kanagawa, Japan; Shuichi Adachi, Keio University - Yokohama-shi, Kanagawa, Japan
A method for binaural reproduction over loudspeakers using low-order modeled head-related transfer functions (HRTFs) is proposed. The low-order modeling consists of two steps: high-order model estimation using a prediction error method and subsequent model reduction based on asymptotic theory. Binaural processing over loudspeakers using the low-order modeled HRTFs is done in the time domain. In general, the directly derived controller for crosstalk cancellation is unstable, and so a method for approximating the unstable components in the controller as stable ones with processing delays is proposed. Results of computer simulation indicated that the designed controller worked well for producing equalization and crosstalk cancellation.
Convention Paper 9128
P6-4 Assessment of Ambisonic System Performance Using Binaural Measurements—Eric M. Benjamin, Surround Research - Pacifica, CA, USA; Aaron Heller, SRI International - Menlo Park, CA, USA
The phenomenon described by Solvang as spectral impairment in Ambisonic reproduction is examined. The timbre of reproduced sounds is arguably the most important aspect of an audio system. In multichannel systems audio is almost always reproduced through two or more loudspeakers simultaneously. The combination of those audio signals produces variable localization, but interference between them also causes comb filtering that then causes a reduction in output at high frequencies. The present work reports on measurements, including binaural measurements, of the spectral changes encountered in Ambisonic systems. In the case where a system has more loudspeakers than the minimum required the amount of interference is increased. What is the best choice for the use of an array designed for higher-order reproduction when used to reproduce lower-order program?
Convention Paper 9129
P6-5 The Design, Calibration, and Validation of a Binaural Recording and Playback System for Headphone and Two-Speaker 3D-Audio Reproduction—Bob Schulein, RBS Consultants - Schaumburg, IL, USA; Dan Mapes-Riordan, Etymotic Research - Elk Grove Village, IL, USA; DMR Consultants - Evanston, IL, USA
The evolution of iOS, Android, Windows Mobile, and other operating systems has fueled a rapid growth in personal entertainment products and has revolutionized the way consumers receive, control, and listen to audio content. Headphones or earphones and two-speaker stereo have become the dominant means of listening. Multichannel /speaker audio systems in contrast are primarily a part of the motion picture and home theater experience and can create audio content with richer spatial content. This 3D or immersive audio experience is desired by consumers but is not a part of the typical listening experience. Binaural sound, reproduced by headphones or two speakers, using cross-talk cancellation techniques, has been shown to provide significant spatial audio benefits when properly implemented. This paper presents a detailed look as to how these technologies are being refined and applied today to create entertainment content with significantly improved spatial qualities.
Convention Paper 9130
P6-6 Analytical Interaural Time Difference Model for the Individualization of Arbitrary Head-Related Impulse Responses—Ramona Bomhardt, RWTH Aachen University - Aachen, Germany; Janina Fels, RWTH Aachen University - Aachen, Germany
If dummy head or individual Head-Related Impulse Responses (HRIR) are used for binaural reproduction, either it can result in an incorrect perception of virtual sound sources or poses an enormous measurement effort. Therefore, in this paper a model is presented that helps to calculate the Interaural Time Difference by anthropometric head-data. By means of this Interaural Time Difference, the time of arrival of arbitrary HRIRs can be individualized. This model is compared with 13 individual measured HRIRs and the subjects' anthropometric head-data. The result of comparison leads to the conclusion that the model works well to individualize the Time of Arrival of an arbitrary HRIR.
Convention Paper 9131
P7 - Cinema Sound, Recording and Production
Friday, October 10, 9:00 am — 11:30 am
Scott Levine, Skywalker Sound - San Francisco, CA, USA; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
P7-1 Particle Systems for Creating Highly Complex Sound Design Content—Nuno Fonseca, ESTG/CIIC, Polytechnic Institute of Leiria - Leiria, Portugal
Even with current audio technology, many sound design tasks present practical constraints in terms of layering sounds, creating sound variations, fragmenting sound, and ensuring space distribution especially when trying to handle highly complex scenarios with a significant number of audio sources. This paper presents the use of particles systems and virtual microphones, as a new approach to sound design, allowing the mixing of thousands or even millions of sound sources, without requiring laborious work and providing a true coherence between sound and space, with support for several surround formats, Ambisonics, Binaural, and even partial Dolby Atmos support. By controlling a particle system, instead of individual sound sources, a high number of sounds can be easily spread over a virtual space. By adding movement or random audio effects, even complex scenarios can be created.
Convention Paper 9132
P7-2 Stage Metaphor Mixing on a Multi-Touch Tablet Device—Steven Gelineck, Aalborg University Copenhagen - Copenhagen, Denmark; Dannie Korsgaard, Aalborg University - Copenhagen, Denmark
This paper presents a tablet based interface (the Music Mixing Surface) for supporting a more natural user experience while mixing music. It focuses on the so-called stage metaphor control scheme where audio channels are represented by virtual widgets on a virtual stage. Through previous research the interface has been developed iteratively with several evaluation sessions with professional users on different platforms. The iteration presented here has been developed especially for the mobile tablet platform and explores this format for music mixing both in a professional and casual setting. The paper first discusses various contexts in which the tablet platform might be optimal for music mixing. It then describes the overall design of the mixing interface (especially focused on the stage metaphor), after which the iOS implementation is briefly described. Finally, the interface is evaluated in a qualitative user study comparing it to two alternative existing tablet solutions. Results are presented and discussed focusing on how the evaluated interfaces invite four different forms of exploration of the mix and on what consequences this has in a mobile mixing context.
Convention Paper 9133
P7-3 The Duplex Panner: Comparative Testing and Applications of an Enhanced Stereo Panning Technique for Headphone-Reproduced Commercial Music—Samuel Nacach, New York University - New York, NY, USA
As a result of new technology advances consumers primarily interact with recorded music on-the-go through headphones. Yet, music is primarily mixed using stereo loudspeaker systems consisting of crosstalk signals, which are absent in headphone reproduction. Consequently, the audio engineer's intended sound image collapses with headphones. To solve this, the work presented in this paper examines existing 3D audio techniques—primarily Binaural Audio and Ambiophonics—and enhances them to develop a novel and improved mixing technique, the Duplex Panner, for headphone-reproduced commercial music. Through subjective experiments designed for two groups, the Duplex Panner is compared to conventional Stereo panning to determine what the advantages are, if any.
Convention Paper 9134
P7-4 The Role of Acoustic Condition on High Frequency Preference—Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, University of Nebraska at Omaha - Omaha, NE; McGill University - Montreal, Quebec, Canada; Stuart Bremner, McGill University - Montreal, Quebec, Canada; The Centre for Interdiciplinary Research in music Media & Technology - Montreal, QC, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
Subjective preference for high frequency content in music program has shown a wide variance in baseline testing involving expert listeners. The same well-trained subjects are retested for consistency in setting a high frequency shelf equalizer to a preferred level under varying acoustic conditions. Double-blind testing indicates that lateral energy significantly influences high frequency preference. Furthermore, subject polling indicates that blind preference of acoustic condition is inversely related to optimal consistency when performing high frequency equalization tasks.
Convention Paper 9135
P7-5 Listener Preferences for Analog and Digital Summing Based on Music Genre—Eric Tarr, Belmont University - Nashville, TN, USA; Jane Howard, Belmont University - Nashville, TN, USA; Benjamin Stager, Belmont University - Nashville, TN, USA
The summation of multiple audio signals can be accomplished using digital or analog technologies. Digital summing and analog summing are not identical processes and, therefore, produce different results. In this study digital summing and analog summing were performed separately on the audio signals of three different recordings of music. These recordings represented three genres of music: classical, pop/country, and heavy rock. Twenty-one listeners participated in a preference test comparing digital summing to analog summing. Results indicated that listeners preferred one type of summing to the other; this preference was dependent on the genre of music.
Convention Paper 9136
P8 - Perception: Part 1
Friday, October 10, 2:00 pm — 5:00 pm
Dan Mapes-Riordan, Etymotic Research - Elk Grove Village, IL, USA; DMR Consultants - Evanston, IL, USA
P8-1 Effect of Phase on the Perceived Level of Bass—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Kai Jussila, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland; Technical University of Denmark - Denmark
The perceived level of bass is typically considered to be related to the level of the magnitude spectrum at the corresponding frequencies. However, recently it has been found that, in the case of harmonic complex signals, also the phase spectrum can affect it. This paper studies this effect further using formal listening tests. It is found out that the phase spectrum that produces the perception of the loudest bass depends on the individual. Furthermore, the loudness of the bass appears to be affected by the phase characteristics of the tone in a relatively wide band.
Convention Paper 9137
P8-2 Auditory Compensation for Spectral Coloration—Cleopatra Pike, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
The “spectral compensation effect” (Watkins, 1991) describes a decrease in perceptual sensitivity to spectral modifications caused by the transmission channel (e.g., loudspeakers, listening rooms). Few studies have examined this effect: its extent and perceptual mechanisms are not confirmed. The extent to which compensation affects the perception of sounds colored by loudspeakers and other channels should be determined. This compensation has been mainly studied with speech. Evidence suggests that speech engages special perceptual mechanisms, so compensation might not occur with non-speech sounds. The current study provides evidence of compensation for spectrum in nonspeech tests: channel coloration was reduced by approximately 20%.
Convention Paper 9138
P8-3 The Importance of Onset Features in Listeners’ Perception of Vocal Modes in Singing—Eddy B. Brixen, EBB-consult - Smorum, Denmark; Cathrine Sadolin, Complete Vocal Institute - Copenhagen, Denmark; Henrik Kjelin, Complete Vocal Institute - Copenhagen, Denmark
The Complete Vocal Technique defines four vocal modes: Neutral, Curbing, Overdrive, and Edge. This paper reports the result of a listening test involving 59 subjects. The goal has been to find the importance of onset and decay features when identifying the vocal modes. The conclusion is that the onset only to a minor degree is responsible for the aural detection of vocal modes.
Convention Paper 9139
P8-4 The Influence of Up- and Down-mixes on the Overall Listening Experience—Michael Schoeffler, International Audio Laboratories Erlangen - Erlangen, Germany; Alexander Adami, International Audio Laboratories Erlangen - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Former studies have shown that up- and down-mix algorithms have a significant effect on ratings of audio quality. The question arises whether this significant effect is also verifiable when it comes to rating the overall listening experience of music. When listeners rate the overall listening experience, they are allowed to take everything into account that is important to them for enjoying a listening experience. An experiment was conducted where 25 participants rated the overall listening experience while listening to music that was artistically mixed and up- and down-mixed by six algorithms. The results show that there are no significant differences between the artistic mixes and the up- and down-mix algorithms except for two mixing algorithms which served as “lower anchors” and had a significant negative effect on the ratings.
Convention Paper 9140
P8-5 Measures of Microdynamics—Esben Skovenborg, TC Electronic - Risskov, Denmark
Overall loudness variations such as the distance between soft and loud scenes of a movie are known as macrodynamics and can be quantified with the Loudness Range measure. Microdynamics, in contrast, concern variations on a (much) finer time-scale. In this study six types of objective measures—some based on loudness level, some based on peak-to-average ratio—were evaluated against perceived microdynamics. A novel measure LDR, based on the maximum difference between a “fast” and a “slow” loudness level, had the strongest perceptual correlation. Peak-to-average ratio (or crest factor) type of measures had little or no correlation. The ratings of perceived microdynamics were obtained in a listening experiment, with stimuli consisting of music and speech of different dynamical properties.
Convention Paper 9141
P8-6 Real-Time Infant Cry Detection in Diverse Environments: A Novel Approach—Anant Baijal, Samsung Electronics Co. Ltd. - Suwon, Korea; Jinsung Kim, Samsung Electronics Co. Ltd. - Suwon, Republic of Korea; Jae-hoon Jeong, Samsung Electronics Co. Ltd. - Suwon, Korea; Inwoo Hwang, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; JungEun Park, Samsung Electronics Co. Ltd. - Suwon, Korea; Byeong-Seob Ko, Samsung Electronics Co. Ltd. - Suwon, Korea
We present a novel approach to detect infant cry in actual outdoor and indoor settings. Using computationally inexpensive features like Mel Frequency Cepstral Coefficients (MFCCs) and timbre-related features, the proposed algorithm yields very high recall rates for detecting infant cry in challenging settings such as café, street, playground, office, and home environments, even when Signal to Noise Ratio (SNR) is as low as 6 dB, while maintaining high precision. The results indicate that our approach is highly accurate, robust and, works in real-time.
Convention Paper 9142
P9 - Transducers—Part 2
Friday, October 10, 2:00 pm — 4:30 pm
Juha Backman, Microsoft - Espoo, Finland
P9-1 The Implementation of MEMS Microphones for Urban Sound Sensing—Charlie Mydlarz, New York University - New York, NY, USA; Samuel Nacach, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Tae Hong Park, New York University - New York, NY, USA
The urban sound environment of New York City (NYC) is notoriously loud and dynamic. The current project aims to deploy a large number of remote sensing devices (RSDs) throughout the city, to accurately monitor and ultimately understand this environment. To achieve this goal, a process of long-term and continual acoustic measurement is required, due to the complex and transient nature of the urban soundscape. Urban sound recording requires the use of robust and resilient microphone technologies, where unpredictable external conditions can have a negative impact on acoustic data quality. For the presented study a large-scale deployment is necessary to accurately capture the geospatial and temporal characteristics of urban sound. As such, an implementation of this nature requires a high-quality, low-power, and low-cost solution that can scale viably. This paper details the microphone selection process involving the comparison between a range of consumer and custom made MEMS microphone solutions in terms of their environmental durability, frequency response, dynamic range, and directivity. Ultimately a MEMS solution is proposed based on its superior resilience to varying environmental conditions and preferred acoustic characteristics.
Convention Paper 9143
P9-2 Graphene Oxide Based Materials as Acoustic Transducers: A Ribbon Microphone Application Case Study—Peter Gaskell, McGill University - Montreal, Quebec, Canada; GKL Audio Inc. - Montreal, Quebec, Canada; Robert-Eric Gaskell, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada; Jung Wook (Jonathan) Hong, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada; Thomas Szkopek, McGill University - Montreal, Quebec, Canada
Materials used in acoustic transducer membranes need very specific qualities that in any real system require many tradeoffs to be made. Graphene and graphene related materials are a newly discovered class of materials with some exceptional properties that has the potential to make significant contributions to the performance of many acoustical transduction systems. The properties of graphene relevant to transducer applications are discussed and two graphene based films, an aluminum coated Graphene Oxide film and an aluminum coated reduced Graphene Oxide film, are tested in a ribbon microphone application. Physical and acoustical measurements of the films indicate that with minor improvements, ribbon transducers could significantly benefit from graphene-based materials.
Convention Paper 9144
P9-3 Subwoofers in Rooms: Stereophonic Reproduction—Juha Backman, Microsoft - Espoo, Finland
A study based on computational model of interaural level and time differences at the lowest audio frequencies, often reproduced through subwoofers, is presented. This work studies whether interaural differences can exist, and if they do, what kind of relationship there is between the loudspeaker direction and the interaural differences when monophonic and stereophonic subwoofer arrangements are considered. The calculations are made for both simple amplitude panned signals and for simulated microphone signals. The results indicate that strong narrow-band differences can exist, especially near room eigenfrequencies when the listener is close to nodes of the room modes and that the modes of the recording room can have an effect on the sound field of the listening room.
Convention Paper 9145
P9-4 Subwoofer Design with Moving Magnet Linear Motor—Mario Di Cola, Audio Labs Systems - Casoli (CH), Italy; Claudio Lastrucci, Powersoft S.p.a. - Scandicci (FI), Italy; Lorenzo Lombardi, Powersoft S.p.a. - Scandicci, FL, Italy
A new electro-dynamic transducer has been studied, based on a moving magnet linear motor instead of a traditional moving coil, and it has been carefully described into a recently presented paper from Claudio Lastrucci. This moving magnet motor could considerably improve the conversion efficiency and the sound quality at the lowest frequency range. It has been developed around a fully balanced and symmetrical moving magnet motor geometry and it can significantly reduce the distortion, in the lowest range, to a fraction if compared to that of a conventional moving coil loudspeaker in the same range. It also offers a considerably higher power handling and overall robustness thus being able of reproducing the lowest range on bass spectrum with an unprecedented level of quality and output. The novel motor design also shows a considerable high acceleration that makes it suitable for the application also in the upper bass region. This paper proposes a review of the methodology that can be pursued in subwoofer design while using this motor technology. The new motor technology will require a different approach to subwoofer design. Several aspects that are in common with conventional loudspeakers will be outlined while also described those characteristics that differs significantly. The application of the technology and relative results will be shown through examples of practical applications and with measurement results.
Convention Paper 9146
P9-5 A Direct Driver for Electrostatic Transducers—Dennis Nielsen, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Electrostatic transducers represent a very interesting alternative to the traditional inefficient electrodynamic transducers. In order to establish the full potential of these transducers, power amplifiers that fulfill the strict requirements imposed by such loads (high impedance, frequency depended, nonlinear, and high bias voltage for linearization) must be developed. This paper analyzes a power stage suitable for driving an electrostatic transducer under biasing. Measurement results of a plus/minus 400 V prototype amplifier are shown. THD below 1 % is reported.
Convention Paper 9147
P10 - Spatial Audio
Friday, October 10, 2:30 pm — 4:00 pm
P10-1 An Object-Based Audio System for Interactive Broadcasting—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Jens Spille, Technicolor, Research and Innovation - Hannover, Germany
This paper describes audio recording, delivery, and rendering for an end-to-end broadcast system allowing users free navigation of panoramic video content with matching interactive audio. The system is based on one developed as part of the EU FP7 funded project, FascinatE. The premise of the system was to allow users free navigation of an ultra-high definition 180 degree video panorama for a customizable viewing experience. From an audio perspective the complete audio scene is recorded and broadcast so the rendered sound scene at the user end may be customized to match the view point. The approach described here uses an object-based audio paradigm. This paper presents an overview of the system and describes how such a system is useful for facilitating an interactive broadcast.
Convention Paper 9148
P10-2 Effect of Headphone Equalization on Auditory Distance Perception—Kaushik Sunder, Nanyang Technological University - Singapore, Singapore; Ee-Leng Tan, Nanyang Technological University - Singapore, Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore
Headphones are not acoustically transparent and thus it affects both the timbral as well as the spatial quality of the input sound source. The effect of the headphones has to be compensated by calculating the inverse of the headphones transfer function and convolving it with the binaurally synthesized audio. Headphone transfer function (HPTF) also depends on the headphone-ear coupling and thus displays high spectral variation between individuals. It has been found that the type of equalization (individual or non-individual) affects the directional perception of the virtual audio reproduced using headphones. However, little investigation has been carried out on the effect of headphone equalization on auditory distance perception. In this paper, we study in detail the perceptual effects of equalization on the auditory distance perception in the proximal region in anechoic conditions. It was found that the equalization of the headphone is critical for good distance perception. The type of equalization (individual or non-individual) did not have a significant effect on the auditory distance perception indicating that the distance perception does not depend on the idiosyncratic features. The effect of repositioning of the headphone on auditory depth perception is also studied in this work.
Convention Paper 9149
P10-3 Perceptual Evaluation of Loudspeaker Binaural Rendering Using a Linear Array—Ismael Nawfal, Beats Electronics - Culver City, CA, USA; Joshua Atkins, Apple, Inc. - Culver City, CA, USA; Stephen Nimick, Beats Electronics, LLC - Culver City, CA, USA
In this paper we evaluate two different techniques for spatial rendering using various linear array arrangements and filter lengths in the context of their perceived ability to render a given sound event. The two techniques explored are a recently introduced numerical technique and a conventional crosstalk cancellation system. Extensive perceptual evaluations were conducted in order to evaluate the perceived quality of the proposed and conventional synthesis methods using a binaural representation over headphones. The data were compiled to show the relationship between linear array loudspeaker arrangement, reproduction angle, filter length, and subjective mean opinion scores.
Convention Paper 9151
P10-4 Uncorrelated Input Signals Design and Identification with Low-Complexity for Simultaneous Estimation of Head-Related Transfer Functions—Sekitoshi Kanai, Keio University - Yokohama-shi, Kanagawa, Japan; Kentaro Matsui, Keio University - Yokohama-shi, Kanagawa, Japan; NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Yasushige Nakayama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Shuichi Adachi, Keio University - Yokohama-shi, Kanagawa, Japan
In our previous study, we verified that a set of head-related transfer functions (HRTFs) can simultaneously be estimated by treating it as a multi-input single-output (MISO) system. However, this leads to a lack of accuracy if appropriate input signals are not chosen and high computational cost is required to estimate. To improve the accuracy, a novel input design method is proposed. Moreover, we also propose a system identifiation method that reduces the space complexity even when the number of measuring directions increases. The effectiveness of the proposed methods was demonstrated through simultaneous estimation experiments of HRTFs.
Convention Paper 9152
P10-5 An Evolutionary Algorithm Approach to Customization of Non-Individualized Head Related Transfer Functions—Eric S. Schwenker, Carnegie Mellon University - Pittsburgh, PA, USA; Griffin D. Romigh, Air Force Research Labs - Wright Patterson Air Force Base, OH, USA
Currently, the commercialization of high-quality virtual auditory display technology is limited by the costly and time-consuming methods required for obtaining listener-specific head-related transfer functions (HRTFs), directionally-dependent filters that encode spatial information. As such, there is an increased interest in the estimation of individualized HRTFs based on non-acoustic data. This study highlights the capabilities of an evolutionary algorithm method applied to the complex parameter optimization problem that arises when HRTFs are fit to individuals (or populations), rather than acoustically measured. Results suggest the algorithm may be capable of providing HRTFs that improve localization through both personalization of generic HRTFs and the generation of an optimized set of generic HRTFs.
Convention Paper 9153
P10-6 Multichannel-Reproduced Music with Height Ambiences: Investigating Physical and Perceptual Factors for Comprehensive 3D Experience—Antonios Karampourniotis, Rochester Institute of Technology - Rochester, NY, USA; Mark J. Indelicato, Rochester Institute of Technology - Rochester, NY, USA; Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
This study investigated the influence of the height loudspeaker positions and their signals on perceived overall sound quality. Two layers and a total of seventeen loudspeakers were used in a horizontal and height layer. Twelve participants were asked to subjectively rank and describe eight randomly presented configurations that consisted of four height loudspeakers. A set of inverse filters was generated and applied to remove the room’s acoustic influence and a new set of listeners were asked to evaluate sound quality. The experimental results indicate the significance of the height loudspeaker positioning for a perceived 3D sound field. These results show that the room’s acoustic influence affects desired perceptual characteristics of the sound field and influences subjective preferences.
Convention Paper 9154
P11 - Room Acoustics
Saturday, October 11, 9:00 am — 11:30 am
Doyuen Ko, Belmont University - Nashville, TN, USA
P11-1 Development of a Sound Field Diffusion Coefficient—Alejandro Bidondo, Universidad Nacional de Tres de Febrero - UNTREF - Caseros, Buenos Aires, Argentina; Mariano Arouxet, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Sergio Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Virtual Things; Javier Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Germán Heinze, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Adrián Saavedra, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina
This research addresses the development of an absolute descriptor and its associated calculation software algorithm with user interface, which quantifies the degree of diffusion, in a third octave band basis and globally, of a sound field from a monaural impulse response. The degree of sound field diffuseness is related with the probability of getting accumulated energy in discrete reflections compared to the total energy contained in all the reflections of an impulse response, after extracting its decay and normalizing it in respect to its reverberation time. The coefficient range varies between 0 and 1, zero being “no diffuseness” and one being a maximum absolute reference obtained from analyzing different types of rooms. The challenge has been not only to develop this coefficient theoretically but converting its theory in a mathematical numerical calculation through a dedicated software. This coefficient may be used both to study the effects of sound diffusers coatings as well as coated surfaces and the degree of perception of different values which may appear within sound fields.
Convention Paper 9155
P11-2 Simulating Talker Directivity for Speech Intelligibility Measurements—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
The research investigated how both the frequency response and directivity of a talker or voice simulator can affect the measured and predicted speech intelligibility within a given situation. Current sound system and acoustic standards provide little guidance as to the required acoustic characteristics of a simulator or the effects that its directivity and frequency response parameters may have. It is shown that the both driver size and format as well as the overall frequency response can have a marked effect on speech intelligibility measurements. A range of talker loudspeaker simulators was investigated in both real and simulated environments. The research shows that the characteristics of several commonly used simulators varied significantly which markedly affected the resultant intelligibility measurements. The results of the work are used to formulate a number of recommendations for talker and voice simulator electroacoustic characteristics and standardization of measurement methods.
Convention Paper 9156
P11-3 Visualization of Early Reflections in Control Rooms—Malcolm Dunn, Marshall Day Acoustics - Auckland, New Zealand; Daniel Protheroe, Marshall Day Acoustics - Auckland, New Zealand
Measurements were undertaken in a variety of control rooms with a system utilizing a compact microphone array and sound intensity technique to estimate the direction of early reflections. This paper presents the results of these measurements including 3D intensity plots that provide a visual representation of sound arrivals at the listener position. The effectiveness of this type of system for the detection of problematic reflections and the evaluation of the listening environment is discussed.
Convention Paper 9157
P11-4 Holistic Acoustic Absorber Design: From Modeling and Simulation to Laboratory Testing and Practical Realization—Rob Toulson, Anglia Ruskin University - Cambridge, UK; Silvia Cirstea, Anglia Ruskin University - Cambridge, UK
In developing a new acoustic absorber, a number of practical design challenges are experienced. Complex mathematical models for many acoustic absorbing methods have previously been developed, however there is very little accessible data describing how those models perform in a practical implementation of the design. This project describes a holistic approach to the development of a novel slotted film sound absorber and presents the results at each design iteration. Initially a number of mathematical models are considered, in order to optimize the design geometry for a maximum sound absorbing effect. Second, the modeled designs are laboratory tested with an impedance tube system. Finally, the practical acoustic absorber design, including framing and mounting methods, is finalized and tested in an ISO accredited reverberation chamber. The results of the modeling, impedance tube testing, and the room testing are all considered. It is seen that the simulation and impedance tube results match very closely, whereas the practical implementation performance is lower in terms of acoustic absorption. This research therefore presents a valuable case study for acoustic absorber designers in helping to better predict the final performance of their designs.
Convention Paper 9158
P11-5 Oscillating Measurement Motion – Myth or Magic?—Wolfgang Heß, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Stefan Varga, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Acoustical reproduction in cars is different to acoustical reproduction in rooms or other larger environments. This is caused by the size of the car cabin, by surrounding materials as well as non-ideal loudspeaker enclosures, and reproduction positions. In order to achieve a well balanced sound system, for most sound system tuning engineers, acoustical measurements are essential in the process of designing and optimizing an audio system. This paper analyses and describes different methods of measuring the frequency-gain behavior of single or multiple loudspeakers. When measured at a single position, often dips and notches in the form of comb-filters can be observed in the frequency response. This work focuses on practical aspects: In which way is it possible to measure a frequency response that describes the sound at the listening area as correct as possible, how can we reduce comb-filter effects? In which way is a fast and adequate measurement possible? Results showed a significant reduction by movement of the measurement microphone. An evaluation by listening tests showed that frequency response averaging by microphone movements led not only to smoother magnitude responses but also to better sound experience through less equalization.
Convention Paper 9159
P12 - Transducers—Part 3
Saturday, October 11, 9:00 am — 12:00 pm
Robert-Eric Gaskell, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada
P12-1 Compensation of the Flux Modulation Distortion Using an Additional Coil in a Loudspeaker Unit—Niccolo Antonello, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
Flux modulation is one of the main causes of distortion in electrodynamic loudspeaker units. A new compensation technique that eliminates this type of non-linearity using an additional compensation coil in the speaker unit is presented. An equivalent circuit model of the device including the compensation coil is derived. The compensation technique consists on feeding the compensation coil and voice coil with filtered versions of the wanted audio signal. Simulations show that a significant reduction in flux modulation distortion can be achieved with this technique. A simple magnetic circuit has been constructed to test the method on a real device, and the measurements show the method works, also when eddy currents are present.
Convention Paper 9160
P12-2 Physical Requirements of New Acoustic Transducers to Replace Existing Moving-Coil Loudspeakers—Gyeong-Tae Lee, Samsung Electronics - Suwon-si, Gyeonggi-do, Korea; Jong-Bae Kim, Samsung Electronics - Suwon-si, Gyeonggi-do, Korea; Dong-Hyun Jung, Samsung Electronics - Suwon-si, Gyeonggi-do, Korea
New types of acoustic transducers have recently emerged as a possible alternative to moving-coil loudspeakers. However, they have not met the performance for commercialization enough to replace existing loudspeakers. In this paper to identify the requirements for high performance, we derived an analytical model of a moving-coil loudspeaker based on physical acoustics and electroacoustics, and evaluated the simulated results of the model in terms of acoustic performance. Finally, we discussed the physical requirements of new acoustic transducers from the perspective of bass performance, tonal balance, decay time, and spatial directivity and then made some suggestions.
Convention Paper 9161
P12-3 Nonlinear Flux Modulation Effects in Moving Coil Transducers—Felix Kochendörfer, JBL/Harman Professional - Northridge, CA USA; Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
Adverse effects in transducer motors produced by the nonlinear force factor on performance of loudspeakers are well understood. Nonlinear effects produced by dependence of the voice coil inductance and resistance on the coil’s position and on the current are less obvious. Previous work of the authors showed the nonlinear behavior of the voice coil inductive component presented as a function of the voice coil position, which was obtained by FEA simulation tools. This work is an attempt to obtain variation of the aforementioned parameters experimentally as functions of the voice coil position, current, and frequency.
Convention Paper 9162
P12-4 Design and Construction of a Circular AMT Speaker of 360° Radiation—Rodrigo Fernández Arcani, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Alejandro Sánchez Caparrós, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina
Among the electro-mechano-acoustical dynamic transducers there is one whose operating principle is not widely publicized. This transducer is known as Air Motion Transformer (AMT). Using this operation principle, a prototype of an AMT transducer type was designed and built with the particularity of making a cylindrical diaphragm. The behavior of the prototype was assessed and noted that it achieves a 360 degree of quasi uniform high frequency sound pressure level radiation in the horizontal plane. In order to improve the efficiency, several configurations are proposed.
Convention Paper 9163
P12-5 Identification Compression Driver Parameters Based on a Concept of Diaphragm’s Frequency-Dependent Area—Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
In the previous work, matrix analysis was applied the derivation of the transfer matrix of a compression driver’s diaphragm. Its mechanical impedance consisted of lumped parameters and a part corresponding to the high-frequency breakups. In the current work the mechanical impedance is based on lumped parameters whereas the area of the diaphragm is presented as a function of frequency. The transfer function of compression driver is derived from the overall matrix that includes the frequency-dependent area of the diaphragm. The area is included in the transformation matrix that links the mechanical and acoustical parts. By equating the measured SPL for a given input voltage to SPL derived from the model, the expression for the frequency-dependent area of the diaphragm can be derived. This information is used in modeling and design of different drivers using the identified diaphragm.
Convention Paper 9164
P12-6 A General Approach for the Acoustic Design of Compression Drivers with "Narrow" Channels and Rigid Diaphragms—Jack Oclee-Brown, GP Acoustics (UK) Ltd. - Maidstone, UK
A generalized approach to determining the position and size of "narrow" phase-plug channels is presented that is applicable to compression drivers with radiating diaphragms of arbitrary geometry. In addition, it is demonstrated that by carefully shaping the compression cavity according to the diaphragm motion then optimal behavior results. FEM computed examples are presented demonstrating the method for the case of a conical diaphragm geometry.
Convention Paper 9165
P13 - Applications in Audio: Part 1
Saturday, October 11, 10:30 am — 12:00 pm
P13-1 Automated Sound Optimization of Car Audio Systems Using Binaural Measurements and Parametric IIR Filters—Friedrich von Tuerckheim, Visteon Electronics Germany GmbH; Tobias Münch, Visteon Electronics Germany GmbH - Karlsruhe, Germany
Sound tuning is an important step towards improved listening conditions in car interiors. In most cases it is done manually by sound engineers. This paper presents an approach for fully automated sound optimization. In a first step, loudspeaker and interior responses are captured by averaged binaural measurements. Then, the resulting frequency response is matched to a given reference curve. As automotive head units often provide limited capacity for audio filters, a small set of second order recursive filters is used for equalization. Numerical optimization leads to a minimum error response while maintaining psychoacoustic specifications. The presented method is used for fast and efficient frequency response correction as well as for copying sound characteristics of different car interiors.
Convention Paper 9166
P13-2 Study of TV Sound Level Adjustment System for the Elderly with Speech Rate Conversion Function—Tomoyasu Komori, NHK Engineering System, Inc. - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Atsushi Imai, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Nobumasa Seiyama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Reiko Takou, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tohru Takagi, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Yasuhiro Oikawa, Waseda University - Shinjuku-ku, Tokyo, Japan
Elderly viewers sometimes feel that background sound (music and sound effects) in TV programs is too loud, or that narration or speech is too fast to understand. That is why we have constructed a prototype system that compensates for both of these problems with sound on the receiver side. The results of evaluation experiments targeting elderly viewers showed that the use of this system could make it significantly easier to listen to TV sound. These results also showed that elderly viewers exhibit the "recruitment phenomenon." They tend to select processing with a slowed speech rate that is easy to hear.
Convention Paper 9167
P13-3 Investigation of Gain Adjustment in a Personal Assistive Listening System Using Parametric Array Loudspeakers—Santi Peksi, Nanyang Technological University - Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore; Ee-Leng Tan, Nanyang Technological University - Singapore, Singapore; Eu-Chin Ho, Tan Tock Seng Hospital - Singapore; Satya Vijay Reddy Medapati, Tan Tock Seng Hospital - Singapore
Human hearing degrades with ages, which leads to difficulties in viewers of different age groups enjoying television together as they required different audio volumes. To address the problem Simon et al.  proposed loudspeaker arrays that boost 10 dB at all frequencies in a narrow spatial zone where hearing-impaired listener is located. This paper presents a different approach using a personal assistive listening (PAL) system that aims to deliver a highly directional sound beam with the required gain amplification through a parametric array loudspeaker to match the hearing profile of a hearing-impaired listener, while delivering normal sound loudness to the rest of normal listeners using conventional electro-dynamic loudspeakers. This paper investigates the gain adjustment of two commercially-available parametric loudspeakers over the frequency range for audiometry testing and relates the gain adjustments to the sound pressure level (SPL) at various positions away from the sound system.
Convention Paper 9168
P13-4 Cinema Sound Facility Design for Higher Education—Robert Jay Ellis-Geiger, City University of Hong Kong - Hong Kong, SAR China
This paper is a narrative of the trials and tribulations that the author went through from design through to the commissioning of probably the most advanced higher education cinema sound facilities within the Asia-Pacific region. The facilities include a 7.1 THX and Dolby certified dubbing theatre, audio recording studio integrated into a 30-workstation audio/music technology lab, multiple 5.1 surround screening rooms, color correction, multi-format home entertainment environment and a large sound stage that can accommodate a full symphonic orchestra. The main purpose for the facilities were to support the delivery of undergraduate and post-graduate courses in sound, music, and audio within the academic studios of cinematic arts and animation and to establish a research center for cinema sound and music technology applications.
Convention Paper 9169
P13-5 A General-Purpose Decorrelator Algorithm with Transient Fidelity—Ross Penniman, University of Miami - Coral Gables, FL, USA
In a multichannel spatial audio presentation, a decorrelator is a signal-processing algorithm that helps to create a diffuse sensation for the listener by defeating any localization cues. In this paper the relevant psychoacoustic and signal processing principles are reviewed, and a new decorrelator algorithm is proposed that operates blindly on a single-channel input signal and creates a 5-channel decorrelated presentation. This algorithm uses transient extraction to achieve better fidelity when decorrelating a wide range of input signals. A subjective listening test compares the performance of the proposed algorithm in relation to two existing algorithms drawn from the literature. Results of the test are discussed as well as suggested improvements to the test methodology.
Convention Paper 9170
P13-6 Applicability of Perceptual Evaluation of Speech Quality in Evaluating Heavily Distorted Speech—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan
Speech quality assessment is indispensable to properly design a speech enhancement algorithm. The perceptual evaluation of speech quality (PESQ) is frequently employed as an objective speech distortion measure. The PESQ is a methodology for estimating subjective assessment of speech quality assuming a slight distortion caused by speech codecs for telephony systems. In case of noise reduction, however, a degree of speech distortion is heavier than those caused by the speech codecs. In this paper applicability of the PESQ is investigated for noisy and noise-reduced speech signals under severe noisy conditions. A relationship between PESQ scores and subjective mean opinion scores reveals that the PESQ can be applicable for heavily distorted speech only under non-stationary noisy conditions.
Convention Paper 9171
P14 - Perception: Part 2
Saturday, October 11, 2:00 pm — 5:30 pm
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA
P14-1 Revision of Rec. ITU-R BS.1534—Judith Liebetrau, Fraunhofer IDMT - Ilmenau, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany; Nick Zacharov, DELTA SenseLab - Iisalmi, Finland; Kaoru Watanabe, NHK Science and Technology Research Labs. - Setagaya-ku, Tokyo, Japan; Catherine Colomes, Orange Labs - Cesson Sevigné, France; Poppy Crum, Dolby Laboratories - San Francisco, CA, USA; Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Ilmenau University of Technology - Ilmenau, Germany; Andrew Mason, BBC Research and Development - London, UK
In audio quality evaluation, ITU-R BS.1534-1, commonly known as MUSHRA, is widely used for the subjective assessment of intermediate audio quality. Studies have identified limitations of the MUSHRA methodology , which can influence the robustness to biases and errors introduced during the testing process. Therefore ITU-R BS.1534 was revised to reduce the potential for introduction of systematic errors and biases in the resulting data. These modifications improve the validity and the reliability of data collected with the MUSHRA method. The main changes affect the post screening of listeners, the inclusion of a mandatory mid-range anchor, the number and length of test items as well as statistical analysis. In this paper the changes and reasons for modification are given.
Convention Paper 9172
P14-2 Movement Perception of Risset Tones with and without Artificial Spatialization—Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan
The apparent radial movement (approaching or receding) of Risset tones was studied for sources in front, above, and to the right of listeners. Besides regular Risset tones, two kinds of spatialization were included: global (regarding the tone as a whole) and individual (spatializing each of its spectral components). The results suggest that regardless of the direction of the glissando, subjects tend to judge them as approaching. The effect of spatialization type was complex: For upward Risset tones, judgments were, in general, aligned with the direction of the spatialization, but this was not observed in the downward Risset tones. Furthermore, individual spatialization yielded judgments comparable to those of non-spatialized stimuli, whereas spatializing the stimuli as a whole yielded judgments more aligned with the treatment.
Convention Paper 9173
P14-3 The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System—Helen M. Jackson, Meridian Audio Ltd. - Huntingdon, UK; Michael D. Capp, Meridian Audio Ltd. - Huntingdon, UK; J. Robert Stuart, Meridian Audio Ltd. - Huntingdon, UK
This paper describes listening tests investigating the audibility of various filters applied in high-resolution wideband digital playback systems. Discrimination between filtered and unfiltered signals was compared directly in the same subjects using a double-blind psychophysical test. Filter responses tested were representative of anti-alias filters used in A/D (analog-to-digital) converters or mastering processes. Further tests probed the audibility of 16-bit quantization with or without a rectangular dither. Results suggest that listeners are sensitive to the small signal alterations introduced by these filters and quantization. Two main conclusions are offered: first, there exist audible signals that cannot be encoded transparently by a standard CD; and second, an audio chain used for such experiments must be capable of high-fidelity reproduction.
Convention Paper 9174
P14-4 Evaluation Criteria for Live Loudness Meters—Jon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
As a response to discrepancies in loudness levels in broadcast, the recommendations of the International Telecommunication Union and the European Broadcasting Union state that audio levels should be regulated based on loudness measurement. These recommendations differ regarding the definition of meter ballistics for live loudness meters, and this paper seeks to identify possible additional information needed to attain a higher conformity between the recommendations. This work suggests that the qualities we seek in a live loudness meter could be more differentiated for different time scales (i.e., momentary and short-term that is defined by two different integration times), and therefore also should be evaluated by different evaluation criteria.
Convention Paper 9175
P14-5 Factors Influencing Listener Preference for Dynamic Range Compression—Malachy Ronan, University of Limerick - Limerick, Ireland; Robert Sazdov, University of Limerick - Limerick, Ireland; Nicholas Ward, University of Limerick - Limerick, Ireland
The introduction of loudness normalization has led some commentators to declare that the loudness wars are over. However, factors contributing to a preference for dynamic range compression have not been removed. The research presented here investigates the role of long-term memory in sound quality judgments. Factors influencing preference judgments of dynamic range compression are discussed along with suggestions of further research areas. Research is presented that indicates that an objective measure of dynamic range will facilitate a greater understanding of how dynamic range compression affects individual sound quality attributes.
Convention Paper 9176
P14-6 The Influence of Listeners’ Experience, Age, and Culture on Headphone Sound Quality Preferences—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
Double-blind headphone listening tests were conducted in four different countries (Canada, USA, China, and Germany) involving 238 listeners of different ages, gender, and listening experiences. Listeners gave comparative preference ratings for three popular headphones and a new reference headphone that were all virtually presented through a common replicator headphone equalized to match their measured frequency responses. In this way, biases related to headphone brand, price, visual appearance, and comfort were removed from listeners’ judgment of sound quality. On average, listeners preferred the reference headphone that was based on the in-room frequency response of an accurate loudspeaker in a reference listening room. This was generally true regardless of the listeners’ experience, age, gender, and culture. This new evidence suggests a headphone standard based on this new target response would satisfy the tastes of most listeners.
Convention Paper 9177
P14-7 A Hierarchical Approach to Archiving and Distribution—J. Robert Stuart, Meridian Audio Ltd. - Huntingdon, UK; Peter Craven, Algol Applications Ltd. - London, UK
When recording, the ideal is to capture a performance so that the highest possible sound quality can be recovered from the archive. While an archive has no hard limit on the quantity of data assignable to that information, in distribution the data deliverable depends on application-specific factors such as storage, bandwidth or legacy compatibility. Recent interest in high-resolution digital audio has been accompanied by a trend to higher and higher sampling rates and bit depths, yet the sound quality improvements show diminishing returns and so fail to reconcile human auditory capability with the information capacity of the channel. By bringing together advances in sampling theory with recent findings in human auditory science, our approach aims to deliver extremely high sound quality through a hierarchical distribution chain where sample rate and bit depth can vary at each link but where the overall system is managed from end-to-end, including the converters. Our aim is an improved time/frequency balance in a high-performance chain whose errors, from the perspective of the human listener, are equivalent to no more than those introduced by sound traveling a short distance through air.
Convention Paper 9178
P15 - Signal Processing: Part 1
Saturday, October 11, 2:00 pm — 5:00 pm
Jayant Datta, THX - San Francisco, CA, USA; Syracuse University - Syracuse, NY, USA
P15-1 MATLAB Program for Calculating the Parameters of Autocorrelation and Interaural Cross-Correlation Functions Based on a Model of the Signal Processing Performed in the Auditory Pathways—Shin-ichi Sato, Universidad Nacional de Tres de Febrero - Caseros, Buenos Aires, Argentina; Alejandro Bidondo, Universidad Nacional de Tres de Febrero - UNTREF - Caseros, Buenos Aires, Argentina; Yoshiharu Soeta, National Institute of Advanced Industrial Science and Technology (AIST) - Ikeda, Japan
This paper describes a MATLAB program with a graphical user interface (GUI) for a signal processing based on the Auditory Image Model [S. Bleeck et al., Acta Acustica united with Acustica, 90 (2004) 781-787], followed by the summary autocorrelation function (SACF) and the summary interaural cross-correlation function (SIACF) analyses, and the calculation of the SACF and SIACF parameters. The effects of the number of the channels and the frequency range of the filterbanks on the SACF parameters are investigated.
Convention Paper 9179
P15-2 An Investigation of Temporal Feature Integration for a Low-Latency Classification with Application to Speech/Music/Mix Classification—Joachim Flocon-Cholet, Orange Labs - Lannion, France; Julien Faure, Orange Labs - Lannion, France; Alexandre Guérin, Orange Labs - Lannion, France; Pascal Scalart, INRIA/IRISA, Université de Rennes - Rennes, France
In this paper we propose several methodologies for the use of feature integration and evaluate them in a low-latency classification framework. These general methodologies are based on three key aspects that will be assessed in this study: the selection of the features that have to be temporally integrated, the choice of the integration techniques, i.e., how the temporal information is extracted, and the size of the integration window. The experiments carried out for the speech/music/mix classification task show that the different methodologies have a significant impact on the global performance. Compared to the state of the art procedures, the methodologies we proposed achieved the best performance, even with the low-latency constraints.
Convention Paper 9180
P15-3 MATLAB Program for Calculating the Parameters of the Autocorrelation and Interaural Cross-Correlation Functions Based on Ando's Auditory-Brain Model—Shin-ichi Sato, Universidad Nacional de Tres de Febrero - Caseros, Buenos Aires, Argentina
This paper describes a MATLAB program with a graphical user interface (GUI) to calculate the parameters of the autocorrelation and the interaural cross-correlation functions of a binaural signal based on the auditory-brain model proposed by Ando [Y. Ando. (1998) Architectural Acoustics: Sound Source, Sound Fields, and Listeners, Springer-Verlag, New York, Chap. 5], which can describe the various subjective attributes such as pitch, timbre, and spatial impression.
Convention Paper 9181
P15-4 Perceptual Quality of Audio Separated Using Sigmoidal Masks—Toby Stokes, University of Surrey - Guildford, Surrey, UK; Christopher Hummersone, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Andrew Mason, BBC Research and Development - London, UK
Separation of underdetermined audio mixtures is often performed in the Time-Frequency (TF) domain by masking each TF element according to its target-to-mixture ratio. This work uses sigmoidal functions to map the target-to-mixture ratio to mask values. The series of functions used encompasses the ratio mask and an approximation of the binary mask. Mixtures are chosen to represent a range of different amounts of TF overlap, then separated and evaluated using objective measures. PEASS results show improved interferer suppression and artifact scores can be achieved using softer masking than that applied by binary or ratio masks. The improvement in these scores gives an improved overall perceptual score; this observation is repeated at multiple TF resolutions.
Convention Paper 9182
P15-5 A New Approach to Impulse Response Measurements at High Sampling Rates—Joseph G. Tylka, 3D3A Lab, Princeton University - Princeton, NJ, USA; Rahulram Sridhar, 3D3A Lab, Princeton University - Princeton, NJ, USA; Braxton B. Boren, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
High sampling rates are required to fully characterize some acoustical systems, but capturing the system's high-frequency roll-off decreases the signal-to-noise ratio (SNR). Band-pass filtering can improve the SNR but may create an undesirable pre-response. An iterative procedure is developed to measure impulse responses (IRs) with an improved SNR and a constrained pre-response. First, a quick measurement provides information about the system and ambient noise. A second, longer measurement is then performed, and a suitable band-pass filter is applied to the recorded signal. Experimental results show that the proposed procedure achieves an SNR of 37 dB with a peak pre-response amplitude of <0.2% of the IR peak, whereas a conventional technique achieves an SNR of 32 dB with a peak pre-response amplitude of 16%.
Convention Paper 9183
P15-6 IIR Filters for Audio Test and Measurement: Design, Implementation, and Optimization—Thomas Kite, Audio Precision, Inc. - Beaverton, OR, USA
Audio analyzers use filters for many reasons: to define the measurement bandwidth, to isolate tones for measurement, to remove fundamental signals, and so on. In modern instruments, the majority of this filtering is done digitally, following analog-to-digital conversion if the signal is not already digital. Digital filter design is a mature field that encompasses a broad range of techniques, from classical analog filter design to advanced iterative design methods. However, the filter design considerations and techniques unique to audio analyzers do not seem to occupy much space in the published literature. This paper aims to correct this with a discussion of filter design, implementation, and optimization for modern Intel x86 architectures.
Convention Paper 9184
P16 - Applications in Audio: Part 2
Saturday, October 11, 3:00 pm — 4:30 pm
P16-1 General Volterra and Swept-Sine Diagonal System Estimation and Modeling Performance—Russell H. Lambert, Harman International - South Jordan, UT, USA
Volterra system modeling performance results are given for various scenarios using both fully-determined and under-determined models. Three nonlinear system estimation methods are presented and compared including a novel and efficient Farina-type Hammerstein algorithm. The diagonal-only Hammerstein methods will not model off-diagonal nonlinear energy for general uncorrelated inputs but will model correlated inputs to some degree. The data matrix estimation methods work for generic input signal types. The nonlinear system must be fully determined to yield best results, but the diagonal-only models are more practical for applications having significantly long memory channels.
Convention Paper 9185
P16-2 Downward Compatibility Configurations when Using a Univalent 12 Channel 3D Microphone Array Design as a Master Recording Array—Michael Williams, Sounds of Scotland - Le Perreux sur Marne, France
It can be shown that Microphone Array Design applied to a 12-channel 3D microphone array can create a master recording array design that will generate downward compatible signals that satisfy most of the present-day univalent lower order channel/loudspeaker configurations. The implementation of this compatibility oriented array design requires no matrixing or processing of the channel signals, while still maintaining the integrity of the overall sound field architecture. This compatibility approach to 3D array design produces a master recording system that can be adopted for an overall production, eventually to be distributed using several different media formats (stereo, DVD, Blu-ray, 3D, etc.). However this approach can also be used as a consumer choice function within a global master recording or file downloading facility.
Convention Paper 9186
P16-3 Relative Influence of Spectral Bands in Horizontal Front Localization of White Noise—Tomomi Sugasawa, University of Aizu - Aizu-Wakamatsu, Fukushima, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan; Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan
The relationship between horizontal-front localization and the energy in different spectral bands is investigated in this paper. Specifically, we tried to identify which spectral regions produced changes in the judgments of the position of a white noise, when each band was removed from the noise presented through a front loudspeaker and presented via side loudspeakers. These loudspeakers were set at left and right from the front-midsagittal plane of the listener. Participants were asked to assess whether the noise was coming from the front loudspeaker as bands were moved from front to side loudspeakers. Results from a pilot study suggested differences in the relative importance of spectral bands for horizontal-front localization.
Convention Paper 9187
P16-4 Acoustic Digital Communication for Identification Systems—Sergio Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Virtual Things
In this paper a secure, low cost, energy efficient, robust digital communication system for identification purposes is presented. An alternative to magnetic stripe cards, Bluetooth LE, and NFC (Near Field Communication) that requires no specific hardware, making it compatible with almost every smartphone or portable device that has a working loudspeaker and is capable of reproducing audio. While previous works on the subject established the possibility to transmit digital data over the air using acoustic waves, this paper focuses on its implementation.
Convention Paper 9188
P16-5 How Critical Listening Exercises Complement Technical Courses to Effectively Provide Audio Education for Engineering Technology Students—Mark J. Indelicato, Rochester Institute of Technology - Rochester, NY, USA; Clark Hochgraf, Rochester Institute of Technology - Rochester, NY, USA; Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA
Music is important to many aspects of our lives including student life at an institution of higher education. Combining music with academic coursework and programs therefore can be an effective way of engaging students to embrace academic programs and be successful in higher education. Some institutions have purposely incorporated audio engineering into technical programs as a way to not only create interest but to increase retention. Others have created music and technology programs and options, leveraging the student passion for music and keen interest in engineering. This paper discusses the benefit of combining music, technology, and engineering into higher education and, in particular, how the development of critical listening skills is key to the success of such a curriculum.
Convention Paper 9189
P16-6 On the Acoustics of Alleyways—Regina E. Collecchia, Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Sean Coffin, Stanford University - Stanford, CA, USA; Eoin Callery, Stanford University - Stanford, CA, USA; Yoo Hsiu Yeh, Stanford University - Stanford, CA, USA; Kyle Spratt, University of Texas, Austin - Austin, TX, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
Alleyways bounded by flat, reflective, parallel walls and smooth concrete floors can produce impulse responses that are surprisingly rich in texture, featuring a long-lasting modulated tone and a changing timbre, much like the sound of a didgeridoo. This work explores alleyway acoustics with acoustic measurements and presents a computational model based on the image method. Alleyway response spectrograms show spectral zeros rising in frequency with time and a modulated tone lasting noticeably longer than the harmonic series associated with the distance between the walls. With slight canting of the walls and floors to produce the long lasting modulated tone, the image method model captures much of this behavior.
Convention Paper 9190
P17 - Signal Processing: Part 2
Sunday, October 12, 9:00 am — 12:00 pm
J. Keith McElveen, Wave Sciences - Charleston, SC, USA
P17-1 A Practical Approach to Robust Speech Recognition Using Two Microphones in Driving Environments—Jaeyoun Cho, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; Seungyeol Lee, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; Inwoo Hwang, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea
Now that the technologies related to the automatic speech recognition have been mature enough and applicable to our everyday life, people have started considering speech as the most desirable human-device interaction means and utilized speech recognition in vehicles. Nonetheless, it is still challenging to recognize speech correctly in driving environments for at least two reasons. One is that the speech signal is corrupted by innumerable noise sources such as the engine sound, road friction, music from the radio, even worse the mixture of spoken words by passengers, etc. Another is that the recognition device may be put at any place like cup holder, passenger seat or dashboard. In this paper we propose a robust speech recognition front-end that removes the probable ambient noise in a driving car regardless of where the recognition device is. The proposed method finds the direction of speech and enhances the speech signal by first detecting the existence of speech utterance using only two microphones. This front-end is designed with practical consideration so that its implementation in the mobile device showed higher recognition accuracy, shorter processing latency and lower computing power consumption than any other top-tier methods.
Convention Paper 9191
P17-2 Predistortion of a Bidirectional Cuk Audio Amplifier—Thomas Haagen Birch, Technical University of Denmark - Kgs. Lyngby, Denmark; Dennis Nielsen, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Some non-linear amplifier topologies are capable of providing a larger voltage gain than one from a DC source, which could make them suitable for various applications. However, the non-linearities introduce a significant amount of harmonic distortion (THD). Some of this distortion could be reduced using predistortion. This paper suggests linearizing a nonlinear bidirectional Cuk audio amplifier using an analog predistortion approach. A prototype power stage was built and results show that a voltage gain of up to 9 dB and reduction in THD from 6% down to 3% was obtainable using this approach.
Convention Paper 9192
P17-3 Frequency Dependent Loss Analysis and Minimization of System Losses in Switch-Mode Audio Power Amplifiers—Akira Yamauchi, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Ivan H. H. Jørgensen, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
In this paper the frequency dependent losses in switch-mode audio power amplifiers are analyzed and the loss model is improved by taking the voltage dependence of the parasitic capacitance of MOSFETs into account. The estimated power losses are compared to the measurement and great accuracy is achieved. By choosing the optimal switching frequency based on the proposed analysis, the experimental results show that the system power losses of the reference design are minimized and an efficiency improvement of 8% in maximum is achieved without compromising audio performances.
Convention Paper 9193
P17-4 Resolving Delay-Free Loops in Recursive Filters Using the Modified Härmä Method—Will Pirkle, University of Miami - Coral Gables, FL, USA
Resolving delay-free loops in recursive filter structures has been a longstanding problem approached in several different ways including signal flow graph manipulation ,  and more recently with Zavalishin’s instantaneous response technique ,. Härmä demonstrates a method for resolving delay-less loops in recursive filter structures  but the technique is limited to a specific generic loop topology in which the feedforward branch does not implement signal processing; all processing is implemented in one or more delay-less feedback loops. We modify Härmä’s method to accommodate filter processing in the feedforward branch and provide a step-by-step method to resolve delay-less loops in recursive filter structures. We conclude with examples including a new method of synthesizing fourth order filters.
Convention Paper 9194
P17-5 Novel Hybrid Virtual Analog Filters Based on the Sallen-Key Architecture—Will Pirkle, University of Miami - Coral Gables, FL, USA
The Sallen-Key filter structure is a revered analog filter design topology. In Sallen-Key lowpass and highpass filters, the cutoff frequency and resonance (Q) controls are decoupled though the cutoff and resonant frequencies are not. In this paper we demonstrate novel variations on the Sallen-Key architecture and we decouple the resonant and cutoff frequencies. This produces multiple hybrid filter designs including resonant quasi-first order lowpass and highpass filters, resonant quasi-first order low and high shelving filters, decoupled resonant second order filters and doubly resonant quasi-second order lowpass and highpass filters. In the doubly-resonant filters all three frequencies may be decoupled and independently adjustable; they also self-oscillate at both resonant frequencies.
Convention Paper 9195
P17-6 Timbre Imitation and Adaptation for Experimental Music Instruments: An interactive Approach Using Real-Time Digital Signal Processing Framework—Mingfeng Zhang, University of Rochester - Rochester, NY, USA; John Granzow, Stanford University - Stanford, CA, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a real-time digital signal processing framework to extend the timbre control capability of experimental musical instruments. We focus on two music cognition concepts of timbre imitation and adaption to enable experimental musical instruments to be integrated into existing ensemble works. In timbre imitation, we aim to simulate known timbre patterns to enhance the musical coherence during an ensemble performance. In timbre adaptation, we explore extended timbre manipulation settings such as complementary timbre and contrasting timbre. Our proposed framework is implemented on a low cost real-time digital signal processing system to ensure easy adaptability. Our study is based on saxophone and violin but can be readily generalized to other instrument categories.
Convention Paper 9196
P18 - Applications in Audio: Part 1
Sunday, October 12, 9:00 am — 12:30 pm
Jung Wook (Jonathan) Hong, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada
P18-1 Measuring Time Varying or Offset Voltage Dependent Harmonic and Intermodulation Distortion via Filter Banks Including a Stairstep Signal and Measuring FM Distortion in IM Distortion Signals—Ronald Quan, Ron Quan Designs - Cupertino, CA, USA
This paper will present methods of measuring dynamic harmonic distortion using filter banks and a staircase signal. The harmonic distortion is measured in real time at each level of the staircase signal. For measuring dynamic time varying 2nd and 3rd order intermodulation distortion, a low frequency sinewave signal is combined with a higher frequency signal. Also, the FM distortion of the intermodulation distortion signals is measured. FM distortion in current mode op amps is measured. Also FM distortion is measured in an op amp with conventional Miller compensation then measured later in an op amp with two-pole compensation. Finally, Volterra Series distortion analysis is included as part of an equation that describes phase or frequency modulation effects in a nonlinear system.
Convention Paper 9197
P18-2 DC Servos and Digitally-Controlled Microphone Preamplifiers—Gary Hebert, That Corp. - Milford, MA, USA
Microphone preamplifiers for professional audio applications require a very wide range of gain and low noise in order to provide a high-quality interface with the vast number of available microphones. In many modern systems the preamplifier gain is controlled indirectly via a digital interface in discrete steps. Often dc servo amplifiers are employed as a means of keeping the dc gain fixed to avoid large changes in output offset voltage while the audio band gain is varied. The resulting highpass filter response varies substantially as a function of the preamplifier gain. We investigate the frequency and time-domain effects of this. We also investigate several approaches to minimize these effects.
Convention Paper 9198
P18-3 The Design of Urban Sound Monitoring Devices—Charlie Mydlarz, New York University - New York, NY, USA; Samuel Nacach, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Tae Hong Park, New York University - New York, NY, USA
The urban sound environment of New York City is notoriously loud and dynamic. Scientists, recording engineers, and soundscape researchers continue to search for methods to capture and monitor such urban sound environments. One method to accurately monitor and ultimately understand this dynamic environment involves a process of long-term sound capture, measurement, and analysis. Urban sound recording requires the use of robust and resilient acoustic sensors, where unpredictable external conditions can have a negative impact on acoustic data quality. Accordingly, this paper describes the design and build of a self-contained urban acoustic sensing device to capture, analyze, and transmit high quality sound from any given urban environment. The presented acoustic sensing device prototype incorporates a quad core Android-based mini PC with Wi-Fi capabilities, a custom MEMS microphone, and a USB soundcard. The design considerations, materials used, noise mitigation strategies, and the associated measurements are detailed in a following paper.
Convention Paper 9199
P18-4 A Comparison of Real-Time Pitch Detection Algorithms in SuperCollider—Elliot Kermit-Canfield, Stanford University - Stanford, CA, USA
Three readily-available pitch detection algorithms implemented as unit generators in the SuperCollider programming language are evaluated and compared with regard to their accuracy and latency for a variety of test signals consisting of both harmonic and non-harmonic content. Suggestions are made for the type of signal on which each algorithm performs well.
Convention Paper 9200
P18-5 Performance Comparison Between Nested Differentiating Feedback Loops and Classic Three Stage Operational Amplifier Architectures: A SPICE-Based Simulation Approach—Ariel Muszkat, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; David Kadener, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina
Since 1970, the three stage operational amplifier with dominant pole compensation has become the standard basis in amplifier’s architectures. However, during the 1980s and following years the nested differentiating feedback loops (NDFL) concept was introduced by Edward M. Cherry as an attempt to improve the classic power amp performance, mainly distortion caused by class B output stages. The proposal of this work lies on the first part of a comparison between both topologies’ performance in a SPICE based simulator. Most important analyzed parameters are open-loop gain, distortion, transient response, and, of course, stability. In addition, modern semi-conductor devices and improved inner stages are used to make the comparison circuits based on small signal devices such as discrete operational amplifiers.
Convention Paper 9201
P18-6 Making Audio Sound Better One Square Wave at a Time (Or How an Algorithm Called “Undo” Fixes Audio)—Leif Claesson, Omnia Audio - Cleveland, OH, USA
Audio mastering engineers have felt increasing pressure over the years to master recordings at ever increasing loudness levels as compared to other contemporary recordings, by way of dynamic compression, peak limiting, and hard clipping. This pursuit of loudness adds distortion, and reduces fidelity. When radio stations play the compromised audio through their FM processing chains, this confluence of degradation causes serious audio quality issues on air. This paper shall examine what the music endures when broadcast on FM, and how that led to the invention of the “undo” algorithm, which repairs damage caused by these mastering techniques by adaptively de-clipping and de-compressing the mastered recordings.
Convention Paper 9202
P18-7 Acoustic Surveillance of Hazardous Situations Using Nonnegative Matrix Factorization and Hidden Markov Model—Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Dong Yun Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; Myung J. Lee, City University of New York - New York, NY, USA
In this paper an acoustic surveillance method is proposed for accurately detecting hazardous situations under noisy conditions. In order to improve detection accuracy, the proposed method first tries to separate each atypical event from the input noisy audio signal. Next, maximum likelihood classification using multiple hidden Markov models (HMMs) is carried out to decide whether or not an atypical event occurs. Performance evaluation shows that the proposed method achieves higher detection accuracy under various signal-to-noise ratio (SNR) conditions than a conventional HMM-based method.
Convention Paper 9203
P19 - Signal Processing: Part 3
Sunday, October 12, 1:30 pm — 5:00 pm
Duane Wise, Wholegrain Digital Systems LLC - Boulder, CO, USA
P19-1 Eliminating Transducer Distortion in Acoustic Measurements—Finn Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Antoni Torras-Rosell, Danish National Metrology Institute - Lyngby, Denmark; Richard McWalter, Technical University of Denmark - Lyngby, Denmark
This paper investigates the influence of nonlinear components that contaminate the linear response of acoustic transducers and presents a method for eliminating the influence of nonlinearities in acoustic measurements. The method is evaluated on simulated as well as experimental data and is shown to perform well even in noisy conditions. The limitations of the Total Harmonic Distortion, THD, measure is discussed and a new distortion measure, Total Distortion Ratio, TDR, which more accurately describes the amount of nonlinear power in the measured signal, is proposed.
Convention Paper 9204
P19-2 Uniformly-Partitioned Convolution with Independent Partitions in Signal and Filter—Frank Wefers, RWTH Aachen University - Aachen, Germany; Michael Vorländer, RWTH Aachen University - Aachen, Germany
Low-latency real-time FIR filtering is often realized using partitioned convolution algorithms, which split the filter impulse responses into a sequence of sub filters and process these sub filters efficiently using frequency-domain methods (e.g., FFT-based convolution). Methods that split both, the signal and the filter, into uniformly-sized sub filters define a fundamental class of algorithms known as uniformly-partitioned convolution techniques. In these methods both operands, signal and filter, are usually partitioned with the same granularity. This contribution introduces uniformly-partitioned algorithms with independent partitions (block lengths) in both operands and regards viable transform sizes resulting from these. The relations of the algorithmic parameters are derived and the performance of the approach is evaluated.
Convention Paper 9205
P19-3 Modeling the Nonlinear Behavior of Operational Amplifiers—Robert-Eric Gaskell, McGill University - Montreal, QC, Canada; GKL Audio Inc. - Montreal, QC, Canada
Due to the gain-bandwidth characteristics of operational amplifiers, their nonlinearities are frequency dependent, showing a rise in distortion at higher frequencies. Depending on the circuit and system implementations, this distortion can be significant to listener perception of sonic character and quality and is therefore relevant to models of op amp-based analog equipment. Power-series models of the harmonic signature of various op amp nonlinearities are developed with and without this frequency dependence. Listening tests are performed to determine the extent to which the distortion characteristic of the model must match that of the real component to create a perceptually similar result.
Convention Paper 9206
P19-4 More Cowbell: A Physically-Informed, Circuit-Bendable, Digital Model of the TR-808 Cowbell—Kurt James Werner, Center for Computer Research in Music and Acoustics (CCRMA) - Stanford, CA, USA; Stanford University; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
We present an analysis of the cowbell voice circuit from the Roland TR-808 Rhythm Composer. A digital model based on this analysis accurately emulates the original. Through the use of physical and behavioral models of each sub-circuit, this model supports accurate emulation of circuit-bent extensions to the voice's original behavior (including architecture-level alterations and component substitution). Some of this behavior is very complicated and is inconvenient or impossible to capture accurately through black box modeling or structured sampling. The band pass filter sub-circuit is treated as a case study of how to apply Mason's gain formula to finding the continuous-time transfer function of an analog circuit.
Convention Paper 9207
P19-5 A Modal Architecture for Artificial Reverberation with Application to Room Acoustics Modeling—Jonathan S. Abel, Stanford University - Stanford, CA, USA; Sean Coffin, Stanford University - Stanford, CA, USA; Kyle Spratt, University of Texas, Austin - Austin, TX, USA
The modal analysis of a room response is considered, and a computational structure employing a modal decomposition is introduced for synthesizing artificial reverberation. The structure employs a collection of resonant filters, each driven by the source signal and their outputs summed. With filter resonance frequencies and dampings tuned to the modal frequencies and decay times of the space, and filter gains set according to the source and listener positions, any number of acoustic spaces and resonant objects may be simulated. Issues of sufficient modal density, computational efficiency and memory use are discussed. Finally, models of measured and analytically derived reverberant systems are presented, including a medium-sized acoustic room and an electro-mechanical spring reverberator.
Convention Paper 9208
P19-6 The Procedural Sound Design of Sim Cell—Leonard J. Paul, School of Video Game Audio - Vancouver, Canada
Synthesis was used to generate all of the audio for the sound design of educational game Sim Cell using the open source language Pure Data . A primary advantage of using Pure Data is that it can be easily embedded into games for iOS, Android, and other platforms. This paper illustrates different examples of how synthesis can be effectively used in video games in contrast to more conventional contemporary audio production methods such as sampling. Synthesis allows for the accurate rendering of high resolution audio easily in addition to very high rates of data compression when compared to sampling.
Convention Paper 9209
P19-7 OBRAMUS: A System for Object-Based Retouch of Amateur Music—Jordi Janer, Universitat Pompeu Fabra - Barcelona, Catalunya, Spain; Stanislaw Gorlow, Gorlow Brainworks - Bordeaux, France; Keita Arimoto, Yamaha Corporation - Iwata, Shizuoka, Japan
In the more recent past, the area of semantic audio has become an object of special attention due to the increase in attractiveness of signal representations that allow manipulations of audio on a symbolic level. The semantics usually refer to audio objects, such as instruments, or musical entities, such as chords or notes. In this paper we present a system for making minor corrections to amateur piano recordings based on a nonnegative matrix factorization. Acting as middleman between the signal and the user, the system enables a simple form of musical recomposition by altering pitch, timbre, onset, and offset of distinct notes. The workflow is iterative, that is the result improves stepwise through user intervention.
Convention Paper 9210
P20 - Applications in Audio: Part 2
Sunday, October 12, 1:30 pm — 5:00 pm
Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
P20-1 Producing Interactive Immersive Sound for MPEG-H: A Field Test for Sports Broadcasting—Hanne Stenzel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ulli Scuda, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The present paper gives a practical example how broadcast content can be produced for MPEG-H. Existing production workflows are investigated with the question in mind, what needs to be adapted in order to make use of audio objects and immersive 3D-Audio provided by the new broadcast standard. After a short introduction to the features of MPEG-H, practical use cases are presented, such as immersive mixes and interactive personalized audio. In a field test, two sports events were accompanied and original audio material was gathered. Recording methods were tested to see how much additional effort is needed to make use of the mentioned features. The results show that already existing TV productions techniques can be used to provide enough audio material for interactive TV mixes. With little additional effort immersive 3D-audio environments can be created.
Convention Paper 9211
P20-2 Headstock Resonances in the Electric Bass Guitar—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Goran Petrovic, McGill University - Montreal, Quebec, Canada
The solid-body electric bass has long been established as a staple in jazz and popular music. This investigation examines the resonant characteristics of the headstock. Sine sweep techniques are employed to extract resonant characteristics from the headstock and compares these with those of the plucked open strings. It was found that there appears to be a correlation between the integrity of the open string resonances in the headstock with the output sound quality of the instrument.
Convention Paper 9212
P20-3 Requirements Specification for Amplifiers and Power Supplies in Active Loudspeakers—Henrik Schneider, Technical University of Denmark - Kgs. Lyngby, Denmark; Lasse C. Jensen, Technical University of Denmark - Kgs. Lyngby, Denmark; Lars Press Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
This work aims to provide designers with a method to develop a requirements specification for power supplies and amplifiers in active loudspeakers. The motivation is to avoid over-sizing and unnecessary cost. A realistic estimation of the power supplied during playback of audio in a given loudspeaker is obtained by considering a wide range of audio source material, loudness normalization of the source material, crossover filtering, driver characteristics as well as a perceived maximum loudness/volume level. The results from analyzing a sub-woofer and a woofer reveals the peak power, peak voltage, peak current, and apparent power—thus providing a solid foundation for a requirement specification.
Convention Paper 9213
P20-4 Multiphysical Simulation Methods for Loudspeakers—Advanced CAE-Based Simulations of Motor Systems—Alfred Svobodnik, Konzept-X GmbH - Karlsruhe, Germany; Roger Shively, JJR Acoustics, LLC - Seattle, WA, USA; Marc-Olivier Chauveau, Moca Audio - Tours, France
This is the first in a series of papers on the details of loudspeaker design using multiphysical computer aided engineering simulation methods. In this paper, the simulation methodology for accurately modeling the electromagnetics of loudspeakers will be presented. Primarily, the creation of a useful impedance curve in the virtual world will be demonstrated. The influences of the mechanical mounting will also be illustrated, as well as the inherent non-linearities of the loudspeaker motor. Those non-linearities will be illustrated through the correct simulation of the electromagnetic driving force, which has an influence on all loudspeakers, and the voice coil inductance, which can have a profound influence on midrange and high frequency loudspeakers. Results will be presented, correlating the simulated model results to the measured physical parameters and to the impedance curve. From that, the important aspects of the modeling that determine its accuracy will be discussed.
Convention Paper 9214
P20-5 MotionMix: A Gestural Audio Mixing Controller—Jarrod Ratcliffe, New York University - New York, NY, USA
This paper presents a control interface for stereo mixing using real time computer vision. The user’s sense of depth and panorama are improved over the traditional channel strip, while broad accessibility is maintained by integrating the system with Digital Audio Workstation (DAW) software and implementing a system that is portable and affordable. To provide the user with a heightened sense of sound spatialization over the traditional channel strip, the concept of depth is addressed directly using the stage metaphor. Sound sources are represented as colored spheres in a graphical user interface to provide the user with visual feedback. Moving sources back and forward controls volume, while left to right controls panning. Preliminary evaluation is conducted through a pilot study, and user feedback is considered regarding future applications of the interface.
Convention Paper 9215
P20-6 An Associative Shared Memory Approach to Audio Connection Management—Andrew Eales, Wellington Institute of Technology - Wellington, New Zealand; Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
A distributed, associative memory that advertises audio streams and represents audio connections between networked audio devices is described. Characteristic features of a shared, associative memory are discussed, and three parameter-based models that represent audio signals and audio connections are introduced. Connection management is then discussed with reference to a distributed, associative memory environment. This environment allows changes made to audio connections to be automatically propagated to all networked devices, while also eliminating potential race conditions between connection requests. Additionally, connection management applications can be shared between different networked devices and controllers.
Convention Paper 9216
P20-7 Utilizing Gesture Recognition and Ethernet AVB for Distributed Surround Sound Control—Mitchell Hedges, Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
Gesture recognition has become a preferred approach to the control of various systems. This allows users of the system to interact without having to use any controls or equipment. This paper investigates the use of gesture recognition in order to select and transport audio tracks over an Ethernet AVB network to speaker endpoints. The research uses equipment that is commercially available and relatively cost efficient. The endpoints receive audio samples that are encapsulated within network packets and processes them. The audio tracks are mixed at the endpoints according to gain ratios that will change and be different for each endpoint.
Convention Paper 9217