AES New York 2015
Paper Session Details

P1 - Signal Processing

Thursday, October 29, 9:00 am — 12:30 pm (Room 1A08)

Chair:
Scott Norcross, Dolby Laboratories - San Francisco, CA, USA

P1-1 Time-Frequency Analysis of Loudspeaker Sound Power Impulse Response—Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; Allan Devantier, Samsung Research America - Valencia, CA, USA; Adrian Celestinos, Samsung Research America - Valencia, CA, USA
In normal conditions (e.g., a living room) the total sound power emitted by the loudspeaker plays an important role in the listening experience. Along with the direct sound and first reflections, the sound power defines the loudspeaker performance in the room. The acoustic resonances of the loudspeaker system are especially important, and thanks to spatial averaging, are more easily revealed in the sound power response. In this paper we use time-frequency analysis to study the spatially averaged impulse response and reveal the structure of its resonances. We also show that the net effect of loudspeaker equalization is not only the attenuation of the resonances but also the shortening of their duration.
Convention Paper 9354 (Purchase now)

P1-2 Low-Delay Transform Coding Using the MPEG-H 3D Audio Codec—Christian R. Helmrich, International Audio Laboratories - Erlangen, Germany; Michael Fischer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Recently the ISO/IEC MPEG-H 3D Audio standard for perceptual coding of one or more audio channels has been finalized. It is a little-known fact that, particularly for communication applications, the 3D Audio core-codec can be operated in a low-latency configuration in order to reduce the algorithmic coding/decoding delay to 44, 33, 24, or 18 ms at a sampling rate of 48 kHz. This paper introduces the essential coding tools required for high-quality low-delay coding–transform splitting, intelligent gap filling, and stereo filling–and demonstrates by means of blind listening tests that the achievable subjective performance compares favorably with, e.g., that of HE-AAC even at low bit-rates.
Convention Paper 9355 (Purchase now)

P1-3 Dialog Control and Enhancement in Object-Based Audio Systems—Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA; Brandon Smith, DTS, Inc. - Bellevue, WA, USA; Jeff Thompson, DTS, Inc. - Bellevue, WA, USA
Dialog is often considered the most important audio element in a movie or television program. The potential for artifact-free dialog salience personalization is one of the advantages of new object-based multichannel digital audio formats, along with the ability to ensure that dialog remains comfortably audible in the presence of concurrent sound effects or music. In this paper we review some of the challenges and requirements of dialog control and enhancement methods in consumer audio systems, and their implications in the specification of object-based digital audio formats. We propose a solution incorporating audio object loudness metadata, including a simple and intuitive consumer personalization interface and a practical head-end encoder extension.
Convention Paper 9356 (Purchase now)

P1-4 Frequency-Domain Parametric Coding of Wideband Speech–A First Validation Model—Aníbal Ferreira, University of Porto - Porto, Portugal; Deepen Sinha, ATC Labs - Newark, NJ, USA
Narrow band parametric speech coding and wideband audio coding represent opposite coding paradigms involving audible information, namely in terms of the specificity of the audio material, target bit rates, audio quality, and application scenarios. In this paper we explore a new avenue addressing parametric coding of wideband speech using the potential and accuracy provided by frequency-domain signal analysis and modeling techniques that typically belong to the realm of high-quality audio coding. A first analysis-synthesis validation framework is described that illustrates the decomposition, parametric representation, and synthesis of perceptually and linguistically relevant speech components while preserving naturalness and speaker specific information.
Convention Paper 9357 (Purchase now)

P1-5 Proportional Parametric Equalizers—Application to Digital Reverberation and Environmental Audio Processing—Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA
Single-band shelving or presence boost/cut filters are useful building blocks for a wide range of audio signal processing functions. Digital filter coefficient formulas for elementary first- or second-order IIR parametric equalizers are reviewed and discussed. A simple modification of the classic Regalia-Mitra design yields efficient solutions for tunable digital equalizers whose dB magnitude frequency response is proportional to the value of their gain control parameter. Practical applications to the design of tone correctors, artificial reverberators and environmental audio signal processors are described.
Convention Paper 9358 (Purchase now)

P1-6 Comparison of Parallel Computing Approaches of a Finite-Difference Implementation of the Acoustic Diffusion Equation Model—Juan M. Navarro, UCAM - Universidad Católica San Antonio - Guadalupe (Murcia), Spain; Baldomero Imbernón, UCAM Catholic University of San Antonio - Murcia, Spain; José J. López, Universitat Politcnica de Valencia - Valencia, Spain; José M. Cecilia, UCAM Catholic University of San Antonio - Murcia, Spain
The diffusion equation model has been intensively researched as a room-acoustics simulation algorithm during last years. A 3-D finite-difference implementation of this model was proposed to evaluate the propagation over time of sound field within rooms. Despite the computational saving of this model to calculate the room energy impulse response, elapsed times are still long when high spatial resolutions and/or simulations in several frequency bands are needed. In this work several data-parallel approaches of this finite-difference solution on Graphics Processing Units are proposed using a compute unified device architecture programming model. A comparison of their performance running on different models of Nvidia GPUs is carried out. In general, 2D vertical block approach running in a Tesla K20C shows the best speed-up of more than 15 times versus CPU version.
Convention Paper 9359 (Purchase now)

P1-7 An Improved and Generalized Diode Clipper Model for Wave Digital Filters—Kurt James Werner, Center for Computer Research in Music and Acoustics (CCRMA) - Stanford, CA, USA; Stanford University; Vaibhav Nangia, Stanford University - Stanford, CA, USA; Alberto Bernardini, Politecnico di Milano - Milan, Italy; Julius O. Smith, III, Stanford University - Stanford, CA, USA; Augusto Sarti, Politecnico di Milano - Milan, Italy
We derive a novel explicit wave-domain model for “diode clipper" circuits with an arbitrary number of diodes in each orientation, applicable, e.g., to wave digital filter emulation of guitar distortion pedals. Improving upon and generalizing the model of Paiva et al. (2012), which approximates reverse-biased diodes as open circuits, we derive a model with an approximated correction term using two Lambert W functions. We study the energetic properties of each model and clarify aspects of the original derivation. We demonstrate the model's validity by comparing a modded Tube Screamer clipping stage emulation to SPICE simulation.
Convention Paper 9360 (Purchase now)

P2 - Audio Education

Thursday, October 29, 9:00 am — 12:00 pm (Room 1A07)

Chair:
Tim Ryan, Webster University - St. Louis, MO, USA

P2-1 LabVIEW as a Music Synthesizer Laboratory Learning Environment—Edward B. Stokes, University of North Carolina at Charlotte - Charlotte, NC, USA; Ed Doering, Rose-Hulman Institute of Technology - Terre Haute, IN, USA
Most electrical engineering (EE) students are familiar with LabVIEW. This graphical programming environment is commonly used in university EE educational and research labs to facilitate data acquisition and processing using a suite of built-in mathematical, DSP, and communication functions. LabVIEW is particularly adept at emulating control panels with a variety of knobs, sliders, and gages. The audio functionality of LabVIEW, along with its “knobby” user interface, makes it ideal for exploration of music synthesis concepts by EE students. In this paper several types of music synthesis are explored in LabVIEW. Implementation of these in elective EE coursework gives EE students a unique opportunity to experience abstract concepts such as waveforms, frequency, filtering, and envelopes through their auditory cortex, reinforcing what they have learned through traditional pedagogy, and also provides EE students an introduction to some basic audio engineering (AE) concepts.
Convention Paper 9361 (Purchase now)

P2-2 A Model for International and Industry-Engaged Collaboration and Learning—Mark Thorley, Coventry University - Coventry, Warwickshire, UK
Traditional barriers of geography, organization, and culture and being broken down by emerging technology [1]. In the recording industry, professionals often collaborate on projects globally, engaging in what Tapscott and Williams [2] call “peer-production.” The potential in these concepts extends to those developing their expertise—they can connect with peers and industry professionals on a global scale. Despite the potential however, most Higher Education institutions fail to engage for cultural reasons. This paper outlines a model for collaborative learning explored and developed through a project funded by the UK's Higher Education Academy. The project involved Coventry University and industry organization JAMES as well as a number of other international partners. The paper looks at the pedagogical background to the project, some typical activities undertaken before summarizing the key outcomes and opportunities for further work.
Convention Paper 9362 (Purchase now)

P2-3 From Creativity to Science and Back Again: Supporting Audio Students Through Active Teaching Approaches—Jason Fick, The Art Institute of Dallas - Dallas, TX, USA
Many students enrolling in audio programs are not fully aware of the importance of science for the audio professional. Typically these students are creative but may have deficiencies in math and science. My goal as an instructor is to minimize the negative associations of these subjects through active lesson plans that stress practical audio situations in a compelling and interactive manner. As a result, students develop confidence through their ability to use science as a tool to both solve audio problems and create expressive art forms. My approaches empower them to succeed in early courses, which facilitate creative applications in later classes. Consequently, students are better prepared for the job force using skills that promote both technical and creative capacities.
Convention Paper 9363 (Purchase now)

P2-4 The Use of Digital Reverberation Projects to Teach Audio Signal Processing—Benjamin D. McPheron, Roger Williams University - Bristol, RI, USA; Kelsey M. Cintorino, Roger Williams University - Bristol, RI, USA; Nicholas J. Benoit, Roger Williams University - Bristol, RI, USA; Abdulrahim S. Hasan, Roger Williams University - Bristol, RI, USA; Kevin J. Oliveira, Roger Williams University - Bristol, RI, USA; Andrew D. Senerchia, Roger Williams University - Bristol, RI, USA; Daniel M. Wisniewski, Roger Williams University - Bristol, RI, USA
Hands-on application is essential to the development of practicing engineers capable of designing and implementing digital signal processing methods. The application of digital signal processing to audio applications provides students with instantly gratifying results and further develops future audio engineering professionals. In order to provide deeper understanding of audio processing techniques, students can be presented with projects that challenge them to create unique applications or methods in the field of audio processing. This work reports the project framework and outstanding student work resulting from implementing this method in a digital signal processing course, as well as the assessment strategy used to evaluate student understanding of key audio engineering techniques.
Convention Paper 9364 (Purchase now)

P2-5 Audio Recording and Production Education: Skills New Hires Have and Where They Reported Learning Them—Doug Bielmeier, Purdue School Of Engineering, IUPUI - Indianapolis, IN, USA
To understand how audio recording and production programs meet the needs of the larger entertainment industry, this study directly asked new hires what skills they have and where they were learned. In the New Hires Survey they were asked to rate the level of proficiency of their skills, where they learned these skills, and what skills they need to learn. The new hires reported learning basic technical skills during formal audio recording and production training but learned social and communication skills on their own or on the job. They requested a greater emphasis on career-critical areas of live sound and music business. Further research is recommended to understand industry needs, identify best practices for the acquisition of skills, and to determine how educational institutions can keep pace with the ever-changing entertainment industry.
Convention Paper 9365 (Purchase now)

P2-6 Case Study: Expanding Audio Production Facilities at Ohio University to Accommodate Student Needs—Kyle P. Snyder, Ohio University, School of Media Arts & Studies - Athens, OH, USA
Creating a recording facility is equal parts art and science. However, designing and adapting recording studios in higher education environments presents several challenges unseen within the commercial arena. In its third major design iteration since the formation of the College of Communication in 1968, the School of Media Arts & Studies has expanded its facilities to include a new mixing and mastering suite, an expansive 5.1 post-production and critical listening facility, and numerous classrooms and additional lab spaces, more than tripling the space available to faculty, graduate, and undergraduate students.
Convention Paper 9366 (Purchase now)

P3 - Transducers/Perception

Thursday, October 29, 11:00 am — 12:30 pm (S-Foyer 1)

P3-1 Predicting the Acoustic Power Radiation from Loudspeaker Cabinets: A Numerically Efficient Approach—Mattia Cobianchi, B&W Group Ltd. - West Sussex, UK; Martial Rousseau, B&W Group Ltd. - West Sussex, UK
Loudspeaker cabinets should not contribute at all to the total sound radiation but aim instead to be a perfectly rigid box that encloses the drive units. To achieve this goal, state of the art FEM software packages and Doppler vibro-meters are the tools at our disposal. The modeling steps covered in the paper are: measuring and fitting orthotropic material properties, including damping; 3D mechanical modeling with a curvilinear coordinates system and thin elastic layers to represent glue joints; scanning laser Doppler measurements and single point vibration measurements with an accelerometer. Additionally a numerically efficient post-processing approach used to extract the total radiated acoustic power and an example of what kind of improvement can be expected from a typical design optimization are presented.
Convention Paper 9367 (Purchase now)

P3-2 New Method to Detect Rub and Buzz of Loudspeakers Based on Psychoacoustic Sharpness—Tingting Zhou, Nanjing Normal University - Nanjing, Jiangsu, China; Ming Zhang, Nanjing Normal University - Nanjing, Jiangsu, China; Chen Li, Nanjing Normal University - Nanjing, Jiangsu, China
The distortion detection of loudspeakers has been researched for a very long time. Researchers are committed to finding an objective way to detect Rub and Buzz (R&B) in loudspeakers that is in line with human ear feelings. This paper applies the psychoacoustics to distortion detection of loudspeakers and describes a new method to detect the R&B based on the psychoacoustic sharpness. Experiments show, comparing with existing objective detection methods of R&B, detection results based on the proposed method are more consistent with subjective judgments.
Convention Paper 9368 (Purchase now)

P3-3 Modal Impedances and the Boundary Element Method: An Application to Horns and Ducts—Bjørn Kolbrek, Norwegian University of Science and Technology - Trondheim, Norway
Loudspeaker horns, waveguides, and other ducts can be simulated by general numerical methods, like the Finite Element or Boundary Element Methods (FEM or BEM), or by a method using a modal description of the sound field, called the Mode Matching Method (MMM). BEM and FEM can describe a general geometry but are often computationally expensive. MMM, on the other hand, is fast, easily scalable, requires no mesh generation and little memory but can only be applied to a limited set of geometries. This paper shows how BEM and MMM can be combined in order to efficiently simulate horns where part of the horn must be described by a general meshed geometry. Both BEM-MMM and MMM-BEM couplings are described, and examples given.
Convention Paper 9369 (Purchase now)

P3-4 Audibility Threshold of Auditory-Adapted Exponential Transfer-Function Smoothing (AAS) Applied to Loudspeaker Impulse Responses—Florian Völk, Technische Universität München - München, Germany; WindAcoustics UG (haftungsbeschränkt) - Windach, Germany; Yuliya Fedchenko, Technische Universität München - Munich, Germany; Hugo Fastl, Technical University of Munich - Munich, Germany
A reverberant acoustical system’s transfer function may show deep notches or pronounced peaks, requiring large linear amplification in the play-back system when used, for example, in auralization or for convolution reverb. It is common practice to apply spectral smoothing, with the aim of reducing spectral fluctuation without degrading auditory-relevant information. A procedure referred to as auditory-adapted exponential smoothing (AAS) was proposed earlier, adapted to the spectral properties of the hearing system by implementing frequency-dependent smoothing bandwidths. This contribution presents listening experiments aimed at determining the audibility threshold of auditory-adapted exponential smoothing, which is the maximum amount of spectral smoothing allowed without being audible. As the results depend on the specific acoustic system, parametrization guidelines are proposed.
Convention Paper 9371 (Purchase now)

P3-5 Developing a Timbrometer: Perceptually-Motivated Audio Signal Metering—Duncan Williams, University of Plymouth - Devon, UK
Early experiments suggest that a universally agreed upon timbral lexicon is not possible, and nor would such a tool be intrinsically useful to musicians, composers, or audio engineers. Therefore the goal of this work is to develop perceptually-calibrated metering tools, with a similar interface and usability to that of existing loudness meters, by making use of a linear regression model to match large numbers of acoustic features to listener reported timbral descriptors. This paper presents work towards a proof-of-concept combination of acoustic measurement and human listening tests in order to explore connections between 135 acoustic features and 3 timbral descriptors, brightness, warmth, and roughness.
Convention Paper 9372 (Purchase now)

P3-6 A Method of Equal Loudness Compensation for Uncalibrated Listening Systems—Oliver Hawker, Birmingham City University - Birmingham, UK; Yonghao Wang, Birmingham City University - Birmingham, UK
Equal-loudness contours represent the sound-pressure-level-dependent frequency response of the auditory system, which implies an arbitrary change in the perceived spectral balance of a sound when the sound-pressure-level is modified. The present paper postulates an approximate proportional relationship between loudness and sound-pressure-level, permitting relative loudness modification of an audio signal while maintaining a constant spectral balance without an absolute sound-pressure-level reference. A prototype implementation is presented and accessible at [1]. Preliminary listening tests are performed to demonstrate the benefits of the described method.
Convention Paper 9373 (Purchase now)

P4 - Transducers—Part 1: Headphones, Amplifiers, and Microphones

Thursday, October 29, 2:30 pm — 5:30 pm (Room 1A08)

Chair:
Christopher Struck, CJS Labs - San Francisco, CA, USA; Acoustical Society of America

P4-1 Headphone Response: Target Equalization Trade-offs and Limitations—Christopher Struck, CJS Labs - San Francisco, CA, USA; Acoustical Society of America; Steve Temme, Listen, Inc. - Boston, MA, USA
The effects of headphone response and equalization are examined with respect to the influence on perceived sound quality. Free field, diffuse field, and hybrid real sound field targets are shown and objective response data for a number of commercially available headphones are studied and compared. Irregular responses are examined to determine the source of response anomalies, whether these can successfully be equalized and what the limitations are. The goal is to develop a robust process for evaluating and appropriately equalizing headphone responses to a psychoacoustically valid target and to understand the constraints.
Convention Paper 9374 (Purchase now)

P4-2 A Headphone Measurement System Covers both Audible Frequency and beyond 20 kHz—Naotaka Tsunoda, Sony Corporation - Shinagawa-ku, Tokyo, Japan; Takeshi Hara, Sony Corporation - Tokyo, Japan; Koji Nageno, Sony Corporation - Tokyo, Japan
New headphone measurement system consisting of a 1/8” microphone and newly developed HATS (Head And Torso Simulator) with a coupler that have realistic ear canal shape is proposed to enable entire frequency response measurement from audible frequency and higher frequency area up to 140 kHz. At the same time a new frequency response evaluation scheme based on HRTF correction is proposed. Measurement results obtained by this scheme enables much better understanding by enabling direct comparison with free field loudspeaker frequency response.
Convention Paper 9375 (Purchase now)

P4-3 Measurements of Acoustical Speaker Loading Impedance in Headphones and Loudspeakers—Jason McIntosh, McIntosh Applied Engineering - Eden Prairie, MN, USA
The acoustical design of two circumaural headphones and a desktop computer speaker have been studied by measuring the acoustical impedance of the various components in their design. The impedances were then used to build an equivalent circuit model for the devices that then predicted their pressure response. There was seen to be good correlation between the model and measurements. The impedance provides unique insight into the acoustic design that is not observed though electrical impedance or pressure response measurements that are commonly relied upon when designing such devices. By building models for each impedance structure, it is possible to obtain an accurate model of the whole system where the effects of each component upon the device's overall performance can be seen.
Convention Paper 9376 (Purchase now)

P4-4 Efficiency Investigation of Switch-Mode Power Audio Amplifiers Driving Low Impedance Transducers—Niels Elkjær Iversen, Technical University of Denmark - Lyngby, Denmark; Henrik Schneider, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
The typical nominal resistance span of an electro dynamic transducer is 4 Ohms to 8 Ohms. This work examines the possibility of driving a transducer with a much lower impedance to enable the amplifier and loudspeaker to be directly driven by a low voltage source such as a battery. A method for estimating the amplifier rail voltage requirement as a function of the voice coil nominal resistance is presented. The method is based on a crest factor analysis of music signals and estimation of the electrical power requirement from a specific target of the sound pressure level. Experimental measurements confirm a huge performance leap in terms of efficiency compared to a conventional battery-driven sound system. Future optimization of low voltage, high current amplifiers for low impedance loudspeaker drivers are discussed.
Convention Paper 9377 (Purchase now)

P4-5 Self-Oscillating 150 W Switch-Mode Amplifier Equipped with eGaN-FETs—Martijn Duraij, Technical University of Denmark - Lyngby, Denmark; Niels Elkjær Iversen, Technical University of Denmark - Lyngby, Denmark; Lars Press Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark; Patrik Boström, Bolecano Holding AB - Helsingborg, Sweden
Where high-frequency clocked system switch-mode audio power amplifiers equipped with eGaN-FETs have been introduced in the past years, a novel self-oscillating eGaN-FET equipped amplifier is presented. A 150 Wrms amplifier has been built and tested with regard to performance and efficiency with an idle switching frequency of 2 MHz. The amplifier consists of a power-stage module with a self-oscillating loop and an error-reducing global loop. It was found that an eGaN-FET based amplifier shows promising potential for building high power density audio amplifiers with excellent audio performance. However care must be taken of the effects caused by a higher switching frequency.
Convention Paper 9378 (Purchase now)

P4-6 Wind Noise Measurements and Characterization Around Small Microphone Ports—Jason McIntosh, Starkey Hearing Technologies - Eden Prairie, MN, USA; Sourav Bhunia, Starkey Hearing Technologies - Eden Prairie, MN, USA
The physical origins of microphone wind noise is discussed and measured. The measured noise levels are shown to correlate well to theoretical estimates of non-propagating local fluid dynamic turbulence pressure variations called “convective pressure.” The free stream convective pressure fluctuations may already be present in a flow independent of its interactions with a device housing a microphone. Consequently, wind noise testing should be made in turbulent air flows rather than laminar. A metric based on the Speech Intelligibility Index (SII) is proposed for characterizing wind noise effects for devices primarily designed to work with speech signals, making it possible to evaluate nonlinear processing effects on reducing wind noise on microphones.
Convention Paper 9379 (Purchase now)

P5 - Perception—Part 1

Thursday, October 29, 2:30 pm — 5:00 pm (Room 1A07)

Chair:
Jon Boley, GN ReSound - Chicago, IL, USA

P5-1 Detection of High-Frequency Harmonics in a Complex Tone—Wesley Bulla, Belmont University - Nashville, TN, USA
Prior investigations have generally failed to confirm or deny the influence of high-frequency harmonics contained in musical sounds. Embedded within this experiment were two listening tests: one investigating threshold for differences in timbre, and thus, participant ability, and another seeking to find an influence of high-frequency harmonic content on timbre perception. Based on the premise that harmonics out of the range of auditory detection influence the resultant waveform and therefore may alter the percept of a sound’s tonal character, this study found no evidence that capable listeners noticed an effect of high frequency harmonics.
Convention Paper 9380 (Purchase now)

P5-2 Towards a Perceptual Model of “Punch” in Musical Signals—Steven Fenton, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This paper proposes a perceptual model for the measurement of “punch” in musical signals. Punch is an attribute that is often used to characterize music or sound sources that convey a sense of dynamic power or weight to the listener. A methodology is explored that combines signal separation and low level parameter measurement to produce a perceptually weighted “punch” score. The parameters explored are the onset time and frequency components of the signal across octave bands. The “punch” score is determined by a weighted sum of these parameters using coefficients derived through a large scale listening test. The model may have application in music information retrieval (MIR) and music production tools. The paper concludes by evaluating the perceptual model using commercially released music.
Convention Paper 9381 (Purchase now)

P5-3 Factors That Influence Listeners’ Preferred Bass and Treble Levels in Headphones—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International Inc. - Northridge, CA, USA
A listening experiment was conducted to study factors that influence listeners’ preferred bass and treble balance in headphone sound reproduction. Using a method of adjustment a total of 249 listeners adjusted the relative treble and bass levels of a headphone that was first equalized at the eardrum reference point (DRP) to match the in-room steady-state response of a reference loudspeaker in a reference listening room. Listeners repeated the adjustment five times using three stereo music programs. The listeners included males and females from different age groups, listening experiences, and nationalities. The results provide evidence that the preferred bass and treble balances in headphones was influenced by several factors including program, and the listeners’ age, gender, and prior listening experience. The younger and less experienced listeners on average preferred more bass and treble in their headphones compared to the older, more experienced listeners. Female listeners on average preferred less bass and treble than their male counterparts.
Convention Paper 9382 (Purchase now)

P5-4 Identifying and Validating Program Material: A Hyper-Compression Perspective—Malachy Ronan, University of Limerick - Limerick, Ireland; Nicholas Ward, University of Limerick - Limerick, Ireland; Robert Sazdov, University of Limerick - Limerick, Ireland
Two listening experiments were conducted to assess: (i) the effect of program material on six sound quality dimensions and (ii) the effect of 20 dB of compression limiting on distraction. Thirty-five participants completed two experiments using a MuSHRA style interface. The experimental results demonstrate that program material significantly affected dimension and distraction ratings. Dimension ratings were influenced by prior listening experience while distraction ratings related to audible artifacts in different program material. Program material from the same artist was rated similarly for distraction in two-thirds of the dimensions suggesting a possible correlation between production aesthetics and audible artifacts. It is concluded that validating program material is a necessary precaution to avoid distracting perceptual cues generated by the process of dynamic range compression.
Convention Paper 9383 (Purchase now)

P5-5 Validation of Experimental Methods to Record Stimuli for Microphone Comparisons—Andy Pearce, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Martin Dewhirst, University of Surrey - Guildford, Surrey, UK
Test recordings can facilitate evaluation of a microphone's characteristics but there is currently no standard or experimentally validated method for making recordings to compare the perceptual characteristics of microphones. This paper evaluates previously used recording methods, concluding that, of these, the most appropriate approach is to record multiple microphones simultaneously. However, perceived differences between recordings made with microphones in a multi-microphone array might be due to (i) the characteristics of the microphones and/or (ii) the different locations of the microphones. Listening tests determined the maximum acceptable size of a multi-microphone array to be 150 mm in diameter, but the diameter must be reduced to no more than 100 mm if the microphones to be compared are perceptually very similar.
Convention Paper 9385 (Purchase now)

P6 - Transducers—Part 2: Loudspeakers

Friday, October 30, 9:00 am — 12:30 pm (Room 1A08)

Chair:
Sean Olive, Harman International - Northridge, CA, USA

P6-1 Wideband Compression Driver Design, Part 1: A Theoretical Approach to Designing Compression Drivers with Non-Rigid Diaphragms—Jack Oclee-Brown, GP Acoustics (UK) Ltd. - Maidstone, UK
This paper presents a theoretical approach to designing compression drivers that have non-rigid radiating diaphragms. The presented method is a generalization of the Smith "acoustic mode balancing" approach to compression driver design that also considers the modal behavior of radiating structure. It is shown that, if the mechanical diaphragm modes and acoustical cavity modes meet a certain condition, then the diaphragm non-rigidity is not a factor that limits the linear driver response. A theoretical compression driver design approximately meeting this condition is described and it's performance evaluated, using FEM models.
Convention Paper 9386 (Purchase now)

P6-2 Time/Phase Behavior of Constant Beamwidth Transducer (CBT) Circular-Arc Loudspeaker Line Arrays—D.B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA
This paper explores the time and phase response of circular-arc CBT arrays through simulation and measurement. Although the impulse response of the CBT array is spread out in time, it’s phase response is found to be minimum phase at all locations in front of the array: up-down, side-to-side, and near-far. When the magnitude response is equalized flat with a minimum-phase filter, the resultant phase is substantially linear phase over a broad frequency range at all these diverse locations. This means that the CBT array is essentially time aligned and linear phase and as a result will accurately reproduce square waves anywhere within its coverage. Accurate reproduction of square waves is not necessarily audible but many people believe that it is an important loudspeaker characteristic. The CBT array essentially forms a virtual point-source but with the extremely-uniform broadband directional coverage of the CBT array itself. When the CBT array is implemented with discrete sources, the impulse response mimics a FIR filter but with non-linear sample spacing and with a shape that looks like a roller coaster track viewed laterally. An analysis of the constant-phase wave fronts generated by a CBT array reveals that the sound waves essentially radiate from a point that is located at the center of curvature of the array’s circular arc and are essentially circular at all distances, mimicking a point source.
Convention Paper 9387 (Purchase now)

P6-3 Progressive Degenerate Ellipsoidal Phase Plug—Charles Hughes, Excelsior Audio - Gastonia, NC, USA; AFMG - Berlin, Germany
This paper will detail the concepts and design of a new phase plug. This device can be utilized to transform a circular planar wave front to a rectangular planar wave front. Such functionality can be very useful for line array applications as well as for feeding the input, or throat section, of a rectangular horn from the output of conventional compression drivers. The design of the phase plug allows for the exiting wave front to have either concave or convex curvature if a planar wave front is not desired. One of the novel features of this device is that there are no discontinuities within the phase plug.
Convention Paper 9388 (Purchase now)

P6-4 Low Impedance Voice Coils for Improved Loudspeaker Efficiency—Niels Elkjær Iversen, Technical University of Denmark - Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
In modern audio systems utilizing switch-mode amplifiers the total efficiency is dominated by the rather poor efficiency of the loudspeaker. For decades voice coils have been designed so that nominal resistances of 4 to 8 Ohms is obtained, despite modern audio amplifiers, using switch-mode technology, can be designed to much lower loads. A thorough analysis of the loudspeaker efficiency is presented and its relation to the voice coil fill factor is described. A new parameter, the drivers mass ratio, is introduced and it indicates how much a fill factor optimization will improve a driver’s efficiency. Different voice coil winding layouts are described and their fill factors analyzed. It is found that by lowering the nominal resistance of a voice coil, using rectangular wire, one can increase the fill factor. Three voice coils are designed for a standard 10” woofer and corresponding frequency responses are estimated. For this woofer it is shown that the sensitivity can be improved approximately 1 dB, corresponding to a 30% efficiency improvement, just by increasing the fill factor using a low impedance voice coil with rectangular wire.
Convention Paper 9389 (Purchase now)

P6-5 Effectiveness of Exotic Vapor-Deposited Coatings on Improving the Performance of Hard Dome Tweeters—Peter John Chapman, Harman - Denmark; Bang & Olufsen Automotive
The audio industry is constantly striving for new and different methods with which to improve the sound quality and performance of components in the signal chain. In many cases however, insufficient evidence is provided for the benefit of so-called improvements. This paper presents the results of a scientific study to analyze the effectiveness of applying vapor-deposited diamond-like-carbon, chromium, and chromium nitride coatings to aluminum and titanium hard dome tweeters. Careful attention was paid during the processing, assembly, and measurement of the tweeters to ensure a control and equal influence of other factors such that a robust analysis could be made. The objective results were supplemented with listening tests between the objectively most significant change and the control.
Convention Paper 9390 (Purchase now)

P6-6 Wideband Compression Driver Design. Part 2, Application to a High Power Compression Driver with a Novel Diaphragm Geometry—Mark Dodd, Celestion - Ipswich, Suffolk, UK
Performance limitations of high-power wide-bandwidth conventional and co-entrant compression drivers are briefly reviewed. An idealized co-entrant compression driver is modeled and acoustic performance limitations discussed. The beneficial effect of axisymmetry is illustrated using results from numerical models. Vibrational behavior of spherical-cap, conical, and bi-conical diaphragms are compared. Axiperiodic membrane geometries consisting of circular arrays of features are discussed. This discussion leads to the conclusion that, for a given feature size, annular axiperiodic diaphragms have vibrational properties mostly dependent on the width of the annulus rather than it's diameter. Numerically modeled and measured acoustic performance of a high-power wide-bandwidth compression driver using an annular axiperiodic membrane, with vibrational and acoustic modes optimized, is discussed.
Convention Paper 9391 (Purchase now)

P6-7 Dual Diaphragm Asymmetric Compression Drivers—Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
A theory of dual compression drivers was described earlier and the design was implemented in several JBL Professional loudspeakers. This type of driver consists of two motors and two annular diaphragms connected through similar phasing plugs to the common acoustical load. The new concept is based as well on two motors and acoustically similar phasing plugs but the diaphragms are mechanically “tuned” to different frequency ranges. Summation of acoustical signals on common acoustical load provides extended frequency range compared to the design with identical diaphragms. Theoretically maximum overall SPL sensitivity is achieved by the in-phase radiation of the diaphragms. Principles of operation of the new dual asymmetric driver are explained using a combination of matrix analysis, finite elements analysis, and data obtained from a scanning vibrometer and the electroacoustic measurements are presented. Comparison of the performance of these dual drivers and the earlier fully symmetric designs is provided.
Convention Paper 9392 (Purchase now)

P7 - Perception—Part 2

Friday, October 30, 9:00 am — 12:00 pm (Room 1A07)

Chair:
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA

P7-1 In-Vehicle Audio System Sound Quality Preference Study—Patrick Dennis, Nissan North America - Farmington Hills, MI, USA
In-vehicle audio systems present a unique listening environment. Listeners were asked to adjust the relative bass and treble levels as well as fade and balance levels based on preference on three music programs reproduced through a high quality in-vehicle audio system. The audio system frequency response was initially tuned to a frequency spectrum similar to that preferred for in-room loudspeakers. The fade control was initially set to give a frontal image with some rear envelopment using two different rear speaker locations, rear deck and rear door, while the balance control was set to give a center image between the center of the steering wheel and rearview mirror. Stage height was located on top of the instrument panel (head level). Results showed that on average listeners preferred +13 dB bass and –2 dB treble compared to a flat response while fade was +3.5 dB rearward for rear deck mounted speakers,+2.6 dB rearward for rear door mounted, and balance was 0 dB. Significant variations between individual listeners were observed.
Convention Paper 9393 (Purchase now)

P7-2 Adapting Audio Quality Assessment Procedures for Engineering Practice—Jan Berg, Luleå University of Technology - Piteå, Sweden; Nyssim Lefford, Luleå University of Technology - Luleå, Sweden
Audio quality is of concern up and down the production chain from content creation to distribution. The technologies employed at each step— equipment, processors like codecs, downmix algorithms, and loudspeakers—all are scrutinized for their impact. The now well-established field of audio quality research has developed robust methods for assessments. To form a basis for this work, research has investigated how perceptual dimensions are formed and expressed. The literature includes numerous sonic attributes that may be used to evaluate audio quality. All together, these findings have provided benchmarks and guidelines for improving audio technology, setting standards in the manufacture of sound and recording equipment and furthering the design of reproduction systems and spaces. They are, however, by comparison rarely used to inform recording and mixing practice. In this paper quality evaluation and mixing practice are compared on selected counts and observations are made on what points these fields may mutually inform one another.
Convention Paper 9394 (Purchase now)

P7-3 Perception and Automated Assessment of Audio Quality in User Generated Content—Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK; Paul Kendrick, University of Salford - Salford, UK; Trevor Cox, University of Salford - Salford, UK; Francis Li, University of Salford - Salford, UK; Iain Jackson, University of Manchester - Manchester, UK
Many of us now carry around technologies that allow us to record sound, whether that is the sound of our child's first music concert on a digital camera or a recording of a practical joke on a mobile phone. However, the production quality of the sound on user-generated content is often very poor: distorted, noisy, with garbled speech or indistinct music. This paper reports the outcomes of a three-year research project on assessment of quality from user generated recordings. Our interest lies in the causes of the poor recording, especially what happens between the sound source and the electronic signal emerging from the microphone. We have investigated typical problems: distortion; wind noise, microphone handling noise, and frequency response. From subjective tests on the perceived quality of such errors and signal features extracted from the audio files we developed perceptual models to automatically predict the perceived quality of audio streams unknown to the model. It is shown that perceived quality is more strongly associated with distortion and frequency response, with wind and handling noise being just slightly less important. The work presented here has applications in areas such as perception and measurement of audio quality, signal processing, and feature detection and machine learning.
Convention Paper 9395 (Purchase now)

P7-4 Compensating for Tonal Balance Effects Due to Acoustic Cross Talk Removal while Listening with Headphones—Bob Schulein, RBS Consultants - Schaumburg, IL, USA
With the large number of headphones now in use, a preponderance of recorded music mixed with loudspeakers is experienced while listening with headphones. It is well known that the headphone experience creates a difference in spatial perception due to the fact that the crosstalk normally associated with loudspeaker listening is eliminated, resulting in a widening of the perceived sound stage. In addition to this difference, a question arises as to changes in the perceived tonal balance that may occur with the removal of acoustic crosstalk. This paper presents a method of measuring such differences based on a series of near field binaural mannequin recordings for which the spectral influence of crosstalk is determined. Measurement data is presented as to the findings of this investigation. Results suggest that headphones designed to sound well balanced for most popular music benefit from a low frequency boost in frequency response, whereas headphones designed primarily for classical listening require less boost.
Convention Paper 9396 (Purchase now)

P7-5 The Use of Microphone Level Balance in Blending the Timbre of Horn and Bassoon Players—Sven-Amin Lembke, McGill University - Montreal, Quebec, Canada; De Montfort University - Leicester, UK; Scott Levine, Skywalker Sound; Martha de Francisco, McGill University - Montreal, QC, Canada; Stephen McAdams, McGill University - Montreal, Quebec, Canada
A common musical aim of orchestration is to achieve a blended timbre for certain instrument combinations. Its success has been shown to also depend on the timbral coordination between musicians during performance, which this study extends by adding the subsequent involvement of sound engineers. We report the results from a production experiment in which sound engineers mixed independent feeds for a main and two spot microphones to blend the timbre of pairs of bassoon and horn players in a two-channel stereo mix. The balance of microphone feeds can be shown to be affected by leadership roles between performers, the musical material, and aspects related to room acoustics and performer characteristics.
Convention Paper 9397 (Purchase now)

P7-6 101 Mixes: A Statistical Analysis of Mix-Variation in a Dataset of Multi-Track Music Mixes—Alex Wilson, University of Salford - Salford, Greater Manchester, UK; Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK
The act of mix-engineering is a complex combination of creative and technical processes; analysis is often performed by studying the techniques of a few expert practitioners qualitatively. We propose to study the actions of a large group of mix-engineers of varying experience, introducing quantitative methodology to investigate mix-variation and the perception of quality. This paper describes the analysis of a dataset containing 101 alternate mixes generated by human mixers as part of an on-line mix competition. A varied selection of audio signal features is obtained from each mix and subsequent principal component analysis reveals four prominent dimensions of variation: dynamics, treble, width, and bass. An ordinal logistic regression model suggests that the ranking of each mix in the competition was significantly influenced by these four dimensions. The implications for the design of intelligent music production systems are discussed.
Convention Paper 9398 (Purchase now)

P8 - Signal Processing

Friday, October 30, 11:00 am — 12:30 pm (S-Foyer 1)

P8-1 Robust MPEG-4 High-Efficiency AAC With Fixed- and Variable-Length Soft-Decision Decoding—Sai Han, Technische Universität Braunschweig - Braunschweig, Germany; Tim Fingscheidt, Technische Universität Braunschweig - Braunschweig, Germany
MPEG-4 High-Efficiency advanced audio coding (HE-AAC) is optimized for low bit rate applications, such as digital radio broadcasting and wireless music streaming. In HE-AAC, the differential scale factors and quantized spectral coefficients are variable-length coded (VLC) by Huffman codes. The common reference value of the scale factors is a fixed-length coded global gain. Due to the error propagation in VLCs, a robust source decoder is desired for HE-AAC transmission over an error-prone channel. Unlike traditional hard-decision decoding or error concealment, soft-decision decoding utilizing bit-wise channel reliability information offers improved audio quality. In this work we apply soft-decision decoding of fixed length to the global gain and of variable length to the scale factors. Simulation results show a clearly improved performance.
Convention Paper 9399 (Purchase now)

P8-2 Extension of Monaural to Stereophonic Sound Based on Deep Neural Networks—Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seok Hee Jeong, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Su Yeon Park, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; City University of New York - New York, NY, USA
In this paper we propose a method of extending monaural into stereophonic sound based on deep neural networks (DNNs). First, it is assumed that monaural signals are the mid signals for the extended stereo signals. In addition, the residual signals are obtained by performing the linear prediction (LP) analysis. The LP coefficients of monaural signals are converted into the line spectral frequency (LSF) coefficients. After that, the LSF coefficients are taken as the DNN features, and the features of the side signals are estimated from those of the mid signals. The performance of the proposed method is evaluated using a log spectral distortion (LSD) measure and a multiple stimuli with a hidden reference and anchor (MUSHRA) test. It is shown from the performance comparison that the proposed method provides lower LSD and higher MUSHRA score than a conventional method using hidden Markov model (HMM).
Convention Paper 9400 (Purchase now)

P8-3 Nonnegative Tensor Factorization-Based Wind Noise Reduction—Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Ji Hyun Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seung Woo Yu, Gwangju Institute of Science and Technology - Gwangju, Korea; Young Han Lee, Korea Electronics Technology Institute (KETI) - Seongnam-si, Gyeonggi-do, Korea; Choong Sang Cho, KETI/Multimedia-IP - SeongNam-si, Gyeonggi-do, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; City University of New York - New York, NY, USA
In this paper a wind noise reduction method based on nonnegative tensor factorization (NTF) is proposed to enhance the audio quality recorded using an outdoor multichannel microphone array. The proposed method first prepares learned bases for NTF by training exemplar blocks of spectral magnitudes for a series of wind noises and audio contents. Then, the spectral magnitudes of wind noise to be reduced are estimated from the exemplar blocks. Finally, a wind noise reduction multichannel filter is constructed based on a minimum mean squared error (MMSE) criterion and applied to the multichannel noisy signal to obtain the signal with reduced wind noise. The performance of the proposed method is compared with those of conventional methods using minimum statistics (MS) and nonnegative matrix factorization (NMF) for wind noise reduction. As a result, it is shown that the proposed method provides a higher signal-to-distortion ratio (SDR), signal-to-interference ratio (SIR), and signal-to-artifact ratio (SAR) than the conventional methods under various signal-to-noise ratio (SNR) conditions.
Convention Paper 9401 (Purchase now)

P8-4 Detection and Removal of the Birdies Artifact in Low Bit-Rate Audio—Simon Desrochers, Université de Sherbrooke - Sherbrooke, QC, Canada; Roch Lefebvre, Universite de Sherbrooke - Sherbrooke, QC, Canada
Audio signals compressed at low bit rates are known to generate audible artifacts that degrade perceptual quality. These different artifacts have been documented and solutions have been proposed by many authors to modify the internal mechanisms of codecs that cause these artifacts. In this paper we propose a post-processing approach to detecting and removing the birdies artifact by modeling spectral components as partials. This approach has the advantage of being compatible with any codec as it only requires the compressed signal. Formal listening tests have shown that this prototype algorithm can increase the perceptual quality of birdies-ridden signals. Furthermore, the explicit detection of this artifact could eventually be used in an objective perceptual quality assessment algorithm.
Convention Paper 9402 (Purchase now)

P8-5 Using Cascaded Global Optimization for Filter Bank Design in Low Delay Audio Coding—Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jörn Ostermann, Leibniz Universität Hannover - Hannover, Germany
This paper demonstrates the possibility of finding suitable design parameters for a filter bank optimization procedure by the use of global optimization techniques. After the Nayebi filter bank optimization algorithm is summarized and its degrees of freedom are described, a global optimization framework for the parameters involved in the design process is presented. It includes a cost function monitoring the frequency characteristics and reconstruction properties of different filter bank designs and allows for an automatic search of suitable design parameters for a given number of bands and taps and a predefined delay. The global optimization itself is done by means of well known methods like pattern search and the genetic algorithm. Experiments show that with our method manual parameter adjustment becomes obsolete. Furthermore with our proposed cascaded optimization, compared to manually adjusted designs, a gain of up to 10 dB in stopband attenuation can be achieved without loss in reconstruction quality.
Convention Paper 9403 (Purchase now)

P8-6 Effect of Reverberation on Overtone Correlations in Speech and Music—Sarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
This paper explores the effect of reverberation on audio signals that possess a harmonically rich overtone spectrum such as speech and many musical instrument sounds. A proposed metric characterizes the degree of reverberation based upon the cross correlation of the instantaneous frequency tracks of the signal overtones. It is found that sounds that exhibit near perfect correlations in an anechoic acoustic environment become less correlated when passed through a reverberant channel. These results are demonstrated for a variety of music and speech tones using both natural recordings and synthetic reverberation. The proposed metric corresponds to the speech transmission index and thus may be employed as a quantitative measure of the amount of reverberation in a recording.
Convention Paper 9404 (Purchase now)

P8-7 Stacked Modulation in a Hall Reverberation Algorithm—Kelsey M. Cintorino, Roger Williams University - Bristol, RI, USA; Daniel M. Wisniewski, Roger Williams University - Bristol, RI, USA; Benjamin D. McPheron, Roger Williams University - Bristol, RI, USA
Reverberation is the reflection of sound caused by objects in space, similar to the way the visual world is sensed by the reflection of light. Novel reverberation algorithms are in high demand within the music industry due to changing trends and desire for unique sounds. As DSP hardware has improved, it is easier to implement multiple effects into the same algorithm. This paper presents a hall algorithm augmented with a series of chorus modulation blocks in an attempt to create new sounds. The approach is to add chorus blocks before the early decay phase of the hall algorithm as well as within the late reverb generation phase. The result is a stacked modulation reverberation algorithm.
Convention Paper 9405 (Purchase now)

P8-8 Efficient Multi-Band Digital Audio Graphic Equalizer with Accurate Frequency Response Control—Richard J. Oliver, DTS, Inc. - Santa Ana, CA, USA; Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA
Graphic equalizers give listeners an intuitive way to modify the frequency response of an audio signal—simply set the sliders to visually represent the desired curve and the corresponding shape of audio filter frequency response will be invoked. At least, that is the implied promise of the technology. However, the actual measured response of the equalizer can reveal some surprises. Filter inaccuracy, boost/cut asymmetry and unexpected nulls can disappoint both the eye and the ear. An equalizer design is presented that uses efficient IIR filter sections tuned with a closed form algorithm to give an accurate and intuitive frequency response with low complexity and minimal processing overhead. Design parameters and implementation details are discussed.
Convention Paper 9406 (Purchase now)

P9 - Transducers—Part 3: Loudspeakers

Friday, October 30, 2:00 pm — 5:00 pm (Room 1A08)

Chair:
Peter John Chapman, Harman - Denmark; Bang & Olufsen Automotive

P9-1 A Model for the Impulse Response of Distributed-Mode Loudspeakers and Multi-Actuator Panels—David Anderson, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Panels driven into transverse (bending) vibrations by one or more small force drivers are a promising alternative approach in loudspeaker design. A mechanical-acoustical model is presented here that enables computation of the acoustic transient response of such loudspeakers driven by any number of force transducers at arbitrary locations on the panel and at any measurement point in the acoustic radiation field. Computation of the on- and off-axis acoustic radiation from a panel confirms that the radiated sound is spatially diffuse. Unfortunately, this favorable feature of vibrating panel loudspeakers is accompanied by significant reverberant effects and such loudspeakers are poor at reproducing signals with rapid transients.
Convention Paper 9409 (Purchase now)

P9-2 Loudspeaker Rocking Modes (Part 1: Modeling)—William Cardenas, Klippel GmbH - Dresden, Germany; Wolfgang Klippel, Klippel GmbH - Dresden, Germany
The rocking of the loudspeaker diaphragm is a severe problem in headphones, micro-speakers, and other kinds of loudspeakers causing voice coil rubbing that limits the maximum acoustical output at low frequencies. The root causes of this problem are small irregularities in the circumferential distribution of the stiffness, mass, and magnetic field in the gap. A dynamic model describing the mechanism governing rocking modes is presented and a suitable structure for the separation and quantification of the three root causes exciting the rocking modes is developed. The model is validated experimentally for the three root causes and the responses are discussed conforming a basic diagnostics analysis.
Convention Paper 9410 (Purchase now)

P9-3 Active Transducer Protection Part 1: Mechanical Overload—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
The generation of sufficient acoustical output by smaller audio systems requires maximum exploitation of the usable working range. Digital preprocessing of audio input signals can be used to prevent a mechanical or thermal overload generating excessive distortion and eventually damaging the transducer. The first part of two related papers focuses on the mechanical protection defining useful technical terms and the theoretical framework to compare existing algorithms and to develop meaningful specifications required for the adjustment of the protection system to the particular transducer. The new concept is illustrated with a micro-speaker and the data exchange and communication between transducer manufacturer, software provider, and system integrator are discussed.
Convention Paper 9411 (Purchase now)

P9-4 Horns Near Reflecting Boundaries—Bjørn Kolbrek, Norwegian University of Science and Technology - Trondheim, Norway
It is well known that when a sound source is placed near one or more walls, the power output increases due to the mutual coupling between the source and its image sources. This is reflected in an increase in the low frequency radiation resistance as seen by the sources. While direct radiating loudspeakers may benefit from this whenever the sources are within about a quarter wavelength of each other, horns will behave differently depending on if the increase in radiation resistance comes within the pass band of the horn or not. This has implications for the placement of corner horns. In this paper the Mode Matching Method (MMM) is used together with the modal mutual radiation impedance and the concept of image sources to compute the throat impedance and radiated sound pressure of horns placed near infinite, perpendicular reflecting boundaries. The MMM is compared with another numerical method, the Boundary Element Rayleigh Integral Method (BERIM), and with measurements and is shown to give good agreement with both. The MMM also has significantly shorter computation time than BERIM, making it attractive for use for the initial iterations of a design, or for optimization procedures.
Convention Paper 9412 (Purchase now)

P9-5 State-Space Modeling of Loudspeakers Using Fractional Derivatives—Alexander King, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
This work investigates the use of fractional order derivatives in modeling moving-coil loudspeakers. A fractional order state-space solution is developed, leading the way towards incorporating nonlinearities into a fractional order system. The method is used to calculate the response of a fractional harmonic oscillator, representing the mechanical part of a loudspeaker, showing the effect of the fractional derivative and its relationship to viscoelasticity. Finally, a loudspeaker model with a fractional order viscoelastic suspension and fractional order voice coil is fit to measurement data. It is shown that the identified parameters can be used in a linear fractional order state-space model to simulate the loudspeakers’ time domain response.
Convention Paper 9413 (Purchase now)

P9-6 Comparative Static and Dynamic FEA Analysis of Single and Dual Voice Coil Midrange Transducers—Felix Kochendörfer, JBL/Harman Professional - Northridge, CA USA; Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
The concept of the dual coil direct-radiating loudspeakers have been known for several decades. JBL Professional pioneered in design and application of dual coil woofers and midrange loudspeakers. There are several properties of the dual coil transducers that differentiate them from the traditional single voice coil design. First is the better heat dissipation—the dual coil may be considered as a traditional coil slit in two parts and each one is positioned into its own magnetic gap. Second is the symmetry of the force factor (Bl product) versus position of the voice coils in their gaps. It is explained by the fact that one coil leaves its gap the other one on contrary enters its gap. These two features are well researched and described in literature [1,2]. Less is known about advantage of the dual coil transducers related to the flux modulation and dependence of the alternating magnetic flux (and corresponding voice coil inductance) on frequency, current, and voice coil positions. In this work comparison of a regular single coil design and dual coil configuration is carried out through dynamic magnetic FEA modeling and measurements.
Convention Paper 9414 (Purchase now)

P10 - Recording & Production

Friday, October 30, 2:00 pm — 4:30 pm (Room 1A07)

Chair:
Grzegorz Sikora, Harman - Pullach, Germany

P10-1 Lossless Audio Checker: A Software for the Detection of Upscaling, Upsampling, and Transcoding in Lossless Musical Tracks—Julien Lacroix, Independent Developer - Aix-en-Provence, France; Yann Prime, Independent Developer - Aix-en-Provence, France; Alexandre Remy, Independent Developer - Aix-en-Provence, France; Olivier Derrien, University of Toulon / CNRS-LMA - Toulon, France
Internet music dealers currently sell “CD quality” tracks, or even better (“Studio Master”), thanks to lossless audio coding formats (FLAC, ALAC). However, a lossless format does not guarantee that the audio content is what it seems to be. The audio signal may have been upscaled (increasing the resolution), upsampled (increasing the sample rate), or even transcoded from a lossy to a lossless format. In this paper we describe a new software that analyzes lossless audio tracks and detects upsampling, upscaling, and transcoding (only for AAC in this early version). Validation tests over a large music database (with groundtruth available) show that this method is fast and accurate: 100% of success for upscaling and transcoding, 91.3% for upsampling.
Convention Paper 9416 (Purchase now)

P10-2 Comparison of Audio Signals Obtained with Source Overlay (OAS) and Other Conventional Recording Methods—Juliette Olivella, Universidad de San Buenaventura - Bogotá, Colombia; K2 INGENIERIA; William Romo, Universidad de San Buenaventura - Bogotá, Colombia; Dario Páez, Universidad de San Buenaventura - Bogotá, Colombia
Overlay Model of Acoustic Sources (OAS) is an unconventional recording method with a stereo microphone array. This model was proposed as a methodological alternative that allows emulating a recording single-take of a musical group. It is based on the presumption of a linear behavior in a recording system and involves doing partial captures of musical instruments that integrate the entire assembly. Experimental tests were done to corroborate the system's linearity; two speakers are used instead of musicians and audio is recorded with conventional techniques and model of Overlay of Acoustic Sources. The audios were discretized using MATLAB in order to evaluate their physical parameters and the correlation coefficients between energy, maximum values, minimum values, frequency response, the zero crossings rate, and spatiality of recordings. All the research sought to answer the question if it is possible to get an audio signal able to imitate the signal characteristics captured in real time in a recording by takes. The results showed that it is possible when the recording is performed with the method of overlay of acoustic sources (OAS).
Convention Paper 9417 (Purchase now)

P10-3 Process Improvement in Audio Production from a Sociotechnical Systems Perspective—Gerhard Roux, Stellenbosch University - Stellenbosch, Western Cape, South Africa
Audio professionals involved in live sound reinforcement, record production, and broadcasting are continuously solving complex problems in creative ways. It is wasteful if the pragmatic methodologies used in solving these problems do not contribute towards a reusable model of process improvement. This paper suggests a systems-level engagement with audio production that strikes a balance between human creativity and technological infrastructure. A conceptual model of process improvement is developed through analysis of audio production as a complex system and subsequently implemented through an action research methodology in multiple case studies. The study found that significant quality improvements in audio production could be attained through a sociotechnical systems approach. The results imply that the application of process improvement methodologies can coexist with creative social practice, resulting in improved technical performance of production systems.
Convention Paper 9418 (Purchase now)

P10-4 Listener Preference for Height Channel Microphone Polar Patterns in Three-Dimensional Recording—Will Howie, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Matthew Boerum, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT); David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Alan Joosoo Han, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada
A listening experiment was conducted to determine if a preference exists among three microphone polar patterns when recording height channels for three-dimensional music production. Seven-channel 3D recordings of four different musical instruments were made using five-channel surround microphone arrays, augmented with two Sennheiser MKH 800 Twin microphones as height channels. In a double-blind listening test, subjects were asked to rate different mixes of the same recordings based on preference. The independent variable element in these mixes was the polar pattern of the height channel microphones. Analysis of the results found that the vast majority of subjects showed no statistically significant preference for any one polar pattern.
Convention Paper 9419 (Purchase now)

P10-5 Listener Discrimination of High-Speed Digitization from Analog Tape Masters with Spectral Matching—Nick Lobel, Belmont University - Nashville, TN, USA; Eric Tarr, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
This study investigated whether listeners could discriminate between real-time (RT) and double-speed (DS) digital transfers from analog tape recordings. Signals were recorded to tape at 15 inches per second (ips), then digitized at two copy rates: 15 ips (RT) and 30 ips (DS). The DS transfers were digitally time-stretched and spectrally processed to match the duration and frequency response of the RT transfers. Thirty-one listeners participated in an ABX experiment to discriminate between the RT and DS transfers. Results show discrimination between RT and DS transfers was not statistically significant. Additionally, discrimination did not vary significantly across different types of source signals.
Convention Paper 9420 (Purchase now)

P11 - Spatial Audio

Friday, October 30, 2:00 pm — 3:30 pm (S-Foyer 1)

P11-1 Comparison of Techniques for Binaural Navigation of Higher-Order Ambisonic Soundfields—Joseph G. Tylka, 3D3A Lab, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
Soundfields that have been decomposed into spherical harmonics (i.e., encoded into higher-order ambisonics—HOA) can be rendered binaurally for off-center listening positions, but doing so requires additional processing to translate the listener and necessarily leads to increased reproduction errors as the listener navigates further away from the original expansion center. Three techniques for performing this navigation (simulating HOA playback and listener movement within a virtual loudspeaker array, computing and translating along plane-waves, and re-expanding the soundfield about the listener) are compared through numerical simulations of simple incident soundfields and evaluated in terms of both overall soundfield reconstruction accuracy and predicted localization. Results show that soundfield re-expansion achieves arbitrarily low reconstruction errors (relative to the original expansion) in the vicinity of the listener, whereas errors generated by virtual-HOA and plane-wave techniques necessarily impose additional restrictions on the navigable range. Results also suggest that soundfield re-expansion is the only technique capable of accurately generating high-frequency localization cues for off-center listening positions, although the frequencies and translation distances over which this is possible are strictly limited by the original expansion order.
Convention Paper 9421 (Purchase now)

P11-2 Estimation of Individual HRIRs Based on SPCA from Impulse Responses Acquired in Ordinary Sound Fields—Shouichi Takane, Akita Prefectural University - Yurihonjo, Akita, Japan
In this paper a method for estimation of individual Head-Related Impulse Responses (HRIRs) from impulse responses acquired in an ordinary sound field is proposed based on the Spatial Principal Components Analysis (SPCA) of the HRIRs. The average vector and the principal components matrix are assumed to be obtained by adopting the SPCA to the set of HRIRs of multiple subjects covering all directions. A part of the impulse response from sound source to an ear of a certain subject, regarded as one of his/her HRIR, is used together for estimating the weight coefficients of the principal components. Applying the method using the dataset involving the HRIRs of the multiple subjects covering all sound source directions to the estimation of the individual HRIRs showed that the acceptable estimation accuracy is obtained for the estimation of the HRIRs in an ipsilateral direction.
Convention Paper 9422 (Purchase now)

P11-3 Height Perception in Ambisonic Based Binaural Decoding—Gavin Kearney, University of York - York, UK; Tony Doyle, University of York - York, UK
This paper presents an investigation into the perception of height in Ambisonic decoding schemes for binaural reproduction. We compare the spatial resolution of first, third, and fifth order Ambisonic decoders to that of real-world monophonic sources presented in the vertical plane. Spatial preservation of the spectral cues required for rendering sources with height is investigated and cross-referenced to binaural models of the rendered systems. The results presented address the applicability of higher order Ambisonics to the rendering of sound source elevation given the high frequency distortion of pinnae cues.
Convention Paper 9423 (Purchase now)

P11-4 An HRTF Database for Virtual Loudspeaker Rendering—Gavin Kearney, University of York - York, UK; Tony Doyle, University of York - York, UK
This paper presents a database of Head Related Transfer Functions (HRTFs), collected from 20 subjects for use in virtual loudspeaker reproduction systems. The paper documents the measurement procedure and format of the HRTFs. The database accommodates Ambisonic rendering up to 5th Order and includes loudspeaker configurations derived from platonic, convex polyhedra and other spherical distributions. The datasets are also presented with matching acoustic responses to assist externalization and decode matrices for higher order Ambisonic rendering.
Convention Paper 9424 (Purchase now)

P11-5 Influence of Energy Distribution on Elevation Judgments—Taku Nagasaka, University of Aizu - Aizu-Wakamatsu, Japan; Shunsuke Nogami, University of Aizu - Aizu-Wakamatsu, Japan; Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
The relative influence of spectral cues on elevation localization of virtual sources was investigated by comparing judgments of loudspeaker reproduced stimuli spatialized with three methods, two of them based on vector-based amplitude panning: 3D vector-based amplitude panning (3D-VBAP), and 2D-VBAP in conjunction with HRIR convolution; and a third method that filtered the stimuli to simulate spectral peaks and troughs naturally occurring at different angles (equalizing filters). For the last two methods a single horizontal loudspeaker array was used. Smallest absolute errors were observed for the 3D-VBAP judgments regardless of azimuth; no significant difference in the mean absolute error was found between the other two methods. However, for most presentation azimuths, the equalizing filter method yielded the least dispersed results. These results could be used for improving elevation localization in two-dimensional VBAP reproduction systems.
Convention Paper 9425 (Purchase now)

P11-6 Influence of Spectral Energy Distribution on Subjective Azimuth Judgments—Shunsuke Nogami, University of Aizu - Aizu-Wakamatsu, Japan; Taku Nagasaka, University of Aizu - Aizu-Wakamatsu, Japan; Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
In this research we compare subjective judgments of azimuth obtained by three methods: Vector-Based Amplitude Panning (VBAP), VBAP mixed with binaural rendition over loudspeakers (VBAP + HRTF), and a newly proposed method based on equalizing spectral energy. In our results, significantly smaller errors were found for the stimuli treated with VBAP and HRTFs; differences between the other two treatments were not significant. Regarding spherical dispersion of the judgments, VBAP results have the greatest dispersion, whereas the dispersion on the results of the other two methods were significantly smaller, however similar between them. These results suggest that horizontal localization using VBAP methods can be improved by applying a frequency dependent panning factor a opposed to a constant scalar as commonly used.
Convention Paper 9426 (Purchase now)

P11-7 Subjective Diffuseness in Layer-Based Loudspeaker Systems with Height—Michael P. Cousins, University of Southampton - Southampton, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Stefan Bleeck, University of Southampton - Southampton, UK; Frank Melchior, BBC Research and Development - Salford, UK
Loudspeaker systems with more channels and with elevated loudspeakers are becoming more common. There is an opportunity for greater spatial impression with listeners surrounded in three dimensions. There is research showing the advantages of more loudspeakers and of 3D layouts over 2D layouts although it is not clear whether the cause of these improvements is the greater number of loudspeaker, their position, or both. In this paper two listening tests are presented that investigate the subjective diffuseness of a range of loudspeaker layouts. The first experiment was used to optimize the distribution of loudness between horizontal layers of loudspeakers to allow fair comparison between different layouts. The second experiment investigated the perceived diffuseness of a range of loudspeaker layouts chosen to critically assess parameters of layer-based loudspeaker systems as well as validate the results of the first experiment. The number of loudspeakers at head-height, the number of loudspeakers not at head-height, and the relative level between head-height and non-head-height layers were all found to be statistically significant in terms of perceived diffuseness. It was also confirmed that 3D loudspeaker layouts can have statistically greater perceived diffuseness than 2D layouts.
Convention Paper 9427 (Purchase now)

P11-8 Echo Canceler for Real-Time Audio Communication with Wave Field Reconstruction—Satoru Emura, NTT Media Intelligence Laboratories - Tokyo, Japan; Sachiko Kurihara, NTT Media Intelligence Laboratories - Tokyo, Japan
For immersive sharing of a sound field between two remote sites wave field synthesis (WFS) and echo cancellation are essential. Though both technologies have been studied for more than a decade, it was not clear whether it was possible to build a real-time system for full-duplex audio communication with WFS. We show in this paper that such a system can be built.
Convention Paper 9428 (Purchase now)

P12 - Game Audio

Friday, October 30, 4:30 pm — 5:30 pm (Room 1A07)

Chair:
Michael Kelly, DTS, Inc. - London, UK

P12-1 Real-Time Morphing of Impact Sounds—Sadjad Siddiq, Square Enix Co., Ltd. - Tokyo, Japan
This paper introduces an algorithm to morph between two or more sounds, which can be used to synthesize new sounds in real-time whose features lie between the tone color, amplitude envelope, pitch, and length of the source sounds. It is used to increase variation of commonly used impact sounds in video games, but the algorithm can also be applied to other sound types like instrument sounds. Morphing of the tone color is achieved by shifting formants in the frequency spectrum of one sound toward the frequencies of the corresponding formants in the other sounds. Corresponding formants are found automatically by pairing frequency regions of equal normalized cumulative energy. Morphing of the temporal structure is achieved by aligning those frames of all sounds that have equal normalized cumulative amplitude. A link to samples is provided.
Convention Paper 9407 (Purchase now)

P12-2 Using Pure Data as a Game Audio Engine—Leonard J. Paul, School of Video Game Audio - Vancouver, Canada
Recent improvements in the Pure Data (Pd) library library code (libpd) and significant run-time improvements using the Heavy compiler have made Pd more viable as a free audio engine for use in video games. Open source projects are now available to help speed the process of integrating Pd into the popular Unity game and create new possibilities for the use of Pd by game studios with limited budgets and for educational purposes as well. Details on best practices on the use of Pd for audio in video games are outlined in this paper.
Convention Paper 9408 (Purchase now)

P13 - Spatial Audio—Part 1

Saturday, October 31, 9:00 am — 12:30 pm (Room 1A08)

Chair:
Francis Rumsey, Logophon Ltd. - Oxfordshire, UK

P13-1 On the Performance of Acoustic Intensity-Based Source Localization with an Open Spherical Microphone Array—Mert Burkay Cöteli, METU Middle East Technical University - Ankara, Turkey; ASELSAN A.S. - Ankara, Turkey; Hüseyin Hacihabiboglu, Middle East Technical University (METU) - Ankara, Turkey
Sound source localization is important in a variety of contexts. A notable example is acoustic scene analysis for parametric spatial audio where not only recording the sound source but also deducing its direction is necessary. Sound source localization methods based on acoustic intensity provide a viable alternative to more traditional, delay-based techniques. However, special sound intensity probes or microphone arrays need to be used. This paper presents the evaluation of the sound source localization performance of an icosahedral open spherical microphone array using a method based on intensity vector distributions in time-frequency domain.
Convention Paper 9429 (Purchase now)

P13-2 A Microphone Array for Recording Music in Surround-Sound with Height Channels—David Bowles, Swineshead Productions LLC - Berkeley, CA, USA
In the past few years, sound recordings with spatial audio have moved from the realm of theoretical research to the actuality of physical and digital releases in the market. At present three Blu-ray disc formats utilize a traditional 5.1 surround-sound recording, with an added 4-channel layer of height channels. The topic of this paper is how to capture vertical localization effectively within this release format, utilizing existing research on hearing localization and techniques learned in the field. The proposed microphone array has time-of-arrival differences between all microphones, yet mixes down to 5.1 and stereo without excessive comb-filtering or other artifacts.
Convention Paper 9430 (Purchase now)

P13-3 Exploring 3D: A Subjective Evaluation of Surround Microphone Arrays Catered for Auro-3D Reproduction Systems—Alex Ryaboy, New York University - New York, NY, USA
As multichannel systems grow in popularity, audio professionals must make an informed decision when choosing a correct capturing method to deliver their vision. Many of today’s microphone arrays that are catered for surround sound with height, employ traditional spaced surround techniques that are aided by an additional array in the upper plane and are widely used to capture a performance in large spaces. This paper uses a perceptual study to evaluate a fully coincident microphone array Double-MSZ and a semi-coincident array Twins Square on Envelopment, Localization and Spatial Impression in a small recording studio environment. The study revealed overall lower widths, better localization, and a more stable vertical imaging for Double-MSZ, while the Twins Square technique exhibited higher ensemble envelopment and a more spacious perceived environment.
Convention Paper 9431 (Purchase now)

P13-4 Three Dimensional Spatial Techniques in 22.2 Multichannel Surround Sound for Popular Music Mixing—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Current multichannel spatial mixing practices are largely limited to the construction of three-dimensional space using two dimensional panning tools (meant for 5.1, 7.1, etc.) and those designed for common stereo production. A great deal of research is currently underway in spatial sound reproduction through computer modeling and signal processing, with little focus on actual recording and mixing practices. This investigation examines the design and implementation of early and late reflections and reverberant fields in 22.2 multichannel sound system mixing based upon research in listener envelopment. The techniques discussed will include the expansion of spatial elements into three dimensions using conventional tools and the implementation of multichannel impulse responses for reverberant fields. Listening tests were conducted upon the final music mix with positive results reported for listener immersion.
Convention Paper 9432 (Purchase now)

P13-5 On the Use of a Lebedev Grid for Ambisonics—Pierre Lecomte, Conservatoire National des Arts et Métiers - Paris, France; Université de Sherbrooke - Sherbrooke, Canada; Philippe-Aubert Gauthier, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada; Christophe Langrenne, Conservatoire National des Arts et Métiers - Paris, France; Alexandre Garcia, Conservatoire National des Arts et Métiers - Paris, France; Alain Berry, Université de Sherbrooke - Sherbrooke, Quebec, Canada; McGill University - Montreal, Quebec, Canada
Ambisonics provide tools for three-dimensional sound field analysis and synthesis. The theory is based on sound field decomposition using a truncated basis of spherical harmonics. For the three-dimensional problem the decomposition of the sound field as well as the synthesis imply an integration over the sphere that respects the orthonormality of the spherical harmonics. This integration is practically achieved with discrete angular samples over the sphere. This paper investigates spherical sampling using a Lebedev grid for practical applications of Ambisonics. The paper presents underlying theory, simulations of reconstructed sound fields, and examples of actual prototypes using a 50 nodes grid able to perform recording and reconstruction up to order 5. Orthonormality errors are provided up to sixth order and compared for two grids: (1) the Lebedev grid with 50 nodes and (2) the Pentakis-Dodecahedron with 32 nodes. Finally, the paper presents some practical advantages using Lebedev grids for Ambisonics, in particular the use of sub-grids working up to order 1 or 3 and sharing common nodes with the 50 nodes grid.
Convention Paper 9433 (Purchase now)

P13-6 ISO/MPEG-H 3D Audio: SAOC 3D Decoding and Rendering—Adrian Murtaza, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Jouni Paulus, Fraunhofer IIS - Erlangen, Germany; International Audio Laboratories Erlangen - Erlangen, Germany; Leon Terentiv, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Harald Fuchs, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Disch, Fraunhofer IIS, Erlangen - Erlangen, Germany
The ISO/MPEG standardization group recently finalized the MPEG-H 3D Audio standard for the universal carriage of encoded 3D-sound from channel-based, object-based, and HOA-based input. To achieve efficient low-bitrate coding of a high number of channels and objects, an advanced version of the well-known MPEG-D Spatial Audio Object Coding (SAOC) has been developed under the name SAOC 3D. The new SAOC 3D system supports direct reproduction to any output format from 22.2 and beyond down to 5.1 and stereo. This paper describes the SAOC-3D technology as it is part of the MPEG-H 3D Audio (phase one) International Standard and provides an overview of its features, capabilities, and performance.
Convention Paper 9434 (Purchase now)

P13-7 Auditory Distance Rendering Using a Standard 5.1 Loudspeaker Layout—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Andreas Walther, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ville Pulkki, Aalto University - Espoo, Finland
Human hearing is known to be sensitive to the distances of sound sources. However, spatial-sound rendering systems typically do not allow controlling the distance of the auditory objects. This paper proposes a distance-rendering method that uses standard 5.1 loudspeaker layouts. The proposed method applies an input signal to multiple loudspeakers and controls the gains and the coherence of the loudspeaker signals. In addition, the method is combined with amplitude panning, thus allowing to continuously control both the distance and the direction of the auditory objects. Based on listening tests, the proposed method was found to provide the ability to realistically manipulate the perception of both direction and distance.
Convention Paper 9435 (Purchase now)

P14 - Perception—Part 3

Saturday, October 31, 9:00 am — 12:00 pm (Room 1A07)

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P14-1 Spatial Sound Attributes—Development of a Common Lexicon—Nick Zacharov, DELTA SenseLab - Hørsholm, Denmark; Torben Holm Pedersen, DELTA SenseLab - Hørsholm, Denmark
Sound quality and spatial sound have been topics of research for decades in relation to loudspeakers and headphones as well as performance spaces (e.g., concert halls). Attributes may be used as a means to characterize sound quality through listening tests. Attribute development protocols are well reported and have been employed to a wide range of spatial sound applications. However, the usage of attribute often leads to researchers discussing the merits of the attributes as opposed to focusing upon the object of the research. Over the last few decades a large number of articles have included the development of a spatial sound attribute. This paper describes the collection of many known research articles on spatial sound attributes from a wide range of domains. As opposed to repeating traditional word elicitation and group discussion, we have chosen to use a semantic text data mining approach to find common attribute meanings, which is then followed by a sorting and refinement process with expert assessors. This process is defined in detail and the results of the semantic text mining are presented as part of the further development of a sound wheel for sound reproduction.
Convention Paper 9436 (Purchase now)

P14-2 Towards a MATLAB Toolbox for Imposing Speech Signal Impairments Following the P.TCA Schema—Friedemann Köster, University of Technology Berlin - Berlin, Germany; Falk Schiffner, University of Technology Berlin - Berlin, Germany; Dennis Guse, University of Technology Berlin - Berlin, Germany; Jens Ahrens, University of Technology Berlin - Berlin, Germany; Janto Skowronek, University of Technology Berlin - Berlin, Germany; Sebastian Möller, University of Technology Berlin - Berlin, Germany
In this paper we present and validate a freely available MATLAB Toolbox for imposing speech signal impairments similar to those occurring in real-world telecommunication systems. The purpose of the toolbox is to facilitate research on the perception of different dimensions of speech quality and their relation to technical system properties. In that context the International Telecommunication Union (ITU) is working on the annotation method P.TCA, which enables expert listeners to identify the technical cause for an observed speech signal impairment. Our contribution addresses one current challenge of P.TCA: it was found out that providing written definitions of speech degradations without exemplary listening material is not sufficient to be reliably understood by annotators. To address this issue and make the schema accessible for a wide range of users, this paper describes a systematic approach to generate and validate such exemplary listening material. A validation experiment shows that experts can identify more than half of the processed examples correctly and it encourages further research towards improving the P.TCA procedure as well as the processing algorithms.
Convention Paper 9437 (Purchase now)

P14-3 The Influence of Dumping Bias on Timbral Clarity Ratings—Kirsten Hermes, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Christopher Hummersone, University of Surrey - Guildford, Surrey, UK
When listening test subjects are required to rate changes in a single attribute, but also hear changes in other attributes, their ratings can become skewed by “dumping bias.” To assess the influence of dumping bias on timbral “clarity” ratings, listeners were asked to rate stimuli: (i) in terms of clarity only; and (ii) in terms of clarity, warmth, fullness, and brightness. Clarity ratings of type (i) showed (up to 20%) larger interquartile ranges than those of type (ii). It is concluded that in single-attribute timbral rating experiments, statistical noise—potentially resulting from dumping bias—can be reduced by allowing listeners to rate additional attributes either simultaneously or beforehand.
Convention Paper 9438 (Purchase now)

P14-4 Method for Objective Evaluation of Nonlinear Distortion—Mikhail Pahomov, LG Electronics, Inc. - St. Petersburg, Russia; Yong Hyuk Na, Sr., LG Electronics, Inc. - Seoul, South Korea
A perceptual method is presented for assessing nonlinear distortion audibility in sound systems with high distortion levels of the original signals (mobile devices). The method is based on the Perceptual Evaluation of Audio Quality (PEAQ) standard [1]. To estimate the audibility of non-linear distortion, generating a content-dependent multitone signal and extracting a distortion signal from it is proposed. Then, the distortion signal’s properties are measured. Next, a regression analysis is applied to combine the properties to derive a metric that denotes the overall audible harmonic distortion. Experimental results on mobile handsets are provided to verify the high accuracy of the method.
Convention Paper 9439 (Purchase now)

P14-5 Subjective and Objective Measurements of Speech Loudness in Hands-Free Telephony—Toward an Extended Loudness Model for Telephonometry—Idir Edjekouane, Orange Labs - Lannion Cedex, France; LMA-CNRS; Cyril Plapous, Orange Labs - Lannion Cedex, France; Catherine Quinquis, Orange Labs - Lannion Cedex, France; Sabine Meunier, Centre National de la Recherche Scientifique - Marseille Cedex, France
The loudness rating technique is widely used in telephony. This technique shows some limitations with the recent advances in telecommunications. This paper proposes a new alternative for the loudness rating technique using an extension of Zwicker’s loudness model. We first investigated the loudness of speech transmitted via a telephone system and the ability of Zwicker’s model to predict the perceived loudness. The model predicts the main trends observed in perceptual data. However, a bias exists between the prediction and the measure that depends on sound pressure level. Based on our perceptual data and on recent studies, we proposed a modification of the model at the specific loudness calculation stage. This modification brought a significant improvement on the predictions.
Convention Paper 9440 (Purchase now)

P14-6 Investigation on the Phantom Image Elevation Effect—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening tests have been carried out in order to evaluate the phantom image elevation effect depending on horizontal stereophonic base angle. Seven ecologically valid sound sources as well as four noise sources were tested. Subjects judged the perceived image positions of phantom center image created with seven loudspeaker base angles. Results generally showed that perceived images were elevated from front to above as the loudspeaker base angle increased up to around 180°. This tendency depended on the spectral characteristics of sound source. The perceived results are explained from both physical and cognitive points of view.
Convention Paper 9441 (Purchase now)

P15 - Spatial Audio—Part 2

Saturday, October 31, 2:00 pm — 5:30 pm (Room 1A08)

Chair:
Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK

P15-1 Capturing the Elevation Dependence of Interaural Time Difference with an Extension of the Spherical-Head Model—Rahulram Sridhar, 3D3A Lab, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
An extension of the spherical-head model (SHM) is developed to incorporate the elevation dependence observed in measured interaural time differences (ITDs). The model aims to address the inability of the SHM to capture this elevation dependence, thereby improving ITD estimation accuracy while retaining the simplicity of the SHM. To do so, the proposed model uses an elevation-dependent head radius that is individualized from anthropometry. Calculations of ITD for 12 listeners show that the proposed model is able to capture this elevation dependence and, for high frequencies and at large azimuths, yields a reduction in mean ITD error of up to 13 microseconds (3% of the measured ITD value), compared to the SHM. For low-frequency ITDs, this reduction is up to 160 microseconds (23%).
Convention Paper 9447 (Purchase now)

P15-2 Temporal Reliability of Subjectively Selected Head-Related Transfer Functions (HRTFs) in a Non-Eliminating Discrimination Task—Yunhao Wan, University of Florida - Gainesville, FL, USA; Ziqi Fan, University of Florida - Gainesville, FL, USA; Kyla McMullen, University of Florida - Gainesville, FL, USA
The emergence of commercial virtual reality devices has reinvigorated the need for research in realistic audio for virtual environments. Realistic virtual audio is often realized through the use of head-related transfer functions (HRTFs) that are costly to measure and individualistic to each listener, thus making their use unscalable. Subjective selection allows a listener to pick their own HRTF from a database of premeasured HRTFs. While this is a more scalable option further research is needed to examine listeners' consistency in choosing their own HRTFs. The present study extends the current subjective selection research by quantifying the reliability of subjectively selected HRTFs by 12 participants over time in a non-eliminating perceptual discrimination task.
Convention Paper 9448 (Purchase now)

P15-3 Plane-Wave Decomposition with Aliasing Cancellation for Binaural Sound Reproduction—David L. Alon, Ben-Gurion University of the Negev - Beer-Sheva, Israel; Jonathan Sheaffer, Ben-Gurion University of the Negev - Beer-Sheva, Israel; Boaz Rafaely, Ben-Gurion University of the Negev - Beer Sheva, Israel
Spherical microphone arrays are used for capturing three-dimensional sound fields, from which binaural signals can be obtained. Plane-wave decomposition of the sound field is typically employed in the first stage of the processing. However, with practical arrays the upper operating frequency is limited by spatial aliasing. In this paper a measure of plane-wave decomposition error is formulated to highlight the problem of spatial aliasing. A novel method for plane-wave decomposition at frequencies that are typically considered above the maximal operating frequency is then presented, based on the minimization of aliasing error. The mathematical analysis is complemented by a simulation study and by a preliminary listening experiment. Results show a clear perceptual improvement when aliasing-cancellation is applied to aliased binaural signals, indicating that the proposed method can be used to extend the bandwidth of binaural signals rendered from microphone array recordings.
Convention Paper 9449 (Purchase now)

P15-4 Modeling ITDs Based on Photographic Head Information—Jordan Juras, New York University - New York, NY, USA; Chris Miller, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
Research has shown that personalized spatial cues used in 3D sound simulation lead to an improved perception and quality of the sound image. This paper introduces a simple method for photographically extracting the size of the head, and proposes a fitted spherical head model to more accurately predict Interaural Time Differences (ITD). Head-Related Impulse Responses (HRIR) were measured on eleven subjects, and ITDs were extracted from the measurements. Based on a photograph taken of each subject's face, the distance between the ears was measured and used to model a subject's personal ITDs. A head model is proposed that adjusts the spherical head model to more accurately model ITDs. Acoustic measurements of ITDs are then compared to the modeled ITDs demonstrating the effectiveness of the proposed method for photographically extracting personalized ITDs.
Convention Paper 9450 (Purchase now)

P15-5 Recalibration of Virtual Sound Localization Using Audiovisual Interactive Training—Xiaoli Zhong, South China University of Technology - Guangzhou, China; Jie Zhang, South China University of Technology - Guangzhou, China; Guangzheng Yu, South China University of Technology - Guangzhou, Guangdong, China
In virtual auditory display, non-individual head-related transfer functions (HRTF) of KEMAR result in localization degradation. This work investigates the efficacy of audiovisual interactive training as to the recalibration of such localization degradation. First, an audiovisual interactive training system consisting of control module, binaural virtual sound module, and vision module, was constructed. Then, ten subjects were divided into a control group and a training group, and underwent three-day training and localization tests. Results indicate that in the horizontal plane, the localization accuracy of azimuth is significantly improved with training and the front-back confusion is also reduced; however, in the median plane a three-day short-term training has no significant improvement on the localization accuracy of elevation.
Convention Paper 9451 (Purchase now)

P15-6 Analysis and Experiment on Summing Localization of Two Loudspeakers in the Median Plane—Bosun Xie, South China University of Technology - Guangzhou, China; Dan Rao, South China University of Technology - Guanzhou, Guangdong, China
Based on the hypothesis that the change of interaural time difference caused by head rotation and tilting provides dynamic cues for front-back and vertical localization, low-frequency localization equations or panning laws for multiple loudspeakers in the median plane were derived in our previous work. In present work we further supplement some psychoacoustic explanation of these equations and utilize them to analyze the summing localization of two loudspeakers with various configurations and pair-wise amplitude panning in the median plane. Relationship between current method and other localization theorems is also analyzed. Results indicate that for some configurations, pair-wise amplitude panning is able to create virtual sources between loudspeakers. However, it is unable to do so for some other loudspeaker configurations. A virtual source localization experiment yields consistent results with those of analysis, and therefore validates the proposed method.
Convention Paper 9452 (Purchase now)

P15-7 Immersive Audio Content Creation Using Mobile Devices and Ethernet AVB—Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa; Antoine Rouget, DSP4YOU Ltd. - Kowloon, Hong Kong
The goal of immersive sound systems is to localize multiple sound sources such that listeners are enveloped in sound. This paper describes an immersive sound system that allows for the creation of immersive sound content and real time control over sound source localization. It is a client/server system where the client is a mobile device. The server receives localization control messages from the client and uses an Ethernet AVB network to distribute appropriate mix levels to speakers with in-built signal processing.
Convention Paper 9453 (Purchase now)

P16 - Room Acoustics

Saturday, October 31, 2:15 pm — 3:45 pm (Room 1A07)

Chair:
Rémi Audfray, Dolby Laboratories, Inc. - San Francisco, CA, USA

P16-1 Environments for Evaluation: The Development of Two New Rooms for Subjective Evaluation—Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Adrian Celestinos, Samsung Research America - Valencia, CA, USA; Allan Devantier, Samsung Research America - Valencia, CA, USA
An overview of the optimization, features, and design of two new critical listening rooms developed for subjective evaluation of a wide-array of audio products. Features include a rotating wall for comparing flat-panel televisions, an all-digital audio switching system, custom tablet-based testing software for running a variety of listening experiments, and modular acoustic paneling for customizing room acoustics. Using simulations and acoustic measurements, a study of each of the rooms was performed to analyze the acoustics and optimize the listening environment for different listening situations.
Convention Paper 9460 (Purchase now)

P16-2 Low Frequency Behavior of Small Rooms—Renato Cipriano, Walters Storyk Design Group - Belo Horizonte, Brazil; Robi Hersberger, Walters Storyk Design Group - New York, USA; Gabriel Hauser, Walters Storyk Design Group - Basel, Switzerland; Dirk Noy, WSDG - Basel, Switzerland; John Storyk, Architect, Studio Designer and Principal, Walters-Storyk Design Group - Highland, NY, USA
Modeling of sound reinforcement systems and room acoustics in large- and medium-size venues has become a standard in the audio industry. However, acoustic modeling of small rooms has not yet evolved into a widely accepted concept, mainly because of the unavailable tool set. This work introduces a practical and accurate software-based approach for simulating the acoustic properties of studio rooms based on BEM. A detailed case study is presented and modeling results are compared with measurements. It is shown that results match within given uncertainties. Also, it is indicated how the simulation software can be enhanced to optimize loudspeaker locations, room geometry, and place absorbers in order to improve the acoustic quality of the space and thus the listening experience.
Convention Paper 9461 (Purchase now)

P16-3 Measuring Sound Field Diffusion: SFDC—Alejandro Bidondo, Universidad Nacional de Tres de Febrero - UNTREF - Caseros, Buenos Aires, Argentina; Mariano Arouxet, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Sergio Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Javier Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Germán Heinze, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina
This research addresses the usefulness of an absolute descriptor to quantify the degree of diffusion in a third octave band basis of a sound field. The degree of sound field diffuseness in one point is related with the reflection’s energy control multiplied by the temporal distribution uniformity of reflections. All this information is extracted from a monaural, broadband, omnidirectional, high S/N impulse response. The coefficient range varies between 0 and 1, evaluates the early, late, and total sound field for frequencies above Schroeder’s and in the far field from diffusive surfaces, zero being “no diffuseness” at all. This coefficient allows the comparison of different rooms, different places inside rooms, measurement of the effects of different sound diffusers coatings, and the resulting spatial uniformity variation, among other applications.
Convention Paper 9462 (Purchase now)

P17 - Applications in Audio

Saturday, October 31, 2:15 pm — 3:45 pm (S-Foyer 1)

P17-1 Application of Object-Based Audio for Automated Mixing of Live Football Broadcast—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Darius Satongar, University of Salford - Salford, Greater Manchester, UK
The challenge of creating a live sound mix for a sports event such as a football/soccer match cannot be underestimated. The mixing engineer needs to constantly raise and lower the levels of the faders corresponding to the pitch-side microphones that cover the area of the pitch containing the action at that point in time such that the on-pitch sounds can be heard over the crowd noise. This paper presents an automation of this process based on the detection of audio objects in the microphone feeds and then controls the levels of the faders on the mixing console accordingly. This paper includes a brief description of the underlying algorithms for the detection of ball-kicks and whistle-blows and describes how such a system can be integrated into current broadcast workflows.
Convention Paper 9454 (Purchase now)

P17-2 Personal Adaptive Tuning of Mobile Computer Audio—Kuba Lopatka, Gdansk University of Technology - Gdansk, Poland; Jozef Kotus, Gdansk University of Technology - Gdansk, Poland; Piotr Suchomski, Gdansk University of Technology - Gdansk, Poland; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.
An integrated methodology for enhancing audio quality in mobile computers is presented. The key features are adaptation of the characteristics of the acoustic track to the changing conditions and to the user's individual preferences. Original signal processing algorithms are introduced, which concern linearization of frequency response, dialogue intelligibility enhancement, and dynamics processing tuned up to the user's preferences. The principles of the algorithm implemented in the C++ programming language are provided. The processing is performed utilizing custom Audio Processing Objects (APO) installed in Windows sound system. The sound enhancement bundle is managed with a User Interface enabling control over the sound system. The results of subjective evaluation of the introduced methods are discussed.
Convention Paper 9455 (Purchase now)

P17-3 Audio Effects Data on the Semantic Web—Thomas Wilmering, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Alo Allik, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
We discuss the development of a linked data service exposing metadata about audio effect implementations. The data is collected automatically from Web sources as well as by extracting information from effect plugin binaries, and by manual data entry and correction using a Web service. Automatically generated RDF data is represented using vocabulary terms defined by the Audio Effects Ontology. A SPARQL endpoint allows for the integration of this data resource in novel audio production software and services for the classification, comparison, and recommendation of effects, taking advantage of semantic descriptors.
Convention Paper 9456 (Purchase now)

P17-4 Speech Music Discrimination Using an Ensemble of Biased Classifiers—Kibeom Kim, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea; Anant Baijal, Samsung Electronics Co. Ltd. - Suwon, Korea; Byeong-Seob Ko, Samsung Electronics Co. Ltd. - Suwon, Korea; Sangmoon Lee, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea; Inwoo Hwang, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; Youngtae Kim, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea
In this paper we present a novel framework for real-time speech/music discrimination (SMD). The proposed method improves the overall accuracy of automatically classifying the signals into speech, singing, or instrumental categories. In our work, first, we design several groups of classifiers such that each group’s classification decision is biased towards a certain class of sounds; the bias is induced by training different groups of classifiers on perceptual features extracted at different temporal resolutions. Then, we build our system using an ensemble of these biased classifiers organized in a parallel classification fashion. Last, these ensembles are combined with a weighting scheme, which can be tuned in either forward-weighting or inverse-weighting modes, to provide accurate results in real-time. We show, through extensive experimental evaluations, that the proposed ensemble of biased classifiers framework yields superior performance compared to the baseline approach.
Convention Paper 9457 (Purchase now)

P17-5 Multi-Criteria Decision Aid Analysis of a Musification Approach to the Auditory Display of Micro-Organism Movement—Duncan Williams, University of Plymouth - Devon, UK; Laurence Wilson, University of York - Heslington, York, UK
We evaluate a musification approach to the auditory display of P. berghei flagella movement (a micro-organism that is commonly used in laboratory analysis of malaria transmission). High resolution 3D holography techniques provide the source data. The ultimate goal of this work is to develop an auditory display that could successfully augment existing visual analysis of bacteria motility in-field. The requirement for musification as opposed to sonification, and methods for evaluating the success of this implementation, are explored. An evenly weighted multi-criteria decision aid analysis was undertaken of amenity, immersion, intuitivity, efficiency, and congruency of the musification. Listeners consistently rated the amenity, intuitivity, and congruency of the musification above that of the visual only display and that of a randomized audio accompaniment.
Convention Paper 9458 (Purchase now)

P17-6 An Overview of an Online Audio Electronics Curriculum Offered at the Indiana University Jacobs School of Music—Michael Stucker, Indiana University - Bloomington, IN, USA
An overview will be given of both the pedagogical and technical design of an online curriculum to teach electronics, specifically analog audio electronics. This approach worked to create enhanced engagement in students and allow students to work on their own schedule while still having instructional support. Engagement is particularly difficult with courses taught online and extra effort must be taken to create activities that will increase student participation, focus, and engagement. A great deal of the engagement in an in-person course comes from the interaction of the people involved in the course, whether instructor or student. Creating methods and compelling reasons for student-student and student-instructor interactions is critical to the success of an online course. One of the benefits of online courses is the ability for students to work according to their own schedule. For an online course to be effective, instructional support must be available during whatever hours the student chooses to work on course materials. It is certainly not possible for an instructor to be always available, but course materials can be designed to provide interactive instructional support. This paper will provide an overview of the course design created to solve the aforementioned problems. This will include both the technical details as well as the pedagogy behind the design.
Convention Paper 9459 (Purchase now)

P17-7 A Connection Management System to Enable the Wireless Transmission of MIDI Messages—Brent Shaw, Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
This paper examines the design and implementation of a wireless system for the distribution of MIDI messages for show control and studio environments. The system makes use of the MIDI and MIDINet protocols, creating wireless nodes that will enable the transmission of MIDI between devices on a wireless network with connection management capabilities through the use of embedded web servers. The paper describes the current state of the art, configuration of the system, hardware architectures, software design, and implementation.
Convention Paper 9474 (Purchase now)

P18 - Recording & Production

Saturday, October 31, 4:15 pm — 5:45 pm (S-Foyer 1)

P18-1 The Impact of Subgrouping Practices on the Perception of Multitrack Music Mixes—David Ronan, Queen Mary University of London - London, UK; Brecht De Man, Queen Mary University of London - London, UK; Hatice Gunes, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Subgrouping is an important part of the mix engineering workflow that facilitates the process of manipulating a number of audio tracks simultaneously. We statistically analyze the subgrouping practices of mix engineers in order to establish the relationship between subgrouping and mix preference. We investigate the number of subgroups (relative and absolute), the type of audio processing, and the subgrouping strategy in 72 mixes of 9 songs, by 16 mix engineers. We analyze the subgrouping setup for each mix of a particular song and also each mix by a particular mixing engineer. We show that subjective preference for a mix strongly correlates with the number of subgroups and, to a lesser extent, which types of audio processing are applied to the subgroups.
Convention Paper 9442 (Purchase now)

P18-2 MixViz: A Tool to Visualize Masking in Audio Mixes—Jon Ford, Northwestern University - Evanston, IL, USA; Mark Cartwright, Northwestern University - Evanston, IL, USA; Bryan Pardo, Northwestern University - Evanston, IL, USA
This paper presents MixViz, a real-time audio production tool that helps users visually detect and eliminate masking in audio mixes. This work adapts the Glasberg and Moore time-varying Model of Loudness and Partial Loudness to analyze multiple audio tracks for instances of masking. We extend the Glasberg and Moore model to allow it to account for spatial release from masking effects. Each audio track is assigned a hue and visualized in a 2-dimensional display where the horizontal dimension is spatial location (left to right) and the vertical dimension is frequency. Masking between tracks is indicated via a change of color. The user can quickly drag and drop tracks into and out of the mix visualization to observe the effects on masking. This lets the user intuitively see which tracks are masked in which frequency ranges and take action accordingly. This tool has the potential to both make mixing easier for novices and improve the efficiency of expert mixers.
Convention Paper 9443 (Purchase now)

P18-3 Sound Capture Technical Parameters of Colombian Folk Music Instruments for Virtual Sound Banks Use—Carlos Andrés Caballero Parra, Instituto Tecnológico Metropolitano - Medellín, Antioquia, Colombia; Jamir Mauricio Moreno Espinal, Sr., Instituto Tecnológico Metropolitano - Medellín, Antioquia, Colombia
This paper describes the appropriate and correct way of dealing with the technical conceptualizations required for the digital sound capture of Colombian folk music instruments, taking into account the particular parameters of each instrument and the current audio file formats used in virtual sound banks. This paper does not pose either new capture techniques or microphone placements. Instead, the task carried out herein uses well known methods in order to get precise and clear audio takes that will allow a significant number of audio samples for the configuration of sound banks that can be used in music software and also as virtual instruments. The different tests and analysis carried out showed that a broad sound capture is required (covering the overall instrument range), using plain frequency response microphones, with high-resolution digital conversion formats (96 kHz/24 bits), and near and distant stereo recordings, all these in acoustically-controlled and well-conditioned ambiences.
Convention Paper 9444 (Purchase now)

P18-4 Vocal Clarity in the Mix: Techniques to Improve the Intelligibility of Vocals—Yuval Ronen, New York University - New York, NY, USA
From interviewing leading professional mixing engineers and from research of known literature in the field common mixing techniques to improve the intelligibility of vocals were gathered. An experiment to test these techniques has been conducted on randomly selected participants with normal hearing and with no mixing or recording expertise. The results showed statistically significant differences between processed audio clips using these techniques versus unprocessed audio clips. To the author’s knowledge, this is the first study of its kind, which proved that certain common mixing techniques statistically improve intelligibility of vocals in popular music as perceived by human subjects.
Convention Paper 9445 (Purchase now)

P18-5 Affective Potential in Vocal Production—Duncan Williams, University of Plymouth - Devon, UK
The study of affect in music psychology – broadly construed as emotional responses communicated to, or induced in, the listener – increasingly concludes that voice processing can provide a powerful vector for emotional communication in the music production chain. The audio engineer has the ability to create a “definitive article” in the studio that gives listeners an opportunity to engage with the recorded voice in a manner that is quite distinct from everyday speech or the effect that might be achieved in a typical live performance. This paper examines the affective potential of the voice in a number of examples from popular music where the production chain has been exploited to provide a technological mediation to the listener’s emotional response.
Convention Paper 9446 (Purchase now)

P18-6 Sample-Rate Variance across Portable Digital Audio Recorders—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Paul Kendrick, University of Salford - Salford, UK
In recent years there has been an increase in the use of portable digital recording devices such as, smart phones, tablets, dictaphones, and other portable hand-held recorders for making informal or in-situ recordings. Often it is not possible to connect a clocking signal to these devices as such recordings are affected by the deviations of the actual clocking rate of the device from the expected rate. This variation causes problems in the synchronization of signals from multiple recording devices and can prevent the use of some signal processing algorithms. This paper presents a novel methodology for determining the actual clock rate of digital recording devices based upon optimizing the correlation between a recording and a ground truth signal with varying degrees of temporal stretching. The paper further discusses the effects of sample frequency variation on typical applications. The sampling rates of a range of commonly used mobile audio recording devices was found to deviate from the nominal 48 kHz, with a standard deviation of 0.8172 Hz. The standard deviation of sampling rates for a single device type, used for long term logging of bio-acoustic signals, was found to be 0.1983 Hz (at a sampling rate of 48 kHz).
Convention Paper 9470 (Purchase now)

P18-7 Comparison of Loudness Features for Automatic Level Adjustment in Mixing—Gordon Wichern, iZotope, Inc. - Cambridge, MA, USA; Aaron Wishnick, iZotope - Cambridge, MA, USA; Alexey Lukin, iZotope, Inc. - Cambridge, MA, USA; Hannah Robertson, iZotope - Cambridge, MA, USA
Manually setting the level of each track of a multitrack recording is often the first step in the mixing process. In order to automate this process, loudness features are computed for each track and gains are algorithmically adjusted to achieve target loudness values. In this paper we first examine human mixes from a multitrack dataset to determine instrument-dependent target loudness templates. We then use these templates to develop three different automatic level-based mixing algorithms. The first is based on a simple energy-based loudness model, the second uses a more sophisticated psychoacoustic model, and the third incorporates masking effects into the psychoacoustic model. The three automatic mixing approaches are compared to human mixes using a subjective listening test. Results show that subjects preferred the automatic mixes created from the simple energy-based model, indicating that the complex psychoacoustic model may not be necessary in an automated level setting application.
Convention Paper 9370 (Purchase now)

P19 - Spatial Audio—Part 3

Sunday, November 1, 9:00 am — 12:30 pm (Room 1A08)

Chair:
Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA

P19-1 Estimating the Total Sound Power of Loudspeakers—Adrian Celestinos, Samsung Research America - Valencia, CA, USA; Allan Devantier, Samsung Research America - Valencia, CA, USA; Andri Bezzola, Samsung Research America - Valencia, CA USA; Ritesh Banka, Samsung Research America - Valencia, CA USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions
When designing loudspeakers, a number of parameters have to be known. The total radiated sound power is one of these measures. Typically performed in anechoic conditions a large number of measurements are needed for this estimation. It is of interest to know how accurate this estimation is related to the actual radiated power. Two coherent point sound sources separated by 30 cm are simulated in three scenarios. The sound pressure is calculated over discrete points at a distance around a sphere covering the two point sources. The error between estimated and analytical sound power solution is computed. A number of different microphone arrangements are tested. Results suggest that spatial distribution over the sphere and the number of measurements is critical.
Convention Paper 9463 (Purchase now)

P19-2 Loudness Matching Multichannel Audio Program Material with Listeners and Predictive Models—Jon Francombe, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Frank Melchior, BBC Research and Development - Salford, UK
Loudness measurements are often necessary in psychoacoustic research and legally required in broadcasting. However, existing loudness models have not been widely tested with new multichannel audio systems. A trained listening panel used the method of adjustment to balance the loudness of eight reproduction methods: low-quality mono, mono, stereo, 5-channel, 9-channel, 22-channel, ambisonic cuboid, and headphones. Seven program items were used, including music, sport, and a film soundtrack. The results were used to test loudness models including simple energy-based metrics, variants of ITU-R BS.1770, and complex psychoacoustically motivated models. The mean differences between the perceptual results and model predictions were statistically insignificant for all but the simplest model. However, some weaknesses in the model predictions were highlighted.
Convention Paper 9464 (Purchase now)

P19-3 Dynamic Range and Loudness Control in MPEG-H 3D Audio—Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Michael Kratschmer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bernhard Neugebauer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Michael Meier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frank Baumgarte, Apple Inc. - Cupertino, CA, USA
Recently the new MPEG-H 3D Audio standard has been finalized. It has been designed for delivery of next generation audio content to the user. In addition to highly efficient immersive audio transmission, MPEG-H 3D Audio allows new capabilities such as personalization and adaptation of the audio content to different use scenarios. It also provides an enhanced concept for loudness and dynamic range control (DRC) to adapt the characteristics of the audio content to the requirements of different playback scenarios and listening conditions. This paper gives a detailed overview of the loudness control and DRC functionality of MPEG-H 3D Audio. Relevant use cases are discussed to exemplify the application of the enhanced DRC and loudness management features.
Convention Paper 9465 (Purchase now)

P19-4 Implementing the Radiation Characteristics of Musical Instruments in a Psychoacoustic Sound Field Synthesis System—Tim Ziemer, University of Hamburg - Hamburg, Germany; Rolf Bader, Universität Hamburg - Hamburg, Germany
A method is introduced to measure the radiation characteristics of musical instruments and to calculate the sound field radiated to an extended listening area. This sound field is synthesized by means of a loudspeaker system to create a natural, spatial instrumental sound. All instruments are considered as complex point sources, which makes it easy to measure, analyze, and compare their radiation characteristics as well as to propagate the radiated sound to discrete listening points. The sound field at these listening points as well as the loudspeaker driving signals to synthesize them are calculated in frequency domain. This makes spatial windowing superfluous and allows for all loudspeakers to be active for any virtual source position. However, this procedure introduces synthesis errors that are compensated for the listener by implementing psychoacoustic methods. The synthesis principle works already with low-order loudspeaker systems such as discrete quadraphonic and 5.1 systems as well as with existing ambisonics and wave field synthesis setups with dozens to hundreds of loudspeakers. Aliasing frequency and synthesis precision are dependent on the number of loudspeakers and the extent of the listening area, not on the distance of adjacent loudspeakers. A listening test demonstrates that the approach creates a listening experience comparable with mono and stereo concerning localization and naturalness of the sound and an increased spaciousness.
Convention Paper 9466 (Purchase now)

P19-5 New Techniques for Sound Motion and Display in a 52.1 Surround Sound Hall—Tomás Henriques, SUNY College at Buffalo - Buffalo, NY, USA
The creation of a 52.1 surround sound system is described with a focus on new strategies for sound motion and localization. Innovative artistic, technical, and research approaches to multichannel electronic music composition, spatial sound design, and sound-localization solutions for the study of auditory perception are introduced. A set of software applications is discussed to illustrate the scope of creative possibilities offered by the surround system as a singular performance and research venue.
Convention Paper 9467 (Purchase now)

P19-6 Physical Properties of Modal Beamforming in the Context of Data-Based Sound Reproduction—Nara Hahn, University of Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
A sound field captured by a microphone array can be decomposed into plane waves, and auralized by means of sound field synthesis or binaural synthesis. The achievable performance is limited by the spatial resolution of the plane wave decomposition. Typically, the plane wave decomposition is performed with respect to an expansion center. If the expansion center is translated, the accuracy of the plane wave representation decreases. It is thus likely that the reproduced sound field also suffers form artifacts at off-center listening positions. The aim of this paper is to investigate the physical properties of a sound field represented as plane wave decomposition. The sound field is re-expanded with respect to different positions, and the corresponding modal spectra are investigated. This analysis successfully explains the spectral and temporal properties of spatially continuous and discrete modal beamforming.
Convention Paper 9468 (Purchase now)

P19-7 The Vertical Precedence Effect: Utilizing Delay Panning for Height Channel Mixing in 3D Audio—Adrian Tregonning, New York University - New York, NY, USA; Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada
A strong understanding of psychoacoustic cues is necessary for effective 3D sound reproduction, and the vertical aspects of acoustics and psychoacoustics become even more important than for stereo. This study investigated vertical inter-channel time differences (ICTDs) for frontal imaging in the Auro-3D 9.1 loudspeaker configuration. It was found that vertical ICTDs had a significant effect on perceived images, indicating the operation of the precedence effect in the vertical direction. In particular, 5 ms was found to be a threshold for maximal source elevation. Above this threshold, elevation effects were less prominent but ICTDs significantly increased both phantom image width and vertical spread. The techniques established in this study can assist in the creation of effective immersive content.
Convention Paper 9469 (Purchase now)

P20 - Forensic Audio

Sunday, November 1, 9:00 am — 10:00 am (Room 1A07)

Chair:
Rob Maher, Montana State University - Bozeman, MT, USA

P20-1 Advancing Forensic Analysis of Gunshot Acoustics—Rob Maher, Montana State University - Bozeman, MT, USA; Tushar Routh, Montana State University - Bozeman, MT, USA
This paper describes our current work to create the apparatus and methodology for scientific and repeatable collection of firearm acoustical properties, including the important direction-dependence of each firearm’s sound field. Gunshot acoustical data is collected for a wide range of firearms using an elevated shooting platform and an elevated spatial array of microphones to allow echo-free directional recordings of each firearm’s muzzle blast. The results of this proposed methodology include a standard procedure for cataloging firearm acoustical characteristics and a database of acoustical signatures as a function of azimuth for a variety of common firearms and types of ammunition.
Convention Paper 9471 (Purchase now)

P20-2 Forensic Sound Analyses of Cellular Telephone Recordings—Durand R. Begault, Audio Forensic Center, Charles M. Salter Associates - San Francisco, CA, USA; Adrian L. Lu, Audio Forensics Center, Charles M. Salter Associates - San Francisco, CA, USA; Philip Perry, Audio Forensics Center, Charles M. Salter Associates - San Francisco, CA, USA
Recordings involving cellular telephones or personal digital assistants (“PDAs”) are increasingly the source evidence in audio forensic examinations, compared to recordings originating with other devices such as hand-held digital recorders. On modern PDA cellular telephones recordings can be made either directly to the telephone or transmitted as voice mail messages. The current investigation focuses on differences in the two types of recordings in terms of dynamic range and linearity of levels. Such information can be important for characterizing the distance of sound sources relative to the microphone and are important for understanding transformation of recorded speech and non-speech sounds.
Convention Paper 9472 (Purchase now)

P21 - Applications in Audio

Sunday, November 1, 10:00 am — 11:00 am (Room 1A07)

Chair:
Jason Corey, University of Michigan - Ann Arbor, MI, USA

P21-1 Loudness: A Function of Peak, RMS, and Mean Values of a Sound Signal—Hoda Nasereddin, IRIB University - Tehran, Iran; Ayoub Banoushi, IRIB University - Tehran, Iran
Every sound has a loudness recognized by hearing mechanism. Although loudness is a sensation measure, it is a function of sound signal properties. However, the function is not completely clear. In this paper we show that loudness determination as a function of effective mean square (RMS), peak, and average values of a sound signal is possible with an artificial neural network (ANN). We did not access to experimental data, so we produced required data using ITU-R BS.1770 model to train the network. The results show that the loudness can be simply estimated using sound signal physical features and without referring to complex hearing mechanism.
Convention Paper 9473 (Purchase now)

P21-2 Robust Audio Fingerprinting for Multimedia Recognition Applications—Sangmoon Lee, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea; Inwoo Hwang, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; Byeong-Seob Ko, Samsung Electronics Co. Ltd. - Suwon, Korea; Kibeom Kim, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea; Anant Baijal, Samsung Electronics Co. Ltd. - Suwon, Korea; Youngtae Kim, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-do, Korea
For a reliable audio fingerprinting (AFP) system for multimedia service, it is essential to make fingerprints robust to the time mismatch between live audio stream and prior recordings, as well as they should be sensitive to changes in contents for accurate discrimination. This paper presents a new AFP method using line spectral frequencies (LSFs), which are a kind of parameter that capture the underlying spectral shape: the proposed AFP method includes a new systematic scheme for the robust and discriminative fingerprint generation based on the inter-frame LSF difference and an efficient matching algorithm using the frame concentration measure based on the frame continuity property. The tests on databases containing a variety of advertisements are carried out to compare the performances of Phillips Robust Hash (PRH) and the proposed AFP. The test results demonstrate that the proposed AFP can maintain its true matched rate at over 98% even when the overlap ratio is as low as 87.5%. It can be concluded that the proposed AFP algorithm is more robust to time mismatch conditions when compared to PRH method.
Convention Paper 9475 (Purchase now)

P22 - Sound Reinforcement

Sunday, November 1, 2:00 pm — 4:00 pm (Room 1A08)

Chair:
Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK

P22-1 From Studio to Stage—Guillaume Le Hénaff, Conservatory of Paris - Paris, France
To convert studio produced music into a live concert is a key issue for a lot of artists. Studio work is often a long-term undertaking during which everything is subject to attentive decisions, e.g., instruments, performers, recording venues, microphones. When performing songs from a record in concert, all these decisions have to be reviewed or at least questioned. Indeed, studio and stage are two really different production contexts and differ on so many points that artists often change their arrangements, line-up or even the form of their songs. However, live sound engineers may be expected to reproduce the sound quality and aesthetics of the record. In this paper we propose solutions regarding the switchover from studio to stage to provide artists and engineers with useful tools when designing the sound of a studio album-inspired live show. Specifically, we explain why and how performing music is different in concert than in studio, we detail types of microphones that are suited to both recording and sound reinforcement applications and we take an inventory of miking tricks and mixing techniques like Virtual Soundcheck that offer a studio workflow to Front of House engineers.
Convention Paper 9476 (Purchase now)

P22-2 Some Effects of Speech Signal Characteristics on PA System Performance and Design—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Although the characteristics of speech signals have been extensively studied for more than 90 years, going back to the work of Harvey Fletcher and Bell Labs pioneering research, the characteristics of speech are not as well understood by the PA and sound reinforcement industries as they perhaps should be. Significant differences occur in both the literature and between international standards concerning such basic parameters as speech spectra and level. The paper reviews the primary characteristics of speech of relevance to sound systems design and shows how differences within the data or misapplication of it can lead to impairment of system performance and potential loss of intelligibility. The implications for compliance with various National and International Life Safety standards are discussed.
Convention Paper 9477 (Purchase now)

P22-3 Directivity-Customizable Loudspeaker Arrays Using Constant-Beamwidth Transducer (CBT) Overlapped Shading—Xuelei Feng, Nanjing University - Nanjing, Jiangsu, China; Yong Shen, Nanjing University - Nanjing, Jiangsu Province, China; D.B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA; Jie Xia, Nanjing University - Nanjing, China
In this work a multiple constant-beamwidth transducer (Multi-CBT) loudspeaker array is proposed that is constructed by applying multiple overlapping CBT Legendre shadings to a circular-arc or straight-line delay-curved multi-acoustic-source array. Because it has been proved theoretically and experimentally that the CBT array provides constant broadband directivity behavior with nearly no side lobes, the Multi-CBT array can provide a directivity-customizable sound field with frequency-independent element weights by sampling and reconstructing the targeted directivity pattern. Various circularly curved Multi-CBT arrays and straight-line, delay-curved Multi-CBT arrays are analyzed in several application examples that are based on providing constant Sound Pressure Level (SPL) on a seating plane, and their performance capabilities are verified. The power of the method lies in the fact only a few easily-adjustable real-valued element weights completely control the shape of the polar pattern that makes matching the polar shape to a specific seating plane very easy. The results indicate that the desired directivity patterns can indeed be achieved.
Convention Paper 9478 (Purchase now)

P22-4 A Novel Approach to Large-Scale Sound Reinforcement Systems—Mario Di Cola, Audio Labs Systems - Casoli (CH), Italy; Alessandro Tatini, K-Array S.r.l. - Florence, Italy
An innovative approach to vertical array technology in large-scale sound reinforcement is presented. The innovation introduced consists in mechanical arrangement of the array as well as DSP processing for computer assisted coverage optimization. Beyond these innovations, a different form factor of the vertical array elements and the unusual acoustic principle of dipole are also involved as well as an alternative mechanical aiming method. The paper presents a synthesis of this innovative concept supported by detailed descriptions, test measurement results and proven results from real world applications that have been done.
Convention Paper 9479 (Purchase now)

P23 - Cinema Sound

Sunday, November 1, 2:00 pm — 4:00 pm (Room 1A07)

Chair:
Scott Levine, Skywalker Sound

P23-1 Some Observations on Vinegar Syndrome—Scott Dorsey, Kludge Audio - Williamsburg, VA, USA
In the 1980s it became evident that cellulose triacetate, used as a base for motion picture film and recording tape, was unstable and occasionally suffered from deacetylation with no clear pattern as to which material would be stable and which would not. The author recaps existing research on the nature and basic chemistry of vinegar syndrome, adding some observations of his own on various triggers, and describes a unsuccessful attempt to replasticize damaged film and tape so as to make it playable.
Convention Paper 9480 (Purchase now)

P23-2 Hybrid Channel-Object Approach for Cinema Post-Production Using Particle Systems—Nuno Fonseca, IT/ESTG, Polytechnic Institute of Leiria - Leiria, Portugal
Particle systems are a new sound design approach that is receiving some attention from the cinema community due to the ability to handle thousands of sound sources simultaneously. Unfortunately, current immersive object-based audio formats are not prepared for such scale, forcing sound designers to use traditional channel-based audio approaches. This paper presents a hybrid approach that tries to merge the advantages of both object-based audio and channel-based audio. By using audio-objects with static positions, a high number of virtual “speaker channels” can be created, adding more space resolution than traditional channel-based formats, and allowing the mix of thousands of sounds. The proposed method can be used not only with particle systems software but also on object-based audio downmixing processes or even in high-demanding audio post-production workflows.
Convention Paper 9481 (Purchase now)

P23-3 Measurement of Low Frequencies in Rooms—David Murphy, Krix Loudspeakers - Hackham, South Australia
When aligning and tuning sound systems in commercial cinemas and other rooms there are difficulties with setting an appropriate response for low frequencies and the LFE channel. Long standing practice has been to use pink noise and Real Time Analyzers, but this method is “time blind” and includes the reverberation of the room. A new method is outlined for measuring frequency response at low frequencies. This method uses microphones arranged in a low frequency end-fire array to create useful directivity to discriminate against sound waves from rear wall reflections and reverberation. It also operates in the time domain, processing the acoustic impulse response as it arrives at successive microphones–a shotgun microphone writ large.
Convention Paper 9482 (Purchase now)

P23-4 Subjective Listening Tests for Preferred Room Response in Cinemas-Part 1: System and Test Descriptions—Linda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA
SMPTE and ISO have specified near identical in-room target response curves for cinemas and dubbing stages. However, to this author's knowledge, to date these standards have never been scientifically tested and validated with modern technology and measurement techniques. For this reason, it is still not known if the current SMPTE and ISO in-room target response curves are optimal or if better solutions exist. This paper describes the Binaural Room Scanning system and listening test methodologies for simulating a cinema sound reproduction system through headphones for the purpose of conducting controlled listening experiments. The method uses a binaural mannequin equipped with a computer-controlled rotating head to accurately capture binaural impulse responses of the sound system and the listening space which are then reproduced via calibrated headphones equipped with a head-tracker. In this way, controlled listening evaluations can be made among different cinema audio systems tuned to different in-room target responses. Two different types of listening tests were developed and are described.
Convention Paper 9483 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS October 30th 10am - 6pm October 31st 10am - 6pm November 1st 10am - 4pm

REGISTRATION DESK October 28th 3pm - 7pm October 29th 8am - 6pm October 30th 8am - 6pm October 31st 8am - 6pm November 1st 8am - 4pm

TECHNICAL PROGRAM October 29th 9am - 7pm October 30th 9am - 7pm October 31st 9am - 7pm November 1st 9am - 6pm

Audio Engineering Society

AES New York 2015Paper Session Details