AES Los Angeles 2016
Paper Session Details

P1 - Transducers—Part 1

Thursday, September 29, 9:00 am — 10:30 am (Rm 403A)

Chair:
Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA

P1-1 Holographic Nearfield Measurement of Loudspeaker Directivity—Wolfgang Klippel, Klippel GmbH - Dresden, Germany; Christian Bellmann, Klippel GmbH - Dresden, Germany
The acoustical output of loudspeaker systems is usually measured in the far field under anechoic conditions requiring a large measurement distance and special treatment of the room (absorbing room boundaries, air condition). Also the measurements of directivity characteristics at sufficient angular resolution are also very time consuming. The measurement in the near field of the sound source provides significant benefits (dominant direct sound, higher SNR, less climate impact) but requires a scanning process and a holographic processing of the measured data. This paper describes the theoretical basis of the new measurement technique and the practical consequences for the loudspeaker diagnostics.
Convention Paper 9598 (Purchase now)

P1-2 Fully Coupled Time Domain Simulation of Loudspeaker Transducer Motors—Andri Bezzola, Samsung Research America - Valencia, CA USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions
We present a novel time-dependent simulation method to calculate the response of a loudspeaker motor. The model allows for the simulation of complex signals and predicts the large-signal behavior including motor nonlinearities using only the motor geometry and material parameters without the need to measure physical samples. The transient large-signal simulation is made possible by the implementation of a moving-mesh algorithm for the displacement of the voice coil. Two motor geometries are simulated with different input signals, ranging from simple sine to complex random signals. The method provides previously unavailable insight into effects of flux modulation. The results are validated against a lumped parameter model and experimental measurements. The presented method can be used to compare different motor geometries before the prototyping stage, which is a useful tool for loudspeaker transducer engineers.
Convention Paper 9599 (Purchase now)

P1-3 Necessary Delay and Non-Causal Identification for Online Loudspeaker Modelling Considering Voice Coil Inductance—Rong Hu, Cirrus Logic - Austin, TX, USA; Jie Su, Cirrus Logic - Austin, TX, USA
The authors revisit the topic of online electrical system identification with adaptive filters for dynamic loudspeakers and investigate into the causality of the plant loudspeaker systems to be identified. The effects of the non-causal portion of plant system are analyzed and simulated results establish the link between the non-causality in impulse response and the voice coil inductance. The improvements of introducing necessary delay to the desired signal are proposed to enable the characterization of such non-causality. The proposed architecture with small delay extends the working bandwidth of online loudspeaker system identification and improves the accuracy of existing adaptive identification schemes without delays, which are traditionally restricted to run at low frequency bands.
Convention Paper 9600 (Purchase now)

P2 - Education

Thursday, September 29, 9:00 am — 11:00 am (Rm 409B)

Chair:
Doyuen Ko, Belmont University - Nashville, TN, USA

P2-1 Understanding Project-Based Learning in the Audio Classroom: Using PBL to Facilitate Audio Storytelling—Kyle P. Snyder, Ohio University, School of Media Arts & Studies - Athens, OH, USA
One of the more prevalent buzzwords in education today, project-based learning is a natural fit for the audio engineering classroom. With students that thrive by working toward a common goal or “learning by doing,” this constructivist framework is worth examining as implemented by educators. This paper discusses project-based learning as implemented in an audio engineering classroom to facilitate audio storytelling and provides recommendations for faculty looking to implement project-based learning into their curriculum.
Convention Paper 9601 (Purchase now)

P2-2 The Graduate Audio Database Project: A Look into Pragmatic Decision-Making in Postgraduate AE Curricular Design—Daniel A. Walzer, University of Massachusetts Lowell - Lowell, MA, USA
This paper reports on the first phase of a comparative project to build a Graduate Audio Database (GAD) of North American colleges and universities (N=66) offering 86 Master’s degrees. Data came from available information drawn from institutional websites, course descriptions, professional and educational organizations, and targeted keyword searches. Each credential received categorization across seven areas. Results indicate that 38% of institutions list the Master of Fine Arts (MFA) as the most common degree offering and 92% of universities emphasize the creative aspects of audio and sound. This paper explores the role of action research to build an exploratory review of graduate-level audio degrees and reflect on how decision-making affects postgraduate curricular mapping.
Convention Paper 9602 (Purchase now)

P2-3 Equalizing Frequencies: Gender and Audio Technology in American Higher Education—Roseanna Tucker, University of Southern California - Los Angeles, CA, USA
Unequal gender representation pervades audio engineering and production programs in higher education in the United States but has hitherto been the subject of limited discourse. This paper intends to corroborate survey data and observations from audio-technology professors and students with research concerning gender and academic performance in audio-technology and other disciplines displaying similar gender inequities. Research pertaining to female science, technology, engineering, arts, and math (STEAM) majors suggests a number of strategies to assist educators in affecting more inclusive, equitable classroom cultures. The author focused primarily on the dearth of female audio-technology professors, gender as a factor in classroom participation, and extracurricular student culture, and the impact of gendered expectations concerning music and audio-technology during the precollege years.
Convention Paper 9603 (Purchase now)

P2-4 Withdrawn—N/A

Convention Paper 9604 (Purchase now)

P3 - Transducers—Part 2

Thursday, September 29, 10:45 am — 12:15 pm (Rm 403A)

Chair:
Christopher Struck, CJS Labs - San Francisco, CA, USA; Acoustical Society of America

P3-1 Power Considerations for Distortion Reduction of Loudspeakers—Ajay Iyer, Harman International - Salt Lake City, UT, USA; Douglas J. Button, Harman International - Northridge, CA USA; Russell H. Lambert, Harman International - Salt Lake City, UT, USA
Over the last 25 years, scientists and engineers have written extensively about methods to reduce distortion in loudspeakers with Digital Signal Processing (DSP). Despite the several proposed solutions, no formal product exists on the market today that employs distortion reduction. In this paper the answer to some fundamental questions about what is required to make substantial improvements in loudspeaker performance is investigated through computer simulations. This research examines the level of volume achievable while still maintaining acceptable levels of distortion. Transducer designs that are best suited for this application are studied and identified.
Convention Paper 9605 (Purchase now)

P3-2 Improving the Sound Balance with Dynamic Control of Membrane Excursion—Mikhail Pahomov, LG Electronics, Inc. - St. Petersburg, Russia; Ivan S. Tolokonnikov, LG Electronics Inc. - St. Petersburg, Russia; Victor Rozhnov, LG Electronics Inc. - St. Petersburg, Russia; Mikhail Gusev, LG Electronics Inc. - St. Petersburg, Russia
The electrodynamic transducers’ that are used in mobile devices are typically prone to voice coil overheating and excessive excursion of the membrane. The paper focuses on the second aspect. Nonlinear distortion is known to depend on membrane excursion amplitude. High sound pressure at low frequencies also requires the maximum vibration amplitude. But now the sound balance is at stake. Thus, we face the challenge of finding the optimal relation between the sound balance and the level of audible distortion to obtain the maximum subjective quality evaluation.
Convention Paper 9606 (Purchase now)

P3-3 Force Factor Modulation in Electro Dynamic Loudspeakers—Lars Risbo, Purifi - Hvalsoe, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Carsten Tinggaard, PointSource Acoustics - Roskilde, Denmark; Morten Halvorsen, PointSource Acoustics - Roskilde, Denmark; Bruno Putzeys, Purifi - Rotselaar, Belgium
The relationship between the non-linear phenomenon of “reluctance force” and the position dependency of the voice coil inductance was established in 1949 by Cunningham, who called it “magnetic attraction force.” This paper revisits Cunningham’s analysis and expands it into a generalized form that includes the frequency dependency and applies to coils with non-inductive (lossy) blocked-impedance. The paper also demonstrates that Cunningham’s force can be explained physically as a modulation of the force factor that again is directly linked to modulation of the flux of the coil. A verification based on both experiments and simulations is presented along discussions of the impact of force factor modulation for various motor topologies. Finally, it is shown that the popular L2R2 coil impedance model does not correctly predict the force unless the new analysis is applied.
Convention Paper 9607 (Purchase now)

P4 - Spatial Audio 1: Production

Thursday, September 29, 2:15 pm — 3:45 pm (Rm 409B)

Chair:
Hyunkook Lee, University of Huddersfield - Huddersfield, UK

P4-1 A Three-Dimensional Orchestral Music Recording Technique, Optimized for 22.2 Multichannel Sound—Will Howie, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada
Based on results from previous research, as well as a new series of experimental recordings, a technique for three-dimensional orchestral music recording is introduced. This technique has been optimized for 22.2 Multichannel Sound, a playback format ideal for orchestral music reproduction. A novel component of the recording technique is the use of dedicated microphones for the bottom channels, which vertically extend and anchor the sonic image of the orchestra. Within the context of highly dynamic orchestral music, an ABX listening test confirmed that subjects could successfully differentiate between playback conditions with and without bottom channels.
Convention Paper 9612 (Purchase now)

P4-2 Subjective Graphical Representation of Microphone Arrays for Vertical Imaging and Three-Dimensional Capture of Acoustic Instruments, Part I—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
This investigation employs a simple graphical method in an effort to represent the perceived spatial attributes of three microphone arrays designed to create vertical and three-dimensional audio images. Three separate arrays were investigated in this study: 1. Coincident, 2. M/S-XYZ, and 3. Non-coincident. Instruments of the orchestral string, woodwind, and brass sections were recorded. Test subjects were asked to represent the spatial attributes of the perceived audio image on a horizontal/vertical grid via a pencil drawing. It can be seen in the subjects’ representations that these techniques clearly capture much more information than a single microphone and exhibit vertical as well as horizontal aspects of the audio image.
Convention Paper 9613 (Purchase now)

P4-3 Grateful Live: Mixing Multiple Recordings of a Dead Performance into an Immersive Experience—Thomas Wilmering, Queen Mary University of London - London, UK; Centre for Digital Music (C4DM); Florian Thalmann, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
Recordings of historical live music performances often exist in several versions, either recorded from the mixing desk, on stage, or by audience members. These recordings highlight different aspects of the performance, but they also typically vary in recording quality, playback speed, and segmentation. We present a system that automatically aligns and clusters live music recordings based on various audio characteristics and editorial metadata. The system creates an immersive virtual space that can be imported into a multichannel web or mobile application allowing listeners to navigate the space using interface controls or mobile device sensors. We evaluate our system with recordings of different lineages from the Live Music Archive’s Grateful Dead collection.
Convention Paper 9614 (Purchase now)

P5 - Acoustics, Transducers, and Audio

Thursday, September 29, 2:15 pm — 3:45 pm (Rm 403B)

P5-1 Combined Inverse Filtering and Feedback Control for Robust Equalization and Distortion Reduction in Loudspeaker Systems—Yusuke Kadowaki, Kyushu University - Kyushu, Japan; Toshiya Samejima, Kyushu University - Kyushu, Japan
A method for the robust equalization and distortion reduction of loudspeakers is proposed. The proposed method adopts both an IIR-type inverse filter and a feedback control. The feedback control based on model-following control theory is used to force a loudspeaker to move as a linear time-invariant (LTI) system. Accordingly, we expect the inverse filter that is specifically designed for the LTI system to work correctly. Furthermore, nonlinear distortion of a loudspeaker is expected to be reduced. Computer simulation shows that the proposed method achieves more robust equalization of a loudspeaker than inverse filtering alone. In addition, the proposed method simultaneously reduces nonlinear distortion of the loudspeaker.
Convention Paper 9608 (Purchase now)

P5-2 Investigation of Impulse Response Recording Techniques in Binaural Rendering of Virtual Acoustics—Kaushik Sunder, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media dn Technology (CIRMMT) - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
With the advent of virtual reality headsets, accurate rendering of the acoustics of the real space is critical to deliver a truly immersive experience. To ensure the veracity of immersion, there is a need to obtain high quality impulse responses that captures all the relevant acoustical features of the space. In this work we investigate and compare the perception of virtual acoustics rendered over headphones using impulse responses captured with (a) binaural dummy-head, and (b) multichannel (8-channel) microphone array. A downmixing algorithm is developed that converts the free-field 8-channel impulse responses to binaural for rendering over headphones. Subjective experiments suggest higher quality of immersion with reconstructed binaural from multichannel room impulse responses compared to the measured binaural room impulse responses. This investigation provides important information in understanding the essential elements in creating a convincing perception of an acoustic space.
Convention Paper 9609 (Purchase now)

P5-3 New Recording Application for Software Defined Media—Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan; Takuro Sone, Yamaha Corporation - Shizuoka, Japan; Kenta Niwa, NTT Media Inelligence Laboratories - Tokyo, Japan; Shoichiro Saito, NTT Media Intelligence Laboratories - Tokyo, Japan; Manabu Tsukada, University of Tokyo - Tokyo, Japan; Hiroshi Esaki, University of Tokyo - Tokyo, Japan
In recent years, hardware-based systems are becoming software-based and networked. From IP based media networks, the notion of Software Defined Media (SDM) has arisen. SDM is an architectural approach to media as a service by virtualization and abstraction of networked infrastructure. With this approach, it would be possible to provide more flexible and versatile systems. To test this concept, a baroque orchestra was recorded by various methods with 82 channels of microphones in total. All the data was organized based on the object-based concept and we applied advanced signal processing to the data based on array signal processing technology to produce a content matching various purposes of possible applications. Through this study, the value of SDM concept is verified.
Convention Paper 9610 (Purchase now)

P5-4 Interference Evaluation of Parametric Loudspeakers on Digital Hearing Aids—Santi Peksi, Nanyang Technological University - Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore; Dong-Yuan Shi, Nanyang Technological University - Singapore; Satya Vijay Reddy Medapati, Tan Tock Seng Hospital - Singapore; Eu-Chin Ho, Tan Tock Seng Hospital - Singapore
Parametric loudspeakers are able to generate a highly-directional sound, and recently it has also been used to help the hearing impaired to hear TV programs better. However, there are incidents that particular hearing aid users have reported audible interferences in the path of directional sound beams during the clinical trials. The interference varies from buzzing noise to static noise for various commercialized behind-the-ear (BTE) hearing aids. To investigate the audible interference, hearing aid output measurements were carried out using B&K Head and Torso Simulators (HATS) inside an anechoic room at various distances for four types of parametric loudspeakers. This paper also investigates its possible cause of interference and raises awareness to professionals on potential audible interference on hearing aids using parametric loudspeakers.
Convention Paper 9611 (Purchase now)

P6 - Transducers—Part 3

Thursday, September 29, 4:00 pm — 6:00 pm (Rm 403A)

Chair:
Mark Gander, JBL Professional/Harman International - Northridge, CA, USA

P6-1 Measurement of the Frequency and Angular Responses of Loudspeaker Systems Using Radiation Modes—Maryna Sanalatii, Laboratoire de Mécanique et d'Acoustique UPR CNRS - Marseille, France; Université du Maine, UMR CNRS - Le Mans cedex 9, France; Philippe Herzog, CNRS-LMA - Marseille, France; Manuel Melon, Université du Maine - Le Mans cedex 9, France; Régine Guillermin, Laboratoire de Mécanique et d'Acoustique UPR CNRS - Marseille, France; Jean-Christophe Le Roux, Centre de Transfert de Technologie du Mans - Le Mans, France; Nicolas Poulain, Centre de Transfert de Technologie du Mans - Le Mans, France
In this paper the ”radiation mode” (RM) method is applied to the measurement of the frequency response and directivity pattern of two loudspeaker systems. This approach is based on solving the discretized Helmholtz equation on the source boundaries to obtain an efficient expansion suitable to represent any field radiated by a source. Bookshelf and column systems have been tested. Results obtained with the proposed method are then compared to the ones given by two other methods: measurement in an anechoic room and boundary element computation based on the scanning of the membrane velocity. Results show a good agreement between the different methods. Pros and cons of the different approaches are then discussed as well as the possibility to use the ”radiation mode” method in non-anechoic rooms.
Convention Paper 9615 (Purchase now)

P6-2 Vandermonde Method for Separation of Nonlinear Orders and Measurement of Linear Response—Russell H. Lambert, Harman International - Salt Lake City, UT, USA
The Vandermonde matrix method for separation of nonlinear components and full-power linear response measurement is analyzed in this paper. This technique involves making several measurements of a nonlinear system (a woofer or horn system for example) at different gains and applying the inverse Vandermonde gain matrix to the vector of outputs. The Vandermonde matrix method does more than just return the linear response, as it more generally separates all of the nonlinear orders breaking difficult nonlinear system estimation tasks into more tractable problems, one for each Volterra kernel. Quantitative measures for the degree of nonlinear order separation are proposed. The Vandermonde matrix order separation method is analyzed for noise robustness and gain spacing sensitivity and found to be a useful and practical tool for audio measurements.
Convention Paper 9616 (Purchase now)

P6-3 Fluid Dynamics Analysis of Ported Loudspeakers—Juha Backman, Genelec Oy - Iisalmi, Finland; Microsoft Mobile - Espoo, Finland
The small-signal performance of ported loudspeakers is described in an excellent way by traditional models, such as lumped parameters, waveguide models, or numerical solutions of the acoustic wave equation. However, the acoustic models are clearly insufficient to predict the nonlinear behavior of ported enclosures. This paper presents the results of a computational fluid dynamics analysis of an unlined ported enclosure, focusing on the behavior around the tuning frequency. The results indicate that the vortex formation around the port ends has a significant effect already at a relatively low flow velocities and that the transient behavior of the vortex field can differ from that predicted by the acoustical solution.
Convention Paper 9617 (Purchase now)

P6-4 Compression Drivers’ Phasing Plugs—Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA
Most of compression drivers have phasing plugs with annular slots. Existing theories give recommendations for positioning of annular slots to suppress air resonances in compression chamber. However, interaction of diaphragm’s mechanical resonances with the compression chamber’s air resonances makes the problem very complex and a general theoretical solution hardly exists. New approach, based on certain empirical assumptions, is proposed and explained. New phasing plugs have slots of a “meandering” shape that provide effective “averaging” of high-frequency acoustical signal received from different parts of compression chamber. The method is applicable to drivers having domes, cones, and annular diaphragms. Other aspects of the design such as efficiency, compression ratio, and difference between air resonances in dome and annular compression chambers are discussed.
Convention Paper 9618 (Purchase now)

P7 - Spatial Audio—Part 2

Thursday, September 29, 4:00 pm — 6:00 pm (Rm 409B)

Chair:
Durand R. Begault, NASA Ames Research Center - Moffet Field, CA, USA; Charles M Salter Associates- Audio Forensic Center - San Francisco, CA USA

P7-1 Minimum-Audible Angles in Wave-Field Synthesis: A Case Study—Florian Völk, Technische Universität München - München, Germany; WindAcoustics UG (haftungsbeschränkt) - Windach, Germany
Wave-field synthesis aims at creating a predefined sound field within a restricted listening area. Implementing and maintaining a wave-field-synthesis system is rather costly, as a high number of loudspeakers must be set up meticulously and driven individually. Despite this effort, a physically perfect synthesis is not possible. This contribution addresses a critical and relevant benchmark of synthesis quality: perceptual directional resolution. The study was conducted with a typical living-room-scale system by measuring minimum-audible angles in the horizontal plane with different stimuli. The results indicate that the procedure provides a directional resolution close to that of real sound sources.
Convention Paper 9619 (Purchase now)

P7-2 Accurate Timbre and Frontal Localization without Head Tracking through Individual Eardrum Equalization of Headphones—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
The ear and brain perceive the vertical position of sounds by matching the timbre detected at the eardrum of a listener to timbre patterns built up by that individual over a long period of time. But the eardrum timbre depends dramatically on ear canal resonances between 1000 Hz and 6000 Hz that boost the pressure at the eardrum as much as 20 dB. These resonances are highly individual and are either eliminated or altered by headphones. In-head localization is the result. We have developed an app that uses an equal-loudness procedure to measure and restore the natural timbre. Accurate timbre and frontal localization are then perceived without head-tracking, and binaural recordings can be stunningly realistic.
Convention Paper 9620 (Purchase now)

P7-3 The Room-in-Room Effect and its Influence on Perceived Room Size in Spatial Audio Reproduction—Richard J. Hughes, University of Salford - Salford, Greater Manchester, UK; Trevor Cox, University of Salford - Salford, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Paul Power, University of Salford - Salford, Greater Manchester, UK
In spatial audio it can be desirable to give the impression of a target space (e.g., a church). Often the reproduction environment is assumed acoustically dead; in practice most listening spaces (e.g., domestic living rooms) introduce significant reflections. The result is a room-in-room effect: a complex interaction of target and reproduction environments. This study investigates the influence on perceived room size. A number of target spaces were measured and rendered for loudspeaker playback. Reproduction rooms were measured, with variations produced via impulse response adjustment. Dynamic binaural playback allowed different target and reproduction room combinations, with participants judging the size of environment being reproduced. Results indicate the more reverberant of the target and reproduction rooms is most commonly heard.
Convention Paper 9621 (Purchase now)

P7-4 Compressing Higher Order Ambisonics of a Personal Stereo Soundfield—Panji Setiawan, Member IEEE; Wenyu Jin, Member IEEE - Wellington, NZ
In this work we propose an approach to encode the multizone soundfield within the desired region that features a so-called bright zone with stereo sound effects based on higher order ambisonics (HOA) formats. We decompose the B-format signals for the complex multizone soundfield into the coefficients of a formulated planewave expansion. The multizone soundfield B-format signals are then directly compressed using state-of-the-art audio codecs. The results confirm the effectiveness of this HOA based multizone soundfield encoding. A significant reduction on the compression rate of the desired multizone soundfield with sufficient accuracy can be achieved by quantitatively analyzing the reproduction performance.
Convention Paper 9622 (Purchase now)

P8 - Transducers—Part 4

Friday, September 30, 9:00 am — 10:30 am (Rm 403A)

Chair:
D.B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA

P8-1 Use of Ground-Plane Constant Beamwidth Transducer (CBT) Loudspeaker Line Arrays for Sound Reinforcement—D.B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA
Ground-plane circular-arc CBT line arrays with wide horizontal coverage offer a very viable, high performance, simple, and thrifty alternative to the usual sound reinforcement setup where loudspeakers are elevated or hung overhead. Due to the broadband constant beamwidth/directivity/coverage characteristics and narrow vertical coverage of the CBT array, the ground-plane version offers a number of strong performance and operational advantages even when they are located on stage behind the performers. Among these are: even coverage, minimal front-back variation in sound level, flat-energy response, less energy directed upwards towards ceiling, improved intelligibility, less prone to feedback, and greater performer freedom to move around on stage. In addition, these arrays minimize the use of stage monitors, require minimal installation voicing and on-site equalization adjustments, and result in a much simpler system, i.e., fewer speakers, fewer power amps, and fewer processing channels.
Convention Paper 9623 (Purchase now)

P8-2 Design of Free-Standing Constant Beamwidth Transducer (CBT) Loudspeaker Line Arrays for Sound Reinforcement—D.B. (Don) Keele, Jr., DBK Associates and Labs - Bloomington, IN, USA
This paper presents design guidelines for choosing the parameters of a free-standing CBT line array including its physical height, circular arc angle, location, and downward pitch angle to appropriately cover a single 2D straight-line audience sound-reinforcement listening region with direct sound. These parameters and conditions include: (1) array circular-arc angle and its associated beamwidth, (2) array height and low-frequency beamwidth control limit, (3) array mounting location that includes its height and setback from the front of the seating plane, and (4) the array’s on-axis aiming location and associated downward pitch angle. These parameters are particularly easy to determine in advance for a CBT line array because of the extreme uniformity of its sound field with both frequency and distance, and its inherent constant-directivity characteristics. This paper describes a design scenario that allows the designer to easily choose these system parameters to optimize the direct-field coverage in the prescribed straight-line seating region while minimizing the use of sound-system design and prediction software. The design technique forces the SPL at the front and rear of the listening region to be equal by aiming the array at the rear of the listening region and then choosing its beamwidth (and its associated off-axis rolloff) to provide this front-rear SPL equality. The SPL and frequency response at intermediate points of the covered region are then set by the inherent well-behaved off-axis rolloff of the CBT array.
Convention Paper 9624 (Purchase now)

P8-3 Constant Coverage Line Arrays Using Passive Components for Beamforming—Douglas J. Button, Harman International - Northridge, CA USA
The work here within describes a cost effective method for beamforming utilizing passive components in a transmission line architecture to provide successive amounts of group delay from the middle to the ends of the array. The method also provides amplitude shading and some frequency shading that works to form a hybrid of several traditional methods. The method also provides a simple and straight forward way to change the shape (width) of the beam with changes in the passive network. The resulting network is very cost effective and rivals the performance of a multichannel DSP-based beamformer.
Convention Paper 9625 (Purchase now)

P9 - Semantic Audio & Sonification

Friday, September 30, 9:00 am — 10:30 am (Rm 409B)

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P9-1 The Audio Definition Model—A Flexible Standardized Representation for Next Generation Audio Content in Broadcasting and Beyond—Simone Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; David Marston, BBC R&D - London, UK; Scott Norcross, Dolby Laboratories - San Francisco, CA, USA
Audio for broadcasting around the world is evolving towards new immersive and personalized audio formats, which require accompanying metadata along with the audio essence. This paper introduces the Audio Definition Model (ADM), as specified in Recommendation ITU-R BS.2076 (and EBU Tech 3364), which is an XML-based generalized metadata model that can be used to define the required metadata. The ADM is able to describe channel-based, object-based, and scene-based audio in a formalized way. The ADM can be incorporated into RIFF/WAV-based files, such as BW64 (Recommendation ITU-R BS.2088), and can therefore be deployed in RIFF/WAV-based applications, handling immersive content, while maintaining compatibility to legacy content. This allows for program production and exchange of audio programs in these new audio formats.
Convention Paper 9626 (Purchase now)

P9-2 Development of Semantic Scales for Music Mastering—Esben Skovenborg, TC Electronic - Risskov, Denmark
Mastering is the last stage in a music production, and entails modifications of the music's spectral and dynamic properties. This paper presents the development of a set of semantic scales to characterize the (change in) perceptual properties of the sound associated with mastering. An experiment was conducted with audio engineers as subjects. Verbal elicitation and refinement procedures resulted in a list of 30 attributes. Next, 70 unmastered music segments were rated on scales corresponding to the attributes. Based on clustering and statistics of the responses, groups of similar and opposite attributes were formed. The outcome was a set of seven bipolar semantic scales. These scales could be used in semantic differentiation, to rate typical alterations of sound caused or desired by mastering.
Convention Paper 9627 (Purchase now)

P9-3 A Hierarchical Sonification Framework Based on Convolutional Neural Network Modeling of Musical Genre—Shijia Geng, University Of Miami - Coral Gables, FL, USA; Gang Ren, University of Miami - Coral Gables, FL, USA; Mitsunori Ogihara, University of Miami - Coral Gables, FL, USA
Convolutional neural networks have satisfactory discriminative performances for various music-related tasks. However, the models are implemented as “black boxes” and thus their processed representations are non-transparent for manual interactions. In this paper, a hierarchical sonification framework with a musical genre modeling module and a sample-level sonification module has been implemented for aural interaction. The modeling module trains a convolutional neural network from musical signal segments with genre labels. Then the sonification module performs sample-level modification according to each convolutional layer, where lower sonification levels produce auralized pulses and higher sonification levels produce audio signals similar to the input musical signal. The usage of the proposed sonification framework is demonstrated using a musical stylistic morphing example.
Convention Paper 9628 (Purchase now)

AVAR Paper Session 1: Sound Localization in 3D Space

Friday, September 30, 9:45 am — 11:15 am (Rm 409A)

Moving Virtual Source Perception in 3D Space—Sam Hughes, University of York - York, UK; Gavin Kearney, University of York - York, UK
This paper investigates the rendering of moving sound sources in the context of real-world loudspeaker arrays and virtual loudspeaker arrays for binaural listening in VR experiences. Near Field compensated Higher Order Ambisonics (HOA) and Vector Base Amplitude Panning (VBAP) are investigated for both spatial accuracy and tonal coloration with moving sound source trajectories. A subjective listening experiment is presented over 6, 26, and 50 channel real and virtual spherical loudspeaker configurations to investigate accuracy of spatial rendering and tonal effects. The results show the applicability of different degrees of VBAP and HOA to moving source rendering and illustrate subjective similarities and differences to real and virtual loudspeaker arrays. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Disparity in Horizontal Correspondence of Sound and Source Positioning: The Impact on Spatial Presence for Cinematic VR—Angela McArthur, BBC R&D - London, UK; Queen Mary University of London - London, UK
This study examines the extent to which disparity in azimuth location between a sound cue and image target can be varied in cinematic virtual reality (VR) content, before presence is broken. It applies disparity consistently and inconsistently across five otherwise identical sound-image events. The investigation explores spatial presence, a sub-construct of presence, hypothesizing that consistently applied disparity in horizontal audio-visual correspondence elicits higher tolerance before presence is broken, than inconsistently applied disparity. Guidance about the interactions of subjective judgments and spatial presence for sound positioning is needed for non-specialists to leverage VR’s spatial sound environment. Although approximate compared to visual localization, auditory localization is paramount for VR: it is lighting condition-independent, omnidirectional, not as subject to occlusion, and creates presence. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Lateral Listener Movement on the Horizontal Plane (Part 2): Sensing Motion through Binaural Simulation in a Reverberant Environment—Matthew Boerum, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT); Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada
In a multi-part study, first-person horizontal movement between two virtual sound source locations in an auditory virtual environment (AVE) was investigated by evaluating the sensation of motion as perceived by the listener. A binaural cross-fading technique simulated this movement while real binaural recordings of motion were made as a reference using a motion apparatus and mounted head and torso simulator (HATS). Trained listeners evaluated the sensation of motion among real and simulated conditions in two opposite environment-dependent experiments: Part 1 (semi-anechoic), Part 2 (reverberant). Results from Part 2 were proportional to Part 1, despite the presence of reflections. The simulation provided the greatest sensation of motion again, showing that binaural audio recordings present less sensation of motion than the simulation. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P10 - Cinema Sound & Forensic Audio

Friday, September 30, 10:45 am — 12:15 pm (Rm 409B)

Chair:
Brian McCarty, Coral Sea Studios Pty. Ltd - Clifton Beach, QLD, Australia

P10-1 Wideband Audio Recordings of Gunshots: Waveforms and Repeatability—Rob Maher, Montana State University - Bozeman, MT, USA; Tushar Routh, Montana State University - Bozeman, MT, USA
For the purposes of audio forensics research we have obtained multichannel acoustical recordings of gunshots under controlled conditions for several firearms. The recordings are made using an elevated platform and an elevated spatial array of microphones to provide quasi-anechoic directional recordings of the muzzle blast. The consistency and repeatability of gunshot sounds is relevant to many areas of forensic analysis. This paper includes a description of the recording process and a summary comparison of the acoustical waveforms obtained from ten successive shots by the same firearm by an experienced marksman. Practical examples and applications are presented.
Convention Paper 9634 (Purchase now)

P10-2 Integration of CGI Information on Audio Post Production—Nuno Fonseca, ESTG/Polytechnic Institute of Leiria - Leiria, Portugal; Sound Particles - Leiria, Portugal
Although CGI is a common tool in cinema, for both VFX shots and animation features, all that 3D information is disregarded on audio post production, which usually only uses the final image as reference. This paper presents a workflow that uses CGI information to help audio post-production work. Working on top of “Sound Particles” software, a 3D CGI-like software for audio applications currently used at major Hollywood studios, CGI information is used to automatically control several audio parameters (volume, 3D position, Doppler, etc.), while maintaining full creativity freedom.
Convention Paper 9633 (Purchase now)

P10-3 Intelligibility of Cinema & TV Sound Dialogue—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
In recent years there has been a significant increase in the number of complaints concerning the dialogue intelligibility of both movie sound tracks and TV productions—both in Europe and the USA. The paper reviews the background to dialogue intelligibility and looks at a number of mechanisms that may be responsible for the growing trend of dissatisfaction. The transmission chain is reviewed and new measurements and data concerning domestic listening conditions are presented. The results of a pilot measurement program show that in-situ frequency response of the TV systems, operated by many domestic listeners, is far from ideal with response variations of 10–15 dB being common. Unique Speech Transmission Index (STI) and Clarity data are presented that suggest that the room acoustic conditions of the listening environment should not, in themselves, significantly degrade the received signal.
Convention Paper 9632 (Purchase now)

P11 - Acoustics

Friday, September 30, 10:45 am — 12:15 pm (Rm 403A)

Chair:
Alejandro Bidondo, Universidad Nacional de Tres de Febrero - UNTREF - Caseros, Buenos Aires, Argentina

P11-1 The Influence of Discrete Arriving Reflections on Perceived Intelligibility and Speech Transmission Index Measurements—Ross Hammond, University of Derby - Derby, Derbyshire, UK; Peter Mapp Associates - Colchester, UK; Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK; Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Gand Concert Sound - Elk Grove Village, IL, USA
The most widely used objective intelligibility measurement method, the Speech Transmission Index (STI), does not completely match the highly complex auditory perception and human hearing system. Investigations were made into the impact of discrete reflections (with varying arrival times and amplitudes) on STI scores, subjective intelligibility, and the subjective “annoyance factor.” This allows the effect of comb filtering on the modulation transfer function matrix to be displayed, as well as demonstrates how the perceptual effects of a discrete delay cause subjective “annoyance,” that is not necessarily mirrored by STI. This work provides evidence showing why STI should not be the sole verification method within public address and emergency announcement systems, where temporal properties also need thoughtful consideration.
Convention Paper 9629 (Purchase now)

P11-2 Spatial Stability of the Frequency Response Estimate and the Benefit of Spatial Averaging—Aki Mäkivirta, Genelec Oy - Iisalmi, Finland; Thomas Lund, Genelec Oy - Iisalmi, Finland
In-room estimates of loudspeaker responses at the listening location are typically taken either at one microphone location, replacing the listener with a microphone, or averaging in space, at multiple microphone locations at and relatively close to the listening location. In-frequency averaging can attenuate the locality of the frequency response features in mid and high frequencies. In-space averaging extracts the common frequency response features visible in all the measurement positions. Spatial weighting combined with frequency domain averaging can increase the stability of the frequency response estimate for the features relevant for the subjective compensation of the sound color at the listening location. Spacing out the spatial average measurement points affects the nature of the spatial average and the focus on the frequency response features common to the measurement points. The spatial averaging points used in taking a measurement should be chosen based on the intention of the room equalization.
Convention Paper 9630 (Purchase now)

P11-3 A New and Simple Method to Define the Time Limit between the Early and Late Sound Fields—Alejandro Bidondo, Universidad Nacional de Tres de Febrero - UNTREF - Caseros, Buenos Aires, Argentina; Javier Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Sergio Vazquez, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Mariano Arouxet, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina; Germán Heinze, Universidad Nacional de Tres de Febrero - Buenos Aires, Argentina
In room acoustics the crossover time is defined as the transition period between a clearly deterministic regime (early sound field), to a stochastic, memoryless one (reverberant tale or late sound field). Several studies presented different calculation methods applied to impulse responses like running gaussianity test, running kurtosis, eXtensible Fourier Transform, and Matching Pursuit. No clear or “binary moment” was found analyzing the room’s responses, so another new, simple, and robust method is proposed to determine the limiting instant between both sound fields using the autocorrelation function. Conclusions also include the analysis of several rooms and comments on the progressive change of the room’s system behavior.
Convention Paper 9631 (Purchase now)

AVAR Paper Session 2: Real-World Case Studies—Part 1

Friday, September 30, 11:30 am — 12:30 pm (Rm 409A)

Object-Based 3D Audio Production for Virtual Reality Using the Audio Definition Model—Chris Pike, BBC Research and Development - Salford, Greater Manchester, UK; University of York - Heslington, York, UK; Richard Taylor, BBC Research & Development - Salford, Greater Manchester, UK; Tom Parnell, BBC Research & Development - Salford, UK; Frank Melchior, BBC Research and Development - Salford, UK
This paper presents a case study of the production of a virtual reality experience with object-based 3D audio rendering using professional tools and workflows. An object-based production was created using a common digital audio workstation with real-time dynamic binaural sound rendering and visual monitoring of the scene on a head-mounted display. The Audio Definition Model is a standardized meta-data model for representing audio content including object-based, channel-based, and scene-based 3D audio. Using the Audio Definition Model the object-based audio mix could be exported to a single WAV file. Plug-ins were built for a game engine in which the virtual reality application and the graphics were authored to allow import of the object-based audio mix and custom dynamic binaural rendering.

Virtually Replacing Reality: Sound Design and Implementation for Large Scale Room Scale VR Experiences—Sally-Anne Kellaway, Zero Latency VR - Melbourne, Australia
Audio for Virtual Reality (VR) presents a significant array of challenges and augmentations to the traditional requirements of sound designers employed within the video games industry. The change in perspective and embodiment of the player requires the employment of additional tools and consideration of object size, spacing and spatial design as a more significant part of the sound design process. The author presents her approach to these tasks from the perspective of developing audio for the large-scale Room Scale video game developer Zero Latency. Focusing on the design considerations and processes required in this unique medium, the content of this presentation is designed to give insight in to this large-scale version of VR technology. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P12 - Applications in Audio—Part 1

Friday, September 30, 1:30 pm — 3:00 pm (Rm 403A)

Chair:
Bob Schulein, ImmersAV Technology - Schaumburg, IL, USA

P12-1 Development of Shotgun Microphone with Extra-Long Leaky Acoustic Tube—Yo Sasaki, NHK Science & Technology Research Laboratories - Kinuta, Setagaya-ku, Tokyo, Japan; Toshiyuki Nishiguchi, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kazuho Ono, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Takeshi Ishii, Sanken Microphone Co. Ltd. - Tokyo, Japan; Yutaka Chiba, Sanken Microphone Co. Ltd. - Tokyo, Japan; Akira Morita, Sanken Microphone Co. Ltd. - Tokyo, Japan
A shotgun microphone having sharper directivity than a conventional microphone has been studied to capture distant sound clearly. The directivity of the shotgun microphone is known to become sharper with a longer leaky acoustic tube. Thus, we developed a prototype microphone that uses a 1-m-long leaky acoustic tube, which is longer than the conventional one. We also conducted a numerical simulation based on an acoustical distribution constant circuit to develop such a shotgun microphone. The measurement results of the prototype microphone were in fairly good agreement with the simulation results, and they showed that the directivity is very narrow.
Convention Paper 9639 (Purchase now)

P12-2 Non-Traditional Noise and Distortion Sources in Headphones and Earbuds and Their Impact on System Performance and Ear Fatigue—Dennis Rauschmayer, REVx Technologies/REV33 - Austin, TX, USA
Non-Traditional dynamic noises and distortions that couple into earbuds and headphones are measured and analyzed. The effective noise and distortion level relative to the signal level for a number of commercially available headphones and earbuds are reported. The impact of this noise and distortion on listener experience is quantified and discussed. Finally, the impact on listener fatigue is quantified. Results presented show that the non-traditional sources studied are significant, audible and that they present a fundamental limitation to the performance of many earbuds and headphones. In many cases, these sources reduce the effective signal to noise ratio (SNR) of the audio system into the 30-40 dB range, well below the SNR level that would result from by a 0.1% total harmonic distortion (THD) system and even below that of a 1% THD system. In addition to limiting system performance, the non-traditional noise and distortion are found to increase ear fatigue experienced by the user when ear fatigue tests are conducted with identical audio levels. Otoacoustic Emission (OAE) results from tested subjects show degradation that is significantly greater when the impairments are present vs. when they are mitigated.
Convention Paper 9640 (Purchase now)

P12-3 In Situ Subjective and Objective Acoustic Seal Performance Tests for Insert Earphones—Bob Schulein, ImmersAV Technology - Schaumburg, IL, USA; Brian Fligor, Lantos Technologies Inc. - Wakefield, MA, USA
Insert–type earphones are unique in that when tightly sealed in an ear canal, they have the potential to deliver sound down to 20 Hz and below. In practice however, obtaining an extended low frequency response is challenged by a difficulty for users to achieve an adequate seal between the ear canal and the insert earphone [1], [2], [3]. Achieving such a seal is not intuitive to users new to insert earphones. Measurements made on actual earphones show that a leak as small as .5 mm in diameter and 2.5 mm long can result in a reduction of bass at 50 Hz of approximately 15 dB. Such a loss results when the leak is present, since the actual volume seen by the transducer at low frequencies is considerably greater than for higher frequencies, where the impedance of the leak becomes much higher in value. The result is a very perceptible and often disappointing reduction in bass performance. This paper describes two methods by which a user can confirm the level of acoustic seal obtained for a given combination of transducer, ear tip / ear mold and ear canal by subjective and objective means. The subjective method is based on an experimental observation that even with a poor seal the output of insert earphones at 500 Hz, is quite independent of seal quality. By consequently subjectively comparing the output of a recorded tone at 500 Hz to one at 50 Hz adjusted to be equal in perceived level based on ISO 226:2003 equal-loudness contours, one can confirm a good seal when the levels tend to subjectively match. The objective method involves fitting an insert earphone with a miniature flat pressure microphone into the sound port, and observing the “in canal” frequency response by means of spectral analysis, while the earphone is in use. As the seal quality improves the measured low frequency response approaches that of the coupler response of the earphone as measured with a high quality seal.
Convention Paper 9641 (Purchase now)

P13 - Perception & Forensic Audio

Friday, September 30, 1:30 pm — 3:00 pm (Rm 403B)

P13-1 Determining the Muzzle Blast Duration and Acoustical Energy of Quasi-Anechoic Gunshot Recordings—Tushar Routh, Montana State University - Bozeman, MT, USA; Rob Maher, Montana State University - Bozeman, MT, USA
Investigation of gunshot waveforms largely includes analyzing the muzzle blast. Generated by the combustion of gunpowder immediately after firing, these brief duration directional shock waves travel outward in all directions at the speed of sound. Features of these waveforms are analyzed to identify characteristics of a particular shot, for example, the combination of firearm type, ammunition, and orientation. This paper includes measured muzzle blast durations for several common firearms and calculation of the total acoustical energy during the muzzle blast period.
Convention Paper 9635 (Purchase now)

P13-2 Analysis and Localization for ENF Signals in the Tokyo Metropolitan Area—Akira Nishimura, Tokyo University Information Sciences - Chiba-shi, Japan
This paper addresses the first investigation and analysis of electronic network frequency (ENF) signals in the Tokyo metropolitan area, Japan. Electric power signals are recorded directly from a clean power line at seven different sites simultaneously for several weeks. Instantaneous frequency measurements based on time-domain analytic signals are performed on bandpass-filtered electric power signals, therein providing higher temporal resolution compared with the conventional FFT-based method combined with quadratic interpolation for extracting ENFs. Spectro-temporal analysis of the fluctuations of the ENF signals reveals that temporal correlations between the fluctuation energy in the frequency range of 0.4 Hz to 1.0 Hz obtained at different sites are inversely correlated to the geographical distances between the sites. The similarities of the spectro-temporal ENFs obtained from different sites show generally higher correlations with the geographical distances than the similarities of high-pass-filtered ENFs. Location estimation using linear regression between the similarities of spectro-temporal ENFs and the geographical distances of the anchor sites predicts the location of a target site with a mean prediction error of approximately 20 to 30 km.
Convention Paper 9636 (Purchase now)

P13-3 Does Environmental Noise Influence Preference of Background-Foreground Audio Balance?—Tim Walton, Newcastle University - Newcastle upon Tyne, UK; BBC Research and Development - Salford, UK; Michael Evans, BBC Research & Development - Salford, Greater Manchester, UK; David Kirk, Newcastle University - Newcastle-Upon-Tyne, UK; Frank Melchior, BBC Research and Development - Salford, UK
With an increase in the consumption of mobile media, audio is being consumed in a range of contexts. The literature describes several techniques to improve the experience of mobile listening by utilizing information about the environmental noise of the listening environment, however, none of the previous work utilizes object-based audio. This paper investigates the possibility of using object-based audio to improve the experience of mobile listening by investigating whether environmental noise influences preference of background-foreground audio balance. A listening test was carried out in which listeners were asked to adjust the background-foreground balance to their preference while in the presence of reproduced environmental noise. It was found that environmental noise can have a significant effect on preferred background-foreground balance.
Convention Paper 9637 (Purchase now)

P13-4 Evaluation of a Perceptually-Based Model of “Punch” with Music Material—Steven Fenton, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
This paper evaluates a perceptually motivated objective model for the measurement of “punch” in musical signals. Punch is a perceptual attribute that is often used to characterize music that conveys a sense of dynamic power or weight. A methodology is employed that combines signal separation, onset detection, and low level parameter measurement to produce a perceptually weighted “punch” score. The model is evaluated against subjective scores derived through a forced pairwise comparison listening test using a wide variety of musical stimuli. The model output indicates a high degree of correlation with the subjective scores. Correlation results are also compared to other objective models such as Crest Factor, Inter-Band-Ratio (IBR), Peak-to-Loudness Ratio (PLR), and Loudness Dynamic Range (LDR).
Convention Paper 9638 (Purchase now)

AVAR Paper Session 3: Real-World Case Studies—Part 2

Friday, September 30, 2:00 pm — 3:00 pm (Rm 409A)

Crafting Cinematic High End VR Audio for Etihad Airways—Ola Björling, MediaMonks - New York, NY, USA; Eric Thorsell, MediaMonks
MediaMonks were approached by Etihad Airways via their ad agency, The Barbarian Group, to create a Virtual Reality experience taking place aboard their Airbus A380, the worlds largest and most luxurious non-private airplane. Challenges included capturing audio including dialogue aboard the real plane, crafting an experience that encourages repeated viewing, and combining a sense of truthful realism with a sense of dream-like luxury without relying on a musical score, all in a head tracked spatialized mix. Artistic conventions around non-diegetic sound and their psychological impact in VR also required consideration.

Creating an Immersive 360°-A/V Concert Experience at the 50th Montreux Jazz Festival Using Real-time Room Simulation—Sönke Pelzer, Audioborn GmbH - Cologne, Germany; Dirk Schröder, Audioborn GmbH - Cologne, Germany; Fabian Knauber, Audioborn GmbH
The Montreux Jazz Festival is the second largest jazz festival in the world. Since the beginning 50 years ago, all concerts have been recorded for the Montreux Jazz Archive, a unique treasure and the largest collection of live music, declared Unesco World Heritage. Following the vision of the deceased founder Claude Nobs, who always pushed the boundaries by applying latest recording technologies, this year’s 50th anniversary of the festival introduced capturing of 3D-audio and 360° stereoscopic video. Using a virtual reality camera, ambisonics microphones, as well as multitrack audio recording with 3D post-processing, an immersive capture and reproduction was achieved. This contribution highlights challenges, experiences and solutions of the preparation, recording, post-processing and release of this immersive production.

AVAR Paper Session 4: Capture, Rendering, and Mixing for VR—Part 1

Friday, September 30, 3:15 pm — 5:15 pm (Rm 409A)

Efficient, Compelling, and Immersive VR Audio Experience Using Scene Based Audio/Higher Order Ambisonics—Shankar Shivappa, Qualcomm Technologies Inc. - San Diego, CA, USA; Martin Morrell, Qualcomm Technologies Inc. - San Diego, CA, USA; Deep Sen, Qualcomm Technologies Inc. - San Diego, CA, USA; Nils Peters, Qualcomm, Advanced Tech R&D - San Diego, CA, USA; S. M. Akramus Salehin, Qualcomm Technologies Inc. - San Diego, CA, USA
Scene-based audio (SBA) also known as Higher Order Ambisonics (HOA) combines the advantages of object-based and traditional channel-based audio schemes. It is particularly suitable for enabling a truly immersive (360, 180) VR audio experience. SBA signals can be efficiently rotated and binauralized. This makes realistic VR audio practical on consumer devices. SBA also provides conductive mechanisms for acquiring live soundfields for VR. MPEG-H is a newly adopted compression standard that can efficiently compress HOA for transmission and storage purposes. It is the only known standard that provides compressed HOA end-to-end. Our paper describes a practical end-to-end chain for SBA/HOA based VR audio. Given its advantages over other formats, SBA should be “the format of choice” for a compelling VR audio experience. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones—Joseph G. Tylka, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
A method is presented for soundfield navigation through estimation of the spherical harmonic coefficients (i.e., the higher-order ambisonics signals) of a soundfield at a position within an array of two or more ambisonics microphones. An existing method based on blind source separation is known to suffer from audible artifacts, while an alternative method, in which a weighted average of the ambisonics signals from each microphone is computed, is shown to necessarily introduce comb-filtering and degrade localization for off-center sources. The proposed method entails computing a regularized least-squares estimate of the soundfield at the listening position using the signals from the nearest microphones, excluding those that are nearer to a source than to the listening position. Simulated frequency responses and predicted localization errors suggest that, for interpolation between a pair of microphones, the proposed method achieves both accurate localization and minimal spectral coloration when the product of angular wavenumber and microphone spacing is less than twice the input expansion order. It is also demonstrated that failure to exclude from the calculation those microphones that are nearer to a source than to the listening position can significantly degrade localization accuracy. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Immersive Audio Rendering for Interactive Complex Virtual Architectural Environments—Imran Muhammad, Hanyang University - Seoul, Korea; Jin Yong Jeon, Hanyang University - Seoul, Korea; Acoustics Authorized - Seoul, Korea
In this study we investigate methods for sound propagation in virtual complex architectural environments for spatialized audio rendering to use in immersive virtual reality (VR) scenarios. During the last few decades, sound propagation models have been designed and investigated for complex building structures using geometrical approach (GA) and hybrid techniques. For sound propagation, it is required to design fast simulation tools to incorporate a sufficient number of dynamically moving sound sources, room acoustical properties, and reflections and diffraction from interactively changing surface elements in VR environments. Using physically based models, we achieved a reasonable trade-off between sound quality and system performance. Furthermore, we describe the sound rendering pipeline into a virtual scene to simulate virtual environment. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Immersive Audio for VR—Joel Susal, Dolby Laboratories - San Francisco, CA, USA; Kurt Krauss, Dolby Germany GmbH - Nuremberg, Germany; Nicolas Tsingos, Dolby Labs - San Francisco, CA, USA; Marcus Altman, Dolby Laboratories - San Francisco, CA, USA
Object based sound creation, packaging, and playback of content is now prevalent in the Cinema and Home Theater, delivering immersive audio experiences. This has paved the way for Virtual Reality sound where precision of sound is necessary for complete immersion in a virtual world. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P14 - Applications in Audio—Part 2

Friday, September 30, 3:15 pm — 4:15 pm (Rm 403A)

Chair:
Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada

P14-1 The Fender 5F6-A Bassman Circuit: A 21st Century Adaptation—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada
This investigation involves the design of a guitar amplifier conducive to low-volume environments, such as the home studio. Design goals were increased headroom, a relatively flat frequency response (in the context of classic American and British designs), and exceptional tone at low volume. This lead to the adaptation of the Fender 5F6-A circuit. Measurements of the completed unit are provided as well as assessment by guitarists. All design goals were met.
Convention Paper 9649 (Purchase now)

P14-2 Design of Efficient Sound Systems for Low Voltage Battery Driven Applications—Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Rien Oortgiesen, Merus Audio - Herlev, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark; Mikkel Høyerby, Merus Audio - Herlev, Denmark
The efficiency of portable battery driven sound systems is crucial as it relates to both the playback time and cost of the system. This paper presents design considerations when designing such systems. This include loudspeaker and amplifier design. Using a low resistance voice coil realized with rectangular wire one can boost the efficiency of the loudspeaker driver and eliminate the need of an additional power supply. A newly developed switching topology is described that is beneficial to near-idle efficiency (< 2 W), which is crucial for real audio applications in the consumer electronics space. A small sized sound system was implemented using the discussed design considerations. The amplifier efficiency performance was found to be very high with near-idle efficiency reaching a remarkably 88% at 2W. The average output SPL was estimated to be up to 90 dB in half spheric anechoic conditions. Measured results are compared with current state-of-art and shows a 14% points efficiency improvement.
Convention Paper 9650 (Purchase now)

P15 - Perception—Part 1

Friday, September 30, 3:15 pm — 4:45 pm (Rm 409B)

Chair:
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA

P15-1 In-Vehicle Audio System Distortion Audibility versus Level and Its Impact on Perceived Sound Quality—Steve Temme, Listen, Inc. - Boston, MA, USA; Patrick Dennis, Nissan Technical Center North America - Farmington Hills, MI, USA
As in-vehicle audio system output level increases, so too does audio distortion. At what level is distortion audible and how is sound quality perceived as level increases? Binaural recordings of musical excerpts played through the in-vehicle audio system at various volume levels were made in the driver’s position. These were adjusted to equal loudness and played through a low distortion reference headphone. Listeners ranked both distortion audibility and perceived sound quality. The distortion at each volume level was also measured objectively using a commercial audio test system. The correlation between perceived sound quality and objective distortion measurements are discussed.
Convention Paper 9651 (Purchase now)

P15-2 Effect of Presentation Method Modifications on Standardized Listening Tests—Julian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan; Tore Stegenborg-Andersen, DELTA SenseLab - Hørsholm, Denmark; Nick Zacharov, DELTA SenseLab - Hørsholm, Denmark; Jesper Ramsgaard, Widex - Lynge, Denmark
This study investigates the impact of relaxing presentation methods on listening tests by comparing results from two identical listening experiments carried out on two countries and comprising two presentation methods: the ITU-T P.800 Absolute Category Rating (ACR) recommendation and a modified version of it where assessors had more control on the reproduction of the samples. Compared with the standard method, test duration was reduced on average 37% in the modified version. No significant effects of the method used on the ratings of codecs were found, but a significant effect of site on ratings and duration were found. We hypothesize that in the latter case, cultural differences and instructions to the assessors could explain these effects.
Convention Paper 9652 (Purchase now)

P15-3 Can We Hear The Difference? Testing the Audibility of Artifacts in High Bit Rate MP3 Audio—Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada; George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Martha de Francisco, McGill University - Montreal, QC, Canada; CIRMMT Centre for Interdisciplinary Research in Music Media and Technology - Montreal, QC, Canada
A new type of listening test for testing very small impairments in audio systems is proposed using audio engineer participants and a mix matching task based approach. A pilot test was conducted in an attempt to reveal perceptual differences between WAV (44.1 k-16 bit) and MP3 (256 kbps) encodings of the same musical material. The participant mixing data was analyzed and trends in the data generally coincide with the hypotheses. Several factors were also found that can influence participant accuracy and speed in completing this type of test: age, experience (production, musical, listening test), and preferred genre for audio work.
Convention Paper 9653 (Purchase now)

P16 - Signal Processing

Friday, September 30, 3:15 pm — 4:45 pm (Rm 403B)

P16-1 Analysis of Binaural Features for Supervised Localization in Reverberant Environments—Jiance Ding, Chinese Academy of Science - Beijing, China; University of Chinese Academy of Sciences - Beijing, China; Jie Wang, Guangzhou University - Guangzhou, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Recent research on supervised binaural sound source localization methods shows that the performance is promising even in reverberant environments when the training and testing environments can match perfectly. However, these supervised methods may still suffer somewhat performance degradation when the intensity of the reverberation increases markedly. This paper studies the impact of reverberation on binaural features theoretically. This study reveals that reverberation is a major factor in reducing the accuracy of supervised binaural localization. Accordingly, we use a binaural dereverberation algorithm to reduce the effect of reverberation and thus to improve the performance of the existing supervised binaural localization methods. Experimental results demonstrate that dereverberation can improve the localization accuracy of these supervised binaural localization methods in reverberant environments.
Convention Paper 9642 (Purchase now)

P16-2 Acoustic Echo Cancellation for Asynchronous Systems Based on Resampling Adaptive Filter Coefficients—Yang Cui, Chinese Academy of Sciences - Beijing, China; Univeresity of Chinese Academy of Sciences - Beijing, China; Jie Wang, Guangzhou University - Guangzhou, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
In asynchronous systems, most of traditional acoustic echo cancellation (AEC) algorithms couldn’t track echo path correctly because of the asynchronization of D/A and A/D converters, which can reduce the performance dramatically. Based on multirate digital signal processing theory, this paper proposes to solve this problem by resampling adaptive filter coefficients (RAFC), where the adaptive filter coefficients are updated by normalized least mean square (NLMS) algorithm with a variable step control method. The simulation results indicate that the proposed can estimate the clock offset quite accurately. Objective test results also show that the proposed RAFCNLMS is much better than the previous adaptive sampling rate correction algorithm in terms of the convergence rate and clock offset tracking performance.
Convention Paper 9643 (Purchase now)

P16-3 Single-Channel Speech Enhancement Based on Reassigned Spectrogram—Jie Wang, Guangzhou University - Guangzhou, China; Chengcheng Yang, Guangzhou University - Guangzhou, China; Chunliang Zhang, Guangzhou University - Guangzhou, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China
Most of the traditional a priori SNR estimators, such as the decision-directed approach and its improved versions, only consider the correlation of adjacent frames. Whereas, it is well-known that voiced speech is a typical harmonic signal that results in strong correlation of harmonics. We can expect that the a priori SNR estimator can be improved if the correlation of adjacent frames and harmonics can be used simultaneously. With this motivation, we propose to use the reassigned spectrogram (RS) to control the forgetting factor of the decision-directed approach. Experimental results indicate that the proposed RS-based SNR estimator is much better than the traditional decision-directed approach.
Convention Paper 9644 (Purchase now)

P16-4 The a Priori SNR Estimator Based on Cepstral Processing—Jie Wang, Guangzhou University - Guangzhou, China; Guangquan Yang, Guangzhou University - Guangzhou, China; JingJing Liu, Guangzhou University - Guangzhou, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China
For single-channel speech enhancement systems, the a priori SNR is a key parameter for Wiener-type algorithms. The a priori SNR estimators can reduce the noise efficiently when the noise power spectral density (NPSD) can be estimated accurately. However, when the NPSD is overestimated/underestimated, the a priori SNR may lead to the speech distortion and the residual noise. To solve this problem, this paper proposes to estimate the a priori SNR based on cepstral processing, which not only can suppress harmonic speech components in the noisy speech segments, but also can reduce strong noise components in noise-only segments. Simulation results show that the proposed algorithm has better performance than the traditional DD and Plapous’s two-step algorithms.
Convention Paper 9645 (Purchase now)

P16-5 Quantitative Analysis of Masking in Multitrack Mixes Using Loudness Loss—Gordon Wichern, iZotope, Inc. - Cambridge, MA, USA; Hannah Robertson, iZotope - Cambridge, MA, USA; Aaron Wishnick, iZotope - Cambridge, MA, USA
The reduction of auditory masking is a crucial objective when mixing multitrack audio and is typically achieved through manipulation of gain, equalization, and/or panning for each stem in a mix. However, some amount of masking is unavoidable, acceptable, or even desirable in certain situations. Current automatic mixing approaches often focus on the reduction of masking in general, rather than focusing on particularly problematic masking. As a first step in focusing the attention of automatic masking reduction algorithms on problematic rather than known and accepted masking, we use psychoacoustic masking models to analyze multitrack mixes produced by experienced audio engineers. We measure masking in terms of loudness loss and present problematic masking as outliers (values above the 95th percentile) in instrument and frequency-dependent distributions.
Convention Paper 9646 (Purchase now)

P16-6 Log Complex Color for Visual Pattern Recognition of Total Sound—Stephen Wedekind, University of Missouri - St. Louis - St. Louis, MO, USA; P. Fraundorf, University of Missouri - St.Louis - St. Louis, MO, USA
While traditional audio visualization methods depict amplitude intensities vs. time, such as in a time-frequency spectrogram, and while some may use complex phase information to augment the amplitude representation, such as in a reassigned spectrogram, the phase data are not generally represented in their own right. By plotting amplitude intensity as brightness/saturation and phase-cycles as hue-variations, our complex spectrogram method displays both amplitude and phase information simultaneously, making such images canonical visual representations of the source wave. As a result, the original sound may be reconstructed (down to the original phases) from an image, simply by reversing our process. This allows humans to apply our highly-developed visual pattern recognition skills to complete audio data in new way.
Convention Paper 9647 (Purchase now)

P16-7 Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions—Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.; Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but the need is also encountered in applications such as automatic recognition of speech in movies for non-native speakers, for impaired users, and as a support for multimedia systems. This work contains an attempt to analyze speech recorded in various conditions. First, audio and video recordings of specially constructed list of words in English were prepared in order to perform dedicated audio and video analyses in the future stages of the research aiming at audio-visual speech recognition systems (AVSR) development. A dataset of audio-video recordings was prepared and examples of analyses are described in the paper.
Convention Paper 9648 (Purchase now)

P17 - Applications in Audio—Part 3

Friday, September 30, 5:00 pm — 6:30 pm (Rm 409B)

Chair:
Joshua D. Reiss, Queen Mary University of London - London, UK

P17-1 Measuring Frequency and Amplitude Modulation Effects in Cross-Modulation Distortion from Audio Amplifiers—Ronald Quan, Ron Quan Designs - Cupertino, CA, USA
In the SMPTE IM (Intermodulation) distortion test using 60 Hz and 7000 Hz signals, it is normally assumed that the IM distortion products form amplitude modulation (AM) sidebands, which are commonly measured with an envelope AM detector. However, with other IM test signals the output of the AM detector is minimized or close to zero, while instead producing FM (Frequency Modulation) distortion. This paper investigates testing for phase and frequency modulation from cross-modulation and intermodulation distortions in audio amplifiers. The cross modulation test signal includes a 3 kHz tone and a high frequency amplitude modulation signal. Alternatively the test signal may include the 3 kHz signal and two high frequency tones. Amplifiers with feedback are tested.
Convention Paper 9654 (Purchase now)

P17-2 Finite Element Simulation of Ring Radiators with Acoustic Filter—Lakshmikanth Tipparaju, Samsung Research America - Valencia, CA, USA; Allan Devantier, Samsung Research America - Valencia, CA, USA; Andri Bezzola, Samsung Research America - Valencia, CA USA
Ring radiator loudspeakers consist of a phase plug in front of the diaphragm and are typically used to create omnidirectional sound. Potential resonances between the diaphragm and the phase plug create a design challenge and put additional requirements on the equalizer to obtain a flat amplitude response. We present a finite element model to predict and mitigate the undesirable peaks and dips in the amplitude response of ring radiator loudspeakers. Simulations show that a properly designed acoustic filter can minimize resonant behavior between diaphragm and phase plug. A maximum peak attenuation of 12 dB was obtained using this method. We observe good correlation between simulations and experimental results.
Convention Paper 9655 (Purchase now)

P17-3 Android Based Mobile Application for Home Audio—Visual Localization to Benefit Alzheimer Patients—“Remember It”—Raul Rincón Flórez, Universidad de San Buenaventura - Bogota, Cundinamarca, Colombia; Christopher Vottela Pérez; Esteban Polania Gutierrez; Luis Felipe Rios Zamudio
This article targets the development of a Mobile application to benefit people affected by Alzheimer's Disease implementing visual resources like RGB Leds, using Arduino as a medium between Android and these. On the other hand, the implementation of the program “App inventor,” the Arduino algorithm programming and the circuit mapping.
Convention Paper 9656 (Purchase now)

AVAR Paper Session 5: Streaming Immersive Audio Content

Friday, September 30, 5:45 pm — 6:45 pm (Rm 409A)

Streaming Immersive Audio Content—Johannes Kares, Sennheiser - Vienna, Austria; Veronique Larcher, Sennheiser - Switzerland
“Immersion [...] is a perception of being physically present in a non-physical world.” It is critical to think about immersive audio for live music streaming because giving listeners the illusion of being transported to a different acoustic environment makes the experience of streaming much more real. In this paper we are describing various approaches to enable audio engineers to create immersive audio content for live streaming, whether using existing tools and network infrastructure and delivering static binaural audio, or getting ready for emerging tools and workflows for Virtual Reality streaming. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

An Augmented Reality Audio Live Network for Live Electroacoustic Music Concerts—Nikos Moustakas, Ionian University - Nikea, Greece; Andreas Floros, Ionian University - Corfu, Greece; Bill Kapralos, University of Ontario Institute of Technology - Ontario, Canada
Augmented reality audio (ARA) represents a well-established and widely investigated concept that typically relies on mixing of the real acoustic environment of a listener with a virtual one. In this work, we conceptually extend this legacy ARA framework, aiming to the increment of the synthesized acoustic field spatial scale primarily in the real world domain. Such an increment actually acts as a virtual wide-angle sound focuser of wide area acoustic fields. We demonstrate the above Augmented Reality Audio Network (ARAN) concept in terms of a live electroacoustic music concert. A subjective evaluation derived strong and secure indications that the proposed ARAN framework may represent a strong alternative potential of legacy ARA in the artistic and creative domain. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P18 - Perception—Part 2

Saturday, October 1, 9:00 am — 10:30 am (Rm 403A)

Chair:
Jason Corey, University of Michigan - Ann Arbor, MI, USA

P18-1 Hyper-Compression, Environmental Noise and Preferences for the Ear Bud Listening Experience—Robert W. Taylor, University of Newcastle - Callaghan, NSW, Australia; Luis Miranda, University of Sydney - Sydney, NSW, Australia
The notion that compressed music performs more effectively in automobiles as a consequence of the background noise present has been widely accepted and particularly relevant to classical music with a very large dynamic range. The environmental noise can act as a masking agent that can interrupt the listening experience when sections of the music fall below the noise level. Similarly, it is assumed that the hyper-compression of contemporary popular music fulfills a similar function when using ear bud headphones in noisy environments. This study examines this assumption and can find no evidence to support the practice. It is suggested that contemporary music most likely does not have a sufficiently large enough dynamic range regardless to support its use in this instance.
Convention Paper 9657 (Purchase now)

P18-2 Validation of a Virtual In-Ear Headphone Listening Test Method—Todd Welti, Harman International Inc. - Northridge, CA, USA; Sean Olive, Harman International - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA
Controlled, comparative double blind listening tests on different in-ear (IE) headphones are logistically impractical. One solution is to present listeners virtualized versions of the headphones through a high quality IE replicator headphone equalized to match their measured frequency responses. To test the accuracy of method, ten trained listeners evaluated the overall quality of both actual and virtualized versions of twelve different IE headphones binaurally recorded and reproduced through replicator headphone. The results show evidence that the virtualized headphones produce sound quality ratings that are similar to those produced by the actual headphones.
Convention Paper 9658 (Purchase now)

P18-3 The Physics of Auditory Proximity and its Effects on Intelligibility and Recall—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
Cutthroat evolution has given us seemingly magical abilities to hear speech in complex environments. We can tell instantly, independent of timbre or loudness, if a sound is close to us, and in a crowded room we can switch attention at will between at least three different simultaneous conversations. And we involuntarily switch attention if our name is spoken. These feats are only possible if, without conscious attention, each voice has been separated into an independent neural stream. We believe the separation process relies on the phase relationships between the harmonics above 1000 Hz that encode speech information, and the neurology of the inner ear that has evolved to detect them. When phase is undisturbed, once in each fundamental period harmonic phases align to create massive peaks in the sound pressure at the fundamental frequency. Pitch-sensitive filters can detect and separate these peaks from each other and from noise with amazing acuity. But reflections and sound systems randomize phases, with serious effects on attention, source separation, and intelligibility. This talk will detail the many ways ears and speech have co-evolved, and recent work on the importance of phase in acoustics and sound design.
Convention Paper 9659 (Purchase now)

AVAR Paper Session 6: Perceptual Consideration for VR/AR

Saturday, October 1, 9:30 am — 11:30 am (Rm 409A)

Spatial Auditory Feedback in Response to Tracked Eye Position—Durand R. Begault, NASA Ames Research Center - Moffet Field, CA, USA; Charles M Salter Associates- Audio Forensic Center - San Francisco, CA USA
Fixation of eye gaze toward one or more specific positions or regions of visual space is a desirable feature within several types of high-stress human interfaces, including vehicular operation, flight deck control, target acquisition, etc. It is therefore desirable to have a means to give spatial auditory feedback to a human in such a system about whether or not the gaze is specifically directed towards a desired position. Alternatively, it is desirable to use eye position as a means of controlling a device that provides auditory feedback so that there is a correspondence between eye position and control voltages that manipulate aspects of an auditory cue that includes spatial position, pitch and/or timbre. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Perceptual Weighting of Binaural Information: Toward an Auditory Perceptual "Spatial Codec" for Auditory Augmented Reality—G. Christopher Stecker, Vanderbilt University School of Medicine - Nashville, TN, USA; Anna Diedesch, Vanderbilt University School of Medicine - Nashville, TN, USA
Auditory augmented reality (AR) requires accurate estimation of spatial information conveyed in the natural scene, coupled with accurate spatial synthesis of virtual sounds to be integrated within it. Solutions to both problems should consider the capabilities and limitations of the human binaural system, in order to maximize relevant over distracting acoustic information and enhance perceptual integration across AR layers. Recent studies have measured how human listeners integrate spatial information across multiple conflicting cues, revealing patterns of “perceptual weighting” that sample the auditory scene in a robust but spectrotemporally sparse manner. Such patterns can be exploited for binaural analysis and synthesis, much as time-frequency masking patterns are exploited by perceptual audio codecs, to improve efficiency and enhance perceptual integration. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

DeepEarNet: Individualizing Spatial Audio with Photography, Ear Shape Modeling, and Neural Networks—Shoken Kaneko, Yamaha Corporation - Iwata-shi, Japan; Tsukasa Suenaga, Yamaha Corporation - Iwata-shi, Japan; Satoshi Sekine, Yamaha Corporation - Iwata-shi, Japan
Individualizing spatial audio is of crucial importance for high-quality virtual and augmented reality audio. In this paper we propose a method for individualizing spatial audio by combining the recently proposed ear shape modeling technique with computer vision and machine learning. We use a convolutional neural network to obtain estimates of the ear shape model parameters from stereo photographs of the user ear. The individualized ear shape and its associated individualized head-related transfer function (HRTF) can be calculated from the obtained parameters based on the ear shape model and numerical acoustic simulations. Preliminary experiments, evaluating the shapes of the estimated individual ears, proved the effect of individualization. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Adjustment of the direct-to-Reverberant-Energy-Ratio to Reach Externalization within a Binaural Synthesis System—Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Stephan Werner, Technische Universität Ilmenau - Ilmenau, Germany; Florian Klein, Technische Universität Ilmenau - Ilmenau, Germany
The contribution presents a study that investigates the perception of spatial audio reproduced by a binaural synthesis system. The quality features externalization and room congruence are measured within a listening test. Former studies imply that especially externalization is decreased if acoustic divergence between the synthesized and listening room exists. Other studies show that the adjustment of the Direct-to-Reverberant- Energy-Ratio (DRR) can increase the perceived congruence between synthesized and listening room. Within this experiment test persons are able to adjust the DRR of the synthesis until perceptional congruence between the synthesis and the internal reference concerning the listening room occurs. The ratings show that the test persons are able to adjust DRR of the listening room and therefore externalization increases. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P19 - Signal Processing—Part 1

Saturday, October 1, 10:45 am — 12:15 pm (Rm 409B)

Chair:
Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA

P19-1 Efficient Multichannel Audio Transform Coding with Low Delay and Complexity—Florian Schuh, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Dick, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Richard Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian R. Helmrich, International Audio Laboratories - Erlangen, Germany; Nikolaus Rettelbach, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Schwegler, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
For multichannel input such as 5.1-surround material, contemporary transform-based perceptual audio codecs provide good coding quality even at very low bit-rates. These codecs rely on discrete joint-channel coding and parametric spatial coding schemes. The latter require dedicated complex-valued filter-banks around the core-codec, which increase both the algorithmic complexity and latency. This paper demonstrates that the discrete joint-channel as well as known semi-parametric spatial coding principles can also be realized directly within the real-valued modified discrete cosine transform (MDCT) domain of the core-coder, thereby eliminating the need for auxiliary filter-banks. The resulting fully flexible signal-adaptive coding scheme, when integrated into the MPEG-H 3D Audio codec, offers the same quality as the state of the art even at bit-rates as low as 80 kbit/s for 5.1-surround.
Convention Paper 9660 (Purchase now)

P19-2 Intelligent Gap Filling in Perceptual Transform Coding of Audio—Sascha Disch, Fraunhofer IIS, Erlangen - Erlangen, Germany; Andreas Niedermeier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian R. Helmrich, International Audio Laboratories - Erlangen, Germany; Christian Neukam, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Konstantin Schmidt, International Audio Laboratories Erlangen - Erlangen, Germany; Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Florin Ghido, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany
Intelligent Gap Filling (IGF) denotes a semi-parametric coding technique within modern codecs like MPEG-H-3D-Audio or the 3gpp-EVS-codec. IGF can be applied to fill spectral holes introduced by the quantization process in the encoder due to low-bitrate constraints. Typically, if the limited bit budget does not allow for transparent coding, spectral holes emerge in the high-frequency (HF) region of the signal first and increasingly affect the entire upper spectral range for lowest bitrates. At the decoder side, such spectral holes are substituted via IGF using synthetic HF content generated out of low-frequency (LF) content, and post-processing controlled by additional parametric side information. This paper provides an overview of the principles and functionalities of IGF and presents listening test data assessing the perceptual quality of IGF coded audio material.
Convention Paper 9661 (Purchase now)

P19-3 Sonic Quick Rresponse Codes (SQRC) for Embedding Inaudible Metadata in Sound Files—Mark Sheppard, Anglia Ruskin University - Cambridge, Cambridgeshire, UK; Rob Toulson, Anglia Ruskin University - Cambridge, UK; Mariana Lopez, Anglia Ruskin University - Cambridge, UK
With the advent of high definition recording and playback systems, a proportion of the ultrasonic frequency spectrum can potentially be used as a container for unperceivable data and used to trigger events or to hold metadata in the form of text, ISRC (International Standard Recording Code) or a website URL. The Sonic Quick Response Code (SQRC) algorithm is proposed as a method for embedding inaudible acoustic metadata within a 96 kHz audio file in the 30–35 kHz bandwidth range. Thus any receiver that has sufficient bandwidth and decode software installed can immediately find metadata on the audio being played. SQRC data was mixed at random periods into 96 kHz music audio files and listening subjects were asked to identify if they perceived the introduction of the high frequency content. Results show that none of the subjects in this pilot study could perceive the 30–35 kHz material. As a result, it is shown that it is possible to conduct high-resolution audio testing without significant or perceptible artifacts caused by intermodulation distortion.
Convention Paper 9662 (Purchase now)

P20 - Perception—Part 3

Saturday, October 1, 1:30 pm — 3:00 pm (Rm 403A)

Chair:
Sean Olive, Harman International - Northridge, CA, USA

P20-1 Listener Perceptual Threshold for Image Shift Caused by Channel Delays in Stereo Audio—Elisabeth McMullin, Samsung Research America - Valencia, CA USA
To determine a threshold for listener perception of image shift caused by imperfectly synchronized stereo signals, a series of experiments using method of adjustment and ABX procedures was run over headphones and loudspeakers. Listeners adjusted an endless knob to vary delays between the stereo channels of music programs in search of a centered stereo image. The results demonstrated that 9 out of 10 listeners could reliably detect delays between loudspeaker or headphone channels at levels of 0.06 ms or lower. Furthermore, when centering a stereo image 95% of all listener adjustments were under 0.16 ms for headphones and 0.22 ms for loudspeakers. Many variables that may have affected the experiments are explored, including hearing balance, program material, and listening environments.
Convention Paper 9663 (Purchase now)

P20-2 Phantom Image Elevation Explained—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
A subjective experiment was conducted to identify frequency bands that produce the effect of phantom image elevation. Subjects judged the perceived image regions of phantom center images for a broadband pink noise burst and its octave bands with seven different loudspeaker base angles. The 500 Hz and 8 kHz bands were found to be the most effective bands for the perception of above image with the base angle of 180°. The role of acoustic crosstalks for the elevation effect was also examined using binaural stimuli created for the 180° angle. It was found that the elevation effect was significantly reduced when the crosstalks were removed or their delay times were manipulated to 0 ms. Furthermore, the low frequency component of the crosstalk was found to produce greater elevation and externalization effects than the high frequency component.
Convention Paper 9664 (Purchase now)

P20-3 Validation of a Perceptual Distraction Model in a Complex Personal Sound Zone System—Jussi Rämö, Aalborg University - Aalborg, Denmark; Bang & Olufsen - Struer, Denmark; Steven Marsh, University of Surrey - Guildford, Surrey, UK; Søren Bech, Bang & Olufsen a/s - Struer, Denmark; Aalborg University - Aalborg, Denmark; Russell Mason, University of Surrey - Guildford, Surrey, UK; Søren Holdt Jensen, Aalborg University - Aalborg, Denmark
This paper evaluates a previously proposed perceptual model predicting user's perceived distraction caused by interfering audio programs. The distraction model was originally trained using a simple sound reproduction system for music-on-music interference situations and it has not been formally tested using more complex sound systems. A listening experiment was conducted to evaluate the performance of the model, using music target and speech interferer reproduced by a complex personal sound-zone system. The model was found to successfully predict the perceived distraction of a more complex sound reproducing system with different target-interferer pairs than it was originally trained for. Thus, the model can be used as a tool for personal sound-zone evaluation and optimization tasks.
Convention Paper 9665 (Purchase now)

P21 - Signal Processing—Part 2

Saturday, October 1, 1:30 pm — 3:00 pm (Rm 409B)

Chair:
Leslie Gaston-Bird, University of Colorado Denver - Denver, CO, USA

P21-1 Automatic Design of Feedback Delay Network Reverb Parameters for Impulse Response Matching—Jay Coggin, University of Miami - Coral Gables, FL, USA; Will Pirkle, University of Miami - Coral Gables, FL, USA
Traditional reverberation algorithms generally fall into two approaches: physical methods, which involve either convolving with room impulse responses (IRs) or modeling a physical space, and perceptual methods, which allow the use of practically any reverberation modules in various combinations to achieve a perceptually realistic reverberation sound. Perceptual reverberator algorithms are typically “hand tuned” where many of their parameters are found empirically. In this paper we present an automatic method of matching Feedback Delay Network parameters to real room impulse responses so that we may produce computationally efficient reverberation algorithms that perceptually match linear convolution with the target room IRs. Features are extracted from the target room IR and used to guide a Genetic Algorithm search to find the reverberator parameters.
Convention Paper 9666 (Purchase now)

P21-2 Perceptually Alias-Free Waveform Generation Using the Bandlimited Step Method and Genetic Algorithm—Francisco Valencia, codigoriginal - Medellin, Colombia; Samarth Behura, University of Miami - Coral Gables, FL, USA; Will Pirkle, University of Miami - Coral Gables, FL, USA
Quasi-Bandlimited waveforms may be synthesized by smoothing the discontinuities of trivial waveforms using the Bandlimited Step Method (BLEP) that produces excellent results with low computational overhead [1]. The correction scheme first starts with a sinc function—the impulse response of a low-pass filter—and uses it to generate offset values that are applied to the points around the discontinuity. Windowing the sinc function prior to use is found to reduce aliasing at the expense of the harmonic envelope, whose shape is no longer ideal. In this paper we explore two methods for generating the initial sinc function in an effort to achieve a perceptually alias-free waveform. Both approaches involve using the Genetic Algorithm as a search method.
Convention Paper 9667 (Purchase now)

P21-3 Preserving Reverberation in a Sinusoidally Modeled Pitch Shifter—Sarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Many pitch shifting algorithms suffer when the signal contains reverberation. In general, it is possible to preserve the spectral envelope of the original sound, however, an appropriate phase response can only be estimated for minimum phase systems such as vocal formants. This paper presents a pitch shifting algorithm that preserves the reverberant qualities of the original signal by modifying the instantaneous amplitude and frequency trajectories of a sinusoidal model. For each overtone, the sinusoidal trajectories are decomposed into correlated and uncorrelated components and a deviation spectrum is calculated. To synthesize the modified sound, the uncorrelated components are adjusted to preserve the deviation spectrum. The resulting trajectories and sounds are then compared with those of a standard pitch shifter.
Convention Paper 9668 (Purchase now)

AVAR Paper Session 7: Music for VR/AR Projects

Saturday, October 1, 2:00 pm — 3:30 pm (Rm 409A)

Spatial Music, Virtual Reality, and 360 Media—Enda Bates, Trinity College Dublin - Dublin, Ireland; Francis Boland, Trinity College Dublin - Dublin, Ireland
The following paper documents the composition, recording, and post-production of a number of works of instrumental spatial music for a 360 video and audio presentation. The filming and recording of an orchestral work of spatial music is described with particular reference to the various ambisonic microphones used in the recordings, post production techniques, and the delivery of 360 video with matching 360 audio. The recording and production of a second performance of a newly composed spatial work for an acoustic quartet is also presented and the relationship between spatial music and 360 content is discussed. Finally, an exploration of the creative possibilities of VR in terms of soundscape and acousmatic composition is presented. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Positioning of Musical Foreground Parts in Surrounding Sound Stages—Christoph Hold, Technische Universität Berlin - Berlin, Germany; Lukas Nagel, Technische Universität Berlin - Berlin, Germany; Hagen Wierstorf, Technische Universität Ilmenau - Ilmenau, Germany; Alexander Raake, Technische Universität Ilmenau - Ilmenau, Germany
Object based audio offers several new possibilities during the sound mixing process. While stereophonic mixing techniques are highly developed, not all of them generate promising results in an object-based audio environment. An outstanding feature is the new approach of positioning sound objects in the musical sound scene, providing the opportunity of stable localization throughout the whole listening area. Previous studies have shown that even if object-based audio reproduction systems can enhance the playback situation, the critical and guiding attributes of the mix are still uncertain. This study investigates the impact of different spatial distributions of sound objects on listener preference, with a special emphasis on the distinction of high attention foreground parts of the presented music track. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

The Soundfield as Sound Object: Virtual Reality Environments as a Three-Dimensional Canvas for Music Composition—Richard Graham, Stevens Institute of Technology - Hoboken, NJ, USA; Seth Cluett, Stevens Institute of Technology - Hoboken, NJ, USA
Our paper presents ideas raised by recent projects exploring the embellishment, augmentation, and extension of environmental cues, spatial mapping, and immersive potential of scalable multichannel audio systems for virtual and augmented reality. Moving beyond issues of reproductive veracity raised by merely recreating the soundscape of the physical world, these works exploit characteristics of the natural world to accomplish creative goals that include the development of models for interactive composition, composing with physical and abstract spatial gestures, and linking sound and image. We are presenting a novel system that allows the user to treat the soundfield as a fundamental building block for spatial music composition and sound design. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P22 - Signal Processing—Part 3

Saturday, October 1, 3:15 pm — 4:45 pm (Rm 409B)

Chair:
Christoph M. Musialik, Sennheiser Audio Labs - Waldshut-Tiengen, Germany

P22-1 Loudspeaker Crossover Network Optimizer for Multiple Amplitude Response Objectives—William Decanio, Samsung Research America - Valencia, CA, USA; Ritesh Banka, Samsung Research America - Valencia, CA USA
Even though the correlation between multi-spatial amplitude response metrics and listener preferences has been established, most commercial loudspeaker crossover optimization tools operate on only a single axis of loudspeaker system’s amplitude response. This paper describes development of a software based crossover network optimizer that is capable of simultaneously optimizing the on-axis, as well as off-axis acoustic response of a loudspeaker system. Choice of off-axis acoustic metrics as well as a general description of software and implementation of numerical optimization algorithms will be briefly discussed. Several design examples are presented, and measured versus predicted results will be shown.
Convention Paper 9669 (Purchase now)

P22-2 The Relationship between the Bandlimited Step Method (BLEP), Gibbs Phenomenon, and Lanczos Sigma Correction—Akhil Singh, University of Miami - Coral Gables, FL, USA; Will Pirkle, University of Miami - Coral Gables, FL, USA
In virtual analog synthesis of traditional waveforms, several approaches have been developed for smoothing the discontinuities in trivial waveforms to reduce or eliminate aliasing while attempting to preserve both the time and frequency domain responses of the original analog waveforms. The Bandlimited Step Method (BLEP) has been found to produce excellent results with low computational overhead. The correction scheme first starts with a sinc function—the impulse response of a low-pass filter—and uses it to generate offset values that are applied to the points around the discontinuity. This paper discusses the relationships that exist between the BLEP method, Gibbs Phenomenon, and the Lanczos Sigma correction method.
Convention Paper 9670 (Purchase now)

P22-3 Modeling and Adaptive Filtering for Systems with Output Nonlinearity—Erfan Soltanmohammadi, Marvell Semiconductor, Inc. - Santa Clara, CA, USA
Many practical systems are nonlinear in nature, and the Volterra series, also known as nonlinear convolution, is widely used to model these systems. For nonlinear systems with infinite memory, such a modeling approach is usually not feasible because of multiple infinite summations. In practice, the full Volterra series representation of such a system is either approximated by just a few terms, or is otherwise simplified. In an audio system, a useful approximation is to model all memoryless and dynamical nonlinear effects as a combined nonlinearity at its output. In this paper we propose a new Volterra-based structure that accommodates nonlinear systems with output nonlinearity and infinite memory. We then propose an adaptation approach to estimate the Volterra kernels based on the Least Mean Squares (LMS) approach.
Convention Paper 9671 (Purchase now)

AVAR Paper Session 8: Capture, Rendering, and Mixing for VR—Part 2

Saturday, October 1, 4:00 pm — 5:30 pm (Rm 409A)

XY-Stereo Capture and Up-Conversion for Virtual Reality—Nicolas Tsingos, Dolby Labs - San Francisco, CA, USA; Cong Zhou, Dolby Laboratories - San Francisco, CA, USA; Abhay Nadkarni, Dolby Labs - San Francisco, CA, USA
We propose a perceptually-based approach to creating immersive soundscapes for VR applications. We leverage stereophonic content obtained from XY microphones as a basic building block that can be easily recorded, edited, and combined to provide a more compelling experience than can be obtained from recording at a single location. Central to our approach is a novel up-conversion algorithm that derives a nearly full-spherical parametric soundfield, including height information, from an XY recording. This approach enables a simpler, improved capture, when compared to alternative soundfield recording techniques. It can also take advantage of new object-based delivery formats for flexible delivery and playback. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Augmented Reality Headphone Environment Rendering—Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA; Keun Sup Lee, Apple Inc. - Cupertino, CA, USA
In headphone-based augmented reality audio applications, computer-generated audio-visual objects are rendered over headphones or ear buds and blended into a natural audio environment. This requires binaural artificial reverberation processing to match local environment acoustics, so that synthetic audio objects are not distinguishable from sounds occurring naturally or reproduced over loudspeakers. Solutions involving the measurement or calculation of binaural room impulse responses in a consumer environment are limited by practical obstacles and complexity. We propose an approach exploiting a statistical reverberation model, enabling practical acoustical environment characterization and computationally efficient reflection and reverberation rendering for multiple virtual sound sources. The method applies equally to headphone-based “audio-augmented reality”–enabling natural-sounding, externalized virtual 3-D audio reproduction of music, movie or game soundtracks. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

Capturing and Rendering 360º VR Audio Using Cardioid Microphones—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This paper proposes a new microphone technique and a binaural rendering approach for 360º VR audio. Four cardioid microphones are arranged in a horizontal square, with 30 cm spacing and 90º subtended angle for each of the four pairs of adjacent microphones, in order to obtain the stereophonic recording angle (SRA) of 90º for a quadraphonic loudspeaker reproduction. The signals are binaurally synthesized with quadraphonic read-related impulse responses. This allows production of the same SRA for each of the four 90º segments whenever the listener rotates his or her head by 90º in a VR environment with a head-tracker, which is confirmed by a listening test. For vertical sound capturing, upward- and optional downward-facing cardioid microphones are added. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.

P23 - Signal Processing—Part 4

Sunday, October 2, 9:00 am — 10:30 am (Rm 409B)

Chair:
Dave Berners, Universal Audio; CCRMA, Stanford University - Stanford, CA, USA

P23-1 Physically Derived Synthesis Model of an Aeolian Tone—Rod Selfridge, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Eldad J. Avital, Queen Mary University of London - London, UK; Xiaolong Tang, Queen Mary University of London - London, UK
An Aeolian tone is the whistle-like sound that is generated when air moves past a cylinder or similar object; it is one of the primary aeroacoustic sound sources. A synthesis model of an Aeolian tone has been developed based on empirical formula derived from fundamental fluid dynamics equations. It avoids time consuming computations and allows real-time operation and interaction. Evaluation of the synthesis model shows frequencies produced are close to those measured in a wind tunnel or simulated through traditional offline computations.
Convention Paper 9679 (Purchase now)

P23-2 Binaural Auditory Steering Strategy: A Cupped Ear Study for Hearing Aid Design—Changxue Ma, GN Resound - Glenview, IL, USA; Andrew B. Dittberner, GN Resound - Glenview, IL, USA; Rob de Vries, GN Resound - Eindhoven, The Netherlands
The binaural auditory steering strategy for hearing aids focuses to achieve a better hearing experience in terms of both “better ear” listening and situational awareness. We have taken into account the binaural auditory spatial processing strategy to optimize the acoustic beamforming filters. We investigate in this paper how human beings achieve better listening with cupped ears based on the head-related transfer function (HRTF) of the subjects with both open ears and cupped ears. We define the metrics better ear index and situational awareness index. We show that cupped ears can significantly improve the better ear index and the open ears has better situational awareness. We can automatically choose one hearing aid with directivity similar to a cupped ear and another hearing aid similar to an open ear to achieve both better intelligibility and situational awareness in certain acoustic environments.
Convention Paper 9680 (Purchase now)

P23-3 On the Effect of Artificial Distortions on Objective Performance Measures for Dialog Enhancement—Matteo Torcoli, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The objective evaluation of dialog enhancement systems using computational methods is desired to complement the subjective evaluation using listening tests. It remains a challenge because for this application neither were performance measures specifically designed, nor were existing measures systematically analyzed. This work investigates eight objective performance measurement tools originally developed for audio and speech coding, speech enhancement, or source separation. To this end, a set of basic distortions is presented and used to simulate degradations that are common in dialog enhancement. The effect of the artificial distortions on the performance measures is quantified by means of a so-called response score that is proposed here.
Convention Paper 9681 (Purchase now)

P24 - Spatial Audio and Recording & Production

Sunday, October 2, 9:00 am — 10:30 am (Rm 403B)

P24-1 The Duplex Panner: A Cross-Platform Stereo Widening Tool—Samuel Nacach, Element Audio Group - New York, USA; New York University, Abu Dhabi - Abu Dhabi, UAE
Binaural processing, Ambiophonics, and the Haas effect are some of the most popular methods of achieving 3D audio, virtual space, and wide mixes. However, for commercial music applications Binaural audio is limited by its HRTF and headphone dependency; Ambiophonics by its cross-talk cancellation and a listening sweet-spot; and the Haas effect by its hard pans. To solve this, this paper examines the recently developed Duplex Panner technique to better understand its cross-platform capabilities for playback in both headphone and loudspeaker systems. A subjective experiment comparing processed and unprocessed versions of the same material, along with an in depth analysis of the effects on phase and its psychoacoustics implications will help define the limitations of the Duplex Panner algorithm.
Convention Paper 9672 (Purchase now)

P24-2 Women in Audio: Contributions and Challenges in Music Technology and Production—Marlene Mathew, New York University - New York, NY USA; Jennifer Grossman, New York University - New York, NY USA; Areti Andreopoulou, LIMSI-CNRS - Orsay, France
Even though there is a persistent gender gap, the impact of women in audio continues to grow. The achievements of pioneering women in audio go back to the mid-twentieth century. Their accomplishments in the entertainment and academic sectors have helped pave the way for a record number of women in the next generation of women in audio. This paper presents recent contributions as well as discusses the representation of women in audio, the gender gap and challenges women face in this field. Various options, policies, and initiatives are also proposed with the goal towards gender parity. The authors hope to provide a valuable contribution to the research on women in audio, and in particular women's representation in audio engineering, production, and electronic music.
Convention Paper 9673 (Purchase now)

P24-3 An Evaluation of Two Microphone Techniques for Bleed Reduction Using Independent Component Analysis—Mark Rau, Center for Computer Research in Music and Acoustics, Stanford University - Palo Alto, CA, USA; McGill University - Montreal, QC, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
Independent component analysis is tested as an approach to reduce the effects of instrument bleed in a multi-track audio recording. Two microphone techniques are examined for a case with two sound sources. The first technique uses each microphone facing a separate source, while the second uses both microphones facing the same source. The separation of the microphones as well as polarity is altered to observe the effects. Both techniques are tested using a simulation as well as with a physical experiment. The first technique is shown to be effective less than 50% of the time with minimal bleed reduction. The second technique is shown to be effective between 68–96% of the time and can have an average bleed reduction of up to 2 dB with instances up to 4.6 dB.
Convention Paper 9674 (Purchase now)

P24-4 Music Production for Dolby Atmos and Auro 3D—Robert Jay Ellis-Geiger, City University of Hong Kong - Hong Kong, SAR China
This paper explores 3D cinematic spatial audio techniques adapted and developed for Dolby Atmos and Auro 3D by the author for the production of music for a feature film. The main objective was to develop a way of recording and mixing music that could translate to both Dolby Atmos and Auro-3D cinema reproduction systems.
Convention Paper 9675 (Purchase now)

P24-5 Smartphone-Based 360° Video Streaming/Viewing System including Acoustic Immersion—Kenta Niwa, NTT Media Inelligence Laboratories - Tokyo, Japan; Daisuke Ochi, NTT Media Intelligence Laboratories; Akio Kameda, NTT Media Intelligence Laboratories; Yutaka Kamamoto, NTT Communication Science Laboratories - Kanagawa, Japan; Takehiro Moriya, NTT Communication Science Labs - Atsugi-shi, Kanagawa-ken, Japan
In virtual reality (VR), 360º video services provided through head mounted displays (HMDs) or smartphones are widely used. Despite the fact that the user’s viewpoint seamlessly changes, sounds through the headphones are fixed even when images change in correspondence with user head motion in many 360º video services. We have been studying acoustic immersion technology that is achieved by, for example, generating binaural sounds corresponding to the user head motion. Basically, our method is composed of angular region-wise source enhancement using array observation signals, multichannel audio encoding based on MPEG-4 Audio Lossless Coding (ALS), and binaural synthesizing of enhanced signals using head related transfer functions (HRTFs). In this paper, we constructed a smartphone-based real-time system for streaming/viewing 360º video including acoustic immersion and evaluated it through subjective tests.
Convention Paper 9676 (Purchase now)

P24-6 Analysis on the Timbre of Horizontal Ambisonics with Different Decoding Methods—Yang Liu, South China University of Technology - Guangzhou, China; Guoguang Electric Company Limited - Guangzhou, China; Bosun Xie, South China University of Technology - Guangzhou, China
Based on different physical and psychoacoustic considerations, there are various Ambisonics decoding methods. The perceived performances of reproduction depend on the order and decoding method of Ambisonics. The present paper analyzes and compares the timbre coloration in horizontal Ambisonics with basic, maximize energy location vector (max-rE) and in-phase decoding methods, respectively. The binaural loudness level spectra (BLLS) for Ambisonics reproduction are calculated by using Moore’s revised binaural loudness model and then used as the criterion to evaluate the timbre coloration. Results indicate that, overall, the basic decoding method is superior to the other two methods in terms of the deviation of BLLS. This conclusion is valid for Ambisonics with various orders as well as in the central and off-central listening position.
Convention Paper 9677 (Purchase now)

P24-7 Distance Factor for Frontal Sound Localization with Side Loudspeakers—Yuzuru Nagayama, University of Aizu - Aizu-Wakamatsu, Japan; Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
In our laboratory, we conducted auditory experiments to examine elevation localization by changing frequency spectra without using HRTFs. As a result, we achieved elevation localization at (azimuth, elevation) = (0, 0), (0, 30), (0, 60) by two loudspeakers arranged just beside the listener. However, it was only investigated with fixed distance between the ear and the loudspeaker to be 70 cm. When distance was not 70 cm, the elevation perception was unidentified. Therefore, the arrangement of loudspeakers was changed in this research to improve the result. As a result, the rate of in-the-head localization was increased, and the distance of the perceived sound image became shorter and shorter.
Convention Paper 9678 (Purchase now)

P25 - Signal Processing—Part 5

Sunday, October 2, 10:45 am — 12:15 pm (Rm 409B)

Chair:
Scott Norcross, Dolby Laboratories - San Francisco, CA, USA

P25-1 An Efficient Algorithm for Clipping Detection and Declipping Audio—Christopher Laguna, Georgia Institute of Technology - Atlanta, GA, USA; Alexander Lerch, Georgia Institute of Technology - Atlanta, GA, USA
We present an algorithm for end to end declipping, which includes clipping detection and the replacement of clipped samples. To detect regions of clipping, we analyze the signal’s amplitude histogram and the shape of the signal in the time-domain. The sample replacement algorithm uses a two-pass approach: short regions of clipping are replaced in the time-domain and long regions of clipping are replaced in the frequency-domain. The algorithm is robust against different types of clipping and is efficient compared to existing approaches. The algorithm has been implemented in an open source JavaScript client-side web application. Clipping detection is shown to give an f-measure of 0.92 and is robust to the clipping level.
Convention Paper 9682 (Purchase now)

P25-2 A Two-Pass Algorithm for Automatic Loudness Correction—Alexey Lukin, iZotope, Inc. - Cambridge, MA, USA; Russell McClellan, iZotope, Inc. - Cambridge, MA, USA; Aaron Wishnick, iZotope - Cambridge, MA, USA
Loudness standards for broadcast audio, such as BS.1770, establish target values for the integrated loudness, true peak level, and short-term loudness of a record. Compliance with these three targets can be challenging when the dynamic range of a record is high, so software for automatic loudness correction is important for speeding up the workflow of post-production engineers. This work reviews existing software implementations of automatic loudness correction and proposes a new algorithm that provides efficient simultaneous correction of all three targets.
Convention Paper 9683 (Purchase now)

P25-3 A Low Computational Complexity Beamforming Scheme Concatenated with Noise Cancellation—Jin Xie, Marvell Technology Group Ltd. - Santa Clara, CA, USA; Sungyub Daniel Yoo, Marvell Technology Group Ltd. - Santa Clara, CA, USA; Kapil Jain, Marvell Technology Group Ltd. - Santa Clara, CA, USA
In this paper we present a microphone beamforming algorithm. This algorithm has been implemented in Marvell’s proprietary digital signal processor embedded in Marvell’s audio codec chip. This beamforming algorithm features (1) easy to implement; (2) sound source localization (SSL) and sound source tracking, (3) single in single out frequency domain noise cancellation. Lab tests show that the performance is better than the reference existing codec.
Convention Paper 9684 (Purchase now)

P26 - Signal Processing—Part 6

Sunday, October 2, 1:30 pm — 3:00 pm (Rm 409B)

Chair:
Jon Boley, GN Hearing - Chicago, IL, USA

P26-1 Discrete-Time Implementation of Arbitrary Delay-Free Feedback Networks—Dave Berners, Universal Audio; CCRMA, Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
The delay-free feedback loop can be directly implemented in discrete time by separately discretizing the forward and backward transfer functions and simultaneously solving the resulting linear system for the nodes connecting the filters within the loop. The ability to form the solutions rests upon the fact that, at sample n, the output of a discrete-time linear system is a linear function of the input to the system at sample n. This technique allows for relatively simple calculation of coefficients for certain time-varying feedback systems, and allows for inclusion of memoryless nonlinearities inside feedback loops. We show that the technique can be generalized to discretize an arbitrary network of LTI systems arranged in multiple-loop feedback networks. Two examples are presented: one time-varying system and one nonlinear system.
Convention Paper 9685 (Purchase now)

P26-2 The Time-Varying Bilinear Transform—Jonathan S. Abel, Stanford University - Stanford, CA, USA; Dave Berners, Universal Audio; CCRMA, Stanford University - Stanford, CA, USA
The discretization of continuous-time systems is considered, and an extension of the bilinear transform to the case of time-varying systems is introduced. Termed the “time-varying bilinear transform,” the transform generates a sequence of digital filter coefficients in response to continuous-time system changes that keeps the digital filter state compatible with the changing digital filter coefficients. Accordingly, transients in the digital filter output that don’t appear in the continuous system output are avoided. For an Nth-order continuous-time system, a step change in the system produces a sequence of N intermediate sets of digital filter coefficients, bracketed by what would be generated by the bilinear transform applied to the initial and final systems. Sequences are tabulated for direct and transpose canonical forms and first-order and second-order systems, and examples of first-order and second-order analog filters with time-varying components are presented.
Convention Paper 9686 (Purchase now)

P26-3 Active Equalization for Loudspeaker Protection—Christopher Painter, Marvell Semiconductor, Inc. - Santa Clara, CA, USA; Kapil Jain, Marvell Technology Group Ltd. - Santa Clara, CA, USA
We present a time-varying linear equalization algorithm whose purpose is to protect a loudspeaker from damage under high drive conditions. It is suitable for implementation on a low-cost digital signal processor, often integrated on the same die as a high-performance audio codec. A typical application is in a portable wireless (e.g., Bluetooth) loudspeaker. For a given driver and enclosure design, the algorithm allows the power output of the loudspeaker to be maximized while introducing only minimal coloration or distortion. During the loudspeaker design phase, the parameters of the algorithm can be easily tuned by the designer, further optimizing the overall design for power output, robustness and low distortion.
Convention Paper 9687 (Purchase now)

Return to Paper Sessions

AES Los Angeles 2016Paper Session Details