145th AES CONVENTION Paper Session Details

AES New York 2018
Paper Session Details

P01 - Perception – Part 1

Wednesday, October 17, 9:00 am — 12:00 pm (1E11)

Elisabeth McMullin, Samsung Research America - Valencia, CA USA

P01-1 Improved Psychoacoustic Model for Efficient Perceptual Audio CodecsSascha Disch, Fraunhofer IIS - Erlangen, Germany; Steven van de Par, Fraunhofer Hör-Sprach- und Audio Technologie - Oldenburg, Germany; Andreas Niedermeier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Elena Burdiel Pérez, Fraunhofer IIS - Erlangen, Germany; Ane Berasategui Ceberio, Fraunhofer IIS - Erlangen, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany
Since early perceptual audio coders such as mp3, the underlying psychoacoustic model that controls the encoding process has not undergone many dramatic changes. Meanwhile, modern audio coders have been equipped with semi-parametric or parametric coding tools such as audio bandwidth extension. Thereby, the initial psychoacoustic model used in a perceptual coder, just considering added quantization noise, became partly unsuitable. We propose the use of an improved psychoacoustic excitation model based on an existing model proposed by Dau et al. in 1997. This modulation-based model is essentially independent from the input waveform by calculating an internal auditory representation. Using the example of MPEG-H 3D Audio and its semi-parametric Intelligent Gap Filling (IGF) tool, we demonstrate that we can successfully control the IGF parameter selection process to achieve overall improved perceptual quality.
Convention Paper 10029 (Purchase now)

P01-2 On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in MusicSascha Dick, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jiandong Zhang, Academy of Broadcasting Planning, SAPPRFT - Beijing, China; Yili Qin, Academy of Broadcasting Planning, SAPPRFT - Beijing, China; Nadja Schinkel-Bielefeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Silvantos GmbH - Erlangen, Germany; Anna Katharina Leschanowsky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany
Modern audio codecs are used all over the world, reaching listeners with many different cultures and languages. This study investigates if and how cultural background influences the perception and preference of different audio coding artifacts, focusing on musical content. A subjective listening test was designed to directly compare different types of audio coding and was performed with Mandarin Chinese and German speaking listeners. Overall comparison showed largely consistent results, affirming the validity of the proposed test method. Differential comparison indicates preferences for certain artifacts in different listener groups, e.g., Chinese listeners tended to grade tonality mismatch higher and pre-echoes worse compared to German listeners, and musicians preferred bandwidth limitation over tonality mismatch when compared to non-musicians.
Convention Paper 10030 (Purchase now)

P01-3 Perception of Phase Changes in the Context of Musical Audio Source SeparationChungeun Kim, University of Surrey - Guildford, Surrey, UK; Sony Interactive Entertainment Europe - London, UK; Emad M. Grais, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
This study investigates into the perceptual consequence of phase change in conventional magnitude-based source separation. A listening test was conducted, where the participants compared three different source separation scenarios, each with two phase retrieval cases: phase from the original mix or from the target source. The participants’ responses regarding their similarity to the reference showed that (1) the difference between the mix phase and the perfect target phase was perceivable in the majority of cases with some song-dependent exceptions, and (2) use of the mix phase degraded the perceived quality even in the case of perfect magnitude separation. The findings imply that there is room for perceptual improvement by attempting correct phase reconstruction in addition to achieving better magnitude-based separation.
Convention Paper 10031 (Purchase now)

P01-4 Method for Quantitative Evaluation of Auditory Perception of Nonlinear Distortion: Part II – Metric for Music Signal Tonality and its Impact on Subjective Perception of DistortionsMikhail Pakhomov, SPb Audio R&D Lab - St. Petersburg, Russia; Victor Rozhnov, SPb Audio R&D Lab - St. Petersburg, Russia
In the first part of the paper we have noticed that the impact of audible nonlinear distortions on subjective listener preference is strongly dependent on the spectral structure of a test signal. In the second part we propose a method for considering the spectral characteristics of a test signal in the evaluation of the subjective perception of audible nonlinear distortions. To describe the tonal structure of a music signal, a qualitative characteristic, tonality, is taken as a metric, and tonality coefficient is proposed as a measure of this characteristic. Subjective listening tests were performed to estimate how the auditory perception of nonlinear distortions depends on the tonal structure of a signal and the spectral distribution of the noise-to-mask ratio (NMR)
Convention Paper 10032 (Purchase now)

P01-5 Developing a Method for the Subjective Evaluation of Smartphone Music PlaybackElisabeth McMullin, Samsung Research America - Valencia, CA USA; Victoria Suha, Samsung Electronics - Valencia, CA, USA; Yuan Li, Samsung Research America - Valencia, CA, USA; Will Saba, Samsung Electronics - Valencia, CA, USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions
To determine the preferred audio characteristics for media playback over smartphones, a series of controlled double-blind listening experiments were run to evaluate the subjective playback quality of six high-end smartphones. Listeners rated products based on their audio quality preference and left comments categorized by attribute. The devices were tested in different orientations in level-matched and maximum-volume scenarios. Positional variation and biases were accounted for using a motorized turntable and audio playback was controlled remotely with remote-access software. Test results were compared to spatially-averaged measurements made using a multitone stimulus and demonstrate that the smoothness of the frequency response is the most important aspect in smartphone preference. Low frequency extension, decreased levels of nonlinear distortion, and higher maximum playback level did not correlate with higher phone ratings.
Convention Paper 10033 (Purchase now)

P01-6 Investigation into the Effects of Subjective Test Interface Choice on the Validity of ResultsNicholas Jillings, Birmingham City University - Birmingham, UK; Brecht De Man, Birmingham City University - Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Subjective experiments are a cornerstone of modern research with a variety of tasks being undertaken by subjects. In the field of audio, subjective listening tests provide validation for research and aid fair comparison between techniques or devices such as coding performance, speakers, mixes, and source separation systems. Several interfaces have been designed to mitigate biases and to standardize procedures, enabling indirect comparisons. The number of different combinations of interface and test design make it extremely difficult to conduct a truly unbiased listening test. This paper resolves the largest of these variables by identifying the impact the interface itself has on a purely auditory test. This information is used to make recommendations for specific categories of listening tests.
Convention Paper 10034 (Purchase now)


P02 - Signal Processing—Part 1

Wednesday, October 17, 9:00 am — 11:00 am (1E12)

Emre Çakir, Tampere University of Technology - Tampere, Finland

P02-1 Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent AutoencodersEmre Çakir, Tampere University of Technology - Tampere, Finland; Tuomas Virtanen, Tampere University of Technology - Tampere, Finland
In this work we propose a deep learning based method—namely, variational, convolutional recurrent autoencoders (VCRAE)—for musical instrument synthesis. This method utilizes the higher level time-frequency representations extracted by the convolutional and recurrent layers to learn a Gaussian distribution in the training stage, which will be later used to infer unique samples through interpolation of multiple instruments in the usage stage. The reconstruction performance of VCRAE is evaluated by proxy through an instrument classifier and provides significantly better accuracy than two other baseline autoencoder methods. The synthesized samples for the combinations of 15 different instruments are available on the companion website.
Convention Paper 10035 (Purchase now)

P02-2 Music Enhancement by a Novel CNN ArchitectureAnton Porov, PDMI RAS - St. Petersburg, Russia; Eunmi Oh, Samsung Electronics Co., Ltd. - Seoul, Korea; Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Hosang Sung, Samsung Electronics - Korea; Jonghoon Jeong, Samsung Electronics Co. Ltd. - Seoul, Korea; Konstantin Osipov, PDMI RAS - Russia; Holly Francois, Samsung Electronics R&D Institute UK - Staines-Upon Thames, Surrey, UK
This paper is concerned with music enhancement by removal of coding artifacts and recovery of acoustic characteristics that preserve the sound quality of the original music content. In order to achieve this, we propose a novel convolution neural network (CNN) architecture called FTD (Frequency-Time Dependent) CNN, which utilizes correlation and context information across spectral and temporal dependency for music signals. Experimental results show that both subjective and objective sound quality metrics are significantly improved. This unique way of applying a CNN to exploit global dependency across frequency bins may effectively restore information that is corrupted by coding artifacts in compressed music content.
Convention Paper 10036 (Purchase now)

P02-3 The New Dynamics Processing Effect in Android Open Source ProjectRicardo Garcia, Google - Mountain View, CA, USA
The Android “P” Audio Framework’s new Dynamics Processing Effect (DPE) in Android Open Source Project (AOSP), provides developers with controls to fine-tune the audio experience using several stages of equalization, multi-band compressors, and linked limiters. The API allows developers to configure the DPE’s multichannel architecture to exercise real-time control over thousands of audio parameters. This talk additionally discusses the design and use of DPE in the recently announced Sound Amplifier accessibility service for Android and outlines other uses for acoustic compensation and hearing applications.
Convention Paper 10037 (Purchase now)

P02-4 On the Physiological Validity of the Group Delay Response of All-Pole Vocal Tract ModelingAníbal Ferreira, University of Porto - Porto, Portugal
Magnitude-oriented approaches dominate the voice analysis front-ends of most current technologies addressing, e.g., speaker identification, speech coding/compression, and voice reconstruction and re-synthesis. A popular technique is all-pole vocal tract modeling. The phase response of all-pole models is known to be non-linear and highly dependent on the magnitude frequency response. In this paper we use a shift-invariant phase-related feature that is estimated from signal harmonics in order to study the impact of all-pole models on the phase structure of voiced sounds. We relate that impact to the phase structure that is found in natural voiced sounds to conclude on the physiological validity of the group delay of all-pole vocal tract modeling. Our findings emphasize that harmonic phase models are idiosyncratic, and this is important in speaker identification and in fostering the quality and naturalness of synthetic and reconstructed speech.
Convention Paper 10038 (Purchase now)


P03 - Recording and Production

Wednesday, October 17, 10:00 am — 11:30 am (Poster Area)

P03-1 Characterizing the Effect on Linear and Harmonic Distortions of AC Bias and Input Levels when Recording to Analog TapeThomas Mitchell, University of Miami - Coral Gables, FL, USA; Christopher Bennett, University of Miami - Coral Gables, FL, USA
Analog tape recorders introduce both linear distortions and nonlinear distortions to the audio. While the role of the AC bias and input levels on these distortions are well-understood by recording engineers, the impact on specific audio features, for example SNR, fatness, brightness, roughness, and harmonic count is less well described. In this study we examined with high granularity the impact and interactions of several AC bias and input levels on each of these features. We utilized the exponential swept sine acquisition and deconvolution technique to analyze a Scully 280. The results provide a detailed characterization on the tonal character introduced by the recorder. We conclude with level recommendations that could prove important for primary capture, effects processing, and digital emulation.
Convention Paper 10039 (Purchase now)

P03-2 Microphone Comparison for Snare Drum RecordingMatthew Cheshire, Birmingham City University - Birmingham, UK; Jason Hockman, Birmingham City University - Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK
We present two experiments to test listener preference for snare microphones within real-world recording scenarios. In the first experiment, listeners evaluated isolated recordings captured with 25 microphones. In the second experiment, listeners performed the same task with the addition of a kick drum and hi-hat as part of a performed drum sequence. Results indicate a prominent contrast between the highest and lowest rated microphones and that condensers were rated higher than other subsets tested. The preference for three microphones significantly changed between the two listening test conditions. A post-test survey revealed that most listeners compared high-frequency characteristics, which were measured using spectral features. A positive correlation was observed between test scores of cardioid microphones and the brightness feature.
Convention Paper 10040 (Purchase now)

P03-3 Automatic Mixing of Multitrack Material Using Modified Loudness ModelsSteven Fenton, University of Huddersfield - Huddersfield, West Yorkshire, UK
This work investigates the perceptual accuracy of the ITU-Recommendation BS.1770 loudness algorithm when employed in a basic auto mixing system. Optimal filter parameters previously proposed by the author, which incorporate modifications to both the pre-filter response and the integration window sizes are tested against the standard K-weighted model and filter parameters proposed through other studies. The validation process encompassed two stages, the first being the elicitation of preferred mix parameters used by the mixing system and the subsequent generation of automatic mixes based on these rules utilizing the various filter parameters. A controlled listening test was then employed to evaluate the listener preferences to the completed mixes. It is concluded that the optimized filter parameter set based upon stem type, results in a more perceptually accurate automatic mix being achieved.
Convention Paper 10041 (Purchase now)

P03-4 Double-MS Decoding with Diffuse Sound ControlAlexis Favrot, Illusonic GmbH - Uster, Switzerland; Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland; Helmut Wittek, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany
The double MS (DMS) setup provides a coincident recording configuration in the horizontal plane for surround sound recording, similar to Ambisonics B-format. DMS uses two cardioids and one dipole microphones, arranged coincidentally. An algorithm for processing DMS recordings is described, which in addition to linear processing provides a diffuse sound gain control and diffuse sound de-correlation. The target stereo or multichannel signals can be made more dry or reverberant with diffuse gain. Diffuse de-correlation improves spaciousness and its importance scales with the number of channels. The consequences of these new controls for the stereophonic image will be depicted.
Convention Paper 10042 (Purchase now)

P03-5 Real-Time System for the Measurement of Perceived PunchAndrew Parker, University of Huddersfield - Huddersfield, UK; Steven Fenton, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Previous work has proposed a perceptually motivated model for the objective measurement of punch in a music recording. This paper presents a real-time implementation of the proposed model as a punch metering plug-in. The plug-in presents both momentary and historical punch metrics in the form of a P95 (95th percentile punch measure), Mean punch score, and P95M (95th percentile punch divided by the mean punch score). The meter’s outputs are compared to subjective punch scores derived through a controlled listening test and show a “strong” correlation with Pearson and Spearman coefficients r=0.840 (p<0.001) and rho=0.937 (p<0.001) respectively. The real-time measure of punch could prove useful in allowing objective control and optimization of this feature during mixing, mastering and broadcast.
Convention Paper 10043 (Purchase now)

P03-6 Algorithm to Determine Downmixing Coefficients from Specific Multichannel Format to Reproduction Format with a Smaller Number of ChannelsHiroki Kubo, NHK - Tokyo, Japan; Satoshi Oode, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Takehiro Sugimoto, NHK - Setagaya-ku, Tokyo, Japan; Shu Kitajima, NHK Science & Technology Research Laboratories - Tokyo, Japan; Atsuro Ito, NHK Science & Technology Research Laboratories - Tokyo, Japan; Tomoyasu Komori, NHK Science and Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Kazuho Ono, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
A unified algorithm to derive downmixing coefficients from a specific multichannel format to arbitrary reproduction formats with a smaller number of channels was investigated. The proposed method attaches importance to maintaining both basic audio quality and spatial impression. It involves appropriately changing the positions of channels in source formats, or locating the channels in source formats at equal intervals between the two channels in destination formats when phantom sources are used. Owing to this feature, this method can minimize the use of phantom sources and avoid the deterioration of spatial impression. A subjective evaluation was carried out and the obtained results implied that the proposed algorithm satisfies our requirements.
Convention Paper 10044 (Purchase now)

P03-7 Subjective Evaluation of Stereo-9.1 Upmixing Algorithms Using Perceptual Band AllocationSungsoo Kim, New York University - New York, NY, USA
The purpose of this study was to investigate preexisting algorithms for building an upmixing algorithm that converts a stereo signal to 5.1 and 9.1 multichannel audio formats. Using three algorithms (the passive surround decoding method, the Least Mean Squares algorithm, the adaptive panning algorithm), a stereo audio signal was upmixed to 5.1 and 9.1 in Max. The Max patch provides a GUI in which listeners can select one of the upmixing algorithms and control EQ during playback. Perceptual Band Allocation (PBA) is applied for converting the upmixed 5.1 channel audio to 9.1 that contains four height channels (top front left, top front right, top back left, top back right). A subjective listening test was conducted in New York University’s MARL Research Lab. LMS algorithm was found to provide more natural and spatial sounds than the other two algorithms. The passive surround decoding method and the adaptive panning algorithm were found to show similar characteristics in terms of low frequency and spaciousness.
Convention Paper 10045 (Purchase now)


P04 - Transducers—Part 1

Wednesday, October 17, 2:30 pm — 5:30 pm (1E11)

Sean Olive, Harman International - Northridge, CA, USA

P04-1 Numerical Optimization Strategies for Acoustic Elements in Loudspeaker DesignAndri Bezzola, Samsung Research America - Valencia, CA USA
Optimal design of acoustic loudspeaker design elements, such as waveguides and phase plugs, often requires extensive experience by the designer. Numerical simulations and optimization algorithms can aid in reducing the design-test-optimize cycle that is traditionally applied. A general mathematical framework for numerical optimization techniques is introduced and three algorithms of design optimization are reviewed: parameter optimization, shape optimization, and topology optimization. This paper highlights strengths and drawbacks of each method with the use of real-world design of a waveguide and two phase plugs. Where possible, the results are confirmed with prototypes and measurements. The work shows that excellent results can be achieved in just one design iteration with the help of numerical optimization methods.
Convention Paper 10046 (Purchase now)

P04-2 An Acoustic Model of the Tapped Horn LoudspeakerMarco Berzborn, RWTH Aachen University - Aachen, Germany; Michael Smithers, Dolby Laboratories - McMahons Point, NSW, Australia
A lumped-parameter model of the Tapped Horn loudspeaker–a design where the loudspeaker driver radiates into the throat as well as the mouth of the horn simultaneously–is presented. The model enables the estimation of the far-field sound pressure response from the Thiele/Small parameters of the loudspeaker driver and an additional analytic two-port matrix representation of the Tapped Horn. Simulations, performed using the model for subwoofers, are compared to measurements from an actual loudspeaker.
Convention Paper 10047 (Purchase now)

P04-3 A Survey and Analysis of Consumer and Professional Headphones Based on Their Objective and Subjective PerformancesSean Olive, Harman International - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA; Todd Welti, Harman International Inc. - Northridge, CA, USA
In previous studies [1–3], we presented two statistical models that predict listeners’ sound quality preferences of headphones based on deviations in their measured frequency response. In this paper the models are applied to 156 different consumer and professional headphones that include a wide range of brands, prices, and headphone categories (e.g., in-ear, on-ear, and around-ear). The goal was to gain a better understanding of how these factors influence the subjective and objective performances of the headphone. The predicted preference ratings of the headphones were compared to ratings given by five different headphone review organizations to determine their correlation. Headphones designed to the current IEC/ITU/EBU standards produce significantly lower sound quality ratings.
Convention Paper 10048 (Purchase now)

P04-4 Minimizing Costs in Audio Devices through Efficient End-of-Line TestingWolfgang Klippel, Klippel GmbH - Dresden, Germany
Variances in the parts and uncertainties in the assembling process degrade the performance and the reliability of the manufactured devices. Defective units increase the manufacturing cost if detected during end-of-line testing or increase the after-sales cost if an undetected failure occurs in the field. This paper addresses the role of end-of-line testing to reduce both kinds of failures and to maximize the performance/cost ratio as seen by the end-user. The selection of sensitive and fast measurements, which can be performed under manufacturing conditions, is the basis for the PASS/FAIL classification. The paper shows that optimal production limits used in EoL-testing minimize the overall cost by considering a clear product definition, information from the particular design, statistical data from manufacturing process and traceability of the field rejects. The general concept presented here is illustrated using practical examples from automotive and other applications.
Convention Paper 10049 (Purchase now)

P04-5 Horn Driver Based on Annular Diaphragm and the Side-Firing Compression ChamberAlexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA
This work proposes a new type of compression driver based on an annular flexural diaphragm and a topology that combines part of the diaphragm radiating directly into the horn and the other part loaded by a side-firing compression chamber. This configuration’s design is very simple compared to its predecessors. Acoustical theory, numerical simulations, matrix electrical-mechanical-acoustical model, and the results of real transducer measurements substantiate the new design. The new driver provides performance on par with more complex transducers of a similar format.
Convention Paper 10050 (Purchase now)

P04-6 On the Efficiency of Flown vs. Ground Stacked Subwoofer ConfigurationsEtienne Corteel, L-Acoustics - Marcoussis, France; Hugo Coste Dombre, L-Acoustics - Marcoussis, France; Christophe Combet, L-Acoustics - Marcoussis, France; Yoachim Horyn, L-Acoustics - Marcoussis, France; François Montignies, L-Acoustics - Marcoussis, France
Modern live loudspeaker systems consist of broadband sources, often using variable curvature line sources, combined with subwoofers. While it is common practice to fly the broadband sources to improve energy distribution in the audience, most subwoofer configurations remain ground-stacked because of practical constraints and alleged efficiency loss of flown configurations. This article aims at evaluating the efficiency of flown subwoofers for large audiences as compared to their ground-stacked counterparts. We use finite element simulations to determine the influence of several factors: baffling effect, trim height. We show that flown configurations remain efficient at the back of the venue while reducing the SPL excess at the front of the audience.
Convention Paper 10051 (Purchase now)


P05 - Signal Processing—Part 2

Wednesday, October 17, 2:30 pm — 5:30 pm (1E12)

Rémi Audfray, Magic Leap, Inc. - San Francisco, CA, USA

P05-1 A Pseudoinverse Technique for the Pressure-Matching Beamforming MethodMiller Puckette, University of California San Diego - San Diego, CA, USA; Tahereh Afghah, University of California San Diego - San Diego, CA, USA; Elliot Patros, University of California San Diego - San Diego, CA, USA
In this work an extension to the pressure-matching beamforming method (PMM) that’s well-suited for transaural sound field control is presented. The method aims to improve performance at dark points, locations relative to the array where sound pressure is minimized; without producing noticeable artifacts at bright points, locations where acoustic interference is minimized. The method’s new performance priorities result from replacing Tikhonov regularization, which is conventionally used in PMM, with a purpose-built regularization strategy for solving the pseudoinverse of ill-conditioned matrices. Discussions of how this method’s formulation affects the filter design process and of performance comparisons between this and conventional PMM filters are included.
Convention Paper 10052 (Purchase now)

P05-2 Analog Circuits and Port-Hamiltonian Realizability Issues: A Resolution Method for Simulations via Equivalent ComponentsJudy Najnudel, IRCAM - Paris, France; Thomas Hélie, IRCAM - CNRS-Sorbonne Université - Paris, France; Henri Boutin, IRCAM - Paris, France; David Roze, CNRS - Paris, France; STMS lab (UMR 9912, CNRS - Ircam - Sorbonne Université) - Paris, France; Thierry Maniguet, CNRS-Musé de la Musique - Paris, France; Stéphane Vaiedelich, CNRS-Musé de la Musique - Paris, France
In order to simulate the Ondes Martenot, a classic electronic musical instrument, we aim to model its circuit using Port-Hamiltonian Systems (PHS). PHS have proven to be a powerful formalism to provide models of analog electronic circuits for audio applications, as they guarantee the stability of simulations, even in the case of non-linear systems. However, some systems cannot be converted directly into PHS because their architecture cause what are called realizability conflicts. The Ondes Martenot circuit is one of those systems. In this paper a method is introduced to resolve such conflicts automatically: problematic components are replaced by equivalent components without altering the overall structure nor the content of the modeled physical system.
Convention Paper 10053 (Purchase now)

P05-3 Practical Realization of Dual-Shelving Filter Using Proportional Parametric EqualizersRémi Audfray, Magic Leap, Inc. - San Francisco, CA, USA; Jean-Marc Jot, Magic Leap - San Francisco, CA, USA; Sam Dicker, Magic Leap, Inc. - San Francisco, CA, USA
Proportional Parametric Equalizers have been proposed as an efficient tool for accurate magnitude response control within defined constraints. In particular, a combination of shelving filters can be used to create a 3-band parametric equalizer or tone control with minimal processing overhead. This paper picks up on this concept, fully develops the filter control equations, and proposes a look up table based implementation of the dual-shelving filter design.
Convention Paper 10054 (Purchase now)

P05-4 Measuring Audio when Clocks DifferMark Martin, Audio Precision - Beaverton, OR, USA; Jayant Datta, Audio Precision - Beaverton, OR, USA; Xinhui Zhou, Audio Precision - Beaverton, OR, USA
This paper examines what happens when a digital audio signal is measured with respect to a digital reference signal that was created based on a different clock. The resultant change in frequency and time depends on the degree of mismatch between clocks and may introduce a significant amount of distortion into the measurements. But the distortion is different from the types normally considered important in audio systems and may obscure other types of distortion, e.g., harmonic, IMD, and noise, that are more important to accurately assess. A difference in clocking essentially creates a difference in sample rates. Therefore, sample rate conversion methods can be used to mitigate the discrepancy. Although this approach is effective, it cannot be used when the sample rate difference is too small, it can be computationally intensive, and it cannot entirely eliminate the effects of a clocking difference. This paper describes a much simpler and more effective technique that requires minimal computation.
Convention Paper 10055 (Purchase now)

P05-5 Reducing Musical Noise in Transform Based Audio CodecsElias Nemer, XPERI/DTS - Calabasas, CA, USA; Jeff Thompson, XPERI/DTS - Calabasas, CA, USA; Ton Kalker, XPERI/DTS - Calabasas, CA, USA
This paper addresses the problem of musical noise in transform-based audio codecs. This artifact occurs when encoding audio segments with a noise-like spectrum—at low bit rates where signal quantization results in significant zero-valued coefficients. Due to the quantization commonly used, bands containing several non-zero coefficients are quantized to only one or two, giving rise to a musical artifact. This has been identified in other codecs, such as in CELT/OPUS , where a special transform is used prior to quantization to remedy the problem. In this paper we provide a modified approach consisting of a Hadamard transform combined with an interleaving scheme. Simulation shows the proposed method has a lower complexity and yields improved perceptual scores as measured by PEAQ.
Convention Paper 10056 (Purchase now)

P05-6 Statistical and Analytical Approach to System AlignmentJuan Sierra, Stanford University - Stanford, CA, USA; Meyer Sound Laboratories - Berkeley, CA, USA; Jonathan Kamrava, Meyer Sound Laboratories - Berkeley, CA, USA; Pablo Espinosa, Meyer Sound Laboratories - Berkeley, CA, USA; Jon Arneson, Meyer Sound Laboratories - Berkeley, CA, USA; Paul Kohut, Meyer Sound Laboratories - Berkeley, CA, USA
The current project describes the design of a tuning or alignment processor for a system based on multiple satellite speakers and a single subwoofer. It explains the methodology used to solve this problem and the procedure to arrive to a viable solution. On one side, the procedure was based on filter design techniques to optimize phase relationships; however, these phase relations are often unknown due to the possibility of changing the relative positions of the speakers. Accordingly, a statistical analysis was used to determine the most stable set of parameters across different speaker locations and acoustical environments, implying that the same alignment parameters can be implemented in multiple circumstances without significant performance degradation.
Convention Paper 10057 (Purchase now)


P06 - Transducers—Part 2

Thursday, October 18, 9:30 am — 12:00 pm (1E11)

Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions

P06-1 On the Interdependence of Loudspeaker Motor NonlinearitiesFinn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Franz Heuchel, Technical University of Denmark - Kgs. Lyngby, Denmark
Two of the main nonlinearities in the electrodynamic loudspeaker are the position dependence of the force factor, Bl, and the voice coil inductance, Le. Since they both are determined by the geometry of the motor structure, they cannot be independent. This paper investigates this dependence both analytically and via FEM simulations. Under certain simplifying assumptions the force factor can be shown to be proportional to the spatial derivative of the inductance. Using FEM simulations the implications of this relation is illustrated for drivers with more realistic geometry and material parameters.
Convention Paper 10058 (Purchase now)

P06-2 Comparison of Dynamic Driver Current Feedback MethodsJuha Backman, Huawei Technologies - Tampere, Finland; Genelec Oy - Iisalmi, Finland
Current feedback is a versatile method of modifying the behavior of a loudspeaker driver, with opportunity for linearization and matching the driver to the enclosure design targets, but depending on the chosen approach a potential risk of increasing the effects of either voice coil impedance variation or driver mechanical parameter nonlinearity. This work compares using a nonlinear simulation model various forms of current feedback, including current drive, finite positive or negative amplifier resistances, and negative resistance combined with a reactance.
Convention Paper 10059 (Purchase now)

P06-3 Non-Linear Optimization of Sound Field Control at Low Frequencies Produced by Loudspeakers in RoomsAdrian Celestinos, Samsung Research America - Valencia, CA, USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; Glenn S Kubota, Samsung Research America - Valencia, CA, USA
The low-frequency response of loudspeakers can be affected severely when placed in typical living rooms. Past investigations have focused on reducing the energy at the room resonances but not reducing seat-to-seat variation. Other works, using multiple loudspeakers, nullify the room modes or eliminate them with acoustic interference but are restricted by loudspeaker position and room geometry. In this work the loudspeaker position is not restricted and both frequency and time-domain methods are explored. Nonlinear optimization has been computed to improve the time response of the speaker in the room. Results have shown significant reduction of seat-to-seat variation. The time-frequency analysis reveals elimination of room resonances; producing a clear tight bass response.
Convention Paper 10060 (Purchase now)

P06-4 Evaluation of Efficiency and Voltage Sensitivity in Horn DriversAlexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA; Brian McLaughlin, Harman Professional - Northridge, CA, USA
There is a belief in horn driver design that if the resistive component of the input impedance of the diaphragm’s acoustical load is equal to the DC component of the voice coil’s electrical impedance, a maximum efficiency of 50% can be reached. This work shows that in reality, the compression driver’s real efficiency differs from the classical theory prediction. The discrepancy is explained by the fact that the driver’s output impedance and the acoustical loading impedance are different functions, not mere resistances. Additionally, the input electrical power and the output acoustical power are essentially integral functions of frequency and can be expressed by single numbers rather than frequency-dependent functions. Examples illustrating dependence of efficiency and sensitivity on various parameters are given.
Convention Paper 10061 (Purchase now)

P06-5 Characterization of Nonlinear Port Parameters in Loudspeaker ModelingDoug Button, Harman International - Northridge, CA, USA; Russ Lambert, L3 Communications - Salt Lake City, UT, USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; James Bunning, Harman International - Northridge, CA, USA
The outputs from ports used in common vented box loudspeakers are not linear with input level. With the goal of developing accurate modeling of vented boxes, a new method for estimation of nonlinear port parameters is shown. Acoustic mass and acoustic resistance parameters change with pressure in the enclosure and velocity in the port. Along with all nonlinear speaker parameters required for modeling, a practical way to characterize the changing acoustic mass and acoustic resistance is presented and tested with measurements. Using the new method, nonlinear port parameters are extracted for multiple box and port types.
Convention Paper 10062 (Purchase now)


P07 - Perception – Part 2

Thursday, October 18, 9:30 am — 12:00 pm (1E12)

Hyunkook Lee, University of Huddersfield - Huddersfield, UK

P07-1 Reproducing Low Frequency Spaciousness and Envelopment in Listening RoomsDavid Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
Envelopment – the perception of being surrounded by a beautiful acoustic space – is one of the joys of concert halls and great recordings. Recording engineers strive to capture reverberation independently in each channel, and reverberation from digital equipment is similarly non-coherent. But in playback rooms with a single low frequency driver or a single subwoofer reverberation is often flat, frontal, and without motion. In this paper we will show that full range loudspeakers or at least two independent subwoofers can bring low frequency envelopment back to a playback room. In some rooms minimizing room modes with high pressure at the listening position while maximizing lateral modes with minima near the listeners can help. If necessary, we put two independent subwoofers at the sides of the listeners.
Convention Paper 10063 (Purchase now)

P07-2 Evaluation of Implementations of the EBU R128 Loudness MeasurementBrecht De Man, Birmingham City University - Birmingham, UK
The EBU R128 / ITU-R BS.1770 specifications are widely followed in various areas of the audio industry and academic research. Different implementations exist in different programming languages with subtly differing design parameters. In this work we assess these implementations on their performance in terms of accuracy and functionality. As filter equations are notably absent from the standard, we reverse engineer the prescribed filter coefficients and offer more universal filter specifications. We also provide a simple implementation of the integrated loudness measure in JavaScript, MATLAB, and Python.
Convention Paper 10064 (Purchase now)

P07-3 Hyper-Compression in Music Production: Testing the “Louder Is Better” ParadigmRobert W. Taylor, University of Newcastle - Callaghan, NSW, Australia
Within the scope of the literature surrounding the Loudness War, the “louder is better” paradigm plays a cornerstone role in the motivation and also presumed justification for the continued use of hyper-compression. At the core of this assumption is the non-linearity of frequency response of the human auditory system first identified by Fletcher and Munson [11]. Previous research into listener preferences concerning hyper-compression has attempted to rationalize this production practice with audience expectations. The stimuli used in these studies have invariably been loudness normalized to remove loudness bias in audition so that only the perceptual cues of dynamic range compression (DRC) are under examination. The results of these studies have proven less than conclusive and varied. The research study presented herein examines the extent of influence the “louder is better” paradigm has on listener preferences via a direct comparison between listener preference tasks that present music that is loudness normalized and music that retains the level differentiation which is a by-product of the hyper-compression process. It was found that a level differential of 10 dB had a significant influence of listener preferences as opposed to the arguably weak perceptual cues of DRC.
Convention Paper 10065 (Purchase now)

P07-4 The Effect of Pinnae Cues on Lead-Signal Localization in Elevated, Lowered, and Diagonal Loudspeaker ConfigurationsWesley Bulla, Belmont University - Nashville, TN, USA; Paul Mayo, University of Maryland - College Park, MD, USA
In a follow-up to AES-143 #9832, this experiment employed a novel method that altered subjects’ pinna and examined the effects of modifying salient spectral cues on time-based vertical-oriented precedence in raised, lowered, and diagonal sagittal and medial planes. As suggested in the prior study, outcomes confirm perceptual interference from acoustic patterns generated via lead-lag signal interaction. Results provide clear physical and psychophysical evidence that reliable elevation cues may be rendered ineffective by stimuli such as those used in typical precedence-based experiments. Outcomes here demonstrate the salient and powerful influence of spectral information during lead-lag precedence-oriented tasks and suggest that prior research, in particular that concerned with so-called “vertical” precedence, may have been erroneously influenced by simple--yet profound--acoustic comb-filtering.
Convention Paper 10066 (Purchase now)

P07-5 Perceptual Audio Evaluation of Media Device Orchestration Using the Multi-Stimulus Ideal Profile MethodAlex Wilson, University of Salford - Salford, Greater Manchester, UK; Trevor Cox, University of Salford - Salford, UK; Nick Zacharov, Force Technology, SenseLab - Hørsholm, Denmark; Chris Pike, BBC R&D - Salford, UK; University of York - York, UK
The evaluation of object-based audio reproduction methods in a real-world context remains a challenge as it is difficult to separate the effects of the reproduction system from the effects of the audio mix rendered for that system. This is often compounded by the absence of explicitly-defined reference or anchor stimuli. This paper presents a perceptual evaluation of five spatial audio reproductions including Media Device Orchestration (MDO) in which a stereo mix is augmented by four additional loudspeakers. The systems are evaluated using the Multi-Stimulus Ideal Profile Method (MS-IPM), in which an assessor’s preferred value of each perceptual attribute is recorded in addition to their ratings of the systems. Principal Component Analysis was used to gain insight into the perceptual dimensions of the stimulus set and to investigate if the real systems overlap with the ideal profile. The results indicate that perceptual envelopment is the dominating perceptual factor for this set of reproduction systems, and the ideal system is one of high-envelopment.
Convention Paper 10067 (Purchase now)


P08 - Acoustics and Signal Processing

Thursday, October 18, 10:00 am — 11:30 am (Poster Area)

P08-1 Estimation of MVDR Beamforming Weights Based on Deep Neural NetworkMoon Ju Jo, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Choongsang Cho, Artificial Intelligence Research Center, Korea Electronics Technology Institute (KETI) - Sungnam, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper we propose a deep learning-based MVDR beamforming weight estimation method. The MVDR beamforming weight can be estimated based on deep learning using GCC-PHAT without the information on the source location, while the MVDR beamforming weight requires information on the source location. As a result of an experiment with REVERB challenge data, the root mean square error between the estimated weight and the MVDR weight was found to be 0.32.
Convention Paper 10068 (Purchase now)

P08-2 A Statistical Metric for Stability in Instrumental VibratoSarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
When instrumentalists perform with vibrato, they add a quasi-periodic frequency modulation to the note. Although this modulation is rarely purely sinusoidal, many methods for vibrato parameterization focus exclusively on the rate and depth of the frequency modulation, with less attention given to measuring how a performer’s vibrato changes over the course of a note. In this paper we interpret the vibrato trajectories as instantiations of a random process that can be characterized by an associated autocorrelation function and power spectral density. From these distributions, a coherence time can be estimated that describes the stability of the vibrato within a note. This metric can be used to characterize individual performers as well as for resynthesizing vibratos of different styles.
Convention Paper 10069 (Purchase now)

P08-3 Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification–An Approach Based on Audio “Fingerprinting”Magdalena Piotrowska, Gdansk University of Technology - Poland; Hear Candy Mastering - Poland; Grazina Korvel, Vilnius University - Vilnius, Lithuania; Adam Kurowski, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and non-aspirated allophones pronounced by native and non-native English speakers is recorded and then edited and analyzed. Allophones extracted from English speakers’ recordings are presented in the form of two-dimensional spectrogram images and used as input to train the Convolutional Neural Networks. Various settings of the spectral representation are analyzed to determine adequate option for the allophone classification. Then, testing is performed on the basis of non-native speakers’ utterances. The same approach is repeated employing learning algorithm but based on feature vectors. The archived classification results are promising as high accuracy is observed.
Convention Paper 10070 (Purchase now)

P08-4 Combining the Signals of Sound Sources Separated from Different Nodes after a Pairing Process Using an STFT-Based GCC-PHATCésar Clares-Crespo, University of Alcalá - Alcalá de Henares, Madrid, Spain; Joaquín García-Gómez, University of Alcalá - Alcalá de Henares, Madrid, Spain; Alfredo Fernández-Toloba, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Utrilla-Manso, University of Alcalá - Alcalá de Henares, Madrid, Spain
This paper presents a Blind Source Separation (BSS) algorithm proposed to work into a multi-node network. It is intended to be used indoors as an improvement to overcome some limitations of classical BSS algorithms. The goal is to improve the quality of the speech sources combining the signals of the same source separated from the different nodes located in the room. The algorithm matches the signals belonging to the same source through cross-correlations. It provides some stability to the quality of the separated sources since it takes advantage of the fact that the sources are likely to be correctly separated in some nodes.
Convention Paper 10071 (Purchase now)

P08-5 Reverberation Modeled as a Random Ensemble of ImagesStephen McGovern, Wire Grind Audio - Sunnyvale, CA, USA
Modeling box-shaped rooms with the image method requires many mathematical operations. The resulting reverberation effect is qualitatively flawed due to sweeping echoes and flutter. Efficiency is improved with the Fast Image Method, and sweeping echoes can be suppressed by using randomization. With both approaches, however, there is still a remaining problem with flutter. To address all of these issues, the Fast Image Method is modified to have both randomized image locations and increased symmetry. Additional optimizations are proposed and applied to the algorithm. The resulting audio effect has improved quality, and the computation time is dramatically reduced (by a factor usually exceeding 200) when compared to its ancestral Allen and Berkley algorithm. Some relevant perceptual considerations are also discussed.
Convention Paper 10072 (Purchase now)

P08-6 On the Accuracy of Audience Implementations in Acoustic Computer ModellingRoss Hammond, University of Derby - Derby, Derbyshire, UK; Peter Mapp Associates - Colchester, UK; Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Performance venue acoustics differ significantly due to audience size, largely from the change in absorption and reflection pathways. Creating acoustic models that accurately mimic these changes is problematic, showing significant variance between audience implementation methods and modelling techniques. Changes in total absorption per person due to audience size and density makes absorption coefficients selection difficult. In this research, FDTD simulations confirm that for densely packed audiences, diffraction leads to a linear correlation between capacity and total absorption at low frequencies, while at high frequencies there is less increase in total absorption per person. The significance of diffraction renders ray-tracing inaccurate for individually modelled audience members and has further implications regarding accuracy of standard audience modelling procedures.
Convention Paper 10073 (Purchase now)

P08-7 High Resolution Horizontal Arrays for Sound Stages and Room Acoustics: Concepts and BenefitsCornelius Bradter, State University of New York Korea - Shinan ICT, Incheon, Korea; Kichel Ju, Shinan Information & Communication Co., Ltd. - Gyeonggi-do, Korea
Wavefield-synthesis systems were the first to apply large horizontal arrays for sound reproduction and live sound. While there has been some success for experimental music or sound art, results for music and live sound are limited. The reasons for this are array sound quality, insufficient sound pressure level, spatial aliasing, and interference and inflexible array control algorithm. Nevertheless, the application of high resolution horizontal arrays with appropriate array control systems produces exceptional sound quality, extensive flexibility to adjust to acoustic conditions, and remarkable control over sound distribution. Advanced systems can overcome many shortcomings in traditional sound systems. This paper describes the concepts and technology of high performance HRH-arrays and their utilization for church sound through examples of application in Korean churches.
Convention Paper 10074 (Purchase now)


P09 - Spatial Audio-Part 1

Thursday, October 18, 1:30 pm — 3:30 pm (1E11)

Jean-Marc Jot, Magic Leap - San Francisco, CA, USA

P09-1 Impression of Spatially Distributed Reverberation in Multichannel Audio ReproductionSarvesh Agrawal, Rensselear Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA
Auditory immersion and spatial impression in multichannel audio reproduction can be altered by changing the number of loudspeakers and independent reverberation channels. The spatial impression can change drastically as one moves away from the sweet-spot. Since multichannel audio reproduction is not limited to one position, it is critical to investigate Listener Envelopment (LEV) and immersion at off-axis positions. This work discusses the impression of spatially distributed decorrelated reverberation at on- and off-axis positions. Laboratory environment is used to reproduce a diffused sound field in the horizontal plane through 128 independent audio channels and loudspeakers. Results from psychoacoustical experiments show that there are perceptible differences even at higher channel counts. However, spatial impression does not change significantly beyond 16 channels of decorrelated reverberation and equally spaced loudspeakers at on- and off-axis positions.
Convention Paper 10076 (Purchase now)

P09-2 From Spatial Recording to Immersive Reproduction—Design & Implementation of a 3DOF Audio-Visual VR SystemMaximillian Kentgens, RWTH Aachen University - Aachen, Germany; Stefan Kühl, RWTH Aachen University - Aachen, Germany; Christiane Antweiler, RWTH Aachen University - Aachen, Germany; Peter Jax, RWTH Aachen University - Aachen, Germany
The complex mutual interaction between human visual perception and hearing demands combined examinations of 360° video and spatial audio systems for Virtual Reality (VR) applications. Therefore, we present a joint audio-visual end-to-end chain from spatial recording to immersive reproduction with full rotational three degrees of freedom (3DOF). The audio-subsystem is based on Higher Order Ambisonics (HOA) obtained by Spherical Microphone Array (SMA) recordings, while the video is captured with a 360° camera rig. A spherical multi-loudspeaker setup for audio in conjunction with a VR head-mounted video display is used to reproduce a scene as close as possible to the original scene with regard to the perceptual modalities of the user. A database of immersive content as a basis for future research in spatial signal processing was set up by recording several rehearsals and concerts of the Aachen Symphony Orchestra. The data was used for a qualitative assessment of the eligibility of the proposed end-to-end system. A discussion shows the potential and limitations of the approach. Therein, we highlight the importance of coherent audio and video to achieve a high degree of immersion with VR recordings.
Convention Paper 10077 (Purchase now)

P09-3 Required Bit Rate of MPEG-4 AAC for 22.2 Multichannel Sound Contribution and DistributionShu Kitajima, NHK Science & Technology Research Laboratories - Tokyo, Japan; Takehiro Sugimoto, NHK - Setagaya-ku, Tokyo, Japan; Satoshi Oode, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tomoyasu Komori, NHK Science and Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Joji Urano, Japan Television Network Corporation - Tokyo, Japan
22.2 multichannel sound (22.2 ch sound) is currently broadcast using MPEG-4 Advanced Audio Coding (AAC) in 8K Super Hi-Vision broadcasting in Japan. The use of MPEG-4 AAC for contribution and distribution transmissions is also planned. Contribution and distribution transmissions require sufficient audio quality to withstand repeated coding and decoding processes. In this study the bit rate of MPEG-4 AAC for a 22.2 ch sound signal satisfying tandem transmission quality was investigated by the subjective evaluation specified in Recommendation ITU-R BS.1116-3. The basic audio quality of 72 stimuli made from a combination of 6 bit rates, 3 different numbers of tandems, and 4 contents were evaluated by 28 listeners. The required bit rates of 22.2 sound material transmission for 3, 5, and 7 tandems were concluded to be 96, 144, and 160 kbit/s per channel, respectively.
Convention Paper 10078 (Purchase now)

P09-4 Effect of Binaural Difference in Loudspeaker Directivity on Spatial Audio ProcessingDaekyoung Noh, Xperi/DTS - Santa Ana, CA, USA; Oveal Walker, XPERI/DTS - Calabasas, CA, USA
Head Related Transfer Functions (HRTFs) are typically measured with loudspeakers facing the listener. Therefore, it is assumed that loudspeaker directivity to the left and the right ear is equal. However, in practice the directivity to both ears may not be equal. For instance, differences can be caused by changes to a listener’s location or uncommon loudspeaker driver orientations. This paper discusses the effect of binaural difference in directivity of loudspeaker on spatial audio processing and proposes an efficient solution that improves the spatial effect by compensating the directivity difference. Subjective evaluation is conducted to measure the performance of the proposed solution.
Convention Paper 10079 (Purchase now)


P10 - Recording and Production

Thursday, October 18, 1:30 pm — 3:30 pm (1E12)

Doug Bielmeier, Northeastern University - Boston, MA, USA

P10-1 The Impact of Compressor Ballistics on the Perceived Style of MusicGary Bromham, Queen Mary University London - London, UK; Dave Moffat, Queen Mary University London - London, UK; Mathieu Barthet, Queen Mary University London - London, UK; György Fazekas, Queen Mary University of London - London, UK
Dynamic range compressors (DRC) are one of the most commonly used audio effect in music production. The timing settings are particularly important for controlling the manner in which they will shape an audio signal. We present a subjective user study of DRC, where a series of different compressor attack and release setting are varied and applied to a set of 30 sec audio tracks. Participants are then asked to rate which ballistic settings are most appropriate for the style of music in their judgment and asked to select one of a series of tag words to describe the style or setting of the song. Results show that the attack parameter influences perceived style more than the release parameter. From the study this is seen more evidently in the case of Jazz and Rock styles than in EDM or Hip-Hop. The area of intelligent music production systems might benefit from this study in the future as it may help to inform appropriateness for certain DRC settings in varying styles.
Convention Paper 10080 (Purchase now)

P10-2 Active Multichannel Audio DownmixAleksandr Karapetyan, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Felix Fleischmann, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Mixing down signals of a multichannel con?guration into a format with fewer channels is widely used in many areas of audio coding, production, and recording. Commonly used downmix methods are based on fixed downmix coefficients or mixing equations. Experience has shown that these passive methods cause quality losses in terms of speech intelligibility, vocal/instrumental clarity, and timbre-changes. In this paper a novel method is introduced that addresses these problems. The method aims to preserve the energy of the input signals during the downmix. In doing so, magnitude and phase are retrieved from two different approaches and combined afterwards. A listening test was conducted. The results prove that the introduced method has a significant positive effect on the aforementioned quality aspects.
Convention Paper 10081 (Purchase now)

P10-3 Microphone Array Geometry for Two Dimensional Broadband Sound Field RecordingWei-Hsiang Liao, Sony Corporation - Tokyo, Japan; Yuki Mitsufuji, Sony Corporation - Tokyo, Japan; Keiichi Osako, Sony Corporation - Tokyo, Japan; Kazunobu Ohkuri, Sony Corporation - Tokyo, Japan
Sound field recording with arrays made of omnidirectional microphones suffers from an ill-conditioned problem due to the zero and small values of the spherical Bessel function. This article proposes a geometric design of a microphone array for broadband two dimensional (2D) sound field recording and reproduction. The design is parametric, with a layout having a discrete rotationally symmetric geometry composed of several geometrically similar subarrays. The actual parameters of the proposed layout can be determined for various acoustic situations to give optimized results. This design has the advantage that it simultaneously satisfies many important requirements of microphone arrays such as error robustness, operating bandwidth, and microphone unit efficiency.
Convention Paper 10082 (Purchase now)

P10-4 Risk of Sound-Induced Hearing Disorders for Audio Post Production Engineers: A Preliminary StudyLaura Sinnott, City University of New York / Sensaphonics - Chicago & NYC; Barbara Weinstein, City University of New York - New York, NY, USA
In this preliminary study, sound dosimetry measurements were conducted at film studios to assess whether audio post-production engineers are at risk for sound-induced hearing loss. Additionally, we measured 23 engineers’ hearing thresholds and assessed their self-perception of hearing disorders via a new questionnaire. Our results show that most participants had at least one audiometric notch, which is an early indicator of noise-induced hearing loss, and most reported experiencing hearing disorders such as tinnitus. Dosimetry suggested that sound levels pose a low risk of permanent hearing loss according to NIOSH criteria, but these criteria are not protective for disorders such as tinnitus, cochlear synaptopathy or even early threshold shifts. We recommend routine hearing evaluations and the use of hearing protection to maintain healthy hearing.
Convention Paper 10083 (Purchase now)


P11 - Applications in Audio

Thursday, October 18, 2:45 pm — 4:15 pm (Poster Area)

P11-1 Optimal Exciter Array Placement for Flat-Panel Loudspeakers Based on a Single-Mode, Parallel-Drive LayoutDavid Anderson, University of Pittsburgh - Pittsburgh, PA, USA; Michael Heilemann, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Flat-Panel Loudspeakers are most effective at reproducing audio non-directionally when operating at frequencies with many overlapping modes. Frequency regions with low modal overlap produce directional acoustic radiation, long decay times, as well as sharp peaks and notches in pressure. Exciter arrays re-enable use of these frequency regions by restricting structural excitation to a single mode until the frequency region of high modal overlap. An optimization method is described here for determining the placement of exciters such that they are all driven by a single amplifier yet only excite a single structural mode. Experimental results are reported for an acrylic panel with 4, 8, and 11 exciters that demonstrate successful operation of the exciter arrangements.
Convention Paper 10084 (Purchase now)

P11-2 Solar Powered Autonomous Node for Wireless Acoustic Sensor Networks Based on ARM Cortex M4Alfredo Fernández-Toloba, University of Alcalá - Alcalá de Henares, Madrid, Spain; Héctor A. Sánchez-Hevia, University of Alcalá - Alcalá de Henares, Madrid, Spain; Rubén Espino-Sanjosé, University of Alcalá - Alcalá de Henares, Madrid, Spain; César Clares-Crespo, University of Alcalá - Alcalá de Henares, Madrid, Spain; Joaquín García-Gómez, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain
The project aims to show the hardware and software of a solar powered autonomous node for wireless acoustic sensor networks based on ARM Cortex M4. The device consists of the following components: a microcontroller, four microphones for audio processing, a radio frequency communication module, a microSD to store and read data, four buzzers to emit a sound, a GPS, and a temperature sensor. Furthermore, the device can be powered by a battery or a solar panel. The device is characterized by low consumption and a small size.
Convention Paper 10085 (Purchase now)

P11-3 Vibrational Contrast Control for Local Sound Source Rendering on Flat Panel LoudspeakersZiqing Li, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Pingzhan Luo, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
A vibrational contrast control method is proposed for two-dimensional audio display on a thin flat panel, which is based on maximizing the contrast of the average kinetic energy of the transverse motion between the radiated zone and the total zone. With the measured mobility matrix from the actuators to the measurement points, the optimal filter for each actuator is obtained based on the solutions of the maximization problem. The proposed method does not need to estimate both the natural frequencies and the modal shapes and therefore is easy to be implemented. Experimental results on an aluminum plate show that the proposed method can achieve high contrast level over a wide frequency range.
Convention Paper 10086 (Purchase now)

P11-4 Quantifying Listener Preference of Flat-Panel LoudspeakersMichael Heilemann, University of Rochester - Rochester, NY, USA; David Anderson, University of Pittsburgh - Pittsburgh, PA, USA; Stephen Roessner, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We present a perceptual evaluation of ?at-panel loudspeakers derived from anechoic amplitude response measurements. Seventy measurements were used to formulate frequency response curves for each loudspeaker characterizing the effects of listener position and in-room reflections. A model developed by Olive [1] was applied to the response curves to predict a preference rating for each loudspeaker. A commercial flat-panel speaker and a flat-panel speaker with a modal crossover network enabled/disabled were measured along with a conventional speaker. The modal crossover speaker scored over 10 points higher than the other flat-panel speakers and displayed a smooth low-frequency response. For flat-panel loudspeakers to produce preference ratings comparable with conventional speakers, structural improvements must be made to reduce the narrow band deviation at high-frequencies.
Convention Paper 10087 (Purchase now)

P11-5 Virtual Venues—An All-Pass Based Time-Variant Artificial Reverberation System for Automotive ApplicationsFriedrich von Türckheim, Harman Becker Automotive Systems GmbH - Munich, Germany; Adrian von dem Knesebeck, Harman Becker Automotive Systems GmbH - Munich, Germany; Tobias Münch, Harman International - Munich, Germany
This paper presents an arti?cial reverberation system for automotive applications. The core reverberation algorithm is based on a time-variant all-pass ?lter and delay network. It supports an arbitrary number of de-correlated ambience channels and enables the creation of individual direction-dependent early re?ection patterns for each output channel. Incorporating de-reverberation technology and microphones, the system allows for 3D direct-ambience upmixing of stereo content simulating the acoustics of existing concert halls as well as for actively modifying and improving the acoustics of car interiors. Ambisonics measurements of real rooms and concert halls serve as starting point for designing the virtual rooms. Adaptive effect stabilization guarantees a consistent spatial impression in presence of masking driving noise.
Convention Paper 10088 (Purchase now)

P11-6 Audio Portraiture –The Sound of Identity, an Indigenous Artistic EnquiryMaree Sheehan, Auckland University of Technology - Auckland, New Zealand
To date the potential of 3D immersive and binaural sound technologies have not been applied to audio portraiture nor considered as a means of approaching and expressing indigenous identity. This paper looks at an artistic, practice-led study that utilizes the technology of 3D immersive and binaural sound technologies to create audio portraits and depictions of indigenous Maori women (wahine) from New Zealand/Aotearoa. This enquiry is part of my Ph.D. doctoral research that seeks to artistically interpret the identity and multiple-dimensionality of these women through sound. By multiple-dimensionality, I refer to historical, physical, cognitive, social, emotional, political, and spiritual dimensions of being.
Convention Paper 10089 (Purchase now)

P11-7 Precision Maximization in Anger Detection in Interactive Voice Response SystemsInma Mohíno-Herranz, University of Alcalá - Alcalá de Henares, Madrid, Spain; Cosme Llerena-Aguilar, Sr., University of Alcalá - Alcala de Henares (Madrid), Spain; Joaquín García-Gómez, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Utrilla-Manso, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
Detection is usually carried out following the Neyman-Pearson criterion to maximize the probability of detection (true positives rate), maintaining the probability of false alarm (false positives rate) below a given threshold. When the classes are unbalanced, the performance cannot be measured just in terms of true positives and false positives rates, and new metrics must be introduced, such as Precision. “Anger detection” in Interactive Voice Response (IVR) systems is one application where precision is important. In this paper a cost function for features selection to maximize precision in anger detection applications is presented. The method has been proved with a real database obtained by recording calls managed by an IVR system, demonstrating its suitability.
Convention Paper 10090 (Purchase now)

P11-8 Automatic Guitar Tablature Transcription from Audio Using Inharmonicity Regression and Bayesian ClassificationJonathan Michelson, Electro-Harmonix / New Sensor Corporation - Queens, NY, USA; Richard Stern, Carnegie Mellon University - Pittsburgh, PA, USA; Thomas Sullivan, Carnegie Mellon University - Pittsburg, PA, USA
We propose two new methods to classify guitar strings for automated tablature transcription using only monophonic audio. The first method estimates the linear regression of log-inharmonicities of guitar strings with respect to their pitches and assigns unseen notes to the strings whose means and variances maximize the probability of their measured inharmonicities. The second method, developed as a baseline, characterizes the inharmonicity distribution of each fretboard position as a normal probability density, and then similarly assigns unseen notes to the fretboard positions that maximize the likelihood of their observed inharmonicities. Results from the standard Real World Corpus of guitar recordings show that exploiting regressions generally improves accuracy compared to our baseline, while both achieve adequate performance in guitar-independent test scenarios.
Convention Paper 10091 (Purchase now)

P11-9 Harmonic Drum Design Based on Multi-Objective Shape OptimizationAdam Szwajcowski, AGH University of Science and Technology - Kraków, Poland; Adam Pilch, AGH University of Science and Technology - Krakow, Poland
A vast majority of drums used in music have round membranes. Sound produced by such an instrument is non-harmonic, so one cannot perceive its pitch clearly. The paper aims to present possibilities of fusing additive synthesis and multi-objective optimization in order to find the relatively simple shape for which a membrane could produce harmonic sound. The proposed approach is based on Multi-Objective Particle Swarm Optimization and uses an original shape parametrization method based on Fourier series. Harmonicity of a drum was assessed based on additive synthesis using solutions of two-dimensional Helmholtz equation on irregular domains by means of the ?nite difference method.
Convention Paper 10092 (Purchase now)

P11-10 Troubleshooting Resource Reservation in Audio Video Bridging NetworksChristoph Kuhr, Anhalt University of Applied Sciences - Köthen, Germany; Alexander Carôt, Anhalt University of Applied Sciences - Köthen, Germany
The research project fast-music investigates the requirements of an infrastructure for 60 musicians of a conducted orchestra to do rehearsals via the public internet. Since a single server would not be able to handle the required amount of interleaved audio and video streams, process and distribute them again in a reasonable amount of time, a scalable cloud concept is more promising. The design of the realtime audio and video signal processing cloud, at the heart of a distributed live music session in the public internet, is operating on an Audio Video Bridging network segment. Such a realtime processing cloud requires a proper resource management for network resources. In this paper we present the concept for the processing cloud, evaluate on the resource management, and discuss a troubleshooting strategy for the stream reservation in Audio Video Bridging networks.
Convention Paper 10093 (Purchase now)

P11-11 The Sound Diffusion Simulation Software Basing on Finite-Difference Time-Domain MethodKamil Piotrowski, AGH University of Science and Technology - Kraków, Poland; Adam Pilch, AGH University of Science and Technology - Krakow, Poland
The aim of the project was to create an application that allows users to simulate acoustic wave propagation according to given input parameters. The program was based on MATLAB environment and most parts of it were designed using k-Wave toolbox, the package operating on a finite-difference time-domain method calculations (FDTD). The application enables to create a heterogeneous medium and measure sound pressure distribution in a simulated scenario. Separate program module contains time and frequency analysis of obtained waveforms and gives the user a possibility to visualize the results. What is more, the software also computes directional diffusion coefficient d in accordance with ISO 17497-2:2012 of defined sound diffusers and makes one independent from complex measurements in an anechoic chamber.
Convention Paper 10075 (Purchase now)


P12 - Transducers—Part 3

Friday, October 19, 9:00 am — 12:00 pm (1E11)

Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA
Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark

P12-1 A Stepped Acoustic Transmission Line Model of Interference Tubes for MicrophonesFrancesco Bigoni, Aalborg University - Copenhagen, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Eddy Bøgh Brixen, EBB-consult - Smørum, Denmark; DPA Microphones - Allerød, Denmark
This paper presents an extension of the standing-wave model of interference tubes for microphones by Ono et al. The original model accounts for three acoustic parameters: tube length, tube radius, and constant acoustic conductance per unit length. Our extension allows a varying conductance per unit length along the side wall. The assumptions behind the extended model and its ability to predict the frequency response of interference tubes are validated through simulations and by fitting the model parameters to frequency response measurements of a tube with varying conductance per unit length, using two different mountings. Results suggest that a tube with varying conductance per unit length is most effective at attenuating the off-axis sound if the conductance per unit length is decreased towards the tail end of the tube.
Convention Paper 10094 (Purchase now)

P12-2 Improving Audio Performance of Microphones Using a Novel Approach to Generating 48 Volt Phantom PoweringJoost Kist, Phantom Sound B.V. - Amsterdam, Netherlands; TritonAudio. - Alkmaar, Netherlands.; Dan Foley, Audio Precision - Beaverton, OR, USA
The introduction of the 48-volt phantom powering circuit in 1966 led to IEC 61938:1996. A key aspect of this powering circuit are the 6.81 k? precision resistors that are in parallel to the emitter-follower of the microphone preamplifier. These resistors act as a load on the emitter-follower that causes added distortion. A new approach is presented whereby, in series of these 6K8 resistors, an electronic circuit is placed that acts as a high input-impedance current source, which does not load the emitter-follower. By making this change, THD is decreased by 10 dB while also slightly improving the gain. Measurement results are presented comparing audio performance of a conventional 48-volt phantom power circuit and this new circuit along with circuit details.
Convention Paper 10095 (Purchase now)

P12-3 Challenges and Best Practices for Microphone End-of-Line TestingGregor Schmidle, NTi Audio AG - Schaan, Liechtenstein; Mark Beach, Beach Dynamics - Cincinnati, OH, USA; Brian MacMillan, NTi Audio Inc. - Tigard, OR, USA
Due to the increasing use of microphones in many applications such as automotive or artificial intelligence, the demand for fast and reliable microphone test processes is growing. This paper covers various aspects of the design of an end-of-line microphone test system. A prevailing challenge is to properly control the sound source, as loudspeakers have a tendency to vary their performance due to many influences. The acoustic environment for the test must provide reproducible conditions and is ideally anechoic. Noise from outside must be damped across the measurement bandwidth, so that it doesn’t affect the results. Different testing requirements for various types of microphones are shown. Different methods for defining limit criteria are discussed.
Convention Paper 10096 (Purchase now)

P12-4 Shotgun Microphone with High Directivity by Extra-Long Acoustic Tube and Digital Noise ReductionYo Sasaki, NHK Science & Technology Research Laboratories - Tokyo, Japan; Kazuho Ono, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
A prototype of a shotgun microphone having higher directivity than a conventional microphone has been developed to capture target sounds clearly. The shotgun microphone has a structure in which an acoustic tube is attached on a directional microphone capsule. The directivity pattern is formed by adjusting an acoustic resistor attached to orifices along the length of the tube. The prototype we developed has a 1-m long acoustic tube designed on the basis of a numerical calculation. It also includes additional microphone capsules and a digital signal processing circuit that reduce undesired acoustical signals arriving from directions other than the front. Measurements show that the developed shotgun microphone prototype achieves even higher directivity than conventional shotgun microphones.
Convention Paper 10097 (Purchase now)

P12-5 High Power Density for Class-D Audio Power Amplifiers Equipped with eGaNFETsAndreas Stybe Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark; Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
This paper presents how to optimize the power density of Class-D audio power amplifiers. The main task is to ensure that the ratio between the ripple current and the continuous output current is larger than one. When this is satisfied soft switching conditions are facilitated. Optimizing the amplifier power stage for soft switching while playing audio result in a more evenly distribution of the power dissipation between switching devices and filter inductors. Measured results on 150 Wrms test amplifiers equipped with eGaNFETs shows that the power density can reach 14.3 W/cm3, with THD+N levels as low as 0.03%. Moreover safe operating temperatures below 100°C when playing music with peaking powers of 200 W is achieved. Compared to state-of-the art, the power density of the amplifier module is improved with a factor 2–3.
Convention Paper 10098 (Purchase now)

P12-6 Estimation of the Headphone "Openness" Based on Measurements of Pressure Division Ratio, Headphone Selection Criterion, and Acoustic ImpedanceRoman Schlieper, Leibniz Universität Hannover - Hannover, Germany; Song Li, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
The study presented here investigates and compares three different methods regarding their suitability for determining the relative openness of circumaural and supraaural headphone types, namely: (1) the Pressure Division Ratio (PDR), (2) the Headphone Selection Criterion (HPC), and the Acoustic Impedance Curve (AIC). Measurements were conducted by using a custom built acoustic impedance measuring tube and an artificial dummy head (KEMAR 45BC-12). The results show that the openness of headphones can be determined best by their low-frequency acoustic impedance curves. Estimations using PDR and HPC show large measurement variations especially in the low frequency range where the perceptual occlusion effect dominates. We introduce the Occlusion Index (OI) that characterizes well the acoustical “openness” and possibly can be used as a reliable indicator for the perceived headphone occlusion.
Convention Paper 10099 (Purchase now)


P13 - Acoustics and Live Sound

Friday, October 19, 1:15 pm — 4:15 pm (1E11)

P13-1 A Quiet Zone System, Optimized For Large Outdoor Events, Based on Multichannel FxLMS ANCDaniel Plewe, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Jonas Brunskog, Technical University of Denmark - Kgs. Lyngby, Denmark
As part of the bigger EU project MONICA (Horizon2020) a local quiet zone system is being developed. This system provides a zone of quiet close to loud outdoor concerts in order to support communications or minimize the noise exposure of staff. Because the noise sources are the loudspeakers of the venue’s PA system an ideal reference signal can be obtained from the sound engineer’s mixing console, which can be used to apply methods of feedforward active noise control. This paper presents a real time application of the multichannel filtered reference least mean square algorithm (MCFxLMS), shows how it has been designed, implemented and tested under laboratory conditions.
Convention Paper 10104 (Purchase now)

P13-2 Advancements in Propagation Delay Estimation of Acoustic Signals in an Audience Service for Live EventsMarcel Nophut, Leibniz Universität Hannover - Hannover, Germany; Robert Hupke, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
In the course of the project PMSE-xG an audience service for live events – the Assistive Live Listening Service – was developed. The service uses supplementary augmented audio content presented through transparent headphones to enhance the traditional audio playback of a PA loudspeaker system. Augmented audio content and ambient sound are temporally aligned by the service. This paper proposes signal processing techniques to improve existing methods for estimating the propagation delay of acoustic signals in strongly reverberant environments. These refined methods aim to reduce the computational costs and allow the service to keep track of moving listeners. The proposed methods are evaluated based on realistic recordings of music and speech samples.
Convention Paper 10105 (Purchase now)

P13-3 Perceptual Evaluation of an Augmented Audience Service under Realistic Live ConditionsRobert Hupke, Leibniz Universität Hannover - Hannover, Germany; Marcel Nophut, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
With the usage of future 4G+/5G technologies in wireless equipment for “Programme Making and Special Events” (PMSE) new innovative live services and applications are possible. In this paper our novel “Assistive Live Listening Service” is presented. The service provides individualized additional augmented audio content to every single listener at a concert or voice-based live event by using an augmented reality audio headset that is able to provide both environmental sounds and supplemental audio content. A listening experiment was performed under realistic live conditions to investigate if the service enhances speech intelligibility without a loss of perceptual live experience. The results show that a perceptual enhancement is possible. Further steps to improve the service are discussed.
Convention Paper 10106 (Purchase now)

P13-4 Sound Field Control for Reduction of Noise from Outdoor ConcertsFranz Heuchel, Technical University of Denmark - Kgs. Lyngby, Denmark; Diego Caviedes Nozal, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
We investigate sound field control based on the concept of sound zones for the mitigation of low frequency noise from outdoor concerts to the surrounding area by adding secondary loudspeakers to the existing primary sound system. The filters for the secondary loudspeakers are the result of an optimization problem that minimizes the total sound pressure level of both primary and secondary loudspeakers in a sensitive area and the impact of the secondary loudspeakers on the audience area of the concert. We report results from three different experiments with increasing complexity and scale. The sound field control system was reducing the sound pressure level in the dark zone on average by 10 dB below 1 kHz in a small scale experiment in anechoic conditions, by up to 14 dB in a controlled large scale open-air experiment and by up to 6 dB at a pilot test at a music festival.
Convention Paper 10107 (Purchase now)

P13-5 Analysis of Piano Duo Tempo Changes in Varying Convolution Reverberation ConditionsJames Weaver, Queen Mary University London - London, UK; Mathieu Barthet, Queen Mary University London - London, UK; Elaine Chew, Queen Mary University London - London, UK
Acoustic design of performance spaces often places the performer relatively low in the hierarchy of needs in comparison to the quality of sound for an audience, and while there are a number of studies relating to solo performers’ and symphony orchestras preferred acoustic environments, there is a paucity of literature on objective measurements of the impact of acoustic spaces on smaller ensembles. This study aims to build a methodology for analysis of changes in ensemble musical expression caused by different acoustic environments and extends previous research in the area of acoustics and musical performance.
Convention Paper 10108 (Purchase now)

P13-6 Performance and Installation Criteria for Assistive Listening SystemsPeter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Approximately 12–15% of the population has a noticeable hearing loss and would benefit from a hearing aid or some form of assistive listening system. However, it is not generally appreciated that hearing aids have a limited operating distance and so are often used in conjunction with an assistive listening system. In recent years, as disability legislation has strengthened, the number of Assistive Listening System installations has dramatically increased, yet there is little guidance or standards available concerning the electroacoustic performance that these systems should meet. The paper reports novel research findings into the acoustic effectiveness of both hearing aids and assistive listening systems, reviews current and upcoming technologies, and sets out a number of potential performance guidelines and criteria.
Convention Paper 10109 (Purchase now)


P14 - Perception

Friday, October 19, 2:45 pm — 4:15 pm (Poster Area)

P14-1 Perception of Stereo Noise Bursts with Controlled Interchannel CoherenceSteven Crawford, University of Rochester - Rochester, NY, USA; Michael Heilemann, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Lateralization in stereo-rendered acoustic fields with controlled interchannel cross-correlation properties was explored in subjective listening tests. Participants indicated the perceived lateral locations of a series of 2 ms stereo white noise bursts with specified interchannel cross-correlation properties. Additionally, participants were asked to indicate the spatial location and apparent source width of a series of 2 sec white noise bursts composed of one-thousand 2 ms bursts with specified interchannel cross-correlation. The distribution of peak locations in the signals’ cross-correlation corresponds to the perceived spatial extent of the auditory image. This illustrates the role of the averaging time in the short-time windowed cross-correlation model of binaural hearing and how the coherence properties of audio signals determine source image properties in spatial audio rendering.
Convention Paper 10110 (Purchase now)

P14-2 Analysis of the performance of Evolved Frequency Log-Energy Coefficients in Hearing Aids for different Cost Constraints and ScenariosJoaquín García-Gómez, University of Alcalá - Alcalá de Henares, Madrid, Spain; Inma Mohíno-Herranz, University of Alcalá - Alcalá de Henares, Madrid, Spain; César Clares-Crespo, University of Alcalá - Alcalá de Henares, Madrid, Spain; Alfredo Fernández-Toloba, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain
Hearing loss is a common problem in old people. Nowadays hearing aids compensate these losses and make their life better, but they present some important issues (reduced battery life, requirement of real-time processing). Because of that, the algorithms implemented in these devices must work at low clock rates. Voice Activity Detection (VAD) is one of the main algorithms used in hearing aids, since it is useful for reducing the environmental noise and enhancing the speech intelligibility. In this paper a VAD algorithm will be tested using QUT-NOISE-TIMIT Corpus, with different computational cost constraints and at different locations.
Convention Paper 10111 (Purchase now)

P14-3 Evaluation of Additional Virtual Sound Sources in a 9.1 Loudspeaker ConfigurationSungsoo Kim, New York University - New York, NY, USA; Sripathi Sridhar, New York University - New York, NY, USA
This study aims to evaluate the addition of virtual sound sources to a 9.1 loudspeaker configuration in terms of spatial attributes such as envelopment and sound image width. It is the second part of a previous study where different upmixing algorithms to convert stereo to a 9.1 mix were examined. Four virtual sound sources (VSS) are added to a 9.1 configuration to simulate virtual loudspeakers in the height layer with the help of Vector-Based Amplitude Panning (VBAP). A subjective test is conducted to determine whether listeners perceive an improvement due to the addition of VSS channels in the height layer.
Convention Paper 10112 (Purchase now)

P14-4 Noticeable Rate of Continuous Change of Intensity for Naturalistic Music Listening in Attentive and Inattentive AudiencesYuval Adler, McGill University - Montreal, Quebec, Canada; CCRMA - Stanford, CA, USA
An investigation was done into the threshold of noticeability for a continuous rate of change in intensity and how listener attention affects this threshold rate. Results suggest listener attention has a strong effect on the threshold. Much previous work has been done to try and find the intensity discrimination threshold of human hearing involving comparison of consecutive stimuli differing in intensity but not with a constant change over a long time for a continuous stimulus. Exposure to high intensity sounds over time can damage hearing, so the driving goal behind this investigation is to inform development of a mechanism that will lower the intensity of sound listeners are subjected to when consuming music while having minimal effect on perceived loudness.
Convention Paper 10113 (Purchase now)

P14-5 Environment Replication with Binaural Recording: Three-Dimensional (3-D) Quadrant and Elevation Localization AccuracyJoseph Erichsen, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
The purpose of this experiment was to examine the spatial accuracy of an environmental image created via binaural recording. Listeners were asked to localize 10 sources each positioned in one of four horizontal quadrants in three vertical planes. A binaural-recording was created in both anechoic and reverberant environments and subjective tests were conducted. The experiment yielded data for a comparative study of the effectiveness of the binaural recording in recreating the perceived source locations in “3-D” space around a specified listening position. ANOVA for overall accuracy and target hit/miss binomial measures in the free-field and with binaural recordings via headphone reproduction revealed areas of concern for future investigation as well as measures of relative accuracy for the experimental environments.
Convention Paper 10114 (Purchase now)

P14-6 Localization of Elevated Virtual Sources Using Four HRTF DatasetsPatrick Flanagan, THX Ltd. - San Francisco, CA, USA; Juan Simon Calle Benitez, THX Ltd. - San Francisco, CA, USA
At the core of spatial audio renderers are the HRTF filters that are used to virtually place the sounds in space. There are different ways to calculate these filters, from acoustical measurements to digital calculations using images. In this paper we evaluate the localization of elevated sources using four different HRTF datasets. The datasets used are SADIE (York University), Kemar (MIT), CIPIC (UC Davis), and finally, a personalized dataset that uses an image-capturing technique in which features are extracted from the pinnae. Twenty subjects were asked to determine the location of randomly placed sounds by selecting the azimuth and the elevation from where they felt the sound was coming from. It was found that elevation accuracy is better for HRTFs that are located near elevation = 0°. There was a tendency to under-aim and over-aim towards the area between 0° and 20° in elevation. A high impact of elevation in azimuth location was observed in sounds placed above 60°.
Convention Paper 10115 (Purchase now)


P15 - Audio Education

Saturday, October 20, 9:00 am — 10:30 am (1E12)

Elsa Lankford, Towson University - Towson, MD, USA

P15-1 Development and Evaluation of an Audio Signal Processing Educational Tool to Support Somatosensory Singing ControlEvangelos Angelakis, National and Kapodistrian University of Athens - Athens, Greece; George Kosteletos, National and Kapodistrian University of Athens - Athens, Greece; Areti Andreopoulou, Laboratory of Music Acoustics and Technology (LabMAT) National and Kapodistrian University of Athens - Athens, Greece; Anastasia Georgaki, National and Kapodistrian University of Athens - Athens, Greece
This paper discusses a newly designed educational tool called “Match Your Own Voice!,” which aims at complementing modern vocal pedagogy. The software addresses the perceptual challenges faced by aspiring singers and the need for more objective unsupervised practice methods and quantified reference points to guide students through their training course. The tool is a real-time visual feedback application that uses lessons with a professional vocal coach as a reference for the students’ unassisted practice guidance. It is designed for use on portable computers. The conducted longitudinal study evaluating the tool’s effect on the students’ practice accuracy showed promising results.
Convention Paper 10116 (Purchase now)

P15-2 Future Educational Goals for Audio Recording and Production (ARP) Programs: A Decade of Supporting ResearchDoug Bielmeier, Northeastern University - Boston, MA, USA
This paper presentation reviews research collected over the last decade to forecast best practices and goals for future educators and students. A collective mandate of Audio Recording and Production (ARP) educational programs is to provide students with foundational skills/theory to begin a career in the recording and entertainment industry. This paper draws connections between a series of surveys designed to elicit many perspectives of people involved in the education and application of ARP programs, educators in ARP programs, and students enrolled in ARP programs. The review will provide insight and actionable items for continued innovations in ARP educational institutions and the maintenance of a healthy relationship between Industry and ARP programs for employees, employers, educators, and students.
Convention Paper 10117 (Purchase now)

P15-3 Case Study: An Interdisciplinary Audio CurriculumElsa Lankford, Towson University - Towson, MD, USA; Adam Schwartz, Towson University - Towson, MD, USA; Goucher College
Audio is interdisciplinary in nature, connecting the disciplines of music, radio, broadcasting, film, science, liberal and fine arts. Audio students benefit from a wide, multi-disciplined curricular approach. Towson University’s Radio/Audio concentration in the Electronic Media and Film department has evolved to cover a range of topics and disciplines to better prepare students for a constantly changing professional environment.
Convention Paper 10118 (Purchase now)


P16 - Spatial Audio

Saturday, October 20, 10:30 am — 12:00 pm (Poster Area)

P16-1 Spatial Audio Coding with Backward-Adaptive Singular Value DecompositionSina Zamani, University of California Santa Barbara - Santa Barbara, CA, USA; Kenneth Rose, University of California Santa Barbara - Santa Barbara, CA, USA
The MPEG-H 3D Audio standard applies singular value decomposition (SVD) to higher-order ambisonics data, and divides the outcome into prominent and ambient sound components, which are then separately encoded. We recently showed that significant compression gains are achievable by moving the SVD to the frequency domain, and ensuring smooth transition between frames. Frequency domain SVD also enables SVD adaptation to frequency, but the increase in side information, to specify additional basis vectors, compromises the gains. This paper overcomes this shortcoming by introducing backward adaptive estimation of SVD basis vectors, at no cost in side information, thereby approaching the full potential of frequency domain SVD. Objective and subjective tests show considerable gains that validate the effectiveness of the proposed approach.
Convention Paper 10119 (Purchase now)

P16-2 Virtual Source Reproduction Using Two Rigid Circular Loudspeaker ArraysYi Ren, University of Electro-Communications - Tokyo, Japan; Yoichi Haneda, The University of Electro-Communications - Chofu-shi, Tokyo, Japan
In this paper a virtual sound source reproduction method is proposed using two circular loudspeaker arrays with rigid baffles. This study aims to reproduce virtual sources in front of, or outside the loudspeaker arrays, with each array considered as an infinite-length rigid cylinder with loudspeakers attached to its surface. Transfer functions that consider the reflection between the two arrays are introduced, and the appropriate reflection times to be used in the transfer function are discussed. Using the pressure-matching method and circular harmonic expansion, several methods are proposed and compared via computer simulation
Convention Paper 10120 (Purchase now)

P16-3 Design and Implementation of a Binaural Reproduction Controller Applying Output Tracking ControlAtsuro Ito, NHK Science & Technology Research Laboratories - Tokyo, Japan; Kentaro Matsui, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kazuho Ono, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Hisao Hattori, Sharp Corporation - Japan; Takeaki Suenaga, Sharp Corporation - Japan; Kenichi Iwauchi, Sharp Corporation - Japan; Shuichi Adachi, Keio University - Yokohama-shi, Kanagawa, Japan
We have been studying a design method for a controller for binaural reproduction with loudspeakers. The gain of the controller amplifies errors due to external disturbances and system perturbations, and this leads to deterioration of the sound quality. Therefore, the gain should be suppressed to as low as possible. For this purpose, we formulate the design of the controller as a minimization problem of the gain, in which the H8 norm of the controller is adopted as a measure of the gain. In this article we also introduce a binaural reproduction system as an implementation example. This system virtually reproduces multichannel audio such as 22.2 multichannel audio using line array loudspeakers.
Convention Paper 10121 (Purchase now)

P16-4 Horizontal Binaural Signal Generation at Semi-Arbitrary Positions Using a Linear Microphone ArrayAsuka Yamazato, University of Electro-Communications - Tokyo, Japan; Yoichi Haneda, The University of Electro-Communications - Chofu-shi, Tokyo, Japan
Binaural technology using a dummy-head is a powerful technique to provide realistic sound reproduction through headphones. To obtain the binaural signals as if a listener moves around in the sound field, we need to move the dummy head. To overcome this problem, it is a promising approach to convert signals observed by a microphone array into binaural signals at arbitrary positions. In this paper we aim to reproduce horizontal binaural signals at semi-arbitrary listener’s positions using linear microphone array signals based on the inverse wave propagation method with spatial over sampling and simulated head-related transfer function (HRTF) directivity pattern. We perform the computer simulation and listening experiments in a reverberant room. A listening test is performed for two cases to verify the performance of the sound localization (case-I) and distance perception (case-II). We con?rm that the binaural signals obtained by the proposed method are almost expressed by the HRTF directivity pattern. We find that the angle errors of sound localization is ranged from 1.2° to 4.2° from the results of case-I. According to the results of case-II, the subjects can perceive a distance change of the virtual sound image when the auditory stimulus is white noise
Convention Paper 10122 (Purchase now)

P16-5 Near-Field Compensated Higher-Order Ambisonics Using a Virtual Source Panning MethodTong Wei, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Jinqiu Sang, Institute of Acoustics, Chinese Academy of Science - Beijing, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
The commonly adopted higher order ambisonics (HOA) mainly concentrates on far-field sources and neglects the rendering of near-field sources. Some studies have introduced near-field compensated HOA (NFC-HOA) to preserve the original spherical wave front curvature with lots of loudspeakers. It is worthy to combine the advantages of a physical reproduction approach with a hearing-related model approach to avoid using lots of loudspeakers in regular arrangement. In this paper an all-around virtual source panning method was proposed to improve driving functions of NFC-HOA with panning functions. In this way, a near-field sound source encoded in HOA can be rendered to arbitrary arrangement of only a few loudspeakers. Both the simulation and experimental results show the validity of the proposed method.
Convention Paper 10123 (Purchase now)

P16-6 Subjective Evaluation of Virtual Room Auralization System Based on the Ambisonics Matching Projection Decoding MethodZhongshu Ge, Peking University - Beijing, China; Yue Qiao, Peking University - Beijing, China; Shusen Wang, AES (Beijing) Science & Technology Co., Ltd. - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
Based on the higher order Ambisonics theory, a loudspeaker-based room auralization system was implemented in this paper in combination with a room acoustics computer model. In the decoding part of the Ambisonics technique, the generally used mode-matching decoding method requires a uniformly arranged loudspeaker array, which sometimes cannot be satisfied. A recently proposed method, the matching projection decoding method, which can solve this problem, was introduced in the room auralization system to realize reproduction of room re-verberation with non-uniform loudspeaker arrays. Moreover, the performance of the matching projection method was evaluated objectively through room impulse response reconstruction analysis. Besides, the room auralization system is validated through subjective experiments.
Convention Paper 10124 (Purchase now)

P16-7 A Study of the Effect of Head Rotation on Transaural ReproductionMarcos Simón, University of Southampton - Southampton, UK; Eric Hamdan, University of Southampton - Southampton, UK; Dylan Menzies, University of Southampton - Southampton, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
The reproduction of binaural audio through loudspeakers, also commonly referred to as Transaural audio, allows for the rendering of immersive virtual acoustic images when the original binaural signal is accurately delivered to the listener’s ears. Such accurate reproduction is generally achieved by using a network of cross-talk-cancellation filters designed for a given listener’s position and orientation. This work studies the effect of small rotational movements of the listener’s head on the perceived location of a virtual sound source when the binaural signal is reproduced using an array of loudspeakers. The results of numerical simulations presented in this paper describe how the perceived virtual source position is affected by the variation of the head orientation.
Convention Paper 10125 (Purchase now)

P16-8 A Parametric Spatial Audio Coding Method Based on Convolutional Neural NetworksQingbo Huang, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
The channel based 3D audio can be compressed to a down-mix signal with side information. In this paper the inter-channel transfer functions (ITF) are estimated through training over fitting convolutional neural networks (CNN) on a specific frame. Perfectly reconstructing the original channel and keeping the spatial cues the same is set as the target of the estimation. By taking this approach, more accurate spatial cues are maintained. The subjective evaluation experiments were carried out on stereo signals to evaluate the proposed method.
Convention Paper 10126 (Purchase now)


P17 - Semantic Audio

Saturday, October 20, 1:30 pm — 3:30 pm (1E11)

Rachel Bittner, New York University - New York, NY, USA

P17-1 Audio Forensic Gunshot Analysis and MultilaterationRobert C. Maher, Montana State University - Bozeman, MT, USA; Ethan Hoerr, Montana State University - Bozeman, MT, USA
This paper considers the opportunities and challenges of acoustic multilateration in gunshot forensics cases. Audio forensic investigations involving gunshot sounds may consist of multiple simultaneous but unsynchronized recordings obtained in the vicinity of the shooting incident. The multiple recordings may provide information useful to the forensic investigation, such as the location and orientation of the firearm, and if multiple guns were present, addressing the common question “who shot first?” Sound source localization from multiple recordings typically employs time difference of arrival (TDOA) estimation and related principles known as multilateration. In theory, multilateration can provide a good estimate of the sound source location, but in practice acoustic echoes, refraction, diffraction, reverberation, noise, and spatial/temporal uncertainty can be confounding.
Convention Paper 10100 (Purchase now)

P17-2 Speech Classification for Acoustic Source Localization and Tracking Applications Using Convolutional Neural NetworksJonathan D. Ziegler, Stuttgart Media University - Stuttgart, Germany; Eberhard Karls University Tübingen - Tübingen, Germany; Andreas Koch, Stuttgart Media University - Stuttgart, Germany; Andreas Schilling, Eberhard Karls University Tuebingen - Tuebingen, Germany
Acoustic Source Localization and Speaker Tracking are continuously gaining importance in fields such as human computer interaction, hands-free operation of smart home devices, and telecommunication. A set-up using a Steered Response Power approach in combination with high-end professional microphone capsules is described and the initial processing stages for detection angle stabilization are outlined. The resulting localization and tracking can be improved in terms of reactivity and angular stability by introducing a Convolutional Neural Network for signal/noise discrimination tuned to speech detection. Training data augmentation and network architecture are discussed; classification accuracy and the resulting performance boost of the entire system are analyzed.
Convention Paper 10101 (Purchase now)

P17-3 Supervised Source Localization Using Spot MicrophonesMiguel Ibáñez Calvo, International Audio Laboratories Erlangen - Erlangen, Germany; Maria Luis Valero, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
Spatial microphones are used to acquire sound scenes, while spot microphones are commonly used to acquire individual sound sources with high quality. These recordings are essential when producing spatial audio upmixes. However, to automatically create upmixes, or to assist audio engineers in creating these, information about the position of the sources in the scene is also required. We propose a supervised sound source localization method to estimate the direction-of-arrival (DOA) of several simultaneously active sound sources in reverberant and noisy environments that utilizes several spot microphones and a single spatial microphone. The proposed method employs system identification techniques to estimate the relative impulse responses between each spot microphone and the spatial microphone from which the DOAs can be extracted.
Convention Paper 10102 (Purchase now)

P17-4 Multichannel Fusion and Audio-Based Features for Acoustic Event ClassificationDaniel Krause, AGH University of Science and Technology - Kraków, Poland; Fitech; Konrad Kowalczyk, AGH University of Science and Technology - Kraków, Poland
Acoustic event classification is of interest for various audio applications. The aim of this paper is to investigate the usage of a number of speech and audio based features in the task of acoustic event classification. Several features that originate from audio signal analysis are compared with features typically used in speech processing such as mel-frequency cepstral coefficients (MFCCs). In addition, the approaches to fuse the information obtained from multichannel recordings of an acoustic event are investigated. Experiments are performed using a Gaussian mixture model (GMM) classifier and audio signals recorded using several scattered microphones.
Convention Paper 10103 (Purchase now)


P18 - Spatial Audio-Part 2 (Evaluation)

Saturday, October 20, 1:30 pm — 4:00 pm (1E12)

Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA

P18-1 Prediction of Binaural Lateralization Percepts from the Coherence Properties of the Acoustic WavefieldMark F. Bocko, University of Rochester - Rochester, NY, USA; Steven Crawford, University of Rochester - Rochester, NY, USA; Michael Heilemann, University of Rochester - Rochester, NY, USA
A framework is presented that employs the space-time coherence properties of acoustic wavefields to compute features corresponding to listener percepts in binaural localization. The model employs a short-time windowed cross-correlator to compute a sequence of interaural time differences (ITDs) from the binaurally-sampled acoustic wavefield. The centroid of the distribution of this sequence of measurements indicates the location of the virtual acoustic source and the width determines the perceived spatial extent of the source. The framework provides a quantitative method to objectively assess the performance of various free-space and headphone-based spatial audio rendering schemes and thus may serve as a useful tool for the analysis and design of spatial audio experiences in VR/AR and other spatial audio systems.
Convention Paper 10127 (Purchase now)

P18-2 Influence of Visual Content on the Perceived Audio Quality in Virtual RealityOlli Rummukainen, Fraunhofer IIS - Erlangen, Germany; Jing Wang, Beijing Institute of Technology - Beijing, China; Zhitong Li, Beijing Institute of Technology - Beijing, China; Thomas Robotham, International Audio Laboratories Erlangen - Erlangen, Germany; Zhaoyu Yan, Beijing Institute of Technology - Beijing, China; Zhuoran Li, Beijing Institute of Technology - Beijing, China; Xiang Xie, Beijing Institute of Technology - Beijing, China; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
To evoke a place illusion, virtual reality builds upon the integration of coherent sensory information from multiple modalities. This integrative view of perception could be contradicted when quality evaluation of virtual reality is divided into multiple uni-modal tests. We show the type and cross-modal consistency of visual content to affect overall audio quality in a six-degrees-of-freedom virtual environment with expert and naïve participants. The effect is observed both in their movement patterns and direct quality scores given to three real-time binaural audio rendering technologies. Our experiments show that the visual content has a statistically signi?cant effect on the perceived audio quality.
Convention Paper 10128 (Purchase now)

P18-3 HRTF Individualization: A SurveyCorentin Guezenoc, Centrale-Supélec - Rennes, France; 3D Sound Labs - Rennes, France; Renaud Seguier, Centrale-Supélec - Rennes, France
The individuality of head-related transfer functions (HRTFs) is a key issue for binaural synthesis. While, over the years, a lot of work has been accomplished to propose end-user-friendly solutions to HRTF personalization, it remains a challenge. In this article we establish a state-of-the-art of that work. We classify the various proposed methods, review their respective advantages and disadvantages, and, above all, methodically check if and how the perceptual validity of the resulting HRTFs was assessed.
Convention Paper 10129 (Purchase now)

P18-4 Spatial Auditory-Visual Integration: The Case of Binaural Sound on a SmartphoneJulian Moreira, Cnam (CEDRIC) - Paris, France; Orange Labs - Lannion, France; Laetitia Gros, Orange - Lannion, France; Rozenn Nicol, Orange Labs - Lannion, France; Isabelle Viaud-Delmon, IRCAM - CNRS - Paris, France
Binaural rendering is a technology for spatialized sound that can be advantageously coupled with the visual of a mobile phone. By rendering the auditory scene out of the screen, all around the user, it is a potentially powerful tool of immersion. However, this audio-visual association may lead to specific perception artifacts. One of them is the ventriloquist effect, i.e., the perception of a sound and an image as they come from the same location, while they are actually at different places. We investigate the conditions of this effect to occur using an experimental method called Point of Subjective Spatial Alignment (PSSA). Given the position of a visual stimulus, we determine the integration window, i.e., the range of locations in the horizontal plane where auditory stimulus is perceived as matching the visual stimulus location. Several parameters are varied: semantic type of the stimuli (neutral or meaningful) and sound elevation (same elevation as the visual or above subject’s head). Results reveal the existence of an integration window in all cases. But, surprisingly, the sound is attracted by the visual as located in the virtual scene, rather than its real location on screen. We interpret it as a mark of immersion. Besides, we observe that integration window is not altered by elevation, provided that stimuli are semantically meaningful.
Convention Paper 10130 (Purchase now)

P18-5 Online vs. Offline Multiple Stimulus Audio Quality Evaluation for Virtual RealityThomas Robotham, International Audio Laboratories Erlangen - Erlangen, Germany; Olli Rummukainen, Fraunhofer IIS - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
Virtual reality technology incorporating six degrees-of-freedom introduces new challenges for the evaluation of audio quality. Here, a real-time “online” evaluation platform is proposed, allowing multiple stimulus comparison of binaural renderers within the virtual environment, to perceptually evaluate audio quality. To evaluate the sensitivity of the platform, tests were conducted using the online platform with audiovisual content, and two traditional platforms with pre-rendered “off-line” audiovisual content. Conditions employed had known relative levels of impairments. A comparison of the results across platforms indicates that only the proposed online platform produced results representative of the known impaired audio conditions. Off-line platforms were found to be not sufficient in detecting the tested impairments for audio as part of a multi-modal virtual reality environment.
Convention Paper 10131 (Purchase now)


Return to Paper Sessions