AES New York 2019
Paper Session Details

P01 - Applications in Audio

Wednesday, October 16, 9:00 am — 11:30 am (1E10)

Chair:
Kevin Bastyr, Harman International - Novi, MI, USA

P01-1 Personal Sound Zones: A Comparison between Frequency and Time Domain Formulations in a Transportation Context—Lucas Vindrola, LAUM - Les Mans, France; PSA Group - Rueil-Malmaison, France; Manuel Melon, Le Mans Université - Le Mans cedex 9, France; Jean-Christophe Chamard, PSA Group - Rueil-Malmaison, France; Bruno Gazengel, Université du Maine - Le Mans Cedex 9, France; Guy Plantier, LAUM - Les Mans, France
This paper compares the formulation of a least-squares pressure matching algorithm in the frequency and time domains for the generation of Personal Sound Zones (PSZ) for a transportation application. Due to variations in the transportation’s acoustic environment, the calculation time is added to the usually found metrics in the PSZ bibliography (like Acoustic Contrast, Effort, etc.). Both formulations are implemented to control two zones in three configurations (4, 6, and 8 sources), using monopole simulations and anechoic measurements. In spite of not always presenting perfectly causal filters—pre-ringing in some filters occurs in some cases—the frequency domain formulation allows achieving equal levels of Acoustic Contrast, Effort, and Reproduction error more than 500 times faster than the time domain formulation.
Convention Paper 10216 (Purchase now)

P01-2 Mitigating the Effect of In-Vehicle Road Noise Cancellation on Music Playback—Tao Feng, Harman International - Novi, MI, USA; Kevin Bastyr, Harman International - Novi, MI, USA
A Road Noise Cancellation (RNC) system is an Active Noise Cancellation (ANC) system implemented in a vehicle in order to minimize undesirable road noise inside the passenger cabin. Current RNC systems undesirably affect the frequency response of music playback. The RNC system’s error microphones sense all the sound in the passenger cabin, including the music. Hence, RNC systems will cancel this total sensed sound and not only the road induced noise. A new True Audio algorithm can directly remove the music signal from the error microphone signals and leave only the interior noise portion. In order to correctly estimate the music portion at the error microphones, True Audio implements a novel control topology based on a new multiple channel, real time modeling of the music’s secondary path transfer function. To validate the effectiveness of the proposed algorithm, experimental and numerical simulations were performed. The numerical studies use logs of real sensors mounted on a vehicle forming an RNC system with six reference accelerometers, five control speakers and six error microphones. Both the models and measurements show that the True Audio algorithm preserves the frequency response of music when the RNC system is activated.
Convention Paper 10217 (Purchase now)

P01-3 Effect of a Global Metronome on Ensemble Accuracy in Networked Music Performance—Robert Hupke, Leibniz Universität Hannover - Hannover, Germany; Lucas Beyer, Leibniz Universität Hannover - Hannover, Germany; Marcel Nophut, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
Several rhythmic experiments with pairs drawn from a group of 23 subjects were performed to investigate the effect of a global metronome on the ensemble accuracy in Networked Music Performance (NMP). Artificial delays up to 91 ms were inserted into the audio transmission between the subjects. To investigate the dependencies between delay times, ensemble accuracy and the highly synchronized global metronome, the experiments were evaluated in terms of tempo acceleration, imprecision and subjective judgment of the ensemble play. The results show that the global metronome leads to a stabilization of the tempo acceleration caused by the delay. The imprecision stays constant to a threshold of about 28 ms and 36 ms, depending on the delay compensating strategy the subjects used. Winner of the 147th AES Convention Student Paper Award
Convention Paper 10218 (Purchase now)

P01-4 Evaluation of Multichannel Audio in Automobiles versus Mobile Phones—Fesal Toosy, University of Central Punjab - Lahore, Pakistan; Muhammad Sarwar Ehsan, University of Central Punjab - Lahore, Pakistan
Multichannel surround and 3D audio are slowly gaining popularity and eventually commercial content in these formats will become common. Many automobiles still have a stereo sound system with some firmware or software that is capable of rendering multichannel audio into stereo. This paper shows the results of a listening test for multichannel audio conducted in a medium-sized car. The results of this test are compared to the results of a listening test for the same audio excerpts but conducted on a mobile phone with headphones. The results show that on mobile phones, multichannel audio clearly outperforms stereo in terms of perceived audio quality as rated by a user. However in automobiles, multichannel audio only shows marginal improvement in the rated audio quality.
Convention Paper 10219 (Purchase now)

P01-5 Realizing An Acoustic Vector Network Analyzer—Marcus MacDonell, University of Waikato - Hamilton, Waikato, New Zealand; Jonathan Scott, University of Waikato - Hamilton, Waikato, New Zealand
Acoustic absorption, reflection, and transmission is typically measured using an impedance tube. We present the design and initial measurements of a radically different measurement system. The instrument builds on the rich history and deep mathematics developed in pursuit of electromagnetic Vector-corrected Network Analyzers (VNAs). Using acoustic directional couplers and a traditional VNA mainframe we assembled an “Acoustic Vector Network Analyzer” (AVNA). The instrument measures acoustic scattering parameters, the complex reflection and transmission coefficients, of materials, transmission lines, ported structures, ducts, etc. After the fashion of electromagnetic VNAs we have constructed millimeter-wave measurement heads that span the 800 Hz–2200 Hz (420–150 mm) and 10 kHz–22 kHz (35–15 mm) bands, demonstrating scalability. We present initial measurement results.
Convention Paper 10220 (Purchase now)

P02 - Audio Signal Processing

Wednesday, October 16, 9:00 am — 12:00 pm (1E11)

Chair:
Scott Hawley, Belmont University - Nashville, TN, USA

P02-1 Analyzing and Extracting Multichannel Sound Field—Pei-Lun Hsieh, Ambidio - Glendale, CA, USA
Current post production workflow requires sound engineers to create multiple multichannel audio delivery formats. Inaccurate translation between formats may lead to more time and cost for extra manual adjustment; whereas in sound reproduction, it causes misinterpretation of the original mix and deviation from the intended story. This paper proposes a method that combines both analyzing an encoded Ambisonics field from the input multichannel signal and analyzing between each pair of adjacent channels. This allows an overall understanding of the multichannel sound field while having the ability to have a fine extraction from each channel pair. The result can be used to translate between multichannel formats and also to provide a more accurate rendering for immersive stereo playback.
Convention Paper 10221 (Purchase now)

P02-2 Profiling Audio Compressors with Deep Neural Networks—Scott Hawley, Belmont University - Nashville, TN, USA; Benjamin Colburn, ARiA Acoustics - Washington, DC, USA; Stylianos Ioannis Mimilakis, Fraunhofer Institute for Digital Media Technology (IDMT) - Ilmenau, Germany
We present a data-driven approach for predicting the behavior of (i.e., profiling) a given parameterized, non-linear time-dependent audio signal processing effect. Our objective is to learn a mapping function that maps the unprocessed audio to the processed, using time-domain samples. We employ a deep auto-encoder model that is conditioned on both time-domain samples and the control parameters of the target audio effect. As a test-case, we focus on the offline profiling of two dynamic range compressors, one software-based and the other analog. Our results show that the primary characteristics of the compressors can be captured, however there is still sufficient audible noise to merit further investigation before such methods are applied to real-world audio processing workflows.
Convention Paper 10222 (Purchase now)

P02-3 Digital Parametric Filters Beyond Nyquist Frequency—Juan Sierra, Stanford University - Stanford, CA, USA; Meyer Sound Laboratories - Berkeley, CA, USA
Filter Digitization through the Bilinear Transformation is often considered a very good all-around method to produce equalizer sections. The method is well behaved in terms of stability and ease of implementation; however, the frequency warping produced by the transformation leads to abnormalities near the Nyquist frequency. Moreover, it is impossible to design parametric sections whose analog center frequencies are defined above the Nyquist frequency. These filters, even with center frequencies outside of the hearing range, have effects that extend into the hearing bandwidth with desirable characteristics during mixing and mastering. Surpassing these limitations, while controlling the abnormalities of the warping produced by the Bilinear Transform through an alternative definition of the Bilinear constant is the purpose of this paper. In the process, also a correction factor is discussed for the bandwidth of the parametric section to correct abnormalities affecting the digitization of this parameter.
Convention Paper 10224 (Purchase now)

P02-4 Using Volterra Series Modeling Techniques to Classify Black-Box Audio Effects—Ethan Hoerr, Montana State University - Bozeman, MT, USA; Robert C. Maher, Montana State University - Bozeman, MT, USA
Digital models of various audio devices are useful for simulating audio processing effects, but developing good models of nonlinear systems can be challenging. This paper reports on the in-progress work of determining attributes of black-box audio devices using Volterra series modeling techniques. In general, modeling an audio effect requires determination of whether the system is linear or nonlinear, time-invariant or –variant, and whether it has memory. For nonlinear systems, we must determine the degree of nonlinearity of the system, and the required parameters of a suitable model. We explain our work in making educated guesses about the order of nonlinearity in a memoryless system and then discuss the extension to nonlinear systems with memory.
Convention Paper 10225 (Purchase now)

P02-5 Modifying Audio Signals for Reproduction with Reduced Room Effect—Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
Conventionally, equalizers are applied when reproducing audio signals in rooms to reduce coloration and effect of room resonances. Another approach, filtering audio signals with an inverse of the room impulse response (RIR), can theoretically eliminate the effect of the room in one point. But practical issues arise such as impaired sound at other positions, a need to update when RIRs change, and loudspeaker-challenging signals. A technique is presented, which modifies the time-frequency envelopes (spectrogram) of audio signals, such that the corresponding spectrogram in the room is more similar to the original signal’s spectrogram, i.e., room effect is attenuated. The proposed technique has low sensitivity on RIR and listener position changes.
Convention Paper 10226 (Purchase now)

P02-6 On the Similarity between Feedback/Loopback Amplitude and Frequency Modulation—Tamara Smyth, University of California, San Diego - San Diego, CA, USA
This paper extends previous work in loopback frequency modulation (FM) to a similar system in which an oscillator is looped back to modulate its own amplitude, so called feedback amplitude modulation (FBAM). A continuous-time closed-form solution is presented for each, yielding greatly improved numerical properties, reduced dependency on sampling rate, and a more accurate representation of the feedback by eliminating the unit-sample delay required for discrete-time implementation. Producing similar waveforms, it is shown that FBAM for a known input frequency, is actually a scaled and offset version of loopback FM having a different carrier frequency but same sounding frequency. Two distinct representations are used to show mathematical equivalence between systems while validating the closed-form solution for each.
Convention Paper 10223 (Purchase now)

P3 - Posters: Transducers

Wednesday, October 16, 10:30 am — 12:00 pm (South Concourse A)

P3-1 Acoustic Beamforming on Transverse Loudspeaker Array Constructed from Micro-Speakers Point Sources for Effectiveness Improvement in High-Frequency Range—Bartlomiej Chojnacki, AGH University of Science and Technology - Cracow, Poland; Mega-Acoustic - Kepno, Poland; Klara Juros, AGH University of Science and Technology - Cracow, Poland; Daniel Kaczor, AGH University of Science and Technology - Cracow, Poland; Tadeusz Kamisinski, AGH University of Science and Technology - Cracow, Poland
Variable directivity speaker arrays are very popular in many acoustic aspects, such as wearable systems, natural sources simulations or acoustic scanners. Standard systems constructed from traditional drivers, despite great DSP, have limited beamforming possibilities because of the very narrow directivity patterns for loudspeakers in high frequencies. This paper presents a new approach for micro-speakers array design from monopole sources, based on isobaric speaker configuration. New solutions allow to reach high efficiency in broadband frequency range keeping matrix size small. This presentation will contain an explanation of used isobaric speaker principles and comparisons between standard transverse transducers matrix and innovative point-source matrix in two configurations. Achieved results allow to improve beamforming effectiveness in a high frequency range with new driver matrix construction.
Convention Paper 10227 (Purchase now)

P3-2 Spherical Microphone Array Shape to Improve Beamforming Performance—Sakurako Yazawa, NTT - Tokyo, Japan; Hiroaki Itou, NTT - Tokyo, Japan; Ken'ichi Noguchi, NTT Service Evolution Laboratories - Yokosuka, Japan; Kazunori Kobayashi, NTT - Tokyo, Japan; Noboru Harada, NTT Communicatin Science Labs - Atsugi-shi, Kanagawa-ken, Japan
A 360-degree steerable super-directional beamforming are proposed. We designed a new acoustic baffle for spherical microphone array to achieve both small size and high performance. The shape of baffle is a sphere with parabola-like depressions; therefore, sound-collection performance can be enhanced using reflection and diffraction. We first evaluated its beamforming performance through simulation then fabricated a 3D prototype of an acoustic baffle microphone array with the proposed baffle shape and compared its performance to that of a conventional spherical 3D acoustic baffle. This prototype exhibited better beamforming performance. We built microphone array system that includes the proposed acoustic baffle and a 360-degree camera, our system can pick up match sound to an image in a specific direction in real-time or after recording. We have received high marks from users who experienced the system demo.
Convention Paper 10228 (Purchase now)

P3-3 Infinite Waveguide Termination by Series Solution in Finite Element Analysis—Patrick Macey, PACSYS Limited - Nottingham, UK
The acoustics of an audio system may comprise of several components, e.g., a compression driver producing plane waves, a transition connecting to the throat of a horn, and a cylindrical horn which is baffled at the mouth. While finite elements/boundary elements can model the entire system, it is advantageous from the design perspective to consider simplified systems. A compression driver might be used in many situations and should be designed radiating plane waves, without cross modes, into a semi-infinite tube. The pressure field in the tube can be represented by a series that is coupled to the finite element mesh by a DtN approach. The method is generalized to cater for ducts of arbitrary cross section and infinite cylindrical horns.
Convention Paper 10229 (Purchase now)

P3-4 Evaluating Listener Preference of Flat-Panel Loudspeakers—Stephen Roessner, University of Rochester - Rochester, NY, USA; Michael Heilemann, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
Three flat-panel loudspeakers and two conventional loudspeakers were evaluated in a blind listening test. Two of the flat-panel loudspeakers used in the test were prototypes employing both array-based excitation methods and constrained viscoelastic damping to eliminate modal resonant peaks in the mechanical response of the vibrating surface. The remaining flat-panel speaker was a commercially available unit. A set of 21 listeners reported average preference ratings of 7.00/10 and 6.81/10 for the conventional loudspeakers, 6.48/10 and 5.90/10 for the prototype flat-panel loudspeakers, and 2.24/10 for the commercial flat-panel speaker. The results are consistent with those given by a predictive model for listener preference rating, suggesting that designs aimed at smoothing the mechanical response of the panel lead to improved preference ratings.
Convention Paper 10230 (Purchase now)

P3-5 Modelling of a Chip Scale Package on the Acoustic Behavior of a MEMS Microphone—Yafei Nie, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Jinqiu Sang, Chinese Academy of Sciences - Beijing, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Micro-electro-mechanical system (MEMS) microphones have been widely used in the mobile devices in recent decades. The acoustic effects of a chip scale package on a MEMS microphone needs to be validated. Previously a lumped equivalent circuit model was adopted to analyze the acoustic frequency response of the package. However, such a theoretical model cannot predict performance at relatively high frequencies. In this paper a distributed parameter model was proposed to simulate the acoustic behavior of the MEMS microphone package. The model illustrates how the MEMS microphone acoustic transfer function is affected by the size of sound hole, the volumes of the front and back chamber. This model also can illustrate the mechanical response of the MEMS microphone. The proposed model provided a more reliable way towards an optimized MEMS package structure.
Convention Paper 10231 (Purchase now)

P3-6 Personalized and Self-Adapting Headphone Equalization Using Near Field Response—Adrian Celestinos, Samsung Research America - Valencia, CA, USA; Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Ritesh Banka, Samsung Research America - Valencia, CA USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions
Variability in the acoustical coupling of headphones to human ears depends on a number of factors. Placement, size of user’s head and ears, the headband and ear-pad material are all major contributors to the sound quality delivered by the headphone to the user. By measuring the transfer function from the driver terminals to a miniature microphone set near the driver inside the cavity produced by the headphone and the ear, the degree of acoustical coupling and the fundamental frequency of the cavity volume was acquired. An individualized equalization on these measurements was applied to every user. Listeners rated the personalized EQ significantly higher than a generic target response and slightly higher than the bypassed headphone.
Convention Paper 10232 (Purchase now)

P3-7 Applying Sound Equalization to Vibrating Sound Transducers Mounted on Rigid Panels—Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Alessandro Terenzi, Universita Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy; Ferruccio Bettarelli, Leaff Engineering - Osimo, Italy
In recent years, loudspeaker manufacturers have proposed to the market vibrating sound transducers (also called shakers or exciters) that can be installed on a surface or a panel to be transformed in invisible speakers capable of delivering sound. These systems show different frequency behaviors mainly depending on the type and size of the surface. Therefore, an audio equalization is crucial to enhance the sound reproduction performance achieving flat frequency responses. In this paper a multi-point equalization procedure is applied to several surfaces equipped with vibrating transducers, showing its positive effect from objective and subjective point of view.
Convention Paper 10233 (Purchase now)

P04 - Room Acoustics

Wednesday, October 16, 1:30 pm — 5:00 pm (1E10)

Chair:
David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA

P04-1 Use of Wavelet Transform for the Computation of Modal Decay Times in Rooms—Roberto Magalotti, B&C Speakers S.p.A. - Bagno a Ripoli (FI), Italy; Daniele Ponteggia, Audiomatica Srl - Firenze, Italy
The acoustic behavior of small rooms in the modal frequency band can be characterized by the modal decay times MT₆₀. The paper explores a method for computing modal decay times from measurements of Room Impulse Responses (RIR) based on the wavelet transform. Once the resonance frequencies have been selected, the procedure computes a series of wavelet transforms of the Morlet type with decreasing bandwidth, exploiting the property that Morlet wavelets preserve the time history of energy decay. Then decay times can be calculated either by linear regression of the non-noisy portion of the curve or by nonlinear fitting of a model of decay plus noise. Examples of application of the method to real RIR measurements are shown.
Convention Paper 10235 (Purchase now)

P04-2 What's Old Is New Again: Using a Physical Scale Model Echo Chamber as a Real-Time Reverberator—Kevin Delcourt, École Nationale Supérieure Louis Lumière - Saint-Denis, France; Sorbonne Université, Paris - Paris, France; Franck Zagala, Institute d’Alembert, Group Lutheries—Acoustique-Musique - Paris, France; Alan Blum, École Nationale Supérieure Louis Lumière - Saint-Denis, France; Brian F. G. Katz, Sorbonne Université, CNRS, Institut Jean Le Rond d'Alembert - Paris, France
This paper presents a method using physical scale models as echo chambers. The proposed framework creates a partitioned convolution engine where the convolution processing is carried out physically on an up-scaled live audio stream in the model. The resulting reverberated sound is captured and down-scaled, providing the result to the user in real-time. The scale factor can be dynamically changed to explore different room sizes and the reduced dimensions of the scale model make it a tangible reverberation tool. Scale factors up to 1:5 have been tested for full bandwidth, with higher factor possible with improved hardware or in exchange for lowering the upper frequency range, primarily due to driver performance.
Convention Paper 10236 (Purchase now)

P04-3 Synthesis of Binaural Room Impulse Responses for Different Listening Positions Considering the Source Directivity—Ulrike Sloma, Technische Universität Ilmenau - Ilmenau, Germany; Florian Klein, Technische Universität Ilmenau - Ilmenau, Germany; Stephan Werner, Technische Universität Ilmenau - Ilmenau, Germany; Tyson Pappachan Kannookadan, TU- Ilmenau - Ilmenau, Germany
A popular goal in research on virtual and augmented acoustic realities is the implementation of realistic room acoustics and sound source characteristics. Additionally, listeners want to move around, explore the virtual or augmented environments. One way to realize position-dynamic synthesis is the use of binaural technologies on the basis of real measurements. While this approach allows to successfully reproduce the real acoustic environment, many positions need to be measured. To reduce the time effort new methods are invented to calculate binaural room impulse responses from few positions. The presented work enhances existing synthesis methods by including predefined sound source directivities into calculation of binaural room impulse responses. The results are analyzed in a physical and in a perceptive way.
Convention Paper 10237 (Purchase now)

P04-4 Extracting the Fundamental Mode from Sound Pressure Measurements in an Acoustic Tube—Joerg Panzer, R&D Team - Salgen, Germany
Acoustic tubes are used to provide a load to loudspeakers or to measure material properties. If the wavelength is comparable to the diameter of the tube cross-modes can be excited. This paper demonstrates a method that allows to extract only the fundamental mode from the measurement of the sound-pressure response. The only requirement is the use of three microphones mounted into the sides of the tube-wall as well as a circular cross-section.
Convention Paper 10238 (Purchase now)

P04-5 Accurate Reproduction of Binaural Recordings through Individual Headphone Equalization and Time Domain Crosstalk Cancellation—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
We have developed software apps that allow a user to non-invasively match headphones to reproduce the identical spectrum at the eardrum as that from a frontal source. The result is correct timbre and forward localization without head tracking. In addition we have developed a non-individual crosstalk cancelling algorithm that creates virtual sound sources just outside a listener’s ears. Both systems reproduce binaural recordings with startling realism. The apps enable researchers and students to hear what acoustical features are essential for clarity, proximity, and preference. Listening to any type of music with our apps is beautiful and highly engaging.
Convention Paper 10239 (Purchase now)

P04-6 Concert Hall Acoustics’ Influence on the Tempo of Musical Performances—Jan Berg, Luleå University of Technology - Piteå, Sweden
The acoustics of a concert hall is an integral and significant part of a musical performance as it affects the artistic decisions made by performer. Still, there are few systematic studies on the phenomenon. In this paper the effect of concert hall acoustics, mainly reverberation, on musical tempo for a selection of different genres and ensemble types is analyzed quantitatively. The study utilizes audio recordings made in a concert hall equipped with a movable ceiling enabling a variable volume and thus a variable reverberation time. The results show that there are cases where the tempo follows a change in acoustics as well as cases where it remains more or less unchanged.
Convention Paper 10240 (Purchase now)

P04-7 Optimum Measurement Locations for Large-Scale Loudspeaker System Tuning Based on First-Order Reflections Analysis—Samuel Moulin, L-Acoustics - Marcoussis, France; Etienne Corteel, L-Acoustics - Marcoussis, France; François Montignies, L-Acoustics - Marcoussis, France
This paper investigates how first-order reflections impact the response of sound reinforcement systems over large audiences. On the field, only few acoustical measurements can be performed to drive tuning decisions. The challenge is then to select the right measurement locations so that it provides an accurate representation of the loudspeaker system response. Simulations of each first-order reflection (e.g., floor or side wall reflection) are performed to characterize the average frequency response and its variability over the target audience area. Then, the representativity of measurements performed at a reduced number of locations is investigated. Results indicate that a subset of eight measurement locations spread over the target audience area represents a rational solution to characterize the loudspeaker system response.
Convention Paper 10234 (Purchase now)

P05 - Transducers

Wednesday, October 16, 1:30 pm — 5:00 pm (1E11)

Chair:
Todd Welti, Harman International Inc. - Northridge, CA, USA

P05-1 Nonlinear Control of Loudspeaker Based on Output Flatness and Trajectory Planning—Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; Glenn S. Kubota, Samsung Research America - Valencia, CA, USA
A loudspeaker is inherently nonlinear and produces timbre alterations, roughness, harshness, lack of clarity, and modulation noise. This may impair reproduction quality and speech intelligibility. These issues increase rapidly with high levels and especially high bass levels. Industrial design and marketing constraints demand smaller speaker systems without sacrificing sound output level. This results in higher distortion. To obtain "big bass from little boxes," an anti-distortion system is needed. We present a new approach that is based on the direct control and linearization of the loudspeaker diaphragm displacement that allows the maximization of the bass output and the minimization of the nonlinearities while keeping the diaphragm displacement within the range of safe operation.
Convention Paper 10241 (Purchase now)

P05-2 Perceptual Assessment of Distortion in Low-Frequency Loudspeakers—Louis Fielder, Retired - Millbrae, CA, USA; Michael Smithers, Dolby Laboratories - Sydney, NSW, Australia
A perceptually-driven distortion metric for loudspeakers is proposed that is based on a critical-band spectral comparison of the distortion and noise to an appropriate masking threshold. The loudspeaker is excited by a sine-wave signal composed of windowed 0.3 second bursts. Loudspeaker masking curves for sine waves between 20–500 Hz are derived from previously published ones for headphone distortion evaluation and expanded to curves at 1 decibel increments by linear interpolation and extrapolation. For each burst, the ratios of measured distortion and noise levels to the appropriate masking curve values are determined for each critical band starting at the second harmonic. Once this is done the audibility of all these contributions are combined into various audibility values.
Convention Paper 10242 (Purchase now)

P05-3 Rethinking Flat Panel Loudspeakers—An Objective Acoustic Comparison of Different Speaker Categories—Benjamin Zenker, Technical University Dresden - Dresden, Germany; Hommbru GmbH - Reichenbach, Germany; Sebastian Merchel, TU Dresden - Dresden, Germany; M. Ercan Altinsoy, TU Dresden - Dresden, Germany
The home entertainment market is growing, but connected devices like multi-room and streaming loudspeakers are increasingly replacing traditional audio systems. Compromises in the acoustic quality are made to satisfy additional requirements such as smaller, lighter, and cheaper products. The number of smart speakers sold suggests that the customers accept speakers with lower acoustical quality for their daily use. Concepts like soundbars aim to achieve better spatial reproduction but try to stay visually unobtrusive. Thanks to the low visual profile flat panel loudspeakers give opportunities for invisible integration. This paper presents an objective acoustic comparison of four speaker categories: smart speaker, flat panel, soundbar, and studio monitor. The comparison reveals that recent technological advances could make flat panel loudspeakers an alternative.
Convention Paper 10243 (Purchase now)

P05-4 Modelling and Measurement of Nonlinear Intermodal Coupling in Loudspeaker Diaphragm Vibrations—William Cardenas, ORA Graphine Audio Inc. - Montreal, Quebec, Canada
Accurate prediction of the nonlinear transfer response of loudspeakers in the full band is relevant to optimize the development of audio products. Small size, light, and efficient transducers require low density and thin diaphragms, which may vibrate nonlinearly even at low amplitudes impairing the sound quality. This paper proposes an extension of the existing transducer model comprising breakup modes with geometrical nonlinearities, adding the nonlinear coupling effect between the piston mode and the breakup modes responsible for large intermodulation problems. A novel measurement technique to estimate the breakup frequency modulation induced by the piston mode excursion is presented, the model is validated with measurements of harmonic and intermodulation distortion and other symptoms relevant for assessment of acoustic performance.
Convention Paper 10244 (Purchase now)

P05-5 Sound Capture by Microphone Vibration inside Playback Devices—Rivanaldo De Oliveira, Qualcomm Technologies, Inc. - San Diego, CA, USA
Integration of voice capture into devices that formerly were used only as a sound source, or that now need to include the ability to interface with a variety of cloud-provided services available to users and/or expand voice control capability to otherwise simple devices has become commonplace, and integration of multiple microphones inside a case that houses playback transducers requires careful attention to certain design aspects as will be discussed in this article where a prototype of a "Smart Speaker" will be used as an example.
Convention Paper 10245 (Purchase now)

P05-6 Low Deviation and High Sensitivity—Optimized Exciter Positioning for Flat Panel Loudspeakers by Considering Averaged Sound Pressure Equalization—Benjamin Zenker, Technical University Dresden - Dresden, Germany; Hommbru GmbH - Reichenbach, Germany; Shanavaz Sanjay Abdul Rawoof, TU Dresden - Dresden, Germany; Sebastian Merchel, TU Dresden - Dresden, Germany; M. Ercan Altinsoy, TU Dresden - Dresden, Germany
Loudspeaker panels represent a class of loudspeakers, whose electrical, mechanical, and acoustical properties differ completely from conventional loudspeakers. However, the acoustic properties are mostly associated with lower performance. The position of the excitation is one of the crucial parameters to optimize multiple parameters such as the frequency response in terms of linearity and sensitivity. This paper describes an approach to find the best excitation position for an exemplary distributed mode loudspeaker (DML) by considering efficiency and the averaged sound pressure equalization. An evaluation of the measured response in the horizontal plane of 25 excitation positions is presented and an optimization algorithm is used to filter every position to a certain acoustical quality standard.
Convention Paper 10246 (Purchase now)

P05-7 A Comparison of Test Methodologies to Personalize Headphone Sound Quality—Todd Welti, Harman International Inc. - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA; Sean Olive, Harman International - Northridge, CA, USA; Dan Pye, Harman International - Northridge, CA, USA
There exist many different methods to gather subjective equalization preference data from listeners. From one method to another there are generally tradeoffs between speed, accuracy, and ease of use. In this study four different types of test were compared to see which tests performed the best in each of these categories. The purpose was to select the best methods for headphone personalization applications for mobile devices. All four tests involve test subjects setting filter gain values for bass and treble shelving filters, and thus selecting their preferred response curves for listening to music through headphones. The results of each test, the time taken to complete them, and the ease of use based on a post-test questionnaire are presented.
Convention Paper 10247 (Purchase now)

P06 - Posters: Audio Signal Processing

Wednesday, October 16, 3:00 pm — 4:30 pm (South Concourse A)

P06-1 Modal Representations for Audio Deep Learning—Travis Skare, CCRMA, Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
Deep learning models for both discriminative and generative tasks have a choice of domain representation. For audio, candidates are often raw waveform data, spectral data, transformed spectral data, or perceptual features. For deep learning tasks related to modal synthesizers or processors, we propose new, modal representations for data. We experiment with representations such as an N-hot binary vector of frequencies, or learning a set of modal filterbank coefficients directly. We use these representations discriminatively–classifying cymbal model based on samples–as well as generatively. An intentionally naive application of a basic modal representation to a CVAE designed for MNIST digit images quickly yielded results, which we found surprising given less prior success when using traditional representations like a spectrogram image. We discuss applications for Generative Adversarial Networks, towards creating a modal reverberator generator.
Convention Paper 10248 (Purchase now)

P06-2 Distortion Modeling of Nonlinear Systems Using Ramped-Sines and Lookup Table—Paul Mayo, University of Maryland - College Park, MD, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
Nonlinear systems identification is used to synthesize black-box models of nonlinear audio effects and as such is a widespread topic of interest within the audio industry. As a variety of implementation algorithms provide a myriad of approaches, questions arise whether there are major functional differences between methods and implementations. This paper presents a novel method for the black-box measurement of distortion characteristic curves and an analysis of the popular “lookup table” implementation of nonlinear effects. Pros and cons of the techniques are examined from a signal processing perspective and the basic limitations and efficiencies of the approaches are discussed.
Convention Paper 10249 (Purchase now)

P06-3 An Open Audio Processing Platform Using SoC FPGAs and Model-Based Development—Trevor Vannoy, Montana State University - Bozeman, MT, USA; Flat Earth Inc. - Bozeman, MT, USA; Tyler Davis, Flat Earth Inc. - Bozeman, MT, USA; Connor Dack, Flat Earth Inc. - Bozeman, MT, USA; Dustin Sobrero, Flat Earth Inc. - Bozeman, MT, USA; Ross Snider, Montana State University - Bozeman, MT, USA; Flat Earth Inc. - Bozeman, MT, USA
The development cycle for high performance audio applications using System-on-Chip (SoC) Field Programmable Gate Arrays (FPGAs) is long and complex. To address these challenges, an open source audio processing platform based on SoC FPGAs is presented. Due to their inherently parallel nature, SoC FPGAs are ideal for low latency, high performance signal processing. However, these devices require a complex development process. To reduce this difficulty, we deploy a model-based hardware/software co-design methodology that increases productivity and accessibility for non-experts. A modular multi-effects processor was developed and demonstrated on our hardware platform. This demonstration shows how a design can be constructed and provides a framework for developing more complex audio designs that can be used on our platform.
Convention Paper 10250 (Purchase now)

P06-4 Objective Measurement of Stereophonic Audio Quality in the Directional Loudness Domain—Pablo Delgado, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Automated audio quality prediction is still considered a challenge for stereo or multichannel signals carrying spatial information. A system that accurately and reliably predicts quality scores obtained by time-consuming listening tests can be of great advantage in saving resources, for instance, in the evaluation of parametric spatial audio codecs. Most of the solutions so far work with individual comparisons of distortions of interchannel cues across time and frequency, known to correlate to distortions in the evoked spatial image of the subject listener. We propose a scene analysis method that considers signal loudness distributed across estimations of perceived source directions on the horizontal plane. The calculation of distortion features in the directional loudness domain (as opposed to the time-frequency domain) seems to provide equal or better correlation with subjectively perceived quality degradation than previous methods, as con?rmed by experiments with an extensive database of parametric audio codec listening tests. We investigate the effect of a number of design alternatives (based on psychoacoustic principles) on the overall prediction performance of the associated quality measurement system.
Convention Paper 10251 (Purchase now)

P06-5 Detection of the Effect of Window Duration in an audio Source Separation Paradigm—Ryan Miller, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA; Eric Tarr, Belmont University - Nashville, TN, USA
Non-negative matrix factorization (NMF) is a commonly used method for audio source separation in applications such as polyphonic music separation and noise removal. Previous research evaluated the use of additional algorithmic components and systems in efforts to improve the effectiveness of NMF. This study examined how the short-time Fourier transform (STFT) window duration used in the algorithm might affect detectable differences in separation performance. An ABX listening test compared speech extracted from two types of noise-contaminated mixtures at different window durations to determine if listeners could discriminate between them. It was found that the window duration had a significant impact on subject performance in both white- and conversation-noise cases with lower scores for the latter condition.
Convention Paper 10252 (Purchase now)

P06-6 Use of DNN-Based Beamforming Applied to Different Microphone Array Configurations—Tae Woo Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, South Korea; Nam Kyun Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, South Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Inyoung Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, South Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
Minimum variance distortionless response (MVDR) beamforming is one of the most popular multichannel signal processing techniques for dereverberation and/or noise reduction. However, the MVDR beamformer has the limitation that it must be designed to be dependent on the receiver array geometry. This paper demonstrates an experimental setup and results by designing a deep learning-based MVDR beamformer and applying it to different microphone array configurations. Consequently, it is shown that the deep learning-based MVDR beamformer provides more robust performance under mismatched microphone array configurations than the conventional statistical MVDR one.
Convention Paper 10253 (Purchase now)

P06-7 Deep Neural Network Based Guided Speech Bandwidth Extension—Konstantin Schmidt, Friedrich-Alexander-University (FAU) - Erlangen, Germany; International Audio Laboratories Erlangen - Erlangen; Bernd Edler, Friedrich Alexander University - Erlangen-Nürnberg, Germany; Fraunhofer IIS - Erlangen, Germany
Up to today telephone speech is still limited to the range of 200 to 3400 Hz since the predominant codecs in public switched telephone networks are AMR-NB, G.711, and G.722 [1, 2, 3]. Blind bandwidth extension (blind BWE, BBWE) can improve the perceived quality as well as the intelligibility of coded speech without changing the transmission network or the speech codec. The BBWE used in this work is based on deep neural networks (DNNs) and has already shown good performance [4]. Although this BBWE enhances the speech without producing too many artifacts it sometimes fails to enhance prominent fricatives that can result in muffled speech. In order to better synthesize prominent fricatives the BBWE is extended by sending a single bit of side information—here referred to as guided BWE. This bit may be transmitted, e.g., by watermarking so that no changes to the transmission network or the speech codec have to be done. Different DNN con?gurations (including convolutional (Conv.) layers as well as long short-term memory layers (LSTM)) making use of this bit have been evaluated. The BBWE has a low computational complexity and an algorithmic delay of 12 ms only and can be applied in state-of-the-art speech and audio codecs.
Convention Paper 10254 (Purchase now)

P06-8 Analysis of the Sound Emitted by Honey Bees in a Beehive—Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Alessandro Terenzi, Universita Politecnica delle Marche - Ancona, Italy; Simone Orcioni, Universita Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
The increasing in honey bee mortality of the last years has brought great attention on the possibility of intensive bee hive monitoring in order to better understand the problems that are seriously affecting the honey bee health. It is well known that sound emitted inside a beehive is one of the key parameters for a non-invasive monitoring capable of determining some aspects of their condition. The proposed work aims at analyzing the bees’ sound introducing features extraction useful for sound classification techniques and to determine dangerous situations. Taking into consideration a real scenario, several experiments have been performed focusing on particular events, such as swarming, to highlight the potentiality of the proposed approach.
Convention Paper 10255 (Purchase now)

P06-9 Improvement of DNN-Based Speech Enhancement with Non-Normalized Features by Using an Automatic Gain Control—Linjuan Cheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Speech enhancement performance may degrade when the peak level of the noisy speech is significantly different from the training datasets in Deep Neural Networks (DNN)-based speech enhancement algorithms, especially when the non-normalized features are used in practical applications, such as log-power spectra. To overcome this shortcoming, we introduce an automatic gain control (AGC) method as a preprocessing technique. By doing so, we can train the model with the same peak level of all the speech utterances. To further improve the proposed DNN-based algorithm, the feature compensation method is combined with the AGC method. Experimental results indicate that the proposed algorithm can maintain consistent performance when the peak of the noisy speech changes in a large range.
Convention Paper 10256 (Purchase now)

P07 - Perception

Thursday, October 17, 9:00 am — 12:00 pm (1E10)

Chair:
Elisabeth McMullin, Samsung Research America - Valencia, CA USA

P07-1 A Binaural Model to Estimate Room Impulse Responses from Running Signals and Recordings—Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA; David Dahlbom, Rensselaer Polytechnic Institute - Troy, NY, USA; Nate Keil, Rensselaer Polytechnic Institute - Troy, NY, USA
A binaural model is described that can use a multichannel signal to robustly localize a sound source in the presence of multiple reflections. The model also estimates a room impulse response from a running multichannel signal, e.g., from a recording, and determines the spatial locations and delays of early reflections, without any prior or additional knowledge of the source. A dual-layer cross-correlation/autocorrelation algorithm is used to determine the interaural time difference (ITD) of the direct sound source component and to estimate a binaural activity pattern. The model is able to accurately localize broadband signals in the presence of real room reflections.
Convention Paper 10257 (Purchase now)

P07-2 Describing the Audible Effects of Nonlinear Loudspeaker Distortion—Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; Zhongran Wang, Samsung Research America, Audio Lab - Valencia, CA, USA
In order to evaluate how and when listeners hear distortion in a nonlinear loudspeaker model, a three-part study was designed. A variety of audio files were processed through both a linear and a nonlinear loudspeaker model and the input signals were calibrated to produce a prescribed level of distortion in the nonlinear model. Listeners completed subjective experiments in which they heard both versions of the clips, selected the audible attributes they believed changed, and described the differences in their own words. In later tests, listeners marked in time they heard changes in the most commonly used descriptors. A full analysis of listener comments and time-based relationships is explored with theoretical explanations of the results obtained.
Convention Paper 10258 (Purchase now)

P07-3 Spatial Auditory Masking for Three-Dimensional Audio Coding—Masayuki Nishiguchi, Akita Prefectural University - Yurihonjo Akita, Japan; Kodai Kato, Akita Prefectural University - Yurihonjo Akita, Japan; Kanji Watanabe, Akita Prefectural University - Yurihonjo Akita, Japan; Koji Abe, Akita Prefectural University - Yurihonjo Akita, Japan; Shouichi Takane, Akita Prefectural University - Yurihonjo, Akita, Japan
Spatial auditory masking effects have been examined for developing highly efficient audio coding algorithms for signals in three-dimensional (3D) sound fields. Generally, the masking threshold level is lowered according to the increase of the directional difference between masker and maskee signals. However, we found that when a maskee signal is located at the symmetrical position of the masker signal with respect to the frontal plane of a listener, the masking threshold level is not lowered, which counters the expectations. A mathematical model is proposed to estimate the masking threshold caused by multiple masker signals in the 3D sound field. Using the model, the perceptual entropy of a tune from a two channel stereo CD was reduced by approximately 5.5%.
Convention Paper 10259 (Purchase now)

P07-4 Investigation of Masking Thresholds for Spatially Distributed Sound Sources—Sascha Dick, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Rami Sweidan, University of Stuttgart - Stuttgart, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
For perceptual audio coding of immersive content, the investigation of masking effects between spatially distributed sound sources is of interest. We conducted subjective listening experiments to determine the masking thresholds for “tone-masking-noise” conditions when masker (1 kHz sine tone) and probe (1 kHz narrow band noise) are spatially distributed using an immersive 22.2 loudspeaker setup. Our results show masking thresholds in the range of –35 dB to –26 dB probe-to-masker-ratio. As expected, least masking was found between left/right opposed sources with up to 5 dB lower than for coincident sources. Other noteworthy observations included an increase of masking for certain elevations and cases of selective masking decrease due to interaural phase difference phenomena.
Convention Paper 10260 (Purchase now)

P07-5 An Attempt to Elicit Horizontal and Vertical Auditory Precedence Percepts without Pinnae Cues—Wesley Bulla, Belmont University - Nashville, TN, USA; Paul Mayo, University of Maryland - College Park, MD, USA
This investigation was a continuation of AES-143 paper #9832 and AES-145 paper #10066 where reliable auditory precedence in the elevated, ear-level, and lowered horizontal planes was examined. This experiment altered and eliminated the spectral influences that govern the detection of elevation and presented two different horizontal and vertical inter-channel time delays during a precedence-suppression task. A robust precedence effect was elicited via ear-level horizontal plane loudspeakers. In contrast, leading signal identification was minimal in the vertical condition and no systematic influence of the leading elevated and lowered median plane loudspeakers was witnessed suggesting that precedence was not active in the vertical condition. Observed influences that might have been generated by the lead-lag signal in the vertical plane was not consistent with any known precedence paradigms.
Convention Paper 10261 (Purchase now)

P07-6 Perceptual Weighting to Improve Coding of Harmonic Signals—Elias Nemer, XPERI/DTS - Calabasas, CA, USA; Zoran Fejzo, DTS/Xperi Corp. - Calabasas, CA, USA; Jeff Thompson, XPERI/DTS - Calabasas, CA, USA
This paper describes a new approach to improving the coding of harmonic signals in transform-based audio codecs employing pulse vector quantization. The problem occurs when coding at low rate signals with varying levels of harmonics. As a result of vector quantization (VQ), some lower level harmonics may be missed or fluctuating and cause perceptual artifacts. The proposed solution consists of applying perceptual weighting to the computed synthesis error in the search loop of the VQ. The objective being to de-emphasize the error in the high tonal peaks where signal energy partially masks the quantization noise. Simulation results over mixed musical content showed a noticeable improvement in perceptual scores, particularly for highly harmonic signals.
Convention Paper 10262 (Purchase now)

P08 - Recording, Production, and Live Sound

Thursday, October 17, 9:00 am — 11:30 am (1E11)

Chair:
Wieslaw Woszczyk, McGill University - Montreal, QC, Canada

P08-1 Microphone Comparison: Spectral Feature Mapping for Snare Drum Recording—Matthew Cheshire, Birmingham City University - Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Jason Hockman, Birmingham City University - Birmingham, UK
Microphones are known to exhibit sonic differences and microphone selection is integral in achieving desired tonal qualities of recordings. In this paper an initial multi-stimuli listening test is used to categorize microphones based on user preference when recording snare drums. A spectral modification technique is then applied to recordings made with a microphone from the least preferred category, such that they take on the frequency characteristics of recordings from the most preferred category. To assess the success of the audio transformation, a second experiment is undertaken with expert listeners to gauge pre- and post-transformation preferences. Results indicate spectral transformation dramatically improves listener preference for recordings from the least preferred category, placing them on par with those of the most preferred.
Convention Paper 10263 (Purchase now)

P08-2 An Automated Approach to the Application of Reverberation—Dave Moffat, Queen Mary University London - London, UK; Mark Sandler, Queen Mary University of London - London, UK
The field of intelligent music production has been growing over recent years. There have been several different approaches to automated reverberation. In this paper we automate the parameters of an algorithmic reverb based on analysis of the input signals. Literature is used to produce a set of rules for the application of reverberation, and these rules are then represented directly as direct audio feature. This audio feature representation is then used to control the reverberation parameters from the audio signal in real time.
Convention Paper 10264 (Purchase now)

P08-3 Subjective Graphical Representation of Microphone Arrays for Vertical Imaging and Three-Dimensional Capture of Acoustic Instruments, Part II—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
This investigation employs a simple graphical method in an effort to represent the perceived spatial attributes of three microphone arrays designed to create vertical and three-dimensional audio images. Three separate arrays were investigated in this study: Coincident, M/S-XYZ, and Non-coincident/Five-point capture. Instruments of the orchestral string, woodwind, and brass sections were recorded. Test subjects were asked to represent the spatial attributes of the perceived audio image on a horizontal/vertical grid and a graduated depth grid, via a pencil drawing. Results show that the arrays exhibit a greater extent in every dimension—vertical, horizontal, and depth—compared to the monophonic image. The statistical trends show that the spatial characteristics of each array are consistent across each dimension. In the context of immersive/3D mixing and post production, a case can be made that the arrays will contribute to a more efficient and improved workflow due to the fact that they are easily optimized during mixing or post-production.
Convention Paper 10265 (Purchase now)

P08-4 Filling The Space: The Impact of Convolution Reverberation Time on Note Duration and Velocity in Duet Performance—James Weaver, Queen Mary University London - London, UK; Mathieu Barthet, Queen Mary University London - London, UK; Elaine Chew, CNRS-UMR9912/STMS (IRCAM) - Paris, France
This paper will not be presented The impact of reverberation on musical expressivity is an area of growing interest as technology to simulate, and create, acoustic environments improves. Being able to characterize the impact of acoustic environments on musical performance is a problem of interest to acousticians, designers of virtual environments, and algorithmic composers. We analyze the impact of convolution reverberation time on note duration and note velocity, which serve as markers of musical expressivity. To improve note clarity in situations of long reverberation times, we posit musicians performing in a duo would lengthen the separation between notes (note duration) and increase loudness (note velocity) contrast. The data for this study comprises of MIDI messages extracted from performances by 2 co-located pianists playing the same piece of music 100 times across 5 different reverberation conditions. To our knowledge, this is the largest data set to date looking at piano duo performance in a range of reverberation conditions. In contrast to prior work the analysis considers both the entire performance as well as an excerpt at the opening part of the piece featuring a key structural element of the score. This analysis ?nds convolution reverberation time is found to be moderately positively correlated with mean note duration (r = 0.34 and p =< 0.001), but no significant correlation was found between convolution reverberation time and mean note velocity (r = -0.19 and p = 0.058).
Convention Paper 10266 (Purchase now)

P08-5 The Effects of Spectators on the Speech Intelligibility Performance of Sound Systems in Stadia and Other Large Venues—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK; Ross Hammond, University of Derby - Derby, Derbyshire, UK; Peter Mapp Associates - Colchester, UK
Stadiums and similar venues in the UK and throughout most of Europe are subject to strict safety standards and regulations, including the performance of their Public Address systems. The usual requirement is for the PA system to achieve a potential speech intelligibility performance of 0.50 STI, though some authorities and organizations require a higher value than this. However, a problem exists with measuring the performance of the system, as this can only be carried out in the empty stadium. The paper shows that with occupancy, the acoustic conditions change significantly, as the spectators introduce significant sound absorption and also increase the background noise level. The effect this can have on the intelligibility performance of the sound system is examined and discussed. The relationship between the unoccupied starting conditions and audience absorption and distribution are also investigated.
Convention Paper 10267 (Purchase now)

P09 - Posters: Applications in Audio

Thursday, October 17, 9:00 am — 10:30 am (South Concourse A)

P09-1 Analyzing Loudness Aspects of 4.2 Million Musical Albums in Search of an Optimal Loudness Target for Music Streaming—Eelco Grimm, HKU University of the Arts - Utrecht, Netherlands; Grimm Audio - Eindhoven, The Netherlands
In cooperation with music streaming service Tidal, 4.2 million albums were analyzed for loudness aspects such as loudest and softest track loudness. Evidence of development of the loudness war was found and a suggestion for music streaming services to use album normalization at –14 LUFS for mobile platforms and –18 LUFS or lower for stationary platforms was derived from the data set and a limited subject study. Tidal has implemented the recommendation and reports positive results.
Convention Paper 10268 (Purchase now)

P09-2 Audio Data Augmentation for Road Objects Classification by an Artificial Neural Network—Ohad Barak, Mentor Graphics - Mountain View, CA, USA; Nizar Sallem, Mentor Graphics - Mountain View, CA, USA
Following the resurgence of machine learning within the context of autonomous driving, the need for acquiring and labeling data expanded by folds. Despite the large amount of available visual data (images, point clouds, . . . ), researchers apply augmentation techniques to extend the training dataset, which improves the classification accuracy. When trying to exploit audio data for autonomous driving, two challenges immediately surfaced: first, the lack of available data and second, the absence of augmentation techniques. In this paper we introduce a series of augmentation techniques suitable for audio data. We apply several procedures, inspired by data augmentation for image classification, that transform and distort the original data to produce similar effects on sound. We show the increase in overall accuracy of our neural network for sound classification by comparing it to the non-augmented version.
Convention Paper 10269 (Purchase now)

P09-3 Is Binaural Spatialization the Future of Hip-Hop?—Kierian Turner, University of Lethbridge - Lethbridge, AB, Canada; Amandine Pras, Digital Audio Arts - University of Lethbridge - Lethbridge, Alberta, Canada; School for Advanced Studies in the Social Sciences - Paris, France
Modern hip-hop is typically associated with samples and MIDI and not so much with creative source spatialization since the energy-driving elements are usually located in the center of a stereo image. To evaluate the impact of certain element placements behind, above, or underneath the listener on the listening experience, we experimented beyond standard mixing practices by spatializing beats and vocals of two hip-hop tracks in different ways. Then, 16 hip-hop musicians, producers, and enthusiasts, and three audio engineers compared a stereo and a binaural version of these two tracks in a perceptual experiment. Results showed that hip-hop listeners expect a few elements, including the vocals, to be mixed conventionally in order to create a cohesive mix and to minimize distractions.
Convention Paper 10270 (Purchase now)

P09-4 Alignment and Timeline Construction for Incomplete Analogue Audience Recordings of Historical Live Music Concerts—Thomas Wilmering, Queen Mary University of London - London, UK; Centre for Digital Music (C4DM); Florian Thalmann, Queen Mary University of London - London, UK; Mark Sandler, Queen Mary University of London - London, UK
Analogue recordings pose specific problems during automatic alignment, such as distortion due to physical degradation, or differences in tape speed during recording, copying, and digitization. Oftentimes, recordings are incomplete, exhibiting gaps with different lengths. In this paper we propose a method to align multiple digitized analogue recordings of same concerts of varying quality and song segmentations. The process includes the automatic construction of a reference concert timeline. We evaluate alignment methods on a synthetic dataset and apply our algorithm to real-world data.
Convention Paper 10271 (Purchase now)

P09-5 Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network—Jie Wang, Guangzhou University - Guangzhou, China; Dunze Wang, Guangzhou University - Guangzhou, China; Yunda Chen, Guangzhou University - Guangzhou, China; Xun Lu, Power Grid Planning Center, Guandgong Power Grid Company - Guangdong, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China
To improve the performance of automatic speech recognition in noisy environments, the convolutional neural network (CNN) combined with time-delay neural network (TDNN) is introduced, which is referred as CNN-TDNN. The CNN-TDNN model is further optimized by factoring the parameter matrix in the time-delay neural network hidden layers and adding a time-restricted self-attention layer after the CNN-TDNN hidden layers. Experimental results show that the optimized CNN-TDNN model has better performance than DNN, CNN, TDNN, and CNN-TDNN. The average recognition word error rate (WER) can be reduced by 11.76% when comparing with the baselines.
Convention Paper 10272 (Purchase now)

P10 - Spatial Audio, Part 1

Thursday, October 17, 1:15 pm — 4:15 pm (1E10)

Chair:
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA

P10-1 Use of the Magnitude Estimation Technique in Reference-Free Assessments of Spatial Audio Technology—Alex Brandmeyer, Dolby Laboratories - San Francisco, CA, USA; Dan Darcy, Dolby Laboratories, Inc. - San Francisco, CA, USA; Lie Lu, Dolby Laboratories - San Francisco, CA, USA; Richard Graff, Dolby Laboratories, Inc. - San Francisco, CA, USA; Nathan Swedlow, Dolby Laboratories - San Francisco, CA, USA; Poppy Crum, Dolby Laboratories - San Francisco, CA, USA
Magnitude estimation is a technique developed in psychophysics research in which participants numerically estimate the relative strengths of a sequence of stimuli along a relevant dimension. Traditionally, the method has been used to measure basic perceptual phenomena in different sensory modalities (e.g., "brightness," "loudness"). We present two examples of using magnitude estimation in the domain of audio rendering for different categories of consumer electronics devices. Importantly, magnitude estimation doesn’t require a reference stimulus and can be used to assess general ("audio quality") and domain-specific (e.g., "spaciousness") attributes. Additionally, we show how this data can be used together with objective measurements of the tested systems in a model that can predict performance of systems not included in the original assessment.
Convention Paper 10273 (Purchase now)

P10-2 Subjective Assessment of the Versatility of Three-Dimensional Near-Field Microphone Arrays for Vertical and Three-Dimensional Imaging—Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Jack Kelly, McGill University - Montreal, QC, Canada; Brett Leonard, University of Indianapolis - Indianapolis, IN, USA; The Chelsea Music Festival - New York, NY, USA
This investigation examines the operational size-range of audio images recorded with advanced close-capture microphone arrays for three-dimensional imaging. It employs a 3D panning tool to manipulate audio images. The 3D microphone arrays used in this study were: Coincident-XYZ, M/S-XYZ, and Non-coincident-XYZ/five-point. Instruments of the orchestral string, woodwind, and brass sections were recorded. The objective of the test was to determine the point of three-dimensional expansion onset, preferred imaging, and image breakdown point. Subjects were presented with a continuous dial to manipulate the three-dimensional spread of the arrays, allowing them to expand or contract the microphone signals from 0° to 90° azimuth/elevation. The results showed that the M/S-XYZ array is the perceptually “biggest” of the capture systems under test and displayed the fasted sense of expansion onset. The coincident and non-coincident arrays are much less agreed upon by subjects in terms of preference in particular, and also in expansion onset.
Convention Paper 10274 (Purchase now)

P10-3 Defining Immersion: Literature Review and Implications for Research on Immersive Audiovisual Experiences—Sarvesh Agrawal, Bang & Olufsen a/s - Struer, Denmark; Department of Photonics Engineering, Technical University of Denmark; Adèle Simon, Bang & Olufsen a/s - Struer, Denmark; Søren Bech, Bang & Olufsen a/s - Struer, Denmark; Aalborg University - Aalborg, Denmark; Klaus Bærentsen, Aarhus University - Aarhus, Denmark; Søren Forchhammer, Technical University of Denmark - Lyngby, Denmark
The use of the term “immersion” to describe a multitude of varying experiences in the absence of a definitional consensus has obfuscated and diluted the term. This paper presents a non-exhaustive review of previous work on immersion on the basis of which a definition of immersion is proposed: a state of deep mental involvement in which the subject may experience disassociation from the awareness of the physical world due to a shift in their attentional state. This definition is used to contrast and differentiate interchangeably used terms such as presence and envelopment from immersion. Additionally, an overview of prevailing measurement techniques, implications for research on immersive audiovisual experiences, and avenues for future work are discussed briefly.
Convention Paper 10275 (Purchase now)

P10-4 Evaluation on the Perceptual Influence of Floor Level Loudspeakers for Immersive Audio Reproduction—Yannik Grewe, Fraunhofer IIS - Erlangen, Germany; Andreas Walther, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Julian Klapp, Fraunhofer IIS - Erlangen, Germany
Listening tests were conducted to evaluate the perceptual influence of adding a lower layer of loudspeakers to a setup that is commonly used for immersive audio reproduction. Three setups using horizontally arranged loudspeakers (1M, 2M, 5M), one with added height loudspeakers (5M+4H), and one with additional ?oor level loudspeakers (5M+4H+3L) were compared. Basic Audio Quality was evaluated in a sweet-spot test with explicit reference, and two preference tests (sweet-spot and off sweet-spot) were performed to evaluate the Overall Audio Quality. The stimuli, e.g., ambient recordings and sound design material, made dedicated use of the lower loudspeaker layer. The results show that reproduction comprising a lower loudspeaker layer is preferred compared to reproduction using the other loudspeaker setups included in the test.
Convention Paper 10276 (Purchase now)

P10-5 Investigating Room-Induced Influences on Immersive Experience Part II: Effects Associated with Listener Groups and Musical Excerpts—Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Shuichi Sakamoto, Tohoku University - Sendai, Japan
The authors previously compared four distinct multichannel playback rooms and showed that perceived spatial attributes of program material (width, depth, and envelopment) were similar across all four rooms when reproduced through a 22-channel loudspeaker array. The present study further investigated perceived auditory immersion from two additional variables: listener group and musical style. We found a three-way interaction of variables, MUSIC x (playback) ROOM x GROUP for 22-channel reproduced music. The interaction between musical material and playback room acoustics differentiates perceived auditory immersion across listener groups. However, in the 2-channel reproductions, the room and music interaction is prominent enough to flatten inter-group differences. The 22-channel reproduced sound fields may have shaped idiosyncratic cognitive bases for each listener group.
Convention Paper 10277 (Purchase now)

P10-6 Comparison Study of Listeners’ Perception of 5.1 and Dolby Atmos—Tomas Oramus, Academy of Performing Arts in Prague - Prague, Czech Republic; Petr Neubauer, Academy of Performing Arts in Prague - Prague, Czech Republic
Surround sound reproduction has been a common technology in almost every theater room for several decades. In 2012 Dolby Laboratories, Inc. announced a new spatial 3D audio format – Dolby Atmos [1] that (due to its object-based rendering) pushes the possibilities of spatial reproduction and supposedly listeners' experience forward. This paper examines listeners' perception of this format in comparison with today's unwritten standard for cinema reproduction – 5.1. Two sample groups were chosen for the experiment - experienced listeners (sound designers and sound design students) and inexperienced listeners; the objective was to examine how these two groups perceive selected formats and whether there is any difference between these two groups. We aimed at five aspects – Spatial Immersion (Envelopment), Localization, Dynamics, Audio Quality, and Format Preference. The results show mostly an insignificant difference between these two groups while both of them slightly leaned towards Dolby Atmos over 5.1.
Convention Paper 10278 (Purchase now)

P11 - Semantic Audio

Thursday, October 17, 1:15 pm — 2:45 pm (1E11)

Chair:
Robert C. Maher, Montana State University - Bozeman, MT, USA

P11-1 Impact of Statistical Parameters of Late Reverberation on the Instantaneous Frequencies of Reverberant Audio—Sarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
This paper addresses the impact of late reverberation on the instantaneous frequency tracks of reverberant audio. While existing models of early reflections and low frequency room modes enable prediction of instantaneous frequency tracks of a filtered signal, the effects of late reverberation are best modeled statistically. After reviewing the parameterization of late reverberation, the effects of frequency dependent decay time and direct to reverberant ratio on instantaneous frequency are investigated using synthetic impulse responses derived from velvet noise. These effects are quantified using the autocorrelation function of the reverberant instantaneous frequency tracks. Finally, the instantaneous frequency deviations that occur when an anechoic sound is filtered with a recorded impulse response are compared to those resulting from synthesized late reverberation.
Convention Paper 10279 (Purchase now)

P11-2 Precise Temporal Localization of Sudden Onsets in Audio Signals Using the Wavelet Approach—Yuxuan Wan, Hong Kong University of Science and Technology - Clean Water Bay, Hong Kong; Yijia Chen, Hong Kong University of Science and Technology - Clean Water Bay, Hong Kong; Keegan Yi Hang Sim, Hong Kong University of Science and Technology - Clean Water Bay, Hong Kong; Lijia Wu, Hong Kong University of Science and Technology - Hong Kong, Chian; Xianzheng Geng, Hong Kong University of Science and Technology - Clean Water Bay, Hong Kong; Kevin Chau, Hong Kong University of Science and Technology - Clean Water Bay, Hong Kong
Presently reported is a wavelet-based method for the temporal localization of sudden onsets in audio signals with sub-millisecond precision. The method only requires O(n) operations, which is highly efficient. The entire audio signal can be processed as a whole without the need to be broken down into individual windowed overlapping blocks. It can also be processed in a streaming mode compatible with real-time processing. In comparison with time-domain and frequency-domain methods, the wavelet-based method proposed here offers several distinct advantages in sudden onset detection, temporal localization accuracy, and computational cost, which may therefore find broad applications in audio signal processing and music information retrieval.
Convention Paper 10280 (Purchase now)

P11-3 Forensic Comparison of Simultaneous Recordings of Gunshots at a Crime Scene—Robert C. Maher, Montana State University - Bozeman, MT, USA; Ethan Hoerr, Montana State University - Bozeman, MT, USA
Audio forensic evidence is of increasing importance in law enforcement investigations because of the growing use in the United States of personal audio/video recorders carried by officers on duty, by bystanders, and by surveillance systems of businesses and residences. These recording systems capture speech, background environmental sounds, and in some cases, gunshots and other firearm sounds. When there are multiple audio recording devices near the scene of a gunfire incident, the similarities and differences of the various recordings can either help or hamper the audio forensic examiner’s efforts to describe the sequence of events. This paper considers several examples and provides recommendations for audio forensic examiners in the interpretation of this gunshot acoustic evidence.
Convention Paper 10281 (Purchase now)

P12 - Posters: Room Acoustics

Thursday, October 17, 3:00 pm — 4:30 pm (South Concourse A)

P12-1 Transparent Office Screens Based on Microperforated Foil—Krzysztof Brawata, Gorycki&Sznyterman Sp. z o.o. - Cracow, Poland; Katarzyna Baruch, Gorycki&Sznyterman Sp. z o.o. - Cracow, Poland; Tadeusz Kamisinski, AGH University of Science and Technology - Cracow, Poland; Bartlomiej Chojnacki, AGH University of Science and Technology - Cracow, Poland; Mega-Acoustic - Kepno, Poland
In recent years, providing comfortable working conditions in open office spaces has become a growing challenge. The ever-increasing demand for office work implies the emergence of ever new spaces and the need to use available space, which generates the need for proper interior design. There are many acoustic solutions available on the market that support the acoustic comfort in office spaces by ensuring appropriate levels of privacy and low levels of acoustic background. One of such solutions are desktop screens, which divide employees' space. These solutions are based mainly on sound absorbing materials, i.e., mineral wool, felt, as well as sound insulating ones, such as glass or MDF. The article presents methods of using microperforated foils for building acoustic screens. The influence of dimensions and parameters of microperforated foil were examined. The method of its assembly as well as the use of layered systems made of microperforated foil and sound insulating material were also considered in this paper.
Convention Paper 10282 (Purchase now)

P12-2 A Novel Spatial Impulse Response Capture Technique for Realistic Artificial Reverberation in the 22.2 Multichannel Audio Format—Jack Kelly, McGill University - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
As immersive media content and technology begin to enter the marketplace, the need for truly immersive spatial reverberation tools takes on a renewed significance. A novel spatial impulse response capture technique optimized for the 22.2 multichannel audio format is presented. The proposed technique seeks to offer a path for engineers who are interested in creating three-dimensional spatial reverberation through convolution. Its design is informed by three-dimensional microphone techniques for the channel-based capture of acoustic music. A technical description of the measurement system used is given. The processes by which the spatial impulse responses are captured and rendered, including deconvolution and loudness normalization, are described. Three venues that have been measured using the proposed technique are presented. Preliminary listening sessions suggest that the array is capable of delivering a convincing three-dimensional reproduction of several acoustic spaces with a high degree of fidelity. Future research into the perception of realism in spatial reverberation for immersive music production is discussed.
Convention Paper 10283 (Purchase now)

P12-3 Impulse Response Simulation of a Small Room and in situ Measurements Validation—Daniel Núñez-Solano, University of Las Américas - Quito, Ecuador; Virginia Puyana-Romero, University of Las Américas - Quito, Ecuador; Cristian Ordóñez-Andrade, University of Las Américas - Quito, Ecuador; Luis Bravo-Moncayo, Universidad de Las Américas - Quito, Ecuador; Christiam Garzón-Pico, Universidad de Las Américas - Quito, Ecuador
The study of reverberation time in room acoustics presents certain drawbacks when dealing with small spaces. In order to reduce the inaccuracies due to the lack of space for placing measurement devices, finite element methods become a good alternative to support measurement results or to predict the reverberation time on the bases of calculating impulse responses. This paper presents a comparison of the reverberation time obtained by means of in situ and simulated impulse responses. The impulse response is simulated using time-domain finite elements methods. The used room for measurements and simulations is a control room of Universidad de Las Americas. Results show a measured mean absolute error of 0.04 s compared to the computed reverberation time.
Convention Paper 10284 (Purchase now)

P12-4 Calculation of Directivity Patterns from Spherical Microphone Array Recordings—Carlotta Anemüller, International Audio Laboratories Erlangen - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Taking into account the direction-dependent radiation of natural sound sources (such as musical instruments) can help to enhance auralization processing and thus improves the plausibility of simulated acoustical environments as, e.g., found in virtual reality (VR) systems. In order to quantify this direction-dependent behavior, usually so-called directivity patterns are used. This paper investigates two different methods that can be used to calculate directivity patterns from spherical microphone array recordings. A comparison between both calculation methods is performed based on the resulting directivity patterns. Furthermore, the directivity patterns of several musical instruments are analyzed and important measures are extracted. For all calculations, the publicly available anechoic microphone array measurements database recorded at the Technical University Berlin (TU Berlin) was used.
Convention Paper 10285 (Purchase now)

P13 - Spatial Audio, Part 2

Friday, October 18, 9:00 am — 11:00 am (1E10)

Chair:
Doyuen Ko, Belmont University - Nashville, TN, USA

P13-1 Simplified Source Directivity Rendering in Acoustic Virtual Reality Using the Directivity Sample Combination—Georg Götz, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
This contribution proposes a simplified rendering of source directivity patterns for the simulation and auralization of auditory scenes consisting of multiple listeners or sources. It is based on applying directivity filters of arbitrary directivity patterns at multiple, supposedly important directions, and approximating the filter outputs of intermediate directions by interpolation. This reduces the amount of required filtering operations considerably and thus increases the computational efficiency of the auralization. As a proof of concept, the simplification is evaluated from a technical as well as from a perceptual point of view for one specific use case. The promising results suggest further studies of the proposed simplification in the future to assess its applicability to more complex scenarios.
Convention Paper 10286 (Purchase now)

P13-2 Classification of HRTFs Using Perceptually Meaningful Frequency Arrays—Nolan Eley, New York University - New York, NY, USA
Head-related transfer functions (HRTFs) are essential in binaural audio. Because HRTFs are highly individualized and difficult to acquire, much research has been devoted towards improving HRTF performance for the general population. Such research requires a valid and robust method for classifying and comparing HRTFs. This study used a k-nearest neighbor (KNN) classifier to evaluate the ability of several different frequency arrays to characterize HRTFs. The perceptual impact of these frequency arrays was evaluated through a subjective test. Mel-frequency arrays showed the best results in the KNN classification tests while the subjective test results were inconclusive.
Convention Paper 10288 (Purchase now)

P13-3 An HRTF Based Approach towards Binaural Sound Source Localization—Kaushik Sunder, Embody VR - Mountain View, CA, USA; Yuxiang Wang, University of Rochester - Rochester, NY, USA
With the evolution of smart headphones, hearables, and hearing aids there is a need for technologies to improve situational awareness. The device needs to constantly monitor the real world events and cue the listener to stay aware of the outside world. In this paper we develop a technique to identify the exact location of the dominant sound source using the unique spectral and temporal features listener’s head-related transfer functions (HRTFs). Unlike most state-of-the-art beamforming technologies, this method localizes the sound source using just two microphones thereby reducing the cost and complexity of this technology. An experimental framework is setup at the EmbodyVR anechoic chamber, and hearing aid recordings are carried out for several different trajectories, SNRs, and turn-rates. Results indicate that the source localization algorithms perform well for dynamic moving sources for different SNR levels.
Convention Paper 10289 (Purchase now)

P13-4 Physical Controllers vs. Hand-and-Gesture Tracking: Control Scheme Evaluation for VR Audio Mixing—Justin Bennington, Belmont University - Nashville, TN, USA; Doyuen Ko, Belmont University - Nashville, TN, USA
This paper investigates potential differences in performance for both physical and hand-and-gesture control within a Virtual Reality (VR) audio mixing environment. The test was designed to draw upon prior evaluations of control schemes for audio mixing while presenting sound sources to the user for both controller schemes within VR. A VR audio mixing interface was developed in order to facilitate a subjective evaluation of two control schemes. Response data was analyzed with t- and ANOVA tests. Physical controllers were generally rated higher than the hand-and-gesture controls in terms of perceived accuracy, efficiency, and satisfaction. No significant difference in task completion time for either control scheme was found. The test participants largely preferred the physical controllers over the hand-and-gesture control scheme. There were no significant differences in the ability to make adjustments in general when comparing groups of more experienced and less experienced audio engineers.
Convention Paper 10290 (Purchase now)

P14 - Spatial Audio, Part 3

Friday, October 18, 1:45 pm — 4:15 pm (1E10)

Chair:
Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland

P14-1 Measurement of Oral-Binaural Room Impulse Response by Singing Scales—Munhum Park, King Mongkut's Institute of Technology Ladkrabang - Bangkok, Thailand
Oral-binaural room impulse responses (OBRIRs) are the transfer functions from mouth to ears measured in a room. Modulated by many factors, OBRIRs contain information for the study of stage acoustics from the performer’s perspective and can be used for auralization. Measuring OBRIRs on a human is, however, a cumbersome and time-consuming process. In the current study some issues of the OBRIR measurement on humans were addressed in a series of measurements. With in-ear and mouth microphones volunteers sang scales, and a simple post-processing scheme was used to re?ne the transfer functions. The results suggest that OBRIRs may be measured consistently by using the proposed protocol, where only 4~8 diatonic scales need to be sung depending on the target signal-to-noise ratio.
Convention Paper 10291 (Purchase now)

P14-2 Effects of Capsule Coincidence in FOA Using MEMS: Objective Experiment—Gabriel Zalles, University of California, San Diego - La Jolla, CA, USA
This paper describes an experiment attempting to determine the effects of capsule coincidence in First Order Ambisonic (FOA) capture. While the spatial audio technique of ambisonics has been widely researched, it continues to grow in interest with the proliferation of AR and VR devices and services. Specifically, this paper attempts to determine whether the increased capsule coincidence afforded by Micro-Electronic Mechanical Systems (MEMS) capsules can help increase the impression of realism in spatial audio recordings via objective and subjective analysis. This is the first of a two-part paper.
Convention Paper 10292 (Purchase now)

P14-3 Spatial B-Format Equalization—Alexis Favrot, Illusonic GmbH - Uster, Switzerland; Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
Audio corresponding to the moving picture of a virtual reality (VR) camera can be recorded using a VR microphone. The resulting A or B-format channels are decoded with respect to the look-direction for generating binaural or multichannel audio following the visual scene. Existing post-production tools are limited to only linear matrixing and filtering of the recorded channels when only the signal of a VR microphone is available. A time-frequency adaptive method is presented: providing native B-format manipulations, such as equalization, which can be applied to sound arriving from a specific direction with a high spatial resolution, yielding a backwards compatible modified B-format signal. Both linear and adaptive approaches are compared to the ideal case of truly equalized sources.
Convention Paper 10293 (Purchase now)

P14-4 Exploratory Research into the Suitability of Various 3D Input Devices for an Immersive Mixing Task—Diego I Quiroz Orozco, McGill University - Montreal, QC, Canada; Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada
This study evaluates the suitability of one 2D (mouse and fader) and three 3D (Leap Motion, Space Mouse, Novint Falcon) input devices for an immersive mixing task. A test, in which subjects were asked to pan a monophonic sound object (probe) to the location of a pink noise burst (target), was conducted in a custom 3D loudspeaker array. The objectives were to determine how quickly the subjects were able to perform the task using each input device, which of the four was most appropriate for the task, and which was most preferred overall. Results show significant differences in response time between 2D and 3D input devices. Furthermore, it was found that localization blur had a significant influence over the subject’s response time, as well as “corner” locations.
Convention Paper 10294 (Purchase now)

P14-5 The 3DCC Microphone Technique: A Native B-format Approach to Recording Musical Performance—Kathleen "Ying-Ying" Zhang, New York University - New York, NY, USA; McGill University - Montreal, QC, Canada; Paul Geluso, New York University - New York, NY, USA
In this paper we propose a “native” B-format recording technique that uses dual-capsule microphone technology. The three dual coincident capsule (3DCC) microphone array is a compact sound?eld capturing system. 3DCC’s advantage is that it requires minimal matrix processing during post-production to create either a B-format signal or a multi-pattern, discrete six-channel output with high stereo compatibility. Given its versatility, the system is also capable of producing a number of different primary and secondary signals that are either natively available or derived in post-production. A case study of the system’s matrixing technique has resulted in robust immersive imaging in a multichannel listening environment, leading to the possibility of future development of the system as a single six-channel soundfield microphone.
Convention Paper 10295 (Purchase now)

P15 - Audio Education

Saturday, October 19, 9:00 am — 11:30 am (1E10)

Chair:
Amandine Pras, Digital Audio Arts - University of Lethbridge - Lethbridge, Alberta, Canada; School for Advanced Studies in the Social Sciences - Paris, France

P15-1 Production Processes of Pop Music Arrangers in Bamako, Mali—Amandine Pras, Digital Audio Arts - University of Lethbridge - Lethbridge, Alberta, Canada; School for Advanced Studies in the Social Sciences - Paris, France; Kierian Turner, University of Lethbridge - Lethbridge, AB, Canada; Toby Bol, University of Lethbridge - Lethbridge, Canada; Emmanuelle Olivier, CNRS Centre Georg Simmel (EHESS) - Paris, France
Bamako, economic capital of Mali in West Africa, saw the recent multiplication of digital studios based on Cubase 5, FL Studio, cracked plugins, a MIDI keyboard, and a small cabin with a cheap condenser microphone and a pop-filter. From videos and screen captures of recording sessions in three of these studios, we analyzed the creative process of four DAW practitioners from the beginning of the beat production to the mastering of the track. We also examined their interaction with the singers and rappers. Our analyses showed that young Malian DAW practitioners constantly revisit their MIDI arrangement and vocal recordings with advanced editing techniques. Locally successful, they have quickly developed a notoriety that enables them to be directive with their clients.
Convention Paper 10296 (Purchase now)

P15-2 Towards a Pedagogy of Multitrack Audio Resources for Sound Recording Education—Kirk McNally, University of Victoria, School of Music - Victoria, BC, Canada; Paul Thompson, Leeds Beckett University - Leeds, West Yorkshire, UK; Ken Scott, Leeds Beckett University - Leeds, UK
This paper describes preliminary research into pedagogical approaches to teach and train sound recording students using multitrack audio recordings. Two recording sessions are described and used to illustrate where there is evidence of technical, musical, and socio-cultural knowledge in multitrack audio holdings. Approaches for identifying, analyzing, and integrating this into audio education are outlined. This work responds to the recent AESTD 1002.2.15-02 recommendation for delivery of recorded music projects and calls from within the field to address the advantages, challenges, and opportunities of including multitrack recordings in higher education teaching and research programs.
Convention Paper 10297 (Purchase now)

P15-3 Withdrawn—N/A

Convention Paper 10298 (Purchase now)

P15-4 Mental Representations in Critical Listening Education: A Preliminary Study—Stephane Elmosnino, University of Technology Sydney - Sydney, New South Wales, Australia; SAE Institute - Brisbane, Queensland, Australia
This paper reports on a survey of critical listening training offered at tertiary education providers in the USA, UK, Australia, and Canada. The purpose of the investigation is to explore the concept of mental representations in educational contexts, as instructional materials do not always consider this aspect despite a rich research terrain in the field. The analysis shows a wide diversity of instructional methods used, seemingly influenced by course subject matter and institution business model. It also reveals a need to accurately define the concept of critical listening, depending on the context of its use. This study provides the background to a proposed evaluation of the effectiveness of mental representation models applied to new instructional designs.
Convention Paper 10299 (Purchase now)

P15-5 The Generation Gap—Perception and Workflow of Analog vs. Digital Mixing—Ryland Chambers-Moranz, University of Lethbridge - Lethbridge, AB, Canada; Amandine Pras, Digital Audio Arts - University of Lethbridge - Lethbridge, Alberta, Canada; School for Advanced Studies in the Social Sciences - Paris, France; Nate Thomas, University of Lethbridge - Lethbridge, AB, Canada
Are sound engineers showing preference for the mixing technology of their generation? We interviewed producer Ezequiel Morfi who owns TITANIO in Buenos Aires and contrasted his opinions with those of four mixers based in Western Canada who were required to use analog-only or digital-only mixing tools when preparing stimuli for this study. To ascertain the myths about which technology sounds superior, 19 trained listeners of ages 17–37 compared analog and digital mixing versions of 8 pop-rock tracks in a double-blind listening test. The main results showed that the analog version of one track was significantly preferred by 79% of the listeners (p=.02), and we observed a slight trend towards the significance of age on preference for the analog format (p=.09).
Convention Paper 10300 (Purchase now)

P16 - Posters: Spatial Audio

Saturday, October 19, 10:30 am — 12:00 pm (South Concourse A)

P16-1 Calibration Approaches for Higher Order Ambisonic Microphone Arrays—Charles Middlicott, University of Derby - Derby, UK; Sky Labs Brentwood - Essex, UK; Bruce Wiggins, University of Derby - Derby, Derbyshire, UK
Recent years have seen an increase in the capture and production of ambisonic material due to companies such as YouTube and Facebook utilizing ambisonics for spatial audio playback. Consequently, there is now a greater need for affordable high order microphone arrays due to this uptake in technology. This work details the development of a five-channel circular horizontal ambisonic microphone intended as a tool to explore various optimization techniques, focusing on capsule calibration & pre-processing approaches for unmatched capsules.
Convention Paper 10301 (Purchase now)

P16-2 A Qualitative Investigation of Soundbar Theory—Julia Perla, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
This study investigated basic acoustic principals and assumptions that form the foundation of soundbar technology. A qualitative listening test compared 12 original soundscape scenes each comprising five stationary and two moving auditory elements. Subjects listened to a 5.1 reference scene and were asked to rate “spectral clarity and richness of sound,” “width and height,” and “immersion and envelopment” of stereophonic, soundbar, and 5.1 versions of each scene. ANOVA revealed a significant effect for all three systems. In all three attribute groups, stereophonic was rated lowest, followed by soundbar, then surround. Results suggest waveguide-based “soundbar technology” might provide a more immersive experience than stereo but will not likely be as immersive as true surround reproduction.
Convention Paper 10302 (Purchase now)

P16-3 The Effect of the Grid Resolution of Binaural Room Acoustic Auralization on Spatial and Timbral Fidelity—Dale Johnson, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This paper investigates the effect of the grid resolution of binaural room acoustic auralization on spatial and timbral fidelity. Binaural concert hall stimuli were generated using a virtual acoustics program utilizing image source and ray tracing techniques. Each image source and ray were binaurally synthesized using Lebedev grids of increasing resolution from 6 to 5810 (reference) points. A MUSHRA test was performed where subjects rated the magnitudes of spatial and timbral differences of each stimulus to the reference. Overall, it was found that on the MUSHRA scale, 6 points were perceived to be "Fair," 14 points "Good," and 26 points and above all "Excellent" on the grading scale, for both spatial and timbral fidelity.
Convention Paper 10303 (Purchase now)

P16-4 A Compact Loudspeaker Matrix System to Create 3D Sounds for Personal Uses—Aya Saito, University of Aizu - Aizuwakamatsu City, Japan; Takahiro Nemoto, University of Aizu - Aizuwakamatsu, Japan; Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
In this paper we propose a new 3D sound system in two-layers as a matrix that has five loudspeakers on each side of the listener. The system is effective for sound localization and compact for personal use. Sound images in this system are created by extended amplitude panning method, with the effect of head-related transfer functions (HRTFs). Performance evaluation of the system for sound localization was made by auditory experiments with listeners. As the result, listeners could distinguish sound image direction localized at any azimuth direction and high elevation direction with small biases.
Convention Paper 10304 (Purchase now)

P16-5 Evaluation of Spatial Audio Quality of the Synthesis of Binaural Room Impulse Responses for New Object Positions—Stephan Werner, Technische Universität Ilmenau - Ilmenau, Germany; Florian Klein, Technische Universität Ilmenau - Ilmenau, Germany; Clemens Müller, Technical University of Ilmenau - Ilmenau, Germany
The aim of auditory augmented reality is to create an auditory illusion combining virtual audio objects and scenarios with the perceived real acoustic surrounding. A suitable system like position-dynamic binaural synthesis is needed to minimize perceptual conflicts with the perceived real world. The needed binaural room impulse responses (BRIRs) have to fit the acoustics of the listening room. One approach to minimize the large number of BRIRs for all source-receiver relations is the synthesis of BRIRs using only one measurement in the listening room. The focus of the paper is the evaluation of the spatial audio quality. In most conditions differences in direct-to-reverberant-energy ratio between a reference and the synthesis is below the just noticeable difference. Furthermore, small differences are found for perceived overall difference, distance, and direction perception. Perceived externalization is comparable to the usage of measured BRIRs. Challenges are detected to synthesize more further away sources from a source position that is more close to the listening positions.
Convention Paper 10305 (Purchase now)

P16-6 Withdrawn—N/A

P16-7 An Adaptive Crosstalk Cancellation System Using Microphones at the Ears—Tobias Kabzinski, RWTH Aachen University - Aachen, Germany; Peter Jax, RWTH Aachen University - Aachen, Germany
For the reproduction of binaural signals via loudspeakers, crosstalk cancellation systems are necessary. To compute the crosstalk cancellation filters, the transfer functions between loudspeakers and ears must be given. If the listener moves the filters are usually updated based on a model or previously measured transfer functions. We propose a novel architecture: It is suggested to place microphones close to the listener’s ears to continuously estimate the true transfer functions and use those to adapt the crosstalk cancellation filters. A fast frequency-domain state-space approach is employed for multichannel system tracking. For simulations of slow listener rotations it is demonstrated by objective and subjective means that the proposed system successfully attenuates crosstalk of the direct sound components.
Convention Paper 10307 (Purchase now)

P16-8 Immersive Sound Reproduction in Real Environments Using a Linear Loudspeaker Array—Valeria Bruschi, Univeresità Politecnica delle Marche - Ancona, Italy; Nicola Ortolani, Università Politecnica delle Marche - Ancona (AN), Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
In this paper an immersive sound reproduction system capable of improving the overall listening experience is presented and tested using a loudspeaker linear array. The system aims at providing a channel separation over a broadband spectrum by implementing the RACE (Recursive Ambiophonic Crosstalk Elimination) algorithm and a beamforming algorithm based on a pressure matching approach. A real time implementation of the algorithm has been performed and its performance has been evaluated comparing it with the state of the art. Objective and subjective measurements have con?rmed the effectiveness of the proposed approach.
Convention Paper 10308 (Purchase now)

P16-9 The Influences of Microphone System, Video, and Listening Position on the Perceived Quality of Surround Recording for Sport Content—Aimee Moulson, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This paper investigates the influences of the recording/reproduction format, video, and listening position on the quality perception of surround ambience recordings for sporting events. Two microphone systems—First Order Ambisonics (FOA) and Equal Segment Microphone Array (ESMA)—were compared in both 4-channel (2D) and 8-channel (3D) loudspeaker reproductions. One subject group tested audio-only conditions while the other group was presented with video as well as audio. Overall, the ESMA was rated significantly higher than the FOA for all quality attributes tested regardless of the presence of video. The 2D and 3D reproductions did not have a significant difference within each microphone system. Video had a significant interaction with the microphone system and listening position depending on the attribute.
Convention Paper 10309 (Purchase now)

P16-10 Sound Design and Reproduction Techniques for Co-Located Narrative VR Experiences—Marta Gospodarek, New York University - New York, NY, USA; Andrea Genovese, New York University - New York, NY, USA; Dennis Dembeck, New York University - New York, NY, USA; Flavorlab; Corinne Brenner, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Ken Perlin, New York University - New York, NY, USA
Immersive co-located theatre aims to bring the social aspects of traditional cinematic and theatrical experience into Virtual Reality (VR). Within these VR environments, participants can see and hear each other, while their virtual seating location corresponds to their actual position in the physical space. These elements create a realistic sense of presence and communication, which enables an audience to create a cognitive impression of a shared virtual space. This article presents a theoretical framework behind the design principles, challenges and factors involved in the sound production of co-located VR cinematic productions, followed by a case-study discussion examining the implementation of an example system for a 6-minute cinematic experience for 30 simultaneous users. A hybrid reproduction system is proposed for the delivery of an effective sound design for shared cinematic VR. Winner of the 147th AES Convention Best Peer-Reviewed Paper Award
Convention Paper 10287 (Purchase now)

P17 - Product Development

Saturday, October 19, 3:00 pm — 5:30 pm (1E09)

Chair:
Phil Brown, Dolby Laboratories - San Francisco, CA, USA

P17-1 Summed Efficiency-Method for Efficient Vented Box Speaker Design—Niels Elkjær Iversen, ICEpower a/s - Copenhagen, Denmark; Technical University of Denmark - Kgs. Lyngby, Denmark; Theis Christensen, ICEpower - Søborg, Denmark; Anders Bjørnskov, ICEpower - Søborg, Denmark; Lars Petersen, ICEpower A/S - Søborg, Denmark
Loudspeakers are inefficient and conventional design methods do not consider the efficiency in the design process. With the rise of Digital Signal Processing (DSP) the frequency response can be corrected with pre-filtering. Based on a frequency domain analysis of vented box enclosures this paper proposes a new method, the Summed efficiency-method (S?-method), for designing vented box enclosures. The method focuses on mapping the efficiency and SPL output vs. volume and tuning frequency enabling the designer to perform knowledge-based trade-off decisions in the design process. A design example shows how the method can be used to improve the efficiency with over 70% compared to a conventional maximum flat alignment for a compact Public Address (PA) subwoofer application.
Convention Paper 10310 (Purchase now)

P17-2 Loudspeaker Port Design for Optimal Performance and Listening Experience—Andri Bezzola, Samsung Research America - Valencia, CA USA; Allan Devantier, Samsung Research America, Audio Lab - Valencia, CA, USA; Elisabeth McMullin, Samsung Research America - Valencia, CA USA
Bass reflex ports produce noise at high sound-pressure levels due to turbulence and vortex shedding. Flared ports can reduce port noise compared to straight ports, but the optimal flare rate in ports has remained an unsolved problem. This work demonstrates that there is in fact an optimal amount of flare, and it proposes a design method based on acoustic Finite Element simulations to efficiently predict the optimal flare rate for given port dimensions. Optimality of the flare rate is confirmed with noise and compression measurements as well as double-blind listening tests. At onset of unwanted port noise, optimally flared ports can be played 1 to 3 dB louder than slightly under-flared or over-flared ports, and 10 to 16 dB louder than straight ports.
Convention Paper 10311 (Purchase now)

P17-3 A Method for Three-Dimensional Horn Geometry Optimization—Christopher Smolen, QSC, LLC - Costa Mesa, CA, USA; Jerome Halley, QSC, LLC - Costa Mesa, CA, USA
A method for three dimensional (3D) horn geometry optimization is introduced. The method uses 3D Computer Aided Design (CAD) combined with Finite Element Analysis (FEA), the Boundary Element Method (BEM) and scientific programming where: the acoustical properties of horn geometry parametrized in CAD are analyzed using FEA and BEM, and scientific programming is used to manipulate the parametrized geometry and optimize the horn according to specified objective functions. The example of a horn design using this method is presented together with measurements of the resulting geometry.
Convention Paper 10312 (Purchase now)

P17-4 A Perceptually-Motivated Headphone Transparency Algorithm—Josh Lando, Dolby Laboratories - San Francisco, CA, USA; Alex Brandmeyer, Dolby Laboratories - San Francisco, CA, USA; Phil Brown, Dolby Laboratories - San Francisco, CA, USA; Alan Seefeldt, Dolby Labs - San Francisco, CA, USA; Andy Jaspar, Dolby Laboratories - San Francisco, CA, USA
Many modern closed-back wireless headphones now support a user-selectable “hear-through” or “transparency” feature to allow the wearer to monitor their environment. These products typically work by passively mixing the signals from external microphones with the primary media being reproduced by the headphone’s internal speakers. When there is no media playing back, that approach works reasonably well. However, once media is playing, it tends to mask the passthrough of the external audio and the wearer can no longer hear the outside world. Here we describe a perceptually motivated algorithm for improving audibility of the external microphone signals without compromising the media playback experience. Subjective test results of this algorithm as implemented in a consumer headphone product are presented.
Convention Paper 10313 (Purchase now)

P17-5 Temporal Envelope-Based Psychoacoustic Modelling for Evaluating Non-Waveform Preserving Audio Codecs—Steven van de Par, University of Oldenburg - Oldenburg, Germany; Sascha Disch, Fraunhofer IIS - Erlangen, Germany; Andreas Niedermeier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Elena Burdiel Pérez, Fraunhofer IIS - Erlangen, Germany; Bernd Edler, Friedrich Alexander University - Erlangen-Nürnberg, Germany; Fraunhofer IIS - Erlangen, Germany
Masking models that evaluate the audibility of error signals have a limited validity for assessing perceptual quality of parametric codecs. We propose a model that transforms the audio signal into an Internal Representation (IR) consisting of temporal-envelope modulation patterns. Subsequently, the IR of original and encoded signals are compared between both signals. Even though the audio signals compared may be uncorrelated, leading to a large error signal, they may exhibit a very similar IR and hence are predicted to sound very similar. Additional post-processing stages modeling higher-level auditory perceptual phenomena such as Comodulation Masking Release are included. Predictions are compared against subjective quality assessment results obtained with encoding methods ranging from parametric processing methods up to classic waveform preserving codecs.
Convention Paper 10314 (Purchase now)

P18 - Posters: Perception

Saturday, October 19, 3:00 pm — 4:30 pm (South Concourse A)

P18-1 Comparison of Human and Machine Recognition of Electric Guitar Types—Renato Profeta, Ilmenau University of Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany; Fraunhofer Institute for Digital Media technology (IDMT) - Ilmenau, Germany
The classification of musical instruments for instruments of the same type is a challenging task not only to experienced musicians but also in music information retrieval. The goal of this paper is to understand how guitar players with different experience levels perform in distinguishing audio recordings of single guitar notes from two iconic guitar models and to use this knowledge as a baseline to evaluate the performance of machine learning algorithms performing a similar task. For this purpose we conducted a blind listening test with 236 participants in which they listened to 4 single notes from 4 different guitars and had to classify them as a Fender Stratocaster or an Epiphone Les Paul. We found out that only 44% of the participants could correctly classify all 4 guitar notes. We also performed machine learning experiments using k-Nearest Neighbours (kNN) and Support Vector Machines (SVM) algorithms applied to a classification problem with 1292 notes from different Stratocaster and Les Paul guitars. The SVM algorithm had an accuracy of 93.9%, correctly predicting 139 audio samples from the 148 present in the testing set.
Convention Paper 10315 (Purchase now)

P18-2 Preference for Harmonic Intervals Based on Overtone Content of Complex Tones—Benjamin Fox, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
This study investigated whether or not overtone structure generated preferential differences for harmonic intervals. The purpose of this study was to determine if the structure of a complex tone affects the perception of consonance in harmonic intervals. Prior studies suggest harmonicity as the basis for so-called “consonance” while others suggest exact ratios are not necessary. This test examined listener responses across three tonal “types” through a randomized double-blind trinomial forced-choice format. Stimuli types used full, odd, and even overtone series at three relative-magnitude loudness levels. Results revealed no effect of loudness and a generalized but highly variable trend for the even overtone series. However, some subjects exhibited a very strong preference for certain overtone combinations, while others demonstrated no preference.
Convention Paper 10316 (Purchase now)

P18-3 Just Noticeable Difference for Dynamic Range Compression via “Limiting” of a Stereophonic Mix—Christopher Hickman, Belmont University - Nashville, TN, USA; Wesley Bulla, Belmont University - Nashville, TN, USA
This study focused on the ability of listeners to discern the presence of dynamic range compression (DRC) when applied to a stereo recording. Past studies have primarily focused on listener preferences for stereophonic master recordings with varying levels of DRC. A modified two-down one-up adaptive test presented subjects with an increasingly “limited” stereophonic mix to determine the 70.7% response threshold. Results of this study suggest that DRC settings considered “normal” in recorded music production may be imperceptible when playback levels are loudness-matched. Outcomes of this experiment indicate the use of so-called “limiting” for commercial purposes, such as signal chain control, may have no influence on perceived quality; whereas, uses for perceived aesthetic advantages should be reconsidered.
Convention Paper 10317 (Purchase now)

P18-4 Discrimination of High-Resolution Audio without Music—Yuki Fukuda, Hiroshima City University - Hiroshima-shi, Japan; Shunsuke Ishimitsu, Hiroshima City University - Hiroshima, Japan
Nowadays, High-Resolution (Hi-Res) audio format, which has higher sampling frequency (Fs) and quantization bit number than the Compact disc (CD) format, is becoming extremely popular. Several studies have been conducted to clarify whether these two formats can be distinguished. However, most of the studies were conducted by only using music sources to reach a conclusion. In this paper we will try to bring out the problems due to the primary use of music sources for experimental purposes. We will also answer the question related to discrimination between hi-Res and CD formats using sources other than music, such as noise.
Convention Paper 10318 (Purchase now)

P18-5 Subjective Evaluation of Multichannel Audio and Stereo on Cell Phones—Fesal Toosy, University of Central Punjab - Lahore, Pakistan; Muhammad Sarwar Ehsan, University of Central Punjab - Lahore, Pakistan
With the increasing trend of using smart phones and other handheld electronic devices for accessing the internet, playback of audio in multichannel format would eventually gain popularity on such devices. Given the limited options for audio output on handheld electronic devices, it is important to know if multichannel audio offers an improvement in audio quality over other existing formats. This paper shows a subjective assessment test of multichannel audio versus stereo while played on a mobile phone using headphones. The results show that multichannel audio improves on perceived audio quality as compared to stereo.
Convention Paper 10319 (Purchase now)

Return to Paper Sessions

AES NEW YORK 2019
147^th PRO AUDIO CONVENTION

AES New York 2019
Paper Session Details

P01 - Applications in Audio

P02 - Audio Signal Processing

P3 - Posters: Transducers

P04 - Room Acoustics

P05 - Transducers

P06 - Posters: Audio Signal Processing

P07 - Perception

P08 - Recording, Production, and Live Sound

P09 - Posters: Applications in Audio

P10 - Spatial Audio, Part 1

P11 - Semantic Audio

P12 - Posters: Room Acoustics

P13 - Spatial Audio, Part 2

P14 - Spatial Audio, Part 3

P15 - Audio Education

P16 - Posters: Spatial Audio

P17 - Product Development

P18 - Posters: Perception

Shortcuts

AES NEW YORK 2019 147th PRO AUDIO CONVENTION

AES New York 2019Paper Session Details

P01 - Applications in Audio

P02 - Audio Signal Processing

P3 - Posters: Transducers

P04 - Room Acoustics

P05 - Transducers

P06 - Posters: Audio Signal Processing

P07 - Perception

P08 - Recording, Production, and Live Sound

P09 - Posters: Applications in Audio

P10 - Spatial Audio, Part 1

P11 - Semantic Audio

P12 - Posters: Room Acoustics

P13 - Spatial Audio, Part 2

P14 - Spatial Audio, Part 3

P15 - Audio Education

P16 - Posters: Spatial Audio

P17 - Product Development

P18 - Posters: Perception

Shortcuts

AES NEW YORK 2019
147^th PRO AUDIO CONVENTION

AES New York 2019
Paper Session Details