144th AES CONVENTION Paper Session Details

AES Milan 2018
Paper Session Details

P01 - Loudspeakers-Part 1

Wednesday, May 23, 09:30 — 12:30 (Scala 4)

Angelo Farina, Università di Parma - Parma, Italy

P01-1 Maximizing Efficiency in Active Loudspeaker SystemsWolfgang Klippel, Klippel GmbH - Dresden, Germany
Increasing the efficiency of the electro-acoustical conversion is the key to modern audio devices generating the required sound output with minimum size, weight, cost, and energy. There is unused potential for increasing the efficiency of the electro-dynamical transducer by using a nonlinear motor topology, a soft suspension, and cultivating the modal resonances in the mechanical and acoustical system. However, transducers optimized for maximum efficiency are more prone to nonlinear and unstable behavior. Nonlinear adaptive control can compensate for the undesired signal distortion, protect the transducer against overload, stabilize the voice coil position, and cope with time varying properties of the suspension. The paper discusses the design of modern active systems that combine the new opportunities provided by software algorithms with the optimization of the hardware components in the transducer and power amplifier.
Convention Paper 9908 (Purchase now)

P01-2 How Do We Make an Electrostatic Loudspeaker with Constant Directivity?Tim Mellow, Mellow Acoustics Ltd - Farnham, Surrey, UK
The idea of broadening the directivity pattern of a push-pull electrostatic loudspeaker by partitioning the stators into concentric annular rings, which are connected to tappings along a delay line, isn't new. However, the delay line has traditionally been attenuated to avoid response irregularities due to the finite size of the membrane. An alternative approach is presented here whereby a constant-impedance delay line is configured to imitate an oscillating sphere, which is an ideal constant-directivity dipole source that needs no attenuation. Walker's equation for the on-axis pressure does not account for the effect of the delay line without taking the vector sum of the currents though all the rings, so a simple alternative that does is presented here.
Convention Paper 9909 (Purchase now)

P01-3 Analysis of Front Loaded Low Frequency Horn LoudspeakersBjørn Kolbrek, Celestion International Ltd. - Ipswich, UK
The low frequency horn design procedures described by Keele and Leach are extended and generalized to cases where the horn is already specified, or where maximum output or the smoothest response is desired. The impact of finite-length horns is analyzed. A more detailed analysis of the high frequency range is given, where it is shown how the voice coil inductance can be taken into account to create a third order low pass filter of specified shape. A new analysis of reactance annulling is presented that significantly improves the performance above cutoff for a certain class of horns. Large signal behavior is touched upon, and finally, an analysis of the sensitivity of driver and system parameters is given.
Convention Paper 9910 (Purchase now)

P01-4 Design and Implementation of a High Efficiency SubwooferSebastian Tengvall, Technical University of Denmark - Kgs. Lyngby, Denmark; Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark
The demand for battery driven loudspeakers is increasing but the challenge of efficient low frequency reproduction remains. An alternative approach to the conventional 4th order bandpass enclosure design for a subwoofer to achieve a high peak in the passband and increase voltage sensitivity is investigated. The response is corrected with DSP to ensure a flat response in the passband. The results proved that this approach can increase the voltage sensitivity dramatically, reaching an average sensitivity of over 100 dB in the passband from 45 Hz to 90 Hz. It also showed that the design is sensitive to construction errors. Precise assembling is required to achieve satisfactory results while small errors can ruin the purpose of the design.
Convention Paper 9911 (Purchase now)

P01-5 Parameterization of Micro Speakers Used in Cell Phones and EarbudsJason McIntosh, SAATI SPA - Appiano Gentile (C0), Italy
Loudspeaker parameterizations based on measuring a transfer matrix is presented. This approach produces nine complex frequency dependent functions. While Thiele-Small parameters only capture the first piston mode of a moving coil speaker, the transfer matrix approach captures all the linear behavior of the speaker, including the diaphragm modes and internal geometry. Predictions of baffled response of two speakers are presented with the transfer matrix parameters producing better results than the Thiele-Small parameters, especially at high frequencies. The SAATI "Ares" acoustic simulator uses the transfer matrix parameters for effective simulation of complete devices, including proper porting geometry and dampening acoustic meshes for better audio tuning.
Convention Paper 9912 (Purchase now)

P01-6 A Fast And Efficient Model for Transmission Line LoudspeakersJames Hipperson, University of Salford - Salford, UK; Jamie Angus, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK; Jonathan Hargreaves, University of Salford - Salford, UK
Transmission Line loudspeakers use a tube behind the driver that is lined, or filled, with absorber to remove the rear radiation. They also use the resonances of the pipe to support the radiation of the driver and reduce displacement at low frequencies. While lumped element models are used for modeling sealed and vented box enclosures, they cannot be used for transmission line loudspeakers because they cannot be accurately modeled as a lumped element. Finite Element and Boundary Element models can be used but they are complex and computationally expensive. A cascaded two port method has been developed that can model varying tube area and absorption. It has been evaluated against acoustic measurements and shown to provide accurate predictions.
Convention Paper 9913 (Purchase now)


P02 - Audio Quality Part 1

Wednesday, May 23, 10:30 — 12:30 (Scala 2)

Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany

P02-1 An Auditory Model-Inspired Objective Speech Intelligibility Estimate for Audio SystemsJayant Datta, Audio Precision - Beaverton, OR, USA; Xinhui Zhou, Audio Precision - Beaverton, OR, USA; Joe Begin, Audio Precision - Beaverton, OR, USA; Mark Martin, Audio Precision - Beaverton, OR, USA
Compared with subjective tests, objective measures save time and money. This paper presents the implementation of a new algorithm for objective speech intelligibility, based on the modified rhyme test using real speech. An auditory-model inspired signal processing framework approach gathers word selection evidence in auditory filter bank correlations and then uses an auditory attention model to perform word selection. It has been shown to outperform popular measures in terms of Pearson correlation coefficient to the human intelligibility scores. A real-time version of this approach has been integrated into a versatile audio test and measurement system supporting a number of interfaces (different combinations of devices/channels/systems). Examples and measurement results will be presented to show the advantages of this approach.
Convention Paper 9918 (Purchase now)

P02-2 A Statistical Model that Predicts Listeners’ Preference Ratings of Around-Ear and On-Ear HeadphonesSean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International Inc. - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA
A controlled listening test was conducted on 31 different models of around-ear (AE) and on-ear (OE) headphones to determine listeners’ sound quality preferences. One-hundred-thirty listeners both trained and untrained rated the headphones based on preference using a virtual headphone method that used a single replicator headphone equalized to match magnitude and minimum phase responses of the different headphones. Listeners rated seven different headphones in each trial that included high (the new Harman AE-OE target curve) and low anchors. On average, both trained and untrained listeners preferred the high anchor to 31 other choices. Using machine learning a model was developed that predicts the listeners’ headphone preference ratings of the headphones based on deviation in magnitude response from the Harman target curve. Paper will be presented by Todd Welti]
Convention Paper 9919 (Purchase now)

P02-3 Comparing the Effect of HRTF Processing Techniques on Perceptual Quality RatingsAreti Andreopoulou, Laboratory of Music Acoustics and Technology (LabMAT) National and Kapodistrian University of Athens - Athens, Greece; Brian F. G. Katz, Sorbonne Université, CNRS, Institut Jean Le Rond d'Alembert - Paris, France
The use of Head-Related Transfer Functions for binaural rendering of spatial audio is quickly emerging in today’s audio market. Benefits of individual HRTFs, or personalized HRTF selection, have been demonstrated in numerous previous studies. A number of recent works have examined assisted or automated selection of HRTFs for optimized personalization. Such techniques attempt to rank HRTFs according to expected spatial quality for a given user based on signal, morphological, and/or perceptual studies. In parallel, there exist several HRTF processing methods that are often used to compact and/or smooth HRTFs in order to facilitate real-time treatments. Nevertheless, the potential impact of such processes on HRTF spatial quality is not always considered. This study examines the effects of three commonly used HRTF processing techniques (spectral smoothing in constant absolute bandwidths, minimum-phase decomposition, and infinite impulse response modeling) on perceptual quality ratings of selected HRTFs. Results showed that the frequency and phase-spectra variations introduced in the data by the three processing methods can lead to significant changes in HRTF evaluations. In addition, they highlight the challenging nature of non-individualized HRTF rating tasks and establish the need for systematic participant screening and sufficient task repetitions in perceptual HRTF evaluation studies.
Convention Paper 9920 (Purchase now)

P02-4 The Effect of Visual Cues and Binaural Rendering Method on Plausibility in Virtual EnvironmentsWill Bailey, University of Salford - Salford, UK; Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK
Immersive virtual reality is by its nature a multimodal medium and the use of spatial audio renderers for VR development is widespread. The aim of this study was to assess the performance of two common rendering methods and the effect of the presence of visual cues on plausibility of rendering. While it was found that the plausibility of the rendered audio was low, the results suggest that the use of measured responses performed comparatively better. In addition, absence of virtual sources reduced the number of simulated stimuli identified as real sources and complete absence of visual stimuli increased the rate of simulated audio identified emitted from the loudspeakers.
Convention Paper 9921 (Purchase now)


P03 - Audio Quality Part 2

Wednesday, May 23, 14:00 — 15:30 (Scala 2)

Todd Welti, Harman International Inc. - Northridge, CA, USA

P03-1 Stereo Image Localization Maps for Loudspeaker Reproduction in RoomsGavriil Kamaris, University of Patras - Rion Campus, Greece; John Mourjopoulos, University of Patras - Patras, Greece
A novel approach is proposed for maps illustrating the accuracy of image source position reproduced by stereo systems in various scenarios of listening rooms / loudspeakers. Based on previous work by the authors, the maps unify results derived from an image localization classifier and the sweet spot area metric estimating the direction of arrival angles (DOA) of all potential image source positions reproduced by the system via a perceptual model. The statistical analysis of these maps indicates the skewness and kurtosis of the DOA classification error and hence the consistency and robustness of the image definition along the plane of listener positions. Results utilize such parameter mappings to objectively illustrate the robustness of this qualitative aspect of audio reproduction in rooms.
Convention Paper 9922 (Purchase now)

P03-2 Method for Quantitative Evaluation of Auditory Perception of Nonlinear DistortionMikhail Pahomov, SPb Audio R&D Lab - St. Petersburg, Russia; Victor Rozhnov, SPb Audio R&D Lab - St. Petersburg, Russia
All loudspeakers, amplifiers, and transmission paths introduce various types of distortion into audio signals. While energy contribution of nonlinear distortion to the distortion signal is comparatively small it has a significant impact on the sound signal quality in terms of the auditory perception. Similar energy characteristics of a nonlinear distortion signal can affect the subjective quality evaluation to different extents. This makes it important to accurately extract the nonlinear distortion signal from a musical signal in the simultaneous presence of significant distortions of other types for the auditory perception relevant objective evaluation of nonlinear distortion. The paper offers a method for quantitative evaluation of a nonlinear distortion signal in terms of its audibility and a method for prediction of subjective ratings of signals with nonlinear distortion.
Convention Paper 9923 (Purchase now)

P03-3 On the Effect of Inter-Channel Level Difference Distortions on the Perceived Subjective Quality of Stereo SignalsArmin Taghipour, Empa, Laboratory for Acoustics/Noise Control - Dübendorf, Switzerland; Fraunhofer IIS - Erlangen, Germany; Nadja Schinkel-Bielefeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Silvantos GmbH - Erlangen, Germany; Ankit Kumar, International Audio Laboratories Erlanged, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Pablo Delgado, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Perceptual audio coding at low bitrates and stereo enhancement algorithms can affect perceived quality of stereo audio signals. Besides changes in timbre, also the spatial sound image can be altered, resulting in quality degradations compared to an original reference. While effects of timbre degradation on quality are well-understood, effects of spatial distortions are not sufficiently known. This paper presents a study designed to quantify the effect of Inter-Channel Level Difference (ICLD) errors on perceived audio quality. Results show systematic effects of ICLD errors on quality: bigger ICLD errors led to greater quality degradation. Spectral portions containing relatively higher energy were affected more strongly.
Convention Paper 9924 (Purchase now)


P04 - Loudspeakers-Part 2

Wednesday, May 23, 14:00 — 16:30 (Scala 4)

Jamie Angus, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK

P04-1 Live Sound Subwoofer System Performance QuantificationAdam J. Hill, University of Derby - Derby, Derbyshire, UK
The general aim of live sound reinforcement is to deliver an appropriate and consistent listening experience across an audience. Achieving this in the subwoofer range (typically between 20-100 Hz) has been the focus of previous work, where techniques have been developed to allow for consistent sound energy distribution over a wide area. While this provides system designers with a powerful set of tools, it brings with it many potential metrics to quantify performance. This research identifies key indicators of subwoofer system performance and proposes a single weighted metric to quantify overall performance. Both centrally-distributed and left/right configurations are analyzed using the new metric to highlight functionality.
Convention Paper 9925 (Purchase now)

P04-2 Don't Throw the Loudspeaker Out with the Bathwater! Two Case Studies Regarding End-of-Line Tests in the Automotive Loudspeaker IndustryEnrico Esposito, Ask Industries S.p.A. - Monte San Vito, Italy; Angelo Farina, Università di Parma - Parma, Italy; Pietro Massini, Ask Industries S.p.A. - Monte San Vito (AN), Italy
Mass production of loudspeakers drivers for the automotive market is subjected to the strong requirements dictated by the implementation of the sector Quality System and is heavily conditioned by the low profit margin of what is seen (and actually is) a commodity as many other components of a vehicle; but, differently from other components, a loudspeaker is a complex system made of parts whose performance depends on many factors, including ambient conditions. For these reasons it is quite difficult to impose tight tolerances on loudspeakers and a fair agreement must be found between suppliers and customers to avoid scraping samples that are fine under any aspect, especially considering that the final judgment mainly stays, although not exclusively, in the ears of the end user. In this work two case studies are presented to show how tolerances could be fixed reasonably.
Convention Paper 9926 (Purchase now)

P04-3 Fast and Sensitive End-of-Line TestingStefan Irrgang, Klippel GmbH - Dresden, Germany; Wolfgang Klippel, Klippel GmbH - Dresden, Germany
Measurement time is a crucial factor for the total cost and feasibility of end-of-line quality control. This paper discusses new strategies minimizing the test time for transducers and audio systems while ensuring high sensitivity of defect detection, extracting comprehensive diagnostics information and using available resources in the best possible way. Modern production lines are fully automated and benefit highly from high speed testing. Optimal test stimuli and sophisticated processing in combination with multichannel test design are the key factors for smart testing. Appropriate acoustical, mechanical, and electrical sensors are discussed and suggested. Furthermore, parallel or alternating test schemes reduce the overall test time. Finally, typical concerns and pitfalls when testing at high speed are addressed and illustrated by measurement results.
Convention Paper 9927 (Purchase now)

P04-4 Optimal Material Parameter Estimation by Fitting Finite Element Simulations to Loudspeaker MeasurementsWilliam Cardenas, Klippel GmbH - Dresden, Germany; Wolfgang Klippel, Klippel GmbH - Dresden, Germany
Important characteristics for the sound quality of loudspeakers like frequency response and directivity are determined by the size, geometry, and material parameters of the components interfacing the acoustic field. The higher-order modes after cone break-up play an important role in wideband transducers and require a careful design of the cone, surround, and other soft parts to achieve the desired performance. Finite Element Analysis is a powerful simulation tool but requires accurate material parameters (complex Young's modulus as a function of frequency) to provide meaningful results. This paper addresses this problem and provides optimal material parameters by fitting the FEA model to an existing loudspeaker prototype measured by Laser vibrometry. This method validates the accuracy of the FEA simulation and gives further information to improve the modeling.
Convention Paper 9928 (Purchase now)

P04-5 Vision Methods Applied to Measure the Displacement of Loudspeaker MembranesThomas Durand-Texte, Le Mans Université - Le Mans, France; Manuel Melon, Le Mans Université - Le Mans cedex 9, France; Elisabeth Simonetto, Laboratoire Géomatique et Foncier - Le Mans, France; Stéphane Durand, Laboratoire Géomatique et Foncier - Le Mans, France; Marie-Hélène Moulet, Centre de Transfert de Technologie du Mans - Le Mans, France
Increased interest has been witnessed over the past decades for techniques measuring the vibration of a loudspeaker membrane. In this proceeding, vision methods have been adapted to measure the displacement of the cones of low-frequency drivers. The movement of the membrane is recorded with industrial or consumer market high-speed cameras with frame rates greater than or equal to 240 fps. The measured displacement shows acceptable to very good agreement with the one measured by a laser vibrometer, depending on the camera model. The displacement of the membrane, coupled to the measured electrical impedance, can be used to retrieve the small signal parameters or some non-linear parameters of the speaker. Finally, the vision methods are used to retrieve the vibration patterns of the membrane.
Convention Paper 9929 (Purchase now)


P05 - Posters: Applications

Wednesday, May 23, 14:15 — 15:45 (Arena 2)

P05-1 Grid-Based Stage Paradigm with Equalization Extension for “Flat-Mix” ProductionJonathan Wakefield, University of Huddersfield - Huddersfield, UK; Christopher Dewey, University of Huddersfield - Huddersfield, UK; William Gale, University of Huddersfield - Huddersfield, UK
In the Stage Paradigm (SP) the visual position of each channel widget represents the channel’s level and pan position. The SP has received favorable evaluation but does not scale well to high track counts because channels with similar pan positions and level visually overlap/occlude. This paper considers a modified SP for creating a “flat-mix” that provides coarse control of channel level and pan position using a grid-based, rather than continuous, stage and extends the concept to EQ visualization. Its motivation was to convert the “overlap” deficiency of the SP into an advantage. All subjects were faster at creating audibly comparable flat-mixes with the novel SP. Subject selected satisfaction keywords were also very positive.
Convention Paper 9930 (Purchase now)

P05-2 Real Time Implementation of an Active Noise Control for Snoring ReductionStefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Alessandro Terenzi, Universita Politecnica delle Marche - Ancona, Italy; Paolo Peretti, Leaff Engineering - Osimo, Italy; Ferruccio Bettarelli, Leaff Engineering - Osimo, Italy
Snoring is a well-known problem in our society. Active noise control systems can be applied to partially solve this annoyance. In this context, the presented work aims at proposing a real-time implementation of an active noise control algorithm for snoring reduction by means of a DSP embedded platform and an innovative headboard equipped with microphones and loudspeakers. Several experimental results with different snoring signals have shown the potentiality of the proposed approach in terms of computational complexity and noise reduction.
Convention Paper 9931 (Purchase now)

P05-3 Identification of Nonlinear Audio Devices Exploiting Multiple-Variance Method and Perfect SequencesSimone Orcioni, Universita Politecnica delle Marche - Ancona, Italy; Alberto Carini, University of Urbino Carlo Bo - Urbino, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Alessandro Terenzi, Universita Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Multiple-variance identification methods are based on the use of input signals with different powers for nonlinear system identification. They overcome the problem of the locality of the solution of traditional identification methods that well approximates the system only for inputs with approximately the same power of the identification signal. In this context, it is possible to further improve the nonlinear filter estimation exploiting as input signals the perfect periodic sequences that guarantee the orthogonality of the Wiener basis functions used for identification. Experimental results involving real measurements show that the proposed approach can accurately model nonlinear devices on a wide range of input variances. This property is particularly useful when modeling systems with high dynamic inputs, like audio amplifiers.
Convention Paper 9932 (Purchase now)

P05-4 Power Saving Audio Playback Algorithm Based on Auditory CharacteristicsMitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan; Wataru Kubota, Kyushu Institute of Technology - Fukuoka, Japan; Mitsuhiro Nakagawara, Panasonic - Yokohama City, Kanagawa, Japan
Music appreciation can be achieved with a variety of manners such as smartphones, portable music players, car audio, and high-end audio systems. Power consumption is one of the important issues for portable electronic devices and electric vehicles. In this paper a power saving audio playback algorithm is proposed in restraint of perceptual distortion. An original music source is passed through filterbanks and is reconstructed after increasing or decreasing the narrow-band component in each channel. The channel-dependent manipulation is carefully done not to cause perceptual distortion. The feasibility of the proposed method is evaluated by both measuring consumption current while music playback and carrying out a listening test.
Convention Paper 9933 (Purchase now)

P05-5 Deep Neural Networks for Road Surface Roughness Classification from Acoustic SignalsLivio Ambrosini, Universita Politecnica delle Marche - Ancona, Italy; ASK Industries S.p.A. - Montecavolo di Quattro Castella (RE), Italy; Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Fabio Vesperini, Universita Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Luca Cattani, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy
Vehicle noise emissions are highly dependent on the road surface roughness and materials. A classification of the road surface conditions may be useful in several regards, from driving assistance to in-car audio equalization. With the present work we exploit deep neural networks for the classification of the road surface roughness using microphones placed inside and outside the vehicle. A database is built to test our classification algorithms and results are reported, showing that the roughness classification is feasible with the proposed approach.
Convention Paper 9934 (Purchase now)

P05-6 Elicitation and Quantitative Analysis of User Requirements for Audio Mixing InterfaceChristopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
Existing Audio Mixing Interfaces (AMIs) have focused primarily on track level and pan and related visualizations. This paper places the user at the start of the AMI design process by reconsidering what are the most important aspects of an AMI’s visual feedback from a user’s perspective and also which parameters are most frequently used by users. An experiment was conducted with a novel AMI which in one mode provides the user with no visual feedback. This enabled the qualitative elicitation of the most desired visual feedback from test subjects. Additionally, logging user interactions enabled the quantitative analysis of time spent on different mix parameters. Results with music technology undergraduate students suggest that AMIs should concentrate on compression and EQ visualization.
Convention Paper 9935 (Purchase now)

P05-7 Real-Time Underwater Spatial Audio: A Feasibility StudySymeon Delikaris-Manias, Aalto University - Helsinki, Finland; Leo McCormack, Aalto University - Espoo, Finland; Ilkka Huhtakallio, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
In recent years, spatial audio utilizing compact microphone arrays has seen many advancements due to emerging virtual reality hardware and computational advances. These advances can be observed in three main areas of spatial audio, namely: spatial filtering, direction of arrival estimation, and sound reproduction over loudspeakers or headphones. The advantage of compact microphone arrays is their portability, which permits their use in everyday consumer applications. However, an area that has received minimal attention is the field of underwater spatial audio, using compact hydrophone arrays. Although the principles are largely the same, microphone array technologies have rarely been applied to underwater acoustic arrays. In this feasibility study we present a purpose built compact hydrophone array, which can be transported by a single diver. This study demonstrates a real-time underwater acoustic camera for underwater sound-field visualization and a parametric binaural rendering engine for auralization.
Convention Paper 9936 (Purchase now)

P05-8 Dual-Band PWM for Filterless Class-D Audio AmplificationKonstantinos Kaleris, University of Patras - Patras, Greece; Charalampos Papadakos, University of Patras - Rio, Greece; John Mourjopoulos, University of Patras - Patras, Greece
The benefits of Dual-Band Pulse Width Modulation (DBPWM) are demonstrated for filter-less audio class-D amplifier implementations. DBPWM is evaluated in terms of energy efficiency (thermal loading) and reproduction fidelity for direct driving of loudspeaker units by DBPWM signals. Detailed physical models of low-frequency (woofer) and high-frequency (tweeter) loudspeakers are employed for simulation of the coupling between the DBPWM signal and the electro-mechanical and acoustic properties of loudspeaker systems in the broadband PWM spectral range. Derived frequency responses are used to estimate the reproduced sound signal of the loudspeaker. Equivalent impedance of the speaker's voice coil is used to estimate thermal loading by the DBPWM signal's out-of-band spectral energy, comparing standard filtered PWM implementations to the proposed method.
Convention Paper 9937 (Purchase now)

P05-9 Low Cost Algorithm for Online Segmentation of Electronic Dance MusicEmanuel Aguilera, Universitat Politecnica de Valencia - Valencia, Spain; Jose J. Lopez, Universidad Politecnica de Valencia - Valencia, Spain; Pablo Gutierrez-Parera, Universitat Politecnica de Valencia - Valencia, Spain; Carlos Hernandez, Universitat Politecnica de Valencia - Valencia, Spain
Visual content animation for Electronic Dance Music (EDM) events is an emerging and demanded but costly service for the industry. In this paper an algorithm for automatic EDM structure segmentation is presented, suitable for the automation of video and 3D animation synchronized to the audio content. The algorithm implements low cost time/frequency features smoothed with a novel algorithm, resulting in a low latency output. Segmentation stage is based on a multiple threshold algorithm specifically tuned to EDM. It has been implemented in C performing in real-time and has been successfully integrated in a real-time commercial 3D graphics engine with great results over EDM music sets.
Convention Paper 10027 (Purchase now)


P06 - Posters: Transducers

Wednesday, May 23, 16:00 — 17:30 (Arena 2)

P06-1 Acoustic Power Based Evaluation of Loudspeaker Distortion MeasurementsCharalampos Ferekidis, Ingenieurbuero Ferekidis - Lemgo, Germany
Traditionally, drive-unit related distortion figures are evaluated “on-axis.” This method leads to straightforward results as long as the frequency of the analyzed distortion component stays below the driver’s modal break-up region. When the distortion component is entering the drive-unit’s break-up region an on-axis based distortion analysis approach can become difficult if not meaningless. Therefore an alternative method is proposed which makes use of acoustic harmonic distortion sound power values, instead of sound pressure levels acquired in a single observation point (on-axis). These acoustic power values are derived from high resolution directivity data for the fundamental-, 2nd- and 3rd-harmonic-distortion components. Results are shown for different kinds of drive-units and the implications are discussed.
Convention Paper 9914 (Purchase now)

P06-2 Anti-Rattle System Loudspeaker DeviceDario Cinanni, ASK Industries Spa - Monte San Vito, AN, Italy; Carlo Sancisi, ASK Insustries Spa - Monte San Vito, Italy
On the basis of loudspeaker cabinets and panels vibration problems, this paper deals with a new dynamic loudspeaker device capable to reduce mechanical vibrations transmitted to the panel where it is fixed. Virtual 3D prototype is designed and optimized by simulations. Simulations were carried out using analytical and finite element methods. A working prototype was realized, measured and then tested on a panel, in order to evaluate vibrations reduction.
Convention Paper 9915 (Purchase now)

P06-3 FEM Thermal Model of a Compression Driver: Comparison with Experimental ResultsMarco Baratelli, Faital S.p.A. - Milan, Italy; Grazia Spatafora, Faital S.p.A. - Milan, Italy; Emiliano Capucci, Faital S.p.A. - Milan, Italy; Romolo Toppi, Faital S.p.A. - Milan, Italy
A complete time domain thermal model of a compression driver was developed using COMSOL Multiphysics in order to predict heating phenomena and minimize potential damage. Heat transfer in the model relies on conduction, natural convection, and radiation all together ensuring a rigorous approach. Considerations accounting for power compression are also included in order to provide detail in the temperature prediction through time. Results are satisfactory and represent the outcome of an accurate method to predict operation limits of such devices, together with the change of magnetic induction in the air gap due to thermal effects.
Convention Paper 9916 (Purchase now)

P06-4 Dynamic Driver Linearization Using Current FeedbackJuha Backman, Genelec Oy - Iisalmi, Finland; Noiseless Acoustics - Helsinki, Finland
It is well known that the electromechanical parameters of a dynamic driver can be modified through current feedback to suit the requirements of the available enclosure size and desired bandwidth, with a reduced distortion as a commonly observed beneficial effect. However, especially when designing sealed enclosures it is possible to select the loudspeaker parameters from a continuum of combinations of effective moving mass, damping resistance, and compliance. This paper describes optimization of electrical source parameters to improve the linearity based on numerical solution of the nonlinear equation of motion of the driver using the measured driver parameters.
Convention Paper 9917 (Purchase now)


P07 - Spatial Audio-Part 1

Thursday, May 24, 09:00 — 12:30 (Scala 4)

Sascha Spors, University of Rostock - Rostock, Germany

P07-1 Continuous Measurement of Spatial Room Impulse Responses on a Sphere at Discrete ElevationsNara Hahn, University of Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
In order to analyze a sound field with a high spatial resolution, a large number of measurements are required. A recently proposed continuous measurement technique is suited for this purpose, where the impulse response measurement is performed by using a moving microphone. In this paper it is applied for the measurement of spatial room impulse responses on a spherical surface. The microphone captures the sound field on the sphere at discrete elevations while the system is periodically excited by the so-called perfect sequence. The captured signal is considered as a spatio-temporal sampling of the sound field, and the impulse responses are obtained by a spatial interpolation in the spherical harmonics domain. The elevation angles and the speed of the microphone are chosen is such a way that the spatial sampling points constitute a Gaussian sampling grid.
Convention Paper 9938 (Purchase now)

P07-2 Real-Time Conversion of Sensor Array Signals into Spherical Harmonic Signals with Applications to Spatially Localized Sub-Band Sound-Field AnalysisLeo McCormack, Aalto University - Espoo, Finland; Symeon Delikaris-Manias, Aalto University - Helsinki, Finland; Angelo Farina, Università di Parma - Parma, Italy; Daniel Pinardi, Università di Parma - Parma, Italy; Ville Pulkki, Aalto University - Espoo, Finland
This paper presents two real-time audio plug-ins for processing sensor array signals for sound-field visualization. The first plug-in utilizes spherical or cylindrical sensor array specifications to provide analytical spatial filters which encode the array signals into spherical harmonic signals. The second plug-in utilizes these intermediate signals to estimate the direction-of-arrival of sound sources, based on a spatially localized pressure-intensity (SLPI) approach. The challenge with the traditional pressure-intensity (PI) sound-field analysis is that it performs poorly when presented with multiple sound sources with similar spectral content. Test results indicate that the proposed SLPI approach is capable of identifying sound source directions with reduced error in various environments, when compared to the PI method.
Convention Paper 9939 (Purchase now)

P07-3 Parametric Multidirectional Decomposition of Microphone Recordings for Broadband High-Order Ambisonic EncodingArchontis Politis, Aalto University - Espoo, Finland; Sakari Tervo, Aalto University - Espoo, Finland; Tapio Lokki, Aalto University - Aalto, Finland; Ville Pulkki, Aalto University - Espoo, Finland
Higher-order Ambisonics (HOA) is a flexible recording and reproduction method, which makes it attractive for several applications in virtual and augmented reality. However, the recording of HOA signals with practical compact microphone arrays is limited to a certain frequency range, which depends on the applied microphone array. In this paper we present a parametric signal-dependent approach that improves the HOA signals at all frequencies. The presented method assumes that the sound field consists of multiple directional components and a diffuse part. The performance of the method is evaluated in simulations with a rigid microphone array in different direct-to-diffuse and signal-to-noise ratio conditions. The results show that the proposed method has a better performance than the traditional signal-dependent encoding in all the simulated conditions.
Convention Paper 9940 (Purchase now)

P07-4 Adaptive Non-Coincidence Phase Correction for A to B-Format ConversionAlexis Favrot, Illusonic GmbH - Uster, Switzerland; Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
B-format is usually obtained from A-format signals, i.e., from four directive microphone capsules pointing in different directions. Ideally, these capsules should be coincident, but due to design constraints, small distances always remain between them. As a result the phase mismatches between the microphone capsule signals lead to inaccuracies and interferences, impairing B-format directional responses, especially at high frequencies. A non-coincidence correction is proposed based on adaptive phase matching of the four microphone A-format signals before conversion to B-format, improving the directional responses at high frequencies, enabling higher focus, better spatial image and timbre in B-format signals.
Convention Paper 9941 (Purchase now)

P07-5 Advanced B-Format AnalysisMihailo Kolundzija, Ecole Polytechnique Féderale de Lausanne (EPFL) - Lausanne, Switzerland; Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
Spatial sound rendering methods that use B-format have moved from static to signal-dependent, making B-format signal analysis a crucial part of B-format decoders. In the established B-format signal analysis methods, the acquired sound field is commonly modeled in terms of a single plane wave and diffuse sound, or in terms of two plane waves. We present a B-format analysis method that models the sound field with two direct sounds and diffuse sound, and computes the three components' powers and direct sound directions as a function of time and frequency. We show the effectiveness of the proposed method with experiments using artificial and realistic signals.
Convention Paper 9942 (Purchase now)

P07-6 Ambisonic Decoding with Panning-Invariant Loudness on Small Layouts (AllRAD2)Franz Zotter, IEM, University of Music and Performing Arts - Graz, Austria; Matthias Frank, University of Music and Performing Arts Graz - Graz, Austria
On ITU BS.2051 surround with height loudspeaker layouts, Ambisonic panning is practice-proof, when using AllRAD decoders involving imaginary loudspeaker insertion and downmix. And yet on the 4+5+0 layout, this still yields a loudness difference of nearly 3 dB when comparing sounds panned to the front with such panned to the back. AllRAD linearly superimposes a series of two panning functions, optimally sampled Ambisonics and VBAP. Both are perfectly energy-preserving and therefore do not cause the loudness differences themselves, but their linear superposition does. In this contribution we present and analyze a new AllRAD2 approach that achieves decoding of constant loudness by (i) superimposing the squares of both panning functions, and (ii) calculating the equivalent linear decoder of the square root thereof.
Convention Paper 9943 (Purchase now)

P07-7 BRIR Synthesis Using First-Order Microphone ArraysMarkus Zaunschirm, University of Music and Performing Arts - Graz, Austria; IEM; Matthias Frank, University of Music and Performing Arts Graz - Graz, Austria; Franz Zotter, IEM, University of Music and Performing Arts - Graz, Austria
Both the quality and immersion of binaural auralization benefit from head movements and individual measurements. However, measurements of binaural room impulse responses (BRIRs) for various head rotations are both time consuming and costly. Hence for efficient BRIR synthesis, a separate measurement of the listener-dependent part (head-related impulse responses, HRIR) and the room-dependent part (RIR) is desirable. The room-dependent part can be measured with compact first-order microphone arrays, however the inherent spatial resolution is often not satisfying. Our contribution presents an approach to enhance the spatial resolution using the spatial decomposition method in order to synthesize high-resolution BRIRs that facilitate easy application of arbitrary HRIRs and incorporation of head movements. Finally, the synthesized BRIRs are compared to measured BRIRs.
Convention Paper 9944 (Purchase now)


P08 - Audio Education

Thursday, May 24, 09:00 — 11:00 (Scala 2)

Jan Berg, Luleå University of Technology - Piteå, Sweden

P08-1 Does Spectral Flatness Affect the Difficulty of the Peak Frequency Identification Task in Technical Ear Training?Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
Technical ear training is a method to improve the ability to focus on a specific sound attribute and to communicate using the common language and units used in the industry. In designing the successful course in a sound engineers’ educational institution, it is essential to have the gradual increase of the task difficulty. The authors had investigated the correlation between the students’ subjective ratings on the task difficulty and the physical measures calculated from the sound materials used in the training. However, the objective measure of the difficulty is still not known. Authors created the training materials with different spectral envelope but having the same music content and tested them in the ear training sessions.
Convention Paper 9945 (Purchase now)

P08-2 A Case Study of Cultural Influences on Mixing PracticesAmandine Pras, University of Lethbridge - Lethbridge, Alberta, Canada; Brecht De Man, Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
While sound mixers of popular music may share common principles across cultures, different engineers produce different mixes, and different listeners judge a mix differently. We designed a mixed-methods approach to examine this highly multidimensional problem in both style and perceived quality. Five student sound engineers from the Paris Conservatoire mixed the multitrack source of two pop songs and fully documented their mixing process. The resulting mixes were then used as stimuli for a blind, multi-stimulus listening test in a high-quality listening room that 13 students and 1 faculty member commented on and rated in terms of preference. Our outcomes highlight cultural and generational mixing specificities and offer a better understanding of the artistic side of the practice.
Convention Paper 9946 (Purchase now)

P08-3 Film Sound, Immersion and Learning: Field Study on 3D Surround Sound to Improve Foreign Language LearningFrancisco Cuadrado, Universidad Loyola Andalucía - Sevilla, AE, Spain; Isabel López-Cobo, Universidad Loyola Andalucia - Seville, Spain; Tania Mateos, University of Seville - Seville, Spain; Beatriz Valverde, Universidad Loyola Andalucia - Seville, Spain
This study focuses on the possibilities of film sound to improve the learning process, according to the immersion level elicited by 3D sound listening compared to stereo sound, and to the relation between emotion and learning. three-hundred-thirty students of English as a foreign language from Primary and High School Education watched an audiovisual sequence with one of two conditions: stereo and 3D surround mix. Learning evaluation (listening comprehension tests) and emotional impact measurement (electrodermal response, self-perception emotion test, and voice recording of the participants’ reactions) have been the used instruments. The results show that students who watched the sequence listening to the 3D surround sound mix obtained better listening comprehension results than those that listened to the stereo sound mix.
Convention Paper 9947 (Purchase now)

P08-4 “Touch the Sound”: Tangible Interface for Children Music Learning and Sound ExperimentationFrancisco Cuadrado, Universidad Loyola Andalucía - Sevilla, AE, Spain; Isabel López-Cobo, Universidad Loyola Andalucia - Seville, Spain; Ana Tajadura-Jiménez, Universidad Carlos III de Madrid - Madrid, Spain; University College London - London, UK; David Varona, Universidad Loyola Andalucia - Seville, Spain
“Touch the sound” is a music learning and sound experimenting system for children, composed by a technological tool (a tablet based interface that uses a series of physical and tangible elements for music and sound interaction) and a learning APP based on the IMLS (intelligent Music Learning System) project. In this paper we present and discuss the main pedagogical motivations for this tool: to create a tool based on accessible technology and to develop a learning tool that enhances the contact with the musical language through direct experimentation and that takes into count children’s understanding of symbols. The design process of the whole system is also described. As presented in the outcomes section, the application possibilities of “Touch the Sound” go beyond the learning of music itself and open new paths for learning different contents based on the immersion generated by sound and on the emotional impact that sound has on the listener.
Convention Paper 9948 (Purchase now)


P09 - Posters: Perception

Thursday, May 24, 10:00 — 11:30 (Arena 2)

P09-1 Sound Localization of Beamforming-Controlled Reflected Sound from Ceiling in Presence of Direct SoundHiroo Sakamoto, The University of Electro-Communications - Chofu, Tokyo, Japan; Yoichi Haneda, The University of Electro-Communications - Chofu-shi, Tokyo, Japan
Sound localization from the ceiling is important for 3D surround-sound systems, such as Dolby Atmos, 22.2 channel and higher-order ambisonics. Such systems are difficult to set up in a typical home environment owing to loudspeakers that must be placed on the ceiling. This problem can be solved by using the reflected sound from the ceiling through loudspeaker-array beamforming, such as in commercial sound bars. In the beamforming method, the listener always hears the jammer sound (side lobe) before he/she hears the main sound from the ceiling. To date, the relationship between the time and level differences in the direct and reflected sounds for sound image localization at the reflected position on the ceiling is not clear. This paper investigates this relationship through listening experiments. We also confirmed the localization at the ceiling by using a 32-element spherical loudspeaker array.
Convention Paper 9949 (Purchase now)

P09-2 Evaluation of Player-Controlled Flute Timbre by Flute Players and Non-Flute PlayersMayu Kasahara, Tokyo University of the Arts - Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
In order to investigate how flute players and non-flute players differ in the perception of the instrument, two listening experiments were carried out. The flute sounds were recorded to have changes in five levels of harmonic overtones energy levels played by three flute players. Through a listening experiment of attribute rating on “brightness,” the flute players were found to evaluate the stimuli “brighter” as the harmonic overtones energy decreased while the non-flute players evaluated inversely. Through the second listening experiment of pairwise global dissimilarity rating among the stimuli, two dimensions corresponding to the harmonic overtones energy levels and to the noise levels were found. The experience of the flute performance did not seem to affect the result. These results indicate that the experience of the flute performance seemed to affect the result only when evaluating the stimuli using the word “brightness.”
Convention Paper 9950 (Purchase now)

P09-3 Comparing the Effect of Audio Coding Artifacts on Objective Quality Measures and on Subjective RatingsMatteo Torcoli, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Dick, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
A recent work presented the subjective ratings from an extensive perceptual quality evaluation of audio signals, where isolated coding artifact types of varying strength were introduced. We use these ratings as perceptual reference for studying the performance of 11 well-known tools for objective audio quality evaluation: PEAQ, PEMO-Q, ViSQOLAudio, HAAQI, PESQ, POLQA, fwSNRseg, dLLR, LKR, BSSEval, and PEASS. Some tools achieve high correlation with subjective data for specific artifact types (Pearson's r > 0.90, Kendall's t > 0.70), corroborating their value during the development of a specific algorithm. Still, the performance of each tool varies depending on the artifact type and no tool reliably assesses artifacts from parametric audio coding. Nowadays, perceptual evaluation remains irreplaceable, especially when comparing different coding schemes introducing different artifacts.
Convention Paper 9951 (Purchase now)

P09-4 On the Accuracy and Consistency of Sound Localization at Various Azimuth and Elevation AnglesMaksims Mironovs, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This study examined the sound localization of a broadband pink noise burst at various azimuth and elevation angles in a critical listening room. A total of 33 source positions were tested, ranging from 0° to 180° azimuth and -30° to 90° elevation angles with 30° intervals. Results indicated that sound source elevation was localized inaccurately with a large data spread; however, it was improved on the off-center planes. It was observed that elevation localization accuracy was worse on the back compared to the front. Back to front confusion was observed for the sources raised to 60° elevation angle. Proposed listening response method showed consistent localization result and is therefore considered to be useful for future studies in 3D sound localization.
Convention Paper 9952 (Purchase now)

P09-5 Examination of the Factors Influencing Binaural Rendering on Headphones with the Use of Directivity PatternsBartlomiej Mróz, Gdansk University of Technology - Gdansk, Poland
This paper presents a study on the influence of the directional sound sources with the use of the directivity patterns. This contribution also includes a comparison to the work done by Wendt et al. where several directivity pattern designs used to gradually control the auditory source distance in a room were showed. While the tests of Wendt et al. were done by auralizing source and room using a loudspeaker ring in an anechoic chamber, this study aims at investigating whether the effect performs similarly in binaural auralization over headphone playback. The study includes not only the auditory source distance but also tries to discover the influence of auralized room characteristics, source-to-receiver distance, and signal on auditory externalization.
Convention Paper 9953 (Purchase now)


P10 - Audio Coding, Analysis, and Synthesis

Thursday, May 24, 11:15 — 12:45 (Scala 2)

Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany

P10-1 Bandwidth Extension Method Based on Generative Adversarial Nets for Audio CompressionQingbo Huang, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
The compression ratio of core-encoder can be improved significantly by reducing the bandwidth of the audio signal, resulting in the poor listening perception. This paper proposes a bandwidth extension method based on generative adversarial nets (GAN) for extending the bandwidth of an audio signal, to create a more natural sound. The method uses GAN as a generative model to fit the distribution of the MDCT coefficients of the audio signals in the high-frequency components. Through minimax two-player gaming, more natural high-frequency information can be estimated. On this basis, a codec system is built up. To evaluate the proposed bandwidth extension system the MUSHRA experiments were carried on and the results show that there is comparable performance with HE-AAC.
Convention Paper 9954 (Purchase now)

P10-2 Device-Specific Distortion Observed in Portable Devices Available for Recording Device IdentificationAkira Nishimura, Tokyo University Information Sciences - Chiba-shi, Japan
This study addresses device-specific distortion observed in recorded audio, to identify a built-in system-on-a-chip (SoC) in a portable device. A swept sinusoidal signal is emitted from a loudspeaker and is recorded by the portable device used in this study. The three types of distortion observed by spectral analysis of the recorded signals are the folded components at frequencies symmetrical across 4 kHz and 8 kHz of the signal component, non-harmonic and non-subharmonic distortion components whose frequencies are 4 kHz below and multiples of 4 kHz above the signal frequency, and mixed non-subharmonics and folded components in the low-frequency region. They are also observed using the correlation matrix on temporal amplitude variations among frequencies derived from the recorded speech signals.
Convention Paper 9955 (Purchase now)

P10-3 Physically Derived Synthesis Model of an Edge ToneRod Selfridge, Queen Mary University of London - London, UK; University of Edinburgh - Edinburgh, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Eldad J. Avital, Queen Mary University of London - London, UK
The edge tone is the sound generated when a planar jet of air from a nozzle comes into contact with a wedge and a number of physical conditions are met. Fluid dynamics equations were used to synthesize authentic edge tones without the need for complex computation. A real-time physically derived synthesis model was designed using the jet airspeed and nozzle exit-to-wedge geometry. We compare different theoretical equations used to predict the tone frequency. A decision tree derived from machine learning based on previously published experimental results was used to predict the correct mode of operation. Results showed an accurate implementation for mode selection and highlighted areas where operation follows or deviates from previously published data.
Convention Paper 9956 (Purchase now)


P11 - Posters: Measurement

Thursday, May 24, 11:45 — 13:15 (Arena 2)

P11-1 Presence Detection by Measuring the Change of Total Sound AbsorptionMichal Luczynski, Wroclaw University of Science and Technology - Wroclaw, Poland
The author of this paper analyzed potential possibilities of human presence detection inside a room based on determining the change in total sound absorption. The change between sound absorption with and without human leads to determine person presence. Limitations of this method were examined: in case of human presence, the change of reverberation time must be greater than measurement uncertainty. Important parameters are a volume of a room, total sound absorption, a frequency characteristic of the measurement signal, signal to noise ratio, etc. These parameters are criteria for assessing the reliability of the detection. Different types of measurement signals and systems were considered and tested. The aim was a maximum simplification of measurement method while keeping the minimum measurement uncertainty.
Convention Paper 9957 (Purchase now)

P11-2 The Evolution of Chirp-Based Measurement TechniquesMark Martin, Audio Precision - Beaverton, OR, USA; Jayant Datta, Audio Precision - Beaverton, OR, USA; Xinhui Zhou, Audio Precision - Beaverton, OR, USA
Logarithmic chirp signals, also known as exponentially swept sines, have been used to determine the behavior of audio systems for more than two decades. This is done using a nonlinear deconvolution process that evaluates the direct and harmonic responses of a system. Despite the long history of this technique, improvements continue to be discovered and questions remain. Relatively subtle features of these signals are important for making accurate measurements. This paper describes how these signals have been used in measuring audio systems, describes the current state of the art, and clarifies the theoretical foundations and limitations of the technique.
Convention Paper 9958 (Purchase now)

P11-3 A Novel Measurement Procedure for Wiener/Hammerstein Classification of Nonlinear Audio SystemsAndrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Wataru Hariya, KORG Inc. - Tokyo, Japan; Shogo Murai, KORG Inc. - Tokyo, Japan; Koji Oishi, KORG Inc. - Tokyo, Japan; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Non linear systems identification is a widespread topic and many techniques have been developed over the years in order to identify or synthesize black box models. Among the others, Wiener and Hammerstein structures are two of the more common nonlinear models, and identification techniques based on them are widespread in the literature. The choice of one structure over the other needs some a-priori knowledge. In this paper a novel method to determine if a system has a Wiener or Hammerstein nature is introduced. The method is based on the comparison of frequency responses in linear and nonlinear working region. Some simulated and real test results are reported in order to confirm the validity of the proposed approach.
Convention Paper 9959 (Purchase now)

P11-4 Moving Microphone Measurements for Room Response in CinemaPaul Peace, Community Professional Loudspeakers - Chester, PA, USA; Harman International - Stamford, CT, USA; Shawn Nageli, NXC Systems - West Jordan, UT, USA; Harman International - Stamford, CT, USA; Charles Sprinkle, Kali Audio - Simi Valley, CA, USA; Harman International - Stamford, CT, USA
Comparison of static multi-microphone measurements are made to moving microphone measurements for the purposes of determining room response and for application of appropriate equalization. Several benefits of moving microphone measurements are shown in this paper including consistency of measurements and the ability to measure interaction of correlated signal from multiple sources without the obfuscation of combing artifacts. It is also shown that moving microphone measurement is consistent with multiple microphone spatial average of sufficient resolution. This paper focuses on the moving microphone technique in the cinema application and experiments conducted were done in a medium sized cinema auditorium.
Convention Paper 9960 (Purchase now)

P11-5 Balloon Explosion, Wood-Plank, Revolver Shot, or Traditional Loudspeaker Large-Band Excitation: Which Is Better for Microphone Measurement?Balazs Kakonyi, Université du Mans - Le Mans, France; Harman International Industries - Budapest, Hungary; Riyas Abdul Jaleel, Université du Mans - Le Mans, France; Antonin Novak, Université du Mans - Le Mans, France
The work presented in this paper investigates different sound sources used for microphone measurement in an anechoic room. The microphone under test is measured using three different physical impulse-like sources: balloon explosions, sounds created using wood-planks, and revolver shots; and the results are compared with a measurement in which the sound is generated by a loudspeaker excited with a large band swept-sine signal. The frequency response functions of three different commercial microphones are measured and compared with the data provided by the manufacturer.
Convention Paper 9961 (Purchase now)


P12 - Spatial Audio-Part 2

Thursday, May 24, 13:30 — 15:30 (Scala 4)

Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy

P12-1 Binaural Room Impulse Responses Interpolation for Multimedia Real-Time ApplicationsVictor Garcia-Gomez, Universitat Politecnica de Valencia - Valencia, Valencia, Spain; Jose J. Lopez, Universidad Politecnica de Valencia - Valencia, Spain
In this paper a novel method for the interpolation of Binaural Room Impulse Responses (BRIR) is presented. The algorithm is based on decomposition in time and frequency of the BRIRs combined with an elaborated peak searching and matching algorithm for the early reflections followed by interpolation. The algorithm has been tested with real room data, carrying out a perceptual subjective test. It outperforms, both in quality and in computational cost, the state-of-the-art algorithms.
Convention Paper 9962 (Purchase now)

P12-2 Evaluation of Binaural Renderers: LocalizationGregory Reardon, New York University - New York, NY, USA; Andrea Genovese, New York University - New York, NY, USA; Gabriel Zalles, New York University - New York, NY, USA; Patrick Flanagan, THX Ltd. - San Francisco, CA, USA; Agnieszka Roginska, New York University - New York, NY, USA
Binaural renderers can be used to reproduce spatial audio over headphones. A number of different renderers have recently become commercially available for use in creating immersive audio content. High-quality spatial audio can be used to significantly enhance experiences in a number of different media applications, such as virtual, mixed and augmented reality, computer games, and music and movie. A large multi-phase experiment evaluating six commercial binaural renderers was performed. This paper presents the methodology, evaluation criteria, and main findings of the horizontal-plane source localization experiment carried out with these renderers. Significant differences between renderers’ regional localization accuracy were found. Consistent with previous research, subjects tended to localize better in the front and back of the head than at the sides. Differences between renderer performance at the side regions heavily contributed to their overall regional localization accuracy.
Convention Paper 9963 (Purchase now)

P12-3 Characteristics of Vertical Sound Image with Two Parametric LoudspeakersShigeaki Aoki, Kanazawa Institute of Technology - Nonoichi, Ishikawa, Japan; Kazuhiro Shimizu, Kanazawa Institute of Technology - Nonoichi, Japan; Kouki Itou, Kanazawa Institute of Technology - Nonoichi, Japan
A parametric loudspeaker utilizes nonlinearity of a medium and is known as a super-directivity loudspeaker. So far, the applications have been limited monaural reproduction sound system. We had discussed characteristics of stereo reproduction with two parametric loudspeakers. In this paper the sound localization in the vertical direction using the upper and lower parametric loudspeakers was confirmed by listening tests. The level difference between the upper and lower parametric loudspeakers were varied as a parameter. The direction of sound localization in the vertical plane was able to be controlled. We obtained interesting characteristics of the left-right sound localization in the horizontal plane. The simple geometrical acoustic model was introduced and analyzed. The analysis led to explain the measured characteristics.
Convention Paper 9964 (Purchase now)

P12-4 Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated LoudspeakersHyunkook Lee, University of Huddersfield - Huddersfield, UK; Dale Johnson, The University of Huddersfield - Huddersfield, UK; Maksims Mironovs, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper proposes “virtual hemispherical amplitude panning (VHAP),” which is an efficient 3D panning method exploiting the phantom image elevation effect. Research found that a phantom center image produced by two laterally placed loudspeakers would be localized above the listener. Based on this principle, VHAP attempts to position a phantom image over a virtual upper-hemisphere using just four ear-level loudspeakers placed at the listener’s left side, right side, front center, and back center. A constant-power amplitude panning law is applied among the four loudspeakers. A listening test was conducted to evaluate the localization performance of VHAP. Results indicate that the proposed method can enable one to locate a phantom image at various spherical coordinates in the upper hemisphere with some limitations in accuracy and resolution.
Convention Paper 9965 (Purchase now)


P13 - Posters: Modeling

Thursday, May 24, 16:00 — 17:30 (Arena 2)

P13-1 Nonlinear Real-Time Emulation of a Tube Amplifier with a Long Short Time Memory Neural-NetworkThomas Schmitz, University of Liege - Liege, Belgium; Jean-Jacques Embrechts, University of Liege - Liege, Belgium
Numerous audio systems for musicians are expensive and bulky. Therefore, it could be advantageous to model them and to replace them by computer emulation. Their nonlinear behavior requires the use of complex models. We propose to take advantage of the progress made in the field of machine learning to build a new model for such nonlinear audio devices (such as the tube amplifier). This paper specially focuses on the real-time constraints of the model. Modifying the structure of the Long Short Term Memory neural-network has led to a model 10 times faster while keeping a very good accuracy. Indeed, the root mean square error between the signal coming from the tube amplifier and the output of the neural network is around 2%.
Convention Paper 9966 (Purchase now)

P13-2 Audio Control Room Optimization Employing BEM (Boundary Element Method)Robert Hersberger, Walters Storyk Design Group - Basel, Switzerland; Fachschule für Akustik; Gabriel Hauser, Walters Storyk Design Group - Basel, Switzerland; Dirk Noy, WSDG - Basel, Switzerland; John Storyk, Walters-Storyk Design Group - Highland, NY, USA
The Boundary Element Method (BEM) is a state-of-the art tool in many engineering and science disciplines. In acoustics, the usage of BEM is increasing, especially for low frequency analysis, since the computational effort for small to medium geometries and long wavelengths is comparatively small. While BEM is well known to give reliable results for correctly programmed room shapes, the paper at hand demonstrates that the BEM model can also respond accurately to inserted absorptive materials, and hence the method is useful for virtually prototyping the efficiency of proposed acoustical modifications ahead of actual construction.
Convention Paper 9967 (Purchase now)

P13-3 A Machine Learning Approach to Detecting Sound-Source Elevation in Adverse EnvironmentsHugh O'Dwyer, Trinity College - Dublin, Ireland; Enda Bates, Trinity College Dublin - Dublin, Ireland; Francis M. Boland, Trinity College Dublin - Dublin, Ireland
Recent studies have shown that Deep neural Networks (DNNs) are capable of detecting sound source azimuth direction in adverse environments to a high level of accuracy. This paper expands on these findings by presenting research that explores the use of DNNs in determining sound source elevation. A simple machine-hearing system is presented that is capable of predicting source elevation to a relatively high degree of accuracy in both anechoic and reverberant environments. Speech signals spatialized across the front hemifield of the head are used to train a feedforward neural network. The effectiveness of Gammatone Filter Energies (GFEs) and the Cross-Correlation Function (CCF) in estimating elevation is investigated as well as binaural cues such as Interaural Time Difference (ITD) and Interaural Level Difference (ILD). Using a combination of these cues, it was found that elevation to within 10 degrees could be predicted with an accuracy upward of 80%.
Convention Paper 9968 (Purchase now)

P13-4 Design of an Acoustic System with Variable ParametersKarolina Prawda, AGH University of Science and Technology - Kraków, Poland
The number of multifunctional halls with need for acoustic adaptation aimed at many different demands of the spaces is constantly growing. The present paper shows the design of an acoustic system with variable characteristics and adjustable sound absorption coefficient that may be used in such spaces.
Convention Paper 9969 (Purchase now)

P13-5 High Frequency Modelling of a Car Audio SystemAleksandra Pyzik, Volvo Car Corporation - Torslanda, Sweden; Andrzej Pietrzyk, Volvo Car Corporation - Torslanda, Sweden
Geometrical Acoustics is a widely used room acoustic modelling method. Since GA neglects the wave phenomena and is strictly applicable for short wavelengths relative to model and surface sizes, the application for the automotive industry is still subject of research. The paper studies the feasibility of using GA for high frequency simulations of a car sound system. The GA models of a vehicle at three production stages were created based on FE models. An impedance gun was used for in-situ measurements of the properties of the car interior materials. The directivity of the midrange and tweeter speakers was measured in anechoic conditions. In subsequent simulations, various GA software settings were tested. Simulation results were verified with measurements in the car.
Convention Paper 9970 (Purchase now)


P14 - Perception – Part 1

Thursday, May 24, 16:15 — 17:45 (Scala 4)

Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland

P14-1 The Standard Deviation of the Amplitude Spectrum as a Predictor of the Perception of the 'Power' of Distorted Guitar TimbreKoji Tsumoto, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
The perception of the wildness and heaviness of distorted guitar timbre can be compiled as one attribute associated with the "power" according to our previous study. The amount of distortion is one of the predictors of the perception of the "power." Although the predictor of the "power" other than the amount of distortion is yet to be known, the existence of a predictor for the subtle differences of the "power" can be assumed from the engineering perspective. Specifically, the amount of even and odd harmonics are altered by the symmetrical or asymmetrical placements of the diodes in the distortion effect pedal. We investigated how these changes affect the perception of "power." The spectral centroids of the stimuli were equalized to eliminate the influence of the perception of "brightness" over "power." A pairwise comparison was conducted for the stimuli of three different amounts of distortion and three types of diode placements. Regression analysis indicated that the Standard Deviation of the Amplitude Spectrum seemed to be an appropriate predictor of the perception of "power."
Convention Paper 9971 (Purchase now)

P14-2 Categorization of Isolated Sounds on a Background—Neutral—Foreground ScaleWilliam Coleman, Dublin Institute of Technology - Dublin, Ireland; Charlie Cullen, University of the West of Scotland - UK; Ming Yan, DTS Licensing Inc. - UK
Recent technological advances have driven changes in how media is consumed in home, automotive, and mobile contexts. Multichannel audio home cinema systems are not ubiquitous but have become more prevalent. The consumption of broadcast and gaming content on smart-phone and tablet technology via telecommunications networks is also more common. This has created new possibilities and consequently poses new challenges for audio content delivery such as how media can be optimized for multiple contexts while minimizing file size. For example, a stereo audio file may be adequate for consumption in a mobile context using headphones, but it is limited to stereo presentation in the context of a surround-sound home entertainment system. Another factor is the variability of telecommunications network bandwidths. Object-based audio may offer a solution to this problem by providing audio content on an object level with metadata that controls how the media is delivered depending on the consumption paradigm. In this context, insight into the relative importance of different sounds in the auditory scene will be useful in forming content delivery strategies. This paper presents the results of a listening test investigating categorization of isolated sounds on a Background (BG) — Neutral (N) — Foreground (FG) scale. A continuum of importance was observed among the sounds tested and this shows promise for application in object-based audio delivery.
Convention Paper 9972 (Purchase now)

P14-3 The Relevance of Auditory Adaptation Effects for the Listening Experience in Virtual Acoustic EnvironmentsFlorian Klein, Technische Universität Ilmenau - Ilmenau, Germany; Stephan Werner, Technische Universität Ilmenau - Ilmenau, Germany
Virtual acoustic environments can provide a plausible reproduction of real acoustic scenes. Since the perceived quality is based on expectations and previous sound exposure, a reliable measure of the listening experience is difficult. Listeners are able to learn how to interpret spatial cues and room reflections for certain tasks. To discuss the relevance of auditory adaptation effects, this paper summarizes a series of listening experiments that show adaptation processes with effect on localization accuracy, externalization, and the ability of a listener to identify the own position in a virtual acoustic environment.
Convention Paper 9973 (Purchase now)


P15 - Measurements

Thursday, May 24, 16:30 — 18:00 (Scala 2)

John Mourjopoulos, University of Patras - Patras, Greece

P15-1 Comparison of Effectiveness of Acoustic Enhancement Systems—Comparison of In-Line, Regenerative, and Hybrid-Regenerative Enhancement MethodsTakayuki Watanabe, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Dai Hashimoto, Yamaha Corp. - Hamamasatsu, Japan; Hideo Miyazaki, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Ron Bakker, Yamaha Commercial Audio Systems Europe - Rellingen, Germany
Acoustic enhancement systems have become popular in recent years and are broadly used in various kinds of facilities because of their acoustic naturalness and system stability. Today, demand for acoustic enhancement systems exists not only for multi-purpose halls but also for highly absorptive spaces, especially lecture halls and theaters. To investigate an effective enhancement system for highly absorptive spaces, we compared several enhancement methods that are commonly applied in a small auditorium. This paper summarizes the features and the acoustical characteristics of systems configured according to each of the considered enhancement methods in the small auditorium.
Convention Paper 9974 (Purchase now)

P15-2 Comparison of Methods for Estimating the Propagation Delay of Acoustic Signals in an Audience Service for Live EventsMarcel Nophut, Leibniz Universität Hannover - Hannover, Germany; Robert Hupke, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
Our novel audience service for live events uses supplementary audio content presented through transparent headphones to enhance the traditional audio playback of a PA loudspeaker system. The service requires to estimate the propagation delay of sound waves from the PA loudspeakers to the listener in order to individually delay the supplementary audio content and temporally align it with the PA playback. This paper compares two different correlation-based methods regarding their computational complexity and their performance in estimating the above mentioned time delay using realistic recordings of music and speech samples. Additional measures, that make the estimation more robust, were developed and are also presented. Typical issues like tonal components, room reflections, crosstalk, and a large number of correlation lags are addressed.
Convention Paper 9975 (Purchase now)

P15-3 Experimental Results on Active Road Noise Cancellation in Car InteriorCarlo Tripodi, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; Alessandro Costalunga, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; Lorenzo Ebri, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; University of Parma - Parma (PR), Italy; Marco Vizzaccaro, ASK Industries SPA - Montecavolo di Quattrocastella (RE), Italy; Luca Cattani, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; Emanuele Ugolotti, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; Tiziano Nili, Ask Industries S.p.A. - Montecavolo di Q.Castella(RE), Italy
We discuss the implementation and the performance of an active road noise control system. We review the design of a system based on the Least Mean Square (LMS) adaptive algorithm, suitable for the wideband road noise reduction inside a car cabin. As the system is based on a feedforward control approach, we discuss the method for the selection of the sensors providing the best noise reference signals. We then give a computational complexity analysis of the overall system and discuss the system implementation into a prototype hardware for a mid-size sedan. Performance are then evaluated on different road noise scenarios in real driving situations.
Convention Paper 9976 (Purchase now)


P16 - Spatial Audio-Part 3

Friday, May 25, 08:45 — 10:45 (Scala 4)

Ville Pulkki, Aalto University - Espoo, Finland

P16-1 Surround with Depth on First-Order Beam-Controlling LoudspeakersThomas Deppisch, University of Technology - Graz, Austria; University of Music and Performing Arts Graz - Graz, Austria; Nils Meyer-Kahlen, University of Technology Graz - Graz, Austria; University of Music and Performaing Arts Graz; Franz Zotter, IEM, University of Music and Performing Arts - Graz, Austria; Matthias Frank, University of Music and Performing Arts Graz - Graz, Austria
Surround systems are typically based on fixed-directivity loudspeakers pointing towards the listener. Laitinen et al. showed for a variable-directivity loudspeaker that directivity control can be used to influence the distance impression of the reproduced sound. As we have shown in a listening experiment, using beam-controlling loudspeakers, stable auditory events at directions additional to the loudspeaker positions can be created by exciting specific wall reflections. We use these two effects to enable distance control and increase the number of effective surround directions in two different surround setups. We present IIR filter design derived from a physical model, which achieves low frequency beam-control for our novel cube-shaped 4-channel loudspeakers.
Convention Paper 9977 (Purchase now)

P16-2 A Method to Reproduce a Directional Sound Source by Using a Circular Array of Focused Sources in Front of a Linear Loudspeaker ArrayKimitaka Tsutsumi, NTT Service Evolution Laboratories - Yokosuka-shi, Kanagawa, Japan; University of Electro-Communications - Japan; Yoichi Haneda, The University of Electro-Communications - Chofu-shi, Tokyo, Japan; Ken'ichi Noguchi, NTT Service Evolution Laboratories - Yokosuka, Japan; Hideaki Takada, NTT Service Evolution Laboratories - Kanagawa, Japan
We propose a method to create a directional sound source in front of a linear loudspeaker array. The method creates a virtual circular loudspeaker array comprising multiple focused sources to reproduce directivity patterns. In the proposed method, the driving functions for the secondary sources are defined as a cascade combination of two driving functions: The first one for directivity control derived by an analytical conversion of circular harmonic modes, and the second one for creating focused sources. The proposed driving functions can deal with directivity rotation by changing the position of focused sources, thereby avoiding recalculations of driving functions. Using computer simulation, we obtained accuracy and algorithmic complexity comparable or better than those of a conventional method.
Convention Paper 9978 (Purchase now)

P16-3 Subjective Evaluation of Multichannel Upmix Method Based on Interchannel Coherent Component ExtractionYuta Hashimoto, University of Toyama - Toyama, Japan; Yasuto Goto, University of Toyama - Toyama, Japan; Akio Ando, University of Toyama - Toyama, Japan
We developed a method that extracted the interchannel coherent component from multichannel audio signal. In this paper two subjective evaluation experiments for testing the upmix performance of our method are shown. In the first experiment, stereo signals were upmixed to 4-channel signals in which channels were set at +-30° and +-60°. The subjective evaluation with MUSHRA method showed that our method was superior to the conventional methods. In the second experiment, signals of 4 channels located at +-30° and +-110° were upmixed to 8-channel signals in which four of the channels were set at the upper layer. The subjective evaluation showed that there were no significant differences between the upmixed 8-channel sound and the original 4-channel sound in terms of spatial impression.
Convention Paper 9979 (Purchase now)

P16-4 Comparison between Different Microphone Arrays for 3D-AudioLucca Riitano, University of Applied Sciences Darmstadt - Darmstadt, Germany; Jorge Enrique Medina Victoria, University of Applied Sciences Darmstadt - Darmstadt, Germany
The growing need of 3D recordings for film, virtual reality, and games started the development and research on different microphone arrays for 3D-Audio such as the ORTF-3D, MS-3D, and a wide range of experimental and particular setups. Comparison between the different microphone arrays has been rather the exception. For this paper three different arrays are placed together to record a piece of music. Based on a listening test, the advantages and disadvantages between the three different microphone arrays are compared and discussed in order to find the most suitable array for music recording in 3D.
Convention Paper 9980 (Purchase now)


P17 - Posters: Analysis/Synthesis

Friday, May 25, 09:30 — 11:00 (Arena 2)

P17-1 A Preliminary Study of Sounds Emitted by Honey Bees in a BeehiveStefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Alessandro Terenzi, Universita Politecnica delle Marche - Ancona, Italy; Simone Orcioni, Universita Politecnica delle Marche - Ancona, Italy; Paola Riolo, Universita Politecnica delle Marche - Ancona, Italy; Sara Ruschioni, Universita Politecnica delle Marche - Ancona, Italy; Nunzio Isidoro, Universita Politecnica delle Marche - Ancona, Italy
Honey bees (Apis mellifera L.) are well known insects that have positive effects on a human being’s life. They are so important that the honey bee colonies decline of the last years has produced an increasing interest for their safeguard. In this context, the proposed work aims at studying and developing an innovative system capable of monitoring the beehive’s condition exploiting the sound emitted by the beehives in combination with measurable parameters such as temperature, humidity, CO2, hive weight, and weather conditions. In this paper preliminary results will be reported describing the developed platform and the first results obtained in a real scenario.
Convention Paper 9981 (Purchase now)

P17-2 Analysis of Reports and Crackling Sounds with Associated Magnetic Field Disturbances Recorded during a Geomagnetic Storm on March 7, 2012 in Southern FinlandUnto K. Laine, Aalto University - Aalto, Finland
Audio- and magnetic field signals were recorded during a geomagnetic storm on March 7, 2012, on open fields at Karkkila, approximately 70 km north of Helsinki by using a Zoom H4n recorder. Almost 90 distinct sound events like short claps, loud reports, or even a crackling sound were recorded. The paper describes the methods used and the results obtained in the audio- and magnetic field signal analysis. Relationship between the instances of the sound events and the geomagnetic activity is described. It is shown that the spectral properties of the crackling sound and the reports are similar. The challenges in finding connections between individual sounds and the corresponding magnetic field fluctuations are demonstrated and discussed.
Convention Paper 9982 (Purchase now)

P17-3 Harmonics and Intermodulation Distortion Analysis of the Even-Order Nonlinearity Controlled Effector PedalMasaki Inui, Hiroshima Institute of Technology - Hiroshima, Japan; Kanako Takemoto, Hiroshima Institute of Technology - Hiroshima, Japan; Toshihiko Hamasaki, Hiroshima Institute of Technology - Hiroshima, Japan
It is well known that components of an electric guitar system contain even-harmonics properties inherently, such as vacuum tubes or Fuzz pedal feedback circuits. Furthermore, there are several popular pedals in the "Overdrive" category and the difference in timbers attributes to even-harmonics. However, it is difficult to identify the correlation between transfer characteristics and auditory nuances, which seems to involve psychoacoustic factors such as masking effect. In this study we developed a novel pedal that can continuously control the strength of even-harmonics and odd-harmonics, and clarified the high-order distortion effects from spectrum analysis of intermodulation distortion taking Loudness K-weighted full scale (LKFS) correction. This effector pedal must be a powerful tool for perceptual distortion analysis by using real instrument signal where masking effect occurs.
Convention Paper 9983 (Purchase now)

P17-4 Bandwidth Extension with Auditory Filters and Block-Adaptive AnalysisSunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA; Timothy Mauer, HP, Inc. - Vancouver, WA, USA; Charles Oppenheimer, HP, Inc. - San Francisco, CA, USA; Teresa Wells, HP, Inc. - San Francisco, CA, USA; David Berfanger, HP, Inc. - Vancouver, WA, USA
Bandwidth limits incurred by an audio signal due to low-excursion and narrow-bandwidth speakers reduces the perception of bass. A method to overcome this is to synthesize the harmonics of the fundamental frequency using side-chain processing. Depending on the input signal, intermodulation distortion could be introduced resulting in artifacts. A recent approach selects relevant portions of the low-frequency signal for reproduction using perceptually motivated filters, resulting in cleaner bass reproduction as confirmed through listening tests. However, one of the limitations is the need for large-duration frames or blocks (e.g., 5296 samples/block at 48 kHz) to obtain adequate frequency resolution at low-frequencies. In this paper we present an alternative approach that scales well in performance with respect to smaller block-sizes using 1/6-octave filterbank and power analysis.
Convention Paper 9984 (Purchase now)

P17-5 Evaluating Similarity of Temporal Amplitude Envelopes of Violin SoundsMagdalena Dziecielska, Poznan University of Technology - Poznan, Poland; Krzysztof Martyn, Poznan University of Technology - Poznan, Poland; Ewa Lukasik, Poznan University of Technology - Poznan, Poland
The paper presents a method for evaluating similarity of temporal amplitude envelope of violin sounds. The experienced violinmakers are able to separate good sounding from bad sounding violins just by regarding the envelope of individual sounds. The contours of two-sided envelopes of individual open-string sounds are considered as images. Four uncorrelated visual descriptors are used to form a feature vector characterizing image shape. Individual distance measures are selected for each feature. The similar objects are grouped using k-means method. Violin sounds from AMATI database have been used in experiments.
Convention Paper 9985 (Purchase now)


P18 - Perception – Part 2

Friday, May 25, 11:00 — 13:00 (Scala 4)

Franz Zotter, IEM, University of Music and Performing Arts - Graz, Austria

P18-1 The Effect of the Rise Time and Frequency Character of the Sound Source Signal on the Sense of the Early LEVToru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
The effect of the rise-time of the sound source signal on the sense of the early LEV was investigated. Authors conducted the Scheffe’s pairwise comparison method using seven kinds of bandpass noise that has eight kinds of rise-time convolved with impulse responses and played back from seven loudspeakers. From the result, the octave-band noises up to 1 kHz, the early LEV is felt strongly when the rise-time is 40 to 60 ms and the evaluation due to the difference in the rise-time varies in the case of the high frequency band. Additionally the early LEV in the mid-low range is related to the early reflections in 80 ms based on the first wavefront law, and there was almost no relationship between IACC.
Convention Paper 9986 (Purchase now)

P18-2 Plausibility of an Interactive Approaching Motion towards a Virtual Sound Source Based on Simplified BRIR SetsAnnika Neidhardt, Technische Universität Ilmenau - Ilmenau, Germany; Alby Ignatious-Tommy, Technical University Ilmenau - Ilmenau, Germany; Anson Davis Pereppadan, Technical University Ilmenau - Ilmenau, Germany
In this paper the interactive approaching motion towards a virtual loudspeaker created with dynamic binaural synthesis is subject to research. A realization based on a given set of measured binaural room impulse responses (BRIRs) was rated as plausible by all participants in a previous experiment. In this study the same BRIR-data is systematically simplified to investigate the consequences for the perception. This is of interest in the context of position-dynamic reproduction, related interpolation and extrapolation approaches as well as attempts of parameterization. The potential of inaudible data simplification is highly related to the human sensitivity to position-dependent changes in room acoustics. The results suggest a high potential for simplification, while some kinds of BRIR-impairment clearly affect the plausibility.
Convention Paper 9987 (Purchase now)

P18-3 The Impact of Trajectories of Head and Source Movements on Perceived Externalization of a Frontal Sound SourceSong Li, Leibniz Universität Hannover - Hannover, Germany; Jiaxiang E, Leibniz Universität Hannover - Hannover, Germany; Roman Schlieper, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
Two listening experiments were performed to investigate the influence of different trajectories of head and source movements on perceived externalization of a frontal sound source. In the first listening test, virtual moving sound sources with seven various trajectories were presented over headphones, while subjects’ heads remained stationary. In the second test, subjects were asked to rotate their heads on three predefined trajectories coupled with real-time binaural rendering, while the simulated virtual sound source was kept stationary. After each presentation, subjects should rate the degree of perceived externalization. Results suggested that large head and source movements can improve perceived externalization, except source movements in the front/back direction. In addition, small source or head movements do not have the influence on externalization.
Convention Paper 9988 (Purchase now)

P18-4 Evaluation of Binaural Renderers: Externalization, Front/Back and Up/Down ConfusionsGregory Reardon, New York University - New York, NY, USA; Gabriel Zalles, New York University - New York, NY, USA; Andrea Genovese, New York University - New York, NY, USA; Patrick Flanagan, THX Ltd. - San Francisco, CA, USA; Agnieszka Roginska, New York University - New York, NY, USA
Binaural renderers can be used to reproduce dynamic spatial audio over headphones and deliver immersive audio content. Six commercially available binaural renderers with different rendering methodologies were evaluated in a multi-phase subjective study. This paper presents and discusses the testing methodology, evaluation criteria, and main findings of the externalization, front/back discrimination and up/down discrimination tasks that are part of the first phase. Statistical analysis over a large number of subjects revealed that the choice of renderer has a significant effect on all three dependent measures. Further, ratings of perceived externalization for the renderers were found to be content-specific, while renderer reversal rates were much more robust to different stimuli.
Convention Paper 9989 (Purchase now)


P19 - Posters: Audio Processing/Audio Education

Friday, May 25, 13:15 — 14:45 (Arena 2)

P19-1 Combining Fully Convolutional and Recurrent Neural Networks for Single Channel Audio Source SeparationEmad M. Grais, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
Combining different models is a common strategy to build a good audio source separation system. In this work we combine two powerful deep neural networks for audio single channel source separation (SCSS). Namely, we combine fully convolutional neural networks (FCNs) and recurrent neural networks, specifically, bidirectional long short-term memory recurrent neural networks (BLSTMs). FCNs are good at extracting useful features from the audio data and BLSTMs are good at modeling the temporal structure of the audio signals. Our experimental results show that combining FCNs and BLSTMs achieves better separation performance than using each model individually.
Convention Paper 9990 (Purchase now)

P19-2 A Group Delay-Based Method for Signal DecorrelationElliot K. Canfield-Dafilou, Center for Computer Research in Music and Acosutics (CCRMA), Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
By breaking up the phase coherence of a signal broadcast from multiple loudspeakers, it is possible to control the perceived spatial extent and location of a sound source. This so-called signal decorrelation process is commonly achieved using a set of linear filters and finds applications in audio upmixing, spatialization, and auralization. Allpass filters make ideal decorrelation filters since they have unit magnitude spectra and therefore can be perceptually transparent. Here, we present a method for designing allpass decorrelation filters by specifying group delay trajectories in a way that allows for control of the amount of correlation as a function of frequency. This design is efficiently implemented as a cascade of biquad allpass filters. We present statistical and perceptual methods for evaluating the amount of decorrelation and audible distortion.
Convention Paper 9991 (Purchase now)

P19-3 Designing Quasi-Linear Phase IIR Filters for Audio Crossover Systems by Using Swarm IntelligenceFerdinando Foresi, Università Politecnica delle Marche - Ancona, Italy; Paolo Vecchiotti, Universita Politecnica delle Marche - Ancona, Italy; Diego Zallocco, Elettromedia s.r.l. - Potenza Piena, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy
In sound reproduction systems the audio crossover plays a fundamental role. Nowadays, digital crossover based on IIR filters are commonly employed, of which non-linear phase is a relevant topic. For this reason, solutions aiming to IIR filters approximating a linear phase behavior have been recently proposed. One of the latest exploits Fractional Derivative theory and uses Evolutionary Algorithms to explore the solution space in order to perform the IIR filter design: the IIR filter phase error is minimized to achieve a quasi-linear phase response. Nonetheless, this approach is not suitable for a crossover design, since the single filter transition band behavior is not predictable. This shoved the authors to propose a modified design technique including suitable constraints, as the amplitude response cut-off frequency, in the ad-hoc Particle Swarm Optimization algorithm exploring the space of IIR filter solutions. Simulations show that not only more performing filters can be obtained but also fully flat response crossovers achieved.
Convention Paper 9992 (Purchase now)

P19-4 Graduate Attributes in Music Technology: Embedding Design Thinking in a Studio Design CourseMalachy Ronan, Limerick Institute of Technology - Limerick, Ireland; Donagh O'Shea, Limerick Institute of Technology - Limerick, Ireland
Student acquisition of graduate attributes is an increasingly important consideration for educational institutes, yet embedding these attributes in the curriculum is often challenging. This paper recounts the process of embedding design thinking in a studio design course. The process is adapted to suit music technology students and delivered through weekly interactive workshops. Student adaptation to design thinking is assessed against the characteristics of experienced designers to identify issues and derive heuristics for future iterations of the course.
Convention Paper 9993 (Purchase now)

P19-5 Dynamic Range Controller Ear Training: Analysis of Audio Engineering Student Training DataDenis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada; George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Eight students from the McGill University Graduate Program in Sound Recording participated in a dynamic range controller ear training program over a period of 14 weeks. Analysis of the training data shows a significant improvement in the percentage of correct responses over time. This result agrees with previous findings by other researchers and demonstrates a positive effect of this technical ear training program.
Convention Paper 9994 (Purchase now)


P20 - Audio Processing and Effects – Part 1

Friday, May 25, 13:15 — 15:45 (Scala 4)

Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK

P20-1 Deep Neural Networks for Cross-Modal Estimations of Acoustic Reverberation Characteristics from Two-Dimensional ImagesHomare Kon, Tokyo Institute of Technology - Meguro-ku, Tokyo, Japan; Hideki Koike, Tokyo Institute of Technology - Meguro-ku, Tokyo, Japan
In augmented reality (AR) applications, reproduction of acoustic reverberation is essential for creating an immersive audio experience. The audio component of an AR experience should simulate the acoustics of the environment that users are experiencing. Earlier, sound engineers could program all the reverberation parameters in advance for a scene or if the audience was in a fixed position. However, adjusting the reverberation parameters using conventional methods is difficult because all such parameters cannot be programmed for AR applications. Considering that skilled acoustic engineers can estimate reverberation parameters from an image of a room, we trained a deep neural network (DNN) to estimate reverberation parameters from two-dimensional images. The results suggest a DNN can estimate the acoustic reverberation parameters from one image.
Convention Paper 9995 (Purchase now)

P20-2 Deep Learning for Timbre Modification and Transfer: An Evaluation StudyLeonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Carmine Emanuel Cella, IRCAM - Paris, France; Fabio Vesperini, Universita Politecnica delle Marche - Ancona, Italy; Diego Droghini, Universita Politecnica delle Marche - Ancona, Italy; Emanuele Principi, Università Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy
In the past years, several hybridization techniques have been proposed to synthesize novel audio content owing its properties from two audio sources. These algorithms, however, usually provide no feature learning, leaving the user, often intentionally, exploring parameters by trial-and-error. The introduction of machine learning algorithms in the music processing field calls for an investigation to seek for possible exploitation of their properties such as the ability to learn semantically meaningful features. In this first work we adopt a Neural Network Autoencoder architecture, and we enhance it to exploit temporal dependencies. In our experiments the architecture was able to modify the original timbre, resembling what it learned during the training phase, while preserving the pitch envelope from the input.
Convention Paper 9996 (Purchase now)

P20-3 Feature Selection for Dynamic Range Compressor Parameter EstimationDi Sheng, Queen Mary University London - London, UK; György Fazekas, Queen Mary University of London - London, UK
Casual users of audio effects may lack practical experience or knowledge of their low-level signal processing parameters. An intelligent control tool that allows using sound examples to control effects would strongly benefit these users. In a previous work we proposed a control method for the dynamic range compressor (DRC) using a random forest regression model. It maps audio features extracted from a reference sound to DRC parameter values, such that the processed signal resembles the reference. The key to good performance in this system is the relevance and effectiveness of audio features. This paper focusses on a thorough exposition and assessment of the features, as well as the comparison of different strategies to find the optimal feature set for DRC parameter estimation, using automatic feature selection methods. This enables us to draw conclusions about which features are relevant to core DRC parameters. Our results show that conventional time and frequency domain features well known from the literature are sufficient to estimate the DRC's threshold and ratio parameters, while more specialized features are needed for attack and release time, which induce more subtle changes to the signal.
Convention Paper 9997 (Purchase now)

P20-4 Effect of Delay Equalization on Loudspeaker ResponsesAki Mäkivirta, Genelec Oy - Iisalmi, Finland; Juho Liski, Aalto University - Espoo, Finland; Vesa Välimäki, Aalto University - Espoo, Finland
The impulse response of a generalized two-way loudspeaker is modeled and is delay equalized using digital filters. The dominant features of a loudspeaker are low and high corner roll-off characteristics and the behavior at the crossover points. The proposed model characterizes also the main effects of the mass-compliance resonant system. The impulse response, its logarithm and spectrogram, and the magnitude and group delay responses are visualized and compared with those measured from a two-way loudspeaker. The model explains the typical group-delay variations and magnitude-response deviations from a flat response in the passband. The group-delay equalization of the loudspeaker is demonstrated in two different methods. The first method, the time-alignment of the tweeter and woofer elements using a bulk delay, is shown to cause ripple in the magnitude response. The second method, which flattens the group delay of the speaker model in the whole audio range, leads to pre-ringing in the impulse response.
Convention Paper 9998 (Purchase now)

P20-5 An Allpass Chirp for Constant Signal-to-Noise Ratio Impulse Response MeasurementElliot K. Canfield-Dafilou, Center for Computer Research in Music and Acosutics (CCRMA), Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
A method for designing an allpass chirp for impulse response measurement that ensures a constant signal-to-noise ratio (SNR) in the measurement is presented. By using the background noise and measurement system's frequency responses, a measurement signal can be designed by specifying the group delay trajectory. This signal will have a small crest factor and will be optimally short such that the measured impulse response will have a desired and constant SNR.
Convention Paper 10014 (Purchase now)


P21 - Posters: Audio Coding and Quality

Friday, May 25, 15:00 — 16:30 (Arena 2)

P21-1 Quantization with Signal Adding Noise Shaping Using Long Range Look-Ahead OptimizationAkihiko Yoneya, Nagoya Institute of Technology - Nagoya, Aichi-pref., Japan
A re-quantization approach for digital audio signals using noise shaping by extra signal addition is studied. The approach has been proposed by the author but its properties have not been studied well. In this paper, the feature and performance of the approach is investigated. As a result, the noise shaping performance is a little better than the conventional one and perceptual evaluation is superior in terms of the fineness of the sound source image especially when the optimization horizon used in the additional signal calculation is wide. Since a wide horizon requires a lot of computation, a pruning scheme of the optimization is proposed to reduce the calculation time and the amount of computation is evaluated experimentally.
Convention Paper 9999 (Purchase now)

P21-2 A Comparison of Clarity in MQA Encoded Files vs. Their Unprocessed State as Performed by Three Groups – Expert Listeners, Musicians, and Casual ListenersMariane Generale, McGill University - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada
This paper aims to examine perceived clarity in MQA encoded audio files compared to their unprocessed state (96-kHz 24-bit). Utilizing a methodology initially proposed by the authors in a previous paper, this study aims to investigate any reported differences in clarity for three musical sources of varying genres. A double-blind test is conducted using three groups—expert listeners, musicians, and casual listeners—in a controlled environment using high-quality loudspeakers and headphones. The researchers were interested in comparing the responses of the three target groups and whether playback systems had any significant effect on listeners’ perception. Data shows that listeners were not able to significantly discriminate between MQA encoded files and the unprocessed original due to several interaction effects.
Convention Paper 10000 (Purchase now)

P21-3 A Subjective Evaluation of High Bitrate Coding of MusicKristine Grivcova, BBC Research & Development - Salford, UK; Chris Pike, BBC R&D - Salford, UK; University of York - York, UK; Thomas Nixon, BBC R&D - Salford, UK
The demand to deliver high quality audio has led broadcasters to consider lossless delivery. However the difference in quality over formats used in existing services is not clear. A subjective listening test was carried out to assess the perceived difference in quality between AAC-LC at 320 kbps and an uncompressed reference, using the method of ITU-R BS.1116. Twelve audio samples were used in the test, which included orchestral, jazz, vocal music, and speech. A total of 18 participants with critical listening experience took part in the experiment. The results showed no perceptible difference between AAC-LC at 320 kbps and the reference.
Convention Paper 10001 (Purchase now)

P21-4 Subjective Evaluation of a Spatialization Feature for Hearing Aids by Normal-Hearing and Hearing-Impaired SubjectsGilles Courtois, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Hervé Lissek, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Philippe Estoppey, Acoustique Riponne - Lausanne, Switzerland; Yves Oesch, Phonak Communciations AG - Murten, Switzerland; Xavier Gigandet, Phonak Communications AG - Murten, Switzerland
Remote microphone systems significantly improve speech intelligibly performance offered by hearing aids. The voice of the speaker(s) is captured close to the mouth by a microphone, then wirelessly sent to the hearing aids. However, the sound is rendered in a diotic way, which bypasses the spatial cues for localizing and identifying the speaker. The authors had formerly proposed a feature that localizes and spatializes the voice. The current study investigates the perception of that feature by normal-hearing and hearing-impaired subjects with and without remote microphone system experience. Comparing the diotic and binaural reproductions, subjects rated their preference over various audiovisual stimuli. The results show that experienced subjects mostly preferred the processing achieved by the feature, contrary to the other subjects.
Convention Paper 10002 (Purchase now)

P21-5 Virtual Reality for Subjective Assessment of Sound Quality in CarsAngelo Farina, Università di Parma - Parma, Italy; Daniel Pinardi, Università di Parma - Parma, Italy; Marco Binelli, University of Parma - Parma, Italy; Michele Ebri, University of Parma - Parma, Italy; Politecnico di Torino - Torino, Italy; Lorenzo Ebri, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; University of Parma - Parma (PR), Italy
Binaural recording and playback has been used for decades in the automotive industry for performing subjective assessment of sound quality in cars, avoiding expensive and difficult tests on the road. Despite the success of this technology, several drawbacks are inherent in this approach. The playback on headphones does not have thebenefit of head-tracking, so the localization is poor. The HRTFs embedded in the binaural rendering are those of the dummy head employed for recording the sound inside the car, and finally there is no visual feedback, so the listener gets a mismatch between visual and aural stimulations. The new Virtual Reality approach solves all these problems. The research focuses on obtaining a 360° panoramic video of the interior of vehicle, accompanied by audio processed in High Order Ambisonics format, ready for being rendered on a stereoscopic VR visor. It is also possible to superimpose onto the video a real-time color map of noise levels, with iso-level curves and calibrated SPL values. Finally, both sound level color map and spatial audio can be filtered by the coherence with one or multiple reference signals, making it possible to listen and localize very precisely noise sources and excluding all the others. These results have been acquired employing a massive spherical microphone array, a 360° panoramic video recording system, and accelerometers or microphones for the reference signals.
Convention Paper 10003 (Purchase now)

P21-6 Quality Evaluation of Sound Broadcasted Via DAB+ System Based on a Single Frequency NetworkStefan Brachmanski, Wroclaw University of Technology - Wroclaw, Poland; Maurycy Kin, Wroclaw University of Science and Technology - Wroclaw, Poland
This paper presents the results of quality assessment of speech and music signals transmitted via the DAB+ system. The musical signals have been evaluated in both overall quality and some particular attributes. The subjective research was provided with the use of ACR procedure according to the ITU recommendation and the results have been presented as the MOS values for various bit rates. The speech signals were additionally examined with PESQ method. The results have shown that the assumed quality of 4 MOS, for this kind of broadcasting could be achieved at 48 kbit/s. This fact was confirmed by both subjective and objective research. The differences of results obtained for overall sound quality and particular sound attributes are discussed.
Convention Paper 10004 (Purchase now)

P21-7 An Investigation into Spatial Attributes of 360° Microphone Techniques for Virtual RealityConnor Millns, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening tests were conducted to evaluate perceived spatial attributes of two types of 360° microphone techniques for virtual reality (First Order Ambisonics (FOA) and the Equal Segment Microphone Array (ESMA)). Also a binaural dummy head was included as a baseline for VR audio. The four attributes tested were: source shift/ensemble spread, source/ensemble distance, environmental width, and environmental depth. The stimuli used in these tests included single and multi-source sounds consisting of both human voice and instruments. The results indicate that listeners can distinguish differences in three of the four spatial attributes. The binaural head was rated the highest for each attribute and FOA was rated the least except for in environmental depth.
Convention Paper 10005 (Purchase now)

P21-8 Statistical Tests with MUSHRA DataCatarina Mendonça, Aalto University - Espoo, Finland; Symeon Delikaris-Manias, Aalto University - Helsinki, Finland
This work raises concerns regarding the statistical analysis of data obtained with the MUSHRA method. There is a widespread tendency to prefer the ANOVA test, which is supported by the recommendation. This work analyzes four assumptions underlying the ANOVA tests: interval scale, normality, equal variances, and independence. Data were collected from one experiment and one questionnaire. It is found that MUSHRA data tend to violate all of the above assumptions. The consequences of each violation are debated. The violation of multiple assumptions is of concern. The violation of independence of observations leads to the most serious concern. In light of these findings, it is concluded that ANOVA tests have a high likelihood of resulting in type 1 error (false positives) with MUSHRA data and should therefore never be used with this type of data. The paper finishes with a section devoted to statistical recommendations. It is recommended that when using the MUSHRA method, the Wilcoxon or Friedman tests be used. Alternatively, statistical tests based on resampling methods are also appropriate.
Convention Paper 10006 (Purchase now)

P21-9 Investigation of Audio Tampering in Broadcast ContentNikolaos Vryzas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Anastasia Katsaounidou, Aristotle University of Thessaloniki - Thessaloniki, Greece; Rigas Kotsakis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece
Audio content forgery detection in broadcasting is crucial to prevent misinformation spreading. Tools for the authentication of audio files can be proven very useful, and several techniques have been proposed. In the current paper a database for evaluation of such techniques is introduced. A script was created for automatic generation of tampered audio files, given a number of original source files that contain recorded speech, while they have been encoded in different audio formats (Mp3, AAC, AMR, FLAC) and bitrates and finally they were used to generate the tampered audio files. The database was subjectively evaluated by experts in terms of samples changing audibility. The effect of tampering on several audio features was tested, in order to propose semi-automatic methods for discrimination between the original and tampered files. The database and the scripts are publically accessible so that researchers can use the pre-generated files or use the script to create datasets oriented to their research interests.
Convention Paper 10007 (Purchase now)


P22 - Perception – Part 3

Friday, May 25, 16:00 — 18:00 (Scala 4)

Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany

P22-1 Audibility of Loudspeaker Group-Delay CharacteristicsJuho Liski, Aalto University - Espoo, Finland; Aki Mäkivirta, Genelec Oy - Iisalmi, Finland; Vesa Välimäki, Aalto University - Espoo, Finland
Loudspeaker impulse responses were studied using a paired-comparison listening test to learn about the audibility of the loudspeaker group-delay characteristics. Several modeled and six measured loudspeakers were included in this study. The impulse responses and their time-reversed versions were used in order to maximize the change in the temporal structure and group delay without affecting the magnitude spectrum, and the subjects were asked whether they could hear a difference. Additionally, the same impulse responses were compared after convolving them with a pink impulse, defined in this paper, which causes a low-frequency emphasis. The results give an idea of how much the group delay of a loudspeaker system can vary so that it is unlikely to cause audible effects in sound reproduction. Our results suggest that when the group delay in the frequency range from 300 Hz to 1 kHz is below 1.0 ms, it is inaudible. With low-frequency emphasis the group delay variations can be heard more easily.
Convention Paper 10008 (Purchase now)

P22-2 The Influence of Hearing and Sight on the Perceptual Evaluation of Home Speaker SystemsHans-Joachim Maempel, Federal Institute for Music Research - Berlin, Germany; TU Berlin; Michael Horn, Federal Institute for Music Research - Berlin, Germany
Home speaker systems are not only functional but also aesthetic objects with both acoustic and optical proper-ties. We investigated the perceptual evaluation of four home speaker systems under acoustic, optical, and opto-acoustic conditions (factor Domain). By varying the speakers' acoustic and optical properties under the opto-acoustic condition in a mutually independent manner (factors Acoustic loudspeaker, Optical loudspeaker), we also investigated their proportional influence on perception. To this end, 40 non-expert participants rated 10 auditory, 2 visual, and 4 audiovisual features. The acoustic stimuli were generated by means of data-based dynamic binaural synthesis. Noticeably, participants did not realize that the speakers were acoustically simulated. Results indicated that only the mean ratings of two auditory and one audiovisual feature were significantly influenced by the factor Domain. There were speaker-dependent effects on three further auditory features. Small crossmodal effects from Optical loudspeaker on six auditory features were observed. Remarkably, the audiovisual features, particularly monetary value, were dominated by the optical properties instead of the acoustic. This is due to a low acoustic and a high optical variance of the speakers. Results give reason to the hypothesis that the optical properties imply an overall quality that in turn may influence the rating of auditory features.
Convention Paper 10009 (Purchase now)

P22-3 A VR-Based Mobile Platform for Training to Non-Individualized Binaural 3D AudioChungeun Kim, University of Surrey - Guildford, Surrey, UK; Mark Steadman, Imperial College London - London, UK; Jean-Hugues Lestang, Imperial College London - London, UK; Dan F. M. Goodman, Imperial College London - London, UK; Lorenzo Picinali, Imperial College London - London, UK
Delivery of immersive 3D audio with arbitrarily-positioned sound sources over headphones often requires processing of individual source signals through a set of Head-Related Transfer Functions (HRTFs), the direction-dependent filters that describe the propagation of sound in an anechoic environment from the source to the listener's ears. The individual morphological differences and the impracticality of HRTF measurement make it difficult to deliver completely individualized 3D audio in this manner, and instead lead to the use of previously-measured non-individual sets of HRTFs. In this study a VR-based mobile sound localization training prototype system is introduced that uses HRTF sets for audio. It consists of a mobile phone as a head-mounted device, a hand-held Bluetooth controller, and a network-enabled PC with a USB audio interface and a pair of headphones. The virtual environment was developed on the mobile phone such that the user can listen-to/navigate-in an acoustically neutral scene and locate invisible target sound sources presented at random directions using non-individualized HRTFs in repetitive sessions. Various training paradigms can be designed with this system, with performance-related feedback provided according to the user's localization accuracy, including visual indication of the target location, and some aspects of a typical first-person shooting game, such as enemies, scoring, and level advancement. An experiment was conducted using this system in which 11 subjects went through multiple training sessions, using non-individualized HRTF sets. The localization performance evaluations showed reduction of overall localization angle error over repeated training sessions, reflecting lower front-back confusion rates.
Convention Paper 10010 (Purchase now)

P22-4 Speech-To-Screen: Spatial Separation of Dialogue from Noise towards Improved Speech Intelligibility for the Small ScreenPhilippa Demonte, Acoustics Research Centre, University of Salford - Salford, UK; Yan Tang, University of Salford - Salford, UK; Richard J. Hughes, University of Salford - Salford, Greater Manchester, UK; Trevor Cox, University of Salford - Salford, UK; Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK
Can externalizing dialogue when in the presence of stereo background noise improve speech intelligibility? This has been investigated for audio over headphones using head-tracking in order to explore potential future developments for small-screen devices. A quantitative listening experiment tasked participants with identifying target words in spoken sentences played in the presence of background noise via headphones. Sixteen different combinations of 3 independent variables were tested: speech and noise locations (internalized/externalized), video (on/off), and masking noise (stationary/fluctuating noise). The results revealed that the best improvements to speech intelligibility were generated by both the video-on condition and externalizing speech at the screen while retaining masking noise in the stereo mix.
Convention Paper 10011 (Purchase now)


P23 - Audio Processing and Effects – Part 2

Saturday, May 26, 09:00 — 11:30 (Scala 4)

Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary

P23-1 Stage Compression in Transaural AudioFilippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Eric Hamdan, University of Southampton - Southampton, UK
The reproduction of binaural audio with loudspeakers, also referred to as transaural audio, is affected by a number of artifacts. This work focuses on the effect of reproduction error on low frequency Interaural Time Difference (ITD). Transaural systems do not provide perfect cross-talk cancellation between the left and right ear signals, especially at low frequencies. It is shown that increase in cross-talk leads to a perceived source azimuth angle that is smaller than intended. The authors show that in ideal theoretical conditions the angular error calculated from the interaural phase difference indicates stage compression for frequencies for which high cross-talk occurs. This trend is shown in the resultant ITD calculated from Interaural Cross Correlation (IACC), examined in one-third octave bands.
Convention Paper 10012 (Purchase now)

P23-2 Multi-Track Crosstalk Reduction Using Spectral SubtractionFabian Seipel, TU Berlin - Berlin, Germany; Alexander Lerch, Georgia Institute of Technology - Atlanta, GA, USA
While many music-related blind source separation methods focus on mono or stereo material, the detection and reduction of crosstalk in multi-track recordings is less researched. Crosstalk or “bleed” of one recorded channel in another is a very common phenomenon in specific genres such as jazz and classical, where all instrumentalists are recorded simultaneously. We present an efficient algorithm that estimates the crosstalk amount in the spectral domain and applies spectral subtraction to remove it. Randomly generated artificial mixtures from various anechoic orchestral source material were employed to develop and evaluate the algorithm, which scores an average SIR-Gain result of 15.14 dB on various datasets with different amounts of simulated crosstalk.
Convention Paper 10013 (Purchase now)

P23-3 Wave Digital Modeling of the Diode-Based Ring ModulatorAlberto Bernardini, Politecnico di Milano - Milan, Italy; Kurt James Werner, Queen's University Belfast - Belfast, UK; Sonic Arts Research Centre (SARC); Paolo Maffezzoni, Politecnico di Milano - Milan, Italy; Augusto Sarti, Politecnico di Milano - Milan, Italy
The ring modulator is a strongly nonlinear circuit common in audio gear, especially as part of electronic musical instruments. In this paper an accurate model based onWave Digital (WD) principles is developed for implementing the ring modulator as a digital audio effect. The reference circuit is constituted of four diodes and two multi-winding transformers. The proposed WD implementation is based on the Scattering Iterative Method (SIM), recently developed for the static analysis of large nonlinear photovoltaic arrays. In this paper SIM is shown to be suitable for implementing also audio circuits for Virtual Analog applications, such as the ring modulator, since it is stable, robust and comparable to or more efficient than state-of-the-art strategies in terms of computational cost.
Convention Paper 10015 (Purchase now)

P23-4 Improving the Frequency Response Magnitude and Phase of Analogue-Matched Digital FiltersJohn Flynn, Balance Mastering - London, UK; Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Current closed-form IIR methods for approximating an analogue prototype filter in the discrete-domain do not match frequency response phase. The frequency sampling method can match phase, but requires extremely long filter lengths (and corresponding latency) to perform well at low frequencies. We propose a method for discretizing an analogue prototype that does not succumb to these issues. Contrary to the IIR methods, it accurately approximates the phase, as well as the magnitude response. The proposed method exhibits good low frequency resolution using much smaller filter lengths than design by frequency sampling.
Convention Paper 10016 (Purchase now)

P23-5 Optimization of Personal Audio Systems for Intelligibility ContrastDaniel Wallace, University of Southampton - Southampton, UK; Jordan Cheer, ISVR, University of Southampton - Southampton, Hampshire, UK
Personal audio systems are designed to deliver spatially separated regions of audio to individual listeners. This paper demonstrates a method of personal audio system design that provides a level of contrast in the perceived speech intelligibility between bright and dark audio zones. Limitations in array directivity which would lead to a loss of privacy are overcome by reproducing a synthetic masking signal in the dark zone. This signal is optimized to provide effective masking whilst remaining subjectively pleasant to listeners. Results of this optimization from a simulated personal audio system are presented.
Convention Paper 10017 (Purchase now)


P24 - Posters: Spatial Audio

Saturday, May 26, 09:30 — 11:00 (Arena 2)

P24-1 Acoustic and Subjective Evaluation of 22.2- and 2-Channel Reproduced Sound Fields in Three StudiosMadhu Ashok, University of Rochester - Rochester, NY, USA; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA
Three studios of similar outer-shell dimensions, with varying acoustic treatments and absorptivity, were evaluated via both recorded and simulated binaural stimuli for 22.2- and 2-channel playback. A series of analysis, including acoustic modelling in CATT-Acoustic and subjective evaluation, was conducted to test whether the 22.2-channel playback preserved common perceptual impressions regardless of room-dependent physical characteristics. Results from multidimensional scaling (MDS) indicated that listeners used one perceptual dimension for differentiating between reproduction format, and others for physical room characteristics. Clarity and early decay time measured in the three studios illustrated a similar pattern when scaled from 2- to 22.2-channel reproduced sound fields. Subjective evaluation revealed a tendency to preserve inherent perceptual characteristics of 22.2-channel playback in spite of different playback conditions.
Convention Paper 10018 (Purchase now)

P24-2 Audio Source Localization as an Input to Virtual Reality EnvironmentsAgneya A. Kerure, Georgia Institute of Technology - Atlanta, GA, USA; Jason Freeman, Georgia Institute of Technology - Atlanta, GA, USA
This paper details an effort towards incorporating audio source localization as an input to virtual reality systems, focusing primarily on games. The goal of this research is to find a novel method to use localized live audio as an input for level generation or creation of elements and objects in a virtual reality environment. The paper discusses the current state of audio-based games and virtual reality, and details the design requirements of a system consisting of a circular microphone array that can be used to localize the input audio. The paper also briefly discusses signal processing techniques used for audio information retrieval and introduces a prototype of an asymmetric virtual reality first-person shooter game as a proof-of-concept of the potential of audio source localization for augmenting the immersive nature of virtual reality.
Convention Paper 10019 (Purchase now)

P24-3 High Order Ambisonics Encoding Method Using Differential Microphone ArrayShan Gao, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
High order Ambisonics (HOA) is a flexible way to represent and analyze the sound field. In the process of the spherical Fourier transform, the microphone array need be uniformly scattered on the surface of a sphere, which limits the application of the theory. In this paper we introduce a HOA encoding method using the differential microphone arrays (DMAs). We obtain the particular beam patterns of different orders of spherical functions by the weighted sum of time-delayed outputs from closely-spaced differential microphone array. Then HOA coefficients are estimated by projecting the signals to the beam patterns. The coefficients calculated by the DMA are compared to the results derived from the theoretical spherical harmonics, which proves the effectiveness of our method.
Convention Paper 10020 (Purchase now)

P24-4 Development of a 64-Channel Spherical Microphone Array and a 122-Channel Loudspeaker Array System for 3D Sound Field Capturing and Reproduction Technology ResearchShoken Kaneko, Yamaha Corporation - Iwata-shi, Japan; Tsukasa Suenaga, Yamaha Corporation - Iwata-shi, Japan; Hitoshi Akiyama, Yamaha Co. - Iwata, Shizuoka, Japan; Yoshiro Miyake, Yamaha Corporation - Iwata-shi, Japan; Satoshi Tominaga, Yamaha Corporation - Hamamatsu-shi, Japan; Futoshi Shirakihara, Yamaha Corporation - Iwata-shi, Japan; Hiraku Okumura, Yamaha Corporation - Iwata-shi, Japan; Kyoto University - Kyoto, Japan
In this paper we present our recent activities on building facilities to drive research and development on 3D sound field capturing and reproduction. We developed a 64-channel spherical microphone array, the ViReal Mic, and a 122-channel loudspeaker array system, the ViReal Dome. The ViReal Mic is a microphone array whose microphone capsules are mounted on a rigid sphere, with the positions determined by the spherical Fibonacci spiral. The ViReal Dome is a loudspeaker array system consisting of 122 active coaxial loudspeakers. We present the details of the developed systems and discuss directions of future research.
Convention Paper 10021 (Purchase now)

P24-5 A Recording Technique for 6 Degrees of Freedom VREnda Bates, Trinity College Dublin - Dublin, Ireland; Hugh O'Dwyer, Trinity College - Dublin, Ireland; Karl-Philipp Flachsbarth, Trinity College - Dublin, Ireland; Francis M. Boland, Trinity College Dublin - Dublin, Ireland
This paper presents a new multichannel microphone technique and reproduction system intended to support six degrees of freedom of listener movement. The technique is based on a modified form of the equal segment microphone array (ESMA) concept and utilizes four Ambisonic (B-format) microphones in a near-coincident arrangement with a 50cm spacing. Upon playback, these Ambisonic microphones are transformed into virtual microphones with different polar patterns that change based on the listener's position within the reproduction area. The results of an objective analysis and an informal subjective listening test indicate some inconsistencies in the on and off-axis response, but suggest that the technique can potentially support six degrees of freedom in a recorded audio scene using a compact microphone array that is well suited to Virtual Reality (VR) and particularly Free View Point (FVV) applications.
Convention Paper 10022 (Purchase now)

P24-6 On the Use of Bottleneck Features of CNN Auto-Encoder for Personalized HRTFsGeon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Chan Jun Chun, Korea Institute of Civil Engineering and Building Technology (KICT) - Goyang, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
The most effective way of providing immersive sound effects is to use head-related transfer functions (HRTFs). HRTFs are defined by the path from a given sound source to the listener's ears. However, sound propagation by HRTFs differs slightly between people because the head, body, and ears differ for each person. Recently, a method for estimating HRTFs using a neural network has been developed, where anthropometric pinna measurements and head-related impulse responses (HRIRs) are used as the input and output layer of the neural network. However, it is inefficient to accurately measure such anthropometric data. This paper proposes a feature extraction method for the ear image instead of measuring anthropometric pinna measurements directly. The proposed method utilizes the bottleneck features of a convolutional neural network (CNN) auto-encoder from the edge detected ear image. The proposed feature extraction method using the CNN-based auto-encoder will be incorporated into the HRTF estimation approach.
Convention Paper 10023 (Purchase now)


P25 - Audio Applications

Saturday, May 26, 13:00 — 15:00 (Scala 4)

Annika Neidhardt, Technische Universität Ilmenau - Ilmenau, Germany

P25-1 Films Unseen: Approaching Audio Description Alternatives to Enhance Perception, Immersion, and Imagery of Audio-Visual Mediums for Blind and Partially Sighted Audiences: Science FictionCesar Portillo, SAE Institute London - London, UK
“Films Unseen” is a research conducted to analyze the nature of audio description and the soundtrack features of Science Fiction films/content. The paper explores the distinctive immersive, sound spatialization and sound design features that could allow blind and partially sighted audiences to perceive and conduct an accurate interpretation of the optical elements, presented within visually complex audio visual mediums, such as the film used as a case study called How to Be Human (Centofanti, 2017). Correspondingly, the results collected from 15 experienced audio description users demonstrated the efficiency of SFX, immersive audio and binaural recording techniques to stimulate the perception of visual performances by appealing to auditory senses, evoking a more meaningful and understandable experience to visually impaired audiences in correlation with experimental approaches to sound design and audio description.
Convention Paper 10024 (Purchase now)

P25-2 "It's about Time!" A Study on the Perceived Effects of Manipulating Time in Overdub RecordingsTore Teigland, Westerdals University College - Oslo, Norway; Pål Erik Jensen, Westerdals Oslo ACT, University College - Oslo, Norway; Claus Sohn Andersen, Westerdals Oslo School of Arts - Oslo, Norway; Norwegian University of Science and Technology
In this study we made three separate recordings using both close, near, and room microphones. These recordings were then the subject for a listening test constructed to study a variety of perceived effects due to manipulating time in overdub recordings. While the use of time alignment to decrease comb filtering has been widely studied, there has been little work on investigating other perceived effects. Time alignment has become more and more common, but as this paper concludes, it should not be used without concern. The findings will shed light on a range of important factors affected by manipulating time between microphones in overdub recordings, while also concluding on which of, and when, the investigated techniques are normally preferred or not.
Convention Paper 10025 (Purchase now)

P25-3 Musicians’ Binaural Headphone Monitoring for Studio RecordingValentin Bauer, Paris Conservatoire (CNSMDP) - Paris, France; Hervé Déjardin, Radio France - Paris, France; Amandine Pras, University of Lethbridge - Lethbridge, Alberta, Canada
This study uses binaural technology for headphone monitoring in world music, jazz, and free improvisation recording sessions. We first conducted an online survey with 12 musicians to identify the challenges they face when performing in studio with wearable monitoring devices. Then, to investigate musicians’ perceived differences between binaural and stereo monitoring, we carried out three comparative tests followed by semi-directed focus groups. The survey analysis highlighted the main challenges of coping with an unusual performance situation and a lack of realism and sound quality of the auditory scene. Tests showed that binaural monitoring improved the perceived sound quality and realism, musicians’ comfort and pleasure, and encouraged better musical performances and more creativity in the studio.
Convention Paper 10026 (Purchase now)

P25-4 Estimation of Object-Based Reverberation Using an Ad-Hoc Microphone Arrangement for Live PerformanceLuca Remaggi, University of Surrey - Guildford, Surrey, UK; Philip Jackson, University of Surrey - Guildford, Surrey, UK; Philip Coleman, University of Surrey - Guildford, Surrey, UK; Tom Parnell, BBC Research & Development - Salford, UK
We present a novel pipeline to estimate reverberant spatial audio object (RSAO) parameters given room impulse responses (RIRs) recorded by ad-hoc microphone arrangements. The proposed pipeline performs three tasks: direct-to-reverberant-ratio (DRR) estimation; microphone localization; RSAO parametrization. RIRs recorded at Bridgewater Hall by microphones arranged for a BBC Philharmonic Orchestra performance were parametrized. Objective measures of the rendered RSAO reverberation characteristics were evaluated and compared with reverberation recorded by a Soundfield microphone. Alongside informal listening tests, the results confirmed that the rendered RSAO gave a plausible reproduction of the hall, comparable to the measured response. The objectification of the reverb from in-situ RIR measurements unlocks customization and personalization of the experience for different audio systems, user preferences, and playback environments.
Convention Paper 10028 (Purchase now)


Return to Paper Sessions