AES Dublin 2019
Paper Session Details
P01 - Loudspeakers: Part 1
Wednesday, March 20, 10:30 — 12:30 (Meeting Room 3)
Chair:
Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
P01-1 Large Horns and Small Rooms – Do They “Play Nicely” Together?—Bjørn Kolbrek, Celestion - Ipswich, UK
For some audiophiles, having a huge, low-cutoff bass horn built into the wall of the listening room represents the ultimate low frequency solution. Without considering the practicalities of such an installation, this paper will look at the performance of low frequency horns mounted in the wall of a small room compared to the performance of a typical point source closed box type sub-woofer and an array of such sub-woofers. Simulation results indicate that in addition to higher efficiency, the horns provide smoother response in the listening position and less seat-to-seat variation.
Convention Paper 10132 (Purchase now)
P01-2 Predistortion Technique for Generating Spectrally Clean Excitation Signals for Audio and Electro-Acoustic Nonlinear Measurements—Antonin Novak, Université du Mans - Le Mans, France; Laurent Simon, Le Mans Université - Le Mans, France; Pierrick Lotton, Le Mans Cedex 9, France; Manuel Melon, Le Mans Université - Le Mans cedex 9, France
In many audio and electro-acoustic nonlinear measurements we need to excite the nonlinear system under test with an excitation device that is not linear. A typical example is the study of the nonlinear behavior of a loudspeaker mechanical part, where the mechanical part (the nonlinear system under test) is excited externally, either with a shaker or pneumatically using another loudspeaker. We often consider that the excitation device is linear, which is unfortunately not correct. In this paper we present a simple method that corrects the distorted output signal of the excitation device by pre-distorting the input signal. The process is based on harmonic injection and can be applied to any periodic signal that is used for the measurement, e.g., a sine wave to measure the total harmonic distortion (THD), a two-tone signal to measure an intermodulation distortion (IMD), or a multi-tone signal. The experimental results provided on an electrodynamic loudspeaker show that the undesired spectral components of the acoustic pressure inside the sealed box can be suppressed to the level of the background noise.
Convention Paper 10133 (Purchase now)
P01-3 Sensory Profiling of High-End Loudspeakers Using Rapid Methods—Part 4: Flash Profile with Expert Assessors—Irene Arrieta Sagredo, Bang & Olufsen - Struer, Denmark; Samuel Moulin, Bang & Olufsen - Struer, Denmark; Søren Bech, Bang & Olufsen a/s - Struer, Denmark; Aalborg University - Aalborg, Denmark
This study is the fourth in a series of papers investigating different rapid sensory profiling methods applied to audio stimuli [1, 2, 3]. In particular, this paper considers Flash Profile, a verbal-based method that allows assessors to use their own vocabulary, for perceptual audio evaluation. A listening test was conducted with expert listeners investigating the ability of Flash Profile to describe and discriminate five sets of high-end loudspeakers. The influence of using different audio-stimuli in order to get a broader perceptual image is supported by doing a track by track analysis, using Multiple Factor Analysis [4, 5]. The results suggest that the differences between loudspeakers lie in two main dimensions related to the timbral and spatial characteristics of the stimuli. Flash Profile seem to be a time-efficient tool for visualization and reduction of perceptual dimensions, being useful for the description and discrimination of a set of audio stimuli with medium to small audible differences.
Convention Paper 10134 (Purchase now)
P01-4 Poster Introductions 1—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Optimized Exciter Positioning Based on Acoustic Power of a Flat Panel Loudspeaker—Benjamin Zenker; Shanavaz Sanjay Abdul Rawoof; Sebastian Merchel; Ercan Altinsoy
• Practical Problems in Building Parametric Loudspeakers with Ultrasonic Piezoelectric Emitters—Antonin Novak; Jose Miguel Cadavid Tobon
• Time Stretching of Musical Instrument Tones—Sean O’Leary
P02 - Perception
Wednesday, March 20, 10:30 — 12:30 (Meeting Room 2)
Chair:
Malachy Ronan, Limerick Institute of Technology - Limerick, Ireland
P02-1 Comparison of Recording Techniques for 3D Audio Due to Difference between Listening Positions and Microphone Arrays—Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
The listening experiments comparing three recording techniques for 3D audio, namely Spaced Array, One-point Array, and Ambisonics were executed. First, the evaluation attributes were extracted referring the Repertory Grid Technique. Then participants compared the differences between these microphone techniques including the difference in listening position. From the results, the difference depending on the listening position is the smallest in the Spaced Array. Besides, it is estimated that Ambisonics gives the impression of “hard,” One-point Array gives “rich” and “wide,” and Spaced Array gives “clear” and “real.” Furthermore, “real” was evaluated from the viewpoint of clarity and the richness of reverberation, with a negative correlation with the spectral centroid and a positive correlation with the reflection from lateral and vertical, respectively.
Convention Paper 10136 (Purchase now)
P02-2 [moved to Session 8]—N/A
P02-3 Investigation into the Influence of Electromechanical Characteristics of Electrodynamic Transducers on Sound Quality Perception—Semyung Son, Hyundai Mobis - Yong-in, Kyung-ki, Korea; Juyoung Jeon, Hyundai Mobis - Yong-in, Kyung-ki, Korea; Junbae Choi, Hyundai Mobis - Yongin-si, Korea; Mikhail Pakhomov, SPB Audio R&D Lab - St. Petersburg, Russia
The two most noticeable types of distortion in an audio signal path—frequency and nonlinear—are frequently analyzed by researchers and developers in terms of auditory perception. The effect of transient distortion, though insufficiently studied, is evident in subjective listening tests when comparing loudspeakers with similar frequency response and no audible nonlinear distortions. In the present study we conducted loudspeaker measurements and subjective evaluations to define the critical factors based on the loudspeaker’s electromechanical characteristics that affect transient distortion and determined relations between the factors’ values and subjective scores.
Convention Paper 10135 (Purchase now)
P02-4 Poster Introductions 2—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Audio-Driven Multimedia Content Authentication as a Service—Nikolaos Vryzas; Anastasia Katsaounidou; Rigas Kotsakis; George Kalliris; Charalampos Dimoulas
• ANC System Using Secondary Path Modeling Based on Driver’s Position in Vehicle—Seyeong Jang; Jongin Jung; Hyungsub Lim
• Pop and Rock Music Audio Production for 22.2 Multichannel Sound: A Case Study—Will Howie
• Sound Recording Studio Renovation at the University of Victoria—Bezal Benny; Kirk McNally
P03 - Loudspeakers: Part 2
Wednesday, March 20, 14:30 — 16:30 (Meeting Room 2)
Chair:
Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA
P03-1 Green Speaker Design (Part 1: Optimal Use of System Resources)—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
Increasing the efficiency and voltage sensitivity of the electro-acoustical conversion is the key to modern audio devices generating the required sound output with minimum size, weight, cost, and energy. Traditional loudspeaker design sacrifices efficiency for sound quality. Nonlinear adaptive control can compensate for the undesired signal distortion, protect the transducer against overload, stabilize the voice coil position, and cope with time-varying properties of the suspension. The paper presents a new design concept for an active loudspeaker system that uses the new degree of freedom provided by DSP for exploiting a nonlinear motor topology, a soft suspension and modal vibration in the diaphragm, panel, and in the acoustical systems.
Convention Paper 10138 (Purchase now)
P03-2 Green Speaker Design (Part 2: Optimal Use of Transducer Resources)—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
Green speaker design is a new concept for developing active loudspeaker systems that generate the required sound output with minimum size, weight, cost, and energy. This paper focuses on the optimization of the transducer by exploiting the new opportunities provided by digital signal processing. Nonlinear adaptive control can compensate for the undesired signal distortion, protect the transducer against overload, stabilize the voice coil position, and cope with time varying properties of the suspension. The transducer has to provide maximum efficiency of the electroacoustical conversion and sufficient voltage sensitivity to cope with the amplifier limitations. The potential of the new concept is illustrated on a transducer intended for automotive application.
Convention Paper 10139 (Purchase now)
P03-3 DSP Loudspeaker 3D Complex Correction—Victor Manuel Catala Iborra, DAS Audio - Fuente Del Jarro, Spain; The University of Salford - Salford, UK; Francis F. Li, University of Salford - Salford, UK
An advantageous approach to DSP equalization of loudspeakers is proposed in this paper adopting spatial averages of complex responses acquired from 3D balloon measurements. Alignment of the off-axis impulses responses with the on-axis impulse responses are accomplished using a cross-correlation technique prior to spatial averaging to attain meaningful statistics of magnitude and phase responses. This is performed over a pre-defined listening window from the complete loudspeaker response balloons (both magnitude and phase). The resulted average of the complex response within a suitably defined listening window is used to obtain, via the least mean square adaptive technique, an inverse filter that corrects the linear behavior of the loudspeaker.
Convention Paper 10140 (Purchase now)
P03-4 Poster Introductions 3—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Investigation into How Reference Sources and the Experience of Technical Ear Training Work in Mixing through Headphones—Soohoon Park; Toru Kamekawa; Atsushi Marui
• Proposal of Power-Saving Audio Playback Algorithm Based on Auditory Masking—Mitsunori Mizumachi; Tsukasa Nakashima; Mitsuhiro Nakagawara
• Localization of Natural Sound Sources at Various Azimuth and Elevation Angles—Maksims Mironovs; Hyunkook Lee
P04 - Spatial Audio
Wednesday, March 20, 14:30 — 16:00 (Meeting Room 3)
Chair:
Jorge Medina Victoria, Hochschule Darmstadt/CIT - Darmstadt, Germany; Cork Institute of Technology - Cork, Ireland
P04-1 Toward Six Degrees of Freedom Audio Recording and Playback Using Multiple Ambisonics Sound Fields—Eduardo Patricio, Zylia Sp. z o.o. - Poznan, Poland; Andrzej Ruminski, Zylia sp. z.o.o. - Poznan, Poland; Adam Kuklasinski, Zylia sp. z o. o. - Poznan, Poland; Lukasz Januszkiewicz, Zylia Sp. z o.o. - Poznan, Poland; Tomasz Zernicki, Zylia sp. z o.o. - Poznan, Poland
This paper describes a strategy for recording sound and enabling six-degrees-of-freedom (6DoF) playback making use of multiple simultaneous and synchronized higher-order ambisonics (HOA) recordings. For the evaluation of the proposed approach a 3D audio-visual navigable playback system was implemented. Subjective listening tests were conducted presenting three distinct scenarios, one using spatialized mono sources and the other two interpolated listening points from 1st and 3rd order multiple ambisonics sound fields. The obtained results demonstrate that HOA recordings are suitable for reproduction of 6DoF immersive audio scenes.
Convention Paper 10141 (Purchase now)
P04-2 Recording and Composing Site-Specific Spatial Music for 360 Video—Enda Bates, Trinity College Dublin - Dublin, Ireland; Sebastian Csadi, Trinity College Dublin - Dublin, Ireland; Hugh O'Dwyer, Trinity College - Dublin, Ireland; Luke Ferguson, Trinity College Dublin - Dublin, Ireland; Francis M. Boland, Trinity College Dublin - Dublin, Ireland
This paper documents the 360 video and audio recording of a newly composed work for saxophone quintet, performed in four distinct locations with differing spatial distributions of performers. The potentially site-specific nature of instrumental spatial music is first discussed via a number of historical examples. A comparative analysis of the recordings of this new work from each location is then performed, and the influence of the acoustic environment on different spatial effects such as mobile performers at varying distances, spill, and spatial trajectories is investigated. The analysis suggests that for exterior locations, localization accuracy in first order Ambisonic recordings is adequately maintained, even when performers are placed at large distances. In addition, the presence or lack of reverberation is shown to strongly influence the effectiveness of spill effects or spatial trajectories in instrumental spatial music compositions.
Convention Paper 10142 (Purchase now)
P04-3 3D Ambisonic Decoding for Stereo Loudspeakers with Headtracking—Dylan Menzies, University of Southampton - Southampton, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
Compensated Amplitude Panning (CAP) is a spatial audio reproduction method for loudspeakers that takes the listener head orientation into account. Using CAP it is possible to produce stable images in all directions using only two loudspeakers. In its original formulation CAP is inherently an object-based method, with each image produced separately. Here a natural method is presented for dynamically decoding a first order Ambisonic encoding that is equivalent to using CAP to reproduce the constituents of the encoding. This has the advantage of channel-based methods that complex scenes can be reproduced with little cost, and existing Ambisonic encodings, such as those used in 360º video, can be reproduced directly.
Convention Paper 10143 (Purchase now)
P05 - Poster Session 1
Wednesday, March 20, 15:00 — 17:00 (The Liffey B)
P05-1 Optimized Exciter Positioning Based on Acoustic Power of a Flat Panel Loudspeaker—Benjamin Zenker, Technical University Dresden - Dresden, Germany; Hommbru GmbH - Reichenbach, Germany; Shanavaz Sanjay Abdul Rawoof, TU Dresden - Dresden, Germany; Sebastian Merchel, TU Dresden - Dresden, Germany; Ercan Altinsoy, TU Dresden - Dresden, Germany
Loudspeaker panels, such as distributed mode loudspeakers (DML), are a promising alternative approach in loudspeaker design. DML have many advantages compared to pistonic loudspeakers. However, the frequency response is mostly associated with higher deviations. The position of the excitation is one parameter to optimize the frequency response. An electro-mechanical-acoustical model is presented that enables the optimization of the exciter location, based on the response of the radiated sound power. A simulation model is presented for different surface areas and aspect ratios of the panel. The appropriated positioning and its excitation are discussed based on a single criterion and finally compared with the State of the Art method.
Convention Paper 10144 (Purchase now)
P05-2 Practical Problems in Building Parametric Loudspeakers with Ultrasonic Piezoelectric Emitters—Jose Cadavid, Le Mans Université - Le Mans, France; Antonin Novak, Université du Mans - Le Mans, France
In this paper we deal with some practical issues that one can encounter when building a parametric loudspeaker with ultrasonic piezoelectric emitters. We measured several of those transducers (with resonance frequency 40 kHz) available on the market, observing a strong nonlinear behavior of many of them. We also tested a hundred of piezoelectric emitters of the same series and studied the influence of the standard deviation of the resonance frequency and the sensitivity on the performance of the parametric loudspeaker. We conclude that, when constructing a parametric loudspeaker with low-cost piezoelectric emitters, the individual behavior of each of them should be considered. This allows to minimize the effect of their differences and, thus, improve the quality of the sound generated.
Convention Paper 10145 (Purchase now)
P05-3 Time Stretching of Musical Instrument Tones—Sean O'Leary, Dublin Institute of Technology - Dublin, Ireland
This paper will present an approach to time stretching monophonic sounds such as musical instrument and voice samples. While most time stretching algorithms preserve the pitch of signals, typically they distort some aspects of the temporal evolution—such as onset time, vibrato rate, and random variations in amplitude and frequency. The aim of the time stretching algorithm presented in this paper is to preserve such features of the original signal in the transformed signal.
Convention Paper 10146 (Purchase now)
P05-4 Audio-Driven Multimedia Content Authentication as a Service—Nikolaos Vryzas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Anastasia Katsaounidou, Aristotle University of Thessaloniki - Thessaloniki, Greece; Rigas Kotsakis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece
In the current paper we present a framework for providing supervisory tools for multimedia Content Authentication As A Service (CAAAS). A double compression method for discontinuity detection in audio signals is implemented and integrated in the provided web service. The user can upload audio/video content or provide links and thereafter, a feature vector is extracted from the audio modality of the selected content for the investigation of discontinuities of the signal via the proposed algorithms. Several visualizations are returned to the user, indicating possible points of forgery in the audio/visual file. Moreover, an audio tampering detection methodology by unsupervised clustering of short-window non-vocal segments, in order to identify differentiations of the acoustic environment of speech signals is presented and evaluated.
Convention Paper 10148 (Purchase now)
P05-5 ANC System Using Secondary Path Modeling Based on Driver’s Position in Vehicle—Seyeong Jang, Hyundai Mobis - Seoul, Korea; Jongin Jung, Hyundai Mobis - Seoul, Korea; Hyungsub Lim, Hyundai Mobis - Seoul, Korea
In this paper we propose a study of active noise control systems using the concept of Secondary Path modeling based on driver position in the vehicle. The system obtains estimates of the Secondary Path within range of occupant location and applies them to the ANC system to compensate for change depending on the driver's position. We used the Offline Secondary Path modeling method and FxLMS algorithm in ANC System. Under assumption of detecting a change in position, the secondary path model is applied according to the occupant position and used as initial value of the ANC system. Therefore, ANC performance is better than a system that does not consider existing changing Secondary Path.
Convention Paper 10149 (Purchase now)
P05-6 Pop and Rock Music Audio Production for 22.2 Multichannel Sound: A Case Study—Will Howie, CBC/Radio-Canada - Vancouver, Canada
Advanced sound capture and mixing techniques, optimized for high channel-count three-dimensional audio reproduction systems, are discussed for pop/rock music production. Based on previous research and experimental recordings, newly developed complex close-microphone arrays are designed to deliver realistic sonic images of musical instruments in terms of physical size and timbre. Combined with multiple ambience microphones, these direct sound arrays can be used to create highly realistic or hyper-realistic sound scenes for 22.2 multichannel sound (9+10+3) reproduction, or other 3D audio formats. A specific case study highlights the aesthetic and technical considerations for production of pop/rock music for advanced audio formats such as 22.2 multichannel sound.
Convention Paper 10150 (Purchase now)
P05-7 University of Victoria Sound Recording Studio Renovation—Bezal Benny, University of Victoria - Victoria, Canada; Kirk McNally, University of Victoria, School of Music - Victoria, BC, Canada
A recent renovation of the sound recording studio at the University of Victoria School of Music represents the first major capital project within the school since its opening in 1968. This case study presents an overview of the project from initial briefing to completion, including discussion of the design opportunities and limitations. Acoustical models used in the design process are presented and used to illustrate the challenges faced when attempting to balance control room performance for both research and teaching purposes.
Convention Paper 10151 (Purchase now)
P06 - Loudspeaker: Part 3
Thursday, March 21, 09:00 — 11:00 (Meeting Room 3)
Chair:
Bjørn Kolbrek, Celestion - Ipswich, UK
P06-1 Dynamic Driver Current Feedback Methods—Juha Backman, Huawei Technologies - Tampere, Finland; Genelec Oy - Iisalmi, Finland
Current feedback is a versatile method of modifying the behavior of a loudspeaker driver with opportunity for linearization and matching the driver to the enclosure design targets, but depending on the chosen approach a potential risk of increasing the effects of either voice coil impedance variation or driver mechanical parameter nonlinearity, and the current feedback approach needs to be designed to keep these effects well controlled for the intended application. This work compares using a nonlinear simulation model various forms of current feedback, including current drive, finite positive or negative amplifier resistances, negative resistance with reactance. This final part of the work extends the examples given in the earlier papers and presents a feedback approach that would appear to offer benefits in both distortion and thermal compression control.
Convention Paper 10152 (Purchase now)
P06-2 Impact of the Coupling Factor on Lossy Voice Coil Impedance—Isao Anazawa, NY Works - Toronto, ON, Canada
The voice coil impedance frequency dependence due to Eddy current, skin, and proximity effects (Eddy Losses) becomes more apparent as the frequency becomes higher. The theory is that the magnitude of lossy impedance frequency dependence is vw . However in the majority of real loudspeakers, the impedance frequency dependence was empirically found to be clearly higher than this. A voice coil blocked impedance model was developed based on a structure that applies a transformer for the voice coil inductance as the primary winding. Surrounding conductive material is treated as an impedance connected to the secondary winding. The model successfully describes the blocked impedance frequency dependence that agreed at a high degree of accuracy with the actual samples. Also the model showed intricate connections between the transformer coupling coefficient k and the magnitude of frequency dependency.
Convention Paper 10153 (Purchase now)
P06-3 Compact Stereo Loudspeakers with Dipole Processing—Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
Compact stereo loudspeakers have become increasingly popular. One category of these use side-firing left and right transducers featuring a certain spatial effect due to the transducers’ directivity at high frequencies. The presented technique increases the spatial effect by controlling directivity at low/medium frequencies, where the transducers have low directivity. A multi-band filter network is used to increase directivity at these frequencies by partially reproducing the stereo signal with dipole directivity pattern. The problem of interference between left and right direct and dipole reproduced sound is addressed.
Convention Paper 10154 (Purchase now)
P06-4 Poster Introductions 4—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Real-Time Measurement System Detecting Tonal Components and Determining Their Audibility in Environmental Noise—Magdalena Matys; Kamil Piotrowski; Tadeusz Wszolek; Bartlomiej Kukulski
• Localization Accuracy of First Order Ambisonics in the Vertical Plane across Multiple Horizontal Positions—Connor Millns; Hyunkook Lee; Maksims Mironovs
• A Case Study on the Perceptual Differences in Finite-Difference Time-Domain-Simulated Diffuser Designs—Julie Meyer; Lauri Savioja; Tapio Lokki
• Analysis of Polish Web Streaming Loudness—Piotr Cieslik; Karolina Szybinska
P07 - Assessment
Thursday, March 21, 09:00 — 10:30 (Meeting Room 2)
Chair:
Federica Bressan, Ghent University - Ghent, Belgium
P07-1 BAQ and QoE: Subjective Assessment of 3D Audio on Mobile Phones—Fesal Toosy, University of Central Punjab - Lahore, Pakistan; Muhammad Sarwar Ehsan, University of Central Punjab - Lahore, Pakistan
With the growing popularity of using cellphones and other handheld electronic devices for surfing the internet and streaming audio and video, it was only a matter of time that technologies like 3D audio would be implemented on such devices and relevant content would start being produced. It is important to know if 3D audio offers an improvement over existing stereo formats in terms of perceived basic audio quality and quality of experience. This paper presents a subjective quality assessment of 3D audio. The results show that 3D audio gives an improvement in perceived basic audio quality and quality of experience over other audio formats.
Convention Paper 10155 (Purchase now)
P07-2 Segmentation of Listeners Based on Their Preferred Headphone Sound Quality Profiles—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International Inc. - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA
In previous papers we reported results from two controlled listening tests where both trained and untrained listeners gave sound quality preference ratings for in-ear (IE) and around-ear/on-ear headphones. Both groups of listeners on average preferred headphones with frequency responses that meet the Harman target curves. In this paper we re-analyze the data using cluster analysis to uncover different segments or classes listeners based on their similarity in headphone ratings and explore common demographic (age, gender, listening experience) and acoustic factors associated their headphone preferences.
Convention Paper 10156 (Purchase now)
P07-3 Latency Tolerance Range Measurements in Western Musical Instruments—Jorge Medina Victoria, Hochschule Darmstadt/CIT - Darmstadt, Germany; Cork Institute of Technology - Cork, Ireland
A systematic quantitative listening test was conducted in order to investigate the influence of western musical instruments on the ability to cope with latency. A questionnaire and different control mechanisms, including a predefined score and three different metronomes (aural, visual, and aural-visual), enabled the gathering of data under equal conditions for all participants while performing with self-delay. The influence of the musical instrument was demonstrated with the experimental data. Furthermore, the measurement of the latency tolerance range (LTR) enabled the comparison of different instrument groups and demonstrated the relationship between musical tempo and latency.
Convention Paper 10157 (Purchase now)
P08 - Industry Issues
Thursday, March 21, 11:15 — 13:15 (Meeting Room 2)
Chair:
Roisin Loughran, UCD - Dublin, Ireland
P08-1 Early Causes for Biodegradation of PVA/PVC Tapes for Audio Recording—Ana Paula da Costa, Instituto Superior Tecnico - Lisbon, Portugal; Teresa Rosa, Instituto Superior Tecnico - Lisbon, Portugal; Federica Bressan, Ghent University - Ghent, Belgium
The degradation of magnetic tapes is one of the main threats to the survival of our collective audio heritage. Archives around the world, big and small, are all concerned with the same challenge, that of counteracting the natural decay of plastic compounds. This study investigates the biodegradation of poly(vinyl alcohol)/poly(vinyl chloride) (PVA/PVC) blends tapes, namely audio magnetic tapes, using the spectrophotometry (FTIR), scanning electron microscopy (SEM), and thermogravimetric analysis (TGA). The tapes (both sides) were studied in the light of their degradation in special conditions. The objective is to obtain more information regarding the polymer degradation of magnetic tapes for audio recording and how it affects the structural composition of the tapes. This study contributes to the long-term goal of building a structured knowledge base about diagnostic tools and recovery methods for magnetic tapes.
Convention Paper 10158 (Purchase now)
P08-2 Factors Contributing to Gender Imbalance in the Audio Industry—Shelley Ann McCarthy Buckingham, Limerick Institute of Technology - Limerick, Ireland; Malachy Ronan, Limerick Institute of Technology - Limerick, Ireland
This paper explores the factors contributing to gender imbalance in the audio industry. The two main goals were: (1) whether the traditional gender-related preference for “agency” or “communal” roles holds in the audio industry, and (2) uncover existing gender-based belief systems in the audio industry. The findings suggest that women in the audio industry possess more agentic personality traits than communal. In a surprising finding, men reported more communal personality traits than agentic. Women reported that they were unsuitable for technical and managerial roles making the need for more visible role models in these areas a critical concern.
Convention Paper 10159 (Purchase now)
P08-3 A Psychometric Evaluation of Emotional Responses to Horror Music—Duncan Williams, University of York - York, UK; Chia-Yu Wu, University of York - York, UK; Victoria Hodge, University of York - York, UK; Damian Murphy, University of York - York, UK; Peter Cowling, University of York - York, UK
This research explores and designs an effective
experimental interface to evaluate people’s emotional responses to horror music. We studied methodological approaches by using traditional psychometric techniques to measure emotional responses, including self-reporting and galvanic skin response (GSR). GSR correlates with psychological arousal. It can help circumvent a problem in self-reporting where people are unwilling to report particular felt responses, or confuse perceived and felt responses. We also consider the influence of familiarity. Familiarity can induce learned emotional responses rather than listeners describing how it actually makes them feel. The research revealed different findings in self-reports and GSR data. Both measurements had an interaction between music and familiarity but show inconsistent results from the perspective of simple effects.
Convention Paper 10137 (Purchase now)
P08-4 Poster Introductions 5—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Audio Event Identification in Sports Media Content: the Case of Basketball—Panagiotis-Marios Filippidis; Nikolaos Vryzas; Rigas Kotsakis; Iordanis Thoidis; Charalampos Dimoulas; Charalampos Bratsas
• Objective and Subjective Comparison of Several Machine Learning Techniques Applied for the Real-Time Emulation of the Guitar Amplifier Nonlinear Behavior—Thomas Schmitz; Jean-Jacques Embrechts
• A Generalized Subspace Approach for Multichannel Speech Enhancement Using Machine Learning-Based Speech Presence Probability Estimation—Yuxuan Ke; Yi Hu; Chengshi Zheng; Xiaodong Li
• Road Surface Wetness Detection Using Microphones and Convolutional Neural Networks—Giovani Pepe; Leonardo Gabrielli; Livio Ambrosini; Stefano Squartini; Luca Cattani
• Primary Study on Removing Mains Hum from Recordings by Active Tone Cancellation Algorithms—Michal Luczynski
P09 - Machine Learning: Part 1
Thursday, March 21, 14:00 — 15:30 (Meeting Room 2)
Chair:
Konstantinos Drossos, Tampere University of Technology - Tampere, Finland
P09-1 Feature Selection and its Evaluation in Binaural Ear Acoustic Authentication—Masaki Yasuhara, NIT, Nagaoka College - Nagaoka City, Niigata, Japan; Shohei Yano, Nagaoka College - Nagaoka City, Niigata, Japan; Takayuki Arakawa, NEC Corporation - Tokyo, Japan; Takafumi Koshinaka, NEC Corporation - Japan
Ear acoustic authentication is a type of biometric authentication that uses the ear canal transfer characteristics that show the acoustic characteristics of the ear canal. In ear acoustic authentication, biological information can be acquired from both ears. However, extant literature on an accuracy improvement method using binaural features is inadequate. In this study we experimentally determine a feature that represents the difference between each user to perform a highly accurate authentication. Feature selection was performed by changing the combination of binaural features, and they were evaluated using the ratio of between-class and within-class variance and equal error ratio (EER). We concluded that a method that concatenates the features of both ears has the highest performance.
Convention Paper 10160 (Purchase now)
P09-2 Deep Learning for Synthesis of Head-Related Transfer Functions—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
Ipsilateral and contralateral head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at a virtual location. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we compare and contrast subspace-based techniques for reconstructing HRTFs at arbitrary directions for a sparse dataset (e.g., IRCAM-Listen HRTF database) using (i) hybrid-based (combined linear and nonlinear) principal component analysis (PCA)+fully-connected neural network (FCNN), and (ii) a fully nonlinear (viz., deep learning based) Autoencoder (AE) approach. The results from the AE-based approach show improvement over the hybrid approach, in both objective and subjective tests, and we validate the AE-based approach on the MIT dataset.
Convention Paper 10161 (Purchase now)
P09-3 Bayesian Optimization of Deep Learning Techniques for Synthesis of Head-Related Transfer Functions—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA; Timothy Mauer, HP, Inc. - San Francisco, CA, USA; Teresa Wells, HP, Inc. - San Francisco, CA, USA; David Berfanger, HP, Inc. - Vancouver, WA, USA
Head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at horizontal angle ø and vertical angle ?. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we build up on our prior research [5] by extending the technique to HRTF synthesis, using the IRCAM dataset, while reducing the computational complexity of the autoencoder (AE)+fully-connected-neural-network (FCNN) architecture by ˜ 60% using Bayesian optimization. We also present listening test results, demonstrating the performance of the presented approach, from a pilot study that was designed for assessing the directional cues of the proposed architecture.
Convention Paper 10162 (Purchase now)
P10 - Poster Session 2
Thursday, March 21, 15:15 — 17:15 (The Liffey B)
P10-1 Investigation into How Reference Sources and the Experience of Technical Ear Training Work in Mixing through Headphones—Soohoon Park, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
This paper reports an investigation into how reference sources and the experience of technical ear training work in mixing through headphones. In the experiment, participants were asked to adjust the EQ of the stimulus source while monitoring by using five different types of headphones respectively. There were significant differences between the two groups based on the experience of ear training and in the EQ adjustment results of the high-frequency region depending on whether or not the reference was provided. Based on the experimental results of the experiments, the mixing result has been shown to be influenced by the existence of the reference source and the experience of ear training.
Convention Paper 10163 (Purchase now)
P10-2 Proposal of Power-Saving Audio Playback Algorithm Based on Auditory Masking—Tsukasa Nakashima, Kyushu Institute of Technology - Fukuoka, Japan; Mitsuhiro Nakagawara, Panasonic Corporation - Yokohama City, Kanagawa, Japan; Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan
Power consumption is an important issue while listening to music using portable audio devices. The authors have previously proposed a power-saving audio playback algorithm, which has adjusted filter-bank outputs according to our auditory characteristics. It succeeds in reducing power consumption but causes perceptual distortion. In this paper the power-saving audio playback algorithm is improved based on auditory masking, which attenuates audio components below the masking threshold. As a result of a listening test, it is confirmed that the proposed method is subjectively superior to the previous method with the same power consumption.
Convention Paper 10164 (Purchase now)
P10-3 Localization of Natural Sound Sources at Various Azimuth and Elevation Angles—Maksims Mironovs, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
A bird recording was compared against an airplane take-off sample at various azimuth and elevation angles in this study. A total of 33 source positions were tested, ranging from 0° to 180° azimuth and –30° to 90° elevation angles with 30° intervals. The results showed that both perceived azimuth and elevation are significantly affected by the source frequency content. Furthermore, a significant azimuth shift towards the lateral plane was observed on the off-center axis. This effect was stronger for the elevated positions on the rear hemisphere. Additionally, the pitch-height effect was present and was most dominant on the median plane and frontal hemisphere. Last, confusion errors were present for both stimuli; however, they were significant only on the median plane.
Convention Paper 10165 (Purchase now)
P10-4 Real-Time Measurement System Detecting Tonal Components and Determining Their Audibility in Environmental Noise—Magdalena Matys, AGH University of Science and Technology - Krakow, Poland; Kamil Piotrowski, AGH University of Science and Technology - Kraków, Poland; Tadeusz Wszolek, AGH University of Science and Technology - Krakow, Poland; Bartłomiej Kukulski, AGH University of Science and Technology - Kraków, Poland
The presence of tonal components in a sound signal usually increases its annoyance but their detection and proper qualification is not always unambiguous. Despite the relatively easy recognition of a tonal noise, its objective identification and tonality measurement is much more difficult. The identification and classification of tonal components presence in measured noise is described in the standard ISO/PAS 20065:2016(E). In this paper authors introduce a system that was created in LabVIEW environment. The main objective was to develop the easy to use system running in real-time, which is capable to perform automatic calculations based on ISO/PAS 20065:2016(E).
Convention Paper 10166 (Purchase now)
P10-5 Vertical Localization Accuracy of Binauralized First Order Ambisonics across Multiple Horizontal Positions—Connor Millns, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK; Maksims Mironovs, University of Huddersfield - Huddersfield, West Yorkshire, UK
This presents a systematic investigation into the localization accuracy of two First Order Ambisonics (FOA) decoding methods for head-static binaural reproduction: the magnitude least squared method used in the IEM Binaural Decoder and the basic decoding method for the cube virtual loudspeaker layout. The two decoding methods were compared against a directly binauralized reference for five vertical positions (–45° to 45° at 22.5° intervals) for eight horizontal positions (0° to 315° at 45° intervals). A train of pink noise bursts were used as stimuli. Results indicate that little elevation was perceived across the tested azimuths for all three reproduction methods. The lack of elevation has implications for FOA microphone placement in terms of microphone height.
Convention Paper 10167 (Purchase now)
P10-6 A Case Study on the Perceptual Differences in Finite-Difference Time-Domain-Simulated Diffuser Designs—Julie Meyer, Aalto University - Espoo, Finland; Lauri Savioja, Aalto University - Espoo, Finland; Tapio Lokki, Aalto University - Espoo, Finland
This paper presents a method to determine if differences between the scattering created by geometrically-similar diffuser designs are perceivable. Although there exist standards to measure the scattering and diffusion coefficients, the perceptual evaluation of the scattering created by diffusing surfaces has previously been scarcely examined. In the context of the optimization of a diffuser design, such audibility study can be used to assess the relevance of optimized geometries from a perceptual point of view. The proposed approach uses ?nite-difference time-domain (FDTD) numerical simulations to generate impulse responses (IRs) from which diffuser responses of geometrically-close designs are extracted. For each diffuser geometry, a set of three such time-domain responses convolved with a click-like signal, white Gaussian noise, and a male speech, are used as stimuli in an ABX listening test. Percentage of correct answers show that subjects are able to perceive differences for the click stimulus for all tested conditions (geometries and receiver positions), while discrimination rates are mitigated across conditions for the white Gaussian noise and are not significant for the speech signal. Results also indicate that subjects’ performance depends on the receiver location.
Convention Paper 10168 (Purchase now)
P10-7 Analysis of Polish Web Streaming Loudness—Piotr Cieslik, AGH University of Science and Technology - Krakow, Poland; Karolina Szybinska, Jagiellonian University - Krakow, Poland
The aim of the study was to identify the problem related to the lack of sound normalization and law regulations in the online streaming. The method was based on analysis of samples of recorded materials from PC’s web browser players and Android and iOS apps. Samples were taken from the Polish Internet streaming stations. The data were analyzed and the results were compared. The results showed that the loudness of Polish web streaming was very differentiated. There is significant discrepancy in the loudness between the stations. Moreover, in some cases, there are substantial loudness differences between advertisements, music, and programs.
Convention Paper 10169 (Purchase now)
P10-8 Primary Study on Removing Mains Hum from Recordings by Active Tone Cancellation Algorithms—Michal Luczynski, Wroclaw University of Science and Technology - Wroclaw, Poland
In this paper the method of removing the mains hum has been presented. This method is based on active tone reduction. Active tone reduction is active noise reduction, where the secondary signal is a signal synthesized based on tonal components detected in the primary signal. The author of the paper has made tests of his own algorithm. The tested signals are the mains hum and hum with the guitar sound. The effect of the work is to indicate the advantages and disadvantages of the algorithm comparing with commonly used solutions.
Convention Paper 10147 (Purchase now)
P11 - Machine Learning: Part 2
Thursday, March 21, 16:00 — 18:00 (Meeting Room 2)
Chair:
Bezal Benny, University of Victoria - Victoria, Canada
P11-1 Audio Inpainting of Music by Means of Neural Networks—Andrés Marafioti, Austrian Academy of Sciences - Vienna, Austria; Nicki Holighaus, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria; Nathanaël Perraudin, Swiss Data Science Center - Switzerland
We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps and represented by time-frequency (TF) coefficients. For music, our DNN significantly outperformed the reference method based on linear predictive coding (LPC), demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.
Convention Paper 10170 (Purchase now)
P11-2 A Literature Review of WaveNet: Theory, Application, and Optimization—Jonathan Boilard, Universite de Sherbrooke - Sherbrooke, Quebec, Canada; Philippe Gournay, Universite de Sherbrooke - Sherbrooke, QC, Canada; Roch Lefebvre, Universite de Sherbrooke - Sherbrooke, QC, Canada
WaveNet is a deep convolutional artificial neural network. It is also an autoregressive and probabilistic generative model; it is therefore by nature perfectly suited to solving various complex problems in speech processing. It already achieves state-of-the-art performance in text-to-speech synthesis. It also constitutes a radically new and remarkably efficient tool to perform voice transformation, speech enhancement, and speech compression. This paper presents a comprehensive review of the literature on WaveNet since its introduction in 2016. It identifies and discusses references related to its theoretical foundation, its application scope, and the possible optimization of its subjective quality and computational efficiency.
Convention Paper 10171 (Purchase now)
P11-3 Sparse Autoencoder Based Multiple Audio Objects Coding Method—Shuang Zhang, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
The traditional multiple audio objects codec extracts the parameters of each object in the frequency domain and produces serious confusion because of high coincidence degree in subband among objects. This paper uses sparse domain instead of frequency domain and reconstruct audio object using the binary mask from the down-mixed signal based on the sparsity of each audio object. In order to overcome high coincidence degree of subband among different audio objects, the sparse autoencoder neural network is established. On this basis, a multiple audio objects codec system is built up. To evaluate this proposed system, the objective and subjective evaluation are carried on and the results show that the proposed system has the better performance than SAOC.
Convention Paper 10172 (Purchase now)
P11-4 Poster Introductions 6—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• jReporter: A Smart Voice-Recording Mobile Application—Lazaros Vrysis; Nikolaos Vryzas; Efstathios Sidiropoulos; Evangelia Avraam; Charalampos Dimoulas
• Two-Channel Sine Sweep Stimuli: A Case Study Evaluating 2-n Channel Upmixers—Laurence J. Hobden; Christopher Gribben
• A Rendering Method for Diffuse Sound—Akio Ando
P12 - Speech
Friday, March 22, 09:00 — 11:00 (Meeting Room 3)
Chair:
Yuxuan Ke, University of Chinese Academy of Sciences - Beijing, China
P12-1 Background Ducking to Produce Esthetically Pleasing Audio for TV with Clear Speech—Matteo Torcoli, Fraunhofer IIS - Erlangen, Germany; Alex Freke-Morin, University of Salford - Salford, UK; Jouni Paulus, Fraunhofer IIS - Erlangen, Germany; International Audio Laboratories Erlangen - Erlangen, Germany; Christian Simon, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Salsa Sound Ltd - Salford, Greater Manchester, UK
In audio production background ducking facilitates speech intelligibility while keeping the background track enjoyable. Technical details for recommendable ducking practices are not currently documented in literature. Hence, we first analyze common practices found in TV documentaries. Second, a subjective test investigates the preferences of 22 normal-hearing listeners on the Loudness Difference (LD) between commentary and background during ducking. Highly personal preferences are observed, highlighting the importance of object-based personalization. Statistically significant difference is found between non-expert and expert listeners. On average, non-experts prefer LDs that are 4 LU higher than the ones preferred by experts. Based on the test results, we recommend at least 10 LU difference between commentary and music and at least 15 LU between commentary and ambience.
Convention Paper 10175 (Purchase now)
P12-2 Factors Influencing the Spectral Clarity of Vocals in Music Mixes—Kirsten Hermes, University of Westminster - London, UK
Vocal clarity is one of the most important quality parameters of music mixes. The clarity of isolated sounds depends heavily on spectral factors and can therefore be manipulated with EQ. Spectrum is also an important factor in determining vocal timbral and quality parameters. An experiment where listeners rate the spectral clarity of equalized vocals within a noise backing track can provide insight into spectral predictors of vocal clarity. Overall, higher frequencies contribute to vocal clarity more positively than lower ones, but the relationship is program-item-dependent. Changes in harmonic centroid (or dimensionless spectral centroid) correlate well with changes in clarity and so does the vocal-to-backing track ratio.
Convention Paper 10174 (Purchase now)
P12-3 High-Resolution Analysis of the Directivity Factor and Directivity Index Functions of Human Speech—Samuel Bellows, Brigham Young University - Provo UT, USA; Timothy Leishman, Brigham Young University - Provo, UT, USA
The detailed directivity of a sound source is a powerful tool with broad applications in modeling of sound radiation into various acoustic environments, ideal microphone positioning, and other areas. While the directivity of human speech has been assessed previously, the results have lacked the necessary resolution to accurately model radiation in three dimensions. In this work high-resolution measurements were taken using a multiple-capture spherical-scanning system. The frequency-dependent directivity factors and indices of speech were then calculated from the data and their spherical-harmonic expansions. Although past models have represented these measures in simple terms, high-resolution measurements demonstrate that over the audible range they have more variation than previously known, with important ramifications for three-dimensional modeling and audio.
Convention Paper 10173 (Purchase now)
P12-4 Poster Introductions 7—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Quantitative Analysis of Streaming Protocols for Enabling Internet of Things (IoT) Audio Hardware—Marques Hardin; Rob Toulson
• Automatic Detection of Audio Problems for Quality Control in Digital Music Distribution—Pablo Alonso Jiménez; Luis Joglar Ongay; Xavier Serra; Dmitry Bogdanov
• A High Power Switch-Mode Power Audio Amplifier—Niels Ekljær Iveresen, Jóhann Björnsson, Patrik Boström, Lars Petersen
P13 - DSP: Part 1
Friday, March 22, 09:00 — 10:30 (Meeting Room 2)
Chair:
Emmanouil Theofanis Chourdakis, Queen Mary University London - London, UK
P13-1 Applying Modern Sampling Methods to the Mastering Process for Digitally Recorded Material—Jamie Angus-Whiteoak, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK
Mastering often involves a change in sampling rates from a higher sampling rate to the sampling rate required by the distribution medium such as CD etc. This rate change implicitly implies a resampling process which can introduce artefacts into the output. Modern sampling theory gives useful insight into how to improve this process. This paper introduces modern sampling theory to highlight both the problems, and possible solutions, sample rate changing tof recorded digital audio at the highest quality possible. Possible methods for changing the rate are discussed and means of reducing the huge computational cost are described. The paper will show that by using modern sampling methods it is possible to change sample rates with near perfect to perfect fidelity.
Convention Paper 10176 (Purchase now)
P13-2 Application of a Resonance-Based Signal Decomposition to the Analysis of Subtractive Synthesizer Filter Resonances—Joseph Timoney, Maynooth University - Maynooth, Kildare, Ireland; Kemal Avci, Izmir Democracy University - Karabaglar/Izmir, Turkey; Victor Lazzarini, Maynooth University - Maynooth, Kildare, Ireland
This paper investigates the analysis of resonant filters as they appear in subtractive synthesizers. These filters and their properties are a key component in the synthesis chain. The work investigates the application of a new wavelet-like signal decomposition for examining the components that make up the filter output. It produces a pair of “low” and “high” components. The results will examine these components spectrally with the intention that they might lead to new insights into synthesis and modeling.
Convention Paper 10177 (Purchase now)
P13-3 An Automatic Mixing System for Multitrack Spatialization for Stereo Based on Unmasking and Best Panning Practices—Ajin Tom, McGill University - Montreal, Quebec, Canada; Joshua D. Reiss, Queen Mary University of London - London, UK; Philippe Depalle, McGill University - Montreal, QC, Canada
One of the most important tasks in audio production is to place sound sources across the stereo field so as to reduce masking and immerse the listener within the space. This process of panning sources of a multitrack recording to achieve spatialization and masking minimization is a challenging optimization problem, mainly because of the complexity of auditory perception. We propose a novel panning system that makes use of a common framework for spectral decomposition, masking detection, multitrack sub-grouping and frequency-based spreading. It creates a well spatialized mix with increased clarity while complying to the best panning practices. Both real-time and offline optimization-based approaches are designed and implemented. We investigate the reduction of inter-track auditory masking using the MPEG psychoacoustic model along with various other masking and spatialization metrics extended for multitrack content. Subjective and objective tests compare the proposed work against mixes by professional sound engineers and existing auto-mix systems.
Convention Paper 10178 (Purchase now)
P14 - DSP: Part 2
Friday, March 22, 11:00 — 13:00 (Meeting Room 2)
Chair:
Thomas Schmitz, University of Liege - Liege, Belgium
P14-1 Prediction of Least Significant Bits from Upper Bits in Linearly Quantized Audio Waveform—Akira Nishimura, Tokyo University Information Sciences - Chiba-shi, Japan
Bit-depth expansion of digital audio is essential for enhancing the quality of digital contents in re-mastering and up-conversion processes. The current study predicts the least significant bits for the bit-depth expansion from upper bits in linearly quantized samples of a framed audio waveform. A simulated annealing technique is applied to minimize the effective power of the residual signal derived from linear prediction of the framed waveform by localizing positions of the least significant bit (LSB) to be added in the frame. The results of computer simulation using various genres of 100 mono and 10-s music signals exhibit that the mean correct rate of the predicted LSB is 72% using 8-bit quantized waveforms. Measurements of the objective sound quality degradation reveal that the mean objective difference grade (ODG) of the 8-bit signals improved from –2.96 to –2.56 after addition of the predicted LSB.
Convention Paper 10179 (Purchase now)
P14-2 B-Format Decoding Based on Adaptive Beamforming—Alexis Favrot, Illusonic GmbH - Uster, Switzerland; Christof Faller, Illusonic GmbH - Uster, Zürich, Switzerland; EPFL - Lausanne, Switzerland
B-Format signals can be decoded into signals with first order directivity. For stereo and multichannel decoding it would be desirable to have more channel separation than what is achievable by first order. DirAC (directional audio coding) and HARPEX (high resolution plane wave expansion) achieve higher channel separation by means of using a parametric B-Format model to estimate plane waves and diffuse sound, and adaptively rendering those. A limitation is that plane wave and diffuse models are too simple to represent complex B-Format signals. We propose a B-Format decoder, where each channel is generated by an independent adaptive B-Format beamformer. Each beam is generated independently of the other beams, circumventing the limitation when using a single B-Format signal model.
Convention Paper 10180 (Purchase now)
P14-3 Optimizing Wide-Area Sound Reproduction Using a Single Subwoofer with Dynamic Signal Decorrelation—Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Jonathan Moore, University of Derby - Derby, UK
A central goal in small room sound reproduction is achieving consistent sound energy distribution across a wide listening area. This is especially difficult at low-frequencies where room-modes result in highly position-dependent listening experiences. While numerous techniques for multiple-degree-of-freedom systems exist and have proven to be highly effective, this work focuses on achieving position-independent low-frequency listening experiences with a single subwoofer. The negative effects due to room-modes and comb-filtering are mitigated by applying a time-varying decorrelation method known as dynamic diffuse signal processing. Results indicate that spatial variance in magnitude response can be significantly reduced, although there is a sharp trade-off between the algorithm’s effectiveness and the resulting perceptual coloration of the audio signal.
Convention Paper 10181 (Purchase now)
P14-4 Poster Introductions 8—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions.
• Investigation of an Encoder-Decoder LSTM Model on the Enhancement of Speech Intelligibility in Noise for Hearing Impaired Listeners—Iordanis Thoidis; Lazaros Vrysis; Konstantinos Pastiadis; Konstantinos Markou; George Papanikolaou
• Noise Exposure of PC Video Games Players—Gino Iannace; Giuseppe Ciaburro; Amelia Trematerra
• Key Benefits and Drawbacks of Surrounding Sound when Wearing Headphones or Hearing Protection—Oscar Kårekull; Magnus Johansson
• The Assessment of Maximum and Peak Sound Levels of F3 Category Fireworks—Kamil Piotrowski; Adam Szwajcowski; Bartlomeij Kukulski
P15 - Production and Synthesis
Friday, March 22, 13:30 — 15:30 (Meeting Room 3)
Chair:
Joseph Timoney, Maynooth University - Maynooth, Kildare, Ireland
P15-1 Investigating the Behavior of a Recursive Mutual Compression System in a Two-Track Environment—Hairul Hafizi Bin Hasnan, University of York - York, UK; Jeremy J. Wells, University of York - York, UK
Dynamic range compression is a widely used audio process. Recent trends in music production include the emergence of its use as a creative tool rather than just a corrective device. The control for this process is unidirectional, using one signal to manipulate one or many tracks. This paper examines the behavior of a bidirectional mutual compression system implemented in Max/MSP. Tests were conducted using amplitude-modulated sine waves that highlight different attributes.
Convention Paper 10182 (Purchase now)
P15-2 Turning the DAW Inside Out—Charles Holbrow, Massachusetts Institute of Technology - Cambridge, MA, USA; MIT Media Lab
“Turning the DAW Inside Out” describes a speculative, internet-enabled sound recording and music production technology. The internet changed music authorship, ownership, and distribution. We expect connected digital technologies to continue to affect the processes by which music is created and consumed. Our goal is to explore an optimistic future wherein musicians, audio engineers, software developers, and music fans all benefit from an open ecosystem of connected digital services. In the process we review a range of existing tools for internet enabled audio and audio production and consider how they can grow to support a new generation of music creation technology.
Convention Paper 10183 (Purchase now)
P15-3 Real-Time Synthesis of Sound Effects Caused by the Interaction between Two Solids—Pedro Sánchez, Queen Mary University London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
We present the implementation of two sound effect synthesis engines that work in a web environment. These are physically driven models that recreate the sonic behavior of friction and impact interactions. The models are integrated into an online project aimed at providing users with browser-based sound effect synthesis tools that can be controlled in real time. This is achieved thanks to a physical modelling approach and existing web tools like the Web Audio API. A modular architecture was followed, making the code versatile and easy to reuse, which encourages the development of higher-level models based on the existing ones, as well as similar models based on the same principles. The final implementations present satisfactory performance results despite some minor issues.
Convention Paper 10184 (Purchase now)
P15-4 Reproducing Bass Guitar Performances Using Descriptor Driven Synthesis—Dave Foster, Queen Mary University London - London, UK; Swing City Music Ltd - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Sample-based synthesis is a widely used method of synthesizing the sounds of live instrumental performances, but the control of such sampler instruments is made difficult by the number of parameters that control the output, the expertise required to set those parameters, and by the constraints of the real-time system. In this paper the principles of descriptor-driven synthesis were used to develop a pair of software tools that aid the user in the specific task of reproducing a live performance using a sampler instrument by the automatic generation of MIDI controller messages derived from analysis of the input audio. The techniques employed build on existing work and commercially available products. The output of the system is compared to manipulation by expert users. The results show that the system outperforms the human version, despite the latter taking considerably more time. Future developments of the techniques are discussed, including the application to automatic performer replication.
Convention Paper 10185 (Purchase now)
P16 - Room Acoustics
Friday, March 22, 14:00 — 16:00 (Meeting Room 2)
Chair:
Ben Kok, BEN KOK - acoustic consulting - Uden, The Netherlands
P16-1 Time-Window Differences Evaluation in a Room Acoustic Sound Field Diffuseness Estimation—Bartlomiej Chojnacki, AGH University of Science and Technology - Cracow, Poland; Mega-Acoustic - Kepno, Poland
Diffusion estimation is an unsolved problem, being identified in many papers for years. One of the most common problems in currently known method is an impulse response time-window used for diffuseness estimation. Different methods for diffuseness estimation will be described, based on estimation of statistical parameters like kurtosis and standard deviation with the discussion on the time-window selection and possible solutions for this problem, considering the so called mixing time problem. This paper will discuss the topic of misunderstanding the term of diffuseness as room acoustic measure, being the introduction to extended diffuseness estimation in multicriteria method.
Convention Paper 10186 (Purchase now)
P16-2 Modal Decay Times in Ducts and Rooms—Roberto Magalotti, B&C Speakers S.p.A. - Bagno a Ripoli (FI), Italy; Valentina Cardinali, B&C Speakers - Bagno a Ripoli, Italy
In order to model the behavior of environments dominated by modal resonances, it is important to find the relationship between modal decay times and boundary conditions. The paper investigates this relationship in simple systems (rectangular duct and room), with a theoretical approach validated by FEM simulations. In the rectangular room, the classification of modes in axial, tangential, and oblique categories is helpful in assessing how the impedance of walls influences decay times. The results are compared to the Sabine equation for reverberation time. Some hints for exploiting experimentally the results are given.
Convention Paper 10187 (Purchase now)
P16-3 How to Prepare Typical Cinema Theater to Become Multipurpose Music Venue—Piotr Kozlowski, Wroclaw University of Science and Technology - Wroclaw, Poland; Pracownia Akustyczna Kozlowski sp. j.
Many small towns or villages can afford to build and maintain only one cultural facility. Such buildings have one hall that must be used to hold various meetings, concerts, performances, lectures, and film screenings. It is well known that individual stage productions have quite different requirements regarding the room acoustic conditions. In order to be able to correctly perform various stage activities in one room, it is necessary to use solutions that adjust the parameters of the room acoustics. The work presents methods for providing flexible acoustics of multipurpose venues. On the example of existing venues, the possibility of adjusting room acoustics of the cinema hall to become a good space for music and theater shows is presented.
Convention Paper 10188 (Purchase now)
P16-4 A Method for Studying Interactions between Music Performance and Rooms with Real-Time Virtual Acoustics—Elliot K. Canfield-Dafilou, Center for Computer Research in Music and Acosutics (CCRMA), Stanford University - Stanford, CA, USA; Eoin F. Callery, CCRMA, Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA; Jonathan Berger, CCRMA, Stanford University - Stanford, CA, USA
An experimental methodology for studying the interplay between music composition and performance and room acoustics is proposed, and a system for conducting such experiments is described. Separate auralization and recording subsystems present live, variable virtual acoustics in a studio recording setting, while capturing individual dry tracks from each ensemble member for later analysis. As an example application, acoustics measurements of the Chiesa di Sant’Aniceto in Rome were used to study how reverberation time modifications effect the performance of a piece for four voices and organ likely composed for the space. Performance details, including note onset times and pitch tracks, are clearly evident in the recordings. Two example performance features are presented illustrating the reverberation time impact on this musical material.
Convention Paper 10189 (Purchase now)
P17 - Poster Session 3
Friday, March 22, 15:00 — 17:00 (The Liffey B)
P17-1 Audio Event Identification in Sports Media Content: The Case of Basketball—Panagiotis-Marios Filippidis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Nikolaos Vryzas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Rigas Kotsakis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Iordanis Thoidis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos Bratsas, Aristotle University of Thessaloniki - Thessaloniki, Greece
This paper presents an audio event recognition methodology in the case of basketball content. The proposed method leverages low-level features of the audio component of basketball videos to identify basic events of the game. Through the process of detecting and defining audio event classes, a sound event taxonomy of the sport is formed. The tasks of detecting acoustic events related to basketball games, namely referee whistles and court air horns, are investigated. For the purpose of audio event detection, a feature vector is extracted and evaluated for the training of one-class classifiers. The detected events are used to segment basketball games, while the results are combined with Speech-To-Text and text mining in order to pinpoint keywords in every segment.
Convention Paper 10190 (Purchase now)
P17-2 Objective and Subjective Comparison of Several Machine Learning Techniques Applied for the Real-Time Emulation of the Guitar Amplifier Nonlinear Behavior—Thomas Schmitz, University of Liege - Liege, Belgium; Jean-Jacques Embrechts, University of Liege - Liege, Belgium
Recent progress made in the nonlinear system identification field have improved the ability to emulate nonlinear audio systems such as the tube guitar amplifiers. In particular, machine learning techniques have enabled an accurate emulation of such devices. The next challenge lies in the ability to reduce the computation time of these models. The first purpose of this paper is to compare different neural-network architectures in terms of accuracy and computation time. The second purpose is to select the fastest model keeping the same perceived accuracy using a subjective evaluation of the model with a listening-test.
Convention Paper 10191 (Purchase now)
P17-3 A Generalized Subspace Approach for Multichannel Speech Enhancement Using Machine Learning-Based Speech Presence Probability Estimation—Yuxuan Ke, University of Chinese Academy of Sciences - Beijing, China; Yi Hu, University of Wisconsin - Milwaukee - Milwaukee, WI, USA; Jian Li, University of Chinese Academy of Sciences - Beijing, China; Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
A generalized subspace-based multichannel speech enhancement in frequency domain is proposed by estimating multichannel speech presence probability using machine learning methods. An efficient and low-latency neural networks (NN) model is introduced to discriminatively learn a gain mask for separating the speech and the noise components in noisy scenarios. Besides, a generalized subspace-based approach in frequency domain is proposed, where the speech power spectral density (PSD) matrix and the noise PSD matrix are estimated by short-term and long-term averaging periods, respectively. Experimental results show that the proposed method outperforms the existing NN-based beamforming methods in terms of the perceptual evaluation of speech quality score and the segmental signal-to-noise ratio improvement.
Convention Paper 10192 (Purchase now)
P17-4 Detecting Road Surface Wetness Using Microphones and Convolutional Neural Networks—Giovanni Pepe, Universitá Politecnica delle Marche - Ancona, Italy; ASK Industries S.p.A. - Montecavolo di Quattro Castella (RE), Italy; Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Livio Ambrosini, Universita Politecnica delle Marche - Ancona, Italy; ASK Industries S.p.A. - Montecavolo di Quattro Castella (RE), Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Luca Cattani, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy
The automatic detection of road conditions in next-generation vehicles is an important task that is getting increasing interest from the research community. Its main applications concern driver safety, autonomous vehicles, and in-car audio equalization. These applications rely on sensors that must be deployed following a trade-off between installation and maintenance costs and effectiveness. In this paper we tackle road surface wetness classification using microphones and comparing convolutional neural networks (CNN) with bi-directional long-short term memory networks (BLSTM) following previous motivating works. We introduce a new dataset to assess the role of different tire types and discuss the deployment of the microphones. We find a solution that is immune to water and sufficiently robust to in-cabin interference and tire type changes. Classification results with the recorded dataset reach a 95% F-score and a 97% F-score using the CNN and BLSTM methods, respectively.
Convention Paper 10193 (Purchase now)
P17-5 jReporter: A Smart Voice-Recording Mobile Application—Lazaros Vrysis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Nikolaos Vryzas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Efstathios Sidiropoulos, Aristotle University of Thessaloniki - Thessaloniki, Greece; Evangelia Avraam, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece
The evaluation of sound level measuring mobile applications shows that the development of a sophisticated audio analysis framework for voice-recording purposes may be useful for journalists. In many audio recording scenarios the repetition of the procedure is not an option, and under unwanted conditions the quality of the capturing is possibly degraded. Many problems are fixed during post-production but others may make the source material useless. This work introduces a framework for monitoring voice-recording sessions, capable of detecting common mistakes and providing the user with feedback to avoid unwanted conditions, ensuring the improvement of the recording quality. The framework specifies techniques for measuring sound level, estimating reverberation time, and performing audio semantic analysis by employing audio processing and feature-based classification.
Convention Paper 10194 (Purchase now)
P17-6 Two-Channel Sine Sweep Stimuli: A Case Study Evaluating 2-n Channel Upmixers—Laurence Hobden, Meridian Audio Ltd. - Huntingdon, Cambridgeshire, UK; Christopher Gribben, Meridian Audio Ltd. - Huntingdon, Cambridgeshire, UK
This paper presents new two-channel test stimuli for the evaluation of systems where traditional monophonic test signals are not suitable. The test stimuli consist of a series of exponential sine sweep signals with varying inter-channel level difference and inter-channel phase difference. As a case study the test signals have been used to evaluate a selection of 2-n channel upmixers within a consumer audio-visual receiver. Results from using the new stimuli have been shown to provide useful insight for the improvement and development of future upmixers.
Convention Paper 10195 (Purchase now)
P17-7 A Rendering Method for Diffuse Sound—Akio Ando, University of Toyama - Toyama, Japan
This paper proposes a new audio rendering method that tries to preserve the sound inputs to both ears instead of the sound direction. It uses a conversion matrix that converts the original sound signal into the converted sound signal with the different number of channels. The least squares method optimizes the matrix so as to minimize the difference between the input signals to both ears by the original signal and those by the rendered signals. To calculate the error function, the method uses the Head Related Impulse Responses. Two rendering experiments were conducted to evaluate the method. In the first experiment, 22 channel signals of 22.2 multichannel without two LFE channels were rendered into three dimensional 8-channel signals by the conventional directional-based method and the new method. The result showed that the new method could preserve the diffuseness of sound better than the conventional method. In the second experiment, the 22 channel signals were converted into 2-channel signals by the conventional downmix method and the new method. The evaluation result based on the cross correlation coefficient showed that there were not so many differences between the downmix method and the new method. However, the informal listening test showed that the new method might preserve the diffuseness of sound better than the downmix method.
Convention Paper 10196 (Purchase now)
P18 - MIR
Friday, March 22, 16:30 — 18:00 (Meeting Room 2)
Chair:
Konstantinos Tsioutas, Athens University of Economics and Business - Athens, Greece
P18-1 Evaluating White Noise Degradation on Sonic Quick Response Code (SQRC) Decode Efficacy—Mark Sheppard, Anglia Ruskin University - Cambridge, Cambridgeshire, UK; Rob Toulson, University of Westminster - London, UK
With the advent of high-resolution recording and playback systems, a proportion of the ultrasonic frequency spectrum can potentially be utilized as a carrier for imperceptible data, which can be used to trigger events or to hold metadata in the form of, for example, an ISRC (International Standard Recording Code), a website address or audio track liner notes. The Sonic Quick Response Code (SQRC) algorithm was previously proposed as a method for encoding inaudible acoustic metadata within a 96 kHz audio file in the 30–35 kHz range. This paper demonstrates the effectiveness of the SQRC decode algorithm when acoustically transmitted over distance while evaluating the degradation effect of adding ultrasonic banded white noise to the pre and post transmission SQRC signal.
Convention Paper 10197 (Purchase now)
P18-2 Tagging and Retrieval of Room Impulse Responses Using Semantic Word Vectors and Perceptual Measures of Reverberation—Emmanouil Theofanis Chourdakis, Queen Mary University London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
This paper studies tagging and retrieval of room impulse responses from a labelled library. A similarity-based method is introduced that relies on perceptually relevant characteristics of reverberation. This method is developed using a publicly available dataset of algorithmic reverberation settings. Semantic word vectors are introduced to exploit semantic correlation among tags and allow for unseen words to be used for retrieval. Average precision is reported on a subset of the dataset as well as tagging of recorded room impulse responses. The developed approach manages to assign downloaded room impulse responses to tags that match their short descriptions. Furthermore, introducing semantic word vectors allows it to perform well even when large portions of the training data have been replaced by synonyms.
Convention Paper 10198 (Purchase now)
P18-3 A Custom Integrated Circuit Based Audio-to-CV and Audio-to-MIDI Solution—Brian Kaczynski, Second Sound, LLC - Miami, FL, USA
A new synthesizer technology is demonstrated that tracks the fundamental frequency of virtually any acoustic or electric instrument played monotonically. This technology relies on a mixed analog-digital application-specific integrated circuit (ASIC), which contains a very fast frequency-locked loop (FLL) that tracks with the minimum physically achievable latency of one audio cycle. The ASIC also contains a novel fundamental frequency detection circuit composed of two switched-capacitor peak detectors with decay time proportional to the fundamental period of the audio signal and a novel switched-capacitor, zero-ripple envelope follower. This frequency-tracking technology is fast enough to implement an audio-to-CV or even, with the addition of a simple microcontroller, an audio-to-MIDI solution in real time with very high accuracy and negligible latency.
Convention Paper 10199 (Purchase now)
P19 - Audio and Games
Saturday, March 23, 09:00 — 11:00 (Meeting Room 3)
Chair:
Dylan Menzies, University of Southampton - Southampton, UK
P19-1 A Framework for Understanding and Defining Quality of Musicians’ Experience in Network Music Performance Environments—Konstantinos Tsioutas, Athens University of Economics and Business - Athens, Greece; Ioannis Doumanis, University of Central Lancashire - Preston, UK; George Xylomenos, Athens University of Economics and Business - Athens, Greece
While there is considerable work on network and system level metrics related to Network Music Performance (NMP), assessing the Quality of Musician’s Experience (QoME) in NMP sessions must also take into account the emotional and psychological aspects of the participants. We propose a research framework that integrates both subjective and objective aspects of musicians’ experience by explicitly considering the psychological state and profile of each musician, the environment acoustic variables, and the performance of the network as the key dimensions that impact QoME. We will use the proposed framework to drive empirical studies designed to explore the QoME of musicians performing musical pieces over the Internet; this paper is a first step in this direction.
Convention Paper 10200 (Purchase now)
P19-2 Aretousa: A Competitive Audio Streaming Software for Network Music Performance—Konstantinos Tsioutas, Athens University of Economics and Business - Athens, Greece; George Xylomenos, Athens University of Economics and Business - Athens, Greece; Ioannis Doumanis, University of Central Lancashire - Preston, UK
Many existing open source systems provide support for Network Music Performance (NMP), with each one catering to a specific system and usage scenario. As our research in evaluating the Quality of Experience (QoE) of NMP systems as perceived by musicians involves widely different scenarios and requires extensive instrumentation of the platform, we built a new NMP system, Aretousa. Our system offers a large number of configuration and monitoring options, without sacrificing latency, the most critical factor for NMP. To show that Aretousa provides flexibility while being competitive with the state of the art in terms of latency, we present measurements comparing it against JackTrip in multiple setups over a high speed research network.
Convention Paper 10201 (Purchase now)
P19-3 Measuring the Impact of Level of Detail for Environmental Soundscapes in Digital Games—Igor Dall'Avanzi, Goldsmiths College, University of London - London, UK; Matthew Yee-King, Goldsmiths College, University of London - London, UK
The design of sonic environments in digital games poses an unanswered question of believability. How much time and resources should be used to replicate an element that is stochastic and unpredictable in nature, in order to convey a satisfactory experience? We analyze the effect on player’s immersion caused by the detail of digital environmental sounds (soundscapes). Two groups of participants are asked to play two different versions of the same game. One processes audio elements on run time for higher levels of detail, while the other one uses looped files. Player’s immersion is measured afterwards using the Immersive Experience Questionnaire [1] and qualitative questions. Results showed no considerable difference between the two groups, and we discuss some possible explanations for this.
Convention Paper 10202 (Purchase now)
P19-4 Augmented Audio-Only Games: A New Generation of Immersive Acoustic Environments through Advanced Mixing—Nikos Moustakas, Ionian University - Corfu, Greece; Andreas Floros, Ionian University - Corfu, Greece; Emmanouel Rovithis, Ionian University - Corfu, Greece; Konstantinos Vogklis, Ionian University - Corfu, Greece
Audio-only games represent an alternative type of gaming genres that continuously evolves following the technological trends that boost the video-games market. Since augmented reality is now a widespread approach for producing new kind of immersive applications, including games, it is expected that audio-only games will be influenced by this approach. This work represents the beginning of an attempt to investigate the process of delivering augmented audio-only games, focusing on specific technical factors that can improve the user interaction and the overall game-play experience. In particular, it focuses on a new augmented reality audio mixing process that is optimized for variable acoustic environments allowing the development of new attractive titles of audio-only games.
Convention Paper 10203 (Purchase now)
P20 - Poster Session 4
Saturday, March 23, 10:00 — 12:00 (The Liffey B)
P20-1 Quantitative Analysis of Streaming Protocols for Enabling Internet of Things (IoT) Audio Hardware—Marques Hardin, Anglia Ruskin University - Cambridge, UK; Rob Toulson, University of Westminster - London, UK
Given that traditional music production techniques often incorporate analog audio hardware, the Internet of Things (IoT) presents a unique opportunity to maintain past production workflows. For example, it is possible to enable remote digital connectivity to rare, expensive, and bespoke audio systems, as well as unique spaces for use as echo chambers. In the presented research quantitative testing is conducted to verify the performance of audio streaming platforms. Results show that using a high-speed internet connection, it is possible to stream lossless audio with low distortion, no dropouts and around 30 ms round-trip latency. Therefore, with future integration of audio streaming and IoT control protocols, a new paradigm for remote analog hardware processing in music production could be enabled.
Convention Paper 10204 (Purchase now)
P20-2 Automatic Detection of Audio Problems for Quality Control in Digital Music Distribution—Pablo Alonso-Jiménez, Universitat Pompeu Fabra - Barcelona, Spain; Luis Joglar-Ongay, SonoSuite - Barcelona, Spain; Xavier Serra, Universitat Pompeu Fabra - Barcelona, Spain; Dmitry Bogdanov, Universitat Pompeu Fabra - Barcelona, Spain
Providing contents within the industry quality standards is crucial for digital music distribution companies. For this reason an excellent quality control (QC) support is paramount to ensure that the music does not contain audio defects. Manual QC is a very effective and widely used method, but it is very time and resources consuming. Therefore, automation is needed in order to develop an efficient and scalable QC service. In this paper we outline the main needs to solve together with the implementation of digital signal processing algorithms and perceptual heuristics to improve the QC workflow. The algorithms are validated on a large music collection of more than 300,000 tracks.
Convention Paper 10205 (Purchase now)
P20-3 Investigation of an Encoder-Decoder LSTM Model on the Enhancement of Speech Intelligibility in Noise for Hearing Impaired Listeners—Iordanis Thoidis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Lazaros Vrysis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Konstantinos Pastiadis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Konstantinos Markou, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
Hearing impaired (HI) listeners often struggle to follow conversations when exposed in a complex acoustic environment. This is partly due to the reduced ability in recovering the target speech Temporal Envelope (ENV) cues from Temporal Fine Structure (TFS). This study investigates the enhancement of speech intelligibility in HI listeners by processing the ENV of speech signals corrupted by real-world environmental noise. An Encoder-Decoder Long Short Term Memory (LSTM) model is exploited after perceptually motivated processing stages to compensate for the important ENV characteristics of comprehensible speech for hearing impairment. The computational model is evaluated using the Short-Time Objective Intelligibility (STOI) measure for speech intelligibility. Finally, results indicate a 6% improvement in the mean STOI measure across different SNR values.
Convention Paper 10206 (Purchase now)
P20-4 Noise Exposure of PC Video Games Players—Gino Iannace, Università della Campania "Luigi Vanvitelli" - Aversa, Italy; Giuseppe Ciaburro, Università della Campania Luigi Vanvitelli - Aversa, Italy; Amelia Trematerra, Universitá della Campania "Luigi Vanvitelli" - Aversa, Italy
Video games are a leisure activity that is being practiced by more and more people. Even the average age of the users is gradually increasing, representing a pleasant activity for any age. The literature has widely insinuated the doubt whether such widespread use could have negative consequences for the health of its users. This article describes noise exposure measurement activities for video game users. The damage caused by noise depends on both the acoustic power as well as the exposure time. For this reason, different noise exposure scenarios produced by video games have been simulated. The results of the study show that the daily level of noise exposure is close to the limits imposed by legislation, despite the hours of rest, and were performed in an environment with a low background noise (46.0 dBA).
Convention Paper 10207 (Purchase now)
P20-5 Key Benefits and Drawbacks of Surrounding Sound when Wearing Headphones or Hearing Protection—Oscar Kårekull, 3M Peltor - Värnamo, Sweden; Tech Lic, 3M Peltor Communications - Värnamo, Sweden; Magnus Johansson, 3M Peltor - Värnamo, Sweden
Reproduction of sound in headphones or hearing protectors is essentially a trade-off between sound from the signal source, e.g., a cellphone, and environmental sounds. Acceptable signal to noise ratios and the useful noise level range for communication can be determined by already available measurement methods. The attenuation of surrounding noise, e.g., measured according to ISO 4869-1, can determine the signal to noise ratio but also determine the detection threshold of surrounding sound. Speech intelligibility tests can determine the level of surrounding noise where communication with nearby people is possible. In between these limits, a product can be optimized for different situations. Examples of measured detection levels are presented and the in between performance to the speech intelligibility limit is discussed.
Convention Paper 10208 (Purchase now)
P20-6 The Assessment of Maximum and Peak Sound Levels of F3 Category Fireworks—Kamil Piotrowski, AGH University of Science and Technology - Kraków, Poland; Adam Szwajcowski, AGH University of Science and Technology - Kraków, Poland; Bartłomiej Kukulski, AGH University of Science and Technology - Kraków, Poland
Fireworks are widely discussed in the aspect of harming people. Apart from their spectacular effects, there is a serious danger of hearing loss caused by too close bursts when attending as an operator and also as a spectator. The unpleasant truth is that the documents regarding fireworks are usually not specific enough in terms of limiting the excessive noise exposure and the sound pressure levels generated by some fireworks can be simply too high. The paper presents the results of the examination on F3 class firecrackers impulsive noise. The authors measured two types of F3 explosive materials and analyzed the obtained maximum, peak, and exposure values in accordance with PN-EN 15947-4:2016-02. Pyrotechnic articles. Fireworks, Categories F1, F2 and F3. Test methods.
Convention Paper 10209 (Purchase now)
P20-7 A High Power Switch-Mode Power Audio Amplifier—Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Jóhann Björnsson, ICEpower A/S - Søborg, Denmark; Patrik Boström, ICEpower A/S - Helsingborg, Sweden; Lars Petersen, ICEpower A/S - Søborg, Denmark
Switch-mode power audio amplifiers, also known as class-D, have become the conventional choice for high power applications. This paper presents the considerations for designing a high power audio amplifier power stage. This includes an overview of loss mechanism including reverse recovery losses that are increasingly important at higher output powers. A 4 kW prototype amplifier is implemented. Absolute maximum ratings shows up to +/– 190 V output voltage swing and 6.5 kW for sine wave burst. THD+N levels go down to 0.003% for 100 Hz and are generally below 0.1% up to clipping at 4 kW in a 4 O load.
Convention Paper 10215 (Purchase now)
P21 - Hearing/Perception
Saturday, March 23, 10:30 — 12:00 (Meeting Room 2)
Chair:
Matteo Torcoli, Fraunhofer IIS - Erlangen, Germany
P21-1 Measurement of Bone-Air Differential Transfer Function Based on Hearing Threshold—Huifang Tang, Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Jie Wang, Guangzhou University - Guangzhou, China; Jinqiu Sang, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
In this paper the bone-air differential transfer function (BADTF) is defined as the difference between the bone conduction (BC) and the air conduction (AC) transfer functions. This can be equivalent to the bone to air hearing threshold gap when the system is linear. The single ear BADTF was measured at 29 frequencies ranging from 0.5 to 8 kHz with 5 normal hearing subjects. Repeatability of this method was also verified. The results show that there are obvious individual differences but all the curves have similar envelopes. With the BADTF, the individual equalization can significantly improve the performance of the BC reproduction at low frequencies and make the BC sound perceived closer to the target timbre.
Convention Paper 10210 (Purchase now)
P21-2 Calibration of Digital Sound Projectors with Scene Uncertainty —Luke Ferguson, Trinity College Dublin - Dublin, Ireland; Enda Bates, Trinity College Dublin - Dublin, Ireland; Hugh O'Dwyer, Trinity College - Dublin, Ireland; Sebastian Csadi, Trinity College Dublin - Dublin, Ireland; Francis M. Boland, Trinity College Dublin - Dublin, Ireland
This paper addresses the calibration problem for digital sound projection in the context of uncertain scene geometry. The image method is extended to handle uncertainties in the description of reflectors, specifically the distance parameter, relative to the source and/or receiver. Under the assumption that the source is a linear array of loudspeakers parallel with the back wall of a rectangular room, a novel extended image method is applied to compute probability distributions for the beamforming parameters for digital sound projection. The calibration is enriched with information available from probability distributions of the planar reflectors in the scene. Computer simulations are conducted to validate the calibration accuracy and to evaluate the performance of system. The expected deviation from optimum performance is quantified by analyzing the expected soundfield at the receiver position. This paper also highlights the sensitivity of digital sound projectors to measurement errors under certain constrained conditions.
Convention Paper 10211 (Purchase now)
P21-3 Perception of Auditory Events in Scenarios with Projected and Direct Sound from Various Directions—Tom Wühle, TU Dresden, Lehrstuhl für Akustik und Haptik - Dresden, Germany; Sebastian Merchel, TU Dresden - Dresden, Germany; Ercan Altinsoy, TU Dresden - Dresden, Germany
Sound projecting audio systems realize the reproduction of sound from different directions via re?ections paths using highly focusing sound sources. However, the limited focusing capabilities of real sources, e.g., loudspeaker arrays, cause the perception of the listener in practice to be in?uenced by direct sound in addition to the projected sound. This study dealt with the separation of auditory events caused by increasing perceptual dominance of the leading direct sound in sound projection. For that, the perception of auditory events for different directions of direct and projected sound and increasing direct sound level was evaluated. The separation varied with the different directions of direct and projected sound. The effect of sound projection, however, was not in?uenced.
Convention Paper 10212 (Purchase now)
P22 - Physical Systems and Circuits
Saturday, March 23, 14:00 — 15:00 (Meeting Room 2)
Chair:
John Robert Emmett, Nostairway Creative - Twickenham Studios, UK
P22-1 Statistical Analysis of Audio Triode Tube Properties Based on an Advanced Physical Device Model—Toshihiko Hamasaki, Hiroshima Institute of Technology - Hiroshima, Japan; Reo Sasaki, Hiroshima Institute of Technology - Hiroshima, Japan; Masaki Inui, Hiroshima Institute of Technology - Hiroshima, Japan
The timbre of tube amplifier has been considered to depend not only on the tube type but also on the tube manufacture and its individual difference as well. However, quantitative difference of tube properties of manufacture has not been clarified yet. In this study the manufactures differences of triode tube 12AX7/ECC83 are analyzed statistically by an advanced physical device model. The model parameter values are extracted from measured family curves of 60 devices in total for 5 major tube manufactures for a guitar amplifier. The characteristics of each manufacture tube are clarified by an average and a dispersion of respective parameter value set. Furthermore, the root cause of the significant manufacturing process instability is identified based on the correlation of all combination of parameters for the first time. The tube properties of 5 manufactures are divided into 3 groups by clustering analysis using cosine similarity on the vector of parameters.
Convention Paper 10213 (Purchase now)
P22-2 A Novel Digital Radio-Frequency Capacitor Microphone with Gain Rangin—Lars Urbansky, Helmut-Schmidt University - Hamburg, Germany; Udo Zölzer, Helmut-Schmidt-University Hamburg - Hamburg, Germany
Most capacitor microphones use an audio-frequency (AF) implementation. In an AF circuit, a capacitor is charged with a constant bias voltage leading to a high-impedance circuit. In contrast, by using a radio-frequency (RF) approach, the capacitor is operated on a higher frequency band which reduces the circuit’s impedance. However, state of the art RF microphones are entirely analog. Thus, a novel digital RF condenser microphone system is proposed. Furthermore, it is extended by a corresponding gain ranging approach. The expected advantages are a further improved demodulation linearity due to a digital demodulation and a circumvention of analog disadvantages due to the smaller required analog circuit. Additionally, because of the analog bandpass signal, it is expected to utterly bypass the electrical low frequency 1/f noise.
Convention Paper 10214 (Purchase now)