Saturday, May 20, 09:00 — 12:30 (Salon 2+3 Rome)
Aki Mäkivirta (Chair)
P02-01 Active Vibration Control of Breakup Modes in Loudspeaker Diaphragms
William Cardenas (Presenting Author)
One of the factors that contribute to the degradation of the sound quality of the loudspeakers are the breakup modes of the membrane since they cause complex directivity patterns, peaks, and deeps in the frequency response. This paper presents an active vibration control simulation applied to a 2D Finite Element model of a loudspeaker loaded by a fluid domain to demonstrate an innovative alternative to reduce the amplitude of the breakup modes of loudspeaker diaphragms, improving substantially the mechanical and acoustical performance. The benefits of the controlled system are demonstrated in terms of the acceleration response of the cone and the acoustic directivity.
Convention Paper 9693
P02-02 An Acoustic Radiator with Integrated Cavity and Active Control of Surface Vibration
Arthur Berkhoff (Presenting Author), Farnaz Tajdari (Author)
This paper presents a method to realize an acoustic source for low frequencies with relatively small thickness. A honeycomb plate structure that is open on one side combines the radiating surface and the major part of the air cavity. The vibration of the plate is controlled with a decentralized feedback controller. The fundamental resonance is controlled, as well as higher-order bending modes, while avoiding possible instabilities due to the fluid-structure interaction. The smooth and well defined frequency response enables robust feedforward control for further response equalization. The influence of different actuation principles on the overall system efficiency is compared.
Convention Paper 9694
P02-03 The Acoustic Design of Minimum Diffraction Coaxial Loudspeakers with Integrated Waveguides
Aki Mäkivirta (Presenting Author), Thomas Lund (Author), Ilpo Martikainen (Author), Siamäk Naghian (Author), Jussi Väisänen (Author)
Complementary to precision microphones, creating an ideal point source monitoring speaker has long been considered the holy grail of loudspeaker design. Coaxial transducers unfortunately typically come with several design compromises, such as adding intermodulation distortion, giving rise to various sources of diffraction, and resulting in somewhat restricted maximum output performance or frequency response. In this paper we review the history of coaxial transducer design, considerations for an ideal point source loudspeaker, discuss the performance of a minimum diffraction coaxial loudspeaker and describe novel designs where the bottlenecks of conventional coaxial transducers have been eliminated. In these, the coaxial element also forms an integral part of a compact, continuous waveguide, thereby further facilitating smooth off-axis dispersion.
Convention Paper 9695
P02-04 Root Cause Analysis of Rocking Modes in the Nonlinear Domain
Andreas Schwock (Presenting Author), William Cardenas (Author), Mattia Cobianchi (Author), Wolfgang Klippel (Author)
Rocking modes are caused by small imbalances in the distribution of stiffness, mass, and force factor. A measurement technique to determine these root causes, using laser vibrometry, parameter identification, and root causes analysis has been presented in a previous paper. This paper focuses on the application of this technique to examine rocking modes in nonlinear domain. An incremental DC-offset is applied to the loudspeaker to examine changes of the root causes of the rocking throughout the working range of the loudspeaker.
Convention Paper 9696
P02-05 Nonlinearity of Ported Loudspeaker Enclosures
Juha Backman (Presenting Author)
This paper presents the results of a computational fluid dynamics analysis of an unlined ported enclosure, focusing on the behavior around the tuning frequency. The work presents results for the amplitude dependence of the behavior and the time development of the sound field. The results indicate that the vortex formation around the port ends has a significant effect already at a relatively low flow velocities, and that the nonlinearity of the port is clearly visible in the acoustical load seen by the driver at the resonance frequency.
Convention Paper 9697
P02-06 Efficiency Investigation of Subwoofer Driven Around Resonance Frequency
Tobias Thydal (Presenting Author), Niels Elkj'r Iversen (Author), Arnold Knott (Author)
The need for efficient portable speaker systems has increased tremendously over the past 10 years. The batteries, amplifiers, and filtering has all seen great improvements in efficiency leaving the speakers’ units as the most inefficient part of the system, mainly due to the large amounts of current drawn that ends up being dissipated as heat in the voice coil. This paper will look at how you can design a speaker system to take advantage of the resonance of a speaker unit, since that is where the unit is most efficient and draws the least current. A subwoofer speaker system will be designed with focus on only driving the speaker units near their resonance frequency. The tests found that with modern DSP it was rather simple to design a speaker system that operates in a very narrow frequency band around the speaker units’ resonance frequencies, which in turn ensured a very small current draw. This greatest drawback of this method is the increase in components needed, which drives up cost and complexity.
Convention Paper 9698
P02-07 An Analytical Approach for Optimizing the Curving of Line Source Arrays
Florian Straube (Presenting Author), David Albenés Bonillo (Author), Frank Schultz (Author), Stefan Weinzierl (Author)
Line source arrays (LSAs) are used for large-scale sound reinforcement aiming at the synthesis of homogeneous sound fields for the whole audio bandwidth. The deployed loudspeaker cabinets are rigged with different tilt angles and/or electronically controlled in order to provide the intended coverage of the audience zones and to avoid radiation towards the ceiling, reflective walls or residential areas. This contribution introduces the analytical polygonal audience line curving (PALC) approach for finding appropriate LSA cabinet tilt angles with respect to the geometry of the receiver area, and the intended coverage. PALC can be previously applied to a numerical optimization of the loudspeakers’ driving functions. The method can be used with different objectives, such as a constant interaction between adjacent cabinets with respect to the receiver geometry or by additionally considering amplitude attenuation. PALC is compared with typical standard LSA curving schemes. The advantages of the presented approach regarding sound field homogeneity and target-oriented radiation will be shown with the help of technical quality measures.
Convention Paper 9699
Saturday, May 20, 09:30 — 12:00 (Salon 1 Moscow)
David Griesinger (Chair)
P01-01 Perceptual Evaluation of Synthetic Early Binaural Room Impulse Responses Based on a Parametric Model
Philipp Stade (Presenting Author), Johannes M. Arend (Author), Christoph Pörschmann (Author)
Binaural synthesis is often applied in the field of spatial audio to create a virtual acoustic environment using binaural room impulse responses (BRIRs). In the same area of research, spherical microphone arrays are gaining importance and allow for a spatio-temporal analysis. We present a new approach to describe the acoustical environment by a parametric model using sound field analysis. Combining spherical head related impulse responses (HRIRs) with this description, early BRIRs are synthesized and compared to the measured counterparts in a perceptual evaluation. The listening experiment revealed adequate performance of the approach, almost independently from room and test signal. Surprisingly the synthesis of direct sound and only diffuse reverberation yielded nearly the same results as for the entire parametric model.
Convention Paper 9688
P01-02 Implementation and Evaluation of a Low-Cost Headtracker for Binaural Synthesis
Michael Romanov (Presenting Author), Paul Berghold (Author), Matthias Frank (Author), Daniel Rudrich (Author), Markus Zaunschirm (Author), Franz Zotter (Author)
Human auditory localization strongly relies on head movements. Thus, for plausible perception of virtual acoustic scenes the incorporation of head movements is mandatory. This is achieved via loudspeaker playback as a listener can move the head relatively to the scene. When using binaural synthesis the head movements need to be tracked and the scene needs to be rotated accordingly to achieve a stable perception of the acoustic scene. We present a low-cost, plug-and-play device (MrHeadTracker ) to facilitate head-tracking based on the Arduino platform and the BNO055 sensor. Its performance is compared against another low-cost device (GY-85) and an optical tracking system (Optitrack Flex 13). The proposed MrHeadTracker outperforms the GY-85 device in terms of accuracy and latency and yields comparable results to the optical tracking system.
Convention Paper 9689
P01-03 Influence of Head Tracking on the Externalization of Auditory Events at Divergence between Synthesized and Listening Room Using a Binaural Headphone System
Stephan Werner (Presenting Author), Georg Götz (Author), Florian Klein (Author)
This contribution presents an investigation on the influence of head tracking on the perceived externalization of auditory events using a binaural headphone system. Recordings of individual binaural room impulse responses of a five channel loudspeaker setup in two acoustic different rooms are conducted. Test persons are divided into two groups, while for the first group the listening and synthesized rooms do match (convergence), they do not for the second group (divergence). Moving the head during listening is mandatory and controlled by the test procedure. Perceived externalization of auditory events is used as a quality feature. The analysis of the ratings confirms that head tracking increases perceived externalization. Furthermore, the room divergence effect can be confirmed. Significantly lower externalization is observed if a divergence between the resynthesized and listening room occurs. However, the results clearly show that the benefit of head tracking on externalization does not overcome the room divergence effect.
Convention Paper 9690
P01-04 Laboratory Reproduction of Binaural Concert Hall Measurements through Individual Headphone Equalization at the Eardrum
David Griesinger (Presenting Author)
Progress relating measurements to perception of acoustics of all kinds has been stymied by the difficulty of accurately reproducing a room sound in a laboratory. Spatial aliasing above 1000 Hz, where most information in speech and music resides, severely limits the ability of multiple loudspeaker systems to reproduce proximity. We have developed a simple method of equalizing headphones that accurately reproduces the timbre of a frontal sound source at the eardrums. Combining individual headphone playback with Tapio Lokki’s anechoic recordings makes hall research inexpensive, rapid, and accurate. We can easily test the effects of early reflections and other spatial properties. We find the earliest reflections, whether medial or lateral, are almost always detrimental. Examples from real halls will be presented.
Convention Paper 9691
P01-05 Approaching Immersive 3D Audio Broadcast Streams of Live Performances
Giordano Jacuzzi (Presenting Author), Sofia Brazzola (Author), Johannes Kares (Author)
This paper explores the requirements and best practices of recording, mixing, and streaming broadcasts of live music performances in binaural and ambisonics formats. We outline the optimal workflows for incorporating and executing 3D audio streams with existing in-house infrastructure, as determined from our experience gathered testing and broadcasting public concert events in partnership with Moods jazz club in Zurich, Switzerland, and Vienna State Opera in Vienna, Austria. In addition, this paper discusses the current technological barriers for immersive audio content creation and consumption, areas for growth and improvement, and future projections for 3D immersive audio technology.
Convention Paper 9692
Saturday, May 20, 09:30 — 12:30 (Gallery Window)
P03-01 Analysis of the Subgrouping Practices of Professional Mix Engineers
David Michael Ronan (Presenting Author), Hatice Gunes (Author), Joshua D. Reiss (Author)
Subgrouping facilitates the simultaneous manipulation of a number of audio tracks and is a central aspect of mix engineering. However, the decision process of subgrouping is a poorly documented technique. This study sheds light on this ubiquitous but poorly defined mix practice and provides rules and constraints derived from a questionnaire that could be used in intelligent audio production tools. We prepared an online questionnaire consisting of 21 questions testing nine assumptions and identifying subgrouping decisions, such as why a mix engineer creates subgroups, when they subgroup and how many subgroups they use. We analyzed responses from 10 award winning mix engineers. Thematic analysis enabled us to discover five themes: Decisions, Subgroup Effect Processing, Organization, Exercising Control, and Analogue versus Digital. By analyzing the themes and each respondent’s quantitative data we were able to show that eight out of nine assumptions could be accepted to be true.
Convention Paper 9700
P03-02 Combining Preference Ratings with Sensory Profiling for the Comparison of Audio Reproduction Systems
Tim Walton (Presenting Author), Michael Evans (Author), David Kirk (Author), Frank Melchior (Author)
One aim of perceptual audio evaluation is to understand the relationships between individual sensory attributes and overall quality of experience. This paper discusses one perceptual evaluation method by which this can be realized. Open Profiling of Quality (OPQ), a method first introduced in the field of visual and audiovisual evaluation, involves psychoperceptual evaluation, sensory profiling, and external preference mapping stages and is suitable for use with naïve listeners. Here, a methodological case study is presented in which we discuss the implementation of this method and its adaptation for the comparison of audio reproduction systems.
Convention Paper 9701
P03-03 The Audience Effect on the Acoustics of Ancient Theaters in Modern Use
Gino Iannace (Presenting Author), Amelia Trematerra (Author)
Ancient theaters are used in modern contexts for different types of shows. When ancient theaters are used for musical performances, the audience criticizes the acoustics due to either not being able to understand what is spoken or the weakness of the music. An important aspect is the presence of the audience in the cavea, with it being important to understand whether it can have a negative role. Since it is not possible to take acoustic measurements during theater performances, the evaluation of the effects of the presence of the audience on the acoustics is carried out virtually through the software, “Odeon,” in which the presence of the audience is simulated by changing the absorption coefficient value of the cavea.
Convention Paper 9702
P03-04 Evaluation of Training to Improve Auditory Memory Capabilities on a Mobile Device Based on a Serious Game Application
György Wersényi (Presenting Author), Hunor Nagy (Author)
Capabilities of the auditory memory system were tested in a serious game application developed for the Android mobile platform. Participants played the well-known game of finding pairs by flipping and remembering objects on cards arranged in a matrix structure. Visual objects were replaced by iconic auditory events (auditory icons, earcons). Total time and different error rates were recorded and the effect of training was also evaluated. Results indicate that training contributes to a better performance and human voice samples are the easiest to remember.
Convention Paper 9703
P03-05 Conversational Speech Quality in Noisy Environments
Michal Soloducha (Presenting Author), Stefan Bleiholder (Author), Frank Kettler (Author), Alexander Raake (Author)
The present study reports on a conversation test conducted to reveal insights on how telephony users perceive speech transmission quality in a noisy environment. For this purpose, a telephony setup has been built to simulate different degradations typical of real-life situations. A range of different conditions has been presented during the subjective test including usage of different terminals, environmental noises, and a noise suppression algorithm. A noise reproduction system has been installed on one side of the telecommunication channel to create an immersive noisy environment. Special focus of this paper is on the influence of different terminal devices and their signal processing on subjective quality. Moreover, more generic conclusions regarding conversational quality testing are provided.
Convention Paper 9704
P03-06 Acoustic Room Modelling Using a Spherical Camera for Reverberant Spatial Audio Objects
Luca Remaggi (Presenting Author), Trevor J. Cox (Author), Adrian Hilton (Author), Richard J. Hughes (Author), Philip J. B. Jackson (Author), Hansung Kim (Author), Ben Shirley (Author)
The ability to predict the acoustics of a room without acoustical measurements is a useful capability. The motivation here stems from spatial audio reproduction, where knowledge of the acoustics of a space could allow for more accurate reproduction of a captured environment, or for reproduction room compensation techniques to be applied. A cuboid-based room geometry estimation method using a spherical camera is proposed, assuming a room and objects inside can be represented as cuboids aligned to the main axes of the coordinate system. The estimated geometry is used to produce frequency-dependent acoustic predictions based on geometrical room modelling techniques. Results are compared to measurements through calculated reverberant spatial audio object parameters used for reverberation reproduction customized to the given loudspeaker set up.
Convention Paper 9705
P03-07 Estimating the Diffuseness Level of the Acoustic Field—Reverberation Chamber Under Study
Bartlomiej Chojnacki (Presenting Author), Artur Flach (Author), Tadeusz Kamisinski (Author), Adam Pilch (Author)
A reverberation chamber is a widely used type of a laboratory room for widespread usage of standards like a ISO 354, ISO 17497-1 or ISO 3741. Chambers' shapes and types vary a lot around the world and so do the results of the sound absorption coefficient measurements, even though they meet the standard criteria. Verification of the ISO standards parameters is required, also introducing extra parameters describing the level of sound field diffuseness. Model studies have been conducted using the ray-tracing method in order to verify the level of sound field diffuseness in varying versions of irregular reverberation chamber, further rated with kurtosis of room normalized impulse response and sound field diffuseness coefficient, in order to assess the geometries under study.
Convention Paper 9706
P03-08 Environmental and Technical Problems in Acoustical Scaled Models
Aleksandra Majchrzak (Presenting Author), Katarzyna Baruch (Author), Bartlomiej Chojnacki (Author), Jaroslaw Rubacha (Author)
Using acoustical scaled models provides numerous theoretical and practical issues. After formulating theoretical requirements regarding the sound field in a given object, one should also conduct considerable amount of measurements of acoustical properties of materials to be used in a construction or the sound source. The paper discusses difficulties met during performing acoustical measurements in a scaled reverberation chamber in Technical Acoustics Laboratory AGH, such as utilized sound sources and the ways of adjusting atmospheric conditions (relative humidity and temperature). The paper particularly concerns the ideas of changing the conditions of measurement environment, sound sources being in use and the frequency range of the performed measurements.
Convention Paper 9707
P03-09 OpenAirLib: A JavaScript Library for the Acoustics of Spaces
Damian Murphy (Presenting Author), Kenneth Brown (Author), Matthew Paradis (Author)
The possibilities for creating online sonic art and virtual acoustic environments have been increased by the introduction of audio Application Programming Interfaces, libraries, and online resources of acoustic impulse responses (which encapsulate the acoustics of real or imaginary spaces.) This paper presents the OpenAirLib JavaScript library that extends the capabilities of the World Wide Web Consortium Web Audio API, facilitating the incorporation of three-dimensional acoustics of spaces into web audio projects. It enables first-order ambisonic material from the Open Acoustic Impulse Response Library (OpenAIR) website to superimpose the acoustics of one space onto synthesized or recorded material from another space. An example is then described which uses these technologies to produce instances of generative online sonic art from OpenAIR data.
Convention Paper 9708
P03-10 Measurement and Visualization of Sound Intensity Vector Distribution in Proximity of Acoustic Diffusers
Adam Kurowski (Presenting Author), Bozena Kostek (Author), József Kotus (Author)
In this work, we would like to present analyses and visualizations of sound intensity distribution measured in proximity of an acoustic diffuser. Such distribution may be used for estimation of basic acoustic parameters of a diffuser. Measurement is performed with the use of a logarithmic sine sweep that allows for the analysis of waves scattered by the diffuser and rejecting the direct sound signal component. Pressure and sound intensity vector impulse responses are measured simultaneously. The measurement is carried out for a grid of 37 points arranged at equal intervals lying in a semicircle. To investigate the impact of objects evaluated on the sound wave propagation diffusion coefficients and sound intensity vector distributions are then compared.
Convention Paper 9709
Saturday, May 20, 14:45 — 16:15 (Salon 1 Moscow)
Ramona Bomhardt (Chair)
P04-01 Comparison of Spatial Characteristics of Head-Related Transfer Functions between the Horizontal and Median Planes
Xiaoli Zhong (Presenting Author), Bosun Xie (Author), Guangzheng Yu (Author)
Head-related transfer functions (HRTFs) vary with source direction and thus contain major localization cues. In this work, the directional variation of HRTFs in the horizontal and median planes are studied using directional Fourier expansion. Results indicate that up to 20 kHz, the preceding 8 or 9 order elevation harmonics account for 99% variation of HRTFs in the median plane; while the preceding 31 or 32 order azimuthal harmonics account for 99% variation of HRTFs in the horizontal plane. Therefore, the horizontal-plane variation of HRTFs is more complicated in comparison to the median-plane variation. Moreover, the front-back spatial symmetry calculated from expansion weights is compared between the horizontal and median planes.
Convention Paper 9710
P04-02 An Experiment to Evaluate the Performance of a Parametric Model for the Individualization of the HRTF in the Median Plane
Pablo Gutierrez-Parera (Presenting Author), Jose J. Lopez (Author)
Individualized HRTFs for headphone reproduction provide better immersion and natural localization of sounds than non-individualized, especially for elevated positions. This paper presents an experiment to determine the accuracy of individualized parametric modeled HRTFs. The modeling of the HRTFs was done with an algorithm that detects the main peaks and notches of the HRTF and models them with a chain of second order IIR peak filters. A subjective test was carried on to compare the perception and localization of the modeled versus measured HRTFs in the median plane. The analyzed data shows that simplified modeled versions of HRTFs with a few peaks simulated can obtain similar results than measured HRTF for elevation angles in the median plane.
Convention Paper 9711
P04-03 The Influence of Symmetrical Human Ears on the Front-Back Confusion
Ramona Bomhardt (Presenting Author), Janina Fels (Author)
Human beings have two ears to localize sound sources. At a first glance, the dimensions of the right and left ears are generally very similar. Nevertheless, the individual anthropometric dimensions and shape of both ears are disparate. These differences improve localization on the cone of confusion where interaural differences do not exist. To determine the influence of asymmetric ears, individual HRTF data sets are analytically and subjectively compared with their mirrored versions.
Convention Paper 9712
Saturday, May 20, 14:45 — 16:15 (Salon 2+3 Rome)
Steve Temme (Chair)
P05-01 Quantifying Consistency in Loudspeaker System Production
Andrew Goldberg (Presenting Author)
This paper defines a new metric for production consistency of loudspeaker systems. The motivation is to improve stereo imaging of phantom sources. The measurement system’s consistency and three different loudspeaker designs are analyzed as examples to show how the metric can highlight inconsistencies. The analysis and summary metrics clearly highlighted (the artificially induced) inconsistencies. A simple classification system (from A to H) for loudspeakers is also proposed. It is hoped that this, or a similar, metric will be integrated into international standards as a way to help designers improve their products, production to improve their processes, and customers to better inform their purchase decisions.
Convention Paper 9713
P05-02 Loudspeakers Performance Variance Due to Components and Assembly Process
Maria Costanza Bellini (Presenting Author), Angelo Farina (Author)
This paper presents an experimental study of the main causes of scrap during the production of a typical midrange loudspeaker. After analyzing the most critical components of a transducer, various samples with reference and defected components have been built and characterized in terms of frequency response and distortion. In addition, a second set of samples has been built using reference components but varying the assembly process parameters; these samples also have been characterized as the previous ones. Measurements have been performed both in an anechoic chamber and in a real production line and, by the analysis of acquired data, the authors have individuated the most influential components and assembly parameters in terms of required performance.
Convention Paper 9714
P05-03 Evaluation of Audio Test Methods and Measurements for End-of-the-Line Automotive Loudspeaker Quality Control
Steve Temme (Presenting Author), Viktor Dobos (Author)
In order to minimize costly warranty repairs, automotive manufacturers impose tight specifications and quality/reliability requirements on their part suppliers. At the same time, they also require low prices. This makes it important for automotive manufacturers to work with parts suppliers to define reasonable specifications and tolerances, and to understand both how the parts suppliers are testing and also how to carry out their own measurements for incoming QC purposes. Specifying and testing automotive loudspeakers can be very tricky since loudspeakers are inherently nonlinear, time-variant and effected by their working conditions and environment. This paper examines the loudspeaker characteristics that can be measured and discusses common pitfalls and how to avoid them on a loudspeaker production line. Several different audio test methods and measurements for end-of-the-line automotive speaker quality control are evaluated, and the most relevant ones identified. Speed, statistics, and full traceability are also discussed.
Convention Paper 9715
Saturday, May 20, 15:00 — 18:00 (Gallery Window)
P06-01 Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
Bozena Kostek (Presenting Author), T. Ciszewski (Author), Andrzej Czyzewski (Author), Magdalena Piotrowska (Author)
An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted in partitioning by editing two recorded sets of words into allophones, then signals were analyzed and subsequently audio excerpts were parametrized. The comparison of two sets of allophones was reinforced by the phonology expert’s assessment of produced speech sounds. Analyses presented in this paper allowed for determining a set of parameters describing an allophone pronunciation.
Convention Paper 9716
P06-02 Mathematical Model of the Acoustic Signal Generated by the Combustion Engine
Michal Luczynski (Presenting Author), Stefan Brachmanski (Author)
Development of the technology of electric vehicles is progressing. The acoustic problem of no audible sensation like combustion engine is a significant problem. It is a desirable feature and safety problem as well. More and more automotive companies are working on engine sound synthesis systems. However there is no universal standard helping solve this problem. The aim of this paper is to define a universal form of mathematical model and general assumption to design engine sound synthesis systems for electric motorbikes.
Convention Paper 9717
P06-03 Whispered Speech Speaker Recognition. Listening Tests versus Speaker Recognition System
Krzysztof Goliasz (Presenting Author), Michal Luczynski (Author)
In this study, efficiency of different methods of whispered speech speaker recognition have been compared. Whispered speech as one of the voice disguises is causing a lot of difficulties in speaker identification. During examination of both methods authors were looking for key factors of speaker recognition in both methods. Both methods were tested with and without normal voice as a reference sample. Non-acoustic circumstances as time and costs were considered during comparison as well.
Convention Paper 9718
P06-04 A Study on Audio Signal Processed by "Instant Mastering" Services
Magdalena Piotrowska (Presenting Author), Bozena Kostek (Author), Szymon Piotrowski (Author)
An increasing amount of music produced in home- and project-studios results in development and growth of "automatic mastering services." The presented investigation explores changes introduced to audio signal by various online mastering platforms. A music set consisting of 10 songs produced in small facilities was processed by eight on-line automatic mastering services. Additionally, some laboratory-constructed signals were tested. To determine, whether changes introduced to audio are invariable between trials, every music excerpt was submitted several times. For each sample, parameters related to music characteristics such as timbre, dynamics, and loudness were calculated before and after processing. Results obtained enable to discover some of the mechanisms underlying tested automatic mastering services as well as discern similarities and differences between various platforms.
Convention Paper 9719
P06-06 Car Infotainment Systems Capabilities vs. Customers’ Needs and Expectations
Bartlomiej Kukulski (Presenting Author)
To make the trip more enjoyable, decades ago vehicles gained radio systems, which evolved further into multimedia systems, and today into infotainment systems, which provide entertainment facilities, access to information (e.g., route guidance), and increasingly, network connectivity, which leads to the perception of those systems as smart devices. The paper presents an overview of modern car infotainment systems possibilities compiled with the results of survey conducted on representative group of car users, mainly drivers. The purpose of the survey was to obtain information about possessed car audio systems (loudspeakers, amplifiers), hardware and software capabilities of possessed infotainment systems, and to learn about user experience followed by satisfaction evaluation of quality of audio provided by systems installed in their cars.
Convention Paper 9804
Saturday, May 20, 16:30 — 17:30 (Salon 1 Moscow)
Sascha Spors (Chair)
P07-01 Augmented Reality to Improve Orchestra Conductors’ Headphone Monitoring
Dimitri Soudoplatoff (Presenting Author), Amandine Pras (Author)
Conductors face challenges when conducting the orchestra with headphones to synchronize with a soundtrack or a click track. We sent a survey to 12 international conductors to identify and classify those challenges. They primarily reported on balance issues, aggressive click tracks, and the difficulty of hearing the acoustic sound of the orchestra, leading 70% of them to remove one ear out of the headphones. A solution using augmented reality monitoring through binaural rendering and head tracking was tested in various situations and showed that it could successfully reproduce the acoustic sound of the orchestra into the headphones. Another perceptual experiment evaluated the potential of realism of this solution when merging two binaural auditory scenes recorded in the same acoustic space together. Results encourage us to further develop immersive monitoring systems for conductors, with the soundtrack integrated in the real acoustic space.
Convention Paper 9720
P07-02 Wave Field Synthesis Driving Functions for Large-Scale Sound Reinforcement Using Line Source Arrays
Frank Schultz (Presenting Author), Peter Fiala (Author), Gergely Firtha (Author), Sascha Spors (Author)
Wave field synthesis (WFS) can be used for wavefront shaping using line source arrays (LSAs) in large-scale sound reinforcement. For that the individual drivers might be electronically controlled by WFS driving functions of a virtual directional point source. From the recently introduced unified 2.5D WFS framework it is known that positions of amplitude correct synthesis (PCS) only exist along an arbitrary shaped curve—the reference curve—in front of the LSA. However, its shape can be adapted with the so called referencing function. We introduce the adaption of the referencing function along the audience line of typical concert venues for optimized wavefront shaping. This yields considerable improvements with respect to sound field’s homogeneity and more convenient setups compared to previous WFS-based sound reinforcement.
Convention Paper 9722
Saturday, May 20, 16:30 — 18:00 (Salon 2+3 Rome)
Finn T. Agerkvist (Chair)
P08-01 Position Dependence of Fractional Derivative Models for Loudspeaker Voice Coils with Lossy Inductance
Alexander W. King (Presenting Author), Finn T. Agerkvist (Author)
Commonly used models of moving-coil loudspeaker voice coils, which include effects from eddy current losses, are either inaccurate or contain an abundance of parameters and are difficult to extend to the nonlinear domain. On the contrary, fractional derivative models accurately describe the frequency and position dependence of the lossy inductance, with meaningful connections to the underlying physics, while keeping the number of parameters low. These fractional derivatives are also compatible with state-space polynomial methods of modeling nonlinear behavior. It is shown that the fractional order derivative approaches a value of 1, corresponding to an ideal inductance, when the voice coil is completely outside the magnetic system. Finally, the developed model reveals details about the effect of conductive voice coil formers.
Convention Paper 9723
P08-02 Model for Evaluation of Power Consumption of Vented Box Loudspeakers
Filip Sommer Madsen (Presenting Author), Niels Elkj'r Iversen (Author), Arnold Knott (Author), Søren Thorsen (Author)
In the design of mobile sound systems an estimation of power consumption must be made in order to choose a battery of appropriate size and cost. However poor methods for power estimation tend to result in large and costly batteries. This paper aims to present a more precise method for estimating power consumption for a vented box sound system. Instead of simplifying a loudspeaker system as a purely ohmic resistance, its mechanical and acoustic parameters are used to create a state space model. Despite deviations at high frequencies, the state space model is at least twice as accurate at estimating the power consumption than simplifying the speaker as a resistor.
Convention Paper 9724
P08-03 Assessment of the Radiation Mode Method for In Situ Measurements of Loudspeaker Systems
Maryna Sanalatii (Presenting Author), Régine Guillermin (Author), Philippe Herzog (Author), Jean-Christophe Le Roux (Author), Manuel Melon (Author), Nicolas Poulain (Author), Clément Vasseur (Author), Lucas Vindrola (Author)
In this paper the radiation mode method is used to measure the frequency response and the directivity pattern of loudspeaker systems. This method, which has been successfully applied to speaker measurement in free field conditions, is now tested in a large non-anechoic hall. Two closed box systems and a switchable bi-directive/cardioid subwoofer have been used. Each system is measured first in an anechoic chamber and then in the large hall. The radiation method is then applied to the two different measurement data sets. Results show a good agreement between both conditions. Finally, the influence of the mesh coarsening is studied.
Convention Paper 9725
Sunday, May 21, 09:00 — 12:30 (Salon 1 Moscow)
Franz Zotter (Chair)
P09-01 The Median-Plane Summing Localization in Ambisonics Reproduction
Bosun Xie (Presenting Author), Haiming Mai (Author), Xiaoli Zhong (Author)
One aim of Ambisonics reproduction is to recreate the perception of virtual source in arbitrary directions. Practical Ambisonics reproduction is unable to recreate correct high-frequency spectra in binaural pressures that are known as front-back and vertical localization cue. Based on both a simple shadowless head model and KEMAR’s HRTFs, the present work proves that the changes of ITD caused by head turning in Ambisonics match with these of a real source, and thus provide dynamic cue for vertical localization, especially in the median plane. In addition, the low-frequency virtual source direction can be approximately evaluated by using a set of localization equations or panning laws. The above analysis is validated by a median-plane virtual source localization experiment with Ambisonics reproduction.
Convention Paper 9726
P09-02 Exploring the Perceptual Sweet Area in Ambisonics
Matthias Frank (Presenting Author), Franz Zotter (Author)
A physical pressure-matching criterion to describe the size of the sweet area in Ambisonics does not quite match the generously large sweet area encountered in practice. To satisfy the need of a more practical characterization this paper comes up with a simple and systematic method to experimentally determine the perceptual sweet area. The method is not limited to assessing the localization of dry sounds as it also permits plausibility assessment of more complex scenes. This contribution exemplarily presents results for playback of two different audio scenes (dry/reverberant) using different Ambisonic orders.
Convention Paper 9727
P09-03 Phantom Source Widening by Filtered Sound Objects
Franz Zotter (Presenting Author), Matthias Frank (Author)
Audio effects increasing the perceived source extent (width/distance) often employ frequency-dependent panning of a single virtual sound object or real-time-controlled design of stochastic multichannel filters. Both ways imply increased complexity required in the renderer or object representation. In this paper we present a frequency-dependent panning scheme to obtain constellations of 3, 4, 5, or 7 filtered sound objects, as a simplified object-based description of wide/distant sound for any renderer. We deal with the multichannel filter-design questions: Are the filters rather temporally compact or frequency-selective, zero-phase FIR vs. IIR or causal-sided FIR, how strictly power-complementary? By results of a listening experiment for selected examples, we can provide some answers and an effective design of useful width-/distance-increasing filtered sound objects.
Convention Paper 9728
P09-04 Ambisonic Spatial Blur
Thibaut Carpentier (Presenting Author)
This paper presents a technique for controlling the spatial resolution of an Ambisonic sound field while preserving its overall energy. The proposed method allows to transform a stream encoded in N-order Ambisonic to a lower order resolution. The transformation can be continuously operated, indeed simulating fractional order representation of the Ambisonic stream and varying the "bluriness" of the spatial image.
Convention Paper 9729
P09-05 Comparing Ambisonic Microphones—Part 2
Enda Bates (Presenting Author), Francis M. Boland (Author), Sean Dooney (Author), Luke Ferguson (Author), Marcin Gorzel (Author), Hugh O'Dwyer (Author)
This paper presents some further experiments devised to assess the performance of an expanded number of commercially available Ambisonic microphones. The subjective timbral and spatial quality of five microphones (Soundfield MKV, Core Sound TetraMic, Sennheiser Ambeo, MH Acoustics Eigenmike, and Zoom H2N) is assessed using listening tests and a recording of an acoustic quartet. Localization accuracy is assessed using an objective directional analysis and recordings from a spherical loudspeaker array. Intensity vectors are extracted from 25 critical frequency bands of the Bark scale and used to compute the angle to the source location. Significant differences were found between microphones with the Soundfield MKV and Eigenmike producing the best results in terms of timbral quality and localization respectively.
Convention Paper 9730
P09-06 Object-Based Reverberation Encoding from First-Order Ambisonic RIRs
Philip Coleman (Presenting Author), Andreas Franck (Author), Philip J. B. Jackson (Author), Dylan Menzies (Author)
Recent work on a reverberant spatial audio object (RSAO) encoded spatial room impulse responses (RIRs) as object-based metadata that can be synthesized in an object-based renderer. Encoding reverberation into metadata presents new opportunities for end users to interact with and personalize reverberant content. The RSAO models an RIR as a set of early reflections together with a late reverberation filter. Previous work to encode the RSAO parameters was based on recordings made with a dense array of omnidirectional microphones. This paper describes RSAO parameterization from first-order Ambisonic (B-Format) RIRs, making the RSAO compatible with existing spatial reverb libraries. The object-based implementation achieves reverberation time, early decay time, clarity, and interaural cross-correlation similar to direct Ambisonic rendering of 13 test RIRs.
Convention Paper 9731
P09-07 Further Investigations on the Design of Radial Filters for the Driving Functions of Near-Field Compensated Higher-Order Ambisonics
Nara Hahn (Presenting Author), Sascha Spors (Author)
Analytic driving functions for Near-field Compensated Higher-order Ambisonics (NFC-HOA) are derived based on the spherical harmonics expansions of the desired sound field and the Green’s function that models the secondary sources. In the frequency domain, the radial part of the driving function is given as spherical Hankel functions and compensates the near-field effects of the secondary sources. By exploiting the polynomial expansion of the spherical Hankel functions, the radial filters can be implemented as cascaded biquad filters in the time domain, thereby reducing the computational complexity significantly. In this paper three practical issues regarding the design of the radial filters are addressed: pole-zero computation, pole-zero mapping, and gain normalization. Useful suggestions are given for improvements in terms of stability and numerical stability, which are demonstrated by numerical simulations.
Convention Paper 9732
Sunday, May 21, 09:00 — 11:30 (Salon 2+3 Rome)
Jamie Angus (Chair)
P10-01 Supply Voltage Scaling Technique of Triode Tube Based on Harmonic Distortion Characteristics
Kanako Takemoto (Presenting Author), Toshihiko Hamasaki (Author), Shiori Oshimo (Author)
In recent years, applying a triode tube to the guitar pedal as a signal modulator has been attracting considerable attention. The miniature tube 12AX7/ECC83 is still the key component for creating main sound properties of a guitar amplifier owing to its specific non-linear characteristics. Therefore, the attempts to incorporate the tube harmonic distortion sound to the pedal instead of solid-state circuit came out in the market. This paper describes a technique to generate the same harmonic spectrum focusing tube Crunch sound at 150V power supply voltage, which is half of conventional 300V. This voltage scaling contributes to reducing the battery power consumption of the tube guitar pedal to prolong the playing time.
Convention Paper 9733
P10-02 Linear Phase Crosstalk Cancellation Filters
Arnaud Reymond (Presenting Author), Christof Faller (Author), Daniel Weiss (Author)
The filter design approach presented here manages an attractive compromise that produces filters with good crosstalk cancellation, short impulse responses, and a linear phase. As a starting point, basic cancellation filters are considered, i.e., ideal pulses with delay and inversion. These are modified in the frequency domain with a gain applied so as to obtain a nearly spectrally flat total power at the ears. This modification allows to substantially reduce the coloration of the basic cancellation filters at the price of a small decrease of cancellation performance.
Convention Paper 9734
P10-03 GaN FETs Drive Fidelity and Efficiency in Class-D Audio Amplifiers
Stephen Colino (Author), Bhasy Nair (Presenting Paper), Skip Taylor (Author)
With the current maturity of Class-D audio amplifier architectures, amplifier fidelity and efficiency limitations are primarily at the device level. Silicon MOSFETs have been evolving for almost forty years, and their progress towards a perfect switch has slowed dramatically. There are some fundamental characteristics of MOSFETs that degrade sound quality and efficiency. In 2010, the enhancement mode Gallium nitride (GaN) power FET was introduced by Efficient Power Conversion (EPC), providing a large step towards the perfect switch.
Convention Paper 9735
P10-04 Ultra Efficient Linear Amplifiers
Jamie Angus (Presenting Author)
“Class-D” switching amplifiers are considered to be the most efficient amplifiers. However, designers must deal with supply rail, and radio frequency interference, and the need to switch power devices at high frequencies. Because of these, and other problems, not everyone wishes to use switching based technologies amplifiers. Unfortunately, linear amplifiers are significantly more inefficient than switching amplifiers under sine wave testing. However real audio signals spend much more time at low amplitudes than a sine wave. By changing the switch point for “Class-G” or “Class-H” they can have efficiencies that rival “Class-D” amplifiers producing the same output. The paper develops optimum switch points for both single and multiple switching points, with respect to the expected amplitude distribution of the audio.
Convention Paper 9736
P10-05 Evaluation of Audio Performance over Product Life
Wolfgang Klippel (Presenting Author)
Most measurements are performed on audio products during the development of the first prototype and at the end of the production line. Physical measurements and perceptional evaluation in the target application (e.g., car interior) are also required to define the target performance, to finish successfully the development and to evaluate the reliability and robustness of the product under the influence of climate, aging, and other external factors. This paper discusses evaluation techniques that are useful in the different phases of the product life cycle to generate a successful product that provides the maximum benefit to the end user.
Convention Paper 9737
Sunday, May 21, 13:00 — 14:00 (Salon 1 Moscow)
Nadja Schinkel-Bielefeld (Chair)
P11-01 Is it Harder to Perceive Coding Artifact in Foreign Language Items? – A Study with Mandarin Chinese and German Speaking Listeners
Nadja Schinkel-Bielefeld (Presenting Author), Zhang Jiandong (Author), Anna Katharina Leschanowsky (Author), Fu Shanshan (Author), Qin Yili (Author)
MUSHRA listening tests for the evaluation of audio coded material often also include speech stimuli in a language the listener does not understand. However, it is not clear to what extent the lack of understanding and the unfamiliarity with that language and its phonemes may influence the listener’s behavior during the test and his or her quality ratings. In a study containing German and Mandarin Chinese speaking listeners as well as items of these two languages we analyze how ratings and listening times are affected by the foreign language. Pooled over all conditions we find no significant differences in the ratings. However, for high quality items we find that compared to native listeners, non-native listeners need longer listening times and compare more between items.
Convention Paper 9739
P11-02 Parametric Joint Channel Coding of Immersive Audio
Heiko Purnhagen (Presenting Author), Stanislaw Gorlow (Author), Janusz Klejsa (Author), Heidi-Maria Lehtonen (Author), Lars Villemoes (Author)
This paper presents a parametric joint channel coding scheme that enables the delivery of channel-based immersive audio content in formats such as 7.1.4, 5.1.4, or 5.1.2 at very low bit rates. It is based on a generalized approach for parametric spatial coding of groups of two, three, or more channels using a single downmix channel together with a compact parametrization that guarantees full covariance re-instatement in the decoder. By arranging the full-band channels of the immersive content into five groups, the content can be conveyed as a 5.1 downmix together with the parameters for each group. This coding scheme is implemented in the A-JCC tool of the AC-4 system recently standardized by ETSI, and listening test results illustrate its performance.
Convention Paper 9740
Sunday, May 21, 15:00 — 18:00 (Salon 1 Moscow)
Ben Kok (Chair)
P12-01 Quantitative Investigation Artificial Room Simulations Reproduced by Channel-Based and Object-Based Surround Sound Systems
Bernard Camilleri (Presenting Author), Jakob Bergner (Author), Christoph Sladeczek (Author)
The introduction of object-based audio reproduction comes along with new challenges for the sound engineer to record, design, and synthesize reverberant sound fields due to the increased number of speakers and the placement of such. The aim of this paper is to show that several parameter settings from a digital reverberation unit produce contrasting reflectograms in a 5.0 channel-based setup and an object-based setup that can have effects on the perceived reverberant sound field. Conversely, established acoustical metrics derived from the measured room impulse responses (RIRs) in both multichannel reproduction setups do not highlight the differences noticed in the reflectograms. The potential consequences regarding individual system properties and the metrics themselves are discussed in this work.
Convention Paper 9741
P12-02 Comparative Perceptual Evaluation between Different Methods for Implementing Reverberation in a Binaural Context
Lorenzo Picinali (Presenting Author), Yuli Levtov (Author), David Poirier-Quinot (Author), Alexander Wallin (Author)
Reverberation has always been considered of primary importance in order to improve the realism, externalization and immersiveness of binaurally spatialized sounds. Different techniques exist for implementing reverberation in a binaural context, each with a different level of computational complexity and spatial accuracy. A perceptual study has been performed in order to compare between the realism and localization accuracy achieved using five different binaural reverberation techniques. These included multichannel Ambisonic-based, stereo and mono reverberation methods. A custom web-based application has been developed implementing the testing procedures and allowing participants to take the test remotely. Initial results with 54 participants show that no major difference in terms of perceived level of realism and spatialization accuracy could be found between four of the five proposed reverberation methods, suggesting that a high level of complexity in the reverberation process does not always correspond to improved perceptual attributes.
Convention Paper 9742
P12-03 Data-Driven Granular Synthesis
Sadjad Siddiq (Presenting Author)
Granular synthesis is a flexible method to create a wide range of complex sounds, like the sound of rain or water, using very short waveforms, called grains. To synthesize realistic, natural sounds appropriate grains are needed. In an earlier paper we already presented a method to extract grains from recordings of complex sounds. In this paper we describe an extension of the earlier method in which the end of incomplete grains is estimated to improve sound quality. Additionally synthesis parameters that allow us to recreate sound output very close to the original recordings are found automatically. A few seconds of audio input will provide enough data to synthesize sounds of arbitrary length. The necessary grains only require little memory and since synthesis parameters can also be varied to change the nature of the sound, this method is especially beneficial for video games. While empirical listening suggests that the synthesized waveforms sound natural, a formal listening test was not conducted. Sound samples are provided.
Convention Paper 9743
P12-04 Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments
Vincent Grimaldi (Presenting Author), Christoph Böhm (Author), Stefan Weinzierl (Author), Henrik von Coler (Author)
This paper presents the design and evaluation of a parametric sound texture synthesis for the generation of crowd noise in virtual acoustic environments. It allows the control of the crowd size, its level of excitement, and its spatial distribution in real-time. A corpus-based concatenative approach is used to generate single streams of indistinct speech that are superimposed to create an unintelligible "babbling" texture. Speech material was recorded in semi-supervised group discussions in the anechoic chamber. The database is used in a real-time implementation with a subsequent rendering using dynamic binaural synthesis. Listening tests were conducted to evaluate the effect of different parameter settings, as well as the perceived “naturalness” of the simulation.
Convention Paper 9744
P12-05 Real or Illusion? A Comparative Study of Captured Ambiance vs. Artificial Reverberation in Immersive Audio Applications
Richard King (Presenting Author), Will Howie (Author), Jack Kelly (Author), Brett Leonard (Author)
Spatial audio researchers and content producers agree that the best source material for immersive audio is provided by the capture of acoustic signals at various elevations in a room. Where music recording is concerned, this technique is generally preferred over signal processing, as it provides a more natural and realistic impression of immersion. The authors’ previous work evaluated the content of rear height channels, which demonstrated that a group of listeners could not discriminate between real room sound and artificial reverberation, and showed no significant preference for either version. The current research investigates whether or not there is a preference for real source ambience over artificially generated reverberation in all four of the height channels (i.e., front and rear elevation) of a 9.1 immersive playback system. Results show some subjects can consistently discriminate between ambiences, but no consistent preference for ambience was observed.
Convention Paper 9745
P12-06 Investigating the Impact of a Music Stand on Stage Using Spatial Impulse Responses
Sebastià Vicenç Amengual Gari (Presenting Author), Malte Kob (Author)
A measurement set-up replicating a trumpet solo concert situation is arranged on stage by means of a music stand, a directive loudspeaker, and a microphone array. Spatial Room Impulse Responses are measured and analyzed to evaluate the acoustic impact of the music stand at the musician position, depending on the stand location and orientation. Results show that when the stand is orientated towards the receiver the sound level at high frequencies increases up to 9 dB. In some cases, the level of the stand reflection at high frequencies is higher than the source itself, due to its radiation characteristics. The effects of a possibly perceivable timbre change on stage are discussed.
Convention Paper 9746
Sunday, May 21, 15:00 — 18:00 (Salon 2+3 Rome)
Udo Zölzer (Chair)
P13-01 The Perceptual Effect of Vertical Interchannel Decorrelation on Vertical Image Spread at Different Azimuth Positions
Christopher Gribben (Presenting Author), Hyunkook Lee (Author)
Two subjective experiments have been conducted to investigate the effect of vertical interchannel decorrelation on the perception of vertical image spread (VIS). Pairs of vertically arranged loudspeakers, one at ear level and another elevated by 30°, were positioned at 0°, ±30°, and ±110° azimuth to the listener. The first experiment compared octave-band pink noise stimuli, consisting of two decorrelation methods with three levels of interchannel cross-correlation (ICC), a coherent sample and a monophonic sample. The effect of vertical ICC on VIS perception was found to be most effective for frequencies around 500 Hz and above, with little effect at lower frequencies. The second experiment judged the absolute lower and upper boundaries of perceived VIS, using stimuli from the first experiment, showing a potential association between VIS and vertical localization.
Convention Paper 9747
P13-02 Predictors for the Perception of “Wildness” and “Heaviness” in Distorted Guitar Timbre
Koji Tsumoto (Presenting Author), Toru Kamekawa (Author), Atsushi Marui (Author)
Predictors for the perception of wildness and heaviness in distorted guitar timbre were investigated. A pairwise comparison was conducted for the stimuli of five different amounts of distortion and three types of diodes. The result indicated that the perception of wildness and heaviness seemed to be compiled as one attribute associated with the “power” of the timbre. The ratings appeared to correspond to the threshold voltage of diodes and the amount of distortion. Also, the spectral kurtosis had a relatively high negative correlation with the ratings. The types of diodes, the amount of distortion, and the spectral kurtosis seemed to be appropriate predictors for the perception of wildness and heaviness.
Convention Paper 9748
P13-03 An Investigation into the Relationship between the Subjective Descriptor Aggressive and the Universal Audio of the 1176 FET Compressor
Austin Moore (Presenting Author), Jonathan Wakefield (Author)
In popular music productions, the lead vocal is often the main focus of the mix and engineers will work hard to impart creative coloration on this source. This paper conducts listening experiments to test if there is a correlation between perceived distortion and the descriptor “aggressive” which is often used to describe the sonic signature of the Universal Audio 1176. The results from this study show compression settings that impart audible distortion are perceived as aggressive by the listener and there is a strong correlation between the subjective scores for distortion and aggressive. It was also shown there is a strong correlation between compression settings rated to have high aggressive scores and the audio feature roughness.
Convention Paper 9749
P13-04 Investigations Towards Plausible Blind Upmixing of Applause Signals
Alexander Adami (Presenting Author), Lukas Brand (Author), Jürgen Herre (Author)
Blind upmix denotes the process of converting audio content into a higher number of output channels without the aid of any prior spatial information. This is often needed for upmixing legacy monophonic recordings into modern multichannel audio formats. Especially in live-recordings, applause plays a vital role. However, creating a convincing blind upmix of applause signals is a demanding task. Applause can be interpreted as a superposition of distinctive and individually perceivable foreground claps and a more noise-like background. While the background signal can be upmixed by applying decorrelation and distribution across channels, it is important that the foreground claps are spatially distributed in a perceptually meaningful and plausible manner. This paper investigates the effect of the spatial, temporal, and timbral structure of foreground claps on the perceived plausibility of applause signals. The assessment was done by means of two listening tests. Results show that especially for sparse applause, plausibility is significantly reduced if its natural timbral and temporal structure is corrupted.
Convention Paper 9750
P13-05 Joint Parameter Optimization of Differentiated Discretization Schemes for Audio Circuits
Francois Germain (Presenting Author), Kurt James Werner (Author)
We propose a new approach to discretizing audio circuits that involves applying differentiated discretization schemes among the elements of a linear circuit, or sub-circuit, rather than a single uniform scheme. The scheme coefficients are jointly optimized to minimize some frequency response error function for that linear circuit. We describe the mathematical framework for this optimization and apply it to the case of the parametric bilinear transform. Differentiated discretization coefficients are jointly optimized by minimizing the L2 -norm error between the discretized frequency response and the frequency response of the original system. To demonstrate the validity of our approach, we apply our method to several examples and show a systematic reduction of the frequency response error in each case.
Convention Paper 9751
P13-06 Virtual Analog Modeling of Dynamic Range Compression Systems
Felix Eichas (Presenting Author), Etienne Gerat (Author), Udo Zölzer (Author)
Dynamic range compression (DRC) systems reduce the dynamic range of an input signal by amplifying low amplitude levels and attenuating higher ones. This work describes a method to digitally model any analog dynamic range compression unit solely with the help of input/output measurements. For this purpose a generic dynamic range compression model is chosen and its structure is adapted to be able to recreate an analog reference device. The linear characteristic as well as the static curve of the reference device are extracted and directly used in the model. Afterwards the parameters of the digital model are adapted with an iterative optimization routine. Finally the output of the digital model and the analog reference system are compared to evaluate the quality of the emulation.
Convention Paper 9752
Monday, May 22, 09:00 — 12:30 (Salon 1 Moscow)
Russell Mason (Chair)
P14-01 Evaluation of Auditory Events with Projected Sound Sources Using Perceptual Attributes
Tom Wühle (Presenting Author), M. Ercan Altinsoy (Author), Sebastian Merchel (Author)
The main aim of the projection of sound sources is to change the perceived direction of the auditory event from the direction of the real source to the direction of the projected source. However, the focusing capabilities of projecting sound sources are physically limited. Therefore, the perception of the listener is not only influenced by the projected sound but also by the sound that is directly radiated from the real source. In a scenario with projected sound sources a complex mixture of perceptual attributes change besides the direction of the auditory event. The present study describes this perceptual processes and investigates some of those attributes.
Convention Paper 9753
P14-02 The Evaluation of the Effect of Sound Directionality in Horizontal Plane on the Human Auditory Distance Perception in a Large Reverberant Room
Tahereh Afghah (Presenting Author), Andrew Allen (Author), Aravindan Joseph Benjamin (Author), Peter Otto (Author)
An evaluation of sound localization effect on the auditory distance estimation in a user study is presented. Binaural Room Impulse Responses of 60 positions were recorded in a reverberant space using a dummy head. The recordings were evaluated by the users in a headphone-based listening test to analyze the listeners’ ability to perceive the distance with and without prior knowledge of direction of origin. When known, the distance estimation accuracy in left and right sides of the head in near field (2m, 4m) was improved and at some angles saw a significant improvement. However, known direction did not assist the users in determining the larger distance levels (6m, 8m, 10m). No improvements were seen in the front and back sides for all directions.
Convention Paper 9754
This paper will not be presented but is available in the E-Library
P14-03 Improvement of the Reporting Method for Closed-Loop Human Localization Experiments
Fiete Winter (Presenting Author), Sascha Spors (Author), Hagen Wierstorf (Author)
Sound Field Synthesis reproduces a desired sound field within an extended listening area using up to hundreds of loudspeakers. The perceptual evaluation of such methods is challenging, as many degrees of freedom have to be considered. Binaural Synthesis simulating the loudspeakers over headphones is an effective tool for the evaluation. A prior study has investigated whether non-individual anechoic binaural synthesis is perceptually transparent enough to evaluate human localization in sound field synthesis. With the used apparatus, an undershoot for lateral sound sources was observed for real loudspeakers and their binaural simulation. This paper reassesses human localization for the mentioned technique using a slightly modified setup. The results show that the localization error decreased and no undershoot was observed.
Convention Paper 9755
P14-04 Investigations on Perceptual Phenomena of the Precedence Effect Using a Bessel Sequence
Florian Wendt (Presenting Author)
The precedence effect is typically investigated by presenting two instances of a sound with delay in between. Respective studies found various phenomena indicating that in human auditory localization the contribution of the first sound instance often prevails over a later sound or an acoustic reflection. In reverberant environments, the direct sound is typically followed by more than one reflection. Nevertheless, only little is known about the contribution of multiple reflections on the precedence effect. Understandably, a free number of sound instances increases the number of thinkable conditions drastically and an exhaustive systematic investigation appears infeasible. Directionally distributed impulses weighted by a Bessel sequence offer a neat set of free parameters. We chose this scheme to gain quantitative insights into the influence of multiple reflections on the precedence effect. Our study covers the transient precedence effect, the ongoing precedence effect, and the onset capture effect, which we investigate using sounds of different envelope, frequency range, angular and temporal spread.
Convention Paper 9756
P14-05 Just Noticeable Difference in Apparent Source Width Depending on the Direction of a Single Reflection
Dale Johnson (Presenting Author), Hyunkook Lee (Author)
An investigation on the just noticeable difference in angle of a single reflection in terms of apparent source width was performed using a staircase method to obtain two, single reflection angles between 0° and 180°. In the presence of a direct sound, subjects compared the apparent source width produced by a single 90° reference reflection, and a single test reflection ranging between 0° to 90° and 0° to 180° for each threshold. Subjects repeated this test for four delay times of 5 ms, 10 ms, 20 ms, and 30 ms. Reflection angles were found to be approximately 40° and 130° and, however, do not appear to vary with delay time. This implies that human hearing is not sensitive to changes in reflection angle in terms of apparent source width between the threshold angles.
Convention Paper 9757
P14-06 Modeling Horizontal Audio-Visual Coherence with the Psychometric Function
Hanne Stenzel (Presenting Author), Jon Francombe (Author), Philip J. B. Jackson (Author)
Studies on perceived audio-visual spatial coherence in the literature have commonly employed continuous judgment scales. This method requires listeners to detect and to quantify their perception of a given feature and is a difficult task, particularly for untrained listeners. An alternative method is the quantification of a percept by conducting a simple forced choice test with subsequent modeling of the psychometric function. An experiment to validate this alternative method for the perception of azimuthal audio-visual spatial coherence was performed. Furthermore, information on participant training and localization ability was gathered. The results are consistent with previous research and show that the proposed methodology is suitable for this kind of test. The main differences between participants result from the presence or absence of musical training.
Convention Paper 9758
P14-07 How Important Is Accurate Localization in Reproduced Sound?
Russell Mason (Presenting Author)
A meta-analysis was conducted on elicitation studies to examine the perceptual importance of localization-specific and localization-related attributes. It was found that the majority of attributes were localization-related, including (in order of commonality) extent, locatedness , distribution, spaciousness, and movement. The most common localization-specific attribute was distance, with only 2.6% of the attributes relating to the perceived lateral position. It is concluded that localization accuracy experiments may enable experimenters to make predictions about a reasonable proportion of the attributes found, though further research is needed to develop suitable analysis techniques. In addition, more research is required to develop subjective and objective methods for judging perceived distance.
Convention Paper 9759
Monday, May 22, 09:00 — 11:00 (Salon 2+3 Rome)
John Mourjopoulos (Chair)
P15-01 Close Miking Empirical Practice Verification: A Source Separation Approach
Stylianos Ioannis Mimilakis (Presenting Author), Konstantinos Drossos (Author), Andreas Floros (Author), Gerald Schuller (Author), Tuomas Virtanen (Author)
Close miking represents a widely employed practice of placing a microphone very near to the sound source in order to capture more direct sound and minimize any pickup of ambient sound, including other, concurrently active sources. It is used by the audio engineering community for decades for audio recording, based on a number of empirical rules that were evolved during the recording practice itself. But can this empirical knowledge and close miking practice be systematically verified? In this work we aim to address this question based on an analytic methodology that employs techniques and metrics originating from the sound source separation evaluation field. In particular, we apply a quantitative analysis of the source separation capabilities of the close miking technique. The analysis is applied on a recording dataset obtained at multiple positions of a typical musical hall, multiple distances between the microphone and the sound source multiple microphone types and multiple level differences between the sound source and the ambient acoustic component. For all the above cases we calculate the Source to Interference Ratio (SIR) metric. The results obtained clearly demonstrate an optimum close-miking performance that matches the current empirical knowledge of professional audio recording.
Convention Paper 9760
P15-02 Audio System Spatial Image Evaluation via Binaural Feature Classification
Gavriil Kamaris (Presenting Author), Stamatis Karlos (Author), Dimitris Koutsaidis (Author), John Mourjopoulos (Author), Stergios Terpinas (Author)
A method for evaluating different audio systems with respect to their spatial reproduction accuracy is described based on binaural auditory feature classification. The classifier is trained to act as expert listener judging system spatial quality and achieves high accuracy for a reference ideal system under anechoic conditions. The trained classifier is then employed to assess different suboptimal reproduction setups and listening conditions. The spatial accuracy was assessed with respect to this reference with respect to the panned image, image localization accuracy, and the sweet spot area spread. For 2 channel stereo reproduction, the study used loudspeakers of different directivity under anechoic or varying reverberant room conditions. The work also assesses the relative effects of upmixing stereo for 5 channel reproduction.
Convention Paper 9761
P15-03 Long-term Average Spectrum in Popular Music and its Relation to the Level of the Percussion
Anders Elowsson (Presenting Author), Anders Friberg (Author)
The spectral distribution of music audio has an important influence on listener perception, but large-scale characterizations are lacking. Therefore, the long-term average spectrum (LTAS) was analyzed for a large dataset of popular music. The mean LTAS was computed, visualized, and then approximated with two quadratic fittings. The fittings were subsequently used to derive the spectrum slope. By applying harmonic/percussive source separation, the relationship between LTAS and percussive prominence was investigated. A clear relationship was found; tracks with more percussion have a relatively higher LTAS in the bass and high frequencies. We show how this relationship can be used to improve targets in automatic equalization. Furthermore, we assert that variations in LTAS between genres is mainly a side-effect of percussive prominence.
Convention Paper 9762
P15-04 Efficient Music Identification Approach Based on Local Spectrogram Image Descriptors
Massimiliano Zanoni (Presenting Author), Paolo Bestagini (Author), Antonio Canclini (Author), Stefano Lusardi (Author), Augusto Sarti (Author), Stefano Tubaro (Author)
The diffusion of large music collections has determined the need for algorithms enabling fast song retrieval from query audio excerpts. This is the case of online media sharing platforms that may want to detect copyrighted material. In this paper we start from a proposed state-of-the-art algorithm for robust music matching based on spectrogram comparison leveraging computer vision concepts. We show that it is possible to further optimize this algorithm exploiting more recent image processing techniques and carrying out the analysis on limited temporal windows, still achieving accurate matching performance. The proposed solution is validated on a dataset of 800 songs, reporting an 80% decrease in computational complexity for an accuracy loss of about only 1%.
Convention Paper 9763
Monday, May 22, 09:30 — 12:30 (Gallery Window)
P16-01 MySofa—Design Your Personal HRTF
Christian Hoene (Presenting Author), Alexandru Cacerovschi (Author), Isabel C. Patino Mejia (Author)
Binaural auralizations are present in increasing numbers of applications and devices. Albeit most of the time they only use generic Head-Related Transfer Functions (HRTFs), the recent standardization of the HRTF format SOFA has paved the way to support individualized HRTFs broadly. We have developed and implemented MySofa: a web service that helps users to design a personal HRTF. In MySofa, based on anthropometric and user inputs, algorithms calculate and tune HRTFs. The result is displayed in the web browser and the user can listen to test renderings to verify, whether the personalized HRTF matches his expectations. In order to foster the use of individualized HRTF, we also implemented a light weight C-library called libmysofa, which helps programmers to read SOFA files and lookup FIR filters.
Convention Paper 9764
P16-02 Ecological Validity of Stereo UHJ Soundscape Reproduction
Francis Stevens (Presenting Author), Damian Murphy (Author), Stephen Smith (Author)
This paper contains the results of a study making use of a set of B-format soundscape recordings, presented in stereo UHJ format as part of an online listening test, in order to investigate the ecological validity of such a method of soundscape reproduction. Test participants were presented with a set of soundscapes and asked to rate them using the Self-Assessment Manikin (SAM), and these results were then compared with those from a previous study making use of the same soundscape recordings presented in a surround-sound listening environment (a method previously shown to be ecologically valid). Results show statistically significant correlation between the SAM results for both listening conditions, indicating that the use of stereo UHJ format is valid for soundscape reproduction.
Convention Paper 9765
P16-03 Comparison of HRTFs from a Dummy-Head Equipped with Hair, Cap, and Glasses in a Virtual Audio Listening Task over Equalized Headphones
György Wersényi (Presenting Author), József Répás (Author)
Head-Related Transfer Functions (HRTFs) are frequently used in virtual audio scene rendering in order to simulate sound sources at different spatial locations. The use of dummy-head HRTFs (also referred as generic sets) is often criticized because of poor localization performance, leading to, e.g., lower spatial resolution, in-the-head localization, front-back reversals, etc. This paper presents results of horizontal plane localization obtained by digital filter representations of dummy-head HRTFs that were recorded normally and using additional cap, glasses, and hair on the head. Results of untrained subjects over equalized reference headphones showed no significant difference among the HRTF sets despite of large magnitude differences. This method for customization of generic HRTFs fails if improvement in localization is needed.
Convention Paper 9766
P16-04 Filter Design of a Circular Loudspeaker Array Considering the Three-Dimensional Directivity Patterns Reproduced by Circular Harmonic Modes
Koya Sato (Presenting Author), Yoichi Haneda (Author)
We propose a filter design method for a circular loudspeaker array. This method is based on extended three-dimensional (3-D) bases that are observed by driving each circular harmonic mode using a prototypical circular loudspeaker array. When a desired 3-D directivity pattern is expanded by the extended 3-D bases, the filter coefficients can be obtained by combining the circular harmonics with the expansion coefficients of the desired 3-D directivity pattern. Moreover, the proposed method can suppress large filter gains at low frequencies by limiting the gain at each order (mode) using L1-norm optimization. We evaluated the performance of directivity and sound distortion using an actual 8-element circular loudspeaker array of radius 0.054 m. These results showed that the proposed method could control the 3-D directivity with little distortion.
Convention Paper 9767
P16-05 Wearable Sound Reproduction System Using Two End-Fire Arrays
Kenta Imaizumi (Presenting Author), Yoichi Haneda (Author)
We propose a personal sound reproduction system that uses two wearable end-fire loudspeaker arrays instead of a headphone to present the sound image. The prototype arrays rest on the listener’s chest so that the look direction of each array was directed to the listener’s ears. To prevent sound leakage around the listener, we designed a narrow directivity for each array. Moreover, we used a crosstalk canceler for localizing the sound image with head-related transfer functions. We verified the performance of the prototype by using simulations. A difference of approximately 15 dB of sound pressure was obtained between the look direction and the other directions. The crosstalk was suppressed from approximately 10 dB to 20 dB. Additionally, we also conducted a subjective listening test of the sound image localization. The right and left correct answer rate was approximately 90%, and the exact match was approximately 40%.
Convention Paper 9768
P16-06 Normalization Schemes in Ambisonic: Does it Matter?
Thibaut Carpentier (Presenting Author)
In the context of Ambisonic processing, various normalizations of the spherical harmonic functions have been proposed in the literature and there is yet no consensus in the community about which one should be preferred (if any). This is a frequent source of confusion for the end users and this may lead to compatibility issues between rendering engines. This paper reviews the different conventions in use, presents an extension of the FuMa scheme to higher orders, and discusses possible pitfalls in the decoding stage.
Convention Paper 9769
P16-07 Perceptually Motivated Amplitude Panning (PMAP) for Accurate Phantom Image Localization
Hyunkook Lee (Presenting Author)
This paper proposes and evaluates a new constant-power amplitude-panning law named “Perceptually Motivated Amplitude Panning (PMAP).” The method is based on novel image shift functions that were derived from previous psychoacoustic experiments. The PMAP is also optimized for a loudspeaker setup with an arbitrary base angle using a novel phantom image localization model. Listening tests conducted using various sound sources suggest that, for the 60° base angle, the PMAP provides a significantly better panning accuracy than the tangent law. For the 90° base angle, on the other hand, both panning methods perform equally good. The PMAP is considered to be useful for intelligent sound engineering applications, where an accurate matching between the target and perceived positions is important.
Convention Paper 9770
P16-08 Full-Sphere Binaural Sound Source Localization by Maximum-Likelihood Estimation of Interaural Parameters
Benjamin Hammond (Presenting Author), Philip J. B. Jackson (Author)
Binaural recording technology offers an inexpensive, portable solution for spatial audio capture. In this paper a full-sphere 2D localization method is proposed that utilizes the Model-Based Expectation-Maximization Source Separation and Localization system (MESSL). The localization model is trained using a full-sphere head related transfer function dataset and produces localization estimates by maximum-likelihood of frequency-dependent interaural parameters. The model’s robustness is assessed using matched and mismatched HRTF datasets between test and training data, with environmental sounds and speech. Results show that the majority of sounds are estimated correctly with the matched condition in low noise levels; for the mismatched condition, a “cone of confusion” arises with albeit effective estimation of lateral angles. Additionally, the results show a relationship between the spectral content of the test data and the performance of the proposed method.
Convention Paper 9771
P16-09 Spatial Quality and User Preference of Headphone Based Multichannel Audio Rendering Systems for Video Games: A Pilot Study
Joe Rees-Jones (Presenting Author), Damian Murphy (Author)
This paper presents a pilot experiment comparing the perceived spatial quality and preference of virtualized 7.0 surround-sound video game audio with a stereo down-mix of the same material. The benefits of multichannel audio in gaming are clear in that spatialized sound effects can be used to create immersive and dynamically reacting virtual environments, whilst also offering competitive advantages. However, results from this study suggest that the spatial quality of virtual 7.0 surround-sound is not perceived to be significantly different to that of a stereo down-mix and neither rendering method is preferred, based on a feedback from 18 participants. These results are interesting but surprising, as they bring into question the current methods used for spatial game audio presentation over headphones.
Convention Paper 9772
Monday, May 22, 11:00 — 12:00 (Salon 2+3 Rome)
Jan Berg (Chair)
P17-01 Do In-Ear Monitors Protect Musicians’ Hearing?
Arne Nykänen (Presenting Author), Jan Berg (Author), Tomas Johannesson (Author), Magnus Löfdahl (Author)
In-ear monitors for live performances are commonly considered to give better sound quality than loudspeaker monitors. They are also often assumed to reduce sound exposure. Because of lack of evidence for this, sound exposure for pop/rock/jazz musicians was compared between performances with in-ear and loudspeaker monitors. Equivalent sound pressure levels at the musicians’ ears were 94 to 105 dBA with loudspeaker and 86 to 108 dBA with in-ear monitors. Many participants used earplugs when using loudspeaker monitors. Therefore, the recommendation, from a pure hearing protection perspective, is to use loudspeaker monitors and earplugs. However, the large spread in levels between musicians using in-ear monitors suggests that with better training and measurements of sound exposure, in-ear monitors could be used safely.
Convention Paper 9773
P17-02 An Open-Source Audio Renderer for 3D Audio with Hearing Loss and hearing Aid Simulations
Maria Cuevas-Rodriguez (Presenting Author), Carlos Garre (Author), Daniel Gonzalez-Toledo (Author), Luis Molina-Tanco (Author), Lorenzo Picinali (Author), David Poirier-Quinot (Author), Arcadio Reyes-Lecuona (Author), Ernesto de La Rubia-Buestas (Author)
The EU-funded 3D Tune-In (http://www.3d-tune-in.eu/) project introduces an innovative approach using 3D sound, visuals, and gamification techniques to support people using hearing aid devices. In order to achieve a high level of realism and immersiveness within the 3D audio simulations, and to allow for the emulation (within the virtual environment) of hearing aid devices and of different typologies of hearing loss, a custom open-source C++ library (the 3D Tune-In Toolkit) has been developed. The 3DTI Toolkit integrates several novel functionalities for speaker and headphone-based sound spatialization, together with generalized hearing aid and hearing loss simulators. A first version of the 3DTI Toolkit will be released with a non-commercial open-source license in Spring 2017.
Convention Paper 9774
Monday, May 22, 13:00 — 15:30 (Salon 1 Moscow)
Sean Olive (Chair)
P18-01 Sensory Profiling of High-End Loudspeakers Using Rapid Methods—Part 2: Projective Mapping with Expert and Naïve Assessors
Davide Giacalone (Presenting Author), Søren Bech (Author), Torstein Boðason (Author), Jakob Lund Laugesen (Author), Samuel Moulin (Author), Maciej Nitkiewicz (Author)
This is the second of a series of papers evaluating the efficiency of rapid sensory profiling methodologies in the audio field [1]. The present paper introduces projective mapping [2] as a method for perceptual audio evaluation and demonstrates its application for discrimination and description of a set of high-end loudspeakers. Additionally, the suitability of the method with both experts and naïve assessors was evaluated. The results showed a successful discrimination between the loudspeakers with the main differences primarily associated to bass strength and bass depth. A high degree of agreement was observed between perceptual configurations obtained separately by the expert and the naïve assessors, though the former outperformed the latter in the descriptive part of the method.
Convention Paper 9775
P18-02 Potential Audibility and Effects of Ultrasonic Surveillance Monitoring of PA and Life Safety Sound Systems
Peter Mapp (Presenting Author)
Ultrasonic surveillance monitoring, to check the operational integrity of PA and Emergency Communication Systems, has been in existence for over 40 years – particularly in Europe. Since its inception, there has been debate as to the potential audibility that these systems may have. As the vast majority of PA systems engineers and designers have not heard or experienced any effects, is has generally been assumed that the general public do not either. Recently however, concern has been raised and claims of ill effects have been reported. There is however, little or no data as to the ultrasonic sound levels that PA systems actually emit. The paper discusses the results of an initial survey of ultrasound radiated by a sample of some 50 PA systems and compares the results with a number of international standards – there currently being little or no specific guidance. The paper reviews the technology involved, typical emission levels and concludes by making a number of recommendations to assist with the control of ultrasonic emissions from PA systems that should help to mitigate unintended side effects.
Convention Paper 9776
P18-03 Pink Noise Formant Bandwidth Discrimination
Tomira Rogala (Presenting Author)
This paper presents the results of the third part of an experiment aimed to determine discrimination thresholds for timbre of pink noise modified by a formant. The investigated parameter was the Q factor (Q=f/Δf). The Q=3 was used as a reference and the comparison stimuli had Q>3. A 3AFC test paradigm was used. The listeners, who were tonmeisters and non-musicians, were asked to indicate which noise burst in each group of three was a different one. The results indicate that: (1) the Q discrimination threshold as a function of formant frequency has a U shape, (2) tonmeisters better discriminate Q changes than non-musicians, and (3) all listeners improved their scores with practice. Above results are consistent with those reported previously.
Convention Paper 9777
P18-04 The Influence of Program Material on Sound Quality Ratings of In-Ear Headphones
Sean Olive (Presenting Author), Omid Khonsaripour (Author), Todd Welti (Author)
A listening test was conducted to identify music programs that provide sensitive, discriminating, and reliable ratings for in-ear (IE) headphone evaluations. Ten trained listeners gave sound quality ratings for eight models of IE headphones using ten different music programs. A virtualized headphone method was used to provide double blind, controlled presentations in which headphone leakage effects were monitored and eliminated. The main effect on the sound quality ratings was due to headphones while the program produced no significant effects or interactions. However, certain programs produced more discriminating and reliable ratings than other programs, the key factor being the bandwidth of the program’s spectral content, and the subject’s familiarity with it. As expected, the amount of bass content in each program tended to influence the ratings of headphones that had too much or too little bass output in their measured frequency response.
Convention Paper 9778
P18-05 Audio Quality Evaluation in MUSHRA Tests–Influencesbetween Loop Setting and a Listeners’ Ratings
Nadja Schinkel-Bielefeld (Presenting Author)
In many listening tests for audio quality evaluation the listeners have the possibility to set loops, meaning they can focus on a smaller part of the audio signal and listen to that repeatedly. In previous papers we already showed that experienced listeners set more loops and that learning to set loops increases the ability of the listener to perceive artifacts. Now we analyze to what extent these loops chosen by the listener vary from listener to listener and whether the ratings are influenced by the choice of loops of the listener. We show that—depending on the stimulus—listeners who set different loops may also rate significantly different.
Convention Paper 9779
Monday, May 22, 15:00 — 18:00 (Gallery Window)
P19-01 Audio Time Stretching with Controllable Phase Coherence
Nicolas Juillerat (Presenting Author)
This paper presents a hybrid audio time stretching technique in which the trade-off between vertical and horizontal phase coherence can be freely controlled by a single parameter. Depending on that parameter, the proposed technique sounds like a time domain technique at one extreme, like a phase-locked vocoder at the other extreme, or anywhere in between. By properly choosing the value of the control parameter, it is possible to manually adjust the algorithm to the characteristics of the audio signal being transformed in order to get an optimal result. Furthermore, appropriate middle values yield good results for a wide range of audio signals with mixed content.
Convention Paper 9780
P19-02 Modelling Nonlinearities on Musical Instruments by Means of Volterra Series
Vanna Lisa Coli (Presenting Author), Francesco F. Gionfalo (Author), Lamberto Tronchin (Author)
The behavior of the soundboard of electroacoustic tools and musical instruments has being investigated for several years. The modelling of such instruments is fundamental in order to determine their acoustic characterization. The determination of nonlinear features of the sound production and propagation allows the definition of acoustical aspects that can’t be reproduced with methods based on linear impulse response. A method that allows approximating nonlinear distortions of musical instruments by exploiting the Volterra series model is presented. A Matlab code has been developed in order to test the method on real world audio signals. Results of applications are presented on a series of different wind instruments. Some sound examples are provided.
Convention Paper 9781
P19-03 The Influence of Source Spectrum and Loudspeaker Azimuth on Vertical Amplitude Panning
Maksims Mironovs (Presenting Author), Hyunkook Lee (Author)
Listening tests were conducted to examine the influence of source spectrum and loudspeaker azimuth on the accuracy of vertical amplitude panning. Subjects judged the perceived elevation of the phantom images created using vertical loudspeaker pairs placed at 0°and 30° azimuths. Six sound sources with different spectral characteristics were used: broadband, low-passed and high-passed pink noises as well as speech, bird and tank shot recordings. Results generally indicated that the localization accuracy was poor, however, lower or upper response biases observed in the results were found to be significantly dependent on the target panning angle, the stimuli and the loudspeaker azimuth angle. In particular, the low-passed noise presented from the loudspeakers at 30° azimuth was perceived to be significantly elevated.
Convention Paper 9782
P19-04 Efficient Natural Sample Calculation for Digital Pulse Width Modulation
Carsten Wegner (Presenting Author), Dietmar Ehrhardt (Author), Robert Schwann (Author)
In this paper, an improved algorithm for natural sampling is presented that is suitable for digitally controlled fixed frequency PWM modulators. With only 5 MAC operations, 4 multiplications, and 2 additions, the algorithm calculates both switching times for double-sided 3-level PWM, and offers more than 100 dB between signal and PWM related distortion products for high fidelity audio applications. These features compare well with results published [2-6]. The algorithm can be combined with a noise shaped local feedback for quantized pulse lengths and the digital modulator can be integrated into a global feedback loop.
Convention Paper 9783
P19-05 Construction of Lightweight Loudspeaker Enclosures
Herle Bagh Juul-Nyholm (Presenting Author), Michael A. E. Andersen (Author), Niels Henrik Mortensen (Author), Henrik Schneider (Author), Jonas Corfitz Severinsen (Author)
On the basis of bass cabinets, this paper deals with the problem of reducing loudspeaker enclosure weight. An introductory market analysis emphasizes that lighter cabinets are sought, but maintenance of sound quality is vital. The problem is challenged through experiments and simulations in COMSOL Multiphysics, which indicate that weight reduction and sound quality maintenance is possible by reducing wall thickness and using adequate bracing and lining.
Convention Paper 9784
P19-06 LAMI: A Gesturally Controlled Three-Dimensional Stage Leap (Motion-Based) Audio Mixing Interface
Jonathan Wakefield (Presenting Author), Christopher Dewey (Author), William Gale (Author)
Interface designers are increasingly exploring alternative approaches to user input/control. LAMI is a Leap (Motion-based) AMI that takes user’s hand gestures and maps these to a three-dimensional stage displayed on a computer monitor. Audio channels are visualized as spheres whose Y coordinate is spectral centroid and X and Z coordinates are controlled by hand position and represent pan and level respectively. Auxiliary send levels are controlled via wrist rotation and vertical hand position and visually represented as dial-like arcs. Channel EQ curve is controlled by manipulating a lathed column visualization. Design of LAMI followed an iterative design cycle with candidate interfaces rapidly prototyped, evaluated, and refined. LAMI was evaluated against Logic Pro X in a defined audio mixing task.
Convention Paper 9785
P19-07 OSPW (Open Signal Processing Workstation)–Development of a Stand-Alone Open Platform for Signal-Processing in AV-Networks
Holger Stenschke (Presenting Author), Clemens Fiechter (Author), Peter Glaettli (Author), Thomas Resch (Author), Roman Riedl (Author)
This paper presents the concept and design of a newly developed stand-alone, fully programmable signal processing platform for networked audio and music applications. In recognition of one of the first successful music DSP computation platforms, the ISPW [1], this prototype was named OSPW | Open Signal Processing Workstation. The first part of this paper describes the project's main objectives. The second part provides an overview of the OSPW system components, along with the technologies in use. The third part outlines proof-of-concept demo applications and gives an outlook as to potential user scenarios.
Convention Paper 9786
P19-08 Extending Temporal Feature Integration for Semantic Audio Analysis
Lazaros Vrysis (Presenting Author), Charalampos Dimoulas (Author), George Papanikolaou (Author), Nikolaos Tsipas (Author)
Semantic audio analysis has become a fundamental task in contemporary audio applications; consequently, further improvement and optimization of classification algorithms has also become a necessity. During the recent years, standard frame-based audio classification methods have been optimized and modern approaches introduced additional feature engineering steps, attempting to capture the temporal dependency between successive feature observations. This type of processing is known as Temporal Feature Integration. In this paper, the enhancement of statistical feature integration is proposed by extending and extensively evaluating the measures that can be deployed. Under this scope, new functions for capturing the shape of a texture window are introduced and evaluated. The ultimate goal of this work is to highlight the best performing measures for early temporal integration, focusing on simple feature engineering, avoiding complexity, and forming a compact and robust set of meta-features that can improve performance in audio classification tasks.
Convention Paper 9808
Monday, May 22, 15:30 — 16:30 (Salon 1 Moscow)
John Krivit (Chair)
P20-01 Withdrawn
Convention Paper 9787
P20-02 Facilitating Online International Student Collaborations Through Sound Design
Robert Steel (Presenting Author), Kenneth B. McAlpine (Author)
Cultural exchange and internationalization have grown hugely in significance within higher education in the last few years. In the broadest sense, this agenda is about preparing students for living in and contributing to an increasingly connected global society. At a time when the political and social trend seems to be towards exclusionism, exposing students to a vibrant blend of ideas, opinions and experiences within the stimulating yet safe space of university resonates all the more strongly. Historically, however, it has been difficult to encourage students to participate fully, particularly with regard to student mobility and studying abroad. Socio-economic and cultural factors play an important role here. Abertay and DePaul are both committed to widening participation and have a high proportion of first-generation students from the lowest socio-economic groups. Consequently, less than five percent of students at DePaul study abroad, and Scotland has one of the lowest student mobility rates in Europe, thus limiting opportunities for students to situate their learning within a global context. Recent developments in digital communications and platform sharing technologies have allowed universities to explore online collaboration and virtual exchange, but that raises new challenges. In particular, how do you embed a sense of genuine cultural exchange between students who are geographically remote and still enmeshed in their local culture? This paper explores one response to the problem, using a collaborative sound design project to build a strong sense of cultural exchange between students located at two universities, Abertay University in Scotland and DePaul in the USA.
Convention Paper 9788
Monday, May 22, 15:30 — 17:30 (Salon 2+3 Rome)
Elena Shabalina (Chair)
P21-01 Far-Field Noise Prediction for Open-Air Events. Part 1: Background and Propagation Models
Elena Shabalina (Presenting Author), Daniel Belcher (Author), Matthias Christner (Author), Jochen Schaal (Author), Dieter Zollitsch (Author)
In the past the main focus of loudspeaker manufacturers and sound system designers was to provide the best possible sound quality for the listeners. With the number of these events increasing along with the number of the affected inhabitants and their complaints, the focus is shifting towards including predicting and minimizing the noise in the neighborhood in the planning of an open air event. The presented calculation method is designed to close the gap between the environmental noise propagation models and complex loudspeaker system models. The implementation of the Nord2000 and ISO 9613-2 propagation models were extended to include complex loudspeaker setups. This paper presents the motivation and the theoretical background of the new prediction method.
Convention Paper 9790
P21-02 Noise Prediction Software for Open-Air Events Part 2: Experiences and Validation
Daniel Belcher (Presenting Author), Matthias Christner (Author), Elena Shabalina (Author)
The prediction and minimization of noise in the neighborhood during the planning phase of open-air events is becoming more important. The common available software for calculating environmental noise did not consider complex summation of sound because typical noise sources in traffic or industry are not coherent. State of the art sound systems with arrays of loudspeakers and subwoofers effectively use coherence in order to achieve their high directivity. The propagation models were not only extended for complex summation, but also for import of complex data from a system design tool (see Part 1 for details). This paper presents experiences with the simulation software NoizCalc in the field since its launch, its validation by means of a comparison with accompanying measurements and a derivation of uncertainty, in order to set the informative value of a prediction into context.
Convention Paper 9791
P21-03 Development and Evaluation of an Interface with Four-Finger Pitch Selection
Henrik von Coler (Presenting Author), Hauke Egermann (Author), Gabriel Treindl (Author), Stefan Weinzierl (Author)
This paper presents the development and evaluation of an interface for electronic musical instruments, designed for controlling monophonic synthesizers. The hand-held device allows the pitch selection with one hand, using four valve-like metal mechanics and three octave switches. Note events are triggered with a wooden excitation pad, operated with the second hand. The sensors are equipped with an advanced aftertouch, which enables expressive playing. In a user experiment, the controller is compared to a MIDI keyboard, regarding the reaction time and error rate in simple tasks. Results show no significant difference in the response time but a higher error rate for the novel interface. Outcome of this work is a list of necessary improvements and a plan for further experiments.
Convention Paper 9792
P21-04 Interactive Display of Microphone Polarity Patterns with Non-Fixed Frequency Point
Jonathan D. Ziegler (Presenting Author), Hendrik Paukert (Author), Bernfried Runow (Author)
With the development of bidirectional and unidirectional microphones dating back to the 1930s, the parameter of directivity has been an integral aspect of microphone construction for nearly 100 years [1]. This characteristic is commonly visualized with the microphone’s sensitivity displayed as a radius r over a 360-degree span within a polar coordinate system. Measured directivity is generally shown as an overlay of well-defined frequencies [2]. Although this is common practice, in-depth analysis of the actual performance of a microphone is difficult. In this paper, a novel approach to displaying the directional characteristics of a microphone is presented, providing an interactive display of the angular sensitivity at any frequency. Furthermore, the application within microphone array development is discussed.
Convention Paper 9793
Tuesday, May 23, 09:00 — 10:30 (Salon 1 Moscow)
Akio Ando (Chair)
P22-01 Optimization of Temporally Diffuse Impulses for Decorrelation of Multiple Discrete Loudspeakers
Jonathan B. Moore (Presenting Author), Adam Hill (Author)
Temporally diffuse impulses (TDIs) were originally developed for large arrays of distributed mode loudspeakers to achieve even radiation patterns. This initial investigation evaluates the performance of TDIs in terms of the reduction of low frequency spatial variance across an audience area when used with conventional loudspeakers. A novel variable decay windowing method is presented, allowing users control of TDI performance and perceptibility. System performance is modelled using an anechoic and an image source acoustic model. Results in the anechoic model show a mean spatial variance reduction of 42%, with a range of source material and using the optimal TDI generation methodology. Results in the image source model are more variable, suggesting that coherence of source reflections reduces static TDI effectiveness.
Convention Paper 9794
P22-02 Seamless Spatial Calibration of Multichannel Sound Systems
Antoine Peillot (Presenting Author)
In a multichannel audio setup, spatial calibration aims at delivering an optimal sound experience at the listening position. Since the listener is not expected to stand at the focal point between the surrounding speakers, usually arbitrarily placed, it is needed to focus the sweet spot at the listening position. To do so, proper gains and delays need to be applied to each channel composing the audio setup. The work presented in this paper provides a solution to automatically estimate and apply these parameters. It is based on joint user and speaker localization in a seamless way thanks to microphones embedded in surrounding speakers. A patent is currently pending for the spatial calibration method described in this paper.
Convention Paper 9795
P22-03 Extraction of Interchannel Coherent Component from 3D Multichannel Audio
Yuta Hashimoto (Presenting Author), Akio Ando (Author), Hiroki Tanaka (Author)
Extraction of interchannel coherent component is a useful method that is applicable to the improvement of blurry sound image and setting up an upmix system. In this paper we propose a new method that extracts the component from three-dimensional (3D) multichannel audio signal. Such a signal sometimes has a negative cross correlation among channels because it includes independent sounds propagated from different directions. To handle this problem, a new method is proposed to estimate the component of one channel signal by the other channel signals having positive correlations with the signal in each subband. The experimental result showed that the estimation of the component by selected channel signals brought better performance than that by all channel signals.
Convention Paper 9796
Tuesday, May 23, 10:30 — 12:30 (Salon 1 Moscow)
Will Howie (Chair)
P23-01 Subjective Evaluation of Orchestral Music Recording Techniques for Three-Dimensional Audio
Will Howie (Presenting Author), Florian Grond (Author), Richard King (Author), Denis Martin (Author)
A double-blind study was conducted to evaluate a recently developed microphone technique for three-dimensional orchestral music capture, optimized for 22.2 Multichannel Sound. The proposed technique was evaluated against a current 22.2 production standard for three-dimensional orchestral music capture, as well as a coincident, higher order ambisonics capture system: the Eigenmike. Analysis of the results showed no significant difference in listener evaluation between the proposed technique and the current production standard in terms of the subjective attributes “clarity,” “scene depth,” “naturalness,” “environmental envelopment,” and “quality of orchestral image.”
Convention Paper 9797
P23-02 Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface
Christopher Dewey (Presenting Author), Jonathan Wakefield (Author)
The two-dimensional stage paradigm (2DSP) has been suggested as an alternative audio mixing interface (AMI). This study seeks to refine the 2DSP by formally evaluating graphical track visualization styles. Track visualizations considered were text only, circles containing text, individually colored circles containing text, circles color coded by instrument type with text, icons with text superimposed, circles with RMS related dynamic opacity, and a traditional AMI. The usability evaluation focused on track selection efficiency and included user visualization preference for this micro-task. Test subjects were instructed to click five randomly selected tracks for a six, sixteen, and thirty-two track mix for each visualization. The results indicate text only visualization is best for efficiency however test subjects preferred icons and traditional AMI.
Convention Paper 9798
P23-03 In-Ear vs. Loudspeaker Monitoring for Live Sound and the Effect on Audio Quality Attributes and Musical Performance
Jan Berg (Presenting Author), Tomas Johannesson (Author), Magnus Löfdahl (Author), Arne Nykänen (Author)
A successful performance of live music is dependent on how well musicians can hear themselves and the other members of the ensemble. Sound reinforcement systems can offer monitoring either by on-stage loudspeakers or in-ear headphones. These two monitoring conditions were compared to search for perceived auditory differences that affect parts of musical performance. Four jazz/pop/rock bands made live performances where monitor sound was provided to the musicians. Each band repeated their performance, changing from one monitoring condition to the other. After every performance, the musicians responded to questionnaires covering musical performance and audio quality. Experts also assessed recordings of the performances. Results show that perceived differences exist in audio quality and musical performance between loudspeaker monitors and in-ear headphone monitors.
Convention Paper 9799
P23-04 Using a Speech Codec to Suppress Howling in Public Address Systems
David Ditter (Presenting Author), Edgar Berdahl (Author)
Acoustical feedback is present whenever a loudspeaker signal gets redirected to a microphone that feeds its input signal directly or indirectly back into the loudspeaker. If the gain around such a closed feedback loop is close to or higher than unity, unpleasant acoustical artifacts will typically occur and will nearly always lead to a periodic howling sound. Most readers are probably familiar with this noise, which can for example set in when a microphone is accidentally pointed at a speaker. This research project aims to suppress these unwanted effects of acoustical feedback by the insertion of a modified speech coder and decoder into the signal path of the feedback loop. It is demonstrated that the Speex open-source speech codec can be successfully tweaked to increase the maximum stable feedback gain by as much as 3 dB to 7 dB through adjustment of the codec’s quality parameter. This enhancement outperforms the simple introduction of shaped noise into the feedback loop and is compared with the performance of a frequency shifter. Tests are conducted using an automated experimental framework for determining the maximum stable gain of a public address system.
Convention Paper 9800
Tuesday, May 23, 10:30 — 12:00 (Salon 2+3 Rome)
Annika Neidhardt (Chair)
P24-01 Usability and Effectiveness of Auditory Sensory Substitution Models for the Visually Impaired
Adam Csapo (Presenting Author), Michal Bujacz (Author), Marcelo Herrera Martinez (Author), Gabriel Ivanica (Author), Maciej Janeczek (Author), Alin Moldoveanu (Author), Simone Spagnol (Author), Runar Unnthorsson (Author), György Wersényi (Author)
This paper focuses on auditory sensory substitution for providing visually impaired users with suitable information in both static scene recognition and dynamic obstacle avoidance. We introduce three different sonification models together with three temporal presentation schemes, i.e., ways of temporally organizing the sonic events in order to provide suitable information. Following an overview of the motivation and challenges behind each of the solutions, we describe their implementation and an evaluation of their relative strengths and weaknesses based on a set of experiments in a virtual environment.
Convention Paper 9801
P24-02 Adaptive Audio Engine for EEG-Based Horror Game
Jordan Craig (Presenting Author)
This paper documents the design and play-testing of a videogame that incorporates electroencephalography (EEG) technology to augment traditional controls. A survival horror game was created using Unity3D. The player navigates the game using conventional keyboard and mouse movement, however, they also wear an Emotiv EPOC headset that transmits their level of calm to the game via OSC. In order to complete the game, the player must remain as calm as possible. An adaptive audio engine was developed to act as an auditory display for this complex parameter in lieu of a distracting visual indicator. Every element of the audio was designed to adapt to the constantly fluctuating value. Procedural audio modules were created in Max, where player EEG data was simultaneously mapped to a myriad of modulators. FMOD Studio was used for non-procedural elements due to its facilitation of real-time control parameters, as well as its integration with Unity3D.
Convention Paper 9802
This paper will not be presented but is available in the E-Library
P24-03 Real-Time Reverb Reduction for Improved Automatic Speech Recognition in Far-Field
Adam Kupryjanow (Presenting Author), Lukasz Kurylo (Author), Piotr Lasota (Author), Przemyslaw Maziewski (Author)
In the paper, methods of real-time reverb reduction based on Generalized Weighted Prediction Error (GWPE) were presented. It was shown that usage of the proposed audio processing routines highly improve the accuracy of Automatic Speech Recognition (ASR) system namely word error rates (WERs) are reduced 11.36% when the user stands 5 meters from the microphone array. The obtained results are close to the ones that are achieved by the offline GWPE implementation (12.06%). Thanks to optimizations and parameters tuning, computational complexity of the proposed realization of GWPE was highly reduced and it achieves RTFs lower than 1.0 (computation time is shorter than signal duration) when using one core of CPU.
Convention Paper 9803
Tuesday, May 23, 13:00 — 14:30 (Salon 1 Moscow)
Jan Abildgaard Pedersen (Chair)
P25-01 Amplitude Panning between Beamforming-Controlled Direct and Reflected Sound
Franck Zagala (Presenting Author), Matthias Frank (Author), Julian Linke (Author), Franz Zotter (Author)
Loudspeaker beamformers such as commercial sound bars can be used to produce narrow beams of sound that mainly reach the listener on distinct reflection paths or the direct path in the room. What happens if such variable directivity loudspeakers create two simultaneous beams of sounds with the same signal, each of which pointing to another acoustic path in the room? What is the resulting perceived direction of such a phantom source, and how do changes of time and level differences in the signal pair affect the result? This paper investigates these questions by a listening experiment that employs an auralized 3rd order source.
Convention Paper 9805
P25-02 Sound Zones: On the Effect of Ambient Temperature Variations in Feed-Forward Systems
Martin Olsen (Presenting Author), Martin Bo Møller (Author)
The precondition for realizing personal sound zones, relying on multichannel feed-forward control, is the robustness in the characterization of the sound field inside the control regions. Achieving high separation depends on the ability to accurately estimate the acoustic transfer functions from a set of control loudspeakers to the zones. In this paper the assessment of ambient temperature variations is based on a front-to-rear scenario at low frequencies in a car cabin. Experimental studies in a production vehicle show significant performance decrease, when the temperature conditions in the playback situation differ from those present during the setup procedure. The main cause of the mismatch in the two sets of acoustic transfer functions is analyzed and potential compensation strategies are discussed accordingly.
Convention Paper 9806
P25-03 Assessing the Influence of Loudspeaker Driver Nonlinear Distortion on Personal Sound Zones
Xiaohui Ma (Presenting Author), Patrick J. Hegarty (Author), Lars G. Johansen (Author), Jakob Juul Larsen (Author), Jan Abildgaard Pedersen (Author)
The impacts of loudspeaker nonlinear distortion on sound zones are measured in an anechoic chamber. Two loudspeaker arrays, each with four equally spaced drivers, are used to generate two sound zones, one bright and one dark. Acoustic contrast control (ACC) and planarity control (PC) are employed as control methods. A 250 Hz sinusoidal signal is used as stimulus, and the target sound pressure level for the bright zone is 82 dB. Simulations based on measured transfer functions give acoustic contrast of 43.1 dB between the two zones whereas the experimentally measured acoustic contrast is only 32.1 dB for ACC, and 29.3 dB for PC. Nonlinear distortion contributes to this contrast loss according to spectrum measurements. Experiments also reveal that the nonlinear distortion can be controlled through regularization of the control effort; the regularization parameter has an optimal value which can balance the acoustic contrast and nonlinear distortion.
Convention Paper 9807