AES New York 2019
Engineering Brief Details
EB1 - Recording and Production
Friday, October 18, 9:00 am — 10:15 am (1E11)
Tomasz Zernicki, Zylia sp. z o.o. - Poznan, Poland
EB1-1 Recording and Mixing of Classical Music Using Non-Adjacent Spherical Microphone Arrays and Audio Source Separation Algorithms—Eduardo Patricio, Zylia Sp. z o.o. - Poznan, Poland; Mateusz Skrok, Zylia Sp. z o.o. - Poznan, Poland; Tomasz Zernicki, Zylia sp. z o.o. - Poznan, Poland
The authors present a novel approach to recording classical music, making use of non-adjacent 3rd order Ambisonics microphone arrays. The flexible combination of source separated signals with varied degrees of beamforming focus enable independent levels control, while maintaining the spatial coherence and reverberation qualities of the recorded spaces. The non-coincidental arriving locations of multiple arrays allow for post-production manipulations without disrupting the inherent classical music logic that values the overall sound as opposed to individual single sound sources. In addition, this method employs portable and lightweight equipment to record decorrelated signals, which can be mixed in surround formats with enhanced sense of depth.
Engineering Brief 525 (Download now)
EB1-2 Exploring Preference for Multitrack Mixes Using Statistical Analysis of MIR and Textual Features—Joseph Colonel, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
We investigate listener preference in multitrack music production using the Mix Evaluation Dataset, comprised of 184 mixes across 19 songs. Features are extracted from verses and choruses of stereo mixdowns. Each observation is associated with an average listener preference rating and standard deviation of preference ratings. Principal component analysis is performed to analyze how mixes vary within the feature space. We demonstrate that virtually no correlation is found between the embedded features and either average preference or standard deviation of preference. We instead propose using principal component projections as a semantic embedding space by associating each observation with listener comments from the Mix Evaluation Dataset. Initial results disagree with simple descriptions such as “width” or “loudness” for principal component axes.
Engineering Brief 526 (Download now)
EB1-3 Machine Learning Multitrack Gain Mixing of Drums—Dave Moffat, Queen Mary University London - London, UK; Mark Sandler, Queen Mary University of London - London, UK
There is a body of work in the field of intelligent music production covering a range of specific audio effects. However, there is a distinct lack of any purely machine learning approaches to automatic mixing. This could be due to a lack of suitable data. This paper presents an approach to used human produced audio mixes, along with their source multitrack, to produce the set of mix parameters. The focus will be entirely on the gain mixing of audio drum tracks. Using existing reverse engineering of music production gain parameters, a target mix gain parameter is identified, and these results are fed into a number of machine learning algorithms, along with audio feature vectors of each audio track. This allow for a machine learning prediction approach to audio gain mixing. A random forest approach is taken to perform a multiple output prediction. The prediction results of the random forest approach are then compared to a number of other published automatic gain mixing approaches. The results demonstrate that the random forest gain mixing approach performs similarly to that of a human engineer and outperforms the existing gain mixing approaches.
Engineering Brief 527 (Download now)
EB1-4 Why Microphone Arrays Are Not Better than Single-Diaphragm Microphones with Regard to Their Single Channel Output Quality—Helmut Wittek, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany; Hannes Dieterle, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany
A comparison of the directional characteristics of single-diaphragm vs multi-microphone arrays is performed on the basis of frequency response and polar diagram measurements. The simple underlying question was: Is a conventional first-order pressure-gradient microphone better than an M/S array or an Ambisonics microphone regarding the quality of their individual outputs? The study reveals significant differences and a clear superiority of single-diaphragm microphones regarding the smoothness of on- and off-axis curves which is believed to highly correlate with timbral fidelity. Array microphones, on the other hand, can potentially create variable patterns and a higher order directivity. [Presentation only; not in E-Library]
EB1-5 Predicting Objective Difficulty in Peak Identification Task of Technical Ear Training—Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
Technical ear training is a method to improve the ability to focus on a speci?c sound attribute and to communicate using the vocabularies and units shared in Audio Engineering. In designing the successful course in a sound engineers’ educational institution, it is essential to have a gradual increase in the task dif?culty. In this e-Brief, the authors investigated creating a predictive model of objective dif?culty for a given music excerpt when it is used in a peak identi?cation task of technical ear training. The models consisting of six or seven acoustic features, including statistics on attack transients and power spectrum, showed overall better results.
Engineering Brief 565 (Download now)
EB2 - Posters: Applications in Audio
Friday, October 18, 9:00 am — 10:30 am (South Concourse A)
EB2-2 A Latency Measurement Method for Networked Music Performances—Robert Hupke, Leibniz Universität Hannover - Hannover, Germany; Sripathi Sridhar, New York University - New York, NY, USA; Andrea Genovese, New York University - New York, NY, USA; Marcel Nophut, Leibniz Universität Hannover - Hannover, Germany; Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Tom Beyer, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
The New York University and the Leibniz University Hannover are working on future immersive Networked Music Performances. One of the biggest challenges of audio data transmission over IP-based networks is latency, which can affect the interplay of the participants. In this contribution, two metronomes, utilizing the Global Positioning System to generate a globally synchronized click signal, were used as a tool to determine delay times in the data transmission between both universities with high precision. The aim of this ?rst study is to validate the proposed method by obtaining insights into transmission latency as well as latency ?uctuations and asymmetries. This work also serves as baseline for future studies and helps to establish an effective connection between the two institutions.
Engineering Brief 529 (Download now)
EB2-3 An Investigation into the Effectiveness of Room Adaptation Systems: Listening Test Results—Pei Yu, Nanjing University - Nanjing, China; Ziyun Liu, Nanjing University - Nanjing, China; Shufeng Zhang, Nanjing University - Nanjing, China; Yong Shen, Nanjing University - Nanjing, Jiangsu Province, China
Loudspeaker-room interactions are well known for affecting the perceived sound quality of low frequencies. To solve this problem, different room adaptation systems for adapting a loudspeaker to its acoustic environment have been developed. In this study two listening tests were performed to assess the effectiveness of four different room adaptation systems under different circumstances. The factors investigated include the listening room, loudspeaker, listening position, and listener. The results indicate that listeners’ preference for different adaptation systems is affected by the specific acoustic environment. It was found that the adaptation system based on acoustic power measurement proved to be more preferred, also with stable performance.
Engineering Brief 530 (Download now)
EB2-4 Evaluating Four Variants of Sine Sweep Techniques for Their Resilience to Noise in Room Acoustic Measurements—Eric Segerstrom, Rensselaer Polytechnic Institute - Troy, NY, USA; Ming-Lun Lee, University of Rochester - Rochester, NY, USA; Steve Philbert, University of Rochester - Rochester, NY, USA
The sine sweep is one of the most effective methods for measuring room impulse responses; however, ambient room noise or unpredictable impulsive noises can negatively affect the quality of the measurement. This study evaluates four different variants of sine sweeps techniques for their resilience to noise when used as an excitation signal in room impulse response measurements: linear, exponential, noise whitened, and minimum noise. The result shows that in a pseudo-anechoic environment, exponential and linear sine sweeps are most resilient to impulsive noise among the four sweeps, while none of the evaluated sine sweeps are resilient to impulsive noise in an acoustically untreated room. Additionally, it is shown that minimum noise sine sweeps are most resilient to ambient noise.
Engineering Brief 531 (Download now)
EB2-5 Perceptually Affecting Electrical Properties of Headphone Cable – Factor Hunting Approach—Akihiko Yoneya, Nagoya Institute of Technology - Nagoya, Aichi-pref., Japan
An approach to find the cause of the perceptual sound quality change by headphone cable has been proposed. This is a method of verifying the validity of the selected candidate by selecting candidate factors from the measurement results, simulating them by digital signal processing, and evaluating the simulated sounds by audition. In the headphone cable, it was found that the factor is that the inductance changes due to the flowing current. It has become clear from the experimental results that changes in transfer characteristics are very sensitively affecting the perceptual sound quality.
Engineering Brief 532 (Download now)
EB2-6 An Investigation into the Location and Number of Microphone Measurements Necessary for Efficient Active Control of Low-Frequency Sound Fields in Listening Rooms—Tom Bell, Bowers & Wilkins - Southwater, West Sussex, UK; University of Southampton - Southampton, Hampshire, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
The purpose of this investigation is to understand the minimum number of control microphone measurements needed and their optimal placement to achieve effective active control of the low-frequency sound field over a listening area in a rectangular room. An analytical method was used to model the transfer functions the loudspeakers and a 3-dimensional array of 75 virtual microphones. A least-squares approach was used to create one filter per sound source from a varying number and arrangement of these measurements, with the goal to minimize the error between the reproduced sound field and the target. The investigation shows once enough measurements are taken there is a clear diminishing return in the effectiveness of the filters versus the number of measurements needed. [Presentation only; not in E-Library]
EB2-7 Measuring Speech Intelligibility Using Head-Oriented Binaural Room Impulse Responses—Allison Lam, Tufts University - Medford, MA, USA; Ming-Lun Lee, University of Rochester - Rochester, NY, USA; Steve Philbert, University of Rochester - Rochester, NY, USA
Speech intelligibility/speech clarity is important in any setting in which information is verbally communicated. More specifically, a high level of speech intelligibility is crucial in classrooms to allow teachers to effectively communicate with their students. Given the importance of speech intelligibility in learning environments, several studies have analyzed how accurately the standard method of measuring clarity predicts the level of speech intelligibility in a room. In the context of speech measurements, C50 has been widely used to measure clarity. Instead of using a standard omnidirectional microphone to record room impulse responses for clarity measurements, this study examines the effectiveness of room impulse responses measured with a binaural dummy head. The data collected for this experiment show that C50 measurements differ between the left and right channels by varying amounts based on the dummy head’s position in the room and head orientation. To further investigate the effectiveness of binaural C50 measurements in comparison to the effectiveness of omnidirectional C50 measurements, this research explores the results of psychoacoustic testing to determine which recording method more consistently predicts human speech intelligibility. These results, combined with qualitative observations, predict how precisely acousticians are able to measure C50.
Engineering Brief 533 (Download now)
EB2-8 Compensation Filters for Excess Exciter Excursion on Flat-Panel Loudspeakers—David Anderson, University of Pittsburgh - Pittsburgh, PA, USA
Inertial exciters are used to actuate a surface into bending vibration, producing sound, but often have a high-Q resonance that can cause the exciter magnet to displace enough to contact the bending panel. The magnet contacting the panel can cause distortion and possibly even damage to the exciter or panel while having a minimal contribution to acoustic output. A method is outlined for deriving a digital biquad filter to cancel out the excessive displacement of the magnet based on measurements of the exciter’s resonant frequency and Q-factor. Measurements of exciter and panel displacement demonstrate that an applied filter reduces magnet excursion by 20 dB at the resonant frequency.
Engineering Brief 534 (Download now)
EB3 - Posters: Spatial Audio
Friday, October 18, 11:00 am — 12:30 pm (South Concourse A)
EB3-1 Comparing Externalization Between the Neumann KU100 Versus Low Cost DIY Binaural Dummy Head—Kelley DiPasquale, SUNY Potsdam, Crane School of Music - Potsdam, NY, USA
Music is usually recorded using traditional microphone techniques. With technology continually advancing, binaural recording has become more popular, that is, a recording where two microphones are used to create a three-dimensional stereo image. Commercially available binaural heads are prohibitively expensive and not practical for use in typical educational environments or for casual use in a home studio. This experiment consisted of gathering recorded stimuli with a homemade binaural head and the Neumann KU 100. The recordings were played back for 34 subjects instructed to rate the level of externalization for each example. The study investigates whether a homemade binaural head made for under $500 can externalize sound as well as a commercially available binaural head the Neumann KU 100.
Engineering Brief 535 (Download now)
EB3-2 SALTE Pt. 1: A Virtual Reality Tool for Streamlined and Standardized Spatial Audio Listening Tests—Daniel Johnston, University of York - York, UK; Benjamin Tsui, University of York - York, UK; Gavin Kearney, University of York - York, UK
This paper presents SALTE (Spatial Audio Listening Test Environment), an open-source framework for creating spatial audio perceptual testing within virtual reality (VR). The framework incorporates standard test paradigms such as MUSHRA, 3GPP TS 26.259 and audio localization. The simplified drag-and-drop user interface facilitates rapid and robust construction of customized VR experimental environments within Unity3D without any prior knowledge of the game engine or the C# coding language. All audio is rendered by the dedicated SALTE audio renderer which is controlled by dynamic participant data sent via Open Sound Control (OSC). Finally, the software is capable of exporting all experimental conditions such as visuals, participant interaction mechanisms, and test parameters allowing for streamlined and standardized comparable data within and in-between organizations.
Engineering Brief 536 (Download now)
EB3-3 SALTE Pt. 2: On the Design of the SALTE Audio Rendering Engine for Spatial Audio Listening Tests in VR—Tomasz Rudzki, University of York - York, UK; Chris Earnshaw, University of York - York, UK; Damian Murphy, University of York - York, UK; Gavin Kearney, University of York - York, UK
The dedicated audio rendering engine for conducting listening experiments using the SALTE (Spatial Audio Listening Test Environment) open-source virtual reality framework is presented. The renderer can be used for controlled playback of Ambisonic scenes (up to 7th order) over headphones and loudspeakers. Binaural-based Ambisonic rendering facilitates the use of custom HRIRs contained within separate WAV ?les or SOFA ?les as well as head tracking. All parameters of the audio rendering software can be controlled in real-time by the SALTE graphical user interface. This allows for perceptual evaluation of Ambisonic scenes and different decoding schemes using custom HRTFs.
Engineering Brief 537 (Download now)
EB3-4 Mixed Reality Collaborative Music—Andrea Genovese, New York University - New York, NY, USA; Marta Gospodarek, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This work illustrates a virtual collaborative experience between a real-time musician and virtual game characters based on pre-recorded performers. A dancer and percussionists have been recorded with microphones and a motion capture system so that their data can be converted into game avatars able to be reproduced within VR/AR scenes. The live musician was also converted into a virtual character and rendered in VR, and the whole scene was observable by an audience wearing HMDs. The acoustic character between the live and pre-recorded audio was matched in order to blend the music into a cohesive mixed reality scene and address the viewer's expectations set by the real-world elements. Presentation only; not available in E-Library]
EB3-6 Field Report: Immersive Recording of a Wind Ensemble Using Height Channels and Delay Compensation for a Realistic Playback Experience—Hyunjoung Yang, McGill University - Montréal, QC, Canada; Alexander Dobson, McGill University - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Practical examples for orchestra recording in stereo or surround were relatively easy to obtain. Whereas, it was found that recording practice in immersive audio is relatively limited. This paper is intended to share the experience of the immersive recording process for a wind orchestra recording at McGill University. There were concerns that needed to be considered before planning the concert recording, problems encountered during the planning, and lastly the solutions to these issues. In conclusion, the discussions about the final result and the approach will be described.
Engineering Brief 538 (Download now)
EB4 - Posters: Recording and Production
Friday, October 18, 3:30 pm — 5:00 pm (South Concourse A)
EB4-1 A Comparative Pilot Study and Analysis of Audio Mixing Using Logic Pro X and GarageBand for IOS—Jiayue Cecilia Wu, University of Colorado Denver - Denver, CO, USA; Orchisama Das, Center for Computer Research in Music and Acoustics (CRMA), Stanford University - Stanford, CA, USA; Vincent DiPasquale, University of Colorado Denver - Denver, CO, USA
In this pilot study we compare two mixes of a song done with GarageBand on iOS and Logic Pro X in a professional studio environment. The audio tracks are recorded and mastered in the same controlled environment by the same engineer. A blind listening survey was given to 10 laypersons and 10 professional studio engineers who have at least 10 years of related experience. 80% lay persons and 60% professional studio engineers reported a higher preference for the Logic Pro X version. To further compare these two productions, we look at (1) short-term perceptual loudness to quantify dynamic range and (2) power spectral densities in different frequency bands to quantify EQ. The analysis provides evidence to back the survey results. The purpose of this study is to examine how, in a real-life scenario, a professional studio engineer can produce the best results using the best plugins, effects, and tools available in GarageBand on iOS and Logic Pro X environment, and how these two results are comparatively perceived by both the general audience and professional audio experts.
Engineering Brief 539 (Download now)
EB4-2 The ANU School of Music Post-Production Suites: Design, Technology, Research, and Pedagogy—Samantha Bennett, Australian National University - Canberra, Australia; Matt Barnes, Australian National University - Canberra, Australia
This engineering brief considers the design, construction, technological capacity, research, and pedagogical remit of two post-production suites built at the ANU School of Music. These suites were constructed simultaneously to the recording studio refurbishment, as detailed in AES e-Brief #397 (2017). This new e-Brief first considers the intention and purpose behind the splitting of a single, large control room into two separate, versatile post-production spaces. Secondly, the e-Brief focuses on design and construction, with consideration given to acoustic treatment, functionality, ergonomic workflow, and aesthetics. The e-Brief also focuses technological capacity and the benefits of built-in limitations. Finally, the post-production suites are considered in the broader context of both the research and pedagogical activities of the School.
Engineering Brief 540 (Download now)
EB4-3 A Case Study of Cultural Influences on Mixing Preference—Targeting Japanese Acoustic Major Students—Toshiki Tajima, Kyushu University - Fukuoka, Japan; Kazuhiko Kawahara, Kyushu University - Fukuoka, Japan
There is no clear rule in the process of mixing in popular music production, so even with the same music materials, different mix engineers may arrive at a completely different mix. In order to solve this highly multidimensional problem, some listening experiments of mixing preference have been conducted in Europe and North America in previous studies. In this study additional experiments targeting Japanese major students in the field of acoustics were conducted in an acoustically treated listening room, and we integrated the data with previous ones and analyzed them together. The result showed a tendency for both British students and Japanese students to prefer (or dislike) the same engineers’ works. Furthermore, an analysis of verbal descriptions for mixing revealed that they gave most attention to similar listening points, such as “vocal,” and “reverb.”
Engineering Brief 541 (Download now)
EB4-4 A Dataset of High-Quality Object-Based Productions—Giacomo Costantini, University of Southampton - Southampton, UK; Andreas Franck, University of Southampton - Southampton, Hampshire, UK; Chris Pike, BBC R&D - Salford, UK; University of York - York, UK; Jon Francombe, BBC Research and Development - Salford, UK; James Woodcock, University of Salford - Salford, UK; Richard J. Hughes, University of Salford - Salford, Greater Manchester, UK; Philip Coleman, University of Surrey - Guildford, Surrey, UK; Eloise Whitmore, Naked Productions - Manchester, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
Object-based audio is an emerging paradigm for representing audio content. However, the limited availability of high-quality object-based content and the need for usable production and reproduction tools impede the exploration and evaluation of object-based audio. This engineering brief introduces the S3A object-based production dataset. It comprises a set of object-based scenes as projects for the Reaper digital audio workstation (DAW). They are accompanied by a set of open-source DAW plugins–—the VISR Production Suite—–for creating and reproducing object-based audio. In combination, these resources provide a practical way to experiment with object-based audio and facilitate loudspeaker and headphone reproduction. The dataset is provided to enable a larger audience to experience object-based audio, for use in perceptual experiments, and for audio system evaluation.
Engineering Brief 542 (Download now)
EB4-5 An Open-Access Database of 3D Microphone Array Recordings—Hyunkook Lee, University of Huddersfield - Huddersfield, UK; Dale Johnson, University of Huddersfield - Huddersfield, UK
This engineering brief presents open-access 3D sound recordings of musical performances and room impulse responses made using various 3D microphone arrays simultaneously. The microphone arrays comprised OCT-3D, 2L-Cube, PCMA-3D, Decca Tree with height, Hamasaki Square with height, First-order and Higher-order Ambisonics microphone systems, providing more than 250 different front-rear-height combinations. The sound sources recorded were string quartet, piano trio, piano solo, organ, clarinet solo, vocal group, and room impulse responses of a virtual ensemble with 13 source positions captured by all of the microphones. The recordings can be freely downloaded from www.hud.ac.uk/apl/resources. Future studies will use the recordings to formally elicit perceived attributes for 3D recording quality evaluation as well as for spatial audio ear training.
Engineering Brief 543 (Download now)
EB5 - Transducers
Saturday, October 19, 9:00 am — 11:30 am (1E11)
Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA
EB5-1 The Application of Graphene Oxide-Based Loudspeaker Membranes in 40mm Headphone Drivers—William Cardenas, ORA Graphine Audio Inc. - Montreal, Quebec, Canada; Robert-Eric Gaskell, McGill University - Montreal, QC, Canada
Graphene oxide-based materials have shown promise in loudspeaker membrane applications. The material allows the forming of highly stiff, low mass cones and domes for loudspeakers. The technology allows improvements in efficiency and linearity over other common loudspeaker membrane materials. This class of graphene material can be engineered to produce an excellent ratio of stiffness (Young’s modulus) to density (g/cm3) and damping (tan ? ). In a case study, acoustically optimized graphene materials were formed into membranes for headphone drivers. The performance of headphone drivers made with these membranes was analyzed and compared to standard polymer membrane headphone drivers. Relative to the polymer membrane drivers, the graphene membranes provide a significant reduction in both intermodulation and harmonic distortion while matching the sensitivity and producing a substantially smoother frequency response.
Engineering Brief 544 (Download now)
EB5-2 MEMS Loudspeakers - A New Chip-Based Technology for Ultra-Small Speakers—Fabian Stoppel, Fraunhofer Institute for Silicon Technology ISIT - Itzehoe, Germany; Florian Niekiel, Fraunhofer Institute for Silicon Technology ISIT - Itzehoe, Germany; Andreas Männchen, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Daniel Beer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Bernhard Wagner, Fraunhofer Institute for Silicon Technology ISIT - Itzehoe, Germany
Due to the ability to combine exceptional functionality with a very small size and a low price, micro-electro-mechanical systems (MEMS) have become the state-of-the-art solution for many miniaturized components like microphones and inertial sensors. Lately, strongly increasing efforts are being made to exploit the miniaturization potential and the advantages of semiconductor manufacturing processes to create ultra-small loudspeakers. In this context, a new technology for integrated chip speakers is presented. The MEMS speakers utilize multiple piezoelectric bending actuators, which are able to generate high sound pressure levels at low power consumption. Based on the results of an in-ear speaker system, an insight into the technology is given. Moreover, possibilities and challenges for MEMS speakers in general are discussed. [Presentation only; not available in E-Library]
EB5-3 A Case Study on a Dynamic Driver: How Electromagnet Can Improve the Performance of a Micro Speaker—Md Mehedi, Carpenter Technology Corporation - Philadelphia, PA, USA
How can designers improve the sound quality of next-generation audio products when the market is demanding smaller devices? Bigger sound requires bigger speakers, right? Not necessarily. One approach is to re-evaluate the materials you are using. Alloy materials used within speakers to conduct sound has not changed drastically in the last 10-15 years. However, developments in the performance of electromagnet alloys can replace standard electrical iron/low carbon steel and provide higher efficiency and performance. The result is better sound quality, smaller devices, and extended battery life. We studied the performance of the transducer with different electromagnets in magnet assembly and reported the comparison to provide better insight for the next-gen audio dynamic drives.
Engineering Brief 545 (Download now)
EB5-4 Alignment of Triple Chamber Eighth-Order Band-Pass Loudspeaker Systems—Hao Dong, Nanjing University - Nanjing, China; Yong Shen, Nanjing University - Nanjing, Jiangsu Province, China; Rui Chen, Nanjing University - Nanjing, China
An eighth-order band-pass loudspeaker system consisting of three vented chambers is analyzed. Since its frequency response has equal low-pass and high-pass cut-off slopes of fourth-order, the response function can be aligned to an eighth-order symmetric band-pass filter obtained by the frequency transformation method. For any desired frequency response, required system alignment parameters can be calculated by solving a system of equations. Design examples are presented and compared in terms of the mid-band attenuation factor and the diaphragm displacement.
Engineering Brief 546 (Download now)
EB5-5 Analysis of a Vented-Box Loudspeaker System via the Impedance Function—James Lazar, Samsung Electronics - Valencia, CA, USA; Glenn S. Kubota, Samsung Research America - Valencia, CA, USA
The vented-box loudspeaker system is studied through a small-signal equivalent circuit model via the impedance function. Some traditional models are found to inadequately represent the system losses, lumping them together in an effort to simplify the design process. Here, a low-frequency small-signal equivalent circuit model is proposed, incorporating five loss elements. The impedance function is derived, and system parameters are determined by curve-fitting the impedance function to measured impedance data. It is shown that the reactive elements determine the critical frequencies, and the lossy elements determine the Q-factors or contribute to the impedance level. Moreover, the lossy elements affect the curve-fit in a unique way, allowing their values to be quantified.
Engineering Brief 547 (Download now)
EB5-6 Designing Listening Tests of SR/PA Systems, A Case Study—Eddy Bøgh Brixen, EBB-consult - Smørum, Denmark; DPA Microphones - Allerød, Denmark
It is very common to arrange for comparisons of SR/PA-systems. However, often, these comparisons are organized in a way leaving procedures less transparent and results rather unclear. Standards for the assessment of loudspeakers do exist. The assessors basically must be trained for the purpose, and the set-up should support double-blind testing. However, in the test of big systems, the listening panel is not necessarily trained, and the practical problems of rigging huge arrays to some degree may weaken the procedures and the results. This paper describes considerations for the comparative assessment of SR/PA systems. The paper also reports the outcome of an experiment where considered principles were applied.
Engineering Brief 548 (Download now)
EB5-7 Noise and Distortion Mechanisms Encountered in Switching Audio Power Amplifier Design—Robert Muniz, Harmonic Power Conversion LLC - Douglas, MA, USA
When designing a switching power amplifier, many phenomena are encountered that leave the designer wondering why performance falls short of what theory predicts. While many sources of non-linearity and noise in the conversion process are known and intrinsic to the sub-systems involved, other sources of error are more subtle. The intent of this paper is to outline the noise, distortion, and error mechanisms commonly encountered in practice when designing a switching (Class-D) power amplifier. By understanding the root cause of these mechanisms, a more heuristic approach can be employed in switching power amplifier design. The focus will be on analog systems employing clocked, naturally sampled modulators, but the bulk of the material will be broadly applicable to any modulation scheme.
Engineering Brief 549 (Download now)
EB5-8 Acoustic Metamaterial in Loudspeaker Systems Design—Letizia Chisari, Contralto Audio srl - Casoli, Italy; Mario Di Cola, Contralto Audio srl - Casoli, Italy; Paolo Martignon, Contralto Audio srl - Casoli, Italy
Materials have been used to control waves propagation ever since, and optics is a prime example. In Loudspeaker Systems applications, there have also been approaches in the attempt of controlling waves by acoustic lenses. Metamaterials are artificial structures, typically periodic, composed of small meta-atoms that, in the bulk, behave like continuous material with unconventional effective properties without the constraints normally imposed by nature. This talk offers the opportunity to share what can be done with acoustic metamaterials in audio industry, especially in Loudspeaker Systems Design. The presentation brings back some approaches from the past that can be revisited using today’s technologies. Moreover, this talk shows some of the already developed technologies that employ these extremely innovative materials.
Engineering Brief 550 (Download now)
EB5-9 Application of Matrix Analysis for Derivation of Acoustical Impedance of Horns—Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA; Balázs Kákonyi, Harman Professional Solutions - Northridge, CA, USA; Brian McLaughlin, Harman Professional Solutions - Northridge, CA, USA
The direct measurement of a horn’s acoustical impedance requires knowledge of both the sound pressure and volume velocity at the throat of the horn. While measuring sound pressure is trivial, the measurement of volume velocity requires special equipment. This work proposes a new derivation method for the acoustical impedance of a horn. The method is based on matrix analysis and consists of two stages: derivation of the compression driver’s square transfer matrix of A-parameters, and measurement of electrical impedance and sound pressure at the throat of the horn. These functions yield two matrix equations that relate measured sound pressure and electrical impedance, which allows for an acoustical impedance derivation of the horn. A comparison with COMSOL simulation is provided.
Engineering Brief 551 (Download now)
EB5-10 Application of Modulated Musical Multitone Signal for Evaluation of Horn Driver Sound Quality—Alexander Voishvillo, JBL/Harman Professional Solutions - Northridge, CA, USA; Balázs Kákonyi, Harman Professional Solutions - Northridge, CA, USA; Brian McLaughlin, Harman Professional Solutions - Northridge, CA, USA
This work introduces a new type of test signal called Modulated Musical Multitone (MMM): sinusoidal tones outlining E-minor triads in several octaves with amplitude modulation providing a variable crest factor which can match specific musical signals. Three different signals are used in the corresponding experiments including MMM, sinusoidal sweep, and music. An evaluation of sound quality is conducted for an FIR-filtered single horn driver. The effect of masking is observed when a matching linear low-pass channel is added to the signal. The multitone response is post-processed to obtain the distortion products spectrum.
Engineering Brief 552 (Download now)
EB6 - Spatial Audio
Saturday, October 19, 2:00 pm — 3:00 pm (1E11)
Andreas Franck, University of Southampton - Southampton, Hampshire, UK
EB6-1 Study of the Effect of Tikhonov Regularization on the Low Frequency Performance of Cross-Talk Cancellation Systems—Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Eric Hamdan, University of Southampton - Southampton, UK; Marcos Simon, AudioScenic - Southampton UK; University of Southampton - Southampton, UK; Andreas Franck, University of Southampton - Southampton, Hampshire, UK
Tikhonov regularization is widely applied to the design of cross-talk cancellation (CTC) systems to ensure stability by limiting the loudspeaker array effort. The effect of regularization is significant especially at low frequencies, where the underlying inverse system is generally ill-posed. Previous work by the authors demonstrated that regularization leads to a distortion of the auditory scene, known as stage compression. In this work an analytical formula is derived to calculate, for a given loudspeaker arrangement, the minimum amount of Tikhonov regularization required to ensure that the array effort does not exceed the desired limit. The analytical derivation is presented also of the low frequency limit below which the effect of regularization has a significant effect on the CTC system performance. [Presentation only; not available in E-Library]
EB6-2 Subjective Comparison of Binaural Audio Rendering through Headphones and CTC—Jonathan Phillips, University of Southhampton - Southampton, UK; Marcos Simon, AudioScenic - Southampton UK; University of Southampton - Southampton, UK
This work compares the subjective performance of a compact cross-talk cancellation (CTC) soundbar prototype compared to headphone reproduction. Preference scores were obtained from listening to different video content with binaural audio. A graphical user interface (GUI) was designed to show the video content, switch between systems and elicit preference scores and ratings of several spatial audio attributes. The study aimed to determine how these two methods compare in terms of relative preference and attribute performance by combining four different video content. Emphasis was placed on the perception of envelopment (or immersion) alongside clarity, depth of field, horizontal width, realism, and spatial naturalness. The results show a preference towards the CTC soundbar over headphones over most of the content/attributes configurations. [Presentation only; not available in E-Library]
EB6-3 Tetra-Speaker: Continual Evaluation of the Immersive Experience of a Single-Point Reproduction System—Parichat Songmuang, New York University - New York, NY, USA
In the first phase of experimentation, a tetra-speaker had been built as an efficient and compact system for the reproduction of individual sound sources. The reproduction process was based on the relationship of a single sound source, such as an instrument, and an acoustic space. This relationship focused on the radiating behavior of the source. In this paper the tetra-speaker is further evaluated in a case-like study on the immersive experience of the system with additional discussion of how the system may expand its usage to improve this experience within virtual environments. Professionals within the audio field were asked to give an expert's opinion based on defined attributes of spatial impression and realism.
Engineering Brief 553 (Download now)
EB6-4 Tetrahedral Microphones: An Effective A/B Main System—Alexander Dobson, McGill University - Montreal, QC, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
A simple approach to produce an effective stereo main audio recording system using tetrahedral microphones is described, capturing a full array of close and distant sound with a substantial amount of early reflections. This allows for the easy possibility of later surround or 3D reproduction. Furthermore, the implementation of a pragmatic and simple two-microphone set-up can lead to an efficient capture of the stereo soundfield that has many applications. Particular attention was paid to binaural mixing of this microphone system to demonstrate an easy first step into immersive reproduction of the sound.
Engineering Brief 554 (Download now)
EB7 - Audio Signal Processing
Saturday, October 19, 2:30 pm — 4:15 pm (1E10)
Dave Moffat, Queen Mary University London - London, UK
EB7-1 Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization—Jorge Zúñiga, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
We present a synthesis algorithm for approximating bird song using particle swarm optimization to match real bird recordings. Frequency and amplitude envelope curves are first extracted from a bird recording. Further analysis identifies the presence of even and odd harmonics. A particle swarm algorithm is then used to find cubic Bezier curves which emulate the envelopes. These curves are applied to modulate a sine oscillator and its harmonics. The synthesized syllable can then be repeated to generate the sound. Thirty-six bird sounds have been emulated this way, and a real-time web-based demonstrator is available, with user control of all parameters. Objective evaluation showed that the synthesized bird sounds captured most audio features of the recordings.
Engineering Brief 555 (Download now)
EB7-2 Multi-Scale Auralization for Multimedia Analytical Feature Interaction—Nguyen Le Thanh Nguyen, University of Miami - Coral Gables, FL, USA; Hyunhwan Lee, University of Miami - Coral Gables, FL, USA; Joseph Johnson, University of Miami - Coral Gables, FL, USA; Mitsunori Ogihara, University of Miami - Coral Gables, FL, USA; Gang Ren, University of Miami - Coral Gables, FL, USA; James W. Beauchamp, Univ. of Illinois at Urbana-Champaign - Urbana, IL, USA
Modern human-computer interaction systems use multiple perceptual dimensions to enhance intuition and efficiency of the user by improving their situational awareness. A signal processing and interaction framework is proposed for auralizing signal patterns and augmenting the visualization-focused analysis tasks of social media content analysis and annotations, with the goal of assisting the user in analyzing, retrieving, and organizing relevant information for marketing research. Audio signals are generated from video/audio signal patterns as an auralization framework, for example, using the audio frequency modulation that follows the magnitude contours of video color saturation. The integration of visual and aural presentations will benefit the user interactions by reducing the fatigue level and sharping the users’ sensitivity, thereby improving work efficiency, confidence, and satisfaction.
Engineering Brief 556 (Download now)
EB7-3 Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference—Angeliki Mourgela, Queen Mary University of London - London, UK; Trevor Agus, Queens University Belfast - Belfast, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
This paper proposes the development of a hearing loss simulation for use in audio mix referencing, designed according to psychoacoustic and audiology research findings. The simulation proposed in this paper aims to reproduce four perceptual aspects of hearing loss; threshold elevation, loss of dynamic range, reduced frequency and temporal resolution, while providing an audio input/output functionality.
Engineering Brief 557 (Download now)
EB7-4 Modeling between Partial Components for Musical Timbre Imitation and Migration—Angela C. Kihiko, Spelman College - Atlanta, GA, USA; Mitsunori Ogihara, University of Miami - Coral Gables, FL, USA; Gang Ren, University of Miami - Coral Gables, FL, USA; James W. Beauchamp, Univ. of Illinois at Urbana-Champaign - Urbana, IL, USA
Most musical sounds have strong and regularly distributed spectral components such as harmonic partials. However, the energy distribution patterns between any two such sonic partials, the in-between low-energy signal patterns such as performance articulation or instrument signatures, are also important for characterizing musical sounds. This paper presents a timbre-modeling framework for detecting and modeling the between-partial components for musical timbre analysis and synthesis. This framework focuses on timbre imitation and migration for electronic music instruments, where timbral patterns obtained from acoustical instruments are re-interpreted for electronic instruments and new music interfaces. The proposed framework will help musicians and audio engineers to better explore musical timbre and musical performance expressions for enhancing the naturalness, expressiveness, and creativeness of electronic/computer music systems.
Engineering Brief 558 (Download now)
EB7-5 Coherence as an Indicator of Distortion for Wide-Band Audio Signals such as M-Noise and Music—Merlijn van Veen, Meyer Sound Laboratories - Berkeley, CA, USA; Roger Schwenke, Meyer Sound Laboratories - Berkeley, CA, USA
M-Noise is a new scientifically derived test signal whose crest factor as a function of frequency is modeled after real music. M-Noise should be used with a complementary procedure for determining a loudspeaker’s maximum linear SPL. The M-Noise Procedure contains criteria for the maximum allowable change in coherence as well as frequency response. When the loudspeaker and microphone are positioned as prescribed by the procedure, reductions in coherence are expected to be caused by distortion. Although higher precision methods for measuring distortion exist, coherence has the advantage that it can be calculated for wide-band signals such as M-Noise as well as music. Examples will demonstrate the perceived audio quality associated with different amounts of distortion-induced coherence loss.
Engineering Brief 559 (Download now)
EB7-6 Fast Time Domain Stereo Audio Source Separation Using Fractional Delay Filters—Oleg Golokolenko, TU- Ilmenau - Ilmenau, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany; Fraunhofer Institute for Digital Media technology (IDMT) - Ilmenau, Germany
Our goal is a system for the separation of two speakers during teleconferencing or for hearing aids. To be useful in real time, we want it to work online with as low delay as possible. Proposed approach works in time domain, using attenuation factors and fractional delays between microphone signals to minimize cross-talk, the principle of a fractional delay and sum beamformer. Compared to other approaches this has the advantage that we have lower computational complexity, no system delay and no musical noise like in frequency domain algorithms. We evaluate our approach on convolutive mixtures generated from speech signals taken from the TIMIT data-set using a room impulse response simulator.
Engineering Brief 560 (Download now)
EB7-7 Line Array Optimization through Innovative Multichannel Filtering—Paolo Martignon, Contralto Audio srl - Casoli, Italy; Mario Di Cola, Contralto Audio srl - Casoli, Italy; Letizia Chisari, Contralto Audio srl - Casoli, Italy
Element dependent filtering offers the possibility to optimize the sound coverage of vertical line arrays: distance dependent frequency response, as well as mid-low frequency beaming and air absorption can be partially compensated. Simulation of array elements contributions to venue acoustics is normally the input data for filters calculation, but some phenomena exist in the real world that are hardly addressed by simulations: for example, the dispersion of transducers responses, as well as the acoustic paths atmospheric conditions, among different array elements. This awareness induced us to develop an algorithm with the aim of being robust against these inaccuracies.
Engineering Brief 561 (Download now)
EB8 - Applications in Audio
Saturday, October 19, 3:30 pm — 4:30 pm (1E11)
Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
EB8-1 Vibrary: A Consumer-Trainable Music Tagging Utility—Scott Hawley, Belmont University - Nashville, TN, USA; Jason Bagley, Art+Logic - Pasadena, CA, USA; Brett Porter, Art+Logic - Fanwood, NJ, USA; Daisey Traynham, Art+Logic - Pasadena, CA, USA
We present the engineering underlying a consumer application to help music industry professionals find audio clips and samples of personal interest within their large audio libraries typically consisting of heterogeneously-labeled clips supplied by various vendors. We enable users to train an indexing system using their own custom tags (e.g., instruments, genres, moods), by means of convolutional neural networks operating on spectrograms. Since the intended users are not data scientists and may not possess the required computational resources (i.e., Graphics Processing Units, GPUs), our primary contributions consist of (i) designing an intuitive user experience for a local client application to help users create representative spectrogram datasets, and (ii) "seamless" integration with a cloud-based GPU server for efficient neural network training.
Engineering Brief 562 (Download now)
EB8-2 Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio—Lauren Ward, University of Salford - Salford, UK; BBC R&D - Salford, UK; Matthew Paradis, BBC Research and Development - London, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Salsa Sound Ltd - Salford, Greater Manchester, UK; Laura Russon, BBC Studios - Cardiff, Wales, UK; Robin Moore, BBC Research & Development - Salford, UK; Rhys Davies, BBC Studios - Cardiff, Wales, UK
Casualty Accessible and Enhanced (A&E) Audio is the first public trial of accessible audio technology using a narrative importance approach. This trial allows viewers to personalize the audio of an episode of the BBC’s "Casualty" drama series based on their hearing needs. Using a simple interface the audio can be varied between the broadcast mix and an accessible mix containing narratively important non-speech sounds, enhanced dialogue, and attenuated background sounds. This paper describes the trial’s development, implementation, and it’s evaluation by normal and hard of hearing listeners (n=5209 on 20/8/2019). 299 participants also completed a survey, rating the technology 3.6/5 stars. 73% reported the technology made the content more enjoyable or easier to understand.
Engineering Brief 563 (Download now)
EB8-3 Generative Modeling of Metadata for Machine Learning Based Audio Content Classification—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
Automatic content classification technique is an essential tool in multimedia applications. Present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. In this paper we present a neural network to classify between the movie (cinematic, TV shows), music, and voice using metadata contained in either the audio/video stream. Towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. Subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ML classifier as feature vectors. The resulting classifier is then able to classify real-world content (e.g., YouTube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.
Engineering Brief 564 (Download now)
EB8-4 Individual Headphone Equalization at the Eardrum with New Apps for Computers and Cellphones—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
Ear canal resonances that concentrate energy on the eardrum are highly individual, and headphones alter or eliminate them. The result is inaccurate timbre and in-head localization. We have developed computer apps that use an equal loudness test to match the sound spectrum at the eardrum from a pair of headphones to the spectrum at the eardrums from a frontal loudspeaker. The result is precise timbre and frontal localization. The improvement in sound is startling. In this presentation we will demonstrate the process and the easy to use software that is now available for VST, AAX, Windows, MAC, Android and IOS cellphones. [Presentation only; not available in E-Library]