AES Paris 2016
Engineering Brief Details

EB1 - eBriefs 1: Posters


Saturday, June 4, 09:00 — 10:15 (Foyer)

EB1-1 Sound Pressure Analysis For Closed-Box Loudspeaker EnclosuresCharalampos Papadakos, University of Patras - Rio, Greece; Gavriil Kamaris, University of Patras - Rion Campus, Greece; John Mourjopoulos, University of Patras - Patras, Greece
This study employs a physical modeling method to explore the pressure distribution within typical closed-box loudspeaker enclosures of different shape and inner volume. The simulation results are compared to measurements in such enclosures. The results indicate that sound pressure within such enclosures often exceeds levels of 130 dB. The pressure profile is usually constant at lower frequencies and displays some strong resonances at higher frequencies due to normal modes. Such levels traditionally challenge enclosure air-tightness, box rigidity, but they can also provide useful acoustic energy for harvesting.
Engineering Brief 236 (Download now)

EB1-2 Mobile Platform Acoustical Noise Identification Using Internal and Reference MicrophonesPrzemek Maziewski, Intel Technology Poland - Gdansk, Poland
Proposed paper addresses the problem of microphone noise. The performance of built-in microphones in laptops and other mobile devices can suffer in the presence of noise. Identifying noisy components and separating the internal from external origins allows for the noise sources to be root caused and eliminated. This capability is crucial when developing new platforms. The proposed method employs a series of recordings, conducted using both built-in and reference microphones. The recordings are obtained under different operating conditions of the device, e.g., AC vs battery power. The recordings are then analyzed to identify different characteristics resulting from use of the internal versus the external microphone. Based on these results noise components can be separated and the potential noise source identified.
Engineering Brief 237 (Download now)

EB1-3 Automatically Generating VST Plugins from MATLAB CodeCharlie DeVane, MathWorks - Natick, MA, USA; UMass Lowell - Lowell, MA, USA; Gabriele Bunkheila, MathWorks - Cambridge, UK
We describe the automatic generation of VST audio plugins from MATLAB code using the Audio System Toolbox from MathWorks. We provide MATLAB code for three complete example plugins, discuss problems that may be encountered, and describe a workflow to generate VST plugins as quickly and easily as possible.
Engineering Brief 238 (Download now)

EB1-4 Echo Thresholds for a 3-D Loudspeaker ConfigurationLee Davis, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Echo thresholds were examined with differing stimuli, lag sound directions, and decision criteria in a 3D loudspeaker reproduction environment. Two tests were undertaken to examine two different criteria: echo threshold with fusion and echo threshold with complete separation, each with three stimuli (orchestral, pink noise burst, and speech) and six lag sound directions in total. An adapted method of adjustment was used by subjects to control the delay between the lead and lag loudspeakers. Results showed that there were significant differences in echo threshold when the decision criteria differed. The orchestral stimulus was found to be significantly different from the pink noise burst and speech in both criteria. Few significant differences were noted between angles. In general, echo thresholds were higher with lag sources located in the median plane.
Engineering Brief 239 (Download now)

EB1-5 A New Response Method for Auditory Localization and Spread TestsHyunkook Lee, University of Huddersfield - Huddersfield, UK; Dale Johnson, The University of Huddersfield - Huddersfield, UK; Manchester, UK; Maksims Mironovs, University of Huddersfield - Huddersfield, West Yorkshire, UK
This Engineering Brief presents a new response method developed for auditory localization and spread tests. The proposed method uses a flexible strip with a series of LEDs, which are powered by a microcontroller, for eliciting subjective responses. For the localization test, the position of an active LED is controlled and recorded in Max using a dial. For the spread test, multiple LEDs can be positioned on the strip to visually describe the lower and upper boundaries of the perceived image. The required system is easy to build and relatively inexpensive. Vertical stereophonic localization tests were conducted to compare between the LED method and a visual marker method. Results showed that the proposed method was more accurate, consistent, and time-efficient than the marker method.
Engineering Brief 240 (Download now)

EB1-6 Setting Up and Making an AES67 Network Coexist with Standard Network TrafficMickaël Henry, UVHC - Valenciennes, France; Digigram - Montbonnot, France; Lucas Rémond, CNSMDP - Paris, France; Nicolas Sturmel, Digigram S.A. - Montbonnot, France
In this paper we will show how an AES67 network can coexist within a standard non-audio network. We will detail the difficulties usually encountered when setting up and using AES67 networks. We will analyze the utility of the network protocols required by AES67: (i) IGMP and its impact on devices features, (ii) PTP and the clock recovery performance when using PTP enabled switches, and (iii) QoS and the impact of non-audio traffic such as web and corporate traffic. We will use a set-up of 10 different AES67 compliant devices from many manufacturers and supporting various AoIP protocols all compliant to AES67. We will provide recommendations in order to provide proper quality of experience while making networks coexist.
Engineering Brief 241 (Download now)

EB1-7 Implementation of Faster than Real Time Audio Analysis for Use with Web Audio API: An FFT Case StudyLuis Joglar-Ongay, University of Huddersfield - Huddersfield, West Yorkshire, UK; Christopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
There is significant interest in the audio community in developing web-based applications using HTML5 and Web Audio API. Whilst this newly emerging API goes some way to provide offline audio analysis in the web browser it is limited to a relatively basic FFT with fixed Blackman windowing and no overlap facility. Most previously documented solutions to this issue operate in real time. This paper demonstrates how to perform more sophisticated, faster than real time FFT analysis for use within Web Audio applications. It makes use of the Web Audio API and the dsp.js library. Academics and researchers can use this paper as a tutorial to develop similar solutions within their own web based audio applications.
Engineering Brief 242 (Download now)

EB1-8 Block-Sparse Fast Recursive Approximated Memory Improved Proportionate Affine Projection AlgorithmFelix Albu, Valahia University of Targoviste - Targoviste, Romania
A new approximated memory improved proportionate affine projection algorithm for block sparse echo cancellation is proposed. This contribution presents a fast recursive implementation combined with the use of dichotomous coordinate descent iterations. It is shown that the proposed algorithm has superior convergence speed and tracking abilities for echo path changes in the context of acoustic and network echo cancellation applications. Also it is proved that these achievements are obtained while having a reduced numerical complexity than competing algorithms.
Engineering Brief 243 (Download now)

EB1-9 The Effect of Loop Length and Musical Material on Discrimination Between MP3 and WAV FilesDenis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Listening test results generally show that for bit-rates higher than 128 kbps listeners can rarely distinguish between MP3 and WAV files with any statistical significance while many audio professionals agree it is possible. This project attempts to explain some of the reasons why typical AB and ABX tests often fail by looking at the effects of loop length and music choice on listener success. An informal take-home AB listening test was used with varying musical material and music looping at different lengths. The results show that performance drops significantly with short loop lengths (<2sec, p = .02) and that the participants were able to discriminate between these two different file formats with great significance (p < .001).
Engineering Brief 244 (Download now)

EB1-10 Considerations When Calibrating Program Material Stimuli Using LUFSMalachy Ronan, University of Limerick - Limerick, Ireland; Nicholas Ward, University of Limerick - Limerick, Ireland; Robert Sazdov, University of Limerick - Limerick, Ireland
While the LUFS standard was originally developed for broadcast applications, it offers a convenient means of calibrating program material stimuli to an equal loudness level, while remaining in a multichannel format. However, this calibration is based on an absolute sound pressure level of 60 dBA, the preferred listening level when watching television. Levels used in analytical listening and perceptual experiments tend to be significantly higher. This disparity may affect the accuracy of the Leq(RLB) weighting filter employed in LUFS meters. To address this issue, the development of the LUFS standard is examined to assess its suitability for the task. The findings suggest that a compromise between analytical listening and loudness matching in perceptual experiments requires careful consideration of experimental variables.
Engineering Brief 245 (Download now)

EB1-11 Selective Mixing Improves Reproduction Quaity with Portable LoudspeakersPiotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland; Tomasz Dziedzic, AGH University of Science and Technology - Krakow, Poland
Selective mixing of sounds is an experimental method of mixing, first proposed in [1]. Further developments and listening experiments confirmed that inexperienced listeners more often than not prefer this type of processing over direct mixing, while it is the other way round with mixing engineers. It has been found lately, that besides the extent of the effect, there is another independent variable associated with this method—quality of the reproduction system. Experiments have shown, that the percentage of listeners choosing selective mixing versions is higher when the music is reproduced over small loudspeakers of portable devices, like notebook computers.
Engineering Brief 246 (Download now)

 
 

EB2 - eBriefs 2: Posters


Saturday, June 4, 10:30 — 12:00 (Foyer)

EB2-1 Investigation into the Perceptual Effects of Image Source Method OrderDale Johnson, The University of Huddersfield - Huddersfield, UK; Manchester, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This engineering brief explores the perceived effects and characteristics of impulse responses (IRs) generated using a custom, hybrid, geometric reverb algorithm. The algorithm makes use of a well known Image Source Method (ISM) and Ray Tracing methods. ISM is used to render the early reflections to a specified order while ray tracing renders the remaining reflections. IRs rendered at varying ISM orders appear to exhibit differences in perceptual characteristics, particularly in the early portion. To understand these characteristics, an elicitation test base was devised in order to acquire terms for the different characteristics. These terms were grouped in order to provide attributes for future grading tests.
Engineering Brief 247 (Download now)

EB2-2 The Influence of Discrete Arriving Reflections on Perceived Intelligibility and STI MeasurementsRoss Hammond, University of Derby - Derby, Derbyshire, UK; Peter Mapp Associates - Colchester, UK; Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK; Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Gand Concert Sound - Elk Grove Village, IL, USA
The most widely used objective intelligibility measurement method, the Speech Transmission Index (STI), does not completely match the highly complex auditory perception and human hearing system. Investigations were made into the impact of discrete reflections (with varying arrival times and amplitudes) on STI scores, subjective intelligibility, and the subjective annoyance factor.’ This allows the effect of comb filtering on the modulation transfer function matrix to be displayed, as well as demonstrates how the perceptual effects of a discrete delay cause subjective ‘annoyance,’ that is not necessarily mirrored by STI. This work provides evidence showing why STI should not be the sole verification method within public address and emergency announcement systems, where temporal properties also need thoughtful consideration.
Engineering Brief 248 (Download now)

EB2-3 Immersive Production Techniques in Cinematic Sound Design: Context and SpatializationTom Downes, University of Limerick - Limerick, Ireland; Malachy Ronan, University of Limerick - Limerick, Ireland
Immersive formats are fast becoming a ubiquitous feature of film post-production workflow. However, little knowledge exists concerning production techniques addressing this increased spatial resolution. Questions therefore remain regarding their function in cinematic sound design. To address this issue, this paper evaluates the context required to prompt the use of elevated loudspeakers and examines the relevance of electroacoustic spatialization techniques to 3D cinematic formats. A contextually relevant scene from submarine classic Das Boot was selected to probe this question in a 9.1 loudspeaker configuration. It is hoped that this paper will prompt further discourse on the topic.
Engineering Brief 249 (Download now)

EB2-4 Perceptual Comparison of Localization with Soundman Binaural Microphones vs HRTF Post-ProcessingBlas Payri, Universitat Politècnica de València - Valencia, Spain; Ramón Rodríguez Mariño, Universitat Politècnica de València - Valencia, Spain
We realize a perceptual comparison of spatial sound localization using a synthetic pink noise and four recognizable sound sources: male and female speech, a closing door, and a sea-sound recording. Spatialization is created via binaural recording (Soundman OKM binaural microphones) or HRTF post-processing using filters available in Logic Pro, Protools, and Matlab. Eleven participants had to locate the source position of 72 stimuli combining different locations in azimuth. Results show that location recognition is generally low (36%). Although Soundman recordings show better results, no significant difference in localization accuracy is found between HRTF filtering systems and the binaural microphone recordings. We conclude that binaural 3D sound can easily be implemented with available commercial software, with no clear difference between systems.
Engineering Brief 250 (Download now)

EB2-5 VSV (Virtual Source Visualizer), A Practical Tool for 3D-Visualizing Acoustical Properties of Spatial SoundsMasataka Nakahara, ONFUTURE Ltd. - Tokyo, Japan; SONA Corp. - Tokyo, Japan; Akira Omoto, Kyushu University - Fukuoka, Japan; Onfuture Ltd. - Tokyo, Japan; Yasuhiko Nagatomo, Evixar Inc. - Tokyo, Japan
The authors have developed a practical tool which visualizes 3D acoustical properties of sound by using sound intensity information. The tool, VSV (Virtual Source Visualizer), consists of two main parts; analyzing software and measurement instruments. Since the goal is to provide a simple solution to 3D acoustic analysis, the authors have focused on the following items; obtain intuitively understandable results, and construct reliable system with inexpensive devices. In this paper usefulness and accuracy of our proposed method are discussed, and some examples of practical applications are also introduced.
Engineering Brief 251 (Download now)

EB2-6 Database of Binaural Room Impulse Responses of an Apartment-Like EnvironmentFiete Winter, Universität Rostock - Rostock, Germany; Hagen Wierstorf, Technische Universität Ilmenau - Ilmenau, Germany; Ariel Podlubne, Université de Toulouse - Toulouse, France; Thomas Forgue, Université de Toulouse - Toulouse, France; Jérome Manhès, Université de Toulouse - Toulouse, France; Matthieu Herrb, Université de Toulouse - Toulouse, France; Sascha Spors, University of Rostock - Rostock, Germany; Alexander Raake, Technische Universität Ilmenau - Ilmenau, Germany; Patrick Danès, Université de Toulouse - Toulouse, France
We present a database of binaural room impulse responses (BRIRs) measured in an apartment-like environment. The BRIRs were captured at four different sound source positions, each combined with four listener positions. A head and torso simulator (HATS) with varying head-orientation in the range of ± 78 degrees with 2-degrees resolution was used. Additionally, BRIRs of 20 listener positions along a trajectory connecting two of the four positions were measured, each with a fixed head-orientation. The data is provided in the Spatially Oriented Format for Acoustics (SOFA) and it is freely available under the Creative Commons (CC-BY-4.0) license. It can be used to simulate complex acoustic scenes in order to study the process of auditory scene analysis for humans and machines.
Engineering Brief 252 (Download now)

EB2-7 Compatibility Study of Dolby Atmos Objects' Spatial Sound Localization Using a Visualization MethodTakashi Mikami, SONA Co. - Tokyo, Japan; Masataka Nakahara, ONFUTURE Ltd. - Tokyo, Japan; SONA Corp. - Tokyo, Japan; Kazutaka Someya, beBlue Co., Ltd. - Tokyo, Japan
3D sound intensity measurement was carried out in two Dolby Atmos-compliant mixing rooms, and spatial sound localizations were visualized by using a newly developed visualizer, VSV (Virtual Source Visualizer), which locates sound directions on panoramic 4p space by using 3D sound intensity. Since in conventional channel-based sound design, sound localizations depend on loudspeakers' positions, there should be differences among mixing rooms. But in object-based sound design as is provided by Atmos, sound localization is rendered by RMU (Rendering and Mastering Unit) using metadata of azimuth and elevation angle and is expected not to depend on loudspeakers' positions. The session discusses inter-room compatibility / difference of sound direction using the visualization method between two mixing rooms, a small Home Atmos-compliant mixing room, and a Cinema Atmos-compliant large stage.
Engineering Brief 253 (Download now)

EB2-8 Controlling Program Loudness in Individualized Binaural Rendering of Multichannel Audio ContentsEmmanuel Ponsot, STMS Lab (Ircam, CNRS, UPMC) - Paris, France; Radio France - Paris, France; Hervé Dejardin, Radio France - Paris, France; Edwige Roncière, Radio France - Paris, France
For practical reasons, we often experience multichannel audio productions in a binaural context (e.g., headphones on mobile devices). To make listeners benefit from optimal binaural rendering (“BiLi project”), Radio France developed nouvOson (http://nouvoson.radiofrance.fr/), an online audio platform where listeners can select HRTFs and ITDs that best fit them. The goal of the present study was to control the program loudness (measured according to the ITU-R BS.1770 / R128 recommendations) after binauralization. To this end, we examined the influence of various parameters such as the audio content (synthetic vs. real broadcast audio), HRTFs, and ITDs on loudness. We propose a dynamic process, which adapts the gain in the binauralization chain so as to control the output loudness of virtual surround audio contents.
Engineering Brief 254 (Download now)

EB2-9 Presenting the S3A Object-Based Audio Drama DatasetJames Woodcock, University of Salford - Salford, Greater Manchester, UK; Chris Pike, BBC Research and Development - Salford, Greater Manchester, UK; University of York - Heslington, York, UK; Frank Melchior, BBC Research and Development - Salford, UK; Philip Coleman, University of Surrey - Guildford, Surrey, UK; Andreas Franck, University of Southampton - Southampton, Hampshire, UK; Adrian Hilton, University of Surrey - Guildford, Surrey, UK
This engineering brief reports on the production of three object-based audio drama scenes commissioned as part of the S3A project. 3D reproduction and an object-based workflow were considered and implemented from the initial script commissioning through to the final mix of the scenes. The scenes are being made available freely and without restriction as Broadcast Wave Format files containing all objects as separate tracks and all metadata necessary to render the scenes as an XML chunk in the header conforming to the Audio Definition Model specification (Recommendation ITU-R BS.2076 [1]). It is hoped that these scenes will find use in perceptual experiments and in the testing of 3D audio systems. The scenes are available via the following link: http://dx.doi.org/10.17866/rd.salford.3043921.
Engineering Brief 255 (Download now)

EB2-10 Installation of a Flexible 3D Audio Reproduction System into a Standardized Listening RoomRussell Mason, University of Surrey - Guildford, Surrey, UK
In order to undertake research into 3D audio reproduction systems, it was necessary to install a flexible loudspeaker rig into the ITU-R BS 1116 standard listening room at the University of Surrey. Using a mixture of aluminum truss and tube, a method for mounting loudspeakers in a manner that allows a wide range of layouts was created. As an example configuration, an installed 22.2 system is described. The method used to undertake bass management of this system, as well as methods to align the time of arrival, level, and frequency response of each channel are described. The resulting configuration is compared to the requirements of the ITU-R BS 1116 standard.
Engineering Brief 256 (Download now)

 
 

EB3 - eBriefs 3: Lectures


Monday, June 6, 11:45 — 14:00 (Room 353)

Chair:
Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany

EB3-1 The Aerodynamics Phenomena of a Particular Bass-Reflex PortVictor Manuel Garcia-Alcaide, Universitat Politècnica de Catalunya - Barcelona, Spain; Sergi Palleja-Cabre, Universitat Politècnica de Catalunya - Barcelona, Spain; R. Castilla, Universitat Politècnica de Catalunya - Barcelona, Spain; P. J. Gamez-Montero, Universitat Politècnica de Catalunya - Barcelona, Spain; Jordi Romeu, Universitat Politècnica de Catalunya - Barcelona, Spain; Teresa Pamies, Universitat Politècnica de Catalunya - Barcelona, Spain; Joan Amate, Amate Audio S.L. - Terrassa, Barcelona, Spain; Natalia Milan, Amate Audio S.L. - Barcelona, Spain
The aim of this paper is to study the aerodynamics phenomena of a particular bass-reflex port that causes unwanted noise in the audible frequency range. After discarding structural and mechanical vibration issues, the hypothesis that vortex shedding could be the source of the noise has been considered. Experimental and numerical evidences of the vortex, an analysis of its noise and the similarities between the real performance and the simulated one are presented. The simulations have been performed with axisymmetric geometries with the open source OpenFOAM toolbox. Additionally, three different experiments were carried out. First, acoustic signal experiments were done to analyze the response of the bass-reflex ports. Second, a mechanical vibration was tested in order to discard this source of noise. A good agreement has been found between numerical and experimental results, especially in the frequency band of the detected noise, around 1200 Hz. The presented CFD approach has proved a useful and cost-effective tool to face this kind of phenomena.
Engineering Brief 257 (Download now)

EB3-2 A Novel 32-Speakers Spherical SourceAngelo Farina, Università di Parma - Parma, Italy; Lorenzo Chiesi, University of Parma - Parma, Italy
The construction and test of a novel compact spherical source equipped with 32 individually driven 2" loudspeakers is presented. The new sound source is designed for making room acoustics measurements, emulating the directivity pattern of various music instruments or human talkers and singers. The 32 signals feeding the loudspeakers can be obtained by three different approaches: a set of High Order Ambisonics coefficients computed for emulating the polar pattern of a fixed directivity source a set of SPS (Spatial PCM Sampling) signals recorded around a real source, employing a corresponding set of 32 microphones placed on a sphere surrounding the real source, a matrix of FIR filters, designed employing a mathematical theory almost identical to the one developed for creating virtual microphones from a spherical microphone array [1]. The presentation will show details of the construction of the new loudspeaker array, and the results of the first tests performed for evaluating the capability of creating arbitrary polar radiation patterns.
Engineering Brief 258 (Download now)

EB3-3 Distracting NoiseThomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Tobias Clauß, Fraunhofer Institute for Digital Media Technology (IDMT) - Ilmenau, Germany; Nicolas Pachatz, Technical University of Ilmenau - Ilmenau, Germany; Clemens Müller, Technical University of Ilmenau - Ilmenau, Germany; Matthias-Fritz Melzer, Technical University of Ilmenau - Ilmenau, Germany; Judith Liebetrau, Fraunhofer IDMT - Ilmenau, Germany
Noise in domestic and work environments is usually measured based on noise power. This is not reflecting the fact that temporal and spectral structure of the noise, but also the activity of the test subject influences the annoyance. In addition there is a difference between artificial noise signals and noise signals that probably have a meaning to the listener. In this study 15 assessors evaluated the perception of 23 natural noise stimuli at four different levels in two different situations. The situations are spatial recordings of a library and a canteen. The test subjects are not focusing on listening but on tasks but told to indicate when noise is distorting their activities.
Engineering Brief 259 (Download now)

EB3-4 Noise-Robust Speech Emotion Recognition Using Denoising AutoencoderHun Kyu Ha, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Nam Kyun Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Woo Kyeong Seong, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper, a method of noise-robust speech emotion recognition under music noises is proposed by using a denoising autoencoder (DAE) and a support vector machine (SVM). The proposed method first trains a DAE by using emotional speech signals corrupted by music noises. Then, the output values from a middle layer of the DAE are used as speech features. Next, an SVM is trained to classify emotions using the DAE features. The performance of the proposed method is compared with that of a conventional SVM classifier. Consequently, it is shown that the proposed method relatively improves the overall emotion recognition rate by 9.76% under music noise conditions, compared to the conventional method.
Engineering Brief 260 (Download now)

EB3-5 Non-Intrusive Rumble Filtering by VLF Crossfeed with High Filter SlopesDouglas Self, The Signal Transfer Company - London, UK
Vinyl discs create subsonic anti-phase signals because they are never perfectly flat and cause vertical stylus movement. This is often made worse by cartridge-arm resonance, giving amplitudes peaking around 10 Hz and requiring 40 dB of attenuation to reduce them to the vinyl noise floor. A conventional rumble filter needs very steep slopes to do this without unduly affecting the bottom of the audio band at 20 Hz. L-R crossfeed at low frequencies cancels the anti-phase signals, converting bass information to mono. This is not a new idea but has never caught on, probably because in published implementations the anti-phase filtering slope always comes out as –6dB/octave, no matter what order of lowpass filter is used to control the crossfeed. It is demonstrated that time-correction of the lowpass filter group delay with simple allpass filtering gives a much steeper slope of –18dB/octave for 2nd, 3rd, and 4th-order Butterworth filters, and intrusion into the audio band is minimized; this is believed novel. A practical design using 2nd-order filters was built and measured and gave the desired results.
Engineering Brief 261 (Download now)

EB3-6 The Misunderstood Transformer: “The Answer Lies in the Flux!”Michael Turner, Nidec Motor Corporation - Harrogate, UK
Whether used for power supply or signal interfacing, transformers are a key component of audio equipment. What is surprising is the extent to which these apparently simple devices are misunderstood. This engineering brief seeks to dispel some of the more common myths, to clarify the relationships between voltage, current, flux, and saturation and to thereby assist with proper design, selection, and application of (mainly) power transformers
Engineering Brief 262 (Download now)

EB3-7 Designing a Laboratory for Immersive ArtsChristopher Keyes, Hong Kong Baptist University - Kowloon, Hong Kong
This brief gives an overview of a facility dedicated to 3D sound and multi-screen video. It houses a control room and a theater with the region’s only 24.2 channel sound system and 5 permanent HD video screens. At roughly 200 m3 it is a relatively small facility but has many uses. In its construction we were afforded a wide range of possibilities for spatial configurations and equipment choice. It is hoped that presenting some detail on these design decisions, including choices available and ultimately implemented, may be of use for readers planning and budgeting their own facilities.
Engineering Brief 263 (Download now)

EB3-8 Design and Implementation of a Low-Latency, Lightweight, High-Performance Voice Interface Front-EndThierry Heeb, ISIN-SUPSI - Manno, Switzerland; Digimath - Sainte-Croix, Switzerland; Andrew Stanford-Jason, XMOS Ltd. - Bristol, UK; Tiziano Leidi, ISIN-SUPSI - Manno, Switzerland
Smart voice interfaces are the enabler of a new generation of consumer products such as network connected, voice enabled personal assistants. These are based on distributed architectures where voice is captured and pre-processed locally before being sent to remote servers for semantic analysis and response generation. A key element to achieve lowest cost and the best natural speech user experience is to keep latency to a minimum. This eBrief presents a lightweight, high-performance voice interface front-end software framework capable of handling multiple PDM microphones and integrating PDM to PCM conversion, high-resolution inter-channel delay, decimation, signal correction, and optional output data framing. The software forms a complete smart voice interface front-end running on the XMOS xCORE-200 architecture and achieving very low latency.
Engineering Brief 264 (Download now)

EB3-9 Multiphysical Simulation Methods for Loudspeakers—Nonlinear CAE-Based SimulationsAlfred Svobodnik, Konzept-X GmbH - Karlsruhe, Germany; Roger Shively, JJR Acoustics, LLC - Seattle, WA, USA; Marc-Olivier Chauveau, Moca Audio - Tours, France; Tommaso Nizzoli, Konzept-X GmbH - Karlsruhe, Germany
This is the third in a series of papers on the details of loudspeaker design using multiphysical computer aided engineering simulation methods. In this paper the simulation methodology for accurately modeling the nonlinear electromagnetics and structural dynamics of a loudspeaker will be presented. Primarily, the calculation of nonlinear force factor Bl(x), nonlinear inductance Le(x), and stiffness Kms(x) in the virtual world will be demonstrated. Finally, results will be presented correlating the simulated model results to the measured physical parameters. From that, the important aspects of the modeling that determine its accuracy will be discussed.
Engineering Brief 265 (Download now)

 
 

EB4 - eBriefs 4: Lectures


Tuesday, June 7, 12:00 — 13:45 (Room 353)

Chair:
Thomas Görne, Hamburg University of Applied Sciences - Hamburg, Germany

EB4-1 A Survey of Suggested Techniques for Height Channel Capture in Multichannel RecordingRichard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Will Howie, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Jack Kelly, McGill University - Montreal, QC, Canada
Capturing audio in three dimensions is becoming a required skill for many recording engineers. Playback formats and systems now exist that take advantage of height channels, which introduce the aspect of elevation into the experience. In this engineering brief several exploratory techniques in height channel capture are reviewed and compared. Techniques optimized for conventional 5.1 surround sound are employed, and additional microphones are added to increase the immersive experience. Methods that have proven to be successful in 5.1 recordings are modified for 3D audio capture and the results are discussed. This case study will show an overview of the groundwork currently underway.
Engineering Brief 266 (Download now)

EB4-2 Perceptually Significant Parameters in Stereo and Binaural Mixing with Logic Pro Binaural PannerBlas Payri, Universitat Politècnica de València - Valencia, Spain; Juan-Manuel Sanchis-Rico, Universitat Politècnica de València - Valencia, Spain
We conducted a perception experiment using organ chords recorded with 6 microphones and mixed in stereo and binaural, varying in maximum angle distribution (0º, 87º, 174º), and for binaural mixes, varying in elevation, front-rear distribution, and postprocessing. N=51 participants (audio-related students) listened to the 20 stimuli with headphones, classified them according to similarity, and rated their valence and immersion. Results show a high agreement on similarity (Cronbach’s alpha=.93) but very low agreement on valence and immersion ratings. The parameters that were perceived are the difference binaural/stereo, the binaural postprocessing style, and in a lesser degree, the angle. Elevation and rear distribution of sources did not yield any significant response.
Engineering Brief 267 (Download now)

EB4-3 3D Tune-In: The Use of 3D Sound and Gamification to Aid Better Adoption of Hearing Aid TechnologiesYuli Levtov, Reactify - London, UK; Lorenzo Picinali, Imperial College London - London, UK; Mirabelle D'Cruz, Reactify Music LLP - London, UK; Luca Simeone, 3D Tune-In consortium
3D Tune-In is an EU-funded project with the primary aim of improving the quality of life of hearing aid users. This is an introductory paper outlining the project’s innovative approach to achieving this goal, namely via the 3D Tune-In Toolkit, and a suite of accompanying games and applications. The 3D Tune-In Toolkit is a flexible, cross-platform library of code and guidelines that gives traditional game and software developers access to high-quality sound spatialization algorithms. The accompanying suite of games and applications will then make thorough use of the 3D Tune-In toolkit in order to address the problem of the under-exploitation of advanced hearing aid features, among others.
Engineering Brief 268 (Download now)

EB4-4 Binaural Auditory Feature Classification for Stereo Image Evaluation in Listening RoomsGavriil Kamaris, University of Patras - Rion Campus, Greece; Stamatis Karlos, University of Patras - Patras, Greece; Nikos Fazakis, University of Patras - Patras, Greece; Stergios Terpinas, University of Patras - Patras, Greece; John Mourjopoulos, University of Patras - Patras, Greece
Two aspects of stereo imaging accuracy from audio system listening have been investigated: (i) panned phantom image localization accuracy at 5-degree and (ii) sweet spot spatial spread from the ideal anechoic reference. The simulated study used loudspeakers of different directivity under ideal anechoic or realistic varying reverberant room conditions and extracted binaural auditory features (ILDs, ITDs, and ICs) from the received audio signals. For evaluation, a Decision Tree classifier was used under a sparse data self-training achieving localization accuracy ranging from 92% (for ideal anechoic when training/test data were similar audio category), down to 55% (for high reverberation when training/test data were different music segments).Sweet spot accuracy was defined and evaluated as a spatial spread statistical distribution function.
Engineering Brief 269 (Download now)

EB4-5 Elevation Control in Binaural RenderingAleksandr Karapetyan, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Felix Fleischmann, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Most binaural audio algorithms render the sound image solely on the horizontal plane. Recently, immersive and object-based audio applications like VR and games require control of the sound image position in the height dimension. However, measurements from elevated loudspeakers require a 3D loudspeaker setup. By analyzing early reflections of BRIRs with a fixed elevation, spectral cues for the height perception are extracted and applied to HRTFs. The parametrization of these cues allows the control of height perception. The sound image can be moved to higher as well as lower positions. The performance of this method has been evaluated by means of a listening test.
Engineering Brief 270 (Download now)

EB4-6 Headphone Virtualization for Immersive Audio MonitoringMichael Smyth, Smyth Research Ltd. - Bangor, UK; Stephen Smyth, Smyth Research Ltd. - Bangor, UK
There are a number of competing immersive audio encoding formats, such as Dolby Atmos, Auro-3D, DTS-X, and MPEG-H, but, to date, there is no single loudspeaker format for monitoring them all. While it is argued that one of the benefits of using audio objects within immersive audio is that it allows rendering to different, and even competing, loudspeaker formats, nevertheless the documented native formats of each immersive codec must be considered the reference point when monitoring each immersive audio system. This implies that the ability to switch between different loudspeaker layouts will be important when monitoring different immersive audio formats. The solution outlined here is based on the generation of virtual loudspeakers within DSP hardware and their reproduction over normal stereo headphones. The integrated system is designed to allow the accurate monitoring of any immersive audio system of up to 32 loudspeaker sources, with the ability to switch almost instantly between formats. 
Engineering Brief 271 (Download now)

EB4-7 Temporal Envelope for Audio ClassificationEwa Lukasik, Poznan University of Technology - Poznan, Poland; Cong Yang, University of Siegen - Siegen, Germany; Lukasz Kurzawski, RecArt - Poznan, Poland; Polish Radio Poznan
The paper reviews some applications of temporal envelope of audio signal from the perspective of a sound engineer. It contrasts the parametric representation of the temporal envelope (e.g., temporal centroid, attack time, attack slope) with the more global representation based on envelope shape descriptors. Such an approach would mimic the sound engineer expertise and could be useful for such classification tasks, as music genre, speech/music, musical instruments classification, and others.
Engineering Brief 272 (Download now)

 
 

EB5 - eBriefs 5: Lectures


Tuesday, June 7, 14:00 — 15:30 (Room 353)

Chair:
Emiliano Caballero Fraccaroli, Electric Lady Studios - New York, NY USA

EB5-1 An Investigation into Kinect and Middleware Error and Their Suitability for Academic Listening TestsThomas Johnson, University of Huddersfield - Huddersfield, West Yorkshire, UK; Ian Gibson, University of Huddersfield - Huddersfield, West Yorkshire, UK; Ben Evans, University of Huddersfield - Huddersfield, West Yorkshire, UK; Mark Wendl, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper investigates the accuracy and error introduced by middleware applications when used with the Kinect. The middleware applications (Synapse and GMS v3.0) were tested to quantify the error they introduce compared to the error of the Kinect and assess their suitability for use in academic listening tests.
Engineering Brief 273 (Download now)

EB5-2 How Can Actor Network Theory and Ecological Approach to the Perception Be Used to Analyze the Creative Audio Mixing Practice?Yong Ju Lee, University of West London, London College of Music - London, UK
In audio mixing, communication between the artist or producer and the mix engineer are crucial elements in creating a track that is authentic and aesthetically pleasing. Through a MA Record Production module "Performance in The Studio", the researcher explored the idea that mix engineers, artists and producers develop and select appropriate sounds for a track through a process of negotiation. Furthermore, this negotiation occurs through both verbal and non-verbal communication. Specifically, the researcher aims to look at subjective, or as 'vague', metaphorical descriptions and moments where the engineer, producer, and artists agree on the sound by recommendation and by synchronizing their expectations. However, a metaphorical description cannot define an exact meaning as it is insufficient linguistic tools. The researcher uses Actor Network Theory to understand this negotiation between the technical and the creative and the role of this process of communication and cognition in understanding the interaction and synchronization of the participants' mental representation of the mix and mix process. Furthermore, the researcher uses the Ecological Approach to Perception to analyze specific behavior and response from participants in the mixing process.
Engineering Brief 274 (Download now)

EB5-3 On the Silver Globe RevisitedJoanna Napieralska, Frederic Chopin University of Music - Warsaw, Poland; Dorota Nowocien, The Felix Nowowiejski Academy of Music - Bydgoszcz, Poland; Ronin Group Studio - Radom, Poland
On the Silver Globe is known to be the preeminent expression of Zulawski’s visionary ideas. Its shooting began in 1976, was halted in 1978 by the communist authorities, and then reconstituted in 1988. Digitally restored in 2016 – courtesy of the Polish Film Institute – it had its real first showing on the 20th of February at Lincoln Center in New York, just three days after the director’s death and 28 years after its premiere at the Cannes Film Festival, when the mono loudspeaker went down. Sound restoration is an example of the capabilities of modern technology in pursuing compromise between fidelity to the mono 35 mm magnetic original and digital 5.1 cinema in terms of sync, loudness, timbre, and spatial standards.
Engineering Brief 275 (Download now)

EB5-4 Analyzing Sonic Similarity in Hip-Hop through Critical Listening and Music TheoryDenis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada; Ben Duinker, McGill University - Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
The notion of a musical artist’s or genre’s sound is frequently evoked, but what sonic parameters define this sound? We address this question through an in-depth corpus analysis of 100 critically and commercially acclaimed hip-hop tracks from the genre’s golden age that we define as 1986–1996. We operationalize the term sound as the sum of both musical and production parameters. The practices of music theory and critical listening are brought together to analyze each track across 75 musical and production parameters in 9 categories. Through statistical analysis of our data set, we demonstrate that from these parameters we are capable of assembling groups of songs that sound alike. These groups are then compared against pre-existing groupings such as geographical location and recording label.
Engineering Brief 276 (Download now)

EB5-5 "Space Explorations": Broadening Binaural Horizons with Directionally-Matched Impulse Response Convolution ReverbMatthew Lien, Whispering Willows Records Inc. - Whitehorse, Yukon, Canada; Universal Music Publishing - Taipei, Taiwan
More people are listening with earphones than in the history of recorded music. But earphones locate typical audio claustrophobically in-and-around the listener's head due to an absence of localizing information the brain requires to externalize sound. When combined with recent trends to highly compress music, the results are an unnatural and unhealthy listening experience—a dumbing-down of the auditory faculty. But the rise of earphones has also brought binaural technology onto the radar. While most binaural music productions have been limited to capturing live performances within a single space, the pioneering application of directional binaural impulse response convolution reverb paired with directionally-matched binaural studio recordings restores acoustically diverse spatialization to music.
Engineering Brief 277 (Download now)

EB5-6 An Automated Source Separation Technology and Its Practical ApplicationsAlexandre Vaneph, Audionamix; Ellie McNeil, Audionamix; François Rigaud, Audionamix; Rick Silva, Audionamix
Audio source separation, the process of un-mixing, has long been seen as unreachable, "the holy grail." Recent progress in coupling digital signal processing with machine deep learning puts this process within reach of the typical sound audio engineer. Using our technology, we will demonstrate a few examples of separations focused on isolating voice tracks from fully arranged mixes and the opportunities that can be realized from this technology in a series of industry case studies.
Engineering Brief 278 (Download now)

 
 


Return to Engineering Briefs

EXHIBITION HOURS June 5th   10:00 – 18:00 June 6th   09:00 – 18:00 June 7th   09:00 – 16:00
REGISTRATION DESK June 4th   08:00 – 18:00 June 5th   08:00 – 18:00 June 6th   08:00 – 18:00 June 7th   08:00 – 16:00
TECHNICAL PROGRAM June 4th   09:00 – 18:30 June 5th   08:30 – 18:00 June 6th   08:30 – 18:00 June 7th   08:45 – 16:00
AES - Audio Engineering Society