143rd AES CONVENTION Spatial Audio Track Event Details

AES New York 2017
Spatial Audio Track Event Details

Wednesday, October 18, 9:00 am — 10:00 am (Rm 1E13)

Photo

Spatial Audio: SA01 - Crash Course in 3D Audio

Presenter:
Nuno Fonseca, Polytechnic Institute of Leiria - Leiria, Portugal; Sound Particles

A little confused with all the new 3D formats out there? Although most 3D audio concepts already exist for decades, the interest in 3D audio has increased in recent years, with the new immersive formats for cinema or the rebirth of Virtual Reality (VR). This tutorial will present the most common 3D audio concepts, formats, and technologies allowing you to finally understand buzzwords like Ambisonics/HOA, Binaural, HRTF/HRIR, channel-based audio, object-based audio, Dolby Atmos, Auro 3D, among others.

 
 

Wednesday, October 18, 9:30 am — 11:00 am (Rm 1E14)

Spatial Audio: SA02 - Riding the Faders: How to Stay Ahead of the Curve in Immersive Audio

Presenters:
Brennan Anderson, Senior Audio Producer, Pyramind Studios - San Francisco, CA, USA
Steve Horelick, Steve Horelick Music
Paul Special, Special Audio Sevices - Wantage, NJ, USA; ABC/Disney Television - New York, NY, USA
Richard Warp, Intonic - Emeryville, CA, USA

VR, AR and XR have highlighted the importance of sound for "presence" in all content types from live TV to film to games. This represents a unique opportunity for audio professionals. But making the transition to the new world of immersive realities is no easy task. Here you will learn from industry professionals who are creating new 3D audio workflows and growing professional networks while remaining firmly grounded in their craft. The Manhattan Producers’ Alliance is a New York/San Francisco-based membership organization comprised of engineers, composers and producers. Our focus is on nurturing personal creativity within the art and craft of music making.


AES Members can watch a video of this session for free.


 
 

Wednesday, October 18, 9:30 am — 10:30 am (Rm 1E06 - PMC Room)

Photo

Spatial Audio: SA19 - Sound Design in 3D for Dance Music

Presenter:
Lasse Nipkow, Silent Work LLC - Zurich, Switzerland

Producers of Dance Music pursue the target to put
nightclub audiences into trance.

3D audio includes a high
potential to increase sound impressions significantly and engage the emotions, as is the case in film sound. 3D audio allows to create spectacular spatiality and sound sources around the listener. There are however some challenges: Clubbers on the dance floor are not oriented in the same way as listener of classical music in the concert hall because they are dancing. PA systems of music clubs are nearly exclusively designed for mono and stereo reproduction. Therefore, the impact by changing to a 3D audio system is significant.

During the presentation, the possibilities of musical design in consideration of psychoacoustic phenomena will be described and demonstrated with example recordings and videos.

 
 

Wednesday, October 18, 11:00 am — 12:00 pm (Rm 1E06 - PMC Room)

Photo

Spatial Audio: SA16 - The Butterfly FX—Unintended Consequences of Mastering Tools

Presenter:
Piper Payne, Neato Mastering - San Francisco Bay Area, CA

 
 

Wednesday, October 18, 11:30 am — 12:30 pm (Rm 1E13)

Photo

Spatial Audio: SA03 - Spatial Music: Producing, Performing, and Publishing Music in 3D

Presenter:
Kedar Shashidhar, OSSIC - San Diego, CA

Attendees will learn to produce immersive spatial music in an emerging market with familiar tools. Using examples of spatially mixed recordings from award winning artists, the tutorial will denote different approaches in various genres including EDM, Pop, and Jazz among others. Additional takeaways include various ways in which produced content can translate to a live performance setting while subsequently being released on various online platforms that support spatial formats.

Tutorial Will Cover:
• Basic Concepts in Spatial Audio (HRTFs & Ambisonics)
• Tools, Signal Flow, and Practical Implementation of 3D Audio technology
• Creative Tips and Best Practices when producing a track in 3D


AES Members can watch a video of this session for free.


AES Technical Council This session is presented in association with the AES Technical Committee on Spatial Audio

 
 

Wednesday, October 18, 2:00 pm — 3:00 pm (Rm 1E06 - PMC Room)

Photo

Spatial Audio: SA04 - Spatial Audio for Multi-Tracked Recordings—Workflows, Phase Relationships, Equalization, Reverberation and Delivery Specs

Presenter:
Albert Leusink, Tumble and Yaw - Weehawken, NJ

VR and AR are making a big push onto the world's stage with projected revenues to exceed $120B by 2020. Spatial audio is an integral part of this, albeit in Ambisonics, Dolby Atmos, or other formats. In this tutorial we will learn about the differences and similarities between spatial and stereo workflows, illustrated by real-world examples. We will discuss binaural rendering engines and its impact on phase coloration and frequency response, phase relationships of stereo and mono sources in the Ambisonic soundfield, and loudness management for different delivery platforms.

AES Technical Council This session is presented in association with the AES Technical Committee on Spatial Audio

 
 

Wednesday, October 18, 4:00 pm — 5:00 pm (Rm 1E14)

Photo

Spatial Audio: SA05 - The State of the Art of Binaural Audio for Loudspeakers and Headphones

Presenter:
Edgar Choueiri, Princeton University - Princeton, NJ, USA

I will describe the challenges of binaural audio through headphones (BAH) and loudspeakers (BAL), recent solutions to these challenges, and the state of the art of binaural processing and content development tools. In particular I will describe BACCH 3D Sound processing, which relies on optimal crosstalk cancellation filters, head tracking and automatic individualization to deliver accurate 3D imaging from binaural audio. I will then describe the recently developed BACCH-HP headphones processing, which significantly enhances the robustness of 3D imaging and the ability to head-externalize binaural audio. I will use the powerful BACCH-dSP software, which allows designing BACCH filters for BAL and BAH, processing binaural audio, translational and rotational head tracking, and 3D mixing, to illustrate the talk and demonstrate the technologies.


AES Members can watch a video of this session for free.


 
 

Thursday, October 19, 9:00 am — 12:00 pm (Rm 1E11)

Paper Session: P06 - Spatial Audio—Part 1

Chair:
Ravish Mehra, Oculus Research - Redmond, WA, USA

P06-1 Efficient Structures for Virtual Immersive Audio ProcessingJean-Marc Jot, Magic Leap - Sunnyvale, CA, USA; Daekyoung Noh, Xperi Corp - Santa Ana, CA, USA
New consumer audio formats have been developed in recent years for the production and distribution of immersive multichannel audio recordings including surround and height channels. HRTF-based binaural synthesis and cross-talk cancellation techniques can simulate virtual loudspeakers, localized in the horizontal plane or at elevated apparent positions, for audio reproduction over headphones or convenient loudspeaker playback systems. In this paper we review and discuss the practical design and implementation challenges of immersive audio virtualization methods, and describe computationally efficient processing approaches and topologies enabling more robust and consistent reproduction of directional audio cues in consumer applications.
Convention Paper 9865 (Purchase now)

P06-2 Robust 3D Sound Capturing with Planar Microphone Arrays Using Directional Audio CodingOliver Thiergart, International Audio Laboratories Erlangen - Erlangen, Germany; Guendalina Milano, International Audio Laboratories Erlangen - Erlangen, Germany; Tobias Ascherl, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
Real-world VR applications require to capture 3D sound with microphone setups that are hidden from the field-of-view of the 360-degree camera. Directional audio coding (DirAC) is a spatial sound capturing approach that can be applied to a wide range of compact microphone arrays. Unfortunately, its underlying parametric sound field model is often violated which leads to a degradation of the spatial sound quality. Therefore, we combine the non-linear DirAC processing with a linear beamforming approach that approximates the panning gains in DirAC such that the required amount of non-linear processing is reduced while increasing the robustness against model violations. Additionally, we derive a DOA estimator that enables 3D sound capturing with DirAC using compact 2D microphone arrays, which are often preferred in VR applications.
Convention Paper 9866 (Purchase now)

P06-3 Frequency Bands Distribution for Virtual Source Widening in Binaural SynthesisHengwei Su, Tokyo University of the Arts - Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
The aim of this study is to investigate the perceived width in binaural synthesis. To synthesize sounds with extended source widths, monophonic signals were divided by 1/3-octave filter bank, each component was then distributed to different directions by convolution with head-related transfer function within the intended width. A subjective listening experiment was conducted by using pairwise comparison to evaluate differences of perceived widths between stimuli with different synthesis widths and distribution methods. The results showed that this processing method can achieve a wider sound source width in binaural synthesis. However, effectiveness may vary with spectrum characteristics of source signals. Thus, a further revision of this method is needed to improve the stability and the performance.
Convention Paper 9867 (Purchase now)

P06-4 Improving Elevation Perception in Single-Layer Loudspeaker Array Display Using Equalizing Filters and Lateral GroupingJulian Villegas, University of Aizu - Aizu Wakamatsu, Fukushima, Japan; Naoki Fukasawa, University of Aizu - Aizu Wakamatsu, Japan; Yurina Suzuki, University of Aizu - Aizu Wakamatsu, Japan
A system to improve the perception of elevated sources is presented. This method relies on “equalizing filters,” a technique that aims to compensate for unintended changes in the magnitude spectrum produced by the placement of loudspeakers with respect to the desired location. In the proposed method, when sources are on the horizon, a maximum of two loudspeakers are used for reproduction. Otherwise, the horizon spatialization is mixed with one that uses side loudspeakers grouped by lateral direction. Results from a subjective experiment suggest that the proposed method is capable of producing elevated images, but the perceived elevation range is somewhat compressed.
Convention Paper 9868 (Purchase now)

P06-5 Development and Application of a Stereophonic Multichannel Recording Technique for 3D Audio and VRHelmut Wittek, SCHOEPS GmbH - Karlsruhe, Germany; Günther Theile, VDT - Geretsried, Germany
A newly developed microphone arrangement is presented that aims at an optimal pickup of ambient sound for 3D Audio. The ORTF-3D is a discrete 8ch setup that can be routed to the channels of a 3D Stereo format such as Dolby Atmos or Auro3D. It is also ideally suited for immersive sound formats such as wavefield synthesis or VR/Binaural, as it creates a complex 3D ambience that can be mixed or binauralized. The ORTF-3D setup was developed on the basis of stereophonic rules. It creates an optimal directional image in all directions as well as a high spatial sound quality due to highly uncorrelated signals in the diffuse sound. Reports from sound engineers affirm that it creates a highly immersive sound in a large listening area and still is compact and practical to use.
Convention Paper 9869 (Purchase now)

P06-6 Apparent Sound Source De-Elevation Using Digital Filters Based on Human Sound LocalizationAdrian Celestinos, Samsung Research America - Valencia, CA, USA; Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Ritesh Banka, Samsung Research America - Valencia, CA USA; William Decanio, Samsung Research America - Valencia, CA, USA; Allan Devantier, Samsung Research America - Valencia, CA, USA
The possibility of creating an apparent sound source elevated or de-elevated from its current physical location is presented in this study. For situations where loudspeakers need to be placed in different locations than the ideal placement for accurate sound reproduction digital filters are created and connected in the audio reproduction chain either to elevate or de-elevate the perceived sound from its physical location. The filters are based on head related transfer functions (HRTF) measured in human subjects. The filters relate to the average head, ears, and torso transfer functions of humans isolating the effect of elevation/de-elevation only. Preliminary tests in a movie theater setup indicate that apparent de-elevation can be achieved perceiving about –20 degrees from its physical location.
Convention Paper 9870 (Purchase now)

 
 

Thursday, October 19, 9:30 am — 11:30 am (Rm 1E06 - PMC Room)

Spatial Audio: SA17 - PMC: A 9.1 Musical Experience

Presenters:
Morten Lindberg, 2L (Lindberg Lyd AS) - Oslo, Norway
Daniel Shores, Sono Luminus - Boyce, VA, USA; Shenandoah Conservatory Music Production and Recording Technology - Winchester, VA, USA

 
 

Thursday, October 19, 2:00 pm — 3:30 pm (Poster Area)

Poster: P11 - Spatial Audio

P11-1 Deep Neural Network Based HRTF Personalization Using Anthropometric MeasurementsChan Jun Chun, Korea Institute of Civil Engineering and Building Technology (KICT) - Goyang, Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Nam Kyun Kim, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
A head-related transfer function (HRTF) is a very simple and powerful tool for producing spatial sound by filtering monaural sound. It represents the effects of the head, body, and pinna as well as the pathway from a given source position to a listener’s ears. Unfortunately, while the characteristics of HRTF differ slightly from person to person, it is usual to use the HRIR that is averaged over all the subjects. In addition, it is difficult to measure individual HRTFs for all horizontal and vertical directions. Thus, this paper proposes a deep neural network (DNN)-based HRTF personalization method using anthropometric measurements. To this end, the CIPIC HRTF database, which is a public domain database of HRTF measurements, is analyzed to generate a DNN model for HRTF personalization. The input features for the DNN are taken as the anthropometric measurements, including the head, torso, and pinna information. Additionally, the output labels are taken as the head-related impulse response (HRIR) samples of a left ear. The performance of the proposed method is evaluated by computing the root-mean-square error (RMSE) and log-spectral distortion (LSD) between the referenced HRIR and the estimated one by the proposed method. Consequently, it is shown that the RMSE and LSD for the estimated HRIR are smaller than those of the HRIR averaged over all the subjects from the CIPIC HRTF database. 
Convention Paper 9860 (Purchase now)

P11-2 The Upmix Method for 22.2 Multichannel Sound Using Phase Randomized Impulse ResponsesToru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
The upmix technique for 22.2 multichannel sound was studied using room impulse responses (RIRs) processed by phase randomized technique. From the result of the first experiment, the spatial impression of proposed method was close to the original sound, but the timbre differed. In the second experiment we divided the RIRs at the moment when the diffuse reverberation tail begins (mixing time) by two kinds of time, namely fixed to 80 msec and different mixing times for each frequency band. From the result, the similarity of proposed methods and the original sound was improved, however, it is suggested that the similarity of the timbre depends on the sound sources and the suitable mixing time of RIRs.
Convention Paper 9861 (Purchase now)

P11-3 A 3D Sound Localization System Using Two Side Loudspeaker MatricesYoshihiko Sato, University of Aizu - Aizuwakamatsu-shi, Fukushima, Japan; Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
We have proposed a new 3D sound reproduction system that consists of two side loudspeaker matrices each with four loudspeakers. The 3D sound images that applied to this system were created by the amplitude panning method and convolution of head-related transfer function (HRTF). In our past research we used the loudspeaker matrices arranged as a square shape, nevertheless the accuracy of sound image localization should be improved. We changed the shape of loudspeaker matrices from a square to a diamond by rotating 45 degrees  to improve direction perception. As a result, we could be closer the localized sound images to intended directions than the square-shaped loudspeaker matrices by implementing the diamond-shaped loudspeaker matrices.
Convention Paper 9862 (Purchase now)

P11-4 Optimization of Interactive Binaural Processing François Salmon, CMAP - Ecole Polytechnique - Paris, France; Ecole nationale supérieure Louis-Lumière - Paris, France; Matthieu Aussal, CMAP - Ecole Polytechnique - Paris, France; Etienne Hendrickx, University of Brest - Paris, France; Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Laurent Millot, ENS Louis-Lumière - Paris, France; Acte Institute (UMR 8218, CNRS/University Paris 1) - Paris, France
Several monitoring devices may be involved during a post-production. Given its lower cost and practical aspects, head-tracked binaural processing could be helpful for professionals to monitor spatialized audio contents. However, this technology provides significant spectral coloration in some sound incidences and suffers from its current comparison to a stereophonic signal reproduced through headphones. Therefore, different processing methods are proposed to optimize the binaural rendering and to find a new balance between externalization and timbral coloration. For this purpose, the alteration of the HRTF spectral cues in the frontal area only has been studied. In order to evaluate the accuracy of such treatments, listening tests were conducted. One HRTF processing method offered as much externalization as the original HRTFs while having a closer timbre quality to the original stereo signal.
Convention Paper 9863 (Purchase now)

P11-5 A Direct Comparison of Localization Performance When Using First, Third, and Fifth Ambisonics Order for Real Loudspeaker and Virtual Loudspeaker RenderingLewis Thresh, University of York - York, UK; Cal Armstrong, University of York - York, UK; Gavin Kearney, University of York - York, UK
Ambisonics is being used in applications such as virtual reality to render 3-dimensional sound fields over headphones through the use of virtual loudspeakers, the performance of which has previously been assessed up to third order. Through a localization test, the performance of first, third, and fifth order Ambisonics is investigated for optimized real and virtual loudspeaker arrays utilizing a generic HRTF set. Results indicate a minor improvement in localization accuracy when using fifth order over third though both show vast improvement over first. It is shown that individualized HRTFs are required to fully investigate the performance of Ambisonic binaural rendering.
Convention Paper 9864 (Purchase now)

 
 

Thursday, October 19, 3:00 pm — 4:00 pm (Rm 1E09)

Photo

Spatial Audio: SA06 - Perceptual Thresholds of Spatial Audio Latency for Dynamic Virtual Auditory Environments

Presenter:
Ravish Mehra, Oculus Research - Redmond, WA, USA

Generating the acoustic signals that reproduce the properties of natural environments through headphones remains a significant technical challenge. One hurdle is related to the time it takes to update the signal each time the observer moves. The end-to-end spatial audio latency (SAL) is the time elapsed between the listener assuming a new position and the updated sound being delivered to their ears. It is comprised of latencies in head-tracking, HRTF interpolation and filtering, operating system callback, audio driver and hardware (D/A conversion) buffering, and other parts of the signal processing chain. Because SAL is currently inevitable, it is important to know what SAL is detectable to set minimum thresholds for SAL in virtual auditory environments.

We used a 2-interval-forced-choice paradigm to measure SAL detectability at (10 and 60 degree) azimuths, both with and without the presence of co-located visual stimuli. Overall, mean SAL thresholds were between 128ms and 158ms. Consistent with results from minimum audible motion angle data, thresholds were greater at larger azimuthal positions. A retrospective analysis revealed that listeners who strategically varied the velocity, acceleration and rate of their head rotation were better able to perform the task. This suggests that thresholds for SAL will be lower for applications where users are expected to move their heads more rapidly and abruptly. Results are discussed in the context of prior research and the potential implications for rendering Virtual Reality audio.

 
 

Thursday, October 19, 3:15 pm — 4:45 pm (Rm 1E06 - PMC Room)

Spatial Audio: SA07 - Practical Immersive Audio at Home

Chair:
Chris Pike, BBC R&D - Salford, Greater Manchester, UK; University of York - Heslington, York, UK
Presenters:
Jon Francombe, University of Surrey - Guildford, Surrey, UK
Hilmar Lehnert, Senior Director Audio Systems, Sonos - Boston, MA, USA
Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA

Current methods for immersive audio reproduction in the home include channel-based systems, object-based audio, different types of soundbars, multi-room wireless or Bluetooth loudspeakers, up-firing loudspeakers, ad hoc arrays of mobile phones, and so on. These very different approaches all unlock opportunities for creating immersive and personalizable listening experiences, and each has its own merits and limitations. This workshop will feature a panel of experienced industrial practitioners and academic researchers. It will provide a platform for discussion around different perspectives on the challenges and potential solutions for making engaging and personalizable spatial audio experiences available in the living room.

 
 

Friday, October 20, 12:00 pm — 1:00 pm (Rm 1E06 - PMC Room)

Spatial Audio: SA18 - PMC: Mixing Like a Producer

Presenter:
Chris Tabron

 
 

Friday, October 20, 1:45 pm — 3:15 pm (Rm 1E13)

Spatial Audio: SA08 - Immersive Audio for Music—Why Do It?

Presenters:
Stefan Bock, msm-studios GmbH - Munich, Germany
Morten Lindberg, 2L (Lindberg Lyd AS) - Oslo, Norway
Daniel Shores, Sono Luminus - Boyce, VA, USA; Shenandoah Conservatory Music Production and Recording Technology - Winchester, VA, USA

The panel will discuss concepts in the future of immersive music. They will explore ideas in how to reach the masses, explain the current efforts, and the challenges of reaching the consumers. But most of all, they will examine the question; “why are we doing it?”

 
 

Friday, October 20, 3:15 pm — 4:15 pm (Rm 1E06 - PMC Room)

Photo

Spatial Audio: SA09 - Kraftwerk and Booka Shade —The Challenge to Create Electro Pop Music in Immersive / 3D Audio Formats Like Dolby Atmos.

Presenter:
Tom Ammermann, New Audio Technology GmbH - Hamburg, Germany

Music has not a cinematic approach where spaceships are flying around the listener. Nonetheless, music can become a fantastic spatial listening adventure in immersive / 3D. How this sounds will be shown with the new Kraftwerk and Booka Shade Blu-ray releases this year. Production philosophies, strategies and workflows to create immersive / 3D in current workflows and DAWs will be shown and explained.

 
 

Friday, October 20, 4:30 pm — 5:30 pm (Rm 1E06 - PMC Room)

Photo

Spatial Audio: SA10 - Native Immersive Recordings

Presenter:
Daniel Shores, Sono Luminus - Boyce, VA, USA; Shenandoah Conservatory Music Production and Recording Technology - Winchester, VA, USA

In this tutorial Sono Luminus head engineer will demonstrate and discuss the techniques and planning process, as well as play examples from numerous albums including Los Angeles Percussion Quartet, ACME, Lorelei Ensemble, Skylark, Iceland Symphony Orchestra, and others.

 
 

Saturday, October 21, 9:00 am — 10:00 am (Rm 1E13)

Photo

Spatial Audio: SA11 - Creating Audio for Virtual Reality Applications

Presenter:
Bob Schulein, ImmersAV Technology - Schaumburg, IL, USA

Audio has always been an integral element in the creation more realistic audio-visual entertainment experiences. With the evolution of personal 3D audio and imaging technologies, entertainment experiences are possible with a higher degree of cognition, commonly referred to as virtual reality. The quest for more engaging user experiences has raised the challenge for more compelling audio. Elements of binaural hearing and sound capture have come to play a central role in existing and evolving production techniques. Of particular importance is the value of images related to audio content as a means of improving realism and minimizing binaural recording and reproduction artifacts. This tutorial will cover the elements of binaural audio as they relate to producing compelling entertainment and educational content for virtual reality applications. Specific areas to be covered with support audio and 3D anaglyph video demonstrations include: audio for games, music entertainment, radio drama, and music education. Audio production tools including binaural and higher order ambisonic capture microphone systems, with and without motion capture will be presented and demonstrated.

AES Technical Council This session is presented in association with the AES Technical Committee on Spatial Audio

 
 

Saturday, October 21, 9:00 am — 12:30 pm (Rm 1E11)

Paper Session: P16 - Spatial Audio—Part 2

Chair:
Jean-Marc Jot, Magic Leap - Sunnyvale, CA, USA

P16-1 On Data-Driven Approaches to Head-Related-Transfer Function PersonalizationHaytham Fayek, Oculus Research and Facebook - Redmond, WA, USA; Laurens van der Maaten, Facebook AI Research - New York, NY, USA; Griffin Romigh, Oculus Research - Redmond, WA, USA; Ravish Mehra, Oculus Research - Redmond, WA, USA
Head-Related Transfer Function (HRTF) personalization is key to improving spatial audio perception and localization in virtual auditory displays. We investigate the task of personalizing HRTFs from anthropometric measurements, which can be decomposed into two sub tasks: Interaural Time Delay (ITD) prediction and HRTF magnitude spectrum prediction. We explore both problems using state-of-the-art Machine Learning (ML) techniques. First, we show that ITD prediction can be significantly improved by smoothing the ITD using a spherical harmonics representation. Second, our results indicate that prior unsupervised dimensionality reduction-based approaches may be unsuitable for HRTF personalization. Last,  we show that neural network models trained on the full HRTF representation improve HRTF prediction compared to prior methods.
Convention Paper 9890 (Purchase now)

P16-2 Eigen-Images of Head-Related Transfer FunctionsChristoph Hold, Technische Universität Berlin - Berlin, Germany; Fabian Seipel, TU Berlin - Berlin, Germany; Fabian Brinkmann, Audio Communication Group, Technical University Berlin - Berlin, Germany; Athanasios Lykartsis, TU Berlin - Berlin, Germany; Stefan Weinzierl, Technical University of Berlin - Berlin, Germany
The individualization of head-related transfer functions (HRTFs) leads to perceptually enhanced virtual environments. Particularly the peak-notch structure in HRTF spectra depending on the listener’s specific head and pinna anthropometry contains crucial auditive cues, e.g., for the perception of sound source elevation. Inspired by the eigen-faces approach, we have decomposed image representations of individual full spherical HRTF data sets into linear combinations of orthogonal eigen-images by principle component analysis (PCA). Those eigen-images reveal regions of inter-subject variability across sets of HRTFs depending on direction and frequency. Results show common features as well as spectral variation within the individual HRTFs. Moreover, we can statistically de-noise the measured HRTFs using dimensionality reduction.
Convention Paper 9891 (Purchase now)

P16-3 A Method for Efficiently Calculating Head-Related Transfer Functions Directly from Head Scan Point CloudsRahulram Sridhar, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
A method is developed for efficiently calculating head-related transfer functions (HRTFs) directly from head scan point clouds of a subject using a database of HRTFs, and corresponding head scans, of many subjects. Consumer applications require HRTFs be estimated accurately and efficiently, but existing methods do not simultaneously meet these requirements. The presented method uses efficient matrix multiplications to compute HRTFs from spherical harmonic representations of head scan point clouds that may be obtained from consumer-grade cameras. The method was applied to a database of only 23 subjects, and while calculated interaural time difference errors are found to be above estimated perceptual thresholds for some spatial directions, HRTF spectral distortions up to 6 kHz fall below perceptual thresholds for most directions.
Convention Paper 9892 (Purchase now)

P16-4 Head Rotation Data Extraction from Virtual Reality Gameplay Using Non-Individualized HRTFsJuan Simon Calle, New York University - New York, NY, USA; THX; Agnieszka Roginska, New York University - New York, NY, USA
A game was created to analyze the subject’s head rotation during the process of localizing a sound in a 360-degree sphere in a VR gameplay. In this game the subjects are asked to locate a series of sounds that are randomly placed in a sphere around their heads using generalized HRTFs. The only instruction given to the subjects is that they need to locate the sounds as fast and accurate as possible by looking at where the sound was and then pressing a trigger. To test this tool 16 subjects were used. It showed that the average time that it took the subjects to locate the sound was 3.7±1.8 seconds. The average error in accuracy was 15.4 degrees. The average time that it took the subjects to start moving their head was 0.2 seconds approximately. The average rotation speed achieved its maximum value at 0.8 seconds and the average speed at this point was approximately 102 degrees per second.
Convention Paper 9893 (Purchase now)

P16-5 Relevance of Headphone Characteristics in Binaural Listening Experiments: A Case StudyFlorian Völk, Technische Universität München - Munich, Germany; WindAcoustics - Windach, Germany; Jörg Encke, Technical University of Munich - Munich, Germany; Jasmin Kreh, Technical University of Munich - Munich, Germany; Werner Hemmert, Technical University of Munich - Munich, Germany
Listening experiments typically target performance and capabilities of the auditory system. Another common application scenario is the perceptual validation of algorithms and technical systems. In both cases, systems other than the device or subject under test must not affect the results in an uncontrolled manner. Binaural listening experiments require that two signals with predefined amplitude or phase differences stimulate the left and right ear, respectively. Headphone playback is a common method for presenting the signals. This study quantifies potential headphone-induced interaural differences by physical measurements on selected circum-aural headphones and by comparison to psychoacoustic data. The results indicate that perceptually relevant effects may occur, in binaural listening experiments, traditional binaural headphone listening, and virtual acoustics rendering such as binaural synthesis.
Convention Paper 9894 (Purchase now)

P16-6 Evaluating Binaural Reproduction Systems from Behavioral Patterns in a Virtual Reality—A Case Study with Impaired Binaural Cues and Tracking LatencyOlli Rummukainen, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany; Sebastian Schlecht, International Audio Laboratories - Erlangen, Germany; Axel Plinge, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
This paper proposes a method for evaluating real-time binaural reproduction systems by means of a wayfinding task in six degrees of freedom. Participants physically walk to sound objects in a virtual reality created by a head-mounted display and binaural audio. The method allows for comparative evaluation of different rendering and tracking systems. We show how the localization accuracy of spatial audio rendering is reflected by objective measures of the participants' behavior and task performance. As independent variables we add tracking latency or reduce the binaural cues. We provide a reference scenario with loudspeaker reproduction and an anchor scenario with monaural reproduction for comparison.
Convention Paper 9895 (Purchase now)

P16-7 Coding Strategies for Multichannel Wiener Filters in Binaural Hearing AidsRoberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Beatriz Lopez-Garrido, Servicio de Salud de Castilla la Mancha (SESCAM) - Castilla-Mancha, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
Binaural hearing aids use sound spatial techniques to increase intelligibility, but the design of the algorithms for these devices presents strong constraints. To minimize power consumption and maximize battery life, the digital signal processors embedded in these devices have very low frequency clocks and low amount of available memory. In the binaural case the wireless communication between both hearing devices also increases the power consumption, making necessary the study of relationship between intelligibility improvement and required transmission bandwidth. In this sense, this paper proposes and compares several coding strategies in the implementation of binaural multichannel Wiener filters, with the aim of keeping minimal communication bandwidth and transmission power. The obtained results demonstrate the suitability of the proposed coding strategies.
Convention Paper 9896 (Purchase now)

 
 

Saturday, October 21, 10:15 am — 12:15 pm (Rm 1E13)

Spatial Audio: SA12 - Binaural Listening Experience

Moderator:
Marta Olko, New York University - New York, NY, USA

This is a listening session that will feature a selection of binaural recordings, mixes, and listening experiences from various artists, composers, recording, and mixing engineers.

 
 

Saturday, October 21, 10:45 am — 11:45 am (Rm 1E06 - PMC Room)

Spatial Audio: SA13 - 3D Ambeo Live and Studio Recordings

Moderator:
Gregor Zielinsky, Sennheiser Electronic GmbH & Co. KG - Germany
Presenters:
Jim Anderson, Anderson Audio NY - New York, NY, USA; Clive Davis Institute of Recorded Music, New York University - New York, NY, USA
Ulrike Schwarz, Anderson Audio NY - New York, NY, USA

Live recording experts Jim Anderson and Ulrike Schwarz of Anderson Audio New York captured this year’s Chelsea Music Festival for delivery via Sennheiser’s immersive Ambeo 3D audio format and will present this at the session.

Host Gregor Zielinsky of Sennheiser will present further examples of live and studio Ambeo productions. Also, the MLH 800 Twin plug in, will be explained and presented in a live session. This free Plug In makes the work with the double capsule/two way output MKH 800 Twin much easier and more flexible.

 
 

Saturday, October 21, 1:15 pm — 2:45 pm (Rm 1E06 - PMC Room)

Spatial Audio: SA15 - Afternoon Listening Session in 9.1

Co-moderators:
Paul Geluso, New York University - New York, NY, USA
David Bowles, Swineshead Productions LLC - Berkeley, CA, USA
Presenter:
Tom Ammermann, New Audio Technology GmbH - Hamburg, Germany

Please join us for a 90 minute immersive listening journey on Saturday afternoon. This session will be dedicated to experiencing recent recorded works created specifically for multichannel loudspeaker listening environments. The program will include classical, pop, electronic, jazz, and world music recordings created by a variety of engineers and producers who are dedicated to the art of spatial audio.

 
 

Saturday, October 21, 1:30 pm — 3:15 pm (Rm 1E12)

Engineering Brief: EB06 - Spatial Audio

Chair:
Matthieu Parmentier, francetélévisions - Paris, France

EB06-1 How Streaming Object Based Audio Might WorkAdrian Wisbey, BBC Design and Engineering - London, UK
Object based media is being considered as the future platform model by a number of broadcasting and production organizations. This paper is a personal imagining of how object based broadcasting might be implemented with IP media as the primary distribution whilst still supporting traditional distributions such as FM, DAB and DVB. The examples assume a broadcaster supporting a number of linearly scheduled services providing both live (simulcast ) and on-demand (catch-up) content. An understanding of the basics of object based audio production and broadcasting by the reader is assumed. Whilst this paper specifically discusses audio or radio broadcasting many of the components and requirements are equally valid in a video environment.
Engineering Brief 398 (Download now)

EB06-2 DIY Measurement of Your Personal HRTF at Home: Low-Cost, Fast and ValidatedJonas Reijniers, University of Antwerp - Antwerpen, Belgium; Bart Partoens, University of Antwerp - Antwerp, Belgium; Herbert Peremans, University of Antwerp - Antwerpen, Belgium
The breakthrough of 3D audio has been hampered by the lack of personalized head-related transfer functions (HRTF) required to create realistic 3D audio environments using headphones. In this paper we present a new method for the user to personalize his/her HRTF, similar to the measurement in an anechoic room, yet it is low-cost and can be carried out at home. We compare the resulting HRTFs with those measured in an anechoic room. Subjecting the participants to a virtual localization experiment, we show that they perform significantly better when using their personalized HRTF, compared to a generic HRTF. We believe this method has the potential of opening the way for large scale commercial use of 3D audio through headphones.
Engineering Brief 399 (Download now)

EB06-3 Audio Localization Method for VR ApplicationJoo Won Park, Columbia University - New York, NY, USA
Audio localization is a crucial component in the Virtual Reality (VR) projects as it contributes to a more realistic VR experience to the users. In this paper a method to implement localized audio that is synced with user’s head movement is discussed. The goal is to process an audio signal real-time to represent three-dimensional soundscape. This paper introduces a mathematical concept, acoustic models, and audio processing that can be applied for general VR audio development. It also provides a detailed overview of an Oculus Rift- MAX/MSP demo.
Engineering Brief 400 (Download now)

EB06-4 Sound Fields Forever: Mapping Sound Fields via Position-Aware SmartphonesScott Hawley, Belmont University - Nashville, TN, USA; Sebastian Alegre, Belmont University - Nashville, TN, USA; Brynn Yonker, Belmont University - Nashville, TN, USA
Google Project Tango is a suite of built-in sensors and libraries intended for Augmented Reality applications allowing certain mobile devices to track their motion and orientation in three dimensions without the need for any additional hardware. Our new Android app, "Sound Fields Forever," combines locations with sound intensity data in multiple frequency bands taken from a co-moving external microphone plugged into the phone's analog jack. These data are sent wirelessly to a visualization server running in a web browser. This system is intended for roles in education, live sound reinforcement, and architectural acoustics. The relatively low cost of our approach compared to more sophisticated 3D acoustical mapping systems could make it an accessible option for such applications.
Engineering Brief 401 (Download now)

EB06-5 Real-time Detection of MEMS Microphone Array Failure Modes for Embedded MicroprocessorsAndrew Stanford-Jason, XMOS Ltd. - Bristol, UK
In this paper we describe an online system for real-time detection of common failure modes of arrays of MEMS microphones. We describe a system with a specific focus on reduced computational complexity for application in embedded microprocessors. The system detects deviations is long-term spectral content and microphone covariance to identify failures while being robust to the false negatives inherent in a passively driven online system. Data collected from real compromised microphones show that we can achieve high rates of failure detection.
Engineering Brief 402 (Download now)

EB06-6 A Toolkit for Customizing the ambiX Ambisonics-to-Binaural RendererJoseph G. Tylka, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
An open-source collection of MATLAB functions, referred to as the SOFA/ambiX binaural rendering (SABRE) toolkit, is presented for generating custom ambisonics-to-binaural decoders for the ambiX binaural plug-in. Databases of head-related transfer functions (HRTFs) are becoming widely available in the recently-standardized “SOFA format” (spatially-oriented format for acoustics), but there is currently no (easy) way to use custom HRTFs with the ambiX binaural plug-in. This toolkit enables the user to generate custom binaural rendering configurations for the plug-in from any SOFA-formatted HRTFs or to add HRTFs to an existing ambisonics decoder. Also implemented in the toolkit are several methods of HRTF interpolation and equalization. The mathematical conventions, ambisonics theory, and signal processing implemented in the toolkit are described.
Engineering Brief 403 (Download now)

EB06-7 withdrawnN/A

Engineering Brief 404 (Download now)

 
 

Saturday, October 21, 3:00 pm — 4:00 pm (Rm 1E06 - PMC Room)

Spatial Audio: SA14 - Capturing Height: Recording Techniques that Include the Vertical Dimension

Presenters:
David Bowles, Swineshead Productions LLC - Berkeley, CA, USA
Paul Geluso, New York University - New York, NY, USA
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA
Panelist:
Gregor Zielinsky, Sennheiser Electronic GmbH & Co. KG - Germany

Height sound contains crucial information about sound sources, the recording space, and ambient sounds. As object oriented and dedicated height channel playback system become more common, recording engineers need to be familiar with effective height sound capturing techniques to create true three-dimensional sound recordings. Height sound can be captured using specialty microphones and decoders, or by adding dedicated height microphones to an existing mono, stereo or surround microphone system. At this workshop, the panelists will give a comprehensive overview of current height sound recording techniques and share their personal experiences working with height sound. Recent music recordings made with some of the techniques discussed at the workshop will be played during a technical tour of the studio facilities at NYU Steinhardt.

 
 


Return to Spatial Audio Track Events