Friday, September 30, 8:30 am — 9:00 am (Theater Room 411)
Friday, September 30, 9:00 am — 9:30 am (Theater Room 411)
Virtual, Augmented, and Mixed Reality have the potential of delivering interactive experiences that take us to places of emotional resonance, give us agency to form our own experiential memories and become part of the everyday lives we will live in the future. Philip Lelyveld will define what Virtual, Augmented, and Mixed Reality are, present recent developments that will shape how they will potentially impact entertainment, work, learning, social interaction, and life in general, and raise rarely-mentioned but important issues that will impact how VR/AR/MR is adopted. Just as TV programming progressed from live broadcasts of staged performances to today’s very complex language of multithread long-form content, so VR/AR/MR will progress from the current “early days” of projecting existing media language with a few tweaks into a headset experience to a new VR/AR/MR-specific language that both the creatives and the audience understand. Philip's goal is to bring you up to speed on the current state, the potential, and the known barriers to adoption of Virtual, Augmented, and Mixed Reality.
Price: This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 9:45 am — 11:15 am (Rm 409A)
Moving Virtual Source Perception in 3D Space—Sam Hughes, University of York - York, UK; Gavin Kearney, University of York - York, UK
This paper investigates the rendering of moving sound sources in the context of real-world loudspeaker arrays and virtual loudspeaker arrays for binaural listening in VR experiences. Near Field compensated Higher Order Ambisonics (HOA) and Vector Base Amplitude Panning (VBAP) are investigated for both spatial accuracy and tonal coloration with moving sound source trajectories. A subjective listening experiment is presented over 6, 26, and 50 channel real and virtual spherical loudspeaker configurations to investigate accuracy of spatial rendering and tonal effects. The results show the applicability of different degrees of VBAP and HOA to moving source rendering and illustrate subjective similarities and differences to real and virtual loudspeaker arrays. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Disparity in Horizontal Correspondence of Sound and Source Positioning: The Impact on Spatial Presence for Cinematic VR—Angela McArthur, BBC R&D - London, UK; Queen Mary University of London - London, UK
This study examines the extent to which disparity in azimuth location between a sound cue and image target can be varied in cinematic virtual reality (VR) content, before presence is broken. It applies disparity consistently and inconsistently across five otherwise identical sound-image events. The investigation explores spatial presence, a sub-construct of presence, hypothesizing that consistently applied disparity in horizontal audio-visual correspondence elicits higher tolerance before presence is broken, than inconsistently applied disparity. Guidance about the interactions of subjective judgments and spatial presence for sound positioning is needed for non-specialists to leverage VR’s spatial sound environment. Although approximate compared to visual localization, auditory localization is paramount for VR: it is lighting condition-independent, omnidirectional, not as subject to occlusion, and creates presence. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Lateral Listener Movement on the Horizontal Plane (Part 2): Sensing Motion through Binaural Simulation in a Reverberant Environment—Matthew Boerum, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT); Bryan Martin, McGill University - Montreal, QC, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada
In a multi-part study, first-person horizontal movement between two virtual sound source locations in an auditory virtual environment (AVE) was investigated by evaluating the sensation of motion as perceived by the listener. A binaural cross-fading technique simulated this movement while real binaural recordings of motion were made as a reference using a motion apparatus and mounted head and torso simulator (HATS). Trained listeners evaluated the sensation of motion among real and simulated conditions in two opposite environment-dependent experiments: Part 1 (semi-anechoic), Part 2 (reverberant). Results from Part 2 were proportional to Part 1, despite the presence of reflections. The simulation provided the greatest sensation of motion again, showing that binaural audio recordings present less sensation of motion than the simulation. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 9:45 am — 11:15 am (Theater Room 411)
At IBC, AES 139th, CES, and other events Virtual Reality has been a huge topic. VR producers more and more realize the potential and need for spatial audio processing for VR applications. This workshop will discuss the following topics:
1. How to record audio for 360° video? Mics: Can we use the same techniques as for Movie/TV productions? Or does a B-format mic do the trick?
2. How to mix audio for 360/VR? Formats: What are the formats to store and deliver, Channels, object or ambisonics, a combination or binaural? Processing: How do we combine the different tracks? What Plugins and production tools are there? Monitoring: How can we monitor what the user will hear?
3. How to deliver audio for different VR applications? Creating a VR app with spatial audio (SDKs, tools) User cases: mobile, VR glass, streaming, browser-based Codec and limitations
4. How to render audio for 360/VR? Headphone rendering for VR glasses Speaker playback for TV What are resolution and latency requirements?
5. What quality aspects are important? Accuracy and plausibility What is the interaction with video?
Price: This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 11:30 am — 12:30 pm (Rm 409A)
Object-Based 3D Audio Production for Virtual Reality Using the Audio Definition Model—Chris Pike, BBC Research and Development - Salford, Greater Manchester, UK; University of York - Heslington, York, UK; Richard Taylor, BBC Research & Development - Salford, Greater Manchester, UK; Tom Parnell, BBC Research & Development - Salford, UK; Frank Melchior, BBC Research and Development - Salford, UK
This paper presents a case study of the production of a virtual reality experience with object-based 3D audio rendering using professional tools and workflows. An object-based production was created using a common digital audio workstation with real-time dynamic binaural sound rendering and visual monitoring of the scene on a head-mounted display. The Audio Definition Model is a standardized meta-data model for representing audio content including object-based, channel-based, and scene-based 3D audio. Using the Audio Definition Model the object-based audio mix could be exported to a single WAV file. Plug-ins were built for a game engine in which the virtual reality application and the graphics were authored to allow import of the object-based audio mix and custom dynamic binaural rendering.
Virtually Replacing Reality: Sound Design and Implementation for Large Scale Room Scale VR Experiences—Sally-Anne Kellaway, Zero Latency VR - Melbourne, Australia
Audio for Virtual Reality (VR) presents a significant array of challenges and augmentations to the traditional requirements of sound designers employed within the video games industry. The change in perspective and embodiment of the player requires the employment of additional tools and consideration of object size, spacing and spatial design as a more significant part of the sound design process. The author presents her approach to these tasks from the perspective of developing audio for the large-scale Room Scale video game developer Zero Latency. Focusing on the design considerations and processes required in this unique medium, the content of this presentation is designed to give insight in to this large-scale version of VR technology. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 11:30 am — 12:30 pm (Theater Room 411)
This presentation will discuss the challenges and provide specific solutions for creating audio within interactive virtual and augmented reality experiences. Audio techniques will be revealed that can be used today to advance storytelling and gameplay in virtual environments while creating a cohesive sense of place. Processes and techniques will be demonstrated for use in the creation of soundscapes in shipping products, ranging from immersive mixed reality experiences to multi-participant, multi-site, location based games.
Price: This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 12:45 pm — 1:45 pm (Theater Room 411)
The goal of VR and AR is to immerse the user in a created world by fooling the human perceptual system into perceiving rendered objects as real. This must be done without the brain experiencing fatigue: accurate audio representation plays a crucial role in achieving this. Unlike vision with a narrow foveated field of view, human hearing covers all directions in full 3D. Spatial audio systems must provide realistic rendering of sound objects in full 3D to complement stereo visual rendering. We will describe several areas of our research, initially conducted at the University of Maryland over a decade, and since at VisiSonics, that led to the development of a robust 3D audio pipeline which includes capture, measurement, mathematical modeling, rendering and personalization. The talk will also demonstrate workflow solutions designed to enrich the audio immersion for the gaming, video post-production and capture in VR/AR.
Friday, September 30, 2:00 pm — 3:00 pm (Rm 409A)
Crafting Cinematic High End VR Audio for Etihad Airways—Ola Björling, MediaMonks - New York, NY, USA; Eric Thorsell, MediaMonks
MediaMonks were approached by Etihad Airways via their ad agency, The Barbarian Group, to create a Virtual Reality experience taking place aboard their Airbus A380, the worlds largest and most luxurious non-private airplane. Challenges included capturing audio including dialogue aboard the real plane, crafting an experience that encourages repeated viewing, and combining a sense of truthful realism with a sense of dream-like luxury without relying on a musical score, all in a head tracked spatialized mix. Artistic conventions around non-diegetic sound and their psychological impact in VR also required consideration.
Creating an Immersive 360°-A/V Concert Experience at the 50th Montreux Jazz Festival Using Real-time Room Simulation—Sönke Pelzer, Audioborn GmbH - Cologne, Germany; Dirk Schröder, Audioborn GmbH - Cologne, Germany; Fabian Knauber, Audioborn GmbH
The Montreux Jazz Festival is the second largest jazz festival in the world. Since the beginning 50 years ago, all concerts have been recorded for the Montreux Jazz Archive, a unique treasure and the largest collection of live music, declared Unesco World Heritage. Following the vision of the deceased founder Claude Nobs, who always pushed the boundaries by applying latest recording technologies, this year’s 50th anniversary of the festival introduced capturing of 3D-audio and 360° stereoscopic video. Using a virtual reality camera, ambisonics microphones, as well as multitrack audio recording with 3D post-processing, an immersive capture and reproduction was achieved. This contribution highlights challenges, experiences and solutions of the preparation, recording, post-processing and release of this immersive production.
Friday, September 30, 2:00 pm — 3:30 pm (Theater Room 411)
"In this tutorial, we give an overview of recent research and tools for immersive spatial audio and sound propagation effects for VR. We also discuss sound design and integration aspects of adding these propagation techniques and capabilities to a massive VR platform, Project Sansar from Linden Lab. Spatial audio is important to maintain audio-visual coherence in VR for increased realism and better sense of presence. It refers to 3D audio and environmental effects like sound occlusion, reflection, diffraction, and reverberation. However, quickly simulating spatial audio to update with orientation and positional changes in VR is computationally challenging. We will give an overview of research techniques that have been developed in the last 10 years for efficiently modeling spatial audio for complex VR worlds. There will also be a hands on tutorial using Phonon, that has implemented many of these state of the art spatial audio algorithms. Phonon integrates with a wide variety of game engines and audio engines and we use these applications to demonstrate the performance of these novel spatial audio algorithms. We also demonstrate how 3D audio effect can be applied to environmental effects in spatial audio. Finally, we will discuss various sound design considerations when adding spatial audio for VR as well as practical challenges and considerations when adding spatial audio into a large VR platform, especially with regards to making spatial audio tools accessible for untrained users or content creators. "
Friday, September 30, 3:15 pm — 5:15 pm (Rm 409A)
Efficient, Compelling, and Immersive VR Audio Experience Using Scene Based Audio/Higher Order Ambisonics—Shankar Shivappa, Qualcomm Technologies Inc. - San Diego, CA, USA; Martin Morrell, Qualcomm Technologies Inc. - San Diego, CA, USA; Deep Sen, Qualcomm Technologies Inc. - San Diego, CA, USA; Nils Peters, Qualcomm, Advanced Tech R&D - San Diego, CA, USA; S. M. Akramus Salehin, Qualcomm Technologies Inc. - San Diego, CA, USA
Scene-based audio (SBA) also known as Higher Order Ambisonics (HOA) combines the advantages of object-based and traditional channel-based audio schemes. It is particularly suitable for enabling a truly immersive (360, 180) VR audio experience. SBA signals can be efficiently rotated and binauralized. This makes realistic VR audio practical on consumer devices. SBA also provides conductive mechanisms for acquiring live soundfields for VR. MPEG-H is a newly adopted compression standard that can efficiently compress HOA for transmission and storage purposes. It is the only known standard that provides compressed HOA end-to-end. Our paper describes a practical end-to-end chain for SBA/HOA based VR audio. Given its advantages over other formats, SBA should be “the format of choice” for a compelling VR audio experience. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones—Joseph G. Tylka, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
A method is presented for soundfield navigation through estimation of the spherical harmonic coefficients (i.e., the higher-order ambisonics signals) of a soundfield at a position within an array of two or more ambisonics microphones. An existing method based on blind source separation is known to suffer from audible artifacts, while an alternative method, in which a weighted average of the ambisonics signals from each microphone is computed, is shown to necessarily introduce comb-filtering and degrade localization for off-center sources. The proposed method entails computing a regularized least-squares estimate of the soundfield at the listening position using the signals from the nearest microphones, excluding those that are nearer to a source than to the listening position. Simulated frequency responses and predicted localization errors suggest that, for interpolation between a pair of microphones, the proposed method achieves both accurate localization and minimal spectral coloration when the product of angular wavenumber and microphone spacing is less than twice the input expansion order. It is also demonstrated that failure to exclude from the calculation those microphones that are nearer to a source than to the listening position can significantly degrade localization accuracy. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Immersive Audio Rendering for Interactive Complex Virtual Architectural Environments—Imran Muhammad, Hanyang University - Seoul, Korea; Jin Yong Jeon, Hanyang University - Seoul, Korea; Acoustics Authorized - Seoul, Korea
In this study we investigate methods for sound propagation in virtual complex architectural environments for spatialized audio rendering to use in immersive virtual reality (VR) scenarios. During the last few decades, sound propagation models have been designed and investigated for complex building structures using geometrical approach (GA) and hybrid techniques. For sound propagation, it is required to design fast simulation tools to incorporate a sufficient number of dynamically moving sound sources, room acoustical properties, and reflections and diffraction from interactively changing surface elements in VR environments. Using physically based models, we achieved a reasonable trade-off between sound quality and system performance. Furthermore, we describe the sound rendering pipeline into a virtual scene to simulate virtual environment. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Immersive Audio for VR—Joel Susal, Dolby Laboratories - San Francisco, CA, USA; Kurt Krauss, Dolby Germany GmbH - Nuremberg, Germany; Nicolas Tsingos, Dolby Labs - San Francisco, CA, USA; Marcus Altman, Dolby Laboratories - San Francisco, CA, USA
Object based sound creation, packaging, and playback of content is now prevalent in the Cinema and Home Theater, delivering immersive audio experiences. This has paved the way for Virtual Reality sound where precision of sound is necessary for complete immersion in a virtual world. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 3:45 pm — 4:30 pm (Theater Room 411)
Overview of solutions to some of the creative and practical challenges encountered in the audio post-production pipeline for 360 videos and Virtual Reality. We discuss methodologies for monitoring, editing, designing, mixing, mastering, and delivering audio for VR and 360 Videos. We discuss how to integrate a 3D audio workflow into existing post-production pipelines by merging best practices from games, television, and feature film along with new strategies for this emerging medium. The future of content delivery and playback is considered while still respecting current infrastructures for delivering client projects and accommodating the variety of delivery formats required for various virtual reality and 360 video platforms.
Friday, September 30, 4:45 pm — 6:15 pm (Theater Room 411)
This panel discussion will feature investigators in hearing science and audiology, including experts in binaural hearing, audiological assessment and rehabilitation, next-generation hearing aids, and auditory cognitive neuroscience. Brief presentations will highlight the current and future impacts of hearing science on AVAR—e.g., the evolution of binaural hearing aids as spatially intelligent devices, lessons from auditory scene analysis, and brain-directed signal processing. Applications of AVAR technology to hearing science and the audiology clinic will also be presented, e.g., the use of immersive VR to diagnose and retrain spatial hearing deficits and the benefits of using binaural devices to study hearing “in the wild.”
Friday, September 30, 5:45 pm — 6:45 pm (Rm 409A)
Streaming Immersive Audio Content—Johannes Kares, Sennheiser - Vienna, Austria; Veronique Larcher, Sennheiser - Switzerland
“Immersion [...] is a perception of being physically present in a non-physical world.” It is critical to think about immersive audio for live music streaming because giving listeners the illusion of being transported to a different acoustic environment makes the experience of streaming much more real. In this paper we are describing various approaches to enable audio engineers to create immersive audio content for live streaming, whether using existing tools and network infrastructure and delivering static binaural audio, or getting ready for emerging tools and workflows for Virtual Reality streaming. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
An Augmented Reality Audio Live Network for Live Electroacoustic Music Concerts—Nikos Moustakas, Ionian University - Nikea, Greece; Andreas Floros, Ionian University - Corfu, Greece; Bill Kapralos, University of Ontario Institute of Technology - Ontario, Canada
Augmented reality audio (ARA) represents a well-established and widely investigated concept that typically relies on mixing of the real acoustic environment of a listener with a virtual one. In this work, we conceptually extend this legacy ARA framework, aiming to the increment of the synthesized acoustic field spatial scale primarily in the real world domain. Such an increment actually acts as a virtual wide-angle sound focuser of wide area acoustic fields. We demonstrate the above Augmented Reality Audio Network (ARAN) concept in terms of a live electroacoustic music concert. A subjective evaluation derived strong and secure indications that the proposed ARAN framework may represent a strong alternative potential of legacy ARA in the artistic and creative domain. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Friday, September 30, 6:30 pm — 7:15 pm (Theater Room 411)
At the very least, VR audio is an exciting new paradigm for audio professionals to learn and, at most, it is a convergence of all the preceding subcategories of audio professions. Built with game engines, using film cinematography on a mobile platform with the presence of theater and live streaming inspired by broadcast, it will take all of our combined knowledge to pull off convincing VR. Audio is the glue that binds all of these sub-genres. This talk will identify what we can apply to VR audio from each of these proficiencies, what we can learn from other VR system technology (such as cameras or haptics), and how audio can inspire other professions with our standardization and collaboration.
Saturday, October 1, 8:30 am — 9:15 am (Rm 409A)
More then 50% of the emotional impact of a movie is audio. So creation of audio content becomes a new important topic in VR applications. But using common digital audio workstations (DAWs) for that is currently difficult. Strategies to connect DAWs with VR video devices, game/VR engines and how to use the new space and headphone virtualizations are questions the presentation will answer and show opportunities and tools.
Saturday, October 1, 8:30 am — 9:15 am (Theater Room 411)
GAUDIO LAB has developed VR audio technologies, one of which was adopted as MPEG-H 3D Audio international standard. This workshop will cover an end-to-end solution for the whole VR ecosystem as the producer intended. The audience will be guided through the basics of VR audio solutions when creating, delivering, and playing back 360/VR contents. All levels, including beginner, will be able to enjoy this workshop.
This workshop will cover:
- Immersive and interactive binaural rendering: a key technology for real immersive VR audio
- Object-based audio vs. scene-based audio
- VR audio distribution: app-based or codec-based
- Considerations on VR audio creation workflow
- VR sound recording: mono, Ambisonics and binaural
- Mixing and mastering with GWorks, AAX plugin for Pro Tools HD
- Quality matters. But how to guarantee the same sound quality at the end-user stage?
- GPlayer, a reference quality 360/VR audio player (API)
Saturday, October 1, 9:30 am — 11:30 am (Rm 409A)
Spatial Auditory Feedback in Response to Tracked Eye Position—Durand R. Begault, NASA Ames Research Center - Moffet Field, CA, USA; Charles M Salter Associates- Audio Forensic Center - San Francisco, CA USA
Fixation of eye gaze toward one or more specific positions or regions of visual space is a desirable feature within several types of high-stress human interfaces, including vehicular operation, flight deck control, target acquisition, etc. It is therefore desirable to have a means to give spatial auditory feedback to a human in such a system about whether or not the gaze is specifically directed towards a desired position. Alternatively, it is desirable to use eye position as a means of controlling a device that provides auditory feedback so that there is a correspondence between eye position and control voltages that manipulate aspects of an auditory cue that includes spatial position, pitch and/or timbre. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Perceptual Weighting of Binaural Information: Toward an Auditory Perceptual "Spatial Codec" for Auditory Augmented Reality—G. Christopher Stecker, Vanderbilt University School of Medicine - Nashville, TN, USA; Anna Diedesch, Vanderbilt University School of Medicine - Nashville, TN, USA
Auditory augmented reality (AR) requires accurate estimation of spatial information conveyed in the natural scene, coupled with accurate spatial synthesis of virtual sounds to be integrated within it. Solutions to both problems should consider the capabilities and limitations of the human binaural system, in order to maximize relevant over distracting acoustic information and enhance perceptual integration across AR layers. Recent studies have measured how human listeners integrate spatial information across multiple conflicting cues, revealing patterns of “perceptual weighting” that sample the auditory scene in a robust but spectrotemporally sparse manner. Such patterns can be exploited for binaural analysis and synthesis, much as time-frequency masking patterns are exploited by perceptual audio codecs, to improve efficiency and enhance perceptual integration. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
DeepEarNet: Individualizing Spatial Audio with Photography, Ear Shape Modeling, and Neural Networks—Shoken Kaneko, Yamaha Corporation - Iwata-shi, Japan; Tsukasa Suenaga, Yamaha Corporation - Iwata-shi, Japan; Satoshi Sekine, Yamaha Corporation - Iwata-shi, Japan
Individualizing spatial audio is of crucial importance for high-quality virtual and augmented reality audio. In this paper we propose a method for individualizing spatial audio by combining the recently proposed ear shape modeling technique with computer vision and machine learning. We use a convolutional neural network to obtain estimates of the ear shape model parameters from stereo photographs of the user ear. The individualized ear shape and its associated individualized head-related transfer function (HRTF) can be calculated from the obtained parameters based on the ear shape model and numerical acoustic simulations. Preliminary experiments, evaluating the shapes of the estimated individual ears, proved the effect of individualization. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Adjustment of the direct-to-Reverberant-Energy-Ratio to Reach Externalization within a Binaural Synthesis System—Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Stephan Werner, Technische Universität Ilmenau - Ilmenau, Germany; Florian Klein, Technische Universität Ilmenau - Ilmenau, Germany
The contribution presents a study that investigates the perception of spatial audio reproduced by a binaural synthesis system. The quality features externalization and room congruence are measured within a listening test. Former studies imply that especially externalization is decreased if acoustic divergence between the synthesized and listening room exists. Other studies show that the adjustment of the Direct-to-Reverberant- Energy-Ratio (DRR) can increase the perceived congruence between synthesized and listening room. Within this experiment test persons are able to adjust the DRR of the synthesis until perceptional congruence between the synthesis and the internal reference concerning the listening room occurs. The ratings show that the test persons are able to adjust DRR of the listening room and therefore externalization increases. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Saturday, October 1, 9:30 am — 10:15 am (Theater Room 411)
Dolby has been a pioneer in developing world’s leading object-based audio technologies and mixing tools for filmmakers and sound engineers around the world. In the last two years we have also been working closely with a number of VR pioneers in the content community to develop the tools and playback technologies for enabling high quality linear VR experiences. This workshop will cover the unique advantages of using object-based audio mixing for cinematic and experiential VR experiences. The AES audience can walk away with an understanding the power and flexibility of object-based audio mixing for creating more precise and convincing sound to match the visual—giving viewers a strong sense of presence.
Saturday, October 1, 10:30 am — 11:15 am (Theater Room 411)
This workshop will guide participants in the use of the FB360 Spatial Workstation toolset for 360 spatialized audio work in VR from both the technology and content creation perspectives. It will cover: Session configuration for VST (Reaper/Nuendo) and AAX (Pro Tools HD). A deep exploration of the tools and feature set for a variety of project types, highlighting examples and case studies. Step-by-step workflows to stay organized and achieve best-in-class VR audio outcomes. Maintaining differentiated spatialized vs. non-spatialized experiences across delivery formats. Encoding, levels and ingestion considerations across different platforms (mobile iOS/Android, Oculus Rift etc). The role and future developments of FB360 Spatial Workstation for audio in VR and AR.
Saturday, October 1, 11:30 am — 12:30 pm (Theater Room 411)
The workshop will explore the requirements and best practices of on-location sound capture for cinematic virtual reality. The panel will sketch out the differences for the on-site sound engineer when working on a cinematic VR shoot compared to a traditional cinema shoot. In this context, it will examine the benefits and drawbacks of different spatial audio capture solutions – such as binaural and ambisonics – and look at their best practices. As well, the panel will look at when and how to capture non-spatial sources on set and discuss the still unresolved pain points facing location engineers today.
Saturday, October 1, 12:45 pm — 1:45 pm (Theater Room 411)
Modern Virtual Reality has given rise to new ways of capturing and delivering audio experiences. It is challenging previously accepted audio standards by compelling the audio community at large to invent and innovate. This panel will discuss some of the techniques and challenges present in today's efforts to deliver live virtual reality audio experiences, both in terms of live capture / future broadcast / future release and live capture / "on the day" live streaming. Topics to be discussed will be definitions, standards, creative considerations, educational obligations reflections on the future, and more.
Saturday, October 1, 2:00 pm — 3:30 pm (Rm 409A)
Spatial Music, Virtual Reality, and 360 Media—Enda Bates, Trinity College Dublin - Dublin, Ireland; Francis Boland, Trinity College Dublin - Dublin, Ireland
The following paper documents the composition, recording, and post-production of a number of works of instrumental spatial music for a 360 video and audio presentation. The filming and recording of an orchestral work of spatial music is described with particular reference to the various ambisonic microphones used in the recordings, post production techniques, and the delivery of 360 video with matching 360 audio. The recording and production of a second performance of a newly composed spatial work for an acoustic quartet is also presented and the relationship between spatial music and 360 content is discussed. Finally, an exploration of the creative possibilities of VR in terms of soundscape and acousmatic composition is presented. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Positioning of Musical Foreground Parts in Surrounding Sound Stages—Christoph Hold, Technische Universität Berlin - Berlin, Germany; Lukas Nagel, Technische Universität Berlin - Berlin, Germany; Hagen Wierstorf, Technische Universität Ilmenau - Ilmenau, Germany; Alexander Raake, Technische Universität Ilmenau - Ilmenau, Germany
Object based audio offers several new possibilities during the sound mixing process. While stereophonic mixing techniques are highly developed, not all of them generate promising results in an object-based audio environment. An outstanding feature is the new approach of positioning sound objects in the musical sound scene, providing the opportunity of stable localization throughout the whole listening area. Previous studies have shown that even if object-based audio reproduction systems can enhance the playback situation, the critical and guiding attributes of the mix are still uncertain. This study investigates the impact of different spatial distributions of sound objects on listener preference, with a special emphasis on the distinction of high attention foreground parts of the presented music track. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
The Soundfield as Sound Object: Virtual Reality Environments as a Three-Dimensional Canvas for Music Composition—Richard Graham, Stevens Institute of Technology - Hoboken, NJ, USA; Seth Cluett, Stevens Institute of Technology - Hoboken, NJ, USA
Our paper presents ideas raised by recent projects exploring the embellishment, augmentation, and extension of environmental cues, spatial mapping, and immersive potential of scalable multichannel audio systems for virtual and augmented reality. Moving beyond issues of reproductive veracity raised by merely recreating the soundscape of the physical world, these works exploit characteristics of the natural world to accomplish creative goals that include the development of models for interactive composition, composing with physical and abstract spatial gestures, and linking sound and image. We are presenting a novel system that allows the user to treat the soundfield as a fundamental building block for spatial music composition and sound design. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Saturday, October 1, 2:00 pm — 2:45 pm (Theater Room 411)
Using the novel 3D audio engine, Auratorium, this tutorial discusses a new approach to design compelling audio environments for virtual reality applications such as cinematic VR, 360 degree videos and immersive sound productions. A light-footed production chain for the object-based creation of 3D audio content is presented including a fully fledged room acoustic simulation and reproduction, targeting binaural headphone reproduction as well as arbitrary loudspeaker layouts. Participants experience how to setup an easy workflow within their existing design tools (e.g., Unity and Pro Tools) and create realistic 3D audio by placing sound sources, receivers and adjusting material properties, room shape and directivities while listening to real-time rendered 3D audio.
Saturday, October 1, 2:45 pm — 3:30 pm (Theater Room 411)
Although many VR projects only use the sound of the original venue, other projects require additional audio content, especially on high budget productions, incorporating sound design content and other sound material. The problem with adding additional audio content to VR shots is the need to position sound in a very accurate way, which can be a nightmare when handling moving sound sources. This workshop will present an easy and very accurate way of positioning sounds on 360º videos, by dragging sounds on top of the image and track them around with key frame animation, a well-known CGI technique. Using a 3D CGI-like software for audio applications, with a virtual microphone approach, the same video can then be exported to different output formats, without the need to redo all sound positioning.
Saturday, October 1, 3:45 pm — 4:30 pm (Theater Room 411)
Nokia OZO is a professional quality VR camera with spatial sound recording support. OZO Audio workflow leverages the spatial sound capability and provides an optimal quality spatial audio workflow. This is enabled by a lossless interchange format based on the ITU/EBU Audio Definition Model (ADM), a high efficiency distribution format based on ISO MP4, and import functionality of these formats to digital audio workstations (DAW). In our approach, OZO Audio makes use of a traditional DAW workflow helping audio engineers to support VR audio productions. We are proud to announce our first full support for the OZO Audio workflow with Steinberg Nuendo.
Saturday, October 1, 4:00 pm — 5:30 pm (Rm 409A)
XY-Stereo Capture and Up-Conversion for Virtual Reality—Nicolas Tsingos, Dolby Labs - San Francisco, CA, USA; Cong Zhou, Dolby Laboratories - San Francisco, CA, USA; Abhay Nadkarni, Dolby Labs - San Francisco, CA, USA
We propose a perceptually-based approach to creating immersive soundscapes for VR applications. We leverage stereophonic content obtained from XY microphones as a basic building block that can be easily recorded, edited, and combined to provide a more compelling experience than can be obtained from recording at a single location. Central to our approach is a novel up-conversion algorithm that derives a nearly full-spherical parametric soundfield, including height information, from an XY recording. This approach enables a simpler, improved capture, when compared to alternative soundfield recording techniques. It can also take advantage of new object-based delivery formats for flexible delivery and playback. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Augmented Reality Headphone Environment Rendering—Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA; Keun Sup Lee, Apple Inc. - Cupertino, CA, USA
In headphone-based augmented reality audio applications, computer-generated audio-visual objects are rendered over headphones or ear buds and blended into a natural audio environment. This requires binaural artificial reverberation processing to match local environment acoustics, so that synthetic audio objects are not distinguishable from sounds occurring naturally or reproduced over loudspeakers. Solutions involving the measurement or calculation of binaural room impulse responses in a consumer environment are limited by practical obstacles and complexity. We propose an approach exploiting a statistical reverberation model, enabling practical acoustical environment characterization and computationally efficient reflection and reverberation rendering for multiple virtual sound sources. The method applies equally to headphone-based “audio-augmented reality”–enabling natural-sounding, externalized virtual 3-D audio reproduction of music, movie or game soundtracks. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Capturing and Rendering 360º VR Audio Using Cardioid Microphones—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This paper proposes a new microphone technique and a binaural rendering approach for 360º VR audio. Four cardioid microphones are arranged in a horizontal square, with 30 cm spacing and 90º subtended angle for each of the four pairs of adjacent microphones, in order to obtain the stereophonic recording angle (SRA) of 90º for a quadraphonic loudspeaker reproduction. The signals are binaurally synthesized with quadraphonic read-related impulse responses. This allows production of the same SRA for each of the four 90º segments whenever the listener rotates his or her head by 90º in a VR environment with a head-tracker, which is confirmed by a listening test. For vertical sound capturing, upward- and optional downward-facing cardioid microphones are added. This session is part of the co-located AVAR Conference which is not included in the normal convention All Access badge.
Saturday, October 1, 4:30 pm — 5:15 pm (Theater Room 411)
A few months ago, VideoStitch announced the Orah brand and its Orah 4i, a live streaming spherical camera, complete with immersive audio. The 4i uses four lenses and realtime stitching algorithms to produce 4K spherical video and four on-camera MEMS microphones to capture first order ambisonics. The output can be streamed directly to any platform that accepts live feeds of immersive content. This workshop explores the design of the Orah 4i camera’s integrated ambisonic audio capture, from selection and placement of the microphones, capsule correction EQ, level matching, omnidirectional to cardioid polar pattern processing, and conversion to first order B-format. The final design decisions and several measurements are shown.
Saturday, October 1, 5:30 pm — 6:15 pm (Theater Room 411)
Many producers are currently adopting 360 content creation. For image production, multiple hardware and software solutions are now available off the shelf. However, although audio is a key point to immersion, this is not paralleled on the audio side. An adequate, versatile codec for 3D audio is introduced in this workshop that adapts seamlessly to the needs of the VR industry, from storage to production, distribution/delivery, and rendering. Besides VR, the presented technology suite also finds uses for 2D movies, music, telepresence, and teleconferencing. Other related topics will be addressed: hybrid sound capture for live events, production formats, and integrated rendering techniques.
Saturday, October 1, 6:30 pm — 7:15 pm (Theater Room 411)
Two decades of progress can change how we live and think in ways that boggle the mind. Audio is a small piece of that, but it's our piece. Twenty years ago, the PC got rudimentary sound cards; now the entire “multitrack recording studio" lives on our computers. Some of us saw that development as inevitable but in 1996 those smart people sounded fairly edgy, to say the least. And of those smart people, who among them saw that texting would be the “killer app” for smart phones, in many ways trumping audio communication? Let’s take our accumulated wisdom from the past 20 years of growth and non-growth of audio and computing, and see if we can’t get some feel for what it will be like in this room 20 years from now, looking back. With luck, we will be nodding our heads sagely, saying, “Yep, we saw that one coming way back in 2016!"