The bulk of research on spatial audio perception is typically performed on listeners wearing headphones or listeners who are asked (or forced) to remain still while listening to sounds presented over loudspeakers. While these methods greatly reduce the complexity of an experimental design, neither are representative of how listeners actually listen. Listeners interact with their world, they are nearly continually moving and thus continually changing the spatial relationship between themselves and the sound sources around them. It is known that listeners take advantage of this, being engaged in a constant and sensitive comparison between their own motion and the apparent motion of the acoustic world. This comparison helps them make sense of their 3D surroundings: for example, localization accuracy increases for moving listeners, particularly when rotating their heads during front/back discrimination tasks, the cone of confusion essentially ceasing to exist the moment listeners are allowed to turn their heads. When the spatial relationship between a listener and the world is artificially made incoherent, on the other hand, either through static headphone playback or using motion tracking in virtual acoustic reproduction or loudspeaker arrays, listeners can experience a variety of spatial illusions, poor localization, and/or a total collapse of externalization.
In this keynote, Owen will present work on how our perception of the spatial properties of the acoustic world, real or virtual, is shaped by our movement through it. This will include discussion of the perceptually critical features of the head related transfer function (HRTF) and how we take advantage of movement-induced changes in monaural and binaural spectral cues, information that can reduce complexity in the creation of convincing spatial environments. Owen will also discuss work on how we arrive at a stable percept of the world in spite of our own motion and how we can create virtual acoustic environments that are “hyperstable,” seeming to be yet more world-locked than true physical signals in our environment. This line of research is focused on mapping the perceptual topology of acoustic space, allowing us greater ability to synthesize arbitrary sound sources with pinpoint accuracy. It is hoped that this talk will act both as an overview of past and present work on the perception of moving sounds by moving listeners, with particular relevance for those involved in creating convincing virtual acoustic environments, but also as an endorsement for virtual acoustics and virtual reality testing environments as powerful research tools for understanding perception in general.
After a bachelor’s degree in Neuroscience and Behavior at Wesleyan University, Dr Brimijoin went on to a PhD in Brain and Cognitive Sciences at the University of Rochester. There he studied spectrotemporal auditory receptive fields, attempting to better describe the nonlinear manner in which neurons establish preferences for the way in which sounds like speech change in spectral content over time. He then did a postdoc using multi-unit physiology in mouse models of hearing impairment to examine how age and hearing loss affects this non-linearity and contributes to increasing difficulty in understanding speech in noisy backgrounds.
He then moved into human psychophysics and spent nearly a decade working as a scientist for the MRC Institute of Hearing Research in Glasgow. There he studied speech intelligibility, sound localization, and auditory motion perception, in particular how normal-hearing and hearing-impaired listeners use their own movement to better understand speech and inform their spatial percept of the acoustic world. These theoretical underpinnings led to work developing hearing-assistance devices that take advantage of motion and spatial hearing. Currently he is at Facebook Reality Labs (formerly Oculus Research) doing work in auditory perception, spatial hearing, motion perception, and novel augmented reality devices that have a chance of improving our hearing in difficult environments.
We are living in an expansive universe of immersive sound applications that span from single to multi participant experiences, sometimes distributed across large distances. The quality of these experience, and the sense of immersion created, highly depends on the audio engineering techniques we use to capture and reproduce the sound, algorithmic solutions and rendering methods, and their effectiveness in persuading the auditory system in believing the plausibility of the experience. What is an immersive sound experience that is truly compelling, emotionally engaging, and connects us to another person, place or time? In this talk, we will discuss recent research and development in personalization and adaptation of immersive sound, and the advances in our journey towards creating a high-quality shared immersive sound experience.
Dr. Agnieszka Roginska is Music Associate Professor of Music Technology at the Steinhardt School, at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and its applications in augmented acoustic sensing. She is the author of numerous publications in the areas of the acoustics and psychoacoustic of spatial sound, immersive audio and auditory displays, and is the co-editor of the book titled "Immersive Sound: The Art and Science of Binaural and Multi-Channel Audio". Dr. Roginska is President-Elect of the AES, an AES Fellow, and is the faculty sponsor of the Society for Women in TeCHnology (SWiTCH) at NYU. She has a B.M. with a double major in Piano Performance and Computer Applications in Music (McGill University, an M.M. in Music Technology (New York University), and a Ph.D. in Music Technology (Northwestern University).
Audio techniques for visual media have developed and matured alongside mediums like film, television and video games, and professionals working in these mediums enjoy established paradigms and sophisticated tools. But the advancement of technologies like VR, AR and Spatial Computing has introduced a brand new medium, with a whole new set of audio challenges, considerations, and possibilities. Creating compelling audio in this medium requires designers to change the paradigm or how they think about sound, and to seek out new vocabulary, tools, and processes. In this talk I will discuss how audio in Spatial Computing is different than in other mediums, and highlight several specific challenges. I will also talk about the future of audio tools and introduce the concept of four domains of audio - a paradigm through which designers can address the challenges and take advantage of the unique opportunities of this medium.
Anastasia Devana is the Audio Director of the Sonic Arts team at Magic Leap – a South Florida tech startup that recently released its first Spatial Computing system, Magic Leap One. With a dual background in software development and music composition, she and her team use technology to push the boundaries of what’s possible with sound, and to advance the role and quality of audio in this new medium. Anastasia is an advocate for importance of audio and audio innovation in new realities.
Chair: Duncan Williams (University of York, UK)
Owen Brimijoin, Facebook Reality Labs, USA
Angela McArthur, Queen Mary University, London, UK
Annika Neidhardt, TU Ilmenau, Germany
Tapio Lokki, Aalto University, Helsinki, Finland
Ishwarya Ananthabhotla, MIT, Cambridge, Massachusetts, USA
Immersive audio technologies are continuously evolving through the development of improved spatial audio systems with interactive capabilities and more plausible virtual acoustic rendering. However, there are still many perceptual barriers to immersive spatial audio experiences that are indistinguishable from reality. Perceived differences in real vs. synthetic soundfields, problems with binaural rendering for individuals, perceptual issues in cross-modal presentations, and the overall timbral quality of the rendered audio are just some of the many challenges that must still be fully overcome. Alongside this are the challenges in data compression and computational bandwidth that are ever present in immersive systems. Can sensor technologies such as motion tracking, biometric feedback, and visual data be used to improve the perception of spatial audio rendering or create more immersive experiences? Can auditory attention be used to create more efficient spatial audio algorithms? What are the perceptual limitations of combined audio-visual presentations? How do we begin to evaluate new immersive technologies that exploit auditory attention and interaction?
Chair: Chris Pike (BBC, Salford, UK)
Agnieszka Roginska, New York University, New York, USA
Marcos Felipe Simón Gálvez, Audio Scenic/ University of Southampton, UK
Yesenia Lacouture Parodi, HUAWEI Technologies Duesseldorf GmbH - Munich, Germany
Guillaume Le Nost, L-Acoustics, London, UK
Simon Ashby, AudioKinetic, Canada
Next generation intelligent and interactive consumer based spatial audio reproduction technologies will accommodate the creation of engaging and immersive media content that has until now predominantly been routed in gaming and virtual reality experiences. But how will sensors and intelligent audio systems help us achieve highly personalised yet inclusive immersive experiences for a wide variety of applications? What is the future for immersive audio in broadcast, cinema, live sound, automotive, music and mobile communications as well as augmented and mixed reality experiences that go beyond just entertainment value into the realm of education and telecommunication. This panel of experts, who work in different application areas of spatial audio will discuss their perspectives on the future of immersive experiences, the opportunities arising from new technological developments in the field as well as the technical barriers yet to be overcome.
Chair: Gavin Kearney (University of York, UK)
Mirek Stiles, Abbey Road Studios, UK
Lorenzo Picinali, Imperial College London, UK
Anastasia Devana, Magic Leap, USA
Stephen Barton, Afterlight Inc. / Respawn Entertainment, Philadelphia, USA
Muki Kulhan, Muki-International, London, UK
With different parts of the immersive audio production chain being developed by various third parties, there is a danger of confusing the creator and perhaps scaring off talent before we even get off the ground. Some consistency between user manuals and videos across the various options still feels a little off in some regards. The current status quo doesn’t seem good enough. Can we do better? What are the barriers and how can they be broken down? What are the strengths and weaknesses of existing tools? Can we achieve better clarity in the different formats that are available, and should we move towards standardisation?