Sections

AES Section Meeting Reports

Pacific Northwest - May 11, 2016

Meeting Topic:

Speaker Name:

Meeting Location:

Summary

Dr. Ivan Tashev, the architect behind the audio technologies of many Microsoft products (and PNW Committee member) spoke on Spatial Audio at the May 2016 meeting of the PNW Section. With the recent release of the MS HoloLens augmented reality device, it was timely to now learn how sounds can be manipulated to appear in 3D space. The meeting took place at Microsoft Research's building on one of MS's major campuses in Redmond, WA. Around 58 attendees (26 or so members) were at the meeting, including AES Executive Director Bob Moses.

Dr. Tashev received his Master's degree in Electronic Engineering (1984) and PhD in Computer Science (1990) from the Technical University of Sofia, Bulgaria, where he was also an Assistant Professor. He joined Microsoft in 1998. Currently Dr. Tashev is a Partner Architect and leads the Audio and Acoustics Research Group at Microsoft Research Labs in Redmond, WA. He has published four books, more than 70 papers, and holds 30 U.S. patents. Dr. Tashev created the audio processing technologies incorporated in Windows, Microsoft Auto Platform, and the Round Table device. He served as the audio architect for Microsoft Kinect for Xbox, and HoloLens. Dr. Tashev is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington in Seattle, WA.


Dr. Tashev began by describing spatial audio as the technique to make a person perceive sound coming from any desired direction, with applications in music, movies, gaming and virtual reality devices. He went through some history of multi-channel audio and spatial audio, from mono through stereo and 5.1 and up. However, these systems are channel-based, and usually planar, with speakers at about the same level around the listener (there are exceptions, of course). With newer systems such as Dolby Atmos, progress towards making sounds seem to appear from any direction is apparent.

A discussion of channel-based sound field rendering was next. Speaker arrays can be manipulated to control sound directionality very effectively. Pros and cons of channel based systems (versus object-based) were noted.

Another method is to use headphones. The history and problems of binaural were mentioned. But to create a really good 3-D sound field with headphones, positional data must utilize the Head Related Transfer Function (HRTF) to account for the physical characteristics of a person's head, along with movement tracking of the head.

He then went on to compare augmented reality, where computer generated sensory inputs work with some aspect of the real, physical world, versus virtual reality, where the environment is all created and simulates a user's presence. He showed components of the devices, products on the market now and some uses.

Dr. Tashev then discussed object-based rendering of spatial audio. Starting with the known characteristics of human vision, hearing acuity and localization ability, you can determine what you need to do to achieve a good spatial sense. The HRTF is especially important - a generic one won't work nearly as well as a custom one. He described many details regarding obtaining personalized HRTFs, including a fast, cost-effective "Eigen-face" method utilizing the MS Kinect device.

Suitable headphones with tracking hardware already on the market were shown.

Ivan talked next about computing/rendering 3-D audio with the object-based method. A sound "scene" takes into account the desired sound objects, where they are supposed to sound like they are (and how to do that), the data about the user's HRTF and head movement, and so on, at perhaps 50 times per second.

An extended break was held, with attendees trying on a HoloLens after getting their own HRTFs. They could then experience an augmented reality 3-D audio scene. Assisting with this was one of Ivan's co-researchers, David Johnston. His name should be familiar if you ever used the program, Cool Edit, which he wrote. Door prizes were also awarded.

Lastly, Dr. Tashev spoke about Modal-based rendering of a sound field. With help from the Fourier Transform, models of sound fields can be made with microphone arrays. He showed a spherical device with 64 MEMS (Micro-Electro-Mechanical-Systems) microphones on its surface (looking just like a mini Death Star).

Ivan's crystal ball says that channel-based approaches have shown their limits. Parametric modal methods are being scrutinized. Joint object and modal decompositions will be commonplace, and device independence will be important.

A complete video and PowerPoint is on the Microsoft Research website, at: (TBA)

Written By:

More About Pacific Northwest Section

AES - Audio Engineering Society