Dr. Ivan Tashev
|Dr. Ivan Tashev|
| Photos by boB Gudgel
The PNW Section's December meeting was a presentation by Dr. Ivan Tashev of Microsoft Research, about the engineering used behind the audio capturing abilities of the popular Microsoft Kinect gaming device. 10 AES member and 19 non-members came to the Microsoft Research building in Redmond WA.
Dr. Tashev has been a software architect in the Speech Technology group at Microsoft Research since 2001. He received his masters (electronics) and PhD (computer science) degrees from the Technical University of Sofia, Bulgaria.Dr. Tashev is the author of the book, Sound Capture and Processing: Practical Approaches, Wiley, 2009. He is a senior member of IEEE and IEEE Signal Processing Society, and an AES PNW Section Committee member. Dr. Tashev has also published two other books, over 50 scientific papers, and is listed as inventor of 11 U.S. Patents, and 40 U.S. Patent applications.
Noting that he included slides of many relevant detailed mathematical equations, he said it's for the people who are interested and others can ignore them. Then, a few words about Microsoft Research, the special R&D division. Established in 1991, some 850 researchers work on technologies that Microsoft may need when necessary, giving them technological agility.
Ivan discussed the problems in sound capture and signal processing needed for a device such as Kinect. Many factors contribute to the eerie ability of the Kinect to localize players. While studio engineers can carefully select mikes, placement and so on, an inexpensive game accessory cannot. He covered the background of the physics of sound and microphone types, noting that the Kinect enclosure is an integral part of the 4 cardioid mikes used. The 4 mike array is used along with sophisticated signal processing to suppress noise from unwanted directions, to detect speech from noisy environments, and allow very precise beamforming. He went into some depth of the signal processing mathematics, but also noted that human perception of sounds cannot be so easily expressed using mathematics.
At the refreshments break, people were able to play the Xbox Kinect system in the room.
Resuming after the break, Ivan went further into the signal processing techniques. Their acoustic echo reduction was a particular breakthrough: they developed a method to work in stereo without distracting artifacts such as adding high distortion to decorrelate the channels. They hit upon playing some music that has a known spectral content and using it, instead of a test tone, to analyze the stereo echo.
Distinguishing multiple sound sources in one channel is another processing challenge. Microphone arrays can effectively be used to figure angular distance and separation of sources. Applications of all of these technologies include cars, where voice recognition and noise suppression can be used for hands-free phones, navigation and entertainment control. A short video of the MS CommuteUXsystem was shown demonstrating this. Ivan did mention that he nearly crashed the driving simulator only once.
Also noted was the Kinect cameras (infrared and visible) that contribute to the recognition of people and gestures. Finally, a demo of the Xbox voice recognition system showed how it responded when addressed: "Xbox, do this."
Ivan felt that the technologies are an important new way of interacting with computers, and may also remove some age and gender barriers that some people have to computer "games."
A Q&A session covered topics like firmware upgrades, accuracy of sound source detection, the Kinect biometrics identification, voice recognition and accent sensitivity, where the processing occurs, and 3rd party development support.
Finally, everyone got to play the Kinect games for the rest of the evening.
Reported by Gary Louie, PNW Section Secretary
Last modified 10/13/2011.