AES Vienna 2007: Poster Session P17

P17 - Spatial Audio Perception and Processing

Monday, May 7, 11:30 — 13:00

P17-1 Sound Source Localization and B-Format Enhancement Using Sound Field Microphone Sets—Charalampos Dimoulas, Kostantinos Avdelidis, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the implementation of sound-field microphone arrays for sound source localization purposes and B-format enhancement. There are many applications where spatial audio information is very important, while reverberant sound-field and ambient noise deteriorate the recording conditions. As examples we may refer to sound recordings during movie production, virtual reality environments, teleconference and distance learning applications using 3-D audio capabilities. B-format components, provided from a single sound field microphone, are adequate to estimate sound source direction of arrival, while the combination of two sound field microphones allows estimating the exact source location. In addition, the eight (or more) available signal components can be used to apply delay and sum techniques, enabling SNR improvements and virtual positioning of a signal B-format microphone to any desired place. Simplicity, reduced computational load, and effectiveness are some of the advantages of the proposed methodology, which is evaluated via software simulations.
Convention Paper 7091 (Purchase now)

P17-2 Research on Widening the Virtual Listening Space in the Automotive Environment—Jeong-Hun Seo, Lae-Hoon Kim, Hwan Shim, Koeng-Mo Sung, Seoul National University - Seoul, Korea
This paper represents the research about a way to widen the virtual space in cars. Generally, the interior of cars contains small volume compared to normal listening environments. This makes listeners feel a little stuffy. Therefore, the way to widen a virtual space in cars is needed. One of the most important cues for spaciousness is the lateral reflections in accordance with room acoustics, so we will widen virtual space in cars using artificial lateral reflections in automotive environments.
Convention Paper 7092 (Purchase now)

P17-3 Perceptual Distortion Maps for Room Reverberation—Thomas Zarouchas, John Mourjopoulos, University of Patras - Patras, Greece
From reverberated audio signals and using as reference the input (anechoic) audio, a number of distortion maps are extracted indicating how room reverberation distorts in time-frequency scales, perceived features in the received signal. These maps are simplified to describe the monaural time-frequency/level distortions and the distortion of the spatial cues (i.e., inter-channel cues and coherence), which are very important for sound localization in a reverberant environment. Such maps are studied here as functions of room parameters (size, acoustics, distance, etc.), as well as due to input signal properties. Overall perceptual distortion ratings are produced and reverberation-resilient signal features are extracted.
Convention Paper 7093 (Purchase now)

P17-4 A New Structure for Stereo Acoustic Echo Cancellation Based on Binaural Cue Coding—Yoomi Hur, Young-Choel Park, Dae Hee Youn, Yonsei University - Seoul, Korea
Most stereo teleconferencing systems involve an acoustic echo canceller to remove undesired echoes. However, it is difficult for the stereo echo cancellers to converge to the true echo path since the cross-correlation between the stereo input signals is high. To solve the problem, we propose a new structure that is combined with Binaural Cue Coding (BCC). BCC is a method for multichannel spatial rendering based on one down-mixed audio channel and side information. Based on the BCC, we propose a new single channel adaptive filter for the stereo echo cancellation, which takes the down-mixed monaural signal as the reference. Efficient voice conference systems can be implemented by using the proposed structure, since the BCC scheme to transmit stereo signals as mono signals with a number of side information, enables low-bit-rate transmission. Simulation results confirm that the convergence speed is increased and the misalignment problem is solved. In addition, the proposed structure has better tracking capability.
Convention Paper 7094 (Purchase now)

P17-5 Efficient Binaural Filtering in QMF Domain for BRIR—David Virette, France Telecom - Lannion, France; Pierrick Philippe, France Telecom - Cesson-Sévigné, France; Gregory Pallone, Rozenn Nicol, Julien Faure, France Telecom - Lannion, France; Marc Emerit, France Telecom - Cesson-Sévigné, France; Alexandre Guerin, France Telecom - Lannion, France
The MPEG Surround standard includes two "native" binaural processing modules for reproducing 3-D audio content over headphones. In this paper we present a novel and efficient binaural room impulse response (BRIR) modeling algorithm extending their possibilities. It is based on a parametric decomposition of the BRIR and is integrated within the subband domain implementation of the MPEG Surround binaural decoder. First we show that for impulse responses with room effects, our approach offers a significant reduction in terms of computational requirements compared to the native methods. Second, we report results from listening tests comparing different tradeoffs between complexity and quality. We show that using our method, the complexity can be reduced by a factor of two while preserving the optimum quality.
Convention Paper 7095 (Purchase now)

P17-6 A Parametric Model of Head-Related Transfer Functions for Sound Source Localization—Youngtae Kim, Jungho Kim, Sangchul Ko, Samsung Advanced Institute of Technology - Gyeonggi-do, Korea
A simple and effective parametric model of head-related transfer functions is presented for synthesizing binaural sound for practical 3-D sound reproduction systems. The suggested model is based on a simplified time-domain description of the physics of wave propagation and diffraction, and the components of the model have a one-to-one correspondence with the main characteristics of the measured head-related transfer functions such as sound diffraction, delay, and reflection. Their parameters are derived from some sets of the measurements, and thus enable the model to fit significant perceptual impact hidden in head-related transfer functions. Finally simple subjective listening tests verify the perceptual effectiveness of the model. This will show some promise of permitting efficient implementation in real-world applications.
Convention Paper 7096 (Purchase now)

P17-7 Binaural Response Synthesis from Center-of-Head Position Measurements for Stereo Applications—Sunil Bharitkar, Chris Kyriakakis, University of Southern California/Audyssey Labs. - Los Angeles, CA, USA
In two-channel or stereo applications, such as for televisions, automotive infotainment, and hi-fi systems, the loudspeakers are typically placed substantially close to each other. The sound field generated from such a setup creates an image that is perceived as monophonic while lacking sufficient spatial “presence.” Due to this limitation, a virtual sound technique may be utilized to widen the soundstage to give the perception to listener(s) that sound is originating from a wider angle using head-related-transfer functions (HRTFs). In this paper we present a method to synthesize responses at a listener’s ears given simply two room-response measurements, where each measurement is obtained between a loudspeaker, in a stereo loudspeaker setup, and an assumed center-of-head position (where the listener is assumed to be seated). The binaural response synthesis approach uses head-shadowing models (for inter-aural intensity differences) and the Woodworth-Schlosberg delay model. This approach is useful when dummy heads are not readily available for HRTF measurements as well as to generalize the approach to reflect measurements that would have been obtained over a large corpus of data (viz., human subjects). We also compare the responses obtained from this approach with measurements done with a dummy-head.
Convention Paper 7097 (Purchase now)

P17-8 Physical and Filter Pinna Models Based on Anthropometry—Patrick Satarzadeh, V. Ralph Algazi, Richard O. Duda, University of California at Davis - Davis, CA, USA
This paper addresses the fundamental problem of relating the anthropometry of the pinna to the localization cues it creates. The HRTFs for isolated pinnae (which are called PRTFs) are analyzed and modeled for sound sources directly in front of the listener. It is shown that a low-order filter model, with parameters suggested by or derived from pinna anthropometry, provides a good fit to the data. Methods are reported for adjusting the model parameters to fit the PRTF data. It is often possible to estimate the model parameters from a few geometrical measurements of the pinna. However, direct estimation from pinna anthropometry in general remains an unsolved problem, and the nature of this problem is discussed.
Convention Paper 7098 (Purchase now)

P17-9 A Novel Approach to Robotic Monaural Sound Localization—Fakheredine Keyrouz, Abdallah Bou Saleh, Klaus Diepold, Technische Universität München - Munich, Germany
This paper presents a novel monaural 3-D sound localization technique that robustly estimates a sound source within a 2.5-degree azimuth deviation and a 5-degree elevation deviation. The proposed system, an upgrade of monaural-based localization techniques, uses two microphones, one inserted within the ear canal of a humanoid head equipped with an artificial ear, and the second held outside the ear, 5 cm away from the inner microphone. The outer microphone is small enough so that minimal reflections that might contribute to localization errors are introduced. The system exploits the spectral information of the signals from the two microphones in such a way that a simple correlation mechanism, using a generic set of Head Related Transfer Functions (HRTFs), is used to localize the sound sources. The low computational requirement provides a basis for robotic real-time applications. The technique was tested through extensive simulations of a noisy reverberant room and further through an experimental setup. Both results demonstrated the capability of the monaural system to localize, with high accuracy, sound sources in a three-dimensional environment even in presence of strong noise and distortion.
Convention Paper 7099 (Purchase now)

P17-10 Optimized Binaural Modeling for Immersive Audio Applications—Christos Tsakostas, Holistiks Engineering Systems - Athens, Greece; Andreas Floros, Ionian University - Corfu, Greece
Recent developments related to immersive audio systems mainly originate from binaural audio processing technology. In this paper a novel high-quality binaural modeling engine is presented suitable for supporting a wide range of applications in the area of virtual reality, mobile playback, and computer games. Based on a set of optimized algorithms for Head-Related Transfer Functions (HRTF) equalization, acoustic environment modeling, and cross-talk cancellation, it is shown that the proposed binaural engine can achieve significantly improved authenticity for 3-D audio representation in real-time. A complete binaural synthesis application is also presented that demonstrates the efficiency of the proposed algorithms.
Convention Paper 7100 (Purchase now)

P17-11 Head-Related Transfer Function Calculation Using Boundary Element Method—Przemyslaw Plaskota, Andrzej B. Dobrucki, Wroclaw University of Technology - Wroclaw, Poland
Measuring the head-related transfer function (HRTF) is an efficient method in taking the influence of human body on sound spectrum into consideration. The database used in reproduction of the sound source position is built using the measurement results. The base is individual for each human, which makes it impossible to make a versatile base for all listeners. In this paper a numerical model of an artificial head is presented. The model allows the determination of the value of HRTF without the measurements. The model includes both geometrical and acoustical parameters. A method that is often used to determine the acoustical field parameters is the boundary element method, which was used to calculate the values of HRTF in this paper.
Convention Paper 7101 (Purchase now)

P17-12 Binaural Room Synthesis and Biaural Sky—Flexible Virtual Monitoring for Multichannel Audio also with Height Speakers—Klaus Laumann, Jörg Hör, Gerd Spikofski, Roman Stumpner, Günther Theile, Institut für Rundfunktechnik (IRT) - Munich, Germany; Helmut Wittek, Schoeps Mikrofone - Karlsruhe, Germany
This paper is for presentation only. No print copy will be available.