AES Munich 2009
Poster Session P7
P7 - Spatial Audio Processing
Thursday, May 7, 16:30 — 18:00
P7-1 Low Complexity Binaural Rendering for Multichannel Sound—Kangeun Lee, Changyong Son, Dohyung Kim, Samsung Advanced Institute of Technology - Suwon, Korea
The current paper is concerned with an effective method to emulate the multichannel sound in a portable environment where low power is required. The goal of this paper is to show the complexity of binaural rendering of the multichannel to stereo sound systems in cases of portable devices. To achieve this, we proposed the modified discrete cosine transform (MDCT) based binaural rendering, combined with the Dolby Digital decoder (AC-3) that is a multichannel audio decoder. A reverberation algorithm is added to the proposed algorithm for closing to real sound. This combined structure is implemented on a DSP processer. The complexity and quality are compared with a conventional head-related transfer function (HRTF) filtering method and Dolby headphone that are the most current in commercial binaural rending technology, demonstrating significant complexity reduction and comparable sound quality to the Dolby headphone.
Convention Paper 7687 (Purchase now)
P7-2 Optimal Filtering for Focused Sound Field Reproductions Using a Loudspeaker Array—Youngtae Kim, Sangchul Ko, Jung-Woo Choi, Jungho Kim, SAIT, Samsung Electronics Co., Ltd. - Gyeonggi-do, Korea
This paper describes audio signal processing techniques in designing multichannel filters for reproducing an arbitrary spatial directivity pattern with a typical loudspeaker array. In designing the multichannel filters, some design criteria based on, for example, least-squares methods and the maximum energy array are introduced as non-iterative optimization techniques with a lower computational complexity. The abilities of the criteria are first evaluated with a given loudspeaker configuration for reproducing a desired acoustic property in a spatial area of interest. Also, additional constraints are considered to impose for minimizing the error between the amplitudes of actual and the desired spatial directivity pattern. Their limitations in practical applications are revealed by experimental demonstrations, and finally some guidelines are proposed in designing optimal filters.
Convention Paper 7688 (Purchase now)
P7-3 Single-Channel Sound Source Distance Estimation Based on Statistical and Source-Specific Features—Eleftheria Georganti, Philips Research Europe - Eindhoven, The Netherlands, University of Patras, Patras, Greece; Tobias May, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Steven van de Par, Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands; John Mourjopoulos, University of Patras - Patras, Greece
In this paper we study the problem of estimating the distance of a sound source from a single microphone recording in a room environment. The room effect cannot be separated from the problem without making assumptions about the properties of the source signal. Therefore, it is necessary to develop methods of distance estimation separately for different types of source signals. In this paper we focus on speech signals. The proposed solution is to compute a number of statistical and source-specific features from the speech signal and to use pattern recognition techniques to develop a robust distance estimator for speech signals. Experiments with a database of real speech recordings showed that the proposed model is capable of estimating source distance with acceptable performance for applications such as ambient telephony.
Convention Paper 7689 (Purchase now)
P7-4 Implementation of DSP-Based Adaptive Inverse Filtering System for ECTF Equalization—Masataka Yoshida; Haruhide Hokari; Shoji Shimada, Nagaoka University of Technology - Nagaoka, Niigata, Japan
The Head Related Transfer Function (HRTF) and the inverse Ear Canal Transfer Function (ECTF) must be accurately determined if stereo earphones are realized out-of-head sound localization (OHL) with high presence. However, the characteristics of ECTF depend on the type of earphone used and the number of earphone mounting and demounting operations. Therefore, we present a DSP-based adaptive inverse filtering system for ECTF equalization in this paper. The buffer composition and size of DSP were studied so as to implement operation processing. As a result, we succeeded in constructing a system that was able to work in the audio-band of 15 kHz with the sampling frequency of 44.1 kHz. Listening tests clarified that the effective estimation error of the adaptive inverse-ECTF for OHL was less than –11 dB with convergence time of about 0.3 seconds.
Convention Paper 7690 (Purchase now)
P7-5 Improved Localization of Sound Sources Using Multi-Band Processing of Ambisonic Components—Charalampos Dimoulas, George Kalliris, Konstantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the use of multi-band ambisonic-processing for improved sound source localization. Energy-based localization can be easily delivered using soundfield microphone pairs, as long as free field conditions and the single omni-directional-point-source model apply. Multi-band SNR-based selective processing improves the noise tolerance and the localization accuracy, eliminating the influence of reverberation and background noise. Band-related sound-localization statistics are further exploited to verify the single or multiple sound-sources scenario, while continuous spectral fingerprinting indicates the potential arrival of a new source. Different sound-excitation scenarios are examined (single /multiple sources, narrowband / wideband signals, time-overlapping, noise, reverberation). Various time-frequency analysis schemes are considered, including filter-banks, windowed-FFT and wavelets with different time resolutions. Evaluation results are presented.
Convention Paper 7691 (Purchase now)
P7-6 Spatial Audio Content Management within the MPEG-7 Standard of Ambisonic Localization and Visualization Descriptions—Charalampos Dimoulas, George Kalliris, Kostantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on spatial audio video/imaging and sound field visualization using ambisonic-processing, combined with MPEG-7 description schemes for multi-modal content description and management. Sound localization can be easily delivered using multi-band ambisonic processing under free-field and single point-source excitation conditions, offering an estimate on the achieved accuracy. Sound source forward propagation models can be applied in case that confident localization accuracy has achieved, to visualize the corresponding sound field. Otherwise, 3-D audio/surround sound reproduction simulation can be used instead. In any case, sound level distribution colormap-videos and highlighting images can be extracted. MPEG-7 adapted description schemes are proposed for spatial-audio audiovisual content description and management, facilitating a variety of user-interactive postprocessing applications.
Convention Paper 7692 (Purchase now)