AES London 2011
Poster Session P3
P3 - Sound Field Analysis
Friday, May 13, 11:00 — 12:30 (Room: Foyer)
P3-1 Localization of Multiple Speech Sources Using Distributed Microphones—Maximo Cobos, Amparo Marti, Jose J. Lopez, Universidad Politecnica Valencia - Valencia, Spain
Source localization is an important task in many speech processing systems. There are many microphone array techniques intended to provide accurate source localization, but their performance is severely affected by noise and reverberation. The Steered-Response Power Phase Transform (SRP-PHAT) algorithm has been shown to perform very robustly in adverse acoustic environments; however, its computational cost can be an issue. Recently, the authors presented a modified version of the SRP-PHAT algorithm that improves its performance without adding a significant cost. However, the performance of the modified algorithm has only been analyzed in single source localization tasks. This paper explores further the possibilities of this localization method by considering multiple speech sources simultaneously active. Experiments considering different number of sources and acoustic environments are presented using simulations and real data.
Convention Paper 8327 (Purchase now)
P3-2 Detection of “Solo Intervals” in Multiple Microphone Multiple Source Audio Applications—Elias Kokkinis, University of Patras - Patras, Greece; Joshua Reiss, Queen Mary University of London - London, UK; John Mourjopoulos, University of Patras - Patras, Greece
In this paper a simple and effective method is proposed to detect time intervals where only a single source is active (solo intervals) for multiple microphone, multiple source settings commonly encountered in audio applications, such as live sound reinforcement. The proposed method is based on the short term energy ratios between all available microphone signals, and a single threshold value is used to determine if and which source is solely active. The method is computationally efficient and results indicate that it is accurate and fairly robust with respect to reverberation time and amount of source interference.
Convention Paper 8328 (Purchase now)
P3-3 A Real-Time Sound Source Localization and Enhancement System Using Distributed Microphones—Amparo Marti, Maximo Cobos, Jose J. Lopez, Universidad Politecnica Valencia - Valencia, Spain
The Steered Response Power - Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. A recently proposed modified SRP-PHAT algorithm has been shown to provide robust localization performance in indoor environments without the need for having a very fine spatial grid, thus reducing the computational cost required in a practical implementation. Sound source localization methods are commonly employed in many sound processing applications. In our case, we use the modified SRP-PHAT functional for improving noisy speech signals. The estimated position of the speaker is used to calculate the time-delay for each microphone and then the speech is enhanced by aligning correctly the microphone signals.
Convention Paper 8329 (Purchase now)
P3-4 Binaural Moving Sound Source Localization by Joint Estimation of ITD and ILD—Cheng Zhou, Ruimin Hu, Weiping Tu, Xiaochen Wang, Li Gao, Wuhan University - Wuhan, Hubei, China
Spatial cues ITD and ILD that provide sound localization information play a very important role in the binaural localization system. The efficient improvement of binaural moving sound source localization method by joint estimation of ITD and ILD based on Doppler effect is investigated. By removing Doppler effect influence, results show that the proposed binaural moving sound source localization method achieves 0.3%(velocity = 1m/s), 5.7%(velocity = 5m/s), and 10.5%(velocity = 10m/s) accuracy improvement in silent conditions. The performance of our method will be more effective as sound moves faster.
Convention Paper 8330 (Purchase now)
P3-5 Perceived Level of Late Reverberation in Speech and Music—Jouni Paulus, Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany, International Audio Laboratories Erlangen, Erlangen, Germany
This paper presents experimental investigations on the perceived level of running reverberation in various types of monophonic audio signals. The design and results of three listening tests are discussed. The tests focus on the influence of the input material, the direct-to-reverberation ratio, and the reverberation time using artificially generated impulse responses for simulating the late reverberation. Furthermore, a comparison between mono and stereo reverberation is conducted. It can be observed that with equal mixing levels, the input material and the shape of the reverberation tail have a prominent effect on the perceived level. The results suggest that mono and stereo reverberation with identical reverberation times and mixing ratios are perceived as having equal level regardless of the material.
Convention Paper 8331 (Purchase now)
P3-6 Reverberation Enhancement in a Modal Sound Field—Hugh Hopper, David Thompson, Keith Holland, University of Southampton - Hampshire, UK
The reverberation time of a room can be increased by using a reverberation enhancement system. These electronic systems have generally been installed in large rooms, where diffuse field assumptions are sufficiently accurate. Novel applications of the technology can be found by applying it to smaller spaces where isolated modal resonances will dominate at low frequency. An analysis of a multichannel feedback system within a rectangular enclosure is presented. To assess the performance of this system, metrics are defined based on the spatial and frequency variations of a diffuse field. These metrics are then used to optimize the parameters of the system using a genetic algorithm. It is shown that optimization significantly improves the performance of the system.
Convention Paper 8332 (Purchase now)
P3-7 An Advanced Implementation of a Digital Artificial Reverberator—Andrea Primavera, Stefania Cecchi, Laura Romoli, Paolo Peretti, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy
Reverberation is a well known effect particularly important for listening of recorded and live music. In this paper we propose a real implementation of an enhanced approach for a digital artificial reverberator. Starting from a preliminary analysis of the mixing time, the selected impulse response is decomposed in the time domain considering the early and late reflections. Therefore, a short FIR is used to synthesize the first part of the impulse response, and a generalized recursive structure is used to synthesize the late reflections, exploiting a minimization criterion in the cepstral domain. Several results are reported taking into consideration different real impulse responses and comparing the results with those obtained with previous techniques in terms of computational complexity and reverberation quality.
Convention Paper 8333 (Purchase now)
P3-8 Evaluation of Spatial Impression Comparing 2-Channel Stereo, 5-Channel Surround, and 7-Channel Surround with Height Channels for 3-D Imagery—Toru Kamekawa, Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toshikiko Date, AVC Networks Company, Panasonic Corporation - Osaka, Japan; Masaaki Enatsu, marimoRECORDS Inc. - Tokyo, Japan
Three-dimensional (3-D) imagery is now widely spreading as one of the next visual formats for Blu-ray or other future media. Since more audio channels are available with future media, the authors aim to find the suitable sound format for 3-D imagery. A pairwise comparison test was carried out comparing combinations of 3-D and 2-D imagery with 2-channel stereo, 5-channel surround and 7-channel surround sound (5 channel surround plus 2 height channels) asking better depth sense and better match between visual and audio images. The results show that 3-D imagery with 7 channel surround gives the highest sense of depth and match of visual and audio images.
Convention Paper 8334 (Purchase now)