AES London 2011
Paper Session P18
P18 - Source Enhancement
Monday, May 16, 09:00 — 13:00 (Room 1)
P18-1 Dereverberation in the Spatial Audio Coding Domain—Markus Kallinger, Giovanni Del Galdo, Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Oliver Thiergart, International Audio Laboratories - Erlangen, Germany
Spatial audio coding techniques are fundamental for recording, coding, and rendering spatial sound. Especially in teleconferencing, spatial sound reproduction helps in making a conversation feel more natural reducing the listening effort. However, if the acoustic sources are far from the microphone arrangement, the rendered sound may easily be corrupted by reverberation. This paper proposes a dereverberation technique, which is integrated efficiently into the parameter domain of Directional Audio Coding (DirAC). Utilizing DirAC’s signal model we derive a parametric method to reduce the diffuse portion of the recorded signal. Instrumental quality measures and informal listening tests confirm the efficiency of the proposed method to render a spatial sound scene less reverberant without introducing noticeable artifacts.
Convention Paper 8429 (Purchase now)
P18-2 Blind Single-Channel Dereverberation for Music Post-Processing—Alexandros Tsilfidis, John Mourjopoulos, University of Patras - Patras, Greece
Although dereverberation can be useful in many audio applications, such techniques often introduce artifacts that are unacceptable in audio engineering scenarios. Recently, the authors have proposed a novel dereverberation approach, suitable for both speech and music signals, based on perceptual reverberation modeling. Here, the method is fine-tuned for sound engineering applications and tested for both natural and artificial reverberation. The results show that the proposed technique efficiently suppresses reverberation without introducing significant processing artifacts and the method is appropriate for the post-processing of music recordings.
Convention Paper 8430 (Purchase now)
P18-3 Joint Noise and Reverberation Suppression for Speech Applications—Elias K. Kokkinis, Alexandros Tsilfidis, Eleftheria Georganti, John Mourjopoulos, University of Patras - Patras, Greece
An algorithm for joint suppression of noise and reverberation from speech signals is presented. The method requires a handclap recording that precedes speech activity. A running kurtosis technique is applied in order to extract an estimation of the late reflections of the room impulse response from the clap while a moving average filter is employed for the noise estimation. Moreover, the excitation signal derived from the Linear Prediction (LP) analysis of the noisy speech along with the estimated power spectrum of the late reflections are used to suppress late reverberation through spectral subtraction while a Wiener filter compensates for the ambient noise. A gain magnitude regularization step is also implemented to reduce overestimation errors. Objective and subjective results show that the proposed method achieves significant speech enhancement in all tested cases.
Convention Paper 8431 (Purchase now)
P18-4 System Identification for Acoustic Echo Cancellation Using Stepped Sine Method Related to FFT Size—TaeJin Park, Seung Kim, Koeng-mo Sung, Seoul National University - Seoul, Korea
A stepped sine method was applied for system identification to cancel acoustic echoes in a speaker phone system that has been widely used in recent mobile devices. We applied the stepped sine method by regarding Discrete Fourier Transform (DFT) as a uniform-DFT filter bank. By using this stepped sine method, we were able to obtain more accurate and detailed characteristics of non-linearity, dependent on the amplitude and frequency of speech. We stored the non-linearity information into linear transform matrices and estimated the responses of the mobile device speaker. The proposed method exhibits higher echo return loss enhancement (ERLE) and increased correlation when compared to the conventional method.
Convention Paper 8432 (Purchase now)
P18-5 Using Spaced Microphones with Directional Audio Coding—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ville Pulkki, Aalto University - Espoo, Finland
Directional audio coding (DirAC) is a perceptually motivated method to reproduce spatial sound, which typically uses input from first-order coincident microphone arrays. This paper presents a method to additionally use spaced microphone setups with DirAC. It is shown that since diffuse sound is incoherent between spatially separated microphones at certain frequencies, no decorrelation in DirAC processing is needed, which improves the perceived quality. Furthermore, the directions of sound sources are perceived to be more accurate and stable.
Convention Paper 8433 (Purchase now)
P18-6 Parameter Estimation in Directional Audio Coding Using Linear Microphone Arrays—Oliver Thiergart, International Audio Laboratories - Erlangen, Germany; Michael Kratschmer, Markus Kallinger, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional Audio Coding (DirAC) provides an efficient description of spatial sound in terms of few audio downmix signals and parametric side information, namely the Direction Of Arrival (DOA) and the diffuseness of the sound. Traditionally, the parameters are derived based on the active sound intensity vector that is often determined via 2-D or 3-D microphone grids. Adapting this estimation strategy to linear arrays, which are preferred in various applications due to form factor constraints, yields comparatively poor results. This paper proposes to replace the intensity based DOA estimation in DirAC by a specific estimation of signal parameters via rotational invariant techniques, namely ESPRIT. Moreover, a diffuseness estimator exploiting the correlation between the array sensors is presented. Experimental results show that the DirAC concept can be applied in practice also in conjunction with linear arrays.
Convention Paper 8434 (Purchase now)
P18-7 Extraction of Voice from the Center of the Stereo Image—Aki Härmä, Munhum Park, Philips Research Laboratories Eindhoven - Eindhoven, The Netherlands
Detection and extraction of the center vocal source is important for many audio format conversion and manipulation applications. First, we study some generic properties of stereo signals containing sources panned exactly to the center of the stereo image and propose an algorithm for the separation of a stereo audio signal into a center and side channels. In the 128th AES Convention a paper was presented (Convention Paper 8071) on listening tests comparing the perception of the widths of the stereo images of synthetic signal combinations. In this paper the same experiment is repeated with real stereo audio content using the proposed center separation algorithm. The main observation is that there are clear differences in the results. The reasons for the differences are discussed in light of the literature and analysis of the test signals and their binaural characteristics in the listening test setup.
Convention Paper 8435 (Purchase now)
P18-8 Directional Segmentation of Stereo Audio via Best Basis Search of Complex Wavelet Packets—Jeremy Wells, University of York - York, North Yorkshire, UK
A system for dividing time-coincident stereo audio signals into directional segments is presented. The purpose is to give greater flexibility in the presentation of spatial information when two-channel audio is reproduced. For example, different inter-channel time shifts could be introduced for segments depending on their direction. A novel aspect of this work is the use of complex wavelet packet analysis, along with “best basis” selection, in an attempt to identify time-frequency atoms that belong to only one segment. The system is described, with reference to the relevant underlying theory, and the quality of its output for the best bases from complex wavelet packets is compared with methods based on more established analysis and processing methods.
Convention Paper 8436 (Purchase now)