AES Munich 2009 Friday, May 8, 10:30 — 12:00
Poster Session P10
P10 - Audio for Telecommunications
P10-1 Harmonic Representation and Auditory Model-Based Parametric Matching and its Application in Speech/Audio Analysis—Alexey Petrovsky, Elias Azarov, Belarusian State University of Informatics and Radioelectronics - Minsk, Belarus; Alexander Petrovsky, Bialystok Technical University - Bialystok, Poland
The paper presents new methods for the selection of sinusoids and transients components in hybrid sinusoidal modeling of speech/audio. The instantaneous harmonic parameters (magnitude, frequency, and phase) are calculated as the result of the narrow band filtering of speech/audio. The frequency-modulated filters synthesis with the closed form impulse response has been proposed. The filter frequency bounds can be determined during the components frequency tracking and can be adjusted according to the fundamental frequency modulations. It can be implemented speech/audio harmonic/noise decomposition. The transient components modeling are presented by matching pursuit with frame-based psychoacoustic optimized wavelet packet dictionary. The choice of most relevant coefficients is based on maximizing the matching between the auditory excitation scalograms of original and modeled signals.
Convention Paper 7705 (Purchase now)
P10-2 Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference—Toni Hirvonen, Institute of Computer Science (ICS) of the Foundation for Research and Technology - Hellas, Greece; Jukka Ahonen, Ville Pulkki, TKK - Finland
In teleconferencing application of Directional Audio Coding, the transmitted data consists of monophonic audio signal and directional metadata measured in frequency bands depending on time. In reproduction, each frequency channel of the signal is reproduced to corresponding direction with corresponding diffuseness. This paper examines methods for reducing the data rate of the metadata. The compression methods are based on psychoacoustic studies about the accuracy of directional hearing, and further developed and validated. Informal tests with one-way reproduction, as well as usability testing where an actual teleconference was arranged, were utilized for this purpose. The results indicate that the data rate can be as low as approximately 3 kbit/s without a significant loss in the reproduced spatial quality.
Convention Paper 7706 (Purchase now)
P10-3 Speaker Detection and Separation with Small Microphone Arrays—Maximo Cobos, Jose J. Lopez, David Martinez, Universidad Politécnica de Valencia - Valencia, Spain
Small microphone arrays are desirable for many practical speech processing applications. In this paper we describe a system for detecting several sound sources in a room and enhancing a predominant target source using a pair of close microphones. The system consists of three main steps: time-frequency processing of the input signals, source localization via model fitting, and time-frequency masking for interference reduction. Experiments and results using recorded signals in real scenarios are discussed.
Convention Paper 7707 (Purchase now)
P10-4 Directional Audio Coding with Stereo Microphone Input—Jukka Ahonen, Ville Pulkki, TKK - Finland; Fabian Kuech, Giovanni Del Galdo, Markus Kallinger, Richard Schultz-Amling, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The use of stereo microphone configuration as input to teleconference application of Directional Audio Coding (DirAC) is presented. DirAC is a method for spatial sound processing, in which the direction of the arrival of sound and diffuseness are analyzed and used for different purposes in reproduction. So far, omnidirectional microphones arranged in an array have been used to generate input signals for one- and two-dimensional sound field analysis in DirAC processing. In this study the possibility to use domestic stereo microphones with DirAC analysis is investigated. Different methods to derive omnidirectional and dipole signals from stereo microphones for directional analysis are presented and their applicability is discussed.
Convention Paper 7708 (Purchase now)
P10-5 Robust Noise Reduction Based on Stochastic Spatial Features—Mitsunori Mizumachi, Kyushu Institute of Technology - Fukuoka, Japan
This paper proposes a robust noise reduction method relying on stochastic spatial features. Almost all of noise reduction methods have both strong and weak sides in the real world. In this paper time evolution of direction of arrival (DOA) and its stochastic reliability are the clues for selecting a suitable approach of noise reduction under time-variant noisy environments, where a DOA is an important spatial feature in beamforming for noise reduction. On the other hand, single channel approaches for noise reduction may be reasonable when DOA estimates are not reliable. Then, either spectral subtraction or beamforming is selected out for achieving robust noise reduction depending on a DOA estimate and its reliability. The proposed method had an advantage in noise reduction compared with a conventional approach.
Convention Paper 7709 (Purchase now)