Events of the AES: AES 113th Convention: SESSION B: SIGNAL PROCESSING, PART 1

Return to 113th

Chairman's Welcome

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

4 Day Planner

Paper Sessions

Workshops

Special Events

Technical Tours

Student Program

Historical

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Registration

Travel and

General Info

Press Information

Student Volunteers

Saturday, October 5 9 am – 11:00 am
SESSION B: SIGNAL PROCESSING, PART 1

Chair: Rob Maher, Montana State University, Bozeman, MT, USA

B-1 A Talker-Tracking Microphone Array for Teleconferencing Systems—Kazunori Kobayashi, Ken’ichi Furuya, Akitoshi Kataoka, NTT, Musashino-shi, Tokyo, Japan

We propose a beam forming method that is applicable to near sound fields where a filter-and-sum microphone array maintains better quality for the target sound than the conventional delay-and-sum array. We also describe a real-time implementation that includes steering of the beam to detected talker locations. With the use of a microphone array, our system also cuts levels of noise to achieve high-quality sound acquisition. Furthermore, it allows the talker to be in any position. Computer simulation and experiments show that our method is effective in teleconferencing systems.
Convention Paper 5642

B-2 An Alternative Model for Sound Signals Encountered in Reverberant Environments; Robust Maximum Likelihood Localization and Parameter Estimation Based on a Sub-Gaussian Model—Panayiotis G. Georgiou, Chris Kyriakakis, University of Southern California, Los Angeles, CA, USA

In this paper we investigate an alternative to the Gaussian density for modeling signals encountered in audio environments. The observation that sound signals are impulsive in nature, combined with the reverberation effects commonly encountered in audio, motivates the use of the sub-Gaussian density. The new sub-Gaussian statistical model and the separable solution of its maximum likelihood estimator are derived. These are used in an array scenario to demonstrate with both simulations and two different microphone arrays the achievable performance gains. The simulations exhibit the robustness of the sub-Gaussian-based method while the real world experiments reveal a significant performance gain, supporting the claim that the sub-Gaussian model is better suited for sound signals.
Convention Paper 5643

B-3 Imperceptible Echo for Robust Audio Watermarking—Hyen-O Oh¹, Jong Won Seok², Jin Woo Hong², Dae-Hee Youn¹ - ¹Yonsei University, Seoul, Korea; ²Electronics & Telecommunications Research Institute (ETRI), Daejon, Korea

In echo watermarking, the effort to improve robustness often conflicts with the requirement of imperceptibility. There have been inherent trade-offs in general audio
watermarking techniques. In this paper we challenge the development of imperceptible but detectable echo kernels being directly embedded into the high-quality audio signal. Mathematical and perceptual characteristics of echo kernels are analyzed in a frequency domain. Finally, we can obtain a greater flat frequency response in perceptually significant bands by combining closely located positive and negative echoes. The proposed echo makes it possible to improve the robustness of an echo watermark without breaking the imperceptibility.
Convention Paper 5644

B-4 New High Data Rate Audio Watermarking Based on SCS (Scalar Costa Scheme)—Frank Siebenhaar¹, Christian Neubauer¹, Robert Bäuml², Jürgen Herre¹ - ¹Fraunhofer Institute for Integrated Circuits, Erlangen, Germany; ²Friedrich-Alexander University, Erlangen, Germany

Presently, distribution of audio material is no longer limited to physical media. Instead, distribution via the Internet is of increasing importance. In order to attach additional information to the audio content, either for forensic or digital rights management purposes or for annotation purposes, watermarking is a promising technique since it is independent of the audio format and transmission technology.
State-of-the-art spread spectrum watermarking systems can offer high robustness against unintentional and intentional signal modifications. However, their data rate is typically comparatively low, often below 100 bit/s. This paper describes the adaptation of a new watermarking scheme called Scalar Costa Scheme (SCS), which is based on dithered quantization of audio signals. In order to fulfill the demands of high quality audio signal processing, modifications of the basic SCS, such as the introduction of a psychoacoustic model and new algorithms to determine quantization intervals, are required. Simulation figures and results of a sample implementation, which show the potential of this new watermarking scheme, are presented in this paper along with a short theoretical introduction to the SCS watermarking scheme.
Convention Paper 5645