AES 116th Convention: PAPERS

Return to 116th

Registration

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

Preliminary Program

4 Day Planner PDF

Convention Program

(in PDF)

Exhibitor Seminars

(in PDF)

Multichannel

Symposium

Paper Sessions

Tutorial Seminars

Workshops

Special Events

Exhibitor Seminars

Tours

Student Program

Historical

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Hotel Information

Travel Info

Press Information

v3.1, 20040329, ME

Session N Monday, May, 10 13:30 h–16:30 h
SIGNAL PROCESSING—PART 2
(focus on analysis and reproduction)
Chair: John Vanderkooy, University of Waterloo, Waterloo, Ontario, Canada

N-1 Effects of Jitter on AD/DA Conversion; Specification of clock Jitter Performance—Bruno Putzeys, Renaud de Saint Moulin, Philips, Digital System Labs, Heverlee, Belgium
The impact of clock jitter on AD/DA conversion performance is detailed for several conversion methods. Account is taken of the spectral distribution of both the jitter and of the converted waveform. The inadequacy of a single “picosecond” performance figure is shown, and the use of a dBc/sqrt(Hz) specification is proposed instead.
N-2 Nonuniform Sampling Theory in Audio Signal Processing— Patrick Wolfe¹, Jamie Howarth², ¹ University of Cambridge, Cambridge, UK² Plangent Processes, Nantucket, MA, USA
The goal of most sampling schemes is to sample the analog signal of interest at a regular rate sufficiently high to ensure a perfect reconstruction principle in theory. Indeed, analysis and subsequent signal processing is almost always predicated on this requirement. However, the assumption of uniformly spaced samples is often invalidated in practice. Here, we describe nonuniform sampling theory, which provides a framework for the investigation and analysis of such cases. We review aspects of the theory and describe how it may be applied to practical problems of interest in audio signal processing, including those of wow and flutter in the analog domain as well as jitter in the digital domain.
N-3 Acoustic Positioning and Head Tracking Based on Binaural Signals—Miikka Tikander, Aki Härmä, Matti Karjalainen, Helsinki University of Technology, Espoo, Finland
Tracking a user's movement and orientation is essential for providing realistic mobile augmented reality audio (MARA) services. For mobile use the tracking system needs to be light-weight, wearable, and wireless. Binaural microphones offer a convenient and practical solution for tracking user movement and orientation. These sensors can be easily integrated with portable headphones. In addition to tracking, the microphones also offer several possibilities to control the user's acoustic environment. This paper reviews the latest results in binaural head-tracking with known anchor sources and also discusses the case where there are no known anchor (reference) sources available. Some transducer issues are also discussed.
N-4 Feature Extractors for Music Information Retreival: Noise Robustness—Adebunmi Paul-Taiwo, Mark Sandler, Mike Davies, Queen Mary University of London, London, UK
The challenge in the field of music information retrieval is to discover a set of features that has minimal dimensionality and is also very robust to the variations in the channel and environment. This paper provides an overview of several feature extraction algorithms that have been used for music information retrieval; Mel Frequency Cepstral coefficient (MFCC), Linear Prediction coefficient (LPC), Perceptual Linear Prediction coefficient (PLP), and delta coefficient. This paper also emphasizes a biologically inspired feature extractor (The Human Factor Cepstral Coefficient) which was initially introduced in speech recognition. Its performance compares favorably with the other modeling algorithms. It also reports the findings of experiments that compare the effectiveness of these feature extractors, in the presence of noise in the context of a simple but complete music information retrieval system.
N-5 A System for Multitask Noisy Speech Enhancement—Andrzej Czyzewski¹, Andrzej Kaczmarek¹, Jozef Kotus¹, Arkadiusz Pawlik^2,, Andrzej Rypulak², Pawel Zwan^1

1 Gdansk University of Technology, Gdansk, Poland;² Air Force Academy, Deblin, Poland
A general characteristic of the engineered speech signal registration and restoration system is presented in this paper. It contains a concise description of specific components of the system: the system being a set of advanced tools for registration, analysis, and reconstruction of speech existing in the form of computer software. The tools included allow for prompt search of desired fragments of recordings and for the improvement of their quality through noise, distortion, and interference reduction. Brief information concerning selected speech reconstruction algorithms is presented also, the use of which allowed for an especially significant increase of processed speech comprehension.
N-6 Automatic Extraction of High-Level Music Descriptors from Acoustic Signals Using EDS—Aymeric Zils, Francois Pachet, Sony CSL, Paris, France
High-level music descriptors are key ingredients for music information retrieval systems. Although there is a long tradition in extracting information from acoustic signals, the field of music information extraction is largely heuristic in nature. We present here a heuristic-based generic approach for extracting automatically high-level music descriptors from acoustic signals. This approach is based on genetic programming, used to build relevant features as functions of mathematical and signal processing operators. The search for relevant features is guided by specialized heuristics that embody knowledge about the signal processing functions built by the system. Signal processing patterns are used in order to control the general processing methods. In addition, rewriting rules are introduced to simplify overly complex expressions, and a caching system further reduces the computing cost of each cycle. Finally, the features built by the system are combined into an optimized machine learning descriptor model, and an executable program is generated to compute the model on any audio signal. In this paper, we describe the overall system and compare its results against traditional approaches in musical feature extraction à la MPEG7.