AES 110th Convention: Session G

Other AES Events

Chairman's Welcome

General Information

Exhibitors

Calendar in Excel

Calendar in PDF

Paper Sessions

Workshops

Special Events

Historical Program

Student Program

Technical Tours

Cultural Tours

Standards Comm Mtgs

Technical Comm Mtgs

Registration

Session G Sunday, May 13 13:30 - 18:00 hr Room B

Analysis and Synthesis of Sound

Chair: Matti Karjalainen, Helsinki University of Technology, Espoo, Finland

13:30 hr G-1
Signal-Adapted Wavelet for Pitch Detection of Musical Signals
Dong-Yan Huang
Institute of Microelectronics, Singapore, Singapore

This paper presents a novel approach for pitch detection of musical sound signals using the signal-adapted wavelet transform (WT). As the effectiveness of the wavelet transform for a particular application depends on the choice of the wavelet function, a wavelet function derived from the input power spectral density (PSD) is designed to concentrate the signal energy in the low-frequency region. Based on the corresponding wavelet transform, a time-based event detection method is proposed to extract the pitch periods information from the wavelet coefficients. Because the wavelet is signal-adapted and presents the characteristics of the signal, the pitch detector is suitable for sound signals having a range from 50 to 4000 Hz of the fundamental frequency values. The simulation results and real music experiments demonstrate that the main features of this method are better accuracy than the different methods for pitch period estimation and robustness to noise.
Paper 5326

14:00 hr G-2
Means of Integrating Audio Content Analysis Algorithms
Anssi Klapuri
Tampere University of Technology, Tampere, Finland

Two generic mechanisms are proposed that facilitate the efficient integration of audio content analysis algorithms. The first mechanism, priority-rule based interleaving of algorithms, allows the simultaneous interoperation of several bottom-up analysis modules by interleaving their atomic steps. It aims at increased accuracy through controlled manipulation of common data. The second mechanism, top-down routing of requests for data, allows high-level predictions to direct the bottom-up analysis towards verifying the predicted hypotheses by observations. Examples from automatic music transcription are presented to clarify the use of the proposed methods.
Paper 5327

14:30 hr G-3
An Audio-Driven, Spectral Analysis-Based, Perceptual Synthesis Engine
Tristan Jehan & Bernd Schoner
MIT Media Laboratory, Cambridge, MA, USA

A real-time synthesis engine is presented which models and predicts the "timbre" of different acoustic instruments based on perceptual features. The paper describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. Demonstrations include the timbre synthesis of stringed instruments and the singing voice, as well as the cross-synthesis and timbre morphing between these instruments.
Paper 5328

15:00 hr G-4
A Speech-Based Frequency Scale
Tim Brookes
University of Surrey, Guildford, UK

There are several different audio frequency scales in common use, each having its own particular merits. The speech-based frequency scale derived here, from vowel formant frequency difference limens, has a markedly different shape from the others and attaches more relative weight to the range of frequencies associated with vowel perception, making it potentially well suited to speech analysis applications.
Paper 5329

15:30 hr G-5
A Novel Portamento Embedded Model For Analysis and Synthesis of Musical Sound
Alvin Su (1) & Rei-Wen Wang (2)
National Cheng-Kung University, Tainan, Taiwan
Chung-Hwa University, Hsinchu, Taiwan

A novel music analysis/synthesis method is proposed. The basic structure consists of a delay line, a feedback filter, and a short wavetable as the excitation signal. Because most musical tones are quasi-periodic, the feedback filter predicts the next input data to the delay line based on the signal in the delay buffer. The filter coefficients are obtained in the analysis process performed by using source signals as the teaching vector and a recurrent neural network learning procedure. Because the basic architecture is identical to Digital Waveguide Filters (DWF), most efficient supplemental processing and implementation techniques for DWF can be applied. Instead of a fixed-length delay line, a variable-length delay line and a control method are embedded when a wide range portamento is required. The proposed method is currently applied to synthesize plucked-string instruments.
Paper 5330

16:00 hr G-6
Restoration and Enhancement of Instrumental Recordings Based on Sound Source Modeling
Paulo Esquef, Matti Karjalainen & Vesa V�lim�ki
Helsinki University of Technology, Espoo, Finland

This paper presents new propositions to audio restoration and enhancement based on Sound Source Modeling. We describe a case based on the commuted waveguide synthesis algorithm for plucked string tones. The main motivation is to take advantage of prior information of generative models of sound sources when restoring or enhancing musical signals.
Paper 5331

16:30 hr G-7
Synthesising Prosody with Variable Resolution
Eduardo Miranda
Sony CSL, Paris, France

This paper presents a technique for synthesizing prosody based upon information extracted from spoken utterances. We are interested in designing systems that learn how to speak autonomously, by interacting with humans. Our motivation for an in-depth investigation on prosody is prompted by the fact that infants seem to have acute prosodic listening during the first months of life. We presume that any system aimed at learning some form of speaking skills should display this fundamental capacity. This paper addresses two fundamental components for the development of such systems: prosody listening and prosody production. It begins with a brief introduction to the problem within the context of our research objectives. Then it introduces the system and presents some commented examples. The paper concludes with final remarks and a brief discussion on future developments.
Paper 5332

17:00 hr G-8
Accurate Sinusoidal Model Analysis and Parameter Reduction by Fusion of Components
Tuomas Virtanen
Tampere University of Technology, Tampere, Finland

A method is described, with which two stable sinusoids can be represented with a single sinusoid with time-varying parameters and in some conditions approximated with a stable sinusoid. The method is utilized in an iterative sinusoidal analysis algorithm, which combines the components obtained in different iteration steps using described the method. The proposed algorithm improves the quality of the analysis at the expense of an increased number of components.
Paper 5333

17:30 hr G-9
Automatic Recognition of Musical Instrument Sounds - Further Developments
Bozena Kostek & Andrzej Czyzewski
Technical University of Gdansk, Gdansk, Poland

Discussion on the subject of retrieval of musical data from Internet or multimedia databases, which is carried out now for some time does not successfully reach its final stage of application. There are still many problems related to the subject of automatic recognition of music or musical instrument sounds that cannot be solved easily. Especially important is to find adequate parameters of musical signal based on time and frequency and/or wavelet analyses. Proposed feature vectors were derived on the basis of the constructed databases that contain recorded musical sounds. The presented study shows some methods of automatic identification of musical instruments based both on classical statistical and soft computing approaches. They were used then to classify musical instruments. The results obtained in the carried out investigations are presented and analyzed, leading to some specific and some more general conclusions.
Paper 5334

Return to list of Sessions