Monday, October 13 9:00 12:00 noon
Session N Analysis and Synthesis of Sound
N-1 Object-Based 3-D Audio Scene RepresentationDae-young Jang, Jeongil Seo, Kyeongok Kang, ETRI, Daejeon, Korea; Hoe-Kyung Jung, Paichai University, Daejeon, Korea
In this paper we introduce a new object-based 3-D audio scene representation scheme. Typically, four kinds of 3-D sound source objects are defined: point sound source, line sound source, plane sound source, and volume sound source. These are used for representation of an object-based 3-D audio scene. Each 3-D sound source object includes sound source and related 3-D scene information. Users can interact with 3-D sound source objects and control them by modifying 3-D scene information. We implement a prototype of an object-based 3-D audio player and produced several contents for demonstration. This object-based 3-D audio representation scheme can be used in a very wide range of applications, such as in interactive games, home shopping, and broadcasting realistic ambient sounds.
N-2 A Flexible Resynthesis Approach for Quasi-Harmonic SoundsHarvey Thornburg, Randal Leistikow, Stanford University, Stanford, CA, USA
We propose a flexible state-space resynthesis approach that extends the sinusoidal model into the domain of source-filter modeling. Our approach is further specialized to a class of quasi-harmonic sounds, representing a variety of acoustic sources in which multiple, closely spaced modes cluster about principal harmonics loosely following a harmonic structure. We detail a variety of sound modification possibilities: time and pitch scaling modifications, cross-synthesis, and other, potentially novel possibilities.
N-3 Objective Prediction of Sound Synthesis QualityBrahim Hamadicharef, Emmanuel Ifeachor, University of Plymouth, Plymouth, Devon, UK
This paper is concerned with objective prediction of perceived audio quality for an intelligent audio system for modeling musical instruments. The study is part of a project to develop an automated tool for sound design. Objective prediction of subjective audio quality ratings by audio experts is an important part of the system. Sound quality is assessed using PEAQ (perceptual evaluation of audio quality), and this greatly reduces the time-consuming efforts involved in listening tests. Tests carried out using a large database of pipe organ sounds, show that the method can be used to quantify the quality of synthesized sounds. This approach provides a basis for development of a new index for benchmarking sound synthesis techniques.
N-4 Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio MaximizationGeoffroy Peeters, IRCAM, Paris, France
This paper addresses the problem of classifying large databases of musical instrument sounds. An efficient algorithm is proposed for selecting the most appropriate signal features for a given classification task. This algorithm, called IRMFSP, is based on the maximization of the ratio of the between-class inertia to the total inertia combined with a step-wise feature space orthogonalization. Several classifiersflat gaussian, flat KNN, hierarchical gaussian, hierarchical KNN, and decision tree classifiersare compared for the task of large database classification. Especially considered is the application when our classification system is trained on a given database and used for the classification of another database possibly recorded in completely different conditions. The highest recognition rates are obtained when the hierarchical gaussian and KNN classifiers are used. Organization of the instrument classes is studied through an MDS analysis derived from the acoustic features of the sounds.
N-5 Virtual Analog Synthesis with a Time-Varying Comb FilterDavid Lowenfels, Stanford University, Stanford, CA, USA
The bandlimited digital synthesis model of Stilson and Smith is extended with a single feed-forward comb filter. Time-varying comb filter techniques are shown to produce a variety of classic analog waveform effects, including waveform morphing, pulse-width modulation, harmonization, and frequency modulation. Unlike previous techniques for hard-sync, the computational load of this method does not increase with frequency. The techniques discussed are not guaranteed to maintain perfect bandlimiting; however, they are generally applicable to other syntheses models.
N-6 Using MPEG-7 Audio Fingerprinting in Real-World ApplicationsOliver Hellmuth, Eric Allamance, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany; Markus Cremer, Holger Grossmann, Fraunhofer Institute for Integrated Circuits IIS, AEMT, Ilmenau, Germany; Jürgen Herre, Thorsten Kastner, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Finalized in 2001, the MPEG-7 audio standard provides a universal toolbox for the content-based description of audio material. While the descriptive elements defined in this standard may be used for many purposes, audio fingerprinting (i.e., automatic identification of audio content) was already among the initial set of target applications that were conceived during the design of the standard. This paper reviews the basics of MPEG-7-based audio fingerprinting and explains how the technology has been used in a number of real-world applications, including metadata search engines, database maintenance, broadcast monitoring, and audio identification on embedded systems. Appropriate selection of fingerprinting parameters and performance numbers are discussed.