AES Show: Make the Right Connections Audio Engineering Society

AES San Francisco 2008
Paper Session P2

P2 - Analysis and Synthesis of Sound

Thursday, October 2, 9:00 am — 12:30 pm
Chair: Hiroko Terasawa, Stanford University - Stanford, CA, USA

P2-1 Spatialized Additive Synthesis of Environmental SoundsCharles Verron, Orange Labs - Lannion, France, and Laboratoire de Mécanique et d’Acoustique, Marseille, France; Mitsuko Aramaki, Institut de Neurosciences Cognitives de la Méditerranée - Marseille, France; Richard Kronland-Martinet, Laboratoire de Mécanique et d’Acoustique - Marsielle, France; Grégory Pallone, Orange Labs - Lannion, France
In virtual auditory environment, sound sources are typically created in two stages: the “dry” monophonic signal is synthesized, and then, the spatial attributes (like source directivity, width, and position) are applied by specific signal processing algorithms. In this paper we present an architecture that combines additive sound synthesis and 3-D positional audio at the same level of sound generation. Our algorithm is based on inverse fast Fourier transform synthesis and amplitude-based sound positioning. It allows synthesizing and spatializing efficiently sinusoids and colored noise, to simulate point-like and extended sound sources. The audio rendering can be adapted to any reproduction system (headphones, stereo, 5.1, etc.). Possibilities offered by the algorithm are illustrated with environmental sound.
Convention Paper 7509 (Purchase now)

P2-2 Harmonic Sinusoidal + Noise Modeling of Audio Based on Multiple F0 EstimationMaciej Bartkowiak, Tomasz Zernicki, Poznan University of Technology - Poznan, Poland
This paper deals with the detection and tracking of multiple harmonic series. We consider a bootstrap approach based on prior estimation of F0 candidates and subsequent iterative adjustment of a harmonic sieve with simultaneous refinement of the F0 and inharmonicity factor. Experiments show that this simple approach is an interesting alternative to popular strategies, where partials are detected without harmonic constraints, and harmonic series are resolved from mixed sets afterwards. The most important advantage is that common problems of tonal/noise energy confusion in case of unconstrained peak detection are avoided. Moreover, we employ a popular LP-based tracking method that is generalized to dealing with harmonically related groups of partials by using a vector inner product as the prediction error measure. Two alternative extensions of the harmonic model are also proposed in the paper that result in greater naturalness of the reconstructed audio: an individual frequency deviation component and a complex narrowband individual amplitude envelope.
Convention Paper 7510 (Purchase now)

P2-3 Sound Extraction of Delackered RecordsOttar Johnsen Frédéric Bapst, Ecole d'ingenieurs et d'architectes de Fribourg - Fribourg, Switzerland; Lionel Seydoux, Connectis AG - Berne, Switzerand
Most direct cut records are made of an aluminum or glass plate with a coated acetate lacquer. Such records are often crackled due to the shrinkage of the coating. It is impossible to read such records mechanically. We are presenting here a technique to reconstruct the sound from such record by scanning the image of the record and combining the sound from the different parts of the "puzzle." The system has been tested by extracting sounds from sound archives in Switzerland and in Austria. The concepts will be presented as well as the main challenges. Extracted sound samples will be played.
Convention Paper 7511 (Purchase now)

P2-4 Parametric Interpolation of Gaps in Audio SignalsAlexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
The problem of interpolation of gaps in audio signals is important for the restoration of degraded recordings. Following the parametric approach over a sinusoidal model recently suggested in JAES by Lagrange et al., this paper proposes an extension to this interpolation algorithm by considering interpolation of a noisy component in a “sinusoidal + noise” signal model. Additionally, a new interpolator for sinusoidal components is presented and evaluated. The new interpolation algorithm becomes suitable for a wider range of audio recordings than just interpolation of a sinusoidal signal component.
Convention Paper 7512 (Purchase now)

P2-5 Classification of Musical Genres Using Audio Waveform Descriptors in MPEG-7Nermin Osmanovic, Microsoft Corporation - Seattle, WA, USA
Automated genre classification makes it possible to determine the musical genre of an incoming audio waveform. One application of this is to help listeners find music they like more quickly among millions of tracks in an online music store. By using numerical thresholds and the MPEG-7 descriptors, a computer can analyze the audio stream for occurrences of specific sound events such as kick drum, snare hit, and guitar strum. The knowledge about sound events provides a basis for the implementation of a digital music genre classifier. The classifier inputs a new audio file, extracts salient features, and makes a decision about the musical genre based on the decision rule. The final classification results show a recognition rate in the range 75% to 94% for five genres of music.
Convention Paper 7513 (Purchase now)

P2-6 Loudness Descriptors to Characterize Programs and Music TracksEsben Skovenborg, TC Group Research - Risskov, Denmark; Thomas Lund, TC Electronic - Risskov, Denmark
We present a set of key numbers to summarize loudness properties of an audio segment, broadcast program, or music track: the loudness descriptors. The computation of these descriptors is based on a measurement of loudness level, such as specified by the ITU-R BS.1770. Two fundamental loudness descriptors are introduced: Center of Gravity and Consistency. These two descriptors were computed for a collection of audio segments from various sources, media, and formats. This evaluation demonstrates that the descriptors can robustly characterize essential properties of the segments. We propose three different applications of the descriptors: for diagnosing potential loudness problems in ingest material; as a means for performing a quality check, after processing/editing; or for use in a delivery specification.
Convention Paper 7514 (Purchase now)

P2-7 Methods for Identification of Tuning System in Audio Musical SignalsPeyman Heydarian, Lewis Jones, Allan Seago, London Metropolitan University - London, UK
The tuning system is an important aspect of a piece. It specifies the scale intervals and is an indicator of the emotions of a musical file. There is a direct relationship between musical mode and the tuning of a piece for modal musical traditions. So, the tuning system carries valuable information, which is worth incorporating into metadata of a file. In this paper different algorithms for automatic identification of the tuning system are presented and compared. In the training process, spectral and chroma average, and pitch histograms, are used to construct reference patterns for each class. The same is done for the testing samples and a similarity measure like the Manhattan distance classifies a piece into different tuning classes.
Convention Paper 7515 (Purchase now)

P2-8 “Roughometer”: Realtime Roughness Calculation and ProfilingJulian Villegas, Michael Cohen, University of Aizu - Aizu-Wakamatsu, Fukushima-ken, Japan
A software tool capable of determining auditory roughness in real-time is presented. This application, based on Pure-Data (Pd), calculates the roughness of audio streams using a spectral method originally proposed by Vassilakis. The processing speed is adequate for many real-time applications and results indicate limited but significant agreement with an Internet application of the chosen model. Finally, the usage of this tool is illustrated by the computation of a roughness profile of a musical composition that can be compared to its perceived patterns of “tension” and “relaxation.”
Convention Paper 7516 (Purchase now)