AES London 2011
Paper Session P10

P10 - Audio Content Management

Saturday, May 14, 11:00 — 13:00 (Room 1)

Jamie A. S. Angus

P10-1 A Comprehensive and Modular Framework for Audio Content Extraction, Aimed at Research, Pedagogy, and Digital Library ManagementOlivier Lartillot, University of Jyväskylä - Jyväskylä, Finland
We present a framework for audio analysis and the extraction of low-level features, mid-level structures, and high-level concepts, altogether studied as a fully interwoven complex system. Composite operations are constructed via an intuitive programming language on top of Matlab. Datasets of any size can be processed thanks to implicit memory management mechanisms. The data structure enables a tight articulation between signal and symbolic layers in a unified framework. The resulting technology can be used as a pedagogical tool for the understanding of audio, speech, and musical processes and concepts, and for content-based discovery of digital libraries. Other applications includes intelligent browsing and structuring of digital library, information retrieval, and the design of content-based audio interfaces.
P10-2 Selected Playback Problems of Historical Grooved MediaNadja Wallaszkovits, Franz Lechleitner, Phonogrammarchiv Austrian Academy of Sciences - Vienna, Austria; Heinrich Pichler, Audio Consultant - Vienna, Austria
The paper discusses some selected playback problems of the replay and high quality archival transfer of historical grooved media, like cylinders, instantaneous discs and early coarse groove records. The topics outline the problems of noise reduction and the compensation of the horizontal tracking angle by means of stereo playback and modification of the sum and differential signals. A comparison between existing noise reduction methods and analog as well as digital phase and group delay compensation methods is given and discussed. Finally possible compensation methods for the change in noise spectrum caused by the groove velocity decrease at inner diameters with early discs are outlined. The authors propose a radius equalization in the digital domain by using a digital high pass filter without group delay distortion, using diameter dependent change of cut-off frequency.
P10-3 Automatic Recognition of Events in Audio Data Using Supercomputer ClusterKuba Lopatka, Andrzej Czyzewski, Henryk Krawczyk, Gdansk University of Technology - Gdansk, Poland
Dangerous events’ automatic recognition by audio analysis employing parallel processing on a supercomputer cluster is described in the paper. Sound files recorded by microphones operating in a security surveillance system are processed by a sound event detection and classification algorithm. Because of the large amount of data, parallel computation is employed to speed up the analysis. The sound file recorded by the surveillance system is divided into chunks and processed by separate threads or processes. Several strategies for such parallel computation are introduced and discussed. Results obtained in tests using a supercomputer cluster are presented.
P10-4 Using Support Vector Machines for Automatic Mood Tracking In Audio MusicRenato Panda, Rui Pedro Paiva, University of Coimbra - Coimbra, Portugal
In this paper we propose a solution for automatic mood tracking in audio music, based on supervised learning and classification. To this end, various music clips with a duration of 25 seconds, previously annotated with arousal and valence (AV) values, were used to train several models. These models were used to predict quadrants of the Thayer’s taxonomy and AV values, of small segments from full songs, revealing the mood changes over time. The system accuracy was measured by calculating the matching ratio between predicted results and full song annotations performed by volunteers. Different combinations of audio features, frameworks, and other parameters were tested, resulting in an accuracy of 56.3% and showing there is still much room for improvement.
