AES London 2010
Monday, May 24, 16:30 — 18:00
Poster Session P20
P20 - Audio Content Management—Audio Information Retrieval
P20-1 Complexity Scalable Perceptual Tempo Estimation From HE-AAC Encoded Music—Danilo Hollosi, Ilmenau University of Technology - Ilmenau, Germany; Arijit Biswas, Dolby Germany GmbH - Nürberg, Germany
A modulation frequency-based method for perceptual tempo estimation from HE-AAC encoded music is proposed. The method is designed to work on fully-decoded PCM-domain; the intermediate HE-AAC transform-domain after partial decoding; and directly on HE-AAC compressed-domain using Spectral Band Replication (SBR) payload. This offers complexity scalable solutions. We demonstrate that SBR payload is an ideal proxy for tempo estimation directly from HE-AAC bit-streams without even decoding them. A perceptual tempo correction stage is proposed based on rhythmic features to correct for octave errors in every domain. Experimental results show that the proposed method significantly outperforms two commercially available systems, both in terms of accuracy and computational speed.
Convention Paper 8109 (Purchase now)
P20-2 On the Effect of Reverberation on Musical Instrument Automatic Recognition—Mathieu Barthet, Mark Sandler, Queen Mary University of London - London, UK
This paper investigates the effect of reverberation on the accuracy of a musical instrument recognition model based on Line Spectral Frequencies and K-means clustering. One-hundred-eighty experiments were conducted by varying the type of music databases (isolated notes, solo performances), the stage in which the reverberation is added (learning, and/or testing), and the type of reverberation (3 different reverberation times, 10 different dry-wet levels). The performances of the model systematically decreased when reverberation was added at the testing stage (by up to 40%). Conversely, when reverberation was added at the training stage, a 3% increase of performance was observed for the solo performances database. The results suggest that pre-processing the signals with a dereverberation algorithm before classification may be a means to improve musical instrument recognition systems.
Convention Paper 8110 (Purchase now)
P20-3 Harmonic Components Extraction in Recorded Piano Tones—Carmine Emanuele Cella, Università di Bologna - Bologna, Italy
It is sometimes desirable, in the purpose of analyzing recorded piano tones, to remove from the original signal the noisy components generated by the hammer strike and by other elements involved in the piano action. In this paper we propose an efficient method to achieve such result, based on adaptive filtering and automatic estimation of fundamental frequency and inharmonicity; the final method, applied on a recorded piano tone, produces two separate signals containing, respectively, the hammer knock and the harmonic components. Some sound examples to listen for evaluation are available on the web as specified in the paper.
Convention Paper 8111 (Purchase now)
P20-4 Browsing Sound and Music Libraries by Similarity—Stéphane Dupont, Université de Mons - Mons, Belgium; Christian Frisson, Université Catholique de Louvain - Louvain-la-Neuve, Belgium; Xavier Siebert, Damien Tardieu, Université de Mons - Mons, Belgium
This paper presents a prototype tool for browsing through multimedia libraries using content-based multimedia information retrieval techniques. It is composed of several groups of components for multimedia analysis, data mining, interactive visualization, as well as connection with external hardware controllers. The musical application of this tool uses descriptors of timbre, harmony, as well as rhythm and two different approaches for exploring/browsing content. First, a dynamic data mining allows the user to group sounds into clusters according to those different criteria, whose importance can be weighted interactively. In a second mode, sounds that are similar to a query are returned to the user, and can be used to further proceed with the search. This approach also borrows from multi-criteria optimization concept to return a relevant list of similar sounds.
Convention Paper 8112 (Purchase now)
P20-5 On the Development and Use of Sound Maps for Environmental Monitoring—Maria Rangoussi, Stelios M. Potirakis, Ioannis Paraskevas, Technological Education Institute of Piraeus - Aigaleo-Athens, Greece; Nicolas–Alexander Tatlas, University of Patras - Patras, Greece
The development, update, and use of sound maps for the monitoring of environmental interest areas is addressed in this paper. Sound maps constitute a valuable tool for environmental monitoring. They rely on networks of microphones distributed over the area of interest to record and process signals, extract and characterize sound events and finally form the map; time constraints are imposed by the need for timely information representation. A stepwise methodology is proposed and a series of practical considerations are discussed to the end of obtaining a multi-layer sound map that is periodically updated and visualizes the sound content of a “scene.” Alternative time-frequency-based features are investigated as to their efficiency within the framework of a hierarchical classification structure.
Convention Paper 8113 (Purchase now)
P20-6 The Effects of Reverberation on Onset Detection Tasks—Thomas Wilmering, György Fazekas, Mark Sandler, Queen Mary University of London - London, UK
The task of onset detection is relevant in various contexts such as music information retrieval and music production, while reverberation has always been an important part of the production process. The effect may be the product of the recording space or it may be artificially added, and, in our context, destructive. In this paper we evaluate the effect of reverberation on onset detection tasks. We compare state-of-the art techniques and show that the algorithms have varying degrees of robustness in the presence of reverberation depending on the content of the analyzed audio material.
Convention Paper 8114 (Purchase now)
P20-7 Segmentation and Discovery of Podcast Content—Steven Hargreaves, Chris Landone, Mark Sandler, Panos Kudumakis, Queen Mary University of London - London, UK
With ever increasing amounts of radio broadcast material being made available as podcasts, sophisticated methods of enabling the listener to quickly locate material matching their own personal tastes become essential. Given the ability to segment a podcast that may be in the order of one or two hours duration into individual song previews, the time the listener spends searching for material of interest is minimized. This paper investigates the effectiveness of applying multiple feature extraction techniques to podcast segmentation and describes how such techniques could be exploited by a vast number of digital media delivery platforms in a commercial cloud-based radio recommendation and summarization service.
Convention Paper 8115 (Purchase now)