AES Show: Make the Right Connections Audio Engineering Society

AES San Francisco 2008
Poster Session P7

Friday, October 3, 9:00 am — 10:30 am

P7 - Audio Content Management

P7-1 A Piano Sound Database for Testing Automatic Transcription MethodsLuis Ortiz-Berenguer, Elena Blanco-Martin, Alberto Alvarez-Fernandez, Jose A. Blas-Moncalvillo, Francisco J. Casajus-Quiros, Universidad Politecnica de Madrid - Madrid, Spain
A piano sound database, called PianoUPM, is presented. It is intended to help the researching community in developing and testing transcription methods. A practical database needs to contain notes and chords played through the full piano range, and it needs to be recorded from acoustic pianos rather than synthesized ones. The presented piano sound database includes the recording of thirteen pianos from different manufacturers. There are both upright and grand pianos. The recordings include the eighty-eight notes and eight different chords played both in legato and staccato styles. It also includes some notes of every octave played with four different forces to analyze the nonlinear behavior. This work has been supported by the Spanish National Project TEC2006-13067-C03-01/TCM.
Convention Paper 7538 (Purchase now)

P7-2 Measurements of Spaciousness for Stereophonic MusicAndy Sarroff, Juan P. Bello, New York University - New York, NY, USA
The spaciousness of pre-recorded stereophonic music, or how large and immersive the virtual space of it is perceived to be, is an important feature of a produced recording. Quantitative models of spaciousness as a function of a recording’s (1) wideness of source panning and of a recording’s (2) amount of overall reverberation are proposed. The models are independently evaluated in two controlled experiments. In one, the panning widths of a distribution of sources with varying degrees of panning are estimated; in the other, the extent of reverberation for controlled mixtures of sources with varying degrees of reverberation are estimated. The models are shown to be valid in a controlled experimental framework.
Convention Paper 7539 (Purchase now)

P7-3 Music Annotation and Retrieval System Using Anti-ModelsZhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, National Tsing Hua University - Taiwan
Query-by-semantic-description (QBSD) is a natural way for searching/annotating music in a large database. We propose such a system by considering anti-words for each annotation word based on the concept of supervised multi-class labeling (SML). Moreover, words that are highly correlated with the anti-semantic meaning of a word constitute its anti-word set. By modeling both a word and its anti-word set, our system can achieve +8.21% and +1.6% gains of average precision and recall against SML under the condition of an equal average number of annotation words, that is, 10. By incorporating anti-models, we also allow queries with anti-semantic words, which is not an option for previous systems.
Convention Paper 7540 (Purchase now)

P7-4 The Effects of Lossy Audio Encoding on Onset Detection TasksKurt Jacobson, Matthew Davies, Mark Sandler, Queen Mary University of London - London, UK
In large audio collections, it is common to store audio content with perceptual encoding. However, encoding parameters may vary from collection to collection or even within a collection—using different bit rates, sample rates, codecs, etc. We evaluated the effect of various audio encodings on the onset detection task and show that audio-based onset detection methods are surprisingly robust in the presence of MP3 encoded audio. Statistically significant changes in onset detection accuracy only occur at bit-rates lower than 32 kbps.
Convention Paper 7541 (Purchase now)

P7-5 An Evaluation of Pre-Processing Algorithms for Rhythmic Pattern AnalysisMatthias Gruhne, Christian Dittmar, Daniel Gaertner, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau Technical University - Ilmenau, Germany
For the semantic analysis of polyphonic music, such as genre recognition, rhythmic pattern features (also called Beat Histogram) can be used. Feature extraction is based on the correlation of rhythmic information from drum instruments in the audio signal. In addition to drum instruments, the sounds of pitched instruments are usually also part of the music signal to analyze. This can have a significant influence on the correlation patterns. This paper describes the influence of pitched instruments for the extraction of rhythmic features, and evaluates two different pre-processing methods. One method computes a sinusoidal and noise model, where its residual signal is used for feature extraction. In the second method, a drum transcription based on spectral characteristics of drum sounds is performed, and the rhythm pattern feature is derived directly from the occurrences of the drum events. Finally, the results are explained and compared in detail.
Convention Paper 7542 (Purchase now)

P7-6 A Framework for Producing Rich Musical Metadata in Creative Music ProductionGyorgy Fazekas, Yves Raimond, Mark Sandler, Queen Mary University of London - London, UK
Musical metadata may include references to individuals, equipment, procedures, parameters, or audio features extracted from signals. There are countless possibilities for using this data during the production process. An intelligent audio editor, besides internally relying on it, can be both producer and consumer of information about speci?c aspects of music production. In this paper we propose a framework for producing and managing meta information about a recording session, a single take or a subsection of a take. As basis for the necessary knowledge representation we use the Music Ontology with domain speci?c extensions. We provide examples on how metadata can be used creatively, and demonstrate the implementation of an extended metadata editor in a multitrack audio editor application.
Convention Paper 7543 (Purchase now)

P7-7 SoundTorch: Quick Browsing in Large Audio CollectionsSebastian Heise, Michael Hlatky, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Musicians, sound engineers, and foley artists face the challenge of finding appropriate sounds in vast collections containing thousands of audio files. Imprecise naming and tagging forces users to review dozens of files in order to pick the right sound. Acoustic matching is not necessarily helpful here as it needs a sound exemplar to match with and may miss relevant files. Hence, we propose to combine acoustic content analysis with accelerated auditioning: Audio files are automatically arranged in 2-D by psychoacoustic similarity. A user can shine a virtual flashlight onto this representation; all sounds in the light cone are played back simultaneously, their position indicated through surround sound. User tests show that this method can leverage the human brain's capability to single out sounds from a spatial mixture and enhance browsing in large collections of audio content.
Convention Paper 7544 (Purchase now)

P7-8 File System Tricks for Audio ProductionMichael Hlatky, Sebastian Heise, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Not every file presented by a computer operating system needs to be an actual stream of independent bits. We demonstrate that different types of virtual files and folders including so-called "Filesystems in Userspace" (FUSE) allow streamlining audio content management with relatively little additional complexity. For instance, an off-the-shelf database system may present a distributed sound library through (seemingly) standard files in a project-specific hierarchy with no physical copying of the data involved. Regions of audio files may be represented as separate files; audio effect plug-ins may be displayed as collections of folders for on-demand processing while files are read. We address differences between operating systems, available implementations, and lessons learned when applying such techniques.
Convention Paper 7545 (Purchase now)