Authors:Lafay, Grégoire; Misdariis, Nicolas; Lagrange, Mathieu; Rossignol, Mathias
Affiliation:IRCCyN, Ecole Centrale de Nantes, France; STMS Ircam-CNRS-UPMC, Paris, France
With the growing capability of recording and storage devices, the problem of indexing large audio databases has been the object of much attention. Most of this effort is dedicated to automatic inferences from indexed metadata. In contrast, browsing audio databases in an effective manner has been less considered. This report studies the relevance of a semantic organization of sounds to ease the browsing of a sound database. For such a task, semantic access to data is traditionally implemented by a keyword selection process. However, various limitations of written language, such as word polysemy, ambiguities, or translation issues, may bias the browsing process. Two sound presentation strategies organized sounds spatially to reflect an underlying semantic hierarchy. For the sake of comparison, the authors also considered a display whose spatial organization was only based on acoustic cues. Those three displays were evaluated in terms of search speed in a crowdsourcing experiment using two different corpora: environmental sounds from urban environments and sounds produced by musical instruments. Coherent results demonstrate the usefulness of an implicit semantic organization for representing sounds in terms of both search speed and of learning efficiency.
Download: PDF (HIGH Res) (2.0MB)
Download: PDF (LOW Res) (309KB)
Authors:Zacharakis, Asterios; Pastiadis, Konstantinos
Affiliation:Aristotle University of Thessaloniki, Department of Music Studies, Thessaloniki, Greece
This study describes a listening experiment designed to further examine the previously proposed luminance-texture-mass (LTM) model for timbral semantics. Thirty two musically trained listeners rated twenty four instrument tones on six predefined semantic scales: brilliance, depth, roundness, warmth, fullness, and richness. These six scales were analyzed with Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) to produce two different timbre spaces. These timbre spaces were subsequently compared for their configurational and dimensional similarity with the LTM semantic space and the direct MDS perceptual space obtained from the same stimuli. The results showed that the selected semantic scales are adequately representing the LTM model and are fair at predicting the configurations of the sounds that result from pairwise dissimilarity ratings.
Download: PDF (HIGH Res) (1.9MB)
Download: PDF (LOW Res) (247KB)
Authors:Fan, Jianyu; Thorogood, Miles; Pasquier, Philippe
Affiliation:Simon Fraser University, SIAT, Canada
Soundscape studies have demonstrated a variety of approaches for investigating how soundscapes affect is part of immersive experiences. This research tries to develop an automatic affect recognition system that soundscape composers can use to create emotional compositions to evoke a target mood. In addition, this system can offer sound designers a more streamlined workflow for creating suitable sound effects for films and can offer engineers a way to design mood-enabled recommendation systems for retrieval of soundscape recordings. This research uses ground truth data collected from an online survey, and an analysis of the corpus shows that participants have a high level of agreement on the valence and arousal of soundscapes. The authors then generated a gold standard by averaging user responses. The propose system obtained better results than an expert-user model.
Download: PDF (HIGH Res) (834KB)
Download: PDF (LOW Res) (158KB)
Affiliation:Interdisciplinary Centre for Computer Music Research, Plymouth University, Devon, UK
Real-time control of the emotional content of sound has utility in video game soundtracking where the player controls the narrative trajectory, and the affective attributes of the sound should ideally match this trajectory. Perceived emotions can be represented in a 2-dimensional space composed of valence (positivity, e.g. happy, sad, fearful) and arousal (intensity, e.g. mild vs strong). This report is a speculative exploration of measuring and manipulating sound effects to achieve emotional congruence. An initial study suggests that timbral features can exert an influence on the perceived emotional response of a listener. A panel of listeners responded to stimuli in a set with varying timbres, while maintaining pitch, loudness, and other musical and acoustic features such as key, melodic contour, rhythm and meter, reverberant environment etc. The long term goal is to create an automated system that utilizes timbre morphing in real time to manipulate perceived affect in soundtrack generation.
Download: PDF (HIGH Res) (2.4MB)
Download: PDF (LOW Res) (311KB)
Authors:Bohak, Ciril; Marolt, Matija
Affiliation:University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
Automatic music transcription transforms an acoustic music signal into a symbolic notation that typically involves the detection of multiple concurrent pitches, the detection of note onsets and offsets, as well as recognition of the instruments. This paper presents a novel method for transcribing folk music. In contrast to most commercial music, folk music recordings may contain various inaccuracies because they are usually performed by amateur musicians and recorded in the field. The proposed method fuses three sources of information: frame-based multiple F0 estimates, song structure, and pitch drift estimates. Using song structure can improve transcription accuracy. The method uses two strategies: exploiting repetitions aligned in the time and pitch domains for improving F0 estimates and incorporating a probabilistic model based on explicit duration hidden Markov models (EDHMM) to estimate notes from F0. A representative segment of the analyzed song is used to align other segments. Information from these segments is summarized and used in a two-layer probabilistic EDHMM to segment frame-based information into notes.
Download: PDF (HIGH Res) (1.3MB)
Download: PDF (LOW Res) (839KB)
Authors:Barthet, Mathieu; Fazekas, György; Allik, Alo; Thalmann, Florian; B.Sandler, Mark
Affiliation:Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
Listeners of audio are increasingly shifting to a participatory culture where technology allows them to modify and control the listening experience. This report describes the developments of a mood-driven music player, Moodplay, which incorporates semantic computing technologies for musical mood using social tags and informative and aesthetic browsing visualizations. The prototype runs with a dataset of over 10,000 songs covering various genres, arousal, and valence levels. Changes in the design of the system were made in response to user evaluations from over 120 participants in 15 different sectors of work or education. The proposed client/server architecture integrates modular components powered by semantic web technologies and audio content feature extraction. This enables recorded music content to be controlled in flexible and nonlinear ways. Dynamic music objects can be used to create mashups on the fly of two or more simultaneous songs to allow selection of multiple moods. The authors also consider nonlinear audio techniques that could transform the player into a creative tool, for instance, by reorganizing, compressing, or expanding temporally prerecorded content.
Download: PDF (HIGH Res) (750KB)
Download: PDF (LOW Res) (484KB)
Authors:Seetharaman, Prem; Pardo, Bryan
Affiliation:Northwestern University, Evanston, IL, USA
While professional audio development tools such as reverberators and equalizers are widely available to musicians who are not expert audio engineers, their interfaces can be frustrating. Professional interfaces are parameterized in terms of low-level signal manipulations, which are not intuitive to nonexperts. This report describes Audealize as an interface that bridges the gap between low-level parameters of existing audio production tools and programmatic goals, such as “make my guitar sound underwater.” Users modify the audio by selecting descriptive terms in a word-map built from a crowdsourced vocabulary of word labels for audio effects. A study with a population of 432 nonexperts found that they favored the crowdsourced word-map over traditional interfaces. Absolute performance measures showed that those who used the word-map interface produced results that were equal to or better than traditional interfaces. Also, participants preferred the word-map interface on the word matching task. The effectiveness of the interface was surprising because the interface was not designed for this task. One would expect the fine control afforded by the signal-parameter interface would let the user match the effect more closely than would be likely with the word-map. This indicates that a crowdsourced language is an effective interaction paradigm for novice users of audio production tools.
Download: PDF (HIGH Res) (4.6MB)
Download: PDF (LOW Res) (424KB)
The design of car audio systems involves an understanding of the challenging acoustics of the cabin. Systems can be “tuned” for specific listener positions. Compromises may need to be made depending on aspects of car design and production management. It is also possible to engineer enhanced spatial listening experiences in cars, and upmixing makes it possible to generate suitable signals for surround and vertical loudspeakers.
Download: PDF (275KB)