AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this conference

Informed Audio Source Separation

Document Thumbnail

Audio source separation still remains very challenging in many situations, especially in the undetermined case, when there are fewer observations than sources. This is, in particular, the case of polyphonic music (e.g. with multiple individual sources) recorded in stereo (e.g. 2 channels). In order to improve audio source separation performances, many recent works have followed a so-called informed audio source separation approach, where the separation algorithm relies on some kind of additional information about the sources. The goal of this keynote is to make a comprehensive review of three major trends in informed audio source separation and to illustrate these trends with a number of demonstrative examples.

Author: Richard, Gaël
Affiliation: Institut Mines-Télécom, Télécom ParisTech, CNRS/LTCI, Paris, France
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: Keynote Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Keynote

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Creating Research Corpora for the Computational Study of Music: the case of the CompMusic Project

Document Thumbnail

A fundamental concern in music information research is the use of appropriate data sets, research corpora, from which to perform the needed data processing tasks. These corpora have to be suited for the specific research problems to be addressed and the design criteria with which to create them is a research task to which not much attention has been paid. In the CompMusic project we are studying several non-western art music traditions and a major effort has been the creation of appropriate data collections with which to study and characterise the melodic and rhythmic aspects of these traditions. In this article we go over the criteria used to create these collections and we describe the specificities of each of the collections gathered.

Author: Serra, Xavier
Affiliation: Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: 1-1 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Music Informatics and Retrieval

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Evaluation and Improvement of the Mood Conductor Interactive System

Document Thumbnail

In traditional music performances, audience members have a passive role in the music creation process and can't manifest what they would desire to listen to. We proposed an interactive system, Mood Conductor, to allow for interactions between the audience and performers in improvised performance situations. The system consists of three parts: a smartphone-friendly web application, a server component aggregating and clustering the messages sent from the application, and a visualisation client showing the emotional intentions from the audience. In this system, audience members can express emotional directions via the application. The collected data are processed and then fed back visually to the performers to indicate which emotions to express. A first user survey was conducted to assess the initial system following two public performances involving different ensembles and several issues were uncovered. This paper aims at describing changes made to the web application user interface and the visualisation system following a user-centred design approach. A second series of performances and user survey was then conducted validating the benefit of the changes.

Authors: Lou, Ting; Barthet, Mathieu; Fazekas, György; Sandler, Mark
Affiliation: Queen Mary University of London, London, UK
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-10 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Applications in Gaming, Entertainment, Education, etc.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Modeling Emotions in Music: Advances in Conceptual, Contextual and Validity Issues

Document Thumbnail

Modelling emotion recognition and induction by music has garnered increased attention during the last years. The present work puts together observations of the issues that need attention in order to make advances in music emotion recognition. These are divided into conceptual, contextual and data validity issues. Within each issue, the central dilemmas are discussed and promising further avenues are presented. In conceptual issues, significant discrepancies in the terminologies and choices for emotion focus and models exist. In contextual issues, the primary area of improvement is incorporating music and user contexts into emotion recognition. For the validity of data, reliable estimation of mid-level musical concepts require significant attention and optimal combination of studies with high validity and high stimulus quantity is the key to robust music emotion recognition models.

Author: Eerola, Tuomas
Affiliation: Durham University, Durham, UK
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: S1-1 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Music Informatics and Retrieval

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Semiotic Description of Music Structure: An Introduction to the Quaero/Metiss Structural Annotations

Document Thumbnail

Interest has been steadily growing in semantic audio and music information retrieval for the description of music structure, i.e. the global organization of music pieces in terms of large-scale structural units. This article presents a detailed methodology for the semiotic description of music structure, based on concepts and criteria which are formulated as generically as possible. We sum up the essential principles and practices developed during an annotation effort deployed by our research group (Metiss) on audio data, in the context of the Quaero project, which has led to the public release of over 380 annotations of pop songs from three different data sets. The paper also includes a few case studies and a concise statistical overview of the annotated data.

Authors: Bimbot, Frédéric; Sargent, Gabriel; Deruty, Emmanuel; Guichaoua, Corentin; Vincent, Emmanuel
Affiliation: METISS (PANAMA) Research Group, INRIA, IRISA CNRS-UMR 6074 and Université de Rennes 1, Campus Universitaire de Beaulieu, France
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-1 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Music Informatics and Retrieval

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Interactive Music Applications by MPEG-A Support in Sonic Visualizer

Document Thumbnail

New interactive music services have emerged, despite currently using proprietary file formats. Having a standardized file format could benefit the interoperability between these services. In this regard, the ISO/IEC Moving Picture Experts Group (MPEG) issued a new standard, the so called, MPEG-A: Interactive Music Application Format (IM AF). The purpose of this paper is to describe the design and implementation of an IM AF codec and its integration into Sonic Visualiser. In this way, the visualization of the chords or the pitch of the main melody aligned in time with the song's lyrics is achieved. Furthermore, this integration provides the semantic audio research community with a test-bed for further development and comparison of new Sonic Visualiser VAMP plug-ins, e.g., for the conversion of singing voice to text and/or automatic highlighting of lyrics for karaoke applications.

Authors: García, Jesús; Taglialatela, Costantino; Kudumakis, Panos; Barbancho, Isabel; Tardon, Lorenzo; Sandler, Mark
Affiliations: Queen Mary University of London, London, UK; Seconda Universita Degli Studi Di Napoli, Naples, Italy; Universidad de Málaga, Malaga, Spain(See document for exact affiliation information.)
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-11 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Audio Signal Processing and Feature Extraction

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Harmonic Cues for Number of Simultaneous Speakers Estimation

Document Thumbnail

Overlapped speech, where several speakers are speaking simultaneously, is a common occurence in multiparty discussions such as meetings. This kind of speech presents a great challenge to automatic speech processing systems such as speech recognition systems and speaker diarisation systems. In recent speaker diarisation systems, a large portion of the remaining error comes from overlapped speech. So far little work has been done on detecting overlapped speech and the number of speakers present in overlapped speech. In this paper we first describe a model-based approach for estimating the number of simultaneous speakers. Then, we propose a new approach called Spectral Peak Clustering where instead of training statistical models we extract spectral peaks from the input data and then cluster them into components by using a similarity measure between peaks where each component represents a speaker present in the input data.

Authors: Rafi, Umer; Bardeli, Rolf
Affiliation: Fraunhofer Institute for Intellegent Analysis and Information Systems, Fraunhofer IAIS, Sankt Augustin, Germany
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-12 Permalink
Publication Date: January 27, 2014 Import into BibTeX

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

A GMM Approach to Singing Language Identification

Document Thumbnail

Automatic language identification for singing is a topic that has not received much attention for the past years. Possible application scenarios include searching for musical pieces in a certain language, improvement of similarity search algorithms for music, and improvement of regional music classification and genre classification. It could also serve to mitigate the "glass ceiling" effect. Most existing approaches employ PPRLM (Parallel Phone Recognition followed by Language Modelling) processing. Recent publications show that GMM-based (Gaussian Mixture Models) approaches are now able to produce results comparable to PPRLM systems when using certain audio features. Their advantages lie in their simplicity of implementation and the reduced training data requirements. This was only tested on speech data so far. In this paper, we therefore try out such a GMM-based approach for singing language identification. We test our system on speech data and a-capella singing. We use MFCC (Mel-Frequency Cepstral Coefficients), TRAP (Termporal Pattern), and SDC (Shifted Delta Cepstrum) features. The results are comparable to the state of the art for singing language identification, but the approach is a lot simpler to implement as no phoneme-wise annotations are required. We obtain results of 75% accuracy for speech data and 67.5% accuracy for a-capella data. To our knowledge, neither the GMM-based approach nor this feature combination have been used for the purpose of singing language identification before.

Authors: Kruspe, Anna M.; Abesser, Jakob; Dittmar, Christian
Affiliations: Fraunhofer IDMT, Ilmenau, Germany; Johns Hopkins University, Baltimore, MD, USA(See document for exact affiliation information.)
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-13 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Speech Processing and Analysis

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

A Mid-Level Approach to Local Tonality Analysis: Extracting Key Signatures from Audio

Document Thumbnail

We propose a new method to automatically determine key signature changes. In automatic music transcription, sections in distantly related keys may lead to music scores that are hard to read due to a high number of notated accidentals. The problem of key change is commonly addressed by finding the correct local key out of the 24 major and minor keys. However, to provide the best matching key signature, choosing the right mode (major or minor) is not necessary and thus, we only estimate the local underlying diatonic scale. After extracting chroma features and a beat grid from the audio data, we calculate local probabilities for the different diatonic scales. For this purpose, we present a multiplicative procedure that shows promising results for visualizing complex tonal structures. From the obtained diatonic scale estimates, we identify candidates for key signature changes. By clustering similar segments and applying minimum segment length constraints, we get the tonal segmentation. We test our method on a dataset containing 30 hand-annotated pop songs. To evaluate our results, we calculate scores based on the number of frames correctly annotated, as well as segment border F-measures and perform a cross-validation study. Our rule-based method yields up to 90% class accuracy and up to 70% F-measure score for segment borders. These results are promising and qualify the approach to be applied for automatic music transcription.

Authors: Weiss, Christof; Cano, Estefania; Lukashevich, Hanna
Affiliation: Fraunhofer Institute for Digital Media Ttechnology, Ilmenau, Germany
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-2 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Music Informatics and Retrieval

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Bridging the Audio-Symbolic Gap: The Discovery of Repeated Note Content Directly from Polyphonic Music Audio

Document Thumbnail

Algorithms for the discovery of musical repetition have been developed in audio and symbolic domains more or less independently for over a decade. In this paper we combine algorithms for multiple F0 estimation, beat tracking, quantisation, and pattern discovery, so that for the first time, the note content of motifs, themes, and repeated sections can be discovered directly from polyphonic music audio. Testing on deadpan and expressive piano renditions of pieces, we compared pattern discovery performance against runs on symbolic representations of the same pieces. Comparing deadpan audio with deadpan-symbolic representations, establishment precision and recall fell by ~25%, and by ~50% when comparing expressive audio with deadpan-symbolic representations. The music data and evaluation results establish a benchmark for future work that attempts to bridge the audio-symbolic gap.

Authors: Collins, Tom; Böck, Sebastian; Krebs, Florian; Widmer, Gerhard
Affiliation: Johannes Kepler University Linz, Linz, Austria
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: 1-2 Permalink
Publication Date: January 27, 2014 Import into BibTeX
Subject: Automatic Music Transcription

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!