AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this conference

Enhancing LSTM RNN-Based Speech Overlap Detection by Artificially Mixed Data

Document Thumbnail

This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers' gender is recognised, as the first successful combination of this kind within one model.

Authors: Hagerer, Gerhard; Pandit, Vedhas; Eyben, Florian; Schuller, Björn
Affiliations: audEERING GmbH, Gilching, Germany; University of Passau, Passau, Germany(See document for exact affiliation information.)
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-1 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Gaussian Framework for Interference Reduction in Live Recordings

Document Thumbnail

In live multitrack recordings, each voice is usually captured by dedicated close microphones. Unfortunately, it is also captured in practice by other microphones intended for other sources, leading to so-called “interferences”. Reducing this leakage is desirable because it opens new perspectives for the engineering of live recordings. Hence, it has been the topic of recent research in audio processing. In this paper, we show how a Gaussian probabilistic framework may be set up for obtaining good isolation of the target sources. Doing so, we extend several state-of-the art methods by fixing some heuristic parts of their algorithms. As we show in a perceptual evaluation on real-world multitrack live recordings, the resulting principled techniques yield improved quality.

Authors: Di Carlo, Diego; Déguernel, Ken; Liutkus, Antoine
Affiliations: INRIA, Villers-les-Nancy, France; STMS Lab IRCAM/CNRS/UPMC, Paris, France(See document for exact affiliation information.)
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: 1-1 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Audio Source Separation

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

An Unsupervised Hybrid Approach for Online Detection of Sound Scene Changes in Broadcast Content

Document Thumbnail

In this paper we describe an online system for broadcast content, which can detect sound scene changes with high accuracy. The system is unsupervised and does not require prior information on the segment classes. A scene change probability score is computed for each frame of the signal using a hybrid approach combining a model-based (Gaussian Mixture Model) with a distance-based (Hotelling’s T2-Statistic) segmentation method. The mixture model parameters are adapted online using the previous frames of the signal. Experiments on real recordings show that we can achieve more than 85% correct segment change detection with only 16% false detections.

Authors: Sevkin, Gökhan; Craciun, Alexandra; Bäckström, Tom
Affiliations: International Audio Laboratories, Erlangen, Friedrich- Alexander-Universität (FAU), Erlangen, Germany; Aalto University, Aalto, Finland(See document for exact affiliation information.)
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-2 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

On the Importance of Temporal Context in Proximity Kernels: A Vocal Separation Case Study

Document Thumbnail

Musical source separation methods exploit source-specific spectral characteristics to facilitate the decomposition process. Kernel Additive Modelling (KAM) models a source applying robust statistics to time-frequency bins as specified by a source-specific kernel, a function defining similarity between bins. Kernels in existing approaches are typically defined using metrics between single time frames. In the presence of noise and other sound sources information from a single-frame, however, turns out to be unreliable and often incorrect frames are selected as similar. In this paper, we incorporate a temporal context into the kernel to provide additional information stabilizing the similarity search. Evaluated in the context of vocal separation, our simple extension led to a considerable improvement in separation quality compared to previous kernels.

Authors: Yela, Delia Fano; Ewert, Sebastian; Fitzgerald, Derry; Sandler, Mark
Affiliations: Queen Mary University of London, London, UK; Cork Institute of Technology, Cork, Ireland(See document for exact affiliation information.)
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: 1-2 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Audio Source Separation

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields

Document Thumbnail

We propose a machine learning approach based on hinge-loss Markov random fields to solve the problem of applying reverb automatically to a multitrack session. With the objective of obtaining perceptually meaningful results, a set of Probabilistic Soft Logic (PSL) rules has been defined based on best practices recommended by experts. These rules have been weighted according to the level of confidence associated with the mentioned practices based on existent evidence. The resulting model has been used to extract parameters for a series of reverb units applied over the different tracks to obtain a reverberated mix of the session.

Authors: Benito, Adán L.; Reiss, Joshua D.
Affiliation: Queen Mary University of London, London, UK
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-3 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

The Phase Retrieval Toolbox

Document Thumbnail

A Matlab/GNU Octave toolbox for phase (signal) reconstruction from the short-time Fourier transform (STFT) magnitude is presented. The toolbox provides an implementation of various, conceptually different algorithms ranging from the well known Griffin-Lim algorithm and its derivatives to the very recent ones. The list includes real-time capable algorithms which are also implemented in real-time audio demos running directly in Matlab/GNU Octave. The toolbox is well-documented, open-source and it is available under the GPL3 license. In this paper, we give an overview of the algorithms contained in the toolbox and discuss their properties.

Author: Prusa, Zdenek
Affiliation: Austrian Academy of Sciences, Vienna, Austria
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-4 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

A Framework to Provide Fine-Grained Time-Dependent Context for Active Listening Experiences

Document Thumbnail

This work presents a system that is able to provide fine-grained time-dependent context while listening to recorded music. By utilizing acoustic fingerprinting techniques the system recognizes which music is playing in the environment and also determines an exact playback position. This makes it possible to provide context at exactly the right time. The design of the system can be used to augment listening experiences with lyrics, scores, tablature or even music videos. To test the concept, a prototype has been built that is able to give feedback that coincides with the beat of music playing in the users environment. The system is evaluated with respect to timing and is able to respond to beats within 16 ms on average.

Authors: Six, Joren; Leman, Marc
Affiliation: Ghent University, Ghent, Belgium
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-5 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Fast Music and Audio Processing Using the Julia Language

Document Thumbnail

Julia is a high-level dynamic programming language for technical computing characterized by its concise syntax and high performance. This paper reviews Julia's features that are useful for audio signal processing, and introduces JuliaAudio and MusicProcessing.jl, which provide a set of Julia packages for basic I/O and transformations of audio data as well as various feature extraction methods for music information retrieval tasks. We quantitatively evaluate the package in terms of its performance relative to existing audio feature extraction libraries. We argue that using Julia for music and audio processing brings a number of benefits, including its high performance in numerical computations, the ease of development coming from Julia's conciseness and versatility, and its scalability for distributed computing.

Authors: Kim, Jong Wook; Russell, Spencer; Bello, Juan
Affiliations: New York University, New York, NY, USA; Massachusetts Institute of Technology, Cambridge, MA, USA(See document for exact affiliation information.)
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-6 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Investigating Music Production Using a Semantically Powered Digital Audio Workstation in the Browser

Document Thumbnail

In this study, we present an online music production tool that facilitates the capture of time-series audio and session data, including action history. This allows us to analyse sessions and infer production decisions based on actions made to the user interface. We conduct an experiment in which mix engineers were asked to use the system to perform a balance mix, then we provide observations made using the system. We show that participants often exhibit commonalities in mixing styles when applying gain and panning to specific instruments in a mix, and demonstrate common temporal characteristics relating to the magnitude of parameter adjustments.

Authors: Jillings, Nicholas; Stables, Ryan
Affiliation: Birmingham City University, Birmingham, UK
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P1-7 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Singing Voice Detection across Different Music Genres

Document Thumbnail

Most of recent studies on vocal detection in audio recordings typically focus either on the development of new features or on classification methods. The impact of training and test data is largely neglected, leading to weaknesses in the design of databases which do not cover differences of vocal techniques across music genres. In this paper, we compare approaches for singing voice detection on individual genres. For both methods with the best performance, we further investigate the impact of disjunct distribution of training and test tracks with regard to their genres. In particular, the tracks of electronic genres, which are barely contained in public databases for vocal recognition, contribute to a better classification performance identifying vocals in tracks of other genres.

Authors: Scholz, Florian; Vatolkin, Igor; Rudolph, Günter
Affiliation: Technische Universität Dortmund, Dortmund, Germany
AES Conference: 2017 AES International Conference on Semantic Audio (June 2017)
Paper Number: P2-1 Permalink
Publication Date: June 13, 2017 Import into BibTeX
Subject: Semantic Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!