Singing Voice Separation from Stereo Recordings Using Spatial Clues and Robust F0 Estimation

Cabañas-Molero, Pablo; Martínez Muñoz, Damián; Cobos, Maximo; López, José J.

AES E-Library

Singing Voice Separation from Stereo Recordings Using Spatial Clues and Robust F0 Estimation

Separation of singing voice from music accompaniment is a topic of great utility in many application of Music Information Retrieval. In the context of stereophonic music mixtures, many algorithms face this problem making use of the spatial diversity of the sound sources to localize and isolate the singing voice. Although these spatial approaches can obtain acceptable results, the separated signal usually is affected by a high level of distortions and artifacts. In this paper, we propose a method for improving the isolation of the singing voice in stereo recordings based on incorporating the fundamental frequency (F0) information to the separation process. First, the singing voice is pre-separated from the input mixture using a state-of-the-art stereo source separation method, the MuLeTs algorithm. Then, the F0 of this pre-separated signal is obtained using a robust pitch estimator based on the computation of the difference function and Hidden Markov Models, obtaining a smooth pitch contour with voiced/unvoiced decisions. A binary mask is finally constructed from F0 to isolate the singing voice from the original mix. The method has been tested on studio music recordings, obtaining good separation results.

Authors: Cabañas-Molero, Pablo; Martínez Muñoz, Damián; Cobos, Maximo; López, José J.
Affiliations: Institute for Telecommunications and Multimedia Applications (iTEAM); University of Jaén, Linares, Jaén, Spain; Technical University of Valencia, Valencia, Spain(See document for exact affiliation information.)
AES Conference: 42nd International Conference: Semantic Audio (July 2011)
Paper Number: 5-1
Publication Date: July 22, 2011 Import into BibTeX
Subject: Audio Source Separation
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=15965

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD 42ndPapers) /conf/42/aes42-000051.pdf

Start a discussion about this paper!

AES E-Library

Singing Voice Separation from Stereo Recordings Using Spatial Clues and Robust F0 Estimation

ABOUT AES

Contact Us