Separation of singing voice from music accompaniment is a topic of great utility in many application of Music Information Retrieval. In the context of stereophonic music mixtures, many algorithms face this problem making use of the spatial diversity of the sound sources to localize and isolate the singing voice. Although these spatial approaches can obtain acceptable results, the separated signal usually is affected by a high level of distortions and artifacts. In this paper, we propose a method for improving the isolation of the singing voice in stereo recordings based on incorporating the fundamental frequency (F0) information to the separation process. First, the singing voice is pre-separated from the input mixture using a state-of-the-art stereo source separation method, the MuLeTs algorithm. Then, the F0 of this pre-separated signal is obtained using a robust pitch estimator based on the computation of the difference function and Hidden Markov Models, obtaining a smooth pitch contour with voiced/unvoiced decisions. A binary mask is finally constructed from F0 to isolate the singing voice from the original mix. The method has been tested on studio music recordings, obtaining good separation results.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.