Singing Voice Separation by Low-Rank and Sparse Spectrogram Decomposition with Prelearned Dictionaries

Yu, Shiwei; Zhang, Hongjuan; Duan, Zhiyao

AES E-Library

Singing Voice Separation by Low-Rank and Sparse Spectrogram Decomposition with Prelearned Dictionaries

Although the human auditory system can easily distinguish the singing voice from the background music in a music recording, it is extremely difficult for computer systems to replicate this ability, especially when the music mixture is a single channel. The challenge arises from the variety of simultaneous sound sources as well from the rich pitch and timbre variations of a singing voice. Unsupervised spectrogram decomposition involves separating the mixture spectrogram into a sparse spectrogram for the singing voice and a low-rank spectrogram for the background music. This approach has two limitations: the unsupervised nature prevents the prelearning of voice and background in music dictionaries; some components of the singing voice and background music may not show the preferred sparse and low-rank properties. In contrast, the authors propose to decompose the mixture spectrogram into three parts: a sparse spectrogram representing the singing voice, a low-rank spectrogram representing the background music, and a residual spectrogram for the components that are not identified by either the sparse or the low-rank spectrogram. Universal dictionaries for the singing voice and background music are prelearned from isolated singing voice and background music training data, through which prior knowledge of the voice and background music is introduced to the separation process. Evaluations on two datasets show that the proposed method is effective and efficient for both the separated singing voice and music accompaniment at various voice-to-music ratios.

Authors: Yu, Shiwei; Zhang, Hongjuan; Duan, Zhiyao
Affiliations: Department of Mathematics, Shanghai University, Shanghai, P R China; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA(See document for exact affiliation information.)
JAES Volume 65 Issue 5 pp. 377-388; May 2017
Publication Date: May 26, 2017 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=18731

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD JAES65) /jaes65/5/pg377.pdf

DOI: https://doi.org/10.17743/jaes.2017.0009

Start a discussion about this paper!

AES E-Library

Singing Voice Separation by Low-Rank and Sparse Spectrogram Decomposition with Prelearned Dictionaries

ABOUT AES

Contact Us