Conditioned Source Separation by Attentively Aggregating Frequency Transformations With Self-Conditioning

Choi, Woosung; Jeong, Yeong-Seok; Kim, Jinsung; Chung, Jaehwa; Jung, Soonyoung; Reiss, Joshua D.

AES E-Library

Conditioned Source Separation by Attentively Aggregating Frequency Transformations With Self-Conditioning

This work is licensed under a
Creative Commons Attribution
4.0 International License.

Label-conditioned source separation extracts the target source, specified by an input symbol, from an input mixture track. A recently proposed label-conditioned source separation model called Latent Source Attentive Frequency Transformation (LaSAFT)--Gated Point-Wise Convolutional Modulation (GPoCM)--Net introduced a block for latent source analysis called LaSAFT. Employing LaSAFT blocks, it established state-of-the-art performance on several tasks of the MUSDB18 benchmark. This paper enhances the LaSAFT block by exploiting a self-conditioning method. Whereas the existing method only cares about the symbolic relationships between the target source symbol and latent sources, ignoring audio content, the new approach also considers audio content. The enhanced block computes the attention mask conditioning on the label and the input audio feature map. Here, it is shown that the conditioned U-Net employing the enhanced LaSAFT blocks outperforms the previous model. It is also shown that the present model performs the audio-query--based separation with a slight modification.

Open
Access

Authors: Choi, Woosung; Jeong, Yeong-Seok; Kim, Jinsung; Chung, Jaehwa; Jung, Soonyoung; Reiss, Joshua D.
Affiliations: Department of Computer Science and Engineering, Korea University, Republic of Korea; Department of Computer Science and Engineering, Korea University, Republic of Korea; Department of Computer Science and Engineering, Korea University, Republic of Korea; Department of Computer Science, Korea National Open University, Republic of Korea; Department of Computer Science and Engineering, Korea University, Republic of Korea; Centre for Digital Music, Queen Mary University of London, London, UK(See document for exact affiliation information.)
JAES Volume 70 Issue 9 pp. 661-673; September 2022
Publication Date: September 12, 2022 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=21880

Download Now (882 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location: (CD JAES70) /jaes70/9/pg661.pdf

DOI: https://doi.org/10.17743/jaes.2022.0030

Start a discussion about this paper!

AES E-Library

Conditioned Source Separation by Attentively Aggregating Frequency Transformations With Self-Conditioning

ABOUT AES

Contact Us