Deep Learning Based Voice Extraction and Primary-Ambience Decomposition for Stereo to Surround Upmixing

Paez Amaro, Ricardo Thaddeus; Tejeda Ocampo, Carlos; Souza Blanes, Ema; Bharitkar, Sunil; Madrid Herrera, Luis

AES E-Library

Deep Learning Based Voice Extraction and Primary-Ambience Decomposition for Stereo to Surround Upmixing

Surround systems have gained popularity in home entertainment despite the fact that most of the cinematic content is delivered in two-channel stereo format. Although there are several upmixing options, it has proven challenging to deliver an upmixed signal that approximates the original directionality and timbre intended by the mixing artist. The aim of this work is to design a two-to-five channels upmixer using a novel upmixing strategy combining voice extraction and primary-ambience decomposition. Results from a modified-MUSHRA test show that our proposed upmixer outperforms established alternatives for cinematic upmixing in perceived spatial and timbral quality.

Open
Access

Authors: Paez Amaro, Ricardo Thaddeus; Tejeda Ocampo, Carlos; Souza Blanes, Ema; Bharitkar, Sunil; Madrid Herrera, Luis
Affiliations: Samsung Research Tijuana, Mexico; Samsung Research Tijuana, Mexico; Samsung Research America, Mountain View, CA, USA; Samsung Research America, Mountain View, CA, USA; Samsung Research Tijuana, Mexico(See document for exact affiliation information.)
Express Paper 62; AES Convention 154; May 2023
Publication Date: May 13, 2023 Import into BibTeX
Subject: Neural Networks
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=22087

AES E-Library

Deep Learning Based Voice Extraction and Primary-Ambience Decomposition for Stereo to Surround Upmixing

ABOUT AES

Contact Us