AES E-Library

AES E-Library

Deep Learning Based Voice Extraction and Primary-Ambience Decomposition for Stereo to Surround Upmixing

Document Thumbnail

Surround systems have gained popularity in home entertainment despite the fact that most of the cinematic content is delivered in two-channel stereo format. Although there are several upmixing options, it has proven challenging to deliver an upmixed signal that approximates the original directionality and timbre intended by the mixing artist. The aim of this work is to design a two-to-five channels upmixer using a novel upmixing strategy combining voice extraction and primary-ambience decomposition. Results from a modified-MUSHRA test show that our proposed upmixer outperforms established alternatives for cinematic upmixing in perceived spatial and timbral quality.

Open Access


Express Paper 62; AES Convention 154; May 2023
Publication Date:

Download Now (886 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this Neural Networks!

AES - Audio Engineering Society