In the field of intelligent audio production, neural networks have been trained to automatically mix a multitrack to a stereo mixdown. Although these algorithms contain latent models of mix engineering, there is still a lack of approaches that explicitly model the decisions a mix engineer makes while mixing. In this work, a method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model a multitrack mix using gain, panning, equalization, dynamic range compression, distortion, delay, and reverb with the aid of greybox differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modeling the audio effects one may expect in a typical engineer's mixing chain. The modeling capacities of several different mixing chains are measured using both objective and subjective measures on a dataset of student mixes. Results show that the full signal chain performs best on objective measures and that there is no statistically significant difference between the participants' perception of the full mixing chain and reference mixes.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.