Processing individual stems from raw recordings is one of the first steps of multitrack audio mixing. In this work we explore which set of low-level audio features are sufficient to design a prediction model for this transformation. We extract a large set of audio features from bass, guitar, vocal, and keys raw recordings and stems. We show that a procedure based on random forests classifiers can lead us to reduce significantly the number of features and we use the selected audio features to train various multi-output regression models. Thus, we investigate stem processing as a content-based transformation, where the inherent content of raw recordings leads us to predict the change of feature values that occurred within the transformation.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.