AES E-Library

AES E-Library

Modal Representations for Audio Deep Learning

Deep learning models for both discriminative and generative tasks have a choice of domain representation. For audio, candidates are often raw waveform data, spectral data, transformed spectral data, or perceptual features. For deep learning tasks related to modal synthesizers or processors, we propose new, modal representations for data. We experiment with representations such as an N-hot binary vector of frequencies, or learning a set of modal filterbank coefficients directly. We use these representations discriminatively–classifying cymbal model based on samples–as well as generatively. An intentionally naive application of a basic modal representation to a CVAE designed for MNIST digit images quickly yielded results, which we found surprising given less prior success when using traditional representations like a spectrogram image. We discuss applications for Generative Adversarial Networks, towards creating a modal reverberator generator.

AES Convention: Paper Number:
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this paper!

AES - Audio Engineering Society