Modal Representations for Audio Deep Learning

Skare, Travis; Abel, Jonathan S.; Smith, III, Julius O.

AES E-Library

Modal Representations for Audio Deep Learning

Deep learning models for both discriminative and generative tasks have a choice of domain representation. For audio, candidates are often raw waveform data, spectral data, transformed spectral data, or perceptual features. For deep learning tasks related to modal synthesizers or processors, we propose new, modal representations for data. We experiment with representations such as an N-hot binary vector of frequencies, or learning a set of modal filterbank coefficients directly. We use these representations discriminatively–classifying cymbal model based on samples–as well as generatively. An intentionally naive application of a basic modal representation to a CVAE designed for MNIST digit images quickly yielded results, which we found surprising given less prior success when using traditional representations like a spectrogram image. We discuss applications for Generative Adversarial Networks, towards creating a modal reverberator generator.

Authors: Skare, Travis; Abel, Jonathan S.; Smith, III, Julius O.
Affiliation: CCRMA, Stanford University, Stanford, CA, USA
AES Convention: 147 (October 2019) Paper Number: 10248
Publication Date: October 8, 2019 Import into BibTeX
Subject: Posters: Audio Signal Processing
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=20621

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: /conv/147/10248.pdf

Start a discussion about this paper!

AES E-Library

Modal Representations for Audio Deep Learning

ABOUT AES

Contact Us