1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

Nemer, Elias; Vines, Greg

AES E-Library

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Chromagram have been proven more effective and convenient than training on time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with various combinations. In this paper, we provide a PyTorch framework for creating spectral features and time-frequency transformation using the built-in trainable conv1d() layer. This allows computing these on-the-fly as part of a larger network and enabling easier experimentation with various parameters. Our work extends the work in the literature developed for that end: First by adding more of these features; and also by allowing the possibility of either training from initialized kernels or training from random values and converging to the desired solution. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes for various applications.

Open
Access

Authors: Nemer, Elias; Vines, Greg
Affiliations: Irvine, CA, USA; Irvine, CA, USA(See document for exact affiliation information.)
Express Paper 6; AES Convention 153; October 2022
Publication Date: October 19, 2022 Import into BibTeX
Subject: Applications in Audio
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=21940

AES E-Library

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

ABOUT AES

Contact Us