AES E-Library

AES E-Library

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

Document Thumbnail

Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Chromagram have been proven more effective and convenient than training on time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with various combinations. In this paper, we provide a PyTorch framework for creating spectral features and time-frequency transformation using the built-in trainable conv1d() layer. This allows computing these on-the-fly as part of a larger network and enabling easier experimentation with various parameters. Our work extends the work in the literature developed for that end: First by adding more of these features; and also by allowing the possibility of either training from initialized kernels or training from random values and converging to the desired solution. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes for various applications.

Open Access


Express Paper 6; AES Convention 153; October 2022
Publication Date:

Download Now (1014 KB)

This paper is Open Access which means you can download it for free.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this Applications in Audi!

AES - Audio Engineering Society