Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks

Colonel, Joseph; Keene, Sam

AES E-Library

Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks

A lightweight algorithm for low latency timbre interpolation of two input audio streams using an autoencoding neural network is presented. Short-time Fourier transform magnitude frames of each audio stream are encoded, and a new interpolated representation is created within the autoencoder’s latent space. This new representation is passed to the decoder, which outputs a spectrogram. An initial phase estimation for the new spectrogram is calculated using the original phase of the two audio streams. Inversion to the time domain is done using a Griffin-Lim iteration. A method for avoiding pops between processed batches is discussed. An open source implementation in Python is made available.

Authors: Colonel, Joseph; Keene, Sam
Affiliations: Queen Mary University of London, UK; The Cooper Union for the Advancement of Science and Art, New York, NY, USA(See document for exact affiliation information.)
AES Convention: 149 (October 2020) Paper Number: 10406
Publication Date: October 22, 2020 Import into BibTeX
Subject: Audio Processing
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=20943

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: /conv/149/10406.pdf

Start a discussion about this paper!

AES E-Library

Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks

ABOUT AES

Contact Us