Phase-Aware Transformations in Variational Autoencoders for Audio Effects
×
Cite This
Citation & Abstract
M. Cámara, and JO. LU. Blanco, "Phase-Aware Transformations in Variational Autoencoders for Audio Effects," J. Audio Eng. Soc., vol. 70, no. 9, pp. 731-741, (2022 September.). doi: https://doi.org/10.17743/jaes.2022.0042
M. Cámara, and JO. LU. Blanco, "Phase-Aware Transformations in Variational Autoencoders for Audio Effects," J. Audio Eng. Soc., vol. 70 Issue 9 pp. 731-741, (2022 September.). doi: https://doi.org/10.17743/jaes.2022.0042
Abstract: This paper analyzes the impact of signal phase handling in one of the most popular architectures for the generative synthesis of audio effects: variational autoencoders (VAEs). Until quite recently, autoencoders based on the Fast Fourier Transform routinely avoided the phase of the signal. They store the phase information and retrieve it at the output or rely on signal phase regenerators such as Griffin--Lim. We evaluate different VAE networks capable of generating a latent space with intrinsic information from signal amplitude and phase. The Modulated Complex Lapped Transform (MCLT) has been evaluated as an alternative to the Short-Time Fourier Transform (STFT). A novel database on beats has been designed for testing the architectures. Results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on STFT and MCLT representations, using Griffin--Lim phase regeneration, multichannel networks, as well as the Complex VAE. The autoencoders successfully learned to represent the phase information and handle it in a holistic approach. State-of-the-art quality standards were reached for audio effects. The autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.
@article{cámara2022phase-aware,
author={cámara, mateo and blanco, josé luis},
journal={journal of the audio engineering society},
title={phase-aware transformations in variational autoencoders for audio effects},
year={2022},
volume={70},
number={9},
pages={731-741},
doi={https://doi.org/10.17743/jaes.2022.0042},
month={september},}
@article{cámara2022phase-aware,
author={cámara, mateo and blanco, josé luis},
journal={journal of the audio engineering society},
title={phase-aware transformations in variational autoencoders for audio effects},
year={2022},
volume={70},
number={9},
pages={731-741},
doi={https://doi.org/10.17743/jaes.2022.0042},
month={september},
abstract={this paper analyzes the impact of signal phase handling in one of the most popular architectures for the generative synthesis of audio effects: variational autoencoders (vaes). until quite recently, autoencoders based on the fast fourier transform routinely avoided the phase of the signal. they store the phase information and retrieve it at the output or rely on signal phase regenerators such as griffin--lim. we evaluate different vae networks capable of generating a latent space with intrinsic information from signal amplitude and phase. the modulated complex lapped transform (mclt) has been evaluated as an alternative to the short-time fourier transform (stft). a novel database on beats has been designed for testing the architectures. results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on stft and mclt representations, using griffin--lim phase regeneration, multichannel networks, as well as the complex vae. the autoencoders successfully learned to represent the phase information and handle it in a holistic approach. state-of-the-art quality standards were reached for audio effects. the autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.},}
TY - paper
TI - Phase-Aware Transformations in Variational Autoencoders for Audio Effects
SP - 731
EP - 741
AU - Cámara, Mateo
AU - Blanco, José Luis
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 9
VO - 70
VL - 70
Y1 - September 2022
TY - paper
TI - Phase-Aware Transformations in Variational Autoencoders for Audio Effects
SP - 731
EP - 741
AU - Cámara, Mateo
AU - Blanco, José Luis
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 9
VO - 70
VL - 70
Y1 - September 2022
AB - This paper analyzes the impact of signal phase handling in one of the most popular architectures for the generative synthesis of audio effects: variational autoencoders (VAEs). Until quite recently, autoencoders based on the Fast Fourier Transform routinely avoided the phase of the signal. They store the phase information and retrieve it at the output or rely on signal phase regenerators such as Griffin--Lim. We evaluate different VAE networks capable of generating a latent space with intrinsic information from signal amplitude and phase. The Modulated Complex Lapped Transform (MCLT) has been evaluated as an alternative to the Short-Time Fourier Transform (STFT). A novel database on beats has been designed for testing the architectures. Results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on STFT and MCLT representations, using Griffin--Lim phase regeneration, multichannel networks, as well as the Complex VAE. The autoencoders successfully learned to represent the phase information and handle it in a holistic approach. State-of-the-art quality standards were reached for audio effects. The autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.
This paper analyzes the impact of signal phase handling in one of the most popular architectures for the generative synthesis of audio effects: variational autoencoders (VAEs). Until quite recently, autoencoders based on the Fast Fourier Transform routinely avoided the phase of the signal. They store the phase information and retrieve it at the output or rely on signal phase regenerators such as Griffin--Lim. We evaluate different VAE networks capable of generating a latent space with intrinsic information from signal amplitude and phase. The Modulated Complex Lapped Transform (MCLT) has been evaluated as an alternative to the Short-Time Fourier Transform (STFT). A novel database on beats has been designed for testing the architectures. Results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on STFT and MCLT representations, using Griffin--Lim phase regeneration, multichannel networks, as well as the Complex VAE. The autoencoders successfully learned to represent the phase information and handle it in a holistic approach. State-of-the-art quality standards were reached for audio effects. The autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.
Authors:
Cámara, Mateo; Blanco, José Luis
Affiliations:
Information Processing and Telecommunication Center, Universidad Politécnica de Madrid, Madrid, Spain; Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain(See document for exact affiliation information.) JAES Volume 70 Issue 9 pp. 731-741; September 2022
Publication Date:
September 12, 2022Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21885