Style Transfer of Audio Effects with Differentiable Signal Processing
×
Cite This
Citation & Abstract
CH. J.. Steinmetz, NI. J.. Bryan, and JO. D.. Reiss, "Style Transfer of Audio Effects with Differentiable Signal Processing," J. Audio Eng. Soc., vol. 70, no. 9, pp. 708-721, (2022 September.). doi: https://doi.org/10.17743/jaes.2022.0025
CH. J.. Steinmetz, NI. J.. Bryan, and JO. D.. Reiss, "Style Transfer of Audio Effects with Differentiable Signal Processing," J. Audio Eng. Soc., vol. 70 Issue 9 pp. 708-721, (2022 September.). doi: https://doi.org/10.17743/jaes.2022.0025
Abstract: This work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. A deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. In contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. Pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. A survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. The proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. Convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.
@article{steinmetz2022style,
author={steinmetz, christian j. and bryan, nicholas j. and reiss, joshua d.},
journal={journal of the audio engineering society},
title={style transfer of audio effects with differentiable signal processing},
year={2022},
volume={70},
number={9},
pages={708-721},
doi={https://doi.org/10.17743/jaes.2022.0025},
month={september},}
@article{steinmetz2022style,
author={steinmetz, christian j. and bryan, nicholas j. and reiss, joshua d.},
journal={journal of the audio engineering society},
title={style transfer of audio effects with differentiable signal processing},
year={2022},
volume={70},
number={9},
pages={708-721},
doi={https://doi.org/10.17743/jaes.2022.0025},
month={september},
abstract={this work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. a deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. in contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. a survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. the proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.},}
TY - paper
TI - Style Transfer of Audio Effects with Differentiable Signal Processing
SP - 708
EP - 721
AU - Steinmetz, Christian J.
AU - Bryan, Nicholas J.
AU - Reiss, Joshua D.
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 9
VO - 70
VL - 70
Y1 - September 2022
TY - paper
TI - Style Transfer of Audio Effects with Differentiable Signal Processing
SP - 708
EP - 721
AU - Steinmetz, Christian J.
AU - Bryan, Nicholas J.
AU - Reiss, Joshua D.
PY - 2022
JO - Journal of the Audio Engineering Society
IS - 9
VO - 70
VL - 70
Y1 - September 2022
AB - This work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. A deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. In contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. Pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. A survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. The proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. Convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.
This work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. A deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. In contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. Pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. A survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. The proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. Convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.
Open Access
Authors:
Steinmetz, Christian J.; Bryan, Nicholas J.; Reiss, Joshua D.
Affiliations:
Centre for Digital Music, Queen Mary University of London, London, UK; Adobe Research, San Fransico, CA; Centre for Digital Music, Queen Mary University of London, London, UK(See document for exact affiliation information.) JAES Volume 70 Issue 9 pp. 708-721; September 2022
Publication Date:
September 12, 2022Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21883