WaveBeat: End-to-end beat and downbeat tracking in the time domain
×
Cite This
Citation & Abstract
CH. J.. Steinmetz, and JO. D.. Reiss, "WaveBeat: End-to-end beat and downbeat tracking in the time domain," Engineering Brief 655, (2021 October.). doi:
CH. J.. Steinmetz, and JO. D.. Reiss, "WaveBeat: End-to-end beat and downbeat tracking in the time domain," Engineering Brief 655, (2021 October.). doi:
Abstract: Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field (= 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.
@article{steinmetz2021wavebeat:,
author={steinmetz, christian j. and reiss, joshua d.},
journal={journal of the audio engineering society},
title={wavebeat: end-to-end beat and downbeat tracking in the time domain},
year={2021},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{steinmetz2021wavebeat:,
author={steinmetz, christian j. and reiss, joshua d.},
journal={journal of the audio engineering society},
title={wavebeat: end-to-end beat and downbeat tracking in the time domain},
year={2021},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={deep learning approaches for beat and downbeat tracking have brought advancements. however, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. in this work, we propose wavebeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. this method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. our model utilizes temporal convolutional networks (tcns) operating on waveforms that achieve a very large receptive field (= 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. with a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.},}
TY - paper
TI - WaveBeat: End-to-end beat and downbeat tracking in the time domain
SP -
EP -
AU - Steinmetz, Christian J.
AU - Reiss, Joshua D.
PY - 2021
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2021
TY - paper
TI - WaveBeat: End-to-end beat and downbeat tracking in the time domain
SP -
EP -
AU - Steinmetz, Christian J.
AU - Reiss, Joshua D.
PY - 2021
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2021
AB - Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field (= 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.
Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field (= 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.
Authors:
Steinmetz, Christian J.; Reiss, Joshua D.
Affiliation:
Queen Mary University of London, UK
AES Convention:
151 (October 2021)eBrief:655
Publication Date:
October 13, 2021Import into BibTeX
Subject:
Applications in audio
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21518
The Engineering Briefs at this Convention were
selected on the basis of a submitted synopsis,
ensuring that they are of interest to AES members,
and are not overly commercial. These briefs have
been reproduced from the authors' advance
manuscripts, without editing, corrections, or
consideration by the Review Board. The AES takes no
responsibility for their contents. Paper copies are
not available, but any member can freely access
these briefs. Members are encouraged to provide
comments that enhance their usefulness.