Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks
×
Cite This
Citation & Abstract
A. Cohen-Hadria, and G. Peeters, "Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks," Paper 5-3, (2017 June.). doi:
A. Cohen-Hadria, and G. Peeters, "Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks," Paper 5-3, (2017 June.). doi:
Abstract: In this paper we propose a new representation as input of a Convolutional Neural Network in the goal of detecting music structure boundaries. For this task, previous works used a late-fusion of a Mel-scaled Log-Magnitude Spectrograms (MLS) and a lag matrices networks. We propose here to use several self-similarity-matrices, each representing different audio descriptors, and combined using the depth of the input layer. We show that this representation improve the results over the use of the lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of representations.
@article{cohen-hadria2017music,
author={cohen-hadria, alice and peeters, geoffroy},
journal={journal of the audio engineering society},
title={music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks},
year={2017},
volume={},
number={},
pages={},
doi={},
month={june},}
@article{cohen-hadria2017music,
author={cohen-hadria, alice and peeters, geoffroy},
journal={journal of the audio engineering society},
title={music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks},
year={2017},
volume={},
number={},
pages={},
doi={},
month={june},
abstract={in this paper we propose a new representation as input of a convolutional neural network in the goal of detecting music structure boundaries. for this task, previous works used a late-fusion of a mel-scaled log-magnitude spectrograms (mls) and a lag matrices networks. we propose here to use several self-similarity-matrices, each representing different audio descriptors, and combined using the depth of the input layer. we show that this representation improve the results over the use of the lag-matrix. we also show that using the depth of the input layer provide a convenient way for early fusion of representations.},}
TY - paper
TI - Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks
SP -
EP -
AU - Cohen-Hadria, Alice
AU - Peeters, Geoffroy
PY - 2017
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - June 2017
TY - paper
TI - Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks
SP -
EP -
AU - Cohen-Hadria, Alice
AU - Peeters, Geoffroy
PY - 2017
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - June 2017
AB - In this paper we propose a new representation as input of a Convolutional Neural Network in the goal of detecting music structure boundaries. For this task, previous works used a late-fusion of a Mel-scaled Log-Magnitude Spectrograms (MLS) and a lag matrices networks. We propose here to use several self-similarity-matrices, each representing different audio descriptors, and combined using the depth of the input layer. We show that this representation improve the results over the use of the lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of representations.
In this paper we propose a new representation as input of a Convolutional Neural Network in the goal of detecting music structure boundaries. For this task, previous works used a late-fusion of a Mel-scaled Log-Magnitude Spectrograms (MLS) and a lag matrices networks. We propose here to use several self-similarity-matrices, each representing different audio descriptors, and combined using the depth of the input layer. We show that this representation improve the results over the use of the lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of representations.
Authors:
Cohen-Hadria, Alice; Peeters, Geoffroy
Affiliation:
IRCAM, Paris, France
AES Conference:
2017 AES International Conference on Semantic Audio (June 2017)
Paper Number:
5-3
Publication Date:
June 13, 2017Import into BibTeX
Subject:
Deep Learning
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=18763