Generative Modeling of Metadata for Machine Learning Based Audio Content Classification
×
Cite This
Citation & Abstract
SU. G.. Bharitkar, "Generative Modeling of Metadata for Machine Learning Based Audio Content Classification," Engineering Brief 564, (2019 October.). doi:
SU. G.. Bharitkar, "Generative Modeling of Metadata for Machine Learning Based Audio Content Classification," Engineering Brief 564, (2019 October.). doi:
Abstract: Automatic content classification technique is an essential tool in multimedia applications. Present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. In this paper we present a neural network to classify between the movie (cinematic, TV shows), music, and voice using metadata contained in either the audio/video stream. Towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. Subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ML classifier as feature vectors. The resulting classifier is then able to classify real-world content (e.g., YouTube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.
@article{bharitkar2019generative,
author={bharitkar, sunil g.},
journal={journal of the audio engineering society},
title={generative modeling of metadata for machine learning based audio content classification},
year={2019},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{bharitkar2019generative,
author={bharitkar, sunil g.},
journal={journal of the audio engineering society},
title={generative modeling of metadata for machine learning based audio content classification},
year={2019},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={automatic content classification technique is an essential tool in multimedia applications. present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. in this paper we present a neural network to classify between the movie (cinematic, tv shows), music, and voice using metadata contained in either the audio/video stream. towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ml classifier as feature vectors. the resulting classifier is then able to classify real-world content (e.g., youtube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.},}
TY - paper
TI - Generative Modeling of Metadata for Machine Learning Based Audio Content Classification
SP -
EP -
AU - Bharitkar, Sunil G.
PY - 2019
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2019
TY - paper
TI - Generative Modeling of Metadata for Machine Learning Based Audio Content Classification
SP -
EP -
AU - Bharitkar, Sunil G.
PY - 2019
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2019
AB - Automatic content classification technique is an essential tool in multimedia applications. Present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. In this paper we present a neural network to classify between the movie (cinematic, TV shows), music, and voice using metadata contained in either the audio/video stream. Towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. Subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ML classifier as feature vectors. The resulting classifier is then able to classify real-world content (e.g., YouTube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.
Automatic content classification technique is an essential tool in multimedia applications. Present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. In this paper we present a neural network to classify between the movie (cinematic, TV shows), music, and voice using metadata contained in either the audio/video stream. Towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. Subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ML classifier as feature vectors. The resulting classifier is then able to classify real-world content (e.g., YouTube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.
Author:
Bharitkar, Sunil G.
Affiliation:
HP Labs., Inc., San Francisco, CA, USA
AES Convention:
147 (October 2019)eBrief:564
Publication Date:
October 8, 2019Import into BibTeX
Subject:
Applications in Audio
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=20587
The Engineering Briefs at this Convention were
selected on the basis of a submitted synopsis,
ensuring that they are of interest to AES members,
and are not overly commercial. These briefs have
been reproduced from the authors' advance
manuscripts, without editing, corrections, or
consideration by the Review Board. The AES takes no
responsibility for their contents. Paper copies are
not available, but any member can freely access
these briefs. Members are encouraged to provide
comments that enhance their usefulness.