InSE-NET: A Perceptually Coded Audio Quality Model based on CNN
×
Cite This
Citation & Abstract
G. Jiang, A. Biswas, C. Bergler, and A. Maier, "InSE-NET: A Perceptually Coded Audio Quality Model based on CNN," Paper 10514, (2021 October.). doi:
G. Jiang, A. Biswas, C. Bergler, and A. Maier, "InSE-NET: A Perceptually Coded Audio Quality Model based on CNN," Paper 10514, (2021 October.). doi:
Abstract: Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of
coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.
@article{jiang2021inse-net:,
author={jiang, guanxin and biswas, arijit and bergler, christian and maier, andreas},
journal={journal of the audio engineering society},
title={inse-net: a perceptually coded audio quality model based on cnn},
year={2021},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{jiang2021inse-net:,
author={jiang, guanxin and biswas, arijit and bergler, christian and maier, andreas},
journal={journal of the audio engineering society},
title={inse-net: a perceptually coded audio quality model based on cnn},
year={2021},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. one of the typical human-perception-related metrics, visqol v3 (viv3), has been proven to provide a high correlation to the quality scores rated by humans. in this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. we propose a learnable neural network, entitled inse-net, with a backbone of inception and squeeze-and-excitation modules to assess the perceived quality of
coded audio at a 48 khz sample rate. we demonstrate that synthetic data augmentation is capable of enhancing the prediction. our proposed method is intrusive, i.e. it requires gammatone spectrograms of unencoded reference signals. besides a comparable performance to viv3, our approach provides a more robust prediction towards higher bitrates.},}
TY - paper
TI - InSE-NET: A Perceptually Coded Audio Quality Model based on CNN
SP -
EP -
AU - Jiang, Guanxin
AU - Biswas, Arijit
AU - Bergler, Christian
AU - Maier, Andreas
PY - 2021
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2021
TY - paper
TI - InSE-NET: A Perceptually Coded Audio Quality Model based on CNN
SP -
EP -
AU - Jiang, Guanxin
AU - Biswas, Arijit
AU - Bergler, Christian
AU - Maier, Andreas
PY - 2021
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2021
AB - Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of
coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.
Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of
coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.
Authors:
Jiang, Guanxin; Biswas, Arijit; Bergler, Christian; Maier, Andreas
Affiliations:
Dolby Germany GmbH; Pattern Recognition Lab, FAU Erlangen-Nuremberg, Erlangen, Germany(See document for exact affiliation information.)
AES Convention:
151 (October 2021)
Paper Number:
10514
Publication Date:
October 13, 2021Import into BibTeX
Subject:
Audio quality
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21478