Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET
×
Cite This
Citation & Abstract
A. Biswas, and G. Jiang, "Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET," Express Paper 21, (2022 October.). doi:
A. Biswas, and G. Jiang, "Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET," Express Paper 21, (2022 October.). doi:
Abstract: Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it – completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of Pearson’s and Spearman’s Rank correlation coefficient, respectively, over the latest ViSQOL-v3 [3].
@article{biswas2022stereo,
author={biswas, arijit and jiang, guanxin},
journal={journal of the audio engineering society},
title={stereo inse-net: stereo audio quality predictor transfer learned from mono inse-net},
year={2022},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{biswas2022stereo,
author={biswas, arijit and jiang, guanxin},
journal={journal of the audio engineering society},
title={stereo inse-net: stereo audio quality predictor transfer learned from mono inse-net},
year={2022},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. with inse-net [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (visqol-v3 [2]) with deep neural networks (dnn) and subsequently improving it – completely with programmatically generated data. in this study, we take steps towards building a dnn-based coded stereo audio quality predictor and we propose an extension of the inse-net for handling stereo signals. the design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model stereo inse-net. by transferring selected weights from the pre-trained mono inse-net and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of pearson’s and spearman’s rank correlation coefficient, respectively, over the latest visqol-v3 [3].},}
TY - Applications in Audi
TI - Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET
SP -
EP -
AU - Biswas, Arijit
AU - Jiang, Guanxin
PY - 2022
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2022
TY - Applications in Audi
TI - Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET
SP -
EP -
AU - Biswas, Arijit
AU - Jiang, Guanxin
PY - 2022
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2022
AB - Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it – completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of Pearson’s and Spearman’s Rank correlation coefficient, respectively, over the latest ViSQOL-v3 [3].
Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it – completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of Pearson’s and Spearman’s Rank correlation coefficient, respectively, over the latest ViSQOL-v3 [3].
Authors:
Biswas, Arijit; Jiang, Guanxin
Affiliations:
Dolby Germany GmbH; Dolby Germany GmbH(See document for exact affiliation information.) Express Paper 21; AES Convention 153; October 2022
Publication Date:
October 19, 2022Import into BibTeX
Subject:
Applications in Audio
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=21902