Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition
×
Cite This
Citation & Abstract
R. Huber, J. Ooster, and BE. T.. Meyer, "Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition," J. Audio Eng. Soc., vol. 66, no. 10, pp. 759-769, (2018 October.). doi: https://doi.org/10.17743/jaes.2018.0041
R. Huber, J. Ooster, and BE. T.. Meyer, "Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition," J. Audio Eng. Soc., vol. 66 Issue 10 pp. 759-769, (2018 October.). doi: https://doi.org/10.17743/jaes.2018.0041
Abstract: Quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the required quality of telecommunication service. Although formal subjective listening tests still represent the gold standard, they are time-consuming and costly. A new single-ended speech quality measure is proposed that uses a deep neural network (DNN)-based automatic speech recognition system. A quality measure is used to quantify the degradation of the DNN output caused by speech distortions. The new method was evaluated using five databases containing nine subsets of data covering several conditions of narrowband and broadband speech that was degraded by speech codecs, telecommunication networks, clipping, chopped speech, echoes, competing speakers, and additional background noises. Other than the training data set, evaluation results with the remaining eight data subsets showed good average correlations with subjective speech quality ratings achieved without any task-specific training or optimizations. These average results are close to those achieved with the American National Standard ANIQUE+ and clearly better than those obtained with the ITU-T standard P.563.
@article{huber2018single-ended,
author={huber, rainer and ooster, jasper and meyer, bernd t.},
journal={journal of the audio engineering society},
title={single-ended speech quality prediction based on automatic speech recognition},
year={2018},
volume={66},
number={10},
pages={759-769},
doi={https://doi.org/10.17743/jaes.2018.0041},
month={october},}
@article{huber2018single-ended,
author={huber, rainer and ooster, jasper and meyer, bernd t.},
journal={journal of the audio engineering society},
title={single-ended speech quality prediction based on automatic speech recognition},
year={2018},
volume={66},
number={10},
pages={759-769},
doi={https://doi.org/10.17743/jaes.2018.0041},
month={october},
abstract={quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the required quality of telecommunication service. although formal subjective listening tests still represent the gold standard, they are time-consuming and costly. a new single-ended speech quality measure is proposed that uses a deep neural network (dnn)-based automatic speech recognition system. a quality measure is used to quantify the degradation of the dnn output caused by speech distortions. the new method was evaluated using five databases containing nine subsets of data covering several conditions of narrowband and broadband speech that was degraded by speech codecs, telecommunication networks, clipping, chopped speech, echoes, competing speakers, and additional background noises. other than the training data set, evaluation results with the remaining eight data subsets showed good average correlations with subjective speech quality ratings achieved without any task-specific training or optimizations. these average results are close to those achieved with the american national standard anique+ and clearly better than those obtained with the itu-t standard p.563.},}
TY - paper
TI - Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition
SP - 759
EP - 769
AU - Huber, Rainer
AU - Ooster, Jasper
AU - Meyer, Bernd T.
PY - 2018
JO - Journal of the Audio Engineering Society
IS - 10
VO - 66
VL - 66
Y1 - October 2018
TY - paper
TI - Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition
SP - 759
EP - 769
AU - Huber, Rainer
AU - Ooster, Jasper
AU - Meyer, Bernd T.
PY - 2018
JO - Journal of the Audio Engineering Society
IS - 10
VO - 66
VL - 66
Y1 - October 2018
AB - Quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the required quality of telecommunication service. Although formal subjective listening tests still represent the gold standard, they are time-consuming and costly. A new single-ended speech quality measure is proposed that uses a deep neural network (DNN)-based automatic speech recognition system. A quality measure is used to quantify the degradation of the DNN output caused by speech distortions. The new method was evaluated using five databases containing nine subsets of data covering several conditions of narrowband and broadband speech that was degraded by speech codecs, telecommunication networks, clipping, chopped speech, echoes, competing speakers, and additional background noises. Other than the training data set, evaluation results with the remaining eight data subsets showed good average correlations with subjective speech quality ratings achieved without any task-specific training or optimizations. These average results are close to those achieved with the American National Standard ANIQUE+ and clearly better than those obtained with the ITU-T standard P.563.
Quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the required quality of telecommunication service. Although formal subjective listening tests still represent the gold standard, they are time-consuming and costly. A new single-ended speech quality measure is proposed that uses a deep neural network (DNN)-based automatic speech recognition system. A quality measure is used to quantify the degradation of the DNN output caused by speech distortions. The new method was evaluated using five databases containing nine subsets of data covering several conditions of narrowband and broadband speech that was degraded by speech codecs, telecommunication networks, clipping, chopped speech, echoes, competing speakers, and additional background noises. Other than the training data set, evaluation results with the remaining eight data subsets showed good average correlations with subjective speech quality ratings achieved without any task-specific training or optimizations. These average results are close to those achieved with the American National Standard ANIQUE+ and clearly better than those obtained with the ITU-T standard P.563.
Authors:
Huber, Rainer; Ooster, Jasper; Meyer, Bernd T.
Affiliation:
Medizinische Physik and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany JAES Volume 66 Issue 10 pp. 759-769; October 2018
Publication Date:
October 16, 2018Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19859