A general method for predicting the subjective quality of speech codecs has been developed. This method uses the concept of an internal sound representation. A model of the human auditory system is used to calculate the internal representation of the input and output signals of a speech codec. The transformation from the physical domain to the psychophysical (internal) domain is performed by way of three operations-frequency warping, time-frequency smearing, and level compression. These operations allow modeling of the masking behavior of the human auditory system both at and above masked threshold. It is shown that for the determination of speech codec quality, no time-frequency smearing has to be applied. This is in contrast with the results found for music codecs, for which the applied model parameters were in line with psychoacoustic data. Nevertheless the perceptual speech-quality measure (PSQM) can be used to predict the quality of speech codecs. The PSQM was optimized using the ETSI GSM speech codec test. The PSQM is validated is validated with the subjective results of the CCITT LD-CELP (G.728) speech codec test. Correlation between the predicted objective mean opinion scores (MOS), using the PSQM, and the subjective MOS results of the CCITT LD-CELP database was very high (0.99) with a low standard deviation (0.14). The predictions made with the PSQM are compared with predictions of four other speech-quality measures. The results show that the PSQM has the highest correlation with the lowest standard deviation.
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.