The Influence of Cross-Modal Interaction on Audio-Visual Speech Quality Perception
Combined audio-visual services are expected to become a feature of the next generation of telecommunication systems. Current systems apply individual perceptually motivated data reduction to the audio and video. A system which applies perceptually motivated data reduction to the combined audio-visual content may provide greater data reduction, whilst maintaining high quality service over low bit-rate transmission media. To be able to design such bimodal codecs, and optimize overall performance, (through the use of low bit-rate codecs),it is important to understand the trade-offs between the quality of the audio and the video. This paper compares the results of a complimentary pair of subjective experiments designed to investigate the differences between visual speech and non-visual speech quality perception and quality mismatch. Conclusions are drawn about the variation in cross-modal interaction, and particularly speech quality perception, for different audio-visual content. The results obtained will be used in the design of a bi-model perceptual model.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is temporarily free for AES members.