Unified Speech and Audio Coding is the newest MPEG audio standard, published in late 2011. It achieves consistently state-of-the-art compression performance for any mix of speech and music content. MPEG-1 and MPEG-2 Layer III and MPEG-4 Advanced Audio Coding (AAC) use perceptually shaped quantization noise as the primary tool for achieving compression; MPEG-4 High-Efficiency AAC adds parametric coding of the upper spectrum region (using the Spectral Band Replication tool); and MPEG-D MPEG Surround adds parametric coding of the sound stage (using level, time and coherence parameters in the time/frequency domain). The common thread in all of these MPEG standards is that they model and exploit how humans perceive sound. MPEG-D Unified Speech and Audio Coding incorporates all of these models of sound perception and additionally incorporates a model of sound production, specifically that of human speech. The paper gives an overview of the architecture of the Unified Speech and Audio Coding algorithm and how the various compression tools operate in response to the instantaneous statistics of arbitrary mixed-content signals. There is a brief description of the tools giving the greatest compression performance and results of subjective listening tests showing the performance of the standard relative to state-of-the-art benchmark coders.
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.