Perceptually Optimized Cascaded Long Term Prediction of Polyphonic Signals for Enhanced MPEG-AAC
MPEG-4 Advanced Audio Coding uses the long term prediction (LTP) tool to exploit inter-frame correlations by providing a segment of previously reconstructed samples as prediction for the current frame, which is naturally useful for encoding signals with a single periodic component. However, most audio signals are polyphonic in nature containing a mixture of several periodic components. While such polyphonic signals are themselves periodic with overall period equaling the least common multiple of the individual component periods, the signal rarely remains sufficiently stationary over the extended period, rendering the LTP tool ineffective. Further hindering the LTP tool is the typically employed parameter selection based on minimizing the mean squared error as opposed to the perceptual distortion criteria defined for audio coding. We thus propose a technique to exploit the correlation of each periodic component with its immediate past, while taking into account the perceptual distortion criteria. Specifically, we propose cascading LTP filters corresponding to individual periodic components, designed appropriately in a two stage method, wherein an initial set of parameters is estimated backward adaptively to minimize the mean squared prediction error, followed by a refinement stage where parameters are adjusted to minimize the perceptual distortion. Objective and subjective results validate the effectiveness of the proposal on a variety of polyphonic signals.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is temporarily free for AES members.