In speech/music coders and analysis/synthesis systems, spectral modeling is generally performed on a short-term (ST) frame-by-frame basis, which is justified by the fact that the signal is only locally (quasi-) stationary. The vocal tract configuration moves slowly and smoothly thereby resulting in a high correlation between the spectral parameters of successive frames: this correlation property is exploited in long-term modeling of the ST parameters, which however results in longer modeling/coding delays. The short delay constraint can be relaxed in many applications, such as text-to-speech modification/synthesis, telephony surveillance data, digital answering machines, electronic voicemail, digital voice logging, electronic toys, and video games. The long-term harmonic plus noise model (LT-HNM) for speech shows additional data compression possibilities since it exploits the smooth evolution of the time trajectories of the short-term harmonic plus noise model parameters by applying a discrete cosine model (DCM). In this paper, the authors extend the LT-HNM to a complete low bit-rate speech coder that is based on a long-term approach ca. 200ms. The proposed LT-HNM coder reaches a bit-rate of 2.7kbps for wideband speech.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.