A coding methodology that aims at rate-distortion optimal sinusoid + noise coding of audio and speech signals is presented. The coder divides the input signal into variable-length time segments and distributes sinusoidal components over the segments such that the resulting distortion (as measured by a perceptual distortion measure) is minimized subject to a prespecified rate constraint. The coder is bit-rate scalable. For a given target bit budget it automatically adapts the segmentation and distribution of sinusoids in a rate-distortion optimal manner. The coder uses frequency-differential coding techniques in order to exploit intrasegment correlations for efficient quantization and encoding of the sinusoidal model parameters. This technique makes the coder more robust toward packet losses when used in a lossy-packet channel environment as compared to time-differential coding techniques, which are commonly used in audio or speech coders. In a subjective listening experiment the present coder showed similar or better performance than a set of four MPEG-4 coders operating at bit rates of 16, 24, 32, and 48 kbit/s, each of which was state of the art for the given target bit rate.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.