Transmitting speech signals at optimum quality over a weak narrowband network requires audio codecs that must not only be robust to packet loss and operate at low latency, but also offer a very low bit rate and maintain the original sound of the coded signal. Advanced speech codecs for real-time communication based on code-excited linear prediction provide bandwidths as low as 2 kbit/s. We propose a new coding approach that promises even lower bitrates through a synthesis approach not based on the source-filter model, but merely on a lookup table of audio waveform snippets and their corresponding Mel-Frequency Cepstral Coefficients (MFCC). The encoder performs a nearest-neighbor search for the MFCC features of each incoming audio frame against the lookup table. This process is heavily sped up by building a multi-dimensional search tree of the MFCC-features. In a speech coding application, for each audio frame, only the index of the nearest neighbor in the lookup table would need to be transmitted. The encoder synthesizes the audio signal from the waveform snippets corresponding to the transmitted indices.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.