AES E-Library

AES E-Library

Deep Neural Network Based Guided Speech Bandwidth Extension

Up to today telephone speech is still limited to the range of 200 to 3400 Hz since the predominant codecs in public switched telephone networks are AMR-NB, G.711, and G.722 [1, 2, 3]. Blind bandwidth extension (blind BWE, BBWE) can improve the perceived quality as well as the intelligibility of coded speech without changing the transmission network or the speech codec. The BBWE used in this work is based on deep neural networks (DNNs) and has already shown good performance [4]. Although this BBWE enhances the speech without producing too many artifacts it sometimes fails to enhance prominent fricatives that can result in muffled speech. In order to better synthesize prominent fricatives the BBWE is extended by sending a single bit of side information—here referred to as guided BWE. This bit may be transmitted, e.g., by watermarking so that no changes to the transmission network or the speech codec have to be done. Different DNN con?gurations (including convolutional (Conv.) layers as well as long short-term memory layers (LSTM)) making use of this bit have been evaluated. The BBWE has a low computational complexity and an algorithmic delay of 12 ms only and can be applied in state-of-the-art speech and audio codecs.

AES Convention: Paper Number:
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this paper!

AES - Audio Engineering Society