Speech de-noising algorithms often suffer from introduction of artifacts, either by removal of parts of the speech signal, or imperfect noise reduction causing the remaining noise to sound unnatural and disturbing. This contribution proposes to spatially distribute monaural noisy speech signals based on single-channel source separation, in order to improve the perceived speech quality. Stereo up-mixing is utilized on the estimated speech and noise sources instead of simply suppressing the noise. This paper investigates the case of non-negative matrix factorization (NMF) speech enhancement applied to high levels of non-stationary noise. NMF-based and spectral subtraction speech enhancement algorithms are evaluated in a listening test in terms of speech intelligibility, presence of interfering noises and overall quality with respect to the unprocessed signal. In the result, the listening test provides evidence for superior noise reduction by NMF, yet also a drop in perceived speech quality that is not covered by the employed set of common objective metrics. However, stereo up-mixing of NMF-separated speech and noise delivers high subjective noise reduction while preserving the perceived speech quality.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.