Single-channel speech source separation (SCSSS) is a research field with applications that include hearing aids and security. This research uses a hybrid method for SCSSS, which combines two different approaches based on the voicing state; the algorithm can be used for speech source separation and speech enhancement. The hybrid method combines subspace decomposition for unvoiced speech, and Soft-CASA (Computational Auditory Scene Analysis) for voiced speech. The voiced speech source separation process is an improved version of the conventional CASA system that is optimized by the use of a soft mask. Moreover, the unvoiced speech source separation process relies on an optimized approximation of the speech signal by subspace decomposition in the spectral domain. The new system is evaluated for speech separation outcome, as well as for voicing decision. Despite the challenging acoustic environments that were used for test, the proposed speech separation approach yields on average 58.91 % improvement in signal-to-interference ratio, 12.67 % improvement in signal-to-artifact ratio, 38.91 % improvement in signal-to-distortion ratio, and 45 % improvement in perceived speech quality.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.