Binary masking is a common technique for separating target audio from an interferer. Its use is often justified by the high signal-to-noise ratio achieved. The mask can introduce musical noise artifacts, limiting its perceptual performance and that of techniques that use it. Three mask-processing techniques, involving adding noise or cepstral smoothing, are tested and the processed masks are compared to the ideal binary mask using the perceptual evaluation for audio source separation (PEASS) toolkit. Each processing technique's parameters are optimized before the comparison is made. Each technique is found to improve the overall perceptual score of the separation. Results show a trade-off between interferer suppression and artifact reduction.
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.