Microphone arrays provide spatial resolution that is useful for speech source separation due to the fact that sources located in different positions cause different time and level differences in the elements of the array. This feature can be combined with time-frequency masking in order to separate speech mixtures by means of clustering techniques, such as the so-called DUET algorithm, which uses only two microphones. However, there are applications where larger arrays are available, and the separation can be performed using all these microphones. A speech separation algorithm based on mean shift clustering technique has been recently proposed using only two microphones. In this work the aforementioned algorithm is generalized for arrays of any number of microphones, testing its performance with echoic speech mixtures. The results obtained show that the generalized mean shift algorithm notably outperforms the results obtained by the original DUET algorithm.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.