AES Store

Journal Forum

Reflecting on Reflections - June 2014
1 comment

Quiet Thoughts on a Deafening Problem - May 2014
1 comment

Perceptual Effects of Dynamic Range Compression in Popular Music Recordings - January 2014
5 comments

Access Journal Forum

AES E-Library

Harmonic Cues for Number of Simultaneous Speakers Estimation

Overlapped speech, where several speakers are speaking simultaneously, is a common occurence in multiparty discussions such as meetings. This kind of speech presents a great challenge to automatic speech processing systems such as speech recognition systems and speaker diarisation systems. In recent speaker diarisation systems, a large portion of the remaining error comes from overlapped speech. So far little work has been done on detecting overlapped speech and the number of speakers present in overlapped speech. In this paper we first describe a model-based approach for estimating the number of simultaneous speakers. Then, we propose a new approach called Spectral Peak Clustering where instead of training statistical models we extract spectral peaks from the input data and then cluster them into components by using a similarity measure between peaks where each component represents a speaker present in the input data.

Authors:
Affiliation:
AES Conference:
Paper Number:
Publication Date:

Click to purchase paper or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $20 for non-members, $5 for AES members and is free for E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:

Start a discussion about this paper!


 
Facebook   Twitter   LinkedIn   Google+   YouTube   RSS News Feeds  
AES - Audio Engineering Society