In this paper we focus on the real-time frequency domain analysis of speech signals, and on the extraction of suitable and perceptually meaningful features that are related to the glottal source and that may pave the way for robust speaker identification and voice register classification. We take advantage of an analysis-synthesis framework derived from an audio coding algorithm in order to estimate and model the relative delays between the different harmonics reflecting the contribution of the glottal source and the group delay of the vocal tract filter. We show in this paper that this approach effectively captures the shape invariance of a periodic signal and may be suited to monitor and extract in real-time perceptually important features correlating well with specific voice registers or with a speaker unique sound signature. A first validation study is described that confirms the competitive performance of the proposed approach in the automatic classification of the breathy, normal and pressed voice phonation types.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.