This paper reports on the techniques refined for a method of speaker identification through the automated comparison of spectral, timbral, and temporal features unique to an individual’s speech production. This method was first described in Convention Paper 7274 presented by the co-author of this paper, Richard Sanders, at the 123rd Convention of the Audio Engineering Society. Since its first publication, the system (now referred to as SIDNI or Speaker Identification by Numerical Imprint) has improved from 79% correct identifications in 78 comparisons from the speech of 26 males to 100% correct identifications in 150 comparisons from the speech of 50 males. This paper will provide more information on these results and the results of several other tests while also elaborating on the specific speech characteristics exploited by the system and their potential for identification. Some characteristics include: average fundamental speaking frequency, ratio of spectral densities below 1 kHz to those above 1 kHz (Alpha ratio), average rate of vowels, jitter, and shimmer.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.