Deep Neural Network Based Forensic Automatic Speaker Recognition in VOCALISE using x-Vectors

Kelly, Finnian; Forth, Oscar; Kent, Samuel; Gerlach, Linda; Alexander, Anil

AES E-Library

Deep Neural Network Based Forensic Automatic Speaker Recognition in VOCALISE using x-Vectors

In this article we present a Deep Neural Network (DNN)-based version of the VOCALISE (Voice Comparison and Analysis of the Likelihood of Speech Evidence) forensic automatic speaker recognition system. DNNs mark a new phase in the evolution of automatic speaker recognition technology, providing a powerful framework for extracting highly-discriminative speaker-specific features from a recording of speech. The latest version of VOCALISE aims to preserve the ‘open-box’ philosophy of its predecessors, offering the forensic practitioner flexibility in the configuration and training of all parts of the automatic speaker recognition pipeline. VOCALISE continues to support both legacy and state-of-the-art speaker modelling algorithms, the latest of which is a DNN-based ‘x-vector’ framework, a state-of-the-art approach that leverages a DNN to extract compact speaker representations. Here, we introduce the x-vector framework and its implementation in VOCALISE, and demonstrate its powerful performance capabilities on some forensically relevant data.

Open
Access

Authors: Kelly, Finnian; Forth, Oscar; Kent, Samuel; Gerlach, Linda; Alexander, Anil
Affiliations: Oxford Wave Research Ltd., Oxford, UK; Oxford Wave Research Ltd., Oxford, UK; Oxford Wave Research Ltd., Oxford, UK; Philipps-Universität Marburg, Germany; Oxford Wave Research Ltd., Oxford, UK(See document for exact affiliation information.)
AES Conference: 2019 AES International Conference on Audio Forensics (June 2019)
Paper Number: 27
Publication Date: June 8, 2019 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=20477

AES E-Library

Deep Neural Network Based Forensic Automatic Speaker Recognition in VOCALISE using x-Vectors

ABOUT AES

Contact Us