A GMM Approach to Singing Language Identification

Kruspe, Anna M.; Abesser, Jakob; Dittmar, Christian

AES E-Library

A GMM Approach to Singing Language Identification

Automatic language identification for singing is a topic that has not received much attention for the past years. Possible application scenarios include searching for musical pieces in a certain language, improvement of similarity search algorithms for music, and improvement of regional music classification and genre classification. It could also serve to mitigate the "glass ceiling" effect. Most existing approaches employ PPRLM (Parallel Phone Recognition followed by Language Modelling) processing. Recent publications show that GMM-based (Gaussian Mixture Models) approaches are now able to produce results comparable to PPRLM systems when using certain audio features. Their advantages lie in their simplicity of implementation and the reduced training data requirements. This was only tested on speech data so far. In this paper, we therefore try out such a GMM-based approach for singing language identification. We test our system on speech data and a-capella singing. We use MFCC (Mel-Frequency Cepstral Coefficients), TRAP (Termporal Pattern), and SDC (Shifted Delta Cepstrum) features. The results are comparable to the state of the art for singing language identification, but the approach is a lot simpler to implement as no phoneme-wise annotations are required. We obtain results of 75% accuracy for speech data and 67.5% accuracy for a-capella data. To our knowledge, neither the GMM-based approach nor this feature combination have been used for the purpose of singing language identification before.

Authors: Kruspe, Anna M.; Abesser, Jakob; Dittmar, Christian
Affiliations: Fraunhofer IDMT, Ilmenau, Germany; Johns Hopkins University, Baltimore, MD, USA(See document for exact affiliation information.)
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-13
Publication Date: January 27, 2014 Import into BibTeX
Subject: Speech Processing and Analysis
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=17119

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD 53rdPapers) /conf/53/aes53-000101.pdf

Start a discussion about this paper!

AES E-Library

A GMM Approach to Singing Language Identification

ABOUT AES

Contact Us