Automatic language identification for singing is a topic that has not received much attention for the past years. Possible application scenarios include searching for musical pieces in a certain language, improvement of similarity search algorithms for music, and improvement of regional music classification and genre classification. It could also serve to mitigate the "glass ceiling" effect. Most existing approaches employ PPRLM (Parallel Phone Recognition followed by Language Modelling) processing. Recent publications show that GMM-based (Gaussian Mixture Models) approaches are now able to produce results comparable to PPRLM systems when using certain audio features. Their advantages lie in their simplicity of implementation and the reduced training data requirements. This was only tested on speech data so far. In this paper, we therefore try out such a GMM-based approach for singing language identification. We test our system on speech data and a-capella singing. We use MFCC (Mel-Frequency Cepstral Coefficients), TRAP (Termporal Pattern), and SDC (Shifted Delta Cepstrum) features. The results are comparable to the state of the art for singing language identification, but the approach is a lot simpler to implement as no phoneme-wise annotations are required. We obtain results of 75% accuracy for speech data and 67.5% accuracy for a-capella data. To our knowledge, neither the GMM-based approach nor this feature combination have been used for the purpose of singing language identification before.
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.