AES E-Library

AES E-Library

Transcription of Polyphonic Vocal Music with a Repetitive Melodic Structure

Document Thumbnail

Automatic music transcription transforms an acoustic music signal into a symbolic notation that typically involves the detection of multiple concurrent pitches, the detection of note onsets and offsets, as well as recognition of the instruments. This paper presents a novel method for transcribing folk music. In contrast to most commercial music, folk music recordings may contain various inaccuracies because they are usually performed by amateur musicians and recorded in the field. The proposed method fuses three sources of information: frame-based multiple F0 estimates, song structure, and pitch drift estimates. Using song structure can improve transcription accuracy. The method uses two strategies: exploiting repetitions aligned in the time and pitch domains for improving F0 estimates and incorporating a probabilistic model based on explicit duration hidden Markov models (EDHMM) to estimate notes from F0. A representative segment of the analyzed song is used to align other segments. Information from these segments is summarized and used in a two-layer probabilistic EDHMM to segment frame-based information into notes.

JAES Volume 64 Issue 9 pp. 664-672; September 2016
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location:


Start a discussion about this paper!

AES - Audio Engineering Society