Automatic Music Transcription (AMT) is a process of inferring score notation from audio recordings, which depends on such subtasks as multipitch estimation, onset detection, tempo estimation, etc. The dynamics of music is one of the main elements that explains the characteristics of a performance, but dynamics has not yet been thoroughly investigated in the context of automatic music transcription. This report proposes a system for estimating the intensity of individual notes from piano recordings. The algorithm is based on a score-informed nonnegative matrix factorization (NMF) that takes the spectrogram of an audio recording and a corresponding MIDI score as inputs and factorizes the spectrogram into a set of spectral templates and their activations. The intensity of each note is obtained from the maximum activation of the corresponding pitch template around the onset of the note. The authors improved their system by employing an NMF model that can learn the temporal progress of the timbre of piano notes. While the previous research was evaluated only with perfectly-aligned scores, this paper also presents an evaluation with coarsely-aligned scores. The results shows that this approach is robust in aligning errors within 100 ms.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.