We address the problem of distinguishing solo plucked string sound from speech. Due to the harmonic components present in both types of signals, a low complexity music/speech classifier often misclassifies these signals. To capture the sustained harmonic structures observed in solo plucked string sound, we propose a new feature, the Energy-to-Spectral Flux Ratio (ESFR). The values and the statistics of the ESFR for solo plucked string sound were distinct from those for speech when calculated over windows of 20 to 50 ms. By building a low complexity detector with the ESFR, we demonstrate the discriminating performance of the ESFR feature for the considered problem.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.