In audio signal processing, several techniques rely on the Time-Frequency Representation (TFR) of an audio signal, and particularly in applications for music information retrieval. Examples include automatic music transcription, sound source separation, and classification of instruments playing in a musical piece. This paper presents a novel method for obtaining a sparse time-frequency representation by combining different instances of the Fan-Chirp Transform (FChT). The method described is comprised of two main steps: computing the multiple FChTs by means of the structure tensor; and combining them, along with spectrograms, using the smoothed local sparsity method. Experiments conducted with synthetic and real-world audio signals suggest that the proposed method is able to effectively yield much better TFRs than the standard short-time Fourier transform, especially in the presence of fast frequency variations; this allows using the FChT for polyphonic audio signals. As a result, the proposed method allows for better extraction of precise information from audio signals with multiple sources.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.