The current paper focuses on audio content management by means of joint audio segmentation and classification. We concentrate on the separation of typical audio classes, such as silence / background noise, speech, music and their combinations. A compact feature-vector subset is selected by a Correlation feature selection subset evaluation algorithm after the use of EM clustering algorithm on an initial audio data set. Time and spectral parameters are extracted using filter-banks and wavelets in combination with sliding windows and exponential moving averaging techniques. Features are extracted on a point-to-point basis, using the finest possible time resolution, so that each sample can be individually classified to one of the available groups. Clustering algorithms like EM or Simple K-means are tested to evaluate the final point-to-point classification result, therefore the joint audio detection-classification indexes. The extracted audio detection, segmentation and classification results can be incorporated into appropriate description schemes that would annotate audio events / segments for content description and management purposes.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.