[Feature] The growth in computer power over the past decade has enabled remarkable possibilities for the automatic interpretation of audio signals. As human listeners we are able to make all sorts of conscious and unconscious interpretations of what we hear, from the recognition of instruments and voices within a complex texture through the extraction of melodic and chordal progressions to the inference of emotional mood or cultural associations. All of this is based on listening to a single mixed stream of sound that is just a messy waveform. If we are lucky there may be some spatial information involving the reception of more than one related stream from different directions, but at best we only have two ears no matter how many sources there are. Enabling machines to make sense of mixed audio streams was something close to the realms of science fiction not so long ago, but the latest research in semantic audio analysis brings it within our grasp.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.