The primary stages of auditory stream separation are modelled here as a bottom-up organisation of primitive spectro-temporal regions into composites, which provide a succinct description of the sound environment in terms of a limited number of salient sound events. Many years of research on auditory streaming have determined a number of qualitative comparisons between physical attributes of sounds, or cues, that are well-correlated with the extent to which stream segregation or fusion of simple test stimuli occurs in listening tests. However, the relative importance of these cues is difficult to determine, especially in natural sound environments. This work represents some exploratory stages in using an artificial neural network to learn how to integrate multiple cues in nonlinear manner, for sound object formation. As a precursor to a more complex auditory front-end, a sinusoidal tracking algorithm was used to obtain the initial set of "spectro-temporal regions" or partial trajectories.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.