The correspondence of various spectral difference error metrics to human discrimination data was investigated. Time-varying harmonic amplitude data were obtained from the spectral analysis of eight musical instrument sounds (bassoon, clarinet, flute, horn, oboe, saxophone, trumpet, and violin). Sounds were resynthesized with various levels of random spectral alteration, ranging from 1 to 50%. Listeners were asked to discriminate the randomly altered sounds from reference sounds resynthesized from the original data. Then several formulas designed to predict discrimination performance were evaluated by calculating the correspondence between the discrimination data and the associated spectral difference measurements. Averaged over the eight instruments, the best correspondence was achieved using a spectral error metric based on linear harmonic amplitude differences normalized by rms amplitude and raised to a power a. While an optimum correspondence of 91% was achieved for a 0.64, good correspondence occurred over a wide range of a. For linear harmonic amplitudes without rms normalization, good correspondence occurred within a narrower range, with a maximum correspondence of 88%. Correspondence was approximately 80% for decibelamplitude differences over an even narrower range. Other error metrics such as those based on critical-band grouping of components worked well but did not give any improvement over the method based on harmonic amplitudes, and in some cases yielded worse results. Spectral differences using a small number of representative frames emphasizing attack and decay transients yielded results slightly better than using all frames.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.