Toward a Statistically Well-Grounded Evaluation of Listening Tests—Avoiding Pitfalls, Misuse, and Misconceptions
Many recent publications in audio research present subjective evaluations of audio quality based on the Recommendation ITU-R BS.1534-1 (MUSHRA, MUltiple Stimuli with Hidden Reference and Anchor). This is a very welcome trend because it enables researchers to assess the implications of their developments. The evaluation of listening tests, however, sometimes sufers from an incomplete understanding of the underlying statistics. The present paper aims at identifying the causes for the pitfalls and misconceptions in MUSHRA evaluations. It exemplifes the impact of falsely used or even misused statistics. Subsequently, schemes for evaluating the listeners' judgments that are well-grounded on statistical considerations comprising an understanding of the concepts of statistical power and efect size are proposed.
Click to purchase paper or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $20 for non-members, $5 for AES members and is free for E-Library subscribers.