AES Munich 2009
Paper Session P24
Assessment and Evaluation
Sunday, May 10, 09:00 — 12:30
Chair: Gaëtan Lorho
P24-1 Influence of Level Setting on Loudspeaker Preference Ratings—Vincent Koehl, Mathieu Paquier, Université de Brest - Plouzané, France
The perceived audio quality of a sound-reproduction device such as a loudspeaker is hard to evaluate. Industrial and academic researchers are still focusing on the design of reliable assessment procedures to measure this subjective character. One of the main issues of listening tests is about their validity in regard to real comparison situations (Hi-Fi magazine evaluations, audiophile, sound engineer, customer, etc.). Are the conclusions of laboratory tests consistent with these almost informal comparisons? As an example, one of the main differences between listening tests and real-life comparisons is about the loudness matching. This paper is aimed at comparing paired-comparison tests that are commonly accomplished under laboratory conditions with a procedure assumed to be closer to real-life conditions. It shows that differences in the test procedures led to differences in the subjective assessments.
Convention Paper 7782 (Purchase now)
P24-2 Comparing Three Methods for Sound Quality Evaluation with Respect to Speed and Accuracy—Florian Wickelmaier, Nora Umbach, Konstantin Sering, University of Tübingen - Tübingen, Germany; Sylvain Choisel, Bang & Olufsen A/S - Struer, Denmark, now at Philips Consumer Lifestyle, Leuven, Belgium
The goal of the present study was to compare three response-collection methods that may be used in sound quality evaluation. To this end, 52 listeners took part in an experiment where they assessed the audio quality of musical excerpts and six processed versions thereof. For different types of program material, participants performed (a) a direct ranking of the seven sound samples, (b) pairwise comparisons, and (c) a novel procedure, called ranking by elimination. The latter requires subjects on each trial to eliminate the least preferred sound; the elimination continues until only the sample with the highest audio quality is left. The methods are compared with respect to the resulting ranking/scaling and the time required to obtain the results.
Convention Paper 7783 (Purchase now)
P24-3 Reference Units for the Comparison of Speech Quality Test Results—Nicolas Côté, Ecole Nationale d’Ingénieurs de Brest - Plouzané, France, Deutsche Telekom Laboratories, Berlin, Germany; Vincent Koehl, Université de Brest - Plouzané, France; Valérie Gautier-Turbin, France Telecom R&D - Lannion, France; Alexander Raake, Sebastian Möller, Deutsche Telekom Laboratories - Berlin, Germany
Subjective tests are carried out to assess the quality of an entity as perceived by a user. However, several characteristics inherent to the subject or to the test methodology might influence the users’ judgments. As a result, reference conditions are usually included in subjective tests. In the field of quality of transmitted speech, reference conditions correspond to a speech sample impaired by a known amount of degradation. In this paper several kinds of reference conditions and the process used for their production are presented. Examples of the corresponding normalization procedure of each kind of reference are given.
Convention Paper 7784 (Purchase now)
P24-4 The Influence of Sound Processing on Listeners’ Program Choice in Radio Broadcasting—Hans-Joachim Maempel, Fabian Gawlik, Technische Universität Berlin - Berlin, Germany
Many opinions on broadcast sound processing are founded on tacit assumptions about certain effects on listeners. These, however, have lacked support by internally and ecologically valid empirical data so far. Thus, under largely realistic conditions it has been experimentally investigated to what extent broadcast sound processing influences listeners’ program choice. Technical features of stimuli, socio-demographic data of the test persons, and data of listening conditions have been additionally collected. In the main experiment, subjects were asked to choose one out of six audio stimuli varied in content and sound processing. The varied sound processing caused marginal and statistically not significant differences in frequencies of program choice. By contrast, a subsequent experiment enabling a direct comparison of different sound processings of the same audio content yielded distinct preferences for certain sound processings.
Convention Paper 7785 (Purchase now)
P24-5 Free Choice Profiling and Natural Grouping as Methods for the Assessment of Emotions in Musical Audio Signals—Sebastian Schneider, Florian Raschke, Ilmenau University of Technology - Ilmenau, Germany; Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology, IDMT - Ilmenau, Germany; Dominik Strohmeier, Ilmenau University of Technology - Ilmenau, Germany
To measure the perceived emotions caused by musical audio signals we propose to use “Free Choice Profiling” (FCP) combined with “Natural Grouping” (NG). FCP/NG—originally derived from food research and new to the research of music perception—allow participants to evaluate stimuli using their own vocabulary. To evaluate the proposed methods, we conducted an experiment where 16 participants had to assess major-major and minor-minor chord pairs. Unlike one could expect, allowing participants to express themselves freely does not lead to a degeneration of the quality of the data. Instead, clearly interpretable results consistent with music theory and emotional psychology were obtained. These results encourage further investigations, which could lead to a general method for assessing emotions in music.
Convention Paper 7786 (Purchase now)
P24-6 Subjective Quality Evaluation of Audio Streaming Applications on Absolute and Paired Rating Scales—Bernhard Feiten, Alexander Raake, Marie-Neige Garcia, Ulf Wüstenhagen, Jens Kroll, Deutsche Telekom Laboratories - Berlin, Germany
In the context of the development of a parametric model for the quality assessment of audiovisual IP-based multimedia applications, audio tests have been carried out. The test method used for the subjective audio tests was aligned to the method used for video tests. Hence, the Absolute Category Ranking (ACR) method was applied. To prove the usability of ACR tests for this purpose MUSHRA and ACR were applied in parallel listening tests. The MPEG audio codecs AAC, HE-AAC, MP2, and MP3 at different bitrates and different packet loss conditions were evaluated. The test results show that the ACR method also reveals the quality differences for higher qualities, even though MUSHRA has superior resolution.
Convention Paper 7787 (Purchase now)
P24-7 Assessor Selection Process for Multisensory Applications—Søren Vase Legarth, Nick Zacharov, DELTA SenseLab - Hørsholm, Denmark
Assessor panels are used to perform perceptual evaluation tasks in the form of listening and viewing tests. In order to ensure the quality of collected data it is vital that the selected assessors have the desired qualities in terms of discrimination aptitude as well as consistent rating ability. This work extends existing procedures in this field to provide a statistically robust and effcient manner for assessing and evaluating the performance of assessors for listening and viewing tasks.
Convention Paper 7788 (Purchase now)