144th AES CONVENTION Paper Session P02: Audio Quality Part 1

AES Milan 2018
Paper Session P02

P02 - Audio Quality Part 1

Wednesday, May 23, 10:30 — 12:30 (Scala 2)

Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany

P02-1 An Auditory Model-Inspired Objective Speech Intelligibility Estimate for Audio SystemsJayant Datta, Audio Precision - Beaverton, OR, USA; Xinhui Zhou, Audio Precision - Beaverton, OR, USA; Joe Begin, Audio Precision - Beaverton, OR, USA; Mark Martin, Audio Precision - Beaverton, OR, USA
Compared with subjective tests, objective measures save time and money. This paper presents the implementation of a new algorithm for objective speech intelligibility, based on the modified rhyme test using real speech. An auditory-model inspired signal processing framework approach gathers word selection evidence in auditory filter bank correlations and then uses an auditory attention model to perform word selection. It has been shown to outperform popular measures in terms of Pearson correlation coefficient to the human intelligibility scores. A real-time version of this approach has been integrated into a versatile audio test and measurement system supporting a number of interfaces (different combinations of devices/channels/systems). Examples and measurement results will be presented to show the advantages of this approach.
P02-2 A Statistical Model that Predicts Listeners’ Preference Ratings of Around-Ear and On-Ear HeadphonesSean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International Inc. - Northridge, CA, USA; Omid Khonsaripour, Harman International - Northridge, CA, USA
A controlled listening test was conducted on 31 different models of around-ear (AE) and on-ear (OE) headphones to determine listeners’ sound quality preferences. One-hundred-thirty listeners both trained and untrained rated the headphones based on preference using a virtual headphone method that used a single replicator headphone equalized to match magnitude and minimum phase responses of the different headphones. Listeners rated seven different headphones in each trial that included high (the new Harman AE-OE target curve) and low anchors. On average, both trained and untrained listeners preferred the high anchor to 31 other choices. Using machine learning a model was developed that predicts the listeners’ headphone preference ratings of the headphones based on deviation in magnitude response from the Harman target curve. Paper will be presented by Todd Welti]
P02-3 Comparing the Effect of HRTF Processing Techniques on Perceptual Quality RatingsAreti Andreopoulou, Laboratory of Music Acoustics and Technology (LabMAT) National and Kapodistrian University of Athens - Athens, Greece; Brian F. G. Katz, Sorbonne Université, CNRS, Institut Jean Le Rond d'Alembert - Paris, France
The use of Head-Related Transfer Functions for binaural rendering of spatial audio is quickly emerging in today’s audio market. Benefits of individual HRTFs, or personalized HRTF selection, have been demonstrated in numerous previous studies. A number of recent works have examined assisted or automated selection of HRTFs for optimized personalization. Such techniques attempt to rank HRTFs according to expected spatial quality for a given user based on signal, morphological, and/or perceptual studies. In parallel, there exist several HRTF processing methods that are often used to compact and/or smooth HRTFs in order to facilitate real-time treatments. Nevertheless, the potential impact of such processes on HRTF spatial quality is not always considered. This study examines the effects of three commonly used HRTF processing techniques (spectral smoothing in constant absolute bandwidths, minimum-phase decomposition, and infinite impulse response modeling) on perceptual quality ratings of selected HRTFs. Results showed that the frequency and phase-spectra variations introduced in the data by the three processing methods can lead to significant changes in HRTF evaluations. In addition, they highlight the challenging nature of non-individualized HRTF rating tasks and establish the need for systematic participant screening and sufficient task repetitions in perceptual HRTF evaluation studies.
P02-4 The Effect of Visual Cues and Binaural Rendering Method on Plausibility in Virtual EnvironmentsWill Bailey, University of Salford - Salford, UK; Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK
Immersive virtual reality is by its nature a multimodal medium and the use of spatial audio renderers for VR development is widespread. The aim of this study was to assess the performance of two common rendering methods and the effect of the presence of visual cues on plausibility of rendering. While it was found that the plausibility of the rendered audio was low, the results suggest that the use of measured responses performed comparatively better. In addition, absence of virtual sources reduced the number of simulated stimuli identified as real sources and complete absence of visual stimuli increased the rate of simulated audio identified emitted from the loudspeakers.
