AES Milan 2018
Paper Session P22
P22 - Perception – Part 3
Friday, May 25, 16:00 — 18:00 (Scala 4)
Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
P22-1 Audibility of Loudspeaker Group-Delay Characteristics—Juho Liski, Aalto University - Espoo, Finland; Aki Mäkivirta, Genelec Oy - Iisalmi, Finland; Vesa Välimäki, Aalto University - Espoo, Finland
Loudspeaker impulse responses were studied using a paired-comparison listening test to learn about the audibility of the loudspeaker group-delay characteristics. Several modeled and six measured loudspeakers were included in this study. The impulse responses and their time-reversed versions were used in order to maximize the change in the temporal structure and group delay without affecting the magnitude spectrum, and the subjects were asked whether they could hear a difference. Additionally, the same impulse responses were compared after convolving them with a pink impulse, defined in this paper, which causes a low-frequency emphasis. The results give an idea of how much the group delay of a loudspeaker system can vary so that it is unlikely to cause audible effects in sound reproduction. Our results suggest that when the group delay in the frequency range from 300 Hz to 1 kHz is below 1.0 ms, it is inaudible. With low-frequency emphasis the group delay variations can be heard more easily.
Convention Paper 10008 (Purchase now)
P22-2 The Influence of Hearing and Sight on the Perceptual Evaluation of Home Speaker Systems—Hans-Joachim Maempel, Federal Institute for Music Research - Berlin, Germany; TU Berlin; Michael Horn, Federal Institute for Music Research - Berlin, Germany
Home speaker systems are not only functional but also aesthetic objects with both acoustic and optical proper-ties. We investigated the perceptual evaluation of four home speaker systems under acoustic, optical, and opto-acoustic conditions (factor Domain). By varying the speakers' acoustic and optical properties under the opto-acoustic condition in a mutually independent manner (factors Acoustic loudspeaker, Optical loudspeaker), we also investigated their proportional influence on perception. To this end, 40 non-expert participants rated 10 auditory, 2 visual, and 4 audiovisual features. The acoustic stimuli were generated by means of data-based dynamic binaural synthesis. Noticeably, participants did not realize that the speakers were acoustically simulated. Results indicated that only the mean ratings of two auditory and one audiovisual feature were significantly influenced by the factor Domain. There were speaker-dependent effects on three further auditory features. Small crossmodal effects from Optical loudspeaker on six auditory features were observed. Remarkably, the audiovisual features, particularly monetary value, were dominated by the optical properties instead of the acoustic. This is due to a low acoustic and a high optical variance of the speakers. Results give reason to the hypothesis that the optical properties imply an overall quality that in turn may influence the rating of auditory features.
Convention Paper 10009 (Purchase now)
P22-3 A VR-Based Mobile Platform for Training to Non-Individualized Binaural 3D Audio—Chungeun Kim, University of Surrey - Guildford, Surrey, UK; Mark Steadman, Imperial College London - London, UK; Jean-Hugues Lestang, Imperial College London - London, UK; Dan F. M. Goodman, Imperial College London - London, UK; Lorenzo Picinali, Imperial College London - London, UK
Delivery of immersive 3D audio with arbitrarily-positioned sound sources over headphones often requires processing of individual source signals through a set of Head-Related Transfer Functions (HRTFs), the direction-dependent filters that describe the propagation of sound in an anechoic environment from the source to the listener's ears. The individual morphological differences and the impracticality of HRTF measurement make it difficult to deliver completely individualized 3D audio in this manner, and instead lead to the use of previously-measured non-individual sets of HRTFs. In this study a VR-based mobile sound localization training prototype system is introduced that uses HRTF sets for audio. It consists of a mobile phone as a head-mounted device, a hand-held Bluetooth controller, and a network-enabled PC with a USB audio interface and a pair of headphones. The virtual environment was developed on the mobile phone such that the user can listen-to/navigate-in an acoustically neutral scene and locate invisible target sound sources presented at random directions using non-individualized HRTFs in repetitive sessions. Various training paradigms can be designed with this system, with performance-related feedback provided according to the user's localization accuracy, including visual indication of the target location, and some aspects of a typical first-person shooting game, such as enemies, scoring, and level advancement. An experiment was conducted using this system in which 11 subjects went through multiple training sessions, using non-individualized HRTF sets. The localization performance evaluations showed reduction of overall localization angle error over repeated training sessions, reflecting lower front-back confusion rates.
Convention Paper 10010 (Purchase now)
P22-4 Speech-To-Screen: Spatial Separation of Dialogue from Noise towards Improved Speech Intelligibility for the Small Screen—Philippa Demonte, Acoustics Research Centre, University of Salford - Salford, UK; Yan Tang, University of Salford - Salford, UK; Richard J. Hughes, University of Salford - Salford, Greater Manchester, UK; Trevor Cox, University of Salford - Salford, UK; Bruno Fazenda, University of Salford - Salford, Greater Manchester, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK
Can externalizing dialogue when in the presence of stereo background noise improve speech intelligibility? This has been investigated for audio over headphones using head-tracking in order to explore potential future developments for small-screen devices. A quantitative listening experiment tasked participants with identifying target words in spoken sentences played in the presence of background noise via headphones. Sixteen different combinations of 3 independent variables were tested: speech and noise locations (internalized/externalized), video (on/off), and masking noise (stationary/fluctuating noise). The results revealed that the best improvements to speech intelligibility were generated by both the video-on condition and externalizing speech at the screen while retaining masking noise in the stereo mix.
Convention Paper 10011 (Purchase now)