AES New York 2019
Paper Session P7
P7 - Perception
Thursday, October 17, 9:00 am — 12:00 pm
Elisabeth McMullin, Samsung Research America - Valencia, CA USA
P7-1 A Binaural Model to Estimate Room Impulse Responses from Running Signals and Recordings—Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA; David Dahlbom, Rensselaer Polytechnic Institute - Troy, NY, USA; Nate Keil, Rensselaer Polytechnic Institute - Troy, NY, USA
A binaural model is described that can use a multichannel signal to robustly localize a sound source in the presence of multiple reflections. The model also estimates a room impulse response from a running multichannel signal, e.g., from a recording, and determines the spatial locations and delays of early reflections, without any prior or additional knowledge of the source. A dual-layer cross-correlation/autocorrelation algorithm is used to determine the interaural time difference (ITD) of the direct sound source component and to estimate a binaural activity pattern. The model is able to accurately localize broadband signals in the presence of real room reflections.
Convention Paper 10257
P7-2 Describing the Audible Effects of Nonlinear Loudspeaker Distortion—Elisabeth McMullin, Samsung Research America - Valencia, CA USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions; Zhongran Wang, Samsung Research America, Audio Lab - Valencia, CA, USA
In order to evaluate how and when listeners hear distortion in a nonlinear loudspeaker model, a three-part study was designed. A variety of audio files were processed through both a linear and a nonlinear loudspeaker model and the input signals were calibrated to produce a prescribed level of distortion in the nonlinear model. Listeners completed subjective experiments in which they heard both versions of the clips, selected the audible attributes they believed changed, and described the differences in their own words. In later tests, listeners marked in time they heard changes in the most commonly used descriptors. A full analysis of listener comments and time-based relationships is explored with theoretical explanations of the results obtained.
Convention Paper 10258
P7-3 Spatial Auditory Masking for Three-Dimensional Audio Coding—Masayuki Nishiguchi, Akita Prefectural University - Yurihonjo Akita, Japan; Kodai Kato, Akita Prefectural University - Yurihonjo Akita, Japan; Kanji Watanabe, Akita Prefectural University - Yurihonjo Akita, Japan; Koji Abe, Akita Prefectural University - Yurihonjo Akita, Japan; Shouichi Takane, Akita Prefectural University - Yurihonjo, Akita, Japan
Spatial auditory masking effects have been examined for developing highly efficient audio coding algorithms for signals in three-dimensional (3D) sound fields. Generally, the masking threshold level is lowered according to the increase of the directional difference between masker and maskee signals. However, we found that when a maskee signal is located at the symmetrical position of the masker signal with respect to the frontal plane of a listener, the masking threshold level is not lowered, which counters the expectations. A mathematical model is proposed to estimate the masking threshold caused by multiple masker signals in the 3D sound field. Using the model, the perceptual entropy of a tune from a two channel stereo CD was reduced by approximately 5.5%.
Convention Paper 10259
P7-4 Investigation of Masking Thresholds for Spatially Distributed Sound Sources—Sascha Dick, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Rami Sweidan, University of Stuttgart - Stuttgart, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
For perceptual audio coding of immersive content, the investigation of masking effects between spatially distributed sound sources is of interest. We conducted subjective listening experiments to determine the masking thresholds for “tone-masking-noise” conditions when masker (1 kHz sine tone) and probe (1 kHz narrow band noise) are spatially distributed using an immersive 22.2 loudspeaker setup. Our results show masking thresholds in the range of –35 dB to –26 dB probe-to-masker-ratio. As expected, least masking was found between left/right opposed sources with up to 5 dB lower than for coincident sources. Other noteworthy observations included an increase of masking for certain elevations and cases of selective masking decrease due to interaural phase difference phenomena.
Convention Paper 10260
P7-5 An Attempt to Elicit Horizontal and Vertical Auditory Precedence Percepts without Pinnae Cues—Wesley Bulla, Belmont University - Nashville, TN, USA; Paul Mayo, University of Maryland - College Park, MD, USA
This investigation was a continuation of AES-143 paper #9832 and AES-145 paper #10066 where reliable auditory precedence in the elevated, ear-level, and lowered horizontal planes was examined. This experiment altered and eliminated the spectral influences that govern the detection of elevation and presented two different horizontal and vertical inter-channel time delays during a precedence-suppression task. A robust precedence effect was elicited via ear-level horizontal plane loudspeakers. In contrast, leading signal identification was minimal in the vertical condition and no systematic influence of the leading elevated and lowered median plane loudspeakers was witnessed suggesting that precedence was not active in the vertical condition. Observed influences that might have been generated by the lead-lag signal in the vertical plane was not consistent with any known precedence paradigms.
Convention Paper 10261
P7-6 Perceptual Weighting to Improve Coding of Harmonic Signals—Elias Nemer, XPERI/DTS - Calabasas, CA, USA; Zoran Fejzo, DTS/Xperi Corp. - Calabasas, CA, USA; Jeff Thompson, XPERI/DTS - Calabasas, CA, USA
This paper describes a new approach to improving the coding of harmonic signals in transform-based audio codecs employing pulse vector quantization. The problem occurs when coding at low rate signals with varying levels of harmonics. As a result of vector quantization (VQ), some lower level harmonics may be missed or fluctuating and cause perceptual artifacts. The proposed solution consists of applying perceptual weighting to the computed synthesis error in the search loop of the VQ. The objective being to de-emphasize the error in the high tonal peaks where signal energy partially masks the quantization noise. Simulation results over mixed musical content showed a noticeable improvement in perceptual scores, particularly for highly harmonic signals.
Convention Paper 10262