145th AES CONVENTION Paper Session P01: Perception — Part 1

AES New York 2018
Paper Session P01

P01 - Perception – Part 1

Wednesday, October 17, 9:00 am — 12:00 pm (1E11)

Elisabeth McMullin, Samsung Research America - Valencia, CA USA

P01-1 Improved Psychoacoustic Model for Efficient Perceptual Audio CodecsSascha Disch, Fraunhofer IIS - Erlangen, Germany; Steven van de Par, Fraunhofer Hör-Sprach- und Audio Technologie - Oldenburg, Germany; Andreas Niedermeier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Elena Burdiel Pérez, Fraunhofer IIS - Erlangen, Germany; Ane Berasategui Ceberio, Fraunhofer IIS - Erlangen, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany
Since early perceptual audio coders such as mp3, the underlying psychoacoustic model that controls the encoding process has not undergone many dramatic changes. Meanwhile, modern audio coders have been equipped with semi-parametric or parametric coding tools such as audio bandwidth extension. Thereby, the initial psychoacoustic model used in a perceptual coder, just considering added quantization noise, became partly unsuitable. We propose the use of an improved psychoacoustic excitation model based on an existing model proposed by Dau et al. in 1997. This modulation-based model is essentially independent from the input waveform by calculating an internal auditory representation. Using the example of MPEG-H 3D Audio and its semi-parametric Intelligent Gap Filling (IGF) tool, we demonstrate that we can successfully control the IGF parameter selection process to achieve overall improved perceptual quality.
Convention Paper 10029 (Purchase now)

P01-2 On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in MusicSascha Dick, International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS - Erlangen, Germany; Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jiandong Zhang, Academy of Broadcasting Planning, SAPPRFT - Beijing, China; Yili Qin, Academy of Broadcasting Planning, SAPPRFT - Beijing, China; Nadja Schinkel-Bielefeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Silvantos GmbH - Erlangen, Germany; Anna Katharina Leschanowsky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany
Modern audio codecs are used all over the world, reaching listeners with many different cultures and languages. This study investigates if and how cultural background influences the perception and preference of different audio coding artifacts, focusing on musical content. A subjective listening test was designed to directly compare different types of audio coding and was performed with Mandarin Chinese and German speaking listeners. Overall comparison showed largely consistent results, affirming the validity of the proposed test method. Differential comparison indicates preferences for certain artifacts in different listener groups, e.g., Chinese listeners tended to grade tonality mismatch higher and pre-echoes worse compared to German listeners, and musicians preferred bandwidth limitation over tonality mismatch when compared to non-musicians.
Convention Paper 10030 (Purchase now)

P01-3 Perception of Phase Changes in the Context of Musical Audio Source SeparationChungeun Kim, University of Surrey - Guildford, Surrey, UK; Sony Interactive Entertainment Europe - London, UK; Emad M. Grais, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
This study investigates into the perceptual consequence of phase change in conventional magnitude-based source separation. A listening test was conducted, where the participants compared three different source separation scenarios, each with two phase retrieval cases: phase from the original mix or from the target source. The participants’ responses regarding their similarity to the reference showed that (1) the difference between the mix phase and the perfect target phase was perceivable in the majority of cases with some song-dependent exceptions, and (2) use of the mix phase degraded the perceived quality even in the case of perfect magnitude separation. The findings imply that there is room for perceptual improvement by attempting correct phase reconstruction in addition to achieving better magnitude-based separation.
Convention Paper 10031 (Purchase now)

P01-4 Method for Quantitative Evaluation of Auditory Perception of Nonlinear Distortion: Part II – Metric for Music Signal Tonality and its Impact on Subjective Perception of DistortionsMikhail Pakhomov, SPb Audio R&D Lab - St. Petersburg, Russia; Victor Rozhnov, SPb Audio R&D Lab - St. Petersburg, Russia
In the first part of the paper we have noticed that the impact of audible nonlinear distortions on subjective listener preference is strongly dependent on the spectral structure of a test signal. In the second part we propose a method for considering the spectral characteristics of a test signal in the evaluation of the subjective perception of audible nonlinear distortions. To describe the tonal structure of a music signal, a qualitative characteristic, tonality, is taken as a metric, and tonality coefficient is proposed as a measure of this characteristic. Subjective listening tests were performed to estimate how the auditory perception of nonlinear distortions depends on the tonal structure of a signal and the spectral distribution of the noise-to-mask ratio (NMR)
Convention Paper 10032 (Purchase now)

P01-5 Developing a Method for the Subjective Evaluation of Smartphone Music PlaybackElisabeth McMullin, Samsung Research America - Valencia, CA USA; Victoria Suha, Samsung Electronics - Valencia, CA, USA; Yuan Li, Samsung Research America - Valencia, CA, USA; Will Saba, Samsung Electronics - Valencia, CA, USA; Pascal Brunet, Samsung Research America - Valencia, CA USA; Audio Group - Digital Media Solutions
To determine the preferred audio characteristics for media playback over smartphones, a series of controlled double-blind listening experiments were run to evaluate the subjective playback quality of six high-end smartphones. Listeners rated products based on their audio quality preference and left comments categorized by attribute. The devices were tested in different orientations in level-matched and maximum-volume scenarios. Positional variation and biases were accounted for using a motorized turntable and audio playback was controlled remotely with remote-access software. Test results were compared to spatially-averaged measurements made using a multitone stimulus and demonstrate that the smoothness of the frequency response is the most important aspect in smartphone preference. Low frequency extension, decreased levels of nonlinear distortion, and higher maximum playback level did not correlate with higher phone ratings.
Convention Paper 10033 (Purchase now)

P01-6 Investigation into the Effects of Subjective Test Interface Choice on the Validity of ResultsNicholas Jillings, Birmingham City University - Birmingham, UK; Brecht De Man, Birmingham City University - Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Subjective experiments are a cornerstone of modern research with a variety of tasks being undertaken by subjects. In the field of audio, subjective listening tests provide validation for research and aid fair comparison between techniques or devices such as coding performance, speakers, mixes, and source separation systems. Several interfaces have been designed to mitigate biases and to standardize procedures, enabling indirect comparisons. The number of different combinations of interface and test design make it extremely difficult to conduct a truly unbiased listening test. This paper resolves the largest of these variables by identifying the impact the interface itself has on a purely auditory test. This information is used to make recommendations for specific categories of listening tests.
Convention Paper 10034 (Purchase now)

Return to Paper Sessions