AES E-Library

AES E-Library Search Results

Bulk download - click topic to download Zip archive of all papers related to that topic: Acoustics and Live Sound Acoustics and Signal Processing Applications in Audio Audio Education Live Sound, Recording, and Production Perception Perception – Part 1 Perception – Part 2 Posters: Applications in Audio Posters: Recording and Production Posters: Spatial Audio Recording and Production Semantic Audio Signal Processing—Part 1 Signal Processing—Part 2 Spatial Audio Spatial Audio-Part 1 Spatial Audio-Part 2 (Evaluation) Transducers Transducers—Part 1 Transducers—Part 2 Transducers—Part 3

Improved Psychoacoustic Model for Efficient Perceptual Audio Codecs

Since early perceptual audio coders such as mp3, the underlying psychoacoustic model that controls the encoding process has not undergone many dramatic changes. Meanwhile, modern audio coders have been equipped with semi-parametric or parametric coding tools such as audio bandwidth extension. Thereby, the initial psychoacoustic model used in a perceptual coder, just considering added quantization noise, became partly unsuitable. We propose the use of an improved psychoacoustic excitation model based on an existing model proposed by Dau et al. in 1997. This modulation-based model is essentially independent from the input waveform by calculating an internal auditory representation. Using the example of MPEG-H 3D Audio and its semi-parametric Intelligent Gap Filling (IGF) tool, we demonstrate that we can successfully control the IGF parameter selection process to achieve overall improved perceptual quality.

Authors: Disch, Sascha; van de Par, Steven; Niedermeier, Andreas; Burdiel Pérez, Elena; Berasategui Ceberio, Ane; Edler, Bernd
Affiliations: University of Oldenburg, Oldenburg, Germany; Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany; Friedrich Alexander University, International Audio Laboratories Erlangen, Erlangen, Germany(See document for exact affiliation information.)
AES Convention: 145 (October 2018) Paper Number: 10029 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in Music

Modern audio codecs are used all over the world, reaching listeners with many different cultures and languages. This study investigates if and how cultural background influences the perception and preference of different audio coding artifacts, focusing on musical content. A subjective listening test was designed to directly compare different types of audio coding and was performed with Mandarin Chinese and German speaking listeners. Overall comparison showed largely consistent results, affirming the validity of the proposed test method. Differential comparison indicates preferences for certain artifacts in different listener groups, e.g., Chinese listeners tended to grade tonality mismatch higher and pre-echoes worse compared to German listeners, and musicians preferred bandwidth limitation over tonality mismatch when compared to non-musicians.

Authors: Dick, Sascha; Zhang, Jiandong; Qin, Yili; Schinkel-Bielefeld, Nadja; Leschanowsky, Anna Katharina; Nagel, Frederik
Affiliations: International Audio Laboratories Erlangen, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS, Erlangen, Germany; Academy of Broadcasting Planning, SAPPRFT, Beijing, China(See document for exact affiliation information.)
AES Convention: 145 (October 2018) Paper Number: 10030 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Perception of Phase Changes in the Context of Musical Audio Source Separation

This study investigates into the perceptual consequence of phase change in conventional magnitude-based source separation. A listening test was conducted, where the participants compared three different source separation scenarios, each with two phase retrieval cases: phase from the original mix or from the target source. The participants’ responses regarding their similarity to the reference showed that (1) the difference between the mix phase and the perfect target phase was perceivable in the majority of cases with some song-dependent exceptions, and (2) use of the mix phase degraded the perceived quality even in the case of perfect magnitude separation. The findings imply that there is room for perceptual improvement by attempting correct phase reconstruction in addition to achieving better magnitude-based separation.

Authors: Kim, Chungeun; Grais, Emad M.; Mason, Russell; Plumbley, Mark D.
Affiliation: University of Surrey, Guildford, Surrey, UK
AES Convention: 145 (October 2018) Paper Number: 10031 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Method for Quantitative Evaluation of Auditory Perception of Nonlinear Distortion: Part II – Metric for Music Signal Tonality and its Impact on Subjective Perception of Distortions

In the first part of the paper we have noticed that the impact of audible nonlinear distortions on subjective listener preference is strongly dependent on the spectral structure of a test signal. In the second part we propose a method for considering the spectral characteristics of a test signal in the evaluation of the subjective perception of audible nonlinear distortions. To describe the tonal structure of a music signal, a qualitative characteristic, tonality, is taken as a metric, and tonality coefficient is proposed as a measure of this characteristic. Subjective listening tests were performed to estimate how the auditory perception of nonlinear distortions depends on the tonal structure of a signal and the spectral distribution of the noise-to-mask ratio (NMR)

Authors: Pakhomov, Mikhail; Rozhnov, Victor
Affiliation: SPb Audio R&D Lab, St. Petersburg, Russia
AES Convention: 145 (October 2018) Paper Number: 10032 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Developing a Method for the Subjective Evaluation of Smartphone Music Playback

To determine the preferred audio characteristics for media playback over smartphones, a series of controlled double-blind listening experiments were run to evaluate the subjective playback quality of six high-end smartphones. Listeners rated products based on their audio quality preference and left comments categorized by attribute. The devices were tested in different orientations in level-matched and maximum-volume scenarios. Positional variation and biases were accounted for using a motorized turntable and audio playback was controlled remotely with remote-access software. Test results were compared to spatially-averaged measurements made using a multitone stimulus and demonstrate that the smoothness of the frequency response is the most important aspect in smartphone preference. Low frequency extension, decreased levels of nonlinear distortion, and higher maximum playback level did not correlate with higher phone ratings.

Open
Access

Authors: McMullin, Elisabeth; Suha, Victoria; Li, Yuan; Saba, Will; Brunet, Pascal
Affiliation: Samsung Research America, Valencia, CA USA
AES Convention: 145 (October 2018) Paper Number: 10033 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

Download Now (1.4 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Investigation into the Effects of Subjective Test Interface Choice on the Validity of Results

Subjective experiments are a cornerstone of modern research with a variety of tasks being undertaken by subjects. In the field of audio, subjective listening tests provide validation for research and aid fair comparison between techniques or devices such as coding performance, speakers, mixes, and source separation systems. Several interfaces have been designed to mitigate biases and to standardize procedures, enabling indirect comparisons. The number of different combinations of interface and test design make it extremely difficult to conduct a truly unbiased listening test. This paper resolves the largest of these variables by identifying the impact the interface itself has on a purely auditory test. This information is used to make recommendations for specific categories of listening tests.

Authors: Jillings, Nicholas; De Man, Brecht; Stables, Ryan; Reiss, Joshua D.
Affiliations: Birmingham City University, Birmingham, UK; Queen Mary University of London, London, UK(See document for exact affiliation information.)
AES Convention: 145 (October 2018) Paper Number: 10034 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Perception – Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders

In this work we propose a deep learning based method—namely, variational, convolutional recurrent autoencoders (VCRAE)—for musical instrument synthesis. This method utilizes the higher level time-frequency representations extracted by the convolutional and recurrent layers to learn a Gaussian distribution in the training stage, which will be later used to infer unique samples through interpolation of multiple instruments in the usage stage. The reconstruction performance of VCRAE is evaluated by proxy through an instrument classifier and provides significantly better accuracy than two other baseline autoencoder methods. The synthesized samples for the combinations of 15 different instruments are available on the companion website.

Authors: Çakir, Emre; Virtanen, Tuomas
Affiliation: Tampere University of Technology, Tampere, Finland
AES Convention: 145 (October 2018) Paper Number: 10035 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Signal Processing—Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Music Enhancement by a Novel CNN Architecture

This paper is concerned with music enhancement by removal of coding artifacts and recovery of acoustic characteristics that preserve the sound quality of the original music content. In order to achieve this, we propose a novel convolution neural network (CNN) architecture called FTD (Frequency-Time Dependent) CNN, which utilizes correlation and context information across spectral and temporal dependency for music signals. Experimental results show that both subjective and objective sound quality metrics are significantly improved. This unique way of applying a CNN to exploit global dependency across frequency bins may effectively restore information that is corrupted by coding artifacts in compressed music content.

Authors: Porov, Anton; Oh, Eunmi; Choo, Kihyun; Sung, Hosang; Jeong, Jonghoon; Osipov, Konstantin; Francois, Holly
Affiliations: PDMI RAS, St. Petersburg, Russia; Samsung Electronics Co., Ltd., Seoul, Korea; Samsung Electronics R&D Institute UK, Staines-Upon Thames, Surrey, UK(See document for exact affiliation information.)
AES Convention: 145 (October 2018) Paper Number: 10036 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Signal Processing—Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

The New Dynamics Processing Effect in Android Open Source Project

The Android “P” Audio Framework’s new Dynamics Processing Effect (DPE) in Android Open Source Project (AOSP), provides developers with controls to fine-tune the audio experience using several stages of equalization, multi-band compressors, and linked limiters. The API allows developers to configure the DPE’s multichannel architecture to exercise real-time control over thousands of audio parameters. This talk additionally discusses the design and use of DPE in the recently announced Sound Amplifier accessibility service for Android and outlines other uses for acoustic compensation and hearing applications.

Open
Access

Author: Garcia, Ricardo
Affiliation: Google, Mountain View, CA, USA
AES Convention: 145 (October 2018) Paper Number: 10037 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Signal Processing—Part 1

Download Now (3.4 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

On the Physiological Validity of the Group Delay Response of All-Pole Vocal Tract Modeling

Magnitude-oriented approaches dominate the voice analysis front-ends of most current technologies addressing, e.g., speaker identification, speech coding/compression, and voice reconstruction and re-synthesis. A popular technique is all-pole vocal tract modeling. The phase response of all-pole models is known to be non-linear and highly dependent on the magnitude frequency response. In this paper we use a shift-invariant phase-related feature that is estimated from signal harmonics in order to study the impact of all-pole models on the phase structure of voiced sounds. We relate that impact to the phase structure that is found in natural voiced sounds to conclude on the physiological validity of the group delay of all-pole vocal tract modeling. Our findings emphasize that harmonic phase models are idiosyncratic, and this is important in speaker identification and in fostering the quality and naturalness of synthetic and reconstructed speech.

Author: Ferreira, Aníbal
Affiliation: University of Porto, Porto, Portugal
AES Convention: 145 (October 2018) Paper Number: 10038 Permalink
Publication Date: October 7, 2018 Import into BibTeX
Subject: Signal Processing—Part 1

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Search Results (Displaying 1-10 of 103 matches)		New Search
Sort by:
	Records Per Page:

AES E-Library Search Results

Improved Psychoacoustic Model for Efficient Perceptual Audio Codecs

On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in Music

Perception of Phase Changes in the Context of Musical Audio Source Separation

Method for Quantitative Evaluation of Auditory Perception of Nonlinear Distortion: Part II – Metric for Music Signal Tonality and its Impact on Subjective Perception of Distortions

Developing a Method for the Subjective Evaluation of Smartphone Music Playback

Investigation into the Effects of Subjective Test Interface Choice on the Validity of Results

Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders

Music Enhancement by a Novel CNN Architecture

The New Dynamics Processing Effect in Android Open Source Project

On the Physiological Validity of the Group Delay Response of All-Pole Vocal Tract Modeling

ABOUT AES

Contact Us