Bulk download - click topic to download Zip archive of all papers related to that topic: Acoustics and Live Sound Acoustics and Signal Processing Applications in Audio Audio Education Live Sound, Recording, and Production Perception Perception – Part 1 Perception – Part 2 Posters: Applications in Audio Posters: Recording and Production Posters: Spatial Audio Recording and Production Semantic Audio Signal Processing—Part 1 Signal Processing—Part 2 Spatial Audio Spatial Audio-Part 1 Spatial Audio-Part 2 (Evaluation) Transducers Transducers—Part 1 Transducers—Part 2 Transducers—Part 3
Since early perceptual audio coders such as mp3, the underlying psychoacoustic model that controls the encoding process has not undergone many dramatic changes. Meanwhile, modern audio coders have been equipped with semi-parametric or parametric coding tools such as audio bandwidth extension. Thereby, the initial psychoacoustic model used in a perceptual coder, just considering added quantization noise, became partly unsuitable. We propose the use of an improved psychoacoustic excitation model based on an existing model proposed by Dau et al. in 1997. This modulation-based model is essentially independent from the input waveform by calculating an internal auditory representation. Using the example of MPEG-H 3D Audio and its semi-parametric Intelligent Gap Filling (IGF) tool, we demonstrate that we can successfully control the IGF parameter selection process to achieve overall improved perceptual quality.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Modern audio codecs are used all over the world, reaching listeners with many different cultures and languages. This study investigates if and how cultural background influences the perception and preference of different audio coding artifacts, focusing on musical content. A subjective listening test was designed to directly compare different types of audio coding and was performed with Mandarin Chinese and German speaking listeners. Overall comparison showed largely consistent results, affirming the validity of the proposed test method. Differential comparison indicates preferences for certain artifacts in different listener groups, e.g., Chinese listeners tended to grade tonality mismatch higher and pre-echoes worse compared to German listeners, and musicians preferred bandwidth limitation over tonality mismatch when compared to non-musicians.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
This study investigates into the perceptual consequence of phase change in conventional magnitude-based source separation. A listening test was conducted, where the participants compared three different source separation scenarios, each with two phase retrieval cases: phase from the original mix or from the target source. The participants’ responses regarding their similarity to the reference showed that (1) the difference between the mix phase and the perfect target phase was perceivable in the majority of cases with some song-dependent exceptions, and (2) use of the mix phase degraded the perceived quality even in the case of perfect magnitude separation. The findings imply that there is room for perceptual improvement by attempting correct phase reconstruction in addition to achieving better magnitude-based separation.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
In the first part of the paper we have noticed that the impact of audible nonlinear distortions on subjective listener preference is strongly dependent on the spectral structure of a test signal. In the second part we propose a method for considering the spectral characteristics of a test signal in the evaluation of the subjective perception of audible nonlinear distortions. To describe the tonal structure of a music signal, a qualitative characteristic, tonality, is taken as a metric, and tonality coefficient is proposed as a measure of this characteristic. Subjective listening tests were performed to estimate how the auditory perception of nonlinear distortions depends on the tonal structure of a signal and the spectral distribution of the noise-to-mask ratio (NMR)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
To determine the preferred audio characteristics for media playback over smartphones, a series of controlled double-blind listening experiments were run to evaluate the subjective playback quality of six high-end smartphones. Listeners rated products based on their audio quality preference and left comments categorized by attribute. The devices were tested in different orientations in level-matched and maximum-volume scenarios. Positional variation and biases were accounted for using a motorized turntable and audio playback was controlled remotely with remote-access software. Test results were compared to spatially-averaged measurements made using a multitone stimulus and demonstrate that the smoothness of the frequency response is the most important aspect in smartphone preference. Low frequency extension, decreased levels of nonlinear distortion, and higher maximum playback level did not correlate with higher phone ratings.
Download Now (1.4 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Subjective experiments are a cornerstone of modern research with a variety of tasks being undertaken by subjects. In the field of audio, subjective listening tests provide validation for research and aid fair comparison between techniques or devices such as coding performance, speakers, mixes, and source separation systems. Several interfaces have been designed to mitigate biases and to standardize procedures, enabling indirect comparisons. The number of different combinations of interface and test design make it extremely difficult to conduct a truly unbiased listening test. This paper resolves the largest of these variables by identifying the impact the interface itself has on a purely auditory test. This information is used to make recommendations for specific categories of listening tests.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
In this work we propose a deep learning based method—namely, variational, convolutional recurrent autoencoders (VCRAE)—for musical instrument synthesis. This method utilizes the higher level time-frequency representations extracted by the convolutional and recurrent layers to learn a Gaussian distribution in the training stage, which will be later used to infer unique samples through interpolation of multiple instruments in the usage stage. The reconstruction performance of VCRAE is evaluated by proxy through an instrument classifier and provides significantly better accuracy than two other baseline autoencoder methods. The synthesized samples for the combinations of 15 different instruments are available on the companion website.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
This paper is concerned with music enhancement by removal of coding artifacts and recovery of acoustic characteristics that preserve the sound quality of the original music content. In order to achieve this, we propose a novel convolution neural network (CNN) architecture called FTD (Frequency-Time Dependent) CNN, which utilizes correlation and context information across spectral and temporal dependency for music signals. Experimental results show that both subjective and objective sound quality metrics are significantly improved. This unique way of applying a CNN to exploit global dependency across frequency bins may effectively restore information that is corrupted by coding artifacts in compressed music content.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
The Android “P” Audio Framework’s new Dynamics Processing Effect (DPE) in Android Open Source Project (AOSP), provides developers with controls to fine-tune the audio experience using several stages of equalization, multi-band compressors, and linked limiters. The API allows developers to configure the DPE’s multichannel architecture to exercise real-time control over thousands of audio parameters. This talk additionally discusses the design and use of DPE in the recently announced Sound Amplifier accessibility service for Android and outlines other uses for acoustic compensation and hearing applications.
Download Now (3.4 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Magnitude-oriented approaches dominate the voice analysis front-ends of most current technologies addressing, e.g., speaker identification, speech coding/compression, and voice reconstruction and re-synthesis. A popular technique is all-pole vocal tract modeling. The phase response of all-pole models is known to be non-linear and highly dependent on the magnitude frequency response. In this paper we use a shift-invariant phase-related feature that is estimated from signal harmonics in order to study the impact of all-pole models on the phase structure of voiced sounds. We relate that impact to the phase structure that is found in natural voiced sounds to conclude on the physiological validity of the group delay of all-pole vocal tract modeling. Our findings emphasize that harmonic phase models are idiosyncratic, and this is important in speaker identification and in fostering the quality and naturalness of synthetic and reconstructed speech.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!