AES E-Library

AES E-Library Search Results

Bulk download - click topic to download Zip archive of all papers related to that topic: No Subject Listed Applications in Audio Audio Culture and Education Perception Psychoacoustics and Perception Recording and Production Room Acoustics Semantic Audio Signal Processing Spatial Audio Transducers

A Perceptual Evaluation of Spatial Room Impulse Responses Captured Using Spaced Higher-Order Ambisonic Microphone Arrays

Ambisonic spatial room impulse responses (SRIRs) can be used in digital reverberators to create three-dimensional artificial reverberation. Previous research has shown that mixing the spatially-filtered outputs from two spaced higher-order ambisonic (HOA) microphone arrays can improve the listeners perception of ambience sound compared to a single HOA array in music recording applications. Perceptual attributes used to describe SRIRs may be similarly influenced by this technique. In the present study, a method for deriving a composite HOA SRIR from two spaced HOA microphone arrays is described. Several microphone spacing distances commonly used in stereophonic recording were measured (17cm, 34cm, and 68cm) and compared to a single-array control. Subjective evaluations were conducted over a 7.0.4 loudspeaker array and headphones (binaural decoding). The stimuli were rated based on perceived clarity of background and early spatial impressions, environment / room width, sensation of physical presence, and overall listening experience. Results show the technique may improve the clarity of early spatial impressions and perceived environment width under certain conditions. Generally, participants had a difficulty discriminating between array spacings. The study suggests that there are potential perceptual benefits to this approach, but further studies involving additional room and source types is needed.

Authors: Kelly, Jack; Grond, Florian; Woszczyk, Wieslaw; King, Richard
Affiliations: McGill University and CIRMMT, Montreal, Canada; McGill University, CIRMMT, Montreal, Canada and Zylia, Poznan, Poland; McGill University and CIRMMT, Montreal, Canada; McGill University and CIRMMT, Montreal, Canada(See document for exact affiliation information.)
Express Paper 1; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Spatial Audio

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Spatial Audio!

A Perceptual Evaluation of Spatial Room Impulse Responses Convolved with Multichannel Direct Sound

In a channel-based paradigm, artificial reverberation can be produced by spatial room impulse response (SRIR) convolution. Many receiver positions are measured using multichannel microphone arrays, typically optimized for specific loudspeaker configurations. Each SRIR channel is then convolved with a single monophonic direct sound input to auralize a virtual acoustic environment (VAE). In the following study, an experimental direct sound input treatment is presented where each SRIR channel is convolved with a bespoke direct sound file recorded at a specific angle of incidence with reference to the performer. The result is an SRIR auralization which incorporates the instruments’ time-variant radiation characteristics via multichannel direct sound. A subjective evaluation was conducted to assess the perceptual differences between SRIRs with and without the multichannel direct sound treatment. Participants’ sense of physical presence, sensory immersion, environment / room width, and naturalness of the sound source were evaluated. Results of three-way within-subject ANOVAs show that the multichannel direct sound convolution treatment significantly improved participants ratings of all attributes under test, and was preferred over the monophonic direct sound treatment. A significant interaction between reverberation treatment and instrument type was observed for environment / room width. The study suggests that this technique can be effective in practice, and more broadly supports the notion that representing a sound-sources’ time-variant directivity in artificial reverberation could be a beneficial feature in the design of 3D audio production tools.

Authors: Kelly, Jack; Woszczyk, Wieslaw; King, Richard
Affiliations: McGill University and CIRMMT, Montreal, Canada; McGill University and CIRMMT, Montreal, Canada; McGill University and CIRMMT, Montreal, Canada(See document for exact affiliation information.)
Express Paper 2; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Spatial Audio

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Spatial Audio!

Design Optimization of Acoustic Structures Using Geometry Parameterization and Particle Swam Optimization Algorithm

The geometric design of acoustic components, such as horns and surroundings, is time-consuming and complicated. A design process is introduced in this study to shorten the design-test-optimization cycle. The design process consists of three parts. The first part is the geometry parameterization. Formulas to build the shape of the geometry are provided, from a 2D profile to a 3D geometry. In the second part, the acoustic properties of the acoustic components are simulated; for example, the sound pressure level (SPL), which is advised in most acoustic component designs. Optimization is performed in the third part. Using the particle swarm optimization (PSO) algorithm, the geometric parameters are modified to improve the acoustic properties. We consider the horn as an example in this study. However, the parametric geometry and optimization method above are highly flexible and can be performed in diverse structures, in which the derivative of the surface is expected to be continuous. The entire process is assembled in one software, which makes the design program easy to handle and highly efficient.

Authors: Li, Chaopen; Cao, Huixain
Affiliations: vivo Mobile Communication Co, Ltd, China; vivo Mobile Communication Co, Ltd, China(See document for exact affiliation information.)
Express Paper 3; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Transducers

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Transducers!

Immersive Personal Sound Using a Surface Nearfield Source

This paper discusses sound reproduction using a surface nearfield source (SNS), which is categorized between headphones and loudspeakers providing also a natural audio-tactile augmentation to the listening experience. The SNS can be embedded for example in the headrest as a personal sound system. In this sense it has similarities to headphones, but there is no need to wear a device. The SNS has also several advantages compared to loudspeakers, such as suppressed room effect and enhanced bass perception. Differences and similarities of the SNS approach with open and closed headphones, mobile device speakers, and regular loudspeakers are itemized. The SNS implementation is applicable to e.g. movie theater couches and car seats.

Open
Access

Authors: Linjama, Jukka; Välimäki, Vesa
Affiliations: Flexound Systems Oy, Espoo, Finland; Flexound Systems Oy, Espoo, Finland and Aalto University, Espoo, Finland(See document for exact affiliation information.)
Express Paper 4; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Spatial Audio

Download Now (749 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Spatial Audio!

Improving Domain Generalization Via Event-Based Acoustic Scene Classification

Acoustic Scene Classification (ASC) has been typically addressed by feeding raw audio features to deep neural networks. However, such an audio-based approach has consistently proved to result in poor model generalization across different recording devices. In fact, device-specific transfer functions and nonlinear dynamic range compression highly affect spectro-temporal features, resulting in a deviation from the learned data distribution known as domain shift. In this paper, we present an alternative ASC paradigm that involves ditching the classic end-to-end audio-based training in favor of gathering an intermediate event-based representation of the acoustic scenes using large-scale pretrained models. Performance evaluation on the TAU Urban Acoustic Scenes 2020 Mobile Development dataset shows that the proposed event-based approach is up to 160% more robust than corresponding audio-based methods in the face of mismatched recording devices.

Authors: Mezza, Alessandro Ilic; Sarti, Augusto
Affiliations: Politecnico di Milano, Italy; Politecnico di Milano, Italy(See document for exact affiliation information.)
Express Paper 5; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Semantic Audio

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Semantic Audio!

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Chromagram have been proven more effective and convenient than training on time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with various combinations. In this paper, we provide a PyTorch framework for creating spectral features and time-frequency transformation using the built-in trainable conv1d() layer. This allows computing these on-the-fly as part of a larger network and enabling easier experimentation with various parameters. Our work extends the work in the literature developed for that end: First by adding more of these features; and also by allowing the possibility of either training from initialized kernels or training from random values and converging to the desired solution. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes for various applications.

Open
Access

Authors: Nemer, Elias; Vines, Greg
Affiliations: Irvine, CA, USA; Irvine, CA, USA(See document for exact affiliation information.)
Express Paper 6; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Applications in Audio

Download Now (1014 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Applications in Audi!

The new digital cinema at Fraunhofer IIS as a versatile immersive reference listening room

This paper presents the new cinema room at Fraunhofer IIS which has been designed as a laboratory for state of the art and future audio and video research. It provides vast flexibility in regard to playback options to support the researchers’ and engineers’ work in the field of Next Generation Audio production, delivery, transmission and reproduction. At the same time it fulfills all relevant requirements for theatrical playback. To achieve this, the Digital Cinema of Fraunhofer IIS underwent a major upgrade, providing up-to-date audio and video playback to serve as a reference reproduction facility for quality assessment. In order to reproduce all substantial speaker layout formats or channel configurations, a powerful and highly flexible audio matrix has been implemented as a core element. It allows for processing, altering and monitoring of all audio signals before they are fed to 67 speakers. Measurements have been conducted to confirm that the acoustic characteristics satisfy the relevant standards. This paper focuses on audio features and only briefly mentions video projector capabilities, the screen, media control, and biosignal acquisition.

Authors: Scuda, Ulli; Mayenfels, Thomas; Eibl, Philipp; Hörlbacher, Wolfgang
Affiliations: Fraunhofer IIS; Fraunhofer IIS; Fraunhofer IIS; Fraunhofer IIS(See document for exact affiliation information.)
Express Paper 7; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Spatial Audio

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Spatial Audio!

The Psychoacoustic Perception of Distance in Monophonic and Binaural Music

Music plays an important role in immersive environments. With the rise in immersive technology, scientists in the field have been researching how music is perceived in extended reality (XR), in an attempt to improve the authenticity of audiovisual experiences. The auditory perception of distance is an important issue in 3D virtual environments but has been considered an obscure area. The study aims to compare monophonic and binaural signals for more accurate localization, over different types of music audio sources.

Authors: Riaz, Maham; Roginska, Agnieszka
Affiliations: New York University, NY, USA; New York University, NY, USA(See document for exact affiliation information.)
Express Paper 8; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Spatial Audio

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Spatial Audio!

Higher order ambisonics compression method based on autoencoder

The compression of three-dimensional sound field signals has always been a very important issue. Recently, an Independent Component Analysis (ICA) based Higher Order Ambisonics (HOA) compression method introduces blind source separation to solve the shortcomings of discontinuity between frames in the existing Singular Value Decomposition (SVD) based methods. However, ICA is weak to model the reverberant environment, and its target is not to recover original signal. In this work, we replace ICA with autoencoder to further improve the above method’s ability to cope with reverberation conditions and ensure the unanimous optimization both in separation and recovery by reconstruction loss. We constructed a dataset with simulated and recorded signals, and verified the effectiveness of our method through objective and subjective experiments.

Authors: Qu, Tianshu; Xu, Jiahao; Yuan, Zeyu; Wu; Xihong
Affiliations: Peking University, Beijing, China; Peking University, Beijing, China; Peking University, Beijing, China; Peking University, Beijing, China(See document for exact affiliation information.)
Express Paper 9; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Applications in Audio

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Applications in Audi!

Everything Plus the Kitchen Sink: An Introduction to Noise in Contemporary Art and Music

Disruptive, disturbing, and dangerous are all adjectives that are commonly attributed to noise. This may be because the experience of noise is likely to trigger the auditory startle response, which in turn propels one out of harm’s way. For those who are unable to consider noise beyond its negative connotations, it remains a threat. However, there is a growing number of artists and composers who choose to consider noise differently and use it as an aesthetic material. With unconventional methods, instruments, and applications, these creatives liberate noise from its habitually perceived confines and transduce it into aesthetic material that can be used to challenge power structures, call attention to injustices, encourage collaboration, and is even transformed into means of spiritual and artistic expansion. In this writing, I will highlight artists and composers such as Luigi Russolo, John Cage, Pauline Oliveros, and others who use noise aesthetically.

Open
Access

Author: Mazurek, Mary
Affiliation: University of Lethbridge, Canada
Express Paper 10; AES Convention 153; October 2022 Permalink
Publication Date: October 19, 2022 Import into BibTeX
Subject: Psychoacoustics and Perception

Download Now (442 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Psychoacoustics and!

Search Results (Displaying 1-10 of 54 matches)		New Search
Sort by:
	Records Per Page:

AES E-Library Search Results

A Perceptual Evaluation of Spatial Room Impulse Responses Captured Using Spaced Higher-Order Ambisonic Microphone Arrays

A Perceptual Evaluation of Spatial Room Impulse Responses Convolved with Multichannel Direct Sound

Design Optimization of Acoustic Structures Using Geometry Parameterization and Particle Swam Optimization Algorithm

Immersive Personal Sound Using a Surface Nearfield Source

Improving Domain Generalization Via Event-Based Acoustic Scene Classification

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

The new digital cinema at Fraunhofer IIS as a versatile immersive reference listening room

The Psychoacoustic Perception of Distance in Monophonic and Binaural Music

Higher order ambisonics compression method based on autoencoder

Everything Plus the Kitchen Sink: An Introduction to Noise in Contemporary Art and Music

ABOUT AES

Contact Us