AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this conference

Investigating the Influence of Environmental Acoustics and Playback Device for Audio Augmented Reality Applications

Presenting plausible virtual sounds to a user is an important challenge within audio augmented reality (AAR), where virtual sounds must appear as a real part of the audio environment. Reproducing an environment’s acoustics is one step towards this, however there is limited understanding of how the spatial resolution and spectral bandwidth of such reproductions contribute to plausibility, and therefore which approaches an AAR developer should target. We present two studies comparing room impulse responses (varying in spatial resolution and spectral bandwidth) and playback devices (headphones and audio glasses) to investigate their influence on the plausibility and user perception of virtual sounds. We do so using both a listening test in a controlled environment, and then an AAR game played in two real-world locations. Our results suggest that, particularly in a real-world AAR application context, users have low sensitivity for differences between reverberation models, but that the reproduction of an environment’s acoustics positively influences the plausibility and externalisation of a virtual sound. These benefits are most pronounced when played over headphones, but users were positive about the use of audio glasses for an AAR application, despite their lower perceptual fidelity. Overall, our findings suggest both lower fidelity environmental acoustics and audio glasses are appropriate for future AAR applications, allowing developers to use less computing resources and maintain real-world awareness without compromising user experience.

Open
Access

Authors: Bhattacharyya, Jake; Picinali, Lorenzo; Vinciarelli, Alessandro; Brewster, Stephen
Affiliations: University of Glasgow; Imperial College London; University of Glasgow; University of Glasgow(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 1 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: audio augmented reality acoustic transparency virtual sound plausibility environmental acoustics audio game audio augmented reality game audio glasses

Download Now (4.5 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

A Machine learning method to evaluate and improve sound effects synthesis model design

Procedural audio models have great potential in sound effects production and design, they can be incredibly high quality and have high interactivity with the users. However, they also often have many free parameters that may not be specified just from an understanding of the phenomenon, making it very difficult for users to create the desired sound. Moreover, their potential and generalization ability are rarely explored fully due to their complexity. To address these problems, this work introduces a hybrid machine learning method to evaluate the overall sound matching performance of a real sound dataset. First, we train a parameter estimation network using synthesis sound samples. Through the differentiable implementation of the sound synthesis model, we use both parameter and spectral loss in this self-supervised stage. Then, we perform adversarial training by spectral loss plus adversarial loss using real sound samples. We evaluate our approach for an example of an explosion sound synthesis model. We experiment with different model designs and conduct a subjective listening test. We demonstrate that this is an effective method to evaluate the overall performance of a sound synthesis model, and its capability to speed up the sound model design process.

Authors: Zong, Yisu; Garcia-Sihuay, Nelly; Reiss, Joshua
Affiliations: Queen Mary University of London; Queen Mary University of London; Queen Mary University of London(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 2 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: Procedural audio Sound effects synthesis Sound matching Differentiable digital signal processing Deep learning

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Distance perception using open-back headphones and multi-channel speakers

In game audio and interactive productions, the representation of distance perception is important, but is considered to be difficult to implement. In this paper, a simple system using a combination of open-back headphones and multi-channel speakers was attempted to represent distance perception. A single ambisonics signal is sent to headphones and loudspeakers at the same time, thus representing a distance by changing the volume of both sources. Two experiments and one measurement were conducted with changing conditions. As a result, it was found that distance representation can be improved by selecting headphones with low sound obstruction and processing head-tracking.

Authors: Ikeda, Tsubasa; Kamekawa, Toru; Marui, Atsushi
Affiliations: Tokyo University of the arts; Tokyo University of the arts; Tokyo University of the arts(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 3 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: 3D audio immersive audio headphones binaural distance

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

CGI Scenes for Interactive Audio Research and Development: Cave, Cinema, and Mansion

Audio rendering engines are a cornerstone in offering a plausible and immersive experience for interactive virtual environments (IVEs). For virtual reality IVEs, a culmination of visuals, audio, interactive, and behavioral cues blend to form a user’s perception and cognition. However, implementing such IVEs incurs additional costs and resources beyond the scope of many labs. This contribution describes a set of three open-source computer-generated imagery interactive audiovisual scenes, including geometric, material, lighting, and post-processing implementation for relevant audio and visual cues. In addition, each IVE poses an audio-relevant task for users to perform throughout the environment, invoking cognitive processes for further psychological and behavioral research. The results of a small-scale case study are presented, which demonstrate the IVE design’s impact on user behavior along with scene profiling of selected acoustic attributes. The scene profiling highlights that different acoustic auralization attributes for IVEs may be needed as a combination of both the IVE’s physical design and the user task.

Open
Access

Authors: Robotham, Thomas; Rebmann, Daniela; Fintineanu-Anghelescu, Dominik O.; Raake, Alexander; Habets, Emanuël A. P.
Affiliations: International Audio Laboratories Erlangen; Fraunhofer Institut für Integrierte Schaltungen; Fraunhofer Institut für Integrierte Schaltungen; Audiovisual Technology Group, TU-Ilmenau; International Audio Laboratories Erlangen(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 4 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: Audiovisual Virtual reality Interactive sound Immersive audio Spatial audio Acoustics Perception Behavior Cognition Multimodal Unity Evaluation

Download Now (35.6 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Dynamic late reverberation rendering using the common-slope model

Late reverberation rendering in video games and virtual reality applications can be challenging due to limited computational resources. Typical scenes feature complex geometries with multiple coupled rooms or non-uniform absorption. Additionally, the audio engine must continuously adapt to the player’s movements and the sound sources in the scene. This paper proposes a dynamic rendering system for anisotropic and inhomogeneous late reverberation. It is based on the common-slope model and uses a set of exponentially decaying reverberators that are weighted with position-, direction-, and frequency-dependent gains. We evaluate the system in a scene consisting of three coupled rooms, where we illustrate the reverberator gains for multiple octave bands. The proposed method allows real-time rendering of the spatial late reverberation while using a small number of artificial reverberators.

Authors: Götz, Georg; Kerimovs, Teodors; Schlecht, Sebastian J.; Pulkki, Ville
Affiliations: Aalto Acoustics Lab, Department of Information and Communications Engineering, Aalto University; Aalto Acoustics Lab, Department of Information and Communications Engineering, Aalto University; Aalto Acoustics Lab, Department of Information and Communications Engineering, Aalto University, and Media Lab; Aalto Acoustics Lab, Department of Information and Communications Engineering, Aalto University(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 5 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: late reverberation rendering inhomogeneous reverberation anisotropic reverberation reverberator game audio engine

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Perceptual comparison of efficient real-time geometrical acoustics engines in Virtual Reality

This work is licensed under a
Creative Commons Attribution
4.0 International License.

Interactive immersive experiences and games require the dynamic modelling of acoustical phenomena over large and complex geometrical environments. However, the emergence of mobile Virtual Reality (VR) platforms and the ever limited computational budget for audio processing imposes severe constraints on the simulation process. With this in mind, efficient geometrical acoustics (GA) real-time engines are an attractive alternative. In this work we present the results of a perceptual comparison between three geometrical acoustic engines suitable for VR environments: an engine based on an Image Source Model (ISM) of a shoebox of variable dimensions, a path tracing (PT) engine with arbitrary geometry and frequency dependent materials, and a bi-directional path tracing (BDPT) engine with perceptual optimization of the Head-Related Transfer Function. The tests were conducted using Meta Quest and Quest 2 headsets and 26 listeners provided perceptual ratings of six attributes (preference, realism/naturalness, reverb quality, localization, distance, spatial impression) of three different sources in 6 scenes. The results reveal that the BDPT engine is consistently rated higher than the other two in 4 of the perceptual attributes i.e. preference, realism/naturalness, reverberation quality, and spatial impression, particularly in large reverberant spaces. In small spaces, trends are less clear and ratings are more subject dependent. A Principal Component Analysis (PCA) revealed that only two perceptual dimensions account for more than 80% of the explained variance of the ratings.

Open
Access

Authors: Amengual Gari, Sebastia Vicenc; Schissler, Carl; Robinson, Philip
Affiliations: Reality Labs Research; Reality Labs Research, Meta; Reality Labs Research, Meta(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 6 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: Virtual Reality Real-Time Audio Engines Audio Simulation Geometrical Acoustics Path Tracing Bi-Directional Path Tracing Image Source Model Perceptual Dimensions Perceptual Evaluation Acoustics Perception

Download Now (3.6 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

A Survey of High-Level Descriptors in Sound Design: Understanding the Key Audio Aesthetics Towards Developing Generative Game Audio Engines

Sound design plays a critical role in enhancing the impact of audio content for video games and immersive environments. However, the subjective nature of sound perception and aesthetics makes it challenging to define the key features for advancing the development of techniques and tools for analyzing, searching, and organizing sound design signals. To address this issue, we conducted a survey of sound design practitioners to identify the most relevant high-level descriptors (HLDs) that define audio aesthetics. The results of this study provide valuable insights into the most important HLDs for various sound design analysis and modeling tasks toward developing computational assistive technologies for game audio.

Authors: Anderson, Christopher; Carpenter, Craig; Kranabetter, Joshua; Thorogood, Miles
Affiliations: University of British Columbia; University of British Columbia; UBC; University of British Columbia(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 7 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: High Level Descriptors Sound Design Game Audio Generative

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Using texture maps to procedurally generate sound in virtual environments

Audiovisual occurrences in virtual environments are governed by data streams that are often shared, but processed separately by the graphics and audio engines. In a common video game scenario, virtual physics interactions among objects in the scene project their visual effect through animated graphics rendering. Independently to this process, the same interactions trigger and control the corresponding sonic output. However, in the natural world, this group of events is a unified causal phenomenon. In an attempt to model audiovisual phenomena within virtual worlds more thoroughly, the use of texture maps for sound effects generation is investigated. Wavetable synthesis is employed for this purpose, as it features certain characteristics that facilitate intuitive image to sound translation. This approach aims to take advantage of the cross-modal affordances of sonification, the realism of physically inspired sound synthesis and the dynamicism of generative audio.

Author: Menexopoulos, Dimitris
Affiliation: Centre for Digital Music, Queen Mary University of London, London, UK
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 8 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: Game Audio Graphics to Sound Translation Procedural Audio Sound Synthesis Sonification

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

The digital Foley: what Foley artists say about using audio synthesis.

Foley is a sound production technique where organicity and authenticity in sound creation are key to fostering creativity. Audio synthesis, Artificial Intelligence (AI) and Interaction Design (IXD) have been explored by the community to investigate their efficiency and versatility. This paper investigates audio synthesis's current and potential use in Foley practice. We opened an online survey answered by 56 Foley artists with a median of 10 years of experience from 13 different industries. Results from a thematic analysis reported that artists desired controllers for synthesising Foley with a focus on organic control, performance, and innovative ideas. Deterring factors included traditional Foley practices, sound synthesis complexity, and physical object authenticity. The strengths of sound synthesis tools included creativity, speed, cost-effectiveness, and customisability. Suggestions to improve current tools encompassed increased interactivity, teamwork, and continued exploration. Participants had diverse views on potential synthesis tools for Foley, emphasising physical modelling, IXD, and preserving the underlying craftsmanship.

Authors: Di Donato, Balandino; McGregor, Iain
Affiliations: Edinburgh Napier University; Edinburgh Napier University(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 9 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: Foley audio synthesis Interaction Design Artificial Intelligence AI IXD

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Discerning real from synthetic: analysis and perceptual evaluation of sound effects

In audio post-production, the adoption of sound synthesis offers a viable alternative for searching and recording samples in creating soundscapes. However, a central concern arises regarding the ability of synthetic sounds to match the perceived authenticity of library samples. This paper introduces an analytical approach, examining authentic and synthetic samples in five categories(burning embers, pouring water, explosions, popping bubbles and church bells) by delving into audio descriptors that distinguish both types. We focus in the utilization of machine learning classification models and a perceptual evaluation experiment. The perceptual evaluation was between five distinct synthesis techniques – granular, additive, subtractive, physically informed, and modal synthesis –revealed that subtractive synthesis is perceived as more realistic in explosion sounds, while additive synthesis works better with pouring water sounds. This study provides valuable insights into the audio descriptors that may require modification in specific synthetic models, paving the way for a deeper understanding of sound synthesis methods and facilitating their integration into the sound design process.

Open
Access

Authors: Garcia-Sihuay, Nelly; Zong, Yisu; Reiss, Joshua
Affiliations: Queen Mary University of London; Queen Mary University of London; Queen Mary University of London(See document for exact affiliation information.)
AES Conference: AES 2024 International Audio for Games Conference (April 2024)
Paper Number: 10 Permalink
Publication Date: April 27, 2024 Import into BibTeX
Subject: perceptual evaluation synthesis methods realism classification difference

Download Now (749 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Search Results (Displaying 1-10 of 21 matches)		New Search
Sort by:
	Records Per Page:

AES E-Library Search Results

Investigating the Influence of Environmental Acoustics and Playback Device for Audio Augmented Reality Applications

A Machine learning method to evaluate and improve sound effects synthesis model design

Distance perception using open-back headphones and multi-channel speakers

CGI Scenes for Interactive Audio Research and Development: Cave, Cinema, and Mansion

Dynamic late reverberation rendering using the common-slope model

Perceptual comparison of efficient real-time geometrical acoustics engines in Virtual Reality

A Survey of High-Level Descriptors in Sound Design: Understanding the Key Audio Aesthetics Towards Developing Generative Game Audio Engines

Using texture maps to procedurally generate sound in virtual environments

The digital Foley: what Foley artists say about using audio synthesis.

Discerning real from synthetic: analysis and perceptual evaluation of sound effects

ABOUT AES

Contact Us