Bulk download: Download Zip archive of all papers from this conference
Presenting plausible virtual sounds to a user is an important challenge within audio augmented reality (AAR), where virtual sounds must appear as a real part of the audio environment. Reproducing an environment’s acoustics is one step towards this, however there is limited understanding of how the spatial resolution and spectral bandwidth of such reproductions contribute to plausibility, and therefore which approaches an AAR developer should target. We present two studies comparing room impulse responses (varying in spatial resolution and spectral bandwidth) and playback devices (headphones and audio glasses) to investigate their influence on the plausibility and user perception of virtual sounds. We do so using both a listening test in a controlled environment, and then an AAR game played in two real-world locations. Our results suggest that, particularly in a real-world AAR application context, users have low sensitivity for differences between reverberation models, but that the reproduction of an environment’s acoustics positively influences the plausibility and externalisation of a virtual sound. These benefits are most pronounced when played over headphones, but users were positive about the use of audio glasses for an AAR application, despite their lower perceptual fidelity. Overall, our findings suggest both lower fidelity environmental acoustics and audio glasses are appropriate for future AAR applications, allowing developers to use less computing resources and maintain real-world awareness without compromising user experience.
Download Now (4.5 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Procedural audio models have great potential in sound effects production and design, they can be incredibly high quality and have high interactivity with the users. However, they also often have many free parameters that may not be specified just from an understanding of the phenomenon, making it very difficult for users to create the desired sound. Moreover, their potential and generalization ability are rarely explored fully due to their complexity. To address these problems, this work introduces a hybrid machine learning method to evaluate the overall sound matching performance of a real sound dataset. First, we train a parameter estimation network using synthesis sound samples. Through the differentiable implementation of the sound synthesis model, we use both parameter and spectral loss in this self-supervised stage. Then, we perform adversarial training by spectral loss plus adversarial loss using real sound samples. We evaluate our approach for an example of an explosion sound synthesis model. We experiment with different model designs and conduct a subjective listening test. We demonstrate that this is an effective method to evaluate the overall performance of a sound synthesis model, and its capability to speed up the sound model design process.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
In game audio and interactive productions, the representation of distance perception is important, but is considered to be difficult to implement. In this paper, a simple system using a combination of open-back headphones and multi-channel speakers was attempted to represent distance perception. A single ambisonics signal is sent to headphones and loudspeakers at the same time, thus representing a distance by changing the volume of both sources. Two experiments and one measurement were conducted with changing conditions. As a result, it was found that distance representation can be improved by selecting headphones with low sound obstruction and processing head-tracking.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Audio rendering engines are a cornerstone in offering a plausible and immersive experience for interactive virtual environments (IVEs). For virtual reality IVEs, a culmination of visuals, audio, interactive, and behavioral cues blend to form a user’s perception and cognition. However, implementing such IVEs incurs additional costs and resources beyond the scope of many labs. This contribution describes a set of three open-source computer-generated imagery interactive audiovisual scenes, including geometric, material, lighting, and post-processing implementation for relevant audio and visual cues. In addition, each IVE poses an audio-relevant task for users to perform throughout the environment, invoking cognitive processes for further psychological and behavioral research. The results of a small-scale case study are presented, which demonstrate the IVE design’s impact on user behavior along with scene profiling of selected acoustic attributes. The scene profiling highlights that different acoustic auralization attributes for IVEs may be needed as a combination of both the IVE’s physical design and the user task.
Download Now (35.6 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Late reverberation rendering in video games and virtual reality applications can be challenging due to limited computational resources. Typical scenes feature complex geometries with multiple coupled rooms or non-uniform absorption. Additionally, the audio engine must continuously adapt to the player’s movements and the sound sources in the scene. This paper proposes a dynamic rendering system for anisotropic and inhomogeneous late reverberation. It is based on the common-slope model and uses a set of exponentially decaying reverberators that are weighted with position-, direction-, and frequency-dependent gains. We evaluate the system in a scene consisting of three coupled rooms, where we illustrate the reverberator gains for multiple octave bands. The proposed method allows real-time rendering of the spatial late reverberation while using a small number of artificial reverberators.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Interactive immersive experiences and games require the dynamic modelling of acoustical phenomena over large and complex geometrical environments. However, the emergence of mobile Virtual Reality (VR) platforms and the ever limited computational budget for audio processing imposes severe constraints on the simulation process. With this in mind, efficient geometrical acoustics (GA) real-time engines are an attractive alternative. In this work we present the results of a perceptual comparison between three geometrical acoustic engines suitable for VR environments: an engine based on an Image Source Model (ISM) of a shoebox of variable dimensions, a path tracing (PT) engine with arbitrary geometry and frequency dependent materials, and a bi-directional path tracing (BDPT) engine with perceptual optimization of the Head-Related Transfer Function. The tests were conducted using Meta Quest and Quest 2 headsets and 26 listeners provided perceptual ratings of six attributes (preference, realism/naturalness, reverb quality, localization, distance, spatial impression) of three different sources in 6 scenes. The results reveal that the BDPT engine is consistently rated higher than the other two in 4 of the perceptual attributes i.e. preference, realism/naturalness, reverberation quality, and spatial impression, particularly in large reverberant spaces. In small spaces, trends are less clear and ratings are more subject dependent. A Principal Component Analysis (PCA) revealed that only two perceptual dimensions account for more than 80% of the explained variance of the ratings.
Download Now (3.6 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Sound design plays a critical role in enhancing the impact of audio content for video games and immersive environments. However, the subjective nature of sound perception and aesthetics makes it challenging to define the key features for advancing the development of techniques and tools for analyzing, searching, and organizing sound design signals. To address this issue, we conducted a survey of sound design practitioners to identify the most relevant high-level descriptors (HLDs) that define audio aesthetics. The results of this study provide valuable insights into the most important HLDs for various sound design analysis and modeling tasks toward developing computational assistive technologies for game audio.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Audiovisual occurrences in virtual environments are governed by data streams that are often shared, but processed separately by the graphics and audio engines. In a common video game scenario, virtual physics interactions among objects in the scene project their visual effect through animated graphics rendering. Independently to this process, the same interactions trigger and control the corresponding sonic output. However, in the natural world, this group of events is a unified causal phenomenon. In an attempt to model audiovisual phenomena within virtual worlds more thoroughly, the use of texture maps for sound effects generation is investigated. Wavetable synthesis is employed for this purpose, as it features certain characteristics that facilitate intuitive image to sound translation. This approach aims to take advantage of the cross-modal affordances of sonification, the realism of physically inspired sound synthesis and the dynamicism of generative audio.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Foley is a sound production technique where organicity and authenticity in sound creation are key to fostering creativity. Audio synthesis, Artificial Intelligence (AI) and Interaction Design (IXD) have been explored by the community to investigate their efficiency and versatility. This paper investigates audio synthesis's current and potential use in Foley practice. We opened an online survey answered by 56 Foley artists with a median of 10 years of experience from 13 different industries. Results from a thematic analysis reported that artists desired controllers for synthesising Foley with a focus on organic control, performance, and innovative ideas. Deterring factors included traditional Foley practices, sound synthesis complexity, and physical object authenticity. The strengths of sound synthesis tools included creativity, speed, cost-effectiveness, and customisability. Suggestions to improve current tools encompassed increased interactivity, teamwork, and continued exploration. Participants had diverse views on potential synthesis tools for Foley, emphasising physical modelling, IXD, and preserving the underlying craftsmanship.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
In audio post-production, the adoption of sound synthesis offers a viable alternative for searching and recording samples in creating soundscapes. However, a central concern arises regarding the ability of synthetic sounds to match the perceived authenticity of library samples. This paper introduces an analytical approach, examining authentic and synthetic samples in five categories(burning embers, pouring water, explosions, popping bubbles and church bells) by delving into audio descriptors that distinguish both types. We focus in the utilization of machine learning classification models and a perceptual evaluation experiment. The perceptual evaluation was between five distinct synthesis techniques – granular, additive, subtractive, physically informed, and modal synthesis –revealed that subtractive synthesis is perceived as more realistic in explosion sounds, while additive synthesis works better with pouring water sounds. This study provides valuable insights into the audio descriptors that may require modification in specific synthetic models, paving the way for a deeper understanding of sound synthesis methods and facilitating their integration into the sound design process.
Download Now (749 KB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!