Bulk download: Download Zip archive of all papers from this conference
HoloBand is a clinical tool for audiovisual music perception training in augmented reality (AR) using the Hololens 2 for the hard-of-hearing (HoH). It consists of two interfaces: a spatial environment in AR for the HoH user and a computer interface to control the training for speech and language pathologists (SLP). We adopt a participatory exploratory research approach to investigate novel ways of conducting timbre discrimination, audio localization, and pitch recognition. HoloBand is evaluated using a focus group with SLPs (n=6) and interpreted through inductive thematic analysis. Hololens 2 shows relevant contextual, visual capabilities to enhance music perception training. However, a closer integration into the clinical workflow and a task-oriented approach is necessary to bring real-life scenarios to the clinical setting.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Auditory localization potential describes a set of metrics that quantify the estimated angular distance and diffusion error of the impulse response for a rendered direction through a spatial audio system compared to a reference impulse response. These metrics inform the potential auditory angular error and spatial blur that could be expected when users conduct an auditory localization task. Such a tool can inform the possible effects on a listener’s localization ability when designing optimizations strategies and wearable loudspeakers, especially in the context of extended reality (XR) devices. These types of devices commonly employ limited compute resources, with loudspeakers located at a distance from the ear where anthropometric effects are unavoidable. This paper presents a review of how auditory localization potential can be evaluated on XR spatial audio systems through digital output and acoustic measurements, and how this localization metric can be further validated through critical listening measurement strategies. This review of evaluation techniques provides a set of tools to product designers and engineers that can increase the understanding of the expected performance of spatial audio subsystems and improve their design process. 1
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Acoustic room responses usually comprise components that propagate in non-horizontal directions. Oftentimes, audio capture and reproduction systems are not capable of maintaining such elevation information reliably hence it is important to understand their perceptual significance when auralizing rooms. This work investigates the ability of the human hearing system to distinguish between early reflections with different elevation angles by performing loudspeaker- and headphone-based listening experiments using manipulated spatial room impulse responses. The results show that changing the elevation of a strong early reflection can lead to clearly perceivable differences and factors that influence the detectability are identified. Projecting all elevated reflections of a spatial room impulse response with no very prominent ceiling reflection to the horizontal plane showed no perceivable differences.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Complimentary multi-device audio experiences are becoming increasingly common through the proliferation of mobile computing and the creation of bespoke multi-device audio production tools. However, the best use cases and design practices for these experiences remain less well understood. A review of multi-device audio experiences is therefore necessary to capture and consolidate the knowledge in this research area, and to move toward a set of design guidelines for creating more effective and engaging experiences. In this study, the range of applications of co-located multi-device audio experiences is explored and documented through a review of the literature and a survey, resulting in a dataset containing 31 individual experiences and 11 enabling tools or platforms. An initial analysis of the survey data revealed the frequency of types of devices and the forms of interaction in the experiences and platforms of the dataset. The full dataset is available at https://doi.org/10.5281/zenodo.6839250.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Binaural rendering algorithms must balance simplicity and accuracy, due to the high degree of spectral complexity contained in a head-related transfer function (HRTF) dataset. A novel binaural rendering algorithm, termed principal component-base amplitude panning (PCBAP), has been developed which provides both accurate and efficient binaural rendering. The algorithm is based upon a time-domain principal component analysis of the HRTF, where resulting principal components serve as binaural rendering filters, and PC weights serve as panning gains. PCBAP is better suited to accurately reproducing both total level (TL) and interaural level difference (ILD) cues. When compared to time-aligned HRTFs fit to spherical harmonic functions and the magnitude least squares (MagLS) algorithms, PCBAP showed better performance for both cues at all reconstruction orders. Specifically, PCBAP can provide similar accuracy with 16-36 filters that other methods can only achieve with > 650 filters: a 95-98% reduction in computational requirements. Across frequency, PCBAP performs worse at lower orders below 3 kHz, but performance is superior at and above third order processing. PCBAP is also well-suited to accurately rendering ILD cues above 5 kHz in lateral directions. Other algorithms cannot render these fine details without using high processing orders ( ?? ? 9 ).
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
To achieve the goal of a perceptual fusion between the auralization of virtual audio objects in the room acoustics of a real listening room, an adequate adaptation of the virtual acoustics to the real room acoustics is necessary. The challenges are to describe the acoustics of different rooms by suitable parameters, to classify different rooms, and to evoke a similar auditory perception between acoustically similar rooms. An approach is presented to classify rooms based on measured BRIRs using statistical methods and to select best match BRIRs from the dataset to auralize audio objects in a new room. The results show that it is possible to separate rooms based on their room acoustic properties, that the separation also corresponds to a large extent to the perceptual distance between rooms, and that a selection of best match BRIRs is possible.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
The XMA was recently presented, which is a higher-order ambisonic microphone array with a non-spherical scattering body. The approach is compatible with the also recently presented equatorial microphone array so that also XMAs can be designed with the microphones distributed solely on a circumferential contour around the scattering body. This greatly reduces the required number of microphones compared to classical spherical microphone arrays that require the microphones to be distributed over the entire surface of the scatterer. The equatorial XMA has so far only been evaluated as a head-mounted array, i.e. with a human head as the baffle. Other form factors of a range of sizes are also of practical relevance, particularly those form factors of 360 cameras as these are capable of capturing a complete panoramic audio-visual experience from a first-person view when combined with an equatorial XMA. We present a set of simulations based on which we identify what spherical harmonic orders can be obtained with what accuracy for a set of convex scattering body geometries that are of relevance in the given context. We demonstrate that the shape of the body is not very critical, and even corners are possible. The main limitation is that small bodies do not allow for extracting higher orders at low frequencies.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
A framework for actively controlling radiated, incident, or local personal sound fields is presented. It relies on loudspeakers and microphones either worn by the user or surrounding the user. The framework aims to address tasks such as speech privacy, personal active noise cancellation, and immersive audio presentation with limited amplification/injection of noise or leakage of private speech into the environment. The formulation relies on modeling and simulation of the sound field using a fast multipole accelerated boundary element method, spectral or point mea-surements of the sound field, and regularized optimization of the field created by actively controlled speakers. The use of acoustic simulation enables the utilization of transfer functions associated with a large number of points distributed in space resulting in effective regularization. Radiation cancellation of up to 20 dB was observed in low frequencies below 1 kHz in a numerical experiment using real-world impulse responses of a wearable loudspeaker setup.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Recent developments in AR / VR applications have brought a renewed focus on efficient and scalable real-time HRTF renderers to alleviate compute constraints when spatializing many sound sources at once. To efficiently achieve a reasonable approximation of the full-sphere, the HRTF dataset is often linearly decomposed into a predetermined number of basis filters via methods such as Ambisonics, VBAP, or PCA. This paper proposes a novel HRTF renderer and decomposition technique that, when compared to previous methods, allows for greater accuracy of the HRTF approximation for an equivalent compute cost. This is achieved through a multi-layered optimization network architecture that minimizes a perceptually motivated error function to derive the basis filters. We will demonstrate the numerical accuracy of our technique as well as provide listening test results comparing our method to other linear decomposition methods of relative computational cost using both our internal and the publicly available SADIE HRTF datasets.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Following recent trends of fully immersive virtual reality (VR) and augmented reality (AR) applications, ISO/IEC JTC1 SC29 WG6, MPEG Audio coding, decided to create the MPEG-I Audio work-item for standardizing a solution for audio rendering in such applications, in which the user can navigate and interact with the environment using 6 degrees of freedom (6DoF). One of the main capabilities of MPEG-I Audio will be the support of real-time modeling of acoustic occlusion and diffraction effects for geometrically complex VR/AR scenes, including a high degree of user interactivity. This can be achieved by employing a voxel-based representation of sound-occluding scene elements in combination with computationally efficient rendering algorithms, operating on uniform 3D voxel grids and their 2D projections. This paper describes the chosen reference model architecture for voxel-based acoustic occlusion and diffraction modeling, operating modes and envisioned applications. In addition, it summarizes the current status of the MPEG-I Audio standardization process.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!