An experiment has been conducted to assess the perceptual effect of vertical interchannel decorrelation between pairs of vertically-spaced loudspeakers. The study focuses on band-limiting the vertical decorrelation of natural sound sources in groups of octave-bands, while reproducing the unprocessed octave-bands monophonically through the main-layer loudspeaker. The upper limit of the vertical decorrelation is fixed at the 16 kHz octave-band, with the lower limit varied across eight octave-bands (centre frequencies 63 Hz to 8 kHz). A monophonic unprocessed condition was also included in a multiple comparison test alongside the eight decorrelated conditions. The results demonstrate that vertical decorrelation of the 500 Hz octave-band and above can significantly increase the vertical spread of an auditory image, similar to that of broadband decorrelation.
The main purpose of this paper is to observe and analyze how people’s localization performance changes as the HRTFs used transitions from an individualized set to a non-individualized (generic) set. Two common HRTF interpolation techniques, one in the time domain and the other in the frequency domain, are used to create averaged sets of HRTFs which fit between the two extremes and helps to examine the trend of localization behavior change through this continuum.
In this paper, the topic of auditory immersion and the issues with contradictory definitions for the terms immersion and presence are discussed in relation to content in Virtual Reality (VR). The issues of emotional variance in experimentation on immersion are also reviewed, and potential solutions on experimental methodology suggested, such as the use of an electroencephalogram (EEG). A survey designed to gather audio professional and consumer opinions of how important perceptual and technical auditory factors are for providing immersion is presented, and results show clearly that on average all factors questioned are perceived to be important for immersion. However vertical perception of sound was not perceived to be as important as horizontal sound perception.
With the increasing use of mobile phones and other handheld electronic devices for surfing the internet and streaming audio and video clips, it was inevitable that technologies like 3D Audio would eventually be implemented such devices. It is important to know if 3D Audio offers any improvement in perceived audio quality over existing stereo and mono formats. It is also important to know what kind of factors have a larger effect in the rated perceived audio quality of such formats. This paper explores the results of an ANOVA analysis of two ITU standardized tests for the subjective assessment of 3D Audio. The results show that 3D Audio performs better than stereo mono formats in terms of basic audio quality.
This paper presents a new test library for use in the spatial and timbral evaluation of immersive audio systems. Presented are synthesised test signals, anechoic speech and music recordings and Ambisonic conversational speech recordings in both anechoic and reverberant environments. The rationale for included stimuli is described, as well as the synthesizing/recording processes involved.
A novel method for upmixing mono recordings into stereo is presented. This approach uses a source separation strategy to extract a note events from within the original mono mixture, which are clustered into individual sources by exploiting a user-interactive interface. The isolated sources can then be panned in different parts of the stereo image to create a wider spatial experience in the final version. In this work, quality of the stereo sound is evaluated by conducting a listening test, and results are compared with a similar process based on a different separation strategy. The proposed system is shown to be able to deliver stereo versions with higher audio quality and naturalness, suitable for music containing harmonic instruments or singing voices.
Head related impulse response (HRIR) is the total filtering effect induced from the reflection and diffraction of head, torso and pinna. Quaternion is a number system that extends the complex numbers. In this paper, quaternion algebra is applied to exploit the similarities among HRIRs and construct the quaternion impulse response in 7 strategies. A novel quaternion-based two-dimensional common factor decomposition is developed to decompose the quaternion-HRIRs into azimuth and elevation factors. Two datasets are used for experiments. Results show the Q-2D-CFD could achieve better performance than 2D-CFD, also, the quaternion HRTF formation strategy which exploits the front-back similarity and interaural similarity outperforms other strategies.
Crosstalk cancellation can be used to reproduce binaural audio over loudspeakers without headphones. This is desirable for use in a cinema, where current surround sound systems do not produce a consistent spatial experience for the majority of listeners across the auditorium. A crosstalk cancellation system for three listeners using loudspeaker arrays is proposed. The system’s ability to provide crosstalk cancellation is assessed through numerical simulations. Identical systems are placed either side of the central system, mimicking the row of a cinema, and the issue of acoustic leakage from the neighbouring systems is negated by including control points at neighbouring listener positions. Finally, an optimal control point assignment, allowing for the placement of crosstalk cancellation systems side by side, is presented.
This work presents the implementation and experimental validation of an interactive binaural renderer that uses spherical microphone array recordings. The plane wave density function is used to represent the sound field. One implementation using a complete head-related transfer function dataset and one using a spatially re-sampled set are considered. System’s performance is measured based on interaural time and level differences. Static performance validation is given by comparison to an established database. For the dynamic case, a real-time implementation using a head tracker is done. Good agreement is seen for interaural time differences. Significant errors for interaural level differences are found above the spatial aliasing frequency. The spatially re-sampled set implementation improves high-frequency content without affecting interaural time and level differences.
We present a virtual vector base amplitude panning (VBAP) implementation for 3D head-tracked binaural rendering on an embedded Linux system. Three degrees of freedom head-tracking is implemented within acceptable levels of latency and at 1º angular resolution. The technical performance of virtual VBAP is evaluated alongside a First Order Ambisonics (FOA) approach on the same platform, using analysis of localisation cue error against a human-measured head-related transfer function set. Our findings illustrate that, in scenarios utilising embedded or other portable, low-resource computing platforms, the nature and requirements of the immersive or interactive audio application at hand may determine whether virtual VBAP is a viable (or even preferable) approach compared to virtual FOA.
This engineering brief presents a new object collection for Max that enables the Spatially Oriented Format for Acoustics (SOFA) file format to be used within Cycling 74’s Max. The SOFA file format allows for easy distribution of and access to impulse response databases. This collection will allow for SOFA files to be easily opened and created from within Max, so that they can be used in patches that utilise spatial audio reproduction. Primarily, the aims, motive and criteria of this project are discussed. This is then followed by an outline of the objects themselves, along with their design considerations and applications. A finalised version of the package will be made freely available online at: https://research.hud.ac.uk/institutes-centres/apl/resources/
This document illustrates new important feature additions to the ScanIR impulse response measurement tool for MATLAB. ScanIR is a software tool which streamlines the process of recording different types of impulse responses for scientific purposes. The main changes to the software regard the possibility to store and read measurements in SOFA format, the possibility to add a rotating ARDUINO motor platform and a BRIR measurement pre-set modality.
This work summarises recent work from the Acoustics Lab, Aalto University, Finland, on real-time implementations of some fundamental and also advanced methods for spatialisation, production, visualisation, and manipulation of spatial audio sound scenes. The implementations can be roughly categorised to panning tools, implementing binaural panners and panners for arbitrary loudspeaker setups, and linear processing tools based on the Ambisonics framework; the latter includes: ambisonic decoders for loudspeakers or headphones, and tools for visualisation of directional sound scene activity of ambisonic sound scenes, based either on non-parametric beamforming, or parametric high-resolution methods. Finally, recent advanced reproduction tools based on the parametric processing of the COMPASS framework are detailed.
This document covers the release of the open-source HMDiR dataset (Head-Mounted-Display acoustic impulse responses) of HRTFs, useful to study the occlusion effect of wearing XR devices on the auditory perception. The data was collected for a previous publication in which the effect of wearing HMD gear on the HRTFs of a mannequin was described. This document covers in detail the measurement procedure, equipment, and specifications, including instructions on how to download the data files. The measurement library includes a free-head case (no HMD), two mixed reality headset cases, and three virtual reality headsets, chosen among those commercially available.
The ability for the public to experience historical pieces of music in the spaces for which they were initially written or within which they were conceivably performed, blending VR visual recreations with spatialised audio, is shown to be very popular. This research examines the effectiveness of the standard game engine plugins Google Resonance and Steam Audio for the spatialisation of audio in immersive virtual reality (VR) environments. Compared to commercial room acoustics simulation software, objective and subjective tests have been carried out and find that the flexibility of Steam Audio to assign custom properties increases its statistical accuracy but doesn't replicate the acoustic difference in the historical space to the same extent as a commercial acoustic simulation software.
This paper studies a novel personal listening device, HUMU Augmented Audio Cushion™, in the context of spatial audio. The case study explored ways to reproduce binaural audio with this device, which in normal use is located behind the listener. Several techniques were applied to binaural signals to render spatial sound, but none of them worked perfectly and bring the sound image in front of the listener. The device, which also provides tactile information for a user, is really close to the listener’s head, thus traditional far-field techniques failed to perform cross-talk cancellation. However, two listening tests revealed that the implemented techniques worked to some extent, however, many open research questions were left for future research.
This paper introduces Plugsonic Soundscape and Plugsonic Sample, two web-based applications for the creation and experience of binaural interactive audio narratives and soundscapes. The apps are being developed as part of the PLUGGY EU project (Pluggable Social Platform for Heritage Awareness and Participation). The apps audio processing is based on the Web Audio API and the 3D Tune-In toolkit. Within the paper, we report on the implementation, evaluation and future developments. We believe that the idea of a web-based application for 3D sonic narratives represents a novel contribution to the cultural heritage, digital storytelling and 3D audio technology domains.
This paper describes a system that implements audio over a multichannel loudspeaker system for virtual reality (VR) applications. Real-time tracking data such as distances between the user and loudspeakers and head rotation angle are used to modify the output of a multichannel loudspeaker configuration in terms of panning, delay and compensated energy to achieve stationary music and a dynamic sweet spot. This system was adapted for a simple first person shooter VR game, and pilot tests were conducted to test its impact on the user experience.
The auralization schemes in the domain of automotive audio have primarily utilized dummy head recordings in the past. Recently, spatial reproduction allowed the auralization of cabin acoustics over large loudspeaker arrays. Yet no direct comparisons between those methods exist. In this study, the efficacy of headphone presentation is explored in this context. Six acoustical conditions were presented over headphones to experienced assessors (n=23), who were asked to compare them over six elicited perceptual attributes. In 24 out of 36 cases, the results indicate an agreement between headphone- and loudspeaker-based auralisation of identical stimuli set. It is concluded that, when compared to loudspeakers-based rendering, headphones-based rendering reveals similar judgment on timbral attributes, while certain spatial attributes should be assessed with caution.
The emerging production of 3D Audio content results in new challenges regarding the optimization of the production process for VR/AR applications as well as for 3D music or film productions. This contribution presents a first approach of a virtual audiovisual environment for 3D audio production based on a real listening room with different loudspeaker arrangements. By wearing a Head Mounted Display, the producer can move sound sources in the listening room by hand gestures to create an immersive audio experience. For a verification of future listening experiments with the use of the implemented virtual environment, a semi-automatic measurement setup is presented to guarantee a controllable listening environment in terms of reverberation time, background noise, delay compensation and room response equalization.
This paper presents an architecture for the creation of emotionally congruent music using machine learning aided sound synthesis. We analyse participant’s galvanic skin response (GSR) while listening to AI generated music pieces and evaluate the emotions they describe in a questionnaire conducted after listening. These analyses reveal that there is a direct correlation between the calmness/scariness of a musical piece, the users’ GSR reading and the emotions they describe feeling. From these, we will be able to estimate an emotional state using biofeedback as a control signal for a machine-learning algorithm, which generates new musical structures according to a perceptually informed musical feature similarity model. Our case study suggests various applications including in gaming, automated soundtrack generation, and mindfulness.
Spatial Audio Perception and Envelopment are key areas of investigation for understanding audience attention and engagement when experiencing works through loudspeakers sound diffusion systems. This paper reports about the findings of a practice-based interdisciplinary research study on the perception of movement through sound consisting in a joint choreography of sound and body movement, designed and performed through the 192 Loudspeakers Wave Field Synthesis System by The Game of Life Foundation (NL). The ideas and examples discussed are focussed on multimodality and audiovisual perception, on the modalities involved in movement perception and how they could be integrated in a spatial audio composition for dance, to inform the context of auditory engagement and attention with the perspectives of a composer and choreographer.
Capturing musical performances for Virtual Reality (VR) is of growing interest to engineers, cultural organisations and the public. The application of ambisonic workflows in conjunction with binauralisation through head related transfer functions enables perception and localisation of sound sources within three dimensional space, crucially enabling height perception. While there are many excellent examples of orchestral recordings in VR, few make use of the height perception and favour ‘on stage’ horizontal positioning. This brief presents a contemporary symphony orchestral performance captured and produced in second order ambisonics in which 51 performers were individually split and positioned across five levels of the performance space. This case study looks to critically discuss the methods employed addressing the workflow through pre-production, capture and post-production.
The ever-improving immersive sound regeneration techniques have created an entire branch of novel signal processing techniques and 3D audio playback systems. The Pressure Matching Method (PMM) has been developed to efficiently recreate 3D sound with a speaker array. A new mathematical approach which enhances the effectiveness of PMM was presented in . In this paper, the results of the physical measurements conducted to make an assessment of the method are demonstrated. The outcome shows that in comparison with the traditional PMM with Tikhonov regularization, the eigen decomposition pseudoinverse technique improves the PMM efficiency through giving rise to a greater segregation of the received sound at the listener’s ears and a more immersive sound field around the listener’s head.