A novel method for upmixing mono recordings into stereo is presented. This approach uses a source separation strategy to extract a note events from within the original mono mixture, which are clustered into individual sources by exploiting a user-interactive interface. The isolated sources can then be panned in different parts of the stereo image to create a wider spatial experience in the final version. In this work, quality of the stereo sound is evaluated by conducting a listening test, and results are compared with a similar process based on a different separation strategy. The proposed system is shown to be able to deliver stereo versions with higher audio quality and naturalness, suitable for music containing harmonic instruments or singing voices.
Head related impulse response (HRIR) is the total filtering effect induced from the reflection and diffraction of head, torso and pinna. Quaternion is a number system that extends the complex numbers. In this paper, quaternion algebra is applied to exploit the similarities among HRIRs and construct the quaternion impulse response in 7 strategies. A novel quaternion-based two-dimensional common factor decomposition is developed to decompose the quaternion-HRIRs into azimuth and elevation factors. Two datasets are used for experiments. Results show the Q-2D-CFD could achieve better performance than 2D-CFD, also, the quaternion HRTF formation strategy which exploits the front-back similarity and interaural similarity outperforms other strategies.
Crosstalk cancellation can be used to reproduce binaural audio over loudspeakers without headphones. This is desirable for use in a cinema, where current surround sound systems do not produce a consistent spatial experience for the majority of listeners across the auditorium. A crosstalk cancellation system for three listeners using loudspeaker arrays is proposed. The system’s ability to provide crosstalk cancellation is assessed through numerical simulations. Identical systems are placed either side of the central system, mimicking the row of a cinema, and the issue of acoustic leakage from the neighbouring systems is negated by including control points at neighbouring listener positions. Finally, an optimal control point assignment, allowing for the placement of crosstalk cancellation systems side by side, is presented.
This work presents the implementation and experimental validation of an interactive binaural renderer that uses spherical microphone array recordings. The plane wave density function is used to represent the sound field. One implementation using a complete head-related transfer function dataset and one using a spatially re-sampled set are considered. System’s performance is measured based on interaural time and level differences. Static performance validation is given by comparison to an established database. For the dynamic case, a real-time implementation using a head tracker is done. Good agreement is seen for interaural time differences. Significant errors for interaural level differences are found above the spatial aliasing frequency. The spatially re-sampled set implementation improves high-frequency content without affecting interaural time and level differences.
We present a virtual vector base amplitude panning (VBAP) implementation for 3D head-tracked binaural rendering on an embedded Linux system. Three degrees of freedom head-tracking is implemented within acceptable levels of latency and at 1º angular resolution. The technical performance of virtual VBAP is evaluated alongside a First Order Ambisonics (FOA) approach on the same platform, using analysis of localisation cue error against a human-measured head-related transfer function set. Our findings illustrate that, in scenarios utilising embedded or other portable, low-resource computing platforms, the nature and requirements of the immersive or interactive audio application at hand may determine whether virtual VBAP is a viable (or even preferable) approach compared to virtual FOA.
This engineering brief presents a new object collection for Max that enables the Spatially Oriented Format for Acoustics (SOFA) file format to be used within Cycling 74’s Max. The SOFA file format allows for easy distribution of and access to impulse response databases. This collection will allow for SOFA files to be easily opened and created from within Max, so that they can be used in patches that utilise spatial audio reproduction. Primarily, the aims, motive and criteria of this project are discussed. This is then followed by an outline of the objects themselves, along with their design considerations and applications. A finalised version of the package will be made freely available online at: https://research.hud.ac.uk/institutes-centres/apl/resources/
This document illustrates new important feature additions to the ScanIR impulse response measurement tool for MATLAB. ScanIR is a software tool which streamlines the process of recording different types of impulse responses for scientific purposes. The main changes to the software regard the possibility to store and read measurements in SOFA format, the possibility to add a rotating ARDUINO motor platform and a BRIR measurement pre-set modality.
This work summarises recent work from the Acoustics Lab, Aalto University, Finland, on real-time implementations of some fundamental and also advanced methods for spatialisation, production, visualisation, and manipulation of spatial audio sound scenes. The implementations can be roughly categorised to panning tools, implementing binaural panners and panners for arbitrary loudspeaker setups, and linear processing tools based on the Ambisonics framework; the latter includes: ambisonic decoders for loudspeakers or headphones, and tools for visualisation of directional sound scene activity of ambisonic sound scenes, based either on non-parametric beamforming, or parametric high-resolution methods. Finally, recent advanced reproduction tools based on the parametric processing of the COMPASS framework are detailed.
This document covers the release of the open-source HMDiR dataset (Head-Mounted-Display acoustic impulse responses) of HRTFs, useful to study the occlusion effect of wearing XR devices on the auditory perception. The data was collected for a previous publication in which the effect of wearing HMD gear on the HRTFs of a mannequin was described. This document covers in detail the measurement procedure, equipment, and specifications, including instructions on how to download the data files. The measurement library includes a free-head case (no HMD), two mixed reality headset cases, and three virtual reality headsets, chosen among those commercially available.
The ability for the public to experience historical pieces of music in the spaces for which they were initially written or within which they were conceivably performed, blending VR visual recreations with spatialised audio, is shown to be very popular. This research examines the effectiveness of the standard game engine plugins Google Resonance and Steam Audio for the spatialisation of audio in immersive virtual reality (VR) environments. Compared to commercial room acoustics simulation software, objective and subjective tests have been carried out and find that the flexibility of Steam Audio to assign custom properties increases its statistical accuracy but doesn't replicate the acoustic difference in the historical space to the same extent as a commercial acoustic simulation software.
This paper studies a novel personal listening device, HUMU Augmented Audio Cushion™, in the context of spatial audio. The case study explored ways to reproduce binaural audio with this device, which in normal use is located behind the listener. Several techniques were applied to binaural signals to render spatial sound, but none of them worked perfectly and bring the sound image in front of the listener. The device, which also provides tactile information for a user, is really close to the listener’s head, thus traditional far-field techniques failed to perform cross-talk cancellation. However, two listening tests revealed that the implemented techniques worked to some extent, however, many open research questions were left for future research.