Authors: Serafin, Stefania; Alon, David Lou; Gamper, Hannes; Samarasinghe, Prasanga
Authors:Herre, Jürgen; Disch, Sascha
Affiliation:International Audio Laboratories Erlangen, Erlangen, Germany - A Joint Institution of Friedrich-Alexander-Universität Erlangen-Nürnberg and Fraunhofer IIS; Fraunhofer IIS, Erlangen, Germany
MPEG-I Immersive Audio is a forthcoming standard that is under development within the MPEG Audio group (ISO/IEC JTC1/SC29/WG6) to provide a compressed representation and rendering of audio for Virtual and Augmented Reality (VR/AR) applications with six degrees of freedom (6DoF). MPEG-I Immersive Audio supports bitrate-efficient and highquality storage/transmission of complex virtual scenes including sources with spatial extent and distinct radiation characteristics (like musical instruments) as well as geometry description of acoustically relevant elements (e.g., walls, doors, occluders). The rendering process includes detailed modeling of room acoustics and complex acoustic phenomena such as occlusion and diffraction due to acoustic obstacles and Doppler effects as well as interactivity with the user. Based on many contributions, this paper reports on the state of the MPEG-I Immersive Audio standardization process and its first technical Reference Model architecture. MPEG-I Immersive Audio establishes the first long-term stable audio format specification in the field of VR/AR and can be used for many consumer applications such as broadcasting, streaming, social VR/AR, or Metaverse technology.
Download: PDF (HIGH Res) (7.3MB)
Download: PDF (LOW Res) (725KB)
Authors:Engel, Isaac; Daugintis, Rapolas; Vicente, Thibault; Hogg, Aidan O. T.; Pauwels, Johan; Tournier, Arnaud J.; Picinali, Lorenzo
Affiliation:Audio Experience Design (www.axdesign.co.uk), Imperial College London, London, United Kingdom
Immersive audio technologies, ranging from rendering spatialized sounds accurately to efficient room simulations, are vital to the success of augmented and virtual realities. To produce realistic sounds through headphones, the human body and head must both be taken into account. However, the measurement of the influence of the external human morphology on the sounds incoming to the ears, which is often referred to as head-related transfer function (HRTF), is expensive and time-consuming. Several datasets have been created over the years to help researcherswork on immersive audio; nevertheless, the number of individuals involved and amount of data collected is often insufficient for modern machine-learning approaches. Here, the SONICOM HRTF dataset is introduced to facilitate reproducible research in immersive audio. This dataset contains the HRTF of 120 subjects, as well as headphone transfer functions; 3D scans of ears, heads, and torsos; and depth pictures at different angles around subjects' heads.
Download: PDF (HIGH Res) (15.4MB)
Download: PDF (LOW Res) (1.0MB)
Authors:Klein, Florian; Surdu, Tatiana; Treybig, Lukas; Werner, Stephan
Affiliation:Technische Universit¨at Ilmenau, Electronic Media Technology Group, Ilmenau, Germany
How humans perceive, recognize, and remember room acoustics is of particular interest in the domain of spatial audio. For the creation of virtual or augmented acoustic environments, a room acoustic impression matches the expectations of certain room classes or a specific room. These expectations are based on the auditory memory of the acoustic room impression. In this paper, the authors present an exploratory study to evaluate the ability of listeners to recognize room acoustic features. The task of the listeners was to detect the reference room in a modified ABX double-blind stimulus test that featured a pre-defined playback order and a fixed time schedule. Furthermore, the authors explored distraction effects by employing additional nonacoustic interferences. The results show a significant decrease of the auditory memory capacity within 10 s, which is more pronounced when the listeners were distracted. However, the results suggest that auditory memory depends on what auditory cues are available.
Download: PDF (HIGH Res) (5.0MB)
Download: PDF (LOW Res) (649KB)
Authors:McCormack, Leo; Meyer-Kahlen, Nils; Politis, Archontis
Affiliation:Department of Information and Communications Engineering, Aalto University, Espoo, Finland; Department of Information and Communications Engineering, Aalto University, Espoo, Finland; Faculty of Information Technology and Communication Sciences, Tampere University, Finland
A reconstruction-based rendering approach is explored for the task of imposing the spatial characteristics of a measured space onto a monophonic signal while also reproducing it over a target playback setup. The foundation of this study is a parametric rendering framework, which can operate either on arbitrary microphone array room impulse responses (RIRs) or Ambisonic RIRs. Spatial filtering techniques are used to decompose the input RIR into individual reflections and anisotropic diffuse reverberation, which are reproduced using dedicated rendering strategies. The proposed approach operates by considering several hypotheses involving different rendering configurations and thereafter determining which hypothesis reconstructs the input RIR most faithfully.With regard to the present study, these hypotheses involved considering different potential reflection numbers. Once the optimal number of reflections to render has been determined over time and frequency, the array directional responses used to reconstruct the input RIR are substituted with spatialization gains for the target playback setup. The results of formal listening experiments suggest that the proposed approach produces renderings that are perceptually more similar to reference responses, when compared with the use of an established subspace-based detection algorithm. The proposed approach also demonstrates similar or better performance than that achieved with existing state-of-the-art methods.
Download: PDF (HIGH Res) (1.6MB)
Download: PDF (LOW Res) (670KB)
Authors:Anemüller, Carlotta; Adami, Alexander; Herre, Jürgen
Affiliation:International Audio Laboratories Erlangen, Germany. A joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer IIS
In virtual/augmented reality or 3D applications with binaural audio, it is often desired to render sound sources with a certain spatial extent in a realistic way. A common approach is to distribute multiple correlated or decorrelated point sources over the desired spatial extent range, possibly derived from the original source signal by applying suitable decorrelation filters. Based on this basic model, a novel method for efficient and realistic binaural rendering of spatially extended sound sources is proposed. Instead of rendering each point source individually, the target auditory cues are synthesized directly from just two decorrelated input signals. This procedure comes with the advantage of low computational complexity and relaxed requirements for decorrelation filters. An objective evaluation shows that the proposed method matches the basic rendering model well in terms of perceptually relevant objective metrics. A subjective listening test shows, furthermore, that the output of the proposed method is perceptually almost identical to the output of the basic rendering model. The technique is part of the Reference Model architecture of the upcoming MPEG-I Immersive Audio standard.
Download: PDF (HIGH Res) (4.0MB)
Download: PDF (LOW Res) (667KB)
Authors:Corcuera, Andrea; Chatziioannou, Vasileios; Ahrens, Jens
Affiliation:University of Music and Performing Arts, Vienna, Austria; University of Music and Performing Arts, Vienna, Austria; Chalmers University of Technology, Gothenburg, Sweden
Musical instruments are complex sound sources that exhibit directivity patterns that not only vary depending on the frequency, but can also change as a function of the played tone. It is yet unclear whether the directivity variation as a function of the played tone leads to a perceptible difference compared to an auralization that uses an averaged directivity pattern. This paper examines the directivity of 38 musical instruments from a publicly available database and then selects three representative instruments among those with similar radiation characteristics (oboe, violin, and trumpet). To evaluate the listeners' ability to perceive a difference between auralizations of virtual environments using tone-dependent and averaged directivities, a listening test was conducted using the directivity patterns of the three selected instruments in both anechoic and reverberant conditions. The results show that, in anechoic conditions, listeners can reliably detect differences between the tone-dependent and averaged directivities for the oboe but not for the violin or the trumpet. Nevertheless, in reverberant conditions, listeners can distinguish tone-dependent directivity from averaged directivity for all instruments under study.
Download: PDF (HIGH Res) (3.1MB)
Download: PDF (LOW Res) (645KB)