Authors:Yang, Jing; Barde, Amit; Billinghurst, Mark
Affiliation:Department of Computer Science, ETH Zurich, Switzerland; The Empathic Computing Laboratory, Auckland Bioengineering Institute, The University of Auckland, New Zealand
Audio Augmented Reality (AAR) aims to augment people's auditory perception of the real world by synthesizing virtual spatialized sounds. AAR has begun to attract more research interest in recent years, especially because Augmented Reality (AR) applications are becoming more commonly available on mobile and wearable devices. However, because audio augmentation is relatively under-studied in the wider AR community, AAR needs to be further investigated in order to be widely used in different applications. This paper systematically reports on the technologies used in past studies to realize AAR and provide an overview of AAR applications. A total of 563 publications indexed on Scopus and Google Scholar were reviewed, and from these, 117 of the most impactful papers were identified and summarized in more detail. As one of the first systematic reviews of AAR, this paper presents an overall landscape of AAR, discusses the development trends in techniques and applications, and indicates challenges and opportunities for future research. For researchers and practitioners in related fields, this review aims to provide inspirations and guidance for conducting AAR research in the future.
Download: PDF (HIGH Res) (2.1MB)
Download: PDF (LOW Res) (463KB)
Authors:Agrawal, Sarvesh; Bech, Søren; De Moor, Katrien; Forchhammer, Søren
Affiliation:Bang & Olufsen a/s, Struer, Denmark; Department of Photonics Engineering, Technical University of Denmark, Lyngby, Denmark; Department of Electronic Systems, Aalborg University, Aalborg, Denmark; Department of Information Security and Communication Technology, Norwegian University of Science and Technology,Trondheim, Norway
Understanding the influence of technical system parameters on audiovisual experiences is important for technologists to optimize experiences. The focus in this study was on the influence of changes in audio spatialization (varying the loudspeaker configuration for audio rendering from 2.1 to 5.1 to 7.1.4) on the experience of immersion. First, a magnitude estimation experiment was performed to perceptually evaluate envelopment for verifying the initial condition that there is a perceptual difference between the audio spatialization levels. It was found that envelopment increased from 2.1 to 5.1 reproduction, but there was no significant benefit of extending from 5.1 to 7.1.4. An absolute-rating experimental paradigm was used to assess immersion in four audiovisual experiences by 24 participants. Evident differences between immersion scores could not be established, signaling that a change in audio spatialization and subsequent change in envelopment does not guarantee a psychologically immersive experience.
Download: PDF (HIGH Res) (2.4MB)
Download: PDF (LOW Res) (563KB)
Authors:Fela, Randy Frans; Zacharov, Nick; Forchhammer, Søren
Affiliation:SenseLab, FORCE Technology, Hørsholm, Denmark; Meta Reality Labs, Paris, France; Department of Electrical and Photonics Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
For accurate and detailed perceptual evaluation of compressed omnidirectional multimedia content, it is imperative for assessor panels to be qualified to obtain consistent and high-quality data. This work extends existing procedures for assessor selection in terms of scope (360? videos with high-order ambisonic), time efficiency, and analytical approach, as described in detail. The main selection procedures consisted of a basic audiovisual screening and three successive discrimination experiments for audio (listening), video (viewing), and audiovisual using a triangle test. Additionally, four factors influencing quality of experience, including the simulator sickness questionnaire, were evaluated and are discussed. After the selection process, a confirmatory study was conducted using three experiments (audio, video, and audiovisual) and based on a rating scale methodology to compare performance between rejected and selected assessors. The studies showed that (i) perceptual discriminations are influenced by the samples, the encoding parameters, and some quality of experience factors; (ii) the probability of symptom occurrence is considerably low, indicating that the proposed procedure is feasible; and (iii) the selected assessors performed better in discrimination than the rejected assessors, indicating the effectiveness of the proposed procedure.
Download: PDF (HIGH Res) (48.3MB)
Download: PDF (LOW Res) (1.4MB)
Authors:Johansson, Jaan; Mäkivirta, Aki; Malinen, Matti; Saari, Ville
Affiliation:Genelec Oy, Iisalmi, Finland; Kuava Oy, Kuopio, Finland
This paper studies the feasibility of predicting the interaural time difference (ITD) in azimuth and elevation once the personal anthropometric interaural distance is known, proposing an enhancement for spherical head ITD models to increase their accuracy. The method and enhancement are developed using data in a Head-Related Impulse Response (HRIR) data set comprising photogrammetrically obtained personal 3D geometries for 170 persons and then evaluated using three acoustically measured HRIR data sets containing 119 persons in total. The directions include 360° in azimuth and –15° to 60° in elevation. The prediction error for each data set is described, the proportion of persons under a given error in all studied directions is shown, and the directions in which large errors occur are analyzed. The enhanced spherical head model can predict the ITD such that the first and 99th percentile levels of the ITD prediction error for all persons and in all directions remains below 122 µs. The anthropometric interaural distance could potentially be measured directly on a person, enabling personalized ITD without measuring the HRIR. The enhanced model can personalize ITD in binaural rendering for headphone reproduction in games and immersive audio applications.
Download: PDF (HIGH Res) (2.0MB)
Download: PDF (LOW Res) (1.2MB)
Authors:Gareis, Michael; Maas, Jürgen
Affiliation:Mechatronic Systems Lab, Technical University of Berlin, Berlin, Germany
In recent decades, dielectric elastomers (DE) have emerged as a promising transducing principle for various applications. They promise to be lightweight, efficient, and affordable alternatives to conventional electrodynamic or piezoelectric transducers and show large deformations at fast rates. In this work a loudspeaker concept is proposed, which relies on the elastic instability of a DE membrane. A multilayered DE membrane is clamped in a circular ring. Upon applying a DC voltage, its area increases, and themembrane buckles up. A superimposed signal voltage induces vibration and generates sound. To model the device mechanically, a system of partial differential equations is derived from Hamilton's principle. The mechanical model is then coupled to the linear assumed electrical and acoustical domains. Static, dynamic, and acoustic experiments on buckling DE transducers of three different diameters (10, 15, and 20 mm) and different thicknesses (0.4mmto 0.6 mm) as multilayer configurations are conducted to validate the model. Sound pressure levels of about 70 dB above 1 kHz are reached. Small loudspeakers like this may find application in mobile or array systems.
Download: PDF (HIGH Res) (3.0MB)
Download: PDF (LOW Res) (810KB)
Authors:Duan, Zhikui; Gao, Guozhi; Chen, Jiawei; Li, Shiren; Ruan, Jinbiao; Yang, Guangguang; Yu, Xinmei
Affiliation:Foshan University, Foshan, China
The Transformer, an attention-based encoder-decoder network, has recently become the prevailing model for automatic speech recognition because of its high recognition accuracy. However, the convergence speed of the Transformer is not that optimal. In order to address this problem, a structure called Dual-Residual Transformer Network (DRTNet), which has fast convergence speed, is proposed. In DRTNet, a direct path is added in the encoder and decoder layers to propagate features with the inspiration of the structure proposed in ResNet. Moreover, this architecture can also fuse features, which tends to improve the model performance. Specifically, the input of the current layer is the integration of the input and output of the previous layer. Empirical evaluation of the proposed DRTNet has been conducted on two public datasets, which are AISHELL-1 and HKUST, respectively. Experimental results on these two datasets show that DRTNet has faster convergence speed and better performance.
Download: PDF (HIGH Res) (7.0MB)
Download: PDF (LOW Res) (850KB)
Authors:Franco Hernández, Juan Carlos; Bacila, Bogdan; Brookes, Tim; De Sena, Enzo
Affiliation:Institute of Sound Recording, University of Surrey, Guildford, United Kingdom; Applied Psychoacoustics Laboratory, University of Huddersfield, Huddersfield, United Kingdom
A new publicly available dataset of microphone impulse responses (IRs) has been generated. The dataset covers 25 microphones, including a Class-1 measurement microphone and polar pattern variations for seven of the microphones. Microphones that were included had omnidirectional, cardioid, supercardioid, and bidirectional polar patterns; condenser, movingcoil, and ribbon transduction types; single and dual diaphragms; multiple body and head basket shapes; small and large diaphragms; and end-address and side-address designs.Using a customdeveloped computer-controlled precision turntable, IRs were captured quasi-anechoically at incident angles from 0? to 355? in steps of 5? and at source-to-microphone distances of 0.5, 1.25, and 5 m. The resulting dataset is suitable for perceptual and objective studies related to the incident-angle--dependent response of microphones and for the development of tools for predicting and emulating on-axis and off-axis microphone characteristics. The captured IRs allow generation of frequency response plots with a degree of detail not commonly available in manufacturer-supplied data sheets and are also particularly well-suited to harmonic distortion analysis.
Download: PDF (HIGH Res) (14.8MB)
Download: PDF (LOW Res) (1.4MB)
Authors:Zamir, Aviad; Seiden, Gabriel; Kupershmidt, Haim
Affiliation:Synaptics Inc., San Jose, CA; Moriah Scientific Consulting, 76248 Rehovot, ISRAEL
With the evolving market of true wireless stereo earphones and improvements in Bluetooth technology, wireless earphones have become a platform for innovation. Performance of such earphones ismeasured based on two main criteria: sound quality,which includes total harmonic distortion, and power consumption. Power consumption efficiency pertaining to such devices is critical in sustaining a good battery life. In this study, the design and fabrication of a novel aluminum-based push-pull electrostatic microelectromechanical systems transducer for earphones are presented. This device is designed to consume two orders of magnitude lower than a common earphone voice coil speaker and has a substantially higher quality of sound. Particularly, the authors elaborate on the underlying theoretical aspects pertaining to the design and on the unique fabrication challenges originating from the microscale nature.
Download: PDF (HIGH Res) (22.7MB)
Download: PDF (LOW Res) (1.9MB)