Authors:Hill, Adam J.; Mulder, Johannes; Burton, Jon; Kok, Marcel; Lawrence, Michael
Affiliation:College of Science and Engineering, University of Derby, Derby, DE22 1GB, UK; College of Arts and Social Sciences, The National University of Australia, Canberra, Australia; dBcontrol, Zwaag The Netherlands; Rational Acoustics, Woodstock, CT, USA
Musical dynamics are often central within pieces of music and are therefore likely to be fundamental to the live event listening experience. While metrics exist in broadcasting and recording to quantify dynamics, such measures work on high-resolution data. Live event sound level monitoring data is typically low-resolution (logged at one second intervals or less), which necessitates bespoke musical dynamics quantification. Live dynamic range (LDR) is presented and validated here to serve this purpose, wheremeasurement data is conditioned to remove song breaks and sound level regulation-imposed adjustments to extract the true musical dynamics from a live performance. Results show consistent objective performance of the algorithm, as tested on synthetic data as well as datasets from previous performances.
Download: PDF (HIGH Res) (3.7MB)
Download: PDF (LOW Res) (1.2MB)
Authors:Patole, Rashmika; Rege, Priti P.
Affiliation:Department of Electronics and Telecommunication, College of Engineering, Pune, India; Department of Electronics and Telecommunication, College of Engineering, Pune, India
Authentication of audio recordings is an important task in the field of audio forensics. Splicing is the practice of manipulating recorded audio to replace or insert an external sound into the original audio track. Due to the ease with which digital audio recordings can be spliced, forgery and tampering of audio recordings with a criminal intent or intent to destroy their integrity are common practices. This paper describes a methodology for splicing detection in digital audio recordings with a comparative analysis of the effectiveness of different feature sets and classifiers. Different feature sets including conventional, chroma, and reverberation-based features are evaluated, compared, and combined to produce better classification accuracy. Exhaustive experimentation has been done to take into account factors such as the duration of the attack, effect of noise, and effect of compression. The Analytic Hierarchy Process is used to evaluate different performance parameters and identify the most suitable machine learning classifier for splicing detection based on priority weights assigned to the different performance parameters. Results indicate that Long Short-Term Memory with a feature set containing Mel-Frequency Cepstral Coefficients and Decay Rate Distribution features has the best performance compared with other classifiers and feature sets.
Download: PDF (HIGH Res) (6.5MB)
Download: PDF (LOW Res) (746KB)
Authors:Du, Bokai; Behler, Gottfried; Kohnen, Michael; Zeng, Xiangyang; Vorländer, Michael
Affiliation:School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, 710072, China; Institute of Technical Acoustics, RWTH-Aachen University, Aachen, 52072, Germany; Institute of Technical Acoustics, RWTH-Aachen University, Aachen, 52072, Germany; School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, 710072, China; Institute of Technical Acoustics, RWTH-Aachen University, Aachen, 52072, Germany
In this paper, a first-order loudspeaker, which is composed of monopole and dipole units, is designed and manufactured. The structure and size of the loudspeaker is shown. It is able to control the sound energy radiation with first-order beam control. The directivity of the loudspeaker is measured with a turntable andmicrophone arm. The directivity control ability is examined by the synthesis of a cardioid directivity. After that, a circular first-order loudspeaker array is constructed in order to investigate the array's performance on sound field reproduction system with exterior cancellation. The reproduction and energy radiation control performance of this first-order loudspeaker array is compared with a monopole array by experiment in the free field. At last, in order to reduce the effort on the loudspeaker array acoustic transfer function measurement, a sparse equivalent source method is proposed. The performance of the proposed method is compared with the conventional pressure-matching method and a previous equivalent source method.
Download: PDF (HIGH Res) (20.6MB)
Download: PDF (LOW Res) (1.2MB)
Authors:Yeoward, Christopher; Shukla, Rishi; Stewart, Rebecca; Sandler, Mark; Reiss, Joshua D.
Affiliation:Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; Dyson School of Design Engineering, Faculty of Engineering, Imperial College London, London SW7 2AZ, UK; Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
This paper proposes and evaluates an integrated method for real-time, head-tracked, 3D binaural audio with synthetic reverberation. Virtual vector base amplitude panning is used to position the sound source and spatialize outputs from a scattering delay network reverb algorithm running in parallel. A unique feature of this approach is its realization of interactive auralization using vector base amplitude panning and a scattering delay network, within acceptable levels of latency, at low computational cost. The rendering model also allows direct parameterization of room geometry and absorption characteristics. Varying levels of reverb complexity can be implemented, and these were evaluated against two distinct aspects of perceived sonic immersion. Outcomes from the evaluation provide benchmarks for how the approach could be deployed adaptively, to balance three real-time spatial audio objectives of envelopment, naturalness, and efficiency, within contrasting physical spaces.
Download: PDF (HIGH Res) (3.0MB)
Download: PDF (LOW Res) (859KB)
Authors:Kim, Sungyoung; Howie, Will
Affiliation:Electrical, Computer and Telecommunication Engineering Technology, Rochester Institute of Technology, Rochester, NY; Electrical, Computer and Telecommunication Engineering Technology, Rochester Institute of Technology, Rochester, NY
This study investigates how a listening environment (the combination of a room's acoustics and reproduction loudspeaker) influences a listener's perception of reproduced sound fields. Three distinct listening environmentswith different reverberation times and clarity indices were compared for their perceptual characteristics. Binaural recordings were made of orchestral music, mixed for 22.2 and 2-channel audio reproduction, within each of the three listening rooms. In a subjective listening test, 48 listeners evaluate these binaural recordings in terms of overall preference and five auditory attributes: perceived width, perceived depth, spatial clarity, impression of being enveloped, and spectral fidelity. Factor analyses of these five attribute ratings show that listener perception of the reproduced sound fields focused on two salient factors, spatial and spectral fidelity, yet the attributes' weightings in those two factors differs depending on a listener's previous experience with audio production and 3D immersive audio listening. For the experienced group, the impression of being enveloped was the most salient attribute, with spectral fidelity being the most important for the non-experienced group.
Download: PDF (HIGH Res) (3.4MB)
Download: PDF (LOW Res) (525KB)
Authors:Villegas, Julián; Fukasawa, Naoki; Arevalo, Camilo
Affiliation:University of Aizu, Aizu-Wakamatsu, Japan; Yamaha, Hamamatsu, Japan; University of Aizu, Aizu-Wakamatsu, Japan
We report the effect of the presence of floor on elevation estimation of audio spatialized with non-individualized Head-Related Impulse Responses (HRIRs). The results of two experiments (n = 21 and n = 39) suggest that using HRIRs captured when a floor simulator was present improved assessors' accuracy when judging elevation in the sagittal and coronal planes, especially at high elevations. Such improvements were not observed when signals delayed according to their computed first reflection were mixed with signals convolved with anechoic HRIRs. These findings suggest that capturing non-individualized HRIRs in hemi-anechoic rooms could improve accuracy of audio spatialization in virtual environments.
Download: PDF (HIGH Res) (5.1MB)
Download: PDF (LOW Res) (680KB)
Authors:El Baba, Youssef; Walther, Andreas; Habets, Emanuël A. P.
Affiliation:International Audio Laboratories, Erlangen, Germany; Fraunhofer Institute for Integrated Circuits, Erlangen, Germany; International Audio Laboratories, Erlangen, Germany
Acoustic reciprocity is a well-known and established concept, first proposed by Helmholtz and Rayleigh in the late 19th century. Acoustic path reciprocity has been extensively studied in the context of impulse response measurements as it allows us to interchange the locations of sensors and receivers without affecting the measurement. Electro-acoustic transducer reciprocity (also referred to as transducer reversibility) has been less studied. This work presents a literature overview of the science behind acoustic transducer reciprocity, namely Schottky's law of low-frequency reception,‡ and proposes a variant of this law that accounts for modern loudspeaker sensitivity conventions.While the proposed variant applies to all reversible transducer designs, a concrete specification is given for electro-dynamic moving-coil transducers. Furthermore a joint empirical validation of the original and proposed variants of the law is presented. Finally a hybrid empirical-theoretical scheme is proposed that uses the measured frequency response of the transducer used non-reciprocally along with Schottky's lawto predict the frequency response of the transducer used reciprocally.
Download: PDF (HIGH Res) (4.7MB)
Download: PDF (LOW Res) (490KB)
Authors:Lee, Hyunkook; Johnson, Dale
Affiliation:Applied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield, United Kingdom; Applied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield, United Kingdom
This paper describes a set of objective measurements carried out to compare various types of 3D microphone arrays, comprising OCT-3D, PCMA-3D, 2L-Cube, Decca Cuboid, Eigenmike EM32 (i.e., spherical microphone system), and Hamasaki Square with 0-m and 1-m vertical spacings of the height layer. Objective parameters that were measured comprised interchannel and spectral differences caused by interchannel crosstalk (ICXT), fluctuations of interaural level and time differences (ILD and ITD), interchannel correlation coefficient (ICC), interaural cross-correlation coefficient (IACC), and direct-to-reverberant energy ratio (DRR). These were chosen as potential predictors for perceived differences among the arrays. The measurements of the properties of ICXT and the time-varying ILD and ITD suggest that the arrays would produce substantial perceived differences in tonal quality as well as locatedness. The analyses of ICCs and IACCs indicate that perceived differences among the arrays in spatial impression would be larger horizontally rather than vertically. It is also predicted that the addition of the height channel signals to the base channel ones in reproduction would produce little effect on both source-image spread and listener envelopment, regardless of the array type. Finally, differences between the ear-input signals in DRR were substantially smaller than those observed among microphone signals.
Download: PDF (HIGH Res) (1.9MB)
Download: PDF (LOW Res) (1.2MB)
As research into the features of audio quality continues, the emphasis is increasingly on understanding the relationship with human emotions and how machines can be taught to do human-like analysis or synthesis. Separating the effects of audio content from those of its quality is a persistent challenge in this type of work.
Download: PDF (401KB)