AES Journal

Journal of the AES

2021 November - Volume 69 Number 11


Papers

Sound Level Monitoring at Live Events, Part 1--Live Dynamic Range

Open Access

Open
Access

Authors:
Affiliation:
Page:

Musical dynamics are often central within pieces of music and are therefore likely to be fundamental to the live event listening experience. While metrics exist in broadcasting and recording to quantify dynamics, such measures work on high-resolution data. Live event sound level monitoring data is typically low-resolution (logged at one second intervals or less), which necessitates bespoke musical dynamics quantification. Live dynamic range (LDR) is presented and validated here to serve this purpose, wheremeasurement data is conditioned to remove song breaks and sound level regulation-imposed adjustments to extract the true musical dynamics from a live performance. Results show consistent objective performance of the algorithm, as tested on synthetic data as well as datasets from previous performances.

  Download: PDF (HIGH Res) (3.7MB)

  Download: PDF (LOW Res) (1.2MB)

  Be the first to discuss this paper

Machine Learning--Based Splicing Detection in Digital Audio Recordings for Audio Forensics

Authors:
Affiliation:
Page:

Authentication of audio recordings is an important task in the field of audio forensics. Splicing is the practice of manipulating recorded audio to replace or insert an external sound into the original audio track. Due to the ease with which digital audio recordings can be spliced, forgery and tampering of audio recordings with a criminal intent or intent to destroy their integrity are common practices. This paper describes a methodology for splicing detection in digital audio recordings with a comparative analysis of the effectiveness of different feature sets and classifiers. Different feature sets including conventional, chroma, and reverberation-based features are evaluated, compared, and combined to produce better classification accuracy. Exhaustive experimentation has been done to take into account factors such as the duration of the attack, effect of noise, and effect of compression. The Analytic Hierarchy Process is used to evaluate different performance parameters and identify the most suitable machine learning classifier for splicing detection based on priority weights assigned to the different performance parameters. Results indicate that Long Short-Term Memory with a feature set containing Mel-Frequency Cepstral Coefficients and Decay Rate Distribution features has the best performance compared with other classifiers and feature sets.

  Download: PDF (HIGH Res) (6.5MB)

  Download: PDF (LOW Res) (746KB)

  Be the first to discuss this paper

First-Order Loudspeaker Design and an Experimental Application on Sound Field Reproduction With Sparse Equivalent Source Method

Authors:
Affiliation:
Page:

In this paper, a first-order loudspeaker, which is composed of monopole and dipole units, is designed and manufactured. The structure and size of the loudspeaker is shown. It is able to control the sound energy radiation with first-order beam control. The directivity of the loudspeaker is measured with a turntable andmicrophone arm. The directivity control ability is examined by the synthesis of a cardioid directivity. After that, a circular first-order loudspeaker array is constructed in order to investigate the array's performance on sound field reproduction system with exterior cancellation. The reproduction and energy radiation control performance of this first-order loudspeaker array is compared with a monopole array by experiment in the free field. At last, in order to reduce the effort on the loudspeaker array acoustic transfer function measurement, a sparse equivalent source method is proposed. The performance of the proposed method is compared with the conventional pressure-matching method and a previous equivalent source method.

  Download: PDF (HIGH Res) (20.6MB)

  Download: PDF (LOW Res) (1.2MB)

  Be the first to discuss this paper

Real-Time Binaural Room Modelling for Augmented Reality Applications

Open Access

Open
Access

Authors:
Affiliation:
Page:

This paper proposes and evaluates an integrated method for real-time, head-tracked, 3D binaural audio with synthetic reverberation. Virtual vector base amplitude panning is used to position the sound source and spatialize outputs from a scattering delay network reverb algorithm running in parallel. A unique feature of this approach is its realization of interactive auralization using vector base amplitude panning and a scattering delay network, within acceptable levels of latency, at low computational cost. The rendering model also allows direct parameterization of room geometry and absorption characteristics. Varying levels of reverb complexity can be implemented, and these were evaluated against two distinct aspects of perceived sonic immersion. Outcomes from the evaluation provide benchmarks for how the approach could be deployed adaptively, to balance three real-time spatial audio objectives of envelopment, naturalness, and efficiency, within contrasting physical spaces.

  Download: PDF (HIGH Res) (3.0MB)

  Download: PDF (LOW Res) (859KB)

  Be the first to discuss this paper

Influence of the Listening Environment on Recognition of Immersive Reproduction of Orchestral Music Sound Scenes

Open Access

Open
Access

Authors:
Affiliation:
Page:

This study investigates how a listening environment (the combination of a room's acoustics and reproduction loudspeaker) influences a listener's perception of reproduced sound fields. Three distinct listening environmentswith different reverberation times and clarity indices were compared for their perceptual characteristics. Binaural recordings were made of orchestral music, mixed for 22.2 and 2-channel audio reproduction, within each of the three listening rooms. In a subjective listening test, 48 listeners evaluate these binaural recordings in terms of overall preference and five auditory attributes: perceived width, perceived depth, spatial clarity, impression of being enveloped, and spectral fidelity. Factor analyses of these five attribute ratings show that listener perception of the reproduced sound fields focused on two salient factors, spatial and spectral fidelity, yet the attributes' weightings in those two factors differs depending on a listener's previous experience with audio production and 3D immersive audio listening. For the experienced group, the impression of being enveloped was the most salient attribute, with spectral fidelity being the most important for the non-experienced group.

  Download: PDF (HIGH Res) (3.4MB)

  Download: PDF (LOW Res) (525KB)

  Be the first to discuss this paper

The Presence of a Floor Improves Subjective Elevation Accuracy of Binaural Stimuli Created With Non-Individualized Head-Related Impulse Responses

Authors:
Affiliation:
Page:

We report the effect of the presence of floor on elevation estimation of audio spatialized with non-individualized Head-Related Impulse Responses (HRIRs). The results of two experiments (n = 21 and n = 39) suggest that using HRIRs captured when a floor simulator was present improved assessors' accuracy when judging elevation in the sagittal and coronal planes, especially at high elevations. Such improvements were not observed when signals delayed according to their computed first reflection were mixed with signals convolved with anechoic HRIRs. These findings suggest that capturing non-individualized HRIRs in hemi-anechoic rooms could improve accuracy of audio spatialization in virtual environments.

  Download: PDF (HIGH Res) (5.1MB)

  Download: PDF (LOW Res) (680KB)

  Be the first to discuss this paper

Schottky's Law of Low-Frequency Reception for Modern Sensitivity Conventions

Authors:
Affiliation:
Page:

Acoustic reciprocity is a well-known and established concept, first proposed by Helmholtz and Rayleigh in the late 19th century. Acoustic path reciprocity has been extensively studied in the context of impulse response measurements as it allows us to interchange the locations of sensors and receivers without affecting the measurement. Electro-acoustic transducer reciprocity (also referred to as transducer reversibility) has been less studied. This work presents a literature overview of the science behind acoustic transducer reciprocity, namely Schottky's law of low-frequency reception,‡ and proposes a variant of this law that accounts for modern loudspeaker sensitivity conventions.While the proposed variant applies to all reversible transducer designs, a concrete specification is given for electro-dynamic moving-coil transducers. Furthermore a joint empirical validation of the original and proposed variants of the law is presented. Finally a hybrid empirical-theoretical scheme is proposed that uses the measured frequency response of the transducer used non-reciprocally along with Schottky's lawto predict the frequency response of the transducer used reciprocally.

  Download: PDF (HIGH Res) (4.7MB)

  Download: PDF (LOW Res) (490KB)

  Be the first to discuss this paper

Engineering Reports

3D Microphone Array Comparison: Objective Measurements

Open Access

Open
Access

Authors:
Affiliation:
Page:

This paper describes a set of objective measurements carried out to compare various types of 3D microphone arrays, comprising OCT-3D, PCMA-3D, 2L-Cube, Decca Cuboid, Eigenmike EM32 (i.e., spherical microphone system), and Hamasaki Square with 0-m and 1-m vertical spacings of the height layer. Objective parameters that were measured comprised interchannel and spectral differences caused by interchannel crosstalk (ICXT), fluctuations of interaural level and time differences (ILD and ITD), interchannel correlation coefficient (ICC), interaural cross-correlation coefficient (IACC), and direct-to-reverberant energy ratio (DRR). These were chosen as potential predictors for perceived differences among the arrays. The measurements of the properties of ICXT and the time-varying ILD and ITD suggest that the arrays would produce substantial perceived differences in tonal quality as well as locatedness. The analyses of ICCs and IACCs indicate that perceived differences among the arrays in spatial impression would be larger horizontally rather than vertically. It is also predicted that the addition of the height channel signals to the base channel ones in reproduction would produce little effect on both source-image spread and listener envelopment, regardless of the array type. Finally, differences between the ear-input signals in DRR were substantially smaller than those observed among microphone signals.

  Download: PDF (HIGH Res) (1.9MB)

  Download: PDF (LOW Res) (1.2MB)

  Be the first to discuss this report

Standards and Information Documents

AES Standards Committee News

Page: 888

Download: PDF (79KB)

Features

Quality, Emotion, and Machines

Author:
Page:

As research into the features of audio quality continues, the emphasis is increasingly on understanding the relationship with human emotions and how machines can be taught to do human-like analysis or synthesis. Separating the effects of audio content from those of its quality is a persistent challenge in this type of work.

  Download: PDF (401KB)

  Be the first to discuss this feature

Call for Papers: Special Issue on Expanding Frontiers of Web Audio

Page: 895

Download: PDF (74KB)

Departments

Book Reviews

Page: 896

Download: PDF (260KB)

Sustaining Members

Page: 898

Download: PDF (141KB)

Conv&Conf

Page: 906

Download: PDF (12.4MB)

Extras

Table of Contents

Download: PDF (66KB)

Cover & Sustaining Members List

Download: PDF (36KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (68KB)

AES - Audio Engineering Society