AES Journal

Journal of the AES

2023 December - Volume 71 Number 12

Editor's Note and 2023 Reviewers

Authors: Välimäki, Vesa

Page: 824

Download: PDF (77KB)

Review Papers

The State of the Art in Procedural Audio

Open Access



Procedural audio may be defined as real-time sound generation according to programmatic rules and live input. It is often considered a subset of sound synthesis and is especially applicable to nonlinear media, such as video games, virtual reality experiences and interactive audiovisual installations. However, there is resistance to widespread adoption of procedural audio because there is little awareness of the state of the art, including the diversity of sounds that may be generated, the controllability of procedural audio models, and the quality of the sounds that it produces. The authors address all of these aspects in this reviewpaper,while attempting a largescale categorization of sounds that have been approached through procedural audio techniques. The role of recent advancements in neural audio synthesis, its current implementations, and potential future applications in the field are also discussed. Review materials are available*.

  Download: PDF (HIGH Res) (3.5MB)

  Download: PDF (LOW Res) (690KB)

  Be the first to discuss this reviewPaper


The Effects of Individualized Binaural Room Transfer Functions for Personal Sound Zones

Open Access



The extent to which the performance of personal sound zone (PSZ) reproduction systems is impacted by the individualization of Binaural Room Transfer Functions (BRTFs) and the coupling between the listeners' BRTFs was investigated experimentally. Such knowledge can be valuable for deriving rules for the design of high-performance, robust PSZ systems. The performance of a PSZ system consisting of eight frontal mid-range loudspeakers was objectively evaluated with PSZ filters designed using individualized BRTFs of a human listener and generic ones measured from a mannequin head, in terms of Inter-Zone Isolation, Inter-Program Isolation, and robustness against slight head misalignments. Itwas found that when no misalignments were introduced, Inter-Zone Isolation and Inter-Program Isolation are improved by an average of around 4 dB at all frequencies between 200 and 7,000 Hz by the individualized filters, compared to the generic ones. With constrained head misalignments, the robustness of both filters decreases as the frequency increases, and although the individualized filters maintain higher performance, their robustness above 2 kHz is lower than that of the generic ones. The evaluation also reveals an inter-listener BRTF coupling effect and a detrimental impact on the performance for both listeners when a single listener's BRTF is mismatched.

  Download: PDF (HIGH Res) (6.9MB)

  Download: PDF (LOW Res) (1.0MB)

  Be the first to discuss this paper

Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes

Open Access



Individual sounds are difficult to detect in complex soundscapes because of a strong overlap. This article explores the task of estimating sound polyphony, which is defined here as the number of audible sound classes. Sound polyphony measures the complexity of a soundscape and can be used to inform sound classification algorithms. First, a listening test is performed to assess the difficulty of the task.The results showthat humans are only able to reliably count up to three simultaneous sound sources and that they underestimate the degree of polyphony for more complex soundscapes. Human performance depends mainly on the spectral characteristics of the sounds and, in particular, on the number of overlapping noise-like and transient sounds. In a second step, four deep neural network architectures, including an object detection approach for natural images, are compared to contrast human performance with machine learning--based approaches. The results show that machine listening systems can outperform human listeners for the task at hand. Based on these results, an implicit modeling of the sound polyphony based on the number of previously detected sound classes seems less promising than the explicit modeling strategy.

  Download: PDF (HIGH Res) (6.3MB)

  Download: PDF (LOW Res) (786KB)

  Be the first to discuss this paper

Assessment of Recovery Journal-Based Packet Loss Concealment Techniques for Low-Latency MIDI Streaming


In networked music performances, real-time Packet Loss Concealment is a task of pivotal importance to compensate the detrimental impact of loss or late delivery of audio portions that often occur in low-latency audio-streaming scenarios. This paper proposes an open-loop Packet Loss Concealment method tailored for MIDI data and compares it to a closed-loop state-of-the-art benchmark in terms of effectiveness of audio recovery and communication overhead. Moreover, implementations aimed at reducing the computational overhead are proposed and compared for both approaches. Results show that the proposed open-loop policy achieves performances similar to those of the closed-loop one, while reducing the number of operations executed at the transmitter side.

  Download: PDF (HIGH Res) (2.9MB)

  Download: PDF (LOW Res) (928KB)

  Be the first to discuss this paper

Speech Intelligibility and Quality Evaluation of Automotive Microphones Using Different Test Metrics and Their Correlation


Speech intelligibility and speech quality (SI&SQ) of voice microphones used in automotive hands-free communication systems are affected not only by microphone acoustic characteristics but also by interactions between the microphone and the background acoustic field. Due to the complex acoustic environment inside a vehicle cabin which constantly changes with driving modes, it is not a trivial task to choose proper acoustic characteristics of voice microphones for different vehicle and cabin designs. To establish a relationship between microphone characteristics and SI&SQ performance in automotive applications, a study is conducted using three common types of automotive voice microphones. Their performance is evaluated using both subjective and objective metrics described in ANSI standards S3.2-2009, S3.5-1997 and ITU-T Recommendation P.862.2. It is found that the objective SI index (SII) and the subjective SI results correlate nearly linearly. Furthermore, when the original SII scores are weighted by the speech-to-noise ratio, the weighted SII (wSII) data also exhibit linear correlation relationship with the objectively calculated subjective mean opinion score. Because the SII/wSII calculation is significantly less complex than the SI or mean opinion score evaluation process, results from this study demonstrate that the SI/wSII may be conveniently used as a tool to guide automotive voice microphone designs and evaluations.

  Download: PDF (HIGH Res) (7.0MB)

  Download: PDF (LOW Res) (1.1MB)

  Be the first to discuss this paper

Standards and Information Documents

AES Standards Committee News

Page: 900

Download: PDF (69KB)


Call for Nominations

Page: 903

Download: PDF (36KB)


Page: 904

Download: PDF (365KB)

AES Bylaws

Page: 918

Download: PDF (78KB)

Financial Statement

Page: 922

Download: PDF (119KB)


Book Review

Page: 902

Download: PDF (96KB)


Page: 924

Download: PDF (20.5MB)


Table of Contents

Download: PDF (45KB)

Cover & Sustaining Members List

Download: PDF (34KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (132KB)

AES - Audio Engineering Society