The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, new products, and newsworthy developments in the field of audio.
Authors:Albertini, Davide; Bernardini, Alberto; Sarti, Augusto
Affiliation:Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Milano, Italy
The Wave Digital Filter (WDF) formalism is becoming a popular approach for the digital emulation of audio circuits. Nonlinear WDFs, like other kinds of discrete-time nonlinear filters used in Virtual Analog modeling applications, are often affected by aliasing distortion. Recently formalized Antiderivative Antialiasing (ADAA) methods are capable of significant aliasing reduction even with low oversampling factors. This paper discusses different strategies to integrate pth-order ADAA methods into stateful WDFs with a single one-port or multiport nonlinearity while preserving the modularity property typical of traditional WDFs. The effectiveness of the proposed approach is verified by applying the discussed ADAA techniques to three nonlinear audio circuits containing diode-based nonlinearities and a BJT transistor.
Download: PDF (HIGH Res) (1.3MB)
Download: PDF (LOW Res) (1.1MB)
Authors:Werner, Kurt James; Germain, Francois G.; Goldsmith, Cory S.
Affiliation:iZotope, Inc., Cambridge, Massachusetts
We propose time-varying Schroeder allpass filters and Gerzon allpass reverberators that remain energy preserving irrespective of arbitrary variation of their allpass gains or feedback matrices over time. We propose various ways of realizing the unitary matrix involved in the Schroeder structure, based on classic ladder and lattice filters and their generalizations. We show how to construct more elaborate structures including nestings and cascade, giving various strategies for reducing their implementation cost. Extending these algorithms to the multi-input, multi-output case yields time-varying, energy-preserving generalizations of Gerzon’s reverberator, providing a link between Schroeder allpass filters and Schelcht’s recently proposed “Allpass Feedback Delay Networks.” Stability proofs are given for common uses of Schroeder allpass filters, such as inside of Feedback Delay Network reference structures. Finally we give a substantial review of the properties of time-invariant Schroeder allpass filters.
Download: PDF (HIGH Res) (2.5MB)
Download: PDF (LOW Res) (2.0MB)
Authors:Das, Orchisama; Abel, Jonathan S.
Affiliation:Center for Computer Research in Music and Acoustics, Stanford University, USA
Delay Network reverberators are an efficient tool for synthesizing reverberation. We propose a novel architecture, called the Grouped Feedback Delay Network (GFDN) reverberator, with groups of delay lines sharing different target decay rates, and use it to simulate coupled room acoustics. Coupled spaces are common in apartments, concert halls, and churches where two or more volumes with different reverberation characteristics are linked via an aperture. The difference in reverberation times (T60s) of the coupled spaces leads to unique phenomena, such as multi-stage decay. Here the GFDN is used to simulate coupled spaces with groups of delay line filters representing the T60s of the coupled rooms. A parameterized, orthonormal mixing matrix is presented that provides control over the mixing times of the rooms and amount of coupling between the rooms. As an example application we measure a coupled bedroom and bathroom system separated by a door in an apartment and use the GFDN to synthesize the late field for different openings of the door separating the two rooms, thereby varying coupling between the rooms.
Download: PDF (HIGH Res) (6.8MB)
Download: PDF (LOW Res) (3.6MB)
Authors:Meaux, Eric; Marchand, Sylvain
Affiliation:L3i, University of La Rochelle, France
The synthetic transaural audio rendering (STAR) method aims at canceling the cross-talk signals between two loudspeakers and the ears of the listener (in a transaural way), with acoustic paths not measured but computed by some model (thus synthetic). Our model is based on perceptive cues used by the human auditory system for sound localization. The aim is to give the listener the sense of the position of each source rather than reconstruct the corresponding acoustic wave or field. Although the method currently focuses on the azimuth dimension, extensions to elevation and distance are now considered, for full 3D sound, with a discussion to conduct further works needed to improve overall quality and validate such extensions.
Download: PDF (HIGH Res) (755KB)
Download: PDF (LOW Res) (508KB)
Authors:Najnudel, Judy; Hélie, Thomas; Roze, David; Müller, Rémy
Affiliation:S3AM team, STMS laboratory IRCAM - CNRS - SU Paris, France
This paper is concerned with the modeling of ferromagnetic coils with audio applications in mind. The proposed approach derives a macroscopic, energy-based formulation from statistical physics. This choice allows for thermodynamic variables to be explicitly taken into account. As a consequence, macroscopic features such as saturation and hysteresis arise directly. As the proposed model is expressed through a port-Hamiltonian formulation, power balance and passivity are guaranteed. Moreover the model may be straightforwardly connected to other multi-physical components and included in more complex systems. The proposed model is compared to measurements on a real ferromagnetic coil. Simulations of a passive band-pass filter and a transformer built around the model are presented as an illustration.
Download: PDF (HIGH Res) (5.6MB)
Download: PDF (LOW Res) (1.0MB)
Authors:Wright, Alec; Välimäki, Vesa
Affiliation:Acoustics Lab, Dept. of Signal Processing and Acoustics, Aalto University, FI-02150 Espoo, Finland
This article further explores a previously proposed gray-box neural network approach to modeling LFO (low-frequency oscillator) modulated time-varying audio effects. The network inputs are both the unprocessed audio and LFO signal. This allows the LFO to be freely controlled after model training. This paper introduces an improved process for accurately measuring the frequency response of a time-varying system over time, which is used to annotate the neural network training data with the LFO of the effect being modeled. Accuracy is improved by using a frequency domain synthesized chirp signal and using shorter and more closely spaced chirps. A digital flanger effect is used to test the accuracy of the method and neural network models of two guitar effects pedals, a phaser and flanger, were created. The improvement in the system measurement method is reflected in the accuracy of the resulting models, which significantly outperform previously reported results. When modeling a phaser and flanger pedal, error-to-signal ratios of 0.2% and 0.3% were achieved, respectively. Previous work suggests errors of this size are often inaudible. The model architecture can run in real time on a modern computer while using relatively little processing power.
Download: PDF (HIGH Res) (6.1MB)
Download: PDF (LOW Res) (2.2MB)
Author:Wells, Jeremy J.
Affiliation:Department of Music, University of York, UK
A modeling system for the impulse responses (IRs) of reverberators is presented. The overarching purpose of this system is to offer similar levels of control over captured IRs to that of algorithmic reverberators while retaining their acoustic plausibility and, where desired, realism. Specifically an approach to estimating the parameters of the model is presented that offers a significant reduction in the computational requirements of the matrix decomposition method ESPRIT, while offering vastly improved quality than is possible by using a single Fourier analysis. These methods are compared, first on large sets of short-duration synthetic signals and then on a wide range of typical IRs, some many seconds in duration. Finally systems that employ the model described and the analysis method it uses are discussed.
Download: PDF (HIGH Res) (450KB)
Download: PDF (LOW Res) (249KB)
Authors:Tan, Yiyu; Imamura, Toshiyuki; Kondo, Masaaki
Affiliation:RIKEN Center for Computational Science
Sound field rendering is computation-intensive and memory-intensive. This research investigates an FPGA-based accelerator for sound field rendering with an FDTD scheme, in which wave equations are directly implemented by reconfigurable hardware, and spatial blocking is applied to alleviate the memory bandwidth requirement. Compared to software simulation performed on a desktop machine with 128 GB DDR4 RAMs and an Intel i7-7820X processor running at 3.6 GHz, the proposed FPGA-based accelerator achieves up to 2.98 times more in computing performance in the case of different layer sizes and different numbers of nodes computed in parallel even though the FPGA system runs at about 267 MHz.
Download: PDF (HIGH Res) (2.0MB)
Download: PDF (LOW Res) (1.3MB)
Authors:Arend, Johannes M.; Garí, Sebastià V. Amengual; Schissler, Carl; Klein, Florian; Robinson, Philip W.
Affiliation:Institute of Communications Engineering, TH Köln - University of Applied Sciences, Cologne, D-50679, Germany; Audio Communication Group, Technical University of Berlin, Berlin, D-10587, Germany; Facebook Reality Labs Research, Redmond, WA 98052, USA; Facebook Reality Labs Research, Redmond, WA 98052, USA; Electronic Media Technology Lab, Technical University of Ilmenau, Ilmenau, D-98693, Germany; Facebook Reality Labs Research, Redmond, WA 98052, USA
Parametric spatial audio rendering is a popular approach for low computing capacity applications, such as augmented reality systems. However most methods rely on spatial room impulse responses (SRIR) for sound field rendering with 3 degrees of freedom (DoF), i.e., for arbitrary head orientations of the listener, and often require multiple SRIRs for 6-DoF rendering, i.e., when additionally considering listener translations. This paper presents a method for parametric spatial audio rendering with 6 DoF based on one monaural room impulse response (RIR). The scalable and perceptually motivated encoding results in a parametric description of the spatial sound field for any listener’s head orientation or position in space. These parameters form the basis for the binaural room impulse responses (BRIR) synthesis algorithm presented in this paper. The physical evaluation revealed good performance, with differences to reference measurements at most tested positions in a room below the just-noticeable differences of various acoustic parameters. The paper further describes the implementation of a 6-DoF realtime virtual acoustic environment (VAE) using the synthesized BRIRs. A pilot study assessing the plausibility of the 6-DoF VAE showed that the system can provide a plausible binaural reproduction, but it also revealed challenges of 6-DoF rendering requiring further research.
Download: PDF (HIGH Res) (8.7MB)
Download: PDF (LOW Res) (1.5MB)
Authors:Singh, Shubhr; Bromham, Gary; Sheng, Di; Fazekas, György
Affiliation:Centre for Digital Music (C4DM) Queen Mary University of London London, UK
Music producers and casual users often seek to replicate dynamic range compression used in a particular recording or production context for their own track. However, not knowing the parameter settings used to produce the audio using the effect may become an impediment, especially for beginners or untrained users who may lack critical listening skills. We address this issue by presenting an automatic compressor plugin relying on a neural network to extract relevant features from a reference signal and estimate compression parameters. The plugin automatically adjusts its parameters to match the input signal with a reference audio recording as closely as possible. Quantitative and qualitative usability evaluation of the plugin was conducted with amateur, pro-amateur and professional music producers. The results established acceptance of the core idea behind the proposed control method across these user groups.
Download: PDF (HIGH Res) (3.4MB)
Download: PDF (LOW Res) (399KB)
Authors:Rund, František; Vencovský, Václav; Semanský, Marek
Affiliation:Czech Technical University in Prague, Department of Radioelectronics, Czech Republic
This paper evaluates the ability of several algorithms to detect impulse distortions (clicks) in audio signals. The systems are evaluated against data from a listening test conducted using real audio signals provided by a vinyl manufacturer. Some of the signals contained clicks due to damage during the manufacturing process. An evaluation of click detection algorithms against listening test results focuses on the ability of the click-detection algorithms to detect perceptible clicks. The results presented in this paper show that an algorithm that employs a hearing model detected audible clicks with a lower false detection rate than the other algorithms in the test and that the wavelet transform–based algorithm with a dynamic threshold outperformed the other algorithms.
Download: PDF (HIGH Res) (1.2MB)
Download: PDF (LOW Res) (453KB)
Authors:Comunità, Marco; Stowell, Dan; Reiss, Joshua D.
Affiliation:Centre for Digital Music, Queen Mary University of London, UK
Despite the popularity of guitar effects there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion, and fuzz guitar effects. A novel dataset of processed electric guitar samples was assembled, with four sub-datasets consisting of monophonic or polyphonic samples and discrete or continuous settings values, for a total of about 250 hours of processed samples. Results were compared for networks trained and tested on the same or a different subdataset. We found that discrete datasets could lead to equally high performance as continuous ones while being easier to design, analyze, and modify. Classification accuracy was above 80%, with confusion matrices reflecting similarities in the effects timbre and circuits design. With parameter values between 0.0 and 1.0, the mean absolute error is in most cases below 0.05, while the root mean square error is below 0.1 in all cases but one.
Download: PDF (HIGH Res) (1.6MB)
Download: PDF (LOW Res) (837KB)
Workshops on audio archiving, presented at the Spring Show 2021, gave a taste of the broad range of tasks and challenges in the day-to-day life of those working in the archiving and restoration field. Despite the work undertaken to date, there is still a very large amount of undigitized material in archives of various qualities, some of which is at a critical state. Relatively narrow windows of opportunity exist to save this valuable cultural heritage.
Download: PDF (598KB)