AES Journal

Journal of the AES

2024 March - Volume 72 Number 3


Papers

Diffusion-Based Audio Inpainting

Open Access

Open
Access

Authors:
Affiliation:
Page:

Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most existing methods produce plausible reconstructions when the gap lengths are short, but struggle to reconstruct gaps larger than about 100 ms. This paper explores diffusion models, a recent class of deep learning models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting, and is able to regenerate gaps of any size. An improved deep neural network architecture based on the constant-Q transform that allows the model to exploit pitchequivariant symmetries in audio is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps, up to 300 ms. The results of a formal listening test indicate that, for short gaps in the range of 50 ms, the proposed method delivers performance comparable to the baselines. For wider gaps up to 300 ms long, our method outperforms the baselines and retains good or fair audio quality. The method presented in this paper can be applied to restoring sound recordings that suffer from severe local disturbances or dropouts.

  Download: PDF (HIGH Res) (2.2MB)

  Download: PDF (LOW Res) (985KB)

  Be the first to discuss this paper

Evaluation of Real-Time Aliasing Reduction Methods in Neural Networks for Nonlinear Audio Effects Modelling

Authors:
Affiliation:
Page:

Neural networks have seen increased popularity in recent years for nonlinear audio effects modelling. Such a task requires sampling and creates high frequency harmonics that can quickly surpass the Nyquist rate, creating aliasing in the baseband. In this work, we study the impact of processing audio with neural networks and the potential aliasing these highly nonlinear algorithms can incur or aggravate. Namely, we evaluate the performance of a number of anti-aliasing methods for use in real-time. Notably, one method of anti-aliasing capable of real-time performance was identified: forced sparsity through network pruning.

  Download: PDF (HIGH Res) (4.5MB)

  Download: PDF (LOW Res) (1.3MB)

  Be the first to discuss this paper

Properties of Nonlinear Distortions and Related Measures in Audio Amplifiers

Author:
Affiliation:
Page:

The classical characterization of nonlinear distortions of electronic devices such as audio amplifiers involves the calculation of some indicators, such as Total Harmonic Distortion, Total Harmonic Distortion + Noise, and Intermodulation Distortion, obtained by measuring the additional spectral components generated by the device against conventional input signals. This paper will explore the relationships that link these components and therefore how they affect the calculation of indicators. In particular, it will be seen how the current the measures, leaving out some components, make them not representative of the overall entity of nonlinear distortions suffered by the signal. The topic will be developed using black-box--type models, untethered from the particular circuit type of the physical device. Thorough knowledge of spectral relationships can be a guide in tuning amplifiers; measurements, recalculated by integrating missing components, can be used both to more accurately frame the distorting effects of amplifiers and to enable more appropriate classification.

  Download: PDF (HIGH Res) (7.4MB)

  Download: PDF (LOW Res) (2.1MB)

  Be the first to discuss this paper

Evaluation of Active Occlusion Effect Cancellation in Earphones by Subjective, Real-Ear and Coupler Measurements

Authors:
Affiliation:
Page:

Users of earphones, hearing aids or other ear-worn devices frequently experience an unnatural or "boomy" sound of their own voice. This is caused by the occlusion effect, i.e., an amplification of body-conducted components at low frequencies and an attenuation of airconducted high-frequency components of the voice. Although the classic method to reduce the occlusion effect is to partly open the ear canal, active control of the ear canal sound pressure to improve own-voice perception, referred to as Occlusion Effect Cancellation (OEC), is now provided in the transparency mode of many commercial active noise control earphones. In this work, the OEC functionality of four earphones has been evaluated by subjective ratings, probe tube measurements, and measurements in a prototype coupler that features a simulation of body- and air-conducted own-voice components. Results show substantial benefits of OEC that differ between devices and that the various effects of ear canal occlusion across the whole frequency range have to be compensated for satisfactory own-voice quality. Measurements in the prototype coupler approximate the occlusion effects in real ears and may be a useful complement to tedious and potentially unreliable real-ear measurements in human subjects.

  Download: PDF (HIGH Res) (8.5MB)

  Download: PDF (LOW Res) (4.3MB)

  Be the first to discuss this paper

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

Authors:
Affiliation:
Page:

Prosody conversion is an important part in voice conversion, where fundamental frequency (F0), which carries important speaker individuality information (e.g., tone, intonation, etc.), is regarded as one of the key prosodic features in the excitation model for speech synthesis. In a conventional approach based on continuous wavelet transform for modeling F0, analysis is carried out on a frame level and is prone to losing high-frequency information in the process of decomposition and reconstruction. In order to address this problem, the paper shows a representation of long-term fundamental frequency based on Wavelet Packet Transform (WPT). Specifically, the long-term F0 is decomposed usingWPT, and a joint vector is formed by combining the resulted average power spectrum. Furthermore, the method is applied in a voice conversion system. Voice conversion experiments are conducted on Chinese and English speech data to evaluate the performance of the proposed method. The results show that the proposed method is obviously better than the method based on wavelet transform in all conversion scenarios but performs a little worse than the method based on mean and variance in same-gender conversion scenario.

  Download: PDF (HIGH Res) (3.9MB)

  Download: PDF (LOW Res) (577KB)

  Be the first to discuss this paper

Engineering Reports

A Database with Directivities of Musical Instruments

Open Access

Open
Access

Authors:
Affiliation:
Page:

This article presents a database of recordings and radiation patterns of individual notes for 41 modern and historical musical instruments, measured with a 32-channel spherical microphone array in anechoic conditions. In addition, directivities averaged in 1/3-octave bands have been calculated for each instrument, which are suitable for use in acoustic simulation and auralization. The data are provided in Spatially Oriented Format for Acoustics. Spatial upsampling of the directivities was performed based on spherical spline interpolation and converted to OpenDAFF and Generic Loudspeaker Library formats for use in room acoustic and electro-acoustic simulation software. For this purpose, a method is presented for how these directivities can be referenced to a specific microphone position in order to achieve a physically correct auralization without coloration. The data is available under the CC BY-NC 4.0 license.

  Download: PDF (HIGH Res) (3.6MB)

  Download: PDF (LOW Res) (579KB)

  Be the first to discuss this report

Standards and Information Documents

AES Standards Committee News

Page: 180

Download: PDF (269KB)

Departments

Conv&Conf

Page: 184

Download: PDF (645KB)

Extras

Table of Contents

Download: PDF (44KB)

Cover & Sustaining Members List

Download: PDF (64KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (126KB)

AES - Audio Engineering Society