144th AES CONVENTION Paper Session P20: Audio Processing and Effects — Part 1

AES Milan 2018
Paper Session P20

P20 - Audio Processing and Effects – Part 1

Friday, May 25, 13:15 — 15:45 (Scala 4)

Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK

P20-1 Deep Neural Networks for Cross-Modal Estimations of Acoustic Reverberation Characteristics from Two-Dimensional ImagesHomare Kon, Tokyo Institute of Technology - Meguro-ku, Tokyo, Japan; Hideki Koike, Tokyo Institute of Technology - Meguro-ku, Tokyo, Japan
In augmented reality (AR) applications, reproduction of acoustic reverberation is essential for creating an immersive audio experience. The audio component of an AR experience should simulate the acoustics of the environment that users are experiencing. Earlier, sound engineers could program all the reverberation parameters in advance for a scene or if the audience was in a fixed position. However, adjusting the reverberation parameters using conventional methods is difficult because all such parameters cannot be programmed for AR applications. Considering that skilled acoustic engineers can estimate reverberation parameters from an image of a room, we trained a deep neural network (DNN) to estimate reverberation parameters from two-dimensional images. The results suggest a DNN can estimate the acoustic reverberation parameters from one image.
Convention Paper 9995 (Purchase now)

P20-2 Deep Learning for Timbre Modification and Transfer: An Evaluation StudyLeonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Carmine Emanuel Cella, IRCAM - Paris, France; Fabio Vesperini, Universita Politecnica delle Marche - Ancona, Italy; Diego Droghini, Universita Politecnica delle Marche - Ancona, Italy; Emanuele Principi, Università Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy
In the past years, several hybridization techniques have been proposed to synthesize novel audio content owing its properties from two audio sources. These algorithms, however, usually provide no feature learning, leaving the user, often intentionally, exploring parameters by trial-and-error. The introduction of machine learning algorithms in the music processing field calls for an investigation to seek for possible exploitation of their properties such as the ability to learn semantically meaningful features. In this first work we adopt a Neural Network Autoencoder architecture, and we enhance it to exploit temporal dependencies. In our experiments the architecture was able to modify the original timbre, resembling what it learned during the training phase, while preserving the pitch envelope from the input.
Convention Paper 9996 (Purchase now)

P20-3 Feature Selection for Dynamic Range Compressor Parameter EstimationDi Sheng, Queen Mary University London - London, UK; György Fazekas, Queen Mary University of London - London, UK
Casual users of audio effects may lack practical experience or knowledge of their low-level signal processing parameters. An intelligent control tool that allows using sound examples to control effects would strongly benefit these users. In a previous work we proposed a control method for the dynamic range compressor (DRC) using a random forest regression model. It maps audio features extracted from a reference sound to DRC parameter values, such that the processed signal resembles the reference. The key to good performance in this system is the relevance and effectiveness of audio features. This paper focusses on a thorough exposition and assessment of the features, as well as the comparison of different strategies to find the optimal feature set for DRC parameter estimation, using automatic feature selection methods. This enables us to draw conclusions about which features are relevant to core DRC parameters. Our results show that conventional time and frequency domain features well known from the literature are sufficient to estimate the DRC's threshold and ratio parameters, while more specialized features are needed for attack and release time, which induce more subtle changes to the signal.
Convention Paper 9997 (Purchase now)

P20-4 Effect of Delay Equalization on Loudspeaker ResponsesAki Mäkivirta, Genelec Oy - Iisalmi, Finland; Juho Liski, Aalto University - Espoo, Finland; Vesa Välimäki, Aalto University - Espoo, Finland
The impulse response of a generalized two-way loudspeaker is modeled and is delay equalized using digital filters. The dominant features of a loudspeaker are low and high corner roll-off characteristics and the behavior at the crossover points. The proposed model characterizes also the main effects of the mass-compliance resonant system. The impulse response, its logarithm and spectrogram, and the magnitude and group delay responses are visualized and compared with those measured from a two-way loudspeaker. The model explains the typical group-delay variations and magnitude-response deviations from a flat response in the passband. The group-delay equalization of the loudspeaker is demonstrated in two different methods. The first method, the time-alignment of the tweeter and woofer elements using a bulk delay, is shown to cause ripple in the magnitude response. The second method, which flattens the group delay of the speaker model in the whole audio range, leads to pre-ringing in the impulse response.
Convention Paper 9998 (Purchase now)

P20-5 An Allpass Chirp for Constant Signal-to-Noise Ratio Impulse Response MeasurementElliot K. Canfield-Dafilou, Center for Computer Research in Music and Acosutics (CCRMA), Stanford University - Stanford, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
A method for designing an allpass chirp for impulse response measurement that ensures a constant signal-to-noise ratio (SNR) in the measurement is presented. By using the background noise and measurement system's frequency responses, a measurement signal can be designed by specifying the group delay trajectory. This signal will have a small crest factor and will be optimally short such that the measured impulse response will have a desired and constant SNR.
Convention Paper 10014 (Purchase now)

Return to Paper Sessions