AES Dublin 2019
Paper Session P11
P11 - Machine Learning: Part 2
Thursday, March 21, 16:00 — 18:00 (Meeting Room 2)
Bezal Benny, University of Victoria - Victoria, Canada
P11-1 Audio Inpainting of Music by Means of Neural Networks—Andrés Marafioti, Austrian Academy of Sciences - Vienna, Austria; Nicki Holighaus, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria; Nathanaël Perraudin, Swiss Data Science Center - Switzerland
We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps and represented by time-frequency (TF) coefficients. For music, our DNN significantly outperformed the reference method based on linear predictive coding (LPC), demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.
Convention Paper 10170 (Purchase now)
P11-2 A Literature Review of WaveNet: Theory, Application, and Optimization—Jonathan Boilard, Universite de Sherbrooke - Sherbrooke, Quebec, Canada; Philippe Gournay, Universite de Sherbrooke - Sherbrooke, QC, Canada; Roch Lefebvre, Universite de Sherbrooke - Sherbrooke, QC, Canada
WaveNet is a deep convolutional artificial neural network. It is also an autoregressive and probabilistic generative model; it is therefore by nature perfectly suited to solving various complex problems in speech processing. It already achieves state-of-the-art performance in text-to-speech synthesis. It also constitutes a radically new and remarkably efficient tool to perform voice transformation, speech enhancement, and speech compression. This paper presents a comprehensive review of the literature on WaveNet since its introduction in 2016. It identifies and discusses references related to its theoretical foundation, its application scope, and the possible optimization of its subjective quality and computational efficiency.
Convention Paper 10171 (Purchase now)
P11-3 Sparse Autoencoder Based Multiple Audio Objects Coding Method—Shuang Zhang, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
The traditional multiple audio objects codec extracts the parameters of each object in the frequency domain and produces serious confusion because of high coincidence degree in subband among objects. This paper uses sparse domain instead of frequency domain and reconstruct audio object using the binary mask from the down-mixed signal based on the sparsity of each audio object. In order to overcome high coincidence degree of subband among different audio objects, the sparse autoencoder neural network is established. On this basis, a multiple audio objects codec system is built up. To evaluate this proposed system, the objective and subjective evaluation are carried on and the results show that the proposed system has the better performance than SAOC.
Convention Paper 10172 (Purchase now)
P11-4 Poster Introductions 6—N/A
The purpose of Poster Introductions at the end of certain paper sessions is to give the poster authors a chance to briefly outline what is in their paper and encourage people to come to their poster session and ask questions. • jReporter: A Smart Voice-Recording Mobile Application—Lazaros Vrysis; Nikolaos Vryzas; Efstathios Sidiropoulos; Evangelia Avraam; Charalampos Dimoulas • Two-Channel Sine Sweep Stimuli: A Case Study Evaluating 2-n Channel Upmixers—Laurence J. Hobden; Christopher Gribben • A Rendering Method for Diffuse Sound—Akio Ando