Italian - November 25, 2020

The Intelligent Signal Processing and Multi-Media (ISPAMM) research team of the Department of Information Engineering, Electronics and Telecommunications of Sapienza University of Rome, together with the Italian Section of the Audio Engineering Society, organized a series of meetings on the theme: Artificial Intelligence for Sound Synthesis and Analysis.

The topic of deep learning holds increasing importance in research and technology for audio, acoustics, speech, and music. This session introduces the fundamental ideas of what it is, how to use it, and how to develop a system for processing audio signals based on deep learning methods.
The talk starts with examples of off-the-shelf functions that leverage published and well-known deep learning models.
Then, the entire development cycle of a simple speech classification system is discussed as an example, from the audio dataset to the real-time implementation on a Raspberry Pi board.
Among the topics covered are: the use of pre-trained networks such as YAMNet or VGGish, the design, and training of original deep learning models, feature extraction with conventional algorithms (MFCC, GTCC) and neural techniques (VGGish, Wavelet Scattering), the acceleration of network training and feature extraction with GPU cards, the generation of C++ source code for embedded implementations.

