Technical Council & Committees

AES 156th Convention Heyser Lecture

 

 

AES 156th Convention
June 15-17, 2024

Xavier Serra will present the Richard C. Heyser Memorial Lecture during the 156th Convention:

 

From Audio Processing to Music Understanding – a Research Journey

My PhD research, carried out in the 1980s, focused on modeling complex sounds. By using spectral analysis and synthesis techniques we developed a deterministic plus stochastic model able to obtain sonically and musically meaningful audio parameterizations. That research found practical applications in synthesizing and transforming a wide variety of sounds, including the human singing voice.

As a natural progression of that research, in the 1990s, it became interesting and relevant to analyze collections of sounds, thus aiming to describe and model the relationships between sound entities. To accomplish this, we incorporated machine learning methodologies to complement the signal processing approaches used until then. This research was the beginning of the Music Information Retrieval (MIR) field, within which the aim is to analyze and describe music collections.

In the 2000s, with the growth of the Web, scaling these analysis technologies gained importance. In our research group we embarked on curating and leveraging large audio collections with which to conduct research in this direction and develop efficient software tools supporting music search, retrieval, and recommendation systems many of which gained relevance for the music industry.

As web-based music applications became globalized, it became clear that the existing research approaches and systems had important cultural biases. Thus, in the 2010s, we started to work on refining music description methodologies, integrating domain knowledge from diverse music traditions. This research led to the development of culture-specific audio signal processing and machine learning approaches to analyze music signals. These methodologies are of major relevance in the field of Computational Musicology, putting the emphasis on the music understanding perspective.

In recent years, the emergence of deep learning techniques and large AI models based on self-supervised approaches has reshaped the research landscape. Presently, we are working on the development of large AI models trained on huge amounts of diverse multimodal music data that can capture the complex relationships that make up music. From those models, we can then develop smaller task-specific models to support applications related to the creation, production, distribution, access, analysis, or enjoyment of music. The challenge here is how to drive our research from an ethical perspective, putting the musician at the center while supporting all the stakeholders of the music sector.

In this talk we will go through this long research journey, highlighting some of the most relevant developments and giving our view on past and current trends in this area of research.

 

Biography 

Xavier Serra is a Professor at the Universitat Pompeu Fabra in Barcelona, where he leads the Music Technology Group within the Department of Information and Communication Technologies. He earned his PhD in Computer Music from Stanford University in 1989, focusing on spectral processing of musical sounds, a foundational work in the field. His research spans computational analysis, description, and synthesis of sound and music signals, blending scientific and artistic disciplines. Dr. Serra is very active in the fields of Audio Signal Processing, Sound and Music Computing, Music Information Retrieval and Computational Musicology at the local and international levels, being involved in the editorial board of several journals and conferences and giving lectures on current and future challenges of these fields. He received an Advanced Grant from the European Research Council for the CompMusic project, promoting multicultural approaches in music information research. Currently, he directs the UPF-BMAT Chair on AI and Music, dedicated to fostering Ethical AI initiatives that can empower the music sector.
 

AES - Audio Engineering Society