AES Warsaw 2015
Paper Session P18

P18 - (Lecture) Semantic Audio

Sunday, May 10, 14:00 — 17:00 (Room: Belweder)

Chair:
Pedro Duarte Pestana, Catholic University of Oporto - CITAR - Oporto, Portugal; Universidade Lusíada de Lisboa - Lisbon, Portugal

P18-1 Music Onset Detection Using a Bidirectional Mismatch Procedure Based on Smoothly Varying-Q Transform—Li Luo, University of Duisburg-Essen - Duisburg, Germany; Guido H. Bruck, University of Duisburg-Essen - Duisburg, Germany; Peter Jung, University of Duisburg-Essen - Duisburg, Germany
This paper describes a novel onset detector for music signal based on the smoothly varying-Q transform, where the Q-factors vary following a linear function of the center frequencies. The smoothly varying-Q factors allow the time-frequency representation to coincide with the auditory critical-band scale. As the analysis basis of the input signal, the time-frequency image generated by smoothly varying-Q transform indicates the frequency evolution. On the detection stage, a bidirectional mismatch procedure is designed to estimate the discrepancies of frequency partials between currently processed frame and its bidirectional neighboring frames. An onset strength signal is obtained by measuring the mismatch error between the neighboring frames. The evaluation of the proposed algorithm is performed on a fully onset annotated music database and the results show that the proposed algorithm can achieve high detection accuracy and satisfied results.
Convention Paper 9349 (Purchase now)

P18-2 A Real-Time System for Measuring Sound Goodness in Instrumental Sounds—Oriol Romani Picas, Universitat Pompeu Fabra - Barcelona, Spain; Hector Parra Rodriguez, Universitat Pompeu Fabra - Barcelona, Spain; Dara Dabiri, Universitat Pompeu Fabra - Barcelona, Spain; Hiroshi Tokuda, KORG Inc. - Tokyo, Japan; Wataru Hariya, KORG Inc. - Tokyo, Japan; Koji Oishi, KORG Inc. - Tokyo, Japan; Xavier Serra, Universitat Pompeu Fabra - Barcelona, Spain
This paper presents a system that complements the tuner functionality by evaluating the sound quality of a music performer in real-time. It consists of a software tool that computes a score of how well single notes are played with respect to a collection of reference sounds. To develop such a tool we first record a collection of single notes played by professional performers. Then, the collection is annotated by music teachers in terms of the performance quality of each individual sample. From the recorded samples, several audio features are extracted and a machine learning method is used to find the features that best described performance quality according to musician's annotations. An evaluation is carried out to assess the correlation between systems’ predictions and musicians’ criteria. Results show that the system can reasonably predict musicians’ annotations of performance quality.
Convention Paper 9350 (Purchase now)

P18-3 Timbre Solfege: Development of Auditory Cues for the Identification of Spectral Characteristics of Sound—Teresa Rosciszewska, Fryderyk Chopin University of Music - Warsaw, Poland; Andrzej Miskiewicz, Fryderyk Chopin University of Music - Warsaw, Poland
This paper is concerned with listening exercises conducted during a technical ear training course called Timbre Solfege, taught to the students of sound engineering at the Fryderyk Chopin University of Music in Warsaw. Discussed are auditory cues used for identification of the characteristics of timbre produced by varying the sound frequency bandwidth and by boosting of selective frequency bands with the use of a spectrum equalizer. The students’ ability of identifying those modifications of the spectrum envelope has been assessed in a variety of progress tests. Results of the tests show that systematic training during the Timbre Solfege course considerably improves memory for timbre and develops the ability of associating the perceived characteristics of timbre with the spectral properties of sounds.
Convention Paper 9351 (Purchase now)

P18-4 Automatic Vocal Percussion Transcription Aimed at Mobile Music Production—Héctor A. Sánchez-Hevia, University of Alcala - Alcalá de Henares, Madrid, Spain; Cosme Llerena-Aguilar, Sr., University of Alcalá - Alcala de Henares (Madrid), Spain; Guillermo Ramos-Auñón, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain
In this paper we present an automatic vocal percussion transcription system aimed to be an alternative to touchscreen input for drum and percussion programming. The objective of the system is to simplify the workflow of the user by letting him create percussive tracks made up of different samples triggered by his own voice without the need of any demanding skill by creating a system tailored to his specific needs. The system consists of four stages: event detection, feature extraction, and classification. We are employing small user-generated databases to adapt to particular vocalizations while avoiding overfitting and maintaining computational complexity as low as possible.
Convention Paper 9352 (Purchase now)

P18-5 Training-Based Semantic Descriptors Modeling for Violin Quality Sound Characterization—Massimiliano Zanoni, Politecnico di Milano - Milan, Italy; Francesco Setragno, Politecnico di Milano - Milan, Italy; Fabio Antonacci, Politecnico di Milano - Milan, Italy; Augusto Sarti, Politecnico di Milano - Milan, Italy; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
Violin makers and musicians describe the timbral qualities of violins using semantic terms coming from natural language. In this study we use regression techniques of machine intelligence and audio features to model in a training-based fashion a set of high-level (semantic) descriptors for the automatic annotation of musical instruments. The most relevant semantic descriptors are collected through interviews to violin makers. These descriptors are then correlated with objective features extracted from a set of violins from the historical and contemporary collections of the Museo del Violino and of the International School of Luthiery both in Cremona. As sound description can vary throughout a performance, our approach also enables the modeling of time-varying (evolutive) semantic annotations.
Convention Paper 9353 (Purchase now)

P18-6 Audibility of Lossy Compressed Musical Instrument Tones—Agata Rogowska, Warsaw University of Technology - Warsaw, Poland
The aim of the conducted study was to evaluate differences in the audibility of different instruments by three commonly used lossy codecs. Seven instrument tones were compressed using MP3-LAME, Vorbis, and Opus to determine how the detection of compressed sounds varies with bit rate, instrument, and compression formats. Audibility of lossy compression was examined on six naïve subjects during 60 hours of listening. At the bit rate of 32 kbps the compressed signals were easily discriminable with significant differences between subjects. With magnifying the bit rate audibility decreased, the signal becoming inaudible at 64–96 kbps. Discrimination varied significantly from instrument to instrument.
Convention Paper 9232 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS May 7th 10:00 – 18:00 May 8th 09:00 – 18:00 May 9th 09:00 – 18:00

REGISTRATION DESK May 6th 15:00 – 18:00 May 7th 09:30 – 18:30 May 8th 08:30 – 18:30 May 9th 08:30 – 18:30 May 10th 08:30 – 16:30

TECHNICAL PROGRAM May 7th 10:00 – 18:00 May 8th 09:00 – 18:00 May 9th 09:00 – 18:00 May 10th 09:00 – 17:00

Audio Engineering Society

AES Warsaw 2015Paper Session P18

P18 - (Lecture) Semantic Audio

AES Warsaw 2015
Paper Session P18