AES 123rd Convention - Where Audio Comes Alive
Home Visitors Exhibitors Press Students Authors
detailed calendar
paper sessions
broadcast sessions
master classes
live sound seminars
special events
technical tours
exhibitor seminars
training sessions
technical committees
awards ceremony

AES New York 2007
Paper Session P14

P14 - Signal Processing Applied To Music

Sunday, October 7, 9:00 am — 12:00 pm
Chair: John Strawn, S Systems - Larkspur, CA, USA

P14-1 Interactive Beat Tracking for Assisted Annotation of Percussive MusicMichael Evans, British Broadcasting Corporation - Tadworth, Surrey, UK
A practical, interactive beat-tracking algorithm for percussive music is described. Regularly-spaced note onsets are determined by energy-based analysis and users can then explore candidate beat periods and phases as the overall rhythm pattern develops throughout the track. This assisted approach can allow more flexible rhythmic analysis than purely automatic algorithms. An open-source software package based on the algorithm has been developed, along with several practical applications to allow more effective annotation, segmentation, and analysis of music.
Convention Paper 7247 (Purchase now)

P14-2 Identification of Partials in Polyphonic Mixtures Based on Temporal Envelope SimilarityDavid Gunawan, D. Sen, The University of New South Wales - Sydney, NSW, Australia
In musical instrument sound source separation, the temporal envelopes of the partials are correlated due to the physical constraints of the instruments. With this assumption, separation algorithms then exploit the similarities between the partial envelopes in order to group partials into sources. In this paper we quantitatively investigate the partial envelope similarities of a large database of instrument samples and develop weighting functions in order to model the similarities. These model partials then provide a reference to identify similar partials of the same source. The partial identification algorithm is evaluated in the separation of polyphonic mixtures and is shown to successfully discriminate between partials from different sources.
Convention Paper 7248 (Purchase now)

P14-3 Structural Decomposition of Recorded Vocal Performances and it's Application to Intelligent Audio EditingGyörgy Fazekas, Mark Sandler, Queen Mary University of London - London, UK
In an intelligent editing environment, the semantic music structure can be used as beneficial assistance during the postproduction process. In this paper we propose a new approach to extract both low and high level hierarchical structure from vocal tracks of multitrack master recordings. Contrary to most segmentation methods for polyphonic audio, we utilize extra information available when analyzing a single audio track. A sequence of symbols is derived using a hierarchical decomposition method involving onset detection, pitch tracking, and timbre modeling to capture phonetic similarity. Results show that the applied model well captures similarity of short voice segments.
Convention Paper 7249 (Purchase now)

P14-4 Vibrato Experiments with Bassoon Sounds by Means of the Digital Pulse Forming Synthesis and Analysis FrameworkMichael Oehler, Institute for Music and Drama - Hanover, Germany; Christoph Reuter, University of Cologne - Cologne, Germany
The perceived naturalness of real and synthesized bassoon vibrato sounds is investigated in a listening test. The stimuli were generated by means of a currently developed synthesis and analysis framework for wind instrument sounds, based on the pulse forming theory. The framework allows controlling amplitude and frequency parameters at many different stages during the sound production process. Applying an ANOVA and Tukey HSD test it could be shown that timbre modulation (a combined pulse width and cycle duration modulation) is an important factor for the perceived naturalness of bassoon vibrato sounds. Obtained results may be useful for sound synthesis as well as in the field of timbre research.
Convention Paper 7250 (Purchase now)

P14-5 A High Level Musical Score Alignment Technique Based on Fuzzy Logic and DTWBruno Gagnon, Roch Lefebvre, Charles-Antoine Brunet, University of Sherbrooke - Sherbrooke, Quebec, Canada
This paper presents a method to align musical notes extracted from an audio signal with the notes of the musical score being played. Building on conventional alignment systems using Dynamic Time Warping (DTW), the proposed method uses fuzzy logic to create the similarity matrix used by DTW. Like a musician following a score, the fuzzy logic system uses high level information as its inputs, such as note identity, note duration, and local rhythm. Using high level information instead of frame by frame information reduces substantially the size of the DTW similarity matrix and thus reduces significantly the complexity to find the best path for alignment. Finally, the proposed method can automatically track where a musician starts and stops playing in a musical score.
Convention Paper 7251 (Purchase now)

P14-6 Audio Synthesis and Visualization with Flash CS3 and ActionScript 3.0Jordan Kolasinski, New York University - New York, NY, USA
This paper explains the methods and techniques used to build a fully functional audio synthesizer and FFT-based audio visualizer within the newest version of Flash CS3. Audio synthesis and visualization have not been possible to achieve in previous versions of Flash, but two new elements of ActionScript 3.0—the Byte Array and Compute Spectrum function—make it possible even though it is not included in Flash’s codebase. Since Flash is present on 99 percent of the world’s computers, this opens many new opportunities for audio on the web.
Convention Paper 7252 (Purchase now)

Last Updated: 20070828, mei