Return to Paper Sessions  
AES Barcelona 2005
Poster Session Z5 - Analysis and Synthesis of Sound

Last Updated: 20050406, mei

Sunday, May 29, 15:00 — 16:30

Z5-1 The Intelligent Artificial Vocal Vibrato Effecter Using Pitch Detection and Delay-LineChulwoong Jeon, Peter F. Driessen, University of Victoria - Victoria, British Columbia, Canada
Karaoke is one of the largest commercial industries using audio and video in Asia today. One of the most popular features incorporated into the vocals produced by Karaoke singers has been Echo/Reverb. In addition to Echo/Reverb, if vibrato is added to vocal signals, then the vocal vibrato produced has the potential of making the singer feel more comfortable, confident, and professional with regard to their singing. In this paper we present a real-time Vocal Vibrato Effecter running under a Windows PC, which automatically adds a vibrato effect to the vocal input. The proposed technique exploits the vocal energy level and the temporal consistency of the pitch variation. The key novelty in this paper is the combination of pitch detector and pitch shifter. This effecter can be applied to consumer/commercial Karaoke systems to enhance a vocal signal.
Convention Paper 6407 (Purchase now)

Z5-2 Beat Tracking Toward Automatic Musical AccompanimentMatthew E. P. Davies, Paul M. Brossier, Mark D. Plumbley, Queen Mary University of London - London, UK
In this paper we address the issue of causal rhythmic analysis, primarily toward predicting the locations of musical beats such that they are consistent with a musical audio input. This will be a key component required for a system capable of automatic accompaniment with a live musician. We implement our approach as part of a real-time audio library. While performance for this causal system is reduced in comparison to our previous noncausal system, it is still suitable for our intended purpose.
Convention Paper 6408 (Purchase now)

Z5-3 Exploiting the Semantics Linked to a Low-Level Music Descriptor: The Detrended Variance Fluctuation ExponentSebastian Streich, Perfecto Herrera, Universitat Pompeu Fabra - Barcelona, Spain
Detrended fluctuation analysis (DFA) has been proposed by Peng et al. To be used on biomedical data. It originates from fractal analysis and reveals correlations within data series across different time scales. Jennings et al. used a DFA-derived feature, the detrended variance fluctuation exponent, for musical genre classification introducing the method to the music analysis field. In this paper we further exploit the relation of this low-level feature to semantic music descriptions. It was computed on 7750 tracks with manually annotated semantic labels like "energetic" or "melancholic." We found statistically strong associations between some of these labels and this feature supporting the hypothesis that it can be linked to a musical attribute which might be described as "danceability."
Convention Paper 6409 (Purchase now)

Z5-4 Transcribing Piano Recordings Using Signal NoveltyXue Wen, Mark Sandler, Queen Mary, University of London - London, UK
This paper proposes a method for transcribing a piano note that overlaps previous notes. It is suggested that by referring to a short context before the note being transcribed, it is possible to improve the performance of a note transcriber by removing the contribution of previous notes. This removal can be performed either explicitly to produce a novelty signal, or implicitly inside the note transcriber, with the latter leaving a space for further optimization. Experiments show that the method dramatically improves the performance of a simple transcriber.
Convention Paper 6410 (Purchase now)

Z5-5 BillaBoop: Real-Time voice-Driven Drum GeneratorAmaury Hazan, Pompeu Fabra University - Barcelona, Spain
A real-time application for generating drum rhythms controlled by voice is presented. By expressive drum rhythms we refer to a sequence of drum sounds that can be a subset of a samples bank or that can be generated by different drum synthesizers. The system consists of: (a) a descriptors generation component that computes a set of temporal and spectral features from each incoming frame, (b) a multiband onset detection component based on spectral variations of the incoming stream, (c) a machine learning component that assigns to each of the vocal hits of the input stream a label—both supervised and unsupervised approaches are considered for the learning task. The last component is (d) a beat generator that generates an output rhythmic stream taking into account continuous expressive features of the vocal performance. This work can be seen as a preliminary step in order to the build a robust interface able to process a wide range of real-world signals. Indeed, several vocal onomatopoeias can correspond to the same drum label, depending on the playing style of a given performer. Thus we considered a wide range of oral percussive signals from different performers in the perspective of building a model of immediate use, without prior learning steps. All these components are integrated into a low-latency application that allows its use for live performances.
Convention Paper 6411 (Purchase now)

Z5-6 Automatic Chord Identification Using a Quantized ChromagramChristopher Harte, Mark Sandler, Queen Mary University of London - London, UK
This paper presents an approach to the problem of identifying musical chords from audio recordings. In our approach, a tuning algorithm is applied to a 36-bin chromagram to accurately locate the boundaries between semitones. This allows the calculation of a 12-bin semitone-quantized chromagram, which can then be compared with a set of predefined chord templates in order to generate a sequence of chord estimates. The performance of our method is evaluated by comparing the results with a test database of hand-labeled pieces, from which the initial results are encouraging. The paper concludes with a discussion of some possible improvements to the algorithms presented.
Convention Paper 6412 (Purchase now)

Z5-7 Musical Instrument Identification Using LSF and K-MeansNicolas Chétry, Mike Davies, Mark Sandler, Queen Mary University University of London - London, UK
In this paper we address the problem of automatically recognizing and identifying an instrument from a set of solo recordings. A system using the Line Spectrum Frequencies (LSF) as unique features whose statistical properties are learned using the k-means algorithm is described. During the training phase, models are built by determining an optimized code book of LSF vectors for each class of instruments. During the identification phase, one code book is similarly extracted from the unknown audio sample. A distortion measure between two code books whose definition is introduced is used to retrieve the identity of the presented excerpt. System performances are evaluated for the classification of isolated note recordings among 11 classes of instruments and compared to a GMM-based classifier on one hand and to a minimum Mahanalobis distance classifier on the other hand.
Convention Paper 6413 (Purchase now)

Z5-8 Feature Extraction for Voice-Driven SynthesisJordi Janer, Universitat Pompeu Fabra - Barcelona, Spain
This paper explores the singing voice from an unusual perspective, not as a musical instrument but as a musical controller. A set of spectral processing algorithms extract features from the input voice. These features are categorized in four groups: excitation, vocal tract, voice quality, and context. The extracted values are then transmitted as Open Sound Control (OSC) messages to be used in an external synthesis engine. In this paper, we provide first a technical description of the algorithms, and in a second part, we detail the components of the system. A practical example of voice-driven synthesis using PureData (Pd) is also presented.
Convention Paper 6414 (Purchase now)

Z5-9 On the Usefulness of Differentiated Transient/Steady-State Processing in Machine Recognition of Musical InstrumentsSlim Essid, Pierre Leveau, Ecole Nationale Supérieure des Téléommunications (ENST), and Laboratoire d’Acoustique Musicale, Pari - Paris, France; Gaël Richard, Ecole Nationale Supérieure des Téléommunications (ENST) - Paris, France; Laurent Daudet, Laboratoire d'Acoustique Musicale - Paris, France; Bertrand David, Ecole Nationale Supérieure des Téléommunications (ENST) - Paris, France
This paper addresses the usefulness of the segmentation of musical sounds into transient/nontransient parts for the task of machine recognition of musical instruments. We put into light the discriminative power of the attack-transient segments on the basis of objective criteria, consistent with the well-known psychoacoustics findings. The sound database used is composed of real-world mono-instrument phrases. Moreover, we show that, paradoxically, it is not always optimal to consider such a segmentation of the audio signal in a machine recognition system for a given decision window. Our evaluation exploits efficient automatic segmentation techniques, a wide variety of signal processing features as well as feature selection algorithms and support vector machine classification.
Convention Paper 6415 (Purchase now)

Z5-10 A Method for Characterizing and Identifying Audio Based on Auditory Scene AnalysisBrett Crockett, Michael Smithers, Dolby Laboratories - San Francisco, CA, USA
A method for characterizing and identifying audio material using reduced-information audio characterizations based on auditory scene analysis is presented. In the method described, a single or multichannel audio signal is analyzed and the location and duration of the individual audio auditory events are identified. The auditory events are used to create a reduced-information, audio signature that can be used to determine whether one audio signal is derived from another audio signal. The audio signal comparison removes or minimizes the effect of temporal shift or delay on the audio signals, calculates a measure of similarity, and compares the measure of similarity against a threshold providing a fast, highly accurate and automatic method of signal identification.
Convention Paper 6416 (Purchase now)

©2005 Audio Engineering Society, Inc.