Home | Technical Program | Exhibition | Visitors | Students | Press
Last Updated: 20060424, meiP6 - Posters: Analysis and Synthesis of Sound; Mobile Phone Audio; Automobile Audio
Saturday, May 20, 16:00 — 17:30
P6-1 Application of Segmentation and Thumbnailing to Music Browsing and Searching—Mark Levy, Mark Sandler, Queen Mary, University of London - London, UK
We present a method for segmenting musical audio into structural sections and some rules for choosing a representative “thumbnail” segment. We demonstrate how audio thumbnails are an effective and natural way of returning results in music search applications. We investigate the use of segment-based models for music similarity searching and recommendation. We report experimental results of the performance and efficiency of these approaches in the context of SoundBite, a demonstration music thumbnailing and search engine.
[Poster Presentation Associated with Paper Presentation P2-1]
Convention Paper 6642 (Purchase now)
P6-2 Multiple F0 Tracking in Solo Recordings of Monodic Instruments—Chunghsin Yeh, Axel Röbel, Xavier Rodet, IRCAM - Paris, France
This paper is concerned with the F0 tracking in monodic instrument solo recordings. Due to reverberation, the observed signal is rather polyphonic, and single-F0 tracking techniques often give unsatisfying results. The proposed method is based on multiple-F0 estimation and makes use of the a priori knowledge that the observed spectrum is generated by a single monodic instrument. The predominant F0 is tracked first and the secondary F0 tracks are then established. The proposed method is tested on reverberant recordings and show significant improvements compared to single-F0 estimators.
[Poster Presentation Associated with Paper Presentation P2-2]
Convention Paper 6643 (Purchase now)
P6-3 Harmonic Plus Noise Decomposition: Time-Frequency Reassignment Versus a Subspace-Based Method—Bertrand David, Valentin Emiya, Roland Badeau, Yves Grenier, Ecole Nationale Supérieure de Télécommunications - Paris Cedex, France
This paper deals with the Harmonic + Noise decomposition and, as a targeted application, to extract transient background noise surrounded by a signal having a strong harmonic content (speech for instance). In that perspective, a method based on the reassigned spectrum and a high-resolution subspace tracker are compared, both on simulations and in a more realistic manner. The reassignment relocalizes the time-frequency energy around a given pair (analysis time index, analysis frequency bin) while the high resolution method benefits from a characterization of the signal in terms of a space spanned by the harmonic content and a space spanned by the stochastic content. Both methods are adaptive and the estimations are updated from a sample to the next.
[Poster Presentation Associated with Paper Presentation P2-3]
Convention Paper 6644 (Purchase now)
P6-4 Signal Analysis Using the Complex Spectral Phase Evolution (CSPE) Method—Kevin Short, Ricardo Garcia, Chaoticom Technologies - Andover, MA, USA
The Complex Spectral Phase Evolution (CSPE) method is introduced as a tool to analyze and detect the presence of short-term stable sinusoidal components in an audio signal. The method provides for super-resolution of frequencies by examining the evolution of the phase of the complex signal spectrum over time-shifted windows. It is shown that this analysis, when applied to a sinusoidal signal component, allows for the resolution of the true signal frequency with orders of magnitude greater accuracy than the DFT. Further, this frequency estimate is independent of the frequency bin and can be estimated from “leakage” bins far from spectral peaks. The method is robust in the presence of noise or nearby signal components, and is a fundamental tool in the front-end processing for the KOZ compression technology.
[Poster Presentation Associated with Paper Presentation P2-4]
Convention Paper 6645 (Purchase now)
P6-5 Contextual Effects on Sound Quality Judgments: Listening Room and Automotive Environments—Kathryn Beresford, University of Surrey - Guildford, Surrey, UK; Natanya Ford, Harman Becker Automotive Systems - Bridgend, UK; Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
This study was designed to assess the effect of the listening context on basic audio quality for stimuli with varied mid-range timbral degradations. An assessment of basic audio quality was carried out in two different listening environments: an ITU-R BS.1116 conformant listening room and a stationary vehicle. A group of untrained listeners graded basic audio quality using a novel single stimulus method. The listener population was divided into two subsets—one made evaluations in a listening room and the other in a vehicle. The single stimulus method was investigated as a possible subjective evaluation method for use in automotive environments.
[Poster Presentation Associated with Paper Presentation P2-7]
Convention Paper 6648 (Purchase now)
P6-6 A Hybrid Concealment Algorithm for Nonpredictive Wideband Audio Coders—Vilayphone Vilaysouk, Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
This paper proposes a hybrid packet loss concealment (PLC) algorithm for memoryless encoders such as PCM. The concealment algorithm integrates two modes, one in the time domain and the other in the frequency domain. Mode selection is performed using the previous, correctly received samples prior to an erased packet. This hybrid approach provides a packet loss concealment mechanism that can adapt to the signal characteristics and is not restricted to pure speech signals. Subjective evaluations have demonstrated that the proposed algorithm performs significantly better than single mode concealment algorithms.
Convention Paper 6670 (Purchase now)
P6-7 Toward an Inverse Constant Q Transform—Derry FitzGerald, Matt Cranitch, Marcin T. Cychowski, Cork Institute of Technology - Bishopstown, Cork, Ireland
The Constant Q transform has found use in the analysis of musical signals due to its logarithmic frequency resolution. Unfortunately, a considerable drawback of the Constant Q transform is that there is no inverse transform. Here we show it is possible to obtain a good quality approximate inverse to the Constant Q transform provided that the signal to be inverted has a sparse representation in the Discrete Fourier Transform domain. This inverse is obtained through the use of l0 and l1 minimization approaches to project the signal from the constant Q domain back to the Discrete Fourier Transform domain. Once the signal has been projected back to the Discrete Fourier Transform domain, the signal can be recovered by performing an inverse Discrete Fourier Transform.
Convention Paper 6671 (Purchase now)
P6-8 History and Design of Russian Electro-Musical Instrument “Theremin”—Yurii Vasilyev, Saint-Petersburg State University of Telecommunications - St. Petersburg, Russia
Electro-musical instrument Theremin, developed by the Russian physicist L. S. Theremin, has come a long way in its evolution. It evokes the constantly growing interest of audio-engineers and performers. Theremin is used both for performing musical compositions of different genres and for making special effects in theatrical performances, multimedia, and the film industry. In the presented paper the analysis of circuit technique solutions during the last 80 years has been done, both on the basis of analogous circuit technique and digital microprocessor technique, and realizations of Theremin as real and virtual musical instruments. Advantages and disadvantages of different circuit technique solutions have also been analyzed and the most interesting realizations of virtual Theremin are presented.
Convention Paper 6672 (Purchase now)
P6-9 A Fast- and High-Convergence Method for ICA-Based Noise Reduction in Mobile Phone Speech Communication—Zhang Zhipeng, Etoh Minoru, NTT DoCoMo Labs - Yokosuka, Kanagawa, Japan
This paper proposes a noise reduction technique that applies a priori information to unmixing matrix estimation in ICA; it offers fast and accurate convergence. We formulate the parameter estimation stabilized by the a priori information as a Bayesian framework of maximum a posteriori (MAP) estimation, and show its robustness in mobile phone environments, where the position of the microphone relative to the mouth is almost constant. We use the transfer function of mouth to microphone for one row of the unmixing matrix. Using these estimated parameters as initial values, the unmixing matrix can be updated with high efficiency in the framework of MAP estimation. Experimental results confirm that the proposed method achieves high performance, especially in high SNR noise conditions.
Convention Paper 6673 (Purchase now)
P6-10 A Comparison of Time-Domain Time-Scale Modification Algorithms—David Dorran, Dublin Institute of Technology - Dublin, Ireland; Robert Lawlor, National University of Ireland - Maynooth, Ireland; Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
Time-domain approaches to time-scale modification are popular due to their ability to produce high quality results at a relatively low computational cost. Within the category of time-domain implementations quite a number of alternatives exist, each with their own computational requirements and associated output quality. This paper provides a computational and objective output quality assessment of a number of popular time-domain time-scaling implementations; thus providing a means for developers to identify a suitable algorithm for their application of interest. In addition, the issues that should be considered in developing time-domain algorithms are outlined, purely in the context of a waveform editing procedure.
Convention Paper 6674 (Purchase now)
P6-11 The Importance of the Nonharmonic Residual for Automatic Musical Instrument Recognition of Pitched Instruments—Arie Livshin, Xavier Rodet, IRCAM Centre Pompidou - Paris, France
In different papers dealing with automatic musical instrument recognition of pitched instruments, the features used for classification are based solely on the fundamental frequencies and the harmonic series, ignoring the nonharmonic residual. In this paper we explore whether the instrument recognition rate of pitched instruments is decreased by removing the nonharmonic information present in the sound signal.
Convention Paper 6675 (Purchase now)
P6-12 A Fuzzy Rules-Based Speech/Music Discrimination Approach for Intelligent Audio Coding over the Internet—Jose Enrique Muñoz-Exposito, S. García Galán, N. Ruiz Reyes, P. Vera Candeas, F. Rivas Peña, Universidad de Jaen - Linares, Spain
Our paper presents a speech/music discrimination approach based on fuzzy rules for selecting the suitable coder required in an intelligent audio coding system. When the same coder is used for both speech and music, it is difficult to achieve good audio quality and low bit rates for both types of signals. We propose using a simple feature called Warped LPC-based Spectral Centroid (WLPC-SC) for speech/music discrimination. In order to select the suitable audio coder for each audio frame, an expert system is proposed. The main advantage of the proposed approach is the low computational cost in both the speech/music discrimination and coder selection stages. It allows its use in real time applications as internet audio streaming.
Convention Paper 6676 (Purchase now)
P6-13 Analysis and Transsynthesis of Solo Erhu Recordings Using Adaptive Additive/Subtractive Synthesis—Yi-Song Siao, Wei-Lun Chang, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
Erhu is the main bowed-string instrument in traditional Chinese music, much like the violin in western music. It has two strings and its top plate is made of snake skin. Numerous solo works were written for erhu. In this paper erhu resynthesis/transsynthesis software is presented. We use frame-based methods to analyze pitch and volume information of a solo erhu recording. Then, one can resynthesize it using the erhu timbre extracted from the original recording, other erhu timbres, or even timbres like violin and trumpet. Additive synthesis and subtractive synthesis methods are used to synthesize the overall sound. Because the expression and playing style of the original recording are preserved, the result is realistic and musical.
Convention Paper 6677 (Purchase now)
P6-14 Application of Fisher Linear Discriminant Analysis to Speech/Music Classification—Enrique Alexandre, Manuel Rosa, Lucas Cuadra, Roberto Gil-Pita, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain
This paper proposes the application of Fisher linear discriminants to the problem of speech/music classification. Fisher linear discriminants can classify between two different classes and are based on the calculation of some kind of centroid for the training data corresponding with each one of these classes. Based on that information a linear boundary is established, which will be used for the classification process. Some results will be given demonstrating the superior behavior of this classification algorithm compared with the well-known K-nearest neighbor algorithm. It will also be demonstrated that it is possible to obtain very good results in terms of probability of error using only one feature extracted from the audio signal, being thus possible to reduce the complexity of this kind of system in order to implement them in real-time.
Convention Paper 6678 (Purchase now)
|(C) 2006, Audio Engineering Society, Inc.