AES London 2011
Poster Session P21
P21 - Processing and Analysis
Monday, May 16, 14:00 — 15:30 (Room: Foyer)
P21-1 Automatic Classification of Musical Audio Signals Employing Machine Learning Approach—Pawel Zwan, Bozena Kostek, Adam Kupryjanow, Gdansk University of Technology - Gdansk, Poland
This paper presents a thorough analysis of automatic classification applied to musical audio signals. The classification is based on a chosen set of machine learning algorithms. A database of 60 music composers/performers was prepared for the purpose of the described research. For each of the musicians, 15 to 20 music pieces were collected. All the pieces were partitioned into 20 segments and then parameterized. The feature vector consisted of 171 parameters, including MPEG-7 low-level descriptors and mel-frequency cepstral coefficients (MFCC) complemented with time-related dedicated parameters. The task of the classifier was to recognize the composer/performer and to properly categorize a selected piece of music. The paper also presents and discusses the results of classification.
Convention Paper 8449 (Purchase now)
P21-2 Evaluation of Onset Detection Algorithms in Popular Polyphonic Music on a Large Scale Database—Stephan Hübler, Rüdiger Hoffmann, Technische Universität Dresden - Dresden Germany
This paper introduces a large database of popular polyphonic music containing drums (10,238 onsets) for the evaluation of onset detection algorithms. The database has been manually annotated by expert listeners. The inter-rater variability leads to an understanding of inter-human variations. Four common detection functions are investigated: spectral difference, high frequency content, phase deviation, and the psychoacoustic one of Klapuri. We present an additional detection function based on the mpeg7 feature audio spectrum envelope. An adaptive peak picker determines the onsets that are compared with the manual labels. Results show that detection functions based on spectral difference obtain observable better results. The study provides a thorough investigation of onset detection algorithms in popular polyphonic music.
Convention Paper 8450 (Purchase now)
P21-3 Using the Viterbi Algorithm for Error Correction in an Autocorrelation-Based Pitch Detector—Bob Coover, Gracenote, Inc. - Emeryville, CA, USA
An autocorrelation-based method for detecting the fundamental pitch of an audio signal is presented in which the Viterbi algorithm is used in place of the error correction portion of the detector. The Viterbi algorithm is used to locate the most likely pitch path through the audio file. This method is compared to a typical heuristic and median filtering-based error correction approach that has been historically used in this type of algorithm. The Viterbi algorithm results are significantly better than the typical error correction methods for choosing the best and most plausible path through the pitch estimates.
Convention Paper 8451 (Purchase now)
P21-4 Spectral Equalization for GHA-Applied Restoration to Damaged Historical 78 rpm Records—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, The University of Tokyo - Tokyo, Japan
The authors have been engaged in the research of In-harmonic Frequency Analysis “GHA,” which enables the separation of desired signal-components and noise. Its primary purpose has been noise-reduction. Recently, the authors succeeded in conducting GHA in practical time length and carried out many sound restorations of historical 78 rpm records. Thanks to GHA’s sufficient separation of target signal-component from noisy objects, the restored signal is noise-less, however its tone quality is unnatural when it is reproduced using current audio equipment. This is due to fact that the recorded sounds were tuned to match to audio equipment in that age, therefore spectral equalization is necessary. In practice, extreme frequency emphases are required, but it had been impossible because of the existences of scratch noise. GHA-applied restoration removed theses difficulties, and equalization curve was obtained by comparing long-term spectrum of restored music with that of the same recorded music by current musicians. Generally equalizations are very complicated and were done utilizing a parametric equalizer.
Convention Paper 8452 (Purchase now)
P21-5 Selection of Approximated Activation Functions in Neural Network-Based Sound Classifiers for Digital Hearing Aids—Lorena Álvarez, Cosme Llerena, Enrique Alexandre, Roberto Gil-Pita, Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Spain
The feasible implementation of signal processing techniques on hearing aids is constrained by the limited number of instructions per second to implement the algorithms on the digital signal processor the hearing aid is based on. This adversely limits the design of a neural network-based classifier embedded in the hearing aid. Aiming at helping the processor achieve accurate enough results, and in the effort of reducing the number of instructions per second, this paper focuses on exploring the most adequate approximations for the activation function. The experimental work proves that the approximated neural network-based classifier achieves the same efficiency as that reached by the “exact” networks (without these approximations), but, this is the crucial point, with the added advantage of extremely reducing the computational cost on the digital signal processor.
Convention Paper 8453 (Purchase now)
P21-6 Development of Multiband Dynamic Range Compressor Regarding Noise Characteristics—Hoon Heo, Mingu Lee, Seokjin Lee, Koeng-Mo Sung, Seoul National University - Seoul, Korea
It is hard to hear sounds from digital TVs or mobile phones in noisy environments because of the masking effect. It could be solved by a simple amplification; however, special process for masking bands will be a further solution in some restricted situation. We proposed an algorithm named “perceptual irrelevant component elimination” using a modified multiband dynamic range compressor, which does not increase the signal level and enhances its perceptual signal-to-noise ratio by about 1 dB for speech signals and about 3 dB for music signals.
Convention Paper 8454 (Purchase now)
P21-7 Designing Sets of N Doubly Complementary IIR Filters—Alexis Favrot, Christof Faller, Illusonic LLC - Lausanne, Switzerland
A filter design procedure is described for obtaining sets of N doubly complementary IIR filters for any N and any bandpass frequencies. The N doubly complementary IIR filters are built following a tree-like structure based on pairs of doubly complementary IIR filters and additional all-pass filters. The sum of all band signals is an all-pass filtered version of the original audio signal. The complementary IIR filters can be used instead of an analysis filterbank (full rate). The corresponding synthesis filterbank is simply the sum of all band signals. The proposed filters enable high quality delay critical audio processing.
Convention Paper 8455 (Purchase now)