AES Rome 2013
Poster Session P8

P8 - Audio Processing and Semantics

Sunday, May 5, 09:30 — 11:00 (Foyer)

P8-1 Combination of Growing and Pruning Algorithms for Multilayer Perceptrons for Speech/Music/Noise Classification in Digital Hearing Aids—Lorena Álvarez, University of Alcalá - Alcalá de Henares, Spain; Enrique Alexandre, University of Alcalá - Alcala de Henares (Madrid), Spain; Cosme Llerena, University of Alcalá - Alcala de Henares (Madrid), Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, Spain
This paper explores the feasibility of combining both growing and pruning algorithms in some way that the global approach results in finding a smaller multilayer perceptron (MLP) in terms of network size, which enhances the speech/music/noise classification performance in digital hearing aids, with the added bonus of demanding a lower number of hidden neurons, and consequently, lower computational cost. With this in mind, the paper will focus on the design of an approach that starts adding neurons to an initial small MLP until the stopping criteria for the growing stage is reached. Then, the MLP size is reduced by successively pruning the least significant hidden neurons while maintaining a continuous decreasing function. The results obtained with the proposed approach will be compared with those obtained when using both growing and pruning algorithms separately.
Convention Paper 8850 (Purchase now)

P8-2 Automatic Sample Recognition in Hip-Hop Music Based on Non-Negative Matrix Factorization—Jordan L. Whitney, University of Miami - Coral Gables, FL, USA; Colby N. Leider, University of Miami - Coral Gables, FL, USA
We present a method for automatic detection of samples in hip-hop music. A sample is defined as a short extraction from a source audio corpus that may have been embedded into another audio mixture. A series of non-negative matrix factorizations are applied to spectrograms of hip-hop music and the source material from a master corpus. The factorizations result in matrices of base spectra and amplitude envelopes for the original and mixed audio. Each window of the mixed audio is compared to the original audio clip by examining the extracted amplitude envelopes. Several image-similarity metrics are employed to determine how closely the samples and mixed amplitude envelopes match. Preliminary testing indicates that, distinct from existing audio fingerprinting algorithms, the algorithm we describe is able to confirm instances of sampling in a hip-hop music mixture that the untrained listener is frequently unable to detect.
Convention Paper 8851 (Purchase now)

P8-3 Performance Optimization of GCC-PHAT for Delay and Polarity Correction under Real World Conditions—Nicholas Jillings, Queen Mary University of London - London, UK; Alice Clifford, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
When coherent audio streams are summed, delays can cause comb filtering and polarity inversion can result in cancellation. The GCC-PHAT algorithm is a popular method for detecting (and hence correcting) the delay. This paper explores the performance of the Generalized Cross Correlation with Phase Transform (GCC-PHAT) for delay and polarity correction, under a variety of different conditions and parameter settings, and offers various optimizations for those conditions. In particular, we investigated the performance for moving sources, background noise, and reverberation. We consider the effect of varying the size of the Fourier Transform when performing GCC-PHAT. In addition to accuracy, computational efficiency and latency were also used as metrics of performance.
Convention Paper 8852 (Purchase now)

P8-4 Reducing Binary Masking Artifacts in Blind Audio Source Separation—Toby Stokes, University of Surrey - Guildford, Surrey, UK; Christopher Hummersone, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Binary masking is a common technique for separating target audio from an interferer. Its use is often justified by the high signal-to-noise ratio achieved. The mask can introduce musical noise artifacts, limiting its perceptual performance and that of techniques that use it. Three mask-processing techniques, involving adding noise or cepstral smoothing, are tested and the processed masks are compared to the ideal binary mask using the perceptual evaluation for audio source separation (PEASS) toolkit. Each processing technique's parameters are optimized before the comparison is made. Each technique is found to improve the overall perceptual score of the separation. Results show a trade-off between interferer suppression and artifact reduction.
Convention Paper 8853 (Purchase now)

P8-5 Detection of Sinusoids Using Statistical Goodness-of-Fit Test—Pushkar P. Patwardhan, Nokia India Pvt. Ltd. - Bangalore, India; Ravi R. Shenoy, Nokia India Pvt. Ltd. - Bangalore, India
Detection of tonal components from magnitude spectrum is an important initial step in several speech and audio processing applications. In this paper we present an approach for detecting sinusoidal components from the magnitude spectrum using “goodness-of-fit” test. The key idea is to test the null-hypothesis that the region of spectrum under observation is drawn from the magnitude spectrum of an ideal windowed-sinusoid. This hypothesis is tested with a chi-square “goodness-of-fit” test. The outcome of this hypothesis test is a decision about the presence of sinusoid in the observed region of magnitude spectrum. We have evaluated the performance of the proposed approach using synthetically generated samples containing steady and modulated harmonics in clean and noisy conditions.
Convention Paper 8854 (Purchase now)

P8-6 Novel Designs for the Parametric Peaking EQ User Interface for Single Channel Corrective EQ Tasks—Christopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper evaluates the suitability of existing parametric peaking EQ interfaces of analog and digital mixing desks and audio plugins for single channel corrective EQ tasks. It proposes novel alternatives based upon displaying FFT bin maximums for the full audio duration behind the EQ curve, automatically detecting and displaying the top five FFT bin maximum peaks to assist the engineer, an alternative numerical list display of top five FFT bin maximum peaks, and an interface that allows direct manipulation of the displayed FFT bin maximums. All interfaces were evaluated based on the time taken to perform a corrective EQ task, preference ranking, and qualitative comments. Results indicate that the novel EQ interfaces presented have potential over existing EQ interfaces.
Convention Paper 8855 (Purchase now)

P8-7 Drum Replacement Using Wavelet Filtering—Robert Barañski, AGH University of Science and Technology - Kracow, Poland; Szymon Piotrowski, AGH University of Science and Technology - Kracow, Poland; Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland
The paper presents the solution that can be used to unify snare drum sound within a chosen fragment. The algorithm is based on the wavelet transformation and allows replacement of sub-bands of particular sounds, which are outside a certain range. Five experienced sound engineers put the algorithm under the test using samples of five different snare drums. Wavelet filtering seems to be useful in terms of drum replacement, while the sound engineers response was, in the most cases, positive.
Convention Paper 8856 (Purchase now)

P8-8 Collaborative Annotation Platform for Audio Semantics—Nikolaos Tsipas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
In the majority of audio classification tasks that involve supervised machine learning, ground truth samples are regularly required as training inputs. Most researchers in this field usually annotate audio content by hand and for their individual requirements. This practice resulted in the absence of solid datasets and consequently research conducted by different researchers on the same topic cannot be effectively pulled together and elaborated on. A collaborative audio annotation platform is proposed for both scientific and application oriented audio-semantic tasks. Innovation points include easy operation and interoperability, on the fly annotation while playing audio content online, efficient collaboration with feature engines and machine learning algorithms, enhanced interaction, and personalization via state of the art Web 2.0 /3.0 services.
Convention Paper 8857 (Purchase now)

P8-9 Investigation of Wavelet Approaches for Joint Temporal, Spectral and Cepstral Features in Audio Semantics—Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the investigation of wavelet approaches for joint time, frequency, and cepstral audio feature extraction. Wavelets have been thoroughly studied over the last decades as an alternative signal analysis approach. Wavelet-features have also been successfully implemented in a variety of pattern recognition applications, including audio semantics. Recently, wavelet-adapted mel-frequency cepstral coefficients have been proposed as applicable features in speech recognition and general audio classification, incorporating perceptual attributes. In this context, various wavelet configuration-schemes are examined for wavelet-cepstral audio features extraction. Additional wavelet parameters are utilized in the formation of wavelet-feature-vectors and evaluated in terms of salient feature ranking. Comparisons with classical time-frequency and cepstral audio features are conducted in typical audio-semantics scenarios.
Convention Paper 8858 (Purchase now)

Return to Paper Sessions

REGISTRATION DESK May 4th 09:30 �� 18:30 May 5th 08:30 �� 18:30 May 6th 08:30 �� 18:30 May 7th 08:30 �� 16:30

TECHNICAL PROGRAM May 4th 10:30 �� 19:00 May 5th 09:00 �� 19:00 May 6th 09:00 �� 19:00 May 7th 09:00 �� 17:00

Audio Engineering Society

AES Rome 2013Poster Session P8

P8 - Audio Processing and Semantics

AES Rome 2013
Poster Session P8