Wednesday, October 18, 9:00 am — 11:30 am
P01-1 Generation and Evaluation of Isolated Audio Coding Artifacts—Sascha Dick, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Nadja Schinkel-Bielefeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Silvantos GmbH - Erlangen, Germany; Sascha Disch, Fraunhofer IIS, Erlangen - Erlangen, Germany
Many existing perceptual audio codec standards define only the bit stream syntax and associated decoder algorithms, but leave many degrees of freedom to the encoder design. For a systematic optimization of encoder parameters as well as for education and training of experienced test listeners, it is instrumental to provoke and subsequently assess individual coding artifact types in an isolated fashion with controllable strength. The approach presented in this paper consists of a pre-selection of suitable test audio content in combination with forcing a specially modified encoder into non-common operation modes to willingly generate controlled coding artifacts. In conclusion, subjective listening tests were conducted to assess the subjective quality for different parameters and test content.
Convention Paper 9809
P01-2 Enhancement of Voice Intelligibility for Mobile Speech Communication in Noisy Environments—Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Anton Porov, Samsung R&D Institute Russia - Moscow, Russia; Maria Koutsogiannaki, Samsung R&D Institute UK; Holly Francois, Samsung Electronics R&D Institute UK - Staines-Upon Thames, Surrey, UK; Jonghoon Jeong, Samsung Electronics Co. Ltd. - Seoul, Korea; Hosang Sung, Samsung Electronics - Korea; Eunmi Oh, Samsung Electronics Co., Ltd. - Seoul, Korea
One of the biggest challenges still encounter with speech communication via a mobile phone is that it is sometimes very difficult to understand what is said when listening in a noisy place. In this paper a novel approach based on two models is introduced to increase speech intelligibility for a listener surrounded by environmental noise. One is to perceptually optimize the speech when considering simultaneous background noise, the other is to modify the speech towards a more intelligible, naturally elicited speaking style. The two models are combined to provide more understandable speech even in a loud noisy environment environment, even in the case where we are unable to increase the speech volume. The improvements in perceptual quality and intelligibility are shown by Perceptual Objective Listening Quality Assessment and Listening Effort Mean Opinion Score evaluation.
Convention Paper 9810
P01-3 Application of Spectral-Domain Matching and Pseudo Non-Linear Convolution to Down-Sample-Rate-Conversion (DSRC)—Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK
A method of down-sample-rate conversion is discussed that exploits processes of spectral-domain matching and pseudo non-linear convolution applied to discrete data frames as an alternative to conventional convolutional filter and sub-sampling techniques. Spectral-domain matching yields a complex sample sequence that can subsequently be converted into a real sequence using the Discrete Hilbert Transform. The method is shown to result in substantially reduced time dispersion compared to the standard convolutional approach and circumvents filter symmetry selection such as linear phase or minimum phase. The formal analytic process is presented and validated through simulation then adapted to digital-audio sample-rate conversion by using a multi-frame overlap and add process. It has been tested in both LPCM-to-LPCM and DSD-to-LPCM applications where the latter can be simplified using a look-up code table.
Convention Paper 9811
P01-4 Detection of Piano Pedaling Techniques on the Sustain Pedal—Beici Liang, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
Automatic detection of piano pedaling techniques is challenging as it is comprised of subtle nuances of piano timbres. In this paper we address this problem on single notes using decision-tree-based support vector machines. Features are extracted from harmonics and residuals based on physical acoustics considerations and signal observations. We consider four distinct pedaling techniques on the sustain pedal (anticipatory full, anticipatory half, legato full, and legato half pedaling) and create a new isolated-note dataset consisting of different pitches and velocities for each pedaling technique plus notes played without pedal. Experiment shows the effectiveness of the designed features and the learned classifiers for discriminating pedaling techniques from the cross-validation trails.
Convention Paper 9812