AES New York 2009
Paper Session P1
P1 - Audio Perception
Friday, October 9, 9:00 am — 12:30 pm
Chair: Poppy Crum, Johns Hopkins School of Medicine - Baltimore, MD, USA
P1-1 Effect of Whole-Body Vibration on Speech. Part I: Stimuli Recording and Speech Analysis—Durand Begault, NASA Ames Research Center - Moffett Field, CA, USA
In space launch operations, speech intelligibility for radio communications between flight deck and ground control is of critical concern particularly during launch phases of flight having a predicted 12 Hz thrust oscillation. The potential effects of extreme acceleration and vibration during launch scenarios may impact vocal production. In this study, the effect of 0.5 and 0.7 g whole body vibration was evaluated on speech production of words (Diagnostic Rhyme Test word list). Six subjects were recorded in a supine position using a specially-designed chair and vibration platform. Vocal warbling, pitch modulation, and other effects were observed in the spectrographic and fundamental frequency analyses.
Convention Paper 7820 (Purchase now)
P1-2 Comparison of Objective Assessment Methods for Intelligibility and Quality of Speech—Juan-Pablo Ramirez, Alexander Raake, Deutscher Telekom Laboratories, TU Berlin - Berlin, Germany
Subjective rating of the quality of speech in narrow-band telecommunication is parametrically assessed by the so-called E-model. Intelligibility of the speech signal transmitted has a significant impact on the opinion of users. The Speech Intelligibility Index quantifies the amount of speech perceptual features available to the listener in conditions with background noise and linear frequency distortion. It has shown to be highly correlated with subjective speech recognition performance. This paper proposes a comparison between both models. It refers to, and details, improvements toward the modeling of quality in wide-band transmission.
Convention Paper 7821 (Purchase now)
P1-3 A Novel Listening Test-Based Measure of Intelligibility Enhancement—Markus Kallinger, Henning Ochsenfeld, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Anne Schlüter, University of Applied Science Oldenburg - Oldenburg, Germany
One of the main tasks of speech signal processing aims at increasing the intelligibility of speech. Furthermore, in environments with low ambient noise the listening effort can be supported by appropriate algorithms. Objective and subjective measures are available to evaluate these algorithms’ performance. However, most of these measures are not specifically designed to evaluate the performance of speech enhancement approaches in terms of intelligibility improvement. This paper proposes a novel listening test-based measure, which makes use of a speech intelligibility test, the Oldenburg Sentence Test (German Oldenburger Satztest, OLSA). Recent research results indicate a correlation between listening effort and speech intelligibility. Therefore, we propose to use our measure for both intelligibility enhancement for algorithms being operated at low signal-to-noise ratios (SNRs) and listening effort improvement at high SNRs. We compare the novel measure to results obtained from listening test-based as well as instrumental evaluation procedures. Good correlation and more plausible results in specific situations illustrate the potential of the proposed method.
Convention Paper 7822 (Purchase now)
P1-4 Which Wideband Speech Codec? Quality Impact Due to Room-Acoustics at Send Side and Presentation Method—Alexander Raake, Marcel Wältermann, Sascha Spors, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
We report on two listening tests to determine the speech quality of different wideband (WB) speech codecs. In the first test, we studied various network conditions, including WB–WB and WB–narrowband (WB–NB) tandeming, packet loss, and background noise. In addition to other findings, this test showed some codec quality rank-order changes when compared to the literature. To evaluate the hypothesis that secondary test factors lead to this rank-order effect, we conducted another speech quality listening test. Here we simulated different source material recording conditions (room-acoustics and microphone positions), processed the material with different WB speech coders, and presented the resulting files monotically in one and diotically in another test. The paper discusses why and how these factors impact speech quality.
Convention Paper 7823 (Purchase now)
P1-5 Evaluating Physical Measures for Predicting the Perceived Quality of Blindly Separated Audio Source Signals—Thorsten Kastner, University of Erlangen-Nuremberg - Erlangen, Germany, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
For blind source separation (BSS) based applications where the aim is the reproduction of the separated signals, the perceived quality of the produced audio signals is an important key factor to rate these systems. In this paper several signal-derived features are compared to assess their relevance in reflecting the perceived audio quality of BSS signals. The most relevant features are then combined in a multiple linear regression model to predict the perceptual quality. In order to cover a large variety of source signals and different algorithms, the reference ratings are obtained from extensive listening tests rating the BSS algorithms that participated in the Stereo Source Separation Campaigns 2007 (SASSEC) and 2008 (SiSEC). Results are presented for predicting the perceived quality of SiSEC items based on a model that was calibrated using SASSEC material.
Convention Paper 7824 (Purchase now)
P1-6 Statistics of MUSHRA Revisited—Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany, TU Ilmenau, Ilmenau, Germany; Judith Liebetrau, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Sebastian Schneider, TU Ilmenau - Ilmenau, Germany
Listening tests are the final instance when judging perceived audio quality. To achieve reliable and repeatable results, the experimental design and the statistical analysis of results are of great importance. The “triple stimulus with hidden reference” test (Rec. ITU-R BS.1116) and the MUSHRA test (multi-stimulus with hidden reference and anchors, Rec. ITU-R BS.1534, MUSHRA) are well established standardized listening tests. Traditionally, the statistical analysis of both is based on simple parametric statistics. This paper reanalyzes the results from MUSHRA tests with alternative statistical approaches mainly considering the fact that in MUSHRA every listener is not only assigning a score to each item, but is performing an inherent ranking test and a paired comparison test (“better-worse”) between pairs of stimuli Thus, more statistical information is made visible.
Convention Paper 7825 (Purchase now)
P1-7 Statistical Analysis of ABX Results Using Signal Detection Theory—Jon Boley, LSB Audio - Lafayette, IN, USA; Michael Lester, LSB Audio - Lafayette, IN, USA, Shure Incorporated, Niles, IL, USA
ABX tests have been around for decades and provide a simple, intuitive means to determine if there is an audible difference between two audio signals. Unfortunately, however, the results of proper statistical analyses are rarely published along with the results of the ABX test. The interpretation of the results may critically depend on a proper statistical analysis. In this paper a very successful analysis method known as signal detection theory is presented in a way that is easy to apply to ABX tests. This method is contrasted with other statistical techniques to demonstrate the benefits of this approach.
Convention Paper 7826 (Purchase now)