AES London 2011
Paper Session P1

P1 - Speech and Hearing

Friday, May 13, 09:30 — 12:30 (Room 4)

Robert Schulein

P1-1 The Evolution of the Speech Transmission IndexHerman J. M. Steeneken, TNO Human Factors (retired) - Soesterberg, The Netherlands, Embedded Acoustics, Delft, The Netherlands; Sander J. van Wijngaarden, Jan A. Verhave, Embedded Acoustics - Delft, The Netherlands
This year, the Speech Transmission Index celebrates its 40th anniversary. While the first measuring device built in the 1970s could barely fit inside a car, inexpensive pocket-size STI measuring solutions are now available to the world. Meanwhile, the STI method has continually evolved in order to deal with an increasing array of measuring challenges. This paper investigates how the STI kept up with these challenges and analyzes possible room for further improvement. Also, a roadmap for further development of the STI is proposed.
P1-2 Prosody Generation Module for Macedonian Text-to-Speech SynthesisBranislav Gerazov, Zoran Ivanovski, Faculty of Electrical Engineering and Information Technologies - Skopje, Macedonia
The paper presents a fully functional prosody generation module developed for Macedonian text-to-speech (TTS) synthesis. The module is based on research of prosody generation modules in high-end TTS synthesis systems, previous prosody experiences in Macedonian TTS, as well as original research of prosody carried through by the authors. The paper starts with an overview of the basic tasks, problems, and solutions in prosody generation modules. Then it continues to give a detailed account of the workings of the developed module. The module first segments the input speech into intonation phrases, and determines their intonation type. Next it generates durations for each of the units that will be used to synthesize the speech output. Then it determines the positions of the lexical stresses and modifies these units’ durations. After determining the intonation phrase’s pitch accent location, it generates an adequate pitch contour and calculates the pitch targets needed for unit modification. The synthesis module uses this data to generate prosody in the output speech. Generated prosody patterns in the output speech are of satisfactory quality for arbitrary text input. The presented results are of significant value for Macedonian TTS and can be used for other underrepresented languages.
P1-3 The Influence of Transmission Channel on the Admissibility of Speech Sample for Forensic Speaker IdentificationAndrey Barinov, Speech Technology Center Ltd. - St. Petersburg, Russia
This paper is an extension and addition to papers previously published and presented during the AES 39th conference and AES 129th Convention, regarding voice sample quality requirements and compensation for the influence of transmission channels for further forensic or automatic speaker identification. In this paper we provide the analysis of different types of transmission channels such as land line, GSM, radio, and VoIP. We analyze the important parameters of voice samples obtained from these channels and compare the influence of different channels on important speaker identification voice biometric features. At the end of the paper, there are some conclusions provided regarding the circumstances under which the particular recording can be accepted / rejected for forensic speaker identification.
P1-4 Optimizing the Acoustic and Intelligibility Performance of Assistive Audio Systems and Program Relay for the Hard of Hearing and Other UsersPeter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Around 10 % of the general population suffer from a noticeable degree of hearing loss and would benefit from some form of hearing assistance or deaf aid. DDA legislation and requirements mean that many more hearing assistive systems are being installed – yet there is continuing evidence to suggest that many of these systems fail to perform adequately and provide the benefit expected [1]. This paper reports on the results of acoustic performance testing of a number of trial ALS systems. The use of STIPA, as a practical measure for assessing the potential intelligibility of ALS systems, is discussed. The ALS microphone type, distance and angular location from the target acoustic source are shown to have a significant effect on the resultant potential intelligibility performance. The effects of typical ALS signal processing have been investigated and are shown to have a small but significant effect on the STIPA result. The requirements for a suitable acoustic test source to mimic a human talker are discussed as is the need to adequately assess the effects of both reverberation and noise. The findings of this paper are also relevant to the installation and testing of educational sound field systems as well as boardroom and conference room systems.
P1-5 Sound Reproduction within a Closed Ear Canal:Acoustical and Physiological EffectsSamuel Gido, Asius Technologies LLC - Longmont, CO, USA, University of Massachusetts, Amherst, MA, USA; Robert Schulein, Stephen Ambrose, Asius Technologies LLC - Longmont, CO, USA
When a sound producing device such as insert earphones or a hearing aid is sealed in the ear canal, the fact that only a tiny segment of the sound wave can exist in this small volume at any given instant, produces an oscillation of the static pressure in the ear canal. This effect can greatly boost the SPL in the ear canal, especially at low frequencies, a phenomena that we call Trapped Volume Insertion Gain (TVIG). In this study the TVIG has been found by numerical modeling as well as direct measurements using a Zwislocki coupler and the ear of a human subject, to be as much as 50 dB greater than sound pressures typically generated while listening to sounds in an open environment. Even at moderate listening volumes, the TVIG can increase the low frequency SPL in the ear canal to levels where they produce excursions of the tympanic membrane that are 100 to 1000 times greater than in normal open-ear hearing. Additionally, the high SPL at low frequencies in the trapped volume of the ear canal, can easily exceed the threshold necessary to trigger the Stapedius reflex, a stiffing response of the middle ear, which reduces its sensitivity and may lead to audio fatigue. The addition of a compliant membrane covered vent in the sound tube of an insert ear tip was found to reduce the TVIG by up to 20 dB, such that the Stapedius reflex would likely not be triggered.
P1-6 Can We Compare the Sound Quality of Noise Reduction between Hearing Aids? A Method to Level the Ground between DevicesRolph Houben, Inge Brons, Wouter A. Dreschler, Academic Medical Center, Amsterdam, The Netherlands
This paper proposes the application of an equalization filter to remove unwanted differences in frequency response between recordings from different hearing aids. The filter makes it possible to compare the perceptual effects (such as user preference) of a specific signal processing feature (e.g., noise reduction) between different hearing aids, without the dominant influence of differences in their frequency responses. Both an objective quality measure (PESQ) and a listening experiment have shown that the filter was able to “level the ground” between the devices included. The potential application of the inverse filter is to use it on recordings from hearing aids so that we can directly compare the noise reduction between devices. This allows one to determine if users prefer a certain noise reduction over another, which could lead to improved rehabilitation of hearing impaired listeners.
