AES New York 2011
Poster Session P6

P6 - Speech

Thursday, October 20, 3:00 pm — 4:30 pm (Room: 1E Foyer)

P6-1 A Systematic Study on Formant Transition in Speech Using Sinusoidal GlideWen-Jie Wang, City University of New York - New York, NY, USA; Benjamin Guo, Chin-Tuan Tan, New York University, School of Medicine - New York, NY, USA
The goal of this study was to use sinusoidal glide to investigate the perceivable acoustic cues in a consonant-vowel /CV/ syllable systematically. The sinusoidal glide is designed to mimic the formant trajectory in a /CV/ syllable with two parts: a frequency glide followed by a constant frequency. The experiment varied the frequency step (with rising and falling glide) and duration of the initial part, and the center frequency and duration of the final part of the sinusoidal glide. We asked 6 normal hearing subjects to discriminate sinusoidal glides from sinusoids of constant frequency, and found that subjects require a larger frequency step when the duration of the glide is shortened but a smaller frequency step when the center frequency of the final part is lowered, to discriminate the two stimuli. The outcome of this experiment is compared to the outcomes of previous studies using synthesized formants and sinusoidal replicas.
Convention Paper 8479 (Purchase now)

P6-2 Perceived Quality of Resonance-Based Decomposed Vowels and ConsonantsChin-Tuan Tan, Benjamin Guo, New York University School of Medicine - New York, NY, USA; Ivan Selesnick, Polytechnic Institute of New York University - Brooklyn, NY, USA
The ultimate objective of this study is to employ a resonance-based decomposition method for the manipulation of acoustic cues in speech. Resonance-based decomposition (Selesnick, 2010) is a newly proposed nonlinear signal analysis method based not on frequency or scale but on resonance; the method is able to decompose a complex non-stationary signal into a “high-resonance" component and a “low-resonance" component using a combination of low- and high- Q-factors. In this study we conducted a subjective listening experiment on five normal hearing listeners to assess the perceived quality of decomposed components, with the intention of deriving the perceptually relevant combinations of low- and high- Q-factors. Our results show that normal hearing listeners generally rank high-resonance components of speech stimuli higher than low-resonance components. This may be due to a greater salience of perceptually significant formant cues in high-resonance stimuli.
Convention Paper 8480 (Purchase now)

P6-3 Relationship between Subjective and Objective Evaluation of Distorted Speech SignalsMitsunori Mizumachi, Kyushu Institute of Technology - Fukuoka, Japan
It is important for designing a noise reduction algorithm to evaluate the quality of noise-reduced speech signals accurately and efficiently. Subjective evaluation gives accurate evaluation, but requires listening tests with a lot of subjects and time. Then, objective distortion measures are employed as efficient evaluation. However, almost all the distortion measures do not consider the temporal variation of speech distortion. In this paper the temporal aspect of the segmental speech distortion is investigated based on higher-order statistics, that is, variance, skewness, and kurtosis. It is interesting that the skewness of the objective evaluation gives a good explanation for the discrepancy between subjective and objective evaluation.
Convention Paper 8481 (Purchase now)

P6-4 Multiple Microphones Speech Enhancement Using Minimum Redundant ArrayKwang-Cheol Oh, Samsung Electronics Co., Ltd. - Suwon City, Gyeong-Gi Do, Korea
A non-uniformly-spaced multiple microphone array speech enhancement method with small aperture size is proposed and analyzed. The technique utilizes a minimum redundant array structure used in antenna array in order to prevent spatial aliasing for high frequencies and uses the phase difference-based dual microphone speech enhancement techniques to implement small-microphone array. It has highly directive features from the low-frequency to high frequencies evenly, and its performance is measured with directivity index. The directivity index(DI) for the proposed approach is about 3 dB higher than that of a multiple microphone approach with the phase-based filter.
Convention Paper 8482 (Purchase now)

Information Last Updated: 20111005, mei

Return to Paper Sessions