AES Journal

Journal of the AES

2017 April - Volume 65 Number 4

Papers

Single-Channel Speech Enhancement Based on Psychoacoustic Masking

Authors:Zhou, Tingting; Zeng, Yumin; Wang, Rongrong
Affiliation:School of Physics and Technology, Nanjing Normal University, Nanjing, China
Page:272

Speech enhancement processing can improve the performance of speech communication systems in noisy environments, such as in mobile communication systems, speech recognition, or hearing aids. Single-channel speech enhancement is more difficult than it is with multiple channels since there is no independent source of information that can help separate the speech and noise signals. This paper addresses single-channel speech enhancement based on the masking properties of the human auditory system. A complete implementation of speech enhancement using psychoacoustic masking is presented. The incorporation of temporal masking along with simultaneous masking (as compared to using only simultaneous masking) produces results that are more consistent with human auditory characteristics. The combined masking is then used to adapt the subtraction parameters to obtain the best trade-off among noise reduction, speech distortion, and the level of residual perceptual noise. The application of objective measures and subjective listening tests demonstrate that the proposed algorithm outperforms comparable speech enhancement algorithms.

Download: PDF (HIGH Res) (614KB)

Download: PDF (LOW Res) (399KB)

Be the first to discuss this paper

Binaural Speech Intelligibility Prediction in the Presence of Multiple Babble Interferers Based on Mutual Information

Authors:Geravanchizadeh, Masoud; Avanaki, Hadi Jamshidi; Dadvar, Paria
Affiliation:Faculty of Electrical & Computer Engineering, University of Tabriz, Tabriz, Iran
Page:285

This paper describes a predictor for binaural speech intelligibility that computes speech reception thresholds (SRT) without the need to perform subjective listening tests. Although listening tests are considered to be the most reliable indicators of performance, such tests are time consuming and costly. The proposed model computes SRTs in two stages. First, it calculates the binaural advantage. Then, it derives the SRTs based on the computed mutual information of the speech and mixture envelopes. Listening tests were conducted with 13 normal-hearing listeners in 15 spatial configurations, covering one, two, and three babble interferers. The proposed predictor performs as well as the baseline model in predicting the intelligibility of binaural vowel-consonant-vowel signals contaminated by multiple nonstationary babble noise sources. The model is evaluated in anechoic conditions and compared with subjective data as well as with the predictions obtained from a baseline binaural speech intelligibility model.

Download: PDF (HIGH Res) (3.7MB)

Download: PDF (LOW Res) (291KB)

Be the first to discuss this paper

Personalized Object-Based Audio for Hearing Impaired TV Viewers

Open
Access

Authors:Shirley, Ben Guy; Meadows, Melissa; Malak, Fadi; Woodcock, James Stephen; Tidball, Ash
Affiliation:University of Salford, Salford, UK
Page:293

Age demographics have led to an increase in the proportion of the population suffering from some form of hearing loss. The introduction of object-based audio to television broadcasting has the potential to improve the viewing experience for millions of hearing impaired people. Personalization of object-based audio can assist in overcoming difficulties in understanding speech and the narrative audio. This research presented describes a Multi-Dimensional Audio (MDA) implementation of object-based clean audio that presents independent object streams based on object-category elicitation. Evaluations were carried out with hearing impaired people, and participants were able to personalize audio levels independently for four object-categories using an on-screen menu: speech, music, background effects, and foreground effects related to on-screen events. Results show considerable preference variation across subjects but nevertheless the expanding object-category personalization beyond a binary speech/nonspeech categorization can substantially improve the viewing experience for some hearing impaired people.

Download: PDF (HIGH Res) (3.5MB)

Download: PDF (LOW Res) (216KB)

Be the first to discuss this paper

An Analysis of Low-Arousal Piano Music Ratings to Uncover What Makes Calm and Sad Music So Difficult to Distinguish in Music Emotion Recognition

Open
Access

Authors:Hong, Yu; Chau, Chuck-Jee; Horner, Andrew
Affiliation:Department of Computer Science and Engineering Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Page:304

Systems that recognize the emotional content of music and systems that provide music recommendations often use a simplified 4-quadrant model with categories such as Happy, Sad, Angry, and Calm. Previous research has shown that both listeners and automated systems often have difficulty distinguishing low-arousal categories such as Calm and Sad. This paper explores what makes these categories difficult to distinguish. 300 low-arousal excerpts from the classical piano repertoire were used to determine the coverage of the categories Calm and Sad in the low-arousal space, their overlap, and their balance to one another. Results show that Calm was 40% bigger in terms of coverage than Sad, but on average, Sad excerpts were significantly more negative in mood than Calm excerpts that were positive. Calm and Sad overlapped in nearly 20% of the excerpts, meaning 20% of the excerpts were about equally Calm and Sad. Calm and Sad covered about 92% of the low-arousal space. The largest holes were for excerpts considered Mysterious and Doubtful. Due to the holes in the coverage, the overlaps, and imbalances, the Calm-Sad model adds about 6% more errors when compared to asking users directly whether the mood of the music is positive or negative. Nevertheless, the Calm-Sad model is still useful and appropriate for many applications.

Download: PDF (HIGH Res) (2.1MB)

Download: PDF (LOW Res) (619KB)

Be the first to discuss this paper

Engineering Reports

Electrostatic Polarization Process Control: A Case Study on Electret Condenser Microphone Production Line

Authors:Pawar, S. J.; Jong, Yang-Dane; Her, Hong Ching; Huang, Jin H.
Affiliation:Department of Applied Mechanics, Motilal Nehru National Institute of Technology Allahabad, Allahabad, India; Electroacoustic Graduate Program and Department of Mechanical and Computer-Aided Engineering, Feng Chia University, Taichung, Taiwan, Republic of China; Merry Electronics Co., Ltd. Taichung, Taiwan, Republic of China
Page:321

With the expanding market for low-cost microphones, raising the manufacturing yields (lowering the rejection) becomes a central technical issue. This report explores a case study of possible causes of electret microphone rejection. Because the polarization voltage developed on the microphone diaphragm has a direct effect on the microphone sensitivity, hence variations in this voltage were investigated. Initial studies were conducted with an arrangement of fixture plate (bearing holes for holding microphones) which was placed on two base (support) plates. The investigations considered 23 microphone positions across 11 readings. The acceptable and unacceptable polarization voltages were designated, and the corresponding failure percentage was determined. Furthermore, one base plate was removed to increase the distance between top electrode and diaphragm, and polarization voltage was measured. The results showed an appreciable reduction in the polarization voltage that indicated a promising reduction in the microphone failure rate. For a representative electret condenser microphone, a nonlinear variation between sensitivity and polarization voltage was established. Statistical analysis revealed that measurement data is symmetric and distributed normally. The proposed modification, when implemented on the shop floor, reduced rejection from 33% to 16%.

Download: PDF (HIGH Res) (8.3MB)

Download: PDF (LOW Res) (510KB)

Be the first to discuss this report

Features

2017 Audio Forensics Conference Preview, Arlington

Page: 334

Download: PDF (823KB)

2017 Semantic Audio Conference Preview, Erlangen

Page: 336

Download: PDF (599KB)

Broadcast and Streaming: Immersive Audio, Objects, and OTT TV

Author:Rumsey, Francis
Page:338

“Broadcasting” is now done as much on the internet as it is over the airwaves. We summarize two workshops presented at the 141st Convention on the latest developments in this field, including delivery of immersive audio to the home and audio for Over-the-top (OTT) TV.

Download: PDF (116KB)

Be the first to discuss this feature

Departments

Section News

Page: 342

Download: PDF (244KB)

Book Reviews

Page: 346

Download: PDF (206KB)

Products

Page: 349

Download: PDF (130KB)

AES Conventions and Conferences

Page: 352

Download: PDF (180KB)

Navigation

Journal of the AES

2017 April - Volume 65 Number 4

Papers

Single-Channel Speech Enhancement Based on Psychoacoustic Masking

Binaural Speech Intelligibility Prediction in the Presence of Multiple Babble Interferers Based on Mutual Information

Personalized Object-Based Audio for Hearing Impaired TV Viewers

An Analysis of Low-Arousal Piano Music Ratings to Uncover What Makes Calm and Sad Music So Difficult to Distinguish in Music Emotion Recognition

Engineering Reports

Electrostatic Polarization Process Control: A Case Study on Electret Condenser Microphone Production Line

Features

2017 Audio Forensics Conference Preview, Arlington

2017 Semantic Audio Conference Preview, Erlangen

Broadcast and Streaming: Immersive Audio, Objects, and OTT TV

Departments

Section News

Book Reviews

Products

AES Conventions and Conferences

Extras

Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

ABOUT AES

Contact Us