AES E-Library

AES E-Library Search Results

Search Results (Displaying 6 matches) New Search
Sort by:

Bulk download: Download Zip archive of all papers from this Journal issue

 

An Analysis of Low-Arousal Piano Music Ratings to Uncover What Makes Calm and Sad Music So Difficult to Distinguish in Music Emotion Recognition

Document Thumbnail

Systems that recognize the emotional content of music and systems that provide music recommendations often use a simplified 4-quadrant model with categories such as Happy, Sad, Angry, and Calm. Previous research has shown that both listeners and automated systems often have difficulty distinguishing low-arousal categories such as Calm and Sad. This paper explores what makes these categories difficult to distinguish. 300 low-arousal excerpts from the classical piano repertoire were used to determine the coverage of the categories Calm and Sad in the low-arousal space, their overlap, and their balance to one another. Results show that Calm was 40% bigger in terms of coverage than Sad, but on average, Sad excerpts were significantly more negative in mood than Calm excerpts that were positive. Calm and Sad overlapped in nearly 20% of the excerpts, meaning 20% of the excerpts were about equally Calm and Sad. Calm and Sad covered about 92% of the low-arousal space. The largest holes were for excerpts considered Mysterious and Doubtful. Due to the holes in the coverage, the overlaps, and imbalances, the Calm-Sad model adds about 6% more errors when compared to asking users directly whether the mood of the music is positive or negative. Nevertheless, the Calm-Sad model is still useful and appropriate for many applications.

Open Access

Open
Access

Authors:
Affiliation:
JAES Volume 65 Issue 4 pp. 304-320; April 2017 Permalink
Publication Date:


Download Now (618 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Binaural Speech Intelligibility Prediction in the Presence of Multiple Babble Interferers Based on Mutual Information

Document Thumbnail

This paper describes a predictor for binaural speech intelligibility that computes speech reception thresholds (SRT) without the need to perform subjective listening tests. Although listening tests are considered to be the most reliable indicators of performance, such tests are time consuming and costly. The proposed model computes SRTs in two stages. First, it calculates the binaural advantage. Then, it derives the SRTs based on the computed mutual information of the speech and mixture envelopes. Listening tests were conducted with 13 normal-hearing listeners in 15 spatial configurations, covering one, two, and three babble interferers. The proposed predictor performs as well as the baseline model in predicting the intelligibility of binaural vowel-consonant-vowel signals contaminated by multiple nonstationary babble noise sources. The model is evaluated in anechoic conditions and compared with subjective data as well as with the predictions obtained from a baseline binaural speech intelligibility model.

Authors:
Affiliation:
JAES Volume 65 Issue 4 pp. 285-292; April 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Broadcast and Streaming: Immersive Audio, Objects, and OTT TV

Document Thumbnail

[Feature] “Broadcasting” is now done as much on the internet as it is over the airwaves. We summarize two workshops presented at the 141st Convention on the latest developments in this field, including delivery of immersive audio to the home and audio for Over-the-top (OTT) TV.

Author:
JAES Volume 65 Issue 4 pp. 338-341; April 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!


Electrostatic Polarization Process Control: A Case Study on Electret Condenser Microphone Production Line

Document Thumbnail

With the expanding market for low-cost microphones, raising the manufacturing yields (lowering the rejection) becomes a central technical issue. This report explores a case study of possible causes of electret microphone rejection. Because the polarization voltage developed on the microphone diaphragm has a direct effect on the microphone sensitivity, hence variations in this voltage were investigated. Initial studies were conducted with an arrangement of fixture plate (bearing holes for holding microphones) which was placed on two base (support) plates. The investigations considered 23 microphone positions across 11 readings. The acceptable and unacceptable polarization voltages were designated, and the corresponding failure percentage was determined. Furthermore, one base plate was removed to increase the distance between top electrode and diaphragm, and polarization voltage was measured. The results showed an appreciable reduction in the polarization voltage that indicated a promising reduction in the microphone failure rate. For a representative electret condenser microphone, a nonlinear variation between sensitivity and polarization voltage was established. Statistical analysis revealed that measurement data is symmetric and distributed normally. The proposed modification, when implemented on the shop floor, reduced rejection from 33% to 16%.

Authors:
Affiliations:
JAES Volume 65 Issue 4 pp. 321-332; April 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this report!


Personalized Object-Based Audio for Hearing Impaired TV Viewers

Document Thumbnail

Age demographics have led to an increase in the proportion of the population suffering from some form of hearing loss. The introduction of object-based audio to television broadcasting has the potential to improve the viewing experience for millions of hearing impaired people. Personalization of object-based audio can assist in overcoming difficulties in understanding speech and the narrative audio. This research presented describes a Multi-Dimensional Audio (MDA) implementation of object-based clean audio that presents independent object streams based on object-category elicitation. Evaluations were carried out with hearing impaired people, and participants were able to personalize audio levels independently for four object-categories using an on-screen menu: speech, music, background effects, and foreground effects related to on-screen events. Results show considerable preference variation across subjects but nevertheless the expanding object-category personalization beyond a binary speech/nonspeech categorization can substantially improve the viewing experience for some hearing impaired people.

Open Access

Open
Access

Authors:
Affiliation:
JAES Volume 65 Issue 4 pp. 293-303; April 2017 Permalink
Publication Date:


Download Now (216 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Single-Channel Speech Enhancement Based on Psychoacoustic Masking

Document Thumbnail

Speech enhancement processing can improve the performance of speech communication systems in noisy environments, such as in mobile communication systems, speech recognition, or hearing aids. Single-channel speech enhancement is more difficult than it is with multiple channels since there is no independent source of information that can help separate the speech and noise signals. This paper addresses single-channel speech enhancement based on the masking properties of the human auditory system. A complete implementation of speech enhancement using psychoacoustic masking is presented. The incorporation of temporal masking along with simultaneous masking (as compared to using only simultaneous masking) produces results that are more consistent with human auditory characteristics. The combined masking is then used to adapt the subtraction parameters to obtain the best trade-off among noise reduction, speech distortion, and the level of residual perceptual noise. The application of objective measures and subjective listening tests demonstrate that the proposed algorithm outperforms comparable speech enhancement algorithms.

Authors:
Affiliation:
JAES Volume 65 Issue 4 pp. 272-284; April 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


AES - Audio Engineering Society