2017 July/August - Volume 65 Number 7/8


Listener Preferences for Alternative Dynamic-Range-Compressed Audio Configurations


Some audio experts have suggested that using Dynamic Range Compression (DRC) to increase the loudness of music compromises audio quality. Conversely, other researchers sometimes find that audio subjected to DRC is preferred over uncompressed audio. This research tests the hypothesis that it is DRC configuration, rather than the use of DRC that determines listener preferences. In this study 130 listeners completed 13 A/B preference trials using pairs of RMS loudness-equalized stimuli with different DRC configurations. By manipulating the point in the mix chain at which DRC was applied, this study supports the hypothesis that listeners prefer music with DRC applied to fewer signals simultaneously (tracks prior to grouping and summation). Findings also suggest that listeners prefer compression over limiting and they prefer moderate DRC over none.

Database Matching of Sparsely Measured Head-Related Transfer Functions


The effectiveness of binaural reproductions depends on the accuracy of those spatialization cues that are unique to an individual’s personal physiology. These cues are embedded in the Head-Related Transfer Functions (HRTFs), which guide sound from a given source direction to a listener’s ears. This report discusses the design and evaluation of a HRTF database matching system that pairs users to premeasured HRTF sets created from a sparse set of individualized acoustic measurements. The utilized spatial grid, which was derived from an LDA classifier, consisted of 68 filters nonuniformly distributed across 5 elevations from -30° and +30°. A binaural localization study confirmed the original hypothesis that the similarity between subsets of binaural filters can be generalized to represent the relationship of the HRTFs of origin. Analysis of the participant responses provided strong evidence that HRTF database matching is useful. The designed implementation was successful in providing users with alternative HRTF datasets when the similarity of the matched data to the search query was above a certain similarity threshold. It was also shown that low-similarity HRTFs can lead to decreased spatial localization accuracy.

Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and Adaboost


Since people regularly use computers for listening, emotion classification is an important part of human-computer interaction, which has various applications in industrial and commercial sectors. This research investigates and compares recognizing vocal emotions by three different classifiers: multiclass support vector machine, Adaboost, and random forests. The decisions of these classifiers are then combined using majority voting. The proposed method has been applied to two different emotional databases: the Surrey Audio-Visual Expressed Emotion (SAVEE) Database and the Polish Emotional Speech Database. A vector of 14 features was used in order to recognize seven basic emotions from the SAVEE database and six emotions form the Polish database. Features extracted from these databases include pitch, intensity, first through fourth formants and their bandwidths, mean autocorrelation, mean noise-to-harmonic ratio, and standard deviation. Recognition rates ranged from 71 to 87%.

The Emotional Characteristics of Bowed String Instruments with Different Pitch and Dynamics

Open Access



This paper investigates how emotional characteristics vary with pitch and dynamics within the bowed string instrument family. Listening tests compared the effects of pitch and dynamics on emotional characteristics of the violin, viola, cello, and double bass. Listeners compared the sounds pairwise over ten emotional categories. Results showed that the emotional characteristics Happy, Heroic, Romantic, Comic, and Calm generally increased with pitch, but decreased at the highest pitches. Angry and Sad generally decreased with pitch. Scary was strong in the extreme low and high registers, while Shy and Mysterious were unaffected by pitch. For dynamics, the results showed that Heroic, Comic, and Angry were stronger for loud notes, while Romantic, Calm, Shy, Sad, and the high register for Happy were stronger for soft notes. Scary and Mysterious were unaffected by dynamics. The results also showed significant differences between different bowed string instruments on notes of the same pitch and dynamic level. The results provide audio engineers and musicians with suggestions for emphasizing emotional characteristics of bowed strings in sound recordings and performances.

Improvement of Externalization by Listener and Source Movement Using a “Binauralized” Microphone Array


Several studies have reported a collapse of externalization (source location is perceived as being inside the head) when listening to binaural content with nonindividualized HRTFs. A previous experiment conducted with experienced subjects revealed that large head movements coupled with a head-tracking device could substantially improve the externalization of a speech stimulus. In the present study, a similar experiment was conducted with subjects having no previous experience with binaural audio. Similar improvements were found. In an additional condition, the roles were reversed: the subjects’ heads remained stationary while the sound sources were automatically moved around subjects. Results showed that source movements without tracking can also enhance externalization, but to a lesser extent than head-tracked movements. The speech stimulus was a male voice recorded in slightly reverberant conditions with a six-channel microphone array and then “binauralized” over headphones by simulating six virtual loudspeakers around the subject using several nonindividualized HRTFs. It was presented with two different orientations: 0° and 180°

Engineering Reports

Measurement of Loudspeakers with a Laser Doppler Vibrometer and the Exponential Sine Sweep Excitation Technique


In order to explore the vibrations of a loudspeaker cone, this research measures the axial acceleration of hundreds of points on the cone surface employing a Laser Doppler Vibrometer when excited by an exponential swept sine signal. The recorded signal was then transformed into an impulse response by convolution with the matched inverse sweep signal. From the knowledge of the acceleration at each point of the radiating surface, the free field sound pressure on axis at 1-m distance was computed. This research obtained results comparable to Finite Element Method simulations based only on the linear mechanical behavior of the loudspeaker cone without any acoustic interference. Moreover, by comparing the laser measurements of many samples, the researchers evaluated how the influence of known variations in the loudspeaker components or production process influenced the final performances of the device. Postprocessing of experimental results was performed using Matlab scripts, which also computed the deflection shapes of the loudspeaker cone.

The Perception of Hyper-Compression by Mastering Engineers


Hyper-compressed popular music is associated with the overuse of dynamic range processing in an effort to gain a competitive advantage in music production. This behavior should be unnecessary given the availability of loudness normalization algorithms across the industry; the practice has been denounced by mastering engineers as generating audible artefacts. However, the audibility of these artefacts to mastering engineers has not been examined. This study probes this question using an ABX listening experiment with 20 mastering engineers. On average, mastering engineers correctly discriminated 17 out of 24 conditions, suggesting that the sound quality artefacts generated by hyper-compression are difficult to perceive. The findings in the study suggest that audibility depends on the crest factor (CF) of the music rather than the amount of CF reduction, thus proposing the existence of a threshold of audibility.

Standards and Information Documents

AES Standards Committee News

142nd Convention Report, Berlin

142nd Convention Exhibitors and Sponsors

143rd Convention Preview, New York

143rd Convention Exhibitor and Sponsor Preview

AES@NAMM, Call for Contributions, Anaheim

144th Convention, Call for Contributions, Milan

Sonification, Assistive Listening, and Soundscapes


Spatialization and soni cation offer a number of exciting possibilities, both for those with impaired senses and those with normal vision and hearing. Soundscapes can take many forms, and there are a number of projects dedicated to determining how they may best be recorded, categorized, and described. Crowd noise and applause are made up of distinctive “grains” of sound that can create a noise-like background with identi able foreground elements. Research has concentrated on how to analyze and resynthesize these in a way that is perceptually convincing.

Call for Papers for JAES Special Issue on Augmented and Participatory Sound and Music Interaction Using Semantic Audio

Call for Papers for JAES Special Issue on High-Resolution Audio

142nd Convention Papers and Ebriefs Abstracts, Berlin

Section News

AES Conventions and Conferences

Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

