The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
Authors:Campbell, William; Paterson, Justin; van der Linde, Ian
Affiliation:Department of Computing and Technology, Anglia Ruskin University, Cambridge, UK; London College of Music, University of West London, London, UK
Some audio experts have suggested that using Dynamic Range Compression (DRC) to increase the loudness of music compromises audio quality. Conversely, other researchers sometimes find that audio subjected to DRC is preferred over uncompressed audio. This research tests the hypothesis that it is DRC configuration, rather than the use of DRC that determines listener preferences. In this study 130 listeners completed 13 A/B preference trials using pairs of RMS loudness-equalized stimuli with different DRC configurations. By manipulating the point in the mix chain at which DRC was applied, this study supports the hypothesis that listeners prefer music with DRC applied to fewer signals simultaneously (tracks prior to grouping and summation). Findings also suggest that listeners prefer compression over limiting and they prefer moderate DRC over none.
Download: PDF (HIGH Res) (2.0MB)
Download: PDF (LOW Res) (281KB)
Authors:Andreopoulou, Areti; Roginska, Agnieszka
Affiliation:Laboratory of Music Acoustics and Technology (LabMAT), University of Athens, Athens, Greece; Music and Audio Research Lab (MARL), New York University, New York, NY, USA
The effectiveness of binaural reproductions depends on the accuracy of those spatialization cues that are unique to an individual’s personal physiology. These cues are embedded in the Head-Related Transfer Functions (HRTFs), which guide sound from a given source direction to a listener’s ears. This report discusses the design and evaluation of a HRTF database matching system that pairs users to premeasured HRTF sets created from a sparse set of individualized acoustic measurements. The utilized spatial grid, which was derived from an LDA classifier, consisted of 68 filters nonuniformly distributed across 5 elevations from -30° and +30°. A binaural localization study confirmed the original hypothesis that the similarity between subsets of binaural filters can be generalized to represent the relationship of the HRTFs of origin. Analysis of the participant responses provided strong evidence that HRTF database matching is useful. The designed implementation was successful in providing users with alternative HRTF datasets when the similarity of the matched data to the search query was above a certain similarity threshold. It was also shown that low-similarity HRTFs can lead to decreased spatial localization accuracy.
Download: PDF (HIGH Res) (770KB)
Download: PDF (LOW Res) (299KB)
Authors:Noroozi, Fatemeh; Kaminska, Dorota; Sapinski, Tomasz; Anbarjafari, Gholamreza
Affiliation:Institute of Technology, University of Tartu, Estonia; Institute of Mechatronics and Information Systems, Lodz University of Technology, Poland; iCV Research Group, Institute of Technology, University of Tartu, Estonia; Department of Electrical and Electronic Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
Since people regularly use computers for listening, emotion classification is an important part of human-computer interaction, which has various applications in industrial and commercial sectors. This research investigates and compares recognizing vocal emotions by three different classifiers: multiclass support vector machine, Adaboost, and random forests. The decisions of these classifiers are then combined using majority voting. The proposed method has been applied to two different emotional databases: the Surrey Audio-Visual Expressed Emotion (SAVEE) Database and the Polish Emotional Speech Database. A vector of 14 features was used in order to recognize seven basic emotions from the SAVEE database and six emotions form the Polish database. Features extracted from these databases include pitch, intensity, first through fourth formants and their bandwidths, mean autocorrelation, mean noise-to-harmonic ratio, and standard deviation. Recognition rates ranged from 71 to 87%.
Download: PDF (HIGH Res) (8.3MB)
Download: PDF (LOW Res) (365KB)
Authors:Chau, Chuck-jee; Gilburt, Samuel J. M.; Mo, Ronald; Horner, Andrew
Affiliation:The Hong Kong University of Science and Technology, Hong Kong; Newcastle University, Newcastle, UK
This paper investigates how emotional characteristics vary with pitch and dynamics within the bowed string instrument family. Listening tests compared the effects of pitch and dynamics on emotional characteristics of the violin, viola, cello, and double bass. Listeners compared the sounds pairwise over ten emotional categories. Results showed that the emotional characteristics Happy, Heroic, Romantic, Comic, and Calm generally increased with pitch, but decreased at the highest pitches. Angry and Sad generally decreased with pitch. Scary was strong in the extreme low and high registers, while Shy and Mysterious were unaffected by pitch. For dynamics, the results showed that Heroic, Comic, and Angry were stronger for loud notes, while Romantic, Calm, Shy, Sad, and the high register for Happy were stronger for soft notes. Scary and Mysterious were unaffected by dynamics. The results also showed significant differences between different bowed string instruments on notes of the same pitch and dynamic level. The results provide audio engineers and musicians with suggestions for emphasizing emotional characteristics of bowed strings in sound recordings and performances.
Download: PDF (HIGH Res) (5.0MB)
Download: PDF (LOW Res) (534KB)
Authors:Hendrickx, Etienne; Stitt, Peter; Messonnier, Jean-Christophe; Lyzwa, Jean-Marc; Katz, Brian F.G.; de Boishéraud, Catherine
Affiliation:University of Brest, CNRS, Brest Cedex 3, France; Audio Acoustics Group, LIMSI, CNRS, Université Paris-Saclay, Orsay, France; Conservatoire National Supérieur de Musique et de Danse de Paris, Paris, France; Sorbonne Universités, CNRS, Institut d'Alembert, Paris, France
Several studies have reported a collapse of externalization (source location is perceived as being inside the head) when listening to binaural content with nonindividualized HRTFs. A previous experiment conducted with experienced subjects revealed that large head movements coupled with a head-tracking device could substantially improve the externalization of a speech stimulus. In the present study, a similar experiment was conducted with subjects having no previous experience with binaural audio. Similar improvements were found. In an additional condition, the roles were reversed: the subjects’ heads remained stationary while the sound sources were automatically moved around subjects. Results showed that source movements without tracking can also enhance externalization, but to a lesser extent than head-tracked movements. The speech stimulus was a male voice recorded in slightly reverberant conditions with a six-channel microphone array and then “binauralized” over headphones by simulating six virtual loudspeakers around the subject using several nonindividualized HRTFs. It was presented with two different orientations: 0° and 180°
Download: PDF (HIGH Res) (1.2MB)
Download: PDF (LOW Res) (262KB)
Authors:Bellini, Maria Costanza; Collini, Luca; Farina, Angelo; Pinardi, Daniel; Riabova, Kseniia
Affiliation:University of Parma, Dept. of Engineering and Architecture, Parma, Italy
In order to explore the vibrations of a loudspeaker cone, this research measures the axial acceleration of hundreds of points on the cone surface employing a Laser Doppler Vibrometer when excited by an exponential swept sine signal. The recorded signal was then transformed into an impulse response by convolution with the matched inverse sweep signal. From the knowledge of the acceleration at each point of the radiating surface, the free field sound pressure on axis at 1-m distance was computed. This research obtained results comparable to Finite Element Method simulations based only on the linear mechanical behavior of the loudspeaker cone without any acoustic interference. Moreover, by comparing the laser measurements of many samples, the researchers evaluated how the influence of known variations in the loudspeaker components or production process influenced the final performances of the device. Postprocessing of experimental results was performed using Matlab scripts, which also computed the deflection shapes of the loudspeaker cone.
Download: PDF (HIGH Res) (14.4MB)
Download: PDF (LOW Res) (935KB)
Authors:Ronan, Malachy; Ward, Nicholas; Sazdov, Robert; Lee, Hyunkook
Affiliation:Digital Media and Arts Research Centre (DMARC), Department of Computer Science and Information Systems, University of Limerick, Ireland; University of Technology Sydney (UTS), Australia; Applied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield, UK
Hyper-compressed popular music is associated with the overuse of dynamic range processing in an effort to gain a competitive advantage in music production. This behavior should be unnecessary given the availability of loudness normalization algorithms across the industry; the practice has been denounced by mastering engineers as generating audible artefacts. However, the audibility of these artefacts to mastering engineers has not been examined. This study probes this question using an ABX listening experiment with 20 mastering engineers. On average, mastering engineers correctly discriminated 17 out of 24 conditions, suggesting that the sound quality artefacts generated by hyper-compression are difficult to perceive. The findings in the study suggest that audibility depends on the crest factor (CF) of the music rather than the amount of CF reduction, thus proposing the existence of a threshold of audibility.
Download: PDF (HIGH Res) (1.6MB)
Download: PDF (LOW Res) (226KB)
Spatialization and soni cation offer a number of exciting possibilities, both for those with impaired senses and those with normal vision and hearing. Soundscapes can take many forms, and there are a number of projects dedicated to determining how they may best be recorded, categorized, and described. Crowd noise and applause are made up of distinctive “grains” of sound that can create a noise-like background with identi able foreground elements. Research has concentrated on how to analyze and resynthesize these in a way that is perceptually convincing.
Download: PDF (460KB)