The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
Call for Papers for JAES Special Issue on Sound in Immersion and Emotion (Deadline extended to February 7th!)
Affiliation:Ruhr-University Bochum, Institute of Communication Acoustics, Bochum, Germany
When this author studied acoustics in the 1960s, architectural acoustics including room-acoustics was almost exclusively dealt with from the point of view of physics. In Germany we were taught from the seminal works of Cremer  and Kuttruff , for instance. Psychoacoustics was not considered hard science but “subjective,” and thus not taken seriously.
Download: PDF (281KB)
Affiliation:Applied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield, UK
Microphone array techniques for surround sound recording can be broadly classified into two groups: those that attempt to produce the continuous phantom imaging around 360° in the horizontal plane and those that treat the front and rear channels separately. The equal segment microphone array (ESMA) is a multichannel microphone technique that attempts to capture a sound field in 360° without any overlap between the stereophonic recording angle of each pair of adjacent microphones. This study investigated the optimal microphone spacing for a quadraphonic ESMA using cardioid microphones. Recordings of a speech source were made using the ESMAs with four different microphone spacings of 0 cm, 24 cm, 30 cm, and 50 cm, based on different psychoacoustic models for microphone array design. Multichannel and binaural stimuli were created with the reproduced sound field rotated over 45° intervals. Listening tests were conducted to examine the accuracy of phantom image localization for each microphone spacing, in both loudspeaker and binaural headphone reproductions. The results generally indicated that the 50-cm spacing, which was derived from an interchannel time and level trade-off model that is perceptually optimized for 90° loudspeaker base angle, produced more accurate localization results than the 24-cm and 30-cm ones, which were based on conventional models derived from the standard 60° loudspeaker setup. The 0-cm spacing produced the worst accuracy with the most frequent bimodal distributions of responses between the front and back regions. Findings from this study are expected to be useful for acoustic recording for virtual reality applications as well as for multichannel surround sound.
Download: PDF (HIGH Res) (2.7MB)
Download: PDF (LOW Res) (459KB)
Authors:Mitilineos, Stelios A.; Tatlas, Nicolas-Alexander; Potirakis, Stelios M.; Rangoussi, Maria
Affiliation:Department of Electrical and Electronics Engineering, University of West Attica, Athens, Greece
An efficient means for classifying potentially hazardous events using wireless acoustic sensor networks may significantly contribute to the preservation of cultural heritage, artifacts, and architectural sights. However, classification of field-collected sound samples is a demanding task because omnipresent ambient noise severely affects the quality of the recorded samples and the corresponding extracted features. Building on previous work, the authors present a series of fusion or ensemble learning techniques that poll a number of artificial neural network classifiers in order to create class estimates that are significantly more accurate than each isolated classifier or their average. Furthermore, ambient noise effect is simulated by artificially injecting additive white and pink noise to the available sound samples, thus creating a wide range of signal-to-noise (SNR) values. Numerical results demonstrate that the proposed fusion techniques maintain satisfactory accuracy even for negative SNR values, thus demonstrating the applicability of the proposed classification platform for real-world applications.
Download: PDF (HIGH Res) (6.8MB)
Download: PDF (LOW Res) (540KB)
Authors:Brandt, Matthias; Doclo, Simon; Bitzer, Joerg
Affiliation:University of Oldenburg, Dept. of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Oldenburg, Germany; Jade University of Applied Sciences, Oldenburg, Germany
The quality of audio recordings is often degraded by various types of disturbances, such as broadband noise, hum, clicks, and crackles. Of these, broadband noise is one of the most frequently occurring types of disturbance, especially in old recordings. Disturbances can be classified as having either a technical or acoustic origin. This research presents a novel algorithm to estimate the power spectral density (PSD) of stationary broadband noise disturbances in audio recordings. The proposed algorithm estimates the noise PSD as the mean value of an exponential distribution that corresponds to the truncated periodogram coefficients of the disturbed audio signal. A confidence value is computed to reflect the reliability of the noise PSD estimate. Noise PSD estimates with a low confidence are rejected in order to avoid degrading the desired signal when the obtained noise PSD estimate is used in a noise-reduction algorithm. Based on experiments with a large database of clean speech and music signals and different artificial and real-world broadband noise disturbances, the results show that the proposed algorithm yields reduced PSD estimation errors compared to the state-of-the-art minimum statistics algorithm for a large range of SNRs. The algorithm allows for unsupervised operation and thus constitutes an important part of a fully automatic broadband noise restoration system for audio archives.
Download: PDF (HIGH Res) (3.4MB)
Download: PDF (LOW Res) (577KB)
Authors:Hrabina, Martin; Sigmund, Milan
Affiliation:Brno University of Technology, Brno, Czech Republic
This paper introduces a specific database of audio events related to hunting wild elephants by poachers in open nature. Generally, collecting appropriate data with ground truth is a very time-consuming task. There are not many available databases of gunshots that can be used for research. This database contains gunshots, other sounds to express local audio diversity, and mixtures of these. The relatively small variability of gunshot signals together with the variability of extracted features were statistically evaluated. Gunshot detections were estimated using four basic feature sets separately. The created database is appropriate for developing methods of automatic gunshot detection from continuous audio signals that are suitable for implementation in low-power remote-monitoring systems. Some selected recordings from the database are free to be downloaded and other records are available from the authors.
Download: PDF (HIGH Res) (2.3MB)
Download: PDF (LOW Res) (240KB)
Authors:Pearce, Andy; Isabelle, Scott; Francois, Holly; Oh, Eunmi
Affiliation:University of Surrey, Guildford, UK; Amazon, Santa Clara, CA, USA; Samsung Electronics, Staines, UK; Samsung Electronics, Seoul, Republic of Korea
Although predictive models are widely used to predict the results of listening tests, there are currently no standardized statistical metrics for assessing the rank order. Commonly used rank-order metrics do not consider the variance of the listening test data. This paper proposes two novel metrics for assessing rank order with respect to variance by adapting Spearman’s Rho and Kendall’s Tau and assesses the performance of these metrics against actual listening test data with standardized prediction models.
Download: PDF (HIGH Res) (907KB)
Download: PDF (LOW Res) (482KB)
We are in a time when machines are increasingly being taught to undertake work that was previously the domain of humans, and that may mean picking apart human perceptual and creative processes in a way that enables them to be taught to, or built into, forms of display or control that machines can mediate. Selected papers on recording and production from the 145th Convention are summarized in that light.
Download: PDF (645KB)