AES Journal

Journal of the AES

2017 October - Volume 65 Number 10

Papers

Listener Discrimination Between Common Speaker-Based 3D Audio Reproduction Formats

Authors:Howie, Will; King, Richard; Martin, Denis
Affiliation:The Graduate Program in Sound Recording, McGill University, Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology, Montreal, Quebec, Canada
Page:796

Over the last decade, numerous three-dimensional audio playback formats have been introduced and standardized for cinema, broadcast, and home theater environments. They differ in terms of number of speakers, speaker positions in the horizontal and vertical planes, and workflow strategies: channel-based, object-based, or some hybrid of the two. Each system possesses inherent pros and cons. This research attempts to determine whether listeners could discriminate among four currently standardized three-dimensional audio formats for reproduction of acoustic music. Double-blind listening tests showed that listeners could discriminate between NHK 22.2 Multichannel Sound (22.2) and several lower-channel-count 3D reproduction formats with a high degree of success, regardless of the musical stimulus. Listeners were also able to discriminate between three relatively similar 3D audio formats: ATSC 11.1, KBS 10.2, and Auro 9.1, although with significantly less success than with the 22.2. This suggests each of these formats deliver a perceptually different listening experience, with 22.2 being particularly different from the other formats under investigation.

Download: PDF (HIGH Res) (12.6MB)

Download: PDF (LOW Res) (305KB)

Be the first to discuss this paper

Improved Local Mean Decomposition Based on the T-distribution for Feature Extraction of Abnormal Sounds in Public Places

Authors:Li, Weihong; Zhao, Bingxin; Peng, Shuyong; Gong, Weiguo
Affiliation:Key Lab of Optoelectronic Technology and Systems Ministry of Education at Chongqing University, Chongqing, China
Page:806

Audio-based surveillance systems can be used in public places to detect abnormal events because such events are usually accompanied by abnormal sounds, such as screaming, explosions, gunfire, and crashing sounds. Audio-surveillance systems can supplement video surveillance. This paper proves that a T-distribution model is highly suitable for describing a wide range of typical background noise distributions encountered in public places. Background noise in public places can affect feature extraction for abnormal sounds when Local Mean Decomposition (LMD) is used as a signal-processing tool. The authors first confirm that the background noise obeys a T-distribution using Kolmogorov-Smirnov hypothesis testing. The authors propose an improved LMD method based on the T-distribution to enhance features extraction. They add particular production function components of inhomogeneous random noise obeying a T-distribution to the abnormal sound in a nested manner and then take the ensemble means of the obtained production functions as the decomposition results. This alleviates the mode mixing inherent in LMD. Additionally, the algorithm replaces moving average operation with a linear spline to reduce the iteration required in LMD from triple-loop iteration to double-loop iteration. Experimental results demonstrate that the proposed method outperforms commonly used methods in terms of the classification rate and computational cost.

Download: PDF (HIGH Res) (3.4MB)

Download: PDF (LOW Res) (416KB)

Be the first to discuss this paper

Efficient Design of a Parallel Graphic Equalizer

Authors:Bank, Balázs; Belloch, Jose A.; Välimäki, Vesa
Affiliation:Dept. of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary; Dept. of Computer Science and Engineering, Universitat Jaume, Castellón de la Plana, Spain; Acoustics Lab, Dept. of Signal Processing and Acoustics, Aalto University, Espoo, Finland
Page:817

Accurate design of a parallel graphic equalizer involves the construction of a complex target frequency response, which is obtained by smoothly interpolating using minimum-phase characteristics between defined gains, which is then followed by a least-squares filter design. This work proposes two methods to simplify the design computations. First, the magnitude and phase response of the target is computed as a combination of minimum phase basis functions, which leads to the easier evaluation of the total frequency response. Second, the matrix is decomposed into the product of an orthogonal matrix Q and an upper triangular matrix R, which simplifies the required matrix inversion. A comparison with the previous method shows that the accuracy of the proposed design method is not significantly compromised. And the computational cost is radically reduced, making the new algorithm highly attractive for interactive audio applications. The method has been tested on an ARM-based system-on-Chip Cortex-A7, which is currently used in many mobile devices. For the weighted parallel equalizer design, the total speedup is a factor of 7. For the more efficient nonweighted designs, the computation of the filter coefficients takes 0:87ms on the ARM-A7 processor (the speedup factor is 300 compared to the original method).

Download: PDF (HIGH Res) (939KB)

Download: PDF (LOW Res) (701KB)

Be the first to discuss this paper

Impulsive Disturbances in Audio Archives: Signal Classification for Automatic Restoration

Authors:Brandt, Matthias; Doclo, Simon; Gerkmann, Timo; Bitzer, Joerg
Affiliation:University of Oldenburg, Dept. of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Oldenburg, Germany; University of Hamburg, Dept. of Informatics, Signal Processing Group, Hamburg, Germany; Jade University of Applied Sciences, Oldenburg, Germany
Page:826

Historic recordings usually have degraded audio quality because of their age, improper storage, and the shortcomings of the original media. One typical problem is the presence of impulsive disturbances. Recordings that suffer from clicks and crackles can be processed by impulse-restoration algorithms to improve their audio quality. This report presents a new algorithm that classifies one-second frames of an audio recording based on the existence of impulsive disturbances. The algorithm uses supervised learning. It is shown that existing impulse-restoration algorithms suffer from degradation of the desired signal if the input SNR is high and if no manual parameter adjustment is possible. This would make automatic restoration of large amounts of diverse archive audio material unfeasible. The proposed classification algorithm can be used as a supplement to an existing impulse-restoration algorithm to alleviate this drawback. An evaluation using a large number of test signals shows that high classification accuracy can be achieved, making automatic impulse restoration possible. Results show that prewhitening of the input signal by means of a phase-only transform serves to increase the detectability of disturbance impulses, which can also be used as a detection enhancement method for impulse-restoration algorithms.

Download: PDF (HIGH Res) (883KB)

Download: PDF (LOW Res) (426KB)

Be the first to discuss this paper

Engineering Reports

A High Resolution and Full-Spherical Head-Related Transfer Function Database for Different Head-Above-Torso Orientations

Open
Access

Authors:Brinkmann, Fabian; Lindau, Alexander; Weinzierl, Stefan; Par, Steven van de; Müller-Trapet, Markus; Opdam, Rob; Vorländer, Michael
Affiliation:Audio Communication Group, Technical University Berlin, Germany; Acoustics Group, Cluster of Excellence
Page:841

Head-related transfer functions (HRTFs) capture the free-field sound transmission from a sound source to the listeners ears, incorporating all the cues for sound localization, such as interaural time and level differences as well as the spectral cues that originate from scattering, diffraction, and reflection on the human pinnae, head, and body. In this study, HRTFs were acoustically measured and numerically simulated for the FABIAN head-and-torso simulator on a full-spherical and high-resolution sampling grid. HRTFs were acquired for 11 horizontal head-above-torso orientations, covering the typical range of motion of +/-50°. This made it possible to account for head movements in dynamic binaural auralizations. Because of a lack of an external reference for the HRTFs, measured and simulated data sets were cross-validated by applying auditory models for localization performance and spectral coloration. The results indicate a high degree of similarity between the two data sets regarding all tested aspects, thus suggesting that they are free of systematic errors.

Download: PDF (HIGH Res) (3.4MB)

Download: PDF (LOW Res) (301KB)

Be the first to discuss this report

Features

2018 Spatial Reproduction Conference, Call for Contributions, Tokyo

Page: 849

Download: PDF (106KB)

2018 Audio for Virtual and Augmented Reality Conference, Call for Contributions, Redmond

Page: 849

Download: PDF (106KB)

Sound Reinforcement Conference Report, Struer

Page: 850

Download: PDF (1.2MB)

AES@NAMM Preview, Anaheim

Page: 860

Download: PDF (686KB)

Forensic Audio: ENF and Audio File Authentication

Author:Rumsey, Francis
Page:863

Audio forensics bene ts greatly from the research reported at a recent conference, and work continues on improving the reliability and ease of use of ENF data gathering and analysis. There's also considerable effort going in to the authentication of recordings made on mobile devices such as iOS systems. A novel approach to recording authentication was reported based on reverberation analysis.

Download: PDF (456KB)

Be the first to discuss this feature

2018 Audio Archiving, Preservation, and Restoration Conference, Call for Contributions, Culpeper

Page: 867

Download: PDF (90KB)

Review of Society’s Sustaining Members

Page: 872

Download: PDF (263KB)

New Officers 2017/2018

Page: 888

Download: PDF (166KB)

Departments

Section News

Page: 868

Download: PDF (94KB)

Book Reviews

Page: 869

Download: PDF (143KB)

Obituaries

Page: 871

Download: PDF (179KB)

Products

Page: 886

Download: PDF (103KB)

AES Conventions and Conferences

Page: 892

Download: PDF (173KB)

Navigation

Journal of the AES

2017 October - Volume 65 Number 10

Papers

Listener Discrimination Between Common Speaker-Based 3D Audio Reproduction Formats

Improved Local Mean Decomposition Based on the T-distribution for Feature Extraction of Abnormal Sounds in Public Places

Efficient Design of a Parallel Graphic Equalizer

Impulsive Disturbances in Audio Archives: Signal Classification for Automatic Restoration

Engineering Reports

A High Resolution and Full-Spherical Head-Related Transfer Function Database for Different Head-Above-Torso Orientations

Features

2018 Spatial Reproduction Conference, Call for Contributions, Tokyo

2018 Audio for Virtual and Augmented Reality Conference, Call for Contributions, Redmond

Sound Reinforcement Conference Report, Struer

AES@NAMM Preview, Anaheim

Forensic Audio: ENF and Audio File Authentication

2018 Audio Archiving, Preservation, and Restoration Conference, Call for Contributions, Culpeper

Review of Society’s Sustaining Members

New Officers 2017/2018

Departments

Section News

Book Reviews

Obituaries

Products

AES Conventions and Conferences

Extras

Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

ABOUT AES

Contact Us