The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
Author: Bozena Kostek
Authors:Patole, Rashmika; Rege, Priti
Affiliation:College of Engineering, Pune, India
When an audio recording is used as evidence in litigation and forensic investigations, it needs to be checked thoroughly for authenticity and integrity in order to be admissible, compelling, and decisive evidence in a court of law. An audio recording can be subject to tampering attacks with easy-to-use editing and signal processing tools, thereby undermining its legal value. Artifacts embedded in an audio recording can provide valuable clues about the acoustic environment in` which the audio was recorded and allow for the detection of tampering. This paper presents findings of two parallel methodologies: (1) where the features are extracted from the room impulse response and (2) where features are extracted directly from the reverberated recordings. These methods focus on extracting parameters from audio recordings that helped distinguish different auditory scenes. Experiments employing an exhaustive set of machine learning classifiers along with different acoustic features were conducted for the classification of auditory environments. A comparative analysis has been carried out to assess the performance of each classifier and relative performance impact of each feature set in terms of the accuracy of classification. A two-layer Artificial Neural Network (ANN) provided an accuracy of 98.7% using room impulse responses and an accuracy of 99.5% when the reverberated audio recordings were trained.
Download: PDF (HIGH Res) (6.3MB)
Download: PDF (LOW Res) (394KB)
Author:Davis, Andrew R.
Affiliation:Preservation Research and Testing Division, Library of Congress, Washington, DC 20540-4560, USA
The playability and degradation of polyester magnetic media has been an ongoing concern for decades for audio curators, technicians, and hobbyists. As these collections continue to age, users increasingly desire to transfer their contents. However, such a task can be daunting. This report presents a new, rapid, nontechnical tool for evaluating the playability and physical surface of polyester magnetic tapes without needing to place them on playback equipment or use expensive technical instrumentation. Water contact angle, using a micro liter-sized droplet, was found to accurately predict the physical playback condition of the vast majority of tapes from a sampling of test tapes from the Library of Congress testing labs. This tool provides an appealingly simple and powerful method to directly probe a tape's physical surface. Results could frequently be interpreted by eye, without needing technical processing equipment or software.
Download: PDF (HIGH Res) (9.0MB)
Download: PDF (LOW Res) (384KB)
Authors:Najnudel, Judy; Hélie, Thomas; Roze, David
Affiliation:Conservation Recherche Team, CNRS-Musée de la Musique and S3AM Team, STMS Laboratory, IRCAM-CNRS-SU; CNRS, S3AM Team, STMS Laboratory, IRCAM-CNRS-SU
Even though digital technology now dominates the audio industry, there is still the need to preserve historic analog machines and instruments. The Onde Martenot, invented in 1928, is an example of a classic electronic musical instrument based on heterodyne processing. This paper describes a simulation of that instrument. In the Onde Martenot, two oscillators generate high-frequency quasi-sinusoidal signals, one of which is fixed and other is controlled by a player using a sliding ribbon. The sum of these two oscillators is an amplitude-modulated signal whose envelope is detected using a triode vacuum tube. That produces an audible sound with a frequency that is the difference of the two oscillators. The triode vacuum tube in the detector is a nonlinear component that adds harmonics to the signal. This paper focuses on using a power-balanced simulation of its ribbon-controlled oscillator, composed of linear, nonlinear, as well as time-varying components. Numerical experiments on the nonlinear time-varying circuit lead to expected observations: (1) the combination of the triode amplification and the LC-resonator produces a quasi-sinusoidal oscillation with a stable amplitude for a static configuration; (2) the mechanical force produced by the variable capacitor due to the ribbon displacement is undetectable by the musician for over-speed movement; and (3) the latency between the instantaneous frequency and the ribbon position is also undetectable. This corroborates that the Martenot s ribbon-controlled circuit is close to an ideal oscillator.
Download: PDF (HIGH Res) (4.9MB)
Download: PDF (LOW Res) (490KB)
Authors:McGinnity, Siobhan; Mulder, Johannes; Beach, Elizabeth Francis; Cowan, Robert
Affiliation:The University of Melbourne, Melbourne, Australia; The HEARing CRC, Melbourne, Australia; Murdoch University, Perth, Australia; National Acoustic Laboratories, Sydney, Australia
With the increased recognition of potential damage to listeners when subjected to excessively loud sound, software-based sound level management systems can be viewed as a component of a strategy for reducing sound exposure to patrons and staff in live music venues. However, the use of level management tools in small indoor music venues, which represent a unique environment, has not been systematically explored. In an experimental approach for sound level management, a software system was tried in six indoor live-music venues in Melbourne. Comparing a control (without sound level management software) and the experimental condition (using the software), there was no reduction in mean LAeq,T, although there was a reduction in the number of events with extreme volume levels. Subjective questionnaires indicated that one-fifth of the patrons preferred lower sound levels than they experienced. The findings suggest that modifications to the software system may be necessary if the aim of the system is to reduce patron and staff sound exposure rather than simply to avoid exceeding legislative sound level limits. Recommended alterations could include greater flexibility in choice of target, matching with context of the performance, or changes to the system's visual display so that staying below, not at target, is positively reinforced.
Download: PDF (HIGH Res) (3.4MB)
Download: PDF (LOW Res) (415KB)
Authors:Huang, Qingbo; Liu, Tiejun; Wu, Xihong; Qu, Tianshu
Affiliation:Key Laboratory on Machine Perception ( Ministry of Education ), Speech and Hearing Research Center, Peking University, Beijing, China; State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Beijing, China
To reduce the burden of storing and transmitting audio signals, they are often compressed with a lossy single-channel code. Because the high-frequency components are effectively truncated when using a low bitrate encoder, listeners may experience the sound as being uncomfortable, muffled, or dull. To compensate for the perceived degradation, bandwidth extension technology can be used to regenerate the missing high frequencies from the low-frequency components during the decoding process. In this paper the authors propose a bandwidth extension method based on Generative Adversarial Networks (GAN), which is used to estimate the relationship between the MDCT spectrum in the high-frequency part and the low-frequency part. It is evaluated by a discriminant network in the GAN to get a more natural result. A complete audio coding system was built by using AAC Low Complex as the single-channel core encoder with the proposed bandwidth extension method. To evaluate the audio quality decoded by the new system, a subjective evaluation experiment was carried out using the HE-AAC as the baseline system with the MUSHRA experimental method.
Download: PDF (HIGH Res) (8.3MB)
Download: PDF (LOW Res) (437KB)
Authors:Nowak, Johannes; Fischer, Georg
Affiliation:Electronic Media Technology Laboratory, Technische Universi Ilmenau, Ilmenau, Germany; Fraunhofer Institute for Digital Media Technology IDMT, Ilmenau, Germany
A prominent trend in spatial audio research is the realization of virtual acoustic environments based on binaural technology. This study estimates the perceptual influence of system errors on the binaural reproduction of spherical microphone array data for room simulation applications. Specifically, the impact of spatial aliasing, system noise, and microphone positioning errors is perceptually analyzed in a listening experiment using an auditory model. Perceptual and technical data are related by various predictive modeling techniques, which enable estimating the perceptual strength of system errors. The experimental data comprises spherical array simulations under free-field conditions and in two reflective environments, a dry and a reverberant shoebox-shaped room, using five different audio signals for auralization. Results show that error prediction is possible with high accuracy and low errors using nonlinear modeling techniques such as artificial neural networks.
Download: PDF (HIGH Res) (358KB)
Download: PDF (LOW Res) (167KB)
Authors:Torcoli, Matteo; Freke-Morin, Alex; Paulus, Jouni; Simon, Christian; Shirley, Ben
Affiliation:Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany; Acoustics Research Centre, University of Salford, UK; International Audio Laboratories Erlangen, Germany, A joint institution of Universität Erlangen-Nürnburg and Fraunhofer IIS
In audio production, background ducking facilitates speech intelligibility while allowing the background to fulfill its purpose, e.g., to create ambiance, set the mood, or convey semantic cues. Technical details for recommended ducking practices are not currently documented in the literature. This report first analyzes the common practices found in TV documentaries, and it then describes a listening test that investigated the preferences of 22 normal-hearing participants on the Loudness Difference (LD) between commentary and background during ducking. Highly personal preferences were observed, highlighting the importance of object-based personalization. Statistically significant difference was found between nonexpert and expert listeners. On average, nonexperts preferred LDs that were 4 LU higher than the ones preferred by experts. A statistically significant difference was also found between Commentary over Music (CoM) and Commentary over Ambiance (CoA). Based on the test results, the authors recommend at least 10 LU difference for CoM and at least 15 LU for CoA. Moreover, a computational method based on the Binaural Distortion-Weighted Glimpse Proportion (BiDWGP) was found to match the median preferred LD for each item with good accuracy.
Download: PDF (HIGH Res) (2.0MB)
Download: PDF (LOW Res) (491KB)
Room acoustics affect many of the things that audio engineers make or do. Scale models for simulating building acoustics have been given a new look in a potential role as echo chambers for live performance or recording. Modal decay at low frequencies can be evaluated using a clever wavelet transform tech-nique. Finite element models may be able to be employed to support measure-ments of reverberation time in small rooms. Music performances may vary in tempo when they’re done in different reverberation conditions, but the effects are not entirely predictable. Finally it may be possible make a desk screen that is both visually transparent and performs well acoustically.
Download: PDF (436KB)