AES Journal

Journal of the AES

2015 May - Volume 63 Number 5


Autonomous Multitrack Equalization Based on Masking Reduction


In multitrack music production, some sounds get masked by other sounds and the listener has less ability to fully hear and distinguish the sound sources in the mix. The authors designed a simplified measure of masking based on best practices, and then implemented both an off-line and real-time, autonomous multitrack equalization system that reduces masking in multitrack audio. The system used objective measures of spectral masking in the resultant mixes. Listening tests provided a subjective comparison between the mix results of different implementations of the system, a raw mix, and manual mixes made by an amateur and a professional mix engineer. The results showed that autonomous systems reduce both the perceived and objective masking. The offline semi-autonomous system is capable of improving the raw mix better than an amateur and close to a professional mix by simply controlling one user parameter. The results also suggest that existing objective measures of masking are ill-suited for quantifying perceived masking in multitrack musical audio.

  Download: PDF (HIGH Res) (2.6MB)

  Download: PDF (LOW Res) (317KB)

  Be the first to discuss this paper

Nearfield Crosstalk Increases Listener Preferences for Headphone-Reproduced Stereophonic Imagery


Although final mixing and mastering is monitored over loudspeakers, the majority of music listeners use headphones on mobile devices. Preferences for spatial process depend on the method of reproduction. For a variety of program material using headphones, listeners often prefer a stereophonic image that is created by simulating nearfield crosstalk compared to the biphonic spatial image. This novel approach, called Nearfield Crosstalk Simulation, describes crosstalk that simulates closely located loudspeakers. Previous work used farfield crosstalk simulation in an effort to produce an enhanced stereophonic effect, but such results were less preferred. The primary difference between the more conventional farfield crosstalk and the novel nearfield crosstalk developed for this study was the introduction of a level and a time difference at low frequency, consistent with what actually occurs for sound sources very close to a listener’s head.

  Download: PDF (HIGH Res) (3.7MB)

  Download: PDF (LOW Res) (315KB)

  Be the first to discuss this paper

Numerical Modeling and Sound Synthesis for Articulated String/Fretboard Interactions


Sound synthesis for fretted instruments, such as the guitar, has a long history because of numerous challenges; the interaction between the finger, string and fretboard under variable playing conditions is a delicate one, leading to many subtle features difficult to emulate using standard synthesis methods. A physical modeling approach thus becomes an attractive option. A vibrating string is subject to intermittent contact/recontact phenomena along the length of the fretboard, and the string is driven by a plucking interaction and stopped by a finger, the position of which and force applied by are gestural parameters. In this research, a finite-difference time-domain method is developed with a penalty potential allowing for a convenient model of distributed collision. Implementation details are discussed and simulation results and visualizations are presented illustrating a variety of typical playing gestures. Finally, given that such methods for highly nonlinear systems are prone to numerical instability, a brief description of an energy-balanced or Hamiltonian framework is provided, allowing for convenient numerical stability conditions.

  Download: PDF (HIGH Res) (3.5MB)

  Download: PDF (LOW Res) (425KB)

  Be the first to discuss this paper

Noise Reduction Performance of Various Signals for Impulse Response Measurement

Open Access



When measuring the acoustic impulse response in a real environment, acoustic and electrical noise corrupts the result. To improve the quality of measurements, researchers often use various excitation measurement signals, such as a linearly swept sine (SS) wave or a maximum length sequence (MLS) signal because they have high energy at all frequencies. But when the noise is dominated by low-frequency components, the S/N in this region is reduced. In this study, the noise reduction performance (NRP) of different excitation signals was theoretically examined to derive equations that can determine the NRP from the spectra of the measurement signal and noise. From the theoretical and experimental examinations, the following results were obtained. The NPR for white signals and noise-whitening signals is actually the same. Only the minimum noise (MN) signal that minimizes the noise component showed a significant improvement in NPR. A pink spectrum measurement signal showed good NPR in the presence of 1/k2 spectrum noise, where k is the discrete frequency number, but worse performance with other types of noise. This supports the conclusion that using the MN signal, which has a power spectrum that is the square root of the power spectrum of the noise, is the best method of reducing the effect of noise on the measured impulse response.

  Download: PDF (HIGH Res) (5.2MB)

  Download: PDF (LOW Res) (497KB)

  Be the first to discuss this paper

Audio Pattern Recognition of Baby Crying Sound Events


Infants can communicate their internal state (such as pain, hunger, fear, fatigue, or stress) by the nature of their crying. Experts in linguistics suggest that the cry comprises the first speech manifestations. This article describes the design methodology for classifying baby crying sound events according to the pathological status of the infant. Such an automated system can be an aid to an attending physician performing a diagnosis. In order to address this challenge, a great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) were considered. Classification techniques, including Multilayer Perception, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, and Hidden Markov model were customized. The goal is to provide an automatic and noninvasive framework for monitoring infants and helping inexperienced/trainee pediatricians, parents, and baby caregivers to identify the baby’s pathological status.

  Download: PDF (HIGH Res) (4.6MB)

  Download: PDF (LOW Res) (414KB)

  Be the first to discuss this paper


56th Conference Report, London

Page: 370

Download: PDF (691KB)

57th Conference Report, Hollywood

Page: 376

Download: PDF (1.3MB)

58th Conference Preview, Aalborg

Page: 386

Download: PDF (948KB)

58th Conference Preliminary Program

Page: 388

Download: PDF (98KB)

59th Conference Preview, Montreal

Page: 392

Download: PDF (416KB)

Immersive Audio, Objects, and Coding


Immersive audio is an increasingly important topic for the audio industry, and object-based treatment is related to it. The perceptual advantages of immersive audio depend strongly on the program material, and various novel coding schemes have emerged to deliver it at low bit rates.

  Download: PDF (429KB)

  Be the first to discuss this feature

Special Issue on Music Induced Hearing Disorders, Call for Papers

Page: 407

Download: PDF (57KB)


Section News

Page: 399

Download: PDF (155KB)

President���s Activities Report

Page: 401

Download: PDF (71KB)

Technology Trends in Audio Engineering

Page: 402

Download: PDF (39KB)

Book Review

Page: 403

Download: PDF (60KB)

Products and Developments

Page: 405

Download: PDF (131KB)

AES Conventions and Conferences

Page: 408

Download: PDF (84KB)


Table of Contents

Download: PDF (58KB)

Cover & Sustaining Members List

Download: PDF (41KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (58KB)

AES - Audio Engineering Society