In This Section
Journal of the AES
2014 July/August - Volume 62 Number 7/8
Auditory displays, driven by nonauditory data, are often used to present a sound scene to a listener. Typically, the sound field places sound objects at different locations, but the scene becomes aurally richer if the perceived sonic objects have a spatial extent (size), called volumetric virtual coding. Previous research in virtual-world Directional Audio Coding has shown that spatial extent can be synthesized from monophonic sources by applying a time-frequency-space decomposition, i.e., randomly distributing time-frequency bins of the source signal. This technique does not guarantee a stable size and the timbre can degrade. This study explores how to optimize volumetric coding in terms of timbral and spatial perception. The suggested approach for most types of audio uses an STFT window size of 1024 samples and then distributes the frequency bands from lowest to highest using the Halton sequence. The results from two formal listening experiments are presented.
There are two common approaches to the design of IIR filters: (a) match the complex frequency response (magnitude-phase) or, equivalently, the impulse response, or (b) minimize only the magnitude error while ignoring the phase response. This research describes a third approach, called magnitude-priority filter design, which merges the two methods. If possible the algorithm matches the complex response, and when not possible it transitions to magnitude only criteria. This is especially useful with quasi-logarithmic frequency-resolution filters because the frequency-dependent windowing of the impulse response requires the high-frequency attenuation to be corrected. Since the magnitude-priority method does not make any assumption about the core design algorithm, this method can be combined with any technique that attempts to match the complex transfer function or impulse response.
Speech Transmission Index for the English Language Verified Under Reverberant Conditions with Two Binaural Listening Methods: Real-Life and Headphones
The speech transmission index (STI) is one of the most widely used standardized methods for objective prediction of speech intelligibility. This research verifies the current STI for the English language for male speech under reverberant conditions using two binaural listening methods: real-life and headphones. For most scenarios, the intelligibility of phonetically balanced word lists was lower for both headphones and real-life than indicated by the standard PB/STI curve included in IEC 60268-16. Headphone listening had a statistically significant lower intelligibility than real-life listening. The results suggest that under reverberant conditions, the speech intelligibility for a specific STI value is more degraded than suggested by the standard PB/STI curve for STI values lower than 0.60.
When small loudspeakers reproduce audio at high volumes, the resulting degradation requires the use of equalization and linearization to attain quality. This requires accurate modeling and identification of the loudspeaker. As described previously, a technique based on polynomial state-space representation can be used in the loudspeaker identification process because it encompasses both the linear and nonlinear characteristics in one compact model. The nonlinear part, which is added to the linear part, is described by combination of cross-products of state and input variables. In order to reduce the complexity of this task, the loudspeaker model was split into two cascading parts: the electromechanical motor that transforms the input audio into displacement and the mechanico-acoustic component that transforms the displacement into an acoustic wave. In this paper, the authors show that the electrical impedance of the motor can be properly modeled and identified as a Fractional Order (FO) system. Experimental results demonstrate that the FO approach results in both a lower fitting error and a smaller order compared to the traditional integer order approach.
When the number of available loudspeakers or transmission channels is smaller than the number of channels in an audio format, downmixing is required. If the audio in the channels contain nonaligned interdependent sounds, the downmixed signal may have perceptible spectral biases, such as that produced by a comb filter. A time–frequency domain, phase-adaptive downmixing technique is proposed to reduce such spectral effects. The technique aligns the phases of the input channel pairs or groups having a high measured normalized interchannel coherence prior to the downmixing. Simulations and listening tests were conducted to show the conditions in which the proposed method provides benefit with respect to the legacy methods. Computational evaluations showed that the method may be implemented in real time for a large number of channels using reasonable hardware. The target for the phase processing is weighted with the input channel amplitudes, and the phase coefficients are regularized over time and frequency to avoid processing artifacts.
Standards and Information Documents
AES Standards Committee News
AESSC Roadmap; Serial Multichannel Audio Digital Interface (MADI); XLR connector polarity; audio connectors; audio metadata
136th Convention Report, Berlin
136th Convention Exhibitors and Sponsors
137th Convention Preview, Los Angeles
137th Convention Exhibitor and Sponsor Previews
There has been a resurgence of interest in vinyl record production, with sales continuing to rise year on year. During the 135th Convention a panel of vinyl mastering engineers discussed the dos and don’ts of lacquer cutting and vinyl quality control.
As humans, we have to accept a life?with limits: how weak the sounds we can hear, how fast we can run, how much negative criticism we’ll accept, how fast we are permitted to drive. Sometimes the limits are given by physical or biological laws. Other times the limits are rather determined by moral, cultural, or behavioral factors. By nature, we like to exceed limits when possible. Exceeding speed limits on the road may cause traffic accidents, or just a ticket. In audio there are certain limits we must obey. We should not exceed 0 dBFS in digital recording level, for example. (Despite this it would seem that some put all their efforts into doing so anyway...) In live concerts, we may meet legislative limitations to the maximum SPL in order to minimize the risk of hearing damage of the concertgoers and employed personnel.
136th Convention papers order form