In This Section
Journal of the AES
2013 December - Volume 61 Number 12
The ISO/MPEG Unified Speech and Audio Coding Standard—Consistent High Quality for All Content Types and at All Bit Rates
With the advent of devices that unite a multitude of functionalities, the industry has an increased demand for an audio codec that can deal equally well with all types of audio content. In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) finalized the new MPEG-D Unified Speech and Audio Coding standard, bringing together the previously separated worlds of general audio and speech coding. It does so by integrating elements from audio coding and speech coding into a unified system over a wide range of bit rates. The present publication outlines all aspects of this standardization effort, starting with the history and motivation of the MPEG work. Technical features of the final system are described. Listening test results and performance numbers show the advantages of the new system over current state-of-the-art codecs.
The localization behaviors of panning based on interchannel level difference (ICLD) and interchannel time difference (ICTD) at different target image positions were investigated using musical sources with different spectral and temporal characteristics as well as a wideband speech source. The results indicate that a level panning can perform robustly regardless of the spectral and temporal characteristics of source signals, whereas time panning is not suitable for a continuous source with a high fundamental frequency. Statistical differences between the data obtained for different sources were found to be insignificant, and a unified set of ICLD and ICTD values for 10°, 20°, and 30° image positions was derived. Linear level and time panning functions for the two separate panning regions of 0°–20° and 21°–30° are further proposed, and their applicability to arbitrary loudspeaker base angle is also considered. These perceptual panning functions are expected to be more accurate than the theoretical sine or tangent law in terms of matching between predicted and actually perceived image positions.
Non-individualized head related transfer functions (HRTF) limit the spatial accuracy of conventional side projection headphones. This research explores the use of a frontal projection headphone, which customizes the HRTF by introducing idiosyncratic pinna cues. In addition, a robust headphone equalization technique is recommended for frontal projection headphone playback to preserve the embedded personal pinna cues. Perceptual experiments validated the effectiveness of frontal headphone playback over the conventional headphones with reduced front-back confusions and improved frontal localization. It was also observed that the individual spectral cues created by the frontal projection are sufficient for front-back discrimination even with the high frequency pinna cues removed from the non-individual HRTF.
An experimental study investigated the performance of an innovative Wave Field Synthesis (WFS) system in terms of a listener’s ability to localize sound sources in the median plane. Performance was measured by localization accuracy, precision and response time under a variety of conditions that included two seating positions, five levels of elevations, and two spatial precision settings. Localization precision was 6°– 9° with only 24 loudspeakers in the WFS system covering the frontal quarter of the upper half sphere of the listening space. Localization performance was good in comparison to other studies with denser loudspeaker arrays or with other reproduction techniques. The implemented 3-D WFS technique is a serious alternative to other state-of-the-art spatialization methods.
An adaptive equalizer can be used to reduce the possibility of acoustic feedback in audio applications involving amplified closed loop systems, such as in public address configurations. The equalizer decreases the gain at those frequencies where feedback is likely to occur. Using a computationally efficient algorithm based on a short-term Fourier transform, the equalization curve is determined by using information obtained from an adaptively estimated feedback path. Using the proposed scheme, feedback gain is reduced before ringing and howling artifacts become noticeable since the estimated feedback path provides rapid indication of the frequencies that are vulnerable to excessive feedback. A real time test illustrates the functionality of the algorithm.
The author describes a general class of polynomial allpole lowpass filters that include the more common allpole filters as special cases. A method is provided to design lowpass filter prototypes that exactly match the frequency domain specification with a continuous range of selectivity and passband ripple. For many applications Butterworth filters can be replaced with filters having a at least one less order. This approach may allow the standard 4th order Butterworth high- and low-pass filters in woofer-tweeter crossover circuits to be replaced by 3rd filters.
Standards and Information Documents
AES Standards Committee News
Loudspeaker modeling and measurement; microphone measurement and characterization
Call for Awards Nominations
135th Convention Report, New York
135th Convention Exhibitors
The engineering and perceptual aspects of sound field control were discussed in detail by the world’s experts at a recent international conference on the topic. We summarize some of the key challenges and solutions highlighted in selected papers presented at the event, concluding that there is still some way to go before a satisfactory trade-off is achieved between mathematical accuracy and perceptual adequacy.
53rd Conference Program, London
Call for Nominations for Board of Governors
Index to Volume 61