AES New York 2009
Paper Session P13

P13 - Spatial Audio


Sunday, October 11, 9:00 am — 1:00 pm
Chair: Jean-Marc Jot

P13-1 Microphone Array Optimization for a Hearing Restoration HeadsetMarty Johnson, Philip Gillett, Efrain Perini, Alessandro Toso, Virginia Tech - Blacksburg, VA, USA; Daniel Harris, Sennheiser Research Laboratory - Palo Alto, CA, USA
Subjects wearing communications or hearing protection headsets lose the ability to localize sound accurately. Here we describe a hearing restoration headset designed to restore a user’s natural hearing by processing signals from an array of microphones using a filter-and-sum technique and presenting the result to the user via the headset’s speakers. The filters are designed using a phase compensation technique for mapping the microphone array manifolds (or directional transfer functions) onto the target HRTFs. To optimize the performance of the system, a 3-D numerical model of a KEMAR mannequin with headset was built and verified experimentally up to 12 KHz. The numerical model was used to optimize a three microphone array that demonstrated low reconstruction error up to 12 KHz.
Convention Paper 7910 (Purchase now)

P13-2 Optimized Parameter Estimation in Directional Audio Coding Using Nested Microphone ArraysGiovanni Del Galdo, Oliver Thiergart, Fabian Kuech, Maja Taseskma, Divya Sishtla V.N., Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional Audio Coding (DirAC) is an efficient technique to capture and reproduce spatial sound on the basis of a downmix audio signal, direction of arrival, and diffuseness of sound. In practice, these parameters are determined using arrays of omnidirectional microphones. The main drawback of such configurations is that the estimates are reliable only in a certain frequency range, which depends on the array size. To overcome this problem and cover large bandwidths, we propose concentric arrays of different sizes. We derive optimal joint estimators of the DirAC parameters with respect to the mean squared error. We address the problem of choosing the optimal array sizes for specific applications such as teleconferencing and we verify our findings with measurements.
Convention Paper 7911 (Purchase now)

P13-3 Modification of HRTF Filters to Reduce Timbral Effects in Binaural SynthesisJuha Merimaa, Sennheiser Research Laboratory - Palo Alto, CA, USA
Using head-related transfer functions (HRTFs) in binaural synthesis often produces undesired timbral coloration. In this paper a method for designing modified HRTF filters with reduced timbral effects is proposed. The method is based on reducing the variation in the root-mean-square spectral sum of a pair of HRTFs while preserving the interaural time difference and interaural level difference. In formal listening tests it is shown that the coloration due to the tested non-individualized HRTFs can be significantly reduced without altering the resulting localization.
Convention Paper 7912 (Purchase now)

P13-4 An Active Multichannel Downmix Enhancement for Minimizing Spatial and Spectral DistortionsJeffrey Thompson, Aaron Warner, Brandon Smith, DTS, Inc. - Agora Hills, CA, USA
With the continuing growth of multichannel audio formats, the issue of downmixing to legacy formats such as stereo or mono remains an important problem. Traditional downmix methods use fixed downmix coefficients and mixing equations to blindly combine N input channels into M output channels, where N is greater than M. This commonly produces unpredictable and unsatisfactory results due to the dependence of these passive methods on input signal characteristics. In this paper an active downmix enhancement employing frequency domain analysis of key inter-channel spatial cues is described that minimizes various distortions commonly observed in downmixed audio such as spatial inaccuracy, timbre change, signal coloration, and reduced intelligibility.
Convention Paper 7913 (Purchase now)

P13-5 Physical and Perceptual Properties of Focused Virtual Sources in Wave Field SynthesisSascha Spors, Hagen Wierstorf, Matthias Geier, Jens Ahrens, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
Wave field synthesis is a well established high-resolution spatial sound reproduction technique. Its physical basis allows reproduction of almost any desired wave field, even virtual sources that are positioned in the area between the loudspeakers and the listener. These are known as focused sources. A previous paper has revealed that focused sources have a number of remarkable physical properties, especially in the context of spatial sampling. This paper will further investigate on these and other physical artifacts. Additionally, results of perceptual experiments will be discussed in order offer a conclusion on the perceptual relevance of the derived artifacts in practical implementations.
Convention Paper 7914 (Purchase now)

P13-6 Localization Curves for a Regularly-Spaced Octagon Loudspeaker ArrayLaurent S. R. Simon, Russell Mason, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Multichannel microphone array designs often use the localization curves that have been derived for 2-0 stereophony. Previous studies showed that side and rear perception of phantom image locations require somewhat different curves. This paper describes an experiment conducted to evaluate localization curves using an octagon loudspeaker setup. Interchannel level differences were produced between the loudspeaker pairs forming each of the segments of the loudspeaker array, one at a time, and subjects were asked to evaluate the perceived sound event's direction and its locatedness. The results showed that the localization curves derived for 2-0 stereophony are not directly applicable, and that different localization curves are required for each loudspeaker pair.
Convention Paper 7915 (Purchase now)

P13-7 Fixing the Phantom Center: Diffusing Acoustical CrosstalkEarl Vickers, STMicroelectronics - Santa Clara, CA, USA
When two loudspeakers play the same signal, a "phantom center" image is produced between the speakers. However, this image differs from one produced by a real center speaker. In particular, acoustical crosstalk produces a comb-filtering effect, with cancellations that may be in the frequency range needed for the intelligibility of speech. We present a method for using phase decorrelation to fill in these gaps and produce a flatter magnitude response, reducing coloration and potentially enhancing dialog clarity. This method also improves headphone compatibility and reduces the tendency of the phantom image to move toward the nearest speaker.
Convention Paper 7916 (Purchase now)

P13-8 Frequency-Domain Two- to Three-Channel Upmix for Center Channel Derivation and Speech EnhancementEarl Vickers, STMicroelectronics - Santa Clara, CA, USA
Two- to three-channel audio upmix can be useful in a number of contexts. Adding a front center loudspeaker provides a more stable center image and an increase in dialog clarity. Even in the absence of a physical center loudspeaker, the ability to derive a center channel can facilitate speech enhancement by making it possible to boost or filter the dialog, which is usually panned to the center. Two- to three-channel upmix can also be a first step in upmixing from two to five channels. We propose a frequency-domain upmix process using a vector-based signal decomposition, including methods for improving the selectivity of the center channel extraction. A geometric interpretation of the algorithm is provided. Unlike most existing frequency-domain upmix methods, the current algorithm does not perform an explicit primary/ambient decomposition. This reduces the complexity and improves the quality of the center channel derivation.
Convention Paper 7917 (Purchase now)