AES San Francisco 2012
Poster Session P11
P11 - Spatial Audio
Saturday, October 27, 2:00 pm — 3:30 pm (Foyer)
P11-1 Blind Upmixing for Height and Wide Channels Based on an Image Source Method—Sunwoong Choi, Yonsei University - Seoul, Korea; Dong-il Hyun, Younsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Seokpil Lee, Korea Electronics Technology Institute (KETI) - Seoul, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
In this paper we present a method of synthesizing the height and wide channel signals for stereo upmx to multichannel format beyond 5.1. To provide an improved envelopment, reflections from ceiling and side walls are considered for the height and wide channel synthesis. Early reflections (ERs) corresponding to the spatial sections covered by the height and wide channel speakers are separately synthesized using the image source method, and the parameters for the ER generation are determined from the primary-to-ambient ratio (PAR) estimated from the stereo signal. Later, the synthesized ERs are mixed with decorrelated ambient signals and transmitted to the respective channels. Subjective listening tests verify that listener envelopment can be improved by using the proposed method.
Convention Paper 8752 (Purchase now)
P11-2 Spatial Sound Design Tool for 22.2 Channel 3-D Audio Productions, with Height—Wieslaw Woszczyk, McGill University - Montreal, QC, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
Advanced television and cinema systems utilize multiple loudspeakers distributed in three dimensions potentially allowing sound sources and ambiances to appear anywhere in the 3-D space enveloping the viewers, as is the case in 22.2 channel audio format for Ultra High Definition Television (UHDTV). The paper describes a comprehensive tool developed specifically for designing auditory spaces in 22.2 audio but adaptable to any advanced multi-speaker 3-D sound rendering system. The key design goals are the ease of generating and manipulating ambient environments in 3-D and time code automation for creating dynamic spatial narration. The system uses low-latency convolution of high-resolution room impulse responses contained in the library. User testing and evaluation show that the system’s features and architecture enable fast and effective spatial design in 3-D audio.
Convention Paper 8753 (Purchase now)
P11-3 Efficient Primary-Ambient Decomposition Algorithm for Audio Upmix—Yong-Hyun Baek, Yonsei University - Wonju, Kwangwon-do, Korea; Se-Woon Jeon, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Seokpil Lee, Korea Electronics Technology Institute (KETI) - Seoul, Korea
Decomposition of a stereo signal into the primary and ambient components is a key step to the stereo upmix and it is often based on the principal component analysis (PCA). However, major shortcoming of the PCA-based method is that accuracy of the decomposed components is dependent on both the primary-to-ambient power ratio (PAR) and the panning angle. Previously, a modified PCA was suggested to solve the PAR-dependent problem. However, its performance is still dependent on the panning angle of the primary signal. In this paper we proposed a new PCA-based primary-ambient decomposition algorithm whose performance is not affected by the PAR as well as the panning angle. The proposed algorithm finds scale factors based on a criterion that is set to preserve the powers of the mixed components, so that the original primary and ambient powers are correctly retrieved. Simulation results are presented to show the effectiveness of the proposed algorithm.
Convention Paper 8754 (Purchase now)
P11-4 On the Use of Dynamically Varied Loudspeaker Spacing in Wave Field Synthesis—Rishabh Ranjan, Nanyang Technological University - Singapore, Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore
Wave field synthesis (WFS) has evolved as a promising spatial audio rendering technique in recent years and has been widely accepted as the optimal way of sound reproduction technique. Suppressing spatial aliasing artifacts and accurate reproduction of sound field remain the focal points of research in WFS over the recent years. The use of optimum loudspeaker configuration is necessary to achieve perceptually correct sound field in the listening space. In this paper we analyze the performance of dynamically spaced loudspeaker arrays whose spacing varies with the audio signal frequency content. The proposed technique optimizes the usage of a prearranged set of loudspeaker arrays to avoid spatial aliasing at relatively low frequencies as compared to uniformly fixed array spacing in conventional WFS setups.
Convention Paper 8755 (Purchase now)
P11-5 A Simple and Efficient Method for Real-Time Computation and Transformation of Spherical Harmonic-Based Sound Fields—Robert E. Davis, University of the West of Scotland - Paisley, Scotland, UK; D. Fraser Clark, University of the West of Scotland - Paisley, Scotland, UK
The potential for higher order Ambisonics to be applied to audio applications such as virtual reality, live music, and computer games relies entirely on the real-time performance characteristics of the system, as the computational overhead determines factors of latency and, consequently, user experience. Spherical harmonic functions are used to describe the directional information in an Ambisonic sound field, and as the order of the system is increased, so too is the computational expense, due to the added number of spherical harmonic functions to be calculated. The present paper describes a method for simplified implementation and efficient computation of the spherical harmonic functions and applies the technique to the transformation of encoded sound fields. Comparisons between the new method and typical direct calculation methods are presented.
Convention Paper 8756 (Purchase now)
P11-6 Headphone Virtualization: Improved Localization and Externalization of Non-Individualized HRTFs by Cluster Analysis—Robert P. Tame, DTS, Inc. - Bangor, County Down, UK; QMUL - London, UK; Daniele Barchiese, Queen Mary University of London - London, UK; Anssi Klapuri, Queen Mary University of London - London, UK
Research and experimentation is described that aims to prove the hypothesis that by allowing a listener to choose a single non-individualized profile of HRTFs from a subset of maximally different best representative profiles extracted from a database improved localization, and externalization can be achieved for the listener. k-means cluster analysis of entire impulse responses is used to identify the subset of profiles. Experimentation in a controlled environment shows that test subjects who were offered a choice of a preferred HRTF profile were able to consistently discriminate between a front center or rear center virtualized sound source 78.6% of the time, compared with 64.3% in a second group given an arbitrary HRTF profile. Similar results were obtained from virtualizations in uncontrolled environments.
Convention Paper 8757 (Purchase now)
P11-7 Searching Impulse Response Libraries Using Room Acoustic Descriptors—David Benson, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
The ease with which Impulse Response (IR) libraries can be searched is a principal determinant of the usability of a convolution reverberation system. Popular software packages for convolution reverb typically permit searching over metadata that describe how and where an IR was measured, but this "how and where" information often fails to adequately characterize the perceptual properties of the reverberation associated with the IR. This paper explores an alternative approach to IR searching based not on “how and where” descriptors but instead on room acoustics descriptors that are thought to be more perceptually relevant. This alternative approach was compared with more traditional approaches on the basis of a simple IR search task. Results are discussed.
Convention Paper 8758 (Purchase now)
P11-8 HRIR Database with Measured Actual Source Direction Data—Javier Gómez Bolaños, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
A database is presented consisting of head-related impulse responses (HRIR) of 21 subjects measured in an anechoic chamber with simultaneous measurement of head position and orientation. The HRIR data for sound sources at 1.35 m and 68 cm in 240 directions with elevations between ±45 degrees and full azimuth range were measured using the blocked ear canal method. The frequency region of the measured responses ranges from 100 Hz up to 20 kHz for a flat response (+0.1dB / –0.5 dB). This data is accompanied with the measured azimuth and elevation of the source respect to the position and orientation of the subject's head obtained with a tracking system based on infrared cameras. The HRIR data is accessible from the Internet.
Convention Paper 8759 (Purchase now)
P11-9 On the Study of Frontal-Emitter Headphone to Improve 3-D Audio Playback—Kaushik Sunder, Nanyang Technological University - Singapore, Singapore; Ee-Leng Tan, Nanyang Technological University - Singapore, Singapore; Woon-Seng Gan, Nanyang Technological University - Singapore, Singapore
Virtual audio synthesis and playback through headphones by its virtue have several limitations, such as the front-back confusion and in-head localization of the sound presented to the listener. Use of non-individual head related transfer functions (HRTFs) further increases these front-back confusion and degrades the virtual auditory image. In this paper we present a method for customizing non-individual HRTFs by embedding personal cues using the distinctive morphology of the individual’s ear. In this paper we study the frontal projection of sound using headphones to reduce the front-back confusion in 3-D audio playback. Additional processing blocks, such as decorrelation and front-back biasing are implemented to externalize and control the auditory depth of the frontal image. Subjective tests are conducted using these processing blocks, and its impact to localization is reported in this paper.
Convention Paper 8760 (Purchase now)
P11-10 Kinect Application for a Wave Field Synthesis-Based Reproduction System—Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Paolo Peretti, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Wave field synthesis is a reproduction technique capable to reproduce realistic acoustic image taking advantage of a large number of loudspeakers. In particular, it is possible to reproduce moving sound sources, achieving good performance in terms of sound quality and accuracy. In this context, an efficient application of a wave field synthesis reproduction system is proposed, introducing a Kinect control on the transmitting room, capable to accurately track the source movement and thus preserving the spatial representation of the acoustic scene. The proposed architecture is implemented using a real time framework considering a network connection between the receiving and transmitting room: several tests have been performed to evaluate the realism of the achieved performance.
Convention Paper 8761 (Purchase now)