AES Rome 2013
Spatial Sound Track Event Details

Saturday, May 4, 11:00 — 12:30 (Sala Manzoni)

Workshop: W2 - Applications of 3-D Audio in Automotive

Alan Trevena, Jaguar Land Rover - Gaydon, UK
Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Michael Kelly, DTS, Inc. - London, UK
Adam Sulowski, Audi AG - Ingolstadt, Germany
Bert Van Daele, Auro Technologies NV - Mol, Belgium

While there are a number of technologies aimed at improving the spatial rendering of recorded sounds in automobiles, few are offer the advantages and challenges as 3-D surround. This workshop will explore theoretical applications and system configurations as well as limitations of 3-D surround applications in automotive. Questions such as what is the reference experience, and how is a system evaluated will be addressed.


Saturday, May 4, 14:15 — 17:15 (Sala Carducci)

Paper Session: P3 - Perception

Frank Melchior, BBC Research and Development - Salford, UK

P3-1 The Relation between Preferred TV Program Loudness, Screen Size, and Display FormatIan Dash, Consultant - Marrickville, NSW, Australia; Todd Power, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia
The effect of television screen size and display format on preferred TV program loudness was investigated by listening tests using typical program material. While no significant influence on preferred program loudness was observed from screen size or color level, other preference patterns related to soundtrack content type were observed that are of interest.
Convention Paper 8817 (Purchase now)

P3-2 Vibration in Music PerceptionSebastian Merchel, Dresden University of Technology - Dresden, Germany; M. Ercan Altinsoy, Dresden University of Technology - Dresden, Germany
The coupled perception of sound and vibration is a well-known phenomenon during live pop or organ concerts. However, even during a symphonic concert in a classical hall, sound can excite perceivable vibrations on the surface of the body. This study analyzes the influence of audio-induced vibrations on the perceived quality of the concert experience. Therefore, sound and seat vibrations are controlled separately in an audio reproduction scenario. Because the correlation between sound and vibration is naturally strong, vibrations are generated from audio recordings using various approaches. Different parameters during this process (frequency and intensity modifications) are examined in relation to their perceptual consequences using psychophysical experiments. It can be concluded that vibrations play a significant role during the perception of music.
Convention Paper 8818 (Purchase now)

P3-3 An Assessment of Virtual Surround Sound Systems for Headphone Listening of 5.1 Multichannel AudioChris Pike, BBC Research and Development - Salford, York, UK; Frank Melchior, BBC Research and Development - Salford, UK
It is now common for broadcast signals to feature 5.1 surround sound. It is also increasingly common that audiences access broadcast content on portable devices using headphones. Binaural techniques can be applied to create a spatially enhanced headphone experience from surround sound content. This paper presents a subjective assessment of the sound quality of 12 state-of-the-art systems for converting 5.1 surround sound to a 2-channel signal for headphone listening. A multiple stimulus test was used with hidden reference and anchors; the reference stimulus was an ITU stereo down-mix. Dynamic binaural synthesis, based on individualized binaural room impulse response measurements and head orientation tracking, was also incorporated into the test. The experimental design and detailed analysis of the results are presented in this paper.
Convention Paper 8819 (Purchase now)

P3-4 Effect of Target Signal Envelope on Direction Discrimination in Spatially Complex Sound ScenariosOlli Santala, Aalto University School of Electrical Engineering - Aalto, Finland; Marko Takanen, Aalto University School of Electrical Engineering - Aalto, Finland; Ville Pulkki, Aalto University - Aalto, Finland
The temporal envelope of a sound signal has been found to have an effect on localization. Whether this is valid for spatially complex scenarios was addressed by conducting a listening experiment in which a spatially distributed sound source consisted of a target between two interfering noise-like sound sources, all emitting sound simultaneously. All the signals were harmonic complex tones with components within 2 kHz–8.2 kHz and were presented using loudspeaker reproduction in an anechoic chamber. The phases of the harmonic tones of the target signal were altered, causing the envelope to change. The results indicated that prominent peaks in the envelope of the target signal aided in the discrimination of its direction inside the widely distributed sound source.
Convention Paper 8820 (Purchase now)

P3-5 A Framework for Adaptive Real-Time Loudness ControlAndrea Alemanno, Sapienza University of Rome - Rome, Italy; Alessandro Travaglini, Fox International Channels Italy - Guidonia Montecelio (RM), Italy; Simone Scardapane, Sapienza University of Rome - Rome, Italy; Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
Over the last few years, loudness control represents one of the most frequently investigated topics in audio signal processing. In this paper we describe a framework designed to provide adaptive real-time loudness measurement and processing of audio files and streamed content being reproduced by mobile players hosted in laptops, tablets, and mobile phones. The proposed method aims to improve the users’ listening experience by normalizing the loudness level of the content in real-time, while preserving the original creative intent of the original soundtrack. The loudness measurement and adaptation is based on a customization of the High Efficiency Loudness Model algorithm described in the AES Convention Paper #8612 (“HELM: High Efficiency Loudness Model for Broadcast Content,” presented at the 132nd Convention, April 2012). Technical and subjective tests were performed in order to evaluate the performance of the proposed method. In addition, the way the subjective test was arranged offered the opportunity to gather information on the preferred Target Level of streamed and media files reproduced on portable devices.
Convention Paper 8821 (Purchase now)

P3-6 The Perception of Masked Sounds and Reverberation in 3-D vs. 2-D Playback SystemsGiulio Cengarle, Imm Sound S.A., a Dolby company - Barcelona, Spain; Alexandre Pereda, Fundació Barcelona Media - Barcelona, Spain
This paper presents studies on perceptual aspects of spatial audio and their dependency on the playback format. The first study regards the perception of sound in the presence of a masker in stereo, 5.1, and 3-D. Psychoacoustic tests show that the detection threshold improves with the spread of the masker, which justifies the claim that individual elements of dense soundtracks are more audible when they are distributed in a wider panorama. The second study indicates that the perception of the reverberation level does not depend on the playback format. The joint interpretation of these results justifies the claim that in 3-D sound engineers can use higher levels of reverberation without compromising the intelligibility of the sound sources.
Convention Paper 8822 (Purchase now)


Sunday, May 5, 09:00 — 13:00 (Sala Carducci)

Paper Session: P6 - Recording and Production

Alex Case, University of Massachusetts—Lowell - Lowell, MA, USA

P6-1 Automated Tonal Balance Enhancement for Audio Mastering ApplicationsStylianos-Ioannis Mimilakis, Technological Educational Institute of Ionian Island - Lixouri, Greece; Konstantinos Drossos, Ionian University - Corfu, Greece; Andreas Floros, Ionian University - Corfu, Greece; Dionysios Katerelos, Technological Educational Institute of Ionian Island - Lixouri, Greece
Modern audio mastering procedures are involved with the selective enhancement or attenuation of specific frequency bands. The main reason is the tonal enhancement of the original / unmastered audio material. The aforementioned process is mostly based on the musical information and the mode of the audio material. This information can be retrieved from a listening procedure of the original stimuli, or the correspondent musical key notes. The current work presents an adaptive and automated equalization system that performs the aforementioned mastering procedure, based on a novel method of fundamental frequency tracking. In addition to this, the overall system is being evaluated with objective PEAQ analysis and subjective listening tests in real mastering audio conditions.
Convention Paper 8836 (Purchase now)

P6-2 A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone TypesBrecht De Man, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
An essential but complicated task in the audio production process is the selection of microphones that are suitable for a particular source. A microphone is often chosen based on price or common practices, rather than whether the microphone actually sounds best in that particular situation. In this paper we perceptually assess six microphone types for recording a female singer. Listening tests using a pairwise and multiple stimuli approach are conducted to identify the order of preference of these microphone types. The results of this comparison are discussed, and the performance of each approach is assessed.
Convention Paper 8837 (Purchase now)

P6-3 Comparison of Power Supply Pumping of Switch-Mode Audio Power Amplifiers with Resistive Loads and Loudspeakers as LoadsArnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Lars Press Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Power supply pumping is generated by switch-mode audio power amplifiers in half-bridge configuration, when they are driving energy back into their source. This leads in most designs to a rising rail voltage and can be destructive for either the decoupling capacitors, the rectifier diodes in the power supply or the power stage of the amplifier. Therefore precautions are taken by the amplifier and power supply designer to avoid those effects. Existing power supply pumping models are based on an ohmic load attached to the amplifier. This paper shows the analytical derivation of the resulting waveforms and extends the model to loudspeaker loads. Measurements verify that the amount of supply pumping is reduced by a factor of four when comparing the nominal resistive load to a loudspeaker. A simplified and more accurate model is proposed and the influence of supply pumping on the audio performance is proven to be marginal.
Convention Paper 8838 (Purchase now)

P6-4 The Psychoacoustic Testing of the 3-D Multiformat Microphone Array Design and the Basic Isosceles Triangle Structure of the Array and the Loudspeaker Reproduction ConfigurationMichael Williams, Sounds of Scotland - Le Perreux sur Marne, France
Optimizing the loudspeaker configuration for 3-D microphone array design can only be achieved with a clear knowledge of the psychoacoustic parameters of reproduction of height localization. Unfortunately HRTF characteristics do not seem to give us useful information when applied to loudspeaker reproduction. A set of psychoacoustic parameters have to be measured for different configurations in order to design an efficient microphone array recording system, even more so, if a minimalistic approach to array design is going to be a prime objective. In particular the position of a second layer of loudspeakers with respect to the primary horizontal layer is of fundamental importance and can only be based on the psychoacoustics of height perception. What are the localization characteristics between two loudspeakers situated in each of the two layers? Is time difference as against level difference a better approach to interlayer localization? This paper will try to answer these questions and also justify the basic isosceles triangle loudspeaker structure that will help to optimize the reproduction of height information.
Convention Paper 8839 (Purchase now)

P6-5 A Perceptual Audio Mixing DeviceMichael J. Terrell, Queen Mary University of London - London, UK; Andrew J. R. Simpson, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
A method and device is presented that allows novice and expert audio engineers to perform mixing using perceptual controls. In this paper we use Auditory Scene Analysis [Bregman, 1990, MIT Press, Cambridge] to relate the multitrack component signals of a mix to the perception of that mix. We define the multitrack components of a mix as a group of audio streams, which are transformed into sound streams by the act of reproduction, and which are ultimately perceived as auditory streams by the listener. The perceptual controls provide direct manipulation of loudness balance within a mixture of sound streams, as well as the overall mix loudness. The system employs a computational optimization strategy to perform automatic signal gain adjustments to component audio-streams, such that the intended loudness balance of the associated sound-streams is produced. Perceptual mixing is performed using a complete auditory model, based on a model of loudness for time-varying sound streams [Glasberg and Moore, J. Audio Eng. Soc., vol. 50, 331-342 (2002 May)]. The use of the auditory model enables the loudness balance to be automatically maintained regardless of the listening level. Thus, a perceptual definition of the mix is presented that is listening-level independent, and a method of realizing the mix practically is given.
Convention Paper 8840 (Purchase now)

P6-6 On the Use of a Haptic Feedback Device for Sound Source Control in Spatial Audio SystemsFrank Melchior, BBC Research and Development - Salford, UK; Chris Pike, BBC Research and Development - Salford, York, UK; Matthew Brooks, BBC Research and Development - Salford, UK; Stuart Grace, BBC Research and Development - Salford, UK
Next generation spatial audio systems are likely to be capable of 3-D sound reproduction. Systems currently under discussion require the sound designer to position and manipulate sound sources in three dimensions. New intuitive tools, designed to meet the requirements of audio production environments, are needed to make efficient use of this new technology. This paper investigates a haptic feedback controller as a user interface for spatial audio systems. The paper will give an overview of conventional tools and controllers. A prototype has been developed based on the requirements of different tasks and reproduction methods. The implementation will be described in detail and the results of a user evaluation will be given.
Convention Paper 8841 (Purchase now)

P6-7 Audio Level Alignment—Evaluation Method and Performance of EBU R 128 by Analyzing Fader MovementsJon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
A method is proposed for evaluating audio meters in terms of how well the engineer conforms to a level alignment recommendation and succeeds to achieve evenly perceived audio levels. The proposed method is used to evaluate different meter implementations, three conforming to the recommendation EBU R 128 and one conforming to EBU Tech 3205-E. In an experiment, engineers participated in a simulated live broadcast show and the resulting fader movements were recorded. The movements were analyzed in terms of different characteristics: fader mean level, fader variability, and fader movement. Significant effects were found showing that engineers do act differently depending on the meter and recommendation at hand.
Convention Paper 8842 (Purchase now)

P6-8 Balance Preference Testing Utilizing a System of Variable Acoustic ConditionRichard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
In the modern world of audio production, there exists a significant disconnect between the music mixing control room of the audio professional and the listening space of the consumer or end user. The goal of this research is to evaluate a series of varying acoustic conditions commonly used in such listening environments. Expert listeners are asked to perform basic balancing tasks, under varying acoustic conditions. The listener can remain in position while motorized panels rotate behind a screen, exposing a different acoustic condition for each trial. Results show that listener fatigue as a variable is thereby eliminated, and the subject’s aural memory is quickly cleared, so that the subject is unable to adapt to the newly presented condition for each trial.
Convention Paper 8843 (Purchase now)


Sunday, May 5, 10:00 — 11:00 (Sala Manzoni)


Tutorial: T5 - 3-D Audio—Produce the New Dimension

Tom Ammermann, New Audio Technology GmbH - Hamburg, Germany

This workshop will have two parts. The first will be a presentation where new experiences and different ways to handle the different approaches like film, music, and radio plays will be shown. Complete sessions will be opened and production experiences, editing, as well as workflows and possibilities to delivery 3-D/spatial content into the common market will be shown and discussed.

The second part of this workshopwill be a Q & A session while a group of max. 10 participants can listen and work with 3-D content via a sophisticated 3D speaker virtualization on headphones.


Sunday, May 5, 14:00 — 15:00 (Sala Saba)

TC Meeting: Spatial Audio

Technical Committee Meeting on Spatial Audio


Monday, May 6, 09:00 — 12:00 (Sala Manzoni)

Workshop: W7 - Multichannel Immersive Audio Formats for 3-D Cinema and Home Theater

Christof Faller, Illusonic GmbH - Uster, Switzerland
Brian Claypool, Barco
Kimio Hamasaki, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan
Wilfried Van Baelen, Auro Technologies - Mol, Belgium

Several new immersive sound formats are under active consideration for cinema soundtrack production. Each was developed to create realistic sound "motion" and "immerse" the audience in a more realistic soundfield. This workshop is a repeat of the program presented at AES 133 in San Francisco, with the proponents of four of the leading immersive sound systems to discuss their specific technologies.


Monday, May 6, 09:00 — 13:00 (Sala Carducci)

Paper Session: P12 - Spatial Audio—Part 1: Binaural, HRTF

Michele Geronazzo, University of Padova - Padova, Italy

P12-1 Binaural Ambisonic Decoding with Enhanced Lateral LocalizationTim Collins, University of Birmingham - Birmingham, UK
When rendering an ambisonic recording, a uniform speaker array is often preferred with the number of speakers chosen to suit the ambisonic order. Using this arrangement, localization in the lateral regions can be poor but can be improved by increasing the number of speakers. However, in practice this can lead to undesirable spectral impairment. In this paper a time-domain analysis of the ambisonic decoding problem is presented that highlights how a non-uniform speaker distribution can be used to improve localization without incurring perceptual spectral impairment. This is especially relevant to binaural decoders, where the locations of the virtual speakers are fixed with respect to the head, meaning that the interaction between speakers can be reliably predicted.
Convention Paper 8878 (Purchase now)

P12-2 A Cluster and Subjective Selection-Based HRTF Customization Scheme for Improving Binaural Reproduction of 5.1 Channel Surround SoundBosun Xie, South China University of Technology - Guangzhou, China; Chengyun Zhang, South China University of Technology - Guangzhou, China; Xiaoli Zhong, South China University of Technology - Guangzhou, China
This work proposes a cluster and subjective selection-based HRTF customization scheme for improving binaural reproduction of 5.1 channel surround sound. Based on similarity of HRTFs from an HRTF database with 52 subjects, a cluster analysis on HRTF magnitudes is applied. Results indicate that HRTFs of most subjects can be classified into seven clusters and represented by the corresponding cluster centers. Subsequently, HRTFs used in binaural 5.1 channel reproduction are customized from the seven cluster centers by means of subjective selection, i.e., a subjective selection-based customization scheme. Psychoacoustic experiments indicate that the proposed scheme partly improves the localization performance in the binaural 5.1 channel surround sound.
Convention Paper 8879 (Purchase now)

P12-3 Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer FunctionsPiotr Majdak, Austrian Academy of Sciences - Vienna, Austria; Yukio Iwaya, Tohoku Gakuin University - Tagajo, Japan; Thibaut Carpentier, UMR STMS IRCAM-CNRS-UPMC - Paris, France; Rozenn Nicol, Orange Labs, France Telecom - Lannion, France; Matthieu Parmentier, France Television - Paris, France; Agnieszka Roginska, New York University - New York, NY, USA; Yoiti Suzuki, Tohoku University - Sendai, Japan; Kankji Watanabe, Akita Prefectural University - Yuri-Honjo, Japan; Hagen Wierstorf, Technische Universität Berlin - Berlin, Germany; Harald Ziegelwanger, Austrian Academy of Sciences - Vienna, Austria; Markus Noisternig, UMR STMS IRCAM-CNRS-UPMC - Paris, France
Head-related transfer functions (HRTFs) describe the spatial filtering of the incoming sound. So far available HRTFs are stored in various formats, making an exchange of HRTFs difficult because of incompatibilities between the formats. We propose a format for storing HRTFs with a focus on interchangeability and extendability. The spatially oriented format for acoustics (SOFA) aims at representing HRTFs in a general way, thus, allowing to store data such as directional room impulse responses (DRIRs) measured with a microphone-array excited by a loudspeaker array. SOFA specifications consider data compression, network transfer, a link to complex room geometries, and aim at simplifying the development of programming interfaces for Matlab, Octave, and C++. SOFA conventions for a consistent description of measurement setups are provided for future HRTF and DRIR databases.
Convention Paper 8880 (Purchase now)

P12-4 Head Movements in Three-Dimensional LocalizationTommy Ashby, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Previous studies give contradicting evidence as to the importance of head movements in localization. In this study head movements were shown to increase localization response accuracy in elevation and azimuth. For elevation, it was found that head movement improved localization accuracy in some cases and that when pinna cues were impeded the significance of head movement cues was increased. For azimuth localization, head movement reduced front-back confusions. There was also evidence that head movement can be used to enhance static cues for azimuth localization. Finally, it appears that head movement can increase the accuracy of listeners’ responses by enabling an interaction between auditory and visual cues.
Convention Paper 8881 (Purchase now)

P12-5 A Modular Framework for the Analysis and Synthesis of Head-Related Transfer FunctionsMichele Geronazzo, University of Padova - Padova, Italy; Simone Spagnol, University of Padova - Padova, Italy; Federico Avanzini, University of Padova - Padova, Italy
The paper gives an overview of a number of tools for the analysis and synthesis of head-related transfer functions (HRTFs) that we have developed in the past four years at the Department of Information Engineering, University of Padova, Italy. The main objective of our study in this context is the progressive development of a collection of algorithms for the construction of a totally synthetic personal HRTF set replacing both cumbersome and tedious individual HRTF measurements and the exploitation of inaccurate non-individual HRTF sets. Our research methodology is highlighted, along with the multiple possibilities of present and future research offered by such tools.
Convention Paper 8882 (Purchase now)

P12-6 Measuring Directional Characteristics of In-Ear Recording DevicesFlemming Christensen, Aalborg University - Aalborg, Denmark; Pablo F. Hoffmann, Aalborg University - Aalborg, Denmark; Dorte Hammershøi, Aalborg University - Aalborg, Denmark
With the availability of small in-ear headphones and miniature microphones it is possible to construct combined in-ear devices for binaural recording and playback. When mounting a microphone on the outside of an insert earphone the microphone position deviates from ideal positions in the ear canal. The pinna and thereby also the natural sound transmission are altered by the inserted device. This paper presents a methodology for accurately measuring the directional dependent transfer functions of such in-ear devices. Pilot measurements on a commercially available device are presented and possibilities for electronic compensation of the non-ideal characteristics are considered.
Convention Paper 8883 (Purchase now)

P12-7 Modeling the Broadband Time-of-Arrival of the Head-Related Transfer Functions for Binaural AudioHarald Ziegelwanger, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria
Binaural audio is based on the head-related transfer functions (HRTFs) that provide directional cues for the localization of virtual sound sources. HRTFs incorporate the time-of-arrival (TOA), the monaural timing information yielding the interaural time difference, essential for the rendering of multiple virtual sound sources. In this study we propose a method to robustly estimate spatially continuous TOA from an HRTF set. The method is based on a directional outlier remover and a geometrical model of the HRTF measurement setup. We show results for HRTFs of human listeners from three HRTF databases. The benefits of calculating the TOA in the light of the HRTF analysis, modifications, and synthesis are discussed.
Convention Paper 8884 (Purchase now)

P12-8 Multichannel Ring UpmixChristof Faller, Illusonic GmbH - Uster, Switzerland; Lutz Altmann, IOSONO GmbH - Erfurt, Germany; Jeff Levison, IOSONO GmbH - Erfurt, Germany; Markus Schmidt, Illusonic GmbH - Uster, Switzerland
Multichannel spatial decompositions and upmixes have been proposed, but these are usually based on an unrealistically simple direct/ambient sound model, not capturing the full diversity offered by N discrete audio channels, where in an extreme case each channel can contain an independent sound source. While it has been argued that a simple direct/ambient model is sufficient, in practice such is limiting the achievable audio quality. To circumvent the problem of capturing multichannel signals with a single model, the proposed “ring upmix" applies a cascade of 2-channel upmixes to surround signals to generate channels for setups with more loudspeakers featuring full support for 360-degree panning with high channel separation.
Convention Paper 8908 (Purchase now)


Tuesday, May 7, 09:00 — 12:30 (Sala Carducci)

Paper Session: P16 - Spatial Audio—Part 2: 3-D Microphone and Loudspeaker Systems

Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK

P16-1 Recording and Playback Techniques Employed for the “Urban Sounds” ProjectAngelo Farina, Università di Parma - Parma, Italy; Andrea Capra, University of Parma - Parma, Italy; Alberto Amendola, University of Parma - Parma, Italy; Simone Campanini, University of Parma - Parma, Italy
The “Urban Sounds” project, born from a cooperation of the Industrial Engineering Department at the University of Parma with the municipal institution La Casa della Musica, aims to record characteristic soundscapes in the town of Parma with a dual purpose: delivering to posterity an archive of recorded sound fields to document Parma in 2012, employing advanced 3-D surround recording techniques and creation of a “musical” Ambisonics composition for leading the audience through a virtual tour of the town. The archive includes recordings of various “soundscapes,” such as streets, squares, schools, churches, meeting places, public parks, train station, and airport, and everything was considered unique to the town. This paper details the advanced digital sound processing techniques employed for recording, processing, and playback.
Convention Paper 8903 (Purchase now)

P16-2 Robust 3-D Sound Source Localization Using Spherical Microphone ArraysCarl-Inge Colombo Nilsen, University of Oslo - Oslo, Norway; Squarehead Technology AS - Norway; Ines Hafizovic, SquareHead Technology AS - Oslo, Norway; Sverre Holm, University of Oslo - Oslo, Norway
Spherical arrays are gaining increased interest in spatial audio reproduction, especially in Higher Order Ambisonics. In many audio applications the sound source detection and localization is of crucial importance, urging for robust localization methods suitable for spherical arrays. The well-known direction-of-arrival estimator, the ESPRIT algorithm, is not directly applicable to spherical arrays for 3-D applications. The eigenbeam ESPRIT (EB-ESPRIT) is based on the spherical harmonics framework and is especially derived for spherical arrays. However, the ESPRIT method is in general susceptible to errors in the presence of correlated sources and spatial decorrelation is not possible for spherical arrays. In our new implementation of spherical harmonics-based ESPRIT (SA2ULA-ESPRIT) the robustness against correlation is achieved by spatial decorrelation incorporated directly in the algorithm formulation. The simulated performance of the new algorithm is compared to EB-ESPRIT.
Convention Paper 8904 (Purchase now)

P16-3 Parametric Spatial Audio Coding for Spaced Microphone Array RecordingsArchontis Politis, Aalto University - Espoo, Finland; Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Jukka Ahonen, Akukon Ltd. - Helsinki, Finland; Ville Pulkki, Aalto University - Aalto, Finland
Spaced microphone arrays for multichannel recording of music performances, when reproduced in a multichannel system, exhibit reduced inter-channel coherence that translates perceptually to a pleasant ‘enveloping’ quality, at the expense of accurate localization of sound sources. We present a method to process the spaced microphone recordings using the principles of Directional Audio Coding (DirAC), based on the knowledge of the array configuration and the frequency-dependent microphone patterns. The method achieves equal or better quality to the standard high-quality version of DirAC and it improves the common one-to-one channel playback of spaced multichannel recordings by offering improved and stable localization cues.
Convention Paper 8905 (Purchase now)

P16-4 Optimal Directional Pattern Design Utilizing Arbitrary Microphone Arrays: A Continuous-Wave ApproachSymeon Delikaris-Manias, Aalto University - Helsinki, Finland; Constantinos A. Valagiannopoulos, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Aalto, Finland
A frequency-domain method is proposed for designing directional patterns from arbitrary microphone arrays employing the complex Fourier series. A target directional pattern is defined and an optimal set of sensor weights is determined in a least-squares sense, adopting a continuous-wave approach. It is based on discrete measurements with high spatial sampling ratio, which mitigates the potential aliasing effect. Fourier analysis is a common method for audio signal decomposition; however in this approach a set of criteria is employed to define the optimal number of Fourier coefficients and microphones for the decomposition of the microphone array signals at each frequency band. Furthermore, the low-frequency robustness is increased by smoothing the target patterns at those bands. The performance of the algorithm is assessed by calculating the directivity index and the sensitivity. Applications, such as synthesizing virtual microphones, beamforming, binaural, and loudspeaker rendering are presented.
Convention Paper 8906 (Purchase now)

P16-5 Layout Remapping Tool for Multichannel Audio ProductionsTim Schmele, Fundació Barcelona Media - Barcelona, Spain; David García-Garzón, Universitat Pompeu Fabra - Barcelona, Spain; Umut Sayin, Fundació Barcelona Media - Barcelona, Spain; Davide Scaini, Fundació Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain; Daniel Arteaga, Fundacio Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain
Several multichannel audio formats are present in the recording industry with reduced interoperability among the formats. This diversity of formats leaves the end user with limited accessibility to content and/or audience. In addition, the preservation of recordings—that are made for a particular format—comes under threat, should the format become obsolete. To tackle such issues, we present a layout-to-layout conversion tool that allows converting recordings that are designated for one particular layout to any other layout. This is done by decoding the existent recording to a layout independent equivalent and then encoding it to desired layout through different rendering methods. The tool has proven useful according to expert opinions. Simulations depict that after several consecutive conversions the results exhibit a decrease in spatial accuracy and increase in overall gain. This suggests that consecutive conversions should be avoided and only a single conversion from the originally rendered material should be done.
Convention Paper 8907 (Purchase now)

P16-6 Discrimination of Changing Loudspeaker Positions of 22.2 Multichannel Sound System Based on Spatial ImpressionsIkuko Sawaya, Science & Technology Research Laboratories, Japan Broadcasting Corp. - Setagaya, Tokyo, Japan; Kensuke Irie, Science & Technology Research Laboratories, Japan Broadcasting Corp. - Setagaya, Tokyo, Japan; Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Kimio Hamasaki, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan
On 22.2 multichannel reproduction, we sometimes listened to the sounds reproduced by a loudspeaker arrangement different from that on production, and we did not recognize the difference in spatial impression between them definitely. In this paper we discuss the effects of changing some of the loudspeaker positions from the reference on the spatial impressions in a 22.2 multichannel sound system. Subjective evaluation tests were carried out by altering the azimuthal and elevation angles from the reference in each condition. Experimental results showed that the listeners did not recognize the difference in spatial impression from the reference with some loudspeaker arrangements. On the basis of these results, the ranges keeping the equivalent quality of the spatial impressions to the reference are discussed when the reproduced material has the signals of all the channels of the 22.2 multichannel sound system.
Convention Paper 8909 (Purchase now)

P16-7 Modeling Sound Localization of Amplitude-Panned Phantom Sources in Sagittal PlanesRobert Baumgartner, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria
Localization of sound sources in sagittal planes (front/back and top/down) relies on listener-specific monaural spectral cues. A functional model approximating human processing of spectro-spatial information is applied to assess localization of phantom sources created by amplitude panning of coherent loudspeaker signals. Based on model predictions we investigated the localization of phantom sources created by loudspeakers positioned in the front and in the back, the effect of loudspeaker span and panning ratio on localization performance in the median plane, and the amount of spatial coverage provided by common surround sound systems. Our findings are discussed in the light of previous results from psychoacoustic experiments.
Convention Paper 8910 (Purchase now)


Tuesday, May 7, 14:30 — 16:30 (Sala Carducci)

Paper Session: P18 - Spatial Audio—Part 3: Ambisonics, WFS

Symeon Delikaris-Manias, Aalto University - Helsinki, Finland

P18-1 An Ambisonics Decoder for Irregular 3-D Loudspeaker ArraysDaniel Arteaga, Fundacio Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain
We report on the practical implementation of an Ambisonics decoder for irregular 3-D loudspeaker layouts. The developed decoder, which uses a non-linear search algorithm to look for the optimal Ambisonics coefficients for each loudspeaker, has a number of features specially tailored for reproduction in real-world 3-D audio venues (for example, special 3-D audio installations, concert halls, audiovisual installations in museums, etc.). In particular, it performs well even for highly irregular speaker arrays, giving an acceptable listening experience over a large audience area.
Convention Paper 8918 (Purchase now)

P18-2 The Effect of Low Frequency Reflections on Stereo ImagesJamie A. S. Angus, University of Salford - Salford, Greater Manchester, UK
This paper looks at the amount of absorption required to adequately control early reflections in reflection-controlled environments at low frequencies (< 700 Hz). This is where the Inter-aural Time Delay Cue (ITD) is important. Most work has focused on wideband energy time performance. In particular, it will derive the effect a given angle and strength of reflection has on the phantom image location using the Blumlein equations. These allow the effect of reflections as a function of frequency to be quantified. It will show that the effect of reflections are comparatively small for floor and ceiling reflections, but that lateral reflections depend on the phantom image location and get worse the more off-center the phantom image becomes.
Convention Paper 8919 (Purchase now)

P18-3 Parametric Spatial Audio Reproduction with Higher-Order B-Format Microphone InputVille Pulkki, Aalto University - Aalto, Finland; Archontis Politis, Aalto University - Espoo, Finland; Giovanni Del Galdo, Ilmenau University of Technology - Ilmenau, Germany; Achim Kuntz, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
A time-frequency-domain non-linear parametric method for spatial audio processing is presented here, which can utilize microphone input having directional patterns of any order. The method is based on dividing the sound field into overlapping or non-overlapping sectors. Local pressure and velocity signals are measured within each sector, and an individual Directional Audio Coding (DirAC) processing is performed for each sector. It is shown, that in certain acoustically complex conditions the sector-based processing enhances the quality compared to traditional first-order DirAC processing.
Convention Paper 8920 (Purchase now)

P18-4 Wave Field Synthesis of Virtual Sound Sources with Axisymmetric Radiation Pattern Using a Planar Loudspeaker ArrayFilippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Ferdinando Olivieri, University of Southampton - Southampton, Hampshire, UK; Thibaut Carpentier, UMR STMS IRCAM-CNRS-UPMC - Paris, France; Markus Noisternig, UMR STMS IRCAM-CNRS-UPMC - Paris, France
A number of methods have been proposed for the application of Wave Field Synthesis to the reproduction of sound fields generated by point sources that exhibit a directional radiation pattern. However, a straightforward implementation of these solutions involves a large number of real-time operations that may lead to very high computational load. This paper proposes a simplified method to synthesize virtual sources with axisymmetric radiation patterns using a planar loudspeaker array. The proposed simplification relies on the symmetry of the virtual source radiation pattern and on the far-field approximation, although a near-field formula is also derived. The mathematical derivation of the method is presented and numerical simulations validate the theoretical results.
Convention Paper 8921 (Purchase now)


Return to Spatial Sound Track Events

REGISTRATION DESK May 4th 09:30 – 18:30 May 5th 08:30 – 18:30 May 6th 08:30 – 18:30 May 7th 08:30 – 16:30
TECHNICAL PROGRAM May 4th 10:30 – 19:00 May 5th 09:00 – 19:00 May 6th 09:00 – 19:00 May 7th 09:00 – 17:00
AES - Audio Engineering Society