Authors:McCormack, Leo; Delikaris-Manias, Symeon; Politis, Archontis; Pavlidi, Despoina; Farina, Angelo; Pinardi, Daniel; Pulkki, Ville
Affiliation:Aalto University;Aalto University;Aalto University;University of Crete, Heraklion, Greece;University of Parma, Parma, Italy;University of Parma, Parma, Italy;Aalto University
This article details and evaluates three alternative approaches to sound-field visualization, which all employ the use of spatially-localized active-intensity (SLAI) vectors. SLAI vectors are particularly interesting as they allow direction-of-arrival (DoA) estimates to be extracted in multiple spatially-localized sectors, such that sound sources and/or noise present in one sector has reduced influence on the DoA estimate made in the other sectors. These DoA estimates may then be used to visualize the sound-field by either: i) directly depicting the estimates as icons, with their relative size dictated by the corresponding energy of each sector; ii) generating traditional activity-maps via histogram analysis of the DoA estimates; or iii) by using the DoA estimates to re-assign energy and subsequently sharpen traditional beamformer-based activity-maps. Since SLAI-based DoA estimates are continuous, these approaches are inherently computationally efficient, as they forgo the need for dense scanning grids to attain high-resolution imaging.
Download: PDF (HIGH Res) (13.0MB)
Download: PDF (LOW Res) (669KB)
Authors:Pulkki, Ville; Pöntynen, Henri; Santala, Olli
Affiliation:Acoustics Lab, Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland;Acoustics Lab, Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland;Acoustics Lab, Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland
Modern spatial audio reproduction techniques with headphones or loudspeakers seek to control the perceived spatial image as accurately as possible in three dimensions. The mechanisms of spatial perception have been studied mainly in the horizontal plane, and this article attempts to shed some light on the corresponding phenomena in the median plane. Spatial perception of concurrently active sound sources was investigated in an exploratory listening experiment. Incoherent noise source distributions of varying spatial characteristics were presented from loudspeaker arrays in anechoic conditions. The arrays were coinciding with the ±45° angular sectors in the frontal median and horizontal planes. The task for immobile subjects was to report the directions of loudspeakers they perceived emitting sound. The results from median plane distributions suggest that two concurrent sources located along the vertical midline can be perceived individually without resorting to head movements when they are separated in elevation by 60° or more. With source pairs separated by less than 60°, and with more complex physical distributions, the distributions were perceived inaccurately, biased, and spatially compressed but nevertheless not as point-like auditory images.
Download: PDF (HIGH Res) (6.7MB)
Download: PDF (LOW Res) (1.4MB)
Authors:Ma, Xiaohui; Hohnerlein, Christoph; Ahrens, Jens
Affiliation:Dynaudio A/S, Skanderborg, Denmark;Berlin Institute of Technology, Berlin, Germany; Chalmers University of Technology, Sweden
This paper presents a multiband approach for crosstalk cancellation based on superdirective near-field beamforming (SDB) that adapts dynamically to a change in the listener position. SDB requires the computation of a separate set of beamformer weights for each listener position. The beamformer uses weights that exhibit a smooth evolution for listening positions along a linear trajectory parallel to the array. The beamformer weights can therefore be parameterized by using only a few parameters for each frequency. Upon real-time execution, the beamformer weights are determined efficiently for any position from the parameters with negligible error. Simulations and measurements show that the proposed method provides high channel separation and is robust with respect to small uncertainties of the listener position. A user study with 20 subjects and binaural signals shows consistent auditory localization accuracy across the different tested listening positions that are comparable to the localization accuracy of headphone rendering. The study also confirms the previously informal observation that fewer front-back confusions are observed when the listeners face away from the loudspeaker array compared to the listeners facing toward the array.
Download: PDF (HIGH Res) (4.3MB)
Download: PDF (LOW Res) (1.0MB)
Authors:Tylka, Joseph G.; Choueiri, Edgar Y.
Affiliation:Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA
Given an ambisonics-encoded sound field (i.e., a sound field that has been decomposed into spherical harmonics), virtual navigation enables a listener to explore the recorded space and, ideally, experience a spatially- and tonally-accurate perception of the sound field. Suitable domains were established for the practical application of two state-of-the-art parametric interpolation methods for virtual navigation of ambisonics-encoded sound fields. Although several navigational methods have been developed, existing studies rarely include comparisons between methods, and practical assessments of such methods have been limited. The authors conducted numerical simulations in order to characterize and compare the performance of the time-frequency analysis interpolation method of Thiergart et al. to the recently-proposed parametric valid microphone interpolation method. The simulations involved simple incident sound fields consisting of a two-microphone array and a single point-source with varied source distance and azimuth, microphone spacing, and listener position. The errors introduced by the two methods were objectively evaluated in terms of metrics for sound level, spectral coloration, source localization, and diffuseness. Various practical domains were subsequently identified, and guidelines were established with which to choose between these methods based on their intended application.
Download: PDF (HIGH Res) (1.6MB)
Download: PDF (LOW Res) (359KB)
Authors:Costa, Maurício V. M.; Apolinário, Isabela F.; Biscainho, Luiz W. P.
Affiliation:Signals, Multimedia, and Telecommunications Lab (SMT) – DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Signals, Multimedia, and Telecommunications Lab (SMT) – DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Signals, Multimedia, and Telecommunications Lab (SMT) – DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
In audio signal processing, several techniques rely on the Time-Frequency Representation (TFR) of an audio signal, and particularly in applications for music information retrieval. Examples include automatic music transcription, sound source separation, and classification of instruments playing in a musical piece. This paper presents a novel method for obtaining a sparse time-frequency representation by combining different instances of the Fan-Chirp Transform (FChT). The method described is comprised of two main steps: computing the multiple FChTs by means of the structure tensor; and combining them, along with spectrograms, using the smoothed local sparsity method. Experiments conducted with synthetic and real-world audio signals suggest that the proposed method is able to effectively yield much better TFRs than the standard short-time Fourier transform, especially in the presence of fast frequency variations; this allows using the FChT for polyphonic audio signals. As a result, the proposed method allows for better extraction of precise information from audio signals with multiple sources.
Download: PDF (HIGH Res) (5.5MB)
Download: PDF (LOW Res) (459KB)
For augmented or assistive listening situations there is a compromise to be struck between hearing natural sounds from the environment and hearing reproduced sounds. Ideally the hear-through sound quality would be the same as if one was not wearing headphones. Bone conduction is another way of getting sounds into the head, and one that might be usable for spatializing information as part of a hybrid information display. It may be possible to adapt measurement methods intended for active noise canceling ear defenders to consumer ANC applications. One may be able to predict the degree of listening effort needed to hear speech through such headphones when noise is present.
Download: PDF (1023KB)