AES Journal

Journal of the AES

2013 July/August - Volume 61 Number 7/8

Download Entire Issue (10.3MB)

Listen to Podcast

President’s Message

Author: Frank Wells

Page: 488

Download: PDF (34KB)

Papers

Acoustic Zooming by Multi-Microphone Sound Scene Manipulation

Authors:van Waterschoot, Toon; Tirry, Wouter Joos; Moonen, Marc
Affiliation:KU Leuven, Department of Electrical Engineering, ESAT-SCD, Leuven, Belgium; NXP Software, Leuven, Belgium
Page:489

Camera zooming would be more compelling if the audio was subjected to a corresponding zoom that matched the video. Psychophysical and neuroimaging results suggest that a cross-modal approach to zooming facilitates multisensory integration. Because auditory distance perception is primarily determined by sound intensity, an audiovisual zoom effect can be obtained by matching the levels of different sources in a sound scene with their visually perceived distance. The authors propose a general theory for independent sound source level control that can be used to attain an acoustic zoom effect. The theory does not require sound source separation, which reduces computational load. An efficient implementation using fixed and adaptive spatial and spectral noise-reduction algorithms is proposed and evaluated. Experimental results using an array of a small number of low-cost microphones confirm that the proposed approach is particularly suited for consumer audiovisual capture applications.

Download: PDF (HIGH Res) (3.1MB)

Download: PDF (LOW Res) (2.9MB)

Be the first to discuss this paper

A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments

Open
Access

Authors:Hummersone, Christopher; Mason, Russell; Brookes, Tim
Affiliation:University of Surrey, Guildford, UK
Page:508

Reverberation is a problem for source separation algorithms. Because the precedence effect allows human listeners to suppress the perception of reflections arising from room boundaries, numerous computational models have incorporated the precedence effect. However, relatively little work has been done on using the precedence effect in source separation algorithms. This paper compares several precedence models and their influence on the performance of a baseline separation algorithm. The models were tested in a variety of reverberant rooms and with a range of mixing parameters. Although there was a large difference in performance among the models, the one that was based on interaural coherence and onset-based inhibition produced the greatest performance improvement. There is a trade-off between selecting reliable cues that correspond closely to free-field conditions and maximizing the proportion of the input signals that contributes to localization. For optimal source separation performance, it is necessary to adapt the dynamic component of the precedence model to the acoustic conditions of the room.

Download: PDF (HIGH Res) (905KB)

Download: PDF (LOW Res) (598KB)

Be the first to discuss this paper

Real-Time Speech Signal Segmentation Methods

Authors:Kupryjanow, Adam; Czyzewski, Andrzej
Affiliation:Multimedia Systems Department, Gdansk University of Technology, Gdansk, Poland
Page:521

Many researchers have developed algorithms for speech-signal segmentation in such applications as automatic speech recognition, speech coding, echo cancellation, automatic noise reduction, signal-to-noise ratio estimation, nonuniform time-scale modification, estimating the speech rate, automatic language identification, and speaker emotion recognition. Unlike algorithms that work offline, the authors developed two algorithms for real-time speech analysis as applied to detection of vowel-regions and estimation of speech rate. The accuracy, reliability, and real-time performance of these algorithms were evaluated in samples of Polish speech; experimental results showed that they performed equal to or better than existing offline approaches.

Download: PDF (HIGH Res) (1.6MB)

Download: PDF (LOW Res) (459KB)

Be the first to discuss this paper

Perceptual Objective Quality Evaluation Method for High Quality Multichannel Audio Codecs

Authors:Seo, Jeong-Hun; Chon, Sang Bae; Sung, Keong-Mo; Choi, Inyong
Affiliation:Institute of New Media and Communications, School of Electrical Engineering and Computer Science, Seoul National University, Republic of Korea; Multimedia R & D Team, DMC R & D Center, Samsung Electronics, Suwon, Republic of Korea; Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA, USA
Page:535

In order to avoid the high cost of subjective listening tests for evaluating sound quality, objective assessment methods based on psychoacoustics have been routinely used. This research explores an assessment method for evaluating high-quality, multichannel audio codecs with a model that incorporates five monaural Model Output Variables (MOV) combined with four novel MOVs for predicting degradation of spatial attributes. When trained and verified with a listening-test data base of high-quality audio codecs, the model was able to predict small amounts of perceptual differences between test and reference signals in both spatial and timbre qualities.

Download: PDF (HIGH Res) (4.3MB)

Download: PDF (LOW Res) (649KB)

Be the first to discuss this paper

Headphone-Based Virtual Spatialization of Sound with a GPU Accelerator

Authors:Belloch, Jose A.; Ferrer, Miguel; Gonzalez, Alberto; Martinez-Zaldivar, F.J.; Vidal, Antonio M.
Affiliation:Institute of Telecommunications and Multimedia Applications, Universitat Politecnica de Valencia, Valencia, Spain; Dept. of Information Systems and Computation, Universitat Politecnica de Valencia, Valencia, Spain
Page:546

This paper describes the design of a binaural headphone-based multisource spatial-audio application using a Graphical Processing Unit (GPU) as the compute engine. It is a highly parallel programmable coprocessor that provides massive computation power when the algorithm is properly parallelized. To render a sound source at a specific location, audio samples must be convolved with Head Related Impulse Responses (HRIR) filters for that location. A data base of HRIR at fixed spatial positions is used. Solutions have been developed to handle two problems: synthesizing sound sources positions that are not in the HRIR database, and virtualizing the movement of the sound sources between different positions. The GPU is particularly appropriate for simultaneously executing multiple convolutions without overloading the main CPU. The results show that the proposed application is able to handle up to 240 sources simultaneously when all sources are moving.

Download: PDF (HIGH Res) (1.7MB)

Download: PDF (LOW Res) (702KB)

Be the first to discuss this paper

Audio Pitch Shifting Using the Constant-Q Transform

Authors:Schörkhuber, Christian; Klapuri, Anssi; Sontacchi, Alois
Affiliation:Institute of Electronic Music and Acoustics, Graz, Austria; Ovelin, Helsinki
Page:562

Pitch shifting of polyphonic music is usually performed by manipulating the time–frequency representation of the input signal such that frequency is scaled by a constant and time duration remains unchanged. A method for pitch shifting is proposed that exploits the logarithmic frequency-bin spacing of the Constant-Q Transform (CQT). Pitch-scaling of monophonic and dense polyphonic music signals is achieved by a simple linear translation of the CQT representation followed by a phase update stage. This approach provides a natural solution to the problems of transients because the CQT has good time resolution at high frequencies while interference between tonal components at low frequencies is reduced. Performing pitch shifting directly in the frequency domain allows the algorithm to process only parts of the signal while leaving other parts unchanged. Audio examples demonstrate the quality of the proposed algorithm for scaling factors up to an octave.

Download: PDF (HIGH Res) (1019KB)

Download: PDF (LOW Res) (277KB)

Be the first to discuss this paper

Wiener Filtering for Anechoic Transfer Function Measurement in Acoustics

Author:Dadic, Martin
Affiliation:University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Unska, Croatia
Page:573

Deriving the transfer function of electroacoustical systems in a normal reverberant environment is usually based on frequency-domain methods that compute the anechoic spectra of direct and early sounds. Unlike time-delay spectrometry and time-gating frequency analysis, this paper proposes a Wiener-Hopf solution for time-gating in anechoic transfer-function measurements. The procedure eliminates scattered or reflected rays. Compared to methods based on the discrete Fourier transform, this approach allows for nonuniform frequency-domain sampling when the system’s magnitude and/or phase responses exhibit large variations. Since the approach relies only on frequency-domain measurements and simple time-gating, it does not require complicated and expensive equipment such as pulse generators and fast sampling oscilloscopes.

Download: PDF (HIGH Res) (982KB)

Download: PDF (LOW Res) (219KB)

Be the first to discuss this paper

Standards and Information Documents

AES Standards Committee News

Page: 579

Download: PDF (143KB)

Features

134th Convention Report, Rome

Page: 582

Download: PDF (1.5MB)

135th Convention Preview, New York

Page: 592

Download: PDF (628KB)

135th Exhibitor Previews

Page: 594

Download: PDF (761KB)

Game Audio: Transforming Simulation and Interactivity

Author:Rumsey, Francis
Page:609

Game audio has reached a state of maturity that implies parity of status with graphics. The most recent research in the field is concerned with connecting sound design to other game creation processes in a more integrated fashion and with devising adaptive sound design tools to increase emotional involvement and interactivity.

Download: PDF (257KB)

Be the first to discuss this feature