AES London 2010
Paper Session Details
P1 - High Performance Audio Processing
Saturday, May 22, 09:30 — 12:30 (Room C3)
Chair: Neil Harris, New Transducers Ltd. (NXT) - Cambridge, UK
P1-1 Model-Driven Development of Audio Processing Applications for Multi-Core Processors—Tiziano Leidi, ICIMSI-SUPSI - Manno, Switzerland; Thierry Heeb, Digimath - Sainte-Croix, Switzerland; Marco Colla, ICIMSI-SUPSI - Manno, Switzerland; Jean-Philippe Thiran, EPFL - Lausanne, Switzerland
Chip-level multiprocessors are still very young and available forecasts anticipate a strong evolution for the forthcoming decade. To exploit them, efficient and robust applications have to be built with the appropriate algorithms and software architectures. Model-driven development is able to lower some barriers toward applications that process audio in parallel on multi-cores. It allows using abstractions to simplify and mask complex aspects of the development process and helps avoid inefficiencies and subtle bugs. This paper presents some evolutions of Audio n-Genie, an open-source environment for model-driven development of audio processing applications, which has been recently enhanced with support for parallel processing on multi-cores.
Convention Paper 7961 (Purchase now)
P1-2 Real-Time Additive Synthesis with One Million Sinusoids Using a GPU—Lauri Savioja, NVIDIA Research - Helsinki, Finland, Aalto University School of Science and Technology, Espoo, Finland; Vesa Välimäki, Aalto University School of Science and Technology - Espoo, Finland; Julius O. Smith III, Stanford University - Palo Alto, CA, USA
Additive synthesis is one of the fundamental sound synthesis techniques. It is based on the principle that each sound can be represented as a superposition of sine waves of different frequencies. That task can be done fully parallel and thus it is suitable for GPU (graphics processing unit) implementation. In this paper we show that it is possible to compute over one million unique sine waves in real-time using a current GPU. That performance depends on the applied buffer sizes, but close to the maximum result is reachable already with a buffer of 500 samples.
Convention Paper 7962 (Purchase now)
P1-3 A GPGPU Approach to Improved Acoustic Finite Difference Time Domain Calculations—Jamie A. S. Angus, Andrew Caunce, University of Salford - Salford, Greater Manchester, UK
This paper shows how to improve the efficiency and accuracy of Finite Difference Time Domain acoustic simulation by both calculating the differences using spectral methods and performing these calculations on a Graphics Processing Unit (GPU) rather than a CPU. These changes to the calculation method result in an increase in accuracy as well as a reduction in computational expense. The recent advances in the way that GPU’s are programmed (for example using CUDA on Nvidia's GPU) now make them an ideal platform on which to perform scientific computations at very high speeds and very low power consumption.
Convention Paper 7963 (Purchase now)
P1-4 Digital Equalization Filter: New Solution to the Frequency Response Near Nyquist and Evaluation by Listening Tests—Thorsten Schmidt, Cube-Tec International - Bremen, Germany; Joerg Bitzer, Jade-University of Applied Sciences - Oldenburg, Germany
Current design methods for digital equalization filter face the problem of a frequency response increasingly deviating from their analog equivalent close to the Nyquist frequency. This paper deals with a new way to design equalization filters, which improve this behavior over the entire frequency range between 0 Hz (DC) and Nyquist. The theoretical approach is shown and examples of low pass, peak-, and shelving-filters are compared to state-of-the-art techniques. Listening tests were made to verify the audible differences and rate the quality of the different design methods.
Convention Paper 7964 (Purchase now)
P1-5 Audio Equalization with Fixed-Pole Parallel Filters: An Efficient Alternative to Complex Smoothing—Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary
Recently, the fixed-pole design of parallel second-order filters has been proposed to accomplish arbitrary frequency resolution similarly to Kautz filters, at 2/3 of their computational cost. This paper relates the parallel filter to the complex smoothing of transfer functions. Complex smoothing is a well-established method for limiting the frequency resolution of audio transfer functions for analysis, modeling, and equalization purposes. It is shown that the parallel filter response is similar to the one obtained by complex smoothing the target response using a hanning window: a 1/ß octave resolution is achieved by using ß/2 pole pairs per octave in the parallel filter. Accordingly, the parallel filter can be either used as an efficient implementation of smoothed responses, or, it can be designed from the unsmoothed responses directly, eliminating the need of frequency-domain processing. In addition, the theoretical equivalence of parallel filters and Kautz filters is developed, and the formulas for converting between the parameters of the two structures are given. Examples of loudspeaker-room equalization are provided.
Convention Paper 7965 (Purchase now)
P1-6 Rapid and Automated Development of Audio Digital Signal Processing Algorithms for Mobile Devices—David Trainor, APTX - Belfast, N. Ireland, UK
Software applications and programming languages are available to assist audio DSP algorithm developers and mobile device designers, including Matlab/Simulink, C/C++, and assembly languages. These tools provide some assistance for algorithmic experimentation and subsequent refinement to highly-efficient embedded software. However, a typical design flow is still highly iterative, with manual software recoding, translation, and optimization. This paper introduces a software libraries and design techniques that integrate existing commercial audio algorithm design tools and permit intuitive algorithmic experimentation and automated translation of audio algorithms to efficient embedded software. These techniques have been incorporated into a new software framework, and the operation of this framework is described using the example of a custom audio coding algorithm targeted to a mobile audio device.
Convention Paper 7966 (Purchase now)
P2 - Sound Reinforcement and Room Acoustics
Saturday, May 22, 09:30 — 12:30 (Room C5)
Chair: Glenn Leembruggen, Acoustic Directions Pty Ltd. - ICE Design, Sydney, Australia, and University of Sydney, Sydney, Australia
P2-1 Simultaneous Soundfield Reproduction at Multiple Spatial Regions—Yan Jennifer Wu, Thushara D. Abhayapala, The Australian National University - Canberra, Australia
Reproduction of different sound fields simultaneously at multiple spatial regions is a complex problem in acoustic signal processing. In this paper we present a framework to recreate 2-D sound fields simultaneously at multiple spatial regions using a single loudspeakers array. We propose a novel method of finding an equivalent global sound field that consists of each individual desired sound field by the spatial harmonic coefficients translation between the coordinate systems. This method makes full use of the available dimensionality of the sound field. Some fundamental limits are also revealed. Specifically, the number of spatial regions could be reproduced inside the single loudspeaker array as determined by the total dimensionality from each sound field.
Convention Paper 7967 (Purchase now)
P2-2 Verification of Geometric Acoustics-Based Auralization Using Room Acoustics Measurement Techniques—Aglaia Foteinou, Damian Murphy, University of York - Heslington, UK; Anthony Masinton, University of York - King’s Manor, York, UK
Geometric acoustics techniques such as ray-tracing, image-source, and variants are commonly used in the simulation and auralization of the acoustics of an enclosed space. The shortcomings of such techniques are well documented and yet they are still the methods most commonly used and accepted in architectural acoustic design. This work compares impulse response-based objective acoustic measures obtained from a 3-D model of a medieval English church using a combination of image-source and ray-tracing geometric acoustic methods, with measurements obtained within the actual space. The results are presented in both objective and subjective terms, and include an exploration of optimized boundary materials and source directivity characteristics with problems and limitations clarified.
Convention Paper 7968 (Purchase now)
P2-3 Analysis Tool Development for the Investigation of Low Frequency Room Acoustics by Means of Finite Element Method—Christos Sevastiadis, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The sound wave behavior at low frequencies in small sound recording, control, and reproduction rooms, results in resonant sound fields observed in both frequency and spatial domains. Finite Element Method (FEM) software applications provide potential analysis procedures for acoustics problems but there is a lack of tools focusing on room acoustics investigation. The present paper is the result of an attempt to develop a room acoustics modeling tool integrated with a FEM. Simple room construction, common acoustical treatments, and multiple source excitations are involved in modal and steady state analysis procedures in order to simplify the solution of problematic sound fields. The key parameters and features, as long as validation experimental results are presented.
Convention Paper 7969 (Purchase now)
P2-4 Subwoofers in Rooms: Experimental Modal Analysis—Juha Backman, Nokia Corporation - Espoo Finland
The behavior of a loudspeaker in a room depends fully on the coupling of the loudspeaker to the room modes, which are in the subwoofer frequency range individually identifiable. The modes can be determined through computational methods if the surface properties and room geometry are simple enough, but systems with complex properties have to be analyzed experimentally. The paper describes the use of modal analysis techniques, usually applied only to two-dimensional structures, for three-dimensional spaces to determine experimentally the actual modes and their properties.
Convention Paper 7970 (Purchase now)
P2-5 Subwoofer Positioning, Orientation, and Calibration for Large-Scale Sound Reinforcement—Adam Hill, Malcolm Hawksford, University of Essex - Colchester, UK; Adam P. Rosenthal, Gary Gand, Gand Concert Sound - Glenview, IL, USA
It is often difficult to achieve even coverage at low-frequencies across a large audience area. To complicate matters, it is desirable to have tight control of the low-frequency levels on the stage. This is generally dealt with by using cardioid subwoofers. While this helps control the stage area, the audience area receives no clear benefit. This paper investigates how careful positioning, orientation, and calibration of a multiple subwoofer system can provide enhanced low-frequency coverage, both in the audience area and on the stage. The effects of placement underneath, on top of, and in front of the stage are investigated as well as the performance of systems consisting of both flown and ground-based subwoofers.
Convention Paper 7971 (Purchase now)
P2-6 Live Measurements of Ground-Stacked Subwoofers' Performance—Elena Shabalina, Mathias Kaiser, RWTH Aachen University - Aachen, Germany; Janko Ramuscak, d&b audiotechnik GmbH - Backnang, Germany
Various software-based measurement systems can be used for live sound measurements with music or speech signals in occupied concert halls with sufficient accuracy. However for some special applications as measuring the performance of a ground-stacked bass array component of a multiple component sound reinforcement system in an occupied hall, these systems cannot be used directly, since it is not possible to run a bass array alone during a concert. A simple method to obtain the subwoofers' impulse response using program signal measurements of a full range PA system is proposed. The desired impulse response results from subtraction of full system impulse responses with and without subwoofers running. Required measurement conditions and limitations are discussed.
Convention Paper 7972 (Purchase now)
P3 - Recording, Production, and Reproduction—Multichannel and Spatial Audio
Saturday, May 22, 10:30 — 12:00 (Room C4-Foyer)
P3-1 21-Channel Surround System Based on Physical Reconstruction of a Three Dimensional Target Sound Field—Jeongil Seo, Jae-hyoun Yoo, Kyeongok Kang, ETRI - Daejeon, Korea; Filippo M. Fazi, University of Southampton - Southampton, UK
This paper presents the 21-channel sound field reconstruction system based on the physical reconstruction of a three dimensional target sound field over the pre-defined control volume. According to the virtual sound source position and intensity, each loudspeaker signal is estimated through convolving with appropriate FIR filter to reconstruct a target sound field. In addition, the gain of FIR filter is only applied to the mid frequency band of a sound source signal to prevent aliasing effects and to save the computational complexity at the high frequency bands. Also the whole filter processing is carried out at the frequency domain to adopt a real-time application. Through the subjective listening tests the proposed system showed better performance on the localization in the horizontal plane comparing with conventional panning method.
Convention Paper 7973 (Purchase now)
P3-2 Real-Time Implementation of Wave Field Synthesis on NU-Tech Framework Using CUDA Technology—Ariano Lattanzi, Emanuele Ciavattini, Leaff Engineering - Ancona, Italy; Stefania Cecchi, Laura Romoli, Università Politecnica delle Marche - Ancona, Italy; Fabrizio Ferrandi, Politecnico di Milano - Milan, Italy
In this paper we present a novel implementation of a Wave Field Synthesis application based on emerging NVIDIA Compute Unified Device Architecture (CUDA) technology using NU-Tech Framework. CUDA technology unlocks the processing power of the Graphics Processing Units (GPUs) that are characterized by a highly parallel architecture. A wide range of complex algorithms are being re-written in order to benefit from this new approach. Wave Filed Synthesis is a quite new spatial audio rendering technique highly demanding in terms of computational power. We present here results and comparisons between a NU-Tech Plug-In (NUTS) implementing real-time WFS using CUDA libraries and the same algorithm implemented using Intel Integrated Primitives (IPP) Library.
Convention Paper 7974 (Purchase now)
P3-3 Investigation of 3-D Audio Rendering with Parametric Array Loudspeakers—Reuben Johannes, Jia-Wei Beh, Woon-Seng Gan, Ee-Leng Tan, Nanyang Technological University - Singapore
This paper investigates the applicability of parametric array loudspeakers to render 3-D audio. Unlike conventional loudspeakers, parametric array loudspeakers are able to produce sound in a highly directional manner, therefore reducing inter-aural crosstalk and room reflections. The investigation is carried out by performing objective evaluation and comparison between parametric array loudspeakers, conventional loudspeakers, and headphones. The objective evaluation includes crosstalk measurement and binaural cue analysis using a binaural hearing model. Additionally, the paper also investigates how the positioning of the parametric array loudspeakers affects 3-D audio rendering.
Convention Paper 7975 (Purchase now)
P3-4 Robust Representation of Spatial Sound in Stereo-to-Multichannel Upmix—Se-Woon Jeon, Yonsei University - Seoul, Korea; Young-Cheol Park, Yonsei University - Gangwon, Korea; Seok-Pil Lee, Korea Electronics Technology Institute (KETI) - Seoul, Korea; Dae-Hee Youn, Yonsei University - Seoul, Korea
This paper presents a stereo-to-multichannel upmix algorithm based on a source separation method. In the conventional upmix algorithms, panning source and ambient components are decomposed or separated by adaptive algorithm, i.e., least-squares (LS) or least-mean-square (LMS). Separation performance of those algorithms is easily influenced by primary to ambient energy ratio (PAR). Since PAR is time-varying, it causes the energy fluctuation of separated sound sources. To prevent this problem, we propose a robust separation algorithm using a pseudo inverse matrix. And we propose a novel post-scaling algorithm to compensate for the influence of interference with considering desired multichannel format. Performance of the proposed upmix algorithm is confirmed by subjective listening test in ITU 3/2 format.
Convention Paper 7976 (Purchase now)
P3-5 Ambisonic Decoders; Is Historical Hardware the Future?—Andrew J. Horsburgh, D. Fraser Clark, University of the West of Scotland - Paisley, UK
Ambisonic recordings aim to create full sphere audio fields through using a multi-capsule microphone and algorithms based on a “metatheory” as proposed by Gerzon. Until recently, Ambisonic decoding was solely implemented using hardware. Recent advances in computing power now allow for software decoders to supersede hardware units. It is therefore of interest to determine which of the hardware or software decoders provide the most accurate decoding of Ambisonic B-format signals. In this paper we present a comparison between hardware and software decoders with respect to their frequency and phase relationships to determine the most accurate reproduction. Results show that software is able to decode the files with little coloration compared to hardware circuits. It is possible to see which implementation of the analog or digital decoders match the behavioral characteristics of an “ideal” decoder.
Convention Paper 7977 (Purchase now)
P3-6 Localization Curves in Stereo Microphone Techniques—Comparison of Calculations and Listening Tests Results—Magdalena Plewa, Grzegorz Pyda, AGH University of Science and Technology - Kraków, Poland
Stereo microphone techniques are a simple and usable solution to reproduce music scenes while maintaining sound direction maintenance. To choose a proper microphone setup one needs to know the “recording angle” that could be easily determined from localization curves. Localization curves in different stereo microphone techniques can be determined with the use of calculations, which take into consideration interchannel level and time differences. Subjective listening tests were carried out to verify the mentioned calculations. Here we present and discuss the comparison between the calculations and the listening tests.
Convention Paper 7978 (Purchase now)
P3-7 Investigation of Robust Panning Functions for 3-D Loudspeaker Setups—Johann-Markus Batke, Florian Keiler, Technicolor, Research, and Innovation - Hannover Germany
An accurate localization is a key goal for a spatial audio reproduction system. This paper discusses different approaches for audio playback with full spatial information in three dimensions (3-D). Problems of established methods for 3-D audio playback, like the Ambisonics mode matching approach or Vector Base Amplitude Panning (VBAP), are discussed. A new approach is presented with special attention to the treatment of irregular loudspeaker setups, as they are to be expected in real world scenarios like living rooms. This new approach leads to better localization of virtual acoustic sources. Listening tests comparing the new approach with standard mode matching and VBAP will be described in a companion paper.
Convention Paper 7979 (Purchase now)
P4 - Spatial Signal Processing
Saturday, May 22, 14:00 — 18:00 (Room C3)
Chair: Francis Rumsey
P4-1 Classification of Time-Frequency Regions in Stereo Audio—Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands
The paper is about classification of time-frequency (TF) regions in stereo audio data by the type of mixture the region represents. The detection of the type of mixing is necessary, for example, in source separation, upmixing, and audio manipulation applications. We propose a generic signal model and a method to classify the TF regions into six classes that are different combinations of central, panned, and uncorrelated sources. We give an overview of traditional techniques for comparing frequency-domain data and propose a new approach for classification that is based on measures specially trained for the six classes. The performance of the new measures is studied and demonstrated using synthetic and real audio data.
Convention Paper 7980 (Purchase now)
P4-2 A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments—Christopher Hummersone, Russell Mason, Tim Brookes, University of Surrey - Guildford, UK
Reverberation continues to be problematic in many areas of audio and speech processing, including source separation. The precedence effect is an important psychoacoustic tool utilized by humans to assist in localization by suppressing reflections arising from room boundaries. Numerous computational precedence models have been developed over the years and all suggest quite different strategies for handling reverberation. However, relatively little work has been done on incorporating precedence into source separation. This paper details a study comparing several computational precedence models and their impact on the performance of a baseline separation algorithm. The models are tested in a range of reverberant rooms and with a range of other mixture parameters. Large differences in the performance of the models are observed. The results show that a model based on interaural coherence produces the greatest performance gain over the baseline algorithm.
Convention Paper 7981 (Purchase now)
P4-3 Converting Stereo Microphone Signals Directly to MPEG-Surround—Christophe Tournery, Christof Faller, Illusonic LLC - Lausanne, Switzerland; Fabian Kuech, Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
We have previously proposed a way to use stereo microphones with spatial audio coding to record and code surround sound. In this paper we are describing further considerations and improvements needed to convert stereo microphone signals directly to MPEG Surround, i.e., a downmix signal plus a bit stream. It is described in detail how to obtain from the microphone channels the information needed for computing MPEG Surround spatial parameters and how to process the microphone signals to transform them to an MPEG Surround compatible downmix.
Convention Paper 7982 (Purchase now)
P4-4 Modification of Spatial Information in Coincident-Pair Recordings—Jeremy Wells, University of York - York, UK
A novel method is presented for modifying the spatial information contained in the output from a stereo coincident pair of microphones. The purpose of this method is to provide additional decorrelation of the audio at the left and right replay channels for sound arriving at the sides of a coincident pair but to retain the imaging accuracy for sounds arriving to the front or rear or where the entire sound field is highly correlated. Details of how this is achieved are given and results for different types of sound field are presented.
Convention Paper 7983 (Purchase now)
P4-5 Unitary Matrix Design for Diffuse Jot Reverberators—Fritz Menzer, Christof Faller, Ecole Polytechnique Federale de Lausanne - Lausanne, Switzerland
This paper presents different methods for designing unitary mixing matrices for Jot reverberators with a particular emphasis on cases where no early reflections are to be modeled. Possible applications include diffuse sound reverberators and decorrelators. The trade-off between effective mixing between channels and the number of multiply operations per channel and output sample is investigated as well as the relationship between the sparseness of powers of the mixing matrix and the sparseness of the impulse response.
Convention Paper 7984 (Purchase now)
P4-6 Sound Field Indicators for Hearing Activity and Reverberation Time Estimation in Hearing Instruments—Andreas P. Streich, ETH Zurich - Zurich, Switzerland; Manuela Feilner, Alfred Stirnemann, Phonak AG - Stäfa, Switzerland; Joachim M. Buhmann, ETH Zurich - Zurich, Switzerland
Sound field indicators (SFI) are proposed as a new feature set to estimate the hearing activity and reverberation time in hearing instruments. SFIs are based on physical measurements of the sound field. A variant thereof, called SFI short-time statistics SFIst2, is obtained by computing mean and standard deviations of SFIs on 10 subframes. To show the utility of these feature sets for the mentioned prediction tasks, experiments are carried out on artificially reverberated recordings of a large variety of sounds encountered in daily life. In a classification scenario where the hearing activity is to be predicted, both SFI and SFIst2 yield clearly superior accuracy even compared to hand-tailored features used in state-of-the-art hearing instruments. For regression on the reverberation time, the SFI-based features yield a lower residual error than standard feature sets and reach the performance of specially designed features. The hearing activity classification is mainly based on the average of the SFIs, while the standard deviation over sub-window is used heavily to predict the reverberation time.
Convention Paper 7985 (Purchase now)
P4-7 Stereo-to-Binaural Conversion Using Interaural Coherence Matching—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
In this paper a method of converting stereo recordings to simulated binaural recordings is presented. The stereo signal is separated into coherent and diffuse sound based on the assumption that the signal comes from a coincident symmetric microphone setup. The coherent part is reproduced using HRTFs and the diffuse part is reproduced using filters adapting the interaural coherence to the interaural coherence a binaural recording of diffuse sound would have.
Convention Paper 7986 (Purchase now)
P4-8 Linear Simulation of Spaced Microphone Arrays Using B-Format Recordings—Andreas Walther, Christof Faller, Ecole Polytechnique Federal de Lausanne - Lausanne, Switzerland
A novel approach for linear post-processing of B-Format recordings is presented. The goal is to simulate spaced microphone arrays by approximating and virtually recording the sound field at the position of each single microphone. The delays occurring in non-coincident recordings are simulated by translating an approximative plane wave representation of the sound field to the positions of the microphones. The directional responses of the spaced microphones are approximated by linear combination of the corresponding translated B-format channels.
Convention Paper 7987 (Purchase now)
P5 - Loudspeakers and Headphones: Part 1
Saturday, May 22, 14:00 — 18:00 (Room C5)
Chair: Mark Dodd, GP Acoustics - UK
P5-1 Micro Loudspeaker Behavior versus 6½-Inch Driver, Micro Loudspeaker Parameter Drift—Bo Rohde Pedersen, Aalborg University Esbjerg - Esbjerg, Denmark
This study tested micro loudspeaker behavior from the perspective of loudspeaker parameter drift. The main difference between traditional transducers and micro loudspeakers, apart from their size, is their suspension construction. The suspension generally is a loudspeaker's most unstable parameter, and the study investigated temperature drift and signal dependency. There is investigated three different micro loudspeakers and compared their behavior to that of a typical bass mid-range loudspeaker unit. There is measured all linear loudspeaker parameters at different temperatures.
Convention Paper 7988 (Purchase now)
P5-2 Modeling a Loudspeaker as a Flexible Spherical Cap on a Rigid Sphere—Ronald Aarts, Philips Research Europe - Eindhoven, The Netherlands, Technical University Eindhoven, Eindhoven, The Netherlands; Augustus J. E. M. Janssen, Technical University Eindhoven - Eindhoven, The Netherlands
It has been argued that the sound radiation of a loudspeaker is modeled realistically by assuming the loudspeaker cabinet to be a rigid sphere with a moving rigid spherical cap. Series expansions, valid in the whole space on and outside the sphere, for the pressure due to a harmonically excited, flexible cap with an axially symmetric velocity distribution are presented. The velocity profile is expanded in functions orthogonal on the cap rather than on the whole sphere. This has the advantage that only a few expansion coefficients are sufficient to accurately describe the velocity profile. An adaptation of the standard solution of the Helmholtz equation to this particular parametrization is required. This is achieved by using recent results on argument scaling in orthogonal Zernike polynomials. The efficacy of the approach is exemplified by calculating various acoustical quantities with particular attention to certain velocity profiles that vanish at the rim of the cap to a desired degree. These quantities are: the sound pressure, polar response, baffle-step response, sound power, directivity, and acoustic center of the radiator. The associated inverse problem, in which the velocity profile is estimated from pressure measurements around the sphere, is feasible as well since the number of expansion coefficients to be estimated is limited. This is demonstrated with a simulation.
Convention Paper 7989 (Purchase now)
P5-3 Closed and Vented Loudspeaker Alignments that Compensate for Nearby Room Boundaries—Jamie A. S. Angus, University of Salford - Salford, Greater Manchester, UK
The purpose of this paper is to present a method of designing loudspeakers in which the presence of the nearest three boundaries are taken into account in the design of the low frequency speaker. The paper will first review the effects of the three boundaries. It will then discuss how these effects might be compensated for. The paper will then examine the low frequency behavior of loudspeaker drive units and make suggestions for new alignments, which take account of the boundaries. It will conclude with some simulation examples.
Convention Paper 7990 (Purchase now)
P5-4 Point-Source Loudspeaker Reversely-Attached Acoustic Horn: Its Architecture, Acoustic Characteristics and Application to HRTF Measurements—Takahiro Miura, Teruo Muraoka, Tohru Ifukube, The University of Tokyo - Tokyo, Japan
It is ideal to measure acoustic characteristics by point-source sound. In the case when simultaneous recording of single source by multiple microphones located at under 1 m from the source, it is difficult to regard the loudspeaker as a point-source. In this paper we propose a point-source loudspeaker whose radiation diameter is smaller than 2 cm. The loudspeaker is designed to attach the mouse of hyperbolic horn to the diaphragm of a loudspeaker unit. Directional patterns of the proposed was measured at a distance of 50 cm from the radiation point in anechoic chamber. As a result, the difference of directional intensity at the frequency range of 20 - 700 Hz were within 3 dB at any combination of azimuth and elevation. At the frequency range over 700 Hz, difference of azimuthal directional intensity were within 10 dB while that of the elevational ones were within 20 dB.
Convention Paper 7991 (Purchase now)
P5-5 The Low-Frequency Acoustic Center: Measurement, Theory, and Application—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
At low frequencies the acoustic effect of a loudspeaker becomes simpler as the wavelength of the sound becomes large relative to the cabinet dimensions. One point acoustically acts as the center of the speaker at the lower frequencies. Measurements and acoustic boundary-element simulations verify the concept. Source radiation can be expressed as a multipole expansion, consisting of a spherical monopolar portion and a significant dipolar part, which becomes zero when the acoustic centre is chosen as the origin. Theory shows a strong connection between diffraction and the position of the acoustic center. General criteria are presented to give the position of the acoustic center for different geometrical cabinet shapes. Polar plots benefit when the pivot point is chosen to be the acoustic center. For the first of several applications we consider a subwoofer, whose radiation into a room is strongly influenced by the position of the acoustic center. A second application that we consider is the accurate reciprocity calibration of microphones, for which it is necessary to know the position of the acoustic center. A final application is the effective position of the ears on the head at lower frequencies. Calculations and measurements show that the acoustic centers of the ears are well away from the head.
Convention Paper 7992 (Purchase now)
P5-6 Prediction of Harmonic Distortion Generated by Electro-Dynamic Loudspeakers Using Cascade of Hammerstein Models—Marc Rebillat, LIMSI-CRNS - Orsay, France, LMS (CNRS, École Polytechnique), Palaiseau, France; Romain Hennequin, Institut TELECOM, TELECOM ParisTech - Paris, France; Etienne Corteel, sonic emotion - Oberglatt (Zurich), Switzerland; Brian F. G. Katz, LIMSI-CRNS - Orsay, France
Audio rendering systems are always slightly nonlinear. Their non-linearities must be modeled and measured for quality evaluation and control purposes. Cascade of Hammerstein models describes a large class of nonlinearities. To identify the elements of such a model, a method based on a phase property of exponential sine sweeps is proposed. A complete model of non-linearities is identified from a single measurement. Cascade of Hammerstein models corresponding to an electro-dynamic loudspeaker are identified this way. Harmonic distortion is afterward predicted using the identified models. Comparisons with classical measurement techniques show that harmonic distortion is accurately predicted by the identified models over the entire audio frequency range for any desired input amplitude.
Convention Paper 7993 (Purchase now)
P5-7 Characterizing Studio Monitor Loudspeakers for Auralization—Ben Supper, Focusrite Audio Engineering Ltd. - High Wycombe, UK
A method is presented for obtaining the frequency and phase response, directivity pattern, and some of the nonlinear distortion characteristics of studio monitor loudspeakers. Using a specially-designed test signal, the impulse response and directivity pattern are measured in a small recording room. A near-field measurement is also taken. An algorithm is presented for combining the near- and far-field responses in order to compute out the early reflections of the room. Doppler distortion can be calculated using recorded and measured properties of the loudspeaker. The result is a set of loudspeaker impulse and directional responses that are detailed enough for convincing auralization.
Convention Paper 7994 (Purchase now)
P5-8 Automated Design of Loudspeaker Diaphragm Profile by Optimizing the Simulated Radiated Sound Field with Experimental Validation—Patrick Macey, PACSYS Limited - Nottingham, UK; Kelvin Griffiths, Harman International Automotive Division - Bridgend, UK
Loudspeaker designers often aim to reduce acoustic effects of diaphragm resonance from the frequency and power response. However, even in the domain of simulation models, this often requires considerable trial and error to produce a satisfactory outcome. A modeling platform to simulate loudspeakers is presented, which can automatically cycle through vast permutations guided by a carefully considered objective function until convergence is achieved, and importantly, the entire iterative effort is accommodated by the computer. Experience and knowledge must make an initial evaluation of which factors are important in achieving a desired result a priori. Once prepared, the basis of a rapid loudspeaker development tool is formed that can return a tangible resource profit when compared to successive manual efforts to optimize loudspeaker components.
Convention Paper 7995 (Purchase now)
P6 - Audio Equipment and Emerging Technologies
Saturday, May 22, 14:00 — 15:30 (Room C4-Foyer)
P6-1 Study and Evaluation of MOSFET Rds(ON) Impedance Efficiency Losses in High Power Multilevel DCI-NPC Amplifiers—Vicent Sala, G. Ruiz, Luis Romeral, UPC-Universitat Politecnca de Catalunya - Terrassa, Spain
This paper justifies the usefulness of multilevel power amplifiers with DCI-NPC (Diode Clamped Inverter – Neutral Point Converter) topology in applications where size and weight needs were optimized. These amplifiers can work at high frequencies thereby reducing the size and weight of the filter elements. However, it is necessary to study, analyze, and evaluate the efficiency losses because this amplifier has double the number of switching elements. This paper models the behavior of the MOSFET Rds(ON) in a DCI-NPC topology for different conditions.
Convention Paper 7996 (Purchase now)
P6-2 Modeling Distortion Effects in Class-D Amplifier Filter Inductors—Arnold Knott, Tore Stegenborg-Andersen, Ole C. Thomsen, Technical University of Denmark - Lyngby, Denmark; Dominik Bortis, Johann W. Kolar, Swiss Federal Institute of Technology in Zurich - Zurich, Switzerland; Gerhard Pfaffinger, Harman/Becker Automotive Systems GmbH - Straubing, Germany; Michael A. E. Andersen, Technical University of Denmark - Lyngby, Denmark
Distortion is generally accepted as a quantifier to judge the quality of audio power amplifiers. In switch-mode power amplifiers various mechanisms influence this performance measure. After giving an overview of those, this paper focuses on the particular effect of the nonlinearity of the output filter components on the audio performance. While the physical reasons for both, the capacitor and the inductor induced distortion are given, the practical in-depth demonstration is done for the inductor only. This includes measuring the inductors performance, modeling through fitting and resulting into simulation models. The fitted models achieve distortion values between 0.03 % and 0.20 % as a basis to enable the design of a 200 W amplifier.
Convention Paper 7997 (Purchase now)
P6-3 Multilevel DCI-NPC Power Amplifier High-Frequency Distortion Analysis through Parasitic Inductance Dynamic Model—Vicent Sala, G. Ruiz, E. López, Luis Romeral, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
The high frequency distortion sources in DCI-NPC (Diode Clamped Inverter- Neutral Point Converter) amplifiers topology are studied and analyzed. It has justified the need for designing a model that contains the different parasitic inductive circuits that presents dynamically this kind of amplifier, as a function of the combination of its active transistors. By means of a proposed pattern layout we present a dynamic model of the parasitic inductances of the amplifier Full-Bridge DCI-NPC, and this is used to propose some simple rules for the optimal designing of layouts for these types of amplifiers. Simulation and experimental results are presented to justify the proposed model, and the affirmations and recommendations are given in this paper.
Convention Paper 7998 (Purchase now)
P6-4 How Much Gain Should a Professional Microphone Preamplifier Have?—Douglas McKinnie, Middle Tennessee State University - Murfreesboro, TN, USA
Many tradeoffs are required in the design of microphone preamplifier circuits. Characteristics such as noise figure, stability, bandwidth, and complexity may be dependent upon the gain of the design. Three factors determine the gain required from a microphone preamp: sound-pressure level of the sound source, distance of the microphone from that sound source (within the critical distance), and sensitivity of the microphone. This paper is an effort to find a probability distribution of the gain settings used with professional microphones. This is done by finding the distribution of max SPL in real use and by finding the sensitivity of the most commonly used current and classic microphones.
Convention Paper 7999 (Purchase now)
P6-5 Equalizing Force Contributions in Transducers with Partitioned Electrode—Libor Husník, Czech Technical University in Prague - Prague, Czech Republic
A partitioned electrode in an electrostatic transducer can present among others a possibility for making the transducer with the direct D/A conversion. Nevertheless, partitioned electrodes, the sizes of which are proportional to powers of 2 or terms of other convenient series, do not have the corresponding force action on the membrane. The reason is the membrane does not vibrate in a piston-like mode and electrode parts close to the membrane periphery do not excite membrane vibrations in the same way as the elements near the center. The aim of this paper is to suggest equalization of force contributions from different partitioned electrodes by varying their sizes. Principles presented here can also be used for other membrane-electrode arrangements.
Convention Paper 8000 (Purchase now)
P6-6 Low-End Device to Convert EEG Waves to MIDI—Adrian Attard Trevisan, St. Martins Institute of Information Technology - Hamrun, Malta; Lewis Jones, London Metropolitan University - London, UK
This research provides a simple and portable system that is able to generate MIDI output based on the inputted data collected through an EEG collecting device. The context is beneficial in many ways, where the therapeutic effects of listening to the music created by the brain waves documents many cases of treating health problems. The approach is influenced by the interface described in the article “Brain-Computer Music Interface for Composition and Performance” by Eduardo Reck Miranda, where different frequency bands trigger corresponding piano notes through, and the complexity of, the signal represents the tempo of the sound. The correspondence of the sound and the notes have been established through experimental work, where data of participants of a test group were gathered and analyzed, putting intervals for brain frequencies for different notes. The study is an active contribution to the field of the neurofeedback, by providing criteria tools for assessment.
Convention Paper 8001 (Purchase now)
P6-7 Implementation and Development of Interfaces for Music Performance through Analysis of Improvised Dance Movements—Richard Hoadley, Anglia Ruskin University - Cambridge, UK
Electronic music, even when designed to be interactive, can lack performance interest and is frequently musically unsophisticated. This is unfortunate because there are many aspects of electronic music that can be interesting, elegant, demonstrative, and musically informative. The use of dancers to interact with prototypical interfaces comprising clusters of sensors generating music algorithmically provides a method of investigating human actions in this environment. This is achieved through collaborative work involving software and hardware designers, composers, sculptors, and choreographers who examine aesthetically and practically the interstices of these disciplines. This paper investigates these interstices.
Convention Paper 8002 (Purchase now)
P6-8 Violence Prediction through Emotional Speech—José Higueras-Soler, Roberto Gil-Pita, Enrique Alexandre, Manuel Rosa-Zurera, Universidad de Alcalá - Acalá d Henares, Madrid, Spain
Preventing violence takes an absolute necessity in our society. Whether in homes with a particular risk of domestic violence, as in prisons or schools, there is a need for systems capable of detecting risk situations, for preventive purposes. One of the most important factors that precede a violent situation is an emotional state of anger. In this paper we discuss the features that are required to provide decision makers dedicated to the detection of emotional states of anger from speech signals. For this purpose, we present a set of experiments and results with the aim of studying the combination of features extracted from the literature and their effects over the detection performance (relationship between probability of detection of anger and probability of false alarm) of a neural network and a least-square linear detector.
Convention Paper 8003 (Purchase now)
P6-9 FoleySonic: Placing Sounds on a Timeline through Gestures—David Black, Kristian Gohlke, University of Applied Sciences, Bremen - Bremen, Germany; Jörn Loviscach, University of Applied Sciences, Bielefeld - Bielefeld, Germany
The task of sound placement on video timelines is usually a time-consuming process that requires the sound designer or foley artist to carefully calibrate the position and length of each sound sample. For novice and home video producers, friendlier and more entertaining input methods are needed. We demonstrate a novel approach that harnesses the motion-sensing capabilities of readily available input devices, such as the Nintendo Wii Remote or modern smart phones, to provide intuitive and fluid arrangement of samples on a timeline. Users can watch a video while simultaneously adding sound effects, providing a near real-time workflow. The system leverages the user’s motor skills for enhanced expressiveness and provides a satisfying experience while accelerating the process.
Convention Paper 8004 (Purchase now)
P6-10 A Computer-Aided Audio Effect Setup Procedure for Untrained Users—Sebastian Heise, Michael Hlatky, Hochschle Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
The number of parameters of modern audio effects easily ranges in the dozens. Expert knowledge is required to understand which parameter change results in a desired effect. Yet, such sound processors are also making their way into consumer products, where they overburden most users. Hence, we propose a procedure to achieve a desired effect without technical expertise based on a black-box genetic optimization strategy: Users are only confronted with a series of comparisons of two processed examples. Learning from the users’ choices, our software optimizes the parameter settings. We conducted a study on hearing-impaired persons without expert knowledge, who used the system to adjust a third-octave equalizer and a multiband compressor to improve the intelligibility of a TV set.
Convention Paper 8005 (Purchase now)
P7 - Loudspeakers and Headphones
Saturday, May 22, 16:30 — 18:00 (Room C4-Foyer)
P7-1 Theoretical and Experimental Comparison of Amplitude Modulation Techniques for Parametric Loudspeakers—Wei Ji, Woon-Seng Gan, Peifeng Ji, Nanyang Technological University - Singapore
Due to the self-demodulation property of finite amplitude ultrasonic waves propagating in air, a highly focused sound beam can be reproduced. This phenomenon is known as the parametric array in air. However, because of the nonlinearity effect of ultrasonic waves, the reproduced sound wave suffers from high distortion. Several amplitude modulation techniques have been previously proposed to mitigate this problem. Currently, no comprehensive study has been carried out to evaluate the performance of these amplitude modulation techniques. This paper attempts to provide theoretical and experimental studies of the effectiveness in mitigating the distortion from different amplitude modulation techniques. Objective measurements, such as sound pressure level and total harmonic distortion are used to compare the performance of these techniques.
Convention Paper 8006 (Purchase now)
P7-2 Improvement of Sound Quality by Means of Ultra-Soft Elastomer for the Gel Type Inertia Driven DML Type Transducer—Minsung Cho, Edinburgh Napier University - Edingburgh, UK, SFX Technologies Ltd., Edinburgh, UK; Elena Prokofieva, Edinburgh Napier University - Edingburgh, UK; Jordi Munoz, SFX Technologies Ltd. - Edingburgh, UK; Mike Barker, Edinburgh Napier University - Edingburgh, UK
Unlike standard DML transducers, the gel-type inertia driven transducer (referred to as the gel transducer in this paper) designed as a mini woofer DML-type transducer, transfers its pistonic movement to a panel through the gel surround, thereby generating a transverse wave. This mechanism induces maximum movement of a magnet assembly, which boosts the force of the moving voice coil so that sound pressure level increases as the acceleration of the panel increases. This effect is proportional to sound pressure level at low and medium frequencies in the range 50 Hz to 1000 Hz. Furthermore it is found that the gel surround prevents reflected transverse waves from the panel into interfering with the pistonic waves generated by the transducer. Results of the stiffness testing of the gel surround are presented together with data for the displacement and acceleration of the panel with the gel transducer attached. These data are compared to acoustical outputs.
Convention Paper 8007 (Purchase now)
P7-3 Electrical Circuit Model for a Loudspeaker with an Additional Fixed Coil in the Gap—Daniele Ponteggia, Studio Ing. Ponteggia - Terni, Italy; Marco Carlisi, Ing. Carlisi Marco - Como, Italy; Andrea Manzini, 18 Sound - Division of A.E.B. S.r.l. - Cavriago, Italy
A previous paper by some of the authors investigated a solution to minimize the loudspeaker inductance based on an additional fixed coil positioned in the gap with two additional terminals. This device is referred as A.I.C. (Active Impedance Control). In this paper a suitable electrical circuit model for such loudspeaker with four terminals and two coils is proposed.
Convention Paper 8008 (Purchase now)
P7-4 Measurement of the Nonlinear Distortions in Loudspeakers with a Broadband Noise—Rafal Siczek, Andrzej B. Dobrucki, Wroclaw University of Technology - Wroclaw, Poland
The paper presents application of digital filters for measurement of nonlinear distortion in loudspeakers using Wolf’s method and broadband noise as the exciting signal. The results of simulation of the measurement process for loudspeaker with various nonlinearities of Bl factor and suspension stiffness are presented. The influence of various exciting signals, e.g., white and pink noise as well as the parameters of digital filters have been tested. The results of measurement of an actual loudspeaker are also presented.
Convention Paper 8009 (Purchase now)
P7-5 The Fundamentals of Loudspeaker Radiation and Acoustic Quality—Grzegorz P. Matusiak, New Smart System (AC Systems) - Poznan, Poland and Smørum, Denmark
The loudspeaker designer faces many challenges when designing a new loudspeaker system. The proposed solution is to replace the mathematically complicated series circuit of the radiation impedance by a new one, that can easily be introduced into a loudspeaker circuit in the whole frequency range. It facilitates a designing process and enables to learn more about the loudspeaker. The fundamental resonance in air is lower than in vacuum and depends on the radiating surface, as ACOUSTIC QUALITY, which also depends on frequency. 1/QT=1/QE+1/QM+1/QA. The parameters affect the SPL and efficiency response. Investigated structures: pulsating sphere, baffled circular, square and rectangular piston, infinite strip, unflanged pipe-piston, finally horn (0-infinity Hz).
Convention Paper 8010 (Purchase now)
P7-6 Analysis of Low Frequency Audio Reproduction via Multiple Low-Bl Loudspeakers—Theodore Altanis, John Mourjopoulos, University of Patras - Patras, Greece
Small loudspeakers with low force factors (Bl) behave as narrow bandwidth filters with high quality factors and can be tuned at low resonance frequencies. It is possible to optimize low frequency reproduction (in the sub-woofer range) using a number of such drivers. This paper employs a set of such low Bl loudspeakers with different eigenfrequencies to achieve efficient reproduction of low frequency signals and examines the aspects of such implementation comparing the results to an ideal sub-woofer system and also via perceptual evaluation of these systems.
Convention Paper 8011 (Purchase now)
P7-7 Circular Loudspeaker Array with Controllable Directivity—Martin Møller, Martin Olsen, Finn T. Agerkvist, Technical University of Denmark - Lyngby, Denmark; Jakob Dyreby, Gert Kudahl Munch, Bang & Olufsen A/S - Struer, Denmark
Specific directivity patterns for circular arrays of loudspeakers can be achieved by utilizing the concept of phase-modes, which expands the directivity pattern into a series of circular harmonics. This paper investigates the applicability of this concept applied on a loudspeaker array on a cylindrical baffle, with a desired directivity pattern, which is specified in the frequency interval of 400–5000 Hz. From the specified frequency independent directivity pattern filter transfer functions for each loudspeaker are determined. The sensitivity of various parameters, related to practical implementation, is also investigated by introducing filter errors for each array element. Measurements on a small scale model using 2 inch drivers are compared with simulations, showing good agreement between experimental and predicted results.
Convention Paper 8012 (Purchase now)
P8 - Music Analysis and Processing
Sunday, May 23, 09:00 — 13:00 (Room C3)
Chair: David Malham, University of York - York, UK
P8-1 Automatic Detection of Audio Effects in Guitar and Bass Recordings—Michael Stein, Jakob Abeßer, Christian Dittmar, Fraunhofer Institue for Digital Media Technology IDMT - Ilmenau, Germany; Gerald Schuller, Ilmenau University of Technology - Ilmenau, Germany
This paper presents a novel method to detect and distinguish 10 frequently used audio effects in recordings of electric guitar and bass. It is based on spectral analysis of audio segments located in the sustain part of previously detected guitar tones. Overall, 541 spectral, cepstral and harmonic features are extracted from short time spectra of the audio segments. Support Vector Machines are used in combination with feature selection and transform techniques for automatic classification based on the extracted feature vectors. With correct classification rates up to 100% for the detection of single effects and 98% for the simultaneous distinction of 10 different effects, the method has successfully proven its capability—performing on isolated sounds as well as on multitimbral, stereophonic musical recordings.
Convention Paper 8013 (Purchase now)
P8-2 Time Domain Emulation of the Clavinet—Stefan Bilbao, University of Edinburgh - Edingburgh, UK; Matthias Rath, Technische Universität Berlin - Berlin, Germany
The simulation of classic electromechanical musical instruments and audio effects has seen a great deal of activity in recent years, due in part to great recent increases in computing power. It is now possible to perform full emulations of relatively complex musical instruments in real time, or near real time. In this paper time domain finite difference schemes are applied to the emulation of the Hohner Clavinet, an electromechanical stringed instrument exhibiting special features such as sustained hammer/string contact, pinning of the string to a metal stop, and a distributed damping mechanism. Various issues, including numerical stability, implementation details, and computational cost will be discussed. Simulation results and sound examples will be presented.
Convention Paper 8014 (Purchase now)
P8-3 Polyphony Number Estimator for Piano Recordings Using Different Spectral Patterns—Ana M. Barbancho, Isabel Barbancho, Javier Fernandez, Lorenzo J. Tardón, Universidad de Málaga - Málaga, Spain
One of the main tasks of a polyphonic transcription system is the estimation of the number of voices, i.e., the polyphony number. The correct estimation of this parameter is very important for polyphonic transcription systems, this task has not been discussed in depth in the known transcription systems. The aim of this paper is to propose a novel estimation method of the polyphony number for piano recordings. This new method is based on the use of two different types of spectral patterns: single-note patterns and composed-note patterns. The usage of composed-note patterns in the estimation of the polyphony number and in the polyphonic detection process has not been previously reported in the literature.
Convention Paper 8015 (Purchase now)
P8-4 String Ensemble Vibrato: A Spectroscopic Study—Stijn Mattheij, AVANS University - Breda, The Netherlands
A systematic observation of the presence of ensemble vibrato on early twentieth century recordings of orchestral works has been carried out by studying spectral line shapes of individual musical notes. Broadening of line shapes was detected in recordings of Beethoven’s Fifth Symphony and Brahms’s Hungarian Dance no. 5; this effect was attributed to ensemble vibrato. From these observations it may be concluded that string ensemble vibrato was common practice in orchestras from the continent throughout the twentieth century. British orchestras do not use much vibrato before 1940.
Convention Paper 8016 (Purchase now)
P8-5 Influence of Psychoacoustic Roughness on Musical Intonation Preference—Julián Villegas, Michael Cohen, Ian Wilson, University of Aizu - Aizu, Japan; William Martens, University of Sydney - Sydney, NSW, Australia
An experiment to compare the acceptability of three different music fragments rendered with three different intonations is presented. These preference results were contrasted with those of isolated chords also rendered with the same three intonations. The least rough renditions were found to be those using Twelve-Tone Equal-Temperament (12-tet). Just Intonation (ji) renditions were the roughest. A negative correlation between preference and psychoacoustic roughness was also found.
Convention Paper 8017 (Purchase now)
P8-6 Music Emotion and Genre Recognition Toward New Affective Music Taxonomy—Jonghwa Kim, Lars Larsen, University Augsburg - Augsburg, Germany
Exponentially increasing electronic music distribution creates a natural pressure for fine-grained musical metadata. On the basis of the fact that a primary motive for listening to music is its emotional effect, diversion, and the memories it awakens, we propose a novel affective music taxonomy that combines the global music genre taxonomy, e.g., classical, jazz, rock/pop, and rap, with emotion categories such as joy, sadness, anger, and pleasure, in a complementary way. In this paper we deal with all essential stages of automatic genre/emotion recognition system, i.e., from reasonable music data collection up to performance evaluation of various machine learning algorithms. Particularly, a novel classification scheme, called consecutive dichotomous decomposition tree (CDDT) is presented, which is specifically parameterized for multi-class classification problems with extremely high number of class, e.g., sixteen music categories in our case. The average recognition accuracy of 75% for the 16 music categories shows a realistic possibility of the affective music taxonomy we proposed.
Convention Paper 8018 (Purchase now)
P8-7 Perceptually-Motivated Audio Morphing: Warmth—Duncan Williams, Tim Brookes, University of Surrey - Guildford, UK
A system for morphing the warmth of a sound independently from its other timbral attributes was coded, building on previous work morphing brightness only, and morphing brightness and softness. The new warmth-softness-brightness morpher was perceptually validated using a series of listening tests. A multidimensional scaling analysis of listener responses to paired-comparisons showed perceptually orthogonal movement in two dimensions within a warmth-morphed and everything-else-morphed stimulus set. A verbal elicitation experiment showed that listeners’ descriptive labeling of these dimensions was as intended. A further “quality control” experiment provided evidence that no “hidden” timbral attributes were altered in parallel with the intended ones. A complete timbre morpher can now be considered for further work and evaluated using the tri-stage procedure documented here.
Convention Paper 8019 (Purchase now)
P8-8 A Novel Envelope-Based Generic Dynamic Range Compression Model—Adam Weisser, Oticon A/S - Smørum, Denmark
A mathematical model is presented, which reproduces typical dynamic range compression, when given the nominal input envelope of the signal and the compression constants. The model is derived geometrically in a qualitative approach and the governing differential equation for an arbitrary input and an arbitrary compressor is found. Step responses compare well to commercial compressors tested. The compression effect on speech using the general equation in its discrete version is also demonstrated. This model applicability is especially appealing to hearing aids, where the input-output curve and time constants of the nonlinear instrument are frequently consulted and the qualitative theoretical effect of compression may be crucial for speech perception.
Convention Paper 8020 (Purchase now)
P9 - Room and Architectural Acoustics
Sunday, May 23, 09:00 — 13:00 (Room C5)
Chair: Ben Kok, Consultant - The Netherlands
P9-1 On the Air Absorption Effects in a Finite Difference Implementation of the Acoustic Diffusion Equation Model—Juan M. Navarro, San Antonio's Catholic University of Murcia - Guadalupe, Spain; José Escolano, University of Jaén - Linares, Spain; José J. Lopez, Universidad Politecnica de Valencia - Valencia, Spain
In room-acoustics modeling, the sound atmospheric attenuation is a critical phenomenon that has to be taken into account to obtain correct predictions, especially when high frequencies and large enclosures are simulated. A finite difference scheme implementation of the diffusion equation model with a mixed boundary condition is evaluated, including the air absorption within the room. This paper focuses on investigating the performance of this implementation for room-acoustics simulation. In particular, the stability condition is developed to compare the features of the solution both with and without the air absorption term. Moreover, the correct behavior of the numerical implementation has been studied comparing predicted results using different surfaces and air absorption coefficients.
Convention Paper 8021 (Purchase now)
P9-2 A Comparison of Two Diffuse Boundary Models Based on Finite Differences Schemes—José Escolano, University of Jaén - Linares, Spain; Juan M. Navarro, San Antonio's Catholic University of Murcia - Guadalupe, Spain; Damian T. Murphy, Jeremy J. Wells, University of York - York, UK; José J. López, Universidad Politécnica de Valencia - Valencia, Spain
In room acoustics, the reflection and scattering of sound waves at the boundaries strongly determines the behavior within the enclosed space itself. Therefore, the accurate simulation of the acoustic characteristics of a boundary is an important part of any room acoustics prediction model, and in particular diffuse reflections at a boundary is one of the most important properties to model correctly for an accurate and perceptually natural result. This paper presents a comparison of two simulation models for the phenomenon of diffuse reflection at a boundary: one based on introducing physical variations at the boundary itself, and another based on the use of a diffusion equation model.
Convention Paper 8022 (Purchase now)
P9-3 Considerations for the Optimal Location and Boundary Effects for Loudspeakers in an Automotive Interior—Roger Shively, Jeff Bailey, Harman International - Novi, MI, USA; Jerôme Halley, Lars Kurandt, Harman International - Karlsbad, Germany; François Malbos, Harman International - Chateau du Loir, France; Gabriel Ruiz, Harman International - Bridgend, Wales, UK; Alfred Svobodnik, Harman International - Vienna Austria
Referencing earlier work by the authors on the boundary effects in an automotive vehicle interior (AES Convention Paper 4245, May 1996) on the mid-to-high frequency timbral changes in the sound field due to the proximity to loudspeakers of reflective, semi-rigid surfaces, modeling of midsize loudspeakers in the interior of an automobile is reported on, as well as modeled results for a specific case are given.
Convention Paper 8023 (Purchase now)
P9-4 Acoustics of the Restored Petruzzelli Theater—Marco Facondini, TanAcoustics Studio - Pesaro, Italy; Daniele Ponteggia, Studio Ing. Ponteggia - Terni, Italy
Petruzzelli theater in Bari has been recently restored after the disastrous fire of 1991 had seriously damaged the building. The restoration has focused on the aesthetics and functionality of the room, in particular giving great attention to improving the acoustics of the theater. This paper reviews the acoustical design process that has been carried out using a computer model of the hall and measurements during the restoration process. Objective indexes from measurements of the renewed theater are compared with literature suggested values and with similar halls.
Convention Paper 8024 (Purchase now)
P9-5 Identification of a Room Impulse Response Using a Close-Microphone Reference Signal—Elias Kokkinis, John Mourjopoulos, University of Patras - Patras, Greece
The identification of room impulse responses is very important in many audio signal processing applications. Typical system identification methods require access to the original source signal that is seldom available in practice, while blind system identification methods require prior knowledge of the statistical properties of the source signal that, for audio applications, are not available. In this paper a new system identification method is proposed using a close-microphone reference signal, and it is shown that it accurately identifies real room impulse responses.
Convention Paper 8025 (Purchase now)
P9-6 Acoustic Impulse Response Measurement Using Speech and Music Signals—John Usher, Barcelona Media - Barcelona, Spain
Continuous measurement of room impulse responses (RIRs) in the presence of an audience has many applications for room acoustics: in-situ loudspeaker/room equalization; teleconferencing; and for architectural acoustic diagnostics. A continuous analysis of the RIR is often preferable to a single measurement, especially with non-stationary room characteristics such as from changing atmospheric or audience conditions. This paper discusses the use of adaptive filters updated according to the NLMS algorithm for fast, continuous in-situ RIR acquisition; particularly when the input signal is music or speech. We show that the dual-channel FFT (DCFFT) method has slower convergence and is less robust to colored signals such as music and speech. Data is presented comparing the NLMS and the DCFFT methods and we show that the adaptive filter approach provides RIRs with high accuracy and high robustness to background noise using music or speech signals.
Convention Paper 8026 (Purchase now)
P9-7 Evaluation of Late Reverberant Fields in Loudspeaker Rendered Virtual Rooms—Wieslaw Woszczyk, Brett Leonard, Doyuen Ko, McGill University - Montreal, Quebec, Canada
Late reverberant decay was isolated from high-resolution impulse response measured in a church that has excellent acoustics suitable for music recording. The measured multichannel impulse response (IR) was compared to two synthetic decays (SIR) built from Gaussian noise and modeled on the original IR. The synthetic decays were created to be similar to the original response in the rate of decay and spectral weighting in order to make listening comparisons between them. Four different transition points (100 ms, 200 ms, 300 ms, 400 ms) were chosen to crossfade between the original early response and the synthetic decays. Listeners auditioned the synthetic and measured rooms convolved with three anechoic monophonic sources. The results of tests conducted within immersive surround sound environment with height help to verify whether shaped random noise is a suitable substitute for late reverberation in high-resolution simulations of room acoustics.
Convention Paper 8027 (Purchase now)
P9-8 Finite Difference Room Acoustic Modeling on a General Purpose Graphics Processing Unit—Alexander Southern, University of York - York, UK, Aalto University School of Science and Technology, Aalto, Finland; Damian Murphy, University of York - York, UK; Guilherme Campos, Paulo Dias, University of Aveiro - Aveiro, Portugal
Detailed and convincing walkthrough auralizations of virtual rooms requires much processing capability. One method of reducing this requirement is to pre-calculate a data-set of room impulse responses (RIR) at locations throughout the space. Processing resources may then focus on RIR interpolation and convolution using the dataset as the virtual listening position changes in real-time. Recent work identified the suitability of wave-based models over traditional ray-based approaches for walkthrough auralization. Despite the computational saving of wave-based methods to generate the RIR dataset, processing times are still long. This paper presents a wave-based implementation for execution on a general purpose graphics processing unit. Results validate the approach and show that parallelization provides a notable acceleration.
Convention Paper 8028 (Purchase now)
P10 - Audio Processing—Analysis and Synthesis of Sound
Sunday, May 23, 09:00 — 10:30 (Room C4-Foyer)
P10-1 Cellular Automata Sound Synthesis with an Extended Version of the Multitype Voter Model—Jaime Serquera, Eduardo R. Miranda, University of Plymouth - Plymouth, UK
In this paper we report on the synthesis of sounds with cellular automata (CA), specifically with an extended version of the multitype voter model (MVM). Our mapping process is based on DSP analysis of automata evolutions and consists in mapping histograms onto sound spectrograms. This mapping allows a flexible sound design process, but due to the non-deterministic nature of the MVM such process acquires its maximum potential after the CA run is finished. Our extended version model presents a high degree of predictability and controllability making the system suitable for an in-advance sound design process with all the advantages that this entails, such as real-time possibilities and performance applications. This research focuses on the synthesis of damped sounds.
Convention Paper 8029 (Purchase now)
P10-2 Stereophonic Rendering of Source Distance Using DWM-FDN Artificial Reverberators—Saul Maté-Cid, Hüseyin Hacihabiboglu, Zoran Cvetkovic, King's College London - London, UK
Artificial reverberators are used in audio recording and production to enhance the perception of spaciousness. It is well known that reverberation is a key factor in the perception of the distance of a sound source. The ratio of direct and reverberant energies is one of the most important distance cues. A stereophonic artificial reverberator is proposed that allows panning the perceived distance of a sound source. The proposed reverberator is based on feedback delay network (FDN) reverberators and uses a perceptual model of direct-to-reverberant (D/R) energy ratio to pan the source distance. The equivalence of FDNs and digital waveguide mesh (DWM) scattering matrices is exploited in order to devise a reverberator relevant in the room acoustics context.
Convention Paper 8030 (Purchase now)
P10-3 Separation of Music+Effects Sound Track from Several International Versions of the Same Movie—Antoine Liutkus, Télécom ParisTech - Paris, France; Pierre Leveau, Audionamix - Paris, France
This paper concerns the separation of the music+effects (ME) track from a movie soundtrack, given the observation of several international versions of the same movie. The approach chosen is strongly inspired from existing stereo audio source separation and especially from spatial filtering algorithms such as DUET that can extract a constant panned source from a mixture very efficiently. The problem is indeed similar for we aim here at separating the ME track, which is the common background of all international versions of the movie soundtrack. The algorithm has been adapted to a number of channels greater than 2. Preprocessing techniques have also been proposed to adapt the algorithm to realistic cases. The performances of the algorithm have been evaluated on realistic and synthetic cases.
Convention Paper 8031 (Purchase now)
P10-4 A Differential Approach for the Implementation of Superdirective Loudspeaker Array—Jung-Woo Choi, Youngtae Kim, Sangchul Ko, Jungho Kim, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
A loudspeaker arrangement and corresponding analysis method to obtain a robust superdirective beam are proposed. The superdirectivity technique requires precise matching of the sound sources modeled to calculate excitation patterns and those used for the loudspeaker array. To resolve the robustness issue arising from the modeling mismatch error, we show that the overall sensitivity to the model-mismatch error can be reduced by rearranging loudspeaker positions. Specifically, a beam pattern obtained by a conventional optimization technique is represented as a product of robust delay-and-sum patterns and error-sensitive differential patterns. The excitation pattern driving the loudspeaker array is then reformulated such that the error-sensitive pattern is only applied to the outermost loudspeaker elements, and the array design that fits to the new excitation pattern is discussed.
Convention Paper 8032 (Purchase now)
P10-5 Improving the Performance of Pitch Estimators—Stephen J. Welburn, Mark D. Plumbley, Queen Mary University of London - London, UK
We are looking to use pitch estimators to provide an accurate high-resolution pitch track for resynthesis of musical audio. We found that current evaluation measures such as gross error rate (GER) are not suitable for algorithm selection. In this paper we examine the issues relating to evaluating pitch estimators and use these insights to improve performance of existing algorithms such as the well-known YIN pitch estimation algorithm.
Convention Paper 8033 (Purchase now)
P10-6 Reverberation Analysis via Response and Signal Statistics—Eleftheria Georganti, Thomas Zarouchas, John Mourjopoulos, University of Patras - Patras, Greece
This paper examines statistical quantities (i.e., kurtosis, skewness) of room transfer functions and audio signals (anechoic, reverberant, speech, music). Measurements are taken under various reverberation conditions in different real enclosures ranging from small office to a large auditorium and for varying source–receiver positions. Here, the statistical properties of the room responses and signals are examined in the frequency domain. From these properties, the relationship between the spectral statistics of the room transfer function and the corresponding reverberant signal are derived.
Convention Paper 8034 (Purchase now)
P10-7 An Investigation of Low-Level Signal Descriptors Characterizing the Noise-Like Nature of an Audio Signal—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
This paper presents an overview and an evaluation of low-level features characterizing the noise-like or tone-like nature of an audio signal. Such features are widely used for content classification, segmentation, identification, coding of audio signals, blind source separation, speech enhancement, and voice activity detection. Besides the very prominent Spectral Flatness Measure various alternative descriptors exist. These features are reviewed and the requirements for these features are discussed. The features in scope are evaluated using synthetic signals and exemplarily real-world application related to audio content classification, namely voiced-unvoiced discrimination for speech signals and speech detection.
Convention Paper 8035 (Purchase now)
P10-8 Algorithms for Digital Subharmonic Distortion—Zlatko Baracskai, Ryan Stables, Birmingham City University - Birmingham, UK
This paper presents a comparison between existing digital subharmonic generators and a new algorithm developed with the intention of having a more pronounced subharmonic frequency and reduced harmonic, intermodulation and aliasing distortions. The paper demonstrates that by introducing inversions of a waveform at the minima and maxima instead of the zero crossings, the discontinuities are mitigated and various types of distortion are significantly attenuated.
Convention Paper 8036 (Purchase now)
P11 - Network, Internet, and Broadcast Audio
Sunday, May 23, 13:30 — 18:30 (Room C3)
Chair: Bob Walker, Consultant - UK
P11-1 A New Technology for the Assisted Mixing of Sport Events: Application to Live Football Broadcasting—Giulio Cengarle, Toni Mateos; Natanael Olaiz; Pau Arumí, Barcelona Media Innovation Centre - Barcelona, Spain
This paper presents a novel application for capturing the sound of the action during a football match by automatically mixing the signals of several microphones placed around the pitch and selecting only those microphones that are close to, or aiming at, the action. The sound engineer is presented with a user interface where he or she can define and move dynamically the point of interest on a screen representing the pitch, while the application controls the faders of the broadcast console. The technology has been applied in the context of a three-dimensional surround sound playback of a Spanish first-division match.
Convention Paper 8037 (Purchase now)
P11-2 Recovery Time of Redundant Ethernet-Based Networked Audio Systems—Maciej Janiszewski, Piotr Z. Kozlowski, Wroclaw University of Technology - Lower Silesia, Poland
Ethernet-based networked audio systems has become more popular among audio system designers. One of the most important issues that is available from networked audio system is the redundancy. The system can recover after different types of failures—cable or even device failure. Redundancy protocols implemented by audio developers are different than protocols known from computer networks, but both may be used in an Ethernet-based audio system. The most important attribute of redundancy in the audio system is a recovery time. This paper is a summary of research that was done at Wroclaw University of Technology. It shows the recovery time after different types of failures, with different network protocols implemented, for all possible network topologies in CobraNet and EtherSound systems.
Convention Paper 8038 (Purchase now)
P11-3 Upping the Auntie: A Broadcaster’s Take on Ambisonics—Chris Baume, Anthony Churnside, British Broadcasting Corporation Research & Development - UK
This paper considers Ambisonics from a broadcaster’s point of view: to identify barriers preventing its adoption within the broadcast industry and explore the potential advantages were it to be adopted. This paper considers Ambisonics as a potential production and broadcast technology and attempts to assess the impact that the adoption of Ambisonics might have on both production workflows and the audience experience. This is done using two case studies: a large-scale music production of “The Last Night of the Proms” and a smaller scale radio drama production of “The Wonderful Wizard of Oz.” These examples are then used for two subjective listening tests: the first to assess the benefit of representing height allowed by Ambisonics and the second to compare the audience’s enjoyment of first order Ambisonics to stereo and 5.0 mixes.
Convention Paper 8039 (Purchase now)
P11-4 Audio-Video Synchronization for Post-Production over Managed Wide-Area Networks—Nathan Brock, Michelle Daniels, University of California San Diego - La Jolla, CA, USA; Steve Morris, Skywalker Sound - Marin County, CA, USA; Peter Otto, University of California San Diego - La Jolla, CA, USA
A persistent challenge with enabling remote collaboration for cinema post-production is synchronizing audio and video assets. This paper details efforts to guarantee that the sound quality and audio-video synchronization over networked collaborative systems will be measurably the same as that experienced in a traditional facility. This includes establishing a common word-clock source for all digital audio devices on the network, extending transport control and time code to all audio and video assets, adjusting latencies to ensure sample-accurate mixing between remote audio sources, and locking audio and video playback to within quarter-frame accuracy. We will detail our instantiation of these techniques at a demonstration given in December 2009 involving collaboration between a film editor in San Diego and a sound designer in Marin County, California.
Convention Paper 8040 (Purchase now)
P11-5 A Proxy Approach for Interoperability and Common Control of Networked Digital Audio Devices—Osedum P. Igumbor, Richard J. Foss, Rhodes University - Grahamstown, South Africa
This paper highlights the challenge that results from the availability of a large number of control protocols within the context of digital audio networks. Devices that conform to different protocols are unable to communicate with one another, even though they might be utilizing the same networking technology (Ethernet, IEEE 1394 serial bus, USB). This paper describes the use of a proxy that allows for high-level device interaction (by sending protocol messages) between networked devices. Furthermore, the proxy allows for a common controller to control the disparate networked devices.
Convention Paper 8041 (Purchase now)
P11-6 Network Neutral Control over Quality of Service Networks—Philip Foulkes, Richard Foss, Rhodes University - Grahamstown, South Africa
IEEE 1394 (FireWire) and Ethernet Audio/Video Bridging are two networking technologies that allow for the transportation of synchronized, low-latency, real-time audio and video data. Each networking technology has its own methods and techniques for establishing stream connections between the devices that reside on the networks. This paper discusses the interoperability of these two networking technologies via an audio gateway and the use of a common control protocol, AES-X170, to allow for the control of the parameters of these disparate networks. This control is provided by a software patchbay application.
Convention Paper 8042 (Purchase now)
P11-7 Relative Importance of Speech and Non-Speech Components in Program Loudness Assessment—Ian Dash, Australian Broadcasting Corporation - Sydney, NSW, Australia; Mark Bassett, Densil Cabrera, The University of Sydney - Sydney, NSW, Australia
It is commonly assumed in broadcasting and film production that audiences determine soundtrack loudness mainly from the speech component. While intelligibility considerations support this idea indirectly, the literature is very short on direct supporting evidence. A listening test was therefore conducted to test this hypothesis. Results suggest that listeners judge loudness from overall levels rather than speech levels. A secondary trend is that listeners tend to compare like with like. Thus, listeners will compare speech loudness with other speech content rather than with non-speech content and will compare loudness of non-speech content with other non-speech content more than with speech content. A recommendation is made on applying this result for informed program loudness control.
Convention Paper 8043 (Purchase now)
P11-8 Loudness Normalization in the Age of Portable Media Players—Martin Wolters, Harald Mundt, Dolby Germany GmbH - Nuremberg, Germany; Jeffrey Riedmiller, Dolby Laboratories Inc. - San Francisco, CA, USA
In recent years, the increasing popularity of portable media devices among consumers has created new and unique audio challenges for content creators, distributors as well as device manufacturers. Many of the latest devices are capable of supporting a broad range of content types and media formats including those often associated with high quality (wider dynamic-range) experiences such as HDTV, Blu-ray or DVD. However, portable media devices are generally challenged in terms of maintaining consistent loudness and intelligibility across varying media and content types on either their internal speaker(s) and/or headphone outputs. This paper proposes a nondestructive method to control playback loudness and dynamic range on portable devices based on a worldwide standard for loudness measurement as defined by the ITU. In addition the proposed method is compatible to existing playback software and audio content following the Replay Gain (www.replaygain.org) proposal. In the course of the paper the current landscape of loudness levels across varying media and content types is described and new and nondestructive concepts targeted at addressing consistent loudness and intelligibility for portable media players are introduced.
Convention Paper 8044 (Purchase now)
P11-9 Determining an Optimal Gated Loudness Measurement for TV Sound Normalization—Eelco Grimm, Grimm-Audio; Esben Skovenborg, tc-electronic; Gerhard Spikofski, Institute of Broadcast Technology - Berlin, Germany
Undesirable loudness jumps are a notorious problem in television broadcast. The solution consists in switching to loudness-based metering and program normalization. In Europe this development has been led by the EBU P/LOUD group, working toward a single target level for loudness normalization applying to all genres of programs. P/LOUD found that loudness normalization as specified by ITU-R BS.1770-1 works fairly well for the majority of broadcast programs. However, it was realized that wide loudness-range programs were not well-aligned with other programs when using ITU-R BS.1770-1 directly, but that adding a measurement-gate provided a simple yet effective solution. P/LOUD therefore conducted a formal listening experiment to perform a subjective evaluation of different gate parameters. This paper specifies the method of the subjective evaluation and presents the results in term of preferred gating parameters.
Convention Paper 8154 (Purchase now)
P11-10 Analog or Digital? A Case-Study to Examine Pedagogical Approaches to Recording Studio Practice—Andrew King, University of Hull - Scarborough, North Yorkshire, UK
This paper explores the use of digital and analog mixing consoles in the recording studio over a single drum kit recording session. Previous research has examined contingent learning, problem-solving, and collaborative learning within this environment. However, while there have been empirical investigations into the use of computer-based software and interaction around and within computers, this has not taken into consideration the use of complex recording apparatus. A qualitative case study approach was used in this investigation. Thirty hours of video data was captured and transcribed. A preliminary analysis of the data shows that there are differences between the types of problems encountered by learners when using either an analog or digital mixing console.
Convention Paper 8045 (Purchase now)
P12 - Noise Reduction and Speech Intelligibility
Sunday, May 23, 14:00 — 17:30 (Room C5)
Chair: Rhonda Wilson, Meridian Audio - Huntingdon, UK
P12-1 Monolateral and Bilateral Fitting with Different Hearing Aids Directional Configurations—Lorenzo Picinali, De Montfort University - Leicester, UK; Silvano Prosser, Università di Ferrara - Ferrara, Italy
Hearing aid bilateral fitting in hearing impaired subjects raises some problems concerning the interaction of the perceptual binaural properties with the directional characteristics of the device. This experiment aims to establish whether and to which extent in a sample of 20 normally hearing subjects the binaural changes in the speech-to-noise level ratio (s/n), caused by different symmetrical and asymmetrical microphone configurations, and different positions of the speech signal (frontal or lateral), could alter the performances of the speech recognition in noise. Speech Reception Thresholds (SRT) in noise (simulated through an Ambisonic virtual sound field) have been measured monolaterally and bilaterally in order to properly investigate the role of the binaural interaction in the perception of reproduced signals.
Convention Paper 8046 (Purchase now)
P12-2 Acoustic Echo Cancellation for Wideband Audio and Beyond—Shreyas Paranjpe, Scott Pennock, Phil Hetherington, QNX Software Systems Inc. (Wavemakers) - Vancouver, BC, Canada
Speech processing is finally starting the transition to wider bandwidths. The benefits include increased intelligibility and comprehension and a more pleasing communication experience. High quality full-duplex Acoustic Echo Cancellation (AEC) is an integral component of a hands-free speakerphone communication system because it allows participants to converse in a natural manner (and without a headset!) as they would in person. Some high-end Telepresence systems already achieve life-like communication but are computationally demanding and prohibitively expensive. The challenge is to develop a robust Acoustic Echo Canceler (AEC) that processes full-band audio signals with low computational complexity and reasonable memory consumption for an affordable Telepresence experience.
Convention Paper 8047 (Purchase now)
P12-3 Adaptive Noise Reduction for Real-Time Applications—Constantin Wiesener, TU Berlin - Berlin, Germany; Tim Flohrer, Alexander Lerch, zplane.development - Berlin, Germany; Stefan Weinzierl, TU Berlin - Berlin, Germany
We present a new algorithm for real-time noise reduction of audio signals. In order to derive the noise reduction function, the proposed method adaptively estimates the instantaneous noise spectrum from an autoregressive signal model as opposed to the widely-used approach of using a constant noise spectrum fingerprint. In conjunction with the Ephraim and Malah suppression rule a significant reduction of both stationary and non-stationary noise can be obtained. The adaptive algorithm is able to work without user interaction and is capable of real-time processing. Furthermore, quality improvements are easily possible by integration of additional processing blocks such as transient preservation.
Convention Paper 8048 (Purchase now)
P12-4 Active Noise Reduction in Personal Audio Delivery Systems; Assessment Using Loudness Balance Methods—Paul Darlington, Pierre Guiu, Phitek Systems (Europe) Sarl - Lausanne, Switzerland
Subjective methods developed for rating passive hearing protectors are inappropriate to measure active noise reduction of earphones or headphones. Additionally, assessment using objective means may produce misleading results, which do not correlate well with wearer experience and do not encompass human variability. The present paper describes application of “loudness balance” methods to the estimation of active attenuation of consumer headphones and earphones. Early results suggest that objective measures of the active reduction of pressure report greater attenuations than those implied by estimates of perceived loudness, particularly at low frequency.
Convention Paper 8049 (Purchase now)
P12-5 Mapping Speech Intelligibility in Noisy Rooms—John F. Culling, Sam Jelfs, Cardiff University - Cardiff, UK; Mathieu Lavandier, Université de Lyon - Vauix-en-Velin Cedex, France
We have developed an algorithm for accurately predicting the intelligibility of speech in noise in a reverberant environment. The algorithm is based on a development of the equalization-cancellation theory of binaural unmasking, combined with established prediction methods for monaural speech perception in noise. It has been validated against a wide range of empirical data. Acoustic measurements of rooms, known as binaural room impulse responses (BRIRs) are analyzed to predict intelligibility of a nearby voice masked by any number of steady-state noise maskers in any spatial configuration within the room. This computationally efficient method can be used to generate intelligibility maps of rooms based on the design of the room.
Convention Paper 8050 (Purchase now)
P12-6 Further Investigations into Improving STI’s Recognition of the Effects of Poor Frequency Response on Subjective Intelligibility—Glenn Leembruggen, Acoustic Directions Pty Ltd. - ICE Design, Sydney, Australia, and University of Sydney, Sydney, Australia; Marco Hippler, University of Applied Sciences Cologne - Cologne, Germany; Peter Mapp, Peter Mapp and Associates - Colchester, UK
Previous work has highlighted deficiencies in the ability of the STI metric to satisfactorily recognize the subjective loss of intelligibility that occurs with sound systems having poor frequency responses, particularly in the presence of reverberation. In a recent paper we explored the changes to STI values resulting from a range of dynamic speech spectra taken over differing time lengths with different filter responses. That work included determining the effects on STI values of three alternative spreading functions simulating the ear’s upward masking mechanism. This paper extends that work and explores the effects on STI values of two masking methods used in MPEG-1 audio coding.
Convention Paper 8051 (Purchase now)
P12-7 Real-Time Speech-Rate Modification Experiments—Adam Kupryjanow, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together with the resulting sound quality.
Convention Paper 8052 (Purchase now)
P13 - Room Acoustics, Sound Reinforcement, and Instrumentation
Sunday, May 23, 16:30 — 18:00 (Room C4-Foyer)
P13-1 Adaptive Equalizer for Acoustic Feedback Control—Alexis Favrot, Christof Faller, Illusonic LLC - Lausanne, Switzerland
Acoustic feedback is a recurrent problem in audio applications involving amplified closed loop systems. The goal of the proposed acoustic feedback control algorithm is to adapt an equalizer, applied to the microphone signal, to prevent feedback before the effect of it is noticeable. This is achieved by automatically decreasing the gain of an equalizer at frequencies where feedback likely will occur. The equalization curve is determined using information contained in an adaptively estimated feedback path. A computationally efficient implementation of the proposed algorithm, using short-time Fourier transform, is described.
Convention Paper 8053 (Purchase now)
P13-2 The Meshotron: A Network of Specialized Hardware Units for 3-D Digital Waveguide Mesh Acoustic Model Parallelization—Guilherme Campos, Sara Barros, University of Aveiro - Aveiro, Portugal
This paper presents the project of a computing network—the Meshotron—specifically designed for large-scale parallelization of three-dimensional Digital Waveguide Mesh (3-D DWM) room acoustic models. It discusses the motivation of the project, its advantages, and the architecture envisaged for the application-specific hardware (ASH) units to form the network. The initial stages involve the development of a software prototype based on the rectangular mesh topology, using appropriate hardware simulation tools, and the design and test of FPGA-based scattering units for air and boundary nodes.
Convention Paper 8054 (Purchase now)
P13-3 A Hybrid Approach for Real-Time Room Acoustic Response Simulation—Andrea Primavera, Lorenzo Palestini, Stefania Cecchi, Francesco Piazza, Università Politecnica delle Marche - Ancona, Italy; Marco Moschetti, Korg Italy - Osimo, Italy
Reverberation is a well known effect particularly important for music listening especially for recorded and live music. Generally, there are two approaches for artificial reverberation: the desired signal can be obtained by convolving the input signal with a measured impulse response (IR) or a synthetic one. Taking into account the advantages of both approaches, a hybrid artificial reverberation algorithm is presented. The early reflections are derived from a real IR, truncated considering the calculated mixing time, and the reverberation tail is obtained considering the Moorer‘s structure. The parameters defining this structure are derived from the analyzed IR, using a minimization criteria based on Simultaneous Perturbation Stochastic Approximation (SPSA). The obtained results showed a high-quality reverberator with a low computational load.
Convention Paper 8055 (Purchase now)
P13-4 Sensor Networks for Measurement of Acoustic Parameters of Classrooms—Suthikshn Kumar, PESIT - Bangalore, India
Measurement of acoustic parameters such as signal to noise ratio, reverberation time, and background noise is important in order to optimize classroom configuration and architecture. The classroom learning environment can thus be improved. The lecturers and students find comfort and assurance in knowing that the acoustic parameters of the classroom have been measured and indicate good acoustics. We propose sensor networks for acoustic measurement applications. The sensor network consists of an array of inexpensive sensor nodes which communicate and aggregate the acoustic parameters. The paper presents the sensor node architecture and illustrates the typical usage of the sensor networks for the measurement of acoustic parameters in classrooms. This paper will not be presented at the Convention.
Convention Paper 8056 (Purchase now)
P13-5 In Situ Directivity Measurement of Flush-Mounted Loudspeakers in a Non-Environment Listening Room—Daniel Fernández-Comesaña, Paúl Rodriguez-Garcia, Institute of Sound and Vibration Research (ISVR) - Southampton, UK; Soledad Torres-Guijarro, Laboratorio Oficial de Metroloxía de Galicia (LOMG) - Tecnópole Ourense, Spain; Antonio Pena, ETSE Telecomunicación, Universidade de Vigo - Vigo, Spain
Directivity is one important parameter to define the behavior of a loudspeaker. There are many techniques and standards about directivity measurements in anechoic chambers, but in situ measurements of flush-mounted loudspeakers show some specific problems. This paper develops a procedure to measure directivity under the special conditions of a non-environment listening room, introducing the techniques utilized, the problems found with the proposed solutions, and discussing the limitations of the process. The existence of reflections, baffling effects due to adjacent walls and a comparison to theoretical models of the radiation of a piston are discussed.
Convention Paper 8057 (Purchase now)
P13-6 Measurement of Loudspeakers without Using an Anechoic Chamber Utilizing Pulse-Train Measurement Method—Teruo Muraoka, Takahiro Miura, The University of Tokyo - Tokyo, Japan; Haruhito Shimura, Hiroshi Akino, Audio-Technica Corporation - Machida, Japan; Tohru Ifukube, The University of Tokyo - Tokyo, Japan
Generally, loudspeakers are measured in an anechoic chamber. However, those chambers are very expensive, which makes the sound engineer's easy acoustical measurements almost impossible. Thus the authors devised a new measurement without using any anechoic chamber. Traditionally, loudspeakers are measured in an anechoic chamber by installing in a proper speaker enclosure and by driving with swept sinusoidal wave. Loudspeaker's radiated sound is detected with an omnidirectional reference microphone, and the output of the microphone is displayed graphically. If a shotgun microphone is employed instead of omnidirectional microphone, an anechoic chamber will become unnecessary. Furthermore, if pulse-train is used as a test signal, noise in surrounding circumstances will be reduced by applying a synchronous-averaging algorithm. Based upon this idea, the authors measured loudspeaker in any acoustical circumstances and obtained very similar data with that by the traditional method.
Convention Paper 8058 (Purchase now)
P13-7 Acoustic Space Sampling and the Grand Piano in a Non-Anechoic Environment: A Recordist-Centric Approach to the Musical Acoustic Study—Grzegorz Sikora, Brett Leonard, Martha de Francisco, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music Media and Technology, Montreal, Quebec, Canada; Douglas Eck, Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada, Université de Montréal, Montreal, Quebec, Canada, International Laboratory for Brain, Music, and Sound Research, Montreal, Quebec, Canada
A novel approach to instrument acoustics research is presented in which the instrument is coupled with a room and both are measured as a single acoustical system, in juxtaposition to many anechoic and computer simulated studies of musical acoustics. The technique is applied to the ubiquitous concert grand piano, where spectral information is gathered through the process of “acoustic space sampling” (AcSS), using more than 1,330 microphones. The physical data is then combined with psychoacoustic predictors to generate a map of timbre. This map is compared to the preference of expert listeners, thereby correlating the physical measures obtained through acoustic space sampling to the application of the recording engineer.
Convention Paper 8059 (Purchase now)
P14 - Spatial Audio Perception
Monday, May 24, 09:00 — 12:00 (Room C3)
Chair: Russell Mason, University of Surrey - Guildford, Surrey, UK
P14-1 Perceptual Assessment of Delay Accuracy and Loudspeaker Misplacement in Wave Field Synthesis—Jens Ahrens, Matthias Geier, Sascha Spors, Technische Universität Berlin - Berlin, Germany
The implementation of simple virtual source models like plane and spherical waves in wave field synthesis (WFS) employs delays that are applied to the input signals. We present a formal experiment evaluating the perceptual consequences of different accuracies of these delays. Closely related to the question of delay accuracy is the accuracy of the loudspeaker positioning. The second part of the presented experiment investigates the perceptual consequences of improperly placed loudspeakers. Dynamic binaural room impulse response-based simulations of a real loudspeaker array are employed and a static audio scene setup is considered.
Convention Paper 8068 (Purchase now)
P14-2 Perceptual Evaluation of Focused Sources in Wave Field Synthesis—Matthias Geier, Hagen Wierstorf, Jens Ahrens, Ina Wechsung, Alexander Raake, Sascha Spors, Technische Universität Berlin - Berlin, Germany
Wave Field Synthesis provides the possibility to reproduce virtual sound sources located between the loudspeaker array and the listener. Such sources are known as focused sources. A previously published study including an informal listening test has shown that the reproduction of focused sources is subject to audible artifacts, especially for large loudspeaker arrays. The combination of the time-reversal nature of focused sources and spatial sampling leads to pre-echos. The perception of these artifacts is quite different depending on the relative listener position. This paper describes a formal test that was conducted to verify the perceptual relevance of the physical properties found in previous papers.
Convention Paper 8069 (Purchase now)
P14-3 Experiments on the Perception of Elevated Sources in Wave-Field Synthesis Using HRTF Cues—Jose J. Lopez, Maximo Cobos, Technical University of Valencia - Valencia, Spain; Basilio Pueo, University of Alicante - Alicante, Spain
Wave-Field Synthesis (WFS) is a spatial sound reproduction technique that has attracted the interest of many researchers in the last decades. Unfortunately, although WFS has been shown to provide excellent localization accuracy, this property is restricted to sources located on the horizontal plane. Recently, the authors proposed a hybrid system that combines HRTF-based spectral filtering with WFS. This system makes use of the conventional WFS approach to achieve localization in the horizontal plane, whereas elevation effects are simulated by means of spectral elevation cues. This paper provides a review of the proposed method together with a compilation of the last experiments carried out to evaluate the perception of elevated sources in this novel system.
Convention Paper 8070 (Purchase now)
P14-4 Comparison of the Width of Sound Sources in 2-Channel and 3-Channel Sound Reproduction—Munhum Park, Aki Härmä, Steven van de Par, Georgia Tryfou, Philips Research Laboratories - Eindhoven, The Netherlands
In this paper we present the result of listening tests where the width of the sound stage was compared between conventional 2-channel stereophony and 3-channel reproduction with an additional center loudspeaker. When listeners were seated at the axis of symmetry, there was no significant difference between the two cases. In off axis positions there was a clear trend that the perceived image width of the noise is influenced by the way an additional interfering stimulus is reproduced, which was found to depend on the correlation of the noise. The results suggest that the spatial attribute of one sound image may be affected by another, for which possible explanations based on the principles of binaural hearing will be discussed.
Convention Paper 8071 (Purchase now)
P14-5 Study of the Effect of Source Directivity on the Perception of Sound in a Virtual Free-Field—Sorrel Hoare, Alex Southern, Damian Murphy, University of York - York, UK
In the context of soundscape evaluation, an area that has yet to be explored satisfactorily concerns faithful modeling and auralization of outdoor spaces. A common limitation of popular acoustic modeling techniques relates to source characterization, an aspect of which considered here is source directivity. In terms of auralization, source directivity may be linked to perceived spatial sound quality. In a preliminary listening test a selection of audio samples are auralized in a virtual free-field where the directivity pattern of the source is varied. The results confirm the significance of this characteristic on the acoustic perception, thus validating the extension of this research to more complex models.
Convention Paper 8072 (Purchase now)
P14-6 Audio Spatialization for The Morning Line—David G. Malham, Tony Myatt, Oliver Larkin, Peter Worth, Matthew Paradis, University of York - York, UK
The Morning Line is a large-scale outdoor sculpture with a multi-dimensional sound system consisting of 41 small weatherproof main loudspeakers and 12 subs, configured in 6 surround arrays, or "rooms," distributed throughout the sculpture. The irregular nature of these arrays necessitated the use of VBAP panning on the sculpture but when preparing works to be played on it, Ambisonics is more often used. This paper describes the hardware and software developed for the sculpture, theories of spatial audio perception that prompted the design approach, as well as the production facilities provided for the composers whose works are performed on The Morning Line.
Convention Paper 8073 (Purchase now)
P15 - Loudspeakers and Headphones: Part 2
Monday, May 24, 09:00 — 12:00 (Room C5)
Chair: John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
P15-1 Chameleon Subwoofer Arrays—Generalized Theory of Vectored Sources in a Closed Acoustic Space—Adam J. Hill, Malcolm O. J. Hawksford, University of Essex - Colchester, UK
An equalization model is presented that seeks optimal solutions to wide area low-frequency sound reproduction in closed acoustic spaces. The methodology improves upon conventional wisdom by incorporating a generalized subwoofer array where individual frequency dependent loudspeaker polar responses are described by complex spherical harmonic frequency dependent functions. Multi-point system identification is performed using three-dimensional finite-difference time-domain simulation with optimization applied to seek global equalization represented by a set of orthogonal transfer functions applied to each spherical harmonic of each subwoofer within the array. The system is evaluated within a three-dimensional virtual acoustic space using both time and frequency domain metrics.
Convention Paper 8074 (Purchase now)
P15-2 Dynamical Measurement of the Effective Radiation Area SD—Wolfgang Klippel, University of Dresden - Dresden, Germany; Joachim Schlechter, Klippel GmbH - Dresden, Germany
The effective radiation area SD is one of the most important loudspeaker parameters because it determines the acoustical output (SPL, sound power) and efficiency of the transducer. This parameter is usually derived from the geometrical size of the radiator considering the diameter of half the surround area. This conventional technique fails for microspeakers and headphone transducers where the surround geometry is more complicated and the excursion varies not linearly versus radius. The paper discusses new methods for measuring the SD more precisely. The first method uses a laser sensor and microphone to measure the voice coil displacement and the sound pressure generated by the transducer while mounted in a sealed enclosure. The second method uses only mechanical vibration and geometry of the radiator measured by using a laser triangulation scanner. The paper checks the reliability and reproducibility of conventional and the new methods and discusses the propagation of the measurement error on the T/S parameters using text box perturbation technique and other derived parameters (sensitivity).
Convention Paper 8075 (Purchase now)
P15-3 Modeling Acoustic Horns with FEA—David J. Murphy, Krix Loudspeakers - Hackham, South Australia, Australia; Rick Morgans, Cyclopic Energy - Adelaide, South Australia, Australia
Simulations for acoustic horns have been developed using Finite Element Analysis (FEA) in both ANSYS and COMSOL modeling software. The FEA method solves the ideal, linear wave equation, so the effects of nonlinearity and viscosity are not inherently simulated. Quarter models were used in each case as the acoustic horns were not axi-symmetric. The size and shape of the horns have been informed by longstanding design parameters. The simulation results have been compared with acoustic measurements, and discrepancies investigated. While it was found that beam angle (dispersion) characteristics were relatively robust, it was useful in both cases to improve the accuracy of the beam angle, SPL, and impedance simulations by the application of frequency dependent damping to models of the compression driver.
Convention Paper 8076 (Purchase now)
P15-4 Electroacoustic Measurements for High Noise Environment Intercom Headsets—Stelios Potirakis, Technological Education Institute of Piraeus - Aigaleo-Athens, Greece; Nicolas – Alexander Tatlas, University of Patras - Patras, Greece; Maria Rangoussi, Technological Education Institute of Piraeus - Aigaleo-Athens, Greece
Intercom headsets are mandatory communication apparatus in high noise environment (HNE) conditions. Although military intercom headsets are typically used under extreme environmental conditions, a standard performance evaluation method exists only for the earphone elements. A systematic methodology for the measurement and performance evaluation of HNE headsets has recently been proposed based on the use of Head and Torso Simulator (HATS), addressing both signal reproduction and noise reduction issues. In this paper an improvement of the specific method is proposed concerning the headset electroacoustic reproduction measurements. The proposed enhancement refers to the use of impulse response measurements for the extraction of the specific characteristics as a function of frequency.
Convention Paper 8077 (Purchase now)
P15-5 Phase, Polarity, and Delay or Why a Loudspeaker Crossover Is Not a Time Machine—Ian Dash, Australian Broadcasting Corporation - Sydney, NSW, Australia; Fergus Fricke, University of Sydney - Sydney, NSW, Australia
Phase delay and group delay functions are well known in lumped system theory but they are not well understood, particularly in the context of non-minimum phase lumped systems. This paper presents four paradoxes that arise from an overly-literal interpretation of the phase delay function. The paradoxes are very simply illustrated using the example of a first order loudspeaker crossover. A new system response model is presented to resolve these paradoxes. The model is extended to higher order systems and to non-minimum phase systems. Applications and implications for system analysis are discussed.
Convention Paper 8078 (Purchase now)
P15-6 Time and Level Localization Curves for a Regularly-Spaced Octagon Loudspeaker Array—Laurent S. R. Simon, Russell Mason, University of Surrey - Guildford, Surrey, UK
Multichannel microphone array designs often use the localization curves that have been derived for 2-0 stereophony. Previous studies showed that side and rear perception of phantom image locations require somewhat different curves. This paper describes an experiment conducted to determine localization curves using an octagonal loudspeaker setup. Various signals with a range of interchannel time and level differences were produced between pairs of adjacent loudspeakers, and subjects were asked to evaluate the perceived sound event's direction and its locatedness. The results showed that the curves for the side pairs of adjacent loudspeakers are significantly different to the front and rear pairs. The resulting curves can be used to derive suitable microphone techniques for this loudspeaker setup.
Convention Paper 8079 (Purchase now)
P16 - Audio Processing—Audio Coding and Machine Interface
Monday, May 24, 10:30 — 12:00 (Room C4-Foyer)
P16-1 Trajectory Sampling for Computationally Efficient Reproduction of Moving Sound Sources—Nara Hahn, Keunwoo Choi, Hyunjoo Chung, Koeng-Mo Sung, Seoul National University - Seoul, Korea
Reproducing moving virtual sound sources has been addressed in number of spatial audio studies. The trajectories of moving sources are relatively oversampled if they are sampled in temporal domain, as this leads to inefficiency in computing the time delay and amplitude attenuation due to sound propagation. In this paper methods for trajectory sampling are proposed that use the spatial property of the movement and the spectral property of the received signal. These methods reduce the number of sampled positions and reduce the computational complexity. Listening tests were performed to determine the appropriate downsampling rate that depends not only on the trajectory but on the frequency content of the source signal.
Convention Paper 8080 (Purchase now)
P16-2 Audio Latency Measurement for Desktop Operating Systems with Onboard Soundcards—Yonghao Wang, Queen Mary University of London - London, UK, Birmingham City University, Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Joshua Reiss, Queen Mary University of London - London, UK
Using commodity computers in conjunction with live music digital audio workstations (DAW) has become increasingly more popular in recent years. The latency of these DAW audio processing chains for some application such as live audio monitoring has always been perceived as a problem when DSP audio effects are needed. With “High Definition Audio” being standardized as the onboard soundcard’s hardware architecture for personal computers, and with advances in audio APIs, the low latency and multichannel capability has made its way into home studios. This paper will discuss the results of latency measurements of current popular operating systems and hosts applications with different audio APIs and audio processing loads.
Convention Paper 8081 (Purchase now)
P16-3 Error Control Techniques within Multidimensional-Adaptive Audio Coding Algorithms—Neil Smyth, APTX - Belfast, N. Ireland, UK
Multidimensional-adaptive audio coding algorithms can adapt multiple performance measures to the demands of different audio applications in real-time. Depending on the transmission or storage environment, audio processing applications require forms of error control to maintain acceptable audio quality. By definition, multidimensional-adaptive audio coding utilizes numerous error detection, correction, and concealment techniques. However, such techniques also have implications for other relevant performance measurements, such as coded bit-rate and computational complexity. This paper discusses the signal-processing tools used by a multidimensional-adaptive audio coding algorithm to achieve varying levels of error control while the fundamental structure of the algorithm is also varying. The effects and trade-offs on other coding performance measures will also be discussed.
Convention Paper 8082 (Purchase now)
P16-4 Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation—Joonas Nikunen, Tuomas Virtanen, Tampere University of Technology - Tampere, Finland
This paper proposes a new object-based audio coding algorithm, which uses non-negative matrix factorization (NMF) for the magnitude spectrogram representation and the phase information is coded separately. The magnitude model is obtained using a perceptually weighted NMF algorithm, which minimizes the noise-to-mask ratio (NMR) of the decomposition, and is able to utilize long term redundancy by an object-based representation. Methods for the quantization and entropy coding of the NMF representation parameters are proposed and the quality loss is evaluated using the NMR measure. The quantization of the phase information is also studied. Additionally we propose a sparseness criteria for the NMF algorithm, which is set to favor the gain values having the highest probability and thus the shortest entropy coding word length, resulting to a reduced bit rate.
Convention Paper 8083 (Purchase now)
P16-5 Cross-Layer Rate-Distortion Optimization for Scalable Advanced Audio Coding—Emmanuel Ravelli, Vinay Melkote, Tejaswi Nanjundaswamy, Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Current scalable audio codecs optimize each layer of the bit-stream successively and independently with a straightforward application of the same rate-distortion optimization techniques employed in the non-scalable case. The main drawback of this approach is that the performance of the enhancement layers is significantly worse than that of the non-scalable codec at the same cumulative bit-rate. We propose in this paper a novel optimization technique in the Advanced Audio Coding (AAC) framework wherein a cross-layer iterative optimization is performed to select the encoding parameters for each layer with a conscious accounting of rate and distortion costs in all layers, which allows for a trade-off between performance at different layers. Subjective and objective results demonstrate the effectiveness of the proposed approach and provide insights for bridging the gap with the non-scalable codec.
Convention Paper 8084 (Purchase now)
P16-6 Issues and Solutions Related to Real-Time TD-PSOLA Implementation—Sylvain Le Beux, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France; Boris Doval, LAM-IJLRA, Université Paris - Paris, France; Christophe d'Alessandro, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France
This paper presents a procedure adaptation for the calculation of TD-PSOLA algorithm when the processing of pitch-shifting and time-stretching coefficients needs to be achieved in real-time at every new synthesis pitch mark. In the scope of standard TD-PSOLA algorithm, modification coefficients are defined from the analysis time axis whereas for real-time applications, pitch and duration control parameters need to be sampled at synthesis time. This paper will establish the theoretical correspondence between both approaches. Another issue related to real-time context concerns the trade-off between the latency required for processing and the type of analysis window used.
Convention Paper 8085 (Purchase now)
P16-7 Integrating Musicological Knowledge into a Probabilistic Framework for Chord and Key Extraction—Johan Pauwels, Jean-Pierre Martens, Ghent University - Ghent, Belgium
In this paper a formerly developed probabilistic framework for the simultaneous detection of chords and keys in polyphonic audio is further extended and validated. The system behavior is controlled by a small set of carefully defined free parameters. This has permitted us to conduct an experimental study that sheds a new light on the relative importance of musicological knowledge in the context of chord extraction. Some of the obtained results are at least surprising and, to our knowledge, never reported as such before.
Convention Paper 8086 (Purchase now)
P16-8 A Doubly Sparse Greedy Adaptive Dictionary Learning Algorithm for Music and Large-Scale Data—Maria G. Jafari, Mark D. Plumbley, Queen Mary University of London - London, UK
We consider the extension of the greedy adaptive dictionary learning algorithm that we introduced previously to applications other than speech signals. The algorithm learns a dictionary of sparse atoms, while yielding a sparse representation for the speech signals. We investigate its behavior in the analysis of music signals and propose a different dictionary learning approach that can be applied to large data sets. This facilitates the application of the algorithm to problems that generate large amounts of data, such as multimedia and multichannel application areas.
Convention Paper 8087 (Purchase now)
P17 - Multichannel and Spatial Audio: Part 1
Monday, May 24, 14:00 — 17:30 (Room C3)
Chair: Wieslaw Woszczyk, McGill University - Montreal, Quebec, Canada
P17-1 Individualization of Dynamic Binaural Synthesis by Real Time Manipulation of ITD—Alexander Lindau, Jorgos Estrella, Stefan Weinzierl, Technical University of Berlin - Berlin, Germany
The dynamic binaural synthesis of acoustic environments is usually constrained to the use non-individual impulse response datasets, measured with dummy heads or head and torso simulators. Thus, fundamental cues for localization such as interaural level differences (ILD) and interaural time differences (ITD) are necessarily corrupted to a certain degree. For ILDs, this is a minor problem as listeners may swiftly adapt to spectral coloration at least as long as an external reference is not provided. In contrast, ITD errors can be expected to lead to a constant degradation of localization. Hence, a method for the individual customization of dynamic binaural reproduction by means of real time manipulation of the ITD is proposed. As a prerequisite, subjectively artifact free techniques for the decomposition of binaural impulse responses into ILD and ITD cues are discussed. Finally, based on listening test results, an anthropometry-based prediction model for individual ITD correction factors is presented. The proposed approach entails further improvements of auditory quality of real time binaural synthesis.
Convention Paper 8088 (Purchase now)
P17-2 Perceptual Evaluation of Physical Predictors of the Mixing Time in Binaural Room Impulse Responses—Alexander Lindau, Linda Kosanke, Stefan Weinzierl, Technical University of Berlin - Berlin, Germany
The mixing time of room impulse responses denotes the moment when the diffuse reverberation tail begins. A diffuse sound field can physically be defined by (1) equi-distribution of acoustical energy and (2) a uniform acoustical energy flux over the complete solid angle. Accordingly, the perceptual mixing time is the moment when the diffuse tail cannot be distinguished from that of any other position in the room. This provides an opportunity for reducing the length of binaural impulse responses that are dynamically exchanged in virtual acoustic environments (VAEs). Numerous model parameters and empirical features for the prediction of perceptual mixing time in rooms have been proposed. This paper aims at a perceptual evaluation of all potential estimators. Therefore, binaural impulse response data sets were collected with an adjustable head and torso simulator for a representative sample of rectangular-shaped rooms. Prediction performance was evaluated by linear regression using results of a listening test where mixing times could be adaptively altered in real time to determine a just audible transition time into a homogeneous diffuse tail. Regression formulae for the perceptual mixing time are presented, conveniently predicting perceptive mixing times to be used in the context of VAEs.
Convention Paper 8089 (Purchase now)
P17-3 HRTF Measurements with a Continuously Moving Loudspeaker and Swept Sines—Ville Pulkki, Mikko-Ville Laitinen, Ville Pekka Sivonen, Aalto University School of Science and Technology - Aalto Finland
An apparatus is described, which is designed to measure head-related transfer functions (HRTFs) for audio applications. A broadband, two-driver loudspeaker is rotated around the subject with continuous movement, and responses are measured with a swept-sine technique. Potential error sources are discussed and quantified, and it is shown that the responses are almost identical to responses measured with a static, small single-driver loudspeaker. It is also shown that the method can be used to measure a large number of HRTFs in a relatively short time period.
Convention Paper 8090 (Purchase now)
P17-4 In Situ Microphone Array Calibration for Parameter Estimation in Directional Audio Coding—Oliver Thiergart, Giovanni Del Galdo, Maja Taseska, Jose Angel Pineda Pardo, Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional audio coding (DirAC) provides an efficient representation of spatial sound using a downmix audio signal and parametric information, namely direction-of-arrival (DOA) and diffuseness of sound. Input to the DirAC analysis are B-format signals, usually obtained via microphone arrays. The DirAC parameter estimation is impaired when phase mismatch between the array sensors occurs. We present an approach for the in situ microphone array calibration solely based on the DirAC parameters. The algorithm aims at providing consistent parameter estimates rather than matching the sensors explicitly. It does neither require to remove the sensors from the array, nor depend on a priori knowledge such as the array size. We further propose a suitable excitation signal to assure robust calibration in reverberant environments.
Convention Paper 8093 (Purchase now)
P17-5 Sound Field Recording by Measuring Gradients—Mihailo Kolundzija, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland, University of California at Berkeley, Berkeley, CA, USA
Gradient-based microphone arrays, the horizontal sound field's plane wave decomposition, and the corresponding circular harmonics decomposition are reviewed. Further, a general relation between directivity patterns of the horizontal sound field gradients and the circular harmonics of any order is derived. A number of example differential microphone arrays are analyzed, including arrays capable of approximating the sound pressure gradients necessary for obtaining the circular harmonics up to order three.
Convention Paper 8092 (Purchase now)
P17-6 Evaluation of a Binaural Reproduction System Using Multiple Stereo-Dipoles—Yesenia Lacouture Parodi, Per Rubak, Aalborg University - Aalborg, Denmark
The sweet spot size of different loudspeaker configurations was investigated in a previous study carried out by the authors. Closely spaced loudspeakers showed a wider control area than the standard stereo setup. The sweet spot with respect to head rotations showed to be especially large when the loudspeakers are placed at elevated positions. In this paper we describe a system that uses the characteristics of the loudspeakers placed above the listener. The proposed system is comprised of three pairs of closely spaced loudspeakers: one pair placed in front, one placed behind, and one placed above the listener. The system is based on the idea of dividing the sound reproduction into regions to reduce front-back confusions and enhance the virtual experience without the aid of a head tracker. A set of subjective experiments with the intention of evaluating and comparing the performance of the proposed system are also discussed.
Convention Paper 8091 (Purchase now)
P17-7 Conditioning of the Problem of a Source Array Design with Inverse Approach—Jeong-Guon Ih, KAIST - Daejeon, Korea; Wan-Ho Cho, KAIST - Daejeon, Korea, Chuo University, Bunkyoku, Tokyo, Japan
An inverse approach based on the acoustical holography concept can be effectively applied to the acoustic field rendered for achieving a target sound field given as a relative response distribution of sound pressure. To implement this method, the source configuration should be determined a priori, and a meaningful inverse solution of an ill-conditioned transfer matrix should be obtained. To choose efficient source positions that are almost mutually independent, the redundancy detection algorithm like the effective independence method was employed to decide the proper positions for a given number of sources. In this way, an efficient and stable filter set for a source array in controlling the sound field can be obtained. An interior domain with irregularly shaped boundaries was adopted as the target field to control for testing the suggested inverse method.
Convention Paper 8094 (Purchase now)
P18 - Audio Coding and Compression
Monday, May 24, 14:00 — 17:30 (Room C5)
Chair: Jamie A. S. Angus, University of Salford - Salford, Greater Manchester, UK
P18-1 High-Level Sound Coding with Parametric Blocks—Daniel Möhlmann, Otthein Herzog, Universität Bremen - Bremen, Germany
This paper proposes a new parametric encoding model for sound blocks that is specifically designed for manipulation, block-based comparison, and morphing operations. Unlike other spectral models, only the temporal evolution of the dominant tone and its time-varying spectral envelope are encoded, thus greatly reducing perceptual redundancy. All sounds are synthesized from the same set of model parameters, regardless of their length. Therefore, new instances can be created with greater variability than through simple interpolation. A method for creating the parametric blocks from an audio stream through partitioning is also presented. An example of sound morphing is shown and applications of the model are discussed.
Convention Paper 8096 (Purchase now)
P18-2 Exploiting High-Level Music Structure for Lossless Audio Compression—Florin Ghido, Tampere University of Technology - Tampere, Finland
We present a novel concept of "noncontiguous" audio segmentation by exploiting the high-level music structure. The existing lossless audio compressors working in asymmetrical mode divide the audio into quasi-stationary segments of variable length by recursive splitting (MPEG-4 ALS) or by dynamic programming (asymmetrical OptimFROG) before computing a set of linear prediction coefficients for each segment. Instead, we combine several variable length segments into a group and use a single set of linear prediction coefficients for each group. The optimal algorithm for combining has exponential complexity and we propose a quadratic time approximation algorithm. Integrated into asymmetrical OptimFROG, the proposed algorithm obtains up to 1.20% (on average 0.23%) compression improvements with no increase in decoder complexity.
Convention Paper 8097 (Purchase now)
P18-3 Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC Technology—Jürgen Herre, Cornelia Falch, Dirk Mahne, Giovanni del Galdo, Markus Kallinger, Oliver Thiergart, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The importance of telecommunication continues to grow in our everyday lives. An ambitious goal for developers is to provide the most natural way of audio communication by giving users the impression of being located next to each other. MPEG Spatial Audio Object Coding (SAOC) is a technology for coding, transmitting, and interactively reproducing spatial sound scenes on any conventional multi-loudspeaker setup (e.g., ITU 5.1). This paper describes how Directional Audio Coding (DirAC) can be used as recording front-end for SAOC-based teleconference systems to capture acoustic scenes and to extract the individual objects (talkers). By introducing a novel DirAC to SAOC parameter transcoder, a highly efficient way of combining both technologies is presented that enables interactive, object-based spatial teleconferencing.
Convention Paper 8098 (Purchase now)
P18-4 A New Parametric Stereo- and Multichannel Extension for MPEG-4 Enhanced Low Delay AAC (AAC-ELD)—María Luis Valero, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Andreas Hölzer, DSP Solutions GmbH & Co. - Regensburg, Germany; Markus Schnell, Johannes Hilpert, Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jonas Engdegård, Heiko Purnhagen, Per Ekstrand, Kristofer Kjörling, Dolby Sweden AB - Stockholm, Sweden
ISO/MPEG standardizes two communication codecs with low delay: AAC-LD is a well established low delay codec for high quality communication applications such as video conferencing, tele-presence, and Voice over IP. Its successor AAC-ELD offers enhanced bit rate efficiency being an ideal solution for broadcast audio gateway codecs. Many existing and upcoming communication applications benefit from the transmission of stereo or multichannel signals at low bitrates. With low delay MPEG Surround, ISO has recently standardized a low delay parametric extension for AAC-LD and AAC-ELD. It is based on MPEG Surround technology with specific adaption for low delay operation. This extension comes along with a significant improved coding efficiency for transmission of stereo and multichannel signals.
Convention Paper 8099 (Purchase now)
P18-5 Efficient Combination of Acoustic Echo Control and Parametric Spatial Audio Coding—Fabian Kuech, Markus Schmidt, Meray Zourub, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
High-quality teleconferencing systems utilize surround sound to provide natural communication experience. Directional Audio Coding (DirAC) is an efficient parametric approach to capture and reproduce spatial sound. It uses a monophonic audio signal together with parametric spatial cue information. For reproduction, multiple loudspeaker signals are determined based on the DirAC stream. To allow for hands-free operation, multichannel acoustic echo control (AEC) has to be employed. Standard approaches apply multichannel adaptive filtering to address this problem. However, computational complexity constraints and convergence issues inhibit practical applications. This paper proposes an efficient combination of AEC and DirAC by explicitly exploiting its parametric sound field representation. The approach suppresses the echo components in the microphone signals solely based on the single channel audio signal used for the DirAC synthesis of the loudspeaker signals.
Convention Paper 8100 (Purchase now)
P18-6 Sampling Rate Discrimination: 44.1 kHz vs. 88.2 kHz—Amandine Pras, Catherine Guastavino, McGill University - Montreal, Quebec, Canada
It is currently common practice for sound engineers to record digital music using high-resolution formats, and then down sample the files to 44.1 kHz for commercial release. This study aims at investigating whether listeners can perceive differences between musical files recorded at 44.1 kHz and 88.2 kHz with the same analog chain and type of AD-converter. Sixteen expert listeners were asked to compare 3 versions (44.1 kHz, 88.2 kHz, and the 88.2 kHz version down-sampled to 44.1 kHz) of 5 musical excerpts in a blind ABX task. Overall, participants were able to discriminate between files recorded at 88.2 kHz and their 44.1 kHz down-sampled version. Furthermore, for the orchestral excerpt, they were able to discriminate between files recorded at 88.2 kHz and files recorded at 44.1 kHz.
Convention Paper 8101 (Purchase now)
P18-7 Comparison of Multichannel Audio Decoders for Use in Mobile and Handheld Devices—Manish Nema, Ashish Malot, Nokia India Pvt. Ltd. - Bangalore, Karnataka, India
Multichannel audio provides immersive experience to listeners. Consumer demand coupled with technological improvements will drive consumption of high-definition content in mobile and handheld devices. There are several multichannel audio coding algorithms, both, proprietary ones like Dolby Digital, Dolby Digital Plus, Windows Media Audio Professional (WMA Pro), Digital Theater Surround High Definition (DTS-HD), and standard ones like Advanced Audio Coding (AAC), MPEG Surround, available in the market. This paper presents salient features/coding techniques of important multichannel audio decoders and a comparison of these decoders on key parameters like processor complexity, memory requirements, complexity/features for stereo playback, and quality/coding efficiency. The paper also presents a ranking of these multichannel audio decoders on the key parameters in a single table for easy comparison.
Convention Paper 8102 (Purchase now)
P19 - Psychoacoustics and Listening Tests
Monday, May 24, 14:00 — 15:30 (Room C4-Foyer)
P19-1 Auditory Perception of Dynamic Range in the Nonlinear System—Andrew J. R. Simpson, Simpson Microphones - West Midlands, UK
This paper is concerned with the perception of dynamic range in the nonlinear system. The work is differentiated from the generic investigation of “sound quality,” which is usually associated with studies of nonlinear distortion. The proposed hypothesis suggests that distortion products generated within the compressive type nonlinear system are able to act as loudness compensator for the associated amplitude compression in the perceived loudness function. The hypothesis is tested using the Time-Varying Loudness model of Glasberg and Moore (2002) and further tested using AB scale hidden reference type listening test methods. The Loudness Overflow Effect (LOE) is introduced and bandwidth is shown to be a significant limiting factor. Results and immediate implications are briefly discussed.
Convention Paper 8103 (Purchase now)
P19-2 A Subjective Evaluation of the Minimum Audible Channel Separation in Binaural Reproduction Systems through Loudspeakers—Yesenia Lacouture Parodi, Per Rubak, Aalborg University - Aalborg, Denmark
To evaluate the performance of crosstalk cancellation systems the channel separation is usually used as a parameter. However, no systematic evaluation of the minimum audible channel separation has been found in the literature known by the authors. This paper describes a set of subjective experiments carried out to evaluate the minimum channel separation needed such the binaural signals with crosstalk are perceived to be equal to the binaural signals reproduced without crosstalk. A three alternative-forced-choice discrimination experiment, with a simple adaptive algorithm with weighed up-down method was used. The minimum audible channel separation was evaluated for the listeners placed at symmetric and asymmetric positions with respect to the loudspeakers. Eight different stimuli placed at two different locations were evaluated. Span angles of 12 and 60 degrees were also simulated. Results indicate that in order to avoid lateralization the channel separation should be below –15 dB for most of the stimuli and around –20 dB for broad-band noise.
Convention Paper 8104 (Purchase now)
P19-3 Evaluation of Speech Intelligibility in Digital Hearing Aids—Lorena Álvarez, Leticia Vaquero, Enrique Alexandre, Lucas Cuadra, Roberto Gil-Pita, University of Alcalá - Alcalá de Henares – Madrid, Spain
This paper explores the feasibility of using the Speech Intelligibility Index (SII) to evaluate the performance of a digital hearing aid. This standardized measure returns a number, between zero and unity, which can be interpreted as the proportion of the total speech information available to the listener, and correlates with the intelligibility of the speech signal. The paper will focus on the use of the SII as a metric from which to compare the performance of two different hearing aids, in terms of speech intelligibility. From the purpose of this work, experiments employing data from four real subjects with mild-to-profound hearing losses, when using these different hearing aids, will be done. Results will show how the use of the SII can lead to a better selection of a hearing aid in the detriment of others, while avoiding the need for making extensive subjective listening tests.
Convention Paper 8105 (Purchase now)
P19-4 Method to Improve Speech Intelligibility in Different Noise Conditions—Rogerio G. Alves, Kuan-Chieh Yen, Michael C. Vartanian, Sameer A. Gadre, CSR - Cambridge Silicon Radio - Auburn Hills, MI, USA
Mobile communication applications have to address various environmental noise situations. In order to improve the quality of voice communication, not only an effective noise reduction algorithm for the far-end user is wanted, but also an algorithm that helps improve intelligibility for the near-end user in different environmental noise situations is desired. Due to it, the goal of this paper is to improve the overall voice communication experience of mobile device users by introducing a method to improve intelligibility by increasing perceptual loudness of the received speech signal in accordance with the environmental noise. Please note that, due to the characteristics of the mobile devices application, a method capable of working in real time with low computational complexity is highly desired.
Convention Paper 8106 (Purchase now)
P19-5 Contemporary Theories of Tinnitus Generation, Diagnosis, and Management Practices—Stamatia Staikoudi, Queen Margaret University - Edinburgh, Scotland, UK
Tinnitus can be defined as the perception of a sound in the head or the ears, in the absence of external acoustic stimulation. It is a symptom experienced by more than seven million people across the UK and many more worldwide, including children. Its pitch may vary from individual to individual, and it can be described as ringing, whistling, humming, or buzzing amongst other. We will be looking at contemporary theories for its generation, current methods of diagnosis, and management practices.
Convention Paper 8107 (Purchase now)
P19-6 Analytical and Perceptual Evaluation of Nonlinear Devices for Virtual Bass System—Nay Oo, Woon-Seng Gan, Nanyang Technological University - Singapore
Nonlinear devices (NLDs) are generally used in virtual bass systems (VBS). The prime objective is to extend the low frequency bandwidth psychoacoustically by generating a series of harmonics in the upper-bass and/or mid-frequency range where loudspeakers can reproduce well. However, these artificially added harmonics introduce intermodulation distortion and may change the timbre of the sound tracks. In this paper nine memoryless NLDs are studied based on objective analysis and subjective listening tests. The objectives of this paper are (1) to quantify the spectral contents of NLDs when fed by single-tone; (2) to find out which type of NLDs is best for psychoacoustics bass enhancement through subjective listening tests and objective GedLee nonlinear distortion metric; and (3) to investigate whether there is any correlation between subjective listening tests results and objective performance scores.
Convention Paper 8108 (Purchase now)
P20 - Audio Content Management—Audio Information Retrieval
Monday, May 24, 16:30 — 18:00 (Room C4-Foyer)
P20-1 Complexity Scalable Perceptual Tempo Estimation From HE-AAC Encoded Music—Danilo Hollosi, Ilmenau University of Technology - Ilmenau, Germany; Arijit Biswas, Dolby Germany GmbH - Nürberg, Germany
A modulation frequency-based method for perceptual tempo estimation from HE-AAC encoded music is proposed. The method is designed to work on fully-decoded PCM-domain; the intermediate HE-AAC transform-domain after partial decoding; and directly on HE-AAC compressed-domain using Spectral Band Replication (SBR) payload. This offers complexity scalable solutions. We demonstrate that SBR payload is an ideal proxy for tempo estimation directly from HE-AAC bit-streams without even decoding them. A perceptual tempo correction stage is proposed based on rhythmic features to correct for octave errors in every domain. Experimental results show that the proposed method significantly outperforms two commercially available systems, both in terms of accuracy and computational speed.
Convention Paper 8109 (Purchase now)
P20-2 On the Effect of Reverberation on Musical Instrument Automatic Recognition—Mathieu Barthet, Mark Sandler, Queen Mary University of London - London, UK
This paper investigates the effect of reverberation on the accuracy of a musical instrument recognition model based on Line Spectral Frequencies and K-means clustering. One-hundred-eighty experiments were conducted by varying the type of music databases (isolated notes, solo performances), the stage in which the reverberation is added (learning, and/or testing), and the type of reverberation (3 different reverberation times, 10 different dry-wet levels). The performances of the model systematically decreased when reverberation was added at the testing stage (by up to 40%). Conversely, when reverberation was added at the training stage, a 3% increase of performance was observed for the solo performances database. The results suggest that pre-processing the signals with a dereverberation algorithm before classification may be a means to improve musical instrument recognition systems.
Convention Paper 8110 (Purchase now)
P20-3 Harmonic Components Extraction in Recorded Piano Tones—Carmine Emanuele Cella, Università di Bologna - Bologna, Italy
It is sometimes desirable, in the purpose of analyzing recorded piano tones, to remove from the original signal the noisy components generated by the hammer strike and by other elements involved in the piano action. In this paper we propose an efficient method to achieve such result, based on adaptive filtering and automatic estimation of fundamental frequency and inharmonicity; the final method, applied on a recorded piano tone, produces two separate signals containing, respectively, the hammer knock and the harmonic components. Some sound examples to listen for evaluation are available on the web as specified in the paper.
Convention Paper 8111 (Purchase now)
P20-4 Browsing Sound and Music Libraries by Similarity—Stéphane Dupont, Université de Mons - Mons, Belgium; Christian Frisson, Université Catholique de Louvain - Louvain-la-Neuve, Belgium; Xavier Siebert, Damien Tardieu, Université de Mons - Mons, Belgium
This paper presents a prototype tool for browsing through multimedia libraries using content-based multimedia information retrieval techniques. It is composed of several groups of components for multimedia analysis, data mining, interactive visualization, as well as connection with external hardware controllers. The musical application of this tool uses descriptors of timbre, harmony, as well as rhythm and two different approaches for exploring/browsing content. First, a dynamic data mining allows the user to group sounds into clusters according to those different criteria, whose importance can be weighted interactively. In a second mode, sounds that are similar to a query are returned to the user, and can be used to further proceed with the search. This approach also borrows from multi-criteria optimization concept to return a relevant list of similar sounds.
Convention Paper 8112 (Purchase now)
P20-5 On the Development and Use of Sound Maps for Environmental Monitoring—Maria Rangoussi, Stelios M. Potirakis, Ioannis Paraskevas, Technological Education Institute of Piraeus - Aigaleo-Athens, Greece; Nicolas–Alexander Tatlas, University of Patras - Patras, Greece
The development, update, and use of sound maps for the monitoring of environmental interest areas is addressed in this paper. Sound maps constitute a valuable tool for environmental monitoring. They rely on networks of microphones distributed over the area of interest to record and process signals, extract and characterize sound events and finally form the map; time constraints are imposed by the need for timely information representation. A stepwise methodology is proposed and a series of practical considerations are discussed to the end of obtaining a multi-layer sound map that is periodically updated and visualizes the sound content of a “scene.” Alternative time-frequency-based features are investigated as to their efficiency within the framework of a hierarchical classification structure.
Convention Paper 8113 (Purchase now)
P20-6 The Effects of Reverberation on Onset Detection Tasks—Thomas Wilmering, György Fazekas, Mark Sandler, Queen Mary University of London - London, UK
The task of onset detection is relevant in various contexts such as music information retrieval and music production, while reverberation has always been an important part of the production process. The effect may be the product of the recording space or it may be artificially added, and, in our context, destructive. In this paper we evaluate the effect of reverberation on onset detection tasks. We compare state-of-the art techniques and show that the algorithms have varying degrees of robustness in the presence of reverberation depending on the content of the analyzed audio material.
Convention Paper 8114 (Purchase now)
P20-7 Segmentation and Discovery of Podcast Content—Steven Hargreaves, Chris Landone, Mark Sandler, Panos Kudumakis, Queen Mary University of London - London, UK
With ever increasing amounts of radio broadcast material being made available as podcasts, sophisticated methods of enabling the listener to quickly locate material matching their own personal tastes become essential. Given the ability to segment a podcast that may be in the order of one or two hours duration into individual song previews, the time the listener spends searching for material of interest is minimized. This paper investigates the effectiveness of applying multiple feature extraction techniques to podcast segmentation and describes how such techniques could be exploited by a vast number of digital media delivery platforms in a commercial cloud-based radio recommendation and summarization service.
Convention Paper 8115 (Purchase now)
P21 - Multichannel and Spatial Audio: Part 2
Tuesday, May 25, 09:00 — 13:00 (Room C3)
Chair: Ronald Aarts
P21-1 Center-Channel Processing in Virtual 3-D Audio Reproduction Over Headphones or Loudspeakers—Jean-Marc Jot, Martin Walsh, DTS Inc. - Scotts Valley, CA, USA
Virtual 3-D audio processing systems for the spatial enhancement of recordings reproduced over headphones or frontal loudspeakers generally provide a less compelling effect on center-panned sound components. This paper examines this deficiency and presents virtual 3-D audio processing algorithm modifications that provide a compelling spatial enhancement effect over headphones or loudspeakers even for sound components localized in the center of the stereo image, ensure the preservation of the timbre and balance in the original recording, and produce a more stable “phantom center” image over loudspeakers. The proposed improvements are applicable, in particular, in laptop and TV set audio systems, mobile internet devices, and home theater “soundbar” loudspeakers.
Convention Paper 8116 (Purchase now)
P21-2 Parametric Representation of Complex Sources in Reflective Environments—Dylan Menzies, De Montfort University - Leicester, UK
Aspects of source directivity in reflective environments are considered, including the audible effects of directivity and how these can be reproduced. Different methods of encoding and production are presented, leading to a new approach to extend parametric encoding of reverberation, as described in the DIRAC and MPEG formats, to include the response to source directivity.
Convention Paper 8118 (Purchase now)
P21-3 Analysis and Improvement of Pre-Equalization in 2.5-Dimensional Wave Field Synthesis—Sascha Spors, Jens Ahrens, Technische Universität Berlin - Berlin, Germany
Wave field synthesis (WFS) is a well established high-resolution spatial sound reproduction technique. Typical WFS systems aim at the reproduction in a plane using loudspeakers enclosing the plane. This constitutes a so-called 2.5-dimensional reproduction scenario. It has been shown that a spectral correction of the reproduced wave field is required in this context. For WFS this correction is known as pre-equalization filter. The derivation of WFS is based on a series of approximations of the physical foundations. This paper investigates on the consequences of these approximations on the reproduced sound field and in particular on the pre-equalization filter. An exact solution is provided by the recently presented spectral division method and is employed in order to derive an improved WFS driving function. Furthermore, the effects of spatial sampling and truncation on the pre-equalization are discussed.
Convention Paper 8121 (Purchase now)
P21-4 Discrete Wave Field Synthesis Using Fractional Order Filters and Fractional Delays—César D. Salvador, Universidad de San Martin de Porres - Lima, Peru
A discretization of the generalized 2.5D Wave Field Synthesis driving functions is proposed in this paper. Time discretization is applied with special attention to the prefiltering that involves half-order systems and to the delaying that involves fractional-sample delays. Space discretization uses uniformly distributed loudspeakers along arbitrarily shaped contours: visual and numerical comparisons between lines and convex arcs, and between squares and circles, are shown. An immersive soundscape composed of nature sounds is reported as an example. Modeling uses MATLAB and real-time reproduction uses Pure Data. Simulations of synthesized plane and spherical wave fields, in the whole listening area, report a discretization percentage error of less than 1%, using 16 loudspeakers and 5th order IIR prefilters.
Convention Paper 8122 (Purchase now)
P21-5 Immersive Virtual Sound Beyond 5.1 Channel Audio—Kangeun Lee, Changyong Son, Dohyung Kim, Samsung Advanced Institute of Technology - Suwon, Korea
In this paper a virtual sound system is introduced for the next generation multichannel audio. The sound system provides a 9.1 channel surround sound via a conventional 5.1 loudspeaker layout and contents. In order to deliver 9.1 sound, the system includes channel upmixing and vertical sound localization that can create virtually localized sound at any spherical surface around human head. An amplitude panning coefficient is used to channel upmixing that includes a smoothing technique to reduce musical noise occurred by upmixing. The proposed vertical rendering is based on VBAP (vector based amplitude panning) using three loudspeakers among the 5.1. For quality test, our upmixing and virtual rendering method is evaluated in real 9.1 and 5.1 loudspeaker respectively and compared with Dolby Pro Logic IIz. The demonstrated performance is superior to the references.
Convention Paper 8117 (Purchase now)
P21-6 Acoustical Zooming Based on a Parametric Sound Field Representation—Richard Schultz-Amling, Fabian Kuech; Oliver Thiergart, Markus Kallinger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional audio coding (DirAC) is a parametric approach to the analysis and reproduction of spatial sound. The DirAC parameters, namely direction-of-arrival and diffuseness of sound can be further exploited in modern teleconferencing systems. Based on the directional parameters, we can control a video camera to automatically steer on the active talker. In order to provide consistency between the visual and acoustical cues, the virtual recording position should match the visual movement. In this paper we present an approach for an acoustical zoom, which provides audio rendering that follows the movement of the visual scene. The algorithm does not rely on a priori information regarding the sound reproduction system as it operates directly in the DirAC parameter domain.
Convention Paper 8120 (Purchase now)
P21-7 SoundDelta: A Study of Audio Augmented Reality Using WiFi-Distributed Ambisonic Cell Rendering—Nicholas Mariette, Brian F. G. Katz, LIMSI-CNRS - Orsay, France; Khaled Boussetta, Université Paris 13 - Paris, France; Olivier Guillerminet, REMU - Paris, France
SoundDelta is an art/research project that produced several public audio augmented reality art-works. These spatial soundscapes were comprised of virtual sound sources located in a designated terrain such as a town square. Pedestrian users experienced the result as interactive binaural audio by walking through the augmented terrain, using headphones and the SoundDelta mobile device. SoundDelta uses a distributed "Ambisonic cell" architecture that scales efficiently for many users. A server renders Ambisonic audio for fixed user positions, which is streamed wirelessly to mobile users that render a custom, individualized binaural mix for their present position. A spatial cognition mapping experiment was conducted to validate the soundscape perception and compare with an individual rendering system.
Convention Paper 8123 (Purchase now)
P21-8 Surround Sound Panning Technique Based on a Virtual Microphone Array—Filippo M. Fazi, University of Southampton - Southampton, UK; Toshiro Yamada, Suketu Kamdar, University of California, San Diego - La Jolla, CA, USA; Philip A. Nelson, University of Southampton - Southampton, UK; Peter Otto, University of California, San Diego - La Jolla, CA, USA
A multichannel panning technique is presented, which aims at reproducing a plane wave with an array of loudspeakers. The loudspeaker gains are computed by solving an acoustical inverse problem. The latter involves the inversion of a matrix of transfer functions between the loudspeakers and the elements of a virtual microphone array, the center of which corresponds to the location of the listener. The radius of the virtual microphone array is varied consistently with the frequency, in such a way that the transfer function matrix is independent of the frequency. As a consequence, the inverse problem is solved for one frequency only, and the loudspeaker coefficients obtained can be implemented using simple gains.
Convention Paper 8119 (Purchase now)
P22 - Microphones, Converters, and Amplifiers
Tuesday, May 25, 09:00 — 12:00 (Room C5)
Chair: Mark Sandler, Queen Mary University of London - London, UK
P22-1 A Comparison of Phase-Shift Self-Oscillating and Carrier-Based PWM Modulation for Embedded Audio Amplifiers—Alexandre Huffenus, Gaël Pillonnet, Nacer Abouchi, Lyon Institute of Nanotechnology - Villeurbanne, France; Frédéric Goutti, STMicroelectronics - Grenoble, France
This paper compares two modulation schemes for Class-d amplifiers: phase-shift self-oscillating (PSSO) and carrier-based pulse width modulation (PWM). Theoretical analysis (modulation, frequency of oscillation, bandwidth, etc.), design procedure, and IC silicon evaluation will be shown for mono and stereo operation (on the same silicon die) on both structures. The design of both architectures will use as many identical building blocks as possible, to provide a fair, “all else being equal,” comparison. THD+N performance and idle consumption went from 0.02% and 5.6mA in PWM to 0.007% and 5.2mA in self-oscillating. Other advantages and drawbacks of the self-oscillating structure will be explained and compared to the classical carrier-based PWM one, with a focus on battery-powered applications.
Convention Paper 8127 (Purchase now)
P22-2 Digital-Input Class-D Audio Amplifier—Hassan Ihs, Christian Dufaza, Primachip SAS - Marseille, France
Not only digital-input class-D audio amplifier directly converts digital PCM-coded audio signal into power, it also exhibits superior performances with respect to traditional PWM analog class-D amplifiers. In the latter, several analog techniques have to be deployed to combat the many side-effects inherent to analog blocks, resulting in complex circuitry. Digital domain provides many degrees of freedom that allow combating signal non-idealities with no or just little extra cost. As the output to the real world is analog, digital class-D amplifier requires Analog to Digital Conversion (ADC) in feedback to compensate for jitter, signal distortions, and power supply noise. With careful sigma delta modulation design and few digital techniques, the class-D loop is stabilized and achieves superior audio performances. The audio signal cascade chain is tremendously simplified resulting in significant reduction in area cost and power consumption. For applications that require analog input processing, the digital class-D still accepts analog inputs with no extra cost.
Convention Paper 8128 (Purchase now)
P22-3 Digital PWM Amplifier Using Nonlinear Feedback and Predistortion—Peter Craven, Algol Applications Ltd. - Steyning, West Sussex, UK; Larry Hand, Intersil/D2Audio - Austin, TX, USA; Brian Attwood, PWM Systems - Crawley, Sussex, UK; Jack Andersen, D2Audio - Austin, TX, USA (deceased)
A nonlinear feedback topology is used to reduce the deviations of a practical PWM output stage from ideal theoretical behavior, the theoretical nonlinearity of the PWM process being corrected using predistortion. As the final output is analog, an ADC is needed if the feedback is to be digital, and several problems arise from the practical limitations of commercially-available ADCs, including delay and addition of ultrasonic noise. We show how these problems can be minimized and illustrate the performance of a digital PWM amplifier in which feedback results in a significant reduction of distortion throughout the audio range.
Convention Paper 8129 (Purchase now)
P22-4 Microphone Choice: Large or Small, Single or Double?—Martin Schneider, Georg Neumann GmbH - Berlin, Germany
How do large and small diaphragm condenser microphones differ? A common misapprehension is that large capsules necessarily become less directional at low frequencies. It is shown that this is not a question of large or small, but rather of single or double diaphragm design. The different behaviors have a direct impact on the sound engineers’ choice and placement of microphone. Likewise, the much debated question of proximity effect with multi-pattern microphones and omnidirectional directivity is discussed.
Convention Paper 8124 (Purchase now)
P22-5 Improvements on a Low-Cost Experimental Tetrahedral Ambisonic Microphone—Dan T. Hemingson, Mark J. Sarisky, University of Texas at Austin - Austin, TX, USA
An earlier paper [Hemingson, Dan & Sarisky; A Practical Comparison of Three Tetrahedral Ambisonic Microphones, 126th AES Convention Munich May09] compared two low-cost tetrahedral ambisonic microphones, an experimental microphone and a Core Sound TetraMic, using a Soundfield MKV or SPS422B as a standard for comparison. This paper examines improvements to the experimental device, including that suggested in the “future work” section of the original paper. Modifications to the capsules and a redesign of the electronics package made significant improvements in the experimental microphone. Recordings were made in natural environments and of live performances, some simultaneously with the Soundfield standard. Of interest is the use of the low-cost surround microphone for student and experimental education.
Convention Paper 8125 (Purchase now)
P22-6 Analysis of the Interaction between Ribbon Motor, Transformer, and Preamplifier and its Application in Ribbon Microphone Design—Julian David, Audio Engineering Associates - Pasadena, CA, USA, University of Applied Sciences Düsseldorf, Düsseldorf, Germany
The transformer in ribbon microphones interacts with the complex impedances of both the ribbon motor and the subsequent preamplifier. This paper presents a test setup that takes the influences of source and load impedances into account for predicting the amplitude and phase response of a specific ribbon microphone design in combination with different transformers. As a result, the effect of the transformer on the amplitude and phase response of the system can be simulated in good approximation by means of a generic low-impedance source. This allows for optimizing the ribbon/transformer/load circuit under laboratory conditions in order to achieve the desired microphone performance.
Convention Paper 8126 (Purchase now)
P23 - Audio Processing—Music and Speech Signal Processing
Tuesday, May 25, 10:30 — 12:00 (Room C4-Foyer)
P23-1 Beta Divergence for Clustering in Monaural Blind Source Separation—Martin Spiertz, Volker Gnann, RWTH Aachen University - Aachen, Germany
General purpose audio blind source separation algorithms have to deal with a large dynamic range for the different sources to be separated. In the used algorithm the mixture is separated into single notes. These notes are clustered to construct the melodies played by the active sources. The non-negative matrix factorization (NMF) leads to good results in clustering the notes according to spectral features. The cost function for the NMF is controlled by the parameter beta. Beta should be adjusted properly depending on the dynamic difference of the sources. The novelty of this paper is to propose a simple unsupervised decision scheme that estimates the optimal parameter beta for increasing the separation quality over a large range of dynamic differences.
Convention Paper 8130 (Purchase now)
P23-2 On the Effects of Room Reverberation in 3-D DOA Estimation Using a Tetrahedral Microphone Array—Maximo Cobos, Jose J. Lopez, Amparo Marti, Universidad Politécnica de Valencia - Valencia, Spain
This paper studies the accuracy in the estimation of the Direction-Of-Arrival (DOA) of multiple sound sources using a small microphone array. As other sparsity-based algorithms, the proposed method is able to work in undetermined scenarios, where the number of sound sources exceeds the number of microphones. Moreover, the tetrahedral shape of the array allows estimation of DOAs in the three-dimensional space easily, which is an advantage over other existing approaches. However, since the proposed processing is based on an anechoic signal model, the estimated DOA vectors are severely affected by room reflections. Experiments to analyze the resultant DOA distribution under different room conditions and source arrangements are discussed using both simulations and real recordings.
Convention Paper 8131 (Purchase now)
P23-3 Long Term Cepstral Coefficients for Violin Identification—Ewa Lukasik, Poznan University of Technology - Poznan, Poland
Cepstral coefficients in mel scale proved to be efficient features for speaker and musical instrument recognition. In this paper Long Term Cepstral Coefficients—LTCCs—of solo musical phrases are used as features for identification of individual violins. LTCC represents the envelope of LTAS—Long Term Average Spectrum—in linear scale useful to characterize the subtleties’ of violin sound in frequency domain. Results of the classification of 60 instruments are presented and discussed. It was shown, that if the experts’ knowledge is applied to analyze violin sound, the results may be promising.
Convention Paper 8132 (Purchase now)
P23-4 Adaptive Source Separation Based on Reliability of Spatial Feature Using Multichannel Acoustic Observations—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan
Separation of sound source can be achieved by spatial filtering with multichannel acoustic observations. However, the right algorithm should be prepared in each condition of acoustic scene. It is difficult to provide the suitable algorithm under real acoustic environments. In this paper an adaptive source separation scheme is proposed based on the reliability of a spatial feature, which gives an estimate of direction of arrival (DOA). As confidence measures for DOA estimates, the third and fourth moments for spatial features are employed to measure how sharp the main-lobes of spatial features are. This paper proposes to selectively use either spatial filters or frequency-selective filters without spatial filtering depending on the reliability of each DOA estimate.
Convention Paper 8133 (Purchase now)
P23-5 A Heuristic Text-Driven Approach for Applied Phoneme Alignment—Konstantinos Avdelidis, Charalampos Dimoulas, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The paper introduces a phoneme matching algorithm considering a novel concept of functional strategy. In contrast to the classic methodologies that are focusing on the convergence to a fixed expected phonemic sequence (EPS), the presented method follows a more realistic approach. Based on text input, a soft EPS is populated taking into consideration the structural and linguistic deviations that may appear in a naturally spoken sequence. The results of the matching process is evaluated using fuzzy inference and is consisted of both the phoneme transition positions as well as the actual utterance phonemic content. An overview of convergence quality performance through a series of runs for the Greek language is presented.
Convention Paper 8134 (Purchase now)
P23-6 Speech Enhancement with Hybrid Gain Functions—Xuejing Sun, Kuan-Chieh Yen, Cambridge Silicon Radio - Auburn Hills, MI, USA
This paper describes a hybrid gain function for single-channel acoustic noise suppression systems. The proposed gain function consists of Wiener filter and Minimum Mean Square Error – Log Spectral Amplitude estimator (MMSE-LSA) gain functions and selects respected gain values accordingly. Objective evaluation using a composite measure shows the hybrid gain function yields better results over using either of the two functions alone.
Convention Paper 8135 (Purchase now)
P23-7 Human Voice Modification Using Instantaneous Complex Frequency—Magdalena Kaniewska, Gdansk University of Technology - Gdansk, Poland
The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants’ positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully characterized by its envelope and the other with positive instantaneous frequency. ICFs of the factors are modified individually for different sound effects.
Convention Paper 8136 (Purchase now)
P23-8 Designing Optimal Phoneme-Wise Fuzzy Cluster Analysis—Konstantinos Avdelidis, Charalampos Dimoulas, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
A large number of pattern classification algorithms and methodologies have been proposed for the phoneme recognition task during the last decades. The current paper presents a prototype distance-based fuzzy classifier, optimized for the needs of phoneme recognition. This is accomplished by the specially designed objective function and a respective training strategy. Particularly, each phonemic class is represented by a number of arbitrary-shaped clusters that adaptively match the corresponding features space distribution. The formulation of the approach is capable of delivering a variety of related conclusions based on fuzzy logic arithmetic. An overview of the inference capability is presented in combination with performance results for the Greek language.
Convention Paper 8137 (Purchase now)
P24 - Innovative Applications
Tuesday, May 25, 14:00 — 17:30 (Room C3)
Chair: John Dawson
P24-1 The Serendiptichord: A Wearable Instrument for Contemporary Dance Performance—Tim Murray-Browne, Di Mainstone, Nick Bryan-Kinns, Mark D. Plumbley, Queen Mary University of London - London, UK
We describe a novel musical instrument designed for use in contemporary dance performance. This instrument, the Serendiptichord, takes the form of a headpiece plus associated pods that sense movements of the dancer, together with associated audio processing software driven by the sensors. Movements such as translating the pods or shaking the trunk of the headpiece cause selection and modification of sampled sounds. We discuss how we have closely integrated physical form, sensor choice, and positioning and software to avoid issues that otherwise arise with disconnection of the innate physical link between action and sound, leading to an instrument that non-musicians (in this case, dancers) are able to enjoy using immediately.
Convention Paper 8139 (Purchase now)
P24-2 A Novel User Interface for Musical Timbre Design—Allan Seago, London Metropolitan University - London, UK; Simon Holland, Paul Mulholland, Open University - UK
The complex and multidimensional nature of musical timbre is a problem for the design of intuitive interfaces for sound synthesis. A useful approach to the manipulation of timbre involves the creation and subsequent navigation or search of n-dimensional coordinate spaces or timbre spaces. A novel timbre space search strategy is proposed based on weighted centroid localization (WCL). The methodology and results of user testing of two versions of this strategy in three distinctly different timbre spaces are presented and discussed. The paper concludes that this search strategy offers a useful means of locating a desired sound within a suitably configured timbre space.
Convention Paper 8140 (Purchase now)
P24-3 Bi-Directional Audio-Tactile Interface for Portable Electronic Devices—Neil Harris, New Transducers Ltd. (NXT) - Cambridge, UK
When an audio system uses the screen or casework vibrating as the loudspeaker, it can also provide haptic feedback. Just as a loudspeaker may be used reciprocally as a microphone, the haptic feedback aspect of the design may be operated as a touch sensor. This paper considers how to model a basic system embodying these aspects, including the electrical part, with a finite element package. For a piezoelectric exciter, full reciprocal modeling is possible, but for electromagnetic exciters it is not, unless multi-physics simulation is supported. For the latter, a model using only lumped parameter mechanical elements is developed.
Convention Paper 8141 (Purchase now)
P24-4 Tactile Music Instrument Recognition for Audio Mixers—Sebastian Merchel, Ercan Altinsoy, Maik Stamm, Dresden University of Technology - Dresden, Germany
To use touch screens for digital audio workstations, particularly audio mixing consoles, is not very common today. One reason is the ease of use and the intuitive tactile feedback that hardware faders, knobs, and buttons provide. Adding tactile feedback to touch screens will largely improve usability. In addition touch screens can reproduce innovative extra tactile information. This paper investigates several design parameters for the generation of the tactile feedback. The results indicate that music instruments can be distinguished if tactile feedback is rendered from the audio signal. This helps to improve recognition of an audio signal source that is assigned, e.g., to a specific mixing channel. Applying this knowledge, the use of touch screens in audio applications becomes more intuitive.
Convention Paper 8142 (Purchase now)
P24-5 Augmented Reality Audio Editing—Jacques Lemordant, Yohan Lasorsa, INRIA - Rhône-Alpes, France
The concept of augmented reality audio (ARA) characterizes techniques where a physically real sound and voice environment is extended with virtual, geolocalized sound objects. We show that the authoring of an ARA scene can be done through an iterative process composed of two stages: in the first one the author has to move in the rendering zone to apprehend the audio spatialization and the chronology of the audio events, and in the second one a textual editing of the sequencing of the sound sources and DSP acoustics parameters is done. This authoring process is based on the joint use of two XML languages, OpenStreetMap for maps and A2ML for Interactive 3-D audio. A2ML, being a format for a cue-oriented interactive audio system, requests for interactive audio services are done through TCDL, a tag-based cue dispatching language. This separation of modeling and audio rendering is similar to what is done for the web of documents with HTML and CSS style sheets.
Convention Paper 8143 (Purchase now)
P24-6 Evaluation of a Haptic/Audio System for 3-D Targeting Tasks—Lorenzo Picinali, De Montfort University - Leicester, UK; Bob Menelas, Brian F. G. Katz, Patrick Bourdot, LIMSI-CNRS - Orsay, France
While common user interface designs tend to focus on visual feedback, other sensory channels may be used in order to reduce the cognitive load of the visual one. In this paper non-visual environments are presented in order to investigate how users exploit information delivered through haptic and audio channels. A first experiment is designed to explore the effectiveness of a haptic audio system evaluated in a single target localization task; a virtual magnet metaphor is exploited for the haptic rendering, while a parameter mapping sonification of the distance to the source, combined with 3-D audio spatialization, is used for the audio one. An evaluation is carried out in terms of the effectiveness of separate haptic and auditory feedbacks versus the combined multimodal feedback.
Convention Paper 8144 (Purchase now)
P24-7 Track Displays in DAW Software: Beyond Waveform Views—Kristian Gohlke, Michael Hltaky, Sebastian Heise, David Black, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
For decades, digital audio workstation software has displayed the content of audio tracks through bare waveforms. We argue that the same real estate on the computer screen can be used for far more expressive and goal-oriented visualizations. Starting from a range of requirements and use cases, this paper discusses existing techniques from such fields as music visualization and music notation. It presents a number of novel techniques, aimed at better fulfilling the needs of the human operator. To this end, the paper draws upon methods from signal processing and music information retrieval as well as computer graphics.
Convention Paper 8145 (Purchase now)
P25 - Listening Tests and Evaluation Psychoacoustics
Tuesday, May 25, 14:00 — 18:00 (Room C5)
Chair: Natanya Ford, Buckinghamshire New University - UK
P25-1 Toward a Statistically Well-Grounded Evaluation of Listening Tests—Avoiding Pitfalls, Misuse, and Misconceptions—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Peter Sedlmeier, Technical University of Dresden - Dresden, Germany
Many recent publications in audio research present subjective evaluations of audio quality based on the Recommendation ITU-R BS.1534-1 (MUSHRA, MUltiple Stimuli with Hidden Reference and Anchor). This is a very welcome trend because it enables researchers to assess the implications of their developments. The evaluation of listening tests, however, sometimes suffers from an incomplete understanding of the underlying statistics. The present paper aims at identifying the causes for the pitfalls and misconceptions in MUSHRA evaluations. It exemplifies the impact of falsely used or even misused statistics. Subsequently, schemes for evaluating the listeners’ judgments that are well-grounded on statistical considerations comprising an understanding of the concepts of statistical power and effect size are proposed.
Convention Paper 8146 (Purchase now)
P25-2 Audibility of Headphone Positioning Variability—Mathieu Paquier, Vincent Koehl, Université de Brest - Plouzané, France
This paper aims at evaluating the audibility of spectral modifications induced by slight but realistic changes in the headphone position over a listener’s ears. Recordings have been performed on a dummy head on which four different headphone models were placed eight times each. Musical excerpts and pink noise were played over the headphones and recorded with microphones located at the entrance of the blocked ear canal. These recordings were then presented to listeners over a single test headphone. The subjects had to assess the recordings in a 3I3AFC task to discriminate between the different headphone positions. The results indicate that, whatever the headphone model or the excerpt, the modifications caused by different positions were always perceived.
Convention Paper 8147 (Purchase now)
P25-3 Objectivization of Audio-Video Correlation Assessment Experiments—Bartosz Kunka, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
The purpose of this paper is to present a new method of conducting audio-visual correlation analysis employing a head-motion-free gaze tracking system. First, a review of related works in the domain of sound and vision correlation is presented. Then assumptions concerning audio-visual scene creation are briefly described. The objectivization process of carrying out correlation tests employing gaze-tracking system is outlined. The gaze tracking system developed at the Multimedia Systems Department is described, and its use for carrying out subjective tests is given. The results of subjective tests examining the relationship between video and audio associated with the video material are presented. Conclusions concerning the new methodology, as well as future work direction, are provided.
Convention Paper 8148 (Purchase now)
P25-4 A New Time and Intensity Trade-Off Function for Localization of Natural Sound Sources—Hyunkook Lee, LG Electronics - Seoul, Korea
This paper introduces a new set of psychoacoustic values of interchannel time difference (ICTD) and interchannel intensity difference (ICID) required for 10°, 20°, and 30° localization in the conventional stereophonic reproduction, which were obtained using natural sound sources of musical instruments and wideband speech representing different characteristics. It then discusses the new concept of ICID and the ICTD trade-off function developed based on the relationship of the psychoacoustic values. The result of the listening test is conducted to verify the performance of the proposed method is also presented.
Convention Paper 8149 (Purchase now)
P25-5 Effect of Signal-to-Noise Ratio and Visual Context on Environmental Sound Identification—Tifanie Bouchara, LIMSI-CNRS - Orsay, France; Bruno L. Giordano, Ilja Frissen, McGill University - Montreal, Quebec, Canada; Brian F. G. Katz, LIMSI-CNRS - Orsay, France; Catherine Guastavino, McGill University - Montreal, Quebec, Canada
The recognition of environmental sounds is of main interest for the perception of our environment. This paper investigates whether visual context can counterbalance the impairing effect of signal degradation (signal-to-noise ratio, SNR) on the identification of environmental sounds. SNRs and semantic congruency between sensory modalities, i.e., auditory and visual information, were manipulated. Two categories of sound sources, living and nonliving were used. The participants’ task was to indicate the category of the sound as fast as possible. Increasing SNRs and congruent audiovisual contexts enhanced identification accuracy and shortened reaction times. The results further indicated that living sound sources were recognized more accurately and faster than nonliving sound sources. A preliminary analysis of the acoustical factors mediating participants’ responses revealed that the harmonic-to-noise ratio (HNR) sound signals was significantly associated with the probability of identifying a sound as living. Further, the extent to which participants’ identifications were sensitive to the HNR appeared to be modulated by both SNR and audiovisual congruence.
Convention Paper 8150 (Purchase now)
P25-6 The Influence of Individual Audio Impairments on Perceived Video Quality—Leslie Gaston, University of Colorado, Denver - Denver, CO, USA; Jon Boley, LSB Audio LLC - Lafayette, IN, USA; Scott Selter, Jeffrey Ratterman, University of Colorado, Denver - Denver, CO, USA
As the audio, video, and related industries work toward establishing standards for subjective measures of audio/video quality, more information is needed to understand subjective audio/video interactions. This paper reports a contribution to this effort that aims to extend previous studies, which show that audio and video quality influence each other and that some audio artifacts affect overall quality more than others. In the current study, these findings are combined in a new experiment designed to reveal how individual impairments of audio affect perceived video quality. Our results show that some audio artifacts enhance the ability to identify video artifacts, while others make discrimination more difficult.
Convention Paper 8151 (Purchase now)
P25-7 Vertical Localization of Sounds with Frequencies Changing over Three Octaves—Eiichi Miyasaka, Tokyo City University - Yokohama, Kanagawa, Japan
Vertical localization was investigated for sounds consisting of 22 tone-bursts ascending/descending along the whole-tone scale between C4 (262 Hz) and C7 (2093 Hz). The kinds of the sounds used were pure tones, one-third octave band noises, and piano-tones. The sounds were presented through a fixed loudspeaker (SP-A) set up just in front of listeners with numbered cards set perpendicularly (case-1) or with seven dummy loudspeakers attached to the numbered cards (case-2). The results show that most observers perceived the locations of the sound images moved upward from a loudspeaker around SP-A for the ascending sounds or downward for the descending sounds in both cases, although the sounds were radiated through the fixed loudspeaker (SP-A).
Convention Paper 8152 (Purchase now)
P25-8 Variability in Perceptual Evaluation of HRTFs—David Schönstein, Arkamys - Paris, France; Brian F. G. Katz, Université Paris XI - Orsay Cedex, France
The implementation of the head-related transfer function (HRTF) is key to binaural rendering applications. An HRTF evaluation and selection is often required when individual HRTFs are not available. This paper examines the variability in perceptual evaluations of HRTFs using a listening test. A set of six different HRTFs was selected and was then used in a listening test based on the standardized MUSHRA method for evaluating audio quality. A total of six subjects participated, each having their own recorded HRTFs available. Subjects performed five repetitions of the listening test. While conclusive HRTF judgments were evident, a significantly large degree of variance was found. The effect of listener expertise on variability in perceptual judgments was also analyzed.
Convention Paper 8153 (Purchase now)
P26 - Recording, Production, and Reproduction—Multichannel and Spatial Audio
Tuesday, May 25, 14:00 — 15:30 (Room C4-Foyer)
P26-1 Evaluation of Virtual Source Localization Using 3-D Loudspeaker Setups—Florian Keiler, Johann-Markus Batke, Technicolor, Research, and Innovation - Hannover Germany
This paper evaluates the localization accuracy of different playback methods for 3-D spatial sound using a listening test. The playback methods are characterized by their panning functions that define the gain for each loudspeaker to play back a sound source positioned at a distinct pair of azimuth and elevation angles. The tested methods are Ambisonics decoding using the mode matching approach, vector base amplitude panning (VBAP), and a newly proposed 3-D robust panning approach. For irregular 3-D loudspeaker setups, as found in home environments, the mode matching shows poor localization. The new 3-D robust panning leads to a better localization and can also outperform the VBAP technology dependent on the source position and the loudspeaker setup used.
Convention Paper 8060 (Purchase now)
P26-2 Optimization of the Localization Performance of Irregular Ambisonic Decoders for Multiple Off-Center Listeners—David Moore, Glasgow Caledonian University - Glasgow, Lanarkshire, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper presents a method for optimizing the performance of irregular Ambisonic decoders for multiple off-center listeners. New off-center evaluation criteria are added to a multi-objective fitness function, based on auditory localization theory, which guides a heuristic search algorithm to derive decoder parameter sets for the ITU 5-speaker layout. The new evaluation criteria are based upon Gerzon’s Metatheory of Auditory Localization and have been modified to take into account off-center listening positions. The derived decoders exhibit improved theoretical localization performance for off-center listeners. The theoretical results are supported by initial listening test results.
Convention Paper 8061 (Purchase now)
P26-3 Vibrational Behavior of High Aspect Ratio Multiactuator Panels—Basilio Pueo, Jorge A. López, Javier Moralejo, University of Alicante - Alicante, Spain; José Javier López, Technical University of Valencia - Valencia, Spain
Multiactuator Panels (MAPs) consist of a flat panel of a light and stiff material to which a number of mechanical exciters are attached, creating bending waves that are then radiated as sound fields. MAPs can substitute the traditional dynamic loudspeaker arrays for Wave Field Synthesis (WFS) with added benefits, such as the low visual profile or omnidirectional radiation. However, the exciter interaction with the panel, the panel material, and the panel edge boundary conditions are some of the critical points that need to be evaluated and improved. In this paper the structural acoustic behavior of a high aspect ratio MAP is analyzed for two classical edge boundary conditions: free and clamped. For that purpose, the surface velocity over the whole area of the MAP prototype has been measured with a Laser Doppler Vibrometer (LDV), which helped in understanding the sound-generating behavior of the panel.
Convention Paper 8062 (Purchase now)
P26-4 Design of a Circular Microphone Array for Panoramic Audio Recording and Reproduction: Microphone Directivity—Hüseyin Hacihabiboglu, Enzo De Sena, Zoran Cvetkovic, King's College London - London, UK
Design of a circularly symmetric multichannel recording and reproduction system is discussed in this paper. The system consists of an array of directional microphones evenly distributed on a circle and a matching array of loudspeakers. The relation between the microphone directivity and the radius of the circular array is established within the context of time-intensity stereophony. The microphone directivity design is identified as a constrained linear least squares optimization problem. Results of a subjective evaluation are presented that indicate the usefulness of the proposed microphone array design technique.
Convention Paper 8063 (Purchase now)
P26-5 Design of a Circular Microphone Array for Panoramic Audio Recording and Reproduction: Array Radius—Enzo De Sena, Hüseyin Hacihabiboglu, Zoran Cvetkovic, King's College London - London, UK
A multichannel audio system proposed by Johnston and Lam aims at perceptual reconstruction of the sound field of an acoustic performance in its original venue. The system employs a circular microphone array, of 31 cm diameter, to capture relevant spatial cues. This design proved to be effective in the rendition of the auditory perspective, however other studies showed that there is still substantial room for improvement. This paper investigates the impact of the array diameter on the width and naturalness of the auditory images. To this end we propose a method for quantification and prediction of the perceived naturalness. Simulation results support array diameters close to that proposed by Johnston and Lam in the sense that they achieve optimal naturalness in the center of the listening area, but also suggest that larger arrays might provide a more graceful degradation of the naturalness for listening positions away from the center.
Convention Paper 8064 (Purchase now)
P26-6 MIAUDIO—Audio Mixture Digital Matrix—David Pedrosa Branco, José Neto Vieira, Iouliia Skliarova, Universidade de Aveiro - Aveiro, Portugal
Electroacoustic music is turning more and more to the sound diffusion techniques. Multichannel sound systems like BEAST and SARC are built so that the musician can independently control the intensity of several audio channels. This feature provides the possibility of creating several sound diffusion scenarios, i.e., immersion and the possibility of movement around the audience. The developed system (MIAUDIO) is a real-time sound diffusion system currently able to mix up to 8 audio input channels through 32 output channels. A hardware solution was adopted using a Field Programmable Gate Array (FPGA) to perform the mixture. The analog audio signals are conditioned, converted to digital format by several analog-to-digital converters, and then sent to the FPGA that is responsible to perform the mixing algorithm. The host computer connects to the FPGA via USB and is responsible for supplying the parameters that define the audio mixture. Being so, the user has control over the input levels through the output channels independently. MIAUDIO was successfully implemented with a low-cost solution when compared with similar systems. All the channels were tested using a Precision One system with very good results.
Convention Paper 8065 (Purchase now)
P26-7 The Perception of Focused Sources in Wave Field Synthesis as a Function of Listener Angle—Robert Oldfield, Ian Drumm, Jos Hirst, University of Salford - Salford, UK
Wave field synthesis (WFS) is a volumetric sound reproduction technique that allows virtual sources to be positioned anywhere in space. The reproduction of the wave field of these sources means they can be accurately localized even when placed in front of the secondary sources/loudspeakers. Such “focused sources” are very important in WFS as they greatly add to the realism of an auditory scene, however the perception and localization-ability changes with listener and virtual source position. In this paper we present subjective tests to determine the localization accuracy as a function of angle, defining the subjective “view angle.” We also show how improvements can be made through the addition of the first order image sources.
Convention Paper 8066 (Purchase now)
P26-8 Gram-Schmidt-Based Downmixer and Decorrelator in the MPEG Surround Coding—Der-Pei Chen, Hsu-Feng Hsiao, Han-Wen Hsu, Chi-Min Liu, National Chiao Tung University - Hsinchu, Taiwan
MPEG Surround (MPS) coding is an efficient method for multichannel audio coding. In MPS coding, downmixing from multichannel signals into a fewer number of channels is an efficient way to achieve a high compression rate in an encoder. In decoder, an upmixing module combining with the decorrelator is the key module to reconstruct the multichannel signals. This paper considers the design of the downmixer and the decorrelator through the Gram-Schmidt orthogonal process. The individual and joint effects from the downmixer and decorrelator are verified through intensive subjective and objective quality measure.
Convention Paper 8067 (Purchase now)