AES London 2010
Game Audio Event Details
Saturday, May 22, 09:30 — 12:30
(Room C3)
P1 - High Performance Audio Processing
Chair: Neil Harris, New Transducers Ltd. (NXT) - Cambridge, UK
P1-1 Model-Driven Development of Audio Processing Applications for Multi-Core Processors—Tiziano Leidi, ICIMSI-SUPSI - Manno, Switzerland; Thierry Heeb, Digimath - Sainte-Croix, Switzerland; Marco Colla, ICIMSI-SUPSI - Manno, Switzerland; Jean-Philippe Thiran, EPFL - Lausanne, Switzerland
Chip-level multiprocessors are still very young and available forecasts anticipate a strong evolution for the forthcoming decade. To exploit them, efficient and robust applications have to be built with the appropriate algorithms and software architectures. Model-driven development is able to lower some barriers toward applications that process audio in parallel on multi-cores. It allows using abstractions to simplify and mask complex aspects of the development process and helps avoid inefficiencies and subtle bugs. This paper presents some evolutions of Audio n-Genie, an open-source environment for model-driven development of audio processing applications, which has been recently enhanced with support for parallel processing on multi-cores.
Convention Paper 7961 (Purchase now)
P1-2 Real-Time Additive Synthesis with One Million Sinusoids Using a GPU—Lauri Savioja, NVIDIA Research - Helsinki, Finland, Aalto University School of Science and Technology, Espoo, Finland; Vesa Välimäki, Aalto University School of Science and Technology - Espoo, Finland; Julius O. Smith III, Stanford University - Palo Alto, CA, USA
Additive synthesis is one of the fundamental sound synthesis techniques. It is based on the principle that each sound can be represented as a superposition of sine waves of different frequencies. That task can be done fully parallel and thus it is suitable for GPU (graphics processing unit) implementation. In this paper we show that it is possible to compute over one million unique sine waves in real-time using a current GPU. That performance depends on the applied buffer sizes, but close to the maximum result is reachable already with a buffer of 500 samples.
Convention Paper 7962 (Purchase now)
P1-3 A GPGPU Approach to Improved Acoustic Finite Difference Time Domain Calculations—Jamie A. S. Angus, Andrew Caunce, University of Salford - Salford, Greater Manchester, UK
This paper shows how to improve the efficiency and accuracy of Finite Difference Time Domain acoustic simulation by both calculating the differences using spectral methods and performing these calculations on a Graphics Processing Unit (GPU) rather than a CPU. These changes to the calculation method result in an increase in accuracy as well as a reduction in computational expense. The recent advances in the way that GPU’s are programmed (for example using CUDA on Nvidia's GPU) now make them an ideal platform on which to perform scientific computations at very high speeds and very low power consumption.
Convention Paper 7963 (Purchase now)
P1-4 Digital Equalization Filter: New Solution to the Frequency Response Near Nyquist and Evaluation by Listening Tests—Thorsten Schmidt, Cube-Tec International - Bremen, Germany; Joerg Bitzer, Jade-University of Applied Sciences - Oldenburg, Germany
Current design methods for digital equalization filter face the problem of a frequency response increasingly deviating from their analog equivalent close to the Nyquist frequency. This paper deals with a new way to design equalization filters, which improve this behavior over the entire frequency range between 0 Hz (DC) and Nyquist. The theoretical approach is shown and examples of low pass, peak-, and shelving-filters are compared to state-of-the-art techniques. Listening tests were made to verify the audible differences and rate the quality of the different design methods.
Convention Paper 7964 (Purchase now)
P1-5 Audio Equalization with Fixed-Pole Parallel Filters: An Efficient Alternative to Complex Smoothing—Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary
Recently, the fixed-pole design of parallel second-order filters has been proposed to accomplish arbitrary frequency resolution similarly to Kautz filters, at 2/3 of their computational cost. This paper relates the parallel filter to the complex smoothing of transfer functions. Complex smoothing is a well-established method for limiting the frequency resolution of audio transfer functions for analysis, modeling, and equalization purposes. It is shown that the parallel filter response is similar to the one obtained by complex smoothing the target response using a hanning window: a 1/ß octave resolution is achieved by using ß/2 pole pairs per octave in the parallel filter. Accordingly, the parallel filter can be either used as an efficient implementation of smoothed responses, or, it can be designed from the unsmoothed responses directly, eliminating the need of frequency-domain processing. In addition, the theoretical equivalence of parallel filters and Kautz filters is developed, and the formulas for converting between the parameters of the two structures are given. Examples of loudspeaker-room equalization are provided.
Convention Paper 7965 (Purchase now)
P1-6 Rapid and Automated Development of Audio Digital Signal Processing Algorithms for Mobile Devices—David Trainor, APTX - Belfast, N. Ireland, UK
Software applications and programming languages are available to assist audio DSP algorithm developers and mobile device designers, including Matlab/Simulink, C/C++, and assembly languages. These tools provide some assistance for algorithmic experimentation and subsequent refinement to highly-efficient embedded software. However, a typical design flow is still highly iterative, with manual software recoding, translation, and optimization. This paper introduces a software libraries and design techniques that integrate existing commercial audio algorithm design tools and permit intuitive algorithmic experimentation and automated translation of audio algorithms to efficient embedded software. These techniques have been incorporated into a new software framework, and the operation of this framework is described using the example of a custom audio coding algorithm targeted to a mobile audio device.
Convention Paper 7966 (Purchase now)
Saturday, May 22, 14:00 — 18:00 (Room C3)
P4 - Spatial Signal Processing
Chair: Francis Rumsey
P4-1 Classification of Time-Frequency Regions in Stereo Audio—Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands
The paper is about classification of time-frequency (TF) regions in stereo audio data by the type of mixture the region represents. The detection of the type of mixing is necessary, for example, in source separation, upmixing, and audio manipulation applications. We propose a generic signal model and a method to classify the TF regions into six classes that are different combinations of central, panned, and uncorrelated sources. We give an overview of traditional techniques for comparing frequency-domain data and propose a new approach for classification that is based on measures specially trained for the six classes. The performance of the new measures is studied and demonstrated using synthetic and real audio data.
Convention Paper 7980 (Purchase now)
P4-2 A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments—Christopher Hummersone, Russell Mason, Tim Brookes, University of Surrey - Guildford, UK
Reverberation continues to be problematic in many areas of audio and speech processing, including source separation. The precedence effect is an important psychoacoustic tool utilized by humans to assist in localization by suppressing reflections arising from room boundaries. Numerous computational precedence models have been developed over the years and all suggest quite different strategies for handling reverberation. However, relatively little work has been done on incorporating precedence into source separation. This paper details a study comparing several computational precedence models and their impact on the performance of a baseline separation algorithm. The models are tested in a range of reverberant rooms and with a range of other mixture parameters. Large differences in the performance of the models are observed. The results show that a model based on interaural coherence produces the greatest performance gain over the baseline algorithm.
Convention Paper 7981 (Purchase now)
P4-3 Converting Stereo Microphone Signals Directly to MPEG-Surround—Christophe Tournery, Christof Faller, Illusonic LLC - Lausanne, Switzerland; Fabian Kuech, Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
We have previously proposed a way to use stereo microphones with spatial audio coding to record and code surround sound. In this paper we are describing further considerations and improvements needed to convert stereo microphone signals directly to MPEG Surround, i.e., a downmix signal plus a bit stream. It is described in detail how to obtain from the microphone channels the information needed for computing MPEG Surround spatial parameters and how to process the microphone signals to transform them to an MPEG Surround compatible downmix.
Convention Paper 7982 (Purchase now)
P4-4 Modification of Spatial Information in Coincident-Pair Recordings—Jeremy Wells, University of York - York, UK
A novel method is presented for modifying the spatial information contained in the output from a stereo coincident pair of microphones. The purpose of this method is to provide additional decorrelation of the audio at the left and right replay channels for sound arriving at the sides of a coincident pair but to retain the imaging accuracy for sounds arriving to the front or rear or where the entire sound field is highly correlated. Details of how this is achieved are given and results for different types of sound field are presented.
Convention Paper 7983 (Purchase now)
P4-5 Unitary Matrix Design for Diffuse Jot Reverberators—Fritz Menzer, Christof Faller, Ecole Polytechnique Federale de Lausanne - Lausanne, Switzerland
This paper presents different methods for designing unitary mixing matrices for Jot reverberators with a particular emphasis on cases where no early reflections are to be modeled. Possible applications include diffuse sound reverberators and decorrelators. The trade-off between effective mixing between channels and the number of multiply operations per channel and output sample is investigated as well as the relationship between the sparseness of powers of the mixing matrix and the sparseness of the impulse response.
Convention Paper 7984 (Purchase now)
P4-6 Sound Field Indicators for Hearing Activity and Reverberation Time Estimation in Hearing Instruments—Andreas P. Streich, ETH Zurich - Zurich, Switzerland; Manuela Feilner, Alfred Stirnemann, Phonak AG - Stäfa, Switzerland; Joachim M. Buhmann, ETH Zurich - Zurich, Switzerland
Sound field indicators (SFI) are proposed as a new feature set to estimate the hearing activity and reverberation time in hearing instruments. SFIs are based on physical measurements of the sound field. A variant thereof, called SFI short-time statistics SFIst2, is obtained by computing mean and standard deviations of SFIs on 10 subframes. To show the utility of these feature sets for the mentioned prediction tasks, experiments are carried out on artificially reverberated recordings of a large variety of sounds encountered in daily life. In a classification scenario where the hearing activity is to be predicted, both SFI and SFIst2 yield clearly superior accuracy even compared to hand-tailored features used in state-of-the-art hearing instruments. For regression on the reverberation time, the SFI-based features yield a lower residual error than standard feature sets and reach the performance of specially designed features. The hearing activity classification is mainly based on the average of the SFIs, while the standard deviation over sub-window is used heavily to predict the reverberation time.
Convention Paper 7985 (Purchase now)
P4-7 Stereo-to-Binaural Conversion Using Interaural Coherence Matching—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
In this paper a method of converting stereo recordings to simulated binaural recordings is presented. The stereo signal is separated into coherent and diffuse sound based on the assumption that the signal comes from a coincident symmetric microphone setup. The coherent part is reproduced using HRTFs and the diffuse part is reproduced using filters adapting the interaural coherence to the interaural coherence a binaural recording of diffuse sound would have.
Convention Paper 7986 (Purchase now)
P4-8 Linear Simulation of Spaced Microphone Arrays Using B-Format Recordings—Andreas Walther, Christof Faller, Ecole Polytechnique Federal de Lausanne - Lausanne, Switzerland
A novel approach for linear post-processing of B-Format recordings is presented. The goal is to simulate spaced microphone arrays by approximating and virtually recording the sound field at the position of each single microphone. The delays occurring in non-coincident recordings are simulated by translating an approximative plane wave representation of the sound field to the positions of the microphones. The directional responses of the spaced microphones are approximated by linear combination of the corresponding translated B-format channels.
Convention Paper 7987 (Purchase now)
Saturday, May 22, 14:00 — 15:30 (Room C4-Foyer)
P6 - Audio Equipment and Emerging Technologies
P6-1 Study and Evaluation of MOSFET Rds(ON) Impedance Efficiency Losses in High Power Multilevel DCI-NPC Amplifiers—Vicent Sala, G. Ruiz, Luis Romeral, UPC-Universitat Politecnca de Catalunya - Terrassa, Spain
This paper justifies the usefulness of multilevel power amplifiers with DCI-NPC (Diode Clamped Inverter – Neutral Point Converter) topology in applications where size and weight needs were optimized. These amplifiers can work at high frequencies thereby reducing the size and weight of the filter elements. However, it is necessary to study, analyze, and evaluate the efficiency losses because this amplifier has double the number of switching elements. This paper models the behavior of the MOSFET Rds(ON) in a DCI-NPC topology for different conditions.
Convention Paper 7996 (Purchase now)
P6-2 Modeling Distortion Effects in Class-D Amplifier Filter Inductors—Arnold Knott, Tore Stegenborg-Andersen, Ole C. Thomsen, Technical University of Denmark - Lyngby, Denmark; Dominik Bortis, Johann W. Kolar, Swiss Federal Institute of Technology in Zurich - Zurich, Switzerland; Gerhard Pfaffinger, Harman/Becker Automotive Systems GmbH - Straubing, Germany; Michael A. E. Andersen, Technical University of Denmark - Lyngby, Denmark
Distortion is generally accepted as a quantifier to judge the quality of audio power amplifiers. In switch-mode power amplifiers various mechanisms influence this performance measure. After giving an overview of those, this paper focuses on the particular effect of the nonlinearity of the output filter components on the audio performance. While the physical reasons for both, the capacitor and the inductor induced distortion are given, the practical in-depth demonstration is done for the inductor only. This includes measuring the inductors performance, modeling through fitting and resulting into simulation models. The fitted models achieve distortion values between 0.03 % and 0.20 % as a basis to enable the design of a 200 W amplifier.
Convention Paper 7997 (Purchase now)
P6-3 Multilevel DCI-NPC Power Amplifier High-Frequency Distortion Analysis through Parasitic Inductance Dynamic Model—Vicent Sala, G. Ruiz, E. López, Luis Romeral, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
The high frequency distortion sources in DCI-NPC (Diode Clamped Inverter- Neutral Point Converter) amplifiers topology are studied and analyzed. It has justified the need for designing a model that contains the different parasitic inductive circuits that presents dynamically this kind of amplifier, as a function of the combination of its active transistors. By means of a proposed pattern layout we present a dynamic model of the parasitic inductances of the amplifier Full-Bridge DCI-NPC, and this is used to propose some simple rules for the optimal designing of layouts for these types of amplifiers. Simulation and experimental results are presented to justify the proposed model, and the affirmations and recommendations are given in this paper.
Convention Paper 7998 (Purchase now)
P6-4 How Much Gain Should a Professional Microphone Preamplifier Have?—Douglas McKinnie, Middle Tennessee State University - Murfreesboro, TN, USA
Many tradeoffs are required in the design of microphone preamplifier circuits. Characteristics such as noise figure, stability, bandwidth, and complexity may be dependent upon the gain of the design. Three factors determine the gain required from a microphone preamp: sound-pressure level of the sound source, distance of the microphone from that sound source (within the critical distance), and sensitivity of the microphone. This paper is an effort to find a probability distribution of the gain settings used with professional microphones. This is done by finding the distribution of max SPL in real use and by finding the sensitivity of the most commonly used current and classic microphones.
Convention Paper 7999 (Purchase now)
P6-5 Equalizing Force Contributions in Transducers with Partitioned Electrode—Libor Husník, Czech Technical University in Prague - Prague, Czech Republic
A partitioned electrode in an electrostatic transducer can present among others a possibility for making the transducer with the direct D/A conversion. Nevertheless, partitioned electrodes, the sizes of which are proportional to powers of 2 or terms of other convenient series, do not have the corresponding force action on the membrane. The reason is the membrane does not vibrate in a piston-like mode and electrode parts close to the membrane periphery do not excite membrane vibrations in the same way as the elements near the center. The aim of this paper is to suggest equalization of force contributions from different partitioned electrodes by varying their sizes. Principles presented here can also be used for other membrane-electrode arrangements.
Convention Paper 8000 (Purchase now)
P6-6 Low-End Device to Convert EEG Waves to MIDI—Adrian Attard Trevisan, St. Martins Institute of Information Technology - Hamrun, Malta; Lewis Jones, London Metropolitan University - London, UK
This research provides a simple and portable system that is able to generate MIDI output based on the inputted data collected through an EEG collecting device. The context is beneficial in many ways, where the therapeutic effects of listening to the music created by the brain waves documents many cases of treating health problems. The approach is influenced by the interface described in the article “Brain-Computer Music Interface for Composition and Performance” by Eduardo Reck Miranda, where different frequency bands trigger corresponding piano notes through, and the complexity of, the signal represents the tempo of the sound. The correspondence of the sound and the notes have been established through experimental work, where data of participants of a test group were gathered and analyzed, putting intervals for brain frequencies for different notes. The study is an active contribution to the field of the neurofeedback, by providing criteria tools for assessment.
Convention Paper 8001 (Purchase now)
P6-7 Implementation and Development of Interfaces for Music Performance through Analysis of Improvised Dance Movements—Richard Hoadley, Anglia Ruskin University - Cambridge, UK
Electronic music, even when designed to be interactive, can lack performance interest and is frequently musically unsophisticated. This is unfortunate because there are many aspects of electronic music that can be interesting, elegant, demonstrative, and musically informative. The use of dancers to interact with prototypical interfaces comprising clusters of sensors generating music algorithmically provides a method of investigating human actions in this environment. This is achieved through collaborative work involving software and hardware designers, composers, sculptors, and choreographers who examine aesthetically and practically the interstices of these disciplines. This paper investigates these interstices.
Convention Paper 8002 (Purchase now)
P6-8 Violence Prediction through Emotional Speech—José Higueras-Soler, Roberto Gil-Pita, Enrique Alexandre, Manuel Rosa-Zurera, Universidad de Alcalá - Acalá d Henares, Madrid, Spain
Preventing violence takes an absolute necessity in our society. Whether in homes with a particular risk of domestic violence, as in prisons or schools, there is a need for systems capable of detecting risk situations, for preventive purposes. One of the most important factors that precede a violent situation is an emotional state of anger. In this paper we discuss the features that are required to provide decision makers dedicated to the detection of emotional states of anger from speech signals. For this purpose, we present a set of experiments and results with the aim of studying the combination of features extracted from the literature and their effects over the detection performance (relationship between probability of detection of anger and probability of false alarm) of a neural network and a least-square linear detector.
Convention Paper 8003 (Purchase now)
P6-9 FoleySonic: Placing Sounds on a Timeline through Gestures—David Black, Kristian Gohlke, University of Applied Sciences, Bremen - Bremen, Germany; Jörn Loviscach, University of Applied Sciences, Bielefeld - Bielefeld, Germany
The task of sound placement on video timelines is usually a time-consuming process that requires the sound designer or foley artist to carefully calibrate the position and length of each sound sample. For novice and home video producers, friendlier and more entertaining input methods are needed. We demonstrate a novel approach that harnesses the motion-sensing capabilities of readily available input devices, such as the Nintendo Wii Remote or modern smart phones, to provide intuitive and fluid arrangement of samples on a timeline. Users can watch a video while simultaneously adding sound effects, providing a near real-time workflow. The system leverages the user’s motor skills for enhanced expressiveness and provides a satisfying experience while accelerating the process.
Convention Paper 8004 (Purchase now)
P6-10 A Computer-Aided Audio Effect Setup Procedure for Untrained Users—Sebastian Heise, Michael Hlatky, Hochschle Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
The number of parameters of modern audio effects easily ranges in the dozens. Expert knowledge is required to understand which parameter change results in a desired effect. Yet, such sound processors are also making their way into consumer products, where they overburden most users. Hence, we propose a procedure to achieve a desired effect without technical expertise based on a black-box genetic optimization strategy: Users are only confronted with a series of comparisons of two processed examples. Learning from the users’ choices, our software optimizes the parameter settings. We conducted a study on hearing-impaired persons without expert knowledge, who used the system to adjust a third-octave equalizer and a multiband compressor to improve the intelligibility of a TV set.
Convention Paper 8005 (Purchase now)
Sunday, May 23, 09:00 — 10:30 (Room C4-Foyer)
P10 - Audio Processing—Analysis and Synthesis of Sound
P10-1 Cellular Automata Sound Synthesis with an Extended Version of the Multitype Voter Model—Jaime Serquera, Eduardo R. Miranda, University of Plymouth - Plymouth, UK
In this paper we report on the synthesis of sounds with cellular automata (CA), specifically with an extended version of the multitype voter model (MVM). Our mapping process is based on DSP analysis of automata evolutions and consists in mapping histograms onto sound spectrograms. This mapping allows a flexible sound design process, but due to the non-deterministic nature of the MVM such process acquires its maximum potential after the CA run is finished. Our extended version model presents a high degree of predictability and controllability making the system suitable for an in-advance sound design process with all the advantages that this entails, such as real-time possibilities and performance applications. This research focuses on the synthesis of damped sounds.
Convention Paper 8029 (Purchase now)
P10-2 Stereophonic Rendering of Source Distance Using DWM-FDN Artificial Reverberators—Saul Maté-Cid, Hüseyin Hacihabiboglu, Zoran Cvetkovic, King's College London - London, UK
Artificial reverberators are used in audio recording and production to enhance the perception of spaciousness. It is well known that reverberation is a key factor in the perception of the distance of a sound source. The ratio of direct and reverberant energies is one of the most important distance cues. A stereophonic artificial reverberator is proposed that allows panning the perceived distance of a sound source. The proposed reverberator is based on feedback delay network (FDN) reverberators and uses a perceptual model of direct-to-reverberant (D/R) energy ratio to pan the source distance. The equivalence of FDNs and digital waveguide mesh (DWM) scattering matrices is exploited in order to devise a reverberator relevant in the room acoustics context.
Convention Paper 8030 (Purchase now)
P10-3 Separation of Music+Effects Sound Track from Several International Versions of the Same Movie—Antoine Liutkus, Télécom ParisTech - Paris, France; Pierre Leveau, Audionamix - Paris, France
This paper concerns the separation of the music+effects (ME) track from a movie soundtrack, given the observation of several international versions of the same movie. The approach chosen is strongly inspired from existing stereo audio source separation and especially from spatial filtering algorithms such as DUET that can extract a constant panned source from a mixture very efficiently. The problem is indeed similar for we aim here at separating the ME track, which is the common background of all international versions of the movie soundtrack. The algorithm has been adapted to a number of channels greater than 2. Preprocessing techniques have also been proposed to adapt the algorithm to realistic cases. The performances of the algorithm have been evaluated on realistic and synthetic cases.
Convention Paper 8031 (Purchase now)
P10-4 A Differential Approach for the Implementation of Superdirective Loudspeaker Array—Jung-Woo Choi, Youngtae Kim, Sangchul Ko, Jungho Kim, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
A loudspeaker arrangement and corresponding analysis method to obtain a robust superdirective beam are proposed. The superdirectivity technique requires precise matching of the sound sources modeled to calculate excitation patterns and those used for the loudspeaker array. To resolve the robustness issue arising from the modeling mismatch error, we show that the overall sensitivity to the model-mismatch error can be reduced by rearranging loudspeaker positions. Specifically, a beam pattern obtained by a conventional optimization technique is represented as a product of robust delay-and-sum patterns and error-sensitive differential patterns. The excitation pattern driving the loudspeaker array is then reformulated such that the error-sensitive pattern is only applied to the outermost loudspeaker elements, and the array design that fits to the new excitation pattern is discussed.
Convention Paper 8032 (Purchase now)
P10-5 Improving the Performance of Pitch Estimators—Stephen J. Welburn, Mark D. Plumbley, Queen Mary University of London - London, UK
We are looking to use pitch estimators to provide an accurate high-resolution pitch track for resynthesis of musical audio. We found that current evaluation measures such as gross error rate (GER) are not suitable for algorithm selection. In this paper we examine the issues relating to evaluating pitch estimators and use these insights to improve performance of existing algorithms such as the well-known YIN pitch estimation algorithm.
Convention Paper 8033 (Purchase now)
P10-6 Reverberation Analysis via Response and Signal Statistics—Eleftheria Georganti, Thomas Zarouchas, John Mourjopoulos, University of Patras - Patras, Greece
This paper examines statistical quantities (i.e., kurtosis, skewness) of room transfer functions and audio signals (anechoic, reverberant, speech, music). Measurements are taken under various reverberation conditions in different real enclosures ranging from small office to a large auditorium and for varying source–receiver positions. Here, the statistical properties of the room responses and signals are examined in the frequency domain. From these properties, the relationship between the spectral statistics of the room transfer function and the corresponding reverberant signal are derived.
Convention Paper 8034 (Purchase now)
P10-7 An Investigation of Low-Level Signal Descriptors Characterizing the Noise-Like Nature of an Audio Signal—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
This paper presents an overview and an evaluation of low-level features characterizing the noise-like or tone-like nature of an audio signal. Such features are widely used for content classification, segmentation, identification, coding of audio signals, blind source separation, speech enhancement, and voice activity detection. Besides the very prominent Spectral Flatness Measure various alternative descriptors exist. These features are reviewed and the requirements for these features are discussed. The features in scope are evaluated using synthetic signals and exemplarily real-world application related to audio content classification, namely voiced-unvoiced discrimination for speech signals and speech detection.
Convention Paper 8035 (Purchase now)
P10-8 Algorithms for Digital Subharmonic Distortion—Zlatko Baracskai, Ryan Stables, Birmingham City University - Birmingham, UK
This paper presents a comparison between existing digital subharmonic generators and a new algorithm developed with the intention of having a more pronounced subharmonic frequency and reduced harmonic, intermodulation and aliasing distortions. The paper demonstrates that by introducing inversions of a waveform at the minima and maxima instead of the zero crossings, the discontinuities are mitigated and various types of distortion are significantly attenuated.
Convention Paper 8036 (Purchase now)
Sunday, May 23, 11:30 — 13:00 (Room C1)
T4 - CANCELLED
Sunday, May 23, 16:00 — 18:00 (Room C2)
T5 - Spatial Audio Reproduction: From Theory to Production
Presenters:
Frank Melchior, IOSONO GmbH - Erfurt, Germany
Sascha Spors, Deutsche Telekom AG Laboratories - Berlin, Germany
Abstract:
Advanced high-resolution spatial sound reproduction systems like Wave Field Synthesis (WFS) and Higher-Order Ambisonics (HOA) are being used increasingly. Consequently more and more material is being produced for such systems. Established channel-based production processes from stereophony can only be applied to a certain extent. In the future, a paradigm shift toward object-based audio production will have to take place in order to cope for the needs of systems like WFS. This tutorial spans the bridge from the physical foundations of such systems, over their practical implementation toward efficient production processes. The focus lies on WFS, however the findings will also be applicable to other systems. The tutorial is accompanied by practical examples of object-based productions for WFS.
Monday, May 24, 10:30 — 12:00 (Room C4-Foyer)
P16 - Audio Processing—Audio Coding and Machine Interface
P16-1 Trajectory Sampling for Computationally Efficient Reproduction of Moving Sound Sources—Nara Hahn, Keunwoo Choi, Hyunjoo Chung, Koeng-Mo Sung, Seoul National University - Seoul, Korea
Reproducing moving virtual sound sources has been addressed in number of spatial audio studies. The trajectories of moving sources are relatively oversampled if they are sampled in temporal domain, as this leads to inefficiency in computing the time delay and amplitude attenuation due to sound propagation. In this paper methods for trajectory sampling are proposed that use the spatial property of the movement and the spectral property of the received signal. These methods reduce the number of sampled positions and reduce the computational complexity. Listening tests were performed to determine the appropriate downsampling rate that depends not only on the trajectory but on the frequency content of the source signal.
Convention Paper 8080 (Purchase now)
P16-2 Audio Latency Measurement for Desktop Operating Systems with Onboard Soundcards—Yonghao Wang, Queen Mary University of London - London, UK, Birmingham City University, Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Joshua Reiss, Queen Mary University of London - London, UK
Using commodity computers in conjunction with live music digital audio workstations (DAW) has become increasingly more popular in recent years. The latency of these DAW audio processing chains for some application such as live audio monitoring has always been perceived as a problem when DSP audio effects are needed. With “High Definition Audio” being standardized as the onboard soundcard’s hardware architecture for personal computers, and with advances in audio APIs, the low latency and multichannel capability has made its way into home studios. This paper will discuss the results of latency measurements of current popular operating systems and hosts applications with different audio APIs and audio processing loads.
Convention Paper 8081 (Purchase now)
P16-3 Error Control Techniques within Multidimensional-Adaptive Audio Coding Algorithms—Neil Smyth, APTX - Belfast, N. Ireland, UK
Multidimensional-adaptive audio coding algorithms can adapt multiple performance measures to the demands of different audio applications in real-time. Depending on the transmission or storage environment, audio processing applications require forms of error control to maintain acceptable audio quality. By definition, multidimensional-adaptive audio coding utilizes numerous error detection, correction, and concealment techniques. However, such techniques also have implications for other relevant performance measurements, such as coded bit-rate and computational complexity. This paper discusses the signal-processing tools used by a multidimensional-adaptive audio coding algorithm to achieve varying levels of error control while the fundamental structure of the algorithm is also varying. The effects and trade-offs on other coding performance measures will also be discussed.
Convention Paper 8082 (Purchase now)
P16-4 Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation—Joonas Nikunen, Tuomas Virtanen, Tampere University of Technology - Tampere, Finland
This paper proposes a new object-based audio coding algorithm, which uses non-negative matrix factorization (NMF) for the magnitude spectrogram representation and the phase information is coded separately. The magnitude model is obtained using a perceptually weighted NMF algorithm, which minimizes the noise-to-mask ratio (NMR) of the decomposition, and is able to utilize long term redundancy by an object-based representation. Methods for the quantization and entropy coding of the NMF representation parameters are proposed and the quality loss is evaluated using the NMR measure. The quantization of the phase information is also studied. Additionally we propose a sparseness criteria for the NMF algorithm, which is set to favor the gain values having the highest probability and thus the shortest entropy coding word length, resulting to a reduced bit rate.
Convention Paper 8083 (Purchase now)
P16-5 Cross-Layer Rate-Distortion Optimization for Scalable Advanced Audio Coding—Emmanuel Ravelli, Vinay Melkote, Tejaswi Nanjundaswamy, Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Current scalable audio codecs optimize each layer of the bit-stream successively and independently with a straightforward application of the same rate-distortion optimization techniques employed in the non-scalable case. The main drawback of this approach is that the performance of the enhancement layers is significantly worse than that of the non-scalable codec at the same cumulative bit-rate. We propose in this paper a novel optimization technique in the Advanced Audio Coding (AAC) framework wherein a cross-layer iterative optimization is performed to select the encoding parameters for each layer with a conscious accounting of rate and distortion costs in all layers, which allows for a trade-off between performance at different layers. Subjective and objective results demonstrate the effectiveness of the proposed approach and provide insights for bridging the gap with the non-scalable codec.
Convention Paper 8084 (Purchase now)
P16-6 Issues and Solutions Related to Real-Time TD-PSOLA Implementation—Sylvain Le Beux, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France; Boris Doval, LAM-IJLRA, Université Paris - Paris, France; Christophe d'Alessandro, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France
This paper presents a procedure adaptation for the calculation of TD-PSOLA algorithm when the processing of pitch-shifting and time-stretching coefficients needs to be achieved in real-time at every new synthesis pitch mark. In the scope of standard TD-PSOLA algorithm, modification coefficients are defined from the analysis time axis whereas for real-time applications, pitch and duration control parameters need to be sampled at synthesis time. This paper will establish the theoretical correspondence between both approaches. Another issue related to real-time context concerns the trade-off between the latency required for processing and the type of analysis window used.
Convention Paper 8085 (Purchase now)
P16-7 Integrating Musicological Knowledge into a Probabilistic Framework for Chord and Key Extraction—Johan Pauwels, Jean-Pierre Martens, Ghent University - Ghent, Belgium
In this paper a formerly developed probabilistic framework for the simultaneous detection of chords and keys in polyphonic audio is further extended and validated. The system behavior is controlled by a small set of carefully defined free parameters. This has permitted us to conduct an experimental study that sheds a new light on the relative importance of musicological knowledge in the context of chord extraction. Some of the obtained results are at least surprising and, to our knowledge, never reported as such before.
Convention Paper 8086 (Purchase now)
P16-8 A Doubly Sparse Greedy Adaptive Dictionary Learning Algorithm for Music and Large-Scale Data—Maria G. Jafari, Mark D. Plumbley, Queen Mary University of London - London, UK
We consider the extension of the greedy adaptive dictionary learning algorithm that we introduced previously to applications other than speech signals. The algorithm learns a dictionary of sparse atoms, while yielding a sparse representation for the speech signals. We investigate its behavior in the analysis of music signals and propose a different dictionary learning approach that can be applied to large data sets. This facilitates the application of the algorithm to problems that generate large amounts of data, such as multimedia and multichannel application areas.
Convention Paper 8087 (Purchase now)
Monday, May 24, 11:00 — 13:00 (Room C6)
W10 - A Curriculum for Game Audio
Chair:
Richard Stevens, Leeds Metropolitan University
Panelists:
Dan Bardino, Creative Services Manager, Sony Computer Entertainment Europe Limited
Andy Farnell, Author of Designing Sound
David Mollerstedt, DICE - Sweden
Dave Raybould, Leeds Metropolitan University - UK
Nia Wearn, Staffordshire Univerity - Staffordshire, UK
Abstract:
How do I get work in the games industry? Anyone involved in the discussions that follow this question in forums, conferences, and workshops worldwide will realize that many students in Higher Education who are aiming to enter the sector are not equipped with the knowledge and skills that the industry requires. In this workshop a range of speakers will discuss, and attempt to define, the various roles and related skillsets for audio within the games industry and will outline their personal route into this field. The panel will also examine the related work of the IASIG Game Audio Education Working Group in light of the recent publication of its Game Audio Curriculum Guidelines draft. This will be a fully interactive workshop inviting debate from the floor alongside discussion from panel members in order to share a range of views on this important topic.
Monday, May 24, 16:00 — 17:00 (Saint Julien)
Audio for Games
Abstract:
Technical Committee Meeting on Audio for Games
Tuesday, May 25, 09:00 — 13:00 (Room C3)
P21 - Multichannel and Spatial Audio: Part 2
Chair: Ronald Aarts
P21-1 Center-Channel Processing in Virtual 3-D Audio Reproduction Over Headphones or Loudspeakers—Jean-Marc Jot, Martin Walsh, DTS Inc. - Scotts Valley, CA, USA
Virtual 3-D audio processing systems for the spatial enhancement of recordings reproduced over headphones or frontal loudspeakers generally provide a less compelling effect on center-panned sound components. This paper examines this deficiency and presents virtual 3-D audio processing algorithm modifications that provide a compelling spatial enhancement effect over headphones or loudspeakers even for sound components localized in the center of the stereo image, ensure the preservation of the timbre and balance in the original recording, and produce a more stable “phantom center” image over loudspeakers. The proposed improvements are applicable, in particular, in laptop and TV set audio systems, mobile internet devices, and home theater “soundbar” loudspeakers.
Convention Paper 8116 (Purchase now)
P21-2 Parametric Representation of Complex Sources in Reflective Environments—Dylan Menzies, De Montfort University - Leicester, UK
Aspects of source directivity in reflective environments are considered, including the audible effects of directivity and how these can be reproduced. Different methods of encoding and production are presented, leading to a new approach to extend parametric encoding of reverberation, as described in the DIRAC and MPEG formats, to include the response to source directivity.
Convention Paper 8118 (Purchase now)
P21-3 Analysis and Improvement of Pre-Equalization in 2.5-Dimensional Wave Field Synthesis—Sascha Spors, Jens Ahrens, Technische Universität Berlin - Berlin, Germany
Wave field synthesis (WFS) is a well established high-resolution spatial sound reproduction technique. Typical WFS systems aim at the reproduction in a plane using loudspeakers enclosing the plane. This constitutes a so-called 2.5-dimensional reproduction scenario. It has been shown that a spectral correction of the reproduced wave field is required in this context. For WFS this correction is known as pre-equalization filter. The derivation of WFS is based on a series of approximations of the physical foundations. This paper investigates on the consequences of these approximations on the reproduced sound field and in particular on the pre-equalization filter. An exact solution is provided by the recently presented spectral division method and is employed in order to derive an improved WFS driving function. Furthermore, the effects of spatial sampling and truncation on the pre-equalization are discussed.
Convention Paper 8121 (Purchase now)
P21-4 Discrete Wave Field Synthesis Using Fractional Order Filters and Fractional Delays—César D. Salvador, Universidad de San Martin de Porres - Lima, Peru
A discretization of the generalized 2.5D Wave Field Synthesis driving functions is proposed in this paper. Time discretization is applied with special attention to the prefiltering that involves half-order systems and to the delaying that involves fractional-sample delays. Space discretization uses uniformly distributed loudspeakers along arbitrarily shaped contours: visual and numerical comparisons between lines and convex arcs, and between squares and circles, are shown. An immersive soundscape composed of nature sounds is reported as an example. Modeling uses MATLAB and real-time reproduction uses Pure Data. Simulations of synthesized plane and spherical wave fields, in the whole listening area, report a discretization percentage error of less than 1%, using 16 loudspeakers and 5th order IIR prefilters.
Convention Paper 8122 (Purchase now)
P21-5 Immersive Virtual Sound Beyond 5.1 Channel Audio—Kangeun Lee, Changyong Son, Dohyung Kim, Samsung Advanced Institute of Technology - Suwon, Korea
In this paper a virtual sound system is introduced for the next generation multichannel audio. The sound system provides a 9.1 channel surround sound via a conventional 5.1 loudspeaker layout and contents. In order to deliver 9.1 sound, the system includes channel upmixing and vertical sound localization that can create virtually localized sound at any spherical surface around human head. An amplitude panning coefficient is used to channel upmixing that includes a smoothing technique to reduce musical noise occurred by upmixing. The proposed vertical rendering is based on VBAP (vector based amplitude panning) using three loudspeakers among the 5.1. For quality test, our upmixing and virtual rendering method is evaluated in real 9.1 and 5.1 loudspeaker respectively and compared with Dolby Pro Logic IIz. The demonstrated performance is superior to the references.
Convention Paper 8117 (Purchase now)
P21-6 Acoustical Zooming Based on a Parametric Sound Field Representation—Richard Schultz-Amling, Fabian Kuech; Oliver Thiergart, Markus Kallinger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional audio coding (DirAC) is a parametric approach to the analysis and reproduction of spatial sound. The DirAC parameters, namely direction-of-arrival and diffuseness of sound can be further exploited in modern teleconferencing systems. Based on the directional parameters, we can control a video camera to automatically steer on the active talker. In order to provide consistency between the visual and acoustical cues, the virtual recording position should match the visual movement. In this paper we present an approach for an acoustical zoom, which provides audio rendering that follows the movement of the visual scene. The algorithm does not rely on a priori information regarding the sound reproduction system as it operates directly in the DirAC parameter domain.
Convention Paper 8120 (Purchase now)
P21-7 SoundDelta: A Study of Audio Augmented Reality Using WiFi-Distributed Ambisonic Cell Rendering—Nicholas Mariette, Brian F. G. Katz, LIMSI-CNRS - Orsay, France; Khaled Boussetta, Université Paris 13 - Paris, France; Olivier Guillerminet, REMU - Paris, France
SoundDelta is an art/research project that produced several public audio augmented reality art-works. These spatial soundscapes were comprised of virtual sound sources located in a designated terrain such as a town square. Pedestrian users experienced the result as interactive binaural audio by walking through the augmented terrain, using headphones and the SoundDelta mobile device. SoundDelta uses a distributed "Ambisonic cell" architecture that scales efficiently for many users. A server renders Ambisonic audio for fixed user positions, which is streamed wirelessly to mobile users that render a custom, individualized binaural mix for their present position. A spatial cognition mapping experiment was conducted to validate the soundscape perception and compare with an individual rendering system.
Convention Paper 8123 (Purchase now)
P21-8 Surround Sound Panning Technique Based on a Virtual Microphone Array—Filippo M. Fazi, University of Southampton - Southampton, UK; Toshiro Yamada, Suketu Kamdar, University of California, San Diego - La Jolla, CA, USA; Philip A. Nelson, University of Southampton - Southampton, UK; Peter Otto, University of California, San Diego - La Jolla, CA, USA
A multichannel panning technique is presented, which aims at reproducing a plane wave with an array of loudspeakers. The loudspeaker gains are computed by solving an acoustical inverse problem. The latter involves the inversion of a matrix of transfer functions between the loudspeakers and the elements of a virtual microphone array, the center of which corresponds to the location of the listener. The radius of the virtual microphone array is varied consistently with the frequency, in such a way that the transfer function matrix is independent of the frequency. As a consequence, the inverse problem is solved for one frequency only, and the loudspeaker coefficients obtained can be implemented using simple gains.
Convention Paper 8119 (Purchase now)
Tuesday, May 25, 14:00 — 15:45 (Room C2)
AES/APRS—Life in the Old Dogs Yet—Part Three: After the Ball—Protecting the Crown Jewels
Moderator:
John Spencer, BMS CHACE
Panelists:
Chris Clark, British Library Sound Archive
Tommy D, Producer
Tony Dunne, A&R Coordinator, DECCA Records and UMTV/UMR - UK
Simon Hutchinson, PPL
Paul Jessop, Consulant IFI/RIAA
George Massenburg, P&E Wing, NARAS
Abstract:
A fascinating peek into the unspoken worlds of archiving and asset protection. It examines the issues surrounding retrievable formats that promise to future-proof recorded assets and the increasing importance of accurate recordings information (metadata). A unique group of experts from archiving and royalty distribution communities will hear a presentation from John Spencer, from BMS CHACE in Nashville, explaining his work with NARAS and the U.S. Library of Congress to establish an information schema for sound recording and Film and TV audio and then engage in a group discussion. The discussion then moves onto probably the most important topic to impact on the future of the sound and music economies—how to keep what we’ve got and reward those who made it.
Sir George Martin CBE was also awarded an AES Honorary Membership just before this session started. The award was introduced by AES Past President Jim Anderson and presented to Sir George by AES President Diemer de Vries. Click here to watch a video of the presentation.
Tuesday, May 25, 14:00 — 15:30 (Room C1)
W17 - 5.1 into 2 Won't Go—The Perils of Fold-Down in Game Audio
Chair:
Michael Kelly, Sony Computer Entertainment Europe
Panelists:
Richard Furse, Blue Ripple Sound Limited - UK
Simon Goodwin, Codemasters - UK
Jean-Marc Jot, DTS Inc. - CA, USA
Dave Malham, University of York - York, UK
Abstract:
One mixing solution cannot suit mono, stereo, headphone, and various surround configurations. However games mix and position dozens of sounds on the fly, so they can readily make a custom mix, rather than rely upon downmixing or upmixing that penalizes listeners who do not use the default configuration. This workshop explains practical solutions to problems of stereo speaker, headphone, and mono compatibility (including Dolby ProLogic and 2.1 set-ups) without detriment to surround. It notes differences between the demands of games and cinema for surround and the challenges of reconciling de facto (quad+2) and theoretical (ITU 5.1) standard loudspeaker layouts and playing 5.1 channel content on a 7.1 loudspeaker system.
Tuesday, May 25, 14:00 — 17:30 (Room C3)
P24 - Innovative Applications
Chair: John Dawson
P24-1 The Serendiptichord: A Wearable Instrument for Contemporary Dance Performance—Tim Murray-Browne, Di Mainstone, Nick Bryan-Kinns, Mark D. Plumbley, Queen Mary University of London - London, UK
We describe a novel musical instrument designed for use in contemporary dance performance. This instrument, the Serendiptichord, takes the form of a headpiece plus associated pods that sense movements of the dancer, together with associated audio processing software driven by the sensors. Movements such as translating the pods or shaking the trunk of the headpiece cause selection and modification of sampled sounds. We discuss how we have closely integrated physical form, sensor choice, and positioning and software to avoid issues that otherwise arise with disconnection of the innate physical link between action and sound, leading to an instrument that non-musicians (in this case, dancers) are able to enjoy using immediately.
Convention Paper 8139 (Purchase now)
P24-2 A Novel User Interface for Musical Timbre Design—Allan Seago, London Metropolitan University - London, UK; Simon Holland, Paul Mulholland, Open University - UK
The complex and multidimensional nature of musical timbre is a problem for the design of intuitive interfaces for sound synthesis. A useful approach to the manipulation of timbre involves the creation and subsequent navigation or search of n-dimensional coordinate spaces or timbre spaces. A novel timbre space search strategy is proposed based on weighted centroid localization (WCL). The methodology and results of user testing of two versions of this strategy in three distinctly different timbre spaces are presented and discussed. The paper concludes that this search strategy offers a useful means of locating a desired sound within a suitably configured timbre space.
Convention Paper 8140 (Purchase now)
P24-3 Bi-Directional Audio-Tactile Interface for Portable Electronic Devices—Neil Harris, New Transducers Ltd. (NXT) - Cambridge, UK
When an audio system uses the screen or casework vibrating as the loudspeaker, it can also provide haptic feedback. Just as a loudspeaker may be used reciprocally as a microphone, the haptic feedback aspect of the design may be operated as a touch sensor. This paper considers how to model a basic system embodying these aspects, including the electrical part, with a finite element package. For a piezoelectric exciter, full reciprocal modeling is possible, but for electromagnetic exciters it is not, unless multi-physics simulation is supported. For the latter, a model using only lumped parameter mechanical elements is developed.
Convention Paper 8141 (Purchase now)
P24-4 Tactile Music Instrument Recognition for Audio Mixers—Sebastian Merchel, Ercan Altinsoy, Maik Stamm, Dresden University of Technology - Dresden, Germany
To use touch screens for digital audio workstations, particularly audio mixing consoles, is not very common today. One reason is the ease of use and the intuitive tactile feedback that hardware faders, knobs, and buttons provide. Adding tactile feedback to touch screens will largely improve usability. In addition touch screens can reproduce innovative extra tactile information. This paper investigates several design parameters for the generation of the tactile feedback. The results indicate that music instruments can be distinguished if tactile feedback is rendered from the audio signal. This helps to improve recognition of an audio signal source that is assigned, e.g., to a specific mixing channel. Applying this knowledge, the use of touch screens in audio applications becomes more intuitive.
Convention Paper 8142 (Purchase now)
P24-5 Augmented Reality Audio Editing—Jacques Lemordant, Yohan Lasorsa, INRIA - Rhône-Alpes, France
The concept of augmented reality audio (ARA) characterizes techniques where a physically real sound and voice environment is extended with virtual, geolocalized sound objects. We show that the authoring of an ARA scene can be done through an iterative process composed of two stages: in the first one the author has to move in the rendering zone to apprehend the audio spatialization and the chronology of the audio events, and in the second one a textual editing of the sequencing of the sound sources and DSP acoustics parameters is done. This authoring process is based on the joint use of two XML languages, OpenStreetMap for maps and A2ML for Interactive 3-D audio. A2ML, being a format for a cue-oriented interactive audio system, requests for interactive audio services are done through TCDL, a tag-based cue dispatching language. This separation of modeling and audio rendering is similar to what is done for the web of documents with HTML and CSS style sheets.
Convention Paper 8143 (Purchase now)
P24-6 Evaluation of a Haptic/Audio System for 3-D Targeting Tasks—Lorenzo Picinali, De Montfort University - Leicester, UK; Bob Menelas, Brian F. G. Katz, Patrick Bourdot, LIMSI-CNRS - Orsay, France
While common user interface designs tend to focus on visual feedback, other sensory channels may be used in order to reduce the cognitive load of the visual one. In this paper non-visual environments are presented in order to investigate how users exploit information delivered through haptic and audio channels. A first experiment is designed to explore the effectiveness of a haptic audio system evaluated in a single target localization task; a virtual magnet metaphor is exploited for the haptic rendering, while a parameter mapping sonification of the distance to the source, combined with 3-D audio spatialization, is used for the audio one. An evaluation is carried out in terms of the effectiveness of separate haptic and auditory feedbacks versus the combined multimodal feedback.
Convention Paper 8144 (Purchase now)
P24-7 Track Displays in DAW Software: Beyond Waveform Views—Kristian Gohlke, Michael Hltaky, Sebastian Heise, David Black, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
For decades, digital audio workstation software has displayed the content of audio tracks through bare waveforms. We argue that the same real estate on the computer screen can be used for far more expressive and goal-oriented visualizations. Starting from a range of requirements and use cases, this paper discusses existing techniques from such fields as music visualization and music notation. It presents a number of novel techniques, aimed at better fulfilling the needs of the human operator. To this end, the paper draws upon methods from signal processing and music information retrieval as well as computer graphics.
Convention Paper 8145 (Purchase now)