Audio Engineering Society Preprints

AES 120th Convention

Paris, France
May 20-23, 2006

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

6634
The Effect of the Singer's Head on Vocalist Microphones
Schneider, Martin
Vocalist microphones are often optimised for theoretically perfect polar patterns, e.g. cardioid, supercardioid or hypercardioid. The polar pattern can be maintained very well if the microphone is placed in the free-field, with no obstacle around it. When the singer approaches the microphone, the head serves as a reflective and diffractive obstacle. Consequently the far-field polar patterns and frequency responses are distorted, making the microphones more prone to feedback in live amplification situations, and altering the sound of the “spill” in pure recording situations.

6635
Wind Generated Noise in Microphones - An Overview - Part 1
Brixen, Eddy B.; Hensen, Ruben
When microphones are exposed to wind, noise is generated. The amount of noise generated depends on many factors: the speed and the direction of the wind being of course two of the important factors. However, the size, shape and design principles of the microphones are also very important factors. At higher wind speeds, not only is noise generated but distortion is also introduced, normally as a result of clipping. This paper presents comparative measurements that provide an overview of the parameters influencing wind-noise generated in pressure and pressure gradient condenser microphones.

6636
P-MOS FET Application for Silicon Condenser Microphones
Arimura, Norihiro; Kimura, Norio; Ohga, Juro; Yasuno, Yoshinobu
An Electret Condenser Microphone (ECM) is widely as general the microphone devices. Yearly, the miniaturized and the lowering the voltage of the cellular phone for the power consumption decrease are accelerated. Though, the current ECM has progressed in small size and thin shape, the FET has been not designed as low voltage operation in spite of small packages. This paper pays attention to the P-MOS FET of the low current consumption for miniaturization and the improvement in performance by using CMOS process. The authors designed and tested prototype microphone units, and performed comparisons on a basic performance with the conventional ECM.

6637
Development of a Super-Wide-Range Microphone
Ando, Akio; Imanaga, Keishi; Iwaki, Masakazu; Ono, Kazuho; Tanabe, Hayao
This paper describes the development of a low-noise, high-sensitivity microphone with a wide frequency range. Microphones of this kind are needed to provide high quality sound sources for use in studies on the perceptual discrimination between musical sounds with and without very high frequency components. Conventional electrostatic microphones cannot be used for such recordings because conventional methods for expanding the frequency range use a small diaphragm that degrades the S/N ratio. The proposed microphone has a new design in which the frequency range is expanded in two ways, using both the diffraction and the resonance due to the microphone's diaphragm. These effects are generally thought to define the upper limit of the frequency range, but the authors have made active use of them to achieve both a wide frequency range and high sensitivity. The body shape was designed with the help of a scale model study. An omnidirectional, electrostatic microphone that picks up sounds of up to 100kHz with low noise has been developed.

6638
Listening Broadband Physical Model for Microphones: A First Step
Elliq, Mohammed; Lambert, Dominique; Lopes, Manuel; Millot, Laurent; Pelé, Gérard; Valette, Antoine
We will present a first step in design of a broadband physical model for microphones. Within the proposed model, classical directivity patterns (omnidirectionnal, bidirectionnal and cardioids familly) are refound as limit cases: monochromatic excitation, low frequency and far-field approximation. Monophonic pieces of music are used as sources for the model so we can listen the simulation of the associated recorded soundfield in realtime thanks to a Max/MSP application. Listening and subbands analysis show that the directivity is a function of frequential subband and source location. This model also exhibits an interesting proximity effect. Audio demonstrations will be given.

6639
Measuring the Perceived Differences between Similar High-quality Microphones
McKinnie, Douglas
Microphones of similar construction and polar-pattern that can be equalized to have nearly identical on-axis frequency response still are reported to have different sonic character. To help develop a model of how other physical measurements could predict the subjective sonic character, perceptual data were collected from a panel of listeners. The listeners individually made dissimilarity ratings of pair-wise comparisons of 9 versions of a single piano performance. Each version was recorded with a different model of small-diaphragm cardioid condenser microphone. The data are used to derive a stimulus space showing the most salient dimensions upon which the perceived timbre of the microphones differed.

6640
The Native B-Format Microphone: Part II
Benjamin, Eric; Chen, Thomas
Part I of this paper described the objective performance of tetrahedral cardioid arrays versus arrays comprised of discrete pressure and pressure gradient microphone capsules. In the present paper the results of direct listening comparisons between the two types of arrays are given. Simultaneous recordings were made using pairings of the arrays for subsequent comparisons. The sources include both speech and music, and the environments include a range from very dry to very reverberant. The recordings were compared in both horizontal-only and in periphonic reproduction systems.

6641
Influence of Components Precision on Characteristics of Dual Microphone Arrays
Goldin, Alexander; Valitov, Alexander
Microphone arrays have great potential in practical applications due to their ability for significant improvement in speech quality and signal to noise ratio in noisy environments. Large amount of scientific papers and patents have been devoted to different algorithmic techniques for producing optimal output of microphone array using different optimization criteria. However, in practice performance of microphone arrays in a large extent depend on the quality of their components such as amplitude matching, phase matching, error in distance between microphones and etc. This paper analyses dependence of a dual microphone array characteristics on the above factors.

6642
Application of Segmentation and Thumbnailing to Music Browsing and Searching
Levy, Mark; Sandler, Mark
We present a method for segmenting musical audio into structural sections, and some rules for choosing a representative 'thumbnail' segment. We demonstrate how audio thumbnails are an effective and natural way of returning results in music search applications. We investigate the use of segment-based models for music similarity searching and recommendation. We report experimental results of the performance and efficiency of these approaches in the context of SoundBite, a demonstration music thumbnailing and search engine.

6643
Multiple F0 Tracking in Solo Recordings of Monodic Instruments
Röbel, Axel; Rodet, Xavier; Yeh, Chunghsin
This article is concerned with the F0 tracking in monodic instrument solo recordings. Due to reverberation, the observed signal is rather polyphonic and single-F0 tracking techniques often give unsatisfying results. The proposed method is based on multiple-F0 estimation and makes use of the a priori knowledge that the observed spectrum is generated by a single monodic instrument. The predominant F0 is tracked first and the secondary F0 tracks are then established. The proposed method is tested on reverberant recordings and show significant improvements compared to single-F0 estimators.

6644
Harmonic Plus Noise Decomposition: Time-frequency Reassignment Versus a Subspace Based Method
Badeau, Roland; David, Bertrand; Emiya, Valentin; Grenier, Yves
This work deals with the Harmonic+Noise decomposition and, as targeted application, to extract transient background noise surrounded by a signal having a strong harmonic content (speech for instance). In that perspective, a method based on the reassigned spectrum and a High Resolution subspace tracker are compared, both on simulations and in a more realistic manner. The reassignment re-localizes the time-frequency energy around a given pair (analysis time index, analysis frequency bin) while the High Resolution method benefits from a characterization of the signal in terms of a space spanned by the harmonic content and a space spanned by the stochastic content. Both methods are adaptive and the estimations are updated from a sample to the next.

6645
Signal Analysis Using the Complex Spectral Phase Evolution (CSPE) Method
Garcia, Ricardo A.; Short, Kevin M.
The Complex Spectral Phase Evolution (CSPE) method is introduced as a tool to analyze and detect the presence of short-term stable sinusoidal components in an audio signal. The method provides for super-resolution of frequencies by evaluating the evolution of the phase of the complex signal spectrum over time-shifted windows. It is shown that this analysis, when applied to a sinusoidal signal component, allows for the resolution of the true signal frequency with orders of magnitude greater accuracy than the DFT. Further, this frequency estimate is independent of the frequency bin. The method is robust in the presence of noise or nearby signal components, and is a fundamental tool in the front-end processing for the KOZ compression technology.

6646
Upwind Leapfrog Schemes in Physical Models with Mixed Modeling Strategies
Escolano, José; López, José Javier
Block-based physical modeling with mixed modeling strategies is one of the most promising method for digital sound synthesis. This technique proposes to model and discretize each element individually and their interaction topology is separately implemented. In this paper the use of the Linear Bicharacteristic Scheme (LBS) or upwind leapfrog is proposed for digital sound synthesis. It provides an efficient and accurate alternative stencil to the classical leapfrog scheme of the Finite Difference Time Domain (FDTD) method. Furthermore, in this work is proposed the use of the conversion of dependent wave equation variables into characteristic variables to obtain a method suitable to interact with Wave Digital Filter models. This technique is extensively presented and finally justified with some examples.

6647
Simple Modeling of Soundboard Effect for Piano Transcription
Beracoechea, Jon Ander; Casajús-Quirós, Francisco Javier; Ortiz-Berenguer, Luis; Perez-Aranda, J.; Torres-Guijarro, Soledad
Partials of piano sounds are inharmonic. This inharmonicity is due either to string stiffness and to soundboard impedance. The last has not been widely documented. Two problems arise: to know the value of the impedance and to evaluate the frequency deviation the partial suffers. In this work, that deviation has been calculated either by using Morse’s equations and by using author’s proposed method. To validate results, deviations for some piano notes have been measured. Besides, the soundboard impedance has also been measured to verify relationship between deviation and impedance. Moreover, a method to evaluate impedance using measured deviations is also proposed. This last method can be useful during training stages in transcription systems and for parameter extraction schemes.

6648
Contextual Effects on Sound Quality Judgements: Listening Room and Automotive Environments
Beresford, Kathryn; Ford, Natanya; Rumsey, Francis; Zielinski, Slawomir K.
This study was designed to assess the effect of the listening context on basic audio quality for stimuli with varied mid-range timbral degradations. An assessment of basic audio quality was carried out in two different listening environments; an ITU-R BS.1116 conformant listening room and a stationary vehicle. A group of untrained listeners graded basic audio quality using a novel single stimulus method. The listener population was divided into two subsets – one made evaluations in a listening room and the other in a vehicle. The single stimulus method was investigated as a possible subjective evaluation method for use in automotive environments.

6649
Next Generation Automotive Sound Research and Technologies
Benjamin, Eric; Crockett, Brett; Smithers, Michael
The automobile is quickly becoming a prominent environment for listening to multi-channel audio content. As a listening space, the automobile is both interesting and challenging due its interior structure and materials, its predominant off-axis listening positions and the amount and variability of background noise. This paper discusses these challenges, describes a number of existing multi-channel sound technologies and their applicability to the automotive environment, and presents several novel sound technologies that provide new solutions to some of these challenges. Ongoing challenges and associated automotive sound research being investigated are also presented.

6650
Spatial Sound Localization Model Using Neural Network
Correa, Rafael; Floody, Sergio; Lara, Marcelo; Venegas, Rodolfo
This work presents the design, implementation and training of a spatial sound localization system for broadband sound in an anechoic environment inspired in the human auditory system and implemented using neural networks. The data acquisition was made experimentally. The model consist in a nonlinear transformer which possesses one module of ITD and ILD extraction and a second module constituted by a neural network that estimates the sound source position in elevation and azimuth angle. A comparison between the model performance using three different bank filters and a sensitivity analysis of the neural network input are also presented. The average error is 2.4º. This project has been supported by the FONDEI fund of Universidad Tecnológica de Chile.

6651
Aurally Motivated Analysis for Scattered Sound in Auditoria
Jaramillo, Ana M.; Norris, Molly K.; Xiang, Ning
The goal of the first part of this work was to implement an aurally-adequate time-frequency analysis technique as a motivated first effort that takes into account binaural hearing with the implementation goal of the analysis of sound scattering data. The second part of this work aimed to use the developed model in the analysis of different scattering surfaces implemented in a room acoustics modeling program. This was an attempt to start to gain an understanding of what kind of visual changes could be expected when one alters the coefficients used to model scattering in conjunction with the Lambert scattering model. This research is the pursuit of a method for visually representing scattering effects that directly correlates with human perception.

6652
Audibility of Spectral Differences in Head-Related Transfer Functions
Faundez Hoffmann, Pablo; Møller, Henrik
The spatial resolution at which head-related transfer functions (HRTFs) are available is an important aspect in the implementation of three-dimensional sound. Specifically, synthesis of moving sound requires that HRTFs are sufficiently close so the simulated sound is perceived as moving smoothly. How close they must be, depends directly on how much the characteristics of neighboring HRTFs differ, and, most important, when these differences become audible. Differences between HRTFs exist in the interaural delay (ITD) and in the spectral characteristics, i.e. the magnitude spectrum of the HRTFs. The present study investigates the audibility of the spectral characteristics. To this purpose, binaural and monaural audibility thresholds of differences between minimum-phase representations of HRTFs are measured and evaluated.

6653
Looking for a Relevant Similarity Criterion for HRTF Clustering: A Comparative Study
Bondu, Alexis; Busson, Sylvain; Lemaire, Vincent; Nicol, Rozenn
For high-¯delity Virtual Auditory Space (VAS), binaural synthesis requires individualized Head-Related Transfer Functions (HRTF). An alternative to exhaustive measurement of HRTF consists in measuring a set of representative HRTF in a few directions. These selected HRTF are considered as representative because they summarize all the necessary spatial and individual information. The goal is to deduce the HRTF in non-measured directions from the measured ones by appropriate modeling. Clustering is applied in order to identify the representative directions, but the ¯rst issue relies on the de¯nition of a relevant distance criterion. The paper presents a comparative study of several criteria taken from literature. A new insight in HRTF (dis)similarity is proposed.

6654
Evaluation of a 3D-Audio System with Head Tracking
Minnaar, Pauli; Pedersen, Jan Abildgaard
A 3D-audio system was evaluated in an experiment where listeners had to “shoot down” real and virtual sound sources appearing from different directions around them. The 3D-audio was presented through headphones and head tracking was used. In order to investigate the influence of head movements both long and short stimuli were used. Twenty six people participated, of which half were students and half were pilots. The results were analyzed by calculating a localization offset and a localization uncertainty. For azimuth no significant offset was found, whereas for elevation an offset was found that is strongly correlated with the stimulus elevation. The uncertainty for real and virtual sound sources was 10º and 14º in azimuth, and 12º and 24º in elevation.

6655
Design and Verification of HeadZap, a Semi-automated HRIR Measurement System
Anderson, Mark; Begault, Durand; Godfroy, Martine; Miller, Joel D.; Roginska, Agnieszka; Wenzel, Elizabeth
This paper describes the design, development and acoustic verification of HEADZAP, a semi-automated system for measuring head-related impulse responses (HRIR) designed by AuSIM Inc. and modified by the NASA Ames Research Center Spatial Auditory Display Laboratory. HEADZAP utilizes an array of twelve loudspeakers in order to measure 432 HRIRs at 10° intervals in both azimuth and elevation, in a non-anechoic environment. Application to real-time rendering using SLAB software an audio-visual localization experiment is discussed.

6656
Visualization of Perceptual Parameters in Interactive User Interfaces: Application to the Control of Sound Spatialization
Delerue, Olivier
This work addresses the general problem of designing graphical user interfaces for non expert users. The key idea is to help the user anticipating his actions by displaying, in the interaction area, the expected evolution of a quality criterion according to the degrees of freedom which are being monitored. This concept is first applied to the control of sound spatialization: various perceptually based criteria such as “spatial homogeneity” or “spatial masking” are represented as a grey shaded map superimposed to the background of a bird’s eye view interface. After selecting a given sound source, the user is thus informed how these criteria will behave if the source is being moved to any other location of the virtual sound scene.

6657
A New Approach for Direct Interaction with Graphical Representations of Room Impulse Responses for the Use in Wave Field Synthesis Reproduction
de Vries, Diemer; Langhammer, Jan; Melchior, Frank
Room simulation based on convolution is state of the art in modern audio processing environments. Most of the systems currently available provide only a few controllers to modify the underlying room impulse responses. The sound designer can manipulate one set of numeric parameters even in spatial reproduction systems. This paper describes a new approach for the interactive control of room impulse responses based on visualization and parameterization. The new principle is originally developed for the use in Wave Field Synthesis systems and based on Augmented Reality user interfaces. An adaptation to conventional user interfaces and other spatial sound reproduction systems is possible. The modification of the room impulse responses is performed by direct interaction with 3D graphical representations of multi-trace room impulse responses.

6658
Directional Audio Coding: Filterbank and STFT-based Design
Faller, Christof; Pulkki, Ville
Directional audio coding (DirAC) is a method for spatial sound representation, applicable to arbitrary audio reproduction methods. In the analysis part, properties of the sound field in time and frequency in a single point are measured and transmitted as side information together with one or more audio waveforms. In the synthesis part, the properties of the sound field are reproduced using separate techniques for point-like virtual sources and diffuse sound. Different implementations of DirAC are described and differences between them are discussed. A modification of DirAC is presented, which provides a link to Binaural Cue Coding and parametric multi-channel audio coding in general (e.g. MPEG Surround).

6659
Newly Established IEC Standard on Audio Quality Measurement of Personal Computers
Furukawa, Masamichi; Kurakata, Kenji
A new IEC standard on audio quality measurement of personal computers (PCs) was published in December 2005, entitled IEC 61606-4 "Audio and audiovisual equipment - Digital audio parts - Basic measurement methods of audio characteristics - Part 4: Personal computer." That standard prescribes methods for measuring PC audio quality, taking into account the requirements of measuring conditions of PCs. Furthermore, a new measure of audio signal quality, short-term distortion, was introduced to describe PC-specific noise problems. This paper presents an outline of that standard.

6660
Scene Description Model and Rendering Engine for Interactive Virtual Acoustics
Jot, Jean-Marc; Trivi, Jean-Michel
Interactive environmental audio spatialization technology has become commonplace in personal computers, where its primary current application is video game sound track rendering. The most advanced PC audio platforms available can spatialize 100 or more sound sources simultaneously over headphones or multi-channel home theater systems, and employ multiple reverberation engines to simulate complex acoustical environments. This paper reviews the main features of the EAX environmental audio programming interface and its relation to the I3DL2 and MPEG-4 standards. A statistical reverberation model is introduced to account for per-source distance and directivity effects. An efficient spatial reverberation and mixing architecture is described for the spatialization of multiple sound sources around a virtual listener navigating across multiple connected virtual rooms including acoustic obstacles.

6661
Intelligent Audio for Games
Walder, Col
Providing interactive audio for computer games has traditionally been seen as a challenge, particularly given the technological limitations of games consoles. With current advances in technology, however, there is the potential to take advantage of the benefits of interactivity. This paper proposes the use of Artificial Intelligence (AI) routines to control in-game audio with a focus on implementing techniques used in film sound for drama based games. Soar architecture is presented as a good candidate for developing audio AI for games.

6662
A Frame Loss Concealment Technique for MPEG-AAC
Rose, Kenneth; Ryu, Sang-Uk
An efficient method is proposed for frame loss concealment within the advanced audio coding (AAC) decoder, which can effectively mitigate the adverse impact of frame loss on reconstruction quality. The spectral information of the lost frame is first estimated in the modified discrete cosine transform (MDCT) domain via the known frame interpolation approach. The interpolated MDCT coefficients are then further refined by magnitude scaling and sign correction, which are differently designed for tonal and noise components of the source signal: In noise-like spectral bins, shaped-noise insertion technique is employed to adjust the interpolated coefficients, while coefficients in tone-dominant bins are refined by magnitude scaling and novel sign correction techniques so as to optimize the fit of the corresponding time reconstruction with available partial signal information from neighboring frames. Subjective quality evaluations demonstrate that the proposed method achieves significant quality improvement over the shaped-noise insertion method adopted in commercial AAC decoders.

6663
Multiple Description Error Mitigation Techniques for Streaming Compressed Audio Over a 802.11 Wireless Network
Cheng, Corey I.; Jiang, Wenyu
This paper presents several multiple description (MD) coding techniques for error mitigation of compressed audio streamed over an 802.11b/g wireless network. Loosely speaking, an MD encoder generates several descriptions of the same source, and an MD decoder recreates the best estimate of the source from the set of descriptions it successfully receives. We propose a design for an MD architecture and simulate its integration into the AAC codec. We use packet loss traces gathered from an actual 802.11 b/g network to simulate the proposed codec’s error mitigation properties for various network traffic conditions. We examine how tuning several of the proposed codec’s parameters would affect the sound quality and overall bitrate of the proposed codec. Specifically, we show how interleaving, renormalization, and low-frequency variance estimation techniques can be used in conjunction with hierarchical correlating transforms to improve the sound quality of multiple description codecs.

6664
Single Frequency Networks for FM Radio
Soelberg, Pierre
Single Frequency Networks (SFN) and Near Single Frequency Networks (NSFN) are usually not considered suitable for FM radio. Some countries are now re-planning their FM bands for the use of (N)SFN, in order to make space for more stations. Even though some stations use it, like a station covering a highway, replanning the FM-band with the use of SFN for a whole country, is a different thing. The first country to do this was the Netherlands, and the first experiences with it, are not as good as expected. The requirements for synchronization of FM transmitters used for (N)SFN are explained, and SFN networks are tested from real transmitter sites. The result is a proposed correction for the Dutch norm.

6665
A Paradigm for Wireless Digital Audio Home Entertainment
Floros, Andreas; Kokkos, Nikos; Mourjopoulos, John; Tatlas, Nicolas - Alexander
Despite recent advances in wireless networking technology, real-time streaming of CD-quality digital audio remains a challenging topic. In this work, a set of applications following the server-client model was developed, facilitating the transmission and playback of PCM-coded audio over wireless links. The implementation is based on typical Personal Computer (PC) platforms interconnected with off-the-shelf wireless networking hardware. Performance evaluation tests are presented under different networking parameters and link conditions, leading to an optimal set of parameters for high-quality wireless digital audio delivery.

6666
Online Acoustic Measurements in a Networked Audio System
Härmä, Aki
A networked audio system consists of audio devices that are in the same physical environment and are connected by a network. The network connection makes it possible to perform continuous acoustic measurements between the devices. Such measurement data can be used, for example, to control the playback by the properties of the actual sound field produced. Continuous acoustic measurement involves transmission of audio data over the network. The bit-rate of the audio data should be low because the measurement is not a primary function of the networked system. In this paper we introduce a robust system for the networkedaudio measurements where the bitrate sent over the network is small.

6667
Design and Installation of Recording Studios for Vocational training
Bradley, Chris; Law, Billy
The Design and Installation of new Recording Studios for training of music and sound production allowing unparalleled direct student hands-on tuition. The design allows simultaneous recording from the live rooms to all twelve control room’s via digital distribution, enabling individual set up for a recording session, multi-track recording and subsequent mixdown. All recording sessions are saved to a centralised server which will allow back up and uploading to and from any other control room. Students can therefore import their work into any of the other control rooms at any time. Networking will be through Gigabit Ethernet so transfer of work is fast and students have their own password protected space learning the importance of file management.

6668
Flexible, High Speed Audio Networking for Hotels and Convention Centres
Bradley, Klinkradt; Chigwamba, Nyasha; Foss, Richard; Fujimori, Jun-ichi; Harold, Okai-Tettey; Klinkradt, Brad; Okai-Tettey, Harold
This paper describes the use of mLAN (music Local Area Network) to solve the problem of audio routing within hotels and convention centers. mLAN is a Firewire based digital network interface technology that allows professional audio equipment, PCs and electronic instruments to be easily and efficiently interconnected using a single cable. In order to solve this problem, an existing mLAN Connection Management Server, augmented with additional functionality, has been utilized. A graphical client application has been created that displays the various locations within a hotel/convention center, and sends out appropriate routing messages in Extensible Mark-up Language (XML) to an mLAN connection management server. The connection management server, in turn, controls a number of mLAN audio distribution boxes on the firewire network.

6669
Sound Quality Differences between Electret Film (EMFIT) and Piezoelectric Under-saddle Guitar Pickups
Penttinen, Henri; Tikander, Miikka
Two different types of under-saddle guitar pickups, piezoelectric and electret film (EMFIT) were measured and compared. The measurements included comparisons of magnitude, time, and phase responses, and distortion characteristics. The measurements were conducted with a custom rig that allowed accurate control of the environment. For excitation both frequency sweeps and impulsive stimuli were used. As for the magnitude response, the piezoelectric pickup has a boosted bass response and a slightly pronounced high frequency response. The results imply nonlinear behaviour as a function of both the excitation type (sweep vs. impulsive) and the amount of excitation force (small vs. large). In addition, the piezoelectric microphone is fairly immune to tension changes, whereas EMFIT microphone's sensitivity increases as the tension decreases. For time responses excited impulsively the only differences were found at the beginning of the responses. No significant difference in the distortion behaviour was found. A linear filter model is also proposed for making either microphone sound like the other.

6670
A Hybrid Concealment Algorithm for Non-predictive Wideband Audio Coders
Lefebvre, Roch; Vilaysouk, Vilayphone
This paper proposes a hybrid Packet Loss Concealment (PLC) algorithm for memoryless encoders such as PCM. The concealment algorithm integrates two modes, one in the time domain and the other in the frequency domain. Mode selection is performed using the previous, correctly received samples prior to an erased packet. This hybrid approach provides a packet loss concealment mechanism which can adapt to the signal characteristics and is not restricted to pure speech signals. Subjective evaluations have demonstrated that the proposed algorithm performs significantly better than single mode concealment algorithm.

6671
Towards an Inverse Constant Q Transform
Cranitch, Matt; Cychowski, Marcin T.; FitzGerald, Derry
The Constant Q transform has found use in the analysis of musical signals due to its logarithmic frequency resolution. Unfortunately, a considerable drawback of the Constant Q transform is that there is no inverse transform. Here we show it is possible to obtain a good quality approximate inverse to the Constant Q transform provided that the signal to be inverted has a sparse representation in the Discrete Fourier Transform domain. This inverse is obtained through the use of `0 and `1 minimisation approaches to project the signal from the constant Q domain back to the Discrete Fourier Transform domain. Once the signal has been projected back to the Discrete Fourier Transform domain, the signal can be recovered by performing an inverse Discrete Fourier Transform.

6672
History and Design of Russian Electro-musical Instrument "Theremin"
Vasilyev, Yurii
Electro-musical instrument Theremin, developed by the Russian physicist L. Theremin, has passed a long way in its evolution. It evokes constantly growing interest of audio-engineers and performers. Theremin is used both for performing musical compositions of different genres and for making special effects in theatrical performances, multimedia, film industry. In the presented work the analysis of circuit technique solutions has been done created during more than 80 years period both on the basis of analogous circuit technique and digital microprocessor technique, and realizations of Theremin as real and virtual musical instruments. Also advantages and disadvantages of different circuit technique solutions have been analyzed and most interesting realizations of virtual Theremin are presented.

6673
A Fast- and High-convergence Method for ICA-based Noise Reduction in Mobile Phone Speech Communication
Etoh, Minoru; Zhipeng, Zhang
This paper proposes a noise reduction technique that applies a priori information to unmixing matrix estimation in ICA; it offers fast and accurate convergence. We formulate the parameter estimation stabilized by the a priori information as a Bayesian framework?@of maximum a posteriori (MAP) estimation, and show its robustness in mobile phone environments, where the position of the microphone relative to the mouth is almost constant. We use the transfer function of mouth to microphone for one row of the unmixing matrix. Using these estimated parameters as initial values, the unmixing matrix can be updated with high efficiency in the framework of MAP estimation. Experimental results confirm that the proposed method achieves high performance, especially in high SNR noise conditions.

6674
A Comparison of Time-Domain Time-Scale Modification Algorithms
Coyle, Eugene; Dorran, David; Lawlor, Robert
Time-domain approaches to time-scale modification are popular due to their ability to produce high quality results at a relatively low computational cost. Within the category of time-domain implementations quite a number of alternatives exist, each with their own computational requirements and associated output quality. This paper provides a computational and objective output quality assessment of a number of popular time-domain time-scaling implementations; thus providing a means for developers to identify a suitable algorithm for their application of interest. In addition, the issues that should be considered in developing time-domain algorithms are outlined, purely in the context of a waveform editing procedure.

6675
The Importance of the Non-harmonic Residual for Automatic Musical Instrument Recognition of Pitched Instruments
Livshin, Arie; Rodet, Xavier
In different papers dealing with automatic musical instrument recognition of pitched instruments, the features used for classification are based solely on the fundamental frequencies and the harmonic series, ignoring the nonharmonic residual. In this paper we explore whether instrument recognition rate of pitched instruments is decreased by removing the non-harmonic information present in the sound signal.

6676
A Fuzzy Rules-based Speech/Music Discrimination Approach for Intelligent Audio Coding Over the Internet
Garcia Gálan, Sebastian; Muñoz-Exposito, Jose Enrique; Rivas Peña, Fernando; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
Our work presents a speech/music discrimination approach based on fuzzy rules for selecting the suitable coder required in an intelligent audio coding system. When the same coder is used for both speech and music, is difficult to achieve good audio quality and low bit rates for both types of signals. We propose using a simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC) for speech/music discrimination. In order to select the suitable audio coder for each audio frame, an expert system is proposed. The main advantage of the proposed approach is the low computational cost in both the speech/music discrimination and coder selection stages. It allows its use in real time applications as internet audio streaming.

6677
Analysis and Transynthesis of Solo Erhu Recordings using Addtivie/Subtractive Synthesis
Chang, Wei-Lun; Siao, Yi-Song; Su, Alvin W.Y.
Erhu is the main bowed-string instrument in traditional Chinese music, like violin in western music. It has two strings and its top plate is made of snake skin. Numerous solo works were written for erhu. In this paper, erhu resynthesis/transynthesis software is presented. We use frame based methods to analyze pitch and volume information of a solo erhu recording. Then, one can re-synthesize it using the erhu timbre extracted from the original recording, other erhu timbres, or even timbres like violin and trumpet. Additive synthesis and subtractive synthesis methods are used to synthesize the overall sound. Because the expression and playing style of the original recording are preserved, the result is realistic and musical.

6678
Application of Fisher Linear Discriminant Analysis to Speech/Music Classification
Alexandre, Enrique; Cuadra-Rodríguez, Lucas; Gil-Pita, Roberto; Rosa-Zurera, Manuel
This paper proposes the application of Fisher linear discriminants to the problem of speech/music classification. Fisher linear discriminants can classify between two different classes, and are based on the calculation of some kind of centroid for the training data corresponding with each one of these classes. Based on that information a linear boundary is established, which will be used for the classification process. Some results will be given demonstrating the superior behavior of this classification algorithm compared with the well-known K-nearest neighbor algorithm. It will also be demonstrated that it is possible to obtain very good results in terms of probability of error using only one feature extracted from the audio signal, being thus possible to reduce the complexity of this kind of systems in order to implement them in real-time.

6679
Effectiveness of Height Information for Reproducing the Presence and Reality in Multichannel Audio System
Hamasaki, Kimio; Hiyama, Koichiro; Nishiguchi, Toshiyuki; Okumura, Reiko
A 22.2 multichannel sound system was developed that adapts to an ultrahigh-definition video system with 4000 scanning lines. The sound system consisted of loudspeakers with three layers: an upper layer with nine channels, a middle layer with ten channels, and a lower layer with three channels and two channels for low frequency effects. This system has new features of three-dimensional sound reproduction. Subjective evaluation by the semantic differential (SD) method are presented to assess the importance of height information for a sound system using several stimuli in a 22.2 multichannel audio system with Super Hi-Vision and a high-definition television. Furthermore, the actual effectiveness of height information and some practical suggestions for aesthetic mixing of three-dimensional audio is also presented.

6680
Multichannel Processing for Microphones Arrays
Martignon, Paolo
Microphones arrays are employed to make measurements or recordings taking in account the spatial properties of sound. Here the attention is focused on planar array oriented to acoustic mapping, which have a particular interest in industrial and environmental acoustics, although musical and audio applications are directly involved. A Beam Forming theory overview is proposed first, with a light study of array spatial resolution, theory which holds a physical base which no algorithm can deny. Than a new algorithm, basing on Kirkeby multi-channel inversion, is proposed. Comparison between multi-channel inversion and Beam Forming are made through simulations, with good news pro the new method.

6681
Miniature Microphone Arrays for Multi-channel Recording
Backman, Juha
The paper describes a method of using a dense array of miniature microphones (e.g. MEMS or miniature electret) to yield precise one-point multi-channel gradient microphones. The signals obtained from individual microphones in the array are used to obtain a minum noise estimate for the are used to form the zeroth, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multi-channel signals with adjustable polar patterns for recording purposes.

6682
Benefits of Distance Correction for Multichannel Microphones
Goerne, Thomas
Subjective assessment of different stereophonic or multichannel techniques usually suffers from the different diffuse field sensitivity of the tested microphone setups. Although distance correction factors for gradient transducers in the diffuse sound field are well known, they are not sufficient to model multichannel microphone arrays. Thus correction factors for gradient transducers at an angle as well as for MS pairs are proposed, and the benefits of corrected microphone setups are investigated.

6683
Virtual Source Location Information Based Matrix Decoding System
Arora, Manish; Moon, Hangil
In this paper, a new matrix decoding system using vector based Virtual Source Location Information (VSLI) is proposed as one alternative to the conventional Dolby Pro logic II/IIx system for reconstructing multi-channel output signal from matrix encoded two channel signals, Lt/Rt. This new matrix decoding system is composed of passive decoding part and active part. The passive part makes crude multi-channel signals using linear combination of the two encoded signals(Lt/Rt) and the active part enhances each channel regarding to the virtual source which is emergent in each inter channel. The virtual sources between channels are estimated by inverse constant power panning law.

6684
Relating Auditory Attributes of Multichannel Reproduced Sound to Preference and to Physical Parameters
Choisel, Sylvain; Wickelmaier, Florian
Sound reproduced by multichannel systems is affected by many factors giving rise to various sensations, or auditory attributes. Relating specific attributes to overall preference and to physical measures of the sound field provides valuable information for a better understanding of the parameters playing a role in sound quality evaluation. Eight selected attributes are quantified by a panel of 39 listeners using paired-comparison judgments and probabilistic choice models, and related to overall preference. A multiple-regression model predicts preference well, and some similarities are observed within and between musical program materials, allowing for a careful generalization regarding the perception of spatial audio reproduction. Finally, a set of objective measures is derived from analysis of the sound field at the listening position in an attempt to predict the auditory attributes.

6685
Quality Degradation Effects Caused by Limiting the Bandwidth of Standard Surround Sound Channels and Hierarchically Encoded MSBTF Channels: A Comparative Study
Jiao, Yu; Rumsey, Francis; Zielinski, Slawomir K.
Limiting the bandwidth of multichannel audio can be used as an effective method of trading-off audio quality with broadcasting costs. In this paper, subjective effects of two controlled high frequency limitation methods on multichannel audio quality were studied with formal listening tests. The first method was based on limiting the bandwidth of standard surround sound channels (Rec. ITU-R BS. 775-1), the second involved limiting the bandwidth of the hierarchically encoded MSBTF channels. The results are compared and discussed. In this experiment, the Low Frequency Effect (LFE) channel was omitted.

6686
Initial Developments of an Objective Method for the Prediction of Basic Audio Quality for Surround Audio Recordings
George, Sunish; Rumsey, Francis; Zielinski, Slawomir K.
This paper describes the development of the objective method for the prediction of the Basic Audio Quality (BAQ) of bandlimited or down-mixed surround audio recordings. A number of physical parameters, including interaural cross-correlation coefficient and spectral descriptors, were extracted from the recordings and used in a linear regression model to predict BAQ scores obtained from listening tests. The results showed a high correlation between the predicted scores and those obtained from the listening test with the average error of prediction being smaller than 10%. Although the method was originally developed for 5-channel surround recordings, after some modifications it can be upgraded to any number of audio channels.

6687
Listener Opinions of Novel Spatial Audio Scenes
Beresford, Kathryn; Rumsey, Francis; Zielinski, Slawomir K.
Listener opinions for alternative approaches to recording multichannel classical music were investigated, particularly considering alternatives to the traditional approach. Recordings were made with pre-existing microphone arrays but alternative arrangements of musicians. These were used in a listening test to assess different attributes (timbral balance, envelopment, locatedness etc.). From the results it was noted that naïve and trained listeners assessed the recordings in different ways. Through factor analysis, two components were identified to represent these assessments – creativity and conventionality. The naïve listeners indicated that purchasability was closely related to creativity whereas for the trained listeners, conventionality was an indicator of purchasability. A method for predicting purchasability was developed which may aid future work in the area.

6688
Low Frequency Sound Field Enhancement System for Rectangular Rooms using Multiple Low Frequency Loudspeakers
Birkedal Nielsen, Sofus; Celestinos, Adrian
Rectangular rooms have strong influence on the low frequency performance of loudspeakers. Simulations of three different room sizes have been carried out using finite-difference time-domain method (FDTD) in order to predict the behaviour of the sound field at low frequencies. By using an enhancement system with extra loudspeakers the sound pressure level distribution along the listening area presents a significant improvement in the subwoofer frequency range. The system is simulated and implemented on the three different rooms and finally verified by measurements on the real rooms.

6689
Tactile Strategies and Resources for Teaching Multichannel Sound Concepts
Gaston, Leslie
Several university audio programs now incorporate multichannel, or “surround” sound into their curricula. In order to supplement these courses and lectures, many opportunities exist to incorporate hands-on demonstrations of concepts used for microphone techniques, mixing, monitoring, and mastering. This paper will give suggestions for different tactile strategies which can be used to illustrate concepts in multichannel audio, as well as other resources which may be utilized when doing preparation and research for teaching classes. Suggestions for homework and research topics for students will also be provided, along with recommended equipment needs.

6690
All Amplifiers are Analogue, but Some Amplifiers are More Analogue than Others
Groenenberg, René; Putzeys, Bruno; van der Hulst, Paul; Veltman, André
This paper intends to clarify the terms "digital" and "analogue" as applied to class-D audio power amplifiers. Since loudspeaker terminals require an analogue voltage, an audio power amplifier must have an analogue output. If its input is digital, digital-to-analogue conversion is executed at some point. Once a designer acknowledges the analogue output properties of a class-D power stage, amplifier quality can improve. The incorrect assumption that some amplifiers are supposedly digital, causes many designers to come up with twisted digital patches to ordinary analogue phenomena such as timing distortion or supply rejection. This irrational approach blocks the way to a rich world of well-established analogue techniques to avoid many of these problems and realize otherwise unattainable characteristics such as excellent THD+N and extremely low output impedance throughout the audio band.

6691
Towards an Ideal Switching (Class-D) Power Amplifier: How to Control the Flow of Power in a Switching Power Circuit
Esslinger, Rolf; Jurzitza, Dieter
The design of a switching (class-D) audio power amplifier suitable for high-end audio applications is still a very challenging task for circuit design and signal processing engineers. Classical power stage topologies using Pulse-Width Modulation (PWM) in combination with voltage-controlled MOSFET H-bridges are already available on the market, but their performance in terms of signal bandwidth and linearity is still far below the one of traditional class-A and A/B power stages. Moreover, EMC is an issue that is very hard to control. Class-D output stages are considered from a totally different point of view in this paper: The flow of power in the output stage, containing the switching power stage as an “power control element”, the output filter as an “energy store” and the load as both an “power sink” and an “power source” in case the load is not a resistor but a real world loudspeaker device. It is shown, where in a typical power stage the power loss occurs, which is dissipated as heat. To improve the quality and efficiency of high-frequency switched power stages, investigation has to be taken into the way, how to control the flow of power into the storage elements and how to charge them most precisely and most efficiently. Some fundamental approaches for this will be shown in this paper.

6692
Second Generation Intelligent Class D Amplifier Controller Integrated Circuit Enables both Low Cost and High Performance Amplifier Designs
Andersen, Jack; Chieng, Daniel; Harris, Steven; Klaas, Jeff; Kost, Michael; Taylor, Skip
This paper describes a digital input Class D amplifier controller integrated circuit which performs many of the functions needed to build a high performance Class D audio amplifier. Sophisticated digital Pulse Width Modulation, combined with digital feed-forward and feedback paths, yields both low cost and high performance amplifier designs. A powerful DSP is included to support amplifier control and allows comprehensive audio signal processing, including loudspeaker load compensation, EQ, time alignment, room acoustics compensation, bass enhancement, loudspeaker driver protection, virtual surround and other audio signal processing tasks. Power supply feed-forward and closed-loop feedback technology correct for power supply variations, non-linearity and other distortion-inducing mechanisms.

6693
PWM Amplifier Control Loops with Minimum Aliasing Distortion
Neesgaard, Claus; Risbo, Lars
PWM class-D audio power amplifiers contain typically a control loop filter network and a comparator producing the PWM signal. The comparator performs a sampling operation whenever it changes state. A previous paper by the author analyzed this sampling behavior from a small signal point of view. The present paper attempts to formulate a large-signal model that accounts for the non-linear effects of the sampling due to aliasing of high frequency carrier components. The model is validated using simulations and a class of loop filters is presented that obtains minimum aliasing distortion thanks to the use of quadrature sampling. Finally, measurement data are presented for real applications using the principles described.

6694
Simple, Ultralow Distortion Digital Pulse Width Modulator
Putzeys, Bruno
A core problem with digital Pulse Width Modulators is that effective sampling occurs at signal-dependent intervals, falsifying the z-transform on which the input signal and the noise shaping process are based. In a first step the noise shaper is reformulated to operate at the timer clock rate instead of the pulse repetition frequency. This solves the uniform/natural sampling problem, but gives rise to new non-linearities akin to ripple feedback in analogue modulators. By modifying the feedback signal such that it reflects only the modulated edge of the pulse train this effect is practically eliminated, yielding vastly reduced distortion without increasing complexity.

6695
A High Performance Open Loop All-digital Class-D Audio Power Amplifier using Zero Positioning Coding (ZePoC)
Mathis, Wolfgang; Schnick, Olaf
Open loop all-digital Class-D amplifiers are uncommon due to the lack of the correcting feedback path leads to several problems resulting in high distortion compared to analog controlled class-D amplifiers. This paper shows that SB-ZePoC lowers switching frequency to 100 kHz. Therefore, these problems can be solved, so that it is possible to design an open loop all-digital class-D audio amplifier with total distortions below 0,01% in the whole listening-band (20 Hz-20 kHz) and an efficiency that reaches 90%. Results of a test-setup will be presented. The sonic performance will be demonstrated during the session.

6696
A Three Level Trellis Noise Shaping Converter for Class D Amplifiers.
Ausiello, Ludovico; Rovatti, Riccardo; Setti, Gianluca
Class D ampli ers can represent signals with three di erent output levels, +Vcc, 0, -Vcc, with no distortion. Exploiting this in order to achieve a better performance with no switching frequency increase, an extension to the classic Pulse Width Modulation two level A/D conversion is proposed. Coding is achieved by extending output waveforms of a Trellis based Sigma Delta Modulation to three levels. Simulation results have shown that, using the same symbol rate, a three level pattern achieved from 3:7 to 8:2 dB of SINAD improvement and a power consumption up to 5 times smaller.

6697
Using SIP Techniques to Verify the Trade-off between SNR and Information Capacity of a Sigma Delta Modulator
Ho, Charlotte; Ling, Bingo Wing-Kuen; Reiss, Joshua D.
The Gerzon-Craven noise shaping theorem states that the ideal information capacity of a sigma delta modulator design is achieved if and only if the noise transfer function (NTF) is minimal phase. In this paper, it is found that there is a trade-off between the signal-to-noise ratio (SNR) and the information capacity of the noise shaped channel. In order to verify this result, loop filters satisfying and not satisfying the minimal phase condition of the NTF are designed via semi-infinite programming (SIP) techniques and solved using dual parameterization. Numerical simulation results show that the design with a minimal phase NTF achieves near the ideal information capacity of the noise shaped channel, but the SNR is low. On the other hand, the design with a non-minimal phase NTF achieves a positive value of the information capacity of the noise shaped channel, but the SNR is high. Results are also provided which compare the SIP design technique with Butterworth and Chebyshev structures and ideal theoretical SDMs, and evaluate the performance in terms of SNR and a variety of information theoretic measures which capture noise shaping qualities.

6698
Estimation of Initial States of Sigma-delta Modulators
Ho, Charlotte; Ling, Bingo Wing-Kuen; Reiss, Joshua D.
In this paper, an initial condition of a sigma-delta modulator is estimated based on quantizer output bit streams and an input signal. The set of initial conditions that generate a stable trajectory is characterized. It is found that this set, as well as the set of initial conditions corresponding to the quantizer output bit streams, are convex. Also, it is found that the mapping from the set of initial conditions to the stable admissible set of quantizer output bit streams is invertible if the loop filter is unstable. Hence, the initial condition corresponding to given stable admissible quantizer output streams and an input signal is uniquely defined when the loop filter is unstable, and a projection onto convex set approach is employed for approximating the initial condition.

6699
High Performance Real-time Software Asynchronous Sample Rate Converter Kernel
Heeb, Thierry
A scalable real-time asynchronous sample rate converter software kernel is presented that offers a flexible alternative to the usual hardware implementations. The kernel is dynamically configurable at run-time and supports almost arbitrary upsampling or downsampling ratios and any number of channels. Due to its scalability this sample rate converter kernel may be used both for low complexity, cost-sensitive implementations as well as for top-performance applications. In a typical high peformance application, sample rates of 384kHz are easily achieved on a low cost DSP and DSD input data streams are also supported for compatibilty with SACD.

6700
Clean Clocks, Once and for All?
Frandsen, Christian G.; Travis, Chris
Network-based digital audio interfaces are becoming increasingly popular. But they do pose a significant jitter problem wherever high-quality conversion to/from analog is required. This is true even with networks such as 1394 that provide dedicated support for isochronous flows. Conventional PLL solutions have too-little jitter attenuation, too-much intrinsic jitter, and/or too-narrow a frequency range. More-advanced solutions tend to have too-high a cost. We present a new clocking technology that boasts high performance and low cost. It has been implemented in a recent audio-over-1394 chip. We show comparative performance results and explore system-level implications, including for systems that use point-to-point links such as AES3, SPDIF and ADAT.

6701
Comprehensive Analysis of Loudspeaker Span Effects on Crosstalk Cancellation in Spatial Sound Reproduction
Bai, Mingsian R.; Lee, Chih-Chung
This paper seeks to pinpoint the optimal loudspeaker span that best reconciles the robustness and performance of the crosstalk cancellation system (CCS). Two sweet spot definitions are employed for assessment of robustness. Besides the point source model, head related transfer functions are employed in the simulation to capture more design aspects in practical situations. Three span angles, 10 degrees, 60 degrees, and 120 degrees, are compared via objective and subjective experiments. Analysis of Variance is applied for analysis. The results indicate that not only the CCS performance but also the panning effect and head shadowing will dictate the overall performance and robustness. The 120-degree arrangement performs comparably well as the 60-degree arrangement, but is more preferred than the 10-degree arrangement.

6702
A Perceptual Measure for Assessing and Removing Reverberation from Audio Signals
Buchholz, Jörg; Hatziantoniou, Panagiotis; Mourjopoulos, John; Zarouchas, Thomas
A novel signal-dependent approach is followed here for modeling perceived distortions due to reverberation in audio signals. The method attempts to describe perceived monaural time-frequency and level distortions due to reverberation. A Computational Auditory Masking Model (CAMM) is employed, using as inputs the reverberant and reference (anechoic) signal, generating time-frequency maps of perceived distortions. From these maps and in a number of sub-bands, gain vs time functions are derived allowing suppression of reverberation in the processed signal.

6703
Investigating Spatial Audio Coding Cues for Meeting Audio Segmentation
Burnett, Ian; Cheng, Eva; Ritz, Christian
As multiparty meetings involve participants that are generally stationary when actively speaking, participant location information can be used to segment the recorded meeting audio into speaker ‘turns.’ In this paper, speaker location information derived from ‘spatial cues’ generated by spatial audio coding techniques is investigated. The validity of using spatial cues for meeting audio segmentation is explored through investigating multiple microphone meeting audio recording techniques and extracting and comparing spatial cues used by different spatial audio coders. Experimental results show the statistical relationship between speaker location and interchannel level and phase-based spatial cues strongly depends on the microphone pattern. Results also indicate that interchannel correlation-based spatial cues represent location information that is ambiguous for meeting audio segmentation.

6704
The Effect of Audio Compression Techniques on Binaural Audio Rendering
Katz, Brian F.G.; Prezat, Fabien
The use of “lossy” audio compression is becoming increasingly common. Many studies have concentrated on the audio quality of such compression techniques, predominantly in a monaural context. This study investigates the effects of audio compression techniques on spatialized audio, specifically binaural audio. Various compression techniques (AAC, ATRAC, MP2, and MP3) using various bitrates when possible have been applied to several test signals. This work presents numerical and perceptive comparisons of the variations in inter-aural time difference (ITD) due to audio compression techniques. Some investigations were also made concerning the effect on spectral peaks and notches, as these spectral cues (contained in the Head-Related Transfer Function, HRTF) are necessary for more precise localization including front-back discrimination and elevation.

6705
Sound Source Obstruction in an Interactive 3Dimensional MPEG-4 Environment
Reiter, Ulrich; Steglich, Beatrix
This paper describes the continuation of research concerning sound source obstruction in virtual scenes. An algorithm for the determination of sound source obstruction was implemented in the described 3D MPEG-4 Environment. With the help of the MPEG-4 Advanced AudioBIFS node AcousticMaterial acoustic properties are assigned to potential obstructors in a virtual scene. Various implementations of acoustic obstruction are explained. Furthermore, a bimodal subjective assessment was performed in order to identify the best implementation of obstruction. The results of the assessment are presented in-depth. Additionally we demonstrate a concept for a second intended bimodal assessment for the comparison of gain and frequency filtering and give an outlook for further research and development in the area of immersive acoustics.

6706
JavaOL - A Structured Audio Orchestra Language: Tools, Player and Streaming Engine
Siao, Yi-Song; Su, Alvin W.Y.; Wang, Tien-Ming
MPEG-4 Structured Audio (SA) [1] defined a set of tools to provide high quality low bit rate audio. In MPEG-4 SA, SAOL (Orchestra Language) is the most important part because it is used to implement algorithms to generate sounds. However, SAOL must be translated into other programming languages such that it can be executed currently. This requires lots of computing power to achieve real-time decoding. Based on MPEG-4 SAOL, we propose JavaOL because it eliminates the translation process and is more efficient. In fact, it is Java equipped with SAOL opcode library. Therefore, one can achieve the same functions provided by SAOL. We also provide a RTP streaming engine and the associate player. Software tools are provided to combine other audio sources.

6707
Using Remote Recording over the Internet in Education
Baillie, Lynne; Dewar, Martin; Harrison, David; Knox, Don; Quinn, Patrick
Remote recording across the Internet now appears to have come of age with the recent development of appropriate software and infrastructure. Within the educational sector the Internet has taken a central role as a means to deliver educational materials. In this innovative pilot project involving Glasgow Caledonian University and its partner Coatbridge College, the use of the Internet to teach audio technology and production techniques will be explored and evaluated. It is anticipated that the knowledge and experience gained will better prepare the audio professionals of the future.

6708
A Community Hierarchic Based Approach for Scalable Parametric Audio Multicasting Over the Internet
Cuevas-Martinez, Juan Carlos; Garrido-Rivera, P.J.; Ruiz-Perez, J.; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
One of the main features of a low bit rate audio coder is its availability for broadcast over media, mainly over the Internet and mobile networks. It is well known that it is not a trivial problem; there are many troubles that could appear in a multicasting system, mainly due to Internet lack of QoS. This kind of audio traffic has to exist with TCP connections, has to avoid congestion and should require as less changes network equipments as possible. So, in this paper we propose some features to be taken into consideration in the development of a multicast and peer-to-peer communication protocol for scalable parametric audio broadcasting over the Internet with low bit rate and good quality.

6709
Distant Teaching of Chamber Music via Local Area Netwoks
Bitzer, Joerg; Kurtisi, Zefir; Loesch, Thomas; May, Tobias
In this paper we present a study on teaching chamber music via internet. The application for this setup is for a highly reputed teacher to teach professional musicians at a very high level. Usually, all participants would have to fly from all over the world in order to work together. Therefore, it would be of great value, if these teaching lessons could be done via internet. Several audio and video devices and different audio setups have been tested. The results indicate that MPEG 2 broadcast devices with two microphones are suitable for this task.

6710
Implementation of Immersive Audio Applications using Robust Adaptive Beamforming and Wave Field Synthesis
Beracoechea, Jon Ander; Casajus, Javier; García, Lino; Ortiz, Luis; Torres-Guijarro, Soledad
An immersive audio system oriented to future communication applications is presented. The aim is to build a system where the acoustic field of a chamber is recorded using a microphone array and then is reconstructed or rendered again, in a different chamber using loudspeaker array based techniques. Our proposal relays on recent robust adaptive beamforming techniques and joint audio-video source localization for effectively estimating the original sources of the emitting room. The estimated source and the source localization information drive a Wave Field Synthesis engine that renders the acoustic field again at the receiving chamber. The overall system performance is tested using a MUSHRA-based subjective test in a real situation.

6711
Spatial Aliasing Artifacts Produced by Linear and Circular Loudspeaker Arrays used for Wave Field Synthesis
Rabenstein, Rudolf; Spors, Sascha
Wave field synthesis allows the exact reproduction of sound fields if the requirements of its physical foundation are met. However, the practical realization imposes certain technical constraints. One of these is the application of loudspeaker arrays as an approximation to a spatially continuous source distribution. The effect of a finite spacing of the loudspeakers can be described as spatial sampling artifacts. This contribution derives a description of the spatial sampling process for planar linear and circular arrays, analyzes the sampling artifacts and discusses the conditions for preventing spatial aliasing. It furthermore introduces the reproduced aliasing-to-signal ratio as a measure for the energy of aliasing contributions.

6712
Characterization of the Reverberant Sound Field Emitted by a Wave Field Synthesis Driven Loudspeaker Array
Caulkins, Terence; Warusfel, Olivier
Realistic sound reproduction using Wave Field Synthesis in concert halls involves ensuring that both the direct and reverberated sound fields are accurate at all listening positions. Though methods for controlling the direct sound field have been described in the past, the control of the reverberated sound field associated to WFS sources remains a topic of interest. This article describes the characterization of the reverberated sound field associated to a WFS array as it synthesizes a virtual point source. Variations in the directivity and positioning of the virtual source are shown to have an effect on the associated room effect. A solution for controlling the reverberated sound field in a concert hall equipped with a WFS system is proposed, based on this characterization.

6713
Conjugate Gradient Techniques for Multichannel Acoustic Echo Cancellation in Frequency Domain
Beracoechea, Jon Ander; Casajús-Quirós, Francisco Javier; García Morales, Lino; Torres-Guijarro, Soledad
Multichannel acoustic cancellation problem requires working with extremely large impulse responses. Multirate adaptive schemes such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellation systems nowadays. However, being a Least Mean Square (LMS) derived algorithm, the convergence speed may not be fast enough under some circumstances. In this paper we propose a new scheme which combines frequency-domain adaptive filtering with conjugate gradient techniques in order to speed up the convergence time. The new algorithm (PBFDAF-CG) is developed and its behavior is compared against previous PBFDAF schemes.

6714
SigmaStudio. A User Friendly, Intuitive and Expandable, Graphical Development Environment for Audio/DSP Applications.
Chavez, Miguel A.; Huin, Camille
Graphical development environments have been used in the audio industry for a number of years. Those who have fewer limitations have persisted and found a well establish pool of users that is reluctant to modify heir design patterns and adopt different embedded processors and design environments. This article will provide a small history of the evolution of integrated development environments (IDEs). It will then describe and explain the software architecture decisions and design challenges that were used to develop SigmaStudio, it will also show the advantages that those decisions have meant for the SigmaDSP family of audio centric embedded processors.

6715
Filter Update Techniques for Adaptive Virtual Acoustic Imaging
Kim, Youngtae; Mannerheim, Pal; Nelson, Philip A.
This paper deals with filter updates for adaptive virtual acoustic imaging systems using binaural technology and loudspeakers. The problem is to update the inverse filters without creating any audible changes for the listener. The problem can be overcome by using either a very fine mesh for the inverse filters or by using commutation techniques.

6716
Adaptive Filters in Wavelet Transform Domain
Bajic, Vladan
The paper presents performance comparison between two methods of implementing adaptive filtering algorithms for noise reduction, namely the Normalized time domain Least Mean Squares (NLMS) algorithm, and the Wavelet transform domain LMS (WLMS). A brief theoretical development of both methods is explained, and then both algorithms are implemented on the real time Digital Signal Processing (DSP) system used for audio signals processing. Results are presented showing the performance of each algorithm both in time and frequency domains. Noise reduction effects produced by different algorithms were shown across the spectrum, and distorting effects were analyzed. Trade-offs of convergence speed versus added noise were analyzed. Overall results show convergence speed improvement when using WLMS algorithms over the NLMS algorithm.

6717
Adaptive Time-Frequency Resolution for Analysis and Processing of Audio
Lukin, Alexey; Todd, Jeremy
Filter banks with fixed time-frequency resolution, such as STFT, are a common tool for many audio analysis and processing applications allowing effective implementation via FFT. The major problem with STFT approach is a fixed time-frequency resolution. In this paper, we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank specific artifacts such as pre-echo while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated for problems of spectrograms display, noise reduction, and spectral effects processing.

6718
Advanced Methods for Shaping Time-Frequency Areas for the Selective Mixing of Sounds
Kleczkowski, Adam; Kleczkowski, Piotr
The Selective Mixing of Sounds (presented in AES paper 6552) contains a large and conceptually challenging part, which had not been developed previously. This is a method of determining the areas in the time-frequency plane. It has a major effect on the quality of the sound. In this paper we propose and compare a range of appropriate algorithms. We begin with a simple two-dimensional running mean combined with a rule selecting the track characterised by the maximum energy, followed by a low-pass filter based on the 2-dimensional Fourier transform. We also propose several novel methods based on Monte-Carlo approach, in which local probabilistic rules are iterated many times to produce a required level of smoothing.

6719
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Bonada, Jordi; Loscos, Alex; Vinyes, Marc
Audio Blind Separation in real commercial music recordings is still an open problem. In the last few years some techniques have provided interesting results. This article presents a human-assisted clusterization of the DFT coefficients for the Time-Frequency Masking demixing technique. The DFT coefficients are grouped by adjacent panning and phase difference and an interactive graphical interface allows the user to select in real-time panning and phase difference ranges for each song. Results prove an implementation of such technique can be used to demix tracks from nowadays commercial songs. Sample sounds can be found here: http://www.iua.upf.es/~mvinyes/abs/demos

6720
A Multichannel Speech Dereverberation Technique Based Upon the Wiener Filter
Boland, Frank; McCarthy, Denis
We present a new method for dereverberating speech, based upon a multichannel Wiener Filter and a microphone array. We demonstrate the effectiveness of this method under real, reverberant conditions and show that the method may be described as a self-steering beamformer. Furthermore, we investigate the performance of the method under simulated conditions, designed to closely match the acoustic characteristics of the real room environment. These simulations yield significantly inferior results to those obtained using real recordings and we show that this is as a result of the failure of simulated impulse responses to accurately model real impulse responses, in certain critical respects.

6721
Effective Room Equalization Based on Warped Common Acoustical Poles and Zeros
Jeong, Jae-woong; Jeong, Seh-Woong; Lee, Junho; Park, Young-cheol; Youn, Dae-hee
This paper presents a new method of designing room equalization filters using a warped common acoustical pole and zero (WCAPZ) modeling. The proposed method is capable of significantly reducing the order of the equalization filters without sacrificing the filter performance. Thus, the associated input-output delay is much smaller than the conventional room equalization method while its computational complexity is comparable to it. This paper also presents an adaptive IIR filter structure to overcome computational problems associated with calculation of CAPZ coefficients. Simulation results confirm that the use of the proposed algorithm significantly improves the room equalization over a range of low frequencies.

6722
Parametric Recursive Higher-Order Shelving Filters
Holters, Martin; Zölzer, Udo
The main characteristic of shelving filters, as commonly used in audio equalization, is to amplify or attenuate a certain frequency band by a given gain. For parametric equalizers, a filter structure is desirable that allows independent adjustment of the width and center frequency of the band, and the gain. In this paper, we present a design for arbitrary-order shelving filters and a suitable parametric structure. A low shelving filter design based on Butterworth filters is decomposed such that the gain can be easily adjusted. Transformation to the digital domain is performed, keeping gain and denormalized cut-off frequency independently controllable. Finally, we obtain mid and high shelving filters using a simple manipulation, providing the desired parametric filter structure.

6723
Enhanced Control of Sound Field Radiated by Co-axial Loudspeaker Systems using Digital Signal Processing Techniques
Boucher, Jean Marc; Debail, Bernard G.A.; Diquelou, Pierre Yves; Kerneis, Yvon; Shaiek, Hmaied
In multi-way loudspeakers, DSP techniques have been used so far mainly to correct frequency response, time alignment and out of axis lobbing. In this paper, a dedicated signal processing technique is described in order to also control the sound field radiation of a new 4 way co-axial source in the overlap frequency band of drivers. Trades-off and practical constraints (crossover, time shift, gain ...) are discussed and an optimization algorithm is proposed to provide the best achievable result. A real time implementation of this technique dedicated to the new 4-way co-axial source is presented and lead to a nearly ideal point source.

6724
Network Music Performance (NMP) in Narrow Band Networks
Carôt, Alexander; Krämer, Ulrich; Schuller, Gerald
Playing live music on the Internet is one of the hardest disciplines in terms of low delay audio capture and transmission, time synchronization and bandwidth requirements. This has already been successfully evaluated with the Soundjack software which can be described as a low latency UDP streaming application. In combination with the new Fraunhofer ULD Codec this technology could now be used in narrow band DSL networks without a significant increase of latency. This paper first describes the essential basics of network music performances in terms of soundcard and network issues and finally reviews the context under DSL narrow band network restrictions and the usage of the ULD Codec.

6725
Intensive Noise Reduction utilizing Inharmonic Frequency Analysis of GHA
Kanda, Yoshihiro; Muraoka, Teruo; Ohta, Takumi; Takamizawa, Ryuji
Removals of noise in SP record reproduction were attempted utilizing GHA (Generalized Harmonic Analysis) as Inharmonic frequency analysis. Spectrum subtraction is most common among conventional noise reduction techniques, however it has a side effect of musical noise generation. It is caused by inaccurate frequency resolution inherent to conventional Harmonic frequency analysis. One of inharmonic frequency analysis of GHA is equipped with excellent frequency resolution, and has been put in practical use recently. The authors applied GHA for noise reduction and obtained better results than those by conventional Spectrum subtraction. However there still remained musical noise problems and its major reason is spectral in-coincidence between pre-sampled reference noise and actually remained residual noise. The authors tried several countermeasures such as pre-spectral shaping of object signal and spectral similarity calculation of residual noise etc. Through combining countermeasures, the authors achieved satisfactory noise reduction.

6726
Advanced Cataloging and Search Techniques in Audio Archiving
Blohmer, Helge
Ever since the processing capabilities of computers reached the point where audio indexing and searching became possible using techniques beyond simple, manually entered textual annotation in the late 1980s, researchers have been developing such methods with varying degrees of success. Yet even today, the actual workflow in audio archives is dominated by text entry for cataloging and keywords for searching with few or none of the new methods having achieved any practical relevance. This paper is evaluating a number of techniques, both those that enhance textual retrieval and those that seek to supplant it, towards their suitability for real-world audio archiving tasks with special focus on their suitability for a short-term implementation and seamless integration into existing archive workflows.

6727
Evaluation of Query-by-Humming Systems using a Random Melody Database
Batke, Jan-Mark; Eisenberg, Gunnar
The performance of melody retrieval using a query-by-humming (QBH) system depends on different parameters. For the query, parameters like length of the query and possibly contained errors influence the success of the retrieval. But also the size of the melody database inside the QBH-system has a certain impact on the query. This paper describes how the statistical parameters of a random melody database are modelled to get the same behaviour like a database containing authentic melodies. Databases containing random melodies are a testing facility to QBH-systems.

6728
MP3 Window-switching Pattern Analysis for General Purposes Beat Tracking on Music with Drums
D'Aguanno, Antonello; Haus, Goffredo; Vercellesi, Giancarlo
This work analyzes the dependency of the window-switching pattern versus: different encoders, bit rates and encoder quality features. We propose a simple template-matching algorithm to solve beat tracking contest in music with drums. This algorithm uses windows-switching pattern information only. Commonly in a beat-tracking system the window-switching pattern is used to refine the results of a frequency evaluation. Furthermore, this paper wants to demonstrates the reliability of the window-switching pattern to solve beat-tracking problem in music with drums independently from encoders, bit rates, encoders quality features and frequency analysis. This paper confirms the window-switching pattern is adequate information in a beat-tracking contest at every bit rate and for every encoder.

6729
Application of MPEG-4 SLS in MMDBMSs – Requirements for and Evaluation of the Format
Meyer-Wegener, Klaus; Penzkofer, Florian; Suchomski, Maciej
Specific requirements for audio storage in multimedia database management systems, where data independence of continuous data plays a key role, are described in this paper. Based on the defined characteristics for the internal format for natural audio considering especially long time storage, where the lossless property of the format is a must, allowing among others easy upgrade of the system, the new MPEG-4 scalable lossless audio coding (SLS) is shortly explained. The evaluation of SLS w.r.t. the discussed requirements is conducted in the meaning of characteristics and processing complexity of the algorithm. Some suggestions of the possible modifications are given at the end.

6730
Applying EAI Technologies to Bimedial Broadcast Environments. Challenges, Chances and Risks.
Zimmermann, Michael
More and more broadcast companies try to optimize their production environments by enforcing bimedial workflows. The recent applications and tools on the other hand only have poor integration interfaces to achieve this goal. EAI, originally focussing on the integration of legacy systems, has become a mature toolset to integrate various systems and offering tools and applications to ease integration. This lecture should show the possibilities and limits of EAI in bimedial broadcast environments.

6731
Virtual Concert: Spatial Sound in DVD Technology
Gordon, David M. H.
A comprehensive research paper documenting the use of spatial sound in DVD technology. Sets out to evaluate the communicative abilities of spatial sound and the implications of combining spatial sound along with selective multiple camera angles. Increases the rationale by investigating the use of a non-linear structure in the presentation of audio-visual DVD products. Asserts that no product currently integrates these deconstructed components into a singular framework, and therefore reports on the development of a concept titled Virtual Concert. Discusses the underlying concept of Virtual Concert, in relation to the combination of surround sound music mixes with the corresponding camera angle, presented in a non-linear structure. The emphasis is on practical subjective evaluation through a screening of Virtual Concert, and subsequent distribution of comprehensive questionnaires.

6732
The Adaptation of Concert Hall Measures of Spatial Impression to Reproduced Sound
Davies, W. J.; Hirst, Jonathan; Philipson, Peter
A method of objectively measuring the spatial capabilities of multichannel sound systems has been investigated. The method involves the comparison of interaural cross correlation (IACC) measurements taken in a concert hall to IACC measurements taken in reproduced versions of the same concert hall. The type of reproduction system was varied and an indication of the spatial capabilities of each system was gained from the comparison of original and reproduced IACC measurements. The comparisons revealed that all the reproduction systems were unable to match the lowest IACC readings taken in the concert hall and that the measurement method was capable of discriminating between the spatial performances of the reproduction systems and also to rank the system's performances in an expected order.

6733
Analysis of Spatial Resolution of Multiactuator Panels
Bleda, Sergio; Escolano, José; López, José Javier; Pueo, Basilio
A study of the aliasing frequency of Multiactuator Panels (MAPs) for Wave Field Synthesis (WFS) is presented. It is based on the periodicity of the spatial frequency in a wavenumber domain analysis. The success of these loudspeakers for WFS lies in the absence of exciter cross interference, acting as single sources. However, the distance between exciters may not be indicative of the spatial resolution capability of the array. A set of four MAPs comprising 32 exciters were measured by using this multidimensional analysis. An additional dynamic loudspeaker array having the same loudspeaker spacing was also measured. Results show a good correlation with expected figures given by the distance between exciters.

6734
New CLD Quantization Method for Spatial Audio Coding
Choi, Seung Jong; Jung, Yang-Won; Kim, Hyo Jin; Oh, Hyen-O
In spatial audio coding, spatial parameters, such as CLDs, CPCs, ICCs, are utilized for down-mix and up-mix of the multi-channel audio signals. In the current version of MPEG Surround, an universal quantization table for CLD is applied independent to channel combinations. As intervals between adjacent channels differ in the conventional 5.1ch configuration, this universal quantization scheme causes redundancies in some combinations while insufficiencies in the other combinations. In this paper, we propose a new CLD quantization method based on well-known amplitude panning law and spatial resolution of human perception. By the proposed quantization method, CLD can be represented more efficiently, and therefore, bit reduction and quality enhancement can be achieved.

6735
Koch’s Snowflake: A Case Study of Sound Scattering of Fractal Surfaces
Cabrera, Densil; Degos, David; Edson, Steven
Diffusion and scattering are becoming increasingly relevant in room acoustics design. The scattering performance of current passive diffusers is often restricted to a certain bandwidth due to physical constraints. One possible approach to this is to use fractal surface profiles, which have similar geometric features over a wide range of scales, and so should achieve an extended bandwidth for effective scattering. A range of acoustic panels of varying complexity, based around Koch’s Snowflake pattern, was constructed and tested using a two-dimensional pseudo-anechoic method adapted from the AES-4id-2001. This paper reports on these results, and also on issues encountered in implementing the measurements.

6736
Large Scale FEM Analysis of a Studio Room
Ahnert, Wolfgang; Bansal, Mahesh; Feistel, Stefan
In room acoustics, particle models like ray tracing and image source method are not sufficient to explain the wave nature especially at low frequencies. For detailed acoustic investigation, many wave-based approaches like FEM, BEM and finite difference methods have been proposed. We present an application of large scale FEM analysis in order to obtain eigenmodes and transfer functions of a real-world studio with general impedance boundary conditions. The results thus obtained are compared with the measured data and are in fair agreement with each other. Since, FEM needs discretization of domain into small elements like tetrahedral and hexahedral, we also propose a novel all-hexahedral mesh generator for arbitrary shaped rooms and show its application in room acoustics.

6737
Influence of Ray Angle of Incidence and Complex Reflection Factor on Acoustical Simulation Results (Part II)
El-Saghir, Emad; Feistel, Stefan
In a previous paper [1], it was shown that the influence of neglecting the incidence-angle dependence of absorption coefficients in a single-source, shoebox room model was insignificant as far as simulation results are concerned. Neglecting phase shift at each reflection led, however, to a significant difference in the predicted pressure level in the same model. This paper investigates the same two questions in a complicated model with several sources and a diversity of surface materials. It attempts to analytically estimate the error associated with the disregard of these two issues.

6738
Adaptive Audio Equalization of Rooms Based on a Technique of Transparent Insertion of Acoustic Probe Signals
Ferreira, Aníbal J. S.; Leite, António; Pinto, Francisco; Rocha, Ariel F.
This paper presents an enhanced method performing real time adaptive equalization of room acoustics in the frequency domain. The method obtains the frequency response of the room by means of transparent insertion of a certain number of acoustic probe signals into the main audio spectrum. The opportunities for the insertion of tones are identified by means of a spectral analysis of the audio signal based on a psychoacoustic model of frequency masking. This enhanced version of the adaptive equalizer will be explained as well as its real time implementation on a TMS320C6713 DSP based platform. Results of the acoustic tests will be discussed and conclusions about global performance will be presented.

6739
An Amphitheatric Hall Modal Analysis using the Finite Element Method Compared to in situ Measurements.
Kalliris, George; Papanikolaou, George; Papastefanou, Anastasia; Sevastiadis, Christos
The distribution of the low frequency room modes is important in room acoustics. The Finite Element Method (FEM) is a powerful numerical technique for analyzing the behavior of sound waves in enclosures, especially irregular ones. Also, it is the method which produces reliable results in the low frequency range where other methods like ray tracing and image source methods fail. A modal analysis is presented using the FEM in a non rectangular, medium sized amphitheatric hall and we compare the calculated results with those obtained by on site measurements.

6740
A Computer Aided Design Method for the Dimensions of a Rectangular Enclosure to Avoid Degeneracy of Standing Waves
Liu, Zhi; Wu, Fan
A method for designing dimensions of a rectangular enclosure to avoid degeneracy of standing waves, and the corresponding computer aided design software are presented in this paper. A math model to calculate many dimensions in favor of avoiding degeneracy of standing waves is created. The similarity of the normal frequencies regarded as degeneracy is limited under a specific condition. Based on the relationship between normal frequencies and the dimensions of a rectangular enclosure, the dimensions to avoid degeneracy can be chosen. Also a Computer Aided Design program is developed to identify the dimensions that can be applied in dimensions design of loudspeaker’s cabinet or room conveniently to get the best acoustic effect.

6741
A 3D Acoustic Simulation Program with Graphical Frontend for Scene Input
Kuntz, Achim; Rabenstein, Rudolf
A program for full three-dimensional simulation of sound propagation in enclosures is presented which interfaces to a graphical interface for intuitive setup of complex simulation scenes. The simulation algorithm is based on the wave digital ¯ltering principle, allowing for arbitrary re°ection coe±cients at object boundaries and walls for realistic results. Simulation scenes are de¯ned in an object oriented way. As a graphical user interface to the simulation program a modeler front-end for a raytracing program is used. Simulation setups can thus be built by graphically placing objects in the scene. Being open source, the proposed modeler can easily be customized if required. Simulation results are shown for several example setups demonstrating the possibilities of our approach.

6742
Absorptive Material Arrangement Method for Global Interior Noise Control in Wide Frequency Range
Cho, Sung-Ho; Kim, Yang-Hann
A simple method is proposed to arrange absorptive material for global interior noise reduction in wide frequency range. When an enclosure’s typical dimensions are of the order of several wavelengths or less, and sources and enclosure are geometrically complex, it is not easy to select the means that guide us to effectively control its noise by attaching absorptive materials on its walls. The proposed method, however, will lead the designer to better understand which treatments are most effective and how a better design can be achieved. The beauty of proposed method is that one can easily find absorptive material arrangement for global noise reduction needless to calculate sound field by using perturbation method or boundary element method. This means that one can effectively find the absorbent’s arrangement because this method needs only eigenstructures (eigenvalue and eigenfunction) of an enclosure.

6743
Real Time Acoustic Rendering of Complex Environments Including Diffraction and Curved Surfaces
Bouatouch, Kadi; Deille, Olivier; Maillard, Julien; Martin, Jacques; Noé, Nicolas
A solution to produce virtual sound environments based on the physical characteristics of a modeled complex volume is described. The goal is to reproduce, in real time, the sound field depending on the position of the listener and to allow some interactivity (change in material characteristics for instance). First an adaptive beam tracing algorithm is used to compute a geometrical solution between the sources and several positions inside that volume. This algorithm is not limited to polygonal faces and handles diffraction. Then, the precomputed paths, once ordered and selected, are auralized and an adaptive artificial reverberation is used. New techniques to allow fast and accurate rendering are detailed. The proposed approach provides accurate audio rendering on headphones or within advanced multi-user immersive environments.

6744
Comparison between In-situ Recordings and Auralizations
Nijs, Lau; Rindel, Jens Holger; Saher, Konca
The doctoral research of ‘Prediction and Assessment of Acoustical Quality in Living-rooms for People with Intellectual Disabilities’ in Delft University of Technology investigates, among other issues, the applicability and verification of auralization as a quality assessment tool in acoustical-architectural design. This paper deals with comparison between binaural in-situ recordings and auralizations obtained from computer simulations. Listening tests and questionnaires were prepared from auralizations to compare with the reference binaural recordings. The difficulties in evaluation of auralization quality are discussed. The results indicate that although auralizations and binaural recordings evoke different aural perception auralization is a strong tool to assess the acoustical environment before the space is built. Two commercial programs are used for the auralizations. (ODEON and CATT-Acoustics)

6745
The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio Codecs
Marins, Paulo; Rumsey, Francis; Zielinski, Slawomir K.
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artefacts inherent to low-bit rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artefacts. The choice of the artefacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For”.

6746
Improved Noise Weighting in CELP Coding of Speech - Applying the Vorbis Psychoacoustic Model To Speex
Montgomery, Christopher; Valin, Jean-Marc
One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting lter. In this paper, we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the signi cant advantage of improving the quality of an existing codec without the need to change the bit-stream. More speci cally, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a signi cant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20% reduction in bit-rate. The technique itself is not speci c to Speex and could be applied to other CELP codecs.

6747
Reduced Bit Rate Ultra Low Delay Audio Coding
Hirschfeld, Jens; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan
An audio coder with a very low delay (6-8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced bit rate coding. We propose to use a different noise shaping method instead, resulting in an approach which uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.

6748
Real-Time Subband-ADPCM Low-Delay Audio Coding Approach
Keiler, Florian
A low-delay audio codec using the ADPCM structure (ADPCM = adaptive differential pulse code modulation) in subbands is presented. With the use of 8 subbands a coarse spectral shaping of the coding noise is obtained and the signal delay is approximately 3 ms. The targeted bit rate is in the range from 128 to 176 kbit/s per channel for near transparent audio quality. The codec uses a cosine-modulated filterbank and backward adaptive calculation of the prediction coefficients and quantization scaling factors. The computations are optimized for a real-time implementation on a fixed-point DSP with an almost constant workload over time. A comparison with the Philips Subband Coder (SBC) and the Fraunhofer Ultra Low Delay Codec (ULD) is performed.

6749
Scalable Bitplane Runlength Coding
Dunn, Chris
Low-complexity audio compression offering fine-grain bitrate scalability can be realised with bitplane runlength coding. Adaptive Golomb codes are computationally simple runlength codes that allow bitplane runlength coding to achieve notable coding efficiency. For multi-block audio frames, coefficient interleaving prior to bitplane runlength coding also results in a substantial increase in coding efficiency. It is shown that bitplane runlength coding is more compact than the best known SPIHT arrangement for audio bitplane coding, and achieves coding efficiency that is competitive with fixed-rate quantisation.

6750
Scalable Audio Coder with Iterative Auditory Masking
Philippe, Pierrick; Veaux, Christophe
In this paper, reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.

6751
A Frequency-domain Framework for Spatial Audio Coding Based on Universal Spatial Cues
Goodwin, Michael M.; Jot, Jean-Marc
Spatial audio coding (SAC) addresses the emerging need to e ciently represent high- delity multichannel audio. The SAC methods previously described involve analyzing the input audio for inter-channel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel con guration. In this paper, we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.

6752
Parametric Joint-Coding of Audio Sources
Faller, Christof
The following coding scenario is addressed: A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multi-channel surround, wavefield synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties which determine the spatial cues at the mixer output. Subjective evalution indicates that the proposed scheme achieves high audio quality.

6753
Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio Coding
Faller, Christof; Tournery, Christophe
For parametric stereo and multi-channel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multi-channel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g. binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose algorithms for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.

6754
Closing the Gap between the Multi-Channel and the Stereo Audio World: Recent MP3 Surround Extensions
Grill, Bernhard; Hellmuth, Oliver; Herre, Jürgen; Hilpert, Johannes; Plogsties, Jan
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel. Presented in 2004, the MP3 Surround format allows to represent high-quality 5.1 surround sound at bitrates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround: The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.

6755
Design for High Frequency Adjustment Module in MPEG-4 HEAAC Encoder Based on Linear Prediction Method
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Yang, Yung-Cheng
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high frequency adjustment module. In this paper, an efficient method based on Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit the different subband contents. Furthermore, the artifact due to the sharing of control parameter is investigated and the efficient decision criterion of control parameter is proposed.

6756
Multi-Channel Noise-Reduction-Systems for Speaker Identification in an Automotive Acoustic Environment
Goetze, Stefan; Kammeyer, Karl-Dirk; Mildner, Volker
Devices for communication and information utilised by car drivers are facing two essential requirements: hands-free operation via distant microphones but also robustness against different noises depending on car speed etc. Automatic speaker identification can be utilized within such devices to either supply speech recognition systems with so called apriori information to achieve higher recognition rates or even to enable applications such as heating systems to adjust to the preferences of the driver. Thus identifying the driver from a predefined group of possible system users may be a task for future applications. The aim in this work is to investigate to which extent multi-channel noise reduction systems are suitable for improving the performance of speaker identification algorithms under different acoustic conditions in an automotive environment.

6757
Optimal Quantized Linear Prediction Coefficients for Lossless Audio Compression - Scalar Quantization Revisited
Ghido, Florin
Uniform scalar quantization of linear prediction coefficients with B bits is traditionally done by multiplying each coefficient with Q = 2B and rounding it to the nearest integer. We propose an improved, optimal quantization method by replacing the rounding with a more elaborated procedure. It uses on average 2 bits less per quantized prediction coefficient for a similar sum of squared errors and allows an accurate estimate of the mean squared error misadjustment as a function of Q for a given subframe and predictor order M. We introduce several efficient time-constrained probabilistic search methods for obtaining near optimal solutions. There are no required changes at the decoder and the method is applicable on a wider area of cases (mono, stereo, and multichannel prediction) than quantization of reflection coefficients. Moreover, the method enables near optimal compression for 24 bit audio using only 32 bit arithmetic operations.

6758
Efficient Out of Head Localization System for Mobile Applications
Choi, Tacksung; Park, Young-cheol; Youn, Dae-hee
Headphone reproduction of stereo sources often gives in-the-head-localization. One possible solution to this problem is to give directional filtering and room response to the headphone reproduction system. Conventional out of head localization (OHL) schemes consist usually of a tapped delay line to simulate the direct signal path and early room reflections. Each of the taps must be filtered by a pair of HRTF, which leads to a very high processing cost. Our study is based on the fact that spatial impression (SI) can increase the effects of OHL. Our research is to generate the maximum SI with a minimum cost. Through subjective listening tests, the degree of SI was found to be the greatest for reflections within 15 to 30msec time frame from the direct sound and it is greatest for those in opposite direction to the listener’s ears. Based on the test results, we propose an efficient OHL system. In the proposed system, multiple reflections are replaced by a pair of reflections, and HRTF filtering required to simulate directivity of the reflections is implemented using a set of first order IIR shelving filters. According to the subjective tests, we show that the proposed system efficiently creates OHL with a small computational figure, and its performance is comparable to the conventional scheme of high complexity.

6759
A Psychoacoustic Noise Reduction Approach for Stereo Hands-Free Systems
Goetze, Stefan; Kammeyer, Karl-Dirk; Mildner, Volker
One demand for comfortable high quality hands-free video conferencing systems is the transmission of a spatial acoustical impression. Therefore a major task is the transmission of stereo speech signals from a noisy environment. The suppression of the noise components must not corrupt the stereo effect. In this context different single channel, multi-channel and hybrid speech enhancement systems will be evaluated in this contribution. The problem of musical noise in Post-Filter-algorithms is adressed. Therefore a psychoacoustic masking threshold for the noise reduction algorithms is considered.

6760
Estimation of Talker’s Head Orientation Based on Oriented Global Coherence Field
Brutti, Alessio; Omologo, Maurizio; Svaizer, Piergiorgio
This work describes a new method for estimating the orientation of an active sound source given a distributed microphone network. The technique requires that a set of microphone pairs be distributed in a room, and then it exploits the coherence computed from each sensor pair in order to derive an estimation of the head orientation. A database consisting of an audio sequence reproduced by a loudspeaker with different orientations and different positions was collected in order to evaluate the algorithm behavior. Experiments conducted on that database show that our approach can provide an efficient estimation of the sound source orientation, with a RMS error of about 10 degrees.

6761
High Quality Blind Bandwidth Extension of Audio for Portable Player Applications
Arora, Manish; Lee, Joonhyun; Park, Sangil
Bandwidth limitation in lossy audio coding schemes significantly reduces the perceived quality. Blind high frequency bandwidth extension schemes have been proposed but they don’t provide sufficiently good quality in applications where they are needed most, in portable audio devices with severe complexity constraints. The following work describes a high quality blind bandwidth extension method proposing efficient initial audio bandwidth detection and regenerated spectral envelop shaping enhancements. Objective measurements on processed signals show significant quality improvements with very low complexity requirements which allows easy implementation on a wide variety of platforms.

6762
Coherence Enhanced Minimum Statistics Spectral Subtraction in Bi-microphone Systems
Fillion-Deneault, Jonathan; Lefebvre, Roch
A novel system for 2 channel spectral subtraction is presented. The objective is to improve the intelligibility of speech in noisy environments by enhancing noise reduction of single microphone techniques as well as to greatly reduce the amount of musical noise that they introduce. The system consists of two different blocks; the first processing consists of a generalized spectral subtraction block on the primary channel using minimum statistics for noise estimation followed by a perceptual domain coherence based post-filter for additional noise suppression. Subjective and objective testing of both simulated and real-world recordings show that listeners prefer the proposed system to other state of the art speech enhancement reduction techniques.

6763
Sound Field Analysis Based on Generalized Prolate Spheroidal Wave Sequences
Grenier, Yves; Guillaume, Mathieu
In this article, an array processing is described to improve the quality of sound field analysis, which aims to extract spatial properties of a sound field. In this domain, the notion of spatial aliasing inevitably occurs due to the finite number of microphones used in the array. It is linked to the Fourier transform of the discrete analysis window, which is constituted of a mainlobe, fixing the resolution achievable by the spatial analysis, and also from sidelobes which degrade the quality of spatial analysis by introducing artifacts not present in the original sound field. A method to design an optimal analysis window with respect to a particular wave vector is presented, aiming to realize the best localization possible in the wave vector domain. Then the efficiency of the approach is demonstrated for several geometrical configurations of the microphone array, on the whole bandwidth of sound fields.

6764
Optimisation of Co-centred Rigid and Open Spherical Microphone Arrays
Jin, Craig; Parthy, Abhaya; van Schaik, Andre
We present a novel microphone array that consists of an open spherical array with a smaller rigid spherical array at its centre. The distribution of microphones, which results in the array having the largest frequency range, for a given beamforming order, was obtained by analysing microphone errors. For a fixed number of microphones, the results for several examples indicate that the maximum frequency range is obtained when the microphones are relatively evenly distributed between the open and rigid spheres.

6765
Review and Discussion on Classical STFT-based Frequency Estimators
Betser, Michaël; Collen, Patrice; David, Bertrand; Richard, Gaël
Sinusoidal modeling is based on the decomposition of audio signals into a sum of sinusoidal components plus a noise residual part. It involves accurate sinusoid parameters estimation and, in particular, accurate frequency estimation. A broad category of methods uses the Fast Fourier Transform (FFT) as a starting point to compute frequency. All these methods present very similar forms of estimators, but the relations between them are not yet fully understood. This work proposes to take a deeper look into these relations. The first goal of this work is to present a clear review and description fo the classical FFT-based frequency estimator. A new estimator similar to the phase vocoder is presented. The second goal of this work is to identify the common hypotheses and the common steps of the process for this category of estimator. Lastly, experimental comparisons are given.

6766
Accurate Phase Estimation for Chirp-like Signals
Betser, Michaël; Collen, Patrice; Rault, Jean-Bernard
Sinusoidal modeling relies on the decomposition of a given signal (continuous or discrete) into a set of sinusoidal components plus a residual signal. The sinusoidal parameters, namely the amplitude, frequency and phase, may vary upon time. Generally, the tracking of these parameters is performed via Short-Time Fourier Transform (STFT) analysis, providing in fine, for each sinusoidal component, estimates of the amplitude, frequency and phase for a considered time slot. The duration of the analysis time slots is chosen in order to guarantee that the signal under analysis is stationary enough to deliver useful data. If this requirement is not met, in particular if the frequency varies in the analysis slot, the phase estimation is biased. This paper introduces a method to estimate and to correct this bias as a function of the analysis parameters (window type and size) and of the frequency slope.

6767
Equalization of Audio Systems using Kautz Filters with Log-like Frequency Resolution
Karjalainen, Matti; Paatero, Tuomas
This paper presents a new digital filtering approach to the equalization of audio systems such as loudspeaker and room responses. The equalization scheme utilizes a particular infinite impulse response (IIR) filter configuration called Kautz filters, which can be seen as generalizations of finite impulse response (FIR) filters and their warped counterparts. The desired frequency resolution allocation is attained by appropriately choosing a set of fixed Kautz filter poles. The frequency resolution mapping is characterized by the allpass part of the Kautz filter. The second step in the actual equalizer design consist of assigning the Kautz filter tap-output weights, which is more or less a standard least-square configuration. The proposed method is demonstrated using measured loudspeaker and room responses.

6768
Personal Audio Headrest
Chung, Chiho; Elliott, Steve J.
Active noise control was implemented using loudspeakers embedded in the headrests of two adjacent seats. The goal of this study was to create a quiet zone surrounded by headrest 1 which was free from noise caused by the adjacent loudspeaker mounted in the next-to headrest 2. While Headrest 1 was generating noise, headrest 2 was designed to cancel out noise that was being generated by headrest 1, by driving the anti-noise signal through the loundspeaker, without using earphones/headphones. Control source, the speaker of Headrest 1 generating anti-noise, was made from FIR convolution with the electrical signal going to the primary source, the speaker of headset 2 generating target-noise. Implementing both primary and control sources results in a 20-30 dB noise reduction throughout the targeted frequency range (2kHz and below) in terms of squared acoustic pressure.

6769
Accidental Wow Evaluation Based on Sinusoidal Modeling and Neural Nets Prediction
Czyzewski, Andrzej; Litwic, Lukasz; Maziewski, Przemyslaw
In this paper an algorithmic approach to the wow defect characteristic evaluation is presented. The approach is based on a sinusoidal analysis comprising both amplitude and phase spectra processing. The frequency trajectories depicting the distortion are built on a basis of amplitude, frequency and phase dependencies and are further used for wow characteristic evaluation. Additionally the experiments concerning the neural-network-based prediction applied to the characteristic are performed. The obtained results are compared to linear-prediction.

6770
An Ontology-based Approach to Information Management for Music Analysis Systems
Abdallah, Samer; Raimond, Yves; Sandler, Mark
We describe an information management system which addresses the needs of music analysis projects, providing a logic-based knowledge representation scheme for the many types of object in the domains of music and signal processing, including musical works and scores, performance events, human agents, signals, analysis functions, and analysis results. The system is implemented using logic-programming and semantic web technologies, and provides a shareable resource for use in a laboratory environment. The whole is driven from a Prolog command line, where the use of Matlab as a computational engine enables experiments to be designed and run with the results being automatically stored and indexed into the information structure. We present as a case-study an experiment in automatic music segmentation.

6771
Pyramidal Algorithm for the Restoration of Audio Signal Corrupted by Wideband Noise
Cohen, Azaria; Neoran, Itai
Restoration of noisy audio recordings seeks minimum degradation of sound and maximum suppression of noise. Spectral suppression methods perform best with high frequency resolution but the latter results in poor performance with transients. While Wavelet based algorithms attempt to optimize the time-frequency tradeoff, they suffer from frequency aliasing. The suggested pyramid algorithm is a good candidate to optimize the time-frequency resolution trade-off while avoiding aliasing. In this study an algorithm for removal of wide-band noise from old audio recordings is evaluated. The algorithm is based on the pyramid algorithm and on a spectral method for noise suppression. Results show enhanced conservation of onsets with efficient reduction of noise. The algorithm is implemented in real-time.

6772
Digital Music Notation Transformation using XML
Kosch, Harald; Teppan, Erich Christian
The basic problem this paper is dealing with is how to convert western music notations, written for chromatic instruments, into special tablatures for diatonic instruments. There are just a few software programs facing this problem, but with the lack of full automatic operation and flexibility. This was the main reason for the development of new data formats and a new transformation algorithm, which are more suitable for the mentioned problem. Combined in an accurate software architecture, the newly developed algorithm performs the transformation from a chromatic piece of music into a data format, which is representing a diatonic tablature.

6773
A Service-oriented High-performance Architecture for Large Scale Audio Archives
Schneider, Stephan
The contribution describes a solution for large audio archives that has been developed using a Service-oriented architecture (SOA). The audio archiving system is designed as a framework of web services that are controlled centrally by a workflow engine. The audio archiving system offers hierar