Audio Engineering Society Preprints

AES 120th Convention

Paris, France
May 20-23, 2006

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

6634
The Effect of the Singer's Head on Vocalist Microphones
Schneider, Martin
Vocalist microphones are often optimised for theoretically perfect polar patterns, e.g. cardioid, supercardioid or hypercardioid. The polar pattern can be maintained very well if the microphone is placed in the free-field, with no obstacle around it. When the singer approaches the microphone, the head serves as a reflective and diffractive obstacle. Consequently the far-field polar patterns and frequency responses are distorted, making the microphones more prone to feedback in live amplification situations, and altering the sound of the “spill” in pure recording situations.

6635
Wind Generated Noise in Microphones - An Overview - Part 1
Brixen, Eddy B.; Hensen, Ruben
When microphones are exposed to wind, noise is generated. The amount of noise generated depends on many factors: the speed and the direction of the wind being of course two of the important factors. However, the size, shape and design principles of the microphones are also very important factors. At higher wind speeds, not only is noise generated but distortion is also introduced, normally as a result of clipping. This paper presents comparative measurements that provide an overview of the parameters influencing wind-noise generated in pressure and pressure gradient condenser microphones.

6636
P-MOS FET Application for Silicon Condenser Microphones
Arimura, Norihiro; Kimura, Norio; Ohga, Juro; Yasuno, Yoshinobu
An Electret Condenser Microphone (ECM) is widely as general the microphone devices. Yearly, the miniaturized and the lowering the voltage of the cellular phone for the power consumption decrease are accelerated. Though, the current ECM has progressed in small size and thin shape, the FET has been not designed as low voltage operation in spite of small packages. This paper pays attention to the P-MOS FET of the low current consumption for miniaturization and the improvement in performance by using CMOS process. The authors designed and tested prototype microphone units, and performed comparisons on a basic performance with the conventional ECM.

6637
Development of a Super-Wide-Range Microphone
Ando, Akio; Imanaga, Keishi; Iwaki, Masakazu; Ono, Kazuho; Tanabe, Hayao
This paper describes the development of a low-noise, high-sensitivity microphone with a wide frequency range. Microphones of this kind are needed to provide high quality sound sources for use in studies on the perceptual discrimination between musical sounds with and without very high frequency components. Conventional electrostatic microphones cannot be used for such recordings because conventional methods for expanding the frequency range use a small diaphragm that degrades the S/N ratio. The proposed microphone has a new design in which the frequency range is expanded in two ways, using both the diffraction and the resonance due to the microphone's diaphragm. These effects are generally thought to define the upper limit of the frequency range, but the authors have made active use of them to achieve both a wide frequency range and high sensitivity. The body shape was designed with the help of a scale model study. An omnidirectional, electrostatic microphone that picks up sounds of up to 100kHz with low noise has been developed.

6638
Listening Broadband Physical Model for Microphones: A First Step
Elliq, Mohammed; Lambert, Dominique; Lopes, Manuel; Millot, Laurent; Pelé, Gérard; Valette, Antoine
We will present a first step in design of a broadband physical model for microphones. Within the proposed model, classical directivity patterns (omnidirectionnal, bidirectionnal and cardioids familly) are refound as limit cases: monochromatic excitation, low frequency and far-field approximation. Monophonic pieces of music are used as sources for the model so we can listen the simulation of the associated recorded soundfield in realtime thanks to a Max/MSP application. Listening and subbands analysis show that the directivity is a function of frequential subband and source location. This model also exhibits an interesting proximity effect. Audio demonstrations will be given.

6639
Measuring the Perceived Differences between Similar High-quality Microphones
McKinnie, Douglas
Microphones of similar construction and polar-pattern that can be equalized to have nearly identical on-axis frequency response still are reported to have different sonic character. To help develop a model of how other physical measurements could predict the subjective sonic character, perceptual data were collected from a panel of listeners. The listeners individually made dissimilarity ratings of pair-wise comparisons of 9 versions of a single piano performance. Each version was recorded with a different model of small-diaphragm cardioid condenser microphone. The data are used to derive a stimulus space showing the most salient dimensions upon which the perceived timbre of the microphones differed.

6640
The Native B-Format Microphone: Part II
Benjamin, Eric; Chen, Thomas
Part I of this paper described the objective performance of tetrahedral cardioid arrays versus arrays comprised of discrete pressure and pressure gradient microphone capsules. In the present paper the results of direct listening comparisons between the two types of arrays are given. Simultaneous recordings were made using pairings of the arrays for subsequent comparisons. The sources include both speech and music, and the environments include a range from very dry to very reverberant. The recordings were compared in both horizontal-only and in periphonic reproduction systems.

6641
Influence of Components Precision on Characteristics of Dual Microphone Arrays
Goldin, Alexander; Valitov, Alexander
Microphone arrays have great potential in practical applications due to their ability for significant improvement in speech quality and signal to noise ratio in noisy environments. Large amount of scientific papers and patents have been devoted to different algorithmic techniques for producing optimal output of microphone array using different optimization criteria. However, in practice performance of microphone arrays in a large extent depend on the quality of their components such as amplitude matching, phase matching, error in distance between microphones and etc. This paper analyses dependence of a dual microphone array characteristics on the above factors.

6642
Application of Segmentation and Thumbnailing to Music Browsing and Searching
Levy, Mark; Sandler, Mark
We present a method for segmenting musical audio into structural sections, and some rules for choosing a representative 'thumbnail' segment. We demonstrate how audio thumbnails are an effective and natural way of returning results in music search applications. We investigate the use of segment-based models for music similarity searching and recommendation. We report experimental results of the performance and efficiency of these approaches in the context of SoundBite, a demonstration music thumbnailing and search engine.

6643
Multiple F0 Tracking in Solo Recordings of Monodic Instruments
Röbel, Axel; Rodet, Xavier; Yeh, Chunghsin
This article is concerned with the F0 tracking in monodic instrument solo recordings. Due to reverberation, the observed signal is rather polyphonic and single-F0 tracking techniques often give unsatisfying results. The proposed method is based on multiple-F0 estimation and makes use of the a priori knowledge that the observed spectrum is generated by a single monodic instrument. The predominant F0 is tracked first and the secondary F0 tracks are then established. The proposed method is tested on reverberant recordings and show significant improvements compared to single-F0 estimators.

6644
Harmonic Plus Noise Decomposition: Time-frequency Reassignment Versus a Subspace Based Method
Badeau, Roland; David, Bertrand; Emiya, Valentin; Grenier, Yves
This work deals with the Harmonic+Noise decomposition and, as targeted application, to extract transient background noise surrounded by a signal having a strong harmonic content (speech for instance). In that perspective, a method based on the reassigned spectrum and a High Resolution subspace tracker are compared, both on simulations and in a more realistic manner. The reassignment re-localizes the time-frequency energy around a given pair (analysis time index, analysis frequency bin) while the High Resolution method benefits from a characterization of the signal in terms of a space spanned by the harmonic content and a space spanned by the stochastic content. Both methods are adaptive and the estimations are updated from a sample to the next.

6645
Signal Analysis Using the Complex Spectral Phase Evolution (CSPE) Method
Garcia, Ricardo A.; Short, Kevin M.
The Complex Spectral Phase Evolution (CSPE) method is introduced as a tool to analyze and detect the presence of short-term stable sinusoidal components in an audio signal. The method provides for super-resolution of frequencies by evaluating the evolution of the phase of the complex signal spectrum over time-shifted windows. It is shown that this analysis, when applied to a sinusoidal signal component, allows for the resolution of the true signal frequency with orders of magnitude greater accuracy than the DFT. Further, this frequency estimate is independent of the frequency bin. The method is robust in the presence of noise or nearby signal components, and is a fundamental tool in the front-end processing for the KOZ compression technology.

6646
Upwind Leapfrog Schemes in Physical Models with Mixed Modeling Strategies
Escolano, José; López, José Javier
Block-based physical modeling with mixed modeling strategies is one of the most promising method for digital sound synthesis. This technique proposes to model and discretize each element individually and their interaction topology is separately implemented. In this paper the use of the Linear Bicharacteristic Scheme (LBS) or upwind leapfrog is proposed for digital sound synthesis. It provides an efficient and accurate alternative stencil to the classical leapfrog scheme of the Finite Difference Time Domain (FDTD) method. Furthermore, in this work is proposed the use of the conversion of dependent wave equation variables into characteristic variables to obtain a method suitable to interact with Wave Digital Filter models. This technique is extensively presented and finally justified with some examples.

6647
Simple Modeling of Soundboard Effect for Piano Transcription
Beracoechea, Jon Ander; Casajús-Quirós, Francisco Javier; Ortiz-Berenguer, Luis; Perez-Aranda, J.; Torres-Guijarro, Soledad
Partials of piano sounds are inharmonic. This inharmonicity is due either to string stiffness and to soundboard impedance. The last has not been widely documented. Two problems arise: to know the value of the impedance and to evaluate the frequency deviation the partial suffers. In this work, that deviation has been calculated either by using Morse’s equations and by using author’s proposed method. To validate results, deviations for some piano notes have been measured. Besides, the soundboard impedance has also been measured to verify relationship between deviation and impedance. Moreover, a method to evaluate impedance using measured deviations is also proposed. This last method can be useful during training stages in transcription systems and for parameter extraction schemes.

6648
Contextual Effects on Sound Quality Judgements: Listening Room and Automotive Environments
Beresford, Kathryn; Ford, Natanya; Rumsey, Francis; Zielinski, Slawomir K.
This study was designed to assess the effect of the listening context on basic audio quality for stimuli with varied mid-range timbral degradations. An assessment of basic audio quality was carried out in two different listening environments; an ITU-R BS.1116 conformant listening room and a stationary vehicle. A group of untrained listeners graded basic audio quality using a novel single stimulus method. The listener population was divided into two subsets – one made evaluations in a listening room and the other in a vehicle. The single stimulus method was investigated as a possible subjective evaluation method for use in automotive environments.

6649
Next Generation Automotive Sound Research and Technologies
Benjamin, Eric; Crockett, Brett; Smithers, Michael
The automobile is quickly becoming a prominent environment for listening to multi-channel audio content. As a listening space, the automobile is both interesting and challenging due its interior structure and materials, its predominant off-axis listening positions and the amount and variability of background noise. This paper discusses these challenges, describes a number of existing multi-channel sound technologies and their applicability to the automotive environment, and presents several novel sound technologies that provide new solutions to some of these challenges. Ongoing challenges and associated automotive sound research being investigated are also presented.

6650
Spatial Sound Localization Model Using Neural Network
Correa, Rafael; Floody, Sergio; Lara, Marcelo; Venegas, Rodolfo
This work presents the design, implementation and training of a spatial sound localization system for broadband sound in an anechoic environment inspired in the human auditory system and implemented using neural networks. The data acquisition was made experimentally. The model consist in a nonlinear transformer which possesses one module of ITD and ILD extraction and a second module constituted by a neural network that estimates the sound source position in elevation and azimuth angle. A comparison between the model performance using three different bank filters and a sensitivity analysis of the neural network input are also presented. The average error is 2.4º. This project has been supported by the FONDEI fund of Universidad Tecnológica de Chile.

6651
Aurally Motivated Analysis for Scattered Sound in Auditoria
Jaramillo, Ana M.; Norris, Molly K.; Xiang, Ning
The goal of the first part of this work was to implement an aurally-adequate time-frequency analysis technique as a motivated first effort that takes into account binaural hearing with the implementation goal of the analysis of sound scattering data. The second part of this work aimed to use the developed model in the analysis of different scattering surfaces implemented in a room acoustics modeling program. This was an attempt to start to gain an understanding of what kind of visual changes could be expected when one alters the coefficients used to model scattering in conjunction with the Lambert scattering model. This research is the pursuit of a method for visually representing scattering effects that directly correlates with human perception.

6652
Audibility of Spectral Differences in Head-Related Transfer Functions
Faundez Hoffmann, Pablo; Møller, Henrik
The spatial resolution at which head-related transfer functions (HRTFs) are available is an important aspect in the implementation of three-dimensional sound. Specifically, synthesis of moving sound requires that HRTFs are sufficiently close so the simulated sound is perceived as moving smoothly. How close they must be, depends directly on how much the characteristics of neighboring HRTFs differ, and, most important, when these differences become audible. Differences between HRTFs exist in the interaural delay (ITD) and in the spectral characteristics, i.e. the magnitude spectrum of the HRTFs. The present study investigates the audibility of the spectral characteristics. To this purpose, binaural and monaural audibility thresholds of differences between minimum-phase representations of HRTFs are measured and evaluated.

6653
Looking for a Relevant Similarity Criterion for HRTF Clustering: A Comparative Study
Bondu, Alexis; Busson, Sylvain; Lemaire, Vincent; Nicol, Rozenn
For high-¯delity Virtual Auditory Space (VAS), binaural synthesis requires individualized Head-Related Transfer Functions (HRTF). An alternative to exhaustive measurement of HRTF consists in measuring a set of representative HRTF in a few directions. These selected HRTF are considered as representative because they summarize all the necessary spatial and individual information. The goal is to deduce the HRTF in non-measured directions from the measured ones by appropriate modeling. Clustering is applied in order to identify the representative directions, but the ¯rst issue relies on the de¯nition of a relevant distance criterion. The paper presents a comparative study of several criteria taken from literature. A new insight in HRTF (dis)similarity is proposed.

6654
Evaluation of a 3D-Audio System with Head Tracking
Minnaar, Pauli; Pedersen, Jan Abildgaard
A 3D-audio system was evaluated in an experiment where listeners had to “shoot down” real and virtual sound sources appearing from different directions around them. The 3D-audio was presented through headphones and head tracking was used. In order to investigate the influence of head movements both long and short stimuli were used. Twenty six people participated, of which half were students and half were pilots. The results were analyzed by calculating a localization offset and a localization uncertainty. For azimuth no significant offset was found, whereas for elevation an offset was found that is strongly correlated with the stimulus elevation. The uncertainty for real and virtual sound sources was 10º and 14º in azimuth, and 12º and 24º in elevation.

6655
Design and Verification of HeadZap, a Semi-automated HRIR Measurement System
Anderson, Mark; Begault, Durand; Godfroy, Martine; Miller, Joel D.; Roginska, Agnieszka; Wenzel, Elizabeth
This paper describes the design, development and acoustic verification of HEADZAP, a semi-automated system for measuring head-related impulse responses (HRIR) designed by AuSIM Inc. and modified by the NASA Ames Research Center Spatial Auditory Display Laboratory. HEADZAP utilizes an array of twelve loudspeakers in order to measure 432 HRIRs at 10° intervals in both azimuth and elevation, in a non-anechoic environment. Application to real-time rendering using SLAB software an audio-visual localization experiment is discussed.

6656
Visualization of Perceptual Parameters in Interactive User Interfaces: Application to the Control of Sound Spatialization
Delerue, Olivier
This work addresses the general problem of designing graphical user interfaces for non expert users. The key idea is to help the user anticipating his actions by displaying, in the interaction area, the expected evolution of a quality criterion according to the degrees of freedom which are being monitored. This concept is first applied to the control of sound spatialization: various perceptually based criteria such as “spatial homogeneity” or “spatial masking” are represented as a grey shaded map superimposed to the background of a bird’s eye view interface. After selecting a given sound source, the user is thus informed how these criteria will behave if the source is being moved to any other location of the virtual sound scene.

6657
A New Approach for Direct Interaction with Graphical Representations of Room Impulse Responses for the Use in Wave Field Synthesis Reproduction
de Vries, Diemer; Langhammer, Jan; Melchior, Frank
Room simulation based on convolution is state of the art in modern audio processing environments. Most of the systems currently available provide only a few controllers to modify the underlying room impulse responses. The sound designer can manipulate one set of numeric parameters even in spatial reproduction systems. This paper describes a new approach for the interactive control of room impulse responses based on visualization and parameterization. The new principle is originally developed for the use in Wave Field Synthesis systems and based on Augmented Reality user interfaces. An adaptation to conventional user interfaces and other spatial sound reproduction systems is possible. The modification of the room impulse responses is performed by direct interaction with 3D graphical representations of multi-trace room impulse responses.

6658
Directional Audio Coding: Filterbank and STFT-based Design
Faller, Christof; Pulkki, Ville
Directional audio coding (DirAC) is a method for spatial sound representation, applicable to arbitrary audio reproduction methods. In the analysis part, properties of the sound field in time and frequency in a single point are measured and transmitted as side information together with one or more audio waveforms. In the synthesis part, the properties of the sound field are reproduced using separate techniques for point-like virtual sources and diffuse sound. Different implementations of DirAC are described and differences between them are discussed. A modification of DirAC is presented, which provides a link to Binaural Cue Coding and parametric multi-channel audio coding in general (e.g. MPEG Surround).

6659
Newly Established IEC Standard on Audio Quality Measurement of Personal Computers
Furukawa, Masamichi; Kurakata, Kenji
A new IEC standard on audio quality measurement of personal computers (PCs) was published in December 2005, entitled IEC 61606-4 "Audio and audiovisual equipment - Digital audio parts - Basic measurement methods of audio characteristics - Part 4: Personal computer." That standard prescribes methods for measuring PC audio quality, taking into account the requirements of measuring conditions of PCs. Furthermore, a new measure of audio signal quality, short-term distortion, was introduced to describe PC-specific noise problems. This paper presents an outline of that standard.

6660
Scene Description Model and Rendering Engine for Interactive Virtual Acoustics
Jot, Jean-Marc; Trivi, Jean-Michel
Interactive environmental audio spatialization technology has become commonplace in personal computers, where its primary current application is video game sound track rendering. The most advanced PC audio platforms available can spatialize 100 or more sound sources simultaneously over headphones or multi-channel home theater systems, and employ multiple reverberation engines to simulate complex acoustical environments. This paper reviews the main features of the EAX environmental audio programming interface and its relation to the I3DL2 and MPEG-4 standards. A statistical reverberation model is introduced to account for per-source distance and directivity effects. An efficient spatial reverberation and mixing architecture is described for the spatialization of multiple sound sources around a virtual listener navigating across multiple connected virtual rooms including acoustic obstacles.

6661
Intelligent Audio for Games
Walder, Col
Providing interactive audio for computer games has traditionally been seen as a challenge, particularly given the technological limitations of games consoles. With current advances in technology, however, there is the potential to take advantage of the benefits of interactivity. This paper proposes the use of Artificial Intelligence (AI) routines to control in-game audio with a focus on implementing techniques used in film sound for drama based games. Soar architecture is presented as a good candidate for developing audio AI for games.

6662
A Frame Loss Concealment Technique for MPEG-AAC
Rose, Kenneth; Ryu, Sang-Uk
An efficient method is proposed for frame loss concealment within the advanced audio coding (AAC) decoder, which can effectively mitigate the adverse impact of frame loss on reconstruction quality. The spectral information of the lost frame is first estimated in the modified discrete cosine transform (MDCT) domain via the known frame interpolation approach. The interpolated MDCT coefficients are then further refined by magnitude scaling and sign correction, which are differently designed for tonal and noise components of the source signal: In noise-like spectral bins, shaped-noise insertion technique is employed to adjust the interpolated coefficients, while coefficients in tone-dominant bins are refined by magnitude scaling and novel sign correction techniques so as to optimize the fit of the corresponding time reconstruction with available partial signal information from neighboring frames. Subjective quality evaluations demonstrate that the proposed method achieves significant quality improvement over the shaped-noise insertion method adopted in commercial AAC decoders.

6663
Multiple Description Error Mitigation Techniques for Streaming Compressed Audio Over a 802.11 Wireless Network
Cheng, Corey I.; Jiang, Wenyu
This paper presents several multiple description (MD) coding techniques for error mitigation of compressed audio streamed over an 802.11b/g wireless network. Loosely speaking, an MD encoder generates several descriptions of the same source, and an MD decoder recreates the best estimate of the source from the set of descriptions it successfully receives. We propose a design for an MD architecture and simulate its integration into the AAC codec. We use packet loss traces gathered from an actual 802.11 b/g network to simulate the proposed codec’s error mitigation properties for various network traffic conditions. We examine how tuning several of the proposed codec’s parameters would affect the sound quality and overall bitrate of the proposed codec. Specifically, we show how interleaving, renormalization, and low-frequency variance estimation techniques can be used in conjunction with hierarchical correlating transforms to improve the sound quality of multiple description codecs.

6664
Single Frequency Networks for FM Radio
Soelberg, Pierre
Single Frequency Networks (SFN) and Near Single Frequency Networks (NSFN) are usually not considered suitable for FM radio. Some countries are now re-planning their FM bands for the use of (N)SFN, in order to make space for more stations. Even though some stations use it, like a station covering a highway, replanning the FM-band with the use of SFN for a whole country, is a different thing. The first country to do this was the Netherlands, and the first experiences with it, are not as good as expected. The requirements for synchronization of FM transmitters used for (N)SFN are explained, and SFN networks are tested from real transmitter sites. The result is a proposed correction for the Dutch norm.

6665
A Paradigm for Wireless Digital Audio Home Entertainment
Floros, Andreas; Kokkos, Nikos; Mourjopoulos, John; Tatlas, Nicolas - Alexander
Despite recent advances in wireless networking technology, real-time streaming of CD-quality digital audio remains a challenging topic. In this work, a set of applications following the server-client model was developed, facilitating the transmission and playback of PCM-coded audio over wireless links. The implementation is based on typical Personal Computer (PC) platforms interconnected with off-the-shelf wireless networking hardware. Performance evaluation tests are presented under different networking parameters and link conditions, leading to an optimal set of parameters for high-quality wireless digital audio delivery.

6666
Online Acoustic Measurements in a Networked Audio System
Härmä, Aki
A networked audio system consists of audio devices that are in the same physical environment and are connected by a network. The network connection makes it possible to perform continuous acoustic measurements between the devices. Such measurement data can be used, for example, to control the playback by the properties of the actual sound field produced. Continuous acoustic measurement involves transmission of audio data over the network. The bit-rate of the audio data should be low because the measurement is not a primary function of the networked system. In this paper we introduce a robust system for the networkedaudio measurements where the bitrate sent over the network is small.

6667
Design and Installation of Recording Studios for Vocational training
Bradley, Chris; Law, Billy
The Design and Installation of new Recording Studios for training of music and sound production allowing unparalleled direct student hands-on tuition. The design allows simultaneous recording from the live rooms to all twelve control room’s via digital distribution, enabling individual set up for a recording session, multi-track recording and subsequent mixdown. All recording sessions are saved to a centralised server which will allow back up and uploading to and from any other control room. Students can therefore import their work into any of the other control rooms at any time. Networking will be through Gigabit Ethernet so transfer of work is fast and students have their own password protected space learning the importance of file management.

6668
Flexible, High Speed Audio Networking for Hotels and Convention Centres
Bradley, Klinkradt; Chigwamba, Nyasha; Foss, Richard; Fujimori, Jun-ichi; Harold, Okai-Tettey; Klinkradt, Brad; Okai-Tettey, Harold
This paper describes the use of mLAN (music Local Area Network) to solve the problem of audio routing within hotels and convention centers. mLAN is a Firewire based digital network interface technology that allows professional audio equipment, PCs and electronic instruments to be easily and efficiently interconnected using a single cable. In order to solve this problem, an existing mLAN Connection Management Server, augmented with additional functionality, has been utilized. A graphical client application has been created that displays the various locations within a hotel/convention center, and sends out appropriate routing messages in Extensible Mark-up Language (XML) to an mLAN connection management server. The connection management server, in turn, controls a number of mLAN audio distribution boxes on the firewire network.

6669
Sound Quality Differences between Electret Film (EMFIT) and Piezoelectric Under-saddle Guitar Pickups
Penttinen, Henri; Tikander, Miikka
Two different types of under-saddle guitar pickups, piezoelectric and electret film (EMFIT) were measured and compared. The measurements included comparisons of magnitude, time, and phase responses, and distortion characteristics. The measurements were conducted with a custom rig that allowed accurate control of the environment. For excitation both frequency sweeps and impulsive stimuli were used. As for the magnitude response, the piezoelectric pickup has a boosted bass response and a slightly pronounced high frequency response. The results imply nonlinear behaviour as a function of both the excitation type (sweep vs. impulsive) and the amount of excitation force (small vs. large). In addition, the piezoelectric microphone is fairly immune to tension changes, whereas EMFIT microphone's sensitivity increases as the tension decreases. For time responses excited impulsively the only differences were found at the beginning of the responses. No significant difference in the distortion behaviour was found. A linear filter model is also proposed for making either microphone sound like the other.

6670
A Hybrid Concealment Algorithm for Non-predictive Wideband Audio Coders
Lefebvre, Roch; Vilaysouk, Vilayphone
This paper proposes a hybrid Packet Loss Concealment (PLC) algorithm for memoryless encoders such as PCM. The concealment algorithm integrates two modes, one in the time domain and the other in the frequency domain. Mode selection is performed using the previous, correctly received samples prior to an erased packet. This hybrid approach provides a packet loss concealment mechanism which can adapt to the signal characteristics and is not restricted to pure speech signals. Subjective evaluations have demonstrated that the proposed algorithm performs significantly better than single mode concealment algorithm.

6671
Towards an Inverse Constant Q Transform
Cranitch, Matt; Cychowski, Marcin T.; FitzGerald, Derry
The Constant Q transform has found use in the analysis of musical signals due to its logarithmic frequency resolution. Unfortunately, a considerable drawback of the Constant Q transform is that there is no inverse transform. Here we show it is possible to obtain a good quality approximate inverse to the Constant Q transform provided that the signal to be inverted has a sparse representation in the Discrete Fourier Transform domain. This inverse is obtained through the use of `0 and `1 minimisation approaches to project the signal from the constant Q domain back to the Discrete Fourier Transform domain. Once the signal has been projected back to the Discrete Fourier Transform domain, the signal can be recovered by performing an inverse Discrete Fourier Transform.

6672
History and Design of Russian Electro-musical Instrument "Theremin"
Vasilyev, Yurii
Electro-musical instrument Theremin, developed by the Russian physicist L. Theremin, has passed a long way in its evolution. It evokes constantly growing interest of audio-engineers and performers. Theremin is used both for performing musical compositions of different genres and for making special effects in theatrical performances, multimedia, film industry. In the presented work the analysis of circuit technique solutions has been done created during more than 80 years period both on the basis of analogous circuit technique and digital microprocessor technique, and realizations of Theremin as real and virtual musical instruments. Also advantages and disadvantages of different circuit technique solutions have been analyzed and most interesting realizations of virtual Theremin are presented.

6673
A Fast- and High-convergence Method for ICA-based Noise Reduction in Mobile Phone Speech Communication
Etoh, Minoru; Zhipeng, Zhang
This paper proposes a noise reduction technique that applies a priori information to unmixing matrix estimation in ICA; it offers fast and accurate convergence. We formulate the parameter estimation stabilized by the a priori information as a Bayesian framework?@of maximum a posteriori (MAP) estimation, and show its robustness in mobile phone environments, where the position of the microphone relative to the mouth is almost constant. We use the transfer function of mouth to microphone for one row of the unmixing matrix. Using these estimated parameters as initial values, the unmixing matrix can be updated with high efficiency in the framework of MAP estimation. Experimental results confirm that the proposed method achieves high performance, especially in high SNR noise conditions.

6674
A Comparison of Time-Domain Time-Scale Modification Algorithms
Coyle, Eugene; Dorran, David; Lawlor, Robert
Time-domain approaches to time-scale modification are popular due to their ability to produce high quality results at a relatively low computational cost. Within the category of time-domain implementations quite a number of alternatives exist, each with their own computational requirements and associated output quality. This paper provides a computational and objective output quality assessment of a number of popular time-domain time-scaling implementations; thus providing a means for developers to identify a suitable algorithm for their application of interest. In addition, the issues that should be considered in developing time-domain algorithms are outlined, purely in the context of a waveform editing procedure.

6675
The Importance of the Non-harmonic Residual for Automatic Musical Instrument Recognition of Pitched Instruments
Livshin, Arie; Rodet, Xavier
In different papers dealing with automatic musical instrument recognition of pitched instruments, the features used for classification are based solely on the fundamental frequencies and the harmonic series, ignoring the nonharmonic residual. In this paper we explore whether instrument recognition rate of pitched instruments is decreased by removing the non-harmonic information present in the sound signal.

6676
A Fuzzy Rules-based Speech/Music Discrimination Approach for Intelligent Audio Coding Over the Internet
Garcia Gálan, Sebastian; Muñoz-Exposito, Jose Enrique; Rivas Peña, Fernando; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
Our work presents a speech/music discrimination approach based on fuzzy rules for selecting the suitable coder required in an intelligent audio coding system. When the same coder is used for both speech and music, is difficult to achieve good audio quality and low bit rates for both types of signals. We propose using a simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC) for speech/music discrimination. In order to select the suitable audio coder for each audio frame, an expert system is proposed. The main advantage of the proposed approach is the low computational cost in both the speech/music discrimination and coder selection stages. It allows its use in real time applications as internet audio streaming.

6677
Analysis and Transynthesis of Solo Erhu Recordings using Addtivie/Subtractive Synthesis
Chang, Wei-Lun; Siao, Yi-Song; Su, Alvin W.Y.
Erhu is the main bowed-string instrument in traditional Chinese music, like violin in western music. It has two strings and its top plate is made of snake skin. Numerous solo works were written for erhu. In this paper, erhu resynthesis/transynthesis software is presented. We use frame based methods to analyze pitch and volume information of a solo erhu recording. Then, one can re-synthesize it using the erhu timbre extracted from the original recording, other erhu timbres, or even timbres like violin and trumpet. Additive synthesis and subtractive synthesis methods are used to synthesize the overall sound. Because the expression and playing style of the original recording are preserved, the result is realistic and musical.

6678
Application of Fisher Linear Discriminant Analysis to Speech/Music Classification
Alexandre, Enrique; Cuadra-Rodríguez, Lucas; Gil-Pita, Roberto; Rosa-Zurera, Manuel
This paper proposes the application of Fisher linear discriminants to the problem of speech/music classification. Fisher linear discriminants can classify between two different classes, and are based on the calculation of some kind of centroid for the training data corresponding with each one of these classes. Based on that information a linear boundary is established, which will be used for the classification process. Some results will be given demonstrating the superior behavior of this classification algorithm compared with the well-known K-nearest neighbor algorithm. It will also be demonstrated that it is possible to obtain very good results in terms of probability of error using only one feature extracted from the audio signal, being thus possible to reduce the complexity of this kind of systems in order to implement them in real-time.

6679
Effectiveness of Height Information for Reproducing the Presence and Reality in Multichannel Audio System
Hamasaki, Kimio; Hiyama, Koichiro; Nishiguchi, Toshiyuki; Okumura, Reiko
A 22.2 multichannel sound system was developed that adapts to an ultrahigh-definition video system with 4000 scanning lines. The sound system consisted of loudspeakers with three layers: an upper layer with nine channels, a middle layer with ten channels, and a lower layer with three channels and two channels for low frequency effects. This system has new features of three-dimensional sound reproduction. Subjective evaluation by the semantic differential (SD) method are presented to assess the importance of height information for a sound system using several stimuli in a 22.2 multichannel audio system with Super Hi-Vision and a high-definition television. Furthermore, the actual effectiveness of height information and some practical suggestions for aesthetic mixing of three-dimensional audio is also presented.

6680
Multichannel Processing for Microphones Arrays
Martignon, Paolo
Microphones arrays are employed to make measurements or recordings taking in account the spatial properties of sound. Here the attention is focused on planar array oriented to acoustic mapping, which have a particular interest in industrial and environmental acoustics, although musical and audio applications are directly involved. A Beam Forming theory overview is proposed first, with a light study of array spatial resolution, theory which holds a physical base which no algorithm can deny. Than a new algorithm, basing on Kirkeby multi-channel inversion, is proposed. Comparison between multi-channel inversion and Beam Forming are made through simulations, with good news pro the new method.

6681
Miniature Microphone Arrays for Multi-channel Recording
Backman, Juha
The paper describes a method of using a dense array of miniature microphones (e.g. MEMS or miniature electret) to yield precise one-point multi-channel gradient microphones. The signals obtained from individual microphones in the array are used to obtain a minum noise estimate for the are used to form the zeroth, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multi-channel signals with adjustable polar patterns for recording purposes.

6682
Benefits of Distance Correction for Multichannel Microphones
Goerne, Thomas
Subjective assessment of different stereophonic or multichannel techniques usually suffers from the different diffuse field sensitivity of the tested microphone setups. Although distance correction factors for gradient transducers in the diffuse sound field are well known, they are not sufficient to model multichannel microphone arrays. Thus correction factors for gradient transducers at an angle as well as for MS pairs are proposed, and the benefits of corrected microphone setups are investigated.

6683
Virtual Source Location Information Based Matrix Decoding System
Arora, Manish; Moon, Hangil
In this paper, a new matrix decoding system using vector based Virtual Source Location Information (VSLI) is proposed as one alternative to the conventional Dolby Pro logic II/IIx system for reconstructing multi-channel output signal from matrix encoded two channel signals, Lt/Rt. This new matrix decoding system is composed of passive decoding part and active part. The passive part makes crude multi-channel signals using linear combination of the two encoded signals(Lt/Rt) and the active part enhances each channel regarding to the virtual source which is emergent in each inter channel. The virtual sources between channels are estimated by inverse constant power panning law.

6684
Relating Auditory Attributes of Multichannel Reproduced Sound to Preference and to Physical Parameters
Choisel, Sylvain; Wickelmaier, Florian
Sound reproduced by multichannel systems is affected by many factors giving rise to various sensations, or auditory attributes. Relating specific attributes to overall preference and to physical measures of the sound field provides valuable information for a better understanding of the parameters playing a role in sound quality evaluation. Eight selected attributes are quantified by a panel of 39 listeners using paired-comparison judgments and probabilistic choice models, and related to overall preference. A multiple-regression model predicts preference well, and some similarities are observed within and between musical program materials, allowing for a careful generalization regarding the perception of spatial audio reproduction. Finally, a set of objective measures is derived from analysis of the sound field at the listening position in an attempt to predict the auditory attributes.

6685
Quality Degradation Effects Caused by Limiting the Bandwidth of Standard Surround Sound Channels and Hierarchically Encoded MSBTF Channels: A Comparative Study
Jiao, Yu; Rumsey, Francis; Zielinski, Slawomir K.
Limiting the bandwidth of multichannel audio can be used as an effective method of trading-off audio quality with broadcasting costs. In this paper, subjective effects of two controlled high frequency limitation methods on multichannel audio quality were studied with formal listening tests. The first method was based on limiting the bandwidth of standard surround sound channels (Rec. ITU-R BS. 775-1), the second involved limiting the bandwidth of the hierarchically encoded MSBTF channels. The results are compared and discussed. In this experiment, the Low Frequency Effect (LFE) channel was omitted.

6686
Initial Developments of an Objective Method for the Prediction of Basic Audio Quality for Surround Audio Recordings
George, Sunish; Rumsey, Francis; Zielinski, Slawomir K.
This paper describes the development of the objective method for the prediction of the Basic Audio Quality (BAQ) of bandlimited or down-mixed surround audio recordings. A number of physical parameters, including interaural cross-correlation coefficient and spectral descriptors, were extracted from the recordings and used in a linear regression model to predict BAQ scores obtained from listening tests. The results showed a high correlation between the predicted scores and those obtained from the listening test with the average error of prediction being smaller than 10%. Although the method was originally developed for 5-channel surround recordings, after some modifications it can be upgraded to any number of audio channels.

6687
Listener Opinions of Novel Spatial Audio Scenes
Beresford, Kathryn; Rumsey, Francis; Zielinski, Slawomir K.
Listener opinions for alternative approaches to recording multichannel classical music were investigated, particularly considering alternatives to the traditional approach. Recordings were made with pre-existing microphone arrays but alternative arrangements of musicians. These were used in a listening test to assess different attributes (timbral balance, envelopment, locatedness etc.). From the results it was noted that naïve and trained listeners assessed the recordings in different ways. Through factor analysis, two components were identified to represent these assessments – creativity and conventionality. The naïve listeners indicated that purchasability was closely related to creativity whereas for the trained listeners, conventionality was an indicator of purchasability. A method for predicting purchasability was developed which may aid future work in the area.

6688
Low Frequency Sound Field Enhancement System for Rectangular Rooms using Multiple Low Frequency Loudspeakers
Birkedal Nielsen, Sofus; Celestinos, Adrian
Rectangular rooms have strong influence on the low frequency performance of loudspeakers. Simulations of three different room sizes have been carried out using finite-difference time-domain method (FDTD) in order to predict the behaviour of the sound field at low frequencies. By using an enhancement system with extra loudspeakers the sound pressure level distribution along the listening area presents a significant improvement in the subwoofer frequency range. The system is simulated and implemented on the three different rooms and finally verified by measurements on the real rooms.

6689
Tactile Strategies and Resources for Teaching Multichannel Sound Concepts
Gaston, Leslie
Several university audio programs now incorporate multichannel, or “surround” sound into their curricula. In order to supplement these courses and lectures, many opportunities exist to incorporate hands-on demonstrations of concepts used for microphone techniques, mixing, monitoring, and mastering. This paper will give suggestions for different tactile strategies which can be used to illustrate concepts in multichannel audio, as well as other resources which may be utilized when doing preparation and research for teaching classes. Suggestions for homework and research topics for students will also be provided, along with recommended equipment needs.

6690
All Amplifiers are Analogue, but Some Amplifiers are More Analogue than Others
Groenenberg, René; Putzeys, Bruno; van der Hulst, Paul; Veltman, André
This paper intends to clarify the terms "digital" and "analogue" as applied to class-D audio power amplifiers. Since loudspeaker terminals require an analogue voltage, an audio power amplifier must have an analogue output. If its input is digital, digital-to-analogue conversion is executed at some point. Once a designer acknowledges the analogue output properties of a class-D power stage, amplifier quality can improve. The incorrect assumption that some amplifiers are supposedly digital, causes many designers to come up with twisted digital patches to ordinary analogue phenomena such as timing distortion or supply rejection. This irrational approach blocks the way to a rich world of well-established analogue techniques to avoid many of these problems and realize otherwise unattainable characteristics such as excellent THD+N and extremely low output impedance throughout the audio band.

6691
Towards an Ideal Switching (Class-D) Power Amplifier: How to Control the Flow of Power in a Switching Power Circuit
Esslinger, Rolf; Jurzitza, Dieter
The design of a switching (class-D) audio power amplifier suitable for high-end audio applications is still a very challenging task for circuit design and signal processing engineers. Classical power stage topologies using Pulse-Width Modulation (PWM) in combination with voltage-controlled MOSFET H-bridges are already available on the market, but their performance in terms of signal bandwidth and linearity is still far below the one of traditional class-A and A/B power stages. Moreover, EMC is an issue that is very hard to control. Class-D output stages are considered from a totally different point of view in this paper: The flow of power in the output stage, containing the switching power stage as an “power control element”, the output filter as an “energy store” and the load as both an “power sink” and an “power source” in case the load is not a resistor but a real world loudspeaker device. It is shown, where in a typical power stage the power loss occurs, which is dissipated as heat. To improve the quality and efficiency of high-frequency switched power stages, investigation has to be taken into the way, how to control the flow of power into the storage elements and how to charge them most precisely and most efficiently. Some fundamental approaches for this will be shown in this paper.

6692
Second Generation Intelligent Class D Amplifier Controller Integrated Circuit Enables both Low Cost and High Performance Amplifier Designs
Andersen, Jack; Chieng, Daniel; Harris, Steven; Klaas, Jeff; Kost, Michael; Taylor, Skip
This paper describes a digital input Class D amplifier controller integrated circuit which performs many of the functions needed to build a high performance Class D audio amplifier. Sophisticated digital Pulse Width Modulation, combined with digital feed-forward and feedback paths, yields both low cost and high performance amplifier designs. A powerful DSP is included to support amplifier control and allows comprehensive audio signal processing, including loudspeaker load compensation, EQ, time alignment, room acoustics compensation, bass enhancement, loudspeaker driver protection, virtual surround and other audio signal processing tasks. Power supply feed-forward and closed-loop feedback technology correct for power supply variations, non-linearity and other distortion-inducing mechanisms.

6693
PWM Amplifier Control Loops with Minimum Aliasing Distortion
Neesgaard, Claus; Risbo, Lars
PWM class-D audio power amplifiers contain typically a control loop filter network and a comparator producing the PWM signal. The comparator performs a sampling operation whenever it changes state. A previous paper by the author analyzed this sampling behavior from a small signal point of view. The present paper attempts to formulate a large-signal model that accounts for the non-linear effects of the sampling due to aliasing of high frequency carrier components. The model is validated using simulations and a class of loop filters is presented that obtains minimum aliasing distortion thanks to the use of quadrature sampling. Finally, measurement data are presented for real applications using the principles described.

6694
Simple, Ultralow Distortion Digital Pulse Width Modulator
Putzeys, Bruno
A core problem with digital Pulse Width Modulators is that effective sampling occurs at signal-dependent intervals, falsifying the z-transform on which the input signal and the noise shaping process are based. In a first step the noise shaper is reformulated to operate at the timer clock rate instead of the pulse repetition frequency. This solves the uniform/natural sampling problem, but gives rise to new non-linearities akin to ripple feedback in analogue modulators. By modifying the feedback signal such that it reflects only the modulated edge of the pulse train this effect is practically eliminated, yielding vastly reduced distortion without increasing complexity.

6695
A High Performance Open Loop All-digital Class-D Audio Power Amplifier using Zero Positioning Coding (ZePoC)
Mathis, Wolfgang; Schnick, Olaf
Open loop all-digital Class-D amplifiers are uncommon due to the lack of the correcting feedback path leads to several problems resulting in high distortion compared to analog controlled class-D amplifiers. This paper shows that SB-ZePoC lowers switching frequency to 100 kHz. Therefore, these problems can be solved, so that it is possible to design an open loop all-digital class-D audio amplifier with total distortions below 0,01% in the whole listening-band (20 Hz-20 kHz) and an efficiency that reaches 90%. Results of a test-setup will be presented. The sonic performance will be demonstrated during the session.

6696
A Three Level Trellis Noise Shaping Converter for Class D Amplifiers.
Ausiello, Ludovico; Rovatti, Riccardo; Setti, Gianluca
Class D ampli ers can represent signals with three di erent output levels, +Vcc, 0, -Vcc, with no distortion. Exploiting this in order to achieve a better performance with no switching frequency increase, an extension to the classic Pulse Width Modulation two level A/D conversion is proposed. Coding is achieved by extending output waveforms of a Trellis based Sigma Delta Modulation to three levels. Simulation results have shown that, using the same symbol rate, a three level pattern achieved from 3:7 to 8:2 dB of SINAD improvement and a power consumption up to 5 times smaller.

6697
Using SIP Techniques to Verify the Trade-off between SNR and Information Capacity of a Sigma Delta Modulator
Ho, Charlotte; Ling, Bingo Wing-Kuen; Reiss, Joshua D.
The Gerzon-Craven noise shaping theorem states that the ideal information capacity of a sigma delta modulator design is achieved if and only if the noise transfer function (NTF) is minimal phase. In this paper, it is found that there is a trade-off between the signal-to-noise ratio (SNR) and the information capacity of the noise shaped channel. In order to verify this result, loop filters satisfying and not satisfying the minimal phase condition of the NTF are designed via semi-infinite programming (SIP) techniques and solved using dual parameterization. Numerical simulation results show that the design with a minimal phase NTF achieves near the ideal information capacity of the noise shaped channel, but the SNR is low. On the other hand, the design with a non-minimal phase NTF achieves a positive value of the information capacity of the noise shaped channel, but the SNR is high. Results are also provided which compare the SIP design technique with Butterworth and Chebyshev structures and ideal theoretical SDMs, and evaluate the performance in terms of SNR and a variety of information theoretic measures which capture noise shaping qualities.

6698
Estimation of Initial States of Sigma-delta Modulators
Ho, Charlotte; Ling, Bingo Wing-Kuen; Reiss, Joshua D.
In this paper, an initial condition of a sigma-delta modulator is estimated based on quantizer output bit streams and an input signal. The set of initial conditions that generate a stable trajectory is characterized. It is found that this set, as well as the set of initial conditions corresponding to the quantizer output bit streams, are convex. Also, it is found that the mapping from the set of initial conditions to the stable admissible set of quantizer output bit streams is invertible if the loop filter is unstable. Hence, the initial condition corresponding to given stable admissible quantizer output streams and an input signal is uniquely defined when the loop filter is unstable, and a projection onto convex set approach is employed for approximating the initial condition.

6699
High Performance Real-time Software Asynchronous Sample Rate Converter Kernel
Heeb, Thierry
A scalable real-time asynchronous sample rate converter software kernel is presented that offers a flexible alternative to the usual hardware implementations. The kernel is dynamically configurable at run-time and supports almost arbitrary upsampling or downsampling ratios and any number of channels. Due to its scalability this sample rate converter kernel may be used both for low complexity, cost-sensitive implementations as well as for top-performance applications. In a typical high peformance application, sample rates of 384kHz are easily achieved on a low cost DSP and DSD input data streams are also supported for compatibilty with SACD.

6700
Clean Clocks, Once and for All?
Frandsen, Christian G.; Travis, Chris
Network-based digital audio interfaces are becoming increasingly popular. But they do pose a significant jitter problem wherever high-quality conversion to/from analog is required. This is true even with networks such as 1394 that provide dedicated support for isochronous flows. Conventional PLL solutions have too-little jitter attenuation, too-much intrinsic jitter, and/or too-narrow a frequency range. More-advanced solutions tend to have too-high a cost. We present a new clocking technology that boasts high performance and low cost. It has been implemented in a recent audio-over-1394 chip. We show comparative performance results and explore system-level implications, including for systems that use point-to-point links such as AES3, SPDIF and ADAT.

6701
Comprehensive Analysis of Loudspeaker Span Effects on Crosstalk Cancellation in Spatial Sound Reproduction
Bai, Mingsian R.; Lee, Chih-Chung
This paper seeks to pinpoint the optimal loudspeaker span that best reconciles the robustness and performance of the crosstalk cancellation system (CCS). Two sweet spot definitions are employed for assessment of robustness. Besides the point source model, head related transfer functions are employed in the simulation to capture more design aspects in practical situations. Three span angles, 10 degrees, 60 degrees, and 120 degrees, are compared via objective and subjective experiments. Analysis of Variance is applied for analysis. The results indicate that not only the CCS performance but also the panning effect and head shadowing will dictate the overall performance and robustness. The 120-degree arrangement performs comparably well as the 60-degree arrangement, but is more preferred than the 10-degree arrangement.

6702
A Perceptual Measure for Assessing and Removing Reverberation from Audio Signals
Buchholz, Jörg; Hatziantoniou, Panagiotis; Mourjopoulos, John; Zarouchas, Thomas
A novel signal-dependent approach is followed here for modeling perceived distortions due to reverberation in audio signals. The method attempts to describe perceived monaural time-frequency and level distortions due to reverberation. A Computational Auditory Masking Model (CAMM) is employed, using as inputs the reverberant and reference (anechoic) signal, generating time-frequency maps of perceived distortions. From these maps and in a number of sub-bands, gain vs time functions are derived allowing suppression of reverberation in the processed signal.

6703
Investigating Spatial Audio Coding Cues for Meeting Audio Segmentation
Burnett, Ian; Cheng, Eva; Ritz, Christian
As multiparty meetings involve participants that are generally stationary when actively speaking, participant location information can be used to segment the recorded meeting audio into speaker ‘turns.’ In this paper, speaker location information derived from ‘spatial cues’ generated by spatial audio coding techniques is investigated. The validity of using spatial cues for meeting audio segmentation is explored through investigating multiple microphone meeting audio recording techniques and extracting and comparing spatial cues used by different spatial audio coders. Experimental results show the statistical relationship between speaker location and interchannel level and phase-based spatial cues strongly depends on the microphone pattern. Results also indicate that interchannel correlation-based spatial cues represent location information that is ambiguous for meeting audio segmentation.

6704
The Effect of Audio Compression Techniques on Binaural Audio Rendering
Katz, Brian F.G.; Prezat, Fabien
The use of “lossy” audio compression is becoming increasingly common. Many studies have concentrated on the audio quality of such compression techniques, predominantly in a monaural context. This study investigates the effects of audio compression techniques on spatialized audio, specifically binaural audio. Various compression techniques (AAC, ATRAC, MP2, and MP3) using various bitrates when possible have been applied to several test signals. This work presents numerical and perceptive comparisons of the variations in inter-aural time difference (ITD) due to audio compression techniques. Some investigations were also made concerning the effect on spectral peaks and notches, as these spectral cues (contained in the Head-Related Transfer Function, HRTF) are necessary for more precise localization including front-back discrimination and elevation.

6705
Sound Source Obstruction in an Interactive 3Dimensional MPEG-4 Environment
Reiter, Ulrich; Steglich, Beatrix
This paper describes the continuation of research concerning sound source obstruction in virtual scenes. An algorithm for the determination of sound source obstruction was implemented in the described 3D MPEG-4 Environment. With the help of the MPEG-4 Advanced AudioBIFS node AcousticMaterial acoustic properties are assigned to potential obstructors in a virtual scene. Various implementations of acoustic obstruction are explained. Furthermore, a bimodal subjective assessment was performed in order to identify the best implementation of obstruction. The results of the assessment are presented in-depth. Additionally we demonstrate a concept for a second intended bimodal assessment for the comparison of gain and frequency filtering and give an outlook for further research and development in the area of immersive acoustics.

6706
JavaOL - A Structured Audio Orchestra Language: Tools, Player and Streaming Engine
Siao, Yi-Song; Su, Alvin W.Y.; Wang, Tien-Ming
MPEG-4 Structured Audio (SA) [1] defined a set of tools to provide high quality low bit rate audio. In MPEG-4 SA, SAOL (Orchestra Language) is the most important part because it is used to implement algorithms to generate sounds. However, SAOL must be translated into other programming languages such that it can be executed currently. This requires lots of computing power to achieve real-time decoding. Based on MPEG-4 SAOL, we propose JavaOL because it eliminates the translation process and is more efficient. In fact, it is Java equipped with SAOL opcode library. Therefore, one can achieve the same functions provided by SAOL. We also provide a RTP streaming engine and the associate player. Software tools are provided to combine other audio sources.

6707
Using Remote Recording over the Internet in Education
Baillie, Lynne; Dewar, Martin; Harrison, David; Knox, Don; Quinn, Patrick
Remote recording across the Internet now appears to have come of age with the recent development of appropriate software and infrastructure. Within the educational sector the Internet has taken a central role as a means to deliver educational materials. In this innovative pilot project involving Glasgow Caledonian University and its partner Coatbridge College, the use of the Internet to teach audio technology and production techniques will be explored and evaluated. It is anticipated that the knowledge and experience gained will better prepare the audio professionals of the future.

6708
A Community Hierarchic Based Approach for Scalable Parametric Audio Multicasting Over the Internet
Cuevas-Martinez, Juan Carlos; Garrido-Rivera, P.J.; Ruiz-Perez, J.; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
One of the main features of a low bit rate audio coder is its availability for broadcast over media, mainly over the Internet and mobile networks. It is well known that it is not a trivial problem; there are many troubles that could appear in a multicasting system, mainly due to Internet lack of QoS. This kind of audio traffic has to exist with TCP connections, has to avoid congestion and should require as less changes network equipments as possible. So, in this paper we propose some features to be taken into consideration in the development of a multicast and peer-to-peer communication protocol for scalable parametric audio broadcasting over the Internet with low bit rate and good quality.

6709
Distant Teaching of Chamber Music via Local Area Netwoks
Bitzer, Joerg; Kurtisi, Zefir; Loesch, Thomas; May, Tobias
In this paper we present a study on teaching chamber music via internet. The application for this setup is for a highly reputed teacher to teach professional musicians at a very high level. Usually, all participants would have to fly from all over the world in order to work together. Therefore, it would be of great value, if these teaching lessons could be done via internet. Several audio and video devices and different audio setups have been tested. The results indicate that MPEG 2 broadcast devices with two microphones are suitable for this task.

6710
Implementation of Immersive Audio Applications using Robust Adaptive Beamforming and Wave Field Synthesis
Beracoechea, Jon Ander; Casajus, Javier; García, Lino; Ortiz, Luis; Torres-Guijarro, Soledad
An immersive audio system oriented to future communication applications is presented. The aim is to build a system where the acoustic field of a chamber is recorded using a microphone array and then is reconstructed or rendered again, in a different chamber using loudspeaker array based techniques. Our proposal relays on recent robust adaptive beamforming techniques and joint audio-video source localization for effectively estimating the original sources of the emitting room. The estimated source and the source localization information drive a Wave Field Synthesis engine that renders the acoustic field again at the receiving chamber. The overall system performance is tested using a MUSHRA-based subjective test in a real situation.

6711
Spatial Aliasing Artifacts Produced by Linear and Circular Loudspeaker Arrays used for Wave Field Synthesis
Rabenstein, Rudolf; Spors, Sascha
Wave field synthesis allows the exact reproduction of sound fields if the requirements of its physical foundation are met. However, the practical realization imposes certain technical constraints. One of these is the application of loudspeaker arrays as an approximation to a spatially continuous source distribution. The effect of a finite spacing of the loudspeakers can be described as spatial sampling artifacts. This contribution derives a description of the spatial sampling process for planar linear and circular arrays, analyzes the sampling artifacts and discusses the conditions for preventing spatial aliasing. It furthermore introduces the reproduced aliasing-to-signal ratio as a measure for the energy of aliasing contributions.

6712
Characterization of the Reverberant Sound Field Emitted by a Wave Field Synthesis Driven Loudspeaker Array
Caulkins, Terence; Warusfel, Olivier
Realistic sound reproduction using Wave Field Synthesis in concert halls involves ensuring that both the direct and reverberated sound fields are accurate at all listening positions. Though methods for controlling the direct sound field have been described in the past, the control of the reverberated sound field associated to WFS sources remains a topic of interest. This article describes the characterization of the reverberated sound field associated to a WFS array as it synthesizes a virtual point source. Variations in the directivity and positioning of the virtual source are shown to have an effect on the associated room effect. A solution for controlling the reverberated sound field in a concert hall equipped with a WFS system is proposed, based on this characterization.

6713
Conjugate Gradient Techniques for Multichannel Acoustic Echo Cancellation in Frequency Domain
Beracoechea, Jon Ander; Casajús-Quirós, Francisco Javier; García Morales, Lino; Torres-Guijarro, Soledad
Multichannel acoustic cancellation problem requires working with extremely large impulse responses. Multirate adaptive schemes such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellation systems nowadays. However, being a Least Mean Square (LMS) derived algorithm, the convergence speed may not be fast enough under some circumstances. In this paper we propose a new scheme which combines frequency-domain adaptive filtering with conjugate gradient techniques in order to speed up the convergence time. The new algorithm (PBFDAF-CG) is developed and its behavior is compared against previous PBFDAF schemes.

6714
SigmaStudio. A User Friendly, Intuitive and Expandable, Graphical Development Environment for Audio/DSP Applications.
Chavez, Miguel A.; Huin, Camille
Graphical development environments have been used in the audio industry for a number of years. Those who have fewer limitations have persisted and found a well establish pool of users that is reluctant to modify heir design patterns and adopt different embedded processors and design environments. This article will provide a small history of the evolution of integrated development environments (IDEs). It will then describe and explain the software architecture decisions and design challenges that were used to develop SigmaStudio, it will also show the advantages that those decisions have meant for the SigmaDSP family of audio centric embedded processors.

6715
Filter Update Techniques for Adaptive Virtual Acoustic Imaging
Kim, Youngtae; Mannerheim, Pal; Nelson, Philip A.
This paper deals with filter updates for adaptive virtual acoustic imaging systems using binaural technology and loudspeakers. The problem is to update the inverse filters without creating any audible changes for the listener. The problem can be overcome by using either a very fine mesh for the inverse filters or by using commutation techniques.

6716
Adaptive Filters in Wavelet Transform Domain
Bajic, Vladan
The paper presents performance comparison between two methods of implementing adaptive filtering algorithms for noise reduction, namely the Normalized time domain Least Mean Squares (NLMS) algorithm, and the Wavelet transform domain LMS (WLMS). A brief theoretical development of both methods is explained, and then both algorithms are implemented on the real time Digital Signal Processing (DSP) system used for audio signals processing. Results are presented showing the performance of each algorithm both in time and frequency domains. Noise reduction effects produced by different algorithms were shown across the spectrum, and distorting effects were analyzed. Trade-offs of convergence speed versus added noise were analyzed. Overall results show convergence speed improvement when using WLMS algorithms over the NLMS algorithm.

6717
Adaptive Time-Frequency Resolution for Analysis and Processing of Audio
Lukin, Alexey; Todd, Jeremy
Filter banks with fixed time-frequency resolution, such as STFT, are a common tool for many audio analysis and processing applications allowing effective implementation via FFT. The major problem with STFT approach is a fixed time-frequency resolution. In this paper, we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank specific artifacts such as pre-echo while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated for problems of spectrograms display, noise reduction, and spectral effects processing.

6718
Advanced Methods for Shaping Time-Frequency Areas for the Selective Mixing of Sounds
Kleczkowski, Adam; Kleczkowski, Piotr
The Selective Mixing of Sounds (presented in AES paper 6552) contains a large and conceptually challenging part, which had not been developed previously. This is a method of determining the areas in the time-frequency plane. It has a major effect on the quality of the sound. In this paper we propose and compare a range of appropriate algorithms. We begin with a simple two-dimensional running mean combined with a rule selecting the track characterised by the maximum energy, followed by a low-pass filter based on the 2-dimensional Fourier transform. We also propose several novel methods based on Monte-Carlo approach, in which local probabilistic rules are iterated many times to produce a required level of smoothing.

6719
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Bonada, Jordi; Loscos, Alex; Vinyes, Marc
Audio Blind Separation in real commercial music recordings is still an open problem. In the last few years some techniques have provided interesting results. This article presents a human-assisted clusterization of the DFT coefficients for the Time-Frequency Masking demixing technique. The DFT coefficients are grouped by adjacent panning and phase difference and an interactive graphical interface allows the user to select in real-time panning and phase difference ranges for each song. Results prove an implementation of such technique can be used to demix tracks from nowadays commercial songs. Sample sounds can be found here: http://www.iua.upf.es/~mvinyes/abs/demos

6720
A Multichannel Speech Dereverberation Technique Based Upon the Wiener Filter
Boland, Frank; McCarthy, Denis
We present a new method for dereverberating speech, based upon a multichannel Wiener Filter and a microphone array. We demonstrate the effectiveness of this method under real, reverberant conditions and show that the method may be described as a self-steering beamformer. Furthermore, we investigate the performance of the method under simulated conditions, designed to closely match the acoustic characteristics of the real room environment. These simulations yield significantly inferior results to those obtained using real recordings and we show that this is as a result of the failure of simulated impulse responses to accurately model real impulse responses, in certain critical respects.

6721
Effective Room Equalization Based on Warped Common Acoustical Poles and Zeros
Jeong, Jae-woong; Jeong, Seh-Woong; Lee, Junho; Park, Young-cheol; Youn, Dae-hee
This paper presents a new method of designing room equalization filters using a warped common acoustical pole and zero (WCAPZ) modeling. The proposed method is capable of significantly reducing the order of the equalization filters without sacrificing the filter performance. Thus, the associated input-output delay is much smaller than the conventional room equalization method while its computational complexity is comparable to it. This paper also presents an adaptive IIR filter structure to overcome computational problems associated with calculation of CAPZ coefficients. Simulation results confirm that the use of the proposed algorithm significantly improves the room equalization over a range of low frequencies.

6722
Parametric Recursive Higher-Order Shelving Filters
Holters, Martin; Zölzer, Udo
The main characteristic of shelving filters, as commonly used in audio equalization, is to amplify or attenuate a certain frequency band by a given gain. For parametric equalizers, a filter structure is desirable that allows independent adjustment of the width and center frequency of the band, and the gain. In this paper, we present a design for arbitrary-order shelving filters and a suitable parametric structure. A low shelving filter design based on Butterworth filters is decomposed such that the gain can be easily adjusted. Transformation to the digital domain is performed, keeping gain and denormalized cut-off frequency independently controllable. Finally, we obtain mid and high shelving filters using a simple manipulation, providing the desired parametric filter structure.

6723
Enhanced Control of Sound Field Radiated by Co-axial Loudspeaker Systems using Digital Signal Processing Techniques
Boucher, Jean Marc; Debail, Bernard G.A.; Diquelou, Pierre Yves; Kerneis, Yvon; Shaiek, Hmaied
In multi-way loudspeakers, DSP techniques have been used so far mainly to correct frequency response, time alignment and out of axis lobbing. In this paper, a dedicated signal processing technique is described in order to also control the sound field radiation of a new 4 way co-axial source in the overlap frequency band of drivers. Trades-off and practical constraints (crossover, time shift, gain ...) are discussed and an optimization algorithm is proposed to provide the best achievable result. A real time implementation of this technique dedicated to the new 4-way co-axial source is presented and lead to a nearly ideal point source.

6724
Network Music Performance (NMP) in Narrow Band Networks
Carôt, Alexander; Krämer, Ulrich; Schuller, Gerald
Playing live music on the Internet is one of the hardest disciplines in terms of low delay audio capture and transmission, time synchronization and bandwidth requirements. This has already been successfully evaluated with the Soundjack software which can be described as a low latency UDP streaming application. In combination with the new Fraunhofer ULD Codec this technology could now be used in narrow band DSL networks without a significant increase of latency. This paper first describes the essential basics of network music performances in terms of soundcard and network issues and finally reviews the context under DSL narrow band network restrictions and the usage of the ULD Codec.

6725
Intensive Noise Reduction utilizing Inharmonic Frequency Analysis of GHA
Kanda, Yoshihiro; Muraoka, Teruo; Ohta, Takumi; Takamizawa, Ryuji
Removals of noise in SP record reproduction were attempted utilizing GHA (Generalized Harmonic Analysis) as Inharmonic frequency analysis. Spectrum subtraction is most common among conventional noise reduction techniques, however it has a side effect of musical noise generation. It is caused by inaccurate frequency resolution inherent to conventional Harmonic frequency analysis. One of inharmonic frequency analysis of GHA is equipped with excellent frequency resolution, and has been put in practical use recently. The authors applied GHA for noise reduction and obtained better results than those by conventional Spectrum subtraction. However there still remained musical noise problems and its major reason is spectral in-coincidence between pre-sampled reference noise and actually remained residual noise. The authors tried several countermeasures such as pre-spectral shaping of object signal and spectral similarity calculation of residual noise etc. Through combining countermeasures, the authors achieved satisfactory noise reduction.

6726
Advanced Cataloging and Search Techniques in Audio Archiving
Blohmer, Helge
Ever since the processing capabilities of computers reached the point where audio indexing and searching became possible using techniques beyond simple, manually entered textual annotation in the late 1980s, researchers have been developing such methods with varying degrees of success. Yet even today, the actual workflow in audio archives is dominated by text entry for cataloging and keywords for searching with few or none of the new methods having achieved any practical relevance. This paper is evaluating a number of techniques, both those that enhance textual retrieval and those that seek to supplant it, towards their suitability for real-world audio archiving tasks with special focus on their suitability for a short-term implementation and seamless integration into existing archive workflows.

6727
Evaluation of Query-by-Humming Systems using a Random Melody Database
Batke, Jan-Mark; Eisenberg, Gunnar
The performance of melody retrieval using a query-by-humming (QBH) system depends on different parameters. For the query, parameters like length of the query and possibly contained errors influence the success of the retrieval. But also the size of the melody database inside the QBH-system has a certain impact on the query. This paper describes how the statistical parameters of a random melody database are modelled to get the same behaviour like a database containing authentic melodies. Databases containing random melodies are a testing facility to QBH-systems.

6728
MP3 Window-switching Pattern Analysis for General Purposes Beat Tracking on Music with Drums
D'Aguanno, Antonello; Haus, Goffredo; Vercellesi, Giancarlo
This work analyzes the dependency of the window-switching pattern versus: different encoders, bit rates and encoder quality features. We propose a simple template-matching algorithm to solve beat tracking contest in music with drums. This algorithm uses windows-switching pattern information only. Commonly in a beat-tracking system the window-switching pattern is used to refine the results of a frequency evaluation. Furthermore, this paper wants to demonstrates the reliability of the window-switching pattern to solve beat-tracking problem in music with drums independently from encoders, bit rates, encoders quality features and frequency analysis. This paper confirms the window-switching pattern is adequate information in a beat-tracking contest at every bit rate and for every encoder.

6729
Application of MPEG-4 SLS in MMDBMSs – Requirements for and Evaluation of the Format
Meyer-Wegener, Klaus; Penzkofer, Florian; Suchomski, Maciej
Specific requirements for audio storage in multimedia database management systems, where data independence of continuous data plays a key role, are described in this paper. Based on the defined characteristics for the internal format for natural audio considering especially long time storage, where the lossless property of the format is a must, allowing among others easy upgrade of the system, the new MPEG-4 scalable lossless audio coding (SLS) is shortly explained. The evaluation of SLS w.r.t. the discussed requirements is conducted in the meaning of characteristics and processing complexity of the algorithm. Some suggestions of the possible modifications are given at the end.

6730
Applying EAI Technologies to Bimedial Broadcast Environments. Challenges, Chances and Risks.
Zimmermann, Michael
More and more broadcast companies try to optimize their production environments by enforcing bimedial workflows. The recent applications and tools on the other hand only have poor integration interfaces to achieve this goal. EAI, originally focussing on the integration of legacy systems, has become a mature toolset to integrate various systems and offering tools and applications to ease integration. This lecture should show the possibilities and limits of EAI in bimedial broadcast environments.

6731
Virtual Concert: Spatial Sound in DVD Technology
Gordon, David M. H.
A comprehensive research paper documenting the use of spatial sound in DVD technology. Sets out to evaluate the communicative abilities of spatial sound and the implications of combining spatial sound along with selective multiple camera angles. Increases the rationale by investigating the use of a non-linear structure in the presentation of audio-visual DVD products. Asserts that no product currently integrates these deconstructed components into a singular framework, and therefore reports on the development of a concept titled Virtual Concert. Discusses the underlying concept of Virtual Concert, in relation to the combination of surround sound music mixes with the corresponding camera angle, presented in a non-linear structure. The emphasis is on practical subjective evaluation through a screening of Virtual Concert, and subsequent distribution of comprehensive questionnaires.

6732
The Adaptation of Concert Hall Measures of Spatial Impression to Reproduced Sound
Davies, W. J.; Hirst, Jonathan; Philipson, Peter
A method of objectively measuring the spatial capabilities of multichannel sound systems has been investigated. The method involves the comparison of interaural cross correlation (IACC) measurements taken in a concert hall to IACC measurements taken in reproduced versions of the same concert hall. The type of reproduction system was varied and an indication of the spatial capabilities of each system was gained from the comparison of original and reproduced IACC measurements. The comparisons revealed that all the reproduction systems were unable to match the lowest IACC readings taken in the concert hall and that the measurement method was capable of discriminating between the spatial performances of the reproduction systems and also to rank the system's performances in an expected order.

6733
Analysis of Spatial Resolution of Multiactuator Panels
Bleda, Sergio; Escolano, José; López, José Javier; Pueo, Basilio
A study of the aliasing frequency of Multiactuator Panels (MAPs) for Wave Field Synthesis (WFS) is presented. It is based on the periodicity of the spatial frequency in a wavenumber domain analysis. The success of these loudspeakers for WFS lies in the absence of exciter cross interference, acting as single sources. However, the distance between exciters may not be indicative of the spatial resolution capability of the array. A set of four MAPs comprising 32 exciters were measured by using this multidimensional analysis. An additional dynamic loudspeaker array having the same loudspeaker spacing was also measured. Results show a good correlation with expected figures given by the distance between exciters.

6734
New CLD Quantization Method for Spatial Audio Coding
Choi, Seung Jong; Jung, Yang-Won; Kim, Hyo Jin; Oh, Hyen-O
In spatial audio coding, spatial parameters, such as CLDs, CPCs, ICCs, are utilized for down-mix and up-mix of the multi-channel audio signals. In the current version of MPEG Surround, an universal quantization table for CLD is applied independent to channel combinations. As intervals between adjacent channels differ in the conventional 5.1ch configuration, this universal quantization scheme causes redundancies in some combinations while insufficiencies in the other combinations. In this paper, we propose a new CLD quantization method based on well-known amplitude panning law and spatial resolution of human perception. By the proposed quantization method, CLD can be represented more efficiently, and therefore, bit reduction and quality enhancement can be achieved.

6735
Koch’s Snowflake: A Case Study of Sound Scattering of Fractal Surfaces
Cabrera, Densil; Degos, David; Edson, Steven
Diffusion and scattering are becoming increasingly relevant in room acoustics design. The scattering performance of current passive diffusers is often restricted to a certain bandwidth due to physical constraints. One possible approach to this is to use fractal surface profiles, which have similar geometric features over a wide range of scales, and so should achieve an extended bandwidth for effective scattering. A range of acoustic panels of varying complexity, based around Koch’s Snowflake pattern, was constructed and tested using a two-dimensional pseudo-anechoic method adapted from the AES-4id-2001. This paper reports on these results, and also on issues encountered in implementing the measurements.

6736
Large Scale FEM Analysis of a Studio Room
Ahnert, Wolfgang; Bansal, Mahesh; Feistel, Stefan
In room acoustics, particle models like ray tracing and image source method are not sufficient to explain the wave nature especially at low frequencies. For detailed acoustic investigation, many wave-based approaches like FEM, BEM and finite difference methods have been proposed. We present an application of large scale FEM analysis in order to obtain eigenmodes and transfer functions of a real-world studio with general impedance boundary conditions. The results thus obtained are compared with the measured data and are in fair agreement with each other. Since, FEM needs discretization of domain into small elements like tetrahedral and hexahedral, we also propose a novel all-hexahedral mesh generator for arbitrary shaped rooms and show its application in room acoustics.

6737
Influence of Ray Angle of Incidence and Complex Reflection Factor on Acoustical Simulation Results (Part II)
El-Saghir, Emad; Feistel, Stefan
In a previous paper [1], it was shown that the influence of neglecting the incidence-angle dependence of absorption coefficients in a single-source, shoebox room model was insignificant as far as simulation results are concerned. Neglecting phase shift at each reflection led, however, to a significant difference in the predicted pressure level in the same model. This paper investigates the same two questions in a complicated model with several sources and a diversity of surface materials. It attempts to analytically estimate the error associated with the disregard of these two issues.

6738
Adaptive Audio Equalization of Rooms Based on a Technique of Transparent Insertion of Acoustic Probe Signals
Ferreira, Aníbal J. S.; Leite, António; Pinto, Francisco; Rocha, Ariel F.
This paper presents an enhanced method performing real time adaptive equalization of room acoustics in the frequency domain. The method obtains the frequency response of the room by means of transparent insertion of a certain number of acoustic probe signals into the main audio spectrum. The opportunities for the insertion of tones are identified by means of a spectral analysis of the audio signal based on a psychoacoustic model of frequency masking. This enhanced version of the adaptive equalizer will be explained as well as its real time implementation on a TMS320C6713 DSP based platform. Results of the acoustic tests will be discussed and conclusions about global performance will be presented.

6739
An Amphitheatric Hall Modal Analysis using the Finite Element Method Compared to in situ Measurements.
Kalliris, George; Papanikolaou, George; Papastefanou, Anastasia; Sevastiadis, Christos
The distribution of the low frequency room modes is important in room acoustics. The Finite Element Method (FEM) is a powerful numerical technique for analyzing the behavior of sound waves in enclosures, especially irregular ones. Also, it is the method which produces reliable results in the low frequency range where other methods like ray tracing and image source methods fail. A modal analysis is presented using the FEM in a non rectangular, medium sized amphitheatric hall and we compare the calculated results with those obtained by on site measurements.

6740
A Computer Aided Design Method for the Dimensions of a Rectangular Enclosure to Avoid Degeneracy of Standing Waves
Liu, Zhi; Wu, Fan
A method for designing dimensions of a rectangular enclosure to avoid degeneracy of standing waves, and the corresponding computer aided design software are presented in this paper. A math model to calculate many dimensions in favor of avoiding degeneracy of standing waves is created. The similarity of the normal frequencies regarded as degeneracy is limited under a specific condition. Based on the relationship between normal frequencies and the dimensions of a rectangular enclosure, the dimensions to avoid degeneracy can be chosen. Also a Computer Aided Design program is developed to identify the dimensions that can be applied in dimensions design of loudspeaker’s cabinet or room conveniently to get the best acoustic effect.

6741
A 3D Acoustic Simulation Program with Graphical Frontend for Scene Input
Kuntz, Achim; Rabenstein, Rudolf
A program for full three-dimensional simulation of sound propagation in enclosures is presented which interfaces to a graphical interface for intuitive setup of complex simulation scenes. The simulation algorithm is based on the wave digital ¯ltering principle, allowing for arbitrary re°ection coe±cients at object boundaries and walls for realistic results. Simulation scenes are de¯ned in an object oriented way. As a graphical user interface to the simulation program a modeler front-end for a raytracing program is used. Simulation setups can thus be built by graphically placing objects in the scene. Being open source, the proposed modeler can easily be customized if required. Simulation results are shown for several example setups demonstrating the possibilities of our approach.

6742
Absorptive Material Arrangement Method for Global Interior Noise Control in Wide Frequency Range
Cho, Sung-Ho; Kim, Yang-Hann
A simple method is proposed to arrange absorptive material for global interior noise reduction in wide frequency range. When an enclosure’s typical dimensions are of the order of several wavelengths or less, and sources and enclosure are geometrically complex, it is not easy to select the means that guide us to effectively control its noise by attaching absorptive materials on its walls. The proposed method, however, will lead the designer to better understand which treatments are most effective and how a better design can be achieved. The beauty of proposed method is that one can easily find absorptive material arrangement for global noise reduction needless to calculate sound field by using perturbation method or boundary element method. This means that one can effectively find the absorbent’s arrangement because this method needs only eigenstructures (eigenvalue and eigenfunction) of an enclosure.

6743
Real Time Acoustic Rendering of Complex Environments Including Diffraction and Curved Surfaces
Bouatouch, Kadi; Deille, Olivier; Maillard, Julien; Martin, Jacques; Noé, Nicolas
A solution to produce virtual sound environments based on the physical characteristics of a modeled complex volume is described. The goal is to reproduce, in real time, the sound field depending on the position of the listener and to allow some interactivity (change in material characteristics for instance). First an adaptive beam tracing algorithm is used to compute a geometrical solution between the sources and several positions inside that volume. This algorithm is not limited to polygonal faces and handles diffraction. Then, the precomputed paths, once ordered and selected, are auralized and an adaptive artificial reverberation is used. New techniques to allow fast and accurate rendering are detailed. The proposed approach provides accurate audio rendering on headphones or within advanced multi-user immersive environments.

6744
Comparison between In-situ Recordings and Auralizations
Nijs, Lau; Rindel, Jens Holger; Saher, Konca
The doctoral research of ‘Prediction and Assessment of Acoustical Quality in Living-rooms for People with Intellectual Disabilities’ in Delft University of Technology investigates, among other issues, the applicability and verification of auralization as a quality assessment tool in acoustical-architectural design. This paper deals with comparison between binaural in-situ recordings and auralizations obtained from computer simulations. Listening tests and questionnaires were prepared from auralizations to compare with the reference binaural recordings. The difficulties in evaluation of auralization quality are discussed. The results indicate that although auralizations and binaural recordings evoke different aural perception auralization is a strong tool to assess the acoustical environment before the space is built. Two commercial programs are used for the auralizations. (ODEON and CATT-Acoustics)

6745
The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio Codecs
Marins, Paulo; Rumsey, Francis; Zielinski, Slawomir K.
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artefacts inherent to low-bit rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artefacts. The choice of the artefacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For”.

6746
Improved Noise Weighting in CELP Coding of Speech - Applying the Vorbis Psychoacoustic Model To Speex
Montgomery, Christopher; Valin, Jean-Marc
One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting lter. In this paper, we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the signi cant advantage of improving the quality of an existing codec without the need to change the bit-stream. More speci cally, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a signi cant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20% reduction in bit-rate. The technique itself is not speci c to Speex and could be applied to other CELP codecs.

6747
Reduced Bit Rate Ultra Low Delay Audio Coding
Hirschfeld, Jens; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan
An audio coder with a very low delay (6-8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced bit rate coding. We propose to use a different noise shaping method instead, resulting in an approach which uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.

6748
Real-Time Subband-ADPCM Low-Delay Audio Coding Approach
Keiler, Florian
A low-delay audio codec using the ADPCM structure (ADPCM = adaptive differential pulse code modulation) in subbands is presented. With the use of 8 subbands a coarse spectral shaping of the coding noise is obtained and the signal delay is approximately 3 ms. The targeted bit rate is in the range from 128 to 176 kbit/s per channel for near transparent audio quality. The codec uses a cosine-modulated filterbank and backward adaptive calculation of the prediction coefficients and quantization scaling factors. The computations are optimized for a real-time implementation on a fixed-point DSP with an almost constant workload over time. A comparison with the Philips Subband Coder (SBC) and the Fraunhofer Ultra Low Delay Codec (ULD) is performed.

6749
Scalable Bitplane Runlength Coding
Dunn, Chris
Low-complexity audio compression offering fine-grain bitrate scalability can be realised with bitplane runlength coding. Adaptive Golomb codes are computationally simple runlength codes that allow bitplane runlength coding to achieve notable coding efficiency. For multi-block audio frames, coefficient interleaving prior to bitplane runlength coding also results in a substantial increase in coding efficiency. It is shown that bitplane runlength coding is more compact than the best known SPIHT arrangement for audio bitplane coding, and achieves coding efficiency that is competitive with fixed-rate quantisation.

6750
Scalable Audio Coder with Iterative Auditory Masking
Philippe, Pierrick; Veaux, Christophe
In this paper, reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.

6751
A Frequency-domain Framework for Spatial Audio Coding Based on Universal Spatial Cues
Goodwin, Michael M.; Jot, Jean-Marc
Spatial audio coding (SAC) addresses the emerging need to e ciently represent high- delity multichannel audio. The SAC methods previously described involve analyzing the input audio for inter-channel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel con guration. In this paper, we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.

6752
Parametric Joint-Coding of Audio Sources
Faller, Christof
The following coding scenario is addressed: A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multi-channel surround, wavefield synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties which determine the spatial cues at the mixer output. Subjective evalution indicates that the proposed scheme achieves high audio quality.

6753
Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio Coding
Faller, Christof; Tournery, Christophe
For parametric stereo and multi-channel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multi-channel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g. binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose algorithms for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.

6754
Closing the Gap between the Multi-Channel and the Stereo Audio World: Recent MP3 Surround Extensions
Grill, Bernhard; Hellmuth, Oliver; Herre, Jürgen; Hilpert, Johannes; Plogsties, Jan
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel. Presented in 2004, the MP3 Surround format allows to represent high-quality 5.1 surround sound at bitrates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround: The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.

6755
Design for High Frequency Adjustment Module in MPEG-4 HEAAC Encoder Based on Linear Prediction Method
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Yang, Yung-Cheng
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high frequency adjustment module. In this paper, an efficient method based on Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit the different subband contents. Furthermore, the artifact due to the sharing of control parameter is investigated and the efficient decision criterion of control parameter is proposed.

6756
Multi-Channel Noise-Reduction-Systems for Speaker Identification in an Automotive Acoustic Environment
Goetze, Stefan; Kammeyer, Karl-Dirk; Mildner, Volker
Devices for communication and information utilised by car drivers are facing two essential requirements: hands-free operation via distant microphones but also robustness against different noises depending on car speed etc. Automatic speaker identification can be utilized within such devices to either supply speech recognition systems with so called apriori information to achieve higher recognition rates or even to enable applications such as heating systems to adjust to the preferences of the driver. Thus identifying the driver from a predefined group of possible system users may be a task for future applications. The aim in this work is to investigate to which extent multi-channel noise reduction systems are suitable for improving the performance of speaker identification algorithms under different acoustic conditions in an automotive environment.

6757
Optimal Quantized Linear Prediction Coefficients for Lossless Audio Compression - Scalar Quantization Revisited
Ghido, Florin
Uniform scalar quantization of linear prediction coefficients with B bits is traditionally done by multiplying each coefficient with Q = 2B and rounding it to the nearest integer. We propose an improved, optimal quantization method by replacing the rounding with a more elaborated procedure. It uses on average 2 bits less per quantized prediction coefficient for a similar sum of squared errors and allows an accurate estimate of the mean squared error misadjustment as a function of Q for a given subframe and predictor order M. We introduce several efficient time-constrained probabilistic search methods for obtaining near optimal solutions. There are no required changes at the decoder and the method is applicable on a wider area of cases (mono, stereo, and multichannel prediction) than quantization of reflection coefficients. Moreover, the method enables near optimal compression for 24 bit audio using only 32 bit arithmetic operations.

6758
Efficient Out of Head Localization System for Mobile Applications
Choi, Tacksung; Park, Young-cheol; Youn, Dae-hee
Headphone reproduction of stereo sources often gives in-the-head-localization. One possible solution to this problem is to give directional filtering and room response to the headphone reproduction system. Conventional out of head localization (OHL) schemes consist usually of a tapped delay line to simulate the direct signal path and early room reflections. Each of the taps must be filtered by a pair of HRTF, which leads to a very high processing cost. Our study is based on the fact that spatial impression (SI) can increase the effects of OHL. Our research is to generate the maximum SI with a minimum cost. Through subjective listening tests, the degree of SI was found to be the greatest for reflections within 15 to 30msec time frame from the direct sound and it is greatest for those in opposite direction to the listener’s ears. Based on the test results, we propose an efficient OHL system. In the proposed system, multiple reflections are replaced by a pair of reflections, and HRTF filtering required to simulate directivity of the reflections is implemented using a set of first order IIR shelving filters. According to the subjective tests, we show that the proposed system efficiently creates OHL with a small computational figure, and its performance is comparable to the conventional scheme of high complexity.

6759
A Psychoacoustic Noise Reduction Approach for Stereo Hands-Free Systems
Goetze, Stefan; Kammeyer, Karl-Dirk; Mildner, Volker
One demand for comfortable high quality hands-free video conferencing systems is the transmission of a spatial acoustical impression. Therefore a major task is the transmission of stereo speech signals from a noisy environment. The suppression of the noise components must not corrupt the stereo effect. In this context different single channel, multi-channel and hybrid speech enhancement systems will be evaluated in this contribution. The problem of musical noise in Post-Filter-algorithms is adressed. Therefore a psychoacoustic masking threshold for the noise reduction algorithms is considered.

6760
Estimation of Talker’s Head Orientation Based on Oriented Global Coherence Field
Brutti, Alessio; Omologo, Maurizio; Svaizer, Piergiorgio
This work describes a new method for estimating the orientation of an active sound source given a distributed microphone network. The technique requires that a set of microphone pairs be distributed in a room, and then it exploits the coherence computed from each sensor pair in order to derive an estimation of the head orientation. A database consisting of an audio sequence reproduced by a loudspeaker with different orientations and different positions was collected in order to evaluate the algorithm behavior. Experiments conducted on that database show that our approach can provide an efficient estimation of the sound source orientation, with a RMS error of about 10 degrees.

6761
High Quality Blind Bandwidth Extension of Audio for Portable Player Applications
Arora, Manish; Lee, Joonhyun; Park, Sangil
Bandwidth limitation in lossy audio coding schemes significantly reduces the perceived quality. Blind high frequency bandwidth extension schemes have been proposed but they don’t provide sufficiently good quality in applications where they are needed most, in portable audio devices with severe complexity constraints. The following work describes a high quality blind bandwidth extension method proposing efficient initial audio bandwidth detection and regenerated spectral envelop shaping enhancements. Objective measurements on processed signals show significant quality improvements with very low complexity requirements which allows easy implementation on a wide variety of platforms.

6762
Coherence Enhanced Minimum Statistics Spectral Subtraction in Bi-microphone Systems
Fillion-Deneault, Jonathan; Lefebvre, Roch
A novel system for 2 channel spectral subtraction is presented. The objective is to improve the intelligibility of speech in noisy environments by enhancing noise reduction of single microphone techniques as well as to greatly reduce the amount of musical noise that they introduce. The system consists of two different blocks; the first processing consists of a generalized spectral subtraction block on the primary channel using minimum statistics for noise estimation followed by a perceptual domain coherence based post-filter for additional noise suppression. Subjective and objective testing of both simulated and real-world recordings show that listeners prefer the proposed system to other state of the art speech enhancement reduction techniques.

6763
Sound Field Analysis Based on Generalized Prolate Spheroidal Wave Sequences
Grenier, Yves; Guillaume, Mathieu
In this article, an array processing is described to improve the quality of sound field analysis, which aims to extract spatial properties of a sound field. In this domain, the notion of spatial aliasing inevitably occurs due to the finite number of microphones used in the array. It is linked to the Fourier transform of the discrete analysis window, which is constituted of a mainlobe, fixing the resolution achievable by the spatial analysis, and also from sidelobes which degrade the quality of spatial analysis by introducing artifacts not present in the original sound field. A method to design an optimal analysis window with respect to a particular wave vector is presented, aiming to realize the best localization possible in the wave vector domain. Then the efficiency of the approach is demonstrated for several geometrical configurations of the microphone array, on the whole bandwidth of sound fields.

6764
Optimisation of Co-centred Rigid and Open Spherical Microphone Arrays
Jin, Craig; Parthy, Abhaya; van Schaik, Andre
We present a novel microphone array that consists of an open spherical array with a smaller rigid spherical array at its centre. The distribution of microphones, which results in the array having the largest frequency range, for a given beamforming order, was obtained by analysing microphone errors. For a fixed number of microphones, the results for several examples indicate that the maximum frequency range is obtained when the microphones are relatively evenly distributed between the open and rigid spheres.

6765
Review and Discussion on Classical STFT-based Frequency Estimators
Betser, Michaël; Collen, Patrice; David, Bertrand; Richard, Gaël
Sinusoidal modeling is based on the decomposition of audio signals into a sum of sinusoidal components plus a noise residual part. It involves accurate sinusoid parameters estimation and, in particular, accurate frequency estimation. A broad category of methods uses the Fast Fourier Transform (FFT) as a starting point to compute frequency. All these methods present very similar forms of estimators, but the relations between them are not yet fully understood. This work proposes to take a deeper look into these relations. The first goal of this work is to present a clear review and description fo the classical FFT-based frequency estimator. A new estimator similar to the phase vocoder is presented. The second goal of this work is to identify the common hypotheses and the common steps of the process for this category of estimator. Lastly, experimental comparisons are given.

6766
Accurate Phase Estimation for Chirp-like Signals
Betser, Michaël; Collen, Patrice; Rault, Jean-Bernard
Sinusoidal modeling relies on the decomposition of a given signal (continuous or discrete) into a set of sinusoidal components plus a residual signal. The sinusoidal parameters, namely the amplitude, frequency and phase, may vary upon time. Generally, the tracking of these parameters is performed via Short-Time Fourier Transform (STFT) analysis, providing in fine, for each sinusoidal component, estimates of the amplitude, frequency and phase for a considered time slot. The duration of the analysis time slots is chosen in order to guarantee that the signal under analysis is stationary enough to deliver useful data. If this requirement is not met, in particular if the frequency varies in the analysis slot, the phase estimation is biased. This paper introduces a method to estimate and to correct this bias as a function of the analysis parameters (window type and size) and of the frequency slope.

6767
Equalization of Audio Systems using Kautz Filters with Log-like Frequency Resolution
Karjalainen, Matti; Paatero, Tuomas
This paper presents a new digital filtering approach to the equalization of audio systems such as loudspeaker and room responses. The equalization scheme utilizes a particular infinite impulse response (IIR) filter configuration called Kautz filters, which can be seen as generalizations of finite impulse response (FIR) filters and their warped counterparts. The desired frequency resolution allocation is attained by appropriately choosing a set of fixed Kautz filter poles. The frequency resolution mapping is characterized by the allpass part of the Kautz filter. The second step in the actual equalizer design consist of assigning the Kautz filter tap-output weights, which is more or less a standard least-square configuration. The proposed method is demonstrated using measured loudspeaker and room responses.

6768
Personal Audio Headrest
Chung, Chiho; Elliott, Steve J.
Active noise control was implemented using loudspeakers embedded in the headrests of two adjacent seats. The goal of this study was to create a quiet zone surrounded by headrest 1 which was free from noise caused by the adjacent loudspeaker mounted in the next-to headrest 2. While Headrest 1 was generating noise, headrest 2 was designed to cancel out noise that was being generated by headrest 1, by driving the anti-noise signal through the loundspeaker, without using earphones/headphones. Control source, the speaker of Headrest 1 generating anti-noise, was made from FIR convolution with the electrical signal going to the primary source, the speaker of headset 2 generating target-noise. Implementing both primary and control sources results in a 20-30 dB noise reduction throughout the targeted frequency range (2kHz and below) in terms of squared acoustic pressure.

6769
Accidental Wow Evaluation Based on Sinusoidal Modeling and Neural Nets Prediction
Czyzewski, Andrzej; Litwic, Lukasz; Maziewski, Przemyslaw
In this paper an algorithmic approach to the wow defect characteristic evaluation is presented. The approach is based on a sinusoidal analysis comprising both amplitude and phase spectra processing. The frequency trajectories depicting the distortion are built on a basis of amplitude, frequency and phase dependencies and are further used for wow characteristic evaluation. Additionally the experiments concerning the neural-network-based prediction applied to the characteristic are performed. The obtained results are compared to linear-prediction.

6770
An Ontology-based Approach to Information Management for Music Analysis Systems
Abdallah, Samer; Raimond, Yves; Sandler, Mark
We describe an information management system which addresses the needs of music analysis projects, providing a logic-based knowledge representation scheme for the many types of object in the domains of music and signal processing, including musical works and scores, performance events, human agents, signals, analysis functions, and analysis results. The system is implemented using logic-programming and semantic web technologies, and provides a shareable resource for use in a laboratory environment. The whole is driven from a Prolog command line, where the use of Matlab as a computational engine enables experiments to be designed and run with the results being automatically stored and indexed into the information structure. We present as a case-study an experiment in automatic music segmentation.

6771
Pyramidal Algorithm for the Restoration of Audio Signal Corrupted by Wideband Noise
Cohen, Azaria; Neoran, Itai
Restoration of noisy audio recordings seeks minimum degradation of sound and maximum suppression of noise. Spectral suppression methods perform best with high frequency resolution but the latter results in poor performance with transients. While Wavelet based algorithms attempt to optimize the time-frequency tradeoff, they suffer from frequency aliasing. The suggested pyramid algorithm is a good candidate to optimize the time-frequency resolution trade-off while avoiding aliasing. In this study an algorithm for removal of wide-band noise from old audio recordings is evaluated. The algorithm is based on the pyramid algorithm and on a spectral method for noise suppression. Results show enhanced conservation of onsets with efficient reduction of noise. The algorithm is implemented in real-time.

6772
Digital Music Notation Transformation using XML
Kosch, Harald; Teppan, Erich Christian
The basic problem this paper is dealing with is how to convert western music notations, written for chromatic instruments, into special tablatures for diatonic instruments. There are just a few software programs facing this problem, but with the lack of full automatic operation and flexibility. This was the main reason for the development of new data formats and a new transformation algorithm, which are more suitable for the mentioned problem. Combined in an accurate software architecture, the newly developed algorithm performs the transformation from a chromatic piece of music into a data format, which is representing a diatonic tablature.

6773
A Service-oriented High-performance Architecture for Large Scale Audio Archives
Schneider, Stephan
The contribution describes a solution for large audio archives that has been developed using a Service-oriented architecture (SOA). The audio archiving system is designed as a framework of web services that are controlled centrally by a workflow engine. The audio archiving system offers hierarchical storage, import, export and conversion of audio files in various formats. Search and retrieval bases on textual metadata that are entered into an entity-relationship model (ERM). The archive offers a web based interface that can be used with a standard web browser. Current installations are coping with several hundreds of users, more than 700.000 metadata entries and 16 TB of audio files.

6774
A Robust Music Retrieval System
Eom, Ki-wan; Kim, Hyoung-gook; Shi, Yuan-yuan; Zhu, Xuan
A robust music audio fingerprinting system for automatic music retrieval is proposed in this paper. The fingerprinting feature is extracted from the long-term dynamic modulation spectrum estimation in perceptual compressed domain. The modulation frequency analysis, smoothing with a low-pass filter and the low resolution quantization significantly improve the robustness of the feature. Further the fast searching problem is solved by looking up hash table with 32-bit hash values. The hash value bits are quantized from the logarithmic scale modulation frequency coefficients. The system obtains 50.6%, 92.6%, 99.4%, or 100% search precision with approximately zero false positive rate when the query clips’ signal-to-noise ratio is <0dB, 0~5dB, 5~15dB, or >15dB, respectively.

6775
On the Influence of the Geometry on Radiation Electrodynamic Loudspeakers
Chaigne, Antoine; Quaegebeur, Nicolas
The basic conception of loudspeakers remains unchanged for decades. In particular, the shape of the diaphragm is nowadays designed as the association of a spherical cap and a truncated cap. The present work focuses on the influence of the shape of the diaphragm on the sound radiation. A temporal model based on spatial impulse response has been developed to predict the sound radiation of an axisymetric source subjected to an impulse. It is shown that non-planar sources are less subject to off-axis amplitude and phase variations than planar sources. The comparison between convex and concave geometries is also studied. It is shown that transients are more accurately reproduced by convex structures.

6776
Methods to Improve the Horizontal Pattern of a Line Array Module in the Midrange Band
Mores, Robert; Schröder, Nils Benjamin; Schwalbe, Tobias
This paper reviews methods for modeling the vertical directivity of the frequency range from 200 Hz to 1 kHz in line array configurations. It describes the advantages and disadvantages of the following concepts: the horn, the „VAlignment“, the flat alignment and the partial coverage of the speakers. We will shed light on the interrelationship between the angle of two conus speakers and the resulting directivity. Symmetrical and asymmetrical configurations of mid-range drivers and horns are compared. We will outline a procedure to combine these solutions for superior results. One main result will be the desired match of midsection’s directivity with the directivity of hf-waveguide section. A concept for building systems with variable directivity over the whole frequency range will be drafted.

6777
The Performance and Restrictions of High Frequency Waveguides in Line Arrays
Mores, Robert; Schröder, Nils Benjamin; Schwalbe, Tobias
It is necessary to form a plane coherent wavefront in the hf-section of line arrays. Several different concepts have been applied to reach this goal. We discuss these existing solutions. The different ideas on how to create a cylindrical wavefront will be explained and evaluated. Especially those waveguides which have their weak point in the theoretical design will be criticized. An explanation on how we developed a new waveguide will be given. Finally, we want to give some ideas on how the next generation of waveguides could be designed. Basically, five principles of forming flat wavefront are in use.

6778
Efficient Non-Linear Loudspeakers
Agerkvist, Finn T.; Pedersen, Bo Rohde
Loudspeakers have traditionally been designed to be as linear as possible. However, as techniques for compensating non linearities are emerging, it becomes possible to use other design criteria. This paper present and examines a new idea for improving the efficiency of loudspeakers at high levels by changing the voice coil layout. This deliberate non-linear design has the benefit that a smaller amplifier can be used, which has the benefit of reducing system cost as well as reducing power consumption.

6779
Advantages of FIR Filters in Digital Loudspeaker Controllers
Krauss, Guenter J.
Finite Impulse Response (FIR) filters for real-time audio applications can today be realized comparably easy and cost-effectively with state-of-the-art DSP technology. FIR filters have real advantages over regular Infinite Impulse Response (IIR) filters in loudspeaker controllers with regard to straight-forward linear-phase component equalization and significant improvements in the radiation pattern of cabinets with noncoincident drivers.

6780
Efficient Resonant Loudspeakers with Large Form-Factor Design Freedom
Aarts, Ronald M.; Nieuwendijk, Joris A.; Ouweltjes, Okke
Small cabinet loudspeakers with a flat response are quite inefficient. Assuming that the frequency response can be manipulated electronically, systems that have a non-flat SPL-response can provide greater usable efficiency. Such a non-flat design can deal with very compact housing, but, for small drivers, it would require a relatively large cone excursion to obtain a high SPL. However, mounting the driver in a pipe, the air column can be made to resonate which enables the use of small drivers with a small cone excursion to obtain a high SPL. For these special loudspeakers, a practically relevant optimality criterion, involving the driverand pipe parameters, will be defined. This can be especially valuable in designing very compact loudspeaker systems. An experimental example of such a design is described and a working prototype is presented.

6781
A Dipole Multimedia Loudspeaker
Filevski, Vladimir E.
A multimedia/computer loudspeaker usually stands on a desk, so the reflected sound from the desk interferes with the direct sound from the loudspeaker. This results in a comb-like frequency response, with first minimum deep at least -8 dB, followed higher in the frequency by a peak of about +4 dB, and so on. This paper describes a design of a dipole multimedia/computer loudspeaker, with less than +2 dB/ -2.4 dB of difference between resultant frequency response (including reflected sound from the desk) and anechoic response.

6782
Spatial Distribution of Distortion and Spectrally-shaped Quantization Noise in Digital Micro-array Loudspeakers
Hawksford, Malcolm J.
A concept for a digital loudspeaker array is studied composed of clusters of micro-radiating elements that form individual digital-to-acoustic converters. In this scheme a large-scale array is composed of subgroups of micro clusters. To accommodate the finite resolution of each cluster, noise shaping is proposed and parallels are drawn with the processes used in digital-to-analogue converters. Various elemental array geometries for each micro cluster are investigated by mapping transduction output into 3-D space to reveal the spatial distribution of both noise and distortion that result from non-coincident and quantized digital-to-acoustic elements.

6783
A Compact 120 Independent Element Spherical Loudspeaker Array with Programable Radiation Patterns
Avizienis, Rimas; Freed, Adrian; Kassakian, Peter; Wessel, David
We describe the geometric and engineering design challenges that were overcome to create a new compact, 10-inch diameter spherical loudspeaker array with integrated class-D amplifiers and a 120 independent channel digital audio interface using Gigabit Ethernet. A special hybrid geometry is used that combines the maximal symmetry of a triangular-faceted icosahedron with the compact planar packing of 6 circles on an equilateral triangle ("billiard ball packing"). Six custom 1.25inch drivers developed by Meyer Sound Labs are mounted on each of 20 aluminum triangular circuit boards. Class D amplifiers for the six speakers are mounted on the other side of each board. Two pentagonal circuit boards in the icosahedron employ Xilinx Spartan 3E FPGA's to demultiplex digital audio signals from incoming Gigabit Ethernet packets and process them before feeding the class-D modulators. Processing includes scaling, delaying, filtering and limiting.

6784
Polar Plots for Low Frequencies: The Acoustic Centre
Henwood, David; Vanderkooy, John
This paper studies some aspects of how polar plots should be carried out when measuring loudspeakers. At low frequencies the effect of the cabinet becomes simpler as the wavelength of the sound becomes large relative to the cabinet dimensions. This allows a particular point to be picked out which acoustically acts as the centre of the speaker at the lower frequencies. This concept is verified by acoustic simulation, and also theoretically by expressing the source radiation into a multipole expansion. Some general criteria are presented to give estimates of the acoustic centre for different geometrical aspects of the cabinet. Polar plots pivoted about the acoustic centre display very consistent low-frequency characteristics. The discussion includes a number of other considerations regarding the acoustic centre.

6785
Constant Directivity End-fire Arrays for Public Address Systems
Verbinnen, Filip
The directivity of current public address systems can be ensured in mid and high audio frequencies, using arrays and horns. Low frequencies however are often omni-directional. The cardioid subwoofer is introducing itself but has some drawbacks limiting the maximum sound pressure level achievable. As an alternative, the end-fire line array is considered a directive bass system. Previous research already done on end-fire arrays could not rely on our current digital signal processing techniques' potentials since non-existing at the time. Using a digitally processed end-fire array, the possibilities and limitations of these tapered end-fire arrays were examined to create a constant directivity end-fire array with a usable frequency range from 20Hz to 200Hz.

6786
DGRC Arrays : A Synthesis of Geometrical and Electronic Loudspeaker Arrays
Meynial, Xavier
Loudspeaker arrays offer an efficient way of achieving both uniform SPL coverage and high sound clarity over a large audience area. Two types of arrays have been proposed over the last 15 years : geometrically steered J shape arrays, mainly for high power sound reinforcement ; and electronically steered vertical arrays, mainly for speech diffusion in public spaces. This paper introduces the “Digital and Geometric Radiation Control” (DGRC) principle, which combines the advantages of geometrical arrays and electronic arrays : array is vertical so that it can be mounted on a wall ; it is controlled with great flexibility using its DSP ; and the power is evenly distributed upon loudspeakers.

6787
Universal System for Spatial Sound Reinforcement in Theatres and Large Venues - System Design and User Interface
Dausel, Martin; Deguara, Joachim; Gatzsche, Gabriel; Melchior, Frank; Reichelt, Katrin; Strauss, Michael
Sound reinforcement for large venues is a challenging task. Up to now most of the systems and concepts are focused on a more or less stereophonic reproduction. Beside these concepts a promising technology exists, which enables a spatial sound reinforcement for larger audience. Spatial sound reinforcement is an important aspect especially in high quality applications like opera houses and venues for classical music. This paper presents an innovative system and multi-user interface concept for dynamic automation and interactive control of sound source position and other properties for variable sound reproduction systems in live sound reinforcement applications. The system has been designed in close cooperation with experts in sound reinforcement for opera houses. Beside a detailed view on the practical realisation and audio processing in such a system the developed user interfaces are described.

6788
A Novel Integrated Audio Bandwidth Extension Toolkit (ABET)
E.V., Harinarayanan; Ferreira, Aníbal J. S.; Sinha, Deepen
Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio and speech codecs. In this paper we describe the components of a novel integrated audio bandwidth extension toolkit (ABET). The ABET toolkit is a combination of two bandwidth extension tools: (i) The Fractal Self-Similarity Model (FSSM) for signal spectrum; and, (ii) Accurate Spectral Replacement (ASR). Combination of these two tools, which are applied directly to high frequency resolution representation of the signal such as the Modified Cosine Transform (MDCT), has several benefits for increased accuracy and efficiency of the high frequency signal components. At the same time the combination of the two tools entails a number of importation algorithmic and perceptual considerations. In this paper we describe the components of the ABET bandwidth extension toolkit in detail. Algorithmic details, audio demonstrations, and, comparison to other audio coding schemes will be presented. Additional information and audio samples will be available at http://www.atc-labs.com/abet/ .

6789
Evaluation of Real-time Transport Protocol Configurations using aacPlus
Ehret, Andreas; Krauss, Kurt; Schneider, Andreas
aacPlus is a highly efficient audio codec that is being used in a growing number of applications where the compressed audio data is encapsulated in a real-time protocol and transmitted over error prone channels. In this paper the implication of packet losses during transmission and techniques to mitigate the impact on the resulting audio quality are discussed. Example transmission channel characteristics are used to show how typical protocol configuration parameters are derived. The benefits of the described techniques are evaluated and verified by setting up a complete simulation chain and performing listening tests.

6790
Audio Communication Coder
Ferreira, Aníbal J. S.; Sinha, Deepen
3G mobile and wireless communication networks elicit new ways of multimedia human interaction and communication, notably two-way high-quality audio communication. This is inline with both the consumer expectation of new audio experiences and functionalities, and with the motivation of Telecom Operators to offer consumers new services and communication modalities. In this paper we describe the design and optimization of a monophonic audio coder (Audio Communication Coder -ACC) that features low-delay coding (< 50 ms) and intrinsic error robustness, while minimizing complexity and achieving competitive coding gains and audio quality at bit rates around 32 kbit/s and higher. ACC source, perceptual and bandwidth extension tools will be described and an emphasis is placed on the ACC structural and operational features making it suitable for real-time, two-way audio communication.Audio demos are available at http://www.atc-labs.com/acc/ .

6791
ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding
Geiger, Ralf; Herre, Jürgen; Kim, Sang-Wook; Lin, Xiao; Rahardja, Susanto; Schmidt, Markus; Yu, Rongshan
Recently, the MPEG Audio standardization group has successfully concluded the standardization process on technology for lossless coding of audio signals. This paper provides a summary of the Scalable Lossless Coding (SLS) technology as one of the results of this standardization work. MPEG-4 Scalable Lossless Coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innovative technology is described in detail and its performance is characterized for lossless and near-lossless representation, both in conjunction with an AAC coder and as a stand-alone compression engine. A number of application scenarios for the new technology are discussed finally.

6792
Performance Analysis of Wave Field Simulation with the Functional Transformation Method
Petrausch, Stefan; Rabenstein, Rudolf
The application of the Functional Transformation Method (FTM) for the simulation of acoustical wave fields has been recently extended to complex room geometries by the usage of so called "block-based" modeling techniques. The complete model is split into several elementary blocks with simple geometry, which are solved separately with the FTM and reconnected in the discrete system with Wave Digital Filter (WDF) principles. Concerning the performance of this algorithm, two questions arise: how much additional error is introduced compared with an all-in-one FTM solution, and how accurate are the FTM simulations compared with classical methods (e.g. digital waveguide meshes). This paper offers an answer for both questions by proving, that the worst-case scenario of the proposed procedure, i.e. minimum size block models, is identical to a waveguide mesh. The complete derivation is first performed for 1D models and then extended to 2D wave fields, demonstrating the equivalence of minimum-size block-based FTM modeling and digital waveguide meshes.

6793
A Review of NFPA 72 Requirements for Emergency Communications
Pincus, Michael S.
The National Fire Protection Association publication 72, "The National Fire Alarm Code", is the basis for most fire codes in the United States. The latest edition, published in 2006, will have updated requirements for both sound pressure level and intelligibility relating to messages used for emergency communications. This presentation will describe the changes between the new edition and the previous version, published in 2002, as well as a summary of proposed changes that were not accepted. A case study will show the impact of these requirements on the design of sound system designs for a series of light rail stations in Seattle, Washington, and contrast them with subway stations in Boston, Massachusetts.

6794
Classroom Acoustics: Current and Future Criteria for the Assessment of Acoustics for Learning
Campbell, Dick; Guerra, Line; San Souci, Sooch; Teichner, Nicolas
Assuring that a student can hear the teacher and classmates clearly, without having to filter out excessive noise has been a common goal of the past, but the current standards fall short of the optimum acoustic for the act of learning. Several important factors have been overlooked by the current acoustic criteria for listening while learning. First, the actions involved in receiving new information while listening in a learning environment and their relationship with the multiple levels of perception and concentration during the “discovery phase” of the integration of new ideas. This presentation describes an approach to define distinct acoustic criteria for learning environments. Data collected from several prototype classrooms specifically built to assess criteria significance, renovation cost/value, measurement reproducibility with acoustic criteria determined on a seat-by-seat basis will be presented.

6795
Spatial Sound in Auditory Vision Substitution Systems
Väljamäe, Aleksander; Kleiner, Mendel
Current auditory vision sensory substitution (AVSS) systems might be improved by the direct mapping of an image into a matrix of concurrently active sound sources in a virtual acoustic space. This mapping might be similar to the existing techniques for tactile substitution of vision where point arrays are successfully used. This paper gives an overview of the current auditory displays used to sonify 2D visual information and discuss the feasibility of new perceptually motivated AVSS methods encompassing spatial sound.

6796
Acoustic Rendering for Color Information
Ausiello, Ludovico; Caramelli, Nicoletta; Cecchetelli, Emanuele; Ferri, Massimo
The Espacio Acustico Virtual (EAV) [Gonzalez Mora, Rodriguez Ramos] is a portable device that acoustically represents visual environmental scenes by rendering objects with the sound of virtual rain drops. Here an improvement of this device is presented, with the intent to add color to the information it conveys. Two different mappings of color into sound were implemented. Georama is a geometric coding based on Red Green Blue vectors, while Colorama is an associative coding [Hunter]. An experiment aiming at assessing which of these coding was the most user friendly was run both with sighted and blind participants. The results showed that participants learnt to better discriminate colors through sounds when trained with Georama than with Colorama.

6797
Auditory Display of Audio
Cabrera, Densil; Ferguson, Sam
In this paper, we consider applications of auditory display for representing audio system and audio signal characteristics. Conventional analytic representations of system characteristics, such as impulse response or non-linear distortion, rely on numeric and graphic communication. Alternatively, simply listening to the system under test can also reveal important aspects of its performance. Given that auditioning systems is so effective, it seems useful to develop higher-level auditory representations (auditory displays) of system performance parameters to exploit these listening abilities. For this purpose, we consider ways in which audio signals can be further transformed for auditory display, beyond the simple act of playing the sound.

6798
Non Vocal Auditory Signals in the Operating Room for Each Phase of the Anaesthesia Procedure
Bourgeon, Léonore; Cazalaà, Jean-Bernard; Guillaume, Anne; Jacob, Elisa; Rivenez, Marie; Valot, Claude
Auditory warning signals are considered by the anaesthetist team as a major source of annoyance and confusion in the operating room. An ergonomic approach was carried out in order to analyze the pertinence of the auditory warning signals emitted during anesthesia progress in three French hospitals taking into account each phase of the anaesthesia procedure. The results showed significantly higher frequencies of warning signals during induction and emergence phases. However the alarms were often ignored during these two phases as they occurred as a result of deliberate anaesthetist actions. Most of them were then considered as nuisance alarms.

6799
Frequency Bandwidth and Multi-talker Environments
Carlile, Simon; Schonstein, Daviid
Understanding a talker of interest from a complex background is a common and difficult listening task not just restricted to cocktail parties. Recent work demonstrates that high frequencies in speech are important for accurately localizing the talker and that perceived differences in the locations of talkers are important in solving the cocktail party problem. This paper describes experiments demonstrating that high frequencies contribute to the spatial release from masking by other talkers. In addition, low frequency energy at the fundamental frequency of the talker, over and above the perception of the fundamental frequency, also plays a role in spatial release from masking.

6800
Usability of 3D-Sound for Navigation in a Constrained Virtual Environment
Château, Noël; Emerit, Marc; Gonot, Antoine
This paper presents a study on a global evaluation of spatial auditory displays in a constrained virtual environment. Forty subjects had to find nine sound sources in a virtual town, navigating by using spatialized auditory cues that were delivered differently in four different conditions: by a binaural versus a stereophonic rendering (through headphones) combined by a contextualized versus decontextualized presentation of information. Behavioral data, auto-evaluation of cognitive load and subjective-impression data collected via a questionnaire were recorded. The analysis shows that the bina ural-contextualized presentation of auditory cues leads to the best results in terms of usability, cognitive load and subjective evaluation. However, these advantages are only observable after a certain period of acquisition.

6801
Psychoacoustic Evaluation of a New Method for Simulating Near-field Virtual Auditory Space
Jin, Craig; Kan, Alan; van Schaik, Andre
A new method for generating near-field virtual auditory space (VAS) is presented. This method synthesizes near-field head-related transfer functions (HRTFs) based on a distance variation function (DVF). Using a sound localization experiment, the fidelity of the near-field VAS generated using this technique is compared to that obtained using near-field HRTFs synthesized using a multipole expansion of a set of HRTFs interpolated using a spherical thin-plate spline. Individualized HRTFs for varying distances in the near-field were synthesized using the subjects’ HRTFs measured at a radius of 1m for a limited number of locations around the listener’s head. Both methods yielded similar localization performance showing no major directional localization errors and reasonable correlation between perceived and target distances of sounds up to 50cm from the centre of the subjects head. Also, subjects tended to overestimate the target distance for both methods.

6802
A Scalable CELP/Transform Coder for Low Bit Rate Speech and Audio Coding
Fuchs, Guillaume; Lefebvre, Roch
With the increase of channel capacity in communication systems, several emerging applications require an acceptable reproduction quality for speech signals at low bit-rates, and a superior quality for any kinds of audio inputs when more bandwidth is available. To meet this requirement, we propose a new scalable audio coding algorithm. The proposed coder consists of a wideband speech coder embedded in a multi-layer transform coding algorithm. The transform coefficients are quantized using a scalable lattice vector quantization. The global system exhibits low computational complexity and memory requirements, and leads to a very fine grained scalability. The new coding algorithm is suitable for communications over heterogeneous networks with no or uneven guarantee on the quality of service (QoS) for packet delivery.

6803
A New Low Bit Rate Speech Coding Scheme for Mixed Content
Annadana, Raghuram; Ferreira, Aníbal J. S.; Sinha, Deepen
Speech coding is a very mature research area and many coding schemes are available that provide speech qualities ranging from highly intelligible synthetic speech at about 2 kbit/s, till wideband natural speech at about 16 kbit/s. However, emerging application scenarios such as information services on broadcast radio are eliciting additional concurrent challenges not easily addressed by current speech coding technology, namely the need to code mixed audio material, the need to permit flexible bitrate coding configurations, the need to scale effectively in quality in the range 2-8 kbit/s, and the need to offer pleasant natural sound. In this paper we present a new very low rate speech/audio coding technology addressing those concurrent challenges thanks to the use of innovative approaches regarding accurate reconstruction of harmonic complexes, optimal coding of the excitation, efficient side information coding, and suitable combination of new bandwidth extension techniques. The structure of the speech/audio coder will be detailed and its performance in the range 2.4-12 kbit/s will be illustrated and compared to that of reference coders.

6804
On Improving Parametric Stereo Audio Coding
Lapierre, Jimmy; Lefebvre, Roch
Existing schemes for stereo and spatial audio coding rely on psychoacoustically-relevant parametric models. These systems generally encode and transmit inter-channel intensity, coherence and phase parameters extracted from a time-frequency plane. Building on this framework, we discuss a number of potential refinements that can improve the quality or reduce the bit-rate of these existing schemes using information already transmitted to the decoder. We also evaluate and assert the performance of these enhancements with a distortion analysis of the relevant parameters.

6805
Stack-Run Audio Coding
Antonini, Marc; Bensa, Julien; Oger, Marie; Ragot, Stéphane
In this paper we present an application of stack-run entropy coding to audio compression. Stack-run coding represents signed integers and zero run length by adaptive arithmetic coding using a quaternary alphabet (0, 1, +, -). We use this method to encode the scalar quantization indices representing the MDCT spectrum of perceptually weighted wideband audio signals (sampled at 16000 Hz). Noise injection and pre-echo reduction are also used to improve quality. The average quality of the proposed technique is similar to ITU-T G722.1. In addition, we compare the performance of scalar quantization with stack-run coding to the multirate lattice vector quantization of 3GPP AMR-WB+.

6806
A Codebook-based Cascade Coder for Embedded Lossless Audio Coding
Adistambha, Kevin; Burnett, Ian; Lukasiak, Jason; Ritz, Christian
Embedded lossless audio coding embeds a perceptual audio coding bitstream within a lossless audio coding bitstream. Such an approach provides access to both a lossy and lossless version of the audio signal within the one coding scheme. Previously, a lossless embedded audio coder based on the Advanced Audio Coding (AAC) approach and utilizing both backward Linear Predictive Coding (LPC) and cascade coding was proposed. This paper further investigates the adaptation of cascade coding to lossless audio compression using a novel codebook based approach. The codebook is trained using LPC residual signals obtained from the decorrelation stage of the embedded coder. Results show that the overall lossless compression performance of cascade coding closely follows Rice coding.

6807
A Unified Transient Detector for Enhanced aacPlus Encoder
Boon Poh, Ng; George, Sapna; Kurniawati, Evelyn; Samsudin,; Sattar, Farook
Enhanced aacPlus audio codec is a combination of MPEG-4 Advanced Audio Coding (AAC), Spectral Band Replication (SBR) and Parametric Stereo (PS). To deal with transient signal, SBR and AAC employ separate transient detectors, although both detectors basically perform detection on the same signal. This paper presents an idea of a low-complexity transient detector which operates in PS encoder. It performs online detection at the same time as PS spatial parameter extraction and takes advantage of some computations performed for subband grouping. Testing on a few percussive solo instrument signals and percussive mixture shows good transient information matching with the transient information generated by the original SBR and AAC detectors, with much less computational requirements. This implies that complexity of the encoder can be reduced by replacing both detectors with the proposed unified low-complexity detector.

6808
New Results in Rate-Distortion Optimized Parametric Audio Coding
Christensen, Mads G.; Holdt Jensen, Søren
In this paper, we summarize some recently published methods and results in parametric audio coding. These are all based on rate-distortion optimized coding using a perceptual distortion measure. We show how a number of well-known computationally efficient methods for incorporating perception in sinusoidal parameter estimation relate to minimizing this perceptual distortion measure. A number of methods for parametric coding of transients are compared and results of listening tests are presented. Finally, we show how the complexity of rate-distortion optimized audio coding can be reduced by rate-distortion estimation.

6809
Harmonic Structure Reconstruction in Audio Compression Method Based on Spectral Oriented Trees
Chang, Wei-Chen; Su, Alvin W.; Wang, Jing-Xin
A novel audio compression method called Harmonic Structure Quad Tree is presented. The method employs a bit-plane based quantization-encoding method called Concurrent Encoding in Hierarchical Tree to encode MDCT coefficients of the overlapping audio frames. Scalability is easily achieved by discarding the tailing bits at any position of the bit-stream as long as head information is preserved. An embedded harmonic structure reconstruction method is proposed to predict and restore the coefficients missed during the encoding process. The proposed method is compared favorably to the popular MP3 coder in both pop and classic audio programs. No psychoacoustic model is used. The computation complexity and the coding table size are much smaller compared to those of MP3 coder.

6810
An Experimental Audio Coder using Rate-distortion Controlled Temporal Block Switching
Boehm, Johannes; Jax, Peter; Kordon, Sven
To address the requirement of piecewise stationarity within the analyzed signal segments, today’s state of the art audio codecs make use of two filter bank resolutions. Short temporal resolution sequences are used to adapt to transient like jump signals, long temporal resolutions are used to effectively code the more steady or slowly drifting waveforms. With increasing computational capacity a better adaptation of the filter bank to the signal becomes feasible. The paper presents an experimental MDCT based transform coder which is capable of switching between four filter bank resolutions. A distortion measure is deployed which is driven by a simple psychoacoustic model that incorporates masking effects both for stationary and transient signals. A rate-distortion control is proposed to partition the signal to optimally match the signal contour with the temporal resolutions of the filter bank. Performance results are presented and compared to the conventional two resolution approach. Proposals for further developments, like pre-segmentation are evaluated.

6811
Detection and Extraction of Transients for Audio Coding
Edler, Bernd; Niemeyer, Oliver
An algorithm for the detection and extraction of transient signal components is presented. It is based on the detection of sharp onsets of the signal power in time direction of the complex time frequency domain. Afterwards the detected transients are extracted in the corresponding MDCT spectrum. The audio signal containing only the extracted transients is synthesized using the inverse MDCT. In an audio coding application this transient signal and a resulting residual signal can be coded separately using specifically optimized coders. One approach of such an audio coding scheme using an MDCT based coder for the transient signal is also presented.

6812
Audio Coding using a Genetic Algorithm
Marston, David
Currently MPEG 1 Layer II encoders incorporate a feedforward technique where a psychoacoustic model derived from the input signal drives the bit allocation. This paper describes a novel approach where a psychoacoustic metric compares the output signal with the input signal to drive the bit allocation, and uses a genetic algorithm in the feedback process. The audio quality is compared with that of leading conventional audio coders. The aim of the work was to assess how far Layer II coding can be improved, and whether any further progress can be made with conventional coding.

6813
Parametric Representation of Multichannel Audio Based on Principal Component Analysis
Briand, Manuel; Martin, Nadine; Virette, David
Low-bit-rate parametric audio coding for multichannel audio is mainly based on Binaural Cue Coding (BCC). In this paper we show that the Unified Domain Representation (UDR) of multichannel audio, recently introduced, is equivalent to BCC scheme. We also discuss another method, called multichannel audio upmix, which classically converts existing two-channel stereo to five-channel audio. More precisely, we focus on existing PCA-based upmix method. Starting from PCA approach, we propose a general model that may be applied both to parametric representation of multichannel audio signals and upmix methods. Moreover, we apply the analysis results to propose a new low-bit-rate parametric audio coding method based on frequency subbands PCA processing.

6814
A Dual Audio Transcoding Algorithm for Digital Multimedia Broadcasting Services
Bang, Kyoung Ho; Park, Young-cheol; Youn, Dae-hee
In this paper, we propose a dual audio transcoding algorithm to service high quality audio streams using a broadcasting network comprising heterogeneous audio formats. As two typical cases, audio transcodings from TDTV to T-DMB and S-DMB services are considered. While the Korean DTV audio standard employs the Dolby AC-3, the Korean T-DMB and Korean S-DMB services use the MPEG-4 BSAC and the MPEG-4 HE-AAC audio coding technologies, respectively. In the proposed algorithm, the bit allocation information of AC-3 is reused in the process of BSAC and HE-AAC encodings and the nested loops are reestablished as two independent loops, which saves significant amount of computational cost. In overall, the transcoding algorithm can save about 65% of computational cost for the BSAC encoding and 31% of HE-AAC encoding. Subjective quality evaluations show that the proposed algorithm has mean diffgrades of -0.02 and -0.01 relative to the tandem method. Due to its computational simplicity and effective performance, the proposed algorithm is suitable for the mobile multimedia services.

6815
A Subband Domain Downmixing Scheme for Parametric Stereo Encoder
Kurniawati, Evelyn; Ng, Boon Poh; Samsudin,; Sapna, George; Sattar, Farook
Parametric Stereo (PS) coding describes stereo audio signal with a monaural signal and a set of spatial parameters. This paper describes a signal-dependant, subband-domain downmixing scheme for PS encoder to obtain the monaural signal. The downmixing is performed on the subband signal output from PS analysis filtering, hence no extra signal decomposition is required. The scheme is able to minimize phase cancellation by performing phase alignment of the stereo signals prior to the mixing. In addition, power equalization ensures preservation of the overall power of the original stereo signal in the monaural downmixed signal. Additional computational requirement can be kept low by making use of the available PS spatial parameter data for the phase alignment. Testing on synthetic and real-life audio recording shows a good performance especially for audio recording with significant out-of-phase side signal component.

6816
Auditory Scene Synthesis for Distributed Audiences in E-Learning Applications
Furlong, Dermot; Kearney, Gavin
The enhancement of learning processes through electronic presentation has led to the development of e-Learning environments which merge traditional classroom instruction with teleconference capabilities. One aspect of such presentations is the correct localisation of both stationary and mobile sound sources for all audience members with the video data on a teleconference screen. Research into conventional sound reinforcement solutions shows how systems based on stereophonic principles fail in this regard. Furthermore, sound systems such as Delta Stereophony and Wave Front Synthesis, which provide the accurate wavefronts required for correct localisation, are found to be uneconomic for application to e-Learning classroom environments. A solution to this localisation problem is presented as a 5 speaker frontal line-array and its e®ectiveness is veri¯ed through accurate simulator and localisation tests. A physical implementation of the system is also presented for subjective evaluation.

6817
The Effect of Speaker Frequency Bandwidth Limitation and Stereo Base Width on Perceived Quality
Lorho, Gaetan
The effect of frequency bandwidth limitation and stereo base width on listeners’ preference for loudspeaker reproduction of music and movie sound was studied. Various combinations of high and low frequency band limitation were considered for near-field monophonic and stereophonic loudspeaker reproduction. Two speaker configurations with a reduced stereo base width representative of mobile multimedia systems were also included in this experiment to investigate the perceived effect of a stereo enhancement algorithm. The results of this study indicate that naïve listeners consider low-frequency content to be more important than high-frequency content and stereophony in their preference judgments. For cases where the optimal stereo reproduction was preferred to the monophonic and reduced stereo base setups, a significant improvement in preference was found with the stereo enhancement systems.

6818
Spatial Character and Quality Assessment of Selected Stereophonic Image Enhancements for Headphone Playback of Popular Music
Martens, William L.; Marui, Atsushi
The effects of selected stereophonic image enhancement algorithms on perceived spatial character and quality of headphone playback for popular music was investigated for a sampling of program material typical of conventional multi-track mixes. Preference ratings were made for the auditory images resulting from three enhancement algorithms in comparison with the original PCM recordings of nine short musical programs. A perceptual coding (MP3) of the original recordings was also presented, making a total of five versions to be compared for each musical program. In addition, ratings were collected on a perceptual attribute identified herein as Ensemble Stage Width (ESW). Applied algorithms had significant effects on both preference and ESW ratings, regardless of whether expensive or inexpensive headphones were used in the listening tests.

6819
Designing a Spatial Audio Attribute Listener Training System for Optimal Transfer.
Brookes, Tim; Kassier, Rafael; Rumsey, Francis
Interest in spatial audio has increased due to the availability of multichannel reproduction systems for the home and car. Various timbral ear training systems have been presented, but relatively little work has been carried out into training in spatial attributes of reproduced sound. To demonstrate that such a training system is truly useful, it is necessary to show that learned skills are transferable to different settings. Issues relating to the transfer of training are examined; a recent study conducted by the authors is discussed in relation to the level of transfer shown by participants, and a new study is proposed that is aimed to optimise the transfer of training to different environments.

6820
Evaluation of Loudness in a Room Acoustic Model
Angelakis, Konstantinos; Shen, Yi
The equal loudness contours were measured using an analytical model, which was constructed for rectangular rooms based on the Kuttruff’s room acoustic model. A stimulus can be presented through the room acoustic model to the subjects, via headphones. The results are in very close agreement with measurements in a real room. Two additional experiments were conducted to study the loudness as a function of reverberation time for different types of stimuli. The results showed that the effect of reverberation on loudness is negligible for a stationary stimulus. On the other hand, for an impulse train, loudness depends on both the reverberation time of the test room and the repetition frequency of the stimulus.

6821
Effect of Direction on Loudness for Wideband and Reverberant Sounds
Ellermeier, Wolfgang; Sivonen, Ville Pekka
The effect of incidence angle on loudness was investigated for wideband and reverberant sounds. In an adaptive procedure, five listeners matched the loudness of a sound coming from five incidence angles in the horizontal plane to that of the same sound with frontal incidence. The stimuli were presented to the listeners via individual binaural synthesis. The results confirm that loudness depends on sound incidence angle, as it does for narrow-band, anechoic sounds. The directional effects, however, were attenuated with the wideband and reverberant stimuli used in the present investigation.

6822
Investigations in Real-time Loudness Metering
Lavoie, Michel; Soulodre, Gilbert A.
There has been much research in the past few years on loudness perception and metering. Recently, the authors developed an objective loudness algorithm that accurately measures the perceived loudness of mono, stereo, and multichannel audio sequences. The algorithm provides a single loudness reading for the overall audio sequence. In broadcast, film, and music applications it is desirable to have a real-time loudness meter that can track the loudness of the audio signal over time. The new meter would be used in conjunction with existing metering methods to provide additional information about the audio signal. In the present study, the requirements for such a meter are examined and new subjective testing methods are devised to help in the development and evaluation of a new meter.

6823
Measuring the Threshold of Audibility of Temporal Decays
Goldberg, Andrew
A listening test system designed to measure the threshold of audibility of the decay time of low frequency resonances is described. The system employs the Parameter Estimation by Sequential Testing (PEST) technique and the listening test is conducted on calibrated headphones to remove environmental factors. Program signal, replay level, and resonance frequency are believed to influence decay time threshold. A trial listening test shows the system reveals realistic results but the temporal resonance modelling filter requires some adjustment to remove audible non-modal cues. Transducer limitations still affect the test at low frequencies and high replay levels. Factors for a future large-scale listening test are refined. Early indications are that temporal decay thresholds rise with reduced frequency and SPL.

6824
The Influence of Impulse Response Length and Transition Bandwidth of Magnitude Complementary Crossover on Perceived Sound Quality
Djukic, Iva M.; Milic, Ljiljana D.; Todorovic, Dejan Z.
In this paper a special type of magnitude complementary IIR filter pair with variable transition bandwidth and impulse response length was used in order to examine the effects of these two characteristics on subjective perception of the reproduced sound. Two types of listening tests were performed. In the first type of tests sum of crossover outputs was compared to the original signal. In the second type of tests the IIR filter pairs were compared among themselves, as well with linear phase magnitude complementary FIR filter pairs as a reference. The results of the tests show that overall differences are not significant. It was found that considered filters are suitable for loudspeaker crossover applications.

6825
Perception of Simultaneity and Detection of Asynchrony between Audio and Structural Vibration in Multimodal Music Reproduction
Kim, Sungyoung; Martens, William L.; Walker, Kent
In music reproduction incorporating haptic display there is a need to know the human observer’s tolerance for asynchrony between presentations of audio (airborne vibration) and haptic content (in this case whole-body vibration). Two methods for measuring the human tolerance for such audio-haptic asynchrony were employed in experiments using recorded musical instrument sound as stimuli: judgments of the time-order of arrival of airborne versus structure-borne vibration, and judgments of subjective simultaneity that required no report of which component arrived first. Optimal intermodal delay values derived from the time-order judgments were related to the direct judgments of simultaneity for the same set of stimuli.

6826
Computational Two-channel ITD Model
Pulkki, Ville
Recently, the Jeffress model of localization has been questioned in neurophysiological studies, and a twochannel ITD model has been proposed. In this paper, a simple computational implementation of the twochannel ITD model is presented, which models the ITD decoding based on neurophysiological data, although computationally in a very simple way. The model fits almost perfectly to the neurophysiological data recorded from a guinea pig, and matches also at least qualitatively with psychoacoustic data.

6827
Automatic Recognition of Urban Sound Sources
Defreville, Boris; Pachet, François; Rosin, Christophe; Roy, Pierre
The goal of the FDAI project is to create a general system that computes an efficient representation of the acoustic environment. More precisely, FDAI has to compute a noise disturbance indicator based on the identification of six categories of sound sources. This paper describes experiments carried out to identify acoustic features and recognition models that were implemented in FDAI. This framework is based on EDS – Extractor Discovery System – an innovative acoustic feature extraction system for sound feature extraction. The design and development of FDAI raised two critical issues. Completeness: it is very difficult to design descriptors that identify every sound source in urban environments, and Consistency: some sound sources are not acoustically consistent. We solved the first issue with a conditional evaluation of a family of acoustic descriptors, rather than the evaluation of a single general-purpose extractor. Indeed, a first hierarchical separation between vehicles (moped, bus, motorcycle and car) and non-vehicles (bird and voice) significantly raised the accuracy of identification of the buses. The second issue turned out to be more complex and is still under study. We give here preliminary results.

6828
A New Integrated System for Laboratory Environments of Speech/Voice Examination
Papanikolaou, George; Pastiadis, Costas; Psyllidou, Georgia
The paper presents a new computer-based system for the examination and analysis of speech/voice functionality in laboratory environments. Although the system is mainly designed for clinical applications, it employs features that afford its generalized use as a speech/voice acquisition, analysis and evaluation tool. The system offers an integrated and interactive modular structure for the conduction of various speech/voice examination procedures, and provides necessary data management capabilities for further exploitation in diagnostic expert systems and knowledge-based speech/voice applications.

6829
Directivity Measurements on a Highly Directive Hearing Aid: The Hearing Glasses
Boone, Marinus M.
A highly directional hearing aid has been developed with the aim to give a much higher speech intelligibility than with conventional hearing aids. The high directivity is obtained by mounting four microphones in each temple of a pair of glasses and performing optimized beam forming. This leads to an averaged directivity index of 9 dB under free field conditions, without head disturbance. In a recent research program the directivity of this device has been measured with different directivity settings, under free field and diffuse field conditions, with and without head diffraction. Results are presented of this research, where also a comparison is made with the directivity of a conventional hearing aid. Also the influence of the setting of the superdirective beamforming on the noise sensitivity is shown, indicating that for practical use the directivity should be limited.

6830
Accurate Non Linear Models of Valve Amplifiers Including Output Transformers
Touzelet, Pierre
Available commercial network analysis programs are now powerfull enough to look at sophisticated modelizations of complete valve amplifiers including non linear components such as valves and output transformers. Objectives of such accurate non linear models are evident . They allow for the evaluation, with a high degree of realism, of global amplifiers performances and their distortion, reducing as a result, major risks at the development stage of any amplifier project. It is the intention of this paper to show how such sophisticated models can be developed and which kind of results can be extracted from them, by applying these sophisticated modelizations on a real amplifier, as an illustrative exemple.

6831
The Self-compensated Audio Transformers for Tube and Solid State Single Ended Amplifiers
Mariani, Giovanni; Polisois, Aristide
The self-compensated output transformer presented at the AES Convention held in Barcelona in May 2005, intended for Single ended audio amplifiers, is based on the principle that an auxiliary winding (named tertiary), crossed by the same current as the primary winding, can oppose a magnetic flux that reduces the overall flux, producd by the direct current, in the core, to almost zero. However, at the same time, this antagonist winding opposes also to the induced alternating current. A capacitor is therefore connected to its terminals, short-circuiting the alternating current. Under these circumstances, the alternating potential difference is close to zero and the primary is no less affected. But the above short-circuit has a drawback: it reduces the inductance of the primary, considerably.

6832
Some Neglected Audio Distortion Mechanisms
Black, Richard
In addition to the familiar harmonic and intermodulation distortions, there exist various other mechanisms by which electronic equipment can degrade sound quality. Some of these are closely related to the familiar types, others are the result of direct acoustical interaction of the equipment, while yet others rely on the existence of two (or more) unrelated distortions in a system to produce an audible result. This paper examines some of these.

6833
Comparison of Four Subwoofer Measurement Techniques
Herzog, Philippe; Langrenne, Christophe; Melon, Manuel; Rousseau, David; Roux, Bruno
Acoustic measurements at very low frequency are difficult to perform, and the interpretation of the results may often be tricky. In this paper, four subwoofer measurement techniques are compared in terms of frequency response and directivity. The methods used are the following ones: semi-anechoic room, anechoic room, isobaric room, pseudo free-field. Two subwoofers are tested: a closed box system and an active/passive system. For the semi-anechoic technique, double layer pressure measurements on a half-sphere surrounding the source are performed. Then, using a BEM integral formulation, outgoing and ingoing pressure fields are separated to recover free field conditions (i.e. removal of reflections on walls below room cut-off). Discrepancies between results are discussed and explained when possible.

6834
Room Impulse Responses Measurement using a Moving Microphone
Ajdler, Thibaut; Sbaiz, Luciano; Vetterli, Martin
In this paper, we present a technique to record a large set of room impulse responses using a microphone moving along a trajectory. The technique processes the signal recorded by the microphone to reconstruct the signals that would have been recorded at all the possible spatial position along the array. The speed of movement of the microphone is shown to be the key factor for the reconstruction. This fast method of recording spatial impulse responses can also be applied for the recording of head related transfer functions.

6835
Sound Field Characterisation in Audio Reproduction With The Bit-Grouped Digital Transducer Array
Mendoza-López, Jorge; Busbridge, Simon C.; Fryer, Peter A.
A bit-grouped digital transducer array loudspeaker with different numbers of nominally identical transducers for each bit has been developed. The direct digital-to-acoustic conversion process produces a sound field whose quality is shown to be spatially dependent and highly influenced by real effects including non-uniform transducer frequency responses, transducer mismatching, baffle size and room acoustics. Spatial sound pressure maps show that reducing the array size leads to improved reconstruction due to reduced phase distortion. For a given sampling rate and signal frequency, total harmonic distortion decreases as the listening distance is increased. A new criterion for the sweet-spot location in digital arrays is proposed based on the difference between the distortion introduced by path-length differences and the inherent quantisation distortion.

6836
Radiation Impedance of Transducer Field Driven by Binary Signals
Husník, Libor; Kadlec, Frantisek
This article addresses another aspect of transducers with the direct digital to analog conversion, sometimes called digital loudspeakers, which is the radiation impedance. In a transducer array embodying such a system, every value of the acoustic pressure from its dynamic range is radiated by a different number of elementary transducers i.e. different total surface of membranes, driven by the PCM signal. Since the critical frequency depends mainly on the total surface of membranes, an interesting phenomenon appears, i.e. every sound pressure level is radiated with different radiation impedance. As a result, different levels may be radiated differently.

6837
An Introductory Review for U-fa (USM Driven Woofer) Development
Negishi, Hirokazu
It was twelve years ago, the concept of “U-fa” was born. Since then, a lot of people carry the torch up to today. This paper was originally intended to the phantom AES/NY convention in 2001 but it was shelved since. However, the second revival of the activity brought several Japanese convention papers already and the first world debut was made in AES/NY last October. Since audio world and Ultrasonic Motor has little to share a common ground, it seems difficult to appreciate what actually make difference by introducing USM to Woofer. In order to bridge the gap, original paper is refurbished totally. Emphasis is put on to introductory reading for the background, rather than theories and equations.

6838
Improved Model of Loudspeaker using Continuous Revolution of Ultrasonic Motor
Iwaki, Yusuke; Maeda, Kazuaki; Negishi, Hirokazu; Ohga, Juro; Ohnuma, Yuta
The loudspeaker using continuous revolution of an ultrasonic motor (USM), proposed by the authors, is suitable to radiate sound of very low frequency. This paper describes an improved model of the USM loudspeaker. Functions of rotor and stator is inversed than the first model to simplify electric connection. Mass of the rotating ring is increased to make the inertial force larger. Silicon rubber joint is used to connect USM and the cone radiator to avoid frictional noise.

6839
Ring Element Model: Program Results
Prokofieva, Elena
The new modelling method was described in a series of papers presented at AES conventions in 2004-2005 ([1]-[3]). In the proposed model the general approach is to represent a cone driver as a set of rings, loaded by a concentric force, applied around the lower element's edge. The ring element is preferred to the finite element model due to its simple yet precise driver simulation. The standard theoretical model of the radiating piston was initially considered. The problems inherent to this approach were highlighted, and the model was improved by removing standard assumptions one-by-one and replacing them with more complex calculation procedures for improved simulation. The first group of assumptions is programmed and compared against the real measurements and analytical calculation. The advantages of each model are studied and an explanation of how each of the standard theoretical assumptions affects the final result is provided. The possibilities to use different models in preliminary speaker design are also discussed in this paper.

6840
Analysis and Optimal Design of Miniature Loudspeakers
Bai, Mingsian R.; Chen, Rong-Liang
Miniature loudspeakers are key components to many 3C products especially for portable devices such as mobile phones, PDAs, MP3, etc. Due to size limitation, miniature loudspeakers suffer from the problem of low output level. To gain higher output, one tends to drive the miniature loudspeaker over the excursion limit and induce another problem – nonlinear distortion. This paper presents a systematic analysis and design procedure appropriate for miniature loudspeakers to best reconcile the tradeoff between total harmonic distortion (THD) and sound pressure level (SPL). The paper begins with the identification of the electro-acoustic parameters of loudspeaker via the test-box method. Performance indices including voice-coil impedance, frequency response and harmonic distortion are evaluated. On the basis of the preceding model, optimization procedures are carried out to pinpoint an optimal design. The results reveal that both output performance and excursion limit of the miniature loudspeaker can be improved by using the optimal design.

6841
Positions Effect of Multi Exciters and the Optimization on Sound Pressure Responses of Distributed Mode Loudspeaker
Shen, Xiaoxiang; Shen, Yong; Zhang, Suzhen
The exciters of distributed mode loudspeaker (DML) mainly play two roles, activating forces and attached masses, both of which will affect the sound pressure response of the panel. Therefore, to derive a smoother sound pressure response, the positions of the exciters should be considered carefully. In this paper, the model of a panel with activating forces and attached masses is developed with partial differential equations (PDEs) in FEMLAB. The optimized positions of the exciters are given combining the use of genetic algorithm (GA) based on two different optimization criteria, sound pressure response and mode distribution. Optimal results in both cases are derived and show that various optimization criteria lead to different sound decays and sound pressure sensitivities.

6842
Simulation of Reconstruction of Oversampled Signales in Digital Loudspeakers
Busbridge, Simon C.; Fryer, Peter A.; Garrett, Chris; Zhang, Haihua
The technique of oversampling and noise shaping has the potential to improve the resolution of digital loudspeaker systems, at the expense of increasing the signal bandwidth. Previous work has shown that the acoustic radiator in a digital loudspeaker system can act as a reconstruction filter if the oversampled signal bandwidth exceeds the transducer bandwidth. If the oversampled signal is within the transducer bandwidth, the use of reconstruction filters has to be considered in the system. This paper presents an investigation of reconstruction with both pre-acoustic and post-acoustic filtering. Mathematical modelling suggests that the reconstruction in a direct digital-to-analogue loudspeaker should take place before the summation of the digital bitstreams to avoid intermodulation distortion. This is counter-intuitive because the electronic driving signals are no longer digital in the digital loudspeaker system.

6843
Digital Measurement for Dynamic Distortion of Loudspeakers
Imaoka, Keiichi; Ohga, Juro
Most measuring methods by using digital signal processing technique developed recently are only for measurements in a linear range. There is still no suitable measuring method by a digital processing system for measurement of nonlinear distortion of acoustical devices. An accurate and convenient nonlinear distortion measurement system should be developed. The authors already proposed a new digital distortion measuring method for acoustical devices briefly. This method applies a Pink-TSP signal (Time Stretched Pulse, i.e. quickly swept sinusoidal signal), whose frequency band is partially eliminated, to an acoustical system to be measured. The detected component produced in the rejected band is measured as a distortion. This paper analyzes experimental results theoretically by using a single resonant system.

6844
Effect of Membrane Damage on Loudspeaker Performance
Boleiko, Romuald
This paper deals with the effect of membrane damage on loudspeaker parameters. Tweeters with small dents in their metal membranes and with a perforation in their fabric membranes were tested. Acoustic parameters of the loudspeakers were investigated by measuring the sound pressure level frequency response and vibrations of loudspeaker’s membrane. A scanning laser Doppler vibrometer was used for the latter measurements. It was found that dents and a perforation in the tweeter’s dome may change its frequency response by more than 6 dB. As a rule, dents in the dome affect the tweeter frequency response at high frequencies while a perforation in the soft fabric dome affects the tweeter response at medium frequencies.

6845
Loudspeaker Testing at the Production Line
Irrgang, Stefan; Klippel, Wolfgang; Seidel, Ulf
Quality control in the mass production of transducers and electro-acoustical systems requires an objective technique for reliable selection of defective units. A new technique is presented for detecting defects which produce almost inaudible symptoms during testing but may degrade sound quality in the final application (e.g. loose particles in the gap). Here the regular distortion which is characteristic for good units is modeled and actively compensated in the measured signal of a device under test to reveal symptoms of irregular defects (meta-hearing technology). The paper shows ways to perform high-speed measurements close to the physical limits and how to cope with ambient noise in a production environment. Traditional and more advanced techniques for separating passed and failed units are compared and their integration into process control is discussed. Finally, the paper addresses cost effective implementation in a robust hardware, flexibility to customer’s needs, simple handling and other practical requirements.

6846
New Structure of Loudspeaker
Lemarquand, Guy
We present a new structure of loudspeaker: the motor is ironless, the suspension is ferro fluidic, the moving part is piston like, with a concave dome. The absence of iron guarantees a small and constant inductance of the moving coil, as well as the absence of Eddy currents. The motor includes two circular joints, one on each side of the moving coil. These joints are ferro fluidic. They fulfil the guidance and centring function and the air tightness function. This structure is quite rigid. As there is no traditional suspension in this structure, the related non-linearities and hysteresis disappear.

6847
Multi-Channel High Performacne Analog Volume Control with a New Serial I2C/SPI Compatible Control Port
Gaboriau, Johann; Hardy, Chad; Saraf, Vivek; Tucker, John
A new scheme for digitally operating an 8 channel high performance analog volume control is proposed. Along with being I2C/SPI compatible, this scheme is faster and less complex than existing solutions due to its inherent advanced support for group/individual addressing. The volume control varies monotonically all the way from -96 dB to +22 dB in 0.25 dB steps. The chip achieves 110 dB THD+N, 1.8 µVrms total integrated noise and an interchannel isolation of greater than 120 dB between power supplies of ±5 V as well as ±9 V.

6848
Evaluation of Ambience Microphone Arrangements Utilizing Frequency Dependent Spatial Cross Correlation (FSCC)
Muraoka, Teruo
Ambience in stereophonic recording is mostly determined with main microphone system. The authors examined the effect utilizing Frequency-dependent Spatial Cross Correlation (FSCC), which is defined as a cross correlation of two outputs of the abovementioned microphone system. If recording sound field is diffusive, the above FSCC is desired to be zero. However, actual FSCC varies within -1 to 1 depending on frequency, that is brought by microphone’s directionality and location. The authors theoretically analyzed FSCCs of typical main microphone systems such as AB system, ORTF system, WF system, and MS system, and discovered that FSCC of MS system becomes uniformly zero under condition when its azimuth of directional angles is set at 132degree. This was proven experimentally, and excellently ambient recording was achieved.

6849
Parameter Estimation of Dynamic Range Compressors: Models, Procedures and Test Signals
Bitzer, Joerg; Schmidt, Denny; Simmer, Uwe
An analysis of digital dynamic range compression algorithms is presented. They are studied by employing a single-band feed-forward compressor model allowing the use of independent attack and release times for both RMS detection and gain smoothing. Artificial test signals for measuring the static and dynamic compressor characteristics are discussed. The parameters of the compressor model are estimated by fitting the model output to the output of the compressor under test by using a simplex method. The results are verified by comparing the output levels of the actual and the fitted compressor for real world audio samples.

6850
Redefining the Directivity Index for Adaptive Microphone Arrays
Schobben, Daniel W.
The Directivity Index (DI) allows for quantifying the directivity of a microphone or a non-adaptive array of microphones. The DI indicates how well a target sound source can be extracted in the presence of diffuse background noise. Adaptive microphone arrays have the potential to provide an improved performance for suppressing distracters that are spatially apart from the target sound source. In the absence of performance measures for adaptive microphone arrays, the DI has been used to evaluate adaptive microphone arrays in both anechoic and diffuse noise conditions. For adaptive microphone arrays the DI may not reflect the real-life performance of the array however, with theoretical directivity values that can be driven to infinity for spatially separated target and distracter sound sources in anechoic conditions, while such conditions will hardly be encountered in daily life. On the other hand, adaptive microphone arrays inherently do not have improved performance in diffuse sound fields for which the DI originally has been defined. Starting from the definition of DI a new measure for adaptive arrays is introduced in this paper as an objective measure for quantifying improvements in signal-to-noise based on directivity.

6851
A System for Rapid Measurement and Direct Customization of Head Related Impulse Responses
Farina, Angelo; Fontana, Simone; Grenier, Yves
Head-Related Impulse Responses (HRIRs) measurement systems are quite complex and present long acquisition times for an accurate sampling of the full 3D space. Therefore HRIRs customization has become an important research topic. In HRIRs customization some parameters (generally anthropometric measurements) are obtained from new listeners and ad-hoc HRIRs can be retrieved from them. Another way to get new listeners parameters is to measure a subset of the full 3D space HRIRs and extrapolate them in order to obtain a full 3D database. This partial acquisition system, of course, should be rapid and accurate. In this paper we present a system which allows for rapid acquisition and equalization of HRIRs for a subset of the 3D grid. Then a technique to carry out HRIR customization based on the measured HRIRs will be described.

6852
Observations on Bimodal Audio Visual Subjective Assessments of a Virtual 3D Scene
Exner, Markus; Großmann, Sebastian; Reiter, Ulrich; Strohmeier, Dominik
This article deals with observations made during audio visual subjective assessments of perceived overall quality of a virtual 3D scene. Over 30 test subjects were individually presented with a virtual living room. For each of them a pre-defined sequence of self-movement was displayed on a 2.7m wide projection screen. The visual impression was complemented with different versions of room acoustic real time simulations rendered audible via a circular 8 channel loudspeaker setup. These were contrasted in a pair-comparison test. Interestingly, the amount of reverberation judged by test subjects to be “most realistic” was highly dependent on the acoustic stimulus itself. In the following, we will present a number of interesting observations related to expectations of test subjects and gender, as well as an interpretation from which we can derive a number of suggestions for subsequent bimodal assessments.

6853
Measurement of Reverberation Discrimination Threshold for Chinese Subjects with Chinese Music Motifs
Meng, Zihou; Zhao, Fengjie
The just noticeable difference of reverberation time for Chinese subjects was studied using the Chinese music motifs as the test materials. Three subjects groups were tested, the audio technician group, the students from audio engineering department and a group of students without professional training on audio engineering and listening. The test was carried out with headphone in mono style to get the intrinsic reverberation perception. The psychometric method is constant-stimulus-method. The measured just noticeable difference of reverberation is higher than that reported in the study with western subjects and western music. The difference caused by different music motifs is insignificant, but the difference possibly caused by the professional training and experience of different subjects groups is noticeable.

6854
An Auditory Process Model for the Evaluation of Virtual Acoustic Imaging Systems
Kim, Youngtae; Nelson, Philip A.; Park, Munhum
This paper describes the initial application of an auditory process model to the evaluation of various virtual acoustic imaging systems. The model has been designed to simulate human binaural hearing by means of an equalization-cancellation process for the binaural process and a template-matching with frequency weighting for the central process, while linear and non-linear ¯lters have been employed for the peripheral process. The model prediction has been shown to be consistent with the performance of human spatial hearing in case of the localization of white Gaussian noise and the lateralization of low-frequency pure tones. In this study, virtual acoustic images presented by conventional stereophony, the Stereo Dipole and the Optimal Source Distribution have been tested on the optimal listening positions, following a discussion on the template matching process of the model. The simulation results suggest that the current model, with certain limitations, can be a good predictor of the ¯delity of such systems in providing a virtual sound image.

6855
Evaluation of Packet Loss Distortion in Audio Signals
Hellerud, Erik; Svensson, Peter; Voldhaug, Jan Erik
Audio streamed over best effort packet switched networks under real-time requirements may be distorted by lost or delayed packets. In this work triple stimulus hidden reference subjective tests are used to evaluate the perceptual quality of audio signals exposed to packet loss. The effects of packet loss combined with both very simple and very complex error concealment schemes are evaluated, together with four di®erent packet loss rates and five audio clips. Results show statistically significant differences between different packet loss rates, error concealment schemes and audio clips. Results are also compared with output from an objective audio quality evaluation tool (PEAQ).

6856
Dithering Strategy Applied to Tinnitus Masking
Czyzewski, Andrzej; Kochanek, Krzysztof; Kostek, Bozena; Skarzynski, Henryk
The hypothesis on the existence of a parasitic quantization, that accompanies hearing loss has been formulated in this work, and then related to other existing theories on causes of Tinnitus. Some preliminary experiments have been carried out, that targeted at verifying the correctness of the proposed interpretation of applied maskers employing dither theory. An effective method of providing a masking signal that uses bone conductivity was derived for the purpose of these experiments. The results of the experiments initially confirm the analogy between the threshold phenomena occurring in the digital audio circuits and ear noises origin. The presented results may induce the elaboration of more effective ear therapies based on high-frequency dither having specially formed spectral characteristics.

6857
3D Sound Field Recording with Higher Order Ambisonics - Objective Measurements and Validation of Spherical Microphone
Bertet, Stéphanie; Daniel, Jérôme; Moreau, Sébastien
Higher Order Ambisonics (HOA) is a flexible approach for representing and rendering 3D sound fields. Nevertheless, lack of effective microphone systems limited its use until recently. As a result of authors’ previous work on the theory and design of spherical microphone arrays, a 4th order HOA microphone has been built, measured and used for natural recording. The present paper first discusses theoretical aspects and physical limitations proper to discrete, relatively small arrays (spatial aliasing, low-frequency estimation). Then it focuses on the objective validation of such microphones. HOA directivities reconstructed from simulated and measured 3D responses are compared to the expected spherical harmonics. Criteria like spatial correlation help characterizing the encoding artifacts due to the model limitations and the prototype imperfections. Impacts on localisation criteria are evaluated.

6858
Audio Cable Distortion is Not a Myth!
Black, Richard
Specialist audio cables are often sold to the consumer on the basis of eyebrow-raising claims for technical performance, though to date no repeatable test has shown any effect more surprising than mild frequency-selective attenuation. However, because the loudspeaker load is typically nonlinear and causes harmonic currents to flow, finite impedance in an audio cable does indeed cause harmonic voltages to appear across the loudspeaker. This distortion term is similar to, or even greater than, that produced by the amplifier’s intrinsic nonlinearity.

Back to AES Preprints


(C) 2007, Audio Engineering Society, Inc.