Audio Engineering Society Preprints

AES 115th Convention

New York, New York, USA
October 10-13, 2003

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

5854
Matti Hamalainen,
This paper discusses interoperable synthetic audio applications and content formats for mobile devices. It is important for these devices that technologies are compact, efficient and can be applied to many different types of applications. Some of these applications utilize network connectivity and have to be interoperable between different devices. These requirements introduce several technical challenges that are discussed in more detail, focusing on MIDI based technologies. A new interoperability solution is proposed for synthetic audio content for hybrid synthesizer architectures.
Interoperable Synthetic Audio Formats for Mobile Applications and Games

5855
Christian Grigg,
A working group in the Interactive Audio SIG of the MIDI Manufacturers Association has produced a draft specification for a public standard file format supporting the interchange of advanced interactive audio soundtracks. It uses a cue-oriented model, is not tied to any particular authoring or playback platform, is programming languageneutral, can be used without license agreements or royalty payments. It is technically extensible in several dimensions. A model for the underlying soundtrack engine is articulated, as is a plan for an open-source software project to speed implementation on any given platform by leveraging its existing media playback APIs.
Preview: Interactive XMF - A Standardized Interchange File Format for Advanced Interactive Audio Content

5856
Rafael Kassier,Slawomir K. Zielinski,Francis Rumsey,
The effect of division of attention between the evaluation of multichannel audio quality degradations and involvement in a visual task (playing a computer game) was investigated. Time-variant impairments (drop-outs) were used to provide degradations in audio quality. It was observed that involvement in a visual task may significantly change the results obtained during the evaluation of audio impairments for some experimental conditions.
Computer Games and Multichannel Audio Quality Part 2 – Evaluation of Time-Variant Audio Degradations Under Divided and Undivided Attention

5857
Brian E. Schmidt,
As interactive audio soundtracks mature, they become more and more complex. It's not uncommon for games to support tens of thousands of lines of dialog, hundreds of music cues, dozens of ambiences and thousands of individual sound effects. The mixing of all these audio elements presents a unique challenge. Unlike linear media, 'mixing' of interactive audio happens as they game is being played, not ahead of time. Therefore, existing traditional post-productions techniques do not necessarily apply. This paper will discuss some of the unique challenges associated with mixing interactive audio content, including trying to determine what exactly is meant by 'mixing' game audio in the first place.
Interactive Mixing of Game Audio

5858
Jae-Jin Jeon,Lae-Hoon Kim,Koeng-Mo Sung,
Multiple Point Equalization scheme based on least square(LS) method for wavelet-filtered signal is proposed. As the variations of the room transfer function (RTF) are different at different frequency bins depending on wavelength of frequencies, equalization with different frequency resolution is desirable. Decomposing received signal with discrete wavelet transform, we can assign different kinds of filters to each bandpass signal. Moreover, RTF measurements at various receiver positions should be utilized to make the inverse filter insensitive to source/receiver position changes. These two methods are well combined to guarantee wider sweet region in a listening room. Real measurement data are used to construct an inverse filter.
Wavelet Based Multiple Point Equalization of Room Transfer Function

5859
Kittiphong Meesawat,Dorte Hammershoi,
This study aims to investigate, whether the reverberation tails of binaural room impulse responses (BRIRs) for different locations and directions in a given room can be arbitrarily interchanged in e.g. virtual environment generation. BRIRs were measured in a quardratic lecture room and postprocessed to minimize any possible noise and concatenation artifacts. Four subsets of combinations of BRIR heads and tails were selected to test for dependence of location and head direction. A 3 AFC test has been carried out with–so far–4 listeners, and the results suggest that–for the room in question–the BRIRs may be cut around 40-60 ms and arbitrarily combined with no or little perceptual consequence.
The Time When the Reverberation Tail in a Binaural Room Impulse Response Begins

5860
Joel Preto Paulo,Carlos Rodrigues Martins,Jose L. Bento Coelho,
The measurement of the Room Impulse Response is often evaluated in the presence of non-stationary noise showing a rms value and a power spectral density that significantly varies with time. Under these conditions, the mean square value, MS, of the sequence must be minimized to improve the overall SNR. This suggests that the analysis should be performed by considering the energy of the noise in the time domain and in the frequency domain. A modified MLS measurement method working in the time and in the frequency domain for applications in the room acoustics field is presented. Experimental results obtained in real conditions are described and shown in the paper. The new approach, the Hybrid Sequences technique proved to lead to a significant increase of the SNR, when compared with the classical MLS technique.
Hybrid M Sequences for Room Impulse Response Estimation

5861
Hideo Miyazaki,Takayuki Watanabe,Shinji Kishinaga,Fukushi Kawakami,
Technology for controlling sound field by electro-acoustic means is often called Active Field Control, AFC, which is used to improve auditory impressions such as liveness, loudness, and spaciousness in auditoria. The AFC system, which has been developed at Yamaha, utilizes feedback control techniques to re-create natural reverberation based on the existing acoustics of the room. Time varying control, including EMR (Electric Microphone Rotator) and fluc-FIR (fluctuating FIR), is implemented in the AFC system to improve stability, preventing the coloration caused by a feedback loop in the system. In this paper, these technologies are summarized, together with an introduction to the recent representative venues using AFC. A system plan using core devices named AFC1, which has been developed at Yamaha and released recently in the US, is also presented.
Active Field Control (AFC) -Reverberation Enhancement System Using Acoustical Feedback Control

5862
Bradford N. Gover,James G. Ryan,Michael R. Stinson,
Spherical microphone array designs were investigated from the point of view of suitability for directional analysis of reverberant sound fields. Four array geometries (tetrahedron, cube, dodecahedron, geodesic sphere) were considered. Beamforming filters were designed using a constrained gain maximization process. The theoretical performance of each array was then predicted. A room acoustic simulator was used to help assess sufficient directionality and evaluate the suitability of each design. A 32-element geodesic sphere array was constructed and used to make directional measurements in real sound fields.
Designing a Spherical Microphone Array for the Directional Analysis of Reflections and Reverberation

5863
D. B. (Don) Jr. Keele,
To maintain constant beamwidth behavior, CBT circular-arc loudspeaker line arrays require that the individual transducer drive levels be set according to a continuous Legendre shading function. This shading gradually tapers the drive levels from maximum at the center of the array to zero at the outside edges of the array. This paper considers approximations to the Legendre shading that both discretize the levels and truncate the extent of the shading so that practical CBT arrays can be implemented. It was determined by simulation that a 3-dB stepped approximation to the shading maintained out to –12 dB did not significantly alter the excellent vertical pattern control of the CBT line array. Very encouraging experimental measurements were exhibited by a pair of passively-shaded prototype CBT arrays using miniature wide-band transducers.
Practical Implementation of Constant Beamwidth Transducer (CBT) Loudspeaker Circular-Arc Line Arrays

5864
Wolfgang Hess,Jonas Braasch,Jens Blauert,
The output of computational models of human auditory localization (e.g., [Lindemann, 1986], [Gaik, 1993]) is often given in the form of a 3-dimensional plot which takes into account the effect of binaural arrival-time and binaural level differences, the so-called binaural activity pattern. This pattern, which is a function of time, can be used for the detection, identification and separation of incoherent sound sources, the determination of their azimuths, the detection of echoes and the estimation of the amount of auditory spaciousness, among other things. This paper investigates how the binaural activity patterns of head-related room impulse responses can be used for judgements on the auditory quality of different virtual rooms. To this end it is aimed at presenting the binaural activity patterns in a visual form adequate to allow experts a general overview of the perceived room-acoustics as well as to characterize the attributes of the auditory events which correlate with lateral reflections, reverberation and the distribution of energy in time and space.
Acoustical Evaluation of Virtual Rooms by Means of Binaural Activity Patterns

5865
Brian E. Anderson,Timothy W. Leishman,
Because loudspeaker drivers are electro-mechano-acoustical transducers, their parameters may be measured from physical domains other than the electrical domain. A method has been developed by the authors to determine moving-coil loudspeaker parameters through the use of acoustical measurements. The technique utilizes a plane wave tube and the two-microphone transfer function technique to measure acoustical properties of a baffled driver under test (DUT). Quantities such as the reflection and transmission coefficients of the DUT are first measured. Driver parameters are then extracted from the measurements using curve-fitting techniques and theoretical solutions to equivalent circuits of the composite system. This paper discusses the acoustical measurement apparatus, system modeling, and a comparison of acoustically measured parameters to those measured using common electrical techniques. Parameters derived from the various methods are also compared to reference parameters to establish bias errors.
An Acoustical Measurement Method for the Derivation of Loudspeaker Parameters

5866
Karsten Nielsen,Lars Michael Fenger,
A novel audio power conversion system architecture is presented, in the attempt to provide a step forward in overall system efficiency and performance. The Active pulse modulated Transducer system (“Active Transducer” ) converts power directly from AC mains or from a DC power supply to the acoustic output in one simplified topological stage. New perspectives in audio system design emerge.
The Active Pulse Modulated Transducer (AT) - A Novel Audio Power Conversion System Architecture

5867
Juha Backman,
A loudspeaker arrangement, consisting of two individual drivers and delay units, having omnidirectional polar pattern at low frequencies and first-order gradient pattern at middle frequencies is presented. This system combines the low-frequency dynamic range of conventional speakers with the ability of gradient speakers to reduce roominduced coloration. The benefits and limitations of mid-frequency directivity are analyzed through simulations of room-speaker interaction for omnidirectional, cardioid, and dipole loudspeakers.
Low-frequency Polar Pattern Control for Improved In-room Response

5868
Ralf Geiger,Gerald Schuller,Juergen Herre,Ralph Sperschneider,Thomas Sporer,
This paper presents a scalable lossless enhancement of MPEG-4 Advanced Audio Coding (AAC). Scalability is achieved in the frequency domain using the Integer Modified Discrete Cosine Transform (IntMDCT), which is an integer approximation of the MDCT providing perfect reconstruction. With this transform, and only minor extension of the bitstream syntax, the MPEG-4 AAC Scalable codec can be extended to a lossless operation. The system provides bit-exact reconstruction of the input signal independent of the implementation accuracy of the AAC core coder. Furthermore, scalability in sampling rate and reconstruction word length is supported.
Scalable Perceptual and Lossless Audio Coding Based on MPEG-4 AAC

5869
Theg Hong Yeo,Wai Choon Wong,Dong Yan Huang,
In this paper, a concatenated system combining turbo product codes and convolutional codes with soft decision Viterbi decoding algorithm and diversity is proposed to enhance the robustness of Unequal Error Protection (UEP) scheme for wireless transmission of MPEG-4 Advanced Audio Coding (AAC). The proposed scheme has been tested over the Gaussian and Rician fading channels. Under severe channels with random, burst and mixed bit error rates (BER) of 6.00×10-2 and above, the proposed scheme provides a 90% improvement in residual BER performance, which is approximately 3 dB, with 19% increase in bandwidth over the original UEP scheme. At a high channel BER of 6.00×10-2, the proposed scheme gives an error-free header frame. Compared with the concatenated convolutional codes, the proposed scheme provides 0.5 dB BER performance improvements with the same bandwidth. With diversity, the performance can be further improved by 3 dB at low SNR for proposed scheme and 2.5 dB for original UEP scheme.
Robust MPEG Advanced Audio Coding over Wireless Channels

5870
Manfred lutzky,Martin Weishart,Johannes Hilpert,Bernhard Grill,Harald Gernhardt,Michael Haertl,
MP3 is worldwide the most commonly used audio coding scheme. Today it is present on virtually any computer, CD and DVD-Player, many car radios and of course in the many MP3 audio player devices. This paper describes a new MPEG-4 working draft, currently under development by ISO/MPEG, which will allow MP3 to be fully integrated into MPEG-4 Audio-Visual systems. Among other new features, like enhanced editability, MP3 is scheduled to get a full multi-channel audio capability, which can be implemented with a few additional lines of code on top of a standard stereo MP3 codec.
MP3 in MPEG-4

5871
Martin Wolters,Kristofer Kjorling,Daniel Homm,Heiko Purnhagen,
MPEG Spectral Band Replication (SBR) is the newest compression technology available as part of the MPEG standards. It is combined with MPEG Advanced Audio Coding (AAC) and improves coding efficiency by more than 30%. The resulting scheme is called High-Efficiency AAC (HE-AAC). This presentation will explain MPEG-SBR and its integration into the existing MPEG-4 bitstream format. The SBR technology itself as well as the implications on systems based on MPEG-4 technology are described. The signaling through MPEG-4 Systems and other transport formats is introduced and typical applications and usage scenarios will be listed.
A Closer Look into MPEG-4 High Efficiency AAC

5872
Tilman Liebchen,
Lossless coding will become the latest extension of the MPEG-4 audio standard. In response to a call for proposals, many companies have submitted lossless audio codecs for evaluation. The codec of the Technical University of Berlin was chosen as reference model for MPEG-4 Audio Lossless Coding, attaining working draft status in July 2003. The encoder is based on linear prediction, which enables high compression even with moderate complexity, while the corresponding decoder is straightforward. The paper describes the basic elements of the codec, points out envisaged applications, and gives an outline of the standardization process.
MPEG-4 Lossless Coding for High-Definition Audio

5873
Richard Foss,Jun-ichi Fujimori,Brad Klinkradt,Shaun Bangay,
A connection management server has been developed that enables connections to be made between mLAN-compatible audio devices, via a client web browser on any web-enabled device, such as a laptop or PDA. The connections can also be made across IEEE1394 bridges, and will allow for the transport of audio and music data between mLAN devices on the same or separate IEEE 1394 buses. Multiple users will be able to make and break connections via the server.
An mLAN Connection Management Server for Web-Based, Multi-User, Audio Device Patching

5874
Shigeru Aoki,Hirokazu Nakashima,
The Japan FM Network, 38 affiliate FM broadcasting companies all over Japan, installed the digital audio program file distribution network system to abolish conventional analog distribution. This system reduces the distribution cost and duration compared with the traditional technique of broadcasting relay where one station records another station's program off-air for later use. This paper also describes the format of digital audio file for current distribution and presents solutions to some existing problems.
The Audio File Format for Digital Distribution

5875
Masahiro Ikeda,Shinjiro Yamashita,Shinji Kishinaga,Fukushi Kawakami,
In terms of the sound systems installed in auditoriums, it is difficult to design comprehensive networking systems, because of limitations in available channels, latency and cost. However, with rapid progress of networking devices, Digital Audio Networks will become more valuable also in the field of performing arts after several years. This report will discusses required functions of network audio systems for auditoriums from the system-design viewpoint by picking a 2300-seat multipurpose hall as an example. Comparing the system against networking technologies that are suggested today, the report will also discuss the feasibility of the example and point out problems in realizing such systems.
Design method of Digital Audio Network System for Auditoriums

5876
Toshiyuki Nishiguchi,Masakazu Iwaki,Kimio Hamasaki,Akio Ando,
We conducted subjective evaluation tests to study perceptual discrimination between musical sounds with and without very high frequency components (above 21 kHz). In order to conduct strict evaluation tests, the sound reproduction system used for these tests was designed to exclude any leakage or influence of very high frequency components in the audible frequency range. Most of the sound stimuli used for the evaluation tests were newly recorded by the authors to maintain the highest quality for proper sound reproduction. The subjects were selected mainly from professional audio experts and musicians. The results showed that we can still neither confirm nor deny the possibility that some subjects could discriminate between musical sounds with and without very high frequency components.
Perceptual Discrimination between Musical Sounds with and without Very High Frequency Components

5877
Malcolm O. J. Hawksford,
Progress is reported in parametrically controlled noise shaping Sigma Delta Modulator (SDM) design. As this SDM structure (introduced by the author AES 112TH) can obtain a higher SNR than normal SDM structures, Philips Research Laboratories have questioned whether further improvement could be obtained using techniques inspired by the Trellis SDM. Simulations are used here to illustrate the performance of a parametrically controlled pseudo-Trellis SDM, which is believed to be the first disclosure of its type. The technique uses a variable state step back approach to moderate loop behaviour that is shown to achieve robust stability in the presence of aggressive noise shaping and high level signals. Comparisons are made with traditional SDM structures and LPCM systems.
Parametrically Controlled Noise Shaping in Variable State-Step-Back Pseudo-Trellis SDM

5878
Michael Page,Peter Eastty,Gary Cook,Mike Smith,Eamon Hughes,
“Super Multi-channel Audio Connection” (SuperMAC) is an enhancement of the existing “Multi-channel Audio Connection for Direct Stream Digital” (MAC-DSD) to support PCM audio formats, extending the unique benefits of the technology to a universal range of studio audio applications. The link provides full-duplex multi-channel audio connection for Direct Stream Digital (DSD) or 24-bit PCM, at sample rates up to 384kHz, plus high-quality clock signals and auxiliary data. It features error correction and deterministic latency as low as 45 microseconds, and the connection medium is a standard structured wiring cable. The specification is being submitted to the AES Standards Committee with a view to open standardisation.
A Universal Interface on Cat-5 Cable for High-resolution Multi-channel Audio Interconnection

5879
Jon D. Paulo,
High-resolution digital audio systems are susceptible to various sources of electromagnetic noise from the environment, especially crosstalk from adjacent cables. The noise can induce errors and increase jitter in the recovered clock signal. We discuss the most important noise sources and their characteristics. Next, we analyze the noise susceptibility of typical transmitter and receiver circuits. Test results are provided for a system with induced common mode noise. The paper concludes with design and application considerations.
The Effects and Reduction of Common-Mode Noise and Electromagnetic Interference in High-Resolution Digital Audio Transmission Systems

5880
Jan Abildgaard Pedersen,
This paper presents a system for adapting a loudspeaker to its position and to the acoustic properties of the listening room: the ABC room adaptation system. Adaptive Bass Control (ABC) measures the acoustic radiation resistance seen by the bass drive unit and calculates a digital filter, which is inserted in the signal path before the power amplifier. The radiation resistance is calculated from measurements of sound pressures at two different positions in front of the bass drive unit. The measured radiation resistance is compared to sound pressure measurements at different listening positions. The ABC system has been found to provide a room adaptation, which is globally valid throughout the listening room, i.e. all listening positions benefit from this system.
Adjusting a Loudspeaker to Its Acoustic Environment - The ABC System

5881
Scott G. Dorsey,
Incandescent lamps have been used for over fifty years as loudspeaker protection devices, but a large amount of misinformation about them exists. The author measures static and dynamic parameters of over thirty types of auto lamps, as well as tests some types for consistency between manufacturers and production. The results contradict a lot of the common wisdom about using lamps for protection and show serious linearity problems even at low operating levels.
Lamps for Speaker Protection

5882
Steven L. Garrett,John F. Heake,
Penn State recently instituted a First Year Seminar (FYS) requirement for every student. This paper describes a “hands-on” FYS on audio engineering that has freshmen construct and test a two-way loudspeaker system during eight 2-hour class meetings. Time and resource constraints dictated that the speaker system must be assembled using only hand tools and characterized using only an oscillator and digital multimeter. The cost of the entire system could not exceed $60/side. The speaker enclosure is made primarily from PVC plumbing parts. The four laboratory exercises that the students perform and write up are designed to introduce basic engineering concepts including graphing, least-squares fitting and error estimation, electrical impedance, resonance, transfer functions, mechanical and gas stiffness, and non-destructive parameter measurement.
Hey Kid! Wanna Build a Loudspeaker? The First One's Free

5883
Steven L. Temme,Pascal Brunet,Evan Chakroff,
During the loudspeaker manufacturing process, particles may become trapped inside the loudspeaker, resulting in a distinctive defect that is easily heard but difficult to measure. To give a clearer view of the problem, Time-Frequency maps are shown for some defective loudspeakers. Based on this analysis, a reliable testing procedure using a swept-sine stimulus, high-pass filter, and RMS-envelope analysis is presented. Further possible enhancements and applications of the method are listed.
Loose Particle Detection in Loudspeakers

5884
Elena Prokofieva,
Theoretical analysis of the problem of an elastic rectangular vibrating panel placed into a baffle in the vicinity of a porous layer has been conducted. Numerical results were obtained from a special computer program written in Matlab 6.0. It has been found that the presence of the porous layer considerably alters sound emission by the panel. The effect of the porous layer characteristics as well as the air gap width between the vibrating panel and the porous layer on the acoustic pressure and surface velocity was investigated.
Radiation of Sound by a Baffled DML-Panel Near a Porous Layer

5885
Justin Baird,David McGrath,
Conventional crossover design methods utilize traditional frequency selective networks to combine multiple transducers into a single full-bandwidth system. These traditional networks, whether they are implemented in analogue or digital form, exhibit large transition bands and suffer from phase distortion. These characteristics result in poor frequency, impulse and polar responses. A practical crossover implementation is presented that removes the detrimental effects of transition bands and phase distortion. This method implements linear phase crossovers whose transition bands approach a theoretical ideal brick wall response. Comparisons to conventional crossovers will be presented. Applications to large scale array optimization are also discussed and presented.
Practical Application of Linear Phase Crossovers with Transition Bands Approaching a Brick Wall Response for Optimal Loudspeaker Frequency, Impulse and Polar Response

5886
Markus Dodd,
A compression driver with an annular two-slot phase-plug coupled to the convex side of a hemispherical diaphragm is introduced. Magnetic and thermal domains are modelled using transient and static Finite Element Methods (FEM). Structural and acoustic domains are modelled as finite elements with boundary elements used to model free space. Structural and acoustic elements are fully coupled to both each other and the boundary elements. The application of these FEM techniques to the optimisation of compression driver performance is discussed and illustrated with results. The limitations of plane-wave tube measurements are also mentioned and illustrated with FEM and measured results.
The Development of a Forward Radiating Compression Driver by the Application of Acoustic, Magnetic and Thermal Finite Element Methods

5887
D. B. (Don) Jr. Keele,
Recently Vanderkooy et al. [1, 2] considered the effect on amplifier loading of dramatically increasing the Bl force factor of a loudspeaker driver mounted in a sealed-box enclosure. They concluded that high Bl was a decided advantage in raising the overall efficiency of the amplifier-speaker combination particularly when a class-D switching-mode amplifier was used. When the Bl factor of a driver is raised dramatically, the input impedance magnitude also rises dramatically while the impedance phase essentially approaches a purely reactive condition of ±90° over a wide bandwidth centered at resonance. This is an optimum load for a class-D amplifier, they note, which not only can supply power, but can also efficiently absorb, store, and return power to the speaker. Unfortunately, the system designed with a high-Bl driver requires significant low-frequency equalization and increased voltage swing from the amplifier as compared to systems using typical much-lower values of Bl. This paper considers the effect on the driver’s efficiency of raising the driver’s Bl factor through a series of Spice simulations. The nominal power transfer efficiency defined in traditional loudspeaker design methods is contrasted with true efficiency, i.e. true acoustic power output divided by true electrical power input. Increasing Bl dramatically increases the driver’s true efficiency at all frequencies but radically decreases nominal power efficiency in the bass range. Traditional design methods based on nominal power transfer efficiency disguise the very-beneficial effects of dramatically raising the driver’s Bl product.
Comparison of Direct-Radiator Loudspeaker System Nominal Power Efficiency vs. True Efficiency with High-Bl Drivers

5888
John J. Jr. Neumann,
MEMS (microelectromechanical systems) technology is more than a scientific curiosity. Commercial MEMS products are being produced using semiconductor manufacturing techniques. What kind of audio devices can be made using this technology? Surveillance, hearing aids, and directional microphones spring to mind. Less obvious are ultrasonics, in-ear translators and surround-sound wallpaper. The small size of MEMS devices brings up issues of physical limits and appropriate size scales for acoustic applications. MEMS microphone/speaker design involves many of the same issues as conventional microphones/speakers, but the scale difference changes their relative importance. Over the past four years, the MEMS lab at Carnegie Mellon University has developed both microphones and speakers using CMOS-MEMS micromachining, and the technology is being commercialized by Pittsburgh startup Akustica.
MEMS (Microelectromechanical Systems) Audio Devices - Dreams and Realities

5889
Gary W. Elko,Flavio Pardo,Daniel Lopez,David Bishop,Peter Gammel,
The term MEMS is an acronym for MicroElectroMechanical Systems. During the past decade numerous novel sensor devices based on MEMS technologies have been made: accelerometers for air-bag deployment detection, MEMS mirror arrays for Digital Light Processing (DLP) projectors, and inkjet printer heads to name a few well known devices. MEMS devices have been developed for data storage, wireless communication, displays, optical switching, as well as microfluidics, aerospace and biomedical applications. The application of MEMS technology to audio has been primarily focussed on microphones. There are two major application areas that are driving the interest in MEMS microphones: hearing aids where size and integration with signal processing are important, and consumer devices where there is interest in reducing costs by integrating a complete systems solution on an integrated circuit and packaging of devices to allow standard robotic pick-and-place manufacturing. This paper describes a MEMS microphone that was built at Bell Labs which was the first all-surface machined MEMS microphone. We also describe some fundamental issues in the design of MEMS microphones.
Surface-Micromachined MEMS Microphone

5890
Earl R. Geddes,Lidia W. Lee,
Historically, distortion has been measured using specific signals that were sent through a system and quantified by the degree to which the signal is modified by the system. The human auditory system has not been taken into account in these metrics. Combining nonlinear system theory with the theory of hearing, a new paradigm for quantifying distortion is proposed.
Auditory Perception of Nonlinear Distortion - Theory

5891
Earl R. Geddes,Lidia W. Lee,
A new metric to the perception of distortion was recently proposed by Geddes and Lee (2003). Psychoacoustical data were measured, correlation and regression analysis were applied to examine the relationship and predictive value of this new metric to the subjective assessment of sound quality of nonlinear distortion. Furthermore, conventional metrics such as total harmonic distortion (THD) and intermodulation distortion (IMD) were also compared. Thirty-four listeners participated in a listening task, rating twenty-one stimuli using a 7-point scale. No significant relationships were observed when comparing the subjective ratings with TDH and IMD metrics. Significant correlation (r=0.95, p<.001) was observed between the subjective ratings and the new proposed GedLee (Gm) metric. Furthermore, robust predictive power was verified utilizing the GedLee metric. GedLee metric has demonstrated remarkable potential to quantify sound quality ratings of nonlinear distortion.
Auditory Perception of Nonlinear Distortion

5892
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
In many applications it is desirable to measure and control the subjective loudness of audio signals. However, there is a lack of data regarding loudness perception for typical program material and an ITU-R study is underway to examine this matter. In the present study, a series of formal subjective tests were conducted to evaluate the perceived loudness of a broad variety of typical program materials, including music and speech. Subjects adjusted the level of various audio materials until their perceived loudness was equal to that of a reference signal. Tests were designed to examine the relative subjective loudness for a variety of playback conditions. The just noticeable differences (JND) in perceived loudness was also examined.
The Subjective Loudness of Typical Program Material

5893
Kalle Koivuniemi,Nick Zacharov,
This paper describes the design of a Calibrated Source (CalSo) loudspeaker created for audio prototyping of a mobile phone. CalSo is used in conjunction with a PC application, which enables the use of virtual audio prototypes for the enhancement of audio content creation of mobile phones. CalSo provides means to calibrate the output of a PC with the loudspeaker itself or with a pair of headphones. CalSo was primarily developed to ensure that the auralized audio output of a virtual audio prototype from the users PC accurately reproduces the audio output of the real device.
A Calibrated Source for Virtual Audio Prototyping

5894
David Isherwood,Gaetan Lorho,Nick Zacharov,Ville-Veikko Mattila,
The generalized listener selection procedure (GLS) defines criteria for the efficient creation of a permanent listening panel. This paper describes the application of this procedure to a large group of candidate listeners (>300) from which a permanent panel of 30 listeners was to be formed. The criteria presented in the original publication are augmented to make the procedure more rapid and sensitive to acuity of specific auditory percepts. Verification of the benefits of such a procedure based on a comparison of listening test results for the final GLS panel and a panel of randomly chosen listeners is presented.
Augmentation, Application and Verification of the Generalized Listener Selection Procedure

5895
William L. Martens,Charith N. W. Giragama,Susantha Herath,Dishna R. Wanasinghe,Alam M. Sabbir,
A single, common timbre space for a small set of processed guitar sounds was derived for four groups of listeners, each group comprising respectively native speakers of English, Japanese, Bengali, a language of Bangladesh, and Sinhala, a language of Sri Lanka. Members of these four groups also made ratings on 10 bipolar adjective scales for the same set of sounds, each of the four groups using anchoring adjectives taken from their native language. Whereas the two primary dimensions underlying perception of the guitar timbres was common between the four groups, the way in which directly translated adjectives were used to describe the sounds generally differed between the groups, those differences being quantified via principal components analysis. Nonetheless, the two most closely related languages, Bengali and Sinhala (both Indo-Aryan languages), showed much more semantic similarity to each other than did the Japanese language with any of the the three other languages examined.
Relating Multilingual Semantic Scales to a Common Timbre Space - Part II

5896
Scott G. Norcross,Gilbert A. Soulodre,
There are numerous applications where it is desirable to have an objective measure of the perceived loudness of typical audio signals. For example, in broadcast applications an objective measure would allow the perceived loudness of the various program materials to be equalized. In the present study several potential objective loudness measures are examined. The objective measures are evaluated in their ability to predict the results of a database derived from a series of formal subjective tests. Possible metrics for rating the performance of the objective loudness measures are considered.
Objective Measures of Loudness

5897
Jim Brown,
It has been shown that a primary cause of VHF and UHF interference to professional condenser microphones is inadequate termination within the microphone of the shield of the microphone's output wiring, a fault commonly known as the pin 1 problem. Tests using only audio frequency test signals generally fail to expose susceptibility to radio frequency (RF) interference. Simple RF tests for pin 1 problems in microphones and other audio equipment are described that correlate well with EMI observed in the field.
Testing for Radio-Frequency Common Impedance Coupling (the "Pin 1 Problem") in Microphones and Other Audio Equipment

5898
Jim Brown,
It has been shown [1, 2] that radio frequency (RF) current flowing on the shield of balanced audio wiring will be converted to a differential signal on the balanced pair by a cable-related mechanism commonly known as Shield-Current-Induced Noise. This paper investigates the susceptibility of audio input and output circuits to differential signals in the 200 kHz - 2 MHz range, with some work extending to 300 MHz. Simple laboratory test methods are described, equipment is tested, and results are presented. Laboratory data are correlated with EMI observed in the field.
A Novel Method of Testing for Susceptibility of Audio Equipment to Interference from Medium and High Frequency Radio Transmitters

5899
Paul D. Henderson,
To fully describe the sound field at a listening position in an acoustical environment, the distribution of energy arriving from varying spatial directions is required, which this research accomplishes through the use of microphone array beamforming. Using the proposed technique, a multi-microphone impulse response measurement may be used to directionally locate any direct, reflected, or scattered energy arriving at the measurement location. This data may be graphically projected onto a sphere surrounding the measurement location or onto a virtual model of the measured environment, revealing the origin of the arriving energy. Preliminary measurements conducted at a prominent concert hall are presented, illustrating the analysis capability of the technique.
Directional Room Acoustics Measurement Using Large-Scale Microphone Arrays

5900
Charles Robinson,Steve Lyman,Jeffrey C. Riedmiller,
The broadcast, satellite and cable television industries have been plagued for years by the inability of personnel to accurately interpret and thus consistently control program loudness utilizing traditional measurement devices and methods. As a result, most listeners feel compelled to make adjustments to their television volume controls (in the home). A recent survey of channel-to-channel and/or program-to-program level discrepancies and subjective listening tests confirms that current the practice is unacceptable to listeners. In this paper we describe loudness measurement techniques that improve accuracy, usability, and consistency relative to previous techniques. Accuracy in this application is determined by correlation to listener opinion, with the particular goal of minimizing annoyance resulting from level mismatch. Usability is improved by minimizing the interaction required by the user. Consistency is achieved by minimizing the amount of meter interpretation required. The keys to this method are: providing a single numeric indication of loudness for a given program or segment; and isolating and measuring the portions of the program that are primarily speech, and using speech loudness as the basis for overall program level thereby improving listener satisfaction.
Intelligent Program Loudness Measurement and Control: What Satisfies Listeners?

5901
John Vanderkooy,Robert D. Stevens,
A method is presented, for measuring acoustical properties of materials, which the authors believe to be novel and unique. The method relies on an MLS (maximum length sequence) excitation signal to measure the acoustical impedance of a specimen of material placed at the termination of a long tube. Whereas the traditional methods require that measurements be made at multiple locations within the tube, either using a multi-channel data acquisition system, or by physically moving a single microphone from one location to the next, the novel method requires only a single measurement at one location in the tube, using a single microphone. The necessity to conduct only a single measurement makes this method two to four times faster than traditional methods, depending on the desired frequency range of the measurement. Other benefits of this method include the fact that it requires only a single channel data acquisition system and single microphone, and that it has the unique ability to measure the impedance of the source end of the tube (i.e., the loudspeaker) as well as the material specimen at the termination end of the tube.
A Novel Single-Microphone Method of Measuring Acoustical Impedance in a Tube

5902
Gyorgy Wersenyi,
Listening tests were carried out for investigating the localization judgments of 40 untrained subjects through equalized headphones and with HRTF synthesis. The investigation was made on the basis of the former GUIB (Graphical User Interface for Blind Persons) project in order to determine the possibilities of a 2D virtual sound screen and headphone playback. Results will be presented about the capabilities and values of typical headphone playback errors as well as minimum, maximum and average values of discrimination skills. Special localization events like left-right and up-down symmetries, missing locations in vertical localization are also discussed. The measurement method includes a special 3-categorie-forced-choice MAA report on a screen-like virtual auditory surface in front of the listeners. Test signals were presented with different spectra and movement. Conclusions are drawn both for a GUIB application as well as for the binaural synthesis about the role of the fine structure of applied HRTFs.
Localization in a HRTF-based Minimum Audible Angle Listening Test on a 2D Sound Screen for GUIB Applications

5903
Arvindh Krishnaswamy,
We discuss various tuning possibilities for the twelve basic musical intervals used in South Indian classical (Carnatic) music. Theoretical values proposed in certain well-known tuning systems are examined. Issues related to the intonation or tuning of the notes in Carnatic music are raised and discussed as well.
On the Twelve Basic Intervals in South Indian Classical Music

5904
Sylvain Choisel,Karin Zimmer,
A new laser-pointing technique providing visual feedback is presented and compared to a more traditional method, making a mark on a line; broadband noise, speech and a musical instrument served as sound stimuli. In localizing frontal sources (±30 degrees), both “real”, and amplitude-panned, in a standard listening room, the new method is shown to be more intuitive, and precise, allowing for a higher consistency of responses both within and across subjects. Furthermore, the lateral displacement of the sources is overestimated in both response techniques, this inaccuracy being significantly smaller when the laser pointer is used. In a second experiment, the influence of head orientation on pointing performance towards sounds varying in frequency content is investigated. In result, responses are not affected by moving the head towards a physical sound source, but are highly sensitive to head movements when sources are panned.
A pointing Technique with Visual Feedback for Sound Source Localization Experiments

5905
Bruno M. Fazenda,William J. Davies,Mark R. Avis,
A subjective test study was carried out in order to identify the perceptibility of changes in the Q-factor of room modes. The experimental technique concentrates on the identification of difference limen for three levels of Q-factor referring to modes in rooms used for critical listening. Trends show that changes in higher Q values are more perceptible than those for lower Q values. The results may be applied in decisions for treatment of modes in common listening and control rooms.
Difference Limen for the Q factor of Room Modes

5906
Han-gil Moon,Jungmin Park,Koeng-mo Sung,Dae-young Jang,
To provide the distance information of the sound source in 3-D audio environment, we must have some information about effective distance cues and some methods to handle them properly. Through our experiments and researches, we found that C80 and EDT change systematically with source-receiver distances. We are required to validate this result physically and psychologically. So, this paper contains physical explanation about two parameters’ systematic changes and psychological tests with artificially controlled curves, which imitate different distances. With these validations, we want to show the effect of early decay time on auditory depth.
The Effects of Early Decay Time on Auditory Depth in the Virtual Audio Environment

5907
Natanya Ford,Francis Rumsey,Tim Nind,
A Graphical Assessment Language (GAL) appears to provide the listener with a medium for describing the perceived spatial attributes of a reproduced audio event. Previous language development investigations have concluded that these spatial characteristics may be represented consistently by listeners using their own graphical descriptors. However, the ease with which these individual descriptors could be misinterpreted by a researcher was highlighted in a subsequent study; a notable problem since a primary aim of the GAL is to maintain the validity of the listener’s original experience. To reduce potential ambiguities in interpretation, this investigation considers the development of a common descriptive language, consolidating listener’s individual descriptors into a universal set of graphical terms identified as being effective for describing the experiences of all investigation participants. The process and outcome of creating a "universal" GAL is described.
Creating a Universal Graphical Assessment Language for Describing and Evaluating Spatial Attributes of Reproduced Audio Events

5908
Brandon Cochenour,David A. Rich,
Higher-order notched networks more consistently retain a desired all-pass response of loudspeakers. However, their non-coincident drivers cause deep spectral notches to occur intermittently in the crossover region. Large phase shifts are also introduced in the loudspeaker's transfer response. To evaluate the sonic impact of the deep spectral nulls and phase shifts to the overall listening experience, we propose a real-time listening test that does not involve the design of real loudspeakers or modification of a loudspeaker's sound in a listening environment. A speaker-system simulation program has been developed using Matlab. The program processes wavfiles of music clips with a virtual loudspeaker model that covers real crossover networks, offset delays, any compensation network, and raw driver frequency response characteristics. The ABX double-blind testing methodology is applied to determine the audibility of the virtual loudspeaker model under test. This approach isolates audible effects and makes them more apparent to the listener, since other effects are eliminated that might mask the changes caused by the features under study. We expect that the software can serve as a generalized template to examine other phenomena.
A Virtual Loudspeaker Model to Enable Real-Time Listening Tests in Examine the Audibility of High-Order Crossover Networks

5909
Andrew Bright,
This paper explains how, that if the total moving mass of a loudspeaker can be known in advance, all of the remaining basic linear loudspeaker parameters can be determined using only current feedback. This is explained at a theoretical and practical level. Frequency- and time-domain algorithms for tracking the parameters are presented. Examples of the tracking performance of an adaptive algorithm operating with a real music signal on an actual loudspeaker are shown.
Tracking Changes In Linear Loudspeaker Parameters with Current Feedback

5910
Rosalfonso Bortoni,Sidnei Noceti Filho,Rui Seara,
The Thiele-Small method for speaker design considers the linear loudspeaker model driven by voltage sources and operating in a small signal environment. Subsequent studies have been made to introduce into the model some nonlinear characteristics due to the operation with large signals. This paper presents a comparative analysis of the sound pressure level and cone displacement of loudspeaker systems as an infinite baffle, a closed box, a vented box and band-pass enclosure driven by voltage and current sources, under small and large signals. The nonlinearities of the voice-coil, force factor and compliance of the loudspeaker are taken into account.
Comparative Analysis of Moving-Coil Loudspeakers Driven by Voltage and Current Sources

5911
Rosalfonso Bortoni,Homero Sette Silva,
This work presents electric models for loudspeakers installed on baffles and enclosures, as closed box, bass-reflex, 4th and 6th orders band-pass enclosures, using passive crossovers from two way to three way, whose impedance curves were derived from MATLAB® simulations. The impedance curves, it’s module and phase, are presented for each one of the cited models above. The transfer functions are also presented, besides the necessary considerations to get the results from the loudspeakers specifications, dimensions and box tuning. Examples of the efforts caused in the output stages of audio power amplifiers are presented and commented.
Loudspeakers’ Electric Models for Study of the Efforts in Audio Power Amplifiers

5912
Alexander Voishvillo,
The compression driver has always been and remains an essential component of sound reinforcement systems despite its unavoidable nonlinear distortion. This distortion is caused by various effects, including adiabatic compression of air in the front chamber, modulation of the chamber’s air stiffness, and modulation of the chamber’s viscous losses. Each of these sources of distortion is inherent in the compression driver’s operation, each adversely affects the compression driver’s performance in its own specific way, and each is characterized by a different nonlinear “signature.” Comparative analysis of these distortion sources is undertaken. Nonlinear and parametric effects are explicitly expressed. The influence of diaphragm displacement, compression ratio, and chamber sound pressure on the generation of intermodulation and harmonic distortion is explored. Some design recommendations are given.
Nonlinear Versus Parametric Effects in Compression Drivers

5913
Wolfgang Klippel,
A new technique for measuring nonlinear distortion in transducers is presented which considers a priori information from transducer modeling. Transducers are single input-multiple output systems (SIMO) where the dominant nonlinearities can be concentrated in a single source adding nonlinear distortion to the input signal. The equivalent input distortion at this source can easily be derived from the measured sound pressure signal by performing a filtering with the inverse transfer response prior to the spectral analysis. This technique reduces the influence of the acoustical environment (room), removes redundant information and simplifies the interpretation. It is also the basis for speeding up distortion measurements, for the prediction of distortion in the sound field and for the detection of noise and other disturbances not generated by the transducer.
Measurement of Equivalent Input Distortion

5914
Jonathan S. Abel,David P. Berners,
Differential equations governing the behavior of first-order peak-detecting and RMS feedback and feedforward analog compressors are presented. Based on these equations, the relationship between feedback and feedforward compressor behavior is explored, and simple, accurate digital emulations are provided. Feedback and feedforward gain reduction trajectories are shown to be equivalent by transforming the feedback gain reduction into a feedforward gain reduction having a level-dependent time constant. This time constant has the effect of slowing down the transition into and out of compression, and accounts for much of the difference in compression character between the two architectures.
On Peak-Detecting and RMS Feedback and Feedforward Compressors

5915
Stephen H. Lampen,
Digital audio requires cables made to a specific impedance, 110 ohms for twisted pairs and 75 ohms for coaxial cable. But what happens when cables are the wrong impedance or are damaged or otherwise have their impedance altered? Changes in impedance can affect the signal traveling down a cable and makes a portion of the signal reflect back to the source, called ‘return loss’. This paper will show how, and when, return loss can occur, how it is measured, and how it affects digital audio systems. A return loss specification is suggested as a possible addition to the AES specifications for both equipment and cable.
Return Loss and Digital Audio

5916
Knud Bank Christensen,
An efficient implementation of parametric equalizers is a cascade of biquadratic filter sections. Traditionally, a section operates in one of 3 operating modes offering Low Shelf, Bell Shaped or High Shelf families of responses. In either mode, each section offers 3 user parameters: Boost/cut gain G, corner frequency fc and bandwidth/slope Q. This paper describes the derivation and implementation of a new 4th user parameter, Symmetry, producing a smooth transition between the 3 above mentioned operating modes while retaining the meaning of the 3 conventional user parameters. All degrees of freedom in the bi-quadratic filter blocks are utilized. Therefore an inverse mapping exists, converting arbitrary (e.g. computer generated) biquadratic coefficient sets back into meaningful EQ parameters allowing further human interpretation and adjustment.
A Generalization of the Biquadratic Parametric Equalizer

5917
Suthikshn Kumar,
This paper reviews various schemes for Smart Acoustic Volume Controllers(SVCs) for consumer electronics such as Televisions, Car Stereo systems, CD Players, AM/FM Radio, Telephones, Mobile Phones etc. There are several instances of applications of smart volume controllers in consumer electronic devices. In car stereo systems, the volume level automatically dips when the system detects the telephone ringing. The mobile with smart volume controller provides an improved speech quality even in the presence of high background noises levels. The smart volume controller for television will even out any variations in volume levels from one channel to another and program to program, thus delivering an improvement in perceived quality to the consumer. The smart volume controllers can be personalized to suit the individual’s hearing requirements. This paper also reviews several soft computing techniques for smart volume control such as fuzzy logic control, neural control, neuro-fuzzy control etc.
A Review of Smart Acoustic Volume Controllers for Consumer Electronics

5918
Sin-lyul Lee,Lae-Hoon Kim,Koeng-Mo Sung,
The non-individualized head related transfer function (HRTF) is known to have a few problems, which are referred to the ‘hole in the middle’ phenomenon and ‘front-back reversals’. To overcome these problems, a HRTF refinement technique was introduced, but unfortunately, this refinement technique causes sudden degradation in sound quality to occur and difficulty in cross-talk cancellation because of notch frequency exaggeration. In this paper, an HRTF refinement using directional weighting function has been proposed to solve these problems. This newly proposed technique weights ordinary HRTF according to its direction to amplify frontal sound intensity. As a result, spectral differences in the ‘cone-of-confusion’ region become more pronounced within overall audible frequencies without having to exaggerate notch frequency. Also, by using this function, cross-talk cancellation filter can be made more easily. We have verified through listening tests that the proposed technique is superior to the previous HRTF refinement in terms of both sound localization and sound quality. Therefore, the refinement of HRTF through the use of the directional weighting function can be applied in the virtual reality, 3D entertainment and auralization program to require high quality sound.
Head Related Transfer Function Refinement Using Directional Weighting Function

5919
Bruce Duewer,Heling Yi,John Melanson,
A robust new architecture for multi-bit delta-sigma data converters is presented. The second order mismatch shaping function is moved inside the feedback loop of a high order modulator, replacing the need for dynamic element matching (DEM) after the modulator. The mismatch shaper makes a trade-off between ensemble quantization error and mismatch induced error. Mismatch error in the frequencies of interest is decreased, and the resulting additional quantization error is dealt with by the delta-sigma feedback loop. This approach allows good non-tonal noise shaping performance even in the face of severe element mismatch. The modulator reliably collapses to second order to maintain stability if faced with particularly high noise energy. The modulator also has integrated SACD processing.
A Multi-bit Delta-Sigma DAC with Mismatch Shaping in the Feedback Loop

5920
Robert Peruzzi,Marvin White,David A. Rich,John A. Nestor,Erik Geissenhainer,Matthew Johnston,
A low-power audio amplifier with pulse width modulated power supply rails that track the output signal is presented. Because of the tracking power supply rails, the voltage drop over the power transistors is kept as low as possible and nearly constant, so that power efficiency remains high for low as well as high output level signals. A very simple digital input pulse width modulation scheme provides four power rails to a fully differential class-AB power amplifier. The simplicity of the circuit makes it an attractive solution for low cost portable audio applications, instead of using a more complex pulse width modulated class-D audio amplifier. An efficiency increase of about 10% has been simulated over the same class-AB output stage using fixed DC rails of 3 Volts and 0 Volts, with very little sacrifice in THD. Also presented are results from a 12-Volt, single-ended hardware prototype of the system.
An Efficient Low-Power Audio Amplifier with Power Supply Rails Tracking the Output by Means of Pulse Width Modulation

5921
Ronald M. Aarts,Erik Larsen,Okke Ouweltjes,
Extending the bandwidth of an audio signal may be useful at the low or high end of the frequency spectrum, depending on the application. Also, the actual bandwidth extension algorithm may rely entirely on psychoacoustic effects or may create a physical extension of the signal spectrum. We have developed a common framework for all these problems, and from this framework derived algorithms that address diverse applications in audio signal processing for bandwidth extension. Specifically, we describe algorithms for bandwidth extension applied to enhancing reproduction of bandlimited signals (at the low or high end of the frequency spectrum), and for enhancing reproduction over small loudspeakers.
A Unified Approach to Low- and High-frequency Bandwidth Extension

5922
Alberto Bellini,
Thanks to their intrinsic high efficiency audio power switching amplifier are becoming wide spread in many applications where heating and costs are major concerns. Several topologies exist to reduce augmented distortion, and many of them rely on digital signal processors (DSPs). Nowadays the market offers several products with integrated peripherals for analog signal sampling and PWM generation. However the latter operation is still a time consuming task, often performed with a number of different approaches. The paper presents a method for an efficient computation of the PWM signal corresponding to the input audio signal, suitable to feed a switching output power stage. The presented method exploits DSP capabilities and it is oriented to DSPs that integrate PWM generators. Its peculiar characteristics is the possibility to perform N taps FIR operation on the audio signal together with the computation of a suitable PWM signal with N MAC operations per sample.
Embedded Digital Filters for PWM Generators

5923
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
Previous work has shown that inverse filtering can degrade the subjective quality of audio signals in certain conditions. Minimum phase inversion and regularization applied separately have also been studied and can be effective in some cases, but neither technique has proven to be robust. In this paper, further methods involving various regularization methods applied to the full and minimum phase part of the impulse response (IR) are studied. Subjective tests were conducted in accordance with the MUSHRA method to evaluate the performance of the various inversion methods. The results of the subjective tests were also used to determine the effectiveness of the ITU-R PEAQ objective test model as a potential tool in the development and evaluation of inverse filtering techniques.
Further Investigations of Inverse Filtering

5924
Albertus C. den Brinker,Felip Riera-Palou,
Linear prediction (LP) has traditionally been used in speech coding. Recently, variants of LP have also shown to be appropriate for audio coding. In this paper we introduce a new prediction scheme, called pure linear prediction (PLP), which combines important features from previous approaches. We show that the modelling capability of the PLP cna be tuned in a psychoacoustically relevant way making it suitablbe for speech and audio coding. Moreover, under certain restrictions, this new scheme is directly realizable, stable and retains the whitening property of conventional linear prediction. The processing of the prediction coefficients to perfrom operations such as quantisation, interpolation and spectral broadening is also addressed. As an example of the application of the PLP, we describe its use in the context of the sinusoidal coder proposed by Philips which is being standardised in "MPEG-4 Extension 2".
Pure Linear Prediction

5925
Peter Kassakian,David A. Wessel,
A fundamental goal of sound synthesis is to reproduce, and to control, as many facets of the sound as possible. By numerically solving a carefully constructed optimization problem, we are able to design low-order filters for use with a dodecahedral loudspeaker array to synthesize low order spherical harmonics over specified frequency ranges. The method, a variant of least-squares, is general, allowing for the inclusion of side constraints, arbitrary array geometry, and incorporation of measured loudspeaker characteristics. We compare the predicted loudspeaker array performance with high-resolution measurements of the physical system.
Design of Low-Order Filters for Radiation Synthesis

5926
Sidnei Noceti Filho,Homero Sette Silva,Andre Luis Dalcastagne,
This paper describes the design of three filters used in the filtering of pink noise which are described in the new Brazilian standard proposed to replace the current NBR 10303. The NBR 10303 filter does not permit the correct test of subwoofers because its low cut-off frequency is too high and for this reason we changed its component values in order to obtain three new filters with lower low cut-off frequencies. The design of these filters was carried out through a numerical method based on a modification of the NBR 10303 filter frequency response magnitude. The final component values were specified according to commercial values.
A Numerical Method to Modify the NBR 10303 Filter Frequency Response

5927
Wolfgang Ahnert,Stefan Feistel,Waldemar Richert,Steven McManus,
The processing power today available on portable computer platforms is now so far advanced that it is no longer necessary to use dedicated digital signal processing platforms for the intensive analysis required in Time Delay Spectrometry. Moving the processing from a dedicated platform onto a standard personal computer becomes possible with the TDS technology realized in a measurement and post-processing software instead. It also opens the way for more information to be extracted from a measurement after it had been made. Care must still be taken in the choice of the data gathering system, as timing between input and output data samples is critical.
Time Delay Spectrometry Processing Using Standard Hardware Platforms

5928
James A. Angus,Tim Jackson,
The design and implementation of lossless audio signal processing using a Finite Field Transform is shown. In particular Complex Mersenne Transforms are developed. Finite field signal processing techniques are described. The effects of filter length and coefficient accuracy are also discussed. Finite field transform algorithms which would be suitable for lossless signal processing are presented. The paper concludes by presenting examples of lossless processing.
Lossless Signal Processing with Complex Mersenne Transforms

5929
Clemens Kuhn,Renato Pellegrini,Etienne Corteel,Dieter Leckschat,
The reproduction of sound using Wave Field Synthesis (WFS) provides larger possibilities in rendering sonic space compared to standard 5.1 set-ups (panorama, acoustic holography, envelopment…) [Berkhout, 1988], [Verheijen, 1997]. Different microphone set-ups have been developed for this reproduction system, multi-microphone set-ups as well as microphone arrays [Hulsebos et al., 2002] Using multi-microphone techniques, the aesthetic of the sonic image is limited with regard to the localization of sound sources, spectrum and blending, especially in so called “classical” music recordings. The sources appear rather focused and dominant, due to the position of the microphones capturing the sound in the near field. Reproduction as point sources and convolution with the room impulse responses lead to a correct room reproduction in theory, but in practice the spectrum of the instruments and the impression of spatial depth require improvement. Microphone arrays are suitable for impulse response measurements but not flexible enough for direct music recording. Therefore the authors propose an approach to miking and mixing of music recordings, combining WFS techniques with phantom sources from a main microphone. It adapts this stereophonic technique to the holographic properties of WFS. This approach has been evaluated in an interactive mixing and listening test session where a panel of sound engineers was invited to perform the mix of an orchestral recording. Several mixing tasks were specified (stable localization, blending, homogeneity, envelopment…). The results of this test permit analysis of the aesthetic advantages and also the limits of the proposed mixing approach.
An Approach to Miking and Mixing of Music Ensembles Using Wave Field Synthesis

5930
Magali Deschamps,Olivier Warusfel,Alexis Baskind,
Subjective listening tests dedicated to 5.1 multichannel were conducted using various recording and mixing configurations. Two ambience microphone arrays (“Hamasaki squares”), differing in size, were used to record the hall reverberation, in addition to direct sound microphones, providing a separation between direct and reverberant sound. Differences between the reverberation recording systems, in order to study its optimization regarding size, were evaluated using a set of spatial subjective attributes. Post-processing parameters (time delay between direct sound and reverberation, front/back distribution of reverberation) were investigated along similar attributes. Results underlined significant differences between the two “Hamasaki square” systems. The time delay parameter showed low influence on listener envelopment and apparent source width, whereas front/back distribution of reverberation showed a significant effect on these attributes.
Investigation of Interactions between Recording/Mixing Parameters and Spatial Subjective Attributes in the Frame of 5.1 Multichannel

5931
Wieslaw Woszczyk,
It has been advocated that high-resolution audio means ultra-wide frequency range and that, given the limited sensitivity of human hearing for high frequencies, little is gained from high-resolution perceptually. Not much laboratory evidence is found to counter this assertion because psychoacoustic research has restricted itself largely to studying the effects of frequency range within 20Hz-20kHz rather than outside of it. The paper reviews some of the available findings in this area and focuses on remarkable complexities of auditory signals by looking at precise distinctions auditory system has to extract when analyzing time/space attributes of auditory scenes. It is shown that high-resolution in temporal, spatial, spectral, and dynamic domains together determine the quality value of perceived music and sound, and that temporal resolution may be the most important domain perceptually.
Physical and Perceptual Considerations for High-resolution Audio

5932
Ingyu Chun,P. A. Nelson,
The performance of current virtual acoustic systems is highly sensitive to the geometry of individual ear at high frequencies. The objective of this paper is to study a virtual acoustic system which may not be sensitive to individual ear shape. The incident sound field around ear is reproduced by using a multi-channel headphone. The results of computer simulations show that the desired sound pressure at the eardrum can be successfully replicated in a virtual acoustic environment by using a multi-channel headphone.
Virtual Acoustic System with a Multichannel Headphone

5933
Malcolm O. J. Hawksford,Christoph P. A. Reller,
A preliminary study is presented that investigates processing of microphone array signals for multi-channel recording applications. Generalised, perceptually and acoustically based approaches are taken, where a spaced array of M microphones is mapped to an array of L loudspeakers. The principal objective is to establish transformation matrices that integrate both microphone and loudspeaker array geometry in order to reproduce a subjectively accurate illusion of the original soundfield. Techniques of acoustical vector synthesis and plane wave reconstruction are incorporated at low frequencies migrating to an approach based upon head-related transfer functions (HRTFs) at higher frequencies. Error surfaces based on the HRTF reconstruction error are used to assess perceptually relevant solutions. Simulation results presented in a 5-channel format are calculated for processed audio material with and without acoustical boundary reflections.
Perceptually Motivated Processing for Spatial Audio Microphone Arrays

5934
Robert E. (Robin) III Miller,
Objectives: Take the next step toward reproducing human hearing AND make better recordings in 5.1. In life, we hear sources we see – but also reflections and reverberation we don't see. Each sonic arrival is individually direction-stamped by our unique HRTF, including height, tonally colored by our pinna. Preserving 3D directionality is key to life-like hearing. A practical, scalable approach is presented (pat.pend.) - a way to "transform" 3D (full sphere) recordings for uncompromised 2D reproduction in stereo or 5.1/6.1 without any decoding. By adding a decoder and speakers, full 3D is losslessly “reconstituted” from 6-channel media. Experimental "tri-play" 6-channel "PerAmbio 3D/2D" recordings have been made and demonstrations presented (AES 114th Amsterdam 3/2003 & AES 24th Banff 6/2003) with praised results.
Scalable Tri-play Recording for Stereo, ITU 5.1/6.1 2D, and Periphonic 3D (with Height) Compatible Surround Sound Reproduction

5935
Joshua D. Reiss,Mark Sandler,
In recent years there has been considerable debate over the suitability of 1-bit Sigma-Delta modulation (SDM) for high-quality applications. Much of the debate has centered on whether it is possible to properly dither such a system. It has been shown that dither with a triangular probability distribution should be applied to the quantizer input in a pulse code modulation system. This is not the case for all A/D converters. We show that the dependence of error moments on input is inherently different in sigma delta modulators, and that the effect of dither depends on whether the quantiser is one bit or multibit. These statements are proven for simple SDMs and verified by simulation.
Dither and Noise Modulation in Sigma Delta Modulators

5936
Joshua D. Reiss,Mark Sandler,Derk Reefman,Erwin Janssen,
We present a mathematical framework, based on state space modelling, for the description of limit cycles of Sigma Delta Modulators (SDMs). Using a dynamical systems approach, the authors treat sigma delta modulators as piecewise linear maps. This enables us to find all possible limit cycles that might exist in an arbitrary sigma delta modulator with predefined input. We then focus on a DC input analyse their stability and show exactly the amount of dither that is necessary to remove any given limit cycle. Using several different SDM designs, we locate and analyse the limit cycles and thus verify the results by simulation.
Stability Analysis of Limit Cycles in High Order Sigma Delta Modulators

5937
Robert C. Maher,
Table look-up (or wavetable) synthesis methods have been widely used for many years in music synthesizers. Recently wavetable music synthesis has received new interest in the context of mobile applications such as personalized ring tones and pager notification signals. The limited amount of storage and processing available in such mobile devices makes the use of compressed wavetable data desirable. In this paper several issues related to wavetable compression are described and an efficient compression/decompression method is proposed for reducing the size of wavetable data while maintaining loop continuity and timbral quality.
Compression and Decompression of Wavetable Synthesis Data

5938
Helen H. Tarn,Chris H. Dick,
As audio applications become more complex, it is increasingly difficult to support the required signal processing functions using traditional DSP microprocessors due to their limited processing capability in terms of arithmetic capacity, datapath precision and input-output (I/O) bandwidth limitations. An alternative approach based on Field Programmable Gate Array (FPGA) technology is proposed, which not only preserves the versatility and flexibility of DSP microprocessors but provides the advantages of a customizable datapath which include high degrees of computational and data parallelism, arbitrary precision arithmetic and enormous I/O capacity. This paper presents a case study in which reconfigurable logic technology is employed for implementing infinite impulse response (IIR) filter banks. The study’s results show that thousands of second-order filters can be implemented on a single FPGA and demonstrate the potential of reconfigurable logic technology for audio signal processing.
Reconfigurable Logic for Audio Signal Processing

5939
Jonathan S. Abel,David P. Berners,
A method for the design of discrete-time second-order shelf filters is developed which allows the response of an analog resonant shelf filter to be approximated in the digital domain. For filters whose features approach the Nyquist limit, the proposed method provides a closer approximation to the analog response than direct application of the bilinear transform. Three types of resonant shelf filter are discussed, and design examples are presented.
Discrete-Time Shelf Filter Design for Analog Modeling

5940
Srikanth Gurrapu,Doug Roberson,
Recent advances in CMOS VLSI digital technology make it practical now to pack a lot of high performance audio processing into an ASIC. To fully reap the benefits of these semiconductor technological advances, choosing the right architecture for a given application is crucial. Key considerations involve developing an architecture, which provides high audio performance, feature rich options, flexibility to adapt to market demand and highly efficient processing for a low cost. Architectural considerations are made to allow OEM developers to easily create custom filters and functional configuration settings to accommodate specific needs without any required knowledge of DSP programming.
High-performance Configurable Fixed-point Audio Processor Design Considerations

5941
Sunil Bharitkar,Philip Hilmes,Chris Kyriakakis,
Traditionally, room response equalization is performed to improve sound quality at a given listener in applications ranging from automobile, home-theater, movie theater, multimedia education in classrooms. However, room responses vary with source and listener positions. Hence, in a multiple listener environment, equalization may be performed through spatial averaging of magnitude responses at locations of interest (e.g., in movie theater equalization). However, the performance of averaging based equalization, at the listeners, may be affected when listener positions change, or due to mismatch between microphone and listener positions (i.e., displacement effects). In this paper, we present a statistical approach to map displacement effects to a magnitude response averaging equalization performance metric. The results indicate that, for the analyzed listener configurations, the zone of equalization depends on, (i) distance of microphones/listeners from a source, (ii) the listener arrangement, and (iii) the source signal spectral composition. We have also provided an experimental validation of the theoretical results, thereby indicating the usefulness of the proposed closed form expression for measuring equalization performance due to displacement effects.
Sensitivity of Multichannel Room Equalization to Listener Position

5942
Allan Devantier,Todd S. Welti,
At low frequencies the listening environment has a significant impact on the sound quality of the audio system. Standing waves within the room cause large frequency response variations at the listening location. Furthermore, the frequency response changes significantly from one listening location to another; therefore, the system cannot be effectively equalized. A novel method to reduce seat-to-seat frequency response variation is described that allows the system to be equalized over a large listening area using relatively simple signal processing.
In-Room Low Frequency Optimization

5943
Lae-Hoon Kim,Jae-Jin Jeon,Sin-lyul Lee,Koeng-Mo Sung,
For home theatre systems, it is necessary to equalize the room impulse responses. It is well known that the perfect inverse filtering over the entire audio frequency range is hard to be met due to the perturbations such as a listener’s head movement. For securing larger ‘sweet region’ we measured at 18 points 3cm at regular intervals horizontally around the both ears position. We synthesize one representative impulse response of these 18 impulse responses in the position-weighted manner using principal component analysis method. We apply this representative impulse response as the target of inverse filtering. For inverse filtering we apply two different methods into two frequency ranges. In the low frequency range we realize a perfect inverse filter using least square method, in the high frequency range we realize linear phase finite impulse response inverse filter using 1/3 octave band frequency magnitude response smoothing. Using this inverse filter we can confirm well-equalized high sound quality in the entire sweet region from the result of experiment.
Hybrid Equalization of Room for Home Theatre System

5944
Dirk Noy,Gabriel Hauser,John Storyk,
One of the major issues in small room acoustics is low frequency response and behavior. The goal of this paper is to present how and in what magnitude different commercially available low frequency absorbing devices control low frequencies and how off-the-shelf computer based simulation programs can be used to predict the low frequency behavior of small rooms. Reproducible acoustical measurements have been taken in a completely untreated rectangular concrete room, sequentially with and without a total of eight different absorbing devices as courteously provided by eight different international manufacturers. Results are compared and conclusions are presented.
Low Frequency Absorbers - Applications and Comparisons

5945
Eddy B. Brixen,
In an increasing number of radio facilities, large office open plan environments are used for audio production. Program material is prepared, edited, and finished ready for broadcast on DAWs without using studio acoustics, studio microphones, level or loudness metering, loudspeaker monitoring and even audio engineers. In many aspects this affects the sound quality. In this paper the drawbacks are discussed and suggestions for technical solutions are presented.
Audio Production in Large Office Environments

5946
Andrew Eloff,Gary Kendall,Michael R. Honig,
Artificial reverberation continues to be a source of much research and development. Currently, there is a heavy emphasis on “auralization,” or the ability to simulate physical structures' sound characteristics using computer modeling. The diffuse component of early-order reflections has been acknowledged as an important component of reverb for several decades and has been implemented since the 1970s in varying forms. The current paper details a method of simulating diffuse reflections by use of fading models commonly used in wireless communications. The method is then considered as a stereo field enhancement effect and as a component of a reverberation system. Both are implemented in MATLAB and the results are discussed.
Diffuse Field Reverberation Modeled as a Flat Fading Channel

5947
Steven L. Harris,Jack Andersen,Daniel Chieng,
Digital input class D audio amplifiers will replace traditional analog types over the next 5 years. This paper describes a digital input class D amplifier controller integrated circuit (IC) which performs many of the functions needed to build a high performance class D audio amplifier module. A powerful Digital Signal Processor (DSP) is included allowing sophisticated modulation schemes, as well as additional audio signal processing. Possible processing functions include loudspeaker load compensation, equalization (EQ), time alignment, room acoustics compensation, howl prevention and other audio signal processing tasks. A novel clocking scheme decouples the input clock from the output switching clock, creating a highly jitter-tolerant design.
Intelligent Class D Amplifier Controller Integrated Circuit as an Ingredient Technology for Multi-Channel Amplifier Modules of Greater than 50Watts/Channel

5948
Brett G. Crockett,
A method of using auditory scene analysis of audio signals in conjunction with time and pitch scaling is presented. In the method described, a multi-channel audio signal is analyzed and the location and duration of the individual audio signal components that correspond to distinct auditory scene elements are identified. The audio data is then time and/or pitch scaled in such a way that the separate audio signal components are processed individually, thereby greatly reducing audible artifacts inherent in time and pitch scaling processing.
High Quality Multi-channel Time-Scaling and Pitch-Shifting Using Auditory Scene Analysis

5949
Thomas Holm Hansen,Lars Risbo,
A novel digital-domain adaptive calibration technique is proposed, which enables mitigation of analog related errors in over-sampled data converter systems. The technique is suited for all types of over-sampled A/D and D/A converters, e.g. multi-bit, 1-bit, PWM, etc. The calibration is done by adaptive fitting of a digital error model to the physical errors, due to component mismatch, etc.
Adaptive Digital Calibration of Over-Sampled Data Converter Systems

5950
James A. Angus,
Trellis Noise-Shaping Sigma-Delta modulators look forward at k samples of the signal before deciding to output a “one” or a “zero”. The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. Means of making the search more computationally efficient have been proposed. This paper describes alternative tree based algorithms that can also be used search the exponential number possibilities generated by lookahead noise-shaping S-D modulators. Tree based algorithms are simpler to implement because they do not require backtracking through an array of scores to determine the correct output value. They can also be made more efficient via the use of the “Fano” or “Stack” algorithms, which are described.
Efficient Algorithms for Look-Ahead Sigma-Delta Modulators

5951
Sang-Myeong Kim,
Authentic binaural reproduction over loudspeakers using crosstalk cancellation is considered in this paper. A systematic time domain deconvolution method is presented using both stochastic and deterministic Wiener filters. This approach is advantageous over its frequency domain counterpart in that no additional stabilization process is required since the Wiener filter is inherently causal stable. A series of reproduction tests were conducted by changing the length of the Wiener filters in an anechoic chamber using a PC based reproduction system using a soundcard. Excellent performance was achieved with the filter length of only 500; less than 1% and 2% reproduction performance errors for the monaural and binaural reproduction tests, respectively.
Authentic Reproduction of Stereo Sound - A Wiener Filter Approach

5952
Vijay Parsa,Karthikeyan Umapathy,
We have evaluated objectively the comparative performance of five noise reduction algorithms. These algorithms were based on the Short-Time Spectral Amplitude (STSA) estimation, subspace projections, wavelet packets with auditory masking, and time-frequency decompositions using matching pursuits. Speech stimuli corrupted by speech-shaped noise and multi-talker babble at 5 different Signal-to-Noise Ratios (SNRs) were used to test the performance of the noise reduction algorithms. Noise reduction performance was quantified using two different methods. In the first method, the Perceptual Evaluation of Speech Quality (PESQ) measure was computed twice - once between the original and noisy speech and the other between the original and enhanced speech. The difference between these two PESQ values indicated the performance of the noise reduction algorithm. The second method was based on the "phase reversed noise" technique where the noise reduction algorithm was tested twice, once with speech + noise and then with speech + phase reversed noise. The PESQ and SNR gain measures were then computed on the combined output. The results obtained from this study indicate that the STSA based algorithm performs better in terms of the amount of noise reduction, while the wavelet packet based algorithm performs better in terms of minimizing the speech distortion introduced by the noise reduction process.
Objective Evaluation of Noise Reduction Algorithms in Speech Applications

5953
Fabio Bozzoli,Angelo Farina,
One of the most used intelligibility’s parameters is Speech Transmission Index : the techniques for determining it employs artificial speaker and listener. In many cases ( i.e. big rooms or system of telecommunications) the precision of directivity of artificial mouth doesn’t influence too much the result; on the contrary inside cars, but also in other cases, the shape of the whole balloon of directivity is important for determining correct and comparable values and different mouths give really different results in the same situation. Moreover there isn’t a current standard that fixes the whole balloon of directivity of artificial mouth but it defines only limits for some frontal position. For this reasons we have determined, in an anechoic room, the directivities of a real speaker and some artificial mouths and finally we have compared them for underlying the need of a more precise standard in this field.
Directivity Balloons of Real and Artificial Mouth Simulators for Measurement of the Speech Transmission Index

5954
Jan Holub,Radislav Smid,Michael D. Street,
The article describes an algorithm allowing intrusive speech transmission quality measurements in networks with low bit-rate coding schemes (600 – 2400 bits/second) as used in satellite and military communications. The proposed algorithm is based on PESQ (ITU-T P.862) perceptual model, enhanced by input noise resistance capability. The new algorithm reflects also noise cancellation capabilities of modern audio coders. The correlation between Absolute Category Rating (ACR) Mean Opinion Score (MOS) listening test results and output of the developed algorithm achieves 0.89 without the knowledge of original noise-free speech sample
Intrusive Speech Transmission Quality Measurements for Low Bit-Rate Coded Signals

5955
Se-Ung Kim,Sin-Lyul Lee,Lae-Hoon Kim,Koeng-Mo Sung,
The correct level alignment of the multi-channel reproduction system is critical for the quality of the reproduction. However, the level alignment in a conventional product is controlled manually by user. And if the user is not an expert, it is not easy to align correct level of each speaker. This paper provides how to apply binaural energy summation, which is from all of available positions of speakers, to level alignment for the arbitrary multi-channel reproduction system. If it is possible to measure the distance and the angle of a speaker, and if there is an mnidirectional microphone, it is possible to align the correct level automatically applying the binaural energy summation as the position of speakers obtained before.
Automatic Level Alignment for the Arbitrary Multi-channel Reproduction System

5956
Dae-young Jang,Jeongil Seo,Kyeongok Kang,Hoe-Kyung Jung,
In this paper, we introduce a new object-based 3D audio scene representation scheme. Typically, four kinds of 3D sound source objects are defined. Point sound source, line sound source, plane sound source and volume sound source are defined 3D sound source objects that are used for representation of object-based 3D audio scene. Each 3D sound source object includes sound source samples and related 3D audio scene information. User can interact with 3D sound source object and control the object by modifying the 3D audio scene information. We implement a prototype of object-based 3D audio player and several contents for demonstration of object-based 3D audio features. This object-based 3D audio representation scheme can be used very wide range of applications, like interactive game, home shopping and broadcasting realistic ambient sounds.
Object-based 3D Audio Scene Representation

5957
Harvey D. Thornburg,Randal J. Leistikow,
We propose a flexible state-space resynthesis approach that extends the sinusoidal model into the domain of source-filter modeling. Our approach is further specialized to a class of quasi-harmonic sounds, representing a variety of acoustic sources in which multiple, closely spaced modes cluster about principal harmonics loosely following a harmonic structure. We detail a variety of sound modification possibilities: time and pitch scaling modifications, cross-synthesis, and other potentially novel possibilities.
A Flexible Resynthesis Approach for Quasi-harmonic Sounds

5958
Brahim Hamadicharef,Emmanuel Ifeachor,
This study is concerned with objective prediction of perceived audio quality for an intelligent audio system for modeling musical instruments. The study is part of a project to develop an automated tool for sound design. Objective prediction of subjective audio quality ratings by audio experts is an important part of the system. Sound quality is assessed using PEAQ (Perceptual Evaluation of Audio Quality) algorithm, and this greatly reduces the time-consuming efforts involved in listening tests. Tests carried out using a large database of pipe organ sounds, show that the method can be used to quantify the quality of synthesized sounds. This approach provides a basis for the development of a new index for benchmarking sound synthesis techniques.
Objective Prediction of Sound Synthesis Quality

5959
Geoffroy Peeters,
This paper addresses the problem of classifying large databases of musical instrument sounds. An efficient algorithm is proposed for selecting the most appropriate signal features for a given classification task. This algorithm, called IRMFSP, is based on the maximization of the ratio of the between-class inertia to the total inertia combined with a step-wise feature space orthogonalization. Several classifiers - flat gaussian, flat KNN, hierarchical gaussian, hierarchical KNN and decision tree classifiers - are compared for the task of large database classification. Especially considered is the application when our classification system is trained on a given database and used for the classification of another database possibly recorded in completely diffierent conditions. The highest recognition rates are obtained when the hierarchical gaussian and KNN classifiers are used. Organization of the instrument classes is studied through an MDS analysis derived from the acoustic features of the sounds.
Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization

5960
David Lowenfels,
The bandlimited digital synthesis model of Stilson and Smith is extended with a single feed-forward comb filter. Time-varying comb filter techniques are shown to produce a variety of classic analog waveform effects, including waveform morphing, pulse-width modulation, harmonization, and frequency modulation. The techniques discussed are not guaranteed to maintain perfect bandlimiting; however, they are generally applicable to other synthesis models.
Virtual Analog Synthesis with a Time-Varying Comb Filter

5961
Oliver Hellmuth,Eric Allamance,Markus Cremer,Holger Grossmann,Juergen Herre,Thorsten Kastner,
Finalized in 2001, the MPEG-7 Audio standard provides a universal toolbox for the content-based description of audio material. While the descriptive elements defined in this standard may be used for many purposes, audio fingerprinting (i.e. automatic identification of audio content) was already among the initial set of target applications that were conceived during the design of the standard. This paper reviews the basics of MPEG-7 based audio fingerprinting and explains how the technology has been used in a number of real-world applications, including metadata search engines, database maintenance, broadcast monitoring and audio identification on embedded systems. Appropriate selection of fingerprinting parameters and performance numbers are discussed.
Using MPEG-7 Audio Fingerprinting in Real-World Applications

5962
Christian Varani,Enrico Armelloni,Angelo Farina,
Car cockpit is a critical environment for music; sound reproduction is in fact quite conditioned by reflections, echoes, engine noise and loudspeakers’ set up. An important technique to improve sound comfort is a spatial equalization where both magnitude and phase of signal are controlled. This technique is performed by a stereodipole system where two closely loudspeakers are setting in front of listener and digital processing is performed real-time by a DSP board. Cross-talk cancellation is achieved using FIR filters, whose coefficients are obtained by inversion of the measured cockpit impulse response. In this paper an experimental validation of a double stereodipole system, one for driver and other for passenger, is performed by subjective evaluations inside a commercial car.
Implementation of a Double StereoDipole System on a DSP Board – Experimental Validation and Subjective Evaluation Inside a Car Cockpit

5963
Kenichi Taura,Masayuki Tsuji,Masayuki Ishida,Tsuyoshi Nakada,
We have developed a digital amplifier for car use. The major problems of conventional digital amplifiers are lack of the ability to reject the ripple and noise on the car power line, and the EMI (electro-magnetic interference). We have developed a novel feedback system for digital amplifiers to solve mainly the supply voltage ripple problem. By using a prototype with a feedback system, we have confirmed that it gives enough supply voltage ripple rejection for car audio systems. We also confirmed that the feedback system could contribute to reduce the EMI by relaxing the requirements to the switching stage operation, while keeping low distortion of an audio output.
Development of a Digital Amplifier for Car Use

5964
Maja Sliskovic,Hans-Juergen Nitzpon,
As new sound and video broadcasting systems are being deployed, the need for a multistandard multiband receiver increases. The compatibility of the software radio receiver with any defined broadcasting service is guaranteed by its re-programmability, i.e. by the fact that its functionality is determined by software. A receiver for a new standard can be added by a simple software download. In this paper a software implementation of the signal path from the ADC to the audio/video output will be discussed. The possibility of software reuse in different receiver functional blocks and for different broadcasting standards will be investigated.
Software Radio Receiver for Audio and Video Broadcasting Systems

5965
Hesu Huang,Chris Kyriakakis,
Additive background noise and convolutive noise in the form of reverberation are two major types of noise in audio conferencing and hands-free telecommunication environments. To reduce this type of acoustic noise, we propose a two-step approach based on subband adaptive filtering techniques. In particular, we first reduce the additive noise using a Delayless Subband Adaptive Noise Cancellation (DSANC) technique, and then suppress the convolutive noise through Subband Adaptive Blind Deconvolution. The experiments show that our method has lower computational complexity and better performance than previously proposed methods.
Subband Adaptive Filtering for Acoustic Noise Reduction

5966
Ching-Shun Lin,Chris Kyriakakis,
In this paper, a neuro-fuzzy system is proposed to remove multifrequency noise from audio signals. There are two major elements in our method. The first comprises a fuzzy cerebellar model articulation controller (FCMAC) that is used to quantize the signals. The second one is developed based on the theory of stochastic real values (SRV) that is used to search the optimal frequencies for the overall trained system. We present a DSP implementation of the SRV algorithm and results on its performance in removing spectral noise that is buried in audio signals.
Multi-Frequency Noise Removal Based on Reinforcement Learning

5967
Holger Crysandt,
Realtime music identification became more and more interesting within the past few years. Possible fields are for example monitoring a radio station in order to create a playlist or scanning network traffic is search of copyright protected material. This paper presents a client-server application which is able to do this in real-time with the help of MPEG-7. It explains how to define the similarity between two segments of music and determents its robustness towards perceptional audio coding and filtering. It also introduces an indexing system to reduce the number of segments which have to be compared to the query.
Music Identification with MPEG-7

5968
Han-Wen Hsu,Wen-Chieh Lee,Chi-Min Liu,
Current existing digital audio signals are always restricted by sampling rates and bandwidth fit for the various storages and communication bandwidth. Take for example the widely spread mp3 tracks encoded by the standard MPEG1 layer 3. The audio bandwidth in MP3 is restricted to 16 kHz due to the protocols constraints defined. This paper presents the method to reconstruct the lost high frequency components from the band-limited signals. Both the subjective and objective measures have been conducted and shown the better quality. Especially, the important objective measurement by the perceptual evaluation of audio quality system, which is the recommendation system by ITU-R Task Group 10/4 has proven a significant quality improvement.
High Frequency Reconstruction by Linear Extrapolation

5969
Doug Perkins,Amnon Sarig,
There are many ways to store audio files but traditional methods are not conducive to file sharing, which is the goal of the modern networked facility. While still evolving, the world of digital audio storage offers the technology to quickly and safely share files between users, and even allows simultaneous users on different platforms to access audio. Learn how your facility can simplify the transition to the digital world, what products are on the leading edge, and what to look for when you are ready to make the leap.
Audio Storage and Networking in the Digital Age

5970
Nicolas Sincaglia,
In a networked audio environment, metadata not only provides a human interface but is used for the identification, organization, tracking, reporting and selling of digitized sound recordings. Establishing an open industry standard for this data will enable the entire industry to streamline its ability to make content available. The result will be a more efficient and uniform exchange of data, ultimately enabling a more versatile and profitable music industry.
The Requirement for Standards In Metadata Exchange for Networked Audio Environments

5971
Peter Mapp,
A new technique for improving the spaciousness of reproduced sound has been investigated. The technique uses a combination of conventional Pistonic loudspeakers and Distributed Mode (DML) devices. Objective measurements have been made in a range of rooms and show that ‘Layered Sound’ affects parameters such as the Inter Aural Cross Correlation (IACC) and Lateral Energy Fraction as well as Centre Time and Early Decay Time. A number of conditions were investigated including listening room configuration and the relative sound levels of the loudspeakers. Limited subjective testing was carried out in order to ascertain the preferred relative intensities between the conventional and DML loudspeakers. This was confirmed to be optimal with the DMLs set within the range –5dB ± 3dB relative to the conventional stereo loudspeakers. The configuration of the listening room and the type of programme material (and recording technique) were also found to be significant factors. It is shown that over a range of conditions, Layered Sound enhances the perceived spaciousness, envelopment and clarity of reproduced sound, though some changes to the original stereo image were noted.
An Investigation of Layered Sound

5972
Sandra Brixen,Frank Melchior,Thomas Roder,Stefan Wabnik,Christian Riegel,
Wave Field Synthesis (WFS) permits the reproduction of a sound fi eld, which fi lls nearly the whole reproduction room with correct localization and spatial impression. Because of its properties, the WFS technology shows enormous potential to be used for creation of audio in combination with motion pictures. For this application a special authoring system is in development, which gives audio engineers the possibility to automate the process of positioning sound sources, and takes several ergonomics as well as technical features into account. This paper presents fi rst experiences of the mixing process for content production using WFS. It also demonstrates the authoring system capabilities for WFS and shows future developments.
Authoring Systems for Wave Field Synthesis Content Production

5973
Jose Vieira,Luis Almeida,
This paper proposes an intelligent acoustic sensor able to localize sound sources in acoustic environments with strong reverberation. The proposed system is inspired on the precedence effect used by the human auditory system and uses only two acoustic sensors. It implements a modified version of the algorithm proposed by Huang that uses the precedence effect in order to achieve robust sound localization even in reverberant environments. The localization system was impletented in a C31 DSP for real time demonstration and several experiments were performed showing the validity of our solution. Finally, the paper also proposes a method to estimate on-line the decay of the reverberation, using the received sound signals, only.
A Sound Localizer Robust to Reverberation

5974
Banu Gunel,
Fidelity of spatial hearing effects generated by loudspeakers degrades outside the sweet spot. This paper proposes a method to analyze a sound scene created by stereo loudspeakers and modify it with the help of assistant headphones. Loudspeaker positions are found by analyzing B-format recordings at the listener position. This information is evaluated to generate signals for assistant headphones. Headphone signals are filtered with head related transfer functions for amplitude panning to change the positions of the loudspeakers. Then, other headphone signals are added which lead the original sounds for further loudspeaker and lag the original sounds for nearer loudspeaker creating direct-echo pairs. The system improves listening conditions outside the sweet spot and is extendable to multiple listeners.
Modification of Louspeaker Generated Posision Cues through Assistant Headphones

5975
Jens Meyer,Tony Agnello,
This paper describes a beamforming spherical microphone array consisting of many acoustic pressure sensors mounted on the surface of a rigid sphere. The beamformer is based on a spherical harmonic decomposition of the sound field. We show that this design allows a simple and computationally effective, yet very flexible beamformer structure. The spherical shape of the array in combination with the beamformer allows steering the look-direction the to any angle in 3-D space. Measurements from an array with 37.5mm radius that consists of 24 senors are presented. The paper focuses on the applications of directional sound pick-up and sound field analysis and reconstruction. Other suitable applications are e.g. room acoustic measurements and 'forensic beamforming'.
Spherical Microphone Array for Spatial Sound Recording

5976
Huseyin Hacihabiboglu,
The precedence effect refers to the property of the human auditory system that enables accurate localization of sound sources where many interfering echoes of the original signal are also present. Perception of the elevation, azimuth, and distance of sound sources are affected in the presence of an echo. The multivariate Gaussian mixture model proposed in this paper combines azimuth, elevation and distance perception, and provides a general framework for modeling sound source localization under the precedence effect. The model interprets the precedence effect as a spatial property rather than a temporal one.
Modeling Sound Source Localization under the Precedence Effect Using Multivariate Gaussian Mixtures

5977
Cheng-Han Yang,Hsueh-Ming Hang,
A low complexity and high performance scheme for choosing MPEG-4 Advanced Audio Coding (AAC) parameters is proposed. One key element in producing good quality compressed audio at low rates is selecting proper coding parameter values. The MPEG committee AAC reference model does not do well on this job. A joint trellis-based optimization approach has thus been previously proposed. It leads to a near-optimal selection of parameters at the cost of extremely high computational complexity. It is, therefore, very desirable to achieve a similar coding performance (audio quality) at a much lower complexity. Simulation results indicate that the proposed cascaded trellis-based optimization scheme has a coding performance close to that of the joint trellis-based scheme, but it requires only 1/70 computations.
Cascaded Trellis-Based Optimization for MPEG-4 Advanced Audio Coding

5978
Markus Lohwasser,Marc Gayer,Manfred Lutzky,
Encoder implementations of MPEG Advanced Audio Coding and Layer-3 (mp3) on 32-bit or 16-bit fixed-point processors are challenging due to the fact that the usable word length is restricted to 32 bits if low processing power is required. This paper describes the modifications and optimizations that had to be applied to the algorithms of these audio encoders to make a true fixed-point implementation on a 32-bit or 16-bit device possible. The implementation had to be done without using floating-point emulations or even 64-bit values for the signal energies and thresholds in the psychoacoustic model of the encoder. At the same time high encoding quality and speed were required. Memory and processing power requirements on various platforms as well as results from a subjective listening test will be presented.
Implementing MPEG Advanced Audio Coding and Layer-3 Encoders on 32-bit and 16-bit Fixed-point Processors

5979
Dai Yang,Takehiro Moriya,Akio Jin,Kazunaga Ikeda,
An extended-run-length coding tool called zero compaction is proposed in this paper. The proposed coding tool is far more efficient and generates significantly better results than traditional run-length coding when a certain type of data appears in the middle stage of our lossless audio compression systems. The zero-compaction technique has been integrated into our lossless audio coding scheme. When tested on seven sets of standard audio files from the ISO, the inclusion of zero compaction improved the average compression rates of systems by more than 1% in all cases. In addition, the simplicity and good performance of zero-compaction shave up to 4% from the total encoding time.
An Extended-Run-Length Coding Tool for Audio Compression

5980
Jeongil Seo,Dae-Young Jang,Gi Yoon Park,Kyeongok Kang,
We present implementation procedure of the interactive 3D audio player using MPEG-4 standards. MPEG-4 standards allow to interact with important objects in a scene as well as to provide the high efficiency coding tools for audiovisual objects. We apply the AudioBIFS system (version 1), parts of the BIFS in the MPEG-4 Systems standards, to provide an object-based interactivity for audio objects. It also provides geometric information for spatializing an audio object into the 3D audio scene. Because the method of 3D spatialization is not normative in the MPEG-4 AudioBIFS, we adopt a novel 3D spatialization method based on HRTF processing. To prohibit excessive increase of the number of audio objects, we defined the background audio object to present the atmosphere of 3D audio scene. While the important audio objects are separated as audio objects individually, the other audio objects are merged into the background audio object. Though our interactive 3D audio player is developed for a terminal player in Digital Multimedia Broadcasting (DMB) system, it can be also applied to 3DTV, virtual reality games, interactive home shopping applications, etc.
Implementation of Interactive 3D Audio Using MPEG-4 Multimedia Standards

5981
Schuyler Quackenbush,Peter F. Driessen,
This paper investigates techniques for mitigating the effect of missing packets of MPEG-4 Advanced Audio Coding (AAC) data so as to minimize perceived audio degradation. Applications include streaming of AAC music files over the Internet and wireless packet data channels. A range of techniques are presented, but statistical interpolation in the time/frequency domain is found to be the the most effective. The novelty of the work is to use statistical interpolation techniques intended for time domain samples on the frequency domain coefficients. A means of complexity reduction is presented, after which the error mitigation is found to require on average 17\% additional computation for a channel with 5\% errors as compared to a clear channel. In an informal listening test, all subjects preferred this technique over a more simplistic technique of signal repetition, and for one signal item statistical interpolation was preferred to the original.
Error Mitigation in MPEG-4 Audio Packet Communication Systems

5982
Anibal J. S. Ferreira,Andre C. Rocha,
An advanced Audio Spectral Coder (ASC) is described that implements a new approach to the problem of efficient audio compression by combining source coding with perceptual coding techniques. This approach involves the audio segmentation into three main components: transient events in the time domain, harmonic structures of sinusoids and stationary noise in the MDCT frequency domain. It is shown that this segmentation permits the independent parametrization and coding of components according to appropriate representation models and applicable psychoacoustic rules. The coding performance of ASC is characterized and it is shown that due to its underlying structure, addiitional functionalities other than compression are also possible namely bitstream semantic scalability, access and classification.
Combined Source and Perceptual Audio Coding

5983
Albertus C. den Brinker,Andy J. Gerrits,Rob J. Sluijter,
In sinusoidal coding, frequency tracks are formed in the encoder and the amplitude and frequency information of a track is transmitted. Phase is usually not transmitted, but reconstructed at the decoder by assuming that the phase is the integral of the frequency. Such a reconstructed phase accumulates inaccuracies, leading to audible artifacts. To prevent this, a mechanism is proposed to transmit phase information of sinusoidal tracks. The unwrapped phase is quantised and transmitted. The frequencies are not transmitted, but restored from the phase information by differentiation.
Phase Transmission in Sinusoidal Audio and Speech Coding

5984
David Isherwood,Ville-Veikko Mattila,
Listening tests were performed to define the perceived loudness threshold above which a number of mobile terminal alert tones became only partially masked by environmental noise. The relative intensity of the noise and alert tone at the partial masking threshold was then used to create masker+maskee samples to be objectively measured with various loudness models. The objective estimates were then compared to the subjective results to derive recommendations as to how best to objectively estimate the partial masking thresholds.
Objective Estimates of Partial Masking Thresholds for Mobile Terminal Alert Tones

5985
Arkady Gloukhov,
In accordance with the Huygens-Fresnel principle, the radiating device is simulated as an array of point sources in the aperture. The aperture complex pressure can be calculated by means of a model of wave propagation along the waveguide based on the Huygens-Fresnel principle or by means of approximating waveguide procedure using experimental polar data in a traditional form. The method predicts dispersion pattern at any distance with any needed resolution. Phase patterns as well as dispersion vs. distance relationship are demonstrated on samples. Proposed algorithms can be used in high-resolution simulators of loudspeakers and arrays. Results also point to another approach of measurements and presentation of dispersion data.
A Method of Loudspeaker Directivity Prediction Based on Huygens-Fresnel Principle

5986
Peter Mapp,
Although it is well known that equalising a sound system can significantly affect the perceived sound quality and intelligibility, surprisingly there is little or no information relating to the degree of improvement in intelligibility that can be achieved. Measurement data relating to over 30 sound systems has been reviewed and a number of factors relating to typical response anomalies identified. Large discrepancies were often noted to occur between the measured in-room sound system or loudspeaker response and published anechoic frequency response data. The underlying causes and implications relating to speech intelligibility are discussed. The results of a pilot study illustrating the improvements that appropriate equalisation can produce are presented. It is shown that under some conditions, improvements of over 20 % in speech intelligibility can be achieved. However, it is also noted that none of the current electro-acoustically based measurement metrics, including STI, % Alcons, STIPa or RaSTI correctly indicate the intelligibility improvements that system equalisation produces.
Some Effects of Equalisation on Sound System Intelligibility & Measurement

5987
Dai Yang,Takehiro Moriya,
Most lossless audio-coding algorithms are designed for PCM input sound formats. Little work has been done on the lossless compression of IEEE floating-point audio files. An efficient lossless-coding algorithm that handles IEEE floating-point format data as well as PCM format data is described in this paper. In the worst-case scenario, where the algorithm was applied to artificially generated 48-kHz sampling frequency and 32-bit floating-point sound files, an average compression ratio of 65% was still achieved for sound files with 48kHz sampling frequency. Moreover, the proposed algorithm is easily extensible to lossless/variable-lossy operation, which will provide scalability to accommodate the requirements of a wider range of applications and platforms.
Lossless Compression for Audio Data in the IEEE Floating-Point Format

5988
Anibal J. S. Ferreira,
The performance of perceptual audio coders depends on the efficiency of the quantization operation in masking the quantization noise under the audio signal. This objective is better addressed by coding separately different signal components such as sinusoids, transients and stationary noise. In this paper we use an audio coder that normalizes the MDCT spectrum by a smooth spectral envelope and by periodicities due to sinusoids. The resulting flattened MDCT coefficients are shown to exhibit a probability density function with small uncertainty allowing the design of an optimum non­uniform scalar quantizer. Its distortion­-rate function is derived, is compared to that of of known quantizers, and compared to that obtained under real audio coding conditions.
Optimum Quantization of Flattened MDCT Coefficients

5989
Graeme Huon,Zeljko Velican,
The future delivery of sound must create or render a realistic sound event image. A model for 3D sound capture and render format has been proposed earlier, based on Depth Perception. This paper considers the requirements for optimised low frequency reproduction and non-bass masking effects related to this model. Theoretical modeling, practical verification in real listening environments and subjective assessment with skilled and unskilled listeners is presented and conclusions drawn. It is shown that room mode influence, low frequency spatial energy distribution and main system integration for low frequency reproduction in rooms can be effectively managed. New apparatus is described that enables the loudspeaker to be placed optimally with respect to the listener so as to minimise room mode colouration of low frequencies. Its use to quantify frequency dependent loss is defined. Nonbass spatial masking effects are also reported in the context of the Depth Perception model.
Low Frequency Optimisation and Non-bass Masking Effects for Sound Field Re-creation

5990
Ruihua Ma,
Based on recent information hiding theory, information hiding may be reviewed as a game between hider (embedder/decoder) and attacker, and optimal information-embedding and attack strategies may be developed in this context. So far, there is a lot of work to apply this theory to image watermarking. However, little research work is about applying this theory to audio watermarking. This paper aims to fill in this gap. An information-theoretic model for audio watermarking is presented. We use wavelet statistical models for audio signals and compute data-hiding capacity for compressed and uncompressed host-audio sources. The simulation shows that the within experimental results, the proposed system has near-optimal performance compared to the theoretical upper bounds.
An Information-Theoretic Model for Audio Watermarking

5991
Seung-Jin Yang,Do-Hyoung Kim,Jae-Ho Chung,
We suggest the watermarking techniques which inserts additive informations into quantized integer coefficients whose values are over 15, called linbits, during Huffman coding in MP3 encoding procedure. The linbits are inserted into the bitstream with binary codes as it is. We inserted watermarks by modifying the linbits, and made an experiment evaluating audible distortion through the MOS Test. In our experiment, 20 untrained listeners were asked to rate 20 samples of about 15 seconds in which watermark is inserted at 128kbps, according to perceived quality on a scale of 1(very annoying) to 5 (imperceptible). As a result, we confirmed that we could insert the additional informations or watermarks of about 60bytes/second with sound quality of MOS 4.6 on an average.
Watermark Insertion Into MP3 Bitstream Using the Linbits Characteristics

5992
Chung-Han Yang,Jiun-In Guo,Wen-Chieh Lee,Chi-Min Liu,
The FIR-based reverberators, which convolve the input sequence with an impulse response modelling the concert hall, have better quality compared to the IIR-based approach. However, the high computation complexity of the FIR-based reverberators limits the applicability to most cost-oriented system. This paper proposes a method that uses perceptual criterion to reduce the complexity of convolution methods. Also, an objective measurement criterion is introduced to check the perceptual difference from the reduction. The result has shown that the length of impulse response can be cut off by 60% without affecting the perceptual reverberation quality. The method is well integrated into the existing FFT-based approach is have around 30% speed-up. Also, the method has a high flexibility to various computation complexities with graceful degradation to the reverberation quality.
Perceptual Convolution for Reverberation

5993
Erwin Janssen,Derk Reefman,
Recently, a new type of 1-bit Sigma Delta Modulator (SDM), called Trellis Noise-Shaping Converter, has been introduced. It offers several advantages compared to a standard SDM, including better performance in stability, signal to noise ratio (SNR) and linearity. The major drawback of the Trellis architecture is the large computational requirement. This paper refines the concept of Trellis noise-shaping, and introduces a new algorithm that offers better performance at horrendously reduced cost. On this new algorithm, a comparative performance analysis has been performed. Cost savings of multiple orders of magnitude have been achieved, while maintaining all the benefits of Trellis noise-shaping. Finally, special attention has been paid on critical implementation details.
Advances in Trellis Based SDM Structures

Back to AES Preprints


(C) 2003, Audio Engineering Society, Inc.