Audio Engineering Society Preprints

AES 112th Convention

Munich, Germany
May 10-13, 2002

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

Emil Nikolov Milanov,Elena Blagoeva Milanova,
In this article a practical relationship is shown that gives the connection between the forms of the directional characteristics (cardioid, supercardioid and hyper cardioid) and the parameters of the microphone. An electrodynamic microphone with two acoustical entrances is examined and placed in a field of a spherical sound wave. Shown is the relationship between space characteristics when the microphone is in sphere and planar sound wave. Shown are the basic proportion between the angles Q = 0 and Q = 180 (proportion front – back), as well as when Q = 0 and Q = 90 degrees and their relationship with the microphone parameters when the microphone is in sphere sound wave. Proven is that the form of the directional characteristics in the area of the lower frequencies are a function only of the proportion of the phase angles and the ratio between the distance of the two acoustical entrances and the distance between the microphone and the sound source.
Space Characteristics of the Microphones in Spherical Sound Wave Field

Juha Backman,
The paper presents a series of measurements on the behaviour of parabolic reflector microphones. The measurements include gain, sound pressure distribution, and directional properties. These results indicate that although it is easy to obtain considerable high-frequency acoustical gain, the usable angle is extremely narrow, and having the source at a non-ideal position (near-field or off-axis) leads to a geometrical distortion of the focal point, reducing the high-frequency gain. The potential benefit of directional microphones and the theoretical gain properties are discussed.
Parabolic Reflector Microphones

Ville Pulkki,
Different spatial sound reproduction techniques are evaluated using a binaural auditory model. Ear canal signals for different microphone techniques and different loudspeaker reproduction are simulated. Directional auditory cues are calculated and directional quality is discussed. The results of recording techniques for stereophonic listening explain the subjective opinions presented in literature: With coincident microphone techniques directionally fairly stable and consistent virtual sources can be produced, and with spaced microphones more spread and ambiguous virtual sources are achieved. In multichannel reproduction, none of the existing microphone techniques are found to produce good directional quality. Both coincident and spaced microphone techniques produce spread virtual sources.
Microphone Techniques and Directional Quality of Sound Reproduction

Juha Merimaa,
Soundfield inside an enclosed space depends in a complex way upon interactions between emitted sound waves and different reflecting, diffracting, and scattering surfaces. 3-D microphone arrays provide tools for investigating and recording these interactions. This paper discusses several existing array techniques introducing a variety of application targets for the HUT microphone probe. Applications include directional measurement, analysis, and visualization of room responses, estimation of room parameters and analysis of source and surface positions. In a dynamic case the probe can be utilized in source tracking and beam steering, as well as in tracking its own position. Furthermore, the probe can be used to simulate some microphone arrays commonly used in surround sound recording. In each application case both general theory and its relation to the HUT probe is discussed.
Applications of a 3-D Microphone Array

Marco Berkhout,
In this paper an integrated stereo single-ended class D amplifier is presented. The amplifier is capable of delivering 2x100W in two 4? loads at a supply voltage of 60V. The amplifier has been realized in an SOI-based BCD technology. A second order feedback loop is used to supress supply ripple and on-idealities in the output stage. In order to be suited for mass production practical issues such as robustness and EMC have been addressed.
Integrated Class-D Amplifier

Paul van der Hulst,Andre Veltman,Rene Groenenberg,
Switching power amplifiers need an analogue low-pass filter between the power transistors and the speaker to prevent HF switching noise to enter the speaker. Cancellation of the non-linearities, as well as the (load dependent) frequency transfer of this filter is desired, but traditionally very difficult because of stability problems with filter-output feedback. A switching power amplification topology is presented which offers total control over the output filter dynamics and cancellation of the filter characteristics. The proposed amplifier uses a hysteresis circuit with a high-bandwidth capacitor-current feedback as PWM modulator. The output filter is thus an integral part of the modulator which dramatically improves control of the output. The result is a very high fidelity amplifier which is unconditionally stable, regardless of the load.
An Asynchronous Switching High-end Power Amplifier

Darren Martin Rose,
Modern commercially available, compact, low power audio power amplifiers are mostly designed around one of three main technologies. These are integrated circuit class AB, thick film hybrid class AB, and switch mode power amplifier modules. The decision to use a particular technology is not only based on idealised performance specifications, but also on the performance under realistic operating conditions, and cost-to-performance considerations. In this study, the performance of each amplifier technology is studied in ideal and realistic operating conditions with two amplifier designs for each technology category. Regulated and unregulated power supplies are used, in combination with ideal resistive and real-life complex impedance loudspeaker loads. For a fixed nominal supply voltage, the value of the different technologies with regard to noise, distortion and continuous output power is discussed. This results in an analysis of the cost effectiveness, or value, of currently competing technologies for high quality, low power, compact audio power amplifiers.
A Comparison of Modular State-of-the-Art Switch Mode and Linear Audio Power Amplifiers

Tim John Mellow,
A new vented-box loudspeaker system is introduced that can be tuned to provide a pre-determined frequency response shape over a fairly wide and continuous range of box volumes. A conventional high-pass filter only allows the system to be tuned to give a particular frequency-response shape if the box volume is correct. The conventional filter can be either isolated (i.e. buffered by the amplifier) or non-isolated (i.e. between the amplifier and loudspeaker). The latter could be a passive filter that interacts directly with the complex load-impedance of the loudspeaker. Consequently, the two cases require different box volumes. A new current-feedback filter is introduced that can provide a continuous range of alignments from isolated to non-isolated.
A New Set of Fifth and Sixth-Order Vented-Box Loudspeaker System Alignments using a Loudspeaker Enclosure Matching Filter: Part I

Tim John Mellow,
In Part I, a new vented-box loudspeaker system was introduced that uses a Loudspeaker Enclosure Matching Filter to provide a pre-determined frequency response shape over a fairly wide and continuous range of box volumes. The Butterworth shape was used as an example as this is fairly well known. In this part, alternative frequency response shapes are discussed. Also, some other remaining topics are addressed such as box losses and diaphragm displacement.
A New Set of Fifth and Sixth-Order Vented-Box Loudspeaker System Alignments using a Loudspeaker Enclosure Matching Filter: Part II

Steven W. Hutt,
Loudspeakers used in OEM Automotive sound systems are expected to endure and even perform throughout a wide ambient temperature range. These extreme temperatures can have significant effects on suspension linearity. Loudspeakers used in automotive audio systems are analyzed at various temperatures to study how ambient thermal conditions affect performance.
Ambient Temperature Influences on OEM Automotive Loudspeakers

Wolfgang Klippel,
The voice coil peak displacement Xmax is an important driver parameter for assessing the maximal acoustic output at low frequencies. The method defined in standard AES 2-1984 is based on a harmonic distortion measurement, which does not give a definite and meaningful value of Xmax. After a critical review of this performance-based technique, an amendment of this method is suggested by measuring both harmonic and modulation distortion in the near field sound pressure using a two tone excitation signal. Alternatively, a parameter-based method is developed giving more detailed information about the cause of the distortion, limiting and defects. The relationship between performance-based and parameter-based methods is discussed, and both techniques are tested with real drivers.
Assessment of Voice Coil Peak Displacement Xmax

Jason M. Weida,
Audio test signals have long been used in reliability testing for loudspeakers and other parts of the audio reproduction chain. Several characteristics may be easily obtained from an audio signal in the digital domain including frequency content, word code histograms, and several different statistical values such as mean, skewness, kurtosis, and crest factor. The crest factor will be examined as a function of time for non-periodic signals. This collection of metrics will be examined to determine whether certain groupings occur, and the test signals will be examined to see if they encompass the greatest amount of qualities of real world signals as possible. Optimal metrics for a loudspeaker test signal are suggested.
Classification and Comparison of Real vs. Test Audio Signals

Juha Backman,
An improvement of loudspeaker baffle measurements is presented. The proposed method uses impedance-based measurement for low frequencies, and this response is used to correct an acoustically measured impulse response before the diffraction from the baffle edges is gated out. The low-frequency response can then be used to process the gated response, yielding an improvement in low-frequency responses over previous methods.
Improved Baffle Measurements for Loudspeakers

Thorsten Kastner,Eric Allamanche,Jurgen Herre,Oliver Hellmuth,Markus Cremer,Holger Grossmann,
Much interest has recently been received by systems for audio fingerprinting, which enable automatic identification of audio content by extracting unique signatures from the signal. Requirements for such systems include robustness to a wide range of signal distortions and availability of fast search methods, even for large fingerprint databases. This paper describes the provisions of the MPEG-7 standard for audio fingerprinting which allow for interoperability of fingerprint information generated according to the open standardized specification for extraction. In particular, it discusses the ability to generate scalable fingerprints providing different trade-offs between fingerprint compactness, temporal coverage and robustness of recognition, and gives experimental results for various system configurations.
MPEG-7 Scalable Robust Audio Fingerprinting

Frank Kurth,Andreas Ribbrock,Michael Clausen,
In this paper we present a new method for robust audio identification. Based on our existing audio indexing technology, we developed new methods to query large audio data bases with highly distorted versions of an audio signal or parts of them. For instance the data base could be queried by transmitting a piece of music using a cellular phone. In contrast to recent approaches, arbitrary segments of a piece of music are allowed as a query. We demonstrate that our method for any short audio fragment with length exceeding approximately five seconds, is able to identify the corresponding piece of audio along with the exact position of the fragment within the original signal. Our approach only relies on features extracted from the audio signals hence making the embedding of, e.g. watermarks obsolete. In our work we furthermore give an overview on our extensive tests using a database of several 1000 items of audio (approximately one month of audio) demonstrating the capability of our new method.
Identification of Highly Distorted Audio Material for Querying Large Scale Data Bases

Giorgio Zoia,Aleksandar Simeonov,Ruohua Zhou,Stefano Battista,
Natural and structured audio coding are based on two fundamentally different ways of representing the sound information; combination of the two approaches can lead to efficient and improved storage and transmission of both speech and music. Integration of natural audio tracks with synthetic sound and digital post processing is a challenging effort, especially when the audio rendering requires good quality and precise synchronization with video and graphic information, as it is the case in standardized multimedia frameworks. In this paper those challenges will be analyzed and a new player will be presented, which integrates all the mentioned features in a normative context.
Mixing Natural and Structured Audio Coding in Multimedia Frameworks

Bernhard Grill,Thomas Hahn,Daniel Homm,Kurt Krauss,Georg Ohler,Wolfgang Soergel,Christian Spitzner,
MPEG-4 is designed as a transport-independent, universal coding standard for multimedia content. ISMA, a consortium of 30 companies, has created an open, non-proprietary transport protocol for streaming of MPEG-4 over IP networks. For audio data, the ISMA specification 1.0 contains several modes which include optional interleaved transmission. At Fraunhofer IIS ISMA-compliant server and client applications have been implemented and are used for various performance and robustness tests. In the paper we investigate the characteristics of MPEG-4 Audio over the ISMA protocol. In particular we show how packet loss affects the subjective quality of the audio signal and quantify how well interleaving and concealment can restore a damaged signal.
Characteristics of Audio Streaming Over IP Networks Within the ISMA Standard

Sang-Jo Lee,Sang-Wook Kim,Eunmi Oh,
This paper proposes a real-time audio streaming method that dynamically adjusts time-varying network loads. Most current Internet services are best-effort services in that the Quality of Service (QoS) is not guaranteed. The proposed streaming method is based on RTP/RTCP/UDP protocols in order to handle transient changes of network loads. To cope with time-varying conditions in networks, initial network bandwidth and available bandwidth are measured. The Bit-Sliced Arithmetic Coding (BSAC) that provides efficient Fine Grain Scalability (FGS) is used to adapt to network load fluctuation. The results show that our proposed scheme with the BSAC tool allows a real-time continuous play over networks by maximizing QoS of transport data.
A Real-time Audio Streaming Method for Time-varying Network Loads

Andreas Floros,Marios Koutroubas,Nicolas-Alexander Tatlas,John Mourjopoulos,
The paper examines the transmission of audio coded according to the MPEG-I Layer III standard over the Bluetooth Wireless Digital protocol. The study will present the effect of the transmission parameters (such as distance, packet type and length) on the achieved wireless bit rate. The paper will also analyze several aspects concerning the real-time implementation of complete Bluetooth-based audio playback setups and will address the effects of using single-channel, stereo or multichannel audio streams.
A Study of Wireless Compressed Digital-Audio Transmission

Michael Feerick,
Modern real-time digital audio communication systems rely heavily on discrete time processing at every stage of the journey from source to distinction. More often than not implemented in the ubiquitous DSP, processes such as A/D & D/A conversion, audio compression for data rate reduction, sample rate conversion, reverse or inverse multiplexing, transfer of data through the layers of a telecomm protocol stack, etc, all contribute to the end-to-end delay of the audio. Knowledge of what causes these delays can aid system designers and integrators in setting up audio links that minimise such delays. This paper examines the practical details of why and where these delays occur, under the following headings : Physiology of Delay Converison Audio Bit Rate Reduction Data Transmission Circuits System View
How Much Delay Is Too Much Delay ?

Simon C. Busbridge,Peter A. Fryer,Yaxiong Huang,
A digital loudspeaker produces a quantised sound field directly from a digital stimulus without a digital to analogue converter. Reproduction accuracy improves if the conversion is done as late as possible in the audio chain. This paper assesses the relative merits of the two currently competing technologies, digital transducer arrays and multiple voice coils, as well as possible alternatives, by considering the underlying principles. The requirements for crossovers and drive electronics are discussed, and the performance of prototype systems is reported. Methods to extend the present manufacturability to 16 bits include the use of semiconductor mass fabrication techniques in the case of arrays and applications of oversampling and noise shaping. In the latter case the psychoacoustic implications must be understood.
Digital Loudspeaker Technology: Current State and Future Developments

D. B. (Don), Jr. Keele,Ryan J. Mihelich,
The stiffness of a progressive suspension is fairly constant for small excursions and then gets progressively stiffer for larger excursions. When the moving assembly enters the region of increasing stiffness, forces are generated that rapidly reverse its motion much the same as when a bouncing ball hits the ground. Contrary to the common wisdom that predicts a squared-off displacement waveform, the bouncing-ball analogy predicts that the displacement waveform will be turned into a triangle wave. Under some conditions, the moving assembly will repetitively bounce at a frequency tens of times higher than the excitation frequency with acoustic output that exhibits high-level harmonics several times higher in amplitude and frequency than the fundamental. Time-domain simulations and experiments are presented to illustrate the effects.
Suspension Bounce as a Distortion Mechanism in Loudspeakers with a Progressive Stiffness

Joerg W. Panzer,
The cavity of concave diaphragms causes a typical variation of the radiation impedance, which has an effect on the performance of the speaker system in the frequency range where the wavelength is comparable with the dimension of the diaphragm. This paper investigates the radiation impedance and its effects on the sound pressure response of the whole driver with the help of a circular symmetric boundary element method for the infinite baffle. The electrical and mechanical properties of the electro dynamical driver are modelled with lumped elements. An approximation for the wide band radiation impedance of cone type diaphragms is given and compared to the exact results from the BEM.
Radiation Impedance of Cones at High Frequencies

Mark Alexander Dodd,
A simple 2D Finite element method (FEM) may be applied within the thermal domain to analyse the behaviour of systems where fluid flow is not a significant factor. FEM’s ability to analyze problems with arbitrary geometric form makes it a powerful alternative to the lumped element approach for modeling loudspeakers. Applying suitable boundary conditions, and using modified material properties, it is possible to model the thermal behavior of a loudspeaker motor with FEM. This approach minimizes errors due to fluid flow and includes heat radiation to the environment. The FEM technique is applied to a new driver topology with external frame and magnet. Static thermal FEM results are compared to those obtained from the driver by measurement. The material properties for air were derived from experimental results. A separate model, including the full mechanical structure of the coil, is used to derive its bulk thermal properties thus allowing a more efficient solution.
The Application of FEM to the Analysis of Loudspeaker Motor Thermal Behaviour

Jean-Pierre Morkerken,Benjamin Parzy,Guillaume Pellerin,Jean-Dominique Polack,
To-day, low frequency reproduction with loudspeakers in vented box is limited by two factors: the volume of the box, and non-linear airflow through the vent. We propose a novel approach that takes into account aerodynamical parameters, leading to original profiles and an improved functioning of the vented box. For example, under certain alignment conditions, there exists a second cut-off frequency below the first one, localised on the lower impedance hump. Using this lower cut-off frequency and an adapted vent profile makes it possible to radiate frequencies around 40 Hz with box volumes smaller than 1 liter and small drivers. A prototype will be demonstrated.
Vented-box Geometry and Low Frequency Reproduction: The Aerodynamical Approach

Bozena Kostek,Pawel Zwan,Marek Dziubinski,
The presented study is aimed to extract parameters from musical sounds that can be useful in the musical sound recognition process. For this purpose time-frequency transform analysis employing various filters is performed on musical sounds representing twelve instrument classes. Three groups of instruments are taken into account, namely: wind, string and percussive. Examples of wavelet analyses of various musical instrument sounds are presented. On this basis a number of parameters was extracted and statistically analyzed. Parameters that are correlated are removed from the feature vector. In this way a number of parameters in the feature vector can be diminished from dozens to a few most important ones. Then feature vectors were fed to the Artificial Neural Network inputs and classification experiments were performed. Furthermore originally developed Frequency Envelope Distribution method was applied to divide musical signal into harmonic and inharmonic content. Those signals were also parameterized and used in recognition experiments. Some experiment results are presented. The derived conclusions are also included in the paper.
Statistical Analysis of Musical Sound Features Derived from Wavelet Representation

Luis Gustavo Martins,An兊al J. S. Ferreira,
The extraction of semantic features from audio signals is a research area that is of the most interest for the automatic classification, indexation and retrieval of audio material. Furthermore, real-time interactive systems that use sound as an input may also benefit from the development of such technologies in order to achieve a better interaction with “real-world” events. This paper addresses the analysis of acoustical music signals, and in particular, its transcription to MIDI (Musical Instrument Digital Interface). The development and implementation of an automatic transcription system are presented and the results obtained are evaluated and discussed.
PCM to MIDI Transposition

Luis I. Ortiz-Berenguer,Francisco J. Casajus-Quiros,
A method to identify notes and chords of piano is presented. A simplified piano model based on physical properties has been developed for generating spectral patterns used for identification of notes and chords. The patterns generated are used to measure correlation with the chord to be identified. The results using typical values for the model parameters are good enough, but they are improved if the model is trained. Successful recognition of three-note piano chords has been carried out.
Pattern Recognition of Piano Chords Based on Physical Model

Balazs Bank,Giovanni De Poli,Laszlo Sujbert,
Real-time physical modeling may have important applications in audio coding and for music industry. Nevertheless, no efficient solutions have been proposed for modeling the radiation of string instrument body, when commuted synthesis cannot be applied. In this novel multi-rate approach, the string signal is split into two frequency bands. The lower is filtered by a long FIR filter running at a considerably lower sampling rate, precisely synthesizing the body impulse response up to 2kHz. In the high frequency band only the overall magnitude response of the body is modeled, using a low-order filter. The filters are calculated from measurements of real instruments. This enables the physical modeling of string instrument tones in real-time with high sound quality
A Multi-rate Approach to Instrument Body Modeling for Real-time Sound Synthesis Applications

Giuliano Monti,Mark B. Sandler,
This paper illustrates a signal adaptive analysis technique to transcribe monophonic sounds. Unlike other models, which segment audio relying on the onset time-domain analysis, this model principally exploits pitch information. Pitch is locked after detection. The structure of a musical note, i.e. harmonic frequency structure and time-envelope model, is exploited to segment and transcribe the signal. The system is inspired by the Integrated Processing and Undestanding of Signals system (IPUS) where abstract explanation and best front-end configuration are iteratively searched. Onsets and pitch are searched in two different domains and integrated with the system knowledge to give a coherent interpretation of the signal. The system transcribes with success from fast trumpet riffs to long sustained violin vibrato.
Pitch Locking Monophonic Music Analysis

Julie Rosier,Yves Grenier,
We present a pitch estimation method for the separation of musical sounds. From a sinusoid plus autoregressive noise model of the composite signal, we successively estimate the fundamental frequencies using the Minimum Description Length principle or Rissanen Criterion. Equivalent to a penalized log-likelihood, this criterion also allows to detect the onset and offset of each sound. The detection is based upon the likelihood gap between the estimate model and a simple autoregressive model. Several simulations for single or multiple musical sounds are performed to illustrate the effectiveness of the method.
Pitch Estimation for the Separation of Musical Sounds

Nikolaos Mitianoudis,Mike Davies,
The authors introduce the idea of performing it Intelligent ICA to focus on and separate a specific instrument, voice or sound source of interest. This is achieved by incorporating high-level probabilistic priors in the ICA model that characterise each instrument or voice. For instrument modelling, we experimented with various feature sets previously used for instrument or speaker recognition. Prior training of a Gaussian Mixture Model for each instrument was performed. The order of the feature vector, the number of gaussian mixtures and the training audio data length were kept to reasonably minimum levels.
Intelligent Audio Source Separation using Independent Component Analysis

Chris Duxbury,Mike Davies,Mark B. Sandler,
We present a method using temporal information of musical audio to considerably improve time-scaling with pitch-preservation. This work builds on a signal model where the original signal is split into transient and steady state components, based on an adaptive multiresolution analysis. By considering only the transient signal content, temporal segmentation is achieved with a much higher degree of accuracy than standard onset detection algorithms. Only the segmented steady state regions are then stretched, whilst phase is locked in the temporally masked regions at transients. Despite local variances in stretching factor, rhythm is maintained globally, yielding perceptually very high quality results for a range of complex polyphonic musical signals, at a low computational cost.
Improved Time-Scaling of Musical Audio Using Phase Locking at Transients

Pedro Cano,Eloi Batlle,Harald Mayer,Helmut Neuschmied ,
This paper describes the development of an audio fingerprint called AudioDNA designed to be robust against several distortions including those related to radio broadcasting. A complete system, covering also a fast and efficient method for comparing observed fingerprints against a huge database with reference fingerprints is described. The promising results achieved with the first prototype system observing music titles as well as commercials are presented.
Robust Sound Modelling for Song Identification in Broadcast Audio

Sigmund Rothschild,
The aesthetics and techniques used in the creation of early analog tape-based electronic music were influenced by music compositional practices from the 1880s through the first half of the 20th century. Examining these earlier acoustic pieces and a their notable compositional techniques has helped provide an aesthetic context for students studying early tape-based electronic music and their analog recording projects at the University of Colorado at Denver.
The Aesthetics and Pedagogy of Analog Tape Techniques in Classic Tape-Based Electronic Music

George Brock-Nannestad,
The paper is a systematic approach to the various factors that go into a sound recording and its reproduction and links them in a novel way with the expressed views of professionals as regards their philosophies for performance, recording, dissemination, etc. Applying the systematic approach to well-known and frequently touted publications as well as more obscure sources, it becomes possible to structure the desires of a producer and performer and perform a comparison with the obtained result. Producers and Tonmeisters in the classical field (such as Culshaw, Burkowitz, Grubb) discussed their philosophies about the same time that the performance practice movement gathered momentum, and producers in popular music exploited the new possibilities (such as George Martin). Recently perception psychology has been introduced while the technical possibilities including surround have expanded and (hopefully) standardised.
The Influence of Recording Technology on Performers and Listeners - A Review

Sean William Davies,
Equalisation for Archival Transfer: In the Analogue or the Digital Domain? S.W. Davies. Abstract. Most analogue recordings have deliberate departures from a level frequency response and these were made with the assumption that they would be equalised on playback, either through published data specifying the appropriate curve or throyugh the provision of specialised hardware. This paper examines the possible choices and their implications vis-a-vis performing these equalisations prior to or post the analogue / digital conversion. We consider two main categories: 1) Stationary curves such as RIAA for disc, IEC and NAB for tape and 2) Non stationary curves such as noise reduction systems by Dolby, EMT, dbx. In addition to the purely technical trade-offs consideration is also given to the demands on the knowledge of the transfer engineer and the ability of the chosen method(s) to compensate either instantaneously or subsequently for erroneous decisions made at the time of transfer.
Equalisation for Archival Transfer: In the Analogue or the Digital Domain?

Paulo Antonio Andrade Esquef,Luiz Wagner Pereira Biscainho,Vesa Valimaki,Matti Karjalainen,
This paper addresses the restoration of audio signals proceeding from old recordings, and focuses on long-pulse removal. We propose a new two-stage method to estimate the waveform of each long pulse from the observed noisy signal. First, an initial estimate for the pulse shape is obtained via a non-linear filtering scheme called two-pass splitting-window (TPSW) filtering. Then, this estimate is further smoothed through a piecewise polynomial fitting. The degree of smoothness of the estimate can be controlled by adjusting either the TPSW parameters or the length of the segments to be fitted. The proposed method has low computational complexity, it is not constrained by the assumption of shape similarity among pulse waveforms, and can be successfully applied for removing overlapping pulses.
Removal of Long Pulses from Audio Signals Using Two-pass Split-window Filtering

Frank Kurth,Andreas Ribbrock,Michael Clausen,
In this paper we present several novel techniques for incorporating fault tolerance in content-based audio search. Our algorithms extend a recently proposed framework for fast index-based search in score-like audio material. Considering queries given as a sequence of notes and the task of matching those queries to a data base of musical tunes or melodies, we investigate possible deviations such a wrong notes, missing notes, or differences in the underlying tempo curves. It turns out that our fast index-based search methods may be quite naturally adapted to tolerate the former kinds of deviations, while the case of tempo changes requires a more careful treatment. Here, we propose a new technique for incorporating a tempo tracking mechanism into our fast search algorithms. Our methods have been successfully implemented and tested within a query-by-whistling application presented at the 2001 Internationale Funkausstellung (IFA) in Berlin, Germany. We describe this application and give an overview on our extensive tests.
Efficient Fault Tolerant Search Techniques for Full-Text Audio Retrieval

Nuno Fonseca,
Due to wiring questions, computer integration, or simply as a way to increase the features of audio systems, the need for an audio networking solution increases every day. However, it is possible to use the know-how of decades of computer networking, and use winning technologies such as Ethernet. With a few adjustments, we can have an audio and midi networking solution, which will achieve our goals and scale for the future, without hurting our pockets.
Foundations for Sound over Ethernet (SoE)

Gerhard Spikofski,Siegfried Klar,
The problem of partially extreme loudness differences in radio and television programs is well known for a long time. With respect to the introduction of new digital techniques combined with parallel transmission of digital and analogue signals the problem of loudness differences again is especially significant. Based upon relevant levelling recommendations and a newly developed loudness algorithm solutions avoiding loudness differences in radio and television are presented.
On Levelling and Loudness Problems at Television and Radio Broadcast Studios

Massimo Navarri,Francesco Tordini,Emanuele Bellati,Romolo Toppi,
As automotive audio market calls for higher loudspeaker performances and shorter time-to-market, there is a need for improved loudspeaker design techniques. Here, an innovative synthesis approach is proposed. First, the non-linear characteristic of each loudspeaker component (spider, cone, and magnetic circuit) has been individually either measured or simulated using a FEA software. Then, given technical specifications, the data pertinent to each component is used to simulate the performance of the loudspeaker. This way, any solution can be explored at synthesis level before prototype assembly, and components can be chosen to guarantee the best fit to customer specifications. Comparison between simulated and measured data is presented to validate the model.
A Novel Synthesis Approach to Loudspeaker Design

Fabio Bozzoli,Enrico Armelloni,Angelo Farina,Emanuele Ugolotti,
The paper describes the recording/reproduction technique and the subjective listening experiment aimed to the assessment of the effect of the background nose on the perceived quality of the sound being reproduced inside a car. The noise inside 4 different cars was recorded at various speeds both with a binaural microphone and with a Soundfield microphone. These background noise recordings are reproduced inside a special listening room, by means of a very sophisticated reproduction chain, designed so that at the ears of the listeners the same sound pressure is presented as inside the original car. A computer-based system is finally employed for collecting subjective responses to sound stimula, constituted by the reproduction of music or speech on an automotive sound system in presence of the background noise.
Effects of the Background Noise of the Perceived Quality of Car Audio Systems

Angelo Farina,Alberto Bellini,Marco Romagnoli,
The paper discusses various possible DSP algorithms employable on low-cost platform suitable for mass series production ot automotive sound systems. The analysis takes into account traditional IIR and FIR filtering schemes, dual-rate and hybrid approaches, and new algorithms such as Warped FIR (WFIR) or frequency-domain partitioned convolution (BruteFIR). The comparison does not limits to the implementation, cost and performance of different processing schemes, but also to the hot problem of computing the optimal filtering coefficients for each of these schemes starting from measurements taken inside the car cockpit, making use of examples taken from the real life. The results show that both IIR and FIR structures are capable of good results on low-cost DSP systems; the more advanced algorithms will probably become competitive as soon as a new generation of floating-point DSP processors will also be available for low-cost applications.
Different Approaches for the Equalization of Automotive Sound Systems

Alberto Bellini,Monica Tesauri,Emanuele Ugolotti,Eraldo Carpanoni,Giacomo Frassi,
The project delas with the design and realization of a dedicated board for the processing of audio signals for automotive applications, named DIGIcar. It features two stereo inputs (24 bit resolution) and two stereo outputs. The prototype was realized in a 4 layer PCB with SMD components. The DIGIcar board was tested stand-alone with an Audio Precision System 2022. Experiments were performed inside a car too, where the DIGIcar board was interfaced with a four channel power amplifier and connected between audio source and car loudspeakers. A development tool in MATLAB was exploited in order to synthesize the suitable equalizing filters, given standard acoustic car measurements. Then the filters are stored in the board FLASH EEPROM. A few options are available to tailor the equalizer for differen cars. Listening tests and acoustic measurements show the effectiveness and the functionality of the designed board.
Design of a Digital Audio Equalizer for Automotive Applications

Stefan Schmitt,Malte Sandrock,Jochen Cronemeyer,
The objective of this work is to establish a single channel noise reduction algorithm for speech enhancement integrated in DSP systems. The main emphasis is on spectral subtraction. The chosen algorithm is based on a Minimum Mean Square Error Log Spectral Amplitude (MMSE-LSA) approach. One of the crucial tasks for good results, i.e. natural and intelligible speech in combination with well attenuated noise and low spectral distortion, is a balanced estimation and weighting of the noise magnitude spectrum.
Single Channel Noise Reduction for Hands Free Operation in Automotive Environments

Rolf Schirmacher,
For many years, active noise control (ANC) is a concept strongly discussed for the reduction of unwanted sounds. In addition to minimizing noise, this technique can be extended to achieve a target sound which is different from silence. This concept is called Active Sound Design (ASD). It does not only reproduce a target sound, but monitors the sound and uses a closed control loop . This results in a high quality reproduction and allows the target sound level to be lower or higher than the original sound level without system. One major application is the automotive engine noise. ASD allows to merely freely define the engine sound independent of the physical engine used in the car. The paper presents the concept of ASD and reports on applications of this technique for automotive interior and exterior sound.
Active Design of Automotive Engine Sound

Suthikshn Kumar,
The quality of speech (QoS) deteriorates in a noisy environment due to the fixed volume setting available in the current day mobile phones. This paper proposes the use of smart acoustic volume controller for mobile phones based on fuzzy logic concept for improving the QoS in a stationary or non-stationary noise. The fuzzy volume controller makes use of the noise level and class information generated by a system for fuzzy pattern classification of background noise. The smart acoustic volume controller is extended to be useful for hearing impaired by using the audiogram to design the fuzzy rule base. The design and simulation of the fuzzy volume controller is discussed along with the implementation details. We report on the use of two tools i.e., ECANSE and FuzzyControl++ for the simulation of fuzzy volume controller for mobile phones
Smart Acoustic Volume Controller for Mobile Phones

Ville-Veikko Mattila,
Multidimensional scaling and preference mapping were used for the perceptual analysis of speech quality in mobile communications. 41 processing chains, representing, e.g., transmission of speech over mobile networks, were studied. 30 screened subjects were used in the quality test and 15 screened and trained subjects in the MDS test. Based on an external profiling of the auditory characteristics, the dimensions appeared to relate to the naturalness of speech, limitation of the frequency band, smoothness of speech, a bubbling sound alternating with the signal and noisiness of speech. The Phase I ideal point model was used to predict the quality with an average error of about 6 %, to study the interaction between the attributes and the linearity of the attributes.
Ideal Point Modelling of Speech Quality in Mobile Communications Based on Multidimensional Scaling (MDS)

Natanya Ford,Francis J. Rumsey,Tim Nind,
Results from preliminary investigations studying graphical elicitation techniques suggest that a graphical assessment language, whereby listeners use their own non-verbal descriptors to depict spatial attributes of a reproduced sound, may be effective for demonstrating differences in perceived image skew and scene width. This paper investigates the use of a graphical assessment language for evaluating subjective differences in car audio systems with respect to their distortion of stereo images from sub-optimal listening locations. The study compares the image obtained from a surround processing system and conventional two channel stereo reproduction, analysing the graphical depictions obtained using conventional statistical methods. Source material for the investigation employs both time and amplitude variation to position instruments within the reproduced stereo scene.
Subjective Evaluation of Perceived Spatial Differences in Car Audio Systems Using a Graphical Assessment Language

Michael C. Kelly,Anthony I. Tew,
The cocktail party effect describes the human ability to direct attention to a single sound source amongst a mixture of competing sources. Under certain conditions signal fragments from these sources can be removed without creating a perceptible effect (often referred to as the continuity illusion). In this study we evaluate the impact of removing fine spectral detail in the regions of spectro-temporal overlap between a pure tone and a white noise source using Fourier spectral gating. We go on to discuss the use of spectral gating in the treatment of natural signals in relation to the discontinuities that are introduced and their effect on the continuity illusion.
The Continuity Illusion in Virtual Auditory Space

Virginia Best,Simon Carlile,Andre van Schaik,
The effect of spatial separation on the ability of subjects to hear both sounds in a pair of concurrent broadband sounds presented in virtual auditory space was examined. Results suggested that this ability relied strongly on differences in the binaural cues delivered by the sounds, and stimulus pairs could not be separated if they delivered the same binaural cues. A relatively simple model was developed to explain these data using a combination of computational tools relevant to auditory processing: (a) a 128-channel cochlea model, (b) spike generation representing the temporal structure of the energy in each channel, (c) within-channel cross-correlation of left and right ear spike patterns, (d) exclusion of low-energy channels and (e) aggregation of cross-correlation results over remaining channels.
The Perception of Multiple Broadband Noise Sources Presented Concurrently in Virtual Auditory Space

Kazuho Ono,Ville Pulkki,Matti Karjalainen,
Binaural modeling of coloration of sound perceived due to multiple coherent sources is studied under the condition in which sounds arrive at a listener successively within a certain time delay. Our former work showed that a binaural model of timbre perception describes the coloration almost independently of the directional perception, based on listening tests using 1-Bark-width bandlimited noise and pulse trains. The present work is an extension of the study to verify the timbre model, including listening experiments with broadband noise and pulse trains. The results show that our modeling still has a good agreement with the listening tests results for broadband signals in general, but also show clear deviation from the bandlimited noise case especially at low frequencies.
Binaural Modeling of Multiple Sound Source Perception: Coloration of Wideband Sound

Koray Ozcan,Simon C. Busbridge,Peter A. Fryer,
The reproduction of sounds in rooms generates multiple (auditory) cues the relative importance of which is still not clearly understood. This paper presents an experimental investigation to determine the relative hierarchy of conflicting cues. Subjects were asked to auralise different stimuli when two cues were put into conflict. The relative importance of interaural time, phase and intensity differences, the effect of pinnae, motion and reverberation cues have been determined. The use of windowed tone bursts allowed multiple variables to be controlled simulating real signals. The results show the change in auralisation as one cue in varied in opposition to another. The aim of this work is to identify the major and minor roles of different auditory cues in acoustic virtual reality.
Determination of the Relative Hierarchy of Audible Cues in Conflict

William L. Martens,Atsushi Marui,
Multidimensional perceptual scaling analyses were performed for a set of stimuli that were generated by submitting two pre-recorded guitar performances to a popular effects processor designed to model a variety of guitar amplifiers. Within three characteristic types of amplifier distortion (British Crunch, Combo 335, and Twin Drive), the tone color of the output was varied using three nominal output character settings (Normal, Edge, and Punch). As it was only the variation in timbre and tone coloration that was of interest, the loudness of the processor outputs was equalized prior to listening sessions in order to determine the most salient perceptual attributes of these amplifier models as their output character was varied. This analysis separated out two salient tone-coloration dimensions from a third dimension of timbral variation. This third dimension corresponded to a timbral characteristic particular to the three modeled amplifier types. Interpretation of the meanings of the three dimensions was aided by the results of a semantic differential analysis for the same sounds using bipolar adjective scales. The timbral quality distinguishing the three modeled amplifiers was well described by the verbal attributes ``wildness'' and ``hardness.'' The tone coloration variation introduced particularly by the Punch output character settings was most highly correlated with ratings on ``thickness'' and ``heaviness'' scales. A straightforward relation was also found for the use of the ``sharpness'' and ``muddiness'' scales in describing tone coloration variation introduced by the Edge output character setting, though interpretation was complicated somewhat by the correlation of these ratings with the timbral quality that differed between the amplifiers. The results of this study provided the basis for a graphical user interface to computer-controlled musical effects processing that is more immediately accessible to a wide range of users.
Multidimensional Perceptual Scaling of Tone Color Variation in Three Modeled Guitar Amplifiers

Martin Dietz,Lars Liljeryd,Kristofer Kjorling,Oliver Kunz,
Spectral Band Replication (SBR) is a novel technology which significantly improves the compression efficiency of perceptual audio codecs. SBR reconstructs the high frequency components of an audio signal on the receiver side. Thus, it takes the burden of encoding and transmitting high frequency components off the encoder, allowing for a much higher audio quality at low datarates. In December 2001, SBR has been chosen as Reference Model for the MPEG standardization process for bandwidth extension. The paper will highlight the underlying technical ideas and the achievable efficiency improvements. A second focus will be a description of current and future applications of SBR.
Spectral Band Replication, a Novel Approach in Audio Coding

A.C. den Brinker,Erik Schuijers,A.W.J. Oomen,
Compared to the traditional wave form coding standards employing subband or transform coding, parametric coding is regarded as a technique that allows even further reduction of bit-rates. An example of this is the MPEG-4 V2 HILN parametric coder. This coder is targeting medium quality at very low bit-rates. This level of quality was expected to be typical for parametric coders. In response to a call for proposals by MPEG, Philips has submitted a parametric coder targeting a significantly higher quality level. Currently, this coder is further improved as a collaborative development in the course of the "MPEG-4 Extension I" standardization process
Parametric Coding for High-Quality Audio

Ralph Sperschneider,Daniel Homm,Louis-Henry Chambat,
This paper describes an error resilient source coding approach for variable length codes. The scheme to be presented has been designed to make the Huffman coded scalefactor data of MPEG-2/4 Advanced Audio Coding (AAC) more resilient against transmission errors and has been adopted by the ISO/IEC 14496-3 (MPEG-4 Audio) standard. The scalefactor coding scheme enables bidirectional decoding and enhanced error detection, taking into account the differential coding approach used for scalefactor data. Along with the basic ideas associated with the coding scheme, the paper presents appropriate error concealment strategies, which allow for sophisticated scalefactor reconstruction. Subjective test results are presented in order to illustrate the performance of the discussed approach in the presence of adverse channel conditions.
Error Resilient Source Coding with Differential Variable Length Codes and its Application to MPEG Advanced Audio Coding

Alberto Duenas,Rafael Perez,Begona Rivas,Enrique Alexandre,Antonio Pena,
This paper discusses a new DSP based real-time implementation of the MPEG-2/4 AAC standards using a set of non-optimal solutions to reduce the computational load. These solutions comprise a novel implementation of a MDCT-based psychoacoustic model, managing a birth/death scheme which aims to overcome false power spectrum estimates due to the cosine transform, a smoothing of the scalefactors guided by an estimation of the increment in the quantization noise and a loop-free bit allocation method based on an overestimation of the noise level. These techniques provide both robustness and efficiency to the scheme. Other fixed-point DSP programming considerations required to achieve the real-time implementation, including a choice of a MDCT fast algorithm, are presented.
A Robust and Efficient Implementation of MPEG-2/4 AAC Natural Audio Coders

Ashish Aggarwal,Kenneth Rose,
In this paper, we present quantization techniques to improve the low rate performance of a scalable audio coder. We show that a conditional enhancement-layer quantizer is effective in exploiting the statistical dependence of the enhancement-layer signal on the base-layer quantization parameters. It fundamentally extends our prior work on compander domain scalability, which was shown to be asymptotically optimal in the context of entropy coded uniform scalar quantization, to systems with non-uniform base-layer quantization. Moreover, in the important case that the source is well modeled by the Laplacian density, we show that the optimal conditional quantizer is implementable with only two distinct switchable quantizers. Hence, major savings in bit rate are recouped at virtually no additional computational cost. Further improvement in performance is achieved at the expense of computational complexity when the proposed quantization scheme is incorporated within an efficient ``trellis-based'' search for the quantization parameters. For example, we implemented the proposed scalable coder within the MPEG-AAC framework with four 16kbps layers build using the MPEG-AAC framework and achieved performance approximately that of a 56kbps non-scalable coder on the standard test database of 44.1kHz audio.
Approaches to Improve Quantization Performance Over the Scalable Advanced Audio Coder

Jon McClintock,
This document seeks to determine the cost of Audio Coding in terms of its potential impact on the listener base, to identify the attributes and effects of competing digital audio compression algorithms and in conclusion to propose methodologies for maintaining the integrity of audio program material through the DAB delivery chain.
Maintaining Audio Quality in the IBOC Era

Stefan Meltzer,Reinhold Bohm,Fredrik Henn,
Combining the novel SBR technology (Spectral Band Replication) with MPEG AAC audio coding results in the most powerful audio codec available today. It's unprecedented compression efficiency is a key success factor for new emerging digital broadcasting systems such as Digital Radio Mondiale (DRM). DRM is a industry consortium that has been founded in 1998 to create a new digital broadcasting standard for the frequencies below 30 MHz (long, medium and short wave). Being finalised in January 2001 and recommended by ITU, the DRM standard offers significantly improved audio and receiption quality in combination with a large coverage area, thus offering a wide variety of new opportunities for broadcasters. In spite of being limited in bitrate due to the requirement to keep the existing 9/10kHz channel spacing in those frequency bands, DRM is able to offer very high audio quality by using SBR-enhanced MPEG AAC audio coding. The paper will focus on the advantages of AAC+SBR for use in digital broadcasting, describe the DRM system as a whole and talk in more detail about the audio coding tools available in the DRM specification.
SBR Enhanced Audio Codecs for Digital Broadcasting Such as "Digital Radio Mondiale" (DRM)

Thomas Ziegler,Andreas Ehret,Per Ekstrand,Manfred Lutzky,
mp3PRO combines the advantages of a well-known worldwide standard with the potential of Spectral Band Replication (SBR). Using SBR as key technology avoids the usual band width limitation for low bitrate coding, which can be observed with traditional audio codecs like mp3. Enhancing mp3 with SBR results in full audio bandwidth without annoying artifacts, even at bitrates below 64 kbps. In addition, the specific features of SBR allow mp3PRO to be forward andbackward compatible to mp3. mp3PRO single chip solutions from two major chip manufacturers will be available in the first quarter of 2002. The paper discusses the key features of mp3PRO and its compatibility with mp3. Furthermore it will outline the requirements for implementation in soft- and hardware.
Enhancing MP3 with SBR: Features and Capabilities of the New MP3PRO Algorithm

Gerhard Steinke,
The more the quality increases on the part of studio engineering – i.e. microphones, pick-up technology, delivery, storage - the more some weakness is evident at the beginning and the end of the transmission chain: (a) Recording studios for classical music mostly have not the required quality parameters for the artists and the information to be recorded. A sone recommandable example the large recording hall in Berlin and its charcteristic are discussed. Experiences with multichannel sound productions in this hall are covered. (b) On the other side, standard listening conditions of control rooms - see AES document TD 1001.0.01-05 „Multichannel surround sound systems“ – have to be considered from time to time to show open questions and the uncertain situation at home.
Room-acoustical and Technological Aspects for Mulichannel Recordings of Classic Music

Slawomir K. Zielinski,Francis J. Rumsey,Soren Bech,
The subjective effects of controlled high frequency limitation of the audio bandwidth on assessment of audio quality were studied. The investigation was focused on the standard 5.1 multichannel audio set-up (Rec. ITU-R BS.775-1) and limited to the optimum listening position. The effect of video presence on audio quality assessment was also investigated. The results of the formal subjective test indicate that it is possible to limit the high frequency content of the centre or of the rear channels without significant deterioration of the audio quality for some of the investigated programme material types. Video presence has small effect on audio quality assessment.
Subjective Audio Quality Trade-offs in Consumer Multichannel Audio-visual Delivery Systems. Part I: Effects of High Frequency Limitation

Francis J. Rumsey,Wyn Lewis,
The effect of rear microphone spacing in a five-channel omni-directional array was evaluated in respect of the subjective attributes ‘envelopment’, ‘spaciousness’ and ‘naturalness’. Preference results were also obtained and a range of different programme material types was evaluated. Results suggest that, although large differences were not noticed, spacing has a statistically significant effect on envelopment and spaciousness, with spacings larger than the critical distance of the room giving rise to higher levels of these attributes. Distinct preference was shown for spacings of three and four metres as opposed to the extremes of two and five metres, in the particular acoustic environment tested, suggesting that beyond a certain microphone spacing the reproduced spatial impression becomes less pleasing or natural.
Effect of Rear Microphone Spacing on Spatial Impression for Omnidirectional Surround Sound Microphone Arrays

David Griesinger,
The apparent position of speech and music sound sources was investigated using both a two channel loudspeaker array and a three channel loudspeaker array. The results showed that a sine-cosine pan law was reasonably accurate for the three channel array, but consistently produced inmages that were wider than expected with a two channel array. The discrepancy was investigated using a headphone nmodel. We found the apparent position depends strongly on the spectrum of the source, with speech frequencies tending to dominate the overall impression.
Stereo and Surround Panning in Practice

Edwin Pfanzagl-Cardone,
Abstract: The reappearance of surround sound in the form of the DVD's 5.1 format has led sound engineers to re-evaluate current microphone techniques used for stereo recording. The author has measured various 2-channel main-microphone signals in order to prove that their correlation is strongly frequency dependent. Due to properties of the human hearing mechanism as well as standard loudspeaker playback-arrangements frequencies below approximately 800 Hz are particularly critical in terms of faithful spatial reproduction in a stereo as well as 5.1 surround environment and therefore deserve special attention already during the recording process. Conclusions from the measurements are drawn and a microphone system ("AB-Polycardioid Centerfill"), well suited for 5.1 surround is proposed.
In the Light of 5.1 Surround: Why "AB-Polycardioid Centerfill" (AB-PC) is Superior for Symphony-Orchestra Recording

Charles Fox,Wade McGregor,
The paper describes a multiple microphone array method and mounting system of a modular design. The Modular Microphone Array facilitates accurate and repeatable transducer configurations suited to a variety of studio and field recording situations and standards, including surround sound recording. The design supports a wide variety of transducer types (microphones) and is independent of any specific manufacturer's model or type. The novel overall shape, with attendant capability to be modified, is a further attribute. The light weight of this design ensures portability and ease of deployment in diverse field recording situations: musical performance recording, film and video production, broadcast, sound effects recording, ambient soundscape recording, and other professional audio applications.
A Modular Microphone Array for Surround Sound Recording

Michael Williams,
Individual Segment Coverage for a given Multichannel Microphone System is normally considered (if at all) only in the horizontal reference plane However, any microphone system has sound pick up throughout the spherical zone surrounding the microphone system. One must therefore know with reasonable precision the operational characteristics of any microphone system in use, at all angles of elevation to the reference plane, before being able to predict the localisation of sound sources outside the reference plane, the distribution of the reverberant field, or the localisation of early reflections within the reproduced sound image. Knowledge of these characteristics can also influence considerably the mechanical design of a variable microphone support system for multichannel microphone arrays.
Multichannel Microphone Array Design: Segment Coverage Analysis Above and Below the Horizontal Reference Plane

Helmut Wittek,Gunther Theile,
The useful pick-up sector of a stereophonic microphone is indicated by the recording angle. It is based on phantom source shift data due to resulting inter-channel level and/or time differences. However, in related literature the recording angles of known microphones such as XY, Blumlein, AB, ORTF, OCT, etc. are differing because they are determined from different interpretations of these data. It is proposed to rest on the so-called localisation curve, which describes the directional translation of sound sources within the loudspeaker basis and corresponds with directional balancing results in practical recording situations. The newly defined "Recording Angle_75%" is proposed to be the suitable key value in the Tonmeisters’ practice.
The Recording Angle – Based on Localisation Curves

An兊al J. S. Ferreira,
MDCT based perceptual audio coders shape the quantization noise according to simple psychoacoustic rules and general behavioral aspects of the audio signal such as stationarity and tonality. As a consequence, the resulting compressed audio representation has little semantic value making difficult MPEG-7 oriented operations such as feature extraction and audio modification directly in the compressed domain. First results in this perspective are reported using an enhanced version of an MDCT based perceptual coder that implements sinusoidal modeling and subtraction directly in the MDCT frequency domain, as well as spectral envelope modeling and normalization. The implications on the coding efficiency are also addressed.
Perceptual Coding using Sinusoidal Modeling in the MDCT Domain

Manuel Rosa-Zurera,Pedro Vera-Candeas,Nicolas Ruiz-Reyes,Francisco Lopez-Ferreras,Damian Martinez-Munoz,
The application of the matching pursuit algorithm for extracting sinusoidal components and transients from audio signals is proposed. The resulting residue is perceptually modelled as a noise like signal. This multi-part model (Sines + Transients + Noise) is used for audio coding purposes. First of all, an accurate detection of transients in audio signals is required. When a transient is detected, energy-adapted matching pursuits are accomplished using a wavelet-packet based dictionary and a dictionary of sinusoidal functions. Otherwise, the matching pursuit algorithm is only applied with the harmonic dictionary. In both cases, the resulting residue is then modelled as a noise-like signal using the Equivalent Rectangular Bandwidth (ERB) model. The parameters of this multi-part model are efficiently quantized, taking into account psycho-acoustical information, so as to assure high perceptual quality at low bit rates. The combination of these all ideas results in nearly transparent audio coding at binary rates lower than 32 kbps for most of the CD-quality one channel audio signals considered for testing.
Energy-adapted Matching Pursuits in Multi-parts Models for Audio Coding Purposes

Kris Hermus,Werner Verhelst,Patrick Wambacq,
Total Least Squares (TLS) algorithms automatically decompose (audio) frames into a number of exponentially damped sinusoids. This can provide for more efficient modeling than plain sinusoidal modeling, especially in the case of transitional frames. Straightforward implementations of TLS optimize a SNR criterion. In our implementation we apply TLS in a subband scheme in which the number of damped sinusoids is both frame and subband dependent. This is made possible through the use of perceptual information provided by the MPEG-I psycho-acoustic model I. Experiments on different audio tracks provide proof of concept for our perceptual ESM, and illustrate the significant reduction in modeling components compared to a non-perceptual ESM.
Perceptual Audio Modeling Based on Total Least Squares Algorithms

Kelvin H. C. Eng,Dong Yan Huang,Say Wei Foo,
High order linear predictive coding (LPC) analysis, as a pre-preprocessing stage in an audio codec designed for wideband arbitrary audio signals, is found to be particularly beneficial for audio samples of an instrumental nature compared to that of a vocal nature. With increasing LPC orders, it is imperative to keep the bits consumed by the pre-processing stage constant as a proportion of the total bit rate. To achieve this, the properties of the Line Spectrum Pairs (LSP) parameter are exploited in a proposed multistage vector quantization scheme for high order LPC. Notably, incorporating LSP differences in the design of the quantizer was the most efficient, with no perceptible differences at an average of 1.645 bits/sample, compared to the case of scalar quantization, which is used as a benchmark at 2 bits/sample. Particularly, using LSP differences as a bit allocation mechanism proves to be especially effective in dealing with clips of a percussive nature.
High Order LPC and Line Spectrum Pairs

Kelvin H. C. Eng,Dong Yan Huang,Say Wei Foo,
A low delay variable bit rate audio codec, implemented for wideband arbitrary audio signals, combines inter-frame and intra-frame bit allocation in an adaptive scheme. An outer-loop uses a moving average noise-to-mask ratio (NMR) indicator and a bit reserve to adaptively allocate bits from frames of a lesser perceptual significance to frames of a greater perceptual significance. An inner loop allocates the available bits to each line of the spectrum via an adaptive algorithm based on a weighting function derived from the masking thresholds. Through informal listening tests, the proposed new bit allocation method resulted in an improvement in audio quality over most samples, as opposed to one using a single adaptive intra-frame loop. Particularly, these improvements were more perceptible at the lower bit rates of about 36 kbps as opposed to the higher bit rates of about 64 kbps. Numerical results also indicate a savings of 8 – 10% of the total bit rate.
A New Bit Allocation Method for Low Delay Audio Coding at Low Bit Rates

Christof Faller,Frank Baumgarte,
Binaural Cue Coding (BCC) is an efficient representation for spatial audio that can be applied to stereo and multi-channel audio compression. Conventional mono audio coders are enhanced with BCC for coding of stereo and multi-channel audio signals. There is only a relatively small overhead in bitrate for encoding stereo and multi-channel audio signals compared to the bitrate of the mono audio coder alone. The presented implementations have low complexity and are suitable for real-time applications. Results from subjective tests suggest that the proposed scheme provides better audio quality for encoding of stereo audio signals than conventional perceptual transform audio coders for a wide range of bitrates.
Binaural Cue Coding Applied to Stereo and Multi-Channel Audio Compression

Frank Baumgarte,Christof Faller,
Intensity Stereo Coding (ISC) is a joint-channel audio coding tool that is part of the ISO/MPEG standards. ISC can introduce severe distortions if applied to full bandwidth or to audio signals with a dynamic or wide spatial image. In contrast, Binaural Cue Coding (BCC) is a systematic approach for representing auditory spatial cues which includes ISC as a subset. BCC is independent of the time/frequency resolution used by the coder, thus it can be optimized for spatial image reproduction. Subjective listening tests confirm that ISC is significantly compromised by an inappropriate time/frequency resolution and that BCC has superior quality and robustness.
Why Binaural Cue Coding is Better than Intensity Stereo Coding

Sascha Moehrs,Jurgen Herre,Ralf Geiger,
Perceptual audio coding of high quality audio signals is nowadays widely used. To reproduce the audio data, the bitstream is expanded into an uncompressed audio format by the decoding algorithm. As shown previously, it is feasible to recover the encoding compression parameters from the decoded audio signal and even translate a decoded audio signal back into its original bitstream representation. This technique is referred to as "inverse decoding" and has several interesting applications, including tandem-resistant re-encoding of audio signals. The paper illustrates practical results obtained by the first working implementation of an "Inverse Decoder" based on the popular MP3 coder. The performance of the algorithm is evaluated in terms of reconstruction precision and computational complexity. Finally, algorithmic issues are discussed.
Analysing Decompressed Audio with the "Inverse Decoder" - Towards an Operative Algorithm

Robert E. (Robin) Miller,
Two multi-channel microphone techniques for natural music and sound effects reproduction are experimentally compared. Simultaneous surround sound recordings of several genres of music and ambience are made in concert hall, studio, and outdoors. Trained listeners subjectively evaluate the abilities and tradeoffs of each system to recreate accurate panoramic localization and spatial impression of opera, bluegrass with audience participating, flute quartet, brass quintet, marching bands with surrounding crowd and building echoes, and 360° “Walkabout” azimuth test. Differing speaker layouts for 5.1 and “Panor-Ambiophonic” Surround are shown to satisfy two distinct listening audiences, which are further divided into home, automotive, and PC markets. An approach to recording level-setting, compatible production, and delivery formats are introduced to satisfy these diverse end uses.
Contrasting ITU 5.1 and Panor-ambiophonic 4.1 Surround Sound Recording Using OCT and Sphere Microphones

Culann James mac Caba,
The evolution of surround audio presents new challenges to recording engineers. While there may be some consensus on best practice for the 5.1 playback format, content compatibility for future formats is increasingly important. This paper describes practical experiences of these issues in a Music Recording context, using Ambisonic technology to achieve excellent spatial realism without changing classic workflow practices. Furthermore, the paper shows how this approach allows for simultaneous mixes to be made for playback scenarios as yet undefined, including possible "with-height" formats.
Surround Audio That Lasts: Future-proof Ambisonic Recording and Processing Technique for the Real World

Edo Maria Hulsebos,Diemer de Vries,
In order to correctly reproduce the acoustic wave field in a hall over a large listening area through a Wave Field Synthesis reproduction system, impulse responses are nowadays measured along arrays of microphone positions. These measurements could be used directly for reproduction if the positions of the microphones in the hall correspond to the positions of the loudspeakers in the reproduction array (holophony). However, this approach is not very flexible and the amount of data and real-time processing required is extremely large. Therefore a relatively small circular microphone array is used instead and the measured data are spatially and temporarily parameterized to obtain more playback flexibility and to reduce the amount of data and real-time processing without sacrificing perceptual quality and listening area size. In this paper these parameterization techniques are discussed and applied to circular array measurements done in the Amsterdam Concertgebouw. The main application of all this is future high quality audio with realistic room acoustic reproduction over a large listening area.
Parameterization and Reproduction of Concert Hall Acoustics Measured with a Circular Microphone Array

Renato Pellegrini,Ulrich Horbach,
A perception-based parametric model to design auditory virtual environments using a limited small number of reflections and a model of diffusion was used to simulate virtual environments in a binaural (2 channels) and a wave field synthesis (more than 28 channels) reproduction system. Different perceptual parameters such as the room-size, the distance to the source(s), the positioning of the sources(s), the apparent source width, and more can be adjusted by simple control parameters. Rather than recreating a real room as accurately as possible, the main goal was to design nice-sounding environments for speech and music-reproduction. The achieved quality of the resulting virtual rooms is compared to the quality of measured real rooms.
Perception-based Design of Virtual Rooms for Sound Reproduction

Nick Zacharov,Matti Fredriksson,
This study considers the spatial sound reproduction requirements to reproduce environmental sounds in a subjectively natural manner. Recordings for this work were performed in six different sound environments, including different styles of musical performances and also other environmental sounds. Both Soundfield microphone and binaural techniques were used for these recordings. The recordings were processed for reproduction over seven different multichannel sound systems in a standard listening room, ranging from stereo to periphonic reproduction. A formal listening test was arranged employing 14 listeners to evaluate the naturalness of reproduction through these systems in a standard listening room. The results indicate that large number of reproduction channels is not always necessary to reproduce subjectively natural sound. This paper presents the results of the study.
Natual Reproduction of Music and Environmental Sounds

Werner P.J. de Bruijn,Marinus M. Boone,
When in an audio-visual system spatialized audio reproduction is combined with 2D video projection this has effects on the resulting audio-visual experience of observers. Specifically, a mismatch between perceived auditory and visual source directions may occur for observers that are not in the ideal viewpoint of the video projection. Subjective experiments were carried out to investigate the effects in the context of a life-size video conferencing system. Results show that the non-identical perspectives of the audio and video reproductions indeed have a significant influence on subjects' evaluation of the total system. Solutions are proposed to improve the matching of the audio and video scenes for a large listening area.
Subjective Experiments on the Effects of Combining Spatialized Audio and 2D Video Projection in Audio-visual Systems

Andrzej Czyzewski,Bozena Kostek,Piotr Odya ,
The problem of influencing surround sound perception by video content was addressed employing subjective testing procedures in which experts listened to the sound with- and without video image presence and provided their answers. Results of experiments demonstrated in which cases and how video may affect the localization of virtual sound sources. The obtained data were then analyzed by means of modern techniques of intelligent data exploration and knowledge discovery allowing finding some hidden relations between semantic descriptors of subjective impressions. Finally, basing on the results of data analysis a set of rules concerning mastering of multichannel audio to accompany various types of video content were derived. Some results of this study will be presented and discussed in the paper.
Making Surround Audio Considering Image Proximity Effect

Tobias Neher,Francis J. Rumsey,Tim Brookes,
This paper presents some preliminary results from an ongoing study into methods for the training of listeners in subjective evaluation of spatial sound reproduction. Exemplary stimuli were created illustrating two spatial attributes: individual source width and source distance. Changes in each of the two attributes were highly controlled in an attempt to allow uni-dimensional variation of their perceptual effects. The stimuli were validated with the help of an experienced listening panel and then used to instruct naïve listeners. By comparing the listeners' performances at ranking a number of stimuli before and after the training sessions the effectiveness of the adopted method was quantified.
Training of Listeners for the Evaluation of Spatial Sound Reproduction

Amber Naqvi,Francis J. Rumsey,
This paper presents the results of computer simulation and the actual room measurements of reflection patterns from the active reflector arrangement in a reference listening room which will be used to create artificial reflections in a five speaker, surround listening configuration. This formulates the third and final phase of design experiments relating to the panel arrangement to create a perceptually reflection free zone involving the analysis of computer modelling and room measurement results. Results of a pilot listening test using a well defined stimuli with artificial reflections generated by the DML loudspeaker arrangements are also presented.
The Active Listening Room: Part 3, A Subjective Analysis

Akira Nishimura,Nobuo Koizumi,
A method of sampling jitter measurement based on time-domain analytic signals is proposed. Computer simulations and measurements were performed in order to compare the proposed method to the conventional method, in which jitter is evaluated based on amplitudes of sideband spectra for observed signals in the frequency-domain. The results show that the proposed method is effective in that this method: 1) provides good temporal resolution as a result of the direct derivation of jitter waveform, 2) achieves higher precision in measurement of jitter amplitude, and 3) can separate sidebands that originated in sampling jitter from sidebands caused by amplitude fluctuations, while observing the power spectra, because both amplitude fluctuation waveform and jitter waveform can be derived from analytic signals.
Measurement of Sampling Jitter in Analog-to-Digital and Digital-to-Analog Converters Using Analytic Signals

Pierre Touzelet,Menno van der Veen,
This work is an addition to a previous AES preprint (4643), where a new generalised push-pull tube amplifier topology was introduced, allowing an easy evaluation of maximum output power, optimal primary impedance of the output transformer and the damping factor, by a direct simulation of equations on a computer. But direct simulation is not always the right tool to use when accurate informations have to be derived at small input signal. Then the appropriate tool is a systematic use of Taylor's expansions. The aim of this paper is to show how to determine these expansions and which new results and insights can be derived from them, like new stability criteria, output impedance variations, harmonic distortions and effective voltage gains.
Small Signal Analysis for Generalised Push-pull Tube Amplifier Topology

Stefan Brachmanski,
The paper describes the new measurement system and method of speech transmission quality evaluation called "modified intelligibility test with forced choice" (MIT-FC). The MIT-FC method provides fully automatized measurement of speech intelligibility of rooms. The listener's task is to selected on computer monitor which of alternative utterances presented him visually was spoken. The computer's program automatically calculates speech intelligibility and the factor of speech quality. The results are compared with RASTI measurements. The measurement system is based on the local computer network and enables to carry out measurements in several places of the room (depending on the size of the local computer network).
The Automation of Subjective Measurements of Speech Intelligibility in Rooms

Ron Raangs,Erik Druyvesteyn,
With use of a three-dimensional particle velocity and pressure sensor the active sound intensity can be measured. A simple three-dimensional calibration method of such a sensor is described along with different methods for measuring direct particle velocity and active sound intensity in the far field in a reverberating room. The mean sound intensity can be calculated out of the product of the instantaneous pressure and particle velocity, but also out of the calculated direct pressure and direct particle velocity. In case of an ideal situation the sound intensities are the equal in magnitude and direction, differences yield information about the error and can help increase accuracy of measurement.
Sound Source Localization Using Sound Intensity Measured by a Three Dimensional Pu-probe

Matti Karjalainen,Paulo Antonio Andrade Esquef,Poju Antsalo,Aki Makivirta,Vesa Valimaki,
Discrete-time analysis and modeling of reverberant and resonating systems has many applications in audio and acoustics. In a recent paper (AES110, Preprint 5290) we formulated techniques for the estimation of modal decay parameters from noisy response measurements, targeting to systems such as room reverberation and modal decay as well as musical instrument modeling. In this paper we extend the methodology to AR and ARMA modeling of measured responses by all-pole and pole-zero •lters. In addition to an overview of standard techniques we propose a spectral zooming technique that is useful for resolving very closely positioned modes and high-density modal clusters. Sensitivity to background noise is also studied. Application cases are taken from analysis and modeling of room responses, loudspeaker-room equalization, and estimation of parameters for musical instrument modeling.
AR/ARMA Analysis and Modeling of Modes in Resonant and Reverberant Systems

Russell Mason,Francis J. Rumsey,
A controlled subjective experiment was undertaken to evaluate the relative merits of objective measurement techniques for predicting selected perceived spatial attributes of reproduced sound. The stimuli consisted of a number of anechoic recordings of single sound sources that were reproduced in a simulated concert hall and captured using a number of simulated multichannel microphone techniques. These were reproduced in a listening room and the subjects were asked to judge the perceived source width and perceived environment width of each stimulus. A number of objective measurements were made at the listening position and these were then compared with the subjective judgements. The results showed that a perceptually-grouped measurement of the experimental stimuli using a technique based on the interaural cross-correlation coefficient matched the subjective judgements most accurately, though the difference between this measurement and a number of other types was small.
A Comparison of Objective Measurements for Predicting Selected Subjective Spatial Attributes

Craig Jin,Virginia Best,Simon Carlile,Thomas Baer,Brian Moore,
Human sound localization of speech stimuli was tested using an accurate head-pointing task in three sound conditions: (i) broadband (22 Hz to 16 kHz); (ii) low-pass (22 Hz to 8 kHz); (iii) spectrally-smeared broadband. The experiments were conducted in virtual auditory space (VAS) so that reduced frequency selectivity (a consequence of cochlear hearing loss that has effects similar to spectral smearing) could be simulated in normally-hearing listeners. Broadband noise localization provided a control. Results show that broadband speech is not localized as accurately as broadband noise and that there is a significant reduction in localization accuracy for both the low-pass and spectrally-smeared sound conditions. The data show that accurate high-frequency spectral information is important for speech localization.
Speech Localization

Jan Berg,Francis J. Rumsey,
Assessment of the spatial quality of reproduced sound is becoming more important as the number of techniques and systems affecting such quality increases. The presence of dimensions forming spatial quality has been indicated in earlier experiments by using attributes as descriptors for the dimensions. These attributes have been found relevant for describing the spatial quality of stimuli subjected to different modes of reproduction. In this paper, new attributes are elicited and the applicability of these and previously encountered attributes for assessment of spatial quality is tested in the context of new stimuli, recorded by means of 5-channel microphone techniques and reproduced through a 5.0 system.
Validity of Selected Spatial Attributes in the Evaluation of 5-channel Microphone Techniques

Yufei Tao,Anthony I. Tew,Stuart J. Porter,
Differential pressure synthesis (DPS) estimates the free field acoustic pressure on the boundary of an object from its geometry. The DPS method pre-calculates a database of pressure changes caused by introducing orthogonal shape deformations to a template shape. Pressures on the object are synthesised by summing weighted pressure components from the database. Given an appropriate template, the accuracy of DPS approaches that of the proven boundary element method (BEM). Yet computationally its performance is much closer to that of simple spherical head approximations. Pressures are synthesised for a 2D shape and a 3D KEMAR head model and are compared with those from direct application of the BEM. Applications and constraints of DPS are discussed, including its potential for the rapid estimation of head-related transfer functions.
The Differential Pressure Synthesis Method for Eastimating Acoustic Pressures on Human Heads

Andreas Silzle,
The aim of the present work is the reproduction of a five-channel signal over headphones. Informal listening tests show that a large number of different HRTFs do not have the desired level of quality. The frontal localisation was either elevated or completely undefined. The coloration in all directions - even with correct IPTFs - was far too strong for a high quality reproduction. In order to overcome this problem two HRTFs sets together with IPTFs were selected out of a big database. These transfer functions were subsequently tuned by a tuning expert. The main methods used for tuning were smoothing and parametric equalizing of amplitude and phase with individual settings for every direction and for the left and right ear. Listening experiments that confirm the tuning results for a panel of listeners are presented and discussed. The resulting transfer functions have clearly reduced coloration and improved global localisation although with modest improvements in the frontal position.
Selection and Tuning of HRTFs

Takashi Takeuchi,Philip A. Nelson,Martin Teschl,
In order to examine the characteristics of various elevation positions of the control transducers for binaural reproduction over loudspeakers, an analysis is performed of both the spectral and dynamic cues that relate to localisation. The frequency response of the plant that relates transducer outputs to ear pressure signals suggests that control transducer positions will be promising at positions in the frontal plane above the listener's head. The analysis of the dynamic cues induced by unwanted head rotation also strongly supports the use of transducer locations in the frontal plane. A subjective experiment is performed and the control transducer location above the head clearly shows an advantage with respect to the robustness against the transmission of false dynamic information.
Elevated Control Transducers for Virtual Acoustic Imaging.

Peter John Philipson,Jonathan Hirst,Simon Woollard,
The assessment of multichannel audio systems is often based on the ability of a particular loudspeaker configuration to generate the appropriate interaural cues for a listener to localise a stable virtual image between the loudspeakers. A common method used to judge the performance of a system is a subjective listening test where listeners are asked to indicate the direction of the virtual image. This work investigates an alternative method where a dummy head can be used to capture the cues and analysis of these signals allows a third octave band predication of where the image is most likely to be perceived. The MATLAB implementation is tested using two common multichannel systems and the results are compared with subjective results.
Investigation into a Method for Predicting the Perceived Azimuth Position of a Virtual Image

Michael Lannie,Vadim Sukhov,
Most of the Russian cinema halls were designed and built in the past for one channel sound reproduction. Now they are old fashioned and not popular with the public. It is well understood by the owners of the movie theatres that only modern rooms with multi-channel sound reproduction may be profitable. Acoustic design of 11 existing movie theatres (19 halls) was done during the last 2 years. The main parts of this design are: architectural acoustics of the rooms, sound insulation of the room's surfaces, noise control of ventilation systems, acoustic computer modelling of the loudspeaker systems for the screen and surround sound channels. There are presented the results of this design and the acoustic measurements have been done in the renovated halls. Some general tendencies for the renovation of the existing movie theatres have been estimated.
Acoustic Renovation of the Cinema Halls in Russia

Ernst-Joachim Voelker,Wolfgang Teuber,
The very famous old Weimar hall was replaced by a new building with a concert hall in 1999. The new hall is also used as a multi-purpose hall for various activities such as concerts, musicals, pop concerts and conferences. The media design includes sound and video with computer based connections. The hall contains a sound system with hidden speakers above the stage and in the back part of the hall although the ceiling is about 12 meters high. The reverberation time of the hall is approximately 1.7 s in a closed smaller room compared to around 2.2 s in a more open space created by moving walls. The sound system was designed for the long reverberation time. In addition, the Weimar Hall houses a conference center for which the sound system also had to be designed. The different systems are connected through a digital routing system based on fiber optics, including the control rooms. This hall is now used for many events. The paper will describe results and experiences.
Quality Criteria for the Sound System in the New Weimar Hall in Weimar, Germany

Ben H.M. Kok,Erdo Groot,
High quality monitoring is hard to achieve in any confined space. This is particularly true in mobile recording trucks, due to the extremely limited space. This paper describes the design of such a truck where the requirement was high quality monitoring in 5.1 format, suited for classical recording. The design shows that the limited space in a truck does not imply that there is no room for acoustic treatment. Measurements in the completed truck show that an accurate monitoring environment has been realized and the acoustic qualities are not hindered by the limited dimensions Considerations on the choice of recording equipment as well as the experiences in its first year of use will be discussed.
A Mobile for Classical Recording

Wolfgang Teuber,Ernst-Joachim Voelker,
In the European Central Bank (ECB) in Frankfurt, new conference and press conference rooms with audio and media-technical equipment will be outlined and described. Conference systems, with more than 150 delegate stations, spread over 3 rooms and including two separate audio buses can be used for e.g. presidential or participant positions. Using separate controls at the mixing desk, various frequency responses and separate limiters, each station is capable of individual settings. The operation procedure is carried out PC-controlled through a touch screen with the possibility of storing system configurations. In each conference room compact loudspeaker boxes with good frequency response are mounted above the perforated surface of the ceiling. Before installation, tests and verifications were carried out through the program Modeler to check the acoustical transmission of ceiling material and to calculate sound fields. Media technical installations were integrated with video projection, cable and glass fiber connections to ensure a link to O.B. Vans as well as to radio/TV stations over existing broadband fiber networks. Concepts, design, measurement results and experiences will be presented.
2-Bus Conference System for the European Central Bank (ECB), Frankfurt

Todd Welti,
From an intuitive standpoint, putting a large number of subwoofers at different locations in a room might seem likely to excite room modes in a more “balanced” manner, as compared to a single subwoofer. This idea has particular appeal where there is not a single listening location, but rather a listening area. In this case one looks for consistency of acoustical response with in this area. Typical approaches to this problem involve exciting all modes evenly, or trying not to excite modes at all. With this in mind, an investigation was made to determine if using a large number of subwoofers is advantageous, and in particular what configurations give the best results. Several interesting and surprising results were uncovered along the way.
How Many Subwoofers are Enough?

Ville Pulkki,Tapio Lokki,Lauri Savioja,
Traditional image-source method for room acoustic modeling neglects both diffraction and diffusion. In this study the image-source method is evolved to also include edge diffraction as diffractive image sources. Methods to find diffracting edges from a geometric model of a room, and to visualize diffracted image sources are presented. Issues to limit computational complexity and memory usage in image-source method are reviewed and implemented. Diffractive image sources are visualized for two test cases, and impulse responses are calculated and compared with measured ones.
Implementation and Visualization of Edge Diffraction with Image-source Method

Peter Mapp,
A number of metrics exist to measure the potential speech intelligibility performance of a sound system. Most of these are derived from techniques developed for assessing natural speech intelligibility in auditoria rather than directly for sound systems whose acoustic characteristics and performance are highly frequency dependent. The paper compares measurements made on 81 sound systems that covered a wide variety of applications and acoustic environments. It was found that intelligibility metrics such as STI, RaSTI, C50 and C80 showed a high degree of inter correlation whereas surprisingly other measures such as RT60, EDT and % Alcons showed little or no correlation.
Relationships between Speech Intelligibility Measures for Sound Systems

Manfred Krause,
After the invention of paper tapes coated with magnetic powder in 1928, AEG and BASF launched a joint venture in order to develop a tape recording system. Stepwise improved between 1931 and 1934, AEG presen-ted 1935 the model K2, named "Magnetophon" together with the acetyl cellulose tape type "C" of BASF. 1940, AC-biasing in model K4 improved the signal to noise ratio up to 60 dB, surpassing all known recording systems at that time. The steps of development until the 1950th will be presented in detail.
The Legendary "Magnetophon" of AEG

Ernst-Joachim Voelker,Sabine Fischer,
On October 26, 1861, Phillip Reis presented a lecture to the members of the 'Physikalischer Verien' in Frankfurt, Germany. The subject was entitled 'The transmission of tones via galvanic current over wide distances'. His telephone consisted of a thin membrane which produced, with the frequency of sound, a change of the current by just switching a contact. This was easily transmitted to the receiver at around a 200 m distance. The corpus of a violin was shook by a small electromagnetic device which was known from telegraphy. The violin irradiated the sound in such a way that a melody could be heard. Even some words and short sentences were transmitted. For his work and his many improvements he received the title 'Meister des Freien Deutschen Hochstifts'. This 'Hochstift' and it's scientific work was the offspring of the later founded Frankfurt University. The work of Phillip Reis was published in technical periodicals at that time. Some years later, his telephone was produced by a small company in Frankfurt which sold the sender and the receiver all over the world. The small village of Freidrichdorf, located in the Taunus hills near Frankfurt, was, at that time, not interested in having wire connections from village to village. In the 1860's many small German states existed before the unification in 1871. Ten to fifteen years later the need for communication lines using the new telephone of Graham Bell was extreme. Now Phillip Reis was remembered as the first inventor. For him however, it was too late. He died in 1874. The idea of a so called contact microphone was overtaken by the electro-magnetic system of Bell, changing the amplitude of transmitted current. The next step was the invention of Edison's carbon microphone. The paper will describe the work of Phillip Reis at the beginning of a new century of telecommunication technology.
Phillip Reis - From the First Telephone to the First Microphone

Erhard E. Werner,
When the microphone celebrated its 100th anniversary, so many famous products and corresponding literature had been accumulated that only premium authors succeeded in giving short but also comprehensive surveys. Meanwhile microphone technology has been refined in various properties making a "complete" survey even less possible. Therefore some highlights of the principle and of main properties have been selected to show what happened from the first realization to the present stage.
Selected Highlights of Microphone History

Roman Beigelbeck,Heinrich Pichler,Fritz Paschke,
The sound pressure and the sound velocity in the near field of a linear loudspeaker array is proposed and investigated from a mathematical point of view. Four different piston models, employing circular, elliptical and (two different models of) rectangular pistons, are explicitly calculated and closed-form solutions using higher transcendental functions are provided. As a representative overview of results, near field directional diagrams for different frequencies, different distances of the field point, as well as for varying phases of signal, are shown. Based on these results some optimization methods for phases of signal, loudspeaker geometry and array geometry are developed.
Near Field Analysis and Optimization of Linear Loudspeaker Arrays

Elena Prokofieva,Kirill V. Horoshenkov,Neil Harris,
An experimental investigation is reported into the effects of a porous layer on the radiated sound intensity from distributed mode loudspeakers (DMLs). For an unbaffled panel, the results suggest that attaching an absorbent layer behind the panel leads to a smoothing of the spectra of sound intensity. When a specially developed enclosure was used, an improvement in the low frequency response of DML panel was observed. The inclusion of a porous layer in this enclosure further reduced the fluctuations of the emitted sound spectra, and smoothed the resonance peaks in the low frequency range.
Intensity Measurements of the Acoustic Emission from a DML Panel

Lee David Copley,Trevor J. Cox,Mark R. Avis,
The usefulness of Distributed Mode Loudspeakers (DMLs) in arrays has been investigated. The design goal is an array that evenly distributes energy over a hemi-disc. A model has been developed to predict trends of DML array radiation and compared with measurements. This model enables the performance of established array technologies to be tested. When several DML panels are positioned in an array, spatial aliasing results, as would be expected. Conventional array techniques, such as number theory modulation, can improve the radiation characteristics. Complete omni directionality is not achieved.
Distributed Mode Loudspeaker Arrays

Etienne Corteel,Ulrich Horbach,Renato Pellegrini,
A new filter design method to synthesize the wave field of a given virtual source in a horizontal plane is presented. The reproduction of the wave field is performed by a novel “multi-exciter” distributed mode loudspeaker, which contains a special array of transducers, attached on a single panel. Below the spatial aliasing frequency, the diffuse behaviour of the panel as well as the individual directional characteristics of the transducers are taken into account accurately. Above that corner frequency, a modified design procedure is proposed, which employs energy considerations and spatial averaging. A number of such modules can be cascaded to a system, which allows true spatial audio reproduction over a large listening area.
Multichannel Inverse Filtering of Multiexciter Distributed Mode Loudspeakers for Wave Field Synthesis

Malcolm John Hawksford,Neil Harris,
Active diffuse loudspeaker arrays can be implemented by incorporating a distribution of small radiating elements each characterized by processes that embed a unique diffuse impulse response. The properties of synthetic diffuse impulses are investigated both individually and collectively and algorithmic generation techniques presented. The diffuse properties of such synthetic arrays are confirmed both by polar response descriptors and spatial correlation techniques.
Diffuse Signal Processing and Acoustic Source Characterization for Applications in Synthetic Loudspeaker Arrays

David Zaucha,
Digital audio systems are unlike conventional analog systems in which signals can be any value between a minimum to maximum and occur continuously in time. Digital audio systems use finite precision in representing signals and coefficients and in performing arithmetic operations. Consequently, system performance is determined by the precision that is used throughout the system. This paper discusses the factors that influence the performance of Infinite Impulse Response filters in high performance audio applications using fixed point arithmetic.
Importance of Precision on Performance for Digital Audio Filters

Prabindh Sundareson,
High performance DSPs have been used extensively for the implementation of signal transforms in transform based audio coders. While the first generation of DSPs featured single stage MAC (Multiple Accumulate) blocks, the current generation of DSPs feature dual-MAC hardware blocks. Though fast algorithms are available for implementation of transforms, a re-look at the algorithms from this semi-parallel architecture point of view is beneficial as it leads to more efficient implementations. This paper looks specifically at the family of lapped transforms, and quantifies the implementation efficiency of traditional fast optimizations on architectures with this type of semi-parallel computing capability, and derives algorithmic methods of increasing this efficiency.
A Re-evaluation of Fundamental Transform Structures for Efficient Implementation on Semi-parallel DSP Architectures

Hiroshi Kato,
Just recently, a new type of delta-sigma converter, Trellis Noise Shaping Converter, has been introduced. When it is used to generate a bit stream in 1-bit digital audio format, its trellis structure with Viterbi algorithm enables more efficient use of data bits, which yields better performance in stability, signal to noise ratio and tonal behavior. It solves most of the performance problem caused by harsh non-linearity inherent in 1-bit quantization. This paper briefly examines its architecture and performance. An emphasis is put on its simple and more predictable quantization noise spectrum in comparison with that of conventional modulators.
Trellis Noise-Shaping Converters and 1-bit Digital Audio

Derk Reefman,Erwin Janssen,
New Sigma Delta modulators (SDM's) topologies for use in Super Audio CD (SACD) applications are introduced, called Sigma Delta pre-correction (SDPC), which allow the generation of ultra-high quality DSD. Spurious peaks, which are known theoretically to exist in SDM's, are present at levels well below -165 dB, even if undithered. Already a slight amount of dither, will further reduce these signals to levels which are with standard numerical precision undetectable.
Enhanced Sigma Delta Structures for Super Audio CD Applications

Derk Reefman,Erwin Janssen,Joshua D. Reiss,Mark B. Sandler,
A method is presented for improving current coding efficiency in DSD signals. The goal of this work is to explore new compression techniques which are tailored to the DSD format and which are meant to complement the current lossless DST compression practice used for SACD. The new technique builds on principles illustrated in previous papers. The method makes use of the highly oversampled character of DSD. Example implementations and results have been obtained. Losses to stability and signal-to-noise ratio have been measured and their audio effects have been minimised and quantified. Lower bounds are established on the compression ratio of these methods. This is viewed as a first step for a potentially constant bitrate compression scheme.
Improved Compression of DSD for Super Audio CD

Malcolm John Hawksford,
Time quantization and noise shaping applied to linear frequency modulation (LFM) can form an alternative although unconventional means of generating 1-bit uniformly sampled code that is similar in structure to a feedback sigma-delta modulator (SDM). Fundamental insight into the SDM process and base-line coding spectrum emerges, where specifically linearity of signal conversion is studied and compared to that of linear pulse code modulation (LPCM). Time dispersive limiters both within and outside the noise shaper are investigated and their consequence on linearity explored. A noise averaging simulation reveals intrinsic distortion and noise modulation to be low when appropriate dither is used.
Time-quantized Frequency Modulation with Time Dispersive Codes for the Generation of Sigma-delta Modulation

James Angus,
This paper clarifies some of the confusion, which has arisen over the efficacy of dither in PCM and Sigma-Delta Modulation (SDM) systems. It presents a means of analysing "in-band" idle tone structure and describes a fair means of comparison between PCM and SDM. It presents results, which show that dither can be effective in S-D systems.
The Effect of Idle Tone Structure on Effective Dither in Delta-Sigma Modulation Systems

John Vanderkooy,Stanley Paul Lipshitz,
This is Part 3 of our ongoing investigation into the behaviour of 1-bit sigma-delta modulators. It addresses the following topics: (a) Angus has claimed that an undithered 1-bit sigma-delta modulator is effectively self-dithered by its internal noise. We show that his conclusion is essentially a quasi-dc one, and that his pictures of self-dither, although suggestive, do not in fact signify the elimination of all correlated artefacts in the way achieved by a properly-dithered multi-bit modulator. (b) The "average-gain-plus-additive-noise" model of the 1-bit quantizer can be misleading as regards its nonlinearities, even though it may be useful for predicting its stability. We briefly discuss this issue. (c) Some undithered modulators clearly show the production of subharmonic tones. We study the conditions under which this occurs. (d) We also have some observations about why the integrated spectrum of many undithered 1-bit modulators is precisely conjugate-even about fS/4. (e) The idle tone in a 3-level (i.e., 1½-bit) sigma-delta modulator manifests itself in a different way from that in a 2-level modulator. We study the effect of dither in suppressing the idle tone in both these topologies.
Towards a Better Understanding of 1-Bit Sigma-Delta Modulators - Part 3

Edward Valeryewith Semyonov,
The method of noise shaping consisting in minimization of quantization noise in neighborhood of harmonics of digital test sinusoidal signal is considered. This method provides quantization noise power on harmonics frequencies considerably smaller, than methods of noise shaping with uniform power spectral density. For example for 16 bit, 44100 Hz sampling frequency, -60 dB level, 1378 Hz test signal this method provides total harmonic distortion (THD) 0.00067% (without noise shaping THD=1.1%, with dither in a frequency band (20000-22050) Hz and a standard deviation 0.46 quantum THD=0.043%). This method can be applied to measurement of nonlinearity of various systems (especially it is actual for DACs, digital filters and so on).
Noise Shaping for Measuring Digital Sinusoidal Signal with Low Total Harmonic Distortion

Remi Payan,
Professional Audio systems target a wide variety of applications for recording, mixing, musical instruments, studios, and broadcasting. The challenge for Pro Audio systems designers is to find solutions not only to satisfy some of the more common requirements (including high sample rates, high performance and high precision) but also to meet system-specific design goals such as minimizing latency in effects processors and bit error tracking in digital mixing consoles. This paper will first summarize these requirements; then discuss hardware architecture and software methodology trade-offs (e.g. sample-by-sample vs. block processing) and present a flexible DSP architecture for these systems.
DSP Software and Hardware Trade-offs in Professional Audio Applications

Juan Pablo-Bello,Laurent Daudet,Mark B. Sandler,
We describe a new method for the estimation of multiple pitch information in recorded piano music. The method works in the time-domain and makes use of a self-generating database of all possible notes. First, we show how accurate polyphonic pitch detection can be achieved given an adequate database. Then an algorithm is proposed that generates the database from the music, using estimation of predominant pitches in the frequency-domain and pitch-shifting techniques. Both systems generate a MIDI representation of the original signal. This method -that can be generalized to any solo instrument- overcomes the usual constraints of the traditional frequency-domain approach regarding intervals and quantity of notes.
Time-domain Polyphonic Transcription using Self-generating Databases

Patrick J. Wolfe,Simon J. Godsill,Wee Jing Ng,Monika Doerfler,
We present an investigation into signal processing models appropriate for audio, and especially high quality musical signals, by means of Bayesian atomic decompositions. At present, many models rely on short-term stationarity of the audio, or highly limiting forms of non-stationarity. Moreover, they are well-suited only to low-level inference tasks. We seek to formulate a new generation of audio models that will address the main limitations of the existing ones and permit high-level inference. As we show, such models result from the marriage of an overcomplete dictionary of time-frequency atoms with structured hierarchical prior probability distributions developed specifically for audio signals, in order to model coefficient correlation in time and frequency.
Audio Signal Modelling Using Bayesian Atomic Decompositions

Kim Hang Lau,
This paper describes an experimental prototype system developed for vocal modification. This system aims to modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung target vocal sample, simulating a non-parametric transfer of singing techniques from the target vocalist to the source vocalist. The system is comprised of a time-varying time/pitch/amplitude modification engine, a pitch-detector and a subsystem for the generation of modification parameters. Although the system has yet to attain a desirable level of robustness, it has successfully generated interesting synthesized samples that demonstrate the idea of a hybridizing vocal performance.
A System for Hybridizing Vocal Performance

George M. Kalliris,Charalampos A. Dimoulas,George V. Papanikolaou,Kostas Avdelidis,Taxiarchis N. Passias,John S. Stoitsis,
The current paper focuses on the design and implementation of a phoneme recognition algorithm that is used to extract the appropriate parameters in order to drive a 3d graphics facial expression and animation procedure. This is used to emulate speech generation to 3d modeled digital characters. At the first development step, LPC, STFT analysis, wavelets, cepstrum and pattern recognition techniques were tested for phoneme recognition and speaker classification. Then, 3d graphics facial expressions and phonemes were related in a library. A client – server application that processes speech, combines library data via morphing techniques and generates a digital character, virtually speaking according to the given speech, was finally designed. Possible applications include cartoon dubbing and web based virtual teleconference.
Phoneme Recognition for 3D Modeled Digital Character Talking Emulation

Erik Larsen,Ronald M. Aarts,Michael Danessis,
The use of perceptually based (lossy) audio codecs, like MPEG 1 - layer 3 ('mp3'), has become very popular in the last few years. However, at very high compression rates the perceptual quality of the signal is degraded, which is mainly exhibited as a loss of high frequencies. We propose an efficient algorithm for extending the bandwidth of an audio signal, with the goal to create a more natural sound. This is done by adding an extra octave at the high frequency part of the spectrum. The algorithm uses a non-linearity to generate the extended octave, and can be applied to music as well as speech. This also enables application to fixed or mobile communication systems.
Efficient High-frequency Bandwidth Extension of Music and Speech

Matthew A. Watson,Michael M. Truman,
This paper introduces two signaling techniques that may be useful in broadcast applications for combining supplemental data with program material. Such data may be used to monitor audience participation, advertising placement or to convey additional program information (including lyrics and URLs). The first technique involves replacing unused bits, or fill bits, in fixed data-rate compression systems with information-carrying data in a way that does not affect the quality of the program material and does not necessitate modification of encoders or decoders. A second technique involves modulating the bandwidth of a signal and adding perceptually shaped noise to create an inaudible, constant-rate supplemental signal. Both techniques may be integrated with the Dolby Digital perceptual coding system.
Signaling Techniques for Broadcast Applications

Seyed Ali Azizi,
The overall frequency response of a graphic or parametric equalizer bank, consisting of a number of serially connected cut or boost equalizers, may show serious deviations from the user defined gain setting. The deviations are caused by mutual interference's of the equalizers, and depend on gains, quality factors and the center frequencies of the individual equalizers. This paper discusses the known interference compensation strategies and then introduces a new approach to efficiently counteract the undesired interference effects. It is based on the "Opposite Filter Concept": Based on the user defined parameter setting of an equalizer bank, a set of simple filters counteracting the interference's are adequately parameterized and serially inserted into the equalizer bank resulting in substantial diminution of the interference effects, and consequently in generation of an overall frequency response close to the desired one .
A New Concept of Interference Compensation for Parametric and Graphic Equalizer Banks

Igor Nikolic,
This paper proposes the improvement of artificial reverberation through decomposition of input signal into subbands and use of different subband feedback delay networks (FDN). By this means nonuniform modal density is achieved and orders of FDNs are lowered. Evaluation of filter banks and corresponding subband FDNs are described. Comparison results, obtained through objective (simulation) and subjective (listening tests) analysis, are presented.
Improvements of Artificial Reverberation by Use of Subband Feedback Delay Networks

Bruno Johan Putzeys,Renaud de Saint Moulin,
A method is presented to convert 1-bit digital audio signals into an analogue signal with sufficient current and voltage to drive loudspeakers. For this goal a novel non-PWM class D power stage is constructed that performs this function with very low distortion and very high efficiency, without the use of feedback or other analogue processing. Results of the prototype development are detailed.
A True One-Bit Power D/A Converter

Martin Streitenberger,Helmut Bresch,Jens Kaszubiak,Thomas Schindler,
In this paper, a new concept for a high speed, high resolution digital pulse-former is presented. A digital pulse-former basically maps each input data word (sample) into a binary pulse of corresponding width. Such binary pulse-length modulated signals are incorporated in digital class-D amplifiers and pwm applications. In our approach a synchronous digital counter converting the "rough" part of the input sample is combined with a ring oscillator serving as "fine" counter. This hybrid configuration yields a drastically increased resolution while maintaining moderate clock rates. Results of a first discrete implementation of this concept are discussed.
A Novel Approach to High Speed Digital Pulse-formers Based on Ring Oscillators for PWM and Class-D Applications

Rolf Esslinger,Gerhard Gruhler,Robert W. Stewart,
The development of a fully digital audio power amplifier based on PWM or Sigma Delta technologies still has many unsolved practical problems. The most problematic part of the amplifier system is the switching (class-D) power stage. It is extremely difficult to turn on or off the high power voltage impulses as required for a high performance signal quality (e.g. 16 bit CD quality). In this paper the typical switching errors introduced by the power transistors, the power supply voltage and the load will be analysed formally and with computer simulations. From the results it can be seen, that a degrading of the Signal-to-Noise quality by harmonic distortions and unwanted modulation products occur by the signal errors in the output stage.
Distortions by Switching Errors in Digital Power Amplifiers Using Sigma Delta Coded Signals

Rolf Esslinger,Gerhard Gruhler,Robert W. Stewart,
To use one-bit Sigma-Delta Modulation (SDM) in digital class-D power amplifiers the effective pulse frequency has to be reduced. This paper contains a review about this problem. It will be shown, how the efficiency of the amplifier is degraded by too high pulse frequencies. Fundamental approaches to create high resolution pulse signals with lower pulse rates around 300 to 500 kHz will be shown. One of these is a controlled generation of the pulse-pattern, like it is done in the "Bit-Flipping". Alternative approaches can be found, when the dependency of the generated pulse patterns by the loop filter structure is considered.
Sigma-Delta Modulation in Digital Class-D Power Amplifiers: Methods for Reducing the Effective Pulse Transition Rate

Kelvin Chee-Mun Lee,Jun Yang,Woon Seng Gan,Meng-Hwa Er,
The use of a parametric array in air as a directional audio loudspeaker has been reported in previous literature and the self-demodulation phenomenon is well-understood to seriously distort the generated audible sound as a result of inter-modulation distortion. We propose to model the nonlinear interaction in air using a second-order Volterra kernel at the Rayleigh distance within which sound intensity and parametric conversion efficiency are assumed to be high. Results first obtained from Burgers’ equation-based numerical simulations and actual measurements are then used in the nonlinear system identification process.
Modeling Nonlinearity of Air with Volterra Kernels for Use in a Parametric Array Loudspeaker

Furi Andi Karnapi,Woon Seng Gan,Meng-Hwa Er,
Human auditory does not perceive sound of all frequencies with equal loudness. It is known that a low frequency signal needs to be produced with higher power level to have the same loudness as the middle frequency part. There are two ways to overcome this problem. There are either carried out by boosting the power of the low frequency part or utilizing psychoacoustics’ effect called the ‘missing fundamental’. Parametric array’s usage to generate highly directional audible signal has been reported since few decades ago. However the reproduced signal lacks low frequency content. One reason is the relatively low power level produced by the existing parametric array. Utilizing the non-linearity of air, it is proposed to psycho-acoustically enhance the low frequency perception of a parametric array loudspeaker.
Method to Enhance Low Frequency Perception from a Parametric Array Loudspeaker

Back to AES Preprints

(C) 2003, Audio Engineering Society, Inc.