AES 114th Convention
March 22-25, 2003
AES Preprint Ordering
Single Convention Preprints are available through the AES Preprint Search and Shop facility.
The paper describes Autodirective Dual Microphone (ADM) technology and its applications. ADM digital signal processing technology developed by Alango Ltd. is a far-field, adaptive beamforming technology that uses only two closely spaced omnidirectional sound pressure sensors. ADM technology provides the optimal and variable directivity in every frequency region. The adaptation time is very fast leading to very good improvement in signal-to-noise ratio in fast changing noisy environments. Contrary to regular directional microphones, an ADM technology based microphone is much less sensitive to wind noises and does not a proximity effect. Its DSP implementation is simple requiring very modest computational resources.
Autodirective Dual Microphone
Edo Hulsebos,Thomas Schuurmans,Diemer de Vries,Rinus Boone,
Traditional stereo microphone pair techniques for natural recording are quite capable for 2 channel stereo reproduction. However, for multichannel reproduction systems like 5.1, 7.1 and ambisonics compromises in terms of coverage, source localization and channel separation are unavoidable. The main reason for this is that microphones currently used only have low order directivity patterns (omni, figure-of eight, cardioid or hypercardioid) that cannot provide sufficient angular resolution to avoid unwanted cross talk between the recording channels. In this paper a discrete coincident 12 channel microphone is proposed in order to solve these problems. This microphone consists of a circular array with a radius of 1 meter using 288 microphone capsules whose output signals are combined into 24 channels using simple analog electronics. These 24 channels are captured using a multi-track computer interface and post-processed into 12 discrete reproduction audio channels.
Circular Microphone Array for Discrete Multichannel Audio Recording.
Arnaud Laborie,Remy Bruno,Sebastien Montoya,
Many techniques have already been developped concerning surround sound recording, and this issue turned out to be a challenge, without a comprehensive theory. In this paper, an approach based on a full 3D acoustic field theory and using a 3D microphone array is presented. Our research works have lead to a new spatial digital processing technique allowing the use of freely positioned capsules of any type, such as omnidirectionnal, bidirectionnal, cardioïd ones. This technique can be seen as an extension of full-sphere generalized Ambisonic providing a spatial resolution never reached before, but neither requires high-order directivity capsules nor assumes that all capsules are coïncident. The theory has been validated with a full-sphere 3rd-order prototype using 24 omnidirectionnal capsules. A 5th-order directivity multichannel 5.0 microphone is also presented.
A New Comprehensive Approach of Surround Sound Recording
Stuart Bradley,Juha Backman,Sabine von Hunerbein,Tao Wu,
The study identifies how wind noise is generated in microphones and how important different mechanisms are at different frequencies. Studies in a quiet wind tunnel have been performed on the noise spectrum from microphones embedded in a turbulent air stream of known characteristics. The effect of the microphone’s housing in creating a turbulent boundary layer is included. Scaling relationships are found which give insight into noise generating mechanisms and the effects of microphone geometry and placement. The noise spectrum consists of three regimes: a constant noise level band at low frequencies; and 1/f behaviour but with different generating mechanisms and scaling dependence on wind speed and other variables in two distinct higher frequency bands. Appropriate noise reduction schemes are explored.
The Mechanisms Creating Wind Noise in Microphones
Arie van Rhijn,
Electret Condenser Microphones (ECM) are used in almost every consumer and communication audio application. The total yearly volume of all ECM’s is well over 1 billion units. Over the years, innovation in ECM’s have concentrated mainly on lower cost production and smaller sizes. Other improvements of sensitivity, Signal to Noise Ratio (SNR), linearity and supply current have not been addressed. In this paper several I.C. designs are presented, that replace the JFET inside the microphone canister. In a 2-wire ECM this IC results in 20dB increase in sensitivity and THD below 0.5% over the entire output range. In a 3-wire EMC, high PSRR of 60dB and low output impedance of 200 Ohm is achieved.
Integrated Circuits for High Performance Electret Microphones
Jim Brown,David Josephson,
Neil Muncy has shown that improper termination of shield wiring, commonly called the pin 1 problem, couples noise currents flowing on a cable shield into audio circuitry through common impedance coupling. This paper examines the susceptibility of modern microphones, describes a simple test to find problems, and offers simple solutions.
Radio Frequency Susceptibility of Capacitor Microphones
A beamforming technique for rendering multi-channel from recordings obtained using a closely spaced microphone group (cardioids in free space, or omnis mounted on a sphere) is presented. The method allows for optimising the polar characteristics of the group to correspond to the loudspeaker arrangement used, and to obtain the desired panning characteristics.
Microphone Array Beam Forming for Multichannel Recording
Karl A. Sagren,Bjorn Hedquist,
A comparison of the speech-intelligibility of the reflected signal from a vs. a human voice, in a highly reverberant environment. The study was made both with subjective judgement methods and by using the standardized measuring methods commonly used,i.e.RASTI-STI-Alcons-C60-C80. The conclusion points out the possibility to distribute information in a reverberant room by exiting the reflective field in the same manner a purely acoustic source does-i.e. a human voice or an musical instrument.
Intelligibility of Reflected Sound Sources Part I
A procedure for managing listening tests of automotive sound systems is presented. Techniques are selected to reduce listener bias and simplify the listening task. These include comparison to a fixed reference sound, consistent source material, breakdown of the listening task into many independent absolute judgments and adequate listener training. The overall rating is derived from a weighted sum of all of the individual judgments, not listener opinion.
Listening Technology for Automotive Sound Systems
Antony W. Rix,Jens Berger,John G. Beerends,
Perceptual quality measurement algorithms such as PESQ (ITU-T Recommendation P.862) are now in common use for evaluation of speech quality of communications networks and systems. However these algorithms are mainly designed for use with electrical or digital – not acoustic – interfaces to the systems under test. This limits the algorithms’ applicability to terminals, in particular where the effects of transducers, acoustics, and signal processing in the terminals may be combined with network properties such as low bit-rate coding and channel errors. This paper describes work under way in ITU-T SG12 to develop a new algorithm for evaluation of both networks and terminals using acoustic interfaces, and reports the latest results in the development of a new ITU-T Recommendation for this application.
Perceptual Quality Assessment of Telecommunications Systems Including Terminals.
Michael C. Kelly,Anthony I. Tew,
The mechanisms of human localisation for a single sound source are well understood, but less is known about how we localise multiple, simultaneous sound sources. In rendering a complex virtual auditory space (VAS), localisation cues are applied separately to each sound object and the results are summed to create a multiple-source environment. In this paper we investigate the relevance of the inter-source spectral overlap that arises in such a VAS. We do so by adjusting the spatial cues in these regions and comparing a listener's localisation ability for the modified and unmodified cases. We show how even total removal of the weaker spectral components in regions of overlap has no effect on localisation ability. Finally, we discuss the exploitation of redundancies in the regions of spectral overlap with respect to multiple-source localisation.
The Significance of Spectral Overlap in Multiple-source Localization
Gordana Kovacic,Hrvoje Domitrovic,
The aim of this study was to investigate the accuracy of listeners' estimation of physical characteristics such as body weight and body height from voice signals alone. A series of listening tasks was carried out in which 20 adult male listeners judged body weight and height from 20 adult male speakers' voice samples. Additionally, listeners' perception of the speakers' voice pitch was compared to actual voice pitch measured as fundamental frequency (F0) in Hz. The results of Pearson and Spearman correlation coefficients calculations indicated that the listeners' estimation of body weight and height was in negative correlation with actual physical parameters of the speakers. In point of fact, it was shown that listeners held a certain vocal stereotype about physical characteristics where lower pitch of the voice was taken as an indicator of larger body weight and height, and vice versa.
Accuracy of the Listeners' Estimation of the Speakers' Body Weight and Height Based Solely on the Voice Signal
Beate Klehs,Thomas Sporer,
In anechoic rooms the concept of Wave Field Synthesis (WFS) has already proven to provide superior spatial sound over a large part of the room. The progress in microelectronics enables WFS to become available in commercial products at reasonable price. In the next future it will be installed in different acoustical environments. In anechoic space WFS needs a huge number of loudspeakers. In "normal" listening conditions simulated and real acoustics interfere with each other making the generated wave field less exact. This paper describes listening tests conducted to evaluate WFS in common living room conditions. Parameters under test are the number of loudspeakers, the distance between loudspeakers, the position of the simulated source and the position of listeners relative to the loudspeakers.
Wave Field Synthesis in the Real World: Part 1 - In the Living Room
Sean E. Olive,
A series of 28 listening tests were conducted over the course of 11 months involving 199 different listeners and 4 different loudspeakers to determine the repeatability, accuracy and preferences of untrained versus trained listeners. The results indicate remarkable repeatability and similarity in preference among both trained and untrained listeners. This suggests that training does not bias listeners’ preference, and their results can be extrapolated to a larger population of untrained listeners. The trained listeners tended to use the lower half of the preference scale compared to untrained listeners, and produce more reliable and discriminating preferences amongst the different loudspeakers. Other factors that produce variance in preference ratings include training, program, the context and number of speakers compared, audiometric performance and biases within the preference scale itself. A comparison of the acoustical measurements of the loudspeakers and their mean preference rating show clear correlations.
Differences in Performance and Preference of Trained versus Untrained Listeners In Loudspeaker Tests: A Case Study
Brandon Cochenour,Carlos Chai,David A. Rich,
The sensitivity of high-order filter networks to component matching-tolerances increases with filter order. For an audio loudspeaker's crossover network that is designed to sum to an all pass network, we demonstrate that the sensitivity to component matching tolerances may be dwarfed by sensitivities to other effects. We examine second- to eighth-order Linkwitz-Riley crossovers. The analysis also subsumes networks with transmission zeros and optimized networks where the effects of frequency-response errors introduced by the driver's respective transfer functions are minimized. We remark on crossover networks that are least sensitive to the combined effects of component tolerances, path-delay effects, the interaction of filter sections in speakers that divide the incoming signal into three or more sub-bands, and driver transfer functions.
Sensitivity of High Order Loudspeaker Crossover Networks with All Pass Response
Andrew Goldberg,Aki Makivirta,
This paper presents a novel method for automatically selecting the optimal in-situ acoustical frequency response of active loudspeakers within a discrete-valued set of responses offered by room response controls on active loudspeakers. An overview of optimisation techniques is given, the resulting optimisation algorithm described as is the rationale of the room response controls for the active loudspeakers. The frequency response, calculated from the acquired impulse response, is used as the input for the optimisation algorithm to select the most favourable combination of room response controls. Examples are given and the performance of the algorithm is analysed and discussed. This system has been implemented and is currently in active use by specialists who set up and tune studios and listening rooms.
Automated In-Situ Frequency Response Optimisation of Active Loudspeakers
Richard H. Small,
Nonlinearities in full-range loudspeakers can result in amplitude modulation of mid and high frequency content by strong low frequency components. A measurement based on an established two-tone modulation distortion test can assess the amount of distortion produced and provide indication of the dominant nonlinearities causing the distortion. The paper discusses measurement and signal processing techniques as well as methods of data display for interpretation.
Measurement of Loudspeaker Amplitude Modulation Distortion
Neil Harris,Malcolm O.J. Hawksford,
AES paper 5215 presented both 2-D Finite Element Analysis (FEA) and measurements to examine the effects of a single dominant reflection on the radiation of a loudspeaker. This earlier research is extended here by exploiting an analytic 3-D solution to the problem of an acoustic source located in a non-anechoic room. Unlike the earlier FEA solution, this method is mesh-less and provides an output at any point in space at any frequency. Applying the inverse Fourier transform, temporal data can be extracted to form a complete time and frequency domain description.
Modelling Room Interaction for Pistonic and Distributed-mode Loudspeakers in both Frequency and Time Domains
Traditional modeling describes the heat flow in loudspeakers by an equivalent circuit using integrators with different time constants. The parameters of the lumped elements are assumed to be independent of the amplitude of the signal. The simple model fails in describing the air convection cooling which becomes an effective cooling mechanism if the velocity of the coil and/or the forced air in the gap becomes high. This paper presents a large-signal model considering the nonlinear interactions between the electro-mechanical and thermal mechanisms. The model and parameter are verified by practical measurements on drivers. The dominant paths for the heat flow are identified and means for increasing the power handling capacity are discussed.
Nonlinear Modeling of the Heat Transfer in Loudspeakers
Wolfgang Klippel,Ulf Seidel,
Most of the traditional techniques transform the distorted time signal into the frequency domain to measure harmonic and intermodulation as additional spectral components. The amplitude shows the mean energy of the distortion averaged over the analyzing interval usually the period of the fundamental tone. A novel identification technique is presented to measure the instantaneous distortion (ID) versus time, frequency, displacement or other state variables. The ID shows the full fine structure of the distortion and their dependency on the cause of the distortion. Statistical analysis applied to the ID lead to the peak value (PHD) and crest factor (CHD) which seems to be important characteristics comparable with the total harmonic distortion (THD). The practical application, the interpretation of the results and diagnostics of speaker defects caused by design or manufacturing are discussed in the paper.
Measurement of Impulsive Distortion, Rub and Buzz and other Disturbances
Radiation of standard DML placed into acoustically rigid enclosure was investigated. The results of conducted measurements show that the attachment of absorbing material placed on the bottom of enclosure has influence on the sound emission from the panel. The effect of air-gap was also investigated.
Effect of Porous Material on the Diffusivity of an Unbaffled DML Panel
Toni Hirvonen,Markus Vaalgamaa,Juha Backman,Matti Karjalainen,
Two listening tests with six different headphones were conducted. The objectives of these tests were to investigate listener sound color preferences using 1) the actual headphones and 2) dummy head recordings made with the same devices. The purpose of the recordings was to simulate the timbres of the actual devices as well as possible when played back through a pair of compensated headphones. The results from the two tests were compared and despite of some similarity, the analysis showed significant differences between the two cases. Additionally, the diffuse-field responses of the headphones were calculated from frequency response measurements. The obtained headphone preference order cannot fully be explained based on the flatness of the diffuse-field response as a measure.
Listening Test Methodology for Headphone Evaluation
Multidimensional scaling and preference mapping were used for the perceptual analysis of the quality of speech corrupted by car noise in mobile communications. 41 processing chains, representing, e.g., transmission of speech over mobile networks, were studied. 30 screened subjects were used in the quality test and 15 screened and trained subjects in the MDS test. Based on an external profiling of the auditory characteristics, the dimensions appeared to relate to general naturalness of speech, limitation of the frequency band of speech, smoothness of speech and noisiness of speech. The Phase IV, ideal point model was used to predict the quality with an average error of about 6 %, to study the interaction between the attributes and the linearity of the attributes.
Ideal Point Modelling of the Quality of Noisy Speech in Mobile Communications Based on Multidimensional Scaling
Laetitia Gros,Noel Chateau,Sylvain Busson,
The question of the validity of listening tests in laboratory is considered in the case of the speech quality transmitted by mobile phone. First of all, tests are run outside, in two places characterized by two different environments. Degradations are introduced in the transmitted speech signal heard by subjects, and sound environments as well as transmitted speech signals are recorded. In a second time, tests are carried out in laboratory by reproducing recorded speech signals through handsets and recorded sound environments with a Dolby Surround system. Results show a weak impact of the sound environment on quality judgments and validate the use of listening tests in laboratory.
A Comparison of Speech Quality Judgments in Laboratory and in Real Environment
William L. Martens,Atsushi Marui,
Previous studies attempting multidimensional perceptual control of distortion effects have shown that the perceived sharpness is a salient timbral attribute that varies with changes in distortion effects processing parameters. In order to provide perceptually-based (psychophysically-calibrated) control over multiparameter distortion effects processing, a prediction equation for sharpness was derived from the results of listening tests. Though sharpness may be predicted as a weighted first moment of the critical band rate distribution of specific loudness, this first approximation is not sensitive to variation in the low-frequency spectral envelope of the sound stimulus. A model was developed that extends the standard sharpness prediction equation to include dependence upon additional parameters that were shown to modulate perceived sharpness of the guitar sound stimuli employed in this study.
Psychophysical Calibration of Sharpness for Multiparameter Distortion Effects Processing
Koray Ozcan,Simon C. Busbridge,Peter A. Fryer,Gary P. Geaves,Jon P. Moore,
An advance on the auralisation results previously presented for interaural time and intensity conflict cue experiments is reported by the inclusion of phase for multiple frequency tone bursts and wideband signals. A method is presented to manipulate the phase of all the component frequencies in a wideband signal whilst leaving the amplitude structure unchanged through the use of the Hilbert transform. Therefore phase and time become distinguishable from each other for such signals. The results indicate that localisation remains strong in the presence of large phase shifts. Furthermore the central diffuse field that is characteristic of intensity versus interaural time conflict experiments is absent when the intensity and phase of wideband signals are placed in conflict.
The Significance of Phase as an Auditory Cue
An audible anomaly was heard while evaluating a woofer loudspeaker excited with sinusoidal wave signals. Standard testing did not reveal the nature or cause of this unusual sound. Further investigation showed the loudspeaker was producing a frequency component at half the input frequency; a sub-harmonic. Reported cases of sub-harmonic production are rare for woofers; most reports concern compression drivers. In this paper the techniques used to confirm the generation of a sub-harmonic in this woofer are presented. The mechanisms that combined to generate the sub-harmonic are shown. Further examples found in other speakers that exhibit this sound anomaly on occasion are also discussed.
Detection and Diagnoses of Subharmonic Tones Generated in Woofers
John Vanderkooy,Paul M. Boers,Ronald M. Aarts,
In an earlier paper (AES 113th #5651) we showed the basic consequences of a dramatic increase in the motor strength Bl of a driver, as it relates to the efficiency of the loudspeaker and the effect on amplifier dissipation. Here we study the effect of reducing the cabinet size, increasing the mass of the cone, and the use of vented systems or those with passive radiators. A high Bl-value has a positive influence on many aspects of loudspeaker systems that have traditionally been relegated to standard recipes. A glance at some products available today suggests that at least some of the loudspeaker industry is aware of these ideas.
Direct-Radiator Loudspeaker Systems with High Bl
Guillaume Pellerin,Jean-Dominique Polack,Jean-Pierre Morkerken,
Current research into electroacoustics tends to determine the global transfert function between an initial electrical signal and the acoustical signal supplied at the ear. Because electrodynamic transducers radiate in a large frequency bandwidth, lumped parameter model such as Thiele and Small's is not sufficient to provide a realistic simulation of the vibro-acoustic behaviour of the system. This study proposes the use of Finite Element and Boundary Element Methods to compute a complex 3D response of a loudspeaker for each mechanical modes and then synthesize an equivalent electrical model that takes into account acoustical coupling between all modes.
Finite Element Methods and Equivalent Electrical Models for Loudspeaker Characterization.
Erhard E. Werner,
Protection against hearing impairment is as well an individual as a general community matter. Regulations for offices and factories exist since longtime. Proposals for similar protective means for the leisure range resulted in extreme contradictory arguments. The contribution is dealing with an European attempt to find a compromise between the natural wish for unlimited individual acoustic fun and fundamental consequences of living within a community offering social health care. Details of EN 50332 and consequences for technical details of portable audio equipment are presented with reference to the product safety directive.
Product Safety- End of Audio Fun?
A boundary element model is used to analyse a folded horn. Results from the boundary element model are compared to measurements of the throat radiation impedance and the far-field acoustic response. Further analysis shows how one-dimensional and lumped parameter models can be derived from the boundary element results, and used to gain insight into the behaviour of the folded horn in a loudspeaker.
Analysis of the Folded Horn
D. B. (Don) Jr. Keele,
The full-sphere sound radiation pattern of the CBT circular-wedge curved-line loudspeaker array exhibits a 3D petal-shaped sound radiation pattern that stays surprisingly uniform with frequency. Oriented vertically, it not only exhibits the expected uniform control of vertical coverage but also provides significant coverage control horizontally. The horizontal control is provided by a vertical coverage that smoothly decreases as a function of the horizontal off-axis angle and reaches a minimum at right angles to the primary listening axis. This is in contrast to a straight-line array that exhibits a 3D sound field that is axially symmetric about its vertical axis and exhibits only minimal directivity in the horizontal plane due to the inherent directional characteristics of each of the sources that make up the array.
The Full-Sphere Sound Field of Constant Beamwidth Transducer (CBT) Loudspeaker Line Arrays
Jim Brown,Bill Whitlock,
Neil Muncy has shown that audio frequency current flowing on the shield of balanced audio wiring will be converted to differential mode voltage by any imbalance in the transfer impedance of cables, and hypothesized that the effect increases linearly with frequency. Whitlock has shown that conversion also occurs with capacitive imbalance. This paper confirms Muncy's hypothesis, and shows that shield current induced noise can be significant in the MHz range.
Common-Mode to Differential-Mode Conversion in Shielded Twisted-pair Cables (Shield-Current-Induced Noise)
Pierre Touzelet,Menno van der Veen,
In earlier preprints new vacuum-tube and output transformer models were proposed. They now are applied to the famous Quad II valve amplifier. Results of models and measurements are compared in the frequency-time and amplitude domains. It is shown that transformers, tubes and complete amplifiers can be modelled with great precision.
New Vacuum-Tube and Output Transformer Models Applied to the Quad II Valve Amplifier
Malcolm O.J. Hawksford,
System measurement employing pseudo-random filtered noise and music sequences is investigated. An efficient single-pass technique is used to evaluate simultaneously transfer function and spectral domain signal-to-distortion ratio that is applicable to amplifiers, signal processors, digital-to-analogue converters, perceptual coder performance and loudspeakers. The technique is extended to determine a power-series model from which non-linear distortion can be estimated for an arbitrary excitation with out need of re-measurement.
System Measurement and Modeling using Pseudo-random Filtered Noise and Music Sequences
Christos Goussios,George Kalliris,Charalampos Dimoulas,George Papanikolaou,Stylianos-Marinos Charalampidis,
New techniques for the improvement of the frequency response, radiation patterns and maximum SPL of a horn-loaded omnidirectional point source are presented. A ported system is added as a stand for the existing source, in order to improve the response and level in the region of the lower frequencies. Cone-shaped reflectors are placed inside the horns so as to improve the polar patterns at higher frequencies
Improvements of a Horn-loaded Omnidirectional Sound Source
Proper restoration of historic disc recordings is assisted by an accurate knowledge of the characteristics of the equipment used in the recording process. This paper traces the evolution of electro-mechanical disc cutting heads from the early 1920's to the last models. Use is made of the electro-mechanical equivalent circuits to analyse the performance of moving iron, moving coil and motional feedback types.
The Development of Disc Cutting Heads.
Roger Shively,Josh King,
The results of a study of the quality of automotive doors as lousdpeaker enclosures are presented. A method for making measurements for quantifying the doors as enclosures and relating that to the sound quality is provided.
Automotive Doors as Loudspeaker Enclosures
Carl Hetherington,Anthony I. Tew,
In their recent paper, Tao et. al describe the method of differential pressure synthesis (DPS), an efficient way of estimating the pressure response of human head shapes. The DPS method in its present form is unsuitable for parameterising the complex shape of the human pinna. We propose a novel method of parameterising human pinna shape using a compact set of elliptic Fourier coefficients. We present the results of acoustic simulations in which the trade-off between acoustic accuracy and data compression is investigated for a pinna mounted on an infinite baffle. An important application of the method, when integrated into DPS, is the estimation of personalised head-related transfer functions.
Parameterizing Human Pinna Shape for the Estimation of Head-related Transfer Functions
Mark D. Plumbley,Samer A. Abdallah,
We used Independent Component Analysis (ICA) with sparse coding to analyze music spectral sequences. We modelled an audio spectrum as an approximate mixture of the spectra of individual notes, using our ICA approach to “unmix” this to find the individual notes and note spectra. Notes are assumed to be approximately independent, and sparse (mostly off). Results on synthesized harpsichord music are encouraging, producing an approximate piano-roll transcription, and a passable rendition of the original music when resynthesized. We are currently working to extend and improve this through the use of temporal information of note activities and to handle more complex timbral behaviour.
An Independent Component Analysis Approach to Automatic Music Transcription
Enrique Alexandre,Antonio Pena,
In this paper an efficient model which implements a multilevel structure of auditory information will be presented. An exhaustive analysis of the input signal in both subjective and objective terms is performed which allows to make an ad-hoc coding depending on the particular characteristics of the input signal. The model comprises not only the calculation of the masking threshold but also several techniques and tools designed to reduce the amount of audible artifacts present in the coded signal.
Efficient Model Performing a Multilevel Structure of Auditory Information Applied to Audio Coding
An experimental software environment called the BlockCompiler is described that is developed for flexible yet efficient simulation of different acoustic and audio systems. It is based on computational block objects and their interconnection networks, and it supports several different modeling paradigms. It is particularly powerful in creating physical models where two-directional interaction between physical elements has to be represented. High-level model specifications are compiled to efficient code, supporting real-time simulation and sound synthesis of relatively complex systems. Simulation examples to be described in the full paper will include modeling of musical instruments, speech synthesis, simulation of 2-D and 3-D acoustical structures, and loudspeaker simulation.
BlockCompiler: Efficient Simulation of Acoustic and Audio Systems
Pierrick Lotton,Bertrand Lihoreau,Michel Bruneau,Vitali Gusev,
Flexural-mode piezoelectric transducers are extensively used as acoustic actuators. The usefulness of equivalent circuit modelling to characterise this kind of piezoelectric source has been long recognised, even the conventional models are empirical or even they assume drastic approximations. The aim of the study is to improve this kind of equivalent network in such a way that it permits to link together the basic physical parameters, avoiding overly intricate formulation even it does not assume usual approximations. The equivalent network obtained shows a classical structure, the expressions of the parameters of the circuit being however known analytically as simple functions of the parameters of the system. In order to valid the modelling, an application is given when loading the piezoelectric source with a resonator.
An Analytical Modelling to Describe the Coupling Between a Piezoelectric Actuator and a Loading Medium. Validation of the Method for Engineering Problems
Juan Jose Gomez-Alfageme,Beatriz Sanchez-Alonso,
The teaching of Electroacoustics in Audio Engineering career, and especially the characterization of electroacoustic transducers, it has always been complex from the point of view of the traditional education means. The employment of the simulation tools has allowed us to develop applications for the study of these transducers based on the employment of equivalent circuits and its analysis both in time and frequency domains. In this paper some of these applications are described in the case of designing low frequency band-pass radiation systems with variable geometry, using Mathcad software as simulation tool.
Simulation Tools in Electroacoustic Tranducers. A Case Study: Different Order Band-pass System Design
Fangli Ning,Juan Wei,
In this paper, based on acoustical finite element method (FEM) model, one new method for calculating reverberation time in car compartment is presented. The new method differs from Sabine’s equation in that diffuse sound field isn’t absolutely necessarily condition. As a result, the new method could give correct reverberation time for being designed car compartment with any shape and boundary condition. Firstly, Paper describes FEM of car compartment. Secondly, the new method is presented in detail, and computer program of it is written. Lastly, in one model of car compartment, reverberation time is calculated with the new method, and compared with that given by experiment, the results show that the method is effective and feasible.
One New Method for Calculating Reverberation Time in Car Compartment
Evan Reiley,Anthony Grimani,
This report documents the research and prototyping of a new form of acoustical bass absorber. The bass absorber reduces peak and dip frequency response errors caused by interference from naturally occurring standing waves in rooms. The design uses two forms of simple harmonic resonance: pistonic diaphragm resonance and Helmholtz cavity resonance. The pistonic diaphragm resonance is achieved by attaching a rigid planar membrane to metal springs. The Helmholtz cavity resonance is achieved by constructing an enclosed chamber attached to an open tube. Coupling these two dissipation devices led to several-fold improvement in total room mode attenuation.
Room Mode Bass Absorption Through Combined Diaphragmatic & Helmholtz Resonance Techniques: “The Springzorber”
Ernst-Joachim Voelker,Wolfgang Teuber,
The acoustics of concert halls is still a secret. The conductor, the orchestra and the hall are linked together to reach an optimal performance. A sound must be created which is a balance between many important influences, such as sounds in the energy and time domain, "Distance of privacy" for musicians, "Sound field of balance" for the audience, noise disturbences and acoustical properties of the surfaces.The conductor´s place is an optimal position for good listening to "natural multichannal performance" of the orchestra. The 100 year old Festhalle in Landau hase been totally reconstructed including old ceilings and walls in youth stile. Distinct decisions had to be made to create the adaquate acoustics for concert, theater and rehearsals.
Room Acoustics for Rehearsals and Concerts - The New Festhalle in Landau, Germany
Johan van der Werff,Dick de Leeuw,
The Peutz prediction algorithms as published in 1971 in the J.A.E.S. 19 are still valid and considering the simplicity remarkably accurate. However some revision is necessary for adaptation to the contemporary room simulation and cluster design programs. This paper will deal with the prediction of the Articulation Loss of consonants (ALcons) based on usually available data in the drawing stadium of an acoustical project. Special attention will be given on how to deal with multiple sources, nearer and farther apart. For the attendees to the presentation of this paper there will be an Excel® spreadsheet for a quick calculation according to the proposed method.
What You Specify Is What You Get (part 1)
Johan van der Werff,Rob.A. Metkemijer,
The Peutz prediction algorithms for the Articulation Loss of consonants (ALcons) as published in 1988 (85th convention in Los Angeles) did not seem to get the attention they deserved in the acoustical society. Perhaps this is due to the confusion it may have stirred because of the totally different set of algorithms compared to the 1971 set, or perhaps due to the more complicated calculations, but likely most of all how and where to get the physical quantities needed for input. This paper will deal with the underlying principles, how to extract the data from an impulse response and how to calculate the ALcons from that. It is thought that this will be a valuable addition to the well known STI measurements because the data can be narrow band (one octave wide) and is in the gathering not sensitive for signal processors in the signal chain or for the type of filters used in the post processing of the data. For the attendees to the presentation of this paper there will be a computer program available which reads a set of measured or calculated impulse responses, extracts the data, calculates the ALcons and presents the results.
What You Specify Is What You Get (part 2)
Carrouso project combines technologies for recording, transmission and rendering of 3D sound scenes. The rendered virtual scene includes both the sound content, and the spatial and room acoustic description of the performance space. MPEG-4 tools are utilized for encoding of this data, using the general audio coding for compression of sound streams, and the scene description tools for creating virtual 3D audio scenes. We describe the creation of the virtual acoustic space, carried out with the help of a room acoustics analysis software and an authoring tool. A visual representation of the virtual sound scene is also transmitted to the renderer, and it acts as a user interface allowing renderer-side scene modification via the interaction mechanisms provided in MPEG-4.
User Interaction and Authoring of 3D Sound Scenes in the Carrouso EU project
Although the electronic performance of Inductive loop and other hard of hearing assitive audio systems is well covered by appropriate criteria and codes of practice, little or no attention appears to have been given to the acoustic performance and requirements of such systems. The paper reports the results of acoustic performance testing carried out on a number of HoH systems and components. In particular, the effects on the clarity and intelligibility of speech picked up by a system’s microphones are discussed. The parameters such as microphone type, orientation, location & distance from a talker are all shown to have a significant effect on the resultant performance. The paper concludes by making recommendations for test procedures and design criteria targets for the acoustic & intelligibility performance of such systems.
The Acoustic and Intelligibility Performance of Assitive Listening & Deaf Aid Loop (AFILS) Systems
Guillaume Potard,Jens Spille,
MPEG-4 AudioBIFS currently can describe and present point sound sources (e.g. a flying insect or a distant sound source) but it can not describe sound sources that have a certain spatial dimension like a choir, an orchestra, a seafront or rain. Spatial wideness or tonal volume is however a very important perceptual aspect of real sound sources. We proposed in July 2002 to enhance the spatial illusion of virtual sound scenes by adding sound source wideness and shape to AudioBIFS. A core experiment then followed to study the usefulness of this proposal. The details of the experiments and conclusions are described in this document.
Study of Sound Source Shape and Wideness in Virtual and Real Auditory Displays
Ben Supper,Tim Brookes,Francis Rumsey,
A number of problems have recently come to light whilst attempting to perform perceptually relevant computational analysis of binaural recordings made within enclosed spaces. In particular, it is not possible to extract reliable information for auditory source width or listener envelopment without accounting for the time-domain properties of the stimulus. A new method for performing computational spatial analysis entails computing the running interaural cross-correlation of the binaural signal whilst employing an adaptive filter to perform basic dereverberation, hence gaining an amplitude characteristic of the source stream. Early experimental results indicate that this new technique yields an indication of auditory spatial attributes which is more reliable than that attainable previously.
A New Approach to Detecting Auditory Onsets within a Binaural Stream
Aki Harma,Julia Jakka,Miikka Tikander,Matti Karjalainen,Tapio Lokki,Heli Nironen,Sampo Vesa,
The concept of augmented reality audio characterizes techniques where real sound environment is extended with virtual auditory environments and communications scenarios. This article introduces a framework for Wearable Augmented Reality Audio (WARA) based on a specific headset configuration and a real-time audio software system. We will review relevant literature and aim at identifying most potential application scenarios for WARA. Listening test results with a prototype system will be presented.
Techniques and Applications of Wearable Augmented Reality Audio
Tobias Neher,Francis Rumsey,Tim Brookes,Peter Craven,
This paper reports recent progress towards the development of a spatial ear trainer. A study into the perceptual construct of ‘ensemble width’ (i.e. the lateral spacing of the outer sources contained within an auditory scene) was conducted. With the help of a novel surround panner, exemplary stimuli were created. Changes were highly controlled to enable unidimensional variation of the intended qualitative effect. To assess the success of the simulation, a subjective experiment was designed based on Multidimensional Scaling (MDS) techniques and completed by an experienced listening panel. Additional verbal and non-verbal data were collected so as to facilitate analysis of the perceptual (MDS) space. Results show that unidimensionality was achieved, thus suggesting the stimuli to be suitable for training purposes.
Unidimensional Simulation of the Spatial Attribute ‘Ensemble Width’ for Training Purposes
Mitsuo Matsumoto,Mikio Tohyama,
A previously introduced algorithm for simulating a moving sound image was evaluated objectively and subjectively. This algorithm used time-variant convolution with a method for interpolating binaural impulse responses that considers the arrival times when interpolating responses Three moving sound images were used: an actual one recorded using a rotating dummy head, one simulated using the conventional cross-fading method, and one simulated using the algorithm. The objective evaluation by spectrograms of the three images showed that the one simulated using the algorithm was quite close to the actual one; the subjective evaluation using d’ showed that it was perceived to move “smoothly.”
Algorithms for Moving Sound Images
Russell Mason,Tim Brookes,Francis Rumsey,
In order to undertake controlled investigations into perceptual effects that relate to the interaural cross-correlation coefficient, experiment stimuli that meet a tight set of criteria are required. The requirements of each stimulus are that it is narrow band, normally has a constant cross-correlation coefficient over time, and can be altered to cover the full range of values of cross-correlation coefficient, including specified variations over time if required. Stimuli created using a technique based on amplitude modulation are found to meet these criteria, and their use in a number of subjective experiments is described.
Creation and Verification of a Controlled Experimental Stimulus for Investigating Selected Perceived Spatial Attributes
Laurent Bonnet,Roch Lefebvre,
An algorithm capable of extracting multipitch information from a guitar sound is described. The method is based on a two-stage approach. First, the sound signal is segmented in time based on the derivative of the signal envelope. This defines the transients between successive chords. In the second stage, a high resolution FFT is applied to a downsampled version of the signal. This yields a frequency resolution of about 1 Hz, using 1 second time support. An iterative procedure, employing frequency-bins interpolation, is applied to the amplitude spectrum to estimate the possible fundamental frequencies or harmonics. The system has been tested with simulated signals and achieves reliable fundamental frequency detection. With real guitar chords, the performance of the algorithm depends on the harmonic complexity of the sound.
High- Resolution Robust Multipitch Analysis of Guitar Chords
This paper describes the MATRIX (Multipurpose Array of Tactile Rods for Interactive eXpression) interface and its use as a controller for several signal processing algorithms (e.g., delay, reverb, EQ, and chorus). Although these algorithms are conventional effects, the multi-parameter control provided by the MATRIX allows users to manipulate audio in a manner that was not previously feasible. For more information about the MATRIX, please see http://www.create.ucsb.edu/~dano/matrix/
Control of Signal Processing Algorithms using the MATRIX Interface
Yves Grenier,Bertrand David,
Extracting a weak audio signal buried below a stronger one is a difficult task that may be encountered in forensic applications. Extraction of the background signal is made difficult by the signal-to-noise which is clearly negative. Apart from this difficulty, the problem would look like the problem of separating different components from an audio signal. We will investigate the possibility of using separation techniques for this extraction, based upon harmonic models of the stronger signal. We will compare theoretically and practically two approaches: in the first one, estimation of harmonic models is followed by subtraction of the harmonic signal, in the second one, high resolution tracking of damped sinusoids allows synthesis of each component after sorting the individual patterns.
Extraction of Weak Background Transients from Audio Signals
Dynamics processing is performed by amplifying devices where the gain is automatically controlled by the level of the input signal. Non-linear components simulating tube amplifiers can be used in these devices to make musical signal audibly dense. This paper deals with the simulation of tube amplifiers using the power polynomial approximation of transfer characteristic and with the computation method of power polynomial coefficients according to the required higher harmonics ratio.
Non-linear Dynamics Processing
Dimitri Danyuk,Michael J. Renardson,
An audio power amplifier design is presented which can linearize the transfer characteristics of conventional class AB output stages in crossover region. The objective is to offer practical circuits with error correction, that overcome nonlinearities, inherent to crossover region of class AB output stages.
Error Correction in Class AB Power Amplifiers
Preeti Rao,Saurabh Shandilya,
This paper explores the extraction of melodic pitch contour from the polyphonic soundtrack of a song. The motivation for this work lies in the need for automatic tools for the melody-based indexing of the database in a music retrieval system. The melody is assumed to be carried by the singer’s voice which is accompanied by a mainly percussive instrumental music background. This scenario is typical of a large class of Indian movie songs. The challenges raised by this application are presented. A pitch detection method based on a perception model is shown to be a promising approach to the tracking of voice pitch in the presence of strong percussive background.
Pitch Detection of the Singing Voice in Muscial Audio
Alexey Petrovsky,Detlef Krahe,A.A. Petrovsky,
Real-time implementation of a psychoacoustically motivated wavelet packet-based monophonic full-duplex audio coder using the developed a dynamic algorithm transforms approach are proposed. The principle behind approach is to define parameter of input audio signals (subband entropy) and output encoded sequences (subband rate) for the given embedded processor architecture. Adaptive wavelet analysis for audio signal coding purposes is particularly interesting if the psychoacoustic information is considered in the WP decomposition scale. The advantages of this approach are better viewed by considering the wavelet packet pruning as a splitting process, i.e. the temporal construction WP tree created for each signal frame presents an ideal decision for real time processing implemented in a reconfigurable hardware.
Real-Time Wavelet Packet-based Low Bit Rate Audio Coding on a Dynamic Reconfiguration System
Panagiotis D. Hatziantoniou,John N. Mourjopoulos,
Digital equalisation of room acoustics based on inverse filtering of measured response functions, introduces a number of theoretical and practical challenges. To overcome such problems, inverse filtering based on modified measured responses is proposed, derived via their complex Transfer Function smoothing, so that the processed responses are more perceptually compliant, of lower order and less position-sensitive than the original functions. Aim of this study is to evaluate via objective and subjective tests conducted for different-sized rooms and real-time reproduction, the use of such smoothed room responses for the derivation of appropriate room equalisation filters, which can improve the perceived and measured quality of audio reproduction in any reverberant environment.
Results for Room Acoustics Equalisation Based on Smoothed Responses
Mark Gordon,William Hsu,
Musicians in a recording session are used to collaborating in real-time. Distance makes this difficult, and current technologies that allow distance collaboration are either prohibitively expensive or rely on non real-time, store-and-forward strategies. The Network Audio Recording Environment is an object-oriented, client-server environment that enables real-time, collaborative audio production. The system provides full-duplex streaming of data over TCP/IP networks and establishes a custom messaging protocol to handle the communication of audio and control information between the client and server. The success of this project demonstrates that real-time, collaborative audio on the Internet is within reach. NARE provides a prototype of an inexpensive solution without the need for proprietary hardware or accounts on centralized servers.
Network Audio Recording Environment
Nuno Fonseca,Edmundo Monteiro,
In a time, when several audio Ethernet networking solutions appears, an analysis of the latency of this kind of audio networks, have a fundamental role. Not only to discover the factors that could optimize them, but also to decide about the possibility or not to include an in-band synchronism signals.
Latency in Audio Ethernet Networks
Men Muheim,Philipp Blum,
Today’s commodity infrastructure (computer hardware, operating system and networking) does not provide the accurate synchronization needed for playback and recording of highly correlated audio, such as multi-channel sound. The presented work explores the performance of two synchronization algorithms that are applicable for the commodity infrastructure. We have implemented the algorithms and compared the achievable accuracy, scalability, processing requirements and communication bandwidth requirements in the context of 100baseT switched Ethernet and 802.11b Wireless Local Area Network (WLAN).
On the Performance of Clock Synchronization Algorithms for a Distributed Commodity Audio System
In hands-free two-way communications systems, the loudspeaker signal feeds back to the microphone, resulting in an undesired echo signal component in the microphone signal. Acoustic echo cancelers (AEC) model the echo path and subtract an estimate of the echo signal from the microphone signal to remove the undesired echo. We present a novel scheme which estimates the echo signal in terms of its spectral envelope. The time and frequency resolution with which this estimation is carried out is chosen according to perceptual criteria. Given the estimated spectral envelope of the echo signal component, speech enhancement and noise suppression algorithms are used to suppress the echo. The presented scheme has low complexity and a high degree of robustness.
Perceptually Motivated Low Complexity Acoustic Echo Control
Richard Foss,Jun-ichi Fujimori,
mLAN is a networking technology based on the IEEE 1394 standard that allows for the transport of audio and music control data between audio devices. In the original implementation of mLAN, software within each mLAN node hosted by an audio device contained high level plug abstraction and connection management software. mLAN-B is the next generation mLAN architecture that splits the connection management function between workstation and device. The high level connection management and plug abstraction capability resides on the workstation, while a thin low level connection management capability is left on the device. This approach reduces cost and complexity on the device side and ensures that mLAN systems can be easily upgraded.
A New Connection Management Architecture for the Next Generation of mLAN
Bradley Klinkradt,Richard Foss,
This paper highlights the two interconnection technologies of CobraNet and mLAN, and provides a comparative study of these technologies and their applicability to the sound installation industry, through a discussion of constraints inherent within such an installation. Issues such as the adherence to standards, costs, latency, speed, connection management, and the control and monitoring of devices are explored.
A Comparative Study of mLAN and CobraNet Technologies and their use in the Sound Installation Industry
Michael C. Kelly,Anthony I. Tew,
In this paper we present a novel sound source localisation method that requires the spatial location of two sound sources to be matched by a listener. The method is aided by auditory feedback and effectively provides a measure of the minimum audible angle of the system under test. The number of front-back reversals and the time taken to localise each source are provided as additional indicators of performance. We demonstrate the application of our method by comparing localisation accuracy for a particular source with and without a secondary source present. The results demonstrate a small, but significant angular increase in the localisation error for the dual-source condition and also an increase in the time taken.
A Novel Method for the Efficient Comparison of Spatialization Conditions
Yufei Tao,Anthony I. Tew,Stuart J. Porter,
Simplified head shapes, such as spheres and ellipsoids have been applied in the research of head-related transfer functions (HRTFs). However, the effects of the missing head shape features in the simplified head models have not been thoroughly examined. In this paper, head shapes are represented using spherical harmonics, which allows shape simplifications to be carried out in a controlled and systematic way. The KEMAR head shape was lowpassed to different degrees and the errors in both the head shape and acoustic pressures introduced by the lowpass filters were studied. The influence on the HRTFs of head shape features remote from the ears, with particular reference to the nose, are discussed.
A Study on Head Shape Simplification Using Spherical Harmonics for HRTF Computation at Low Frequencies
Jerome Daniel,Sebastien Moreau,Rozenn Nicol,
Ambisonics and Wavefield Synthesis are two ways of rendering 3D audio, which both aim at physically reconstructing the sound field. Though they derive from distinct theoretical fundaments, they have already been shown as equivalent under given assumptions. This paper generalizes this equivalence by introducing new results regarding the coding and rendering of finite distance and enclosed sources. An updated view of the current knowledge is first given. A unified analysis of sound pickup and reproduction by mean of concentric transducer arrays then provides an insight into the spatial encoding and decoding properties. While merging the analysis tools of both techniques and investigating them on a common ground, general compromises are highlighted in terms of spatial aliasing, error and noise amplification.
Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging
Jean-Marie Pernaux,Marc Emerit,Rozenn Nicol,
Methods for reporting perceived sound source directions were compared. To estimate the localization bias introduced by the reporting method from the location percept to the judgment given by the subject, a simple non-audio based approach was proposed. Training strategies were used to simplify the reporting task. The first evaluated method was a graphical interface showing three schematic 2D views of a spherical head on which the subject reported his localization judgment with the mouse. An enhanced version of this 2D interface was designed, with feedback on the reported direction, using an individual 3D visualization of the subject and indicated direction. Sound localization tests with this interface are compared to tests using the “finger pointing” method.
Perceptual Evaluation of Binaural Sound Synthesis: the Problem of Reporting Localization Judgments
Jean-Michel Raczinski,Georges-Claude Vieilledent,Jerome Monceaux,
This paper presents a new audio process named Arkamys Sound Process. The process is based on the diffusion of sounds in an acoustic environment and the capture of these sounds with a specific device. The digital translation of this model results in a specific database made up of transfer functions that include the acoustic characteristics of the model and some optimizations. The implementation of the corresponding filters on a standard General Purpose Processor (1 GHz Pentium III) allows the processing of a stereo stream at 48 kHz. The paper focuses on the main properties of the process: sound field separation, envelopment and depth; bringing realism and naturalness on binaural, and transaural systems.
Sound Field Management with the Arkamys Sound Process
This paper offers details of the recording techniques, editing processes, and subsequent analyses of the wave samples of traditional Malaysian musical instruments. The samples are used for a multi-media project. One of the project’s outcomes includes a bank of digital samples of the indigenous instruments, for musicians and composers to use with conventional triggering devices such as keyboards, computers or drum triggers. The paper also includes a discussion of Malaysian instrument classification and brief analyses of contemporary Malaysian musical culture.
Digital Stereo Recording of Traditional Malaysian Musical Instruments
George Papanikolaou,George Kalliris,Christos Goussios,Charalampos Dimoulas,
Beethoven’s Opera Fidelio was performed inside the walls of Thessaloniki’s Byzantine castle. The need for the raise of the reverberation time of this open space was satisfied with an ambiophonic application that used electroacoustical devices in order to generate and amplify the already existing reflections. The whole system was designed to provide a reverberant and highly diffusive character, with the needs for speech intelligibility taken into account.
An Application of Ambiophony for the Enhancement of the Reverberant Environment inside the Walls of a Byzantine Castle
Yasushige Nakayama,Kaoru Watanabe,Setsu Komiyama,Fumio Okano,Yoshinori Izumi,
We are studying 3-D sound image reproduction systems associated with 3-D video images with the goal of creating a highly realistic form of 3-D TV broadcasting in the future. This paper describes a method of 3-D sound image control using loudspeaker arrays that can control the position of the sound image arbitrarily and continuously. Two kinds of subjective listening test were conducted for distant and directional perception. We confirmed that the position of the sound image can be controlled in 3-D space by using this method.
A Method of 3-D Sound Image Localization using Loudspeaker Arrays
Irina Aldoshina,Stanislav Pychkov,Igor Matcievski,Alexandr Nicanorov,Peter Tovstik,Stepan Chernyiav,
The peculiarities of the tuning and acoustical characteristics of Russian bells were investigated by various scientists for a long time. The results were reviewed in our previous report (pr.5117- 108 Convention AES) .In this paper the results of a new research are represented, including: digital recording of sounding of 16-20-th century bells made in various monasteries and temples of Russia; the computer processing and restoration of obtained recordings; the spectral and statistical analysis of soundings and comparison of bells tuning with that of a conventional Dutch system; development of mathematical models of bells vibration ; creation of the software for the analysis spectral frequencies and modes of vibration; synthesis of their geometric form to optimize the structure of spectrum.
The Analysis of Peculiarities of Russian Bells Acoustic Parameters .
Michael S. Pincus,
This paper is a follow-up to “Distributed Sound Reinforcement for Multiple Talker Locations”, presented at the AES 21st International Conference. That paper described a delay matrix used to help listeners localize several talker locations within a space. This paper will review that technique and describe its implementation in a recently completed project. Digital audio files will be presented allowing a subjective comparison between the simulated model and recordings made in the actual space.
Implementation of a Delay Matrix
Marek Szczerba,Frans de Bont,Werner Oomen,Leon van de Kerkhof,
With the introduction of new standards like DVD and SACD, multi-channel audio systems are growing in popularity. In order to maintain compatibility towards customers that have not migrated to a multi-channel environment, there is a strong demand for the stereo compatibility. Such compatibility was already introduced within the MPEG-2 BC audio standard by means of a matrixed multi-channel extension. The encoded data consists of the stereo downmix stream and the multi-channel extension information. The bitstream syntax allows legacy stereo decoders to decode the downmix, whereas the multi-channel decoders are capable to reconstruct all multi-channel signals. In this paper a matrixed multi-channel extension coding method based on the MPEG-2 AAC standard is introduced. To obtain the highest possible audio quality for both the compatible down-mix and the multi-channel signal in all circumstances, several new solutions were introduced. These include dominant center and dominant surround processing, as well as common window switching. Results of listening tests conducted using both stereo and multi-channel decoded streams are presented.
Matrixed Multi-channel Extension for AAC codec
Akira Nishimura,Nobuo Koizumi,
A method to measure sampling jitter which might be generated by a digital-to-analog converter (DAC) or an analog-to-digital converter (ADC) while reproducing or recording musical signals is discussed. We propose a method to estimate waveforms of sampling jitter while reproducing or recording musical signals by some modifications to the jitter measurement which utilizes analytic signals. Computer simulations of the measurement revealed that the minimum detectable jitter amplitude was about 3 ns. Actual measurements of sampling jitter in the ADC and DACs were also conducted using the musical signal and a pure tone. And we have so far found no jitter components which are specific to the musical signal.
Measurement of Sampling Jitter using a Musical Signal
Alberto Bellini,Antonio De Benedetti,Giovanni Franceschini,
Switching Power amplifiers are becoming quite common in audio applications, thanks to semiconductor technology advances, because of their intrinsic optimal efficiency. However usually they are used for low/medium quality application and low frequency loudspeaker systems. In this paper the design of a large bandwidth switching audio amplifier is presented. The amplifier is specifically aimed at automotive applications, where supply voltage, power consumption, and size are peculiar constraints. A prototype featuring reduced distortion, suitable for large bandwidth loudspeakers was realized and successfully tested.
Design of Digital Audio Amplifier for Automotive Applications
Robert E. (Robin) Miller III,
ITU 6.1 with six discrete full-range audio channels, implemented in DVD-A, SACD, and DTS-ES Discrete, provide the means to deliver full sphere periphonic 3D surround sound. For compatible distribution, the channels are converted to plausible 5.1/6.1 reproduction, but can still be fully recovered for “PerAmbio” reproduction – an Ambisonic + Ambiophonic hybrid approach, described in prior papers, that maximizes 3D envelopment along with front stage imaging and spaciousness, while economizing the number of channels and speakers. To clarify that fewer media channels "r" are required than speakers "s" the use of MCN - multichannel numbering, in the form "r.lfe.s" is proposed. Experimental “PerAmbio 6.1.10” (10 speakers minimum + subwoofer) recordings test three encoding variations applicable in cinema, broadcast, and music-only production.
Transforming Ambiophonic + Ambisonic 3D Surround Sound to & from ITU 5.1/6.1
Marinus M. Boone,Werner P.J. de Bruijn,
teleconferencing can be enhanced considerably with the application of spatial sound recording, transmission and reproduction. True spatial sound reproduction can be obtained with Wave Field Synthesis (WFS) which gives a sound reproduction that is independent of the listener position. Our research has shown that a significant improvement of speech intelligibility can be obtained with WFS as compared with a single loudspeaker reproduction, when there are several interfering speech signals. The improvement in Speech Reception Threshold (SRT) can be more than 2 dB, making a change in speech intelligibility from 50% to 85%.
Improving Speech Intelligibility in Teleconferencing by using Wave Field Synthesis
Werner P.J. de Bruijn,Marinus M. Boone,
Spatial reproduction of the voices of conference participants can greatly enhance the performance of a life-size videoconferencing system in terms of qualities such as speech intelligibility, speaker identification and more generally the naturalness of a conference. A very suitable technique to implement accurate spatial sound reproduction including depth is Wave Field Synthesis (WFS). This paper presents results of research that has been carried out to investigate the combination of WFS with 2D video projection, including subjective experiments on sound localization, correspondence of perceived auditory and visual source directions and speaker identification in situations with multiple speakers, as well as speech intelligibility tests and investigations on the applicability of Distributed Mode Loudspeakers in WFS.
Application of Wave Field Synthesis in Life-size Videoconferencing
Slawomir Zielinski,Francis Rumsey,Soren Bech,
The subjective effects of controlled multichannel audio bandwidth limitation and effects of selected down-mix algorithms were compared. The investigation was focused on the standard 5.1 multichannel audio set-up (Rec. ITU-R BS.775-1) and limited to the optimum listening position. The obtained results of the formal listening test show that in general listeners prefer limitation of number of channels to limitation of bandwidth, for a given ‘information rate’. However, for some programme material containing foreground content (direct sound) in the rear channels, limitation of either parameter has a similar effect.
Comparison of Quality Degradation Effects Caused by Limitation of Bandwidth and by Down-mix Algorithms in Consumer Multichannel Audio Delivery Systems
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
It has been shown that listener envelopment (LEV) can be systematically controlled in a multichannel surround system by varying the level and angular distribution of the late-arriving sound. While the perceptual transition point between early and late energy has traditionally been set to 80ms when predicting LEV, this matter has not been rigorously investigated. In the present study a series of formal subjective tests were conducted to investigate the perceptual point where the early energy ends and the late energy begins. Listeners were asked to rate the amount of LEV in sound fields where the temporal and spatial distributions of the late energy were varied. The results of the subjective tests were used to investigate suitable objective measures for predicting LEV.
Temporal Aspects of Listener Envelopment in Multichannel Surround Systems
Current multichannel audio receivers use a separate digital signal processor and microcontroller to manipulate audio stream data and control the audio flow. The biggest challenge in these systems is coordinating the interaction between the DSP and MCU to create a high quality audio product. By the use of a single high performance DSP and complementary software, it is possible to create a new multichannel audio receiver that is of higher audio quality and provides for a more cost effective system price and quicker time to market. The development of an artifact-free system is quicker, easier and more robust as a result. This also helps reduce the system cost as a fairly major component (the microcontroller) is eliminated.
Single DSP for Audio Stream Control and Manipulation in Multichannel Receiver
Spectral analysis of a physically modeled audio sample can be related by harmonic spectrum to pitch intervals. Pitch class sets, derived from pitch intervals, may be selected that have a significant interval class vector relationship to the harmonic spectrum which a physically modeled sound displays. A software based granular synthesizer using cellular automata computer modeling techniques has been used to evolve selected pitch class sets over time. Macro-architectural musical structures are created by combining multiple pitch class set evolutions. Audio and video examples of pitch class sets being evolved by cellular automata will be presented. This paper focuses on the application, integration and implications of these processes, as the mathematical procedures are well documented in other sources.
Applying Physical Modeling, Set Theory, and Cellular Automata to Create Computer Synthesized Musical Compositions
Perfecto Herrera,Amaury Dehamel,Fabien Gouyon,
We report a series of studies related to automatically labeling sounds from unpitched percussion instruments. Different databases have been set up in order to study relevant factors for labeling different sets of classes (up to 32 instruments, drum kit sounds, electronic sounds, manufacturer “acoustic” signatures…). Usual spectral features (i.e. centroid, skewness, etc.), Bark-band relative energies, and Mel-Frequency Cepstral coefficients, besides some additional original descriptors have been evaluated alongside different feature selection strategies. Classification techniques having different flavors and tradeoffs (k-NN, Kernel Density, Canonical Discriminant Analysis, Binary trees, etc.) have been also evaluated. It is shown that the feature set can be reduced by factors ranging from 2 to 3 without affecting performance, and that performance differences between classification techniques are in the order of 10% between the best (usually Kernel Density estimation) and the worst tested technique. For the most complex problem (classification of 32 sound categories) best hit rates were of up to 80%, whereas for drum kit problems containing 9 or less classes, performance increased to 88%, even using independent sample cross-validation. Very high success rates have been also achieved when labeling manufacturer-model, and when classifying drum machine sounds.
Automatic Labeling of Unpitched Percussion Sounds
Thorsten Heinz,Andreas Brueckmann,
Recent trends in musical audio signal analysis increasingly promote use of perceptual motivated modifications of conventional signal processing algorithms. A consequent further step consists of the inclusion of the knowledge of the structure of mammalian auditory periphery. The presented work uses physiological models in order to mimic active functionality of the inner ear including the transduction from mechanical vibrations into neural impulses. The main part of the paper describes automatic transcription of melodies from real world musical inputs. Bottom-up extraction and segmentation of pitch trajectories based on the outputs of the used models, i.e.~concentration of transmitter substance inside the inner hair cell clefts, are demonstrated. As an example for the wide range of possible further applications a sound source recognition approach using woodwind instruments is proposed. Results indicate that the algorithm performs excellent compared to traditional methods.
Using a Physiological Ear Model for Automatic Melody Transcription and Sound Source Recognition
Derry FitzGerald,Robert Lawlor,Eugene Coyle,
This paper introduces the technique of Prior Subspace Analysis (PSA) as an alternative to Independent Subspace Analysis (ISA) in cases where prior knowledge about the sources to be separated is available. The use of prior knowledge overcomes some of the problems associated with ISA, in particular the problem of estimating the amount of information required for separation. This results in improved robustness for transcription purposes. Prior knowledge is incorporated by use of a set of prior frequency subspaces that characterise features of the sources to be extracted. The effectiveness and robustness of PSA is demonstrated by its use in a simple drum transcription algorithm.
Prior Subspace Analysis for Drum Transcription
Paul M. Brossier,Mark B. Sandler,Mark D. Plumbley,
This paper describes the design of a real-time MP4 Structured Audio codec for monophonic signals. The coding of the live input consists of a pitch detection system which returns the MIDI-like data, and an additive synthesis scheme which creates and modifies the current instrument. Both parts are designed to be fast and scalable : the analysis parameters let the user choose the computational cost for both the analysis and the resynthesis. The extracted objects can then be used in live environments for encoding and/or creation.
Real Time Object Based Coding
Emilia Gomez,Fabien Gouyon,Perfecto Herrera,Xavier Amatriain,
The aim of this paper is to discuss possible ways of describing some music constructs in a dual context. First, that of a specific software application, a tool for content-based management, content edition and content-based audio transformation: the Sound Palette. Second, that of the current standard for multimedia content description: MPEG-7. Different musical layers, melodic, rhythmic and instrumental, are examined in terms of usable descriptors and description schemes. After discussing some MPEG-7 limitations regarding those specific layers given the needs of our application, some proposals for overcoming them are presented.
Using and Enhancing the Current MPEG-7 Standard for a Music Content Processing Tool
Fabien Gouyon,Perfecto Herrera,
We address the problem of classifying polyphonic musical audio signals by their Meter, as ‘binary’ or ‘ternary’. Experiments have been conducted on a 70 instances database (20s excerpts from commercial songs without particular genre or timbre restriction). The Meter is the number of beats between regularly recurring accents (or Downbeats). Our approach aims to test the hypothesis that acoustic evidences for Downbeats can be measured on the signal; putting a special focus on their temporal recurrences. We experimented several approaches to the problem of feature selection and report some interesting results: measurements of a very small set of beat descriptors (i.e. 4) and subsequent processing (based on descriptors’ autocorrelation functions) permit to reach around 95% of correct classification. Using only the temporal centroid, almost 90% of correct classification can be achieved.
Determination of the Meter of Musical Audio Signals: Seeking Recurrences in Beat Segment Descriptors
Chris Duxbury,Mike Davies,Mark B. Sandler,
We present a method to achieve good segmentation of note events for use with non-linear time scaling algorithms, greatly reducing artefacts due to both rhythmic distortions and soft note transitions being treated as percussive transients. The proposed algorithm isolates percussive transients as a subset of note-onsets, leading to a more meaningful segmentation. A subband based hybrid onset detection algorithm forms the basis of this segmentation scheme. A new frequency content distance measure, automatic threshold setting and subband result validation are all key elements of this scheme. At the subband re-combining stage the algorithm differentiates between note onsets, which may appear in one or more subbands, from percussive transients that appear in multiple subbands.
Temporal Segmentation and Pre-analysis for Non-linear Time-scaling of Audio.
Juergen K. Lang,
Promising perfect security, a new media protection system called “m-sec” challenges the audio world. The concept is both straightforward for the customer and complex from a technical point of view. Digital music, protected with m-sec, never leaves the encrypted domain and can be published over the Internet. But playing the music will only be possible after “personalization” – for a one time fee.
Protecting Digital Media with End-to-End Encryption
Wayne Jones,Michael Wolfe,Theodore C. Jr. Tanner,Daniel Dinu,
The Personal Computer Audio environment has evolved over the years to become a tier one entity within the acquisition and rendering of audio information. The personal computer is a highly stochastic interactive environment that is much more complex than a traditional dedicated capture or rendering device, lending itself to new problem areas. These include, but are not limited to, stochastic interrupts, network accesses, disc I/O and disparate hardware qualities. While the environment of a highly matrixed multi-tasking concurrent operating system lends itself to many opportunities to overcome quality issues, the PC due to the media rich tools and feature sets is becoming the capture and rendering device of choice for future generations. Many of the quality issues have been hardware focused, such as converter quality, power supply quality and component metrics. We will be focusing on software performance metrics which are, by definition, much more difficult to ascertain. The tests will include "glitch verification", throughput latency, and MIDI latency. We will also address traditional audio measurements such as distortion, frequency response and signal-to-noise ratios, but will extend these to new depths.
Testing Challenges in Personal Computer Audio Systems
Edmundo Monteiro,Nuno Fonseca,
Although many audio devices have the capabilities to be remotely controlled (by MIDI, RS-232, proprietary systems), usually this control it’s oriented to automation and computer control, using hundreds of specific parameters, which usually don’t support all the control features and presents some problems as a human interface, forcing a manual intervention in the equipment. To resolve this problem, “Remote Interface for Audio Devices” (RIAD) allow a remote access to the devices interface (leds, buttons, displays, knobs,...) letting you do any king of operation.
Remote Interface for Audio Devices (RIAD)
After several years, trying to emulate real musical instruments, it’s time to emulate a real choir, but with a new challenge – Text. This paper presents a software solution (based on sampling, not on synthesis) to make the computer sing text. You write the text, play the keyboard (or any MIDI device or sequencer), and hear the choir singing in realtime.
VOTA Utility: Making the Computer Sing
Aleksandar Simeonov,Giorgio Zoia,Robert-Lluis Garcia,
The geometrical and perceptual room models for 3D audio production and rendering, proposed by notable tools and more recently supported by the MPEG-4 standard, are compact and satisfactory description schemes for a wide range of audio and multimedia applications. Support of both approaches at the same time is a challenging task, especially when the requirements are high quality, precise synchronization with other media and acceptable latency to user interaction, as it often happens in standardized contexts for media integration. In this paper we present first the results from several experiences with different Application Programming Interfaces and hardware platforms; suitable extensions to support physical and perceptual models are then described and their integration into an MPEG-4 compliant player is presented.
Rendering of Advanced 3D Room Models by Enhanced Application Programming Interfaces
Michiel van der Veen,Arno van Leest,Fons Bruekers,
In this paper, we present a fragile and high capacity-watermarking technique for digital audio signals. The watermark itself is reversible, which in this context refers to the ability to restore the original input signal in the watermark detector. In summary, the approach works as follows. In the encoder, the dynamic range of the input signal is limited (i.e. the signal is compressed), and a part of the unused bits are deployed for encoding the watermark. Another part of these bits is used to convey information for the bit-exact reconstruction of signal. It is the purpose of the watermark decoder to extract the watermark and reconstruct the input signal by restoring the original dynamic range. In this study we extensively tested this new algorithm with a variety of settings using audio items with different characteristics. These experiments showed that for 16bit PCM audio sampled at 44.1 kHz, capacities close to 44000 bits per second can be achieved, while perceptual degradation of the watermarked signal remained acceptable.
Reversible Audio Watermarking
Anticipating some emerging audio devices and features, this paper surveys trends in mobile telephony (especially regarding mobile internet in Japan), wearable/intimate multimedia computing, handheld/nomadic/portable interfaces, and embedded systems like multimedia furniture, especially including research by the author's own group, which has built interfaces using Java3D, MPEG-4, and QTVR. Such extended and enriched audio interfaces encourage multipresence, the inhabiting by sources and sinks of multiple spaces simultaneously, allowing, for instance, a user to monitor several aligned spaces at once (conferences, entertainment, navigation, warnings, etc.). Keywords: audio interaction, CVEs (collaborative virtual environments), handheld/mobile/portable interfaces, integration of mobile devices and telecommunication, mobile internet, multimodal interaction, telerobotics, ubicomp (ubiquitous computing) technology, wearable/intimate multimedia computing.
Emerging and Exotic Auditory Interfaces
Andreas Dantele,Michael Schuldt,Ulrich Reiter,Oliver Baum,Helge Drumm,
We show how to use MPEG-4 audio nodes in an interactive virtual 3D scenery to improve scene realism in the auditory domain. Therefore we extend the Virtual Reality Modeling Language (VRML) because of it's similarity to the MPEG-4 scene description. In addition to the implementation of localized sound sources in the scenery, the effect of acoustic obstruction is discussed. Several possibilities for the detection of obstruction are presented. The results demonstrate the capabilities of MPEG-4 audio scene description and thus point out the need for a fully compliant MPEG-4 player to utilize the complete functionality of this standard.
Implementation of MPEG-4 Audio Nodes in an Interactive Virtual 3D Environment
Frank Kurth,Roman Scherzer,
We propose a framework for matching a single PCM audio stream against multiple candidate audio streams. This framework allows for a real-time identification of a single audio stream w.r.t. the candidate streams. The identification is robust to signal delays of up to several seconds as well as to signal distortions due to lossy coding, a noisy environment, or analog transmission. An area of application is the query-by-mobile-phone scenario where a user transmits an audio stream recorded from the radio using his mobile phone as a recording device. The transmitted audio stream may then be identified using the proposed framework by real-time matching of the audio stream to all possible radio programmes.
Robust Real-Time Identification of PCM Audio Sources
The design of antialias and reconstruction filters has traditionally presented a dilemma. Minimum-phase filters suffer from increased group delay at high frequencies, which is considered undesirable. Linear phase filters suffer from pre-responses, also sometimes considered undesirable. At high sampling rates, it is possible to ameliorate these effects by suitable tailoring of the response above 20kHz. The paper presents some designs intended for use at 96kHz and 192kHz that are simultaneously optimised in the frequency domain in the range 0-20kHz, and in the time domain wideband. Pre-responses can be penalised more heavily than post-responses, resulting in an asymmetrical impulse response somewhat similar to a minimum phase analogue response. The paper also considers briefly the use of a filter of this type at the end of a recording and reproducing chain that may contain an unknown combination of conventional bandlimiting filters.
Controlled Pre-response Antialias Filters for Use at 96kHz and 192kHz
Discrete construction of a simple converter architecture delivers performance bettering that of much more involved integrated designs. A straightforward architecture for a noise shaping ADC is outlined. Techniques for optimal discrete-circuit implementation of the building blocks are presented. A practical circuit is built and the measurements shown.
Design Techniques for High-performance Discrete A/D Converters
David Dorran,Robert Lawlor,Eugene Coyle,
The synchronised overlap-add (SOLA) algorithm is a commercially popular and considerably researched audio time-scale modification technique. It operates in the time domain and uses a correlation technique to ensure that synthesis frames overlap in a synchronous manner. We present a modification to SOLA that allows the analysis step size adapt to the desired time-scale factor. The synchronised and adaptive overlap-add (SAOLA) algorithm improves upon the output quality of SOLA for high time-scale factors and reduces the computational requirements for low time-scale factors. However, the computational requirements for high time-scale factors are increased.
Time-Scale Modification of Speech using a Synchronised and Adaptive Overlap-Add (SAOLA) Algorithm
James A.S. Angus,
This paper describes lookahead Sigma-Delta Modulation (SDM) systems based on tree searching. It presents several methods of implementing tree searched SDM. In particular, it looks at ways of making the search more efficient so that significant amounts of lookahead can be achieved. Finally, it compares tree searching to the Viterbi algorithm and demonstrates the computational and implementation advantages tree searching can offer.
Tree Based Lookahead Sigma Delta Modulators
John C. Sarris,George E. Cambourakis,
Rooms are modeled as linear time invariant systems, where the room impulse response (RIR) describes the transmission characteristics for a specific source receiver pair. Concepts of time frequency analysis are used to decompose the RIR in the time frequency domain, where parametric models are employed to model the different subband signals. The evaluated models perform exact modeling of the early reflections, whereas the decay rate of the reverberant part is sufficiently approximated. Equalization is performed in the time frequency domain, where someone can selectively equalize frequency subbands. Two different rooms are studied as example cases.
Time Frequency Analysis, Modeling and Equalization of Room Impulse Response Functions
In this paper, a general class of sampling rate converters for conversion between arbitrary sampling rates is presented. The performance of these converters can be described by simple formulas that show how to trade off memory consumption versus computational complexity. Especially, converters with optimum properties in view of computational complexity are presented. Possible applications of these converters are pitch shifting or correction as well as sampling rate conversion by digital systems with memory limitation.
A Class of Sampling Rate Converters with Interesting Properties
Rolf Esslinger,Gerhard Gruhler,R.W. Stewart,
A fundamental problem in digitally controlled class-D power amplifiers is the distortion of the amplified pulse signal by imperfections of the power transistors and the analogue output circuitry. The only way to overcome this is to provide error correction by feedback. With a pulse-modulated signal (either PWM or Sigma-Delta-Modulation) this correction can be performed by changing the pulse edge timing, as it is done in some existing solutions. If error correction during the pulse generation inside of the digital system is desired, an Analogue- to Digital converter is needed. In this paper the most important solutions done so far are reviewed and further ideas based on feedback into the digital system are introduced.
Feedback Strategies in Digitally Controlled Class-D Amplifiers
Patrick J. Wolfe,Simon J. Godsill,
In this paper we present an overview of complex wavelets, and discuss their application to audio signal processing. While traditional audio processing has involved time-frequency rather than time-scale representations, we demonstrate some of the advantages to be gained through the use of complex wavelets. We focus on two main applications of interest to the audio community: noise reduction and signal compression.
Audio Signal Processing Using Complex Wavelets
Ernst F. Schroeder,Johannes Boehm,
When audio signals are encoded and decoded with typical "lossy" data compression codecs, then it is expected that the decoded audio signals are almost never identical to the original signals. What is often overlooked is, that the decoded audio signals are also typically no longer time-aligned with the original signals. It is shown how these small timing errors are due to the block structure of data processing and to the look-ahead needed for psychoacoustic models. For audio codecs based on the ISO/IEC MPEG standards a solution is introduced which restores the original timing. This "OFL" feature is available in the latest implementations of the mp3PRO audio codec.
Original File Length (OFL) for mp3, mp3PRO and Other Audio Codecs
Laurent Daudet,Mark B. Sandler,
One of the most noticeable (and difficult to remove) artifacts in low bit-rate audio codecs is the high-frequency amplitude modulations, so-called ``birdies'' (or ``warbling'' artifacts). In this paper, we investigate the theoretical reasons why such artifacts occur, and in particular we show that they cannot be avoided with complete representations, such as critically sampled subband schemes, under a simple thresholding / quantization operation. In the case of the Modified Discrete Cosine Transform (MDCT), that forms the basis of many current coders, it is possible to compute explicitly the time-dependency of the coefficients of a pure sinusoid. From this, we derive a simple pre-filtering algorithm, that suppresses most of the frequency lines prone to warbling.
MDCT Analysis of Sinusoids and Applications to Coding Artifacts Reduction.
Joshua D. Reiss,Mark B. Sandler,
A compact form can be used to describe an arbitrary high order sigma delta modulator. Such a format is beneficial because it provides insight into the structure of limit cycles in sigma delta modulators. We consider modulators of any order with periodic output. We make no assumptions regarding the input and are thus able to prove necessary conditions for limit cycles in the output. We show that the input must be periodic, but may have a different period from both integrator output and quantised output. We derive what this implies regarding limit cycles for sinusoidal inputs. Finally, we give examples where sinusoidal input to a third order modulator results in a limit cycle of a different frequency.
They Exist: Limit Cycles in High Order Sigma Delta Modulators
Daniel Homm,Thomas Ziegler,Robert Weidner,Reinhold Bohm,
Founded in 1998, DRM (Digital Radio Mondiale) aimed to create a new digital broadcast standard for the frequencies below 30 MHz (long, medium and short wave). The DRM standard has been finalized in January 2001 and offers significantly improved audio and reception quality. The audio codec chosen combines Spectral Band Replication (SBR) with MPEG AAC (Advanced Audio Coding). This new technique, also known as aacPlus, offers unprecedented compression efficiency and excellent audio quality for the chosen bitrates. As the launching date for first DRM services comes closer, different chip sets are in course of development in order to make them available to receiver manufacturers. This paper will focus on the implementation of aacPlus on an ARM (Advanced Risc Machine) platform within the DIAM (Digital AM) project.
Implementation of a DRM Audio Decoder (aacPlus) on ARM Architecture
Andreas Ehret,Martin Dietz,Kristofer Kjorling,
This paper will discuss the combination of the new approach of the Spectral Band Replication (SBR) technology with the leading conventional waveform audio coder standarized in MPEG, Advanced Audio Coding (AAC). With this enhanced audio coding scheme, named aacPlus, it is possible to achieve high-quality audio in stereo at bit rates as low as 40 kbit/s, stereo. It is thus especially interesting for applications where highest compression efficiency is desired for the reasons of cost or physical limitations, such as digital broadcasting or mobile applications. An overview on the latest development with respect to the standarization process of aacPlus within MPEG-4 and subjective verification results are also given.
State-of-the-Art Audio Coding for Broadcasting and Mobile Applications
Alberto Bellini,Andrea Azzali,Angelo Farina,Marco Romagnoli,Eraldo Carpanoni,
Steady-state characterization of acoustic environment is not enough for an efficient compensation. Therefore a new concept of equalization can be defined. It relies on dynamic frequency response and on articulation to design the equalizer shape. Specifically the inverse filter shape will be based on the dynamic frequency response instead of on the steady-state frequency response. This is obtained relying on AQT methods, which use a variable frequency burst as a stimuli. Moreover the inversion will be based on a target frequency response shape, which corresponds to a maximum pleasantness.
AQTtool an Automatic Tool for Design and Synthesis of Psychoacoustic Equalizers
Mohan D. Rao,Tomasz Letowski,
The specific objective of this project is to assess the speech intelligibility using both subjective and objective methods of one of the new speech test methods developed at U.S. Army Research Lab called the Callsign Acquisition Test (CAT). This study is limited to the determination of speech intelligibility for the CAT in the presence of various background noises, such as pink, white, and multitalker babble.
Speech Intelligibility of the Call Sign Acquisition Test (CAT) for Army Communication Systems
Every user of the mobile phones has different hearing requirements. The individual's hearing requirements change over a period of time based on the prolonged exposure to background noise, age related factors, hearing related illness etc. This paper proposes the personalization of the mobile phones based on the hearing requirements of an individual for improving speech quality. There exist several techniques to accurately assess the hearing requirements of cellular phones users. We present the use of fuzzy logic to improve the speech quality based on the assessed hearing data. The mobile phone if personalized by designing the fuzzy rule base to suit the user's requirement.
Acoustic Personalization of Mobile Phones
Simon Tucker,Guy J. Brown,
A psychophysical experiment was undertaken to investigate whether human listeners are able to perceive the material properties of struck plates when they are suspended in air, and also when they are artificially damped by suspension in water. Listener's judgements of the size, shape and material of the plates were found to be less reliable for the damped case. A computational model was developed which estimates the material properties of an impulsive sound by measuring the decay rate of significant acoustic components at the ouput of an auditory filterbank. The model provides a good overall match to the pattern of human responses in the psychophysical study.
Modelling the Auditory Perception of Size, Shape and Material: Applications to the Classification of Transient Sonar Sounds
Todd Welti,Rene E. Jensen,
A room impulse response constitutes a unique signature of an acoustical space, however much of the detail is masked by the direct sound and highest level reflections. Reflections which are substantially masked may be simplified in some way without audible effects, thus reducing computational requirements. Audibility thresholds of individual reflections were measured, then used as a template to remove low level reflections from the BRIR, and replace them with a simplified signal. This was done using a simple binaural loudness model. Listening tests showed that for voice signals, altered BRIR’s were virtually indistinguishable from the original versions, even when 93% of the BRIR between 15 ms and 200 ms was replaced.
The Importance of Reflections in a Binaural Room Impulse Response
Hossein Najaf-Zadeh,Hassan Lahdili,Louis Thibault,Michel C. Lavoie,
This paper presents a model of auditory temporal masking for perceptual audio coders. As such, we have developed a model to incorporate temporal masking effects into the MPEG psychoacoustic model 2. The enhanced psychoacoustic model renders more accurate masking thresholds. Since the masking thresholds are used in adaptive bit allocation, a better psychoacoustic model leads to higher audio quality at a fixed bit rate. Informal listening tests have shown that the incorporation of temporal masking into the MPEG-1 Layer 2 encoder results in a reduction of 5-14% in the average bit rate for transparent coding.
Use of Auditory Temporal Masking in the MPEG Psychoacoustic Model 2
Yoshiki Ohta,Takashi Mitsuhashi,Shinji Koyano,
A method to control spatial impression in sound reproduction in small space has been developed. First, psychological scales of spatial impression are obtained by subjective evaluation on sounds convoluted with impulse responses of various rooms that differ in volume. Next, relationship between the psychological scale of spatial impression and physical features of the corresponding room impulse response is examined. It turns out that a psychological scale can be represented by linear combination of energy distribution on time-frequency plane calculated from an impulse response. Finally, a sound field control method based on the objective measure is invented, and validity of this method is clarified from experiments applied to some real sound fields.
A Sound Field Control Method Based on an Objective Measure of Spatial Impression
Hania Farag,Jens Blauert,Onsy Abdel Alim,
Efficient simulation of sound-source occlusion is needed in auditory virtual environments and remains an unsolved problem. In order to achieve this, the changes in psychoacoustical parameters accompanying the perception of sound-source occlusion have to be identified and understood. The impact of occlusion on localization of auditory events is investigated with the aid of listening tests. Binaural impulse responses are recorded for this purpose in an anechoic chamber, in which rectangular wood plates of different dimensions are used to represent the occluders.
Psychoacoustic Investigations on Sound-source Occlusion
Kevin McLaughlin,Kumaresh Bathey,Huaijin Chen,
A single chip AES3 receiver and transmitter with an asynchronous sample rate converter is presented that supports sample rates from 32 kHz to 192 kHz. This single chip has a highly flexible architecture that allows the receiver, transmitter and sample rate converter to be independently interconnected amongst themselves and to two serial input ports and two serial output ports. The AES3 receiver recovers the audio data, channel status, user bits and clock with jitter less than 150 ps from an incoming AES3 stream with sample rates up to 192 kHz while complying with the AES3, AES11 and SMPTE337M standards. The asynchronous sample rate converter is capable of upsampling by 1 to 8 and downsampling by 7.75:1 while maintaining a minimum of 120dB THD+N. The AES3 Transmitter can transmit data at sample rates up to 192kHz and supports the AES3 and AES11 standards. The received and transmitted channel status and user bits are buffered without the use of arbitration logic for fast access through a control port.
A Single Chip AES3 Receiver and Transmitter and Asynchronous Sample Rate Converter Supporting Sample Rates from 32 kHz to 192 kHz
Poju Antsalo,Matti Karjalainen,Aki Makivirta,Vesa Valimaki,
Modal equalization of low-frequency room modes has recently been proposed as a method to improve sound reproduction in spaces where modal decay time is too long. This is achieved by signal processing techniques that reduce the pole radii of problematic modes in the overall transfer function. In this paper, we will compare the performance of different methods proposed for modal equalization.
Comparison of Modal Equalizer Design Methods
Pieter Harpe,Derk Reefman,Erwin Janssen,
Recently, a new type Sigma Delta Modulator (SDM), a Trellis Noise-Shaping Converter, has been introduced in  as a 1-bit digital audio stream generator. This type has several advantages compared to standard SDM, including better performance in stability, signal to noise ratio, linearity and prevention of oscillation. A major drawback of the trellis architecture is the large amount of computations and memory usage compared to standard architecture. An efficient trellis implementation will be introduced resulting in a significant decrease of the number of computations and memory with hardly any change in the converter’s performance.  Hiroshi Kato, “Trellis Noise Shaping Converters, Architecture and Evaluation”, Technical Report of IEICE, Vol. 101, No. 381, 65-72, EA2001-59 (2001)
Efficient Trellis-type Sigma Delta Modulator
Derk Reefman,John van den Homberg,Ed van Tuijl,Corne Bastiaansen,Leon van der Dussen,
A new Digital-to-analogue converter (in a .18 micron process) will be presented which exhibits exceptionally large linearity (Total Harmonic Distortion < -118 dB at full scale input) and a very good signal-to-noise (SNR) performance (> 120 dB SNR, unweighted). Targeted as a DAC to meet the Super Audio CD specifications, the DAC is designed as a n-bit (n = 4,5, or 6) switched current Sigma Delta modulator, running at a sample rate of 2.8 or 5.6 MHz. While the SNR is obtained by standard but carefully applied design techniques, the linearity of the DAC is obtained by a new Dynamic Element Matching (DEM) technique. Contrary to existing DEM approaches, the DEM technique fully removes any error due to mismatch in the switched current sources, instead of spectrally shaping the error. The new DEM technique will be explained in detail, and measurement results will be given.
A New Digital-to-Analogue Converter Design Technique for HiFi Applications
Discrete-time models of loudspeaker dynamics are presented. These have been developed to simplify digital processing for active loudspeaker control. The discrete-time representations are in contrast to classical loudspeaker models, which are all represented in continuous time. This simplifies implementation of such aspects of active control as parameter identification, equalisation, and nonlinear distortion compensation.
Discrete-time Loudspeaker Modelling
Scott G. Norcross,Gilbert A. Soulodre,Michel C. Lavoie,
Previous work has shown that inverse filtering to correct the impulse response (IR) of an audio system can degrade the subjective quality in certain conditions. The severity of the degradation depends on both the response of the system that is being inverted and filter inversion method used to correct this response. Regularization has been proposed as a means to improve the performance of inverse filtering by limiting how much “work” the inverse filter will do to correct the system response. In this paper, formal subjective tests were conducted to examine the subjective effects of regularization on inverse filtering. The regularization techniques implemented include frequency independent and dependent methods as well as a perceptually-motivated method. The subjective tests were based on the ITU-R MUSHRA method.
Subjective Effects of Regularization on Inverse Filtering
David Darlington,Mark B. Sandler,
Audio signals are often stored or transmitted in a compressed, transform domain representation. This can pose a problem when there is a requirement to perform signal processing, in that it may be necessary to convert the signal back to a time domain representation prior to processing, and then re-transform. This is time-consuming and computationally intensive, and may degrade the signal. It is thus potentially more effective and efficient for the processing to be applied while the signal remains in the transform domain. We have implemented a scheme whereby processing may be applied to signals stored in a wavelet domain representation without this implicit constraint.
Audio Processing in the Wavelet Domain
Michael Schug,Alexander Groschel,Michael Beer,Fredrik Henn,
Layer2 + SBR is an audio coding scheme which exceeds significantly the coding efficiency of MPEG-1/2 Layer2 especially for broadcasting applications like DAB (Digital Audio Broadcasting). Spectral Band Replication (SBR) is a technique to enhance the efficiency of perceptual audio codecs. High frequency parts of an audio signal are reconstructed on decoder side, so the audio encoder can focus on coding the low frequency part. Thus, a bitrate reduction can be achieved while maintaining subjective audio quality. Besides increasing the coding efficiency, the use of MPEG-Layer2 + SBR inside Eureka DAB would maintain backwards compatibility: Existing DAB receivers are capable of decoding the (bandwidth limited) Layer2 part of the bitstream. This paper describes the functionality of SBR and Layer2 + SBR and the achievable increase in coding efficiency. Furthermore, possible application and introduction scenarios are addressed.
Enhancing Audio Coding Efficiency of MPEG Layer-2 with Spectral Band Replication (SBR) for DigitalRadio (EUREKA 147/DAB) in a Backwards Compatible Way
Miikka Vilermo,Sebastian Streich,Mauri Vaananen,Karsten Linzmeier,Bernhard Grill,Ye Wang,
In a simple scalable audio coding scheme, there are usually two layers – a base layer and an enhancement layer. This paper describes a frequency selective switch (FSS) control algorithm, which is employed to optimally code the signal in the enhancement layer. The FSS determines whether the original signal or the residual of the original and base layer signals is sent to the enhancement layer in certain frequency bands. The proposed method introduces some advanced mechanism to the FSS and the quantization process to achieve perceptually optimal result in the encoding process, particularly at low bitrates. These changes do not assume any modifications in the decoder.
Perceptual Optimization of the Frequency Selective Switch in Scalable Audio Coding
Werner Oomen,Erik Schuijers,Bert den Brinker,Jeroen Breebaart,
In the course of the “MPEG-4 Extension 2” standardisation process, a parametric coding scheme is under development. This coding scheme is based on the notion that any audio signal can be dissected into three objects: transient, sinusoids and noise. Each of these objects allows for an efficient parametric representation. The parametric coder is targeting medium to high quality for CD quality material, at bit-rates ranging from 24 – 40 kbit/s stereo. Recently, improvements have been made to increase the overall performance of the coder. One of these improvements consists of an improved model for the noise object. Furthermore, a very efficient representation for the stereo image has been defined. In this paper, all recent improvements will be discussed.
Advances in Parametric Coding for High-Quality Audio
Michael J. Smithers,Brett G, Crockett,Louis D. Fielder,
Two methods of coding and delivering ultra-high quality audio are presented. Both methods are video frame synchronous and editable at common video frame rates (23.98, 24, 25, 29.97 and 30 frames per second) without the use of sample rate converters. The first is an ultra-high quality audio coder that exceeds 4.8 on the ITU-R 5 point audio impairment scale at a data-rate of 256 kbps per channel and at up to three generations of encoding/decoding. The second is an enhanced method of video frame synchronous PCM packing. Specifically the problem of transmitting 48 kHz audio in 29.97 Hz frames is examined.
Ultra High Quality, Video Frame Synchronous Audio Coding