Audio Engineering Society Preprints

AES 118th Convention

Barcelona, Spain
May 28-31, 2005

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

Jeppesen, Jakob; Moeller, Henrik
Spatial localization of sound is often described as unconscious evaluation of cues given by the interaural time difference (ITD) and the spectral information of the sound that reaches the two ears. Our present knowledge suggests the hypothesis that the ITD roughly determines the cone of the perceived position (i.e. the azimuth in a polar coordinate system with left-right poles), whereas the spectral information determines the position on the cone (i.e. the elevation in the same coordinate system). This hypothesis was evaluated in a series of listening tests, where the two cues were manipulated independently in HRTFs used for binaural synthesis. The ITD seems to be dominant for localization in the horizontal plane even when the spectral information is severely degraded.
Cues for Localization in the Horizontal Plane

Brian, Katz; Rozenn, Nicol; Sylvain, Busson
The Interaural Time Difference (ITD) is the primary cue for sound localisation. Although ITD has been intensively studied, there are still many issues to point at. This paper focuses on the ITD in relation with minimum phase filter modelling for the binaural synthesis. Comparison between ITD extraction methods highlights discrepancies for locations near the interaural axis. A subjective experiment was carried on to know the ITD to use with a given pair of minimun phase filters. Subjects had to adjust the ITD of a test sound in order to move it as close as possible to a target sound. Results are analysed in terms of psychoacoustic ITD estimations compared with extraction methods from Head Related Transfert Function (HRTF).
Subjective Investigations of the Interaural Time Difference in the Horizontal Plane

Cabrera, Densil; Ferguson, Sam; Subkey, Alan
Using four subwoofers, this study investigates auditory image characteristics for one-third octave bands of pink noise in the frequency range 25 Hz to 100 Hz. The subwoofers were located at 90 degree intervals: 45 degrees to the left and right, and in front of and behind the subject. Single noise bands, coherent pairs, incoherent pairs and different frequency band pairs were subjectively assessed. Subjects drew the auditory image as an ellipse on a response sheet. Results indicate that left-right discrimination occurs even at the lowest frequencies of human hearing – a finding consistent with other recent research. ASD and ASW are correlated, increasing at low frequencies for the stimuli tested, and for simultaneous presentation of coherent or incoherent signals.
Localization and Image Size Effects for Low Frequency Sound

Hoffmann, Pablo Faundez; Moeller, Henrik
In binaural synthesis, signals are convolved with head-related transfer functions HRTFs. In dynamic systems, the update is often done by cross-fading between signals that have been filtered in parallel with two HRTFs. An alternative to cross-fading that is attractive in terms of computing power is direct switching between HRTFs that are close enough in space to provide an adequate auralization of moving sound. However, direct switching between HRTFs does not only move the sound but may also generate artifacts such as audible clicks. HRTF switching involves switching of spectral characteristics and timing characteristics (ITD), and the audibility of these were studied separately. The first results, data on minimum audible time switch, MATS, are presented.
Audibility of Time Switching in Dynamic Binaural Synthesis

Ajdler, Thibaut; Faller, Christof; Sbaiz, Luciano; Vetterli, Martin
We are proposing an interpolation technique for head related transfer functions (HRTFs). To derive the algorithm we study the dual problem where sound is emitted from the listener's ear and the generated soundfield is recorded along a circular array of microphones around the listener. The proposed interpolation algorithm is based on the observation that spatial bandwidth of the measured sound along the circular array is limited (for all practical purposes). Further, we observe that this spatial bandwidth increases linearly with the frequency of the emitted sound. The result of the analysis leads to the conclusion that the necessary angle between consecutive HRTFs is about 5 degrees in order to be able to reconstruct all HRTFs at 44.1 kHz temporal sampling frequency in the horizontal plane.
Interpolation of Head Related Transfer Functions Considering Acoustics

Kvist, Preben; Poulsen, Torben; Rasmussen, Karsten Bo
This paper investigates the subjective effects of the use of dither in digital audio systems. A short introduction to dithered and undithered quantization is given, and the subjective experiment is described. The results of listening tests on undithered, subtractive dithered and non-subtractive dithered quantization are presented. The listening tests were made with 4 to 12 bits/sample. The subjective tests show that subtractive dithering is preferred to undithered quantization up to 8 bits/sample. Nonsubtractive dithering was, with music stimuli, only preferred to undithered quantization at 4 bits/sample. With speech, nonsubtractive dither was preferred to undithered quantization up to 8 bits/sample. Triangular probability density function dither was only preferred to rectangular probability density function dither at very low bit rates.
A Listening Test of Dither in Digital Audio Systems

Gonzalez-Rodriguez, Joaquin; Ortega-Garcia, Javier; Sanchez-Bote, Jose-Luis
In this proposed paper, a novel microphone linear array is presented and implemented for real-time processing. The prototype works on a DSP board and operates in the frequency domain. The array, which is composed of 15 microphones in nested configuration, combines two multichannel techniques for speech improvement: SuperDirective beamforming (SD) and Audible Noise Suppression (ANS). The SD beamforming technique is an alternative to conventional or Delay and Sum beamforming (DS) which has worse low frequency spatial selectivity. ANS processing is based on the masking properties of the human auditory system and can benefit the perceived and objective quality of the processed signal. Several on-line experiments will be described, assessing the real-time prototype.
DSP-Implemented Broadband Superdirective Microphon e Array with Audible Noise Suppression

Coyle, Eugene; Fitzgerald, Derry; Gainza, Mikel; Kelleher, Aileen; Lawlor, Bob
By combining techniques used in previous onset detectors, a system that detects note onsets in traditional Irish fiddle tunes has been implemented. The notes detected also include the most common types of ornamentation played by the fiddle. Ornaments are notes of extremely short duration, at most a fifth the length of a regular note. An STFT based sub-band technique, which previously gave good results for the Irish tin whistle, was modified to include a threshold approximation more suitable for the fiddle and good results have been achieved.
Onset Detection, Music Transcription and Ornament Detection for the Traditional Irish Fiddle

Borowicz, Adam; Petrovsky, Alexander
In this paper we present a novel method for enhancing speech corrupted by colored noise. A recent extension of signal subspace approach to colored-noise processes is employed. Enhancement is performed using optimal linear estimator which minimizes average signal distortion power for a given set of constraints on the residual noise power spectrum. Perceptual criteria give lower speech distortion than SNR-based solutions. Thus, our proposition is to use constraints defined in DFT domain that are consistent with masking properties of the human ear. Optimal filter is found by solving the constraints’ equations for given masking threshold. Proposed method utilizes currently most advanced ideas in signal subspace speech enhancement and is optimal in the general case of colored-noise process.
Perceptually Constrained Subspace Method for Enhancing Speech Degraded by Colored Noise

Hikichi, Takafumi; Miyoshi, Masato; Okuno, Hiroshi G.; Yoshioka, Takuya
This paper describes a method for estimating the amplitude characteristics of common poles in a room soundfield from a musical audio signal received by multiple microphones. It has been proven that an estimate can be calculated precisely for a white source signal. However, if a source signal is colored as in the case of a musical audio signal, the estimates are degraded by the frequency characteristics originally contained in the source signal. In this paper, we propose an extension of the conventional method so that it can estimate the amplitude characteristics of common acoustical poles. Simulation results for popular, classical, and jazz musical pieces showed the effectiveness of the proposed method.
Blind Estimation of Room Resonances using Popular, Classical, and Jazz Music

Barry, Dan; Coyle, Eugene; Lawlor, Bob
The Azimuth Discrimination and Resynthesis algorithm (ADRess), has been shown to produce high quality sound source separation results for intensity panned stereo recordings. There are however, artefacts such as phasiness which become apparent in the separated signals under certain conditions. This is largely due to the fact that only the magnitude spectra for each of the separated sources are estimated. Each source is then resynthesised using the phase information obtained from the original mixture signal. This paper describes the nature and origin of the associated artefacts and proposes alternative techniques for resynthesising the separated signals. A comparison of each technique is then presented.
Comparison of Signal Reconstruction Methods for the Azimuth Discrimination and Resynthesis Algorithm

Malah, David; Ofir, Hadas
In this work we present a novel approach for audio packet loss concealment, designed for MPEG-Audio streaming, based only on the data available at the receiver. The proposed method is based on the GAPES (Gapped Amplitude and Phase Estimation) algorithm for replacing the missing data, using interpolation in the spectral domain. Since the MPEG standard uses MDCT for compression, we need to convert the data to the DFT domain. This conversion is done directly using an efficient procedure that we developed for this purpose. This technique is found to provide better performance than previously reported works, and can be successfully used even with a loss rate of 30%.
Packet Loss Concealment for Audio Streaming Based on the GAPES Algorithm

Ferreira, Anibal J. S.; Rocha, Ariel
This paper presents a new method to the adaptive cancellation of acoustic feedbacks. The method uses high resolution frequency analysis and high-Q notch filters so as to accurately detect feedbacks and cancel them without disturbing noticeably the main audio spectrum. The method will be described, its implementation on a TMS320C6711 DSP platform for real time operation will be explained, and results for the adaptive cancellation of two simultaneous acoustic feedbacks will be presented.
An Accurate Method of Detection and Cancellation of Multiple Acoustic Feedbacks

Chen, Liang-Gee; Huang, Shih-Way; Tsai, Tsung-Han
The paper derives fast decomposition for the Quadrature Mirror Filter (QMF) banks of the Low power Spectral Band Replication (SBR) tools in the MPEG High Efficiency Advanced Audio Coding (HE AAC) decoder. The matrix operations in the filterbanks are decomposed into elementary Discrete Cosine Transform (DCT) types and simple permutations. The computational complexity can be effectively reduced by using fast algorithms for DCT.
Fast Filterbanks for the Low Power MPEG High Efficiency Advanced Audio Coding Decoder

Hebert, Gary
A new IC for implementing companding noise reduction in professional wireless microphone applications is described. Unlike existing devices designed primarily for the cordless telephone market, the new design allows straightforward, repeatable implementation of companding schemes incorporating ratios greater or less than 2 to 1, level-dependent ratios, limiters, and noise gates. The overall device architecture and design and performance of individual functional blocks is described. Several examples of encoder and decoder implementations are presented. The design techniques used to maintain wide dynamic range while minimizing power consumption are described.
A Flexible Compander IC for Wireless Microphone Applications

Brown, Jim; Whitlock, Bill
Operational considerations dictate the use of passive splitting of microphones in most sound reinforcement applications. Modern microphones generally require a minimum load impedance of 1,000 ohms. Since mix desk input impedances rarely exceed 1,500 ohms, passive splitting utilizing 1:1 turns ratio transformers can seriously degrade microphone performance when used to drive two or more mix desks. Transformers designed to operate in stepdown mode solve this problem and offer other benefits.
A Better Approach to Passive Microphone Splitting

Schneider, Martin
All electronic devices in the audio chain are to some extent susceptible to external electro-magnetic interference. Interfering signals are moving higher and higher in the radio frequency spectrum, the current must important disturbing element being mobile phones. Standardized measurement systems are available. Although mentioned in the relevant microphone standards, data is seldom published. Actual measurements on different microphones, cables and wiring topologies shall be presented and discussed.
Electromagnetic Interference, Microphones and Cables

Elliq, Mohammed; Millot, Laurent; Pele, Gerard
Audio scene cartography for real or simulated stereo recordings is presented. This audio scene analysis is performed doing successively: a perceptive 10-subbands analysis, calculation of temporal laws for relative delays and gains between both channels of each subband using a short-time constant scene assumption and channels inter-correlation which permit to follow a mobile source in its moves, calculation of global and subbands histograms whose peaks give the incidence information for fixed sources. Audio scenes composed of 2 to 4 fixed sources or with a fixed source and a mobile one have been already sucessfully tested. Further extensions and applications will be discussed. Audio illustrations of audio scenes, subband analysis and demonstration of real-time stereo recording simulations will be given.
Using Perceptive Subbands Analysis to Perform Audio Scenes Cartography

Amadu, Frederic; Monceaux, Jerome; Pachet, Francois; Roy, Pierre; Zils, Aymeric
Monophonic soundtrack translation to new audio format is a growing demand particularly from the DVD producers. The mono to 5.1 upmix is a time consuming task for the sound engineer who must choose and adapt different spatialization tools. In the view to simplify the upmix, we introduce a new spatialization approach based on descriptors. This approach consists in an automatic detection of audio source qualities, based on perceptive aspects. This paper presents an experimentation driven by Arkamys and Sony CSL Paris in order to evaluate the interest of descriptors, in regard of sound engineer needs when upmixing old movies. EDS (Zils & Pachet, 2004), a generic audio information extractor, was used to control the Arkamys spatializer (Raczinski, Monceaux, Vieilledent, 2003). We describe the aim and interest of this descriptor's approach and discuss its performances and limitations.
Descriptor-Based Spatialization

Kessler, Ralph
As a result of a long term research work the author presents a new method for creating, improving and applying highest quality “acoustical fingerprints”. Further “ingredients” of the presented room sampling method are known by most audio experts: a loud speaker, regular sound recording equipment and know-how. A basic method was optimised and extended in order to create a new generation of simulation tools for use in music and film post pro studios. A new software was developed to faciliate the process even further. Beneath practical hints for sampling rooms the presentation will include a poetic “virtual acoustic tour” through famous german spaces in multichannel format as well as a preview of the coming generation of complex sound field simulation.
An Optimized Method for Capturing Multidimensional “Acoustic Fingerprints”

Jin, Craig; Kan, Alan; Lin, Dennis; McGinity, Matthew; Smith, Keir; Tan, Teewoon; van Schaik, Andre
We present a newly developed 3D Audio Playback Engine that has the capabilities to: (i) simultaneously spatialise over 40 audio sources based on sound trajectories; (ii) smoothly playback a fixed 3D soundtrack for 385 possible head-orientations (based on head-tracking data); (iii) switch smoothly between different fixed 3D soundtracks (i.e., different auditory worlds); (iv) playback up to eight simultaneous and instantaneous 3D sounds on command; (v) use a head-tracker interface via the virtual reality peripheral network (VRPN); (vi) employ a 3D audio communication interface using voice over IP between three participants; (vii) create a two-way communication protocol with the Virtools graphical software engine controlling the visual display.
3DApe: A Real-Time 3D Audio Playback Engine.

izquierdo-fuente, alberto; Jimenez-Gomez, Maria-Isabel; Martinez-Arribas, Alberto; Raboso-Mateos, Mariano; Val-Puente, Lara; Villacorta-Calvo, Juan-Jose
A simple "ad hoc" method is presented which it allows to calibrate arrays of sensors, as much in transmission as in reception, for narrow-band acoustical radar systems, using beamforming techniques, where does not know the phase and gain characteristics of each one of the channels. Three methodologies of calibration are proposed, testing on a real system, analyzing the degradation introduced on the radiation patterns, based on the type of calibration and appointment angle values.
A Simple Methodology of Calibration for Sensors Arrays for Acoustical Radar System

Bove Jr., V. Michael; Dalton, Ben
An active self-localisation algorithm is proposed which is effective for even only coarsely synchronised sensor-emitter nodes. Spatial conformation is derived from the differences in audio measurements of pseudo-noise 'chirps' emitted and detected at each of the nodes. By removing dependence on fine grained temporal synchronisation it is hoped that this technique can be used concurrently across a wide range of devices to better leverage the existing audio sensing resources that surround us. An implementation of this method is described and quantitatively compared with a more typical synchronised source location approach, both using the Smart Architectural Surfaces development platform. The viability of the method is further demonstrated in a mixed-device ad hoc sensor-network case using existing off-the-shelf technology.
Audio-Based Self-Localization for Ubiquitous Sensor Networks

Aristide, Polisois; Pierre, Touzelet
Conventional OPT (CO-OPT), when used for Single-Ended tube amplifiers, have a great inconvenient due to the anode quiescent current which reduces the core magnetic flux capability, for the AC magnetic flux, before the core saturation. Usual solutions applied to solve this problem are based on air gapped magnetic cores, with large core cross section. Resulting performances are generally satisfying but lead to large, heavy and expensive OPT. This paper introduces a new concept of OPT: the Self Compensated OPT (SC-OPT). Its aim is to remove the above inconvenient. The magnetic flux, due to the anode quiescent current, is self compensated, offering to the AC magnetic flux, the whole core magnetic flux capability. As a result,for Single ended layouts,the limitations in size and power of the traditional airgapped transformers are overcomed and the SC-OPTs can be built for much higher power outputs, without becoming too bulky. The SC-OPT can deal with very large quiescent currents ( 2 Amps and above ), making it suitable not only for Hi-power triodes, but also solid state devices, such as mosfets. The same principle also applies to interstage transformers as well as filtering and plate chokes. Experiences have also revealed an improved clarity in the sound.
The Self Compensated Output Power Transformer (SC-OPT), Theory and Properties

Van der Veen, Menno
Vacuum tube amplifiers need an output transformer as impedance converter between the high impedance vacuum tubes and the low impedance loudspeaker. A new universal output transformer is proposed. All possible variations of topologies in tube amplifiers are brought into one general system. The new transformer is discussed and results are shown with over twenty different vacuum tube amplifiers from the general system.
Universal System and Output Transformer for Valve Amplifiers

Riedmiller, Jeff; Robinson, Charles; Seefeldt, Alan; Vinton, Mark
The broadcast, satellite and cable television industries have been plagued for years by the inability of personnel to reliably measure and thus consistently control program loudness utilizing traditional measurement devices and methods. As a result, most listeners feel compelled to make adjustments to their television volume controls (in the home). A recent survey of channel-to-channel and program-to-program level discrepancies confirms that the current practice is unacceptable to listeners. In this paper we propose and describe several measurement techniques designed to reliably measure program loudness and enable effective loudness control.
Practical Program Loudness Measurement for Effective Loudness Control

Escolano, Jose; Pueo, Basilio; Romá, Miguel
Maybe the most difficult aspects in Electroacoustics training are those concerning modelling acoustical and mechanical systems by means of their analogous circuits. An adaptation of the "problem-based learning" model for these question is proposed, with the corresponding sequence of activities. A logical, progressive approaching is achieved, allowing the students a less aggressive way towards analogous components, models and circuits.
An Approach to Analogous Circuits of Acoustical and Mechanical Systems by Means of Problem Based Learning

Charles, Jane; Coyle, Eugene; Fitzgerald, Derry
This paper considers the development of a violin teaching aid, called ViTool. It is a computer based teaching aid and will ultimately consist of at least four task dependent tools, one of which is an intonation tool. Typical beginner faults have been identified and features, that best describe them for classification purposes, are being considered. The ViTool is not intended as a replacement or electronic teacher, but as a teaching aid.
Development of a Computer Baesd Violin Teaching Aid, ViTool

Lynn, John; Quinn, Patrick
Given the complexity of audio equipment and software it is surprising the lack of published research into the design and evaluation of the interfaces for such technology. In recognition of the importance of this area of design, students on the BSc(Hons) Audio Technology programmes at Glasgow Caledonian University study the theory and practice of interface design as part of their degree studies. This paper looks at the aims of the syllabus covered, its main influences and issues surrounding its implementation.
Interface Design as Part of an Audio Technology Degree

Harada, Noboru; Moriya, Takehiro; Sekigawa, Hiroshi; Shirayanagi, Kiyoshi
In the proposed lossless compression scheme, the main idea is to use the Approximate-Common-Factor Coding for the pre-processing of the floating-point data compression. This scheme can reduce bit rates significantly, especially when the input values in a frame are constructed by multiplication of the sequence of integer values and a floating-point constant. This scheme has been proposed for the MPEG-4 ALS (Audio Lossless Coding) core experiment and accepted as the part of the reference model.
Lossless Compression of IEEE Floating-point Audio using Approximate Common Factor Coding and Masked-LZ Compression

Ciarkowski, Andrzej; Czyzewski, Andrzej; Dziubinski, Marek; Kaczmarek, Andrzej; Kulesza, Maciej; Maziewski, Przemyslaw
New algorithms were developed based on formant structure analysis as a method for discriminating wow from natural musical effects such as vibrato. Another algorithms allowing wow tracking were also proposed such as: algorithm employing AR model for power line hum frequency detection and algorithm for estimating pitch variation curve employing noise bandwidth analysis. A setup was proposed for efficient wow tracking based on recording bias detection in magnetic recordings. Moreover. time-varying resampling routine was implemented and applied to wow compensation. The developed algorithms were studied employing real audio examples allowing a comparison of their effectiveness.
New Algorithms for Wow and Flutter Detection and Compensation in Audio

Dolan, Janice; McKay, John
Advancements in nonlinear editing technology have enabled directors to modify their film project at any point during the post process. This freedom provides significant creative flexibility. However, the technologies for sound and film editing are not fully integrated and pose a challenge for sound editors keeping sync with film edits and changes. This paper introduces new workflows and technologies that enable sound editors to work in tandem with the changing film and automate manual processes in a collaborative non-linear environment. These new workflows and changing technologies will be described using a real-world motion picture case study – Lord of The Rings
Sound Editing Workflows and Technologies for Digital Film: The Non Linear Sountrack

Duffner, Orla; Marlow, Sean; Murphy, Noel; O'Connor, Noel; Smeanton, Alan
A passive traffic monitoring technique is devised which uses two omnidirectional microphones to detect and localize the acoustic pattern of road vehicles. This method uses the cross-power spectrum phase version of the generalized cross correlation algorithm for time delay estimation to emphasise passing concentrated wideband sound sources. Key traffic flow data such as vehicle location, speed and density are extracted by means of creative pattern recognition including directional filtering, that exploits the acoustical correlation signature of a passing vehicle. Experiments are based on real traffic data in typical conditions. Compared to existing traffic sensors this technique is economically advantageous and non-intrusive.
Road Traffic Monitoring using a Two-Microphone Array

Azzali, Andrea; Bilzi, Paolo; Carpanoni, Eraldo; Farina, Angelo
Being able to have a fast and reliable evaluation of speech intelligibility inside cars is of utmost importance during interiors design phase. Performing subjective tests inside car compartments is a time consuming task and therefore not suitable for industrial processes. Moreover, comparison of different car fittings and database implementation cannot be performed in an easy and fast way. The effectiveness of virtual listening systems was therefore investigated in the context of a research project that involves both UNIPR and an important carmaker. A preliminary subjective evaluation session in real cars was carried out. Two different virtual listening systems were compared to investigate the best configuration for the intelligibility test. In this paper a comparison of the systems is presented and the provided experimental results show that the tests performed in the listening room are consistent with the ones carried out in car compartments.
Comparison of Different Listening Systems for Speech Intelligibility Tests

King, Josh; Shively, Roger
A study of automotive doors as loudspeaker enclosures was previously presented. Considerations for modeling the mechanical and acoustical behavior of automotive doors are now presented.
Automotive Doors as Loudspeaker Enclosures Modeling Considerations

Christensen, Flemming; Lydolf, Morten; Martin, Geoff; Minnaar, Pauli; Pedersen, Benjamin; Song, Woo-Keun
This paper describes a system for simulating automotive audio through headphones for the purposes of conducting listening experiments in the laboratory. The system is based on binaural technology and consists of a component for reproducing the sound of the audio system itself and a component for reproducing the background noise in the cabin. The former is implemented by measuring binaural room impulse responses (with a head-and-torso-simulator equipped with a computer-controlled rotating head) and employing them in binaural synthesis. The latter is implemented by recording the cabin noise (with the artificial head rotated to various positions) and reproducing the recordings through headphones. During playback the listener’s head is tracked and both the binaural synthesis and binaural recordings are updated accordingly in real time.
A Listening Test System for Automotive Audio - Part 1: System Description

Bech, Søren; Ellermeier, Wolfgang; Ghani, Jody; Gulbol, Mehmet-Ali; Martin, Geoff
This paper describes two listening tests that were performed to provide initial validation of an auralisation system (see Part 1) to mimic the acoustics of a car interior. The validation is based on a comparison of results from an in-car listening test and another test using the auralisation system and recordings of the stimuli used for the in-car test. The music samples for the test were chosen from a database of various CODEC examples from a previous extensive ITU test to validate the ITU-R BS.1387-1 standard.
A Listening Test System for Automotive Audio - Part 2: Initial Verification

Bech, Søren; Martin, Geoff
Although many commonly-accepted paradigms exist for perceptual testing of automotive audio systems, it has yet to be reliably determined which auditory attributes should be tested. This paper gives an introduction to the Descriptive Analysis technique for determining and quantifying salient attributes of signals produced by a sound playback system in an automotive environment.
Attribute Identification and Quantification in Automotive Audio - Part 1: Introduction to the Descriptive Analysis Technique

Beltran, Jose R.; Ponce de Leon, Jesus
In this paper a new algorithm to compute an additive synthesis model of a signal is presented. An analysis based on the Complex Continuous Wavelet Transform has been used to extract the time-varying amplitudes and phases of every component of the additive model. The mathematical relationships between the CCWT, the Hilbert Transform and complex filter banks are presented in order to obtain useful filter bank design parameters. The mathematical analysis of five different signals is presented: a pure cosine, a sum of cosines, a signal with frequency variations and two finite duration signals with Gaussian and exponential envelopes. The obtained theoretical results are finally compared with those computed with the developed algorithm.
Sound Analysis and Synthesis through Complex Bandpass Filter Banks

Bonada Sanjaume, Jordi
In this paper we present a transformation which pretends to convert a voice solo into a big unison choir. The basic idea behind the presented algorithm is to morph the input voice solo (dry recording) with a recorded sustained vowel of a unison choir. The processing algorithm is based on the rigid phase-locked vocoder adapted to harmonic sounds. Pitch and timbre are taken from the voice solo, and the local spectrum comes out from the analysis of the unison choir sample.
Voice Solo to Unison Choir Transformation

Collins, Nick
A recent review paper by Bello and colleagues [1] compared the performance of a selection of onset detection algorithms, but omitted the psychoacoustically motivated log difference function introduced by Klapuri [4]. This paper addresses that with respect to a number of variants of the Klapuri model, further considering psychoacoustic loudness measures and some other recently published and novel detection functions and peak pickers. The evaluation procedure utilises an extended superset of the database from [1], contrasting non-pitched percussive and pitched non-percussive sound sources. Sixteen detection functions take part in the main trial. Applications are considered in real-time causal systems which react to percussive transients or enable perceptually motivated segmentation. Keywords: Onset Detection, Detection Functions, Peak Picking, Audio Analysis
A Comparison of Sound Onset Detection Algorithms with Emphasis on Psychoacoustically Motivated Detection Functions

Gomez, Emilia; Maestre, Esteban
We describe a method to automatically extract a set of features from the audio signal that are related to musical expressivity, more concretely to dynamics and articulation. We define a description scheme based on intra-note segmentation into attack, sustain, release and transition segments, and a subsequent amplitude and pitch contour characterization. Saxophone jazz standards performance recordings are analyzed, in order to obtain an expressivity model in terms of amplitude and pitch. The description scheme is presented, as well as the algorithms to automatically perform intra-note segmentation and feature extraction scheme. We evaluate the performance of the algorithms for both intra-note segmentation and feature extraction and propose some future work and applications.
Automatic Characterization of Dynamics and Articulation of Expressive Monophone Recordings

Petrausch, Stefan; Rabenstein, Rudolf
Block-based physical modeling is a recently introduced method for digital sound synthesis via the simulation of physical models. The method follows a "divide-and-conquer" approach, where the elements are individually modeled and discretized, while their interaction topology is separately implemented. In this paper the application of this approach to string instruments is presented. The string is modeled with the functional transformation method, preserving all varieties of strings by a direct link from the physical string parameters to the algorithm. The excitation of the string is modeled separately with wave digital filters. Thanks to the block based approach it is possible to simulate all kind of string instruments, i.e. piano, guitar, and violin, with the same piece of code for the strings.
Application of Block-Based Physical Modelling for Digital Sound Synthesis of String Instruments

Mitchell, Thomas; Sullivan, Charlie
Parameter estimation for Frequency Modulation synthesis has provided a continual challenge to researchers since its introduction over thirty years ago. Previous research has made use of basic evolutionary optimisation algorithms to evolve sounds produced by non-standard Frequency Modulation arrangements. In contrast, this paper utilises recent advancements in multi-modal evolutionary optimisation to perform dynamic-sound matching with traditional Frequency Modulation arrangements. In doing so, a technique is developed that is not synthesiser dependant, and provides the potential for alternative methods of synthesis control.
Frequency Modulation Tone Matching Using a Fuzzy Clustering Evolution Strategy

Eronen, Antti; Hämäläinen, Matti
This paper describes an automated wavetable synthesizer testing system. Due to the nature of MIDI standards, testing of desired synthesizer behaviour or MIDI standard conformance is not feasible with bit-exact test vectors. Automated testing involves measuring the frequency, timing, and amplitude characteristics of the synthesizer output. Novel signal-analysis based testing methods were developed for measuring the rendering precision of MIDI and instrument articulation parameters. The automated test system consists of a PC host, which controls the test execution, stores the test case parameters, runs the signal analysis, and generates the test report. A modular interface connects the test system to the MIDI synthesizer allowing testing to be done on different hardware platforms using a single set of test cases. Currently, 62 different test groups for SP-MIDI, Mobile DLS and Mobile XMF standard conformance testing has been implemented.
Automated Wavetable Synthesizer Testing

Hazan, Amaury; Ramirez, Rafael
We describe a tool for generating expressive music performances of monophonic Jazz melodies. The system consists of three components: (a) a melodic transcription component which extracts a set of acoustic features from monophonic recordings, (b) a machine learning component which induce an expressive transformation model from the set of extracted acoustic features, and (c) a melody synthesis component which generates expressive monophonic output (MIDI or audio) from inexpressive melody descriptions using the induced expressive transformation model.
An Approach to Expressive Music Performance Modeling

Choisel, Sylvain; Wickelmaier, Florian
The identification of relevant auditory attributes is pivotal in sound quality evaluation. Two fundamentally different psychometric methods were employed to uncover perceptually relevant auditory features of multichannel reproduced sound. In the first method, called Repertory Grid Technique (RGT), subjects were asked to directly assign verbal labels to the features when encountering them, and to subsequently rate the sounds on the scales thus obtained. The second method requires the subjects to consistently identify the perceptually relevant features before assigning them a verbal label. Under sufficient consistency, a lattice representation -- as frequently used in Formal Concept Analysis (FCA) -- can be derived to depict the structure of auditory features.
Extraction of Auditory Features and Elicitation of Attributes for the Assessment of Multichannel Reproduced Sound

Usher, John; Woszczyk, Wieslaw
Sound imagery is discussed in many contexts in subjective loudspeaker audio evaluation. In this paper we define imagery in terms of the spatial properties of an auditory object and present a theoretical framework for the study of images in audio. A general categorization of images into either source or reverberance images has been suggested in previous works; here we discuss perceptual organization principles and physical factors which affect this distinction. The degree to which existing theories for controlling a source image can be applied to controlling a reverberance image was investigated with a simple experiment. Using a conventionally arranged 3/2 loudspeaker system and a graphical mapping system developed in previous work, we investigated the spatial imagery associated with independent pair-wise panning of a single dry source audio channel and a number of artificial reverberation channels. From these results, we suggest a new sound processing system for use with conventional 3/2 surround sound loudspeaker audio wherein source and reverberant sound image radiation are separately controlled, enabling more consistent homogeneity of reverberant sound images.
Interaction of Source and Reverberance Spatial Imagery in Multichannel Loudspeaker Audio

Merimaa, Juha; Pulkki, Ville
Spatial Impulse Response Rendering (SIRR) is a recent technique for reproduction of room acoustics with a multichannel loudspeaker system. SIRR analyzes the direction of arrival and diffuseness of measured room responses within frequency bands. The direction is estimated using sound intensity, and diffuseness from the ratio of active sound intensity and energy density. Based the analysis data, a multichannel response suitable for reproduction with any chosen surround loudspeaker setup is synthesized. When loaded to a convolving reverberator, the synthesized responses create a very natural perception of space corresponding to the measured room. In this paper, new listening test data, and applications to processing of continuous sound are presented.
Spatial Impulse Response Rendering: Listening Tests and Applications to Continuous Sound

Kendrick, Paul; Shirley, Ben
Surround sound television is being taken up by broadcasters around the world and it is important to assess the impact of this for viewers, particularly those who struggle to understand speech on TV soundtracks. This research assesses the effect on intelligibility of a central dialogue loudspeaker, such as that found in 5.1 surround systems, when compared to a phantom central stereo image. Listening tests were carried out that showed up to 9.4% average improvement in intelligibility and which indicate possible benefits of surround sound systems on dialogue clarity.
Measurement of Speech Intelligibility in Noise: A Comparison of a Stereo Image Source and a Central Loudspeaker Source

Williams, Michael
The difference in the effect of cross-talk between coincident (or near-coincident) and spaced multichannel array systems – the cross-talk introduced by adjacent microphones to a specific segment – the cross-talk introduced by microphones on the opposing sides of an array – the effect of cross-talk in the transitory and quasi-steady state regions of a natural signal – crosstalk reduction in the quasi steady state region - image folding into “empty” areas. Each of these aspects of crosstalk has a different and definable influence on a specific segment of the multichannel microphone array system. Microphone arrays must therefore be designed to minimise their effect on the final front coverage or surround sound multichannel image.
The Whys and Wherefores of Microphone Array Crosstalk in Multichannel Microphone Array Design

Lee, Hyun-Kook; Rumsey, Francis
A series of subjective listening tests were performed in order to elicit and grade the effect of interchannel crosstalk in 3 channel microphone technique. The effect was investigated with various independent variables, including different types of microphone array (different combinations of time and intensity differences), sound source and acoustic condition. This paper describes the test method and presents the result data.
Investigation Into the Effect of Interchannel Crosstalk in Multichannel Microphone Technique

Bruno, Remy; Laborie, Arnaud; Montoya, Sebastien
Consumers are more and more interested in multichannel sound. However, installing a surround system is still a headache for the average user. The ITU recommendations are generally incompatible with homes arrangement and people install their system how they can, which generally results in large spatial sound distortions. This paper presents a system allowing to overcome this problem by adapting multichannel sound to the actual loudspeaker layout. This system consists of a small calibration microphone measuring loudspeaker characteristics (3D position, frequency response) and a process which remaps multichannel sound over the calibrated layout so as to compensate the measured loudspeakers misconfiguration, including full 3D position.
Reproducing Multichannel Sound on any Speaker Layout

Holzinger, Axel; Jonsson, Lars
Existing networked broadcasting systems are stereo oriented. Multichannel sound demands an upgrade of the existing stereo based gear in the chain from production to playout and for archiving. Existing systems rely on audio file formats that lack some fundamental requirements such as the possible integration of a stereo downmix, which is essential in broadcasting. They suffer from restrictions such as the 4GB barrier that incapacitate them for usage in professional systems. On the other hand there is a need to find solutions that align with existing formats to build an easy upgrade path for potential implementors. The solutions discussed in this paper tries to find a way to fulfill both goals.
New File Format and Methods for Multichannel Sound in Broadcasting

Ruiz Reyes, Nicolas; Vera Candeas, Pedro
In this paper we propose a new sinusoidal modeling method based on perceptual matching pursuits with application to parametric audio coding. Complex exponentials compose the overcomplete dictionary for matching pursuits. The main contribution is the minimization of a perceptual distortion measure defined in the bark scale to select the optimum atom at each iteration of the pursuits. Furthermore, a psychoacoustic stopping criterion for the pursuits is presented. The proposed sinusoidal modeling method is suitable to be integrated into a parametric audio coder based on the three-part model of Sines, Transients and Noise (STN model), as appreciated in experimental results. Our method provides significant advantages regarding previous works mainly because it operates in the bark scale, instead of the frequency scale.
A Sinusoidal Modeling Approach Based on Perceptual Matching Pursuits for Parametric Audio Coding

Buchholz, Jörg; Mourjopoulos, John; Zarouchas, Thomas
A novel, non-uniform PCM audio quantizer is described, employing a time-domain computational auditory masking model. The model utilizes the concept of Signal Dependent Compression to produce an internal representation of the input signal so that via the use of a decision device a time-domain masking threshold can be derived. Based on this model, for inputs the reference audio and its quantized versions, the proposed quantizer derives masked / unmasked regions of the signal, so that by using an iterative process, the desired variable bit allocation can be achieved on the audio samples. Results indicate high quality quantization for an average rate of 6.5 bits / sample, the quantizer having low-computational complexity and very low latency.
An Audio Quantizer Based on a Time Domain Auditory Masking Model

Bouchard, Martin; Shao, Cathy
In this paper, a novel post filtering method is proposed to improve the perceptual quality of wideband speech and audio Algebraic Code Excited Linear Prediction (ACELP) codecs (such as the ITU-G722.2 AMR-WB). This perceptual processing is derived from the characteristics of the human hearing system. Based on these characteristics and more specifically on the analysis of the perceptual loudness difference between the original and the coded signal, it is proposed to add a post filter to reduce the perception of the coding noise. Simulation results show that the objective assessment scores obtained using wideband Perceptual Evaluation of Speech Quality (w-PESQ) can be improved significantly using the proposed method, especially for wideband female speech.
A perceptual Post Filter for Wideband Speech and Audio ACELP Codecs

Chang, Wei-Chen; Su, Alvin W.Y.; Wang, Jing-Xin
A new audio compression method based on a Spectral Oriented Tree is presented. After frequency transformation is performed over a frame of audio samples, the transform coefficients are arranged to form one to several quad trees depending on harmonic structures of the signal. In each quad tree, the coefficients having larger magnitudes regarded as more important are placed closer to the root position of the tree. A method called Concurrent Encoding In Descendant Tree (CEIDT) is employed to encode the tree coefficients such that those important coefficients are encoded before less important coefficients. Therefore, scalability is easy by discarding the tailing bits at any position of the bitstream. The quality is comparable to that of MP3. The proposed method doesn’t use psychoacoustic model and the computation complexity is relatively lower compared to those of MP3 and AAC. Only one small coding table is used for CEIDT method while the tables used in MP3 or AAC require lots of memory spaces. Thus, the proposed method provides a lower cost alternative, too.
A New Audio Compression Method Based on Spectral Oriented Trees

Bang, Kyoung Ho; Lee, Keup Sup; Park, Young Cheol; Youn, Dae Hee
In MP3/AAC encoders, the quantization parameter called scalefactor controls the quantization noise and the bitrate. Tuning these encoders would require a characterization of the rate-distortion function per subband, which seems to be available only in a parametric manner. In this paper, a fast bit allocation method for MP3/AAC encoder is presented. The resulting encoder is able to produce an ISO/MPEG compliant bitstream which can guarantee better audio quality. More importantly, the number of computational steps is greatly reduced as compared to the method recommended by the ISO/MPEG committee because the efficient bit allocation algorithm significantly reduces the number of iterations required. It was found that the efficient bit allocation algorithm works best when the bit rate demanded by the psychoacoustic model in order to keep the quantization noise below the masking threshold is almost equal to the operational bit rate.
Fast Bit Allocation Method for MP3/AAC Encoders

Chen, Li-Wei; Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min
High-Efficiency AAC (HE AAC) has achieved good quality at low bit rates by reconstructing the high frequency signals through replicating the low frequency parts. The bit reservoir that decides the bit budget between the AAC for the low frequency part and the side information used to reconstruct the high frequency part will be an important module. This paper will propose a bit reservoir design for the HE AAC.
Bit Reservoir Design for HE AAC

Ferreira, Anibal J. S.; Sinha, Deepen
Recent advances in perceptual audio coding are strongly based on the concept of bandwidth extension. Most techniques implementing bandwidth extension require an analysis/synthesis filter bank in addition to that used by the associated perceptual audio coder, with a clear penalty in system complexity and coding delay. In this paper we present Accurate Spectral Replacement (ASR) as one of a new class of bandwidth extension techniques applied directly to the high frequency representation of the signal. ASR is based on a suitable decomposition of the MDCT filter bank, and implements synthesis of sinusoidal components with an accuracy much higher than the natural frequency resolution of the MDCT. The ASR technique is described, its performance is assessed with both synthetic and natural audio signals, and its main areas of application are addressed.
Accurate Spectral Replacement

Jang, Inseon; Kang, Kyeongok; Lee, Taejin; Park, Gi Yoon
As digital broadcasting technologies have been advanced, user’s expectations for a realistic and interactive broadcasting service also have been increased. In this paper, we present an object-based 3D audio broadcasting system. This system consists of an authoring tool, a streaming server, and a client. The authoring tool generates an MPEG-4 file, made of multiple audio objects, after adapting several kinds of acoustical effects to the audio objects. The streaming server generates transmission packets, and sends them to the client through the Internet. The client reconstructs the bitstream, and plays the 3D audio with a user interaction. In this paper, we will present the design and implementation method of an object-based 3D audio system and describe the simulation result and applications.
An Object-based 3D Audio Broadcasting System for Interactive Services

Cuevas-Martinez, Juan Carlos
For Internet audio streaming purposes, a good trade-off between bit rate reduction and audio quality is achieved by using parametric audio coding. A scalable parametric coder has been optimized for this requirement, avoiding differential encoding and using a layered scheme for changing straightforwardly the bit rate. The results reveal our coder as a good candidate for massive distributed audio applications, like music on demand, radio broadcasting or real-time streaming audio. Nevertheless, real-time scalable parametric audio streaming requires a complete communication protocol to achieve the goal of network transparency due to best-effort QoS of Internet. In this article some ideas for the design of a new embedded variable audio control protocol are shown.
Scalable Parametric Audio Coder for Internet Audio Streaming

Floros, Andreas; Mourjopoulos, John; Tatlas, Nicolas - Alexander
Despite the recent advances in the wireless networking technology, wireless multichannel digital audio delivery is not yet efficiently realized, because of the additional implementation issues raised for managing possible distortions introduced. In this work, a novel, open-architecture, software platform for evaluating wireless digital audio distribution is presented. This tool facilitates the assessment of real-time playback distortions induced by variable packet reception delays and packet losses, typically encountered in WLAN transmissions. Moreover, this platform can be also employed for producing audio streams corresponding to the wirelessly delivered digital audio, in order to investigate the audibility of such distortions.
An Evaluation Tool for Wireless Digital Audio Applications

Holladay, Aaron; Holladay, Bryan
Audio Dementia is the next generation software application for mixing audio. The objective of Audio Dementia is to make audio mixing more intuitive, thus making audio mixing more user friendly by reducing the learning curve. The provisionally patented layout of Audio Dementia is straightforward to use and understand. The basic concept is that every track in a song has an icon on a stage area that represents its volume and pan with respect to a central icon on the stage. To change the pan or volume just click and drag the track icon to the point on the stage that the user wishes to place it. This position is also relative to the timeline so that each track can have multiple points consecutively connected by a corresponding path color.
Audio Dementia: A Next Generation Audio Mixing Software Application

Ellis-Geiger, Robert
Historically film composers have always complained about how much of their music is frequently lost during the dubbing process, as a result of it being blended or overridden by sound effects and dialogue. There is also a lot of contention over the way temporary music is selected and used within film production in general. This paper will discuss the following positions: 1. The notion that Hollywood film making still largely resembles the Henry Ford “Moving Assembly Line” model. 2. Within the last ten years Hollywood film making has been significantly influenced by the integrated use of music and sound technology. 3. The genre is clearly in a new phase/period where films are being more progressively produced as integrated art forms. Never before has there been such a strong movement by film makers (and agents of the film team) towards supporting an integrative approach in their use of music, sound, voice, text and image.
Film Music Scoring using a Digital Audio Workstation

Bettarelli, Ferruccio; Ciavattini, Emanuele; Lattanzi, Ariano; Piazza, Francesco; Squartini, Stefano; Zallocco, Diego
This work presents a novel software platform called NU-Tech to implement real-time DSP algorithms in multi-channel scenarios. Running on a common PC, the overall framework is based on a plug-in architecture, allowing the user to connect specific blocks, operating as DSP algorithms, within the available graphical design environment. These blocks, namely NUTs' (NU-Tech Satellites), have to be previously written in C++. A strict control over latency times is insured by a proper interface to the hardware layer of the PC sound card. It turns out that NU-Tech is well suited for development, real-time debugging and fine tuning of DSP algorithms. As further application, it fulfils the role of DSP operating core of new stand-alone programs preventing the user to develop them from scratch. Some examples are provided to show the effectiveness of the idea.
NU-Tech: Implementing DSP Algorithms in a Plug-in Based Software Platform for Real time Audio Applications

Camurri, Antonio; Coletta, Paolo; Drioli, Carlo; Massari, Alberto; Volpe, Gualtiero
The EyesWeb system ( is an open platform for real-time multimodal processing. It has now reached a mature stage, and is being used both for research in multimodal interfaces and for for applications such as naturally interacting systems for museum exhibits, performing arts, therapy and rehabilitation. This paper presents the latest development concerning audio processing in EyesWeb with a special focus on multimodal processing. The integrated audio processing support includes modules for the analysis and synthesis of audio streams, for musical processing through the MIDI protocol, and for interoperability with other audio processing platforms, technologies and standards. Advanced real-time audio support for the mapping of multimodal input into multimedia output is described in the full paper.
Audio Processing in a Multimodal Framework

Bradter, Cornelius; Hobohm, Klaus
Based on the H . K. Thiele A-V technology archive, a multimedia approach for the conversion of the material into electronic documents was developed. This online archive uses the greenstone digital library system (GDL) because it provides a wide variety of input formats and full search functions by extensive use of standard or custom built metadata. The archive system is complemented by an interactive encyclopaedia and specialized relational database functions. The system forms an open platform for the management of information concerning historical A-V media technology.
The Thiele-Krause Archive for Audio-Visual Media Technology: An Online Historical Platform.

Ahnert, Wolfgang; Bansal, Mahesh; Feistel, Stefan
Geometrical approaches like ray tracing method and image source method are not sufficient in small and complex shaped rooms in order to take proper account of the wave nature of the sound field and to obtain room acoustic characteristics in higher quality in the low frequency bands. We present an overview of using the Finite Element Method (FEM) for performing modal analysis in closed environments and to show diffraction effects in EASE, the acoustic simulation software, which is not possible with existing particle models especially below the Schroeder frequency. For this purpose we consider the solution of the general quadratic eigenvalue problem arising from finite element analysis in enclosures with complex shapes and general impedance boundary condition. We are mainly concerned with the attempt of showing the practical feasibility of FEM in room acoustics and to combine it with the particle model in order to obtain the broad-band response of the room. Also, fundamental points regarding the Finite Element Method, iterative methods and required mesh quality are discussed.
First Approach to Combine Particle Model Algorithms with Modal Analysis using FEM

Goggans, Paul M.; Kleiner, Mendel; Xiang, Ning
In recent years, acoustics of coupled spaces have received considerable attention in architectural acoustics. One of tasks in analyzing acoustics of coupled spaces is to evaluate different decay times from double-slope decay characteristics of Schroeder decay functions using measured room impulse responses. Traditionally however, identification of double- or multiple- sloped decay in room impulse response measurement has been considered very challenging. We apply Bayesian probabilistic analysis methods to cope with the demanding tasks in estimating multiple decay times from Schroeder decay functions. Using experimentally measured data in scale-down models and in real coupled spaces, this paper discusses determination of number of decay slopes and estimation of decay times within Bayesian framework. Implemented routines of Bayesian approaches will be demonstrated.
Bayesian Probabilistic Analysis of Sound Energy Decay Characteristics in Acoustically Coupled Rooms

Huang, Patty; Karjalainen, Matti; Smith, Julius O.
Digital waveguide networks (DWN) are known as a methodology to simulate spatially distributed systems, such as reverberators (room simulation) and resonators of musical instruments. This paper is a study on the application of DWNs to simulate acoustic spaces for room rendering, including binaural auralization. The methods discussed combine the principles of digital waveguide meshes, image source models, reverberation algorithms, and HRTF-based rendering. Examples are given on synthesizing binaural room responses for simple room geometries, and the possibilities of fitting the models to given real room responses is discussed. System performance is analyzed from the point of view of real-time virtual acoustics.
Digital Waveguide Networks for Room Response Modeling and Synthesis

Rubak, Per
Three different methods to evaluate artificial reverberation decay quality are investigated (monaural attributes). The first method is based on evaluation of the autocorrelation function F(t) in octave bands. The temporal diffusion, defined by Kuttruff as TD = F(0)/ max {F(for t greater than 0)}, is proposed as a simple engineering metric. The second method employ evaluation of the short time energy in the impulse response filtered in octave bands. A sliding 30 ms window was employed to calculate the short time energy in the filtered room impulse responses (exponential decay was compensated by an exponential increasing factor). The last method was a modified version of the Kohlrausch spectral stationarity test called Reverb-prints.
Evaluation of Artificial Reverberation Decay Quality

Ahnert, Wolfgang; Orfali, Wasim
Compared to churches no considerable amount of work has been done in evaluating the acoustical parameters of mosques. The present paper deals with acoustic computer simulations and in particular comparison between measured room acoustical parameters and calculated ones created in a computer model. Afterwards, models of former not any more existing mosques have been created and sound restoration for such mosques using an auralization tool will be within the main aim of this paper. Two former non anymore existing but in the literature well known mosques will be modeled and sound auralization will be performed to demonstrate what kind of acoustic conditions we have had inside these mosques and how the two mosques sounded at that time.
Acoustical Simulation and Auralization in Mosques

Avis, Mark; Cox, Trevor; Xiao, Lejun
This paper presents some further results concerning active diffusers; in particular it addresses the issues of stability and the use of multiple active elements within a surface. Active diffusers have to precisely achieve a desired target impedance which is more complicated than the targets used for active absorption or control. Consequently, active diffusers are more prone to instability. Furthermore, processes developed to reduce the computational burden for active absorbers are not directly applicable to active diffusers, because it is not possible to trade off the final error achieved against the computation burden of the adaptation. These findings are demonstrated through numerical simulation and measurement.
Active Diffusers: Stability Analysis and Multiple Active Elements

Goerne, Thomas; Schumann, Bernd
Rooms used for remote recordings or live stages are sometimes acoustically inappropriate for the purpose, e.g. when working in churches or industrial venues. Some simple and cost-effective methods for designing the reverberation response of a room are investigated. Aim of this work is to show simple and cost-effective solutions for acoustic optimization of highly reverberant rooms with mobile materials. Absorption coefficients of materials like blankets or tarpaulins are measured in several typical situations.
Acoustic Design with Textile Absorbers and Foils

Mapp, Peter
STIPa (STI for PA systems) is rapidly becoming a popular method of assessing speech intelligibility. It was conceived in order to overcome the problems associated with RaSTI when measuring the potential intelligibility performance of a sound system. The paper reports the results of extensive testing carried out on a variety of sound systems and controlled test environments using measurement equipment from four different manufacturers. It is concluded, that whilst theoretically, STIPa should be able to accurately predict STI, in practice, actually realising this goal may not be quite so straightforward. A number of error mechanisms were found and are reported together with estimations of the typical measurement accuracy and repeatability that can be expected when undertaking STI performance testing.
Is STIPa a Robust Measure of Speech Intelligibility Performance?

Rabenstein, Rudolf; Renk, Marcus; Spors, Sascha
Wave field synthesis is an auralization technique which allows to control the wave field within the entire listening area. However, reflections in the listening room interfere with the auralized wave field and may impair the spatial reproduction. Active listening room compensation aims at reducing these impairments by using the WFS system. Current realizations of WFS systems are limited to the reproduction in a plane only. This reduction in dimensionality leads to effects that limit the performance of active room compensation. This paper analyzes these limiting effects on a theoretical and practical basis.
Limiting Effects of Active Room Compensation using Wave Field Synthesis

Fuster, Laura; González, Alberto; Lopez, Jose Javier; Zuccarello, Pedro
Wave Field Synthesis is a 3D audio reproduction system, which allows synthesizing a realistic sound field in an extended area by using arrays of loudspeakers. However, the listening room introduces new echoes not present in the signal to be reproduced, reducing the spatial effect. This paper proposes a new correction approach based on obtaining direct solutions for the multichannel inverse filter bank in order to reduce the error in the listener area. Time and frequency domain algorithms will be proposed to calculate the bank of inverse filters. Different laboratory experiments have been carried out for validating the method.
Room Compensation using Multichannel Inverse Filters for Wave Field Synthesis Systems

Bleda, Sergio; Escolano, Jose; Lopez, Jose Javier; Pueo, Basilio
Wave Field Synthesis systems achieve a realistic 3D-sound rendering for a wide listening area, however, due to its novelty there is a lack in the WFS authoring process of mixing and producing. In this work, a set of software tools is presented. The proposed solution is based on a hybrid tool consisting on two modules: a standalone WFS rendering server and a source positioning client in the form of a VST plug-in. The VST plug-in communicates with the WFS server allowing transparency and flexibility from the software editing tool.
Design and Implementation of a Compatible Wave Field Synthesis Authoring Tool

Bleda, Sergio; Escolano, Jose; Lopez, Jose; Pueo, Basilio
This paper presents a sub-band modification of the Wave-Field Synthesis 3-D audio algorithm for improving the degradations in the reproduction above the spatial aliasing frequency. With the method presented, the spectrum above the spatial aliasing frequency for each source is reproduced using only one loudspeaker of the array. Several advantages are obtained: there is not any comb-filtering effect, avoids dependence between the spatial aliasing frequency and the cross-over frequency on the transducers and some computational cost savings. The characteristics of localization cues of the human auditory system at high frequencies help to validate psico-acoustically this method.
A Sub-band Approach to Wave-Field Synthesis Rendering

Ferekidis, Lampos; Kempe, Uwe
Even excitation of room modes is prerequisite to a balanced low frequency transfer characteristic. The spatially distribution of low frequency sources in multi channel systems yields complex mode excitation patterns. The positioning and adjustment of the relative phase of the main speakers to a “single” LFE is addressed in order to achieve a good summation. Finally, large listening areas combined with a well-balanced transfer characteristic are expected in modern multi channel applications. This study presents solutions by means of using multiple low frequency cardioids to improve the room transfer characteristic. Of particular interest are possible consequences on the decay times of individual modes as well as the signal summation of LFE and main speakers. Further investigations relate to the number of low frequency cardioids and how they relate to the spatial variation of the transfer characteristic in the room.
Controlling the Mode Excitation of Rooms by using Multiple Low Frequency Cardioids in Multichannel Systems

Rafaely, Boaz
Second-order sound-field microphone is desirable for spatial audio recordings and reproduction which uses zero, first and second order harmonics. Recent results in the design and analysis of spherical microphone arrays are employed in this paper to analyze performance and requirements from a second-order soundfield microphone configured as a spherical microphone array. The effect of measurement noise, spatial aliasing, microphone positioning accuracy and microphone response mismatch on overall performance is evaluated both analytically and using simulation. Specifications for high quality second-order microphone are presented, and directions for improving performance are proposed.
Design of a Second-Order Soundfield Microphone

Hamasaki, Kimio; Hiyama, Koichiro; Okumura, Reiko
The 22.2 multichannel sound system was developed for adapting to ultrahigh-definition vide system with 4000 scanning lines. It consists of three layers of loudspeakers: the upper layer with 9 channels, the middle layer with 10 channels, and the lower layer with 3 channels and 2 channels for LFEs. It can reproduce a higher sensation of presence over a wider listening area than the conventional multichannel audio system. This system will be firstly introduced to the public at World Exposition 2005 in Japan. This paper describes the newest 22.2 multichannel system in the Super Hi-Vision Theater of World Exposition, and will discuss advantages of 22.2 multichannel sound. It will also describe the sound recordings and productions by 22.2 multichannel sound.
The 22.2 Multichannel Sound System and Its Application

Driessen, Peter F.; Jeon, Chulwoong
The Karaoke is one of large industries using audio & video in Asia. One of popular features for the vocal that is used by Karaoke users has been Echo/Reverb. In addition to Echo/Reverb, if vibrato is added to vocal signals, the vocal vibrato will deliver more confidence to karaoke singers and make the singers feel themselves as professional singers. In this paper, we present a real-time ‘Vocal Vibrato Effecter’ running under Windows PC which automatically adds a vibrato effect to the vocal input. The proposed technique exploits vocal energy level and temporal consistency of the pitch variation. The key novelty in this work is the combination of pitch detector and pitch shifter (vibrato). This effecter can be applied to consumer/commercial Karaoke systems to enhance a vocal signal.
The Intelligent Artificial Vocal Vibrato Effecter using Pitch Detection and Delay-Line

Davies, Matthew E. P.; Brossier, Paul M.; Plumbley, Mark D.
In this paper we address the issue of real-time rhythmic analysis, primarily towards predicting the locations of musical beats such that they are consistent with a live audio input. This will be a key component required for a system capable of automatic accompaniment with a live musician. We implement our approach as part a real-time audio library. Due to the removal of "future" audio information for this causal system, performance is reduced in comparison to our previous non-causal system, although still acceptable for our intended purpose.
Beat Tracking Towards Automatic Musical Accompaniment

Herrera, Perfecto; Streich, Sebastian
The detrended fluctuation analysis (DFA) has been proposed by Peng et al. and was first applied in biomedical analysis. Jennings et al. introduced the method to the field of music analysis by using the DFA exponent for musical genre classification. In this paper we further exploit the relation of this low-level feature to semantic music descriptions. The feature has been computed on a large-scale collection of 7750 tracks for which manually annotated semantic labels like "energetic" or "melancholic" where available. Associations with high statistical significance could be found between some of these labels and the DFA exponent. The findings sustain the hypothesis that this feature can be linked to a musical attribute which might be described as "danceability".
Detrended Fluctuation Analysis of Music Signals: Danceability Estimation and Further Semantic Characterization

Sandler, Mark; Wen, Xue
This article proposes a method for transcribing a piano note that is overlapped by previous notes. It is suggested that by referring to a short context before the note being transcribed, it is possible to improve the performance of a note transcriber by removing the contribution of previous notes. This removal can be performed either explicitly to produce a novelty signal, or implicitly inside the note transcriber, with the latter leaving a space for further optimization. Experiments show that the method dramatically improves the performance of a simple transcriber.
Transcribing Piano Recordings using Signal Novelty

Hazan, Amaury
We describe a tool for transcribing voice generated percussive rhythms. The system consists of: (a) a segmentation component which separates the monophonic input stream into percussive events (b) a descriptors generation component that computes a set of acoustic features from each of the extracted segments, (c) a machine learning component which assigns to each of the segmented sounds of the input stream a symbolic class, (d) a beat generator that generates an output rhythmic stream. All these components are integrated into a VST plugin that allows its use for live applications.
BillaBoop: Real-Time Voice-Driven Drum Generator

Harte, Christopher; Sandler, Mark
This paper presents an approach to the problem of identifying musical chords from audio recordings. In our approach, a tuning algorithm is applied to a 36-bin chromagram to accurately locate the boundaries between semitones. This allows the calculation of a 12-bin semitone-quantised chromagram, which can then be compared with a set of predefined chord templates in order to generate a sequence of chord estimates. The performance of our method is evaluated by comparing the results with a test database of hand-labelled pieces, from which the initial results are encouraging. The paper concludes with a discussion of some possible improvements to the algorithms presented.
Automatic Chord Identifcation using a Quantised Chromagram

Chetry, Nicolas; Davies, Mike; Sandler, Mark
In this paper, we address the problem of automatically recognising and identifying an instrument from a set of solo recordings. A system using the LSF as features whose statistical properties are learnt using the k-means algorithm is described. During the training phase, models are built by determining an optimised codebook of LSF vectors for each class of instruments. During the identification phase, one codebook is similarly extracted from the unknown audio sample. A distortion measure between two codebooks is then used to retrieve the identity of the presented excerpt. System performances are evaluated using a database of 11 instruments.
Musical Instrument Identification using LSF and K-means

Janer, Jordi
This paper explores the singing voice from an unusual perspective, not as a musical instrument but as a musical controller. A set of spectral processing algorithms extract features form the input voice. These features are categorized in four groups: excitation, vocal tract, voice quality and context. The extracted values are then transmitted as Open Sound Control (OSC) messages to be used in an external synthesis engine. In this document, we provide first a technical description of the algorithms, and in a second part, we detail the components of the system. A practical example of voice-driven synthesis using PureData (Pd) is also presented.
Feature Extraction for Voice-driven Synthesis

Daudet, Laurent; David, Bertrand; EssidSSID, Slim; Leveau, Pierre; Richard, Gael
This paper addresses the usefulness of the segmentation of musical sounds into transient/non-transient parts for the task of machine recognition of musical instruments. We put into light the discriminative power of the attack-transient segments on the basis of objective criteria, consistent with the well-known psychoacoustics findings. Moreover, we show that, paradoxically, it is not always optimal to consider such a segmentation of the audio in a machine recognition system given decision length constraints. Our evaluation exploits efficient automatic segmentation techniques, a wide variety of signal processing features as well as feature selection algorithms and Support Vector Machine classification. The sound database used is composed of real-world mono-instrument phrases.
On the Usefulness of Differentiated Transient/Steady-state Processing in Machine Recognition of Musical Instruments

Crockett, Brett; Smithers, Michael
A method for characterizing and identifying audio material using reduced-information audio characterizations based on auditory scene analysis is presented. In the method described, a single or multi-channel audio signal is analyzed and the location and duration of the individual audio auditory events are identified. The auditory events are used to create a reduced-information, audio signature that can be used to determine whether one audio signal is derived from another audio signal. The audio signal comparison removes or minimizes the effect of temporal shift or delay on the audio signals, calculates a measure of similarity, and compares the measure of similarity against a threshold providing a fast, highly accurate and automatic method of signal identification.
A Method for Characterizing and Identifying Audio Based on Auditory Scene Analysis

Mendoza - Lopez, Jorge; Busbridge, Simon C.; Fryer, Peter A.
A digital transducer array (DTA) loudspeaker which produces a quantised sound field directly from binary-weighted bit streams has been developed. Theoretical investigation shows that it is possible to generate an acoustic wavefront from an assemblage of air quanta provided that the basis functions are orthogonal. Other acoustic effects e.g. filtering are presented. Moving coil transducer frequency response measurements highlight its significance to the behaviour of the array by the use of current driving. Results obtained with newly developed stiff diamond diaphragms demonstrate the importance of the transient response to the conversion process. Further results and simulations are presented giving the DTA sound field distortion as functions of the bit stream processing algorithm as well as the array geometry.
Direct Acoustic Digital to Analogue Conversion with Digital Transducer Array Loudspeakers

Beer, Daniel; Brix, Sandra; de Vries, Diemer; Kuster, Martin
Multi-Actuator Panels (MAPs) are a possible solution to satisfy the requirement of a large number of loudspeaker channels inherent in wave field synthesis. The structural acoustic behaviour of MAPs has been measured with a Laser Doppler Vibrometer and acoustic radiation simulation has been performed using a discretized Rayleigh I integral. The analysis shows that, due to the large structural damping, the acoustic radiation is almost entirely produced by the near-field around the excitation point on the panel. It is concluded that the model of an infinite panel is more appropriate than that of the widely known Distributed Mode Loudspeaker.
Structural and Acoustic Analysis of Multi-Actuator Panels

Kärkkäinen, Leo; Mellow, Tim
The sound radiation characteristics of a fluid-loaded membrane are calculated analytically and using a finite element model. It is shown that good correlation between the two calculation methods can be achieved, providing the elements are small enough. The model can be applied to electrostatic or other membrane type loudspeakers with a very large back cavity. Although this is seldom possible in practice, this model provides an indication of what the theoretical limit of the bass performance actually is as well as an analytical benchmark for the finite element modeling of fluid-structure coupled problems.
On the Sound Field of a Membrane in an Infinite Baffle

Chaigne, Antoine; Quaegebeur, Nicolas
Current research in electroacoustics tends to determine the global transfer function between an initial electrical signal and the acoustic pressure transmitted to the ear. Because electrodynamic transducers radiate in a large frequency range, lumped parameter models, such as Thiele & Small, are not sufficient to provide a realistic simulation of the vibroacoustical behavior of the system. This study proposes the use of modal equations to predict, through the use of Rayleigh's integral, the sound field in free space radiated by a diaphragm, modeled by a thin shallow sperical shell.
Influence of Material and Shape on Sound Reproduction by an Electrodynamic Loudspeaker

Carlisi, Marco; Di Cola, Mario; Manzini, Andrea
Magneto-dynamic loudspeakers are affected from a wide variety of problems due to the voice coil inductance. This inductance is not a constant parameter but is dependent on the frequency, the displacement and the actual current flowing in the coil. Moreover these last two dependencies are also non-linear. Several causes of distortion, affecting mainly the vocal range, are derived from these phenomena. A practical solution to minimize the inductance is investigated. This solution is based on a additional fixed coil positioned in the gap provided with 2 additional terminals offering various connection possibilities. Advantages of this approach will be discussed and measurements results will be shown as well.
An Alternative Approach to Minimize Inductance and Related Distortions in Loudspeakers

klippel, wolfgang
Loudspeaker dedicated for high-frequency signals may also produce significant distortion in the acoustical output. The dynamical measurement of the nonlinear parameters reveals the physical causes directly. The stiffness Kms(x) of the mechanical suspension is mostly the dominant nonlinearity followed by the force factor characteristic Bl(x). The paper presents a new measurement technique developed for those transducers and discusses the relationship with large signal performance (distortion, compression) and the impact on the listening impression.
Large Signal Performance of Tweeters, Micro Speakers and Horn Drivers

Thiele, Neville
Design procedures and worked examples are described for asymmetrical linear phase, or "constant voltage", crossover systems whose summed outputs maintain a constant amplitude and phase response across the whole audio spectrum without making impractical demands on the power handling capability or precision of their components. Alternative procedures are described using either passive or active electrical or acoustic filters that interact with the driver and its parameters.
Linear-Phase Analogue Crossover Systems Revisited

Bai, Mingsian; Lee, Chih-chung
Cone velocity of loudspeakers has been long recognized as an important parameter for loudspeaker compensation. In the paper, a cone velocity observer that requires no sensor is developed on the basis of state-space estimation. Linear quadratic Gaussian (LQG) theory in conjunction with multirate processing is employed in the design of the observer. The experimental results show close agreement between the velocities obtained by using the proposed technique and the measurement by a laser vibrometer. In addition, the system allows for self-identification and automated filter synthesis. The compensation filters are designed using the quantitative feedback technique (QFT) and implemented on a digital signal processor (DSP). The system is applied to two audio problems in loudspeaker compensation: bass enhancement and room response equalization. Experimental results are discussed.
DSP-based Sensorless Velocity Observer with Audio Applications in Loudspeaker Compensation

Bleda, Sergio; Escolano, Jose; Lopez, Jose Javier; Pueo, Basilio
In virtual room acoustics, the goal is to create the illusion of three-dimensional natural and realistic sound scenes. Accuracy of numerical methods in discrete-time domain provides an explicit and practical way to synthesize impulsive responses of virtual rooms, specially in low frequency applications. On the other hand, advanced multichannel sound system such as Wave Field Synthesis (WFS) allows to recreate spatial wider sound scenes of sources after being convolved with extrapolated impulsive responses. In this paper, the use of synthesized impulsive responses obtained by means of finite-differences time domain method in auralization purposes using WFS reproduction system is proposed and discussed. This method provides a complete solution of sound field variables, which are necessary to obtain a proper set of impulsive responses. Some examples are shown and evaluated to illustrate its applicability for WFS requirements.
An Approach to Discrete-Time Modelling Auralization for Wave Field Synthesis Applications

Kamekawa, Toru
Optimum microphone array for multi-channel stereophonic recording was presented by comparing impression differences. From the experiments the array using omni-directional microphones for left and right channels and a bi-directional microphone for center channel was evaluated better localization and spaciousness than the array using uni-directional microphones for left, right and center channels specially at off center position. And optimum position for surround microphones is drawn by calculation from room size considering the time difference between direct sound of front channel and early reflections of surround channel.
Impression Differences by Placement of Front and Rear Microphones for Multi-channel Stereophonic Recording

Martin, Geoff
Contrary to widely-held belief, there are instances where multichannel sound reproduction results in a reduced sweet spot size when compared to stereo. This problem occurs due to interference in the listening room when signals in two or more channels are coherent. In contradistinction, a high degree of interchannel coherence is required to ensure localisation of sources and early reflections between (rather than in) the loudspeakers. This paper describes a new multichannel microphone configuration that provides high interchannel coherence for direct sounds, and low coherence for reverberation. Thus it considers not only the usual sound stage characteristics such as imaging and spaciousness but also unwanted artifacts caused by interchannel interference in the listening environment.
A New Microphone Technique For Five-Channel Recording

Bleda, Sergio; Escolano, Jose; Lopez, Jose Javier; Pueo, Basilio
Wave Field Synthesis (WFS) is a technique that enables true spatial reproduction over a large audience area with no dependence on the listener position. During the last decade, some major advances have been achieved for extending its use in every day applications. WFS can be applied in high powered applications, such as sound reinforcement events, in which true spatial audio would be needed. At present, dynamic midrange loudspeakers, that form the arrays needed to recreate the secondary sound sources cannot deliver such power because of its size. In this paper, large drivers arrays are proposed to achieve high pressure fields that fulfill the requirements of high power applications. Size and distance between loudspeakers are chosen not to elevate the typical aliasing frequencies and directivity of the standard arrays. To prove the benefits of this arrays, some simulations of the reproduced sound field are presented, together with acoustic measurements of an array of loudspeakers.
An Approach for Wave Field Synthesis High Power Applications

Kassier, Rafael; Lee, Hyun-Kook; Brookes, Tim; Rumsey, Francis
There is currently a lack of recorded test materials in five-channel surround format. Particularly lacking are recordings made simultaneously using different microphone arrays that would allow comparative switching between different recorded versions of the same acoustical event. An ambitious pilot experiment was conducted involving the recording of various different programme items using eight different recording techniques simultaneously. This was undertaken to determine the practicality of making of such recordings, to allow informal comparisons between microphone techniques, and to create a set of simultaneous multichannel recordings for subsequent perceptual evaluation. This paper details experimental design considerations and practical limitations, as well as reporting initial observations regarding the resulting recordings.
An Informal Comparison Between Surround-Sound Microphone Techniques

Braasch, Jonas
In auditory virtual environments it is often required to position an anechoic point source in three-dimensional space. When sources in such applications are to be displayed using multichannel loudspeaker reproduction systems, the processing is typically based upon simple amplitude-panning laws. This paper describes an alternative approach based on an array of virtual microphones. In the newly designed environment, the microphones, with adjustable directivity patterns and axis orientations, can be spatially placed as desired. The system architecture was designed to comply with the expectations of audio engineers and to create sound imagery similar to those associated with standard sound recording practice.
A Loudspeaker-based 3D Sound Projection using Virtual Microphone Control (ViMiC)

Ahonen, Jukka; Kelloniemi, Antti; Paajanen, Olli; Pulkki, Ville
Since the direction of sound is not perceived at very low frequencies, it is feasible to use only one subwoofer for low frequency reproduction in commercial multi-channel audio setups. A listening test was conducted to find the crossover frequency where the listeners begin to detect the subwoofer presence. The test was arranged in a symmetrical listening room using four pairs of Genelec 1037C speakers, arranged symmetrically in four angles to the front of the listener to equalize the timbres as well as possible in reverberant conditions. The detection judgement was done using a version of the two alternatives forced choice (TAFC) adaptive method, with which the 75 % point of the psychometric function was found.
Detection of Subwoofer Depending on Crossover Frequency and Spatial Angle between Subwoofer and Main Speaker

Kanda, Yoshihiro; Katayama, Kenji; Kiriu, Shinya; Muraoka, Teruo
GHA is an inharmonic frequency analysis proposed by N.Wiener and featured with its excellent frequency resolution. However it has not been examined practically until we obtain quite powerful computers because of the difficulty that it requires huge amount of calculations. Recently an elegant algorithm enables to reduce calculations was announced by Dr.Hirata,and the authors have improved the algorithm profitable to practical applications. Consequently, it realized both processing speed and accuracy practical stages. In order to utilize GHA’s advantages of precise frequency detection and separation,the authors have been trying to apply GHA to signal processing in audio engineerings.Scratch noise reduction of damaged SP records will be reported as an example.
Generalized Harmonic Analysis and It's Application to Intensive Noise Reduction

de Diego, Maria; Ferrer, Miguel; González, Alberto; Piñero, Gema
It is well-known that the affine projection (AP) algorithm shows a good trade-off between computational effort and convergence speed. Nevertheless, the computational complexity of the AP algorithm increases with the projection order and low-computational versions, fast AP (FAP) algorithms, have been previously discussed. In this paper, a new FAP algorithm focused on the fast error vector calculation is proposed for active noise control. The new version uses the classical filtered-x structure instead of the commonly applied modified filtered-x. A comparative practical study of the different strategies has been also carried out and validates the use of the proposed method in ANC systems as an alternative to the previous versions since it provides lower computational cost.
Efficient Error Vector Calculation in Affine Projection Algorithms for Active Noise Control

Alexey, Budkin; Goldin, Alexander
The paper presents challenges of performing effective acoustic echo cancellation in time-delay sensitive applications where the quality of used acoustic components is low, the acoustic design may be pure and the cost of used digital components must be reduces as low as possible. The situation is typical for mass-market applications such as mobile and regular phones, office speakerphones and low cost conferencing systems. Using low cost components in compact enclosures introduces large amount of non-linear distortions into the loudspeaker signal causing pure performance of classical acoustic echo cancellation algorithms. Unfortunately, more sophisticated algorithms require more processing power and larger processing delays. The paper discusses possible trade-offs as well as solutions implemented in Alango Voice Communication Package.
Challenges of Acoustic Echo Cancellation in Low Cost Applications

Mertins, Alfred; Strahl, Stefan; zhou, huan
To address the fine-grain scalable audio compression issue, a novel combined significance tree technique is proposed for high compression efficiency. The core idea is to dynamically adopt a set of locally optimal significance trees, instead of following the common approach of using a single type of tree. Two different encoding strategies are proposed: the spectral coefficients can be encoded either in a threshold-by-threshold manner or in a segment-by-segment manner. The former yields rate and fidelity scalability, and the latter additionally yields bandwidth scalability. Experimental results show that our proposed scheme significantly outperforms the existing schemes using single-type trees and performs comparably with the MPEG AAC coder while achieving fine-grain scalability.
An Efficient, Fine-Grain Scalable Audio Compression Scheme

Moerman, Jean Paul
Program loudness is still one of the topics of discussion between broadcasters. This paper is not one more attempt to give explanation on the causes. Instead it’s a practical guide how the problem for the viewer has been solved at VRT, Belgium’s National Broadcaster. Program makers are to be convinced of the importance of sound to get their message to the viewer. The technical management have to realize the necessity of processing units all over the production chain. Settings have to be carefully integrated throughout every department. How the settings are achieved and convincing the management is one of the many topics described in this practical paper.
Program Loudness: Nuts & Bolts

Robinson, Charlie; Vinton, Mark
Inter-program audio level discrepancies continue to plague the content creation and broadcast industries. In particular, end users often are compelled to make adjustments to audio playback levels. It has previously been established that leveling programs based on dialogue loudness can improve listener satisfaction; however, it is difficult to measure as it requires a human to continuously monitor an audio stream and measure only the loudness during speech portions of the content. This paper presents an automated speech discrimination system that provides a means to detect portions of audio content that contain primarily speech. The speech/other discrimination system takes advantage of well known speech characteristics to achieve a total error of 3% despite delay and computational limitations.
Automated Speech/Other Discrimination for Loudness Monitoring

Ramirez, Miguel A.; Rodriguez, Sergio G.
The HRTF characterizes the scattering of sound waves on the human body, especially on pinnae, head and torso; hence, it presents high variability between individuals. Given the success of recent geometrical models for the head-torso contribution to the HRTF, this work is proposing a method for modeling the pinna contribution. The model has, as entering data, specific pinna dimensions and it is able to estimate the pinna-related transfer function (PRTF) with a mean of 65% of accuracy for the inter-subject PRTF variance. Since each spatial position must be modeled individually, here we present a detailed example of the method used for modeling one of them. We use HRTF and anthropometric data from the CIPIC Database.
HRTF Individualization by Solving the Least Squares Problem

Brookes, Tim; Treble, Chris
It is known that headphone playback, even of binaurally-recorded material, often gives rise to in-the-head locatedness of reproduced sound sources. Head-tracking systems, artificial reverberation, and decorrelation of the left/right signals, have all been investigated previously as possible means by which the incidence of in-the-head locatedness may be reduced. It is proposed that the left/right symmetry of dummy-head pinnae, and of the head-related transfer functions used in binaural convolution, may exacerbate the in-the-head problem, and it is shown experimentally that, by recording using asymmetrical pinnae, perceived externalisation can be increased significantly.
The Effect of Non-Symmetrical Left/Right Recording Pinnae on the Perceived Externalisation of Binaural Recordings.

Jiao, Yu; Zhang, Zhiping; Qu, Tianshu; Wu, Xihong
A novel objective method to evaluate spatial quality of reproduced sound is presented. A binaural model of spatial hearing is used to measure many aspects of spatial attributes of real and virtual sound sources. This method can produce a measurement of the whole reproduced sound field other than the quality at the sweet spot. Virtual sources synthesized with different reproduction systems are analyzed using this method. And the results show good agreement with subjective evaluation of corresponding situations.
A Novel Objective Localization Quality Evaluation Method for Reproduced Sound

Dewhirst, Martin; Jackson, Philip; Rumsey, Francis; Zielinski, Slawomir K.
A mathematical model for objective assessment of perceived spatial quality was developed for comparison across the listening area of various sound reproduction systems: mono, two-channel stereo (TCS), 3/2 stereo (5.0 surround sound), Wave Field Synthesis (WFS) and Higher Order Ambisonics (HOA). Models for mono, TCS and 3/2 stereo are based on conventional microphone techniques and loudspeaker configurations for each system. WFS and HOA models use circular arrays of thirty-two loudspeakers, driven by signals derived from a virtual microphone array and the Fourier-Bessel spatial decomposition of the sound field respectively. Directional localisation, ensemble width and ensemble envelopment of tones, extracted from binaural signals, are analysed under a range of test conditions.
Objective Assessement of Spatial Localisation Attributes of Surroung-Sound Reproduction Systems

Higuchi, Hiroshi; Hokari, Haruhide; Kudo, Akihiro; Shimada, Shoji
When implementing virtual sound systems with headphones, it is well known that using transfer functions other than those of the listener yields front-back confusion in sound image localization. Some studies have concluded that moving the sound image clarifies the perceived location of the sound image. We focus on moving sound images to achieve highly accurate localization, and propose a swing sound image method that makes the sound image swing between two locations on the horizontal plane. Listening tests verify that the proposed method greatly reduces the front-back confusion.
An Improved Method for Accurate Sound Localization

Bradter, Cornelius; Hobohm, Klaus
We asked 25 test persons to locate real and virtual sound sources within a 360 degree environment. During the tasks head movements were recorded by an head tracker with a time resolution of 20ms. We categorized the success of locating the sound sources and related the outcome to criteria deduced from the head movement data. Contrary to the assumption that stronger head movements support localisation ability, we could not establish a simple relationship between head movements and good localisation.
Head Movements: An Approach to Their Significance for Localisation Tasks.

Armelloni, Enrico; Farina, Angelo; martignon, paolo
Audio reproduction of a movie inside a not dedicated room is critical; setting an ITU 5.1 system at home, for example, requires to place a large number of speakers around the room. But positions of speakers are often conditioned by the furniture. Bad alignment reduces spatial performances of the system dramatically. For circumventing the above problems, most stereo TV sets, nowadays, are equipped with some form of “virtual surround” reproduction, employing substantially the Stereo Dipole method. This provides a very good frontal sound stage, but indeed sucks regarding the emulation of virtual surround loudspeakers. An alternative reproduction technique is PanAmbio 4.1, based on a double stereo dipole system (frontal and rear). In this work the authors propose a comparison between the standard 5-ways surround system, and the new one. Validation is performed by subjective tests inside a domestic room.
Comparison between Different Surround Reproduction Systems: ITU 5.1 vs PanAmbio 4.1

Fejzo, Zoran; Kramer, Lorr; McDowell, Keith; Yee, Delbert
DTS-HD is a multi-channel audio codec comprised of a backward compatible DTS core, DTS ES and DTS 96/24 plus new extensions for: · additional channels · higher constant bit rates (>1.5Mbps) · lossless mode of operation In addition DTS-HD supports non-backward compatible modes of operation i.e.,: · LBR - low bit-rate coding for streaming applications · Lossless without the core for music archiving or cinema applications · LBR + Lossless for scalable server based storage and simulcast transmissions In this paper we present a technical overview of DTS-HD in a lossless mode of operation. We will outline the organization and main features of the stream. In addition we will describe the codec architecture and discuss its performance.
DTS-HD: Technical Overview of Lossless Mode of Operation

Curley, John; Daniels, Michelle; Garcia, Ricardo; Glover, Mike; Short, Kevin
An overview of the high-quality, scalable, low-bitrate KOZ audio codec technology is presented. This new compression method grew out of developments in the control of chaotic systems that allow for the creation of broad spectral components with only a few bits of information. These elements are combined with a high resolution analysis of the audio signal that allows for the signal to be decomposed into peak-like or tonal objects, noise-like objects, transients, and modulations. Psychoacoustic models have been adapted to prioritize and quantize these objects, and the reconstructed signal is built up in layers from the prioritized objects. Metadata and built-in DRM are present in the digital filestream. The decoder is a very low-complexity algorithm that is implemented on a wide variety of portable devices such as cell phones in a software-only solution running on integer processors without DSP support. A demonstration of the quality, scalability and other features of the KOZ format will be given.
An Introduction to the KOZ Scalable Audio Compression Technology

Breebaart, Jeroen; Disch, Sascha; Faller, Christof; Herre, Jürgen; Hilpert, Johannes; Kjörling, Kristofer; Myburg, Francois; Purnhagen, Heiko; Schuijers, Erik
Recently, technologies for parametric coding of multi-channel audio signals have received wide attention under the name of “Spatial Audio Coding.” In contrast to a fully discrete representation of multi-channel sound, these techniques allow for a backward compatible transmission at bitrates: only slightly higher than common rates currently used for mono / stereo sound. Motivated by these prospects, the MPEG Audio standardization group started a new work item on Spatial Audio Coding. The paper reports on the reference model zero architecture, as it emerges from the MPEG Call for Proposals (CfP) and the subsequent evaluation of the submissions. The architecture combines the strong features of the two submissions to the CfP that were found best in this process.
The Reference Model Architecture for MPEG Spatial Audio Coding

Maillard, Rainer; Schick, Bastian; Spenger, Claus-Christian
MP3Surround belongs to a new generation of coding technologies based on the “binaural cue coding”–technique, which transmit — different from the discrete transmission of 5.1-material — only a stereodownmix of the multichannel-signal together with a compact side information. From this signals the decoder generates a multi-channel signal with a spatial image similar to the spatial image of the original input signal. The stereodownmix is created from the multichannel-signal using a dynamic downmixing algorithm. MP3Surround also gives the opportunity to use a manually created downmix as sum signal for the decoder using the original spatial cues. This paper discusses the effects of differences between the manual downmix and the automatic downmix on the decoded surround-version.
First Investigations on the Use of Manually and Automatically Generated Stereo Downmixes for Spatial Audio Coding

Liebchen, Tilman; Reznik, Yuriy A.
MPEG-4 Audio Lossless Coding (ALS) is a new extension of the MPEG-4 audio coding family. The MPEG-4 ALS codec is based on forward-adaptive linear prediction, which enables remarkable compression even with low predictor orders. However, still better compression can be achieved when high orders are considered as well. This requires efficient quantization of the predictor coefficients together with adaptive block length switching. The paper describes the basic elements of the ALS codec with a focus on high-order prediction. It also presents latest developments during the standardization process and points out the most important applications of this new lossless audio format.
Improved Forward-Adaptive Prediction for MPEG-4 Audio Lossless Coding

Angus, James
Work is presented that shows that the behaviour of idle tones in Sigma-Delta modulators depends on whether the noise transfer function has zeros at dc or not. Simulation results for a variety of transfer functions of the same order but a different shape are presented. In particular, no zeros at dc, equiripple zeros, and all zeros at dc. It shows that, without dither, the effects of different filter transfer function shapes on idle tones is minimal for a given order. However, when dither is applied noise transfer functions with all the zeros at dc, Butterworth shapes, are better because lower levels of dither are required to eliminate idle tones.
The Effect of Noise Transfer Function Shape on Idle Tones in Sigma-Delta Modulators

Reiss, Joshua; Ling, Bingo, W.; Ho, Charlotte
Limit cycles and divergent behavior may be observed in sigma delta modulators (SDMs), especially when the inputs are overloaded. In this paper, we consider high order SDMs as used in A/D and D/A audio applications. Consideration is given as to how the initial states of the SDM affect its stability. The conditions which may give rise to different nonlinear behaviors are derived. Then we develop a novel control algorithm based on fuzzy impulsive control, which involves a decision mechanism for stabilizing different possible instabilities. Examples featuring lowpass and bandpass SDMs of various order are given in order to illustrate the effectiveness of the proposed control strategy.
Fuzzy Impulsive Control of High Order Sigma Delta Modulators

Adams, Robert; Gaalaas, Eric; Liu, Bill Yang; Morajkar, Rajeev; Nishimura, Naoaki; Sweetland, Karl
A 2x40W integrated stereo sigma-delta class D amplifier with 100 dB SNR is realized in 0.6um BCDMOS technology. Feedback from power stage outputs gives 0.001% THD and 65dB PSRR. Modulator clock rate is 6 MHz, but dynamically adjusted quantizer hysteresis reduces output data rate to 450 kHz, helping achieve 88% efficiency. At AM radio frequencies, the modulator output spectrum contains a single peak related to the hysteresis, but is otherwise tone-free, unlike PWM modulators which contain many harmonics of the PWM clock frequency.
Integrated Stereo Sigma-Delta Class D Amplifier

Putzeys, Bruno
A stable and load-invariant self-oscillation condition is developed for a class D amplifier employing only one single voltage feedback loop taking off after the output filter. The resulting control method is shown to effectively remove the output filter from the closed loop response. Practical discrete implementations of a comparator and gate-drive circuit are presented. A high-performance class D amplifier employing only 16 discrete transistors is constructed. Higher-order extensions of the control circuit are demonstrated which produce extremely low levels of distortion.
Simple Self-Oscillating Class D Amplifier with Full Output Filter Control

Lopez, Jose; Ramos, German
This paper presents an efficient loudspeaker equalization algorithm combining a novel filter optimization method with a psychoacoustic model. The equalizer topology is based on a chain of second order sections where each one is a conventional parametric IIR audio filter. A psychoacoustic model based on the detection of peaks and holes in the frequency response has been employed to determine which ones need to be equalized. Using this psychoacoustic model the order of the filter could be reduced without noticeable subjective effect. The first computed filter sections are the ones that provide more correction on the response, allowing scalable solutions when hardware limitations or different degree of correction are needed. The method has been validated in the laboratory with subjective testing.
Low Order IIR Parametric Loudspeaker Equalization, A Psychoacoustic Approach

Floru, Fred
The past few years have seen the continuation of the shift from analog processing to digital domain processing in professional audio products. There are still several analog sections of the box that remained purely analog. Over time the performance of mixed signal components has improved significantly to the point that, once again, the weakest link in the chain could be the analog interface. The purpose of this paper is to look at a few popular analog circuits that have a direct impact on the performance of professional audio applications. The circuits are explained with mathematical demonstrations. The impact of real life implementations on the performance specifications is explored for each circuit.
Demystifying Analog Circuits in Professional Audio Applications

Blanco-Martin, Elena; Casajus-Quiros, Francisco Javier; Gomez-Alfageme, Juan Jose; Torres-Guijarro, Soledad
In videoconference systems composed of microphone and loudspeaker arrays, a cue of subjective sound quality is the ability to locate the sound source. A ten loudspeakers array and a dummy head are implemented in an anechoic chamber. The field sampled by the microphone array is simulated by a narrow band source located at different azimuths. That simulation is emitted by the loudspeakers array and the binaural signal from the head is processed. Some improvements have been made in previous methods of azimuth estimation. The ellipsoidal head model used to calculate the interaural time difference (ITD) gives better results. The estimation given by the ILD (Interaural Level Difference) cue is used to avoid the ITD ambiguity at high frequencies.
Spatial Sound Localization Measures From a Dummy Head with a Loudspeaker Array in Anechoic Chamber

Kim, Jongbae; Lee, Joon-Hyun; Park, Sangil; Sung, Ho-Young
Generally, evaluation of sound quality of audio systems or AV products is accomplished through listening tests with groups of trained listeners, each having subjective views. But these kind of methods often tend to give results that are distorted due to listening environments and personal preferences of the listeners. For these reasons, even though they are deeply related to the way listeners actually hear things, subjective listening tests contain problems such as inaccuracy and time/place dependency. By examining the correlation between objective measurable evaluation factors and subjective listening test items, this paper proposes an objective method for sound quality evaluation that is based on former subjective methods for audio systems and AV products.
Development of Objective Sound Quality Evaluation Method Based on Subjective Sound Quality Evaluation

Brachmanski, Stefan
The methods for assessment speech quality fall into two classes: subjective and objective methods. This paper includes an overview of selected methods of subjective listening measurements (ACR, DCR, speech intelligibility) recommended by ITU-T, ISO and Polish Standard. and the method of speech transmission quality evaluation called „modified intelligibility test with forced choice” (MIT-FC). The MIT-FC method provides fully automatized measurement of speech intelligibility. The listener’s task is to select on computer monitor which of alternative utterances visually presented was spoken. The computer’s program automatically calculates speech intelligibility and the factor of speech quality. There are two objective methods described in the paper: QE-ARM and the Hougast and Steeneken method (STI, RASTI). The former one is based on the automatic speech recognition. The latter one is recommended by IEC and Polish Standard for assessment of speech quality in rooms. Experiments in which the methods were used are described. The results are compared with RASTI measurements.
Speech Quality Prediction for Voice in Rooms

Camacho, Andres; De Diego, Maria; Fuster, Laura; Gonzalez, Alberto; Pinero, Gema
A new approach to the evaluation of subjective perception of car engine noise is presented in this paper. It makes use of different Time Frequency (TF) methods that are applied to the calculation of some TF parameters of the noise signals, mainly related to energy variations. Once these parameters have been computed, they are linearly combined through a multiple regression model to describe the most important psychoacoustic features of the noises: loudness, sharpness and roughness. As a result, loudness and sharpness can be well defined by TF energies, whereas roughness needs a deeper study to elaborate new parameters that might fit it. A discussion about the ability of the different TF techniques to correlate with psychoacoustic metrics is also given.
Application of Different Time-Frequency Analysis to Psychoacoustic Description of Car Engine Noise

Martellotta, Francesco
The paper describes the main steps of a research to investigate the relation between subjective assessment of listening conditions in churches and the main objective acoustic parameters. Measurements of binaural and B-format impulse responses were made in several churches and in different positions inside the same church. A listening room was designed and realized in order to carry out listening test using a stereo-dipole configuration. Measured binaural IRs were cross-talk cancelled and then auralized with two anechoic motifs. Paired comparisons were finally performed, asking a trained panel of subjects their preference.
A Preliminary Investigation on the Subjective Evaluation of Church Acoustics using Listening Tests

Hatziantoniou, Panagiotis; Mourjopoulos, John; Worley, John
Formal subjective tests of real-time room dereverberation using Complex Smoothing were conducted to assess the robustness and the validity of the method under real sound reproduction conditions. For comparison, anechoic, inverse loudspeaker real-time filtering was also subjected to assessment to decouple improvements due to the complex smoothing inverse room filtering and due to mere equalization of the loudspeakers. Results derived from a multivariate analysis of variance (MANOVA) of the test data, verify the conclusions detected in the past by the objective evaluation of the method e.g. significant improvement in sound quality and immunity to real-time processing artifacts independently of room size and listener position.
Subjective Assessments of Real-Time Room Dereverberation and Loudspeaker Equalization

Dalka, Piotr; Kostek, Bozena
The aim of the research work presented is to show a system that facilitates speech training for hearing impaired people. The system engineered combines both visual and acoustic speech data acquisition and analysis modules. The Active Shape Model method is used for extracting visual speech features from the shape and movement of the lips. The acoustic features extraction involves mel-cepstral analysis. Artificial Neural Networks are utilized as the classifier, feature vectors extracted combine both modalities of the human speech. Additional experiments with the degraded acoustic and/or visual information are carried out in order to test the system robustness against various distortions affecting the signals.
Combining Visual and Acoustic Modalities to Ease Speech Recognition by Hearing Impaired People

Laskaris, Konstantinos; Orinos, Chris; Tsakiris, Vassilis
In this paper we get into the correlation of subjective with objective tests of multichannel systems that use one or more Subwoofers along with Digital Equalization. We approached this subject in a preceding paper with literature-standard metrics of the frequency response as measured in various places in the room. However there was still a lack of more advanced metrics to offer a better insight into how the bass/mid frequencies integration is perceived by the listeners at various places in the listening room. Now we present comparisons between measurements of more metrics with listeners’ comments, as gathered from blind tests. The comparisons show good correlation of the recommended metrics with the listening tests when using various Digital Equalization systems.
Objective and Subjective Evaluation of Digital Equalization Systems - Measurements of Resonances and Colorations

Dobrucki, Andrzej; Kozlowski, Piotr
This document presents almost final view for the research about objective methods, which use psychoacoustics knowledge for estimation of the quality of audio signals. The software written especially for this research is presented. This program allows for implementation of the different published methods for evaluation of the quality of perceptual coded audio signals. Protocols: PAQM, PSQM, NMR, PEAQ, PESQ are implemented up to now. All of these algorithms are used for simulation of the auditory system. The software is open for addition next protocols as the plug-ins. There is a possibility to change and improve earlier published protocols. Authors proposed in earlier works how to improve objective protocols e.g. by changing pitch scale or FFT parameters. Suggested tuning of internal parameters of signal processing, which improves results of objective evaluation, is presented. The criterion of optimization is difference between results of subjective and objective evaluation.
Tuning of the Objective, Perceptual Based Evaluation Methods of Compressed Speech and Audio Signals

Kot, Valery; van de Par, Steven; van Schijndel, Nicolle
In many state-of-the-art parametric audio codecs, the signal is composed of sinusoids combined with synthetic noise. The sinusoids represent the perceptually most relevant signal part, while the remaining part is represented with noise. This structure poses a problem for developing a scalable sinusoidal coder where certain layers of the bit stream can be dropped to lower bit rate. When dropping a layer containing a significant portion of sinusoids, the accompanying synthetic noise should be adapted to fit the remaining sinusoidal components. A scheme is proposed here that can adapt the noise signal depending on the number of sinusoidal layers that are dropped without the need to send any information about the adaptation of the noise coder in the bit stream.
Scalable Noise Coder for Parametric Sound Coding

Edler, Bernd; Niemeyer, Oliver
An efficient encoding of excitation patterns designed for bit rates between 4 and 10 kbit/s is presented. It is based on a two dimensional transform of the excitation patterns in frequency and time direction, followed by a bit plane encoding of the resulting transform coefficients. The bit plane encoding ensures, that all significant coefficients are captured, both for excitation patterns resulting from more tonal and transient like audio signals. In such a way, coded excitation patterns can be used in a scalable noise coder, but also can substitute the scale factors, which usually control the quantisation in subband/transform audio coders. A subband/transform audio coding scheme is presented, which combines these two applications in one system.
Efficient Coding of Excitation Patterns Combined with a Transform Audio Coder

Ferreira, Anibal J. S.; Sen, Deep; Sinha, Deepen
In the application of conventional audio compression algorithms to low bit rate audio coding one is faced with the unsatisfactory tradeoff between coarser quantization and audio bandwidth reduction. BandwidthExtension has therefore emerged as an important tool for the satisfactory performance of low bit rate audio codecs. In this paper we describe one of a newer class of Frequency Extension techniques which are applied directly to the high frequency resolution representation of the signal (e.g., MDCT). This particular technique is based on a Fractal Self-Similarity Model (FSSM) for the short-term frequency representation of the signal and takes advantage of the high frequency resolution of the MDCT, namely in terms of parameter estimation.. The FSSM model, which may include multiple dilation and translation terms, has been found to be effective for a wide variety of speech and music signals and provides a compact description for long term correlation that may exist in frequency domain.. The Structure of the FSSM model is presented, issues related to parameter estimation, and its application to audio coding for bit rates of 8-48 kbps are discussed.
A Fractal Self-Similarity Model for the Spectral Representation of Audio Signals

Edler, Bernd; Meine, Nikolaus
A source coding algorithm based on the classic Markov model is presented, which uses Vector Quantization and arithmetic coding in conjunction with a dynamically adapted context of previously coded vector indices. The core of this algorithm is the numerically optimized mapping from a large number of source states to a small number of different code tables. This enables its application to audio coding, where it provides higher efficiency than the quantization and lossless coding used in MPEG-AAC.
Improved Quantization and Lossless Coding for Subband Audio Coding

Hamasaki, Kimio; Nishiguchi, Toshiyuki
To study the difference of hearing impression among several high sampling digital recording formats, we conducted subjective evaluation tests of perceptual discrimination among following digital recording formats: 48kHz 24bit PCM, 192kHz 24bit PCM and DSD. Sound stimuli for the evaluation were originally recorded to maintain the exact same quality of analog input feeds to the three A/D conversion systems. The sound reproduction system for the subjective evaluation tests was also carefully designed to reproduce the original quality of each digital recording format. Listening panels were selected from students and faculty stuffs of university of music, recording engineers, and musicians. A pair test method was applied to the subjective evaluation. Through the several evaluation tests, no significant difference was observed.
Differences of Hearing Impressions Among Several High Sampling Digital Recording Formats

Janssen, Erwin; Reefman, Derk
This paper gives a high level introduction to the lossless DSD compression algorithm, used within the context of Super Audio CD, and currently in the process of MPEG standardization. Measurement results on various commercially available discs are shown and the achievable disc playing time is illustrated. Furthermore, the flexibility of DSD is demonstrated by the introduction of a new Trellis based Sigma Delta Modulator (SDM) which is at least four times less CPU demanding than previously published Trellis SDMs. With this new Trellis coder ultra high quality SA-CD recordings, with dynamic range in excess of 115 dB and bandwidth of about 80 kHz, are achieved. Typical SA-CD playing times of over 80 minutes, even for difficult, 2+6 channel, pop material are realized.
DSD Compression for Recent Ultra High Quality 1-bit Coders

Hawksford, Malcolm J.
Sigma-delta modulation (SDM) and pulse-width modulation (PWM) are compared as a means of structuring power digital-to-analogue converters (PDAC) designed specifically for wide-band audio, low power loss and direct loudspeaker dive. Recent innovations in SDM coding and output-stage topologies using pulse shaping techniques are discussed with emphasis on achieving stable and low distortion operation especially under high-level signal excitation having a modulation index circa 0.7 necessary to process peak transients. A simplified variant of predictive SDM with step back is introduced that offers both low latency and probability of instability and structures for both analogue and digital input data reviewed.
SDM Versus PWM Power Dgital-to-Analogue Converters (PDAC) in High-resolution Digital Audio Applications

Karlsson, Erlendur; Paavola, Matti; Page, Jonathan
The mobile terminal (telephone) is rapidly evolving from its origins as a basic device for voice communication into being an advanced multimedia computer able to handle demanding signal processing tasks in real time. Meanwhile, Java programming interfaces are gaining momentum as the preferred approach for third party application development on these same devices. This paper provides an introduction to a new standard interface known as "Advanced Multimedia Supplements" for accessing these features of new mobile devices from the Java programming language. The new interface augments the existing mobile media specification with mechanisms to control audio effect processing in real-time, including 3D positional audio and reverberation, all of which can be synthesized using standard stereo headphones or stereo microspeakers.
3D Audio for Mobile Devices via Java

Jordan, Frank
We present an add-on to cinema projectors that allows projecting subtitles or playing soundclips. These functions require precise timing information to keep both the video content and the added features synchronized. Although some digital projectors offer this timing information, our solution is compatible with all projectors worldwide. This allows cinema operators to offer additional features to their audience without moving to a new and expensive projector system. Our system is simply connected to the analog output of the projector. Based on a pre-computed image file and the live analog input, it calculates the timing through cross-correlation. The system is basically a standard PC and requires no dedicated DSP, instead it is based on “native” signal processing techniques on the CPU.
Generating Time Code Information from Analog Sources

Fonseca, Nuno; Monteiro, Edmundo
In the digital world, latency can be the source for many problems affecting sound, from simple psychoacoustic discomfort to changes in audio quality. Due to its complexity (and in some cases non-deterministic behaviour) audio networking is even more susceptible to this type of problems. This paper presents the major problems created by different types of latencies in an audio networking solution. To achieve a better analysis, a separation is made between problems affecting audio sample transportation and clocking.
Latency Issues in Audio Networking

Foss, Richard; Fujimori, Jun-ichi; Kounosu, Ken; Laubscher, Rob
The Plural Node architecture is an implementation architecture for professional audio devices that adhere to the “Audio and Music (A/M)” protocol. The A/M protocol determines how audio and MIDI data are transported over IEEE 1394 (firewire). The Plural-Node implementation architecture comprises two components on separate IEEE 1394 nodes – a “Transporter” component dedicated to A/M protocol handling, and an “Enabler” component that controls the Transporter and provides high level plug abstractions. Low level control of individual Transporters occurs within the “Hardware Abstraction Layer” (HAL) of the Enabler. Device manufacturers write their own plug-ins for the HAL to interact with their Transporters. The Open Generic Transporter specification provides an open interface between the HAL and Transporter for the convenience of device manufacturers.
An Open Generic Transporter Specification for the Plural Node Architecture of Professional Audio Devices.

Kummer, Jean Christophe; Stocker, Daniel
Narrow bandwidth audio is known to be one tough subject of watermarking when it comes to data density and transparency at the same time. Audio archives possessing rich selection of vintage material tend to license their assets without considering protection of any kind. In the presented paper we show how R2O watermarking technology succeeds providing an optimal solution in such environments.
Watermarking of Archive Recordings

Faller, Christof
Conventional stereophonic processes allow playback of mono audio signals with a stereo effect. The stereo effect is limited to mimicking ambience or signal independent left/right separation and thus no realistic sound stage is reproduced. This paper proposes two techniques for converting old mono recordings to two or more channel stereo signals with a realistic sound stage and ambience. One technique is fully automatic and imposes the auditory spatial image of a given modern stereo recording onto a corresponding old mono recording. The other technique is based on manual input of a sound engineer to generate a desired sound stage and ambience. The underlying mono-to-stereo synthesis process is the same as has been recently proposed for use in low bitrate audio coding.
Pseudostereophony Revisited

Brock-Nannestad, George
The sound record when sold as a physical entity, be it as a short item (a shellac disc or a single), a longer item (LP records or tapes), or a very long item (CD or DVD), has always been accompanied by items of information that relate to it. This information at least functions as a commercial enhancement. In other words, the commercial record is a cultural phenomenon that accepts its buyers into a community that obeys particular rules. The paper makes an overview of the physical embodiment of this information and how it presents the content. However, it is also necessary to look at the inherent, perhaps non-intended information in the sound signal and on the carrier itself.
More Than Sound

Pizzi, Skip
In this presentation, the speaker will offer an industry update on digital audio and radio standards and technologies, with a focus on the digital rights management issue.
Content Protection for Digital Radio

Berg, Jan
The evaluation of different aspects of audio quality can be realised by means of attribute scales. Studies have shown that the attributes selected are of great importance for the evaluation result. Consequently, the process whereby these attributes are generated has to be given careful consideration. It was previously shown that elements from the repertory grid technique facilitated the elicitation and grading of quality attributes, which resulted in a new audio quality evaluation method. The result from this work has now been implemented as a software prototype aimed to support listening tests. This paper reports on the results from a pilot experiment involving the OPAQUE software.
OPAQUE – A Tool for the Elicitation and Grading of Audio Quality Attributes

Ford, Natanya; Nind, Tim; Rumsey, Francis
A method is presented which details how a descriptive language can be developed for effectively communicating listeners’ individual auditory spatial experiences during subjective evaluations. The language-development method focuses on identifying and minimising ambiguities which could prevent the representation of listeners’ experiences or the researcher’s comprehension of these experiences when communicated. The development of a specific descriptive graphical language provides an example of the method in practice. Details of this particular language’s evolution are summarised; from the elicitation and clarification of listeners’ individual graphical descriptors, to the development and evaluation of a communal language. Ambiguities encountered at the various stages in this language’s development are illustrated in a descriptive process model.
Communicating Listeners’ Auditory Spatial Experiences: A Method for Developing a Descriptive Language.

Budzynska, Luiza; Jelonek, Jacek; Lukasik, Ewa; Slowinski, Roman; Susmaga, Robert
The paper compares the process and results of two different methods of performing rankings of a quality of musical instruments voices. A dedicated software tool enables the presentation of recorded sounds to the expert who makes his/her assessment according to particular criteria using a multistimulus test in a scale from 1 to n and a pairwise comparison followed by Net Flow Scoring method (NFS). Several aspects of the process have been analyzed, e.g. consistency of the results for both methods, stability of rankings over time (using Kendall’s coefficient values) as well as an assessment of cognitive effort from the expert in each ranking method. Multistimulus test appeared to be easier to perform but less distinctive than pairwise comparison.
Multistimulus Ranking versus Pairwise Comparison in Assessing Quality of Musical Instruments Sounds

Wickelmaier, Florian; Choisel, Sylvain
A selection procedure was devised in order to select listeners for experiments in which their main task will be to judge multi-channel reproduced sound. 91 participants filled in a web-based questionnaire. 78 of them took part in an assessment of their hearing thresholds, their spatial hearing, and their verbal production abilities. The listeners displayed large individual differences in their performance. 40 subjects were selected based on the test results. The self-assessed listening habits and experience in the web-questionnaire could not predict the results of the selection procedure. Further, the hearing thresholds did not correlate with the spatial-hearing test. This leads to the conclusion that task-specific performance tests might be the preferable means of selecting a listening panel.
Selecting Participants for Listening Tests of Multichannel Reproduced Sound

Ozcan, Koray
The multiple frequency wideband signals were processed using individual HRTFs to adjust the sound source direction. Main localisation cues were tested versus direction in conflict. The method previously presented through the use of the Hilbert transform enabled phase and time become independent from each other even for wideband signals. It was therefore possible to put phase in conflict with HRTF whilst leaving the amplitude characteristics unchanged of stereo signals. The results show that phase is less significant than time against direction. Furthermore the central diffuse sound field which occurs when intensity and time in conflict that reduces the localisation performance is not present when the direction and either intensity, time or phase are placed in conflict.
The Importance of Phase in the Presence of Sound Source Direction

Azzali, Andrea; Cabrera, Densil; Capra, Andrea; Farina, Angelo; Martignon, Paolo
Binaural binaural room impulse responses convolved with anechoic recordings are commonly used in auditorium acoustics design and research.Binaural and stereophonic(ORTF) room impulse responses, which had been recorded in 5 concert auditoria, were used in this study to test the spatial audio quality of four reproduction systems: conventional stereophony, binaural headphones, stereo dipole, and double stereo dipole. Anechoic music, convolved with the impulse responses, was reproduced over these systems. The systems were matched as closely as possible to each other, and to the sound levels that would occur in the auditoria for the musical source.In a subjective test, subjects rated the room size, sound source distance and realism of the reproduction. Results indicate best spatial reproduction from the stereo dipole systems.
Reproduction of Auditorium Spatial Impression with Binaural and Stereophonic Sound System

Martens, William L.; Marui, Atsushi
To develop a model for predicting the timbral variation of guitar tones resulting from multiparameter distortion-based effects processing, physical measures on a set of guitar signals were related to both perceptual and semantic data collected from a group of young adults. Stimuli were generated using 3 types of distortion processing, the outputs of which were adjusted to yield 3 values of Zwicker Sharpness (ZS). Presented with all pairwise comparisons of 9 versions of a single guitar performance, 60 listeners made perceptual dissimilarity ratings in order to derive a stimulus space comprising the 3 most salient dimensions upon which the guitar timbres differed. Also, 49 listeners made direct ratings on 11 bipolar adjective scales for the same 9 stimuli to aid in the interpretation of the stimulus space. Coordinates on the 1st stimulus space dimension could be related to the ZS values computed for the physical signals, while the 2nd and 3rd dimensions distinguished between the 3 types of distortion effects processors employed in stimulus generation. These 2 dimensions could be predicted from measures of spectral features that remained after removing spectral tilt related to the ZS of the stimuli. The dark-bright adjective ratings were more highly correlated with ZS (r=.795), while the sharp-dull adjective ratings were more highly correlated with Spectral Skewness (r=.804).
Predicting Timbral Variation for Sharpness-Matched Guitar Tones Resulting from Distortion-Based Effects Processing

Ghido, Florin
We propose a near optimal, asymmetrical, low complexity, block-based arithmetic coding algorithm for the prediction residuals produced by lossless or lossy audio compression algorithms. The analysis of real-world prediction residuals motivates the use of a modified two-sided continuous generalized Gaussian distribution, pdf(x)=c(p,s)*exp[-(|x|/s)^p], which is mapped to a discrete distribution. Closed form formulas and fast numerical estimates for the block parameters (p,s) are derived. Using a distribution property, precomputed probability tables are obtained for different values of p, independent of s. The compression performance is within 0.2% of a high complexity, symmetrical coder. For very low complexity codes, an alternate coding algorithm is also proposed, employing only bit output, with comparable performance.
Near Optimal, Low Complexity Arithmetic Coding for Generalized Gaussian Sources

Bertotti, Claudio; Brayda, Luca-Giulio; Cristoforetti, Luca; Omologo, Maurizio; Svaizer, Piergiorgio
This work describes an activity that allowed to realize a modified NIST MarkIII array. This system is able to acquire 64 synchronous audio signals at 44.1 kHz and is primarily conceived for far-field automatic speech recognition and speaker localization. Preliminary experiments conducted on the original array had showed that coherence among a generic pair of signals was affected by a bias due to common mode electrical noise, which turned out to be detrimental for time delay estimation techniques applied to co-phase signals or to localize speakers. A hardware intervention was realized to remove each internal noise source from analog modules of the device. The modified array provides a quality of input signals that fits results expected by theory.
Modifications on NIST MarkIII Array to Improve Coherence Properties Among Input Signals

Ignatov, Pavel
The first Russian film studio, created in Petrograd-Saint-Petersburg November 1918 played a significant role in the development of Russian Cinematography (later it was called “Lenfilm”). Since its foundation “Lenfilm” has created more than 1500 sound films with such outstanding tonmeisters as I. Dmitriev, I. Volk, E. Nesterov, L. Obolensky, A. Shargorodsky, A .Bekker, B. Khutoriansky, etc working as a tonmeister. In 2001 “Lenfilm” carried out a project of technical re-equipment of sound recording appliances (Dolby SR, Dolby Digital, Dolby Digital Surround EX). The analysis of the development of tonmeister’s art on “Lenfilm” for the period of more then 90 years is the main aim of this paper.
The Hstory of Tonmeister’s Technique on “Lenfilm” Studio in Saint-Petersburg

Jeong, Jae-woong; Kim, Jeong-Tae; Lee, Junho; Park, Young-cheol; Youn, Dae-hee
This paper presents a new method of designing crosstalk cancellers using low-order IIR filters. The design is accomplished by approximating FIR filters optimized in the warped frequency domain with IIR filters realized in the linear frequency domain. For practical implementations, the dewarped poles and zeros are properly paired and ordered to avoid instability. Also, considering finite wordlength effects, we determine suitable number of data bits for the implementation systems. Our method provides an efficient tool to design a crosstalk canceller that has excellent performance over a wide range of frequencies and structural complexity comparable to conventional crosstalk cancellers. Simulation results confirm that the use of the proposed method significantly improves the channel separation over a wide range of frequencies.
Design and Implementation of IIR Crosstalk Cancellation Filters Approximating Frequency Warping

Andersen, Michael; Ljusev, Petar
This paper presents a new class of switching-mode audio power amplifiers, which are capable of direct energy conversion from the AC mains to the audio output. They represent an ultimate integration of a switching-mode power supply and a Class-D audio power amplifier, where the intermediate DC bus has been replaced with a high frequency AC link. When compared to the conventional Class-D amplifiers with a separate DC power supply, the proposed single conversion stage amplifier provides simple and compact solution with better efficiency and higher level of integration, leading to reduced component count, volume and cost.
Switching-mode Audio Power Amplifiers with Direct Energy Conversion

Bozzoli, Fabio; Farina, Angelo; Viktorovitch, Michel
In room acoustics, one of the most used parameters for evaluating the speech intelligibility is the Speech Transmission Index (STI). the experimental evaluation of this STI generally employ an artificial speaker (binaural head) and listener (artificial mouth). In this study, the influence on the measurements of the emission directivity of the artificial mouth was investigated for different acoustic environments and we have found that, in many cases ( i.e. big rooms or systems of telecommunications ) the results is not sensitive to modifications of the directivity; on the contrary, inside cars, the shape of the whole balloon of directivity is important for determining correct and comparable values and the different mouth studied gives really different results in the same situation. Afterwards we measured, in an anechoic room, the 3D directivities of a statistical population of speakers. The post-processing of the results enabled us to determine the average and the standard deviation of human speech directivity. ficial mouths’ one These results constitute a valuable source of information for assessing the compliances of artificial mouth to reality.
Balloons of Directivity of Real and Artificial Mouth Used in Determining Speech Transmission Index

Smirnoff, Serge
This paper describes a new objective parameter, called “Difference Level” that could be used for preliminary technical estimation of signal degradation in various audio circuits – digital and analogue, hardware and software based. This parameter could be considered either as an extension of THD for non-periodic signals, or as one of the estimations of widely used difference signal.
Difference Level: An Objective Audio Parameter

Ferekidis, Charalampos; Panzer, Joerg
This paper describes the intention, concept and definition of a new specification for a data transfer protocol. The transfer typically originates from a measurement device or simulation package and is collected at a processing and documentation tool. In order to make this exchange as comfortable as possible it is necessary to supply a range of control settings, which describe the meaning and arrangement of the data values within the data set. The layout of the specification strongly supports 3D measurement situations, such as directivity measurements, and generally parameterised sets of data. The format of the protocol is easy to read, flexible and straightforward to implement.
An Easy to Use Import-Export Data Format Specification for Response Type Data

Kim, Jungho; Kim, Sunmin; Kim, Youngtae; Lee, Joonhyun; Park, Sang-il
An extensive data base of HRTF (Head Related Transfer Function) has been established in order to work with high qualities of 3D acoustic appliances. In comparison to previous measured HRTF data base from other researchers, the suggested data base is more suitable for Dolby referenced DVD titles because it sets the distance between a sound source and ear as 2m. It was possible to get a better sound quality when the data base was applied to a 3D sound algorithm that was intended to produce virtual sound using only two front loudspeakers.
New HRTFs (Head Related Transfer Functions) for 3D Audio Applications

Nijs, Lau; Saher, Konca; van der Voorden, Marinus
There are now many commercially available computer programs for predicting acoustics of rooms. All of these computer programs work on the basis of definition of room’s geometry, material properties (mainly the absorption and diffusion characteristics) and definition of the sound source. The results of the calculation are then strongly dependent upon the approximation of these parameters. In this paper, difficulties in the process are indicated and then a methodology for the calibration procedure for ray-tracing software is suggested. The proposed methodology consists of three main items: ‘Absorption-calculator charts’, ‘Diffusion Estimation’ and ‘G-RT graphs’. The mentioned issues of the calibration methodology are briefly discussed and examined by a case study.
Denition of Material Properties in the Acoustical Model Calibration

Jenkin, Michael; Kapralos, Bill; Milios, Evangelos
A problem with ray-based acoustical modeling approaches is handling the potentially large number of interactions between a sound ray and any objects/surfaces it encounters. Typical solutions to modeling these interactions include emitting several "new" rays at each interaction point. Such solutions are computationally expensive except for simple environments. Rather than using such deterministic strategies, probabilistic techniques such as Russian Roulette can be applied instead. Russian Roulette ensures the path length of each acoustic ray is kept at a manageable size yet allows for paths of arbitrary size to be explored. Here we describe the application of Russian Roulette to acoustic modeling. Experimental results demonstrate the ability of Russian Roulette to provide a computationally reasonable solution to room acoustical modeling.
Acoustical Modeling Using a Russian Roulette Strategy

Embrechts, Jean-Jacques; Lesoinne, Stephane; Werner, Nicolas
Auralization in room acoustics is created by the convolution of an anechoic audio signal with the RIR (room impulse response), either computed or measured at the receiver location. When headphones are used for reproduction, the same auralized signal is often sent to both ears, for simplicity. This paper addresses the computation of directional impulse responses by a sound ray program. These responses are used in the convolution process in combination with HRTFs to simulate not only the room reverberation, but also the angular location of all sound contributions to the receiver. Computational problems include the separate treatment of specular and diffuse reflections and the compromise between computing time, the number and the accuracy of directional RIRs.
Computation of Directional Impulse Responses in rooms for Bettter Auralization

Vieira, Jose
Some audio systems such as sound localizers and hearing-aids can be affected in their performance by strong reverberation. Their algorithms can however adapt their algorithms if as estimate of the reverberation time is available. Usually, those systems are not able to measure the reverberation time of the room using test signals. The only available way is to use the captured signals to estimate the reverberation time. This article presents a method to estimate the reverberation time of a room from detecting the “tails” of the received sounds.
Estimation of Reverberation Time without Test Signals

Lokki, Tapio
A novel auralization method, in which the propagation speed of sound is slowed down, is presented. The auralization in slow motion enables to study the details of simulated impulse responses, such as spatial distribution of early reflections and spectrum of each reflection. It gives more information than conventional binaural or multichannel auralization of impulse responses. The proposed method can be applied in concert hall design as well as in teaching of room acoustics.
Auralization of Simulated Impulse Responses in Slow Motion

Ferreira, Anibal J. S.; Leite, Antonio
This paper presents several improvements that have been introduced on the design and operation of an adaptive 20-band room equalizer. The equalizer is implemented on a TMS320C6711 DSP platform and performs fast FIR filtering in the frequency domain. In order to reach fast adaptation to time-varying acoustic conditions, several adaptation rules operating in the frequency domain have been evaluated and the impact of a frequency-varying step-size parameter on the convergence rate has been studied. These results will be presented along with ideas and plans for future developments.
An Improved Adaptive Room Equalization in the Frequency Domain

Molero Milán, Jose Luis
Sound recording and listening rely mainly in the acoustic characteristics of the environment. Places specifically build for a specific application (music recording, conferences or concert rooms) must have the right acoustic quality. In this way, listening an acoustic signal in a room, generally is affected for different elements, inherent to the room, like reverberation, echoes and resonance or the lack of them, the sources sound coverage, etc. All these elements go with the main signal and affect stereophony and intelligibility.
Requirements to Acoustically Prepare a Recording Studio

Kalliris, George; Papanikolaou, George; Sevastiadis, Christos; Zarras, Christos
An interesting part of the remaining ancient Roman buildings is that of odea. One of them is the odeion at the ancient market in the center of Thessaloniki, in northern Greece. It’s a small, amphitheatric hall whose exact shape and acoustical characteristics are not known, since its roof is not preserved. Acoustical measurements and an accurate model of it, as it is today is presented. Moreover, the acoustical characteristics of the roofed hall are estimated, using certain types of roofs
Acoustic Modeling of Ancient Odeion of Thessaloniki

Pincus, Michael
While the theory and practical applications of line array loudspeaker systems are well known, cost-effective digital signal processors have enabled new systems to feature beams that can be steered and adjusted. This in turn has allowed the use of these systems in multipurpose spaces. This presentation will demonstrate an example of how an active line array system can be used in an auditorium featuring a moveable partition.
An Application for a Digitally Steerable Array

Vila, Carlos
In the field of electroacoustic installations, several more or less sophisticated methods may be used to avoid annoying feedback “howl” and boost the potential acoustic gain (PAG) by a few dB. This paper discusses the use of a digital frequency shifter instead of more popular narrow-band filtering techniques. The device has been implemented using a readily available programmable DSP platform and its efficiency has been proven by measurements in various environments.
Digital Frequency Shifting For Electroacoustic Feedback Suppression

Filevski, Vladimir
Proposed amplifier has direct three-phase 400V power supply which consists only of diodes and uses neither filtering capacitors nor transformer. Proposed amplifier consists of 3 modules. The first module is a conventional Class-D digital audio amplifier, which can work on high voltage supply. The second module is a simple 3-phase diode rectifier which rectifies alternating voltage of a 400 V three-phase system. The third module is a negative feedback system that nearly eliminates the ripple voltage in the output signal. With the help of the third module (negative feedback for the power supply ripple) the first module (conventional class-D amplifier) actually behaves as it has power supply with the voltage equal to DC component Vdc of the rectified three-phase voltage.
High-Power Amplifier With Direct Three-Phase 400 V Power Supply

Erne, Markus; Faller, Christof
We are addressing the following scenario: A concert is recorded with a stereo microphone technique. Additionally, several instruments/groups of instruments are recorded with spot microphones. The proposed technique adaptively in time estimates the acoustic transfer functions (ATFs) between the spot microphones and the left and right stereo microphones. The spot microphones, filtered with the estimated ATFs, are scaled and subtracted/added from the stereo microphone signals to attenuate or amplify the corresponding instruments. No amplitude panning and reverberators are needed, while the auditory spatial image attributes of the stereo recording are not altered. Other attributes than the level of instruments can be modified by adding/subtracting spot microphones filtered only with the early or late part of the estimated ATFs.
Modifying Stereo Recordings using Acoustic Information Obtained with Spot Recordings

Bork, Ingolf; Goerne, Thomas; Potratz, Udo
The early reflection patterns of recording rooms strongly influence the subjective quality of stereo or multichannel recordings, especially when the microphones are situated at a distance from the source. The present paper investigates possibilities for the balance engineer to optimize the early reflection pattern at the microphone positions in a chamber music hall utilizing a computer based ray tracing model.
Designing Early Reflection Patterns Suitable for Audio Recordings by Means of Acoustic Modeling

Slotte, Benedict
3-channel recording (of front channels in 5.1 surround) presents a few problems and hard choices that are easier to handle in 2-channel recording. One of these is achieving a good enough channel separation, with or without also achieving a sharp imaging. The present paper will especially emphasize the latter, and discuss a couple of methods and present a new hybrid setup that can achieve both good channel separation and sharp imaging across the L-C-R stereo stage at the same time. Microphone setups for hall ambience recording will also be briefly discussed.
Sharpening the Image in 5.1 Surround Recording

Dobrucki, Andrzej; Plaskota, Przemyslaw
In this paper numerical model of a human head is presented. This model is used for calculations of Head-Related Transfer Function (HRTF). Morphological features of human head as well as the structure of outer ear are described. Some of these features are used in the numerical model, other ones are passed over. The choice is detailed justified. Possibilities of parameterization of the model according to the individual anthropological features are presented.
Numerical Head Model to HRTF Simulation

Adelstein, Bernard; Anderson, Mark; BEGAULT, Durand; McClain, Byran
Realistic simulation and perceived “immersion” within a multimodal display for telecommunication and entertainment can be enhanced, or completely mitigated, as a function of inter-modal timing asynchronies between the multimodal rendering systems (system-system asynchrony). Tolerable asynchronies are have been extensively examined for auditory-visual stimuli but less so for auditory-haptic cues. Realistic simulation and perceived “immersion” can be enhanced, or completely mitigated, as a function of inter-modal timing asynchronies between the multimodal rendering systems. Data are presented from two experiments where auditory stimuli are varied in time of arrival (lead-lag) relative to a tactile pulse. Results indicate variability between participants in terms of overall sensitivity. Generally, sensitivity is greater for auditory stimuli leading haptic stimuli, compared to the opposite condition.
Thresholds for Auditory-Tactile Asynchrony

Ellermeier, Wolfgang; Minnaar, Pauli; Sivonen, Ville
The effect of sound incidence angle on loudness is investigated in this study using binaural synthesis. Individual head-related transfer functions (HRTFs) and headphone equalization are used to present narrow-band noises from different directions to listeners. Their task is to match the loudness of these stimuli in an adaptive procedure to a reference noise in front of the listeners. The results are compared to an earlier investigation with the same experimental design in a real sound field. Based on the results the role of the individual HRTFs in loudness judgments is inspected, and finally, binaural loudness summation of signals presented to the two ears is examined.
Effect of Direction on Loudness in Individual Binaural Synthesis

Ekman, Håkan; Berg, Jan
The research on the perceived depth in a sound image mainly concerns the distance to the source. Observations on the three-dimensional environment, is generally of an anecdotal nature. In order to enhance the perceived spatial quality of reproduced sound, it is important to know more about how depth in recordings is perceived and generated. The aim of this study is to put focus on the concept of depth, i.e. define what sound engineers mean by depth in recordings. This has been studied through interviews with sound engineers. The most common thought is that depth is equal to the distance to the sound sources. This definition is still not enough to encompass the whole concept of depth. The experienced environment in front of the listener may also affect the perceived depth.
The Three-dimensional Acoustic Environment as Depth Cue in Sound Recordings

Lorho, Gaetan
The quality of spatial enhancement systems for stereo headphone reproduction is investigated with three different methods. Five musical programs were processed with eighteen algorithms representing different approaches to spatial enhancement for headphones. A preference test was designed to assess the comparative performance of these systems with respect to an unprocessed stereo reference. A sensory profiling experiment was also performed to explore the perceptual characteristics of sound reproduced over headphones. A panel of twelve listeners developed sixteen attributes and used these descriptive scales to evaluate a subset of eight algorithms for three of the musical programs. Finally, several objective measures were applied to all test stimuli in order to study differences between the spatial enhancement systems. Results of the three studies are analysed and compared in this paper.
Evaluation of Spatial Enhancement Systems for Stereo Headphone Reproduction by Preference and Attribute Rating

Büchner, Andreas; Edler, Bernd; Nogueira, Waldo
Current speech processing strategies for cochlear implants are based on decomposing the audio signals into multiple frequency bands each one associated with one electrode. However, these bands are relatively wide to accurately encode tonal components of audio signals. To improve the encoding of tonal components and performance in cochlear implants, a new signal processing strategy has been developed. The technique is based on the principle of a so-called NofM strategy. These strategies stimulate fewer channels (N) per cycle than active electrodes (M) (NofM; N < M). Furthermore, it incorporates a fundamental frequency estimator which is used to emphasize the periodic structure of tonal components. The new technique was acutely tested on cochlear implant recipients. First intelligibility tests showed similar performance in speech perception between the new strategy and a standard NofM strategy.
Fundamental Frequency Coding in NofM Strategies for Cochlear Implants

Gomez-Alfageme, Juan; Sanchez-Alonso, Beatriz
As continuation of the paper presented at AES 114th Convention in Amsterdam 2003 has prosecuted with the future works proposed in it. First of all, a prototype of the band-pass system has been made. It is a variable geometry system, so in the internal cavities volume as in the tune tubes number and dimensions. Following, different measurements on the band-pass system have been made in an anechoic room. Finally, comparisons between measurements and simulations were developed, as described in the previous paper. As conclusion, it is possible to say that the results of comparison are very positive, which allows to validate the simulations developed and to continue using them as a training tools both in teaching as in prototype developing.
Simulation Tools in Electroacoustics: Comparison with Experimental Measurements

Bolaños, Fernando
The formation, synchronization and lock-in of the subharmonic frequencies, when working with the analytic form of the pressure signal, have been observed. A parametric or autoparametric mechanism seems to be the main cause of these subharmonic excitations and responses. The modal shape of the moving assembly including the coil, the former and the suspension has an important role in the whole dynamic of these transducers. One of the four analyzed samples presented a bilinear response, because of a spontaneous formation of sidebanding when it was excited in specific spectral regions. Another sample showed a combined behavior which was in between the bilinear character and the locked-in subharmonic. This sample exhibited regions in which the spontaneous sidebanding had a fractal structure.
Measurement and Analysis of Subharmonics and Other Distortions in Compression Drivers

Prokofieva, Elena
Theoretical investigation of conventional speaker radiation has been presented at AES 116 and 117, 2004. The concentric ring element was introduced to simulate the speaker’s diaphragm. The technical characteristics of this element were fully described in previous paper (preprint #6246). Four approximations of simulation models were developed to predict the behaviour of a real cone speaker. Most of the modern tweeters used in multi-way loudspeakers are rigid or soft domes. To describe the geometry of a spherical segment, the approximation with truncated cones can be used. Each cone can be presented as a number of ring elements. Two approximations of a standard dome with ring elements are provided in the present paper. The same method can be used for concave woofer drivers.
Dome Tweeter and Concave Woofer: Simulation Model Using Ring

Husnik, Libor
This article deals with the problem of signals used for driving transducers with the direct digital-to-analog conversion, which are sometimes called “Digital Loudspeakers”. Many claim elementary transducers, which transmit bit signals, should have ideally flat impulse response, imposing a very strict demand on them. In this article analysis of transducers in the field arrangements acting as signal filters is given. Various types of transducer transfer functions are modeled and their influence on digital (rectangular) signal is shown. As a result, types of transducer transfer functions for usage in digital loudspeakers are discussed.
Influence of Transfer Functions of Transducers Constituting the Loudspeaker with the Direct D/A Conversion on the Performance of the System

Swarte, Peter
A single transducer that is part of a positive or negative feedback circuit can achieve active acoustic absorption and reflection. It is shown that the mobility of the diaphragm is dependent on the load of the voice coil. By using the four-pole mathematics, we show a direct relationship between the electrical and acoustical terminals of the transducer. The test setup as well as the results by the subtraction of impulse responses are discussed. Even the sound transmission through the diaphragm is a part of this investigation. Possible applications will be discussed.
Active Acoustic Absorption and Reflection

Nakashima, Yusuke; Ohya, Tomoyuki; Yoshimura, Takeshi
The listening approach that uses a parametric array loudspeaker system on a mobile phone is proposed, and the acoustic characteristics of a mobile-phone-size super-directional loudspeaker prototype are reported. The prototype has two sets of parametric loudspeakers that emit narrow ultrasonic beams that are demodulated into audible frequencies by the nonlinearity of air. Using an ultrasonic carrier wave amplitude-modulated by audible frequency, we realize audible sound pressures above 70dBSPL at the distance of 50cm, and also confirm the high directivity of the sound field. With the small-size super-directional loudspeaker, it is possible to provide a personal sound field for hands-free playback in mobile environments.
Prototype of Parametric Array Loudspeaker on Mobile Phone and its Acoustical Characteristics

Shen, Xiaoxiang; Shen, Yong
The model of a panel with attached mass is developed and the modes of distributed mode loudspeaker (DML) have been analyzed with Rayleigh-Ritz method in this paper. Distribution of bending wave modes is determined not only by the geometrical and elastic parameters of panel but also by the mass attached on the panel, which can be predicted by our calculation. Therefore, the mass and position of exciters should be considered carefully by panel’s optimization program. And optimal result using generic algorithm (GA) is given in this paper.
Modal Optimization of Distributed Mode Loudspeaker

Adriano Ribeiro, Ricardo; Joaquim Serralheiro, Antonio; Simões Piedade, Moises
The nonlinear behavior of loudspeaker is responsible for distorting the sound it reproduces. In order to decrease that distortion, nonlinear controllers are used. However, a good knowledge of the loudspeaker parameters is required so that the controllers can be effective. Based both on a simplified nonlinear model of the loudspeaker and a modification to adaptive filers, systems for the estimation of the mechanical and electrical parameters of the loudspeaker where developed. The application of the Kalman and the RLS adaptive algorithms shown that both converge, although the convergence time for the electrical part estimation system was about 15 times slower than for the mechanical one.
Nonlinear Loudspeaker Adaptive Controllers using Kalman and RLS Adaptive Algorithms

Rousseau, Martial; Vanderkooy, John
Damping materials used in drivers often give rise to changes in resonance characteristics that can affect the acoustic properties. We study with several techniques the properties of the viscoelastic materials commonly applied to the surround or spider. Measurements are taken of the resonance characteristics with increasing temperature, which show a significant reduction of the suspension stiffness. Exercising the driver reduces the stiffness as well, and this is partly due to a combination of mechanical and thermal effects. An attempt is made to characterize the changes in terms of temperature and the history of the mechanical motion. Differential scanning calorimetry (DSC) measurements are made of some samples in an attempt to understand the underlying physical behaviour.
Visco-elastic Aspects of Loudspeaker Drivers

Back to AES Preprints

(C) 2005, Audio Engineering Society, Inc.