120th AES Convention - Paris, France - Dates: Saturday May 20 - Tuesday May 23, 2006 - Porte de Versailles

Registration
Exhibition
Calendar
4 Day Planner
Paper Sessions
Workshops
Tutorials
Exhibitor Seminars
Application Seminars
Student Program
Special Events
Technical Tours
Heyser Lecture
Tech Comm Mtgs
Standards Mtgs
Hotel Reservation
Presenters

AES Paris 2006


Home | Technical Program | Exhibition | Visitors | Students | Press
 

Last Updated: 20060425, mei

P17 - Posters: Signal Processing and High-Resolution Audio

Monday, May 22, 09:00 — 10:30

P17-1 All Amplifiers Are Analog, but Some Amplifiers Are More Analog than OthersBruno Putzeys, Hypex Electronics B.V. - Groningen, The Netherlands; André Veltman, Paul van der Hulst, Piak Electronic Design b.v. - Culemborg, The Netherlands; René Groenenberg, Mueta b.v. - Wijk en Aalburg, The Netherlands
This paper intends to clarify the terms “digital” and “analog” as applied to class-D audio power amplifiers. Since loudspeaker terminals require an analog voltage, an audio power amplifier must have an analog output. If its input is digital, digital-to-analog conversion is necessarily executed at some point. Once a designer acknowledges the analog output properties of a class-D power stage, amplifier quality can improve. The incorrect assumption that some amplifiers are supposedly digital, causes many designers to come up with complicated patches to ordinary analog phenomena such as timing distortion or supply rejection. This irrational approach blocks the way to a rich world of well-established analog techniques to avoid and correct many of these problems and realize otherwise unattainable characteristics such as excellent THD+N and extremely low output impedance throughout the audio band.

[Poster Presentation Associated with Paper Presentation P8-1]
Convention Paper 6690 (Purchase now)

P17-2 Toward an Ideal Switching (Class-D) Power Amplifier: How to Control the Flow of Power in a Switching Power CircuitRolf Esslinger, Dieter Jurzitza, Harman/Becker Automotive Systems - Karlsbad, Germany
The design of a switching (class-D) audio power amplifier suitable for high-end audio applications is still a very challenging task for circuit design and signal processing engineers. Classical power stage topologies using Pulse-Width Modulation (PWM) in combination with voltage-controlled MOSFET H-bridges are already available on the market, but their performance in terms of signal bandwidth and linearity is still far below the one of traditional class-A and A/B power stages. Moreover, EMC is an issue that is very hard to control. Class-D output stages are considered from a totally different point of view in this paper. The flow of power in the output stage, containing the switching power stage as a power control element, the output filter as an energy store, and the load as both a power sink and a power source in case the load is not a resistor but a real world loudspeaker device. It is shown, where in a typical power stage the power loss occurs, which is dissipated as heat. To improve the quality and efficiency of high-frequency switched power stages, investigation has to be taken into the way, how to control the flow of power into the storage elements and how to charge them most precisely and most efficiently. Some fundamental approaches for this will be shown in this paper.

[Poster Presentation Associated with Paper Presentation P8-2]
Convention Paper 6691 (Purchase now)

P17-3 PWM Amplifier Control Loops with Minimum Aliasing DistortionLars Risbo, Texas Instruments Denmark A/S - Lyngby, Denmark; Claus Neesgaard, Texas Instruments Inc. - Dallas, TX, USA
PWM class-D audio power amplifiers typically contain a control loop filter network and a comparator producing the PWM signal. The comparator performs a sampling operation whenever it changes state. A previous paper by the author analyzed this sampling behavior from a small signal point of view. The present paper attempts to formulate a large-signal model that accounts for the nonlinear effects of the sampling due to aliasing of high frequency carrier components. Closed-form expressions for the intrinsic THD of the traditional first- and second-order loops are derived. The model is validated using simulations, and a class of Minimum Aliasing Error (MAE) loop filters is presented that obtains minimum aliasing distortion thanks to the use of quadrature sampling. Finally, measurement data are presented for real applications using the principles described.

[Poster Presentation Associated with Paper Presentation P8-4]
Convention Paper 6693 (Purchase now)

P17-4 Simple, Ultralow Distortion Digital Pulse Width ModulatorBruno Putzeys, Hypex Electronics BV - Groningen Belgium
A core problem with digital pulse width modulators is that effective sampling occurs at signal-dependent intervals, falsifying the z-transform on which the input signal and the noise shaping process are based. In a first step the noise shaper is reformulated to operate at the timer clock rate instead of the pulse repetition frequency. This solves the uniform/natural sampling problem, but gives rise to new nonlinearities akin to ripple feedback in analog modulators. By modifying the feedback signal such that it reflects only the modulated edge of the pulse train this effect is practically eliminated, yielding vastly reduced distortion without increasing complexity.

[Poster Presentation Associated with Paper Presentation P8-5]
Convention Paper 6694 (Purchase now)

P17-5 A High Performance Open Loop All-Digital Class-D Audio Power Amplifier Using Zero Positioning Coding (ZePoC)Olaf Schnick, Wolfgang Mathis, University of Hannover - Hannover, Germany
Open loop all-digital Class-D amplifiers are uncommon due to the lack of the correcting feedback path leading to several problems resulting in high distortion compared to analog controlled class-D amplifiers. This paper shows that SB-ZePoC lowers switching frequency to 100 kHz. Therefore, these problems can be solved, so that it is possible to design an open loop all-digital class-D audio amplifier with low total distortions in the whole audio-band (20 Hz to 20 kHz) and an efficiency that reaches 90 percent. Results of a test-setup will be presented. The sonic performance will be demonstrated during the session.

[Poster Presentation Associated with Paper Presentation P8-6]
Convention Paper 6695 (Purchase now)

P17-6 A Three-Level Trellis Noise Shaping Converter for Class D AmplifiersLudovico Ausiello, Riccardo Rovatti, University of Bologna - Bologna, Italy; Gianluca Setti, University of Ferrara - Ferrara, Italy
Class D amplifiers can represent signals with three different output levels, +Vcc, 0, -Vcc, with no distortion. Exploiting this in order to achieve a better performance with no switching frequency increase, an extension to the classic pulse width modulation two level A/D conversion is proposed. Coding is achieved by extending output waveforms of a trellis-based sigma delta modulation to three levels. Simulation results have shown that, using the same symbol rate, a three-level pattern is achieved from 3.7 to 8.2 dB of SINAD improvement and a power consumption up to 5 times smaller.

[Poster Presentation Associated with Paper Presentation P8-7]
Convention Paper 6696 (Purchase now)

P17-7 Using SIP Techniques to Verify the Trade-Off between SNR and Information Capacity of a Sigma Delta ModulatorCharlotte Yuk-Fan Ho, Joshua Reiss, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK
The Gerzon-Craven noise shaping theorem states that the ideal information capacity of a sigma delta modulator design is achieved if and only if the noise transfer function (NTF) is minimal phase. In this paper it is found that there is a trade-off between the signal-to-noise ratio (SNR) and the information capacity of the noise shaped channel. In order to verify this result, loop filters satisfying and not satisfying the minimal phase condition of the NTF are designed via semi-infinite programming (SIP) techniques and solved using dual parameterization. Numerical simulation results show that the design with a minimal phase NTF achieves near the ideal information capacity of the noise shaped channel, but the SNR is low. On the other hand, the design with a nonminimal phase NTF achieves a positive value of the information capacity of the noise shaped channel, but the SNR is high. Results are also provided that compare the SIP design technique with Butterworth and Chebyshev structures and ideal theoretical SDMs, and evaluate the performance in terms of SNR and a variety of information theoretic measures which capture noise shaping qualities.

[Poster Presentation Associated with Paper Presentation P8-8]
Convention Paper 6697 (Purchase now)

P17-8 Estimation of Initial States of Sigma-Delta ModulatorsCharlotte Yuk-Fan Ho, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK; Joshua Reiss, Queen Mary, University of London - London, UK
In this paper an initial condition of a sigma-delta modulator is estimated based on quantizer output bit streams and an input signal. The set of initial conditions that generate a stable trajectory is characterized. It is found that this set, as well as the set of initial conditions corresponding to the quantizer output bit streams, are convex. Also, it is found that the mapping from the set of initial conditions to the stable admissible set of quantizer output bit streams is invertible if the loop filter is unstable. Hence, the initial condition corresponding to given stable admissible quantizer output streams and an input signal is uniquely defined when the loop filter is unstable, and a projection onto convex set approach is employed for approximating the initial condition.

[Poster Presentation Associated with Paper Presentation P8-9]
Convention Paper 6698 (Purchase now)

P17-9 Clean Clocks, Once and for All?Christian G. Frandsen, TC Electronic A/S - Risskov, Denmark; Chris Travis, Sonopsis Ltd. - Wotton-under-Edge, Gloucestershire, UK
Network-based digital audio interfaces are becoming increasingly popular. But they do pose a significant jitter problem wherever high-quality conversion to/from analog is required. This is true even with networks such as 1394 that provide dedicated support for isochronous flows. Conventional PLL solutions have too-little jitter attenuation, too-much intrinsic jitter, and/or too-narrow a frequency range. More advanced solutions tend to have too-high a cost. A new clocking technology that boasts high performance and low cost is presented. It has been implemented in a recent audio-over-1394 chip. We show comparative performance results and explore system-level implications, including for systems that use point-to-point links such as AES3, SPDIF, and ADAT.

[Poster Presentation Associated with Paper Presentation P8-11]
Convention Paper 6700 (Purchase now)

P17-10 SigmaStudio. A User-Friendly, Intuitive and Expandable, Graphical Development Environment for Audio/DSP ApplicationsMiguel Chavez, Camille Huin, Analog Devices, Inc. - Wilmington, MA, USA
Graphical development environments have been used in the audio industry for a number of years. Those who have fewer limitations have persisted and found a well-established pool of users that is reluctant to modify their design patterns and adopt different embedded processors and design environments. This paper provides a small history of the evolution of integrated development environments (IDEs). It then describes and explains the software architecture decisions and design challenges that were used to develop SigmaStudio. It will also show the advantages that those decisions have meant for the SigmaDSP family of audio-centric embedded processors.

[Poster Presentation Associated with Paper Presentation P12-1]
Convention Paper 6714 (Purchase now)

P17-11 Adaptive Filters in Wavelet Transform DomainVladan Bajic, Audio-Technica US - Stow, OH, USA
This paper presents performance comparison between two methods of implementing adaptive filtering algorithms for noise reduction, namely the Normalized time domain Least Mean Squares (NLMS) algorithm and the Wavelet transform domain LMS (WLMS). A brief theoretical development of both methods is explained, and then both algorithms are implemented on a real-time Digital Signal Processing (DSP) system used for audio signals processing. Results are presented showing the performance of each algorithm both in time and frequency domains. Noise reduction effects produced by different algorithms were shown across the spectrum, and distorting effects were analyzed. Trade-offs of convergence speed versus added noise were analyzed. Overall results show convergence speed improvement when using WLMS algorithms over the NLMS algorithm.

[Poster Presentation Associated with Paper Presentation P12-3]
Convention Paper 6716 (Purchase now)

P17-12 Adaptive Time-Frequency Resolution for Analysis and Processing of AudioAlexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform (FFT). The fixed time-frequency resolution of the STFT can lead to the undesirable smearing of events in both time and frequency. In this paper we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank-specific artifacts while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated as applied to spectrogram displays, noise reduction, and spectral effects processing.

[Poster Presentation Associated with Paper Presentation P12-4]
Convention Paper 6717 (Purchase now)

P17-13 Advanced Methods for Shaping Time-Frequency Areas for the Selective Mixing of SoundsPiotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland; Adam Kleczkowski, University of Cambridge - Cambridge, UK
The “Selective Mixing of Sounds” (AES 119th Convention Paper 6552) contains a large and conceptually challenging part, which had not been developed previously. This is a method of determining the areas of dominance by different tracks in the time-frequency plane. It has a major effect on the overall quality of the sound. In this paper we propose and compare a range of appropriate algorithms. We begin with a simple two-dimensional running mean combined with a rule selecting the track characterized by the maximum energy, followed by a low-pass filter based on the two-dimensional Fourier transform. We also propose two novel methods based on the Monte-Carlo approach, in which local probabilistic rules are iterated many times to produce a required level of smoothing.

[Poster Presentation Associated with Paper Presentation P12-5]
Convention Paper 6718 (Purchase now)

P17-14 Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingMarc Vinyes, Jordi Bonada, Alex Loscos, Pompeu Fabra University - Barcelona, Spain
Audio blind separation in real commercial music recordings is still an open problem. In the last few years some techniques have provided interesting results. This paper presents a human-assisted clusterization of the DFT coefficients for the time-frequency masking demixing technique. The DFT coefficients are grouped by adjacent pan, interchannel phase difference, and magnitude and magnitude-variance with a real-time interactive graphical interface. Results prove that an implementation of such technique can be used to demix tracks from nowadays commercial songs. Sample sounds can be found at http://www.iua.upf.es/~mvinyes/abs/demos.

[Poster Presentation Associated with Paper Presentation P12-6]
Convention Paper 6719 (Purchase now)

P17-15 Enhanced Control of Sound Field Radiated by Co-Axial Loudspeaker Systems Using Digital Signal Processing TechniquesHmaied Shaiek, ENST de Bretagne - Brest Cedex, France; Bernard Debail, Cabasse Acoustic Center - Plouzané, France; Jean Marc Boucher, ENST de Bretagne - Brest Cedex, France; Yvon Kerneis, Pierre Yves Diquelou, Cabasse Acoustic Center - Plouzané, France
In multiway loudspeaker systems, digital signal processing techniques have been used so far mainly to correct frequency response, time alignment, and out of axis lobbing. In this paper a dedicated signal processing technique is described in order to also control the sound field radiated by co-axial loudspeaker systems in the overlap frequency band of drivers. Trade-offs and practical constraints (crossover, time shift, gain, etc.) are discussed and an optimization algorithm is proposed to provide the best achievable result. Real-time implementation of this technique is presented and leads to a nearly ideal point source.

[Poster Presentation Associated with Paper Presentation P12-10]
Convention Paper 6723 (Purchase now)

P17-16 Network Music Performance (NMP) in Narrow Band NetworksAlexander Carôt, International School of New Media (ISNM) - Lübeck, Germany; Ulrich Krämer, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Playing live music on the Internet is one of the hardest disciplines in terms of low delay audio capture and transmission, time synchronization, and bandwidth requirements. This has already been successfully evaluated with the Soundjack software, which can be described as a low latency UDP streaming application. In combination with the new Fraunhofer ULD Codec this technology could now be used in narrow band DSL networks without a significant increase of latency. This paper first describes the essential basics of network music performances in terms of soundcard and network issues and finally reviews the context under DSL narrow band network restrictions and the usage of the ULD Codec.

[Poster Presentation Associated with Paper Presentation P12-11]
Convention Paper 6724 (Purchase now)

P17-17 Intensive Noise Reduction Utilizing Inharmonic Frequency Analysis of GHATeruo Muraoka, University of Tokyo - Komaba Meguro-ku, Tokyo, Japan; Ryuji Takamizawa, Matsushita Electric Industrial Co., Ltd. - Kadoma City, Osaka, Japan; Yoshihiro Kanda, Musashi Institute of Technology - Tamadutumi Setagaya, Tokyo, Japan; Takumi Ohta, Kenwood Corporation - Hachiouji City, Tokyo, Japan
Removal of noise in SP record reproduction were attempted utilizing GHA (Generalized Harmonic Analysis) as inharmonic frequency analysis. Spectrum subtraction is most common among conventional noise reduction techniques, however it has a side effect of musical noise generation. It is caused by inaccurate frequency resolution inherent to conventional harmonic frequency analysis. One method of inharmonic frequency analysis of GHA is equipped with excellent frequency resolution, and it has been put in practical use recently. The authors applied GHA for noise reduction and obtained better results than those by conventional spectrum subtraction. However, there still remained musical noise problems, and its major reason is spectral in-coincidence between pre-sampled reference noise and actually remained residual noise. The authors tried several countermeasures such as pre-spectral shaping of object signal and spectral similarity calculation of residual noise, etc. Through combining countermeasures, the authors achieved satisfactory noise reduction.

[Poster Presentation Associated with Paper Presentation P12-12]
Convention Paper 6725 (Purchase now)

P17-18 Multichannel Noise-Reduction-Systems for Speaker Identification in an Automotive EnvironmentVolker Mildner, Stefan Goetze, Karl-Dirk Kammeyer, University Bremen - Bremen, Germany
Devices for communication and information utilized by car drivers are facing two essential requirements: hands-free operation via distant microphones but also robustness against different noises depending on car speed, etc. Automatic loudspeaker identification can be utilized within such devices to either supply speech recognition systems with so called a priori information to achieve higher recognition rates or even to enable applications such as heating systems to adjust to the preferences of the driver. Thus identifying the driver from a predefined group of possible system users may be a task for future applications. The aim in this paper is to investigate to what extent multichannel noise reduction systems are suitable for improving the performance of loudspeaker identification algorithms under different acoustic conditions in an automotive environment.
Convention Paper 6756 (Purchase now)

P17-19 Optimal Quantized Linear Prediction Coefficients for Lossless Audio Compression—Scalar Quantization RevisitedFlorin Ghido, Tampere University of Technology - Tampere, Finland
Uniform scalar quantization of linear prediction coefficients is traditionally done by multiplying each coefficient with Q=2B and rounding it to the nearest integer. We propose an improved, optimal quantization method by replacing the rounding with a more elaborated procedure. The method uses 2 bits less per quantized
prediction coefficient for a similar misadjustment and allows an accurate estimate of the misadjustment as a function of Q. We introduce several efficient time-constrained probabilistic search methods for obtaining near optimal solutions. No changes are
required at the decoder and the method is applicable on a wider area of cases (mono, stereo, and multichannel prediction) than quantization of reflection coefficients. Moreover, it enables near optimal compression for 24 bit audio using only 32 bit
arithmetic operations.
Convention Paper 6757 (Purchase now)

P17-20 Efficient Out of Head Localization System for Mobile ApplicationsTacksung Choi, Yonsei University - Seoul, Korea; Young Cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
Headphone reproduction of stereo sources often gives in-the-head-localization. One possible solution to this problem is to give directional filtering and room response to the headphone reproduction system. Conventional out of head localization (OHL) schemes consist usually of a tapped delay line to simulate the direct signal path and early room reflections. Each of the taps must be filtered by a pair of HRTF, which leads to a very high processing cost. Our study is based on the fact that spatial impression (SI) can increase the effects of OHL. Our research is to generate the maximum SI with a minimum cost. Through subjective listening tests, the degree of SI was found to be the greatest for reflections within 15- to 30-msec time frame from the direct sound, and it is greatest for those in opposite direction to the listener’s ears. Based on the test results, we propose an efficient OHL system. In the proposed system, multiple reflections are replaced by a pair of reflections, and HRTF filtering required to simulate directivity of the reflections is implemented using a set of first order IIR shelving filters. According to the subjective tests, we show that the proposed system efficiently creates OHL with a small computational figure, and its performance is comparable to the conventional scheme of high complexity.
Convention Paper 6758 (Purchase now)

P17-21 A Psychoacoustic Noise Reduction Approach for Stereo Hands-Free SystemsStefan Goetze, Volker Mildner, Karl-Dirk Kammeyer, University of Bremen - Bremen, Germany
One demand for comfortable high quality hands-free video conferencing systems is the transmission of a spatial acoustical impression. Therefore a major task is the transmission of stereo speech signals from a noisy environment. The suppression of the noise components must not corrupt the stereo effect. In this context different single channel, multichannel, and hybrid speech enhancement systems will be evaluated in this paper. The problem of musical noise in post-filter-algorithms is addressed. Therefore, a psychoacoustic masking threshold for the noise reduction algorithms is considered.
Convention Paper 6759 (Purchase now)

P17-22 Estimation of Talker’s Head Orientation Based on Oriented Global Coherence FieldAlessio Brutti, ITC-irst - Trento, Italy, Università di Trento, Trento, Italy; Maurizio Omologo, Piergiorgio Svaizer, ITC irst - Trento, Italy
This work describes a new method for estimating the orientation of a not omnidirectional sound source given a distributed microphone network. The technique requires that a set of microphone pairs be distributed over a room, and it exploits the coherence computed from each sensor pair in order to derive an estimation of the head orientation. A database consisting of an audio sequence reproduced by a loudspeaker with different orientations and different positions was collected in order to evaluate the algorithm behavior. Experiments conducted on that database show that our approach can provide an efficient estimation of the sound source orientation, with an RMS error of about 10 degrees. Satisfactory performance was confirmed by tests with real human speakers.
Convention Paper 6760 (Purchase now)

P17-23 High-Quality Blind Bandwidth Extension of Audio for Portable Player ApplicationsManish Arora, Joonhyun Lee, Sangil Park, Samsung Electronics Co. Ltd. - Suwon, Korea
Bandwidth limitation in lossy audio coding schemes significantly reduces the perceived quality. High frequency bandwidth extension schemes have been proposed but are difficult to implement in applications where they are needed most, in portable audio devices with severe complexity constraints. This paper describes a high-quality blind bandwidth extension method proposing efficient initial audio bandwidth detection, band-based nonlinear processing, and simple regenerated spectral envelop shaping enhancements. Objective and subjective measurements of the processed signal have yielded significant quality improvements with very low complexity requirements allowing easy implementation on a wide variety of portable player platforms.
Convention Paper 6761 (Purchase now)

P17-24 Coherence Enhanced Minimum Statistics Spectral Subtraction in Bimicrophone SystemsJonathan Fillion-Deneault, Roch Lefebvre, Sherbrooke University - Sherbrooke, Quebec, Canada
A novel system for 2-channel spectral subtraction is presented. The objective is to improve the intelligibility of speech in noisy environments by enhancing noise reduction of single microphone techniques as well as to greatly reduce the amount of musical noise that they introduce. The system consists of two different blocks. The first processing consists of a generalized spectral subtraction block on the primary channel using minimum statistics for noise estimation followed by a coherence-based post-filter for additional noise suppression. Subjective and objective testing of both simulated and real-world recordings show that listeners prefer the proposed system to other state-of-the-art speech enhancement reduction techniques.
Convention Paper 6762 (Purchase now)

P17-25 Sound Field Analysis Based on Generalized Prolate Spheroidal WaveMathieu Guillaume, Yves Grenier, Télécom Paris - Paris, France
In this paper an array process to improve the quality of sound field analysis, which aims to extract spatial properties of a sound field, is described. In this domain, the notion of spatial aliasing inevitably occurs due to the finite number of microphones used in the array. It is linked to the Fourier transform of the discrete analysis window, which constitutes a mainlobe, fixing the resolution achievable by the spatial analysis, and also from sidelobes, which degrade the quality of spatial analysis by introducing artifacts not present in the original sound field. A method to design an optimal analysis window with respect to a particular wave vector is presented, aiming to realize the best localization possible in the wave vector domain. The efficiency of the approach is then demonstrated for several geometrical configurations of the microphone array, on the whole bandwidth of sound fields.
Convention Paper 6763 (Purchase now)

P17-26 Optimization of Co-centered Rigid and Open Spherical Microphone ArraysAbhaya Parthy, Craig Jin, André van Schaik, University of Sydney - Sydney, New South Wales, Australia
We present a novel microphone array that consists of an open spherical array with a smaller rigid spherical array at its center. The distribution of microphones, which results in the array having the largest frequency range, for a given beamforming order, was obtained by analyzing microphone errors. For a fixed number of microphones, the results for several examples indicate that the maximum frequency range is obtained when the microphones are relatively evenly distributed between the open and rigid spheres.
Convention Paper 6764 (Purchase now)

P17-27 Review and Discussion on Classical STFT-Based Frequency EstimatorsMichaël Betser, Patrice Collen, France Télécom R&D - Cesson-Sévigné, France; Gaël Richard, Bertrand David, Telecom Paris - Paris, France
Sinusoidal modeling is based on the decomposition of audio signals into a sum of sinusoidal components plus a noise residual part. It involves accurate sinusoid parameters estimation and, in particular, accurate frequency estimation. A broad category of methods uses the Fast Fourier Transform (FFT) as a starting point to compute frequency. All these methods present very similar forms of estimators, but the relations between them are not yet fully understood. This paper proposes to take a deeper look into these relations. The first goal of this paper is to present a clear review and description of the classical FFT-based frequency estimators. A new estimator similar to the phase vocoder is presented. The secod goal of this paper is to identify the common hypotheses and the common steps of the processes for this category of estimators. Last, experimental comparisons are given.
Convention Paper 6765 (Purchase now)

P17-28 Accurate Phase Estimation for Chirp-Like SignalsMichaël Betser, Patrice Collen, Jean-Bernard Rault, France Télécom R&D - Cesson-Sévigné, France
Sinusoidal modeling relies on the decomposition of a given signal (continuous or discrete) into a set of sinusoidal components plus a residual signal. The sinusoidal parameters, namely the amplitude, frequency, and phase, may vary upon time. Generally, the tracking of these parameters is performed via Short-Time Fourier Transform (STFT) analysis, providing in fine, for each sinusoidal component, estimates of the amplitude, frequency, and phase for a considered time slot. The duration of the analysis time slots is chosen in order to guarantee that the signal under analysis is stationary enough to deliver useful data. If this requirement is not met, in particular if the frequency varies in the analysis slot, the phase estimation is biased. This paper introduces a method to estimate and to correct this bias as a function of the analysis parameters (window type and size) and of the frequency slope.
Convention Paper 6766 (Purchase now)

P17-29 Equalization of Audio Systems Using Kautz Filters with Log-Like Frequency ResolutionTuomas Paatero, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
This paper presents a new digital filtering approach to the equalization of audio systems such as loudspeaker and room responses. The equalization scheme utilizes a particular infinite impulse response (IIR) filter configuration called Kautz filters, which can be seen as generalizations of finite impulse response (FIR) filters and their warped counterparts. The desired frequency resolution allocation, in this case h4e logarithmic one, is attained by a chosen set of fixed pole positions that define the particular Kautz filter. The frequency resolution mapping is characterized by the all-pass part of the Kautz filter, which is interpreted as a formal generalization of the warping concept. The second step in the actual equalizer design consists of assigning the Kautz filter tap-output weights, which is then, in turn, more or less a standard least-square configuration. The proposed method is demonstrated using measured loudspeaker and room responses.
Convention Paper 6767 (Purchase now)


   
  (C) 2006, Audio Engineering Society, Inc.