120th AES Convention - Paris, France - Dates: Saturday May 20 - Tuesday May 23, 2006 - Porte de Versailles

Registration
Exhibition
Calendar
4 Day Planner
Paper Sessions
Workshops
Tutorials
Exhibitor Seminars
Application Seminars
Student Program
Special Events
Technical Tours
Heyser Lecture
Tech Comm Mtgs
Standards Mtgs
Hotel Reservation
Presenters

AES Paris 2006


Home | Technical Program | Exhibition | Visitors | Students | Press
 

AES Paris 2006
Paper Session Details

P1 - Microphones

Saturday, May 20, 09:00 — 11:40

Chair: Helmut Wittek, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany

P1-1 The Effect of the Singer’s Head on Vocalist MicrophonesMartin Schneider, Georg Neumann GmbH - Berlin, Germany
Vocalist microphones are often optimized for theoretically perfect polar patterns, e.g., cardioid, supercardioid or hypercardioid. The polar pattern can be maintained very well if the microphone is placed in the free-field, with no obstacle around it. When the singer approaches the microphone, the head serves as a reflective and diffractive obstacle. Consequently the far-field polar patterns and frequency responses are distorted, making the microphones more prone to feedback in live amplification situations, and altering the sound of the “spill” in pure recording situations.

Presentation is scheduled to begin at 09:00
Convention Paper 6634 (Purchase now)

P1-2 Wind Generated Noise in Microphones—An Overview: Part 1Eddy B. Brixen, DPA Microphones A/S - Allerød, Denmark, EBB-consult, Smorum, Denmark; Ruben Hensen, DPA Microphones - Allerød, Denmark
When microphones are exposed to wind, noise is generated. The amount of noise generated depends on many factors: the speed and the direction of the wind being, of course, two of the important factors. However, the size, shape, and design principles of the microphones are also very important factors. At higher wind speeds, not only is noise generated but also distortion is introduced, normally as a result of clipping. This paper presents comparative measurements that provide an overview of the parameters influencing wind-noise generated in pressure and pressure-gradient condenser microphones.

Presentation is scheduled to begin at 09:20
Convention Paper 6635 (Purchase now)

P1-3 P-MOS FET Application for Silicon Condenser MicrophonesNorihiro Arimura, Juro Ohga, Shibaura Institute of Technology - Minato-ku, Tokyo, Japan; Norio Kimura, Yoshinobu Yasuno, Panasonic Semiconductor Device Solutions Co., Ltd. - Tsuzuki-ku, Yokohama, Japan
Electret Condenser Microphones (ECM) are widely used as general microphone devices. Each year the miniaturization and voltage lowering for the cellular phone’s power consumption is improved. Although the current ECM has progressed to be small and thin, the FET has not been designed as low-voltage operation in spite of small packaging. This paper pays attention to the P-MOS FET of the low current consumption for miniaturization and the improvement in performance by using CMOS process. The authors designed and tested prototype microphone units and performed comparisons on a basic performance with the conventional ECM.

[Associated Poster Presentation in Session P5, Saturday, May 20, at 14:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6636 (Purchase now)

P1-4 Development of a Super-Wide-Range MicrophoneKazuho Ono, Hayao Tanabe, Masakazu Iwaki, Akio Ando, NHK Science and Technical Research Laboratories - Kinuta Setagaya-ku, Tokyo, Japan; Keishi Imanaga, Sanken Microphone Co. Ltd. - Suginami-ku, Tokyo, Japan
This paper describes the development of a low-noise, high-sensitivity microphone with a wide frequency range. Microphones of this kind are needed to provide high quality sound sources for use in studies on the perceptual discrimination between musical sounds with and without very high frequency components. Conventional electrostatic microphones cannot be used for such recordings because conventional methods for expanding the frequency range use a small diaphragm that degrades the S/N ratio. The proposed microphone has a new design in which the frequency range is expanded in two ways, using both the diffraction and the resonance due to the microphone’s diaphragm. These effects are generally thought to define the upper limit of the frequency range, but the authors have made active use of them to achieve both a wide frequency range and high sensitivity. The body shape was designed with the help of a scale model study. An omnidirectional, electrostatic microphone that picks up sounds of up to 100-kHz with low noise has been developed.

[Associated Poster Presentation in Session P5, Saturday, May 20, at 14:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6637 (Purchase now)

P1-5 Listening Broadband Physical Model for Microphones: A First StepLaurent Millot, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France; Antoine Valette, Manuel Lopes, ENS Louis-Lumiere - Noisy Le Grand, France; Gérard Pelé, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France; Mohammed Elliq, ENS Louis-Lumiere - Noisy Le Grand, France; Dominique Lambert, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France
We will present the first step in the design of a broadband physical model for microphones. Within the proposed model, classical directivity patterns (omnidirectionnal, bidirectional, and cardioids family) are found as limit cases: monochromatic excitation, low frequency, and far-field approximation. Monophonic pieces of music are used as sources for the model so we can listen to the simulation of the associated recorded sound field in real time thanks to a Max/MSP application. Listening and subband analysis show that the directivity is a function of frequent subband and source location. This model also exhibits an interesting proximity effect. Audio demonstrations will be given.

[Associated Poster Presentation in Session P5, Saturday, May 20, at 14:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6638 (Purchase now)

P1-6 Measuring the Perceived Differences between Similar High-Quality MicrophonesDouglas McKinnie, Consultant - Guildford, Surrey, UK
Microphones of similar construction and polar-pattern that can be equalized to have nearly identical on-axis frequency response still are reported to have different sonic character. To help develop a model of how other physical measurements could predict the subjective sonic character, perceptual data was collected from a panel of listeners. The listeners individually made dissimilarity ratings of pair-wise comparisons of nine versions of a single piano performance. Each version was recorded with a different model of small-diaphragm cardioid condenser microphone. The data was collected in order to derive a stimulus space showing the most salient dimensions upon which the perceived timbre of the microphones differed.

[Associated Poster Presentation in Session P5, Saturday, May 20, at 14:00]

Presentation is scheduled to begin at 10:40
Convention Paper 6639 (Purchase now)

P1-7 The Native B-Format Microphone: Part IIEric Benjamin, Dolby Laboratories - San Francisco, CA, USA; Thomas Chen, Studio C - Stockton, CA, USA
Part I of this paper (119th AES Convention Paper 6621) described the objective performance of tetrahedral cardioid arrays versus arrays comprised of discrete pressure and pressure gradient microphone capsules. In the present paper the results of direct listening comparisons between the two types of arrays are given. Simultaneous recordings were made using pairings of the arrays for subsequent comparisons. The sources include both speech and music, and the environments include a range from very dry to very reverberant. The recordings were compared in both horizontal-only and in periphonic reproduction systems.

Presentation is scheduled to begin at 11:00
Convention Paper 6640 (Purchase now)

P1-8 Influence of Components Precision on Characteristics of Dual Microphone ArraysAlexander Valitov, Alango Ltd. - St. Petersburg, Russia; Alexander Goldin, Alango Ltd. - Haifa, Israel
Microphone arrays have great potential in practical applications due to their ability for significant improvement in speech quality and signal-to-noise ratio in noisy environments. Large numbers of scientific papers and patents have been devoted to different algorithmic techniques for producing optimal output of microphone arrays using different optimization criteria. However, in practice performance of microphone arrays to a large extent depend on the quality of their components such as amplitude matching, phase matching, error in distance between microphones, etc. This paper analyses dependence of dual microphone array characteristics on the above factors.

[Associated Poster Presentation in Session P5, Saturday, May 20, at 14:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6641 (Purchase now)


P2 - Analysis and Synthesis of Sound; Mobile Phone Audio; Automotive Audio

Saturday, May 20, 09:00 — 11:40

Chair: Tim Brooks, Institute of Sound Recording - Surrey, UK

P2-1 Application of Segmentation and Thumbnailing to Music Browsing and SearchingMark Levy, Mark Sandler, Queen Mary, University of London - London, UK
We present a method for segmenting musical audio into structural sections and some rules for choosing a representative “thumbnail” segment. We demonstrate how audio thumbnails are an effective and natural way of returning results in music search applications. We investigate the use of segment-based models for music similarity searching and recommendation. We report experimental results of the performance and efficiency of these approaches in the context of SoundBite, a demonstration music thumbnailing and search engine.

[Associated Poster Presentation in Session P6, Saturday, May 20, at 16:00]

Presentation is scheduled to begin at 09:00
Convention Paper 6642 (Purchase now)

P2-2 Multiple F0 Tracking in Solo Recordings of Monodic InstrumentsChunghsin Yeh, Axel Röbel, Xavier Rodet, IRCAM - Paris, France
This paper is concerned with the F0 tracking in monodic instrument solo recordings. Due to reverberation, the observed signal is rather polyphonic, and single-F0 tracking techniques often give unsatisfying results. The proposed method is based on multiple-F0 estimation and makes use of the a priori knowledge that the observed spectrum is generated by a single monodic instrument. The predominant F0 is tracked first and the secondary F0 tracks are then established. The proposed method is tested on reverberant recordings and show significant improvements compared to single-F0 estimators.

[Associated Poster Presentation in Session P6, Saturday, May 20, at 16:00]

Presentation is scheduled to begin at 09:20
Convention Paper 6643 (Purchase now)

P2-3 Harmonic Plus Noise Decomposition: Time-Frequency Reassignment Versus a Subspace-Based MethodBertrand David, Valentin Emiya, Roland Badeau, Yves Grenier, Ecole Nationale Supérieure de Télécommunications - Paris Cedex, France
This paper deals with the Harmonic + Noise decomposition and, as a targeted application, to extract transient background noise surrounded by a signal having a strong harmonic content (speech for instance). In that perspective, a method based on the reassigned spectrum and a high-resolution subspace tracker are compared, both on simulations and in a more realistic manner. The reassignment relocalizes the time-frequency energy around a given pair (analysis time index, analysis frequency bin) while the high resolution method benefits from a characterization of the signal in terms of a space spanned by the harmonic content and a space spanned by the stochastic content. Both methods are adaptive and the estimations are updated from a sample to the next.

[Associated Poster Presentation in Session P6, Saturday, May 20, at 16:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6644 (Purchase now)

P2-4 Signal Analysis Using the Complex Spectral Phase Evolution (CSPE) MethodKevin Short, Ricardo Garcia, Chaoticom Technologies - Andover, MA, USA
The Complex Spectral Phase Evolution (CSPE) method is introduced as a tool to analyze and detect the presence of short-term stable sinusoidal components in an audio signal. The method provides for super-resolution of frequencies by examining the evolution of the phase of the complex signal spectrum over time-shifted windows. It is shown that this analysis, when applied to a sinusoidal signal component, allows for the resolution of the true signal frequency with orders of magnitude greater accuracy than the DFT. Further, this frequency estimate is independent of the frequency bin and can be estimated from “leakage” bins far from spectral peaks. The method is robust in the presence of noise or nearby signal components, and is a fundamental tool in the front-end processing for the KOZ compression technology.

[Associated Poster Presentation in Session P6, Saturday, May 20, at 16:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6645 (Purchase now)

P2-5 Upwind Leapfrog Schemes in Physical Models with Mixed Modeling StrategiesJosé Escolano, José J. López, Technical University of Valencia - Valencia, Spain
Block-based physical modeling with mixed modeling strategies is one of the most promising methods for digital sound synthesis. This technique proposes to model and discretize each element individually, and their interaction topology is separately implemented. In this paper the use of the Upwind Leapfrog Scheme or Linear Bicharacteristic Scheme (LBS) is proposed for digital sound synthesis of membranes (2-D), into block-based physical modeling context. It provides an efficient and accurate alternative stencil to the classical leapfrog scheme of the Finite Difference Time Domain (FDTD) method. Moreover, the conversion of dependent wave equation variables into characteristic variables makes this method suitable to interact with wave digital filter models and, then, with other paradigms. This technique is extensively presented and justified with some examples.

Presentation is scheduled to begin at 10:20
Convention Paper 6646 (Purchase now)

P2-6 Simple Modeling of Piano Inharmonicity Due to Soundboard ImpedanceLuis Ortiz-Berenguer, Javier Casajus-Quiros, Marisol Torres-Guijarro, Jon Beracoechea, J. Perez-Aranda, Polytechnic University of Madrid - Madrid, Spain
Partials of piano sounds are inharmonic. This inharmonicity is due either to string stiffness and to soundboard impedance. The last has not been widely documented. Two problems arise: to know the value of the impedance and to evaluate the frequency deviation the partial suffers. In this paper that deviation has been calculated either by using Morse’s theoretical equations or by using the authors’ proposed method. To validate results, deviations for some piano notes have been measured. In addition, the soundboard impedance has also been measured to verify the relationship between deviation and impedance. Moreover, a method to evaluate impedance using measured deviations is also proposed. This last method could be useful during training stages in transcription systems and for parameter extraction schemes.

Presentation is scheduled to begin at 10:40
Convention Paper 6647 (Purchase now)

P2-7 Contextual Effects on Sound Quality Judgments: Listening Room and Automotive EnvironmentsKathryn Beresford, University of Surrey - Guildford, Surrey, UK; Natanya Ford, Harman Becker Automotive Systems - Bridgend, UK; Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
This study was designed to assess the effect of the listening context on basic audio quality for stimuli with varied mid-range timbral degradations. An assessment of basic audio quality was carried out in two different listening environments: an ITU-R BS.1116 conformant listening room and a stationary vehicle. A group of untrained listeners graded basic audio quality using a novel single stimulus method. The listener population was divided into two subsets—one made evaluations in a listening room and the other in a vehicle. The single stimulus method was investigated as a possible subjective evaluation method for use in automotive environments.

[Associated Poster Presentation in Session P6, Saturday, May 20, at 16:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6648 (Purchase now)

P2-8 Next Generation Automotive Research and TechnologiesBrett Crockett, Michael Smithers, Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA
The automobile is quickly becoming a prominent environment for listening to multichannel audio content. As a listening space, the automobile is both interesting and challenging due to its interior structure and materials, its predominant off-axis listening positions, and the amount and variability of background noise. This paper discusses these challenges, describes a number of existing multichannel sound technologies and their applicability to the automotive environment, and presents several novel sound technologies that provide new solutions to some of these challenges. Ongoing challenges and associated automotive sound research being investigated are also presented.

Presentation is scheduled to begin at 11:20
Convention Paper 6649 (Purchase now)


P3 - Spatial Perception and Processing

Saturday, May 20, 14:00 — 17:00

Chair: Natanya Ford, Harman Automotive - Bridgend, UK

P3-1 Spatial Sound Localization Model Using Neural NetworkRodolfo Venegas, Marcelo Lara, Universidad Tecnológica de Chile - Santiago, Chile; Rafael Correa, Universidad Tecnológica Metropolitana - Santiago, Chile; Sergio Floody, Universidad Tecnológica de Chile - Santiago, Chile
This paper presents the design, implementation, and training of a spatial sound localization model for broadband sound in an anechoic environment inspired in the human auditory system and implemented using artificial neural networks. The data acquisition was made experimentally. The model consists in a nonlinear transformer that possesses one module of ITD, ILD, and ISLD extraction and a second module constituted by a neural network that estimates the sound source position in elevation and azimuth angle. A comparative study of the model performances using three different bank filters and a sensitivity analysis of the neural network inputs are also presented. The average error is 2.3º. This work was supported by the FONDEI fund of Universidad Tecnológica de Chile.

Presentation is scheduled to begin at 14:00
Convention Paper 6650 (Purchase now)

P3-2 Aurally Motivated Analysis for Scattered Sound in AuditoriaMolly K. Norris, Rensselaer Polytechnic Institute - Troy, NY, USA, Kirkegaard Associates, Chicago, IL, USA; Ning Xiang, Ana M. Jaramillo, Rensselaer Polytechnic Institute - Troy, NY, USA
The goal of the first part of this work was to implement an aurally-adequate time-frequency analysis technique as a motivated first effort that takes into account binaural hearing with the implementation goal of the analysis of sound scattering data. The second part of this work aimed to use the developed model in the analysis of different scattering surfaces implemented in a room acoustics modeling program. This was an attempt to start to gain an understanding of what kind of visual changes could be expected when one alters the coefficients used to model scattering in conjunction with the Lambert scattering model. This paper is the pursuit of a method for visually representing scattering effects that directly correlates with human perception.

Presentation is scheduled to begin at 14:20
Convention Paper 6651 (Purchase now)

P3-3 Audibility of Spectral Differences in Head-Related Transfer FunctionsPablo Faundez Hoffmann, Henrik Møller, Aalborg University - Aalborg, Denmark
The spatial resolution at which head-related transfer functions (HRTFs) are available is an important aspect in the implementation of three-dimensional sound. Specifically, synthesis of moving sound requires that HRTFs are sufficiently close so the simulated sound is perceived as moving smoothly. How close they must be, depends directly on how much the characteristics of neighboring HRTFs differ, and, most important, when these differences become audible. Differences between HRTFs exist in the interaural delay (ITD) and in the spectral characteristics, i.e., the magnitude spectrum of the HRTFs. The present study investigates the audibility of the spectral characteristics. To this purpose, binaural audibility thresholds of differences between minimum-phase representations of HRTFs are measured and evaluated.

Presentation is scheduled to begin at 14:40
Convention Paper 6652 (Purchase now)

P3-4 Looking for a Relevant Similarity Criterion for HRTF Clustering: A Comparative StudyRozenn Nicol, Vincent Lemaire, Alexis Bondu, Sylvain Busson, France Telecom R&D - Lannion, France
For high-fidelity Virtual Auditory Space (VAS), binaural synthesis requires individualized head-related transfer functions (HRTF). An alternative to exhaustive measurement of HRTF consists in measuring a set of representative HRTF in a few directions. These selected HRTF are considered as representative because they summarize all the necessary spatial and individual information. The goal is to deduce the HRTF in nonmeasured directions from the measured ones by appropriate modeling. Clustering is applied in order to identify the representative directions, but the first issue relies on the definition of a relevant distance criterion. This paper presents a comparative study of several criteria taken from literature. A new insight in HRTF (dis)similarity is proposed.

Presentation is scheduled to begin at 15:00
Convention Paper 6653 (Purchase now)

P3-5 Evaluation of a 3-D Audio System with Head TrackingJan Abildgaard Pedersen, Pauli Minnaar, AM3D A/S - Aalborg, Denmark
A 3-D audio system was evaluated in an experiment where listeners had to “shoot down” real and virtual sound sources appearing from different directions around them. The 3-D audio was presented through headphones, and head tracking was used. In order to investigate the influence of head movements both long and short stimuli were used. Twenty six people participated, of which half were students and half were pilots. The results were analyzed by calculating a localization offset and a localization uncertainty. For azimuth no significant offset was found, whereas for elevation an offset was found that is strongly correlated with the stimulus elevation. The uncertainty for real and virtual sound sources was 10 degrees and 14 degrees in azimuth and 12 degrees and 24 degrees in elevation.

[Associated Poster Presentation in Session P9, Sunday, May 21, at 09:00]

Presentation is scheduled to begin at 15:20
Convention Paper 6654 (Purchase now)

P3-6 Design and Verification of HeadZap, a Semi-Automated HRIR Measurement SystemDurand Begault, Martine Godfroy, NASA Ames Research Center, USA - Moffett Field, CA, USA; Joel Miller, VRSonic, Inc. - San Francisco, CA, USA; Agnieskza Roginska, AuSIM Incorporated - Palo Alto, CA, USA; Mark Anderson, Elizabeth Wenzel, NASA Ames Research Center - Moffett Field, CA, USA
This paper describes the design, development, and acoustic verification of HeaddZap, a semi-automated system for measuring head-related impulse responses (HRIR) designed by AuSIM Incorporated and modified by the NASA Ames Research Center Spatial Auditory Display Laboratory. HeaddZap utilizes an array of twelve loudspeakers in order to measure 432 HRIRs at 10° intervals in both azimuth and elevation, in a nonanechoic environment. Application to real-time rendering using SLAB software, an audio-visual localization experiment, is discussed.

[Associated Poster Presentation in Session P9, Sunday, May 21, at 09:00]

Presentation is scheduled to begin at 15:40
Convention Paper 6655 (Purchase now)

P3-7 Visualization of Perceptual Parameters in Interactive User Interfaces: Application to the Control of Sound SpatializationOlivier Delerue, IRCAM - Paris, France
This paper addresses the general problem of designing graphical user interfaces for nonexpert users. The key idea is to help the user anticipating his actions by displaying, in the interaction area, the expected evolution of a quality criterion according to the degrees of freedom that are being monitored. This concept is first applied to the control of sound spatialization: various perceptually based criteria such as spatial homogeneity or spatial masking are represented as a grey shaded map superimposed to the background of a bird’s eye view interface. After selecting a given sound source, the user is thus informed how these criteria will behave if the source is being moved to any other location of the virtual sound scene.

Presentation is scheduled to begin at 16:00
Convention Paper 6656 (Purchase now)

P3-8 A New Approach for Direct Interaction with Graphical Representations of Room Impulse Responses for Use in Wave Field Synthesis ReproductionFrank Melchior, Jan Langhammer, Fraunhofer IDMT - Ilmenau, Germany; Diemer de Vries, Technical University of Delft - Delft, The Netherlands
Room simulation based on convolution is state-of-the-art in modern audio processing environments. Most of the systems currently available provide only a few controllers to modify the underlying room impulse responses. The sound designer can manipulate one set of numeric parameters even in spatial reproduction systems. This paper describes a new approach for the interactive control of room impulse responses based on visualization and parameterization. The new principle is originally developed for the use in wave field synthesis systems and based on augmented reality user interfaces. An adaptation to conventional user interfaces and other spatial sound reproduction systems is possible. The modification of the room impulse responses is performed by direct interaction with 3-D graphical representations of multitrace room impulse responses.

Presentation is scheduled to begin at 16:20
Convention Paper 6657 (Purchase now)

P3-9 Direct Audio Coding: Filterbank and STFT-Based DesignVille Pulkki, Helsinki University of Technology - Espoo, Finland; Christof Faller, EPFL - Lausanne, Switzerland
Directional audio coding (DirAC) is a method for spatial sound representation, applicable to arbitrary audio reproduction methods. In the analysis part, properties of the sound field in time and frequency in a single point are measured and transmitted as side information together with one or more audio waveforms. In the synthesis part, the properties of the sound field are reproduced using separate techniques for point-like virtual sources and diffuse sound. Different implementations of DirAC are described and differences between them are discussed. A modification of DirAC is presented, which provides a link to Binaural Cue Coding and parametric multichannel audio coding in general (e.g., MPEG Surround).

[Associated Poster Presentation in Session P9, Sunday, May 21, at 09:00]

Presentation is scheduled to begin at 16:40
Convention Paper 6658 (Purchase now)


P4 - Audio in Computers & Audio Networking

Saturday, May 20, 14:00 — 17:20

Chair: Christophe Musialik, Algorithmix, Germany - Waldshut-Tiengen, Germany

P4-1 Newly Established IEC Standard on Audio Quality Measurement of Personal ComputersKenji Kurakata, National Institute of Advanced Industrial Science and Technology (AIST) - Tsukuba, Ibaraki, Japan; Masamichi Furukawa, Kenwood Corporation - Hachioji, Tokyo, Japan; JEITA Project Group - Chiyoda, Tokyo, Japan
A new IEC standard on audio quality measurement of personal computers (PCs) was published in December 2005, entitled IEC 61606-4 “Audio and audiovisual equipment—Digital audio parts—Basic measurement methods of audio characteristics —Part 4: Personal computer.” This standard prescribes methods for measuring PC audio quality, taking into account requirements of measuring conditions of PCs. Furthermore, a new measure of audio signal quality, short-term distortion, was introduced to describe PC-specific noise problems. This paper presents an outline of this standard.

Presentation is scheduled to begin at 14:00
Convention Paper 6659 (Purchase now)

P4-2 Scene Description Model and Rendering Engine for Interactive Virtual AcousticsJean-Marc Jot, Jean-Michel Trivi, Creative Advanced Technology Center - Scotts Valley, CA, USA
Interactive environmental audio spatialization technology has become commonplace in personal computers, where its primary current application is video game soundtrack rendering. The most advanced PC audio platforms available can spatialize 100 or more sound sources simultaneously over headphones or multichannel home theater systems, and employ multiple reverberation engines to simulate complex acoustical environments. This paper reviews the main features of the EAX environmental audio programming interface and its relation to the I3DL2 and MPEG-4 standards. A statistical reverberation model is introduced for simulating per-source distance and directivity effects. An efficient spatial reverberation and mixing architecture is described for the spatialization of multiple sound sources around a virtual listener navigating across multiple connected virtual rooms including acoustic obstacles.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at 14:20
Convention Paper 6660 (Purchase now)

P4-3 Intelligent Audio for GamesCol Walder, Revolution Recording - Sheffield, UK
Providing interactive audio for computer games has traditionally been seen as a challenge, particularly given the technological limitations of games consoles. With current advances in technology, however, there is the potential to take advantage of the benefits of interactivity. This paper proposes the use of Artificial Intelligence (AI) routines to control in-game audio with a focus on implementing techniques used in film sound for drama-based games. Soar architecture is presented as a good candidate for developing audio AI for games.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at 14:40
Convention Paper 6661 (Purchase now)

P4-4 A Frame Loss Concealment Technique for MPEG-AACSang-Uk Ryu, Kenneth Rose, University of California at Santa Barbara - Santa Barbara, CA, USA
An efficient method is proposed for frame loss concealment within the advanced audio coding (AAC) decoder, which can effectively mitigate the adverse impact of frame loss on reconstruction quality. The spectral information of the lost frame is first estimated in the modified discrete cosine transform (MDCT) domain via the known frame interpolation approach. The interpolated MDCT coefficients are then further refined by magnitude scaling and sign correction, which are differently designed for tonal and noise components of the source signal. In noise-like spectral bins, a shaped-noise insertion technique is employed to adjust the interpolated coefficients, while coefficients in tone-dominant bins are refined by magnitude scaling and novel sign correction techniques so as to optimize the fit of the corresponding time reconstruction with available partial signal information from neighboring frames. Subjective quality evaluations demonstrate that the proposed method achieves significant quality improvement over the shaped-noise insertion method adopted in commercial AAC decoders.

Presentation is scheduled to begin at 15:00
Convention Paper 6662 (Purchase now)

P4-5 Multiple Description Error Mitigation Techniques for Streaming Compressed Audio over a 802.11 Wireless NetworkCorey Cheng, Wenyu Jiang, Dolby Laboratories - San Francisco, CA, USA
This paper presents several multiple description (MD) coding techniques for error mitigation of compressed audio streamed over an 802.11b/g wireless network. Loosely speaking, an MD encoder generates several descriptions of the same source, and an MD decoder recreates the best estimate of the source from the descriptions it successfully receives. We propose a design for an MD architecture and simulate its integration into the AAC codec. We use packet loss traces gathered from an actual 802.11 b/g network to simulate the proposed codec’s error mitigation properties for various network traffic conditions. We examine how tuning several of the proposed codec’s parameters would affect the sound quality and overall bit rate of the proposed codec. Specifically, we show how interleaving, renormalization, and low-frequency variance estimation techniques can be used in conjunction with hierarchical correlating transforms to improve the sound quality of multiple description codecs.

Presentation is scheduled to begin at 15:20
Convention Paper 6663 (Purchase now)

P4-6 Single Frequency Networks for FM RadioPierre Soelberg, Selberg Broadcast & IT Consult - Købehavn S, Denmark
Single Frequency Networks (SFN) and Near Single Frequency Networks (NSFN) are usually not considered suitable for FM radio. Some countries are now replanning their FM bands for the use of (N)SFN, in order to make space for more stations. Even though some stations use it, like a station covering a highway, replanning the FM-band with the use of SFN for a whole country, is a different thing. The first country to do this was the Netherlands, and the first experiences with it are not as good as expected. The requirements for synchronization of FM transmitters used for (N)SFN are explained, and SFN networks are tested from real transmitter sites. The result is a proposed correction for the Dutch norm.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at 15:40
Convention Paper 6664 (Purchase now)

P4-7 A Paradigm for Wireless Digital Audio Home EntertainmentNikos Kokkos, University of Patras - Patras, Greece; Andreas Floros, Ionian University - Corfu, Greece; Nicolas-Alexander Tatlas, John Mourjopoulos, University of Patras - Patras, Greece
Despite recent advances in wireless networking technology, real-time streaming of CD-quality digital audio remains a challenging topic. In this paper a set of applications following the server-client model was developed, facilitating the transmission and playback of PCM-coded audio over wireless links. The implementation is based on typical personal computer (PC) platforms interconnected with off-the-shelf wireless networking hardware. Performance evaluation tests are presented under different networking parameters and link conditions, leading to an optimal set of parameters for high-quality wireless digital audio delivery.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at 16:00
Convention Paper 6665 (Purchase now)

P4-8 Online Acoustic Measurements in a Networked Audio SystemAki Härmä, Philips Research - Eindhoven, The Netherlands
A networked audio system consists of audio devices that are in the same physical environment and are connected by a network. The network connection makes it possible to perform continuous acoustic measurements between the devices. Such measurement data can be used, for example, to control the playback by the properties of the actual sound field produced. Continuous acoustic measurement involves transmission of audio data over the network. The bit-rate of the audio data should be low because the measurement is not a primary function of the networked system. In this paper we introduce a robust system for the networked audio measurements where the bit rate sent over the network is small.

Presentation is scheduled to begin at 16:20
Convention Paper 6666 (Purchase now)

P4-9 Design and Installation of Recording Studios for Vocational TrainingChris Bradley, James Watt College of Further & Higher Education - Greenock, Inverclyde, Scotland, UK; Billy Law, Mediaspec - Glasgow, Scotland, UK
This paper describes the design and installation of new recording studios for training of music and sound production allowing unparalleled direct student hands-on tuition. The design allows simultaneous recording from the live rooms to all twelve control rooms via digital distribution, enabling individual set up for a recording session, multitrack recording and subsequent mixdown. All recording sessions are saved to a centralized server that allows back-up and uploading to and from any other control room. Students can therefore import their work into any of the other control rooms at any time. Networking is through Gigabit Ethernet so transfer of work is fast, and students have their own password protected space, learning the importance of file management.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at16:40
Convention Paper 6667 (Purchase now)

P4-10 Flexible, High Speed Audio Networking for Hotels and Convention CentersRichard Foss, Rhodes University - Grahamstown, South Africa; Jun-ichi Fujimori, Yamaha Corporation - Hamamatsu, Japan; Nyasha Chigwamba, Rhodes University - Grahamstown, South Africa; Brad Klinkradt, Harold Okai-Tettey, Networked Audio Solutions - Grahamstown, South Africa
This paper describes the use of mLAN (music Local Area Network) to solve the problem of audio routing within hotels and convention centers. mLAN is a FireWire-based digital network interface technology that allows professional audio equipment, PCs, and electronic instruments to be easily and efficiently interconnected using a single cable. In order to solve this problem, an existing mLAN Connection Management Server, augmented with additional functionality, has been utilized. A graphical client application has been created that displays the various locations within a hotel/convention center and sends out appropriate routing messages in Extensible Mark-up Language (XML) to an mLAN connection management server. The connection management server, in turn, controls a number of mLAN audio distribution boxes on the FireWire network.

[Associated Poster Presentation in Session P10, Sunday, May 21, at 11:00]

Presentation is scheduled to begin at 17:00
Convention Paper 6668 (Purchase now)


P5 - Posters: Microphones

Saturday, May 20, 14:00 — 15:30

P5-1 P-MOS FET Application for Silicon Condenser MicrophonesNorihiro Arimura, Juro Ohga, Shibaura Institute of Technology - Minato-ku, Tokyo, Japan; Norio Kimura, Yoshinobu Yasuno, Panasonic Semiconductor Device Solutions Co., Ltd. - Tsuzuki-ku, Yokohama, Japan
Electret Condenser Microphones (ECM) are widely used as general microphone devices. Each year the miniaturization and voltage lowering for the cellular phone’s power consumption is improved. Although the current ECM has progressed to be small and thin, the FET has not been designed as low-voltage operation in spite of small packaging. This paper pays attention to the P-MOS FET of the low current consumption for miniaturization and the improvement in performance by using CMOS process. The authors designed and tested prototype microphone units and performed comparisons on a basic performance with the conventional ECM.

[Poster Presentation Associated with Paper Presentation P1-3]
Convention Paper 6636 (Purchase now)

P5-2 Development of a Super-Wide-Range MicrophoneKazuho Ono, Hayao Tanabe, Masakazu Iwaki, Akio Ando, NHK Science and Technical Research Laboratories - Kinuta Setagaya-ku, Tokyo, Japan; Keishi Imanaga, Sanken Microphone Co. Ltd. - Suginami-ku, Tokyo, Japan
This paper describes the development of a low-noise, high-sensitivity microphone with a wide frequency range. Microphones of this kind are needed to provide high quality sound sources for use in studies on the perceptual discrimination between musical sounds with and without very high frequency components. Conventional electrostatic microphones cannot be used for such recordings because conventional methods for expanding the frequency range use a small diaphragm that degrades the S/N ratio. The proposed microphone has a new design in which the frequency range is expanded in two ways, using both the diffraction and the resonance due to the microphone’s diaphragm. These effects are generally thought to define the upper limit of the frequency range, but the authors have made active use of them to achieve both a wide frequency range and high sensitivity. The body shape was designed with the help of a scale model study. An omnidirectional, electrostatic microphone that picks up sounds of up to 100-kHz with low noise has been developed.

[Poster Presentation Associated with Paper Presentation P1-4]
Convention Paper 6637 (Purchase now)

P5-3 Listening Broadband Physical Model for Microphones: A First StepLaurent Millot, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France; Antoine Valette, Manuel Lopes, ENS Louis-Lumiere - Noisy Le Grand, France; Gérard Pelé, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France; Mohammed Elliq, ENS Louis-Lumiere - Noisy Le Grand, France; Dominique Lambert, Université Paris - Paris, France, ENS Louis-Lumiere, Noisy Le Grand, France
We will present the first step in the design of a broadband physical model for microphones. Within the proposed model, classical directivity patterns (omnidirectionnal, bidirectional, and cardioids family) are found as limit cases: monochromatic excitation, low frequency, and far-field approximation. Monophonic pieces of music are used as sources for the model so we can listen to the simulation of the associated recorded sound field in real time thanks to a Max/MSP application. Listening and subband analysis show that the directivity is a function of frequent subband and source location. This model also exhibits an interesting proximity effect. Audio demonstrations will be given.

[Poster Presentation Associated with Paper Presentation P1-5]
Convention Paper 6638 (Purchase now)

P5-4 Measuring the Perceived Differences between Similar High-Quality MicrophonesDouglas McKinnie, Consultant - Guildford, Surrey, UK
Microphones of similar construction and polar-pattern that can be equalized to have nearly identical on-axis frequency response still are reported to have different sonic character. To help develop a model of how other physical measurements could predict the subjective sonic character, perceptual data was collected from a panel of listeners. The listeners individually made dissimilarity ratings of pair-wise comparisons of nine versions of a single piano performance. Each version was recorded with a different model of small-diaphragm cardioid condenser microphone. The data was collected in order to derive a stimulus space showing the most salient dimensions upon which the perceived timbre of the microphones differed.

[Poster Presentation Associated with Paper Presentation P1-6]
Convention Paper 6639 (Purchase now)

P5-5 Influence of Components Precision on Characteristics of Dual Microphone ArraysAlexander Valitov, Alango Ltd. - St. Petersburg, Russia; Alexander Goldin, Alango Ltd. - Haifa, Israel
Microphone arrays have great potential in practical applications due to their ability for significant improvement in speech quality and signal-to-noise ratio in noisy environments. Large numbers of scientific papers and patents have been devoted to different algorithmic techniques for producing optimal output of microphone arrays using different optimization criteria. However, in practice performance of microphone arrays to a large extent depend on the quality of their components such as amplitude matching, phase matching, error in distance between microphones, etc. This paper analyses dependence of dual microphone array characteristics on the above factors.

[Poster Presentation Associated with Paper Presentation P1-8]
Convention Paper 6641 (Purchase now)

P5-6 Sound Quality Differences between Electret Film (EMFIT) and Piezoelectric Under-Saddle Guitar PickupsMiikka Tikander, Henri Penttinen, Helsinki University of Technology - Espoo, Finland
Two different types of under-saddle guitar pickups, piezoelectric and electret film (EMFIT) were measured and compared. The measurements included comparisons of magnitude, time, and phase responses, distortion and noise characteristics. The measurements were conducted with a custom rig that allowed accurate control of the environment. For excitation both frequency sweeps and impulsive stimuli were used. As for the magnitude response, the piezoelectric pickup has a boosted bass response and a slightly pronounced high frequency response. The results also imply nonlinear behavior as a function of both the excitation type (sweep vs. impulsive) and the amount of excitation force (small vs. large). In addition, the piezoelectric microphone is fairly immune to tension changes, whereas sensitivity of the EMFT microphone increases as the tension decreases. For time responses excited impulsively the only differences were found at the beginning of the responses. The distortion and noise characteristics of the measurements imply that the EMFIT microphone has slightly more distortion and a slightly higher noise floor. A linear filter model is also proposed for making either microphone sound like the other.
Convention Paper 6669 (Purchase now)


P6 - Posters: Analysis and Synthesis of Sound; Mobile Phone Audio; Automobile Audio

Saturday, May 20, 16:00 — 17:30

P6-1 Application of Segmentation and Thumbnailing to Music Browsing and SearchingMark Levy, Mark Sandler, Queen Mary, University of London - London, UK
We present a method for segmenting musical audio into structural sections and some rules for choosing a representative “thumbnail” segment. We demonstrate how audio thumbnails are an effective and natural way of returning results in music search applications. We investigate the use of segment-based models for music similarity searching and recommendation. We report experimental results of the performance and efficiency of these approaches in the context of SoundBite, a demonstration music thumbnailing and search engine.

[Poster Presentation Associated with Paper Presentation P2-1]
Convention Paper 6642 (Purchase now)

P6-2 Multiple F0 Tracking in Solo Recordings of Monodic InstrumentsChunghsin Yeh, Axel Röbel, Xavier Rodet, IRCAM - Paris, France
This paper is concerned with the F0 tracking in monodic instrument solo recordings. Due to reverberation, the observed signal is rather polyphonic, and single-F0 tracking techniques often give unsatisfying results. The proposed method is based on multiple-F0 estimation and makes use of the a priori knowledge that the observed spectrum is generated by a single monodic instrument. The predominant F0 is tracked first and the secondary F0 tracks are then established. The proposed method is tested on reverberant recordings and show significant improvements compared to single-F0 estimators.

[Poster Presentation Associated with Paper Presentation P2-2]
Convention Paper 6643 (Purchase now)

P6-3 Harmonic Plus Noise Decomposition: Time-Frequency Reassignment Versus a Subspace-Based MethodBertrand David, Valentin Emiya, Roland Badeau, Yves Grenier, Ecole Nationale Supérieure de Télécommunications - Paris Cedex, France
This paper deals with the Harmonic + Noise decomposition and, as a targeted application, to extract transient background noise surrounded by a signal having a strong harmonic content (speech for instance). In that perspective, a method based on the reassigned spectrum and a high-resolution subspace tracker are compared, both on simulations and in a more realistic manner. The reassignment relocalizes the time-frequency energy around a given pair (analysis time index, analysis frequency bin) while the high resolution method benefits from a characterization of the signal in terms of a space spanned by the harmonic content and a space spanned by the stochastic content. Both methods are adaptive and the estimations are updated from a sample to the next.

[Poster Presentation Associated with Paper Presentation P2-3]
Convention Paper 6644 (Purchase now)

P6-4 Signal Analysis Using the Complex Spectral Phase Evolution (CSPE) MethodKevin Short, Ricardo Garcia, Chaoticom Technologies - Andover, MA, USA
The Complex Spectral Phase Evolution (CSPE) method is introduced as a tool to analyze and detect the presence of short-term stable sinusoidal components in an audio signal. The method provides for super-resolution of frequencies by examining the evolution of the phase of the complex signal spectrum over time-shifted windows. It is shown that this analysis, when applied to a sinusoidal signal component, allows for the resolution of the true signal frequency with orders of magnitude greater accuracy than the DFT. Further, this frequency estimate is independent of the frequency bin and can be estimated from “leakage” bins far from spectral peaks. The method is robust in the presence of noise or nearby signal components, and is a fundamental tool in the front-end processing for the KOZ compression technology.

[Poster Presentation Associated with Paper Presentation P2-4]
Convention Paper 6645 (Purchase now)

P6-5 Contextual Effects on Sound Quality Judgments: Listening Room and Automotive EnvironmentsKathryn Beresford, University of Surrey - Guildford, Surrey, UK; Natanya Ford, Harman Becker Automotive Systems - Bridgend, UK; Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
This study was designed to assess the effect of the listening context on basic audio quality for stimuli with varied mid-range timbral degradations. An assessment of basic audio quality was carried out in two different listening environments: an ITU-R BS.1116 conformant listening room and a stationary vehicle. A group of untrained listeners graded basic audio quality using a novel single stimulus method. The listener population was divided into two subsets—one made evaluations in a listening room and the other in a vehicle. The single stimulus method was investigated as a possible subjective evaluation method for use in automotive environments.

[Poster Presentation Associated with Paper Presentation P2-7]
Convention Paper 6648 (Purchase now)

P6-6 A Hybrid Concealment Algorithm for Nonpredictive Wideband Audio CodersVilayphone Vilaysouk, Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
This paper proposes a hybrid packet loss concealment (PLC) algorithm for memoryless encoders such as PCM. The concealment algorithm integrates two modes, one in the time domain and the other in the frequency domain. Mode selection is performed using the previous, correctly received samples prior to an erased packet. This hybrid approach provides a packet loss concealment mechanism that can adapt to the signal characteristics and is not restricted to pure speech signals. Subjective evaluations have demonstrated that the proposed algorithm performs significantly better than single mode concealment algorithms.
Convention Paper 6670 (Purchase now)

P6-7 Toward an Inverse Constant Q TransformDerry FitzGerald, Matt Cranitch, Marcin T. Cychowski, Cork Institute of Technology - Bishopstown, Cork, Ireland
The Constant Q transform has found use in the analysis of musical signals due to its logarithmic frequency resolution. Unfortunately, a considerable drawback of the Constant Q transform is that there is no inverse transform. Here we show it is possible to obtain a good quality approximate inverse to the Constant Q transform provided that the signal to be inverted has a sparse representation in the Discrete Fourier Transform domain. This inverse is obtained through the use of l0 and l1 minimization approaches to project the signal from the constant Q domain back to the Discrete Fourier Transform domain. Once the signal has been projected back to the Discrete Fourier Transform domain, the signal can be recovered by performing an inverse Discrete Fourier Transform.
Convention Paper 6671 (Purchase now)

P6-8 History and Design of Russian Electro-Musical Instrument “Theremin”Yurii Vasilyev, Saint-Petersburg State University of Telecommunications - St. Petersburg, Russia
Electro-musical instrument Theremin, developed by the Russian physicist L. S. Theremin, has come a long way in its evolution. It evokes the constantly growing interest of audio-engineers and performers. Theremin is used both for performing musical compositions of different genres and for making special effects in theatrical performances, multimedia, and the film industry. In the presented paper the analysis of circuit technique solutions during the last 80 years has been done, both on the basis of analogous circuit technique and digital microprocessor technique, and realizations of Theremin as real and virtual musical instruments. Advantages and disadvantages of different circuit technique solutions have also been analyzed and the most interesting realizations of virtual Theremin are presented.
Convention Paper 6672 (Purchase now)

P6-9 A Fast- and High-Convergence Method for ICA-Based Noise Reduction in Mobile Phone Speech CommunicationZhang Zhipeng, Etoh Minoru, NTT DoCoMo Labs - Yokosuka, Kanagawa, Japan
This paper proposes a noise reduction technique that applies a priori information to unmixing matrix estimation in ICA; it offers fast and accurate convergence. We formulate the parameter estimation stabilized by the a priori information as a Bayesian framework of maximum a posteriori (MAP) estimation, and show its robustness in mobile phone environments, where the position of the microphone relative to the mouth is almost constant. We use the transfer function of mouth to microphone for one row of the unmixing matrix. Using these estimated parameters as initial values, the unmixing matrix can be updated with high efficiency in the framework of MAP estimation. Experimental results confirm that the proposed method achieves high performance, especially in high SNR noise conditions.
Convention Paper 6673 (Purchase now)

P6-10 A Comparison of Time-Domain Time-Scale Modification AlgorithmsDavid Dorran, Dublin Institute of Technology - Dublin, Ireland; Robert Lawlor, National University of Ireland - Maynooth, Ireland; Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
Time-domain approaches to time-scale modification are popular due to their ability to produce high quality results at a relatively low computational cost. Within the category of time-domain implementations quite a number of alternatives exist, each with their own computational requirements and associated output quality. This paper provides a computational and objective output quality assessment of a number of popular time-domain time-scaling implementations; thus providing a means for developers to identify a suitable algorithm for their application of interest. In addition, the issues that should be considered in developing time-domain algorithms are outlined, purely in the context of a waveform editing procedure.
Convention Paper 6674 (Purchase now)

P6-11 The Importance of the Nonharmonic Residual for Automatic Musical Instrument Recognition of Pitched InstrumentsArie Livshin, Xavier Rodet, IRCAM Centre Pompidou - Paris, France
In different papers dealing with automatic musical instrument recognition of pitched instruments, the features used for classification are based solely on the fundamental frequencies and the harmonic series, ignoring the nonharmonic residual. In this paper we explore whether the instrument recognition rate of pitched instruments is decreased by removing the nonharmonic information present in the sound signal.
Convention Paper 6675 (Purchase now)

P6-12 A Fuzzy Rules-Based Speech/Music Discrimination Approach for Intelligent Audio Coding over the InternetJose Enrique Muñoz-Exposito, S. García Galán, N. Ruiz Reyes, P. Vera Candeas, F. Rivas Peña, Universidad de Jaen - Linares, Spain
Our paper presents a speech/music discrimination approach based on fuzzy rules for selecting the suitable coder required in an intelligent audio coding system. When the same coder is used for both speech and music, it is difficult to achieve good audio quality and low bit rates for both types of signals. We propose using a simple feature called Warped LPC-based Spectral Centroid (WLPC-SC) for speech/music discrimination. In order to select the suitable audio coder for each audio frame, an expert system is proposed. The main advantage of the proposed approach is the low computational cost in both the speech/music discrimination and coder selection stages. It allows its use in real time applications as internet audio streaming.
Convention Paper 6676 (Purchase now)

P6-13 Analysis and Transsynthesis of Solo Erhu Recordings Using Adaptive Additive/Subtractive SynthesisYi-Song Siao, Wei-Lun Chang, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
Erhu is the main bowed-string instrument in traditional Chinese music, much like the violin in western music. It has two strings and its top plate is made of snake skin. Numerous solo works were written for erhu. In this paper erhu resynthesis/transsynthesis software is presented. We use frame-based methods to analyze pitch and volume information of a solo erhu recording. Then, one can resynthesize it using the erhu timbre extracted from the original recording, other erhu timbres, or even timbres like violin and trumpet. Additive synthesis and subtractive synthesis methods are used to synthesize the overall sound. Because the expression and playing style of the original recording are preserved, the result is realistic and musical.
Convention Paper 6677 (Purchase now)

P6-14 Application of Fisher Linear Discriminant Analysis to Speech/Music ClassificationEnrique Alexandre, Manuel Rosa, Lucas Cuadra, Roberto Gil-Pita, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain
This paper proposes the application of Fisher linear discriminants to the problem of speech/music classification. Fisher linear discriminants can classify between two different classes and are based on the calculation of some kind of centroid for the training data corresponding with each one of these classes. Based on that information a linear boundary is established, which will be used for the classification process. Some results will be given demonstrating the superior behavior of this classification algorithm compared with the well-known K-nearest neighbor algorithm. It will also be demonstrated that it is possible to obtain very good results in terms of probability of error using only one feature extracted from the audio signal, being thus possible to reduce the complexity of this kind of system in order to implement them in real-time.
Convention Paper 6678 (Purchase now)


P7 - Multichannel Sound, Part 1

Sunday, May 21, 08:40 — 12:20

Chair: Jan Berg, Luleå University of Technology - Luleå, Sweden

P7-1 Effectiveness of Height Information for Reproducing the Presence and Reality in the Multichannel Audio SystemKimio Hamasaki, Toshiyuki Nishiguchi, NHK Science & Technical Research Laboratories - Tokyo, Japan; Koichiro Hiyama, NHK Kumamoto Station - Kumamoto, Japan; Reiko Okumura, NHK Science & Technical Research Laboratories - Tokyo, Japan
A 22.2 multichannel sound system was developed that adapts to an ultrahigh-definition video system with 4000 scanning lines. The sound system consists of loudspeakers with three layers: an upper layer with nine channels, a middle layer with ten channels, and a lower layer with three channels and two channels for low frequency effects. This system has new features of three-dimensional sound reproduction. Subjective evaluation by the semantic differential (SD) method are presented to assess the importance of height information for a sound system using several stimuli in a 22.2 multichannel audio system with Super Hi-Vision and a high-definition television. Furthermore, the actual effectiveness of height information and some practical suggestions for aesthetic mixing of three-dimensional audio is also presented.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 08:40
Convention Paper 6679 (Purchase now)

P7-2 Multichannel Signal Processing for Microphone ArraysPaolo Martignon, University of Parma - Parma, Italy
Microphone arrays are employed to make measurements or recordings taking into account the spatial properties of sound. Here the attention is focused on planar arrays oriented to acoustic mapping, which have a particular interest in industrial and environmental acoustics, although musical and audio applications are directly involved. A beam-forming theory overview is proposed, with a light study of array spatial resolution theory that holds a physical base, which no algorithm can deny. Then, a new algorithm, based on Kirkeby multichannel inversion, is proposed. Comparison between multichannel inversion, and beam forming are made through simulations, with good news for the new method.

Presentation is scheduled to begin at 09:00
Convention Paper 6680 (Purchase now)

P7-3 Miniature Microphone Arrays for Multichannel RecordingJuha Backman, Nokia Corporation - Espoo, Finland
This paper describes a method of using a dense array of miniature microphones (e.g., MEMS or miniature electret) to yield precise one-point multichannel gradient microphones. The signals obtained from individual microphones in the array are used to obtain an estimate for the zero, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multichannel signals with adjustable polar patterns for recording purposes.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at09:20
Convention Paper 6681 (Purchase now)

P7-4 Benefits of Distance Correction for Multichannel MicrophonesThomas Görne, Detmold University of Music - Detmold, Germany
Subjective assessment of stereophonic or multichannel microphone techniques often suffers from differences in the diffuse field sensitivities of various arrays. Diffuse field behavior of single rotation-symmetrical microphones at lateral direct sound incidence can be derived from the polar equations of ideal first-order gradient transducers. This simple model is used to estimate distance correction factors for symmetrical two-dimensional arrays as well as for MS pairs. The benefits of corrected stereo setups are also investigated.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6682 (Purchase now)

P7-5 Virtual Source Location Information-Based Matrix Decoding SystemHan-gil Moon, Manish Arora, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-Do, Korea
In this paper a new matrix decoding system using vector-based Virtual Source Location Information (VSLI) is proposed as one alternative to the conventional Dolby Pro logic II/IIx system for reconstructing multichannel output signal from matrix encoded 2-channel signals, Lt/Rt. This new matrix decoding system is composed of a passive decoding part and an active part. The passive part makes crude multichannel signals using a linear combination of the two encoded signals(Lt/Rt), and the active part enhances each channel regarding the virtual source, which is emergent in each inter channel. The virtual sources between channels are estimated by the inverse constant power panning law.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6683 (Purchase now)

P7-6 Relating Auditory Attributes of Multichannel Sound to Preference and to Physical ParametersSylvain Choisel, Aalborg University - Aalborg, Denmark, Bang & Olufsen A/S, Struer, Denmark; Florian Wickelmaier, Aalborg University - Aalborg, Denmark
Sound reproduced by multichannel systems is affected by many factors giving rise to various sensations, or auditory attributes. Relating specific attributes to overall preference and to physical measures of the sound field provides valuable information for a better understanding of the parameters playing a role in sound quality evaluation. Eight selected attributes are quantified by a panel of 39 listeners using paired-comparison judgments and probabilistic choice models, and related to overall preference. A multiple-regression model predicts preference well, and some similarities are observed within and between musical program materials, allowing for a careful generalization regarding the perception of spatial audio reproduction. Finally, a set of objective measures is derived from analysis of the sound field at the listening position in an attempt to predict the auditory attributes.

Presentation is scheduled to begin aat 10:20
Convention Paper 6684 (Purchase now)

P7-7 Quality Degradation Effects Caused by Limiting the Bandwidth of Standard Surround Sound Channels and Hierarchically Encoded MSBTF Channels: A Comparative StudyYu Jiao, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Limiting the bandwidth of multichannel audio can be used as an effective method of trading-off audio quality with broadcasting costs. In this paper subjective effects of two controlled high-frequency limitation methods on multichannel audio quality were studied with formal listening tests. The first method was based on limiting the bandwidth of standard surround sound channels (Rec. ITU-R BS. 775-1); the second involved limiting the bandwidth of the hierarchically encoded MSBTF channels. The results are compared and discussed. In this experiment, the low frequency effect (LFE) channel was omitted.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 10:40
Convention Paper 6685 (Purchase now)

P7-8 Initial Developments of an Objective Method for the Prediction of Basic Audio Quality for Surround Audio RecordingsSunish George, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
This paper describes the development of the objective method for the prediction of the Basic Audio Quality (BAQ) of band-limited or down-mixed surround audio recordings. A number of physical parameters, including interaural cross-correlation coefficients and spectral descriptors, were extracted from the recordings and used in a linear regression model to predict BAQ scores obtained from listening tests. The results showed a high correlation between the predicted scores and those obtained from the listening test, with the average error of prediction being smaller than 10 percent. Although the method was originally developed for 5-channel surround recordings, after some modifications it can be upgraded to any number of audio channels.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6686 (Purchase now)

P7-9 Listener Opinions of Novel Spatial Audio ScenesKathryn Beresford, Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
Listener opinions for alternative approaches to recording multichannel classical music were investigated, particularly considering alternatives to the traditional approach. Recordings were made with pre-existing microphone arrays but alternative arrangements of musicians. These were used in a listening test to assess different attributes (timbral balance, envelopment, locatedness, etc.). From the results it was noted that naïve and trained listeners assessed the recordings in different ways. Through factor analysis, two components were identified to represent these assessments—creativity and conventionality. The naïve listeners indicated that purchasability was closely related to creativity whereas for the trained listeners, conventionality was an indicator of purchasability. A method for predicting purchasability was developed, which may aid future work in the area.

Presentation is scheduled to begin at 11:20
Convention Paper 6687 (Purchase now)

P7-10 Low Frequency Sound Field Enhancement System for Rectangular Rooms Using Multiple Low Frequency LoudspeakersAdrian Celestinos, Sofus Birkedal Nielsen, Aalborg University - Aalborg, Denmark
Rectangular rooms have strong influence on the low frequency performance of loudspeakers. Simulations of three different room sizes have been carried out using the finite-difference time-domain method (FDTD) in order to predict the behavior of the sound field at low frequencies. By using an enhancement system with extra loudspeakers the sound pressure level distribution along the listening area presents a significant improvement in the subwoofer frequency range. The system is simulated and implemented on the three different rooms and finally verified by measurements on the real rooms.

Presentation is scheduled to begin at 11:40
Convention Paper 6688 (Purchase now)

P7-11 Tactile Strategies and Resources for Teaching Multichannel Sound ConceptsLeslie Gaston, University of Colorado at Denver - Denver, CO, USA
Several university audio programs now incorporate multichannel, or surround sound, into their curricula. In order to supplement these courses and lectures many opportunities exist to incorporate hands-on demonstrations of concepts used for microphone techniques, mixing, monitoring, and delivery. This paper will give suggestions for different tactile strategies that can be used to illustrate concepts in multichannel audio, as well as other resources that may be utilized when doing preparation and research for teaching classes. Suggestions for homework and research topics for students will also be provided, along with recommended equipment needs.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 12:00
Convention Paper 6689 (Purchase now)


P8 - Signal Processing and High Resolution Audio, Part 1

Sunday, May 21, 08:40 — 12:20

Chair: Pauli Minaar, AM3D A/S - Aalborg, Denmark

P8-1 All Amplifiers Are Analog, but Some Amplifiers Are More Analog than OthersBruno Putzeys, Hypex Electronics B.V. - Groningen, The Netherlands; André Veltman, Paul van der Hulst, Piak Electronic Design b.v. - Culemborg, The Netherlands; René Groenenberg, Mueta b.v. - Wijk en Aalburg, The Netherlands
This paper intends to clarify the terms “digital” and “analog” as applied to class-D audio power amplifiers. Since loudspeaker terminals require an analog voltage, an audio power amplifier must have an analog output. If its input is digital, digital-to-analog conversion is necessarily executed at some point. Once a designer acknowledges the analog output properties of a class-D power stage, amplifier quality can improve. The incorrect assumption that some amplifiers are supposedly digital, causes many designers to come up with complicated patches to ordinary analog phenomena such as timing distortion or supply rejection. This irrational approach blocks the way to a rich world of well-established analog techniques to avoid and correct many of these problems and realize otherwise unattainable characteristics such as excellent THD+N and extremely low output impedance throughout the audio band.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 08:40
Convention Paper 6690 (Purchase now)

P8-2 Toward an Ideal Switching (Class-D) Power Amplifier: How to Control the Flow of Power in a Switching Power CircuitRolf Esslinger, Dieter Jurzitza, Harman/Becker Automotive Systems - Karlsbad, Germany
The design of a switching (class-D) audio power amplifier suitable for high-end audio applications is still a very challenging task for circuit design and signal processing engineers. Classical power stage topologies using Pulse-Width Modulation (PWM) in combination with voltage-controlled MOSFET H-bridges are already available on the market, but their performance in terms of signal bandwidth and linearity is still far below the one of traditional class-A and A/B power stages. Moreover, EMC is an issue that is very hard to control. Class-D output stages are considered from a totally different point of view in this paper. The flow of power in the output stage, containing the switching power stage as a power control element, the output filter as an energy store, and the load as both a power sink and a power source in case the load is not a resistor but a real world loudspeaker device. It is shown, where in a typical power stage the power loss occurs, which is dissipated as heat. To improve the quality and efficiency of high-frequency switched power stages, investigation has to be taken into the way, how to control the flow of power into the storage elements and how to charge them most precisely and most efficiently. Some fundamental approaches for this will be shown in this paper.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 09:00
Convention Paper 6691 (Purchase now)

P8-3 Second Generation Intelligent Class D Amplifier Controller Integrated Circuit Enables both Low Cost and High Performance Amplifier DesignsJack Andersen, Daniel Chieng, Steven Harris, Jeff Klaas, Michael Kost, Skip Taylor, D2Audio Corporation - Austin, TX, USA
This paper describes a digital input Class D amplifier controller integrated circuit that performs many of the functions needed to build a high-performance Class D audio amplifier. Sophisticated digital pulse width modulation, combined with digital feed-forward and feedback paths, yields both low cost and high performance amplifier designs. A powerful DSP is included to support amplifier control and allows comprehensive audio signal processing, including loudspeaker load compensation, EQ, time alignment, room acoustics compensation, bass enhancement, loudspeaker driver protection, virtual surround, and other audio signal processing tasks. Power supply feed-forward and closed-loop feedback technology correct for power supply variations, nonlinearity, and other distortion-inducing mechanisms.

Presentation is scheduled to begin at 09:20
Convention Paper 6692 (Purchase now)

P8-4 PWM Amplifier Control Loops with Minimum Aliasing DistortionLars Risbo, Texas Instruments Denmark A/S - Lyngby, Denmark; Claus Neesgaard, Texas Instruments Inc. - Dallas, TX, USA
PWM class-D audio power amplifiers typically contain a control loop filter network and a comparator producing the PWM signal. The comparator performs a sampling operation whenever it changes state. A previous paper by the author analyzed this sampling behavior from a small signal point of view. The present paper attempts to formulate a large-signal model that accounts for the nonlinear effects of the sampling due to aliasing of high frequency carrier components. Closed-form expressions for the intrinsic THD of the traditional first- and second-order loops are derived. The model is validated using simulations, and a class of Minimum Aliasing Error (MAE) loop filters is presented that obtains minimum aliasing distortion thanks to the use of quadrature sampling. Finally, measurement data are presented for real applications using the principles described.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6693 (Purchase now)

P8-5 Simple, Ultralow Distortion Digital Pulse Width ModulatorBruno Putzeys, Hypex Electronics BV - Groningen, The Netherlands
A core problem with digital pulse width modulators is that effective sampling occurs at signal-dependent intervals, falsifying the z-transform on which the input signal and the noise shaping process are based. In a first step the noise shaper is reformulated to operate at the timer clock rate instead of the pulse repetition frequency. This solves the uniform/natural sampling problem, but gives rise to new nonlinearities akin to ripple feedback in analog modulators. By modifying the feedback signal such that it reflects only the modulated edge of the pulse train this effect is practically eliminated, yielding vastly reduced distortion without increasing complexity.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6694 (Purchase now)

P8-6 A High Performance Open Loop All-Digital Class-D Audio Power Amplifier Using Zero Positioning Coding (ZePoC)Olaf Schnick, Wolfgang Mathis, University of Hannover - Hannover, Germany
Open loop all-digital Class-D amplifiers are uncommon due to the lack of the correcting feedback path leading to several problems resulting in high distortion compared to analog controlled class-D amplifiers. This paper shows that SB-ZePoC lowers switching frequency to 100 kHz. Therefore, these problems can be solved, so that it is possible to design an open loop all-digital class-D audio amplifier with low total distortions in the whole audio-band (20 Hz to 20 kHz) and an efficiency that reaches 90 percent. Results of a test-setup will be presented. The sonic performance will be demonstrated during the session.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6695 (Purchase now)

P8-7 A Three-Level Trellis Noise Shaping Converter for Class D AmplifiersLudovico Ausiello, Riccardo Rovatti, University of Bologna - Bologna, Italy; Gianluca Setti, University of Ferrara - Ferrara, Italy
Class D amplifiers can represent signals with three different output levels, +Vcc, 0, -Vcc, with no distortion. Exploiting this in order to achieve a better performance with no switching frequency increase, an extension to the classic pulse width modulation two level A/D conversion is proposed. Coding is achieved by extending output waveforms of a trellis-based sigma delta modulation to three levels. Simulation results have shown that, using the same symbol rate, a three-level pattern is achieved from 3.7 to 8.2 dB of SINAD improvement and a power consumption up to 5 times smaller.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 10:40
Convention Paper 6696 (Purchase now)

P8-8 Using SIP Techniques to Verify the Trade-Off between SNR and Information Capacity of a Sigma Delta ModulatorCharlotte Yuk-Fan Ho, Joshua Reiss, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK
The Gerzon-Craven noise shaping theorem states that the ideal information capacity of a sigma delta modulator design is achieved if and only if the noise transfer function (NTF) is minimal phase. In this paper it is found that there is a trade-off between the signal-to-noise ratio (SNR) and the information capacity of the noise shaped channel. In order to verify this result, loop filters satisfying and not satisfying the minimal phase condition of the NTF are designed via semi-infinite programming (SIP) techniques and solved using dual parameterization. Numerical simulation results show that the design with a minimal phase NTF achieves near the ideal information capacity of the noise shaped channel, but the SNR is low. On the other hand, the design with a nonminimal phase NTF achieves a positive value of the information capacity of the noise shaped channel, but the SNR is high. Results are also provided that compare the SIP design technique with Butterworth and Chebyshev structures and ideal theoretical SDMs, and evaluate the performance in terms of SNR and a variety of information theoretic measures which capture noise shaping qualities.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6697 (Purchase now)

P8-9 Estimation of Initial States of Sigma-Delta ModulatorsCharlotte Yuk-Fan Ho, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK; Joshua Reiss, Queen Mary, University of London - London, UK
In this paper an initial condition of a sigma-delta modulator is estimated based on quantizer output bit streams and an input signal. The set of initial conditions that generate a stable trajectory is characterized. It is found that this set, as well as the set of initial conditions corresponding to the quantizer output bit streams, are convex. Also, it is found that the mapping from the set of initial conditions to the stable admissible set of quantizer output bit streams is invertible if the loop filter is unstable. Hence, the initial condition corresponding to given stable admissible quantizer output streams and an input signal is uniquely defined when the loop filter is unstable, and a projection onto convex set approach is employed for approximating the initial condition.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6698 (Purchase now)

P8-10 High Performance Real-Time Software Asynchronous Sample Rate Converter KernelThierry Heeb, Anagram Technologies - Préverenges, Switzerland
A scalable real-time asynchronous sample rate converter software kernel is presented that offers a flexible alternative to the usual hardware implementations. The kernel is dynamically configurable at run-time and supports almost arbitrary upsampling or downsampling ratios and any number of channels. Due to its scalability this sample rate converter kernel may be used both for low complexity, cost-sensitive implementations as well as for top-performance applications. In a typical high performance application, sample rates of 384 kHz are easily achieved on a low-cost DSP, and DSD input data streams are also supported for compatibility with SACD.

Presentation is scheduled to begin at 11:40
Convention Paper 6699 (Purchase now)

P8-11 Clean Clocks, Once and for All?Christian G. Frandsen, TC Electronic A/S - Risskov, Denmark; Chris Travis, Sonopsis Ltd. - Wotton-under-Edge, Gloucestershire, UK
Network-based digital audio interfaces are becoming increasingly popular. But they do pose a significant jitter problem wherever high-quality conversion to/from analog is required. This is true even with networks such as 1394 that provide dedicated support for isochronous flows. Conventional PLL solutions have too-little jitter attenuation, too-much intrinsic jitter, and/or too-narrow a frequency range. More advanced solutions tend to have too-high a cost. A new clocking technology that boasts high performance and low cost is presented. It has been implemented in a recent audio-over-1394 chip. We show comparative performance results and explore system-level implications, including for systems that use point-to-point links such as AES3, SPDIF, and ADAT.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 12:00
Convention Paper 6700 (Purchase now)


P9 - Posters: Spatial Perception and Processing

Sunday, May 21, 09:00 — 10:30

P9-1 Evaluation of a 3-D Audio System with Head TrackingJan Abildgaard Pedersen, AM3D A/S - Aalborg, Denmark, Lyngdorf Audio, Skive, Denmark; Pauli Minnaar, AM3D A/S - Aalborg, Denmark
A 3-D audio system was evaluated in an experiment where listeners had to “shoot down” real and virtual sound sources appearing from different directions around them. The 3-D audio was presented through headphones, and head tracking was used. In order to investigate the influence of head movements both long and short stimuli were used. Twenty six people participated, of which half were students and half were pilots. The results were analyzed by calculating a localization offset and a localization uncertainty. For azimuth no significant offset was found, whereas for elevation an offset was found that is strongly correlated with the stimulus elevation. The uncertainty for real and virtual sound sources was 10 degrees and 14 degrees in azimuth and 12 degrees and 24 degrees in elevation.

[Poster Presentation Associated with Paper Presentation 3-5]
Convention Paper 6654 (Purchase now)

P9-2 Design and Verification of HeadZap, a Semi-Automated HRIR Measurement SystemDurand Begault, Martine Godfroy, NASA Ames Research Center, USA - Moffett Field, CA, USA; Joel Miller, VRSonic, Inc. - San Francisco, CA, USA; Agnieskza Roginska, AuSIM Incorporated - Palo Alto, CA, USA; Mark Anderson, Elizabeth Wenzel, NASA Ames Research Center - Moffett Field, CA, USA
This paper describes the design, development, and acoustic verification of HeaddZap, a semi-automated system for measuring head-related impulse responses (HRIR) designed by AuSIM Incorporated and modified by the NASA Ames Research Center Spatial Auditory Display Laboratory. HeaddZap utilizes an array of twelve loudspeakers in order to measure 432 HRIRs at 10° intervals in both azimuth and elevation, in a nonanechoic environment. Application to real-time rendering using SLAB software, an audio-visual localization experiment, is discussed.

[Poster Presentation Associated with Paper Presentation 3-6]
Convention Paper 6655 (Purchase now)

P9-3 Direct Audio Coding: Filterbank and STFT-Based DesignVille Pulkki, Helsinki University of Technology - Espoo, Finland; Christof Faller, EPFL - Lausanne, Switzerland
Directional audio coding (DirAC) is a method for spatial sound representation, applicable to arbitrary audio reproduction methods. In the analysis part, properties of the sound field in time and frequency in a single point are measured and transmitted as side information together with one or more audio waveforms. In the synthesis part, the properties of the sound field are reproduced using separate techniques for point-like virtual sources and diffuse sound. Different implementations of DirAC are described and differences between them are discussed. A modification of DirAC is presented, which provides a link to Binaural Cue Coding and parametric multichannel audio coding in general (e.g., MPEG Surround).

[Poster Presentation Associated with Paper Presentation 3-9]
Convention Paper 6658 (Purchase now)

P9-4 Comprehensive Analysis of Loudspeaker Span Effects on Crosstalk Cancellation in Spatial Sound ReproductionMingsian R. Bai, Chih-Chung Lee, National Chiao-Tung University - Hsin-Chu, Taiwan
This paper seeks to pinpoint the optimal loudspeaker span that best reconciles the robustness and performance of the crosstalk cancellation system (CCS). Two sweet spot definitions are employed for assessment of robustness. Besides the point source model, head-related transfer functions are employed in the simulation to capture more design aspects in practical situations. Three span angles, 10 degrees, 60 degrees, and 120 degrees, are compared via subjective experiments. Analysis of Variance is applied for analysis. The results indicate that not only the CCS performance but also the panning effect and head shadowing will dictate the overall performance and robustness. The 120-degree arrangement performs comparably well as the 60-degree arrangement but is more preferred than the 10-degree arrangement.
Convention Paper 6701 (Purchase now)

P9-5 A Perceptual Measure for Assessing and Removing Reverberation from Audio SignalsThomas Zarouchas, John Mourjopoulos, Panagiotis Hatziantoniou, University of Patras - Patras, Greece; Joerg Buchholz, University of Western Sydney - Penrith South DC, New South Wales, Australia
A novel signal-dependent approach is followed here for modeling perceived distortions due to reverberation in audio signals. The method describes perceived monaural time-frequency and level distortions due to reverberation, which depend on the reproduced signal’s evolution. A Computational Auditory Masking Model (CAMM) is employed, using as inputs the reverberant and reference (anechoic) signal, generating time-frequency maps of perceived distortions. From these maps and in a number of sub-bands, suitable functions can be derived allowing suppression of reverberation in the processed signal.
Convention Paper 6702 (Purchase now)

P9-6 Investigating Spatial Audio Coding Cues for Meeting Audio SegmentationEva Cheng, Ian Burnett, Christian Ritz, University of Wollongong - Wollongong, New South Wales, Australia
As multiparty meetings involve participants that are generally stationary when actively speaking, participant location information can be used to segment the recorded meeting audio into loudspeaker “turns.” In this paper speaker location information derived from spatial cues generated by spatial audio coding techniques is investigated. The validity of using spatial cues for meeting audio segmentation is explored through investigating multiple microphone meeting audio recording techniques and extracting and comparing spatial cues used by different spatial audio coders. Experimental results show the statistical relationship between loudspeaker location and interchannel level and phase-based spatial cues strongly depends on the microphone pattern. Results also indicate that interchannel correlation-based spatial cues represent location information that is ambiguous for meeting audio segmentation.
Convention Paper 6703 (Purchase now)

P9-7 The Effect of Audio Compression Techniques on Binaural Audio RenderingFabien Prezat, Brian Katz, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI-CNRS) - Orsay, France
The use of lossy audio compression is becoming increasingly common. Many studies have concentrated on the audio quality of such compression techniques, predominantly in a monaural context. This paper investigates the effects of audio compression techniques on spatialized audio, specifically binaural audio. Various compression techniques (AAC, ATRAC, MP2, and MP3) using various bit rates when possible have been applied to several test signals. This paper presents numerical and perceptive comparisons of the variations in inter-aural time difference (ITD) due to audio compression techniques. Some investigations were also made concerning the effect on spectral peaks and notches, as these spectral cues (contained in the Head-Related Transfer Function, HRTF) are necessary for more precise localization including front-back discrimination and elevation.
Convention Paper 6704 (Purchase now)

P9-8 Sound Source Obstruction in an Interactive 3-Dimensional MPEG-4 EnvironmentBeatrix Steglich, Ulrich Reiter, Technische Universität Ilmenau - Ilmenau, Germany
This paper describes the continuation of research concerning sound source obstruction in virtual scenes. An algorithm for the determination of sound source obstruction was implemented in the described 3-D MPEG-4 environment. With the help of the MPEG-4 Advanced AudioBIFS node AcousticMaterial acoustic properties are assigned to potential obstructors in a virtual scene. Various implementations of acoustic obstruction are explained. Furthermore, a bimodal subjective assessment was performed in order to identify the best implementation of obstruction. The results of the assessment are presented in-depth. Additionally we demonstrate a concept for a second intended bimodal assessment for the comparison of gain and frequency filtering and give an outlook for further research and development in the area of immersive acoustics.
Convention Paper 6705 (Purchase now)


P10 - Posters: Audio in Computers and Audio Networking

Sunday, May 21, 11:00 — 12:30

P10-1 Scene Description Model and Rendering Engine for Interactive Virtual AcousticsJean-Marc Jot, Jean-Michel Trivi, Creative Advanced Technology Center - Scotts Valley, CA, USA
Interactive environmental audio spatialization technology has become commonplace in personal computers, where its primary current application is video game soundtrack rendering. The most advanced PC audio platforms available can spatialize 100 or more sound sources simultaneously over headphones or multichannel home theater systems, and employ multiple reverberation engines to simulate complex acoustical environments. This paper reviews the main features of the EAX environmental audio programming interface and its relation to the I3DL2 and MPEG-4 standards. A statistical reverberation model is introduced for simulating per-source distance and directivity effects. An efficient spatial reverberation and mixing architecture is described for the spatialization of multiple sound sources around a virtual listener navigating across multiple connected virtual rooms including acoustic obstacles.

[Poster Presentation Associated with Paper Presentation 4-2]
Convention Paper 6660 (Purchase now)

P10-2 Intelligent Audio for GamesCol Walder, Revolution Recording - Sheffield, UK
Providing interactive audio for computer games has traditionally been seen as a challenge, particularly given the technological limitations of games consoles. With current advances in technology, however, there is the potential to take advantage of the benefits of interactivity. This paper proposes the use of Artificial Intelligence (AI) routines to control in-game audio with a focus on implementing techniques used in film sound for drama-based games. Soar architecture is presented as a good candidate for developing audio AI for games.

[Poster Presentation Associated with Paper Presentation 4-3]
Convention Paper 6661 (Purchase now)

P10-3 Single Frequency Networks for FM RadioPierre Soelberg, Selberg Broadcast & IT Consult - Købehavn S, Denmark
Single Frequency Networks (SFN) and Near Single Frequency Networks (NSFN) are usually not considered suitable for FM radio. Some countries are now replanning their FM bands for the use of (N)SFN, in order to make space for more stations. Even though some stations use it, like a station covering a highway, replanning the FM-band with the use of SFN for a whole country, is a different thing. The first country to do this was the Netherlands, and the first experiences with it are not as good as expected. The requirements for synchronization of FM transmitters used for (N)SFN are explained, and SFN networks are tested from real transmitter sites. The result is a proposed correction for the Dutch norm.

[Poster Presentation Associated with Paper Presentation P4-6]
Convention Paper 6664 (Purchase now)

P10-4 A Paradigm for Wireless Digital Audio Home EntertainmentNikos Kokkos, University of Patras - Patras, Greece; Andreas Floros, Ionian University - Corfu, Greece; Nicolas-Alexander Tatlas, John Mourjopoulos, University of Patras - Patras, Greece
Despite recent advances in wireless networking technology, real-time streaming of CD-quality digital audio remains a challenging topic. In this paper a set of applications following the server-client model was developed, facilitating the transmission and playback of PCM-coded audio over wireless links. The implementation is based on typical personal computer (PC) platforms interconnected with off-the-shelf wireless networking hardware. Performance evaluation tests are presented under different networking parameters and link conditions, leading to an optimal set of parameters for high-quality wireless digital audio delivery.

[Poster Presentation Associated with Paper Presentation P4-7]
Convention Paper 6665 (Purchase now)

P10-5 Design and Installation of Recording Studios for Vocational TrainingChris Bradley, James Watt College of Further & Higher Education - Greenock, Inverclyde, Scotland, UK; Billy Law, Mediaspec - Glasgow, Scotland, UK
This paper describes the design and installation of new recording studios for training of music and sound production allowing unparalleled direct student hands-on tuition. The design allows simultaneous recording from the live rooms to all twelve control rooms via digital distribution, enabling individual set up for a recording session, multitrack recording and subsequent mixdown. All recording sessions are saved to a centralized server that allows back-up and uploading to and from any other control room. Students can therefore import their work into any of the other control rooms at any time. Networking is through Gigabit Ethernet so transfer of work is fast, and students have their own password protected space, learning the importance of file management.

[Poster Presentation Associated with Paper Presentation P4-6]
Convention Paper 6667 (Purchase now)

P10-6 Flexible, High Speed Audio Networking for Hotels and Convention CentersRichard Foss, Rhodes University - Grahamstown, South Africa; Jun-ichi Fujimori, Yamaha Corporation - Hamamatsu, Japan; Nyasha Chigwamba, Rhodes University - Grahamstown, South Africa; Brad Klinkradt, Harold Okai-Tettey, Networked Audio Solutions - Grahamstown, South Africa
This paper describes the use of mLAN (music Local Area Network) to solve the problem of audio routing within hotels and convention centers. mLAN is a FireWire-based digital network interface technology that allows professional audio equipment, PCs, and electronic instruments to be easily and efficiently interconnected using a single cable. In order to solve this problem, an existing mLAN Connection Management Server, augmented with additional functionality, has been utilized. A graphical client application has been created that displays the various locations within a hotel/convention center and sends out appropriate routing messages in Extensible Mark-up Language (XML) to an mLAN connection management server. The connection management server, in turn, controls a number of mLAN audio distribution boxes on the FireWire network.

[Poster Presentation Associated with Paper Presentation P4-10]
Convention Paper 6668 (Purchase now)

P10-7 JavaOL – A Structured Audio Orchestra Language: Tools, Player, and Streaming EngineTien-Ming Wang, Yi-Song Siao, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
MPEG4 Structure Audio (SA) defined a set of tools to provide high quality low bit-rate audio. In MPEG-4 SA, SAOL (Orchestra Language) is the most important part because it is used to implement algorithms to generate sounds. However, SAOL must be translated into other programming languages such that it can be executed currently. This requires more computing power to achieve real-time decoding. Based on MPEG-4 SAOL, we propose JavaOL because it eliminates the translation process and is more efficient. In fact, it is Java equipped with the SAOL opcode library. Therefore, one can achieve the same functions provided by SAOL. We also provide an RTP streaming engine and the associate player. Software tools are provided to combine other audio sources.
Convention Paper 6706 (Purchase now)

P10-8 Using Remote Recording Over the Internet in EducationPatrick Quinn, Don Knox, Lynne Baillie, David Harrison, Glasgow Caledonian University - Glasgow, UK; Martin Dewar, Coatbridge College - Coatbridge, UK
Remote recording across the Internet now appears to have come of age with the recent development of appropriate software and infrastructure. Within the educational sector the Internet has taken a central role as a means to deliver educational materials. In this innovative pilot project involving Glasgow Caledonian University and its partner Coatbridge College, the use of the Internet to teach audio technology and production techniques will be explored and evaluated. It is anticipated that the knowledge and experience gained will better prepare the audio professionals of the future.
Convention Paper 6707 (Purchase now)

P10-9 A Community Hierarchic-Based Approach for Scalable Parametric Audio Multicasting Over the InternetJuan Carlos Cuevas-Martinez, P. Vera-Candeas, N. Ruiz-Reyes, P. J. Garrido-Rivera, J. Ruiz-Pérez, University of Jaén - Jaen, Spain
One of the main features of a low bit-rate audio coder is its availability for broadcast over media, mainly over the Internet and mobile networks. It is well known that it is not a trivial problem; there are many troubles that could appear in a multicasting system, mainly due to Internet lack of QoS. This kind of audio traffic has to exist with TCP connections, has to avoid congestion, and should require fewer changes in network equipment as possible. In this paper we propose some features to be taken into consideration in the development of an end system multicast and peer-to-peer communication protocol for scalable parametric audio broadcasting over the Internet with low bit rate and good quality.
Convention Paper 6708 (Purchase now)

P10-10 Distant Teaching of Chamber Music via Local Area NetworksJoerg Bitzer, Tobias May, University of Applied Science Oldenburg - Oldenburg, Germany; Zefir Kurtisi, University of Braunschweig - Braunschweig, Germany; Thomas Loesch, L3S Research Center - Hannover, Germany
In this paper we present a study on teaching chamber music via Internet. The application for this setup is for a highly reputed teacher to teach professional musicians at a very high level. Usually, all participants would have to fly from all over the world in order to work together. Therefore, it would be of great value, if these teaching lessons could be done via Internet. Several audio and video devices and different audio setups have been tested. The results indicate that MPEG 2 broadcast devices with two microphones are suitable for this task.
Convention Paper 6709 (Purchase now)


P11 - Multichannel Sound, Part 2

Sunday, May 21, 14:00 — 15:20

Chair: Jason Corey, University of Michigan - Ann Arbor, MI, USA

P11-1 Implementation of Immersive Audio Applications Using Robust Adaptive Beamforming and Wave Field SynthesisJon Ander Beracoechea-Alava, Politechnical University of Madrid - Madrid, Spain; Soledad Torres-Guijarro, University of Vigo - Vigo, Spain; Lino García, European University of Madrid - Madrid, Spain; Javier Casajús, Luis Ortiz, Politechnical University of Madrid - Madrid, Spain
An immersive audio system oriented to future communication applications is presented. The aim is to build a system where the acoustic field in a chamber is recorded using a microphone array and then is reconstructed or rendered again, in a different chamber using loudspeaker array-based techniques. In order to reduce the enormous bandwidth necessary to deal with this setup, our proposal relies on recent robust adaptive beamforming techniques and joint audio-video source localization for effectively estimating the original sources of the emitting room. The estimated source and the source localization information drive a Wave Field Synthesis engine that renders the acoustic field again at the receiving chamber. The overall system performance is tested using a MUSHRA-based subjective test in a real situation.

Presentation is scheduled to begin at 14:00
Convention Paper 6710 (Purchase now)

P11-2 Spatial Aliasing Artifacts Produced by Linear and Circular Loudspeaker Arrays Used for Wave Field SynthesisSascha Spors, Deutsche Telekom Laboratories - Berlin, Germany; Rudolf Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany
Wave field synthesis allows the exact reproduction of sound fields if the requirements of its physical foundation are met. However, the practical realization imposes certain technical constraints. One of these is the application of loudspeaker arrays as an approximation to a spatially continuous source distribution. The effect of a finite spacing of the loudspeakers can be described as spatial sampling artifacts. This contribution derives a description of the spatial sampling process for planar linear and circular arrays, analyzes the sampling artifacts, and discusses the conditions for preventing spatial aliasing. It furthermore introduces the reproduced aliasing-to-signal ratio as a measure for the energy of aliasing contributions.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at 14:20
Convention Paper 6711 (Purchase now)

P11-3 Characterization of the Reverberant Sound Field Emitted by a Wave Field Synthesis Driven Loudspeaker ArrayTerence Caulkins, Olivier Warusfel, IRCAM - Paris, France
Realistic sound reproduction using wave field synthesis (WFS) in concert halls involves ensuring that both the direct and reverberated sound fields are accurate at all listening positions. Though methods for controlling the direct sound field have been described in the past, the control of the reverberated sound field associated to WFS sources remains a topic of interest. This paper describes the characteristics of the reverberated sound field associated to a WFS array as it synthesizes a virtual point source. Variations in the directivity and positioning of the virtual source are shown to have an effect on the associated room effect. A solution for controlling the reverberated sound field in a concert hall equipped with a WFS system is proposed based on this characterization.

[Associated Poster Presentation in Session P14, Sunday, May 21, at 16:00]

Presentation is scheduled to begin at14:40
Convention Paper 6712 (Purchase now)

P11-4 Conjugate Gradient Techniques for Multichannel Acoustic Echo Cancellation in Frequency DomainLino García Morales, Universidad Europea de Madrid - Madrid, Spain; Jon Ander Beracoechea, Universidad Politécnica de Madrid - Madrid, Spain; Soledad Torres-Guijarro, Universidad de Vigo - Vigo, Spain; Francisco Javier Casajús-Quirós, Universidad Politécnica de Madrid - Madrid, Spain
Multichannel acoustic cancellation problem requires working with extremely large impulse responses. Multirate adaptive schemes such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellation systems nowadays. However, being a Least Mean Square (LMS) derived algorithm, the convergence speed may not be fast enough under some circumstances. In this paper we propose a new scheme that combines frequency-domain adaptive filtering with conjugate gradient techniques in order to speed up the convergence time. The new algorithm (PBFDAF-CG) is developed and its behavior is compared against previous PBFDAF schemes.

Presentation is scheduled to begin at 15:00
Convention Paper 6713 (Purchase now)


P12 - Signal Processing and High Resolution Audio, Part 2

Sunday, May 21, 14:00 — 18:00

Chair: Jan Abildgaard Pedersen, Lyngdorf Audio - Denmark

P12-1 SigmaStudio. A User-Friendly, Intuitive and Expandable, Graphical Development Environment for Audio/DSP ApplicationsMiguel Chavez, Camille Huin, Analog Devices, Inc. - Wilmington, MA, USA
Graphical development environments have been used in the audio industry for a number of years. Those who have fewer limitations have persisted and found a well-established pool of users that is reluctant to modify their design patterns and adopt different embedded processors and design environments. This paper provides a small history of the evolution of integrated development environments (IDEs). It then describes and explains the software architecture decisions and design challenges that were used to develop SigmaStudio. It will also show the advantages that those decisions have meant for the SigmaDSP family of audio-centric embedded processors.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 14:00
Convention Paper 6714 (Purchase now)

P12-2 Filter Update Techniques for Adaptive Virtual Acoustic ImagingPal Mannerheim, Philip Nelson, University of Southampton - Southampton, UK; Youngtae Kim, Samsung Advanced Institute of Technology (SAIT) - Gyeonggi-do, Korea
This paper deals with filter updates for adaptive virtual acoustic imaging systems using binaural technology and loudspeakers. The problem is to update the inverse filters without creating any audible changes for the listener. The problem can be overcome by using either a very fine mesh for the inverse filters or by using commutation techniques.

Presentation is scheduled to begin t 14:20
Convention Paper 6715 (Purchase now)

P12-3 Adaptive Filters in Wavelet Transform DomainVladan Bajic, Audio-Technica US - Stow, OH, USA
This paper presents performance comparison between two methods of implementing adaptive filtering algorithms for noise reduction, namely the Normalized time domain Least Mean Squares (NLMS) algorithm and the Wavelet transform domain LMS (WLMS). A brief theoretical development of both methods is explained, and then both algorithms are implemented on a real-time Digital Signal Processing (DSP) system used for audio signals processing. Results are presented showing the performance of each algorithm both in time and frequency domains. Noise reduction effects produced by different algorithms were shown across the spectrum, and distorting effects were analyzed. Trade-offs of convergence speed versus added noise were analyzed. Overall results show convergence speed improvement when using WLMS algorithms over the NLMS algorithm.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 14:40
Convention Paper 6716 (Purchase now)

P12-4 Adaptive Time-Frequency Resolution for Analysis and Processing of AudioAlexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform (FFT). The fixed time-frequency resolution of the STFT can lead to the undesirable smearing of events in both time and frequency. In this paper we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank-specific artifacts while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated as applied to spectrogram displays, noise reduction, and spectral effects processing.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]


Presentation is scheduled to begin at 15:00
Convention Paper 6717 (Purchase now)

P12-5 Advanced Methods for Shaping Time-Frequency Areas for the Selective Mixing of SoundsPiotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland; Adam Kleczkowski, University of Cambridge - Cambridge, UK
The “Selective Mixing of Sounds” (AES 119th Convention Paper 6552) contains a large and conceptually challenging part, which had not been developed previously. This is a method of determining the areas of dominance by different tracks in the time-frequency plane. It has a major effect on the overall quality of the sound. In this paper we propose and compare a range of appropriate algorithms. We begin with a simple two-dimensional running mean combined with a rule selecting the track characterized by the maximum energy, followed by a low-pass filter based on the two-dimensional Fourier transform. We also propose two novel methods based on the Monte-Carlo approach, in which local probabilistic rules are iterated many times to produce a required level of smoothing.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 15:20
Convention Paper 6718 (Purchase now)

P12-6 Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingMarc Vinyes, Jordi Bonada, Alex Loscos, Pompeu Fabra University - Barcelona, Spain
Audio blind separation in real commercial music recordings is still an open problem. In the last few years some techniques have provided interesting results. This paper presents a human-assisted clusterization of the DFT coefficients for the time-frequency masking demixing technique. The DFT coefficients are grouped by adjacent pan, interchannel phase difference, and magnitude and magnitude-variance with a real-time interactive graphical interface. Results prove that an implementation of such technique can be used to demix tracks from nowadays commercial songs. Sample sounds can be found at http://www.iua.upf.es/~mvinyes/abs/demos.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 15:40
Convention Paper 6719 (Purchase now)

P12-7 A Multichannel Speech Dereverberation Technique Based Upon the Wiener FilterDenis McCarthy, Frank Boland, Trinity College - Dublin, Ireland
We present a new method for dereverberating speech based upon a multichannel Wiener Filter and a microphone array. We demonstrate the effectiveness of this method under real, reverberant conditions and show that the method may be described as a self-steering beamformer. Furthermore, we investigate the performance of the method under simulated conditions, designed to closely match the acoustic characteristics of the real room environment. These simulations yield significantly inferior results to those obtained using real recordings and we show that this is a result of the failure of simulated impulse responses to accurately model real impulse responses in certain critical respects.

Presentation is scheduled to begin at 16:00
Convention Paper 6720 (Purchase now)

P12-8 Effective Room Equalization Based on Warped Common Acoustical Poles and ZerosJunho Lee, Jae-woong Jeong, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju-City, Gangwon-Do, Korea; Seh-Woong Jeong, Samsung Electronics Co., Ltd. - Yongin-City, Gyeonggi-Do, Korea; Dae-hee Youn, Yonsei University - Seoul, Korea
This paper presents a new method of designing room equalization filters using a warped common acoustical pole and zero (WCAPZ) modeling. The proposed method is capable of significantly reducing the order of the equalization filters without sacrificing the filter performance, especially at low frequencies. Thus, the associated input-output delay is much smaller than the conventional block transform method while its computational complexity is comparable to it. The computational complexity also is still comparable to the conventional room equalization method, since the filter is implemented in the linear frequency domain after the pole-zero dewarping. Simulation results confirm that the use of the proposed algorithm significantly improves the room equalization over a range of low frequencies.

Presentation is scheduled to begin at 16:20
Convention Paper 6721 (Purchase now)

P12-9 Parametric Recursive Higher-Order Shelving FiltersMartin Holters, Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
The main characteristic of shelving filters, as commonly used in audio equalization, is to amplify or attenuate a certain frequency band by a given gain. For parametric equalizers, a filter structure is desirable that allows independent adjustment of the width and center frequency of the band, and the gain. In this paper we present a design for arbitrary-order shelving filters and a suitable parametric structure. A low shelving filter design based on Butterworth filters is decomposed such that the gain can be easily adjusted. Transformation to the digital domain is performed, keeping gain and denormalized cut-off frequency independently controllable. Finally, we obtain mid- and high-shelving filters using a simple manipulation, providing the desired parametric filter structure.

Presentation is scheduled to begin t 16:40
Convention Paper 6722 (Purchase now)

P12-10 Enhanced Control of Sound Field Radiated by Co-Axial Loudspeaker Systems Using Digital Signal Processing TechniquesHmaied Shaiek, ENST de Bretagne - Brest Cedex, France; Bernard Debail, Cabasse Acoustic Center - Plouzané, France; Jean Marc Boucher, ENST de Bretagne - Brest Cedex, France; Yvon Kerneis, Pierre Yves Diquelou, Cabasse Acoustic Center - Plouzané, France
In multiway loudspeaker systems, digital signal processing techniques have been used so far mainly to correct frequency response, time alignment, and out of axis lobbing. In this paper a dedicated signal processing technique is described in order to also control the sound field radiated by co-axial loudspeaker systems in the overlap frequency band of drivers. Trade-offs and practical constraints (crossover, time shift, gain, etc.) are discussed and an optimization algorithm is proposed to provide the best achievable result. Real-time implementation of this technique is presented and leads to a nearly ideal point source.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 17:00
Convention Paper 6723 (Purchase now)

P12-11 Network Music Performance (NMP) in Narrow Band NetworksAlexander Carôt, International School of New Media (ISNM) - Lübeck, Germany; Ulrich Krämer, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Playing live music on the Internet is one of the hardest disciplines in terms of low delay audio capture and transmission, time synchronization, and bandwidth requirements. This has already been successfully evaluated with the Soundjack software, which can be described as a low latency UDP streaming application. In combination with the new Fraunhofer ULD Codec this technology could now be used in narrow band DSL networks without a significant increase of latency. This paper first describes the essential basics of network music performances in terms of soundcard and network issues and finally reviews the context under DSL narrow band network restrictions and the usage of the ULD Codec.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 17:20
Convention Paper 6724 (Purchase now)

P12-12 Intensive Noise Reduction Utilizing Inharmonic Frequency Analysis of GHATeruo Muraoka, University of Tokyo - Komaba Meguro-ku, Tokyo, Japan; Ryuji Takamizawa, Matsushita Electric Industrial Co., Ltd. - Kadoma City, Osaka, Japan; Yoshihiro Kanda, Musashi Institute of Technology - Tamadutumi Setagaya, Tokyo, Japan; Takumi Ohta, Kenwood Corporation - Hachiouji City, Tokyo, Japan
Removal of noise in SP record reproduction were attempted utilizing GHA (Generalized Harmonic Analysis) as inharmonic frequency analysis. Spectrum subtraction is most common among conventional noise reduction techniques, however it has a side effect of musical noise generation. It is caused by inaccurate frequency resolution inherent to conventional harmonic frequency analysis. One method of inharmonic frequency analysis of GHA is equipped with excellent frequency resolution, and it has been put in practical use recently. The authors applied GHA for noise reduction and obtained better results than those by conventional spectrum subtraction. However, there still remained musical noise problems, and its major reason is spectral in-coincidence between pre-sampled reference noise and actually remained residual noise. The authors tried several countermeasures such as pre-spectral shaping of object signal and spectral similarity calculation of residual noise, etc. Through combining countermeasures, the authors achieved satisfactory noise reduction.

[Associated Poster Presentation in Session P17, Monday, May 22, at 09:00]

Presentation is scheduled to begin at 17:40
Convention Paper 6725 (Purchase now)


P13 - Audio Archiving, Storage, Restoration, and Content Management

Sunday, May 21, 16:00 — 17:40

Chair: Antonio Oliart, WGBH Educational Foundation - Boston, MA, USA

P13-1 Advanced Cataloging and Search Techniques in Audio ArchivingHelge Blohmer, VCS Aktiengesellschaft - Bochum, Germany
Ever since the processing capabilities of computers reached the point where audio indexing and searching became possible using techniques beyond simple, manually entered textual annotation in the late 1980s, researchers have been developing such methods with varying degrees of success. Yet even today, the actual workflow in audio archives is dominated by text entry for cataloging and keywords for searching with few or none of the new methods having achieved any practical relevance. This paper evaluates a number of techniques, both those that enhance textual retrieval and those that seek to supplant it, toward their suitability for real-world audio archiving tasks with special focus on their suitability for a short-term implementation and seamless integration into existing archive workflows.

[Associated Poster Presentation in Session P18, Monday, May 22, at 11:00]

Presentation is scheduled to begin at 16:00
Convention Paper 6726 (Purchase now)

P13-2 Evaluation of Query-by-Humming Systems Using a Random Melody DatabaseJan-Mark Batke, Thomson, Corporate Research - Hannover, Germany; Gunnar Eisenberg, Technical University of Berlin - Berlin, Germany
The performance of melody retrieval using a query-by-humming (QBH) system depends on different parameters. For the query, parameters like length of the query and possibly contained errors influence the success of the retrieval. But also the size of the melody database (MDB) inside a QBH system has a certain impact on the query. This paper describes how the statistical parameters of a random melody database are modeled to get the same behavior as a database containing authentic melodies. Databases containing random melodies are a testing facility to QBH systems.

Presentation is scheduled to begin at 16:20
Convention Paper 6727 (Purchase now)

P13-3 MP3 Window-Switching Pattern Preliminary Analysis for General Purposes BeatAntonello D’Aguanno, Goffredo Haus, Giancarlo Vercellesi, Università degli Studi di Milano - Milan, Italy
This paper analyses the dependency of the window-switching pattern versus different encoders, bit rates, and encoder quality features. We propose a simple template-matching algorithm to solve beat tracking contest in music with drums. This algorithm uses windows-switching pattern information only. Commonly in a beat-tracking system the window-switching pattern is used to refine the results of a frequency evaluation. Furthermore, this paper wants to demonstrate the reliability of the window-switching pattern to solve beat-tracking problems in music with drums independently from encoders, bit rates, encoders’ quality features, and frequency analysis. This paper confirms the window-switching pattern is adequate information in a beat-tracking contest at every bit rate and for every encoder.

[Associated Poster Presentation in Session P18, Monday, May 22, at 11:00]

Presentation is scheduled to begin at 16:40
Convention Paper 6728 (Purchase now)

P13-4 Application of MPEG-4 SLS in MMDBMSs—Requirements for and Evaluation of the FormatMaciej Suchomski, Klaus Meyer-Wegener, Florian Penzkofer, Friedrich-Alexander University Erlangen-Nuremberg - Erlangen, Germany
Specific requirements for audio storage in multimedia database management systems, where data independence of continuous data plays a key role, are described in this paper. Based on the desired characteristics of the internal format for natural audio considering especially long-time storage, where the storage must be lossless, allowing among others easy upgrade of the system, the new MPEG-4 scalable lossless audio coding (SLS) is briefly explained. It is then evaluated w.r.t. the discussed requirements, looking at characteristics and processing complexity of the algorithm. Some suggestions of the possible modifications are given at the end.

[Associated Poster Presentation in Session P18, Monday, May 22, at 11:00]

Presentation is scheduled to begin at 17:00
Convention Paper 6729 (Purchase now)

P13-5 Applying EAI Technologies to Bimedial Broadcast Environments: Challenges, Chances, and RisksMichael Zimmermann, VCS Aktiengesellschaft - Bochum, Germany
More and more broadcast companies try to optimize their production environments by enforcing bimedial workflows. The recent applications and tools on the other hand only have poor integration interfaces to achieve this goal. EAI, originally focusing on the integration of legacy systems, has become a mature toolset to integrate various systems and offering tools and applications to ease integration. This paper shows the possibilities and limits of EAI in bimedial broadcast environments.

[Associated Poster Presentation in Session P18, Monday, May 22, at 11:00]

Presentation is scheduled to begin at 17:20
Convention Paper 6730 (Purchase now)


P14 - Posters: Multichannel Sound

Sunday, May 21, 16:00 — 17:30

P14-1 Effectiveness of Height Information for Reproducing the Presence and Reality in the Multichannel Audio SystemKimio Hamasaki, Toshiyuki Nishiguchi, NHK Science & Technical Research Laboratories - Tokyo, Japan; Koichiro Hiyama, NHK Kumamoto Station - Kumamoto, Japan; Reiko Okumura, NHK Science & Technical Research Laboratories - Tokyo, Japan
A 22.2 multichannel sound system was developed that adapts to an ultrahigh-definition video system with 4000 scanning lines. The sound system consists of loudspeakers with three layers: an upper layer with nine channels, a middle layer with ten channels, and a lower layer with three channels and two channels for low frequency effects. This system has new features of three-dimensional sound reproduction. Subjective evaluation by the semantic differential (SD) method are presented to assess the importance of height information for a sound system using several stimuli in a 22.2 multichannel audio system with Super Hi-Vision and a high-definition television. Furthermore, the actual effectiveness of height information and some practical suggestions for aesthetic mixing of three-dimensional audio is also presented.

[Poster Presentation Associated with Paper Presentation P7-1]
Convention Paper 6679 (Purchase now)

P14-2 Miniature Microphone Arrays for Multichannel RecordingJuha Backman, Nokia Corporation - Espoo, Finland
This paper describes a method of using a dense array of miniature microphones (e.g., MEMS or miniature electret) to yield precise one-point multichannel gradient microphones. The signals obtained from individual microphones in the array are used to obtain an estimate for the zero, first-, and second-order components of the gradient of the sound field at the center of the array. (Higher orders of the gradient tend to be too noisy for actual sound recording purposes.) These can be used to form stereo or multichannel signals with adjustable polar patterns for recording purposes.

[Poster Presentation Associated with Paper Presentation P7-3]
Convention Paper 6681 (Purchase now)

P14-3 Benefits of Distance Correction for Multichannel MicrophonesThomas Görne, Detmold University of Music - Detmold, Germany
Subjective assessment of stereophonic or multichannel microphone techniques often suffers from differences in the diffuse field sensitivities of various arrays. Diffuse field behavior of single rotation-symmetrical microphones at lateral direct sound incidence can be derived from the polar equations of ideal first-order gradient transducers. This simple model is used to estimate distance correction factors for symmetrical two-dimensional arrays as well as for MS pairs. The benefits of corrected stereo setups are also investigated.

[Poster Presentation Associated with Paper Presentation P7-4]
Convention Paper 6682 (Purchase now)

P14-4 Virtual Source Location Information-Based Matrix Decoding SystemHan-gil Moon, Manish Arora, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-Do, Korea
In this paper a new matrix decoding system using vector-based Virtual Source Location Information (VSLI) is proposed as one alternative to the conventional Dolby Pro logic II/IIx system for reconstructing multichannel output signal from matrix encoded 2-channel signals, Lt/Rt. This new matrix decoding system is composed of a passive decoding part and an active part. The passive part makes crude multichannel signals using a linear combination of the two encoded signals(Lt/Rt), and the active part enhances each channel regarding the virtual source, which is emergent in each inter channel. The virtual sources between channels are estimated by the inverse constant power panning law.

[Poster Presentation Associated with Paper Presentation P7-5]
Convention Paper 6683 (Purchase now)

P14-5 Quality Degradation Effects Caused by Limiting the Bandwidth of Standard Surround Sound Channels and Hierarchically Encoded MSBTF Channels: A Comparative StudyYu Jiao, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Limiting the bandwidth of multichannel audio can be used as an effective method of trading-off audio quality with broadcasting costs. In this paper subjective effects of two controlled high-frequency limitation methods on multichannel audio quality were studied with formal listening tests. The first method was based on limiting the bandwidth of standard surround sound channels (Rec. ITU-R BS. 775-1); the second involved limiting the bandwidth of the hierarchically encoded MSBTF channels. The results are compared and discussed. In this experiment, the low frequency effect (LFE) channel was omitted.

[Poster Presentation Associated with Paper Presentation P7-7]
Convention Paper 6685 (Purchase now)

P14-6 Initial Developments of an Objective Method for the Prediction of Basic Audio Quality for Surround Audio RecordingsSunish George, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
This paper describes the development of the objective method for the prediction of the Basic Audio Quality (BAQ) of band-limited or down-mixed surround audio recordings. A number of physical parameters, including interaural cross-correlation coefficients and spectral descriptors, were extracted from the recordings and used in a linear regression model to predict BAQ scores obtained from listening tests. The results showed a high correlation between the predicted scores and those obtained from the listening test, with the average error of prediction being smaller than 10 percent. Although the method was originally developed for 5-channel surround recordings, after some modifications it can be upgraded to any number of audio channels.

[Poster Presentation Associated with Paper Presentation P7-8]
Convention Paper 6686 (Purchase now)

P14-7 Tactile Strategies and Resources for Teaching Multichannel Sound ConceptsLeslie Gaston, University of Colorado at Denver - Denver, CO, USA
Several university audio programs now incorporate multichannel, or surround sound, into their curricula. In order to supplement these courses and lectures many opportunities exist to incorporate hands-on demonstrations of concepts used for microphone techniques, mixing, monitoring, and delivery. This paper will give suggestions for different tactile strategies that can be used to illustrate concepts in multichannel audio, as well as other resources that may be utilized when doing preparation and research for teaching classes. Suggestions for homework and research topics for students will also be provided, along with recommended equipment needs.

[Poster Presentation Associated with Paper Presentation P7-11]
Convention Paper 6689 (Purchase now)

P14-8 Spatial Aliasing Artifacts Produced by Linear and Circular Loudspeaker Arrays Used for Wave Field SynthesisSascha Spors, Deutsche Telekom Laboratories - Berlin, Germany; Rudolf Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany
Wave field synthesis allows the exact reproduction of sound fields if the requirements of its physical foundation are met. However, the practical realization imposes certain technical constraints. One of these is the application of loudspeaker arrays as an approximation to a spatially continuous source distribution. The effect of a finite spacing of the loudspeakers can be described as spatial sampling artifacts. This contribution derives a description of the spatial sampling process for planar linear and circular arrays, analyzes the sampling artifacts, and discusses the conditions for preventing spatial aliasing. It furthermore introduces the reproduced aliasing-to-signal ratio as a measure for the energy of aliasing contributions.

[Poster Presentation Associated with Paper Presentation P11-2]
Convention Paper 6711 (Purchase now)

P14-9 Characterization of the Reverberant Sound Field Emitted by a Wave Field Synthesis Driven Loudspeaker ArrayTerence Caulkins, Olivier Warusfel, IRCAM - Paris, France
Realistic sound reproduction using wave field synthesis (WFS) in concert halls involves ensuring that both the direct and reverberated sound fields are accurate at all listening positions. Though methods for controlling the direct sound field have been described in the past, the control of the reverberated sound field associated to WFS sources remains a topic of interest. This paper describes the characteristics of the reverberated sound field associated to a WFS array as it synthesizes a virtual point source. Variations in the directivity and positioning of the virtual source are shown to have an effect on the associated room effect. A solution for controlling the reverberated sound field in a concert hall equipped with a WFS system is proposed based on this characterization.

[Poster Presentation Associated with Paper Presentation P11-3]
Convention Paper 6712 (Purchase now)

P14-10 Virtual Concert: Spatial Sound in DVD TechnologyDavid Gordon, Glasgow Caledonian University - Glasgow, UK
This paper documents the use of spatial sound in DVD technology. It sets out to evaluate the communicative abilities of spatial sound and the implications of combining spatial sound along with selective multiple camera angles. We also Increase the rationale by investigating the use of a nonlinear structure in the presentation of audio-visual DVD products. We assert that no product currently integrates these deconstructed components into a singular framework, and therefore reports on the development of a concept titled Virtual Concert. The paper also discusses the underlying concept of Virtual Concert in relation to the combination of surround sound music mixes with the corresponding camera angle, presented in a nonlinear structure. The emphasis is on practical subjective evaluation through a screening of Virtual Concert and subsequent distribution of comprehensive questionnaires.
Convention Paper 6731 (Purchase now)

P14-11 The Adaptation of Concert Hall Measures of Spatial Impression to Reproduced SoundJonathan Hirst, William J. Davies, University of Salford - Salford, UK; Peter Philipson, Liverpool Institute for Performing Arts - Liverpool, UK
A method of objectively measuring the spatial capabilities of multichannel sound systems has been investigated. The method involved the comparison of interaural cross correlation (IACC) measurements taken in a concert hall to IACC measurements taken in reproduced versions of the same concert hall. The type of reproduction system was varied, and an indication of the spatial capabilities of each system was gained from the comparison of original and reproduced IACC measurements. The comparisons revealed that all the reproduction systems were unable to match the lowest IACC readings taken in the concert hall, and that the measurement method was capable of discriminating between the spatial performance of the reproduction systems and also to rank the system's performances in an expected order.
Convention Paper 6732 (Purchase now)

P14-12 Analysis of Spatial Resolution of Multiactuator PanelsBasilio Pueo, Sergio Bleda, University of Alicante - Alicante, Spain; José Escolano, José Javier Lopez, Technical University of Valencia - Valencia, Spain
A study of the aliasing frequency of Multiactuator Panels (MAPs) with application to Wave Field Synthesis (WFS) is presented. It is based on the periodicity of the spatial frequency in a wave number domain. The success of these loudspeakers for WFS lies in the absence of exciter cross interference, acting as single sources. However, the distance between exciters may not be indicative of the spatial resolution capability of the array. A set of four MAPs comprising 15 exciters were measured by using this multidimensional analysis. An additional dynamic loudspeaker array having the same loudspeaker spacing was also measured. Results confirm the diffuse radiation of MAPs and the superior performance when generating axial plane waves with respect to dynamic loudspeakers. Finally, there are little differences in the aliasing frequency with that expected from the distance between exciters.
Convention Paper 6733 (Purchase now)

P14-13 New CLD Quantization Method for Spatial Audio CodingYang-Won Jung, Hyen-O Oh, Hyo Jin Kim, Seung Jong Choi, LG Electronics - Seoul, Korea
In spatial audio coding, spatial parameters, such as CLDs, CPCs, and ICCs, are utilized for downmix and upmix of the multichannel audio signals. In the current version of MPEG Surround, a universal quantization table for CLD is applied independent to channel combinations. As intervals between adjacent channels differ in the conventional 5.1-channel configuration, this universal quantization scheme causes redundancies in some combinations while insufficiencies are caused in the other combinations. In this paper we propose a new CLD quantization method based on well-known amplitude panning law and spatial resolution of human perception. By the proposed quantization method, CLD can be represented more efficiently, and therefore, bit reduction and quality enhancement can be achieved.
Convention Paper 6734 (Purchase now)


P15 - Room and Architectural Acoustics

Monday, May 22, 08:40 — 12:00

Chair: Jan Voetmann, DELTA Acoustics - Hoersholm, Denmark

P15-1 Koch’s Snowflake: A Case Study of Sound Scattering of Fractal SurfacesDavid Degos, Steven Edson, Densil Cabrera, University of Sydney - Syndey, New South Wales, Australia
Diffusion and scattering are becoming increasingly relevant in room acoustics design. The scattering performance of current passive diffusers is often restricted to a certain bandwidth due to physical constraints. One possible approach to this is to use fractal surface profiles, which have similar geometric features over a wide range of scales, and so should achieve an extended bandwidth for effective scattering. A range of acoustic panels of varying complexity, based around Koch’s Snowflake pattern, was constructed and tested using a two-dimensional pseudo-anechoic method adapted from the AES-4id-2001. This paper reports on these results and also on issues encountered in implementing the measurements.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 08:40
Convention Paper 6735 (Purchase now)

P15-2 Large Scale FEM Analysis of a Studio RoomMahesh Bansal, Technical University of Berlin - Berlin, Germany; Stefan Feistel, SDA, Software Design Ahnert - Berlin, Germany; Wolfgang Ahnert, ADA, Acoustic Design Ahnert - Berlin, Germany
In room acoustics, particle models like ray tracing and image source method are not sufficient to explain the wave nature especially at low frequencies. For detailed acoustic investigation, many wave-based approaches like FEM, BEM, and finite difference methods have been proposed. We present an application of large-scale FEM analysis in order to obtain eigenmodes and transfer functions of a real-world studio with general impedance boundary conditions. Since FEM needs discretization of geometry into small elements like tetrahedral and hexahedral, we also propose a novel all-hexahedral mesh generator for arbitrary shaped rooms and show its application in room acoustics.

Presentation is scheduled to begin at 09:00
Convention Paper 6736 (Purchase now)

P15-3 Influence of Ray Angle of Incidence and Complex Reflection Factor on Acoustical Simulation Results (Part II)Emad El-Saghir, Acoustic Design Ahnert Limited - Cairo, Egypt; Stefan Feistel, SDA Software Design Ahnert GmbH - Berlin, Germany
In a previous paper (Convention Paper 6171, 116th AES Convention, Berlin, Germany), it was shown that the influence of neglecting the incidence-angle dependence of absorption coefficients in a simple single-source shoebox room model was insignificant as far as simulation results are concerned. Neglecting phase shift at each reflection led, however, to a significant difference in the predicted pressure in the same model. This paper investigates the same two questions in a complicated model with several sources and a diversity of surface materials. It attempts to analytically estimate the error associated with disregarding these two issues.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 09:20
Convention Paper 6737 (Purchase now)

P15-4 Adaptive Audio Equalization of Rooms Based on a Technique of Transparent Insertion of Acoustic Probe SignalsAriel Rocha, António Leite, Francisco Pinto, Aníbal Ferreira, University of Porto - Porto, Portugal
This paper presents a new method of performing real time adaptive equalization of room acoustics in the frequency domain. The developed method obtains the frequency response of the room by means of transparent insertion of a certain number of acoustic probe signals into the main audio spectrum. The opportunities for the insertion of tones are identified by means of a spectral analysis of the audio signal and using a psychoacoustic model of frequency masking. This enhanced version of the adaptive equalizer will be explained as well as its real-time implementation on a TMS320C6713 DSP-based platform. Results of the acoustic tests and conclusions about its performance will be presented.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6738 (Purchase now)

P15-5 An Amphitheatric Hall Modal Analysis Using the Finite Element Method Compared to In Situ MeasurementsAnastasia Papastefanou, Christos Sevastiadis, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The distribution of the low frequency room modes is important in room acoustics. The Finite Element Method (FEM) is a powerful numerical technique for analyzing the behavior of sound waves in enclosures, especially irregular ones. Also, it is the method that produces reliable results in the low frequency range where other methods like ray tracing and image source methods fail. A modal analysis is presented using the FEM in a nonrectangular, medium-sized amphitheatric hall, and we compare the calculated results with those obtained by on site measurements.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6739 (Purchase now)

P15-6 A Computer-Aided Design Method for Dimensions of a Rectangular Enclosure to Avoid Degeneracy of Standing WavesZhi Liu, Fan Wu, Beijing Union University - Beijing, China
A method for designing dimensions of a rectangular enclosure to avoid degeneracy of standing waves and the corresponding computer-aided design software are presented in this paper. A math model to calculate many dimensions in favor of avoiding degeneracy of standing waves is created. The similarity of the normal frequencies regarded as degeneracy is limited under a specific condition. Based on the relationship between normal frequencies and the dimensions of a rectangular enclosure, the dimensions to avoid degeneracy can be chosen. A Computer Aided Design program is also developed to identify the dimensions that can be applied in the design of a loudspeaker cabinet or room to get the best acoustic effect.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6740 (Purchase now)

P15-7 A 3-D Acoustic Simulation Program with Graphical Front-End for Scene InputAchim Kuntz, Rudolf Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany
A program for full three-dimensional simulation of sound propagation in enclosures is presented that interfaces to a graphical interface for intuitive setup of complex simulation scenes. The simulation algorithm is based on the wave digital filtering principle, allowing for arbitrary reflection coefficients at object boundaries and walls for realistic results. Simulation scenes are defined in an object oriented way. As a graphical user interface to the simulation program, a modeler front-end for a ray-tracing program is used. Simulation setups can thus be built by graphically placing objects in the scene. Being open source, the proposed modeler can easily be customized if required. Simulation results are shown for several example setups demonstrating the possibilities of our approach.

Presentation is scheduled to begin at 10:40
Convention Paper 6741 (Purchase now)

P15-8 Absorptive Material Arrangement Method for Global Interior Noise Control in a Wide Frequency RangeSung-Ho Cho, DM R&D Center, Digital Media Business - Suwon, Gyeong-gi, Korea; Yang-Hann Kim, Korean Advanced Institute of Science and Technology - Daejeon, Korea
A simple method is proposed to arrange absorptive material for global interior noise reduction in a wide frequency range. When an enclosure’s typical dimensions are of the order of several wavelengths or less, and sources and enclosure are geometrically complex, it is not easy to select the means that guide us to effectively control its noise by attaching absorptive materials on its walls. The proposed method, however, will lead the designer to better understand which treatments are most effective and how a better design can be achieved. The beauty of the proposed method is that one can easily find the absorptive material arrangement for global noise reduction unnecessary to calculate the sound field by using a perturbation method or boundary element method. This means that one can effectively find the absorbent’s arrangement because this method needs only eigen structures (eigen value and eigen function) of an enclosure.

Presentation is scheduled to begin at 11:00
Convention Paper 6742 (Purchase now)

P15-9 Real Time Acoustic Rendering of Complex Environments Including Diffraction and Curved SurfacesOlivier Deille, Julien Maillard, Nicolas Noé, Centre Scientifique et Technique de Bâtiment - Saint Martin d'Hères, France; Kadi Bouatouch, Institut de Recherche en Informatique et Systèmes Aléatoires - Rennes Cedex, France; Jacques Martin, Centre Scientifique et Technique de Bâtiment - Saint Martin d'Hères, France
A solution to produce virtual sound environments based on the physical characteristics of a modeled complex volume is described. The goal is to reproduce, in real time, the sound field depending on the position of the listener and to allow some interactivity (change in material characteristics for instance). First an adaptive beam tracing algorithm is used to compute a geometrical solution between the sources and several positions inside that volume. This algorithm is not limited to polygonal faces and handles diffraction. Then, the precomputed paths, once ordered and selected, are auralized, and an adaptive artificial reverberation is used. New techniques to allow for fast and accurate rendering are detailed. The proposed approach provides accurate audio rendering on headphones or within advanced multi-user immersive environments.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6743 (Purchase now)

P15-10 Comparisons between Binaural In-Situ Recordings and AuralizationsKonca Saher, Delft University of Technology - Delft, The Netherlands; Jens Holger Rindel, Technical University Denmark - Kgs. Lyngby, Denmark; Lau Nijs, Delft University of Technology - Delft, The Netherlands
The doctoral research of “Prediction and Assessment of Acoustical Quality in Living-rooms for People with Intellectual Disabilities” at Delft University of Technology investigates, among other issues, the applicability and verification of auralization as a quality assessment tool in acoustical-architectural design. This paper deals with the comparison between binaural in-situ recordings and auralizations obtained from computer simulations. Listening tests and questionnaires were prepared from auralizations to compare with the reference binaural recordings. The difficulties in evaluation of auralization quality are discussed. The results indicate that although auralizations and binaural recordings evoke different aural perception auralization is a strong tool to assess the acoustical environment before the space is built. Two commercial programs are used for the auralizations: ODEON and CATT-Acoustics.

[Associated Poster Presentation in Session P21, Monday, May 22, at 14:00]

Presentation is scheduled to begin at 11:40
Convention Paper 6744 (Purchase now)


P16 - Low Bit-Rate Audio Coding, Part 1

Monday, May 22, 08:40 — 12:20

Chair: Mark Vinton, Dolby Laboratories - San Francisco, CA, USA

P16-1 The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio CodecsPaulo Marins, Francis Rumsey, Slawomir Zielinsky, University of Surrey - Guildford, Surrey, UK
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artifacts inherent to low bit-rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artifacts. The choice of the artifacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For.”

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 08:40
Convention Paper 6745 (Purchase now)

P16-2 Improved Noise Weighting in CELP Coding of Speech—Applying the Vorbis Psychoacoustic Model to SpeexJean-Marc Valin, CSIRO ICT Centre - Epping, New South Wales, Australia; Xiph.Org Foundation; Christopher Montgomery, Red Hat - Westford, MA, USA; Xiph.Org Foundation
One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting filter. In this paper we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the significant advantage of improving the quality of an existing codec without the need to change the bit-stream. More specifically, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a significant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20 percent reduction in bit-rate. The technique itself is not specific to Speex and could be applied to other CELP codecs.

Presentation is scheduled to begin at 09:00
Convention Paper 6746 (Purchase now)

P16-3 Reduced Bit Rate Ultra Low Delay Audio CodingStefan Wabnik, Gerald Schuller, Jens Hirschfeld, Ulrich Krämer, Fraunhofer Institute for Digital media Technology (IDMT) - Ilmenau, Germany
An audio coder with a very low delay (6 to 8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced rate coding. We propose to use a different noise shaping method instead, resulting in an approach that uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 09:20
Convention Paper 6747 (Purchase now)

P16-4 Real-Time Subband-ADPCM Low-Delay Audio Coding ApproachFlorian Keiler, Thomson Corporate Research - Hannover, Germany
A low-delay audio codec using the ADPCM structure (ADPCM = adaptive differential pulse code modulation) in subbands is presented. With the use of eight subbands a coarse spectral shaping of the coding noise is obtained and the signal delay is approximately 3 ms. The targeted bit rate is in the range of 128 to 176 kbit/s per channel for near transparent audio quality. The codec uses a cosine-modulated filter bank and backward adaptive calculation of the prediction coefficients and quantization scaling factors. The computations are optimized for a real-time implementation on a fixed-point DSP with an almost constant workload over time. A comparison with the Philips Subband Coder (SBC) and the Fraunhofer Ultra Low Delay Codec (ULD) is performed.

Presentation is scheduled to begin at 09:40
Convention Paper 6748 (Purchase now)

P16-5 Scalable Bitplane Runlength CodingChris Dunn, Scala Technology Ltd. - London, UK
Low-complexity audio compression offering fine-grain bit rate scalability can be realized with bitplane runlength coding. Adaptive Golomb codes are computationally simple runlength codes that allow bitplane runlength coding to achieve notable coding efficiency. For multiblock audio frames, coefficient interleaving prior to bitplane runlength coding results in a substantial increase in coding efficiency. It is shown that bitplane runlength coding is more compact than the best known SPIHT arrangement for audio bitplane coding and achieves coding efficiency that is competitive with fixed-rate quantization.

Presentation is scheduled to begin at 10:00
Convention Paper 6749 (Purchase now)

P16-6 Scalable Audio Coding with Iterative Auditory MaskingChristophe Veaux, Pierrick Philippe, France Telecom R&D - Cesson-Sévigné, France
In this paper reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6750 (Purchase now)

P16-7 A Frequency-Domain Framework for Spatial Audio Coding Based on Universal Spatial CuesMichael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the emerging need to efficiently represent high-fidelity multichannel audio. The SAC methods previously described involve analyzing the input audio for interchannel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel configuration. In this paper we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 10:40
Convention Paper 6751 (Purchase now)

P16-8 Parametric Joint-Coding of Audio SourcesChristof Faller, EPFL - Lausanne, Switzerland
The following coding scenario is addressed. A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multichannel surround, wave field synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties that determine the spatial cues at the mixer output. Subjective evaluation indicates that the proposed scheme achieves high audio quality.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6752 (Purchase now)

P16-9 Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio CodingChristophe Tournery, Christof Faller, EPFL - Lausanne, Switzerland
For parametric stereo and multichannel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multichannel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g., binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose an algorithm for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6753 (Purchase now)

P16-10 Closing the Gap between the Multichannel and the Stereo Audio World: Recent MP3 Surround ExtensionsBernhard Grill, Oliver Hellmuth, Johannes Hilpert, Jürgen Herre, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel applications. Presented in 2004, the MP3 Surround format allows representation of high-quality 5.1 surround sound at bit rates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround. The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 11:40
Convention Paper 6754 (Purchase now)

P16-11 Design for High Frequency Adjustment Module in MPEG-4 HEAAC Encoder Based on Linear Prediction MethodHan-Wen Hsu, Yung-Cheng Yang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high-frequency adjustment module. In this paper an efficient method based on the Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit different subband contents. Furthermore, the artifact due to the sharing of control parameters is also investigated and the efficient decision criterion of control parameters is proposed.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 12:00
Convention Paper 6755 (Purchase now)


P17 - Posters: Signal Processing and High-Resolution Audio

Monday, May 22, 09:00 — 10:30

P17-1 All Amplifiers Are Analog, but Some Amplifiers Are More Analog than OthersBruno Putzeys, Hypex Electronics B.V. - Groningen, The Netherlands; André Veltman, Paul van der Hulst, Piak Electronic Design b.v. - Culemborg, The Netherlands; René Groenenberg, Mueta b.v. - Wijk en Aalburg, The Netherlands
This paper intends to clarify the terms “digital” and “analog” as applied to class-D audio power amplifiers. Since loudspeaker terminals require an analog voltage, an audio power amplifier must have an analog output. If its input is digital, digital-to-analog conversion is necessarily executed at some point. Once a designer acknowledges the analog output properties of a class-D power stage, amplifier quality can improve. The incorrect assumption that some amplifiers are supposedly digital, causes many designers to come up with complicated patches to ordinary analog phenomena such as timing distortion or supply rejection. This irrational approach blocks the way to a rich world of well-established analog techniques to avoid and correct many of these problems and realize otherwise unattainable characteristics such as excellent THD+N and extremely low output impedance throughout the audio band.

[Poster Presentation Associated with Paper Presentation P8-1]
Convention Paper 6690 (Purchase now)

P17-2 Toward an Ideal Switching (Class-D) Power Amplifier: How to Control the Flow of Power in a Switching Power CircuitRolf Esslinger, Dieter Jurzitza, Harman/Becker Automotive Systems - Karlsbad, Germany
The design of a switching (class-D) audio power amplifier suitable for high-end audio applications is still a very challenging task for circuit design and signal processing engineers. Classical power stage topologies using Pulse-Width Modulation (PWM) in combination with voltage-controlled MOSFET H-bridges are already available on the market, but their performance in terms of signal bandwidth and linearity is still far below the one of traditional class-A and A/B power stages. Moreover, EMC is an issue that is very hard to control. Class-D output stages are considered from a totally different point of view in this paper. The flow of power in the output stage, containing the switching power stage as a power control element, the output filter as an energy store, and the load as both a power sink and a power source in case the load is not a resistor but a real world loudspeaker device. It is shown, where in a typical power stage the power loss occurs, which is dissipated as heat. To improve the quality and efficiency of high-frequency switched power stages, investigation has to be taken into the way, how to control the flow of power into the storage elements and how to charge them most precisely and most efficiently. Some fundamental approaches for this will be shown in this paper.

[Poster Presentation Associated with Paper Presentation P8-2]
Convention Paper 6691 (Purchase now)

P17-3 PWM Amplifier Control Loops with Minimum Aliasing DistortionLars Risbo, Texas Instruments Denmark A/S - Lyngby, Denmark; Claus Neesgaard, Texas Instruments Inc. - Dallas, TX, USA
PWM class-D audio power amplifiers typically contain a control loop filter network and a comparator producing the PWM signal. The comparator performs a sampling operation whenever it changes state. A previous paper by the author analyzed this sampling behavior from a small signal point of view. The present paper attempts to formulate a large-signal model that accounts for the nonlinear effects of the sampling due to aliasing of high frequency carrier components. Closed-form expressions for the intrinsic THD of the traditional first- and second-order loops are derived. The model is validated using simulations, and a class of Minimum Aliasing Error (MAE) loop filters is presented that obtains minimum aliasing distortion thanks to the use of quadrature sampling. Finally, measurement data are presented for real applications using the principles described.

[Poster Presentation Associated with Paper Presentation P8-4]
Convention Paper 6693 (Purchase now)

P17-4 Simple, Ultralow Distortion Digital Pulse Width ModulatorBruno Putzeys, Hypex Electronics BV - Groningen Belgium
A core problem with digital pulse width modulators is that effective sampling occurs at signal-dependent intervals, falsifying the z-transform on which the input signal and the noise shaping process are based. In a first step the noise shaper is reformulated to operate at the timer clock rate instead of the pulse repetition frequency. This solves the uniform/natural sampling problem, but gives rise to new nonlinearities akin to ripple feedback in analog modulators. By modifying the feedback signal such that it reflects only the modulated edge of the pulse train this effect is practically eliminated, yielding vastly reduced distortion without increasing complexity.

[Poster Presentation Associated with Paper Presentation P8-5]
Convention Paper 6694 (Purchase now)

P17-5 A High Performance Open Loop All-Digital Class-D Audio Power Amplifier Using Zero Positioning Coding (ZePoC)Olaf Schnick, Wolfgang Mathis, University of Hannover - Hannover, Germany
Open loop all-digital Class-D amplifiers are uncommon due to the lack of the correcting feedback path leading to several problems resulting in high distortion compared to analog controlled class-D amplifiers. This paper shows that SB-ZePoC lowers switching frequency to 100 kHz. Therefore, these problems can be solved, so that it is possible to design an open loop all-digital class-D audio amplifier with low total distortions in the whole audio-band (20 Hz to 20 kHz) and an efficiency that reaches 90 percent. Results of a test-setup will be presented. The sonic performance will be demonstrated during the session.

[Poster Presentation Associated with Paper Presentation P8-6]
Convention Paper 6695 (Purchase now)

P17-6 A Three-Level Trellis Noise Shaping Converter for Class D AmplifiersLudovico Ausiello, Riccardo Rovatti, University of Bologna - Bologna, Italy; Gianluca Setti, University of Ferrara - Ferrara, Italy
Class D amplifiers can represent signals with three different output levels, +Vcc, 0, -Vcc, with no distortion. Exploiting this in order to achieve a better performance with no switching frequency increase, an extension to the classic pulse width modulation two level A/D conversion is proposed. Coding is achieved by extending output waveforms of a trellis-based sigma delta modulation to three levels. Simulation results have shown that, using the same symbol rate, a three-level pattern is achieved from 3.7 to 8.2 dB of SINAD improvement and a power consumption up to 5 times smaller.

[Poster Presentation Associated with Paper Presentation P8-7]
Convention Paper 6696 (Purchase now)

P17-7 Using SIP Techniques to Verify the Trade-Off between SNR and Information Capacity of a Sigma Delta ModulatorCharlotte Yuk-Fan Ho, Joshua Reiss, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK
The Gerzon-Craven noise shaping theorem states that the ideal information capacity of a sigma delta modulator design is achieved if and only if the noise transfer function (NTF) is minimal phase. In this paper it is found that there is a trade-off between the signal-to-noise ratio (SNR) and the information capacity of the noise shaped channel. In order to verify this result, loop filters satisfying and not satisfying the minimal phase condition of the NTF are designed via semi-infinite programming (SIP) techniques and solved using dual parameterization. Numerical simulation results show that the design with a minimal phase NTF achieves near the ideal information capacity of the noise shaped channel, but the SNR is low. On the other hand, the design with a nonminimal phase NTF achieves a positive value of the information capacity of the noise shaped channel, but the SNR is high. Results are also provided that compare the SIP design technique with Butterworth and Chebyshev structures and ideal theoretical SDMs, and evaluate the performance in terms of SNR and a variety of information theoretic measures which capture noise shaping qualities.

[Poster Presentation Associated with Paper Presentation P8-8]
Convention Paper 6697 (Purchase now)

P17-8 Estimation of Initial States of Sigma-Delta ModulatorsCharlotte Yuk-Fan Ho, Queen Mary, University of London - London, UK; Bingo Wing-Kuen Ling, King’s College London - London, UK; Joshua Reiss, Queen Mary, University of London - London, UK
In this paper an initial condition of a sigma-delta modulator is estimated based on quantizer output bit streams and an input signal. The set of initial conditions that generate a stable trajectory is characterized. It is found that this set, as well as the set of initial conditions corresponding to the quantizer output bit streams, are convex. Also, it is found that the mapping from the set of initial conditions to the stable admissible set of quantizer output bit streams is invertible if the loop filter is unstable. Hence, the initial condition corresponding to given stable admissible quantizer output streams and an input signal is uniquely defined when the loop filter is unstable, and a projection onto convex set approach is employed for approximating the initial condition.

[Poster Presentation Associated with Paper Presentation P8-9]
Convention Paper 6698 (Purchase now)

P17-9 Clean Clocks, Once and for All?Christian G. Frandsen, TC Electronic A/S - Risskov, Denmark; Chris Travis, Sonopsis Ltd. - Wotton-under-Edge, Gloucestershire, UK
Network-based digital audio interfaces are becoming increasingly popular. But they do pose a significant jitter problem wherever high-quality conversion to/from analog is required. This is true even with networks such as 1394 that provide dedicated support for isochronous flows. Conventional PLL solutions have too-little jitter attenuation, too-much intrinsic jitter, and/or too-narrow a frequency range. More advanced solutions tend to have too-high a cost. A new clocking technology that boasts high performance and low cost is presented. It has been implemented in a recent audio-over-1394 chip. We show comparative performance results and explore system-level implications, including for systems that use point-to-point links such as AES3, SPDIF, and ADAT.

[Poster Presentation Associated with Paper Presentation P8-11]
Convention Paper 6700 (Purchase now)

P17-10 SigmaStudio. A User-Friendly, Intuitive and Expandable, Graphical Development Environment for Audio/DSP ApplicationsMiguel Chavez, Camille Huin, Analog Devices, Inc. - Wilmington, MA, USA
Graphical development environments have been used in the audio industry for a number of years. Those who have fewer limitations have persisted and found a well-established pool of users that is reluctant to modify their design patterns and adopt different embedded processors and design environments. This paper provides a small history of the evolution of integrated development environments (IDEs). It then describes and explains the software architecture decisions and design challenges that were used to develop SigmaStudio. It will also show the advantages that those decisions have meant for the SigmaDSP family of audio-centric embedded processors.

[Poster Presentation Associated with Paper Presentation P12-1]
Convention Paper 6714 (Purchase now)

P17-11 Adaptive Filters in Wavelet Transform DomainVladan Bajic, Audio-Technica US - Stow, OH, USA
This paper presents performance comparison between two methods of implementing adaptive filtering algorithms for noise reduction, namely the Normalized time domain Least Mean Squares (NLMS) algorithm and the Wavelet transform domain LMS (WLMS). A brief theoretical development of both methods is explained, and then both algorithms are implemented on a real-time Digital Signal Processing (DSP) system used for audio signals processing. Results are presented showing the performance of each algorithm both in time and frequency domains. Noise reduction effects produced by different algorithms were shown across the spectrum, and distorting effects were analyzed. Trade-offs of convergence speed versus added noise were analyzed. Overall results show convergence speed improvement when using WLMS algorithms over the NLMS algorithm.

[Poster Presentation Associated with Paper Presentation P12-3]
Convention Paper 6716 (Purchase now)

P17-12 Adaptive Time-Frequency Resolution for Analysis and Processing of AudioAlexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform (FFT). The fixed time-frequency resolution of the STFT can lead to the undesirable smearing of events in both time and frequency. In this paper we suggest adaptively varying STFT time-frequency resolution in order to reduce filter bank-specific artifacts while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated as applied to spectrogram displays, noise reduction, and spectral effects processing.

[Poster Presentation Associated with Paper Presentation P12-4]
Convention Paper 6717 (Purchase now)

P17-13 Advanced Methods for Shaping Time-Frequency Areas for the Selective Mixing of SoundsPiotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland; Adam Kleczkowski, University of Cambridge - Cambridge, UK
The “Selective Mixing of Sounds” (AES 119th Convention Paper 6552) contains a large and conceptually challenging part, which had not been developed previously. This is a method of determining the areas of dominance by different tracks in the time-frequency plane. It has a major effect on the overall quality of the sound. In this paper we propose and compare a range of appropriate algorithms. We begin with a simple two-dimensional running mean combined with a rule selecting the track characterized by the maximum energy, followed by a low-pass filter based on the two-dimensional Fourier transform. We also propose two novel methods based on the Monte-Carlo approach, in which local probabilistic rules are iterated many times to produce a required level of smoothing.

[Poster Presentation Associated with Paper Presentation P12-5]
Convention Paper 6718 (Purchase now)

P17-14 Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingMarc Vinyes, Jordi Bonada, Alex Loscos, Pompeu Fabra University - Barcelona, Spain
Audio blind separation in real commercial music recordings is still an open problem. In the last few years some techniques have provided interesting results. This paper presents a human-assisted clusterization of the DFT coefficients for the time-frequency masking demixing technique. The DFT coefficients are grouped by adjacent pan, interchannel phase difference, and magnitude and magnitude-variance with a real-time interactive graphical interface. Results prove that an implementation of such technique can be used to demix tracks from nowadays commercial songs. Sample sounds can be found at http://www.iua.upf.es/~mvinyes/abs/demos.

[Poster Presentation Associated with Paper Presentation P12-6]
Convention Paper 6719 (Purchase now)

P17-15 Enhanced Control of Sound Field Radiated by Co-Axial Loudspeaker Systems Using Digital Signal Processing TechniquesHmaied Shaiek, ENST de Bretagne - Brest Cedex, France; Bernard Debail, Cabasse Acoustic Center - Plouzané, France; Jean Marc Boucher, ENST de Bretagne - Brest Cedex, France; Yvon Kerneis, Pierre Yves Diquelou, Cabasse Acoustic Center - Plouzané, France
In multiway loudspeaker systems, digital signal processing techniques have been used so far mainly to correct frequency response, time alignment, and out of axis lobbing. In this paper a dedicated signal processing technique is described in order to also control the sound field radiated by co-axial loudspeaker systems in the overlap frequency band of drivers. Trade-offs and practical constraints (crossover, time shift, gain, etc.) are discussed and an optimization algorithm is proposed to provide the best achievable result. Real-time implementation of this technique is presented and leads to a nearly ideal point source.

[Poster Presentation Associated with Paper Presentation P12-10]
Convention Paper 6723 (Purchase now)

P17-16 Network Music Performance (NMP) in Narrow Band NetworksAlexander Carôt, International School of New Media (ISNM) - Lübeck, Germany; Ulrich Krämer, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Playing live music on the Internet is one of the hardest disciplines in terms of low delay audio capture and transmission, time synchronization, and bandwidth requirements. This has already been successfully evaluated with the Soundjack software, which can be described as a low latency UDP streaming application. In combination with the new Fraunhofer ULD Codec this technology could now be used in narrow band DSL networks without a significant increase of latency. This paper first describes the essential basics of network music performances in terms of soundcard and network issues and finally reviews the context under DSL narrow band network restrictions and the usage of the ULD Codec.

[Poster Presentation Associated with Paper Presentation P12-11]
Convention Paper 6724 (Purchase now)

P17-17 Intensive Noise Reduction Utilizing Inharmonic Frequency Analysis of GHATeruo Muraoka, University of Tokyo - Komaba Meguro-ku, Tokyo, Japan; Ryuji Takamizawa, Matsushita Electric Industrial Co., Ltd. - Kadoma City, Osaka, Japan; Yoshihiro Kanda, Musashi Institute of Technology - Tamadutumi Setagaya, Tokyo, Japan; Takumi Ohta, Kenwood Corporation - Hachiouji City, Tokyo, Japan
Removal of noise in SP record reproduction were attempted utilizing GHA (Generalized Harmonic Analysis) as inharmonic frequency analysis. Spectrum subtraction is most common among conventional noise reduction techniques, however it has a side effect of musical noise generation. It is caused by inaccurate frequency resolution inherent to conventional harmonic frequency analysis. One method of inharmonic frequency analysis of GHA is equipped with excellent frequency resolution, and it has been put in practical use recently. The authors applied GHA for noise reduction and obtained better results than those by conventional spectrum subtraction. However, there still remained musical noise problems, and its major reason is spectral in-coincidence between pre-sampled reference noise and actually remained residual noise. The authors tried several countermeasures such as pre-spectral shaping of object signal and spectral similarity calculation of residual noise, etc. Through combining countermeasures, the authors achieved satisfactory noise reduction.

[Poster Presentation Associated with Paper Presentation P12-12]
Convention Paper 6725 (Purchase now)

P17-18 Multichannel Noise-Reduction-Systems for Speaker Identification in an Automotive EnvironmentVolker Mildner, Stefan Goetze, Karl-Dirk Kammeyer, University Bremen - Bremen, Germany
Devices for communication and information utilized by car drivers are facing two essential requirements: hands-free operation via distant microphones but also robustness against different noises depending on car speed, etc. Automatic loudspeaker identification can be utilized within such devices to either supply speech recognition systems with so called a priori information to achieve higher recognition rates or even to enable applications such as heating systems to adjust to the preferences of the driver. Thus identifying the driver from a predefined group of possible system users may be a task for future applications. The aim in this paper is to investigate to what extent multichannel noise reduction systems are suitable for improving the performance of loudspeaker identification algorithms under different acoustic conditions in an automotive environment.
Convention Paper 6756 (Purchase now)

P17-19 Optimal Quantized Linear Prediction Coefficients for Lossless Audio Compression—Scalar Quantization RevisitedFlorin Ghido, Tampere University of Technology - Tampere, Finland
Uniform scalar quantization of linear prediction coefficients is traditionally done by multiplying each coefficient with Q=2B and rounding it to the nearest integer. We propose an improved, optimal quantization method by replacing the rounding with a more elaborated procedure. The method uses 2 bits less per quantized
prediction coefficient for a similar misadjustment and allows an accurate estimate of the misadjustment as a function of Q. We introduce several efficient time-constrained probabilistic search methods for obtaining near optimal solutions. No changes are
required at the decoder and the method is applicable on a wider area of cases (mono, stereo, and multichannel prediction) than quantization of reflection coefficients. Moreover, it enables near optimal compression for 24 bit audio using only 32 bit
arithmetic operations.
Convention Paper 6757 (Purchase now)

P17-20 Efficient Out of Head Localization System for Mobile ApplicationsTacksung Choi, Yonsei University - Seoul, Korea; Young Cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
Headphone reproduction of stereo sources often gives in-the-head-localization. One possible solution to this problem is to give directional filtering and room response to the headphone reproduction system. Conventional out of head localization (OHL) schemes consist usually of a tapped delay line to simulate the direct signal path and early room reflections. Each of the taps must be filtered by a pair of HRTF, which leads to a very high processing cost. Our study is based on the fact that spatial impression (SI) can increase the effects of OHL. Our research is to generate the maximum SI with a minimum cost. Through subjective listening tests, the degree of SI was found to be the greatest for reflections within 15- to 30-msec time frame from the direct sound, and it is greatest for those in opposite direction to the listener’s ears. Based on the test results, we propose an efficient OHL system. In the proposed system, multiple reflections are replaced by a pair of reflections, and HRTF filtering required to simulate directivity of the reflections is implemented using a set of first order IIR shelving filters. According to the subjective tests, we show that the proposed system efficiently creates OHL with a small computational figure, and its performance is comparable to the conventional scheme of high complexity.
Convention Paper 6758 (Purchase now)

P17-21 A Psychoacoustic Noise Reduction Approach for Stereo Hands-Free SystemsStefan Goetze, Volker Mildner, Karl-Dirk Kammeyer, University of Bremen - Bremen, Germany
One demand for comfortable high quality hands-free video conferencing systems is the transmission of a spatial acoustical impression. Therefore a major task is the transmission of stereo speech signals from a noisy environment. The suppression of the noise components must not corrupt the stereo effect. In this context different single channel, multichannel, and hybrid speech enhancement systems will be evaluated in this paper. The problem of musical noise in post-filter-algorithms is addressed. Therefore, a psychoacoustic masking threshold for the noise reduction algorithms is considered.
Convention Paper 6759 (Purchase now)

P17-22 Estimation of Talker’s Head Orientation Based on Oriented Global Coherence FieldAlessio Brutti, ITC-irst - Trento, Italy, Università di Trento, Trento, Italy; Maurizio Omologo, Piergiorgio Svaizer, ITC irst - Trento, Italy
This work describes a new method for estimating the orientation of a not omnidirectional sound source given a distributed microphone network. The technique requires that a set of microphone pairs be distributed over a room, and it exploits the coherence computed from each sensor pair in order to derive an estimation of the head orientation. A database consisting of an audio sequence reproduced by a loudspeaker with different orientations and different positions was collected in order to evaluate the algorithm behavior. Experiments conducted on that database show that our approach can provide an efficient estimation of the sound source orientation, with an RMS error of about 10 degrees. Satisfactory performance was confirmed by tests with real human speakers.
Convention Paper 6760 (Purchase now)

P17-23 High-Quality Blind Bandwidth Extension of Audio for Portable Player ApplicationsManish Arora, Joonhyun Lee, Sangil Park, Samsung Electronics Co. Ltd. - Suwon, Korea
Bandwidth limitation in lossy audio coding schemes significantly reduces the perceived quality. High frequency bandwidth extension schemes have been proposed but are difficult to implement in applications where they are needed most, in portable audio devices with severe complexity constraints. This paper describes a high-quality blind bandwidth extension method proposing efficient initial audio bandwidth detection, band-based nonlinear processing, and simple regenerated spectral envelop shaping enhancements. Objective and subjective measurements of the processed signal have yielded significant quality improvements with very low complexity requirements allowing easy implementation on a wide variety of portable player platforms.
Convention Paper 6761 (Purchase now)

P17-24 Coherence Enhanced Minimum Statistics Spectral Subtraction in Bimicrophone SystemsJonathan Fillion-Deneault, Roch Lefebvre, Sherbrooke University - Sherbrooke, Quebec, Canada
A novel system for 2-channel spectral subtraction is presented. The objective is to improve the intelligibility of speech in noisy environments by enhancing noise reduction of single microphone techniques as well as to greatly reduce the amount of musical noise that they introduce. The system consists of two different blocks. The first processing consists of a generalized spectral subtraction block on the primary channel using minimum statistics for noise estimation followed by a coherence-based post-filter for additional noise suppression. Subjective and objective testing of both simulated and real-world recordings show that listeners prefer the proposed system to other state-of-the-art speech enhancement reduction techniques.
Convention Paper 6762 (Purchase now)

P17-25 Sound Field Analysis Based on Generalized Prolate Spheroidal WaveMathieu Guillaume, Yves Grenier, Télécom Paris - Paris, France
In this paper an array process to improve the quality of sound field analysis, which aims to extract spatial properties of a sound field, is described. In this domain, the notion of spatial aliasing inevitably occurs due to the finite number of microphones used in the array. It is linked to the Fourier transform of the discrete analysis window, which constitutes a mainlobe, fixing the resolution achievable by the spatial analysis, and also from sidelobes, which degrade the quality of spatial analysis by introducing artifacts not present in the original sound field. A method to design an optimal analysis window with respect to a particular wave vector is presented, aiming to realize the best localization possible in the wave vector domain. The efficiency of the approach is then demonstrated for several geometrical configurations of the microphone array, on the whole bandwidth of sound fields.
Convention Paper 6763 (Purchase now)

P17-26 Optimization of Co-centered Rigid and Open Spherical Microphone ArraysAbhaya Parthy, Craig Jin, André van Schaik, University of Sydney - Sydney, New South Wales, Australia
We present a novel microphone array that consists of an open spherical array with a smaller rigid spherical array at its center. The distribution of microphones, which results in the array having the largest frequency range, for a given beamforming order, was obtained by analyzing microphone errors. For a fixed number of microphones, the results for several examples indicate that the maximum frequency range is obtained when the microphones are relatively evenly distributed between the open and rigid spheres.
Convention Paper 6764 (Purchase now)

P17-27 Review and Discussion on Classical STFT-Based Frequency EstimatorsMichaël Betser, Patrice Collen, France Télécom R&D - Cesson-Sévigné, France; Gaël Richard, Bertrand David, Telecom Paris - Paris, France
Sinusoidal modeling is based on the decomposition of audio signals into a sum of sinusoidal components plus a noise residual part. It involves accurate sinusoid parameters estimation and, in particular, accurate frequency estimation. A broad category of methods uses the Fast Fourier Transform (FFT) as a starting point to compute frequency. All these methods present very similar forms of estimators, but the relations between them are not yet fully understood. This paper proposes to take a deeper look into these relations. The first goal of this paper is to present a clear review and description of the classical FFT-based frequency estimators. A new estimator similar to the phase vocoder is presented. The secod goal of this paper is to identify the common hypotheses and the common steps of the processes for this category of estimators. Last, experimental comparisons are given.
Convention Paper 6765 (Purchase now)

P17-28 Accurate Phase Estimation for Chirp-Like SignalsMichaël Betser, Patrice Collen, Jean-Bernard Rault, France Télécom R&D - Cesson-Sévigné, France
Sinusoidal modeling relies on the decomposition of a given signal (continuous or discrete) into a set of sinusoidal components plus a residual signal. The sinusoidal parameters, namely the amplitude, frequency, and phase, may vary upon time. Generally, the tracking of these parameters is performed via Short-Time Fourier Transform (STFT) analysis, providing in fine, for each sinusoidal component, estimates of the amplitude, frequency, and phase for a considered time slot. The duration of the analysis time slots is chosen in order to guarantee that the signal under analysis is stationary enough to deliver useful data. If this requirement is not met, in particular if the frequency varies in the analysis slot, the phase estimation is biased. This paper introduces a method to estimate and to correct this bias as a function of the analysis parameters (window type and size) and of the frequency slope.
Convention Paper 6766 (Purchase now)

P17-29 Equalization of Audio Systems Using Kautz Filters with Log-Like Frequency ResolutionTuomas Paatero, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
This paper presents a new digital filtering approach to the equalization of audio systems such as loudspeaker and room responses. The equalization scheme utilizes a particular infinite impulse response (IIR) filter configuration called Kautz filters, which can be seen as generalizations of finite impulse response (FIR) filters and their warped counterparts. The desired frequency resolution allocation, in this case h4e logarithmic one, is attained by a chosen set of fixed pole positions that define the particular Kautz filter. The frequency resolution mapping is characterized by the all-pass part of the Kautz filter, which is interpreted as a formal generalization of the warping concept. The second step in the actual equalizer design consists of assigning the Kautz filter tap-output weights, which is then, in turn, more or less a standard least-square configuration. The proposed method is demonstrated using measured loudspeaker and room responses.
Convention Paper 6767 (Purchase now)


P18 - Posters: Signal Processing and High-Resolution Audio

Monday, May 22, 11:00 — 12:30

P18-1 Advanced Cataloging and Search Techniques in Audio ArchivingHelge Blohmer, VCS Aktiengesellschaft - Bochum, Germany
Ever since the processing capabilities of computers reached the point where audio indexing and searching became possible using techniques beyond simple, manually entered textual annotation in the late 1980s, researchers have been developing such methods with varying degrees of success. Yet even today, the actual workflow in audio archives is dominated by text entry for cataloging and keywords for searching with few or none of the new methods having achieved any practical relevance. This paper evaluates a number of techniques, both those that enhance textual retrieval and those that seek to supplant it, toward their suitability for real-world audio archiving tasks with special focus on their suitability for a short-term implementation and seamless integration into existing archive workflows.

[Poster Presentation Associated with Paper Presentation P13-1]
Convention Paper 6726 (Purchase now)

P18-2 MP3 Window-Switching Pattern Preliminary Analysis for General Purposes BeatAntonello D’Aguanno, Goffredo Haus, Giancarlo Vercellesi, Università degli Studi di Milano - Milan, Italy
This paper analyses the dependency of the window-switching pattern versus different encoders, bit rates, and encoder quality features. We propose a simple template-matching algorithm to solve beat tracking contest in music with drums. This algorithm uses windows-switching pattern information only. Commonly in a beat-tracking system the window-switching pattern is used to refine the results of a frequency evaluation. Furthermore, this paper wants to demonstrate the reliability of the window-switching pattern to solve beat-tracking problems in music with drums independently from encoders, bit rates, encoders’ quality features, and frequency analysis. This paper confirms the window-switching pattern is adequate information in a beat-tracking contest at every bit rate and for every encoder.

[Poster Presentation Associated with Paper Presentation P13-3]
Convention Paper 6728 (Purchase now)

P18-3 Application of MPEG-4 SLS in MMDBMSs—Requirements for and Evaluation of the FormatMaciej Suchomski, Klaus Meyer-Wegener, Florian Penzkofer, Friedrich-Alexander University Erlangen-Nuremberg - Erlangen, Germany
Specific requirements for audio storage in multimedia database management systems, where data independence of continuous data plays a key role, are described in this paper. Based on the desired characteristics of the internal format for natural audio considering especially long-time storage, where the storage must be lossless, allowing among others easy upgrade of the system, the new MPEG-4 scalable lossless audio coding (SLS) is briefly explained. It is then evaluated w.r.t. the discussed requirements, looking at characteristics and processing complexity of the algorithm. Some suggestions of the possible modifications are given at the end.

[Poster Presentation Associated with Paper Presentation P13-4]
Convention Paper 6729 (Purchase now)

P18-4 Applying EAI Technologies to Bimedial Broadcast Environments: Challenges, Chances, and RisksMichael Zimmermann, VCS Aktiengesellschaft - Bochum, Germany
More and more broadcast companies try to optimize their production environments by enforcing bimedial workflows. The recent applications and tools on the other hand only have poor integration interfaces to achieve this goal. EAI, originally focusing on the integration of legacy systems, has become a mature toolset to integrate various systems and offering tools and applications to ease integration. This paper shows the possibilities and limits of EAI in bimedial broadcast environments.

[Poster Presentation Associated with Paper Presentation P13-5]
Convention Paper 6730 (Purchase now)

P18-5 Personal Audio HeadrestChiho Chung, Samsung Electronics Co. Ltd. - Suwon, Korea; Steve Elliott, ISVR, University of Southampton - Southampton UK
Active noise control was implemented using loudspeakers embedded in the headrests of two adjacent seats. The goal of this project was to create a quiet zone surrounded by headrest 1, which was free from noise caused by the adjacent loudspeaker mounted in the next-to headrest 2. While headrest 1 was generating noise, headrest 2 was designed to cancel out noise that was being generated by headrest 1, by driving the anti-noise signal through the loudspeaker, without using earphones/headphones. Control source, the loudspeaker of headrest 1 generating anti-noise, was made from FIR convolution with the electrical signal going to the primary source, the loudspeaker of headset 2 generating target-noise. Implementing both primary and control sources results in a 20 to 30 dB noise reduction throughout the targeted frequency range (2 kHz and below) in terms of squared acoustic pressure.
Convention Paper 6768 (Purchase now)

P18-6 Accidental Wow Evaluation Based on Sinusoidal Modeling and Neural Nets PredictionPrzemyslaw Maziewski, Lukasz Litwic, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
In this paper an algorithmic approach to the wow defect characteristic evaluation is presented. The approach is based on a sinusoidal analysis comprising both amplitude and phase spectra processing. The frequency trajectories depicting the distortion are built on a basis of amplitude, frequency, and phase dependencies and are further used for wow characteristic evaluation. Additionally the experiments concerning the neural-network-based prediction applied to the characteristic are performed. The obtained results are compared to linear-prediction.
Convention Paper 6769 (Purchase now)

P18-7 An Ontology-Based Approach to Information Management for Music Analysis SystemsSamer Abdallah, Yves Raimond, Mark Sandler, Queen Mary, University of London - London, UK
We describe an information management system that addresses the needs of music analysis projects, providing a logic-based knowledge representation scheme for the many types of object in the domains of music and signal processing, including musical works and scores, performance events, human agents, signals, analysis functions, and analysis results. The system is implemented using logic-programming and semantic web technologies and provides a shareable resource for use in a laboratory environment. The whole is driven from a Prolog command line, where the use of Matlab as a computational engine enables experiments to be designed and run with the results being automatically stored and indexed into the information structure. We present as a case-study an experiment in automatic music segmentation.
Convention Paper 6770 (Purchase now)

P18-8 Pyramid Algorithm for the Restoration of Audio Signal Corrupted by Wideband NoiseAzaria Cohen, Itai Neoran, Waves - Tel Aviv, Israel
Restoration of noisy audio recordings necessitates maximum suppression of noise with minimum degradation of program material. Spectral suppression methods perform best with high frequency resolution but deliver poor performance with transients. While wavelet-based algorithms attempt to mitigate the time-frequency trade-off, they suffer from frequency aliasing. The suggested pyramidal algorithm is a good candidate for resolving the time-frequency resolution trade-off while avoiding aliasing. In this paper an algorithm for removal of wideband noise from old audio recordings is evaluated. The algorithm is based on the pyramidal algorithm and on a spectral method for noise suppression. Results show enhanced conservation of onsets with efficient reduction of noise. The algorithm is implemented in real-time.
Convention Paper 6771 (Purchase now)

P18-9 Digital Music Notation Transformation Using XMLErich Christian Teppan, Harald Kosch, University Klagenfurt - Carinthia, Austria
The basic problem this paper deals with is how to convert western music notations, written for chromatic instruments, into special tablatures for diatonic instruments. There are just a few software programs addressing this problem but with the lack of full automatic operation and flexibility. This was the main reason for the development of new data formats and a new transformation algorithm, which are more suitable for the above-mentioned problem. Combined in an accurate software architecture, the newly-developed algorithm performs the transformation from a chromatic piece of music into a data format, which represents a diatonic tablature.
Convention Paper 6772 (Purchase now)

P18-10 A Service-Oriented High-Performance Architecture for Large Scale Audio ArchivesStephan Schneider, Blue Order AG - Kaiserslautern, Germany
This paper describes a solution for large audio archives that have been developed using a service-oriented architecture (SOA). The audio archiving system is designed as a framework of Web services that are controlled centrally by a workflow engine. The audio archiving system offers hierarchical storage, import, export, and conversion of audio files in various formats. Search and retrieval are based on textual metadata that are entered into an entity-relationship model (ERM). The archive offers a Web-based interface that can be used with a standard Web browser. Current installations are coping with several hundreds of users, more than 700,000 metadata entries and 16 TB of audio files.
Convention Paper 6773 (Purchase now)

P18-11 A Robust Music Retrieval SystemYuan-Yuan Shi, Xuan Zhu, Hyoung-Gook Kim, Ki-Wan Eom, Samsung Advanced Institute of Technology - HaiDian District, Beijing, China
A robust music audio fingerprinting system for automatic music retrieval is proposed in this paper. The fingerprinting feature is extracted from the long-term dynamic modulation spectrum estimation in the perceptual compressed domain. The modulation frequency analysis, smoothing with a low-pass filter and the low resolution quantization significantly improve the robustness of the feature. Further, the fast searching problem is solved by looking up hash table with 32-bit hash values. The hash value bits are quantized from the logarithmic scale modulation frequency coefficients. The system obtains 42.62 percent, 92.52 percent, 97.00 percent, or 99.67 percent search precision with approximately 3.0 percent false positive rate when the query clips’ signal-to-noise ratio is <0 dB, 0~5 dB, 5~15 dB, or >15 dB, respectively.
Convention Paper 6774 (Purchase now)


P19 - Loudspeakers and Sound Reinforcement

Monday, May 22, 14:00 — 18:20

Chair: Andrew Goldberg, Genelec Oy - Iisalmi, Finland

P19-1 On the Influence of the Geometry on Radiation for Electrodynamic LoudspeakersNicolas Quáegebeur, Antoine Chaigne, ENSTA - Palaiseau Cedex, France
The basic conception of loudspeakers has remained unchanged for decades. In particular, the shape of the diaphragm is nowadays designed as the association of a spherical cap and a truncated cone. The present paper focuses on the influence of the shape of the diaphragm on the sound radiation. A temporal model based on spatial impulse response has been developed to predict the sound radiation of an axisymetrical source subjected to an impulse. It is shown that nonplanar sources are less subject to off-axis amplitude and phase variations than planar sources. The comparison between convex and concave geometries is also studied. It is shown that transients are more accurately reproduced by convex structures.

Presentation is scheduled to begin at 14:00
Convention Paper 6775 (Purchase now)

P19-2 Methods to Improve the Horizontal Pattern of a Line Array Module in the MidrangeNils Benjamin Schröder, Tobias Schwalbe, Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany
This paper reviews methods for modeling the vertical directivity of the frequency range from 200 Hz to 1 kHz in line array configurations. It describes the advantages and disadvantages of the following concepts: the horn, the “V-Alignment,“ the flat alignment, and the partial coverage of the loudspeakers. We will shed light on the interrelationship between the angle of two cone loudspeakers and the resulting directivity. Symmetrical and asymmetrical configurations of mid-range drivers and horns are compared. We will outline a procedure to combine these solutions for superior results. One main result will be the desired match of the midsection’s directivity with the directivity of the hf-waveguide section. A concept for building systems with variable directivity over the whole frequency range will be drafted.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 14:20
Convention Paper 6776 (Purchase now)

P19-3 The Performance and Restrictions of High Frequency Waveguides in Line ArraysNils Benjamin Schröeder, Tobias Schwalbe, Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany
It is necessary to form a plane coherent wavefront in the hf-section of line arrays. Several different concepts have been applied to reach this goal. We discuss these existing solutions. The different ideas on how to create a cylindrical wavefront will be explained and evaluated. Especially those waveguides that have their weak point in the theoretical design will be criticized. An explanation on how we developed a new waveguide will be given. Finally, we want to give some ideas on how the next generation of waveguides could be designed.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 14:40
Convention Paper 6777 (Purchase now)

P19-4 Efficient Nonlinear LoudspeakersBo Rohde Pedersen, University of Aalborg - Aalborg, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
Loudspeakers have traditionally been designed to be as linear as possible. However, as techniques for compensating nonlinearities are emerging, it becomes possible to use other design criteria. This paper presents and examines a new idea for improving the efficiency of loudspeakers at high levels by changing the voice coil layout. This deliberate nonlinear design has the benefit that a smaller amplifier can be used, which, in turn, has the benefit of reducing system cost as well as reducing power consumption.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 15:00
Convention Paper 6778 (Purchase now)

P19-5 Advantages of FIR Filters in Digital Loudspeaker ControllersGuenter J. Krauss, Telex Communications Inc., EVI Audio GmbH - Straubing, Germany
Finite Impulse Response (FIR) filters for real-time audio applications can today be realized comparably easy and cost-effectively with state-of-the-art DSP technology. FIR filters have real advantages over regular Infinite Impulse Response (IIR) filters in loudspeaker controllers with regard to straight-forward linear-phase component equalization and significant improvements in the radiation pattern of cabinets with noncoincident drivers.

Presentation is scheduled to begin at 15:20
Convention Paper 6779 (Purchase now)

P19-6 Efficient Resonant Loudspeakers with Large Form-Factor Design FreedomRonald M. Aarts, Joris A. M. Nieuwendijk; Okke Ouweltjes, Philips Research - Eindhoven, The Netherlands
Small cabinet loudspeakers with a flat response are quite inefficient. Assuming that the frequency response can be manipulated electronically, systems that have a nonflat SPL-response can provide greater usable efficiency. Such a nonflat design can deal with very compact housing, but, for small drivers, it would require a relatively large cone excursion to obtain a high SPL. However, mounting the driver in a pipe, the air column can be made to resonate, which enables the use of small drivers with a small cone excursion to obtain a high SPL. For these special loudspeakers, a practically relevant optimality criterion, involving the driver and pipe parameters, will be defined. This can be especially valuable in designing very compact loudspeaker systems. An experimental example of such a design is described and a working prototype is presented.

Presentation is scheduled to begin at 15:40
Convention Paper 6780 (Purchase now)

P19-7 A Dipole Multimedia LoudspeakerVladimir Filevski, Broadcasting Council of Macedonia - Skopje, Macedonia
A multimedia/computer loudspeaker usually stands on a desk, so the reflected sound from the desk interferes with the direct sound from the loudspeaker. This results in a comb-like frequency response, with first minimum deep at least -8 dB, followed higher in the frequency by a peak of about +4 dB, and so on. This paper describes the design of a dipole multimedia/computer loudspeaker, with less than +2 dB/ -2.4 dB of difference between resultant frequency response (including reflected sound from the desk) and anechoic response.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 16:00
Convention Paper 6781 (Purchase now)

P19-8 Spatial Distribution of Distortion and Spectrally-Shaped Quantization Noise in Digital Micro-Array LoudspeakersMalcolm Hawksford, University of Essex - Essex, UK
A concept for a digital loudspeaker array is studied composed of clusters of micro-radiating elements that form individual digital-to-acoustic converters. In this scheme a large-scale array is composed of subgroups of micro clusters. To accommodate the finite resolution of each cluster, noise shaping is proposed and parallels are drawn with the processes used in digital-to-analog converters. Various elemental array geometries for each micro cluster are investigated by mapping transduction output into 3-D space to reveal the spatial distribution of both noise and distortion that result from noncoincident and quantized digital-to-acoustic elements.

Presentation is scheduled to begin at 16:20
Convention Paper 6782 (Purchase now)

P19-9 A Compact 120 Independent Element Spherical Loudspeaker Array with Programmable Radiation PatternsRimas Avizienis, Adrian Freed, Peter Kassakian, David Wessel, University of California at Berkely - Berkeley, CA, USA
We describe the geometric and engineering design challenges that were overcome to create a new compact, 10-inch
diameter spherical loudspeaker array with integrated class-D amplifiers and a 120-independent channel digital audio interface using Gigabit Ethernet. A special hybrid geometry is used that combines the maximal symmetry of a triangular-faceted icosahedron with the compact planar packing of six circles on an equilateral triangle ("billiard ball packing"). Six custom 1.25-inch drivers developed by Meyer Sound Labs are mounted on each of 20 aluminum triangular circuit boards. Class D amplifiers for the six loudspeakers are mounted on the other side of each board. Two pentagonal circuit boards in the icosahedron employ Xilinx Spartan 3E FPGA's to demultiplex digital audio signals from incoming Gigabit Ethernet packets and process them before feeding the class-D modulators. Processing includes scaling, delaying, filtering, and limiting.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 16:40
Convention Paper 6783 (Purchase now)

P19-10 Polar Plots at Low Frequencies: The Acoustic CenterJohn Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada; David J. Henwood, B & W Group Limited - Steyning, West Sussex, UK
This paper studies some aspects of how polar plots should be carried out when measuring loudspeakers. At low frequencies the effect of the cabinet becomes simpler as the wavelength of the sound becomes large relative to the cabinet dimensions. This allows a particular point to be picked out that acoustically acts as the center of the loudspeaker at the lower frequencies. This concept is verified by acoustic simulation, and also theoretically by expressing the source radiation into a multipole expansion. Some general criteria are presented to give estimates of the acoustic center for different geometrical aspects of the cabinet. Polar plots pivoted about the acoustic center display very consistent low-frequency characteristics. The discussion includes a number of other considerations regarding the acoustic center.

Presentation is scheduled to begin at 17:00
Convention Paper 6784 (Purchase now)

P19-11 Constant Directivity End-Fire Arrays for Public Address SystemsFilip Verbinnen, University of Southampton - Southampton, UK
The directivity of current public address systems is controlled very well in mid and high audio frequencies using arrays or horns. Low frequencies, though, are mostly still omnidirectional. The cardioid subwoofer is making its introduction but has some drawbacks limiting the maximum sound pressure level achievable by this type of system. As a possibly better alternative, the end-fire line array is considered as a directive bass system. Some research has already been done on end-fire arrays but none exploited the current potential of digital signal processing techniques. Using a linear end-fire array of loudspeakers each with its own digitally-processed input, the possibilities and limitations of these tapered end-fire linear arrays were examined with the main goal to create a constant directivity end-fire array with a usable frequency range from 20Hz to 200 Hz.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 17:20
Convention Paper 6785 (Purchase now)

P19-12 DGRC Arrays: A Synthesis of Geometrical and Electronic Loudspeaker ArraysXavier Meynial, Active Audio - St. Herblain, France
Loudspeaker arrays offer an efficient way of achieving both uniform SPL coverage and high sound clarity over a large audience area. Two types of arrays have been proposed over the last 15 years: geometrically steered J shape arrays, mainly for high power sound reinforcement; and electronically steered vertical arrays, mainly for speech diffusion in public spaces. This paper introduces the Digital and Geometric Radiation Control (DGRC) principle, which combines the advantages of geometrical arrays and electronic arrays; an array that is vertical so that it can be mounted on a wall; that is controlled with great flexibility using its DSP; and that the power is evenly distributed upon loudspeakers.

[Associated Poster Presentation in Session P26, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 17:40
Convention Paper 6786 (Purchase now)

P19-13 Universal System for Spatial Sound Reinforcement in Theaters and Large Venues—System Design and User InterfaceFrank Melchior, Gabriel Gatzsche, Michael Strauss, Katrin Reichelt, Martin Dausel, Joachim Deguara, Fraunhofer IDMT - Ilmenau, Germany
Sound reinforcement for large venues is a challenging task. Up to now most of the systems and concepts are focused on a more or less stereophonic reproduction. Beside these concepts a promising technology exists, which enables a spatial sound reinforcement for a larger audience. Spatial sound reinforcement is an important aspect especially in high quality applications like opera houses and venues for classical music. This paper presents an innovative system- and multi-user interface concept for dynamic automation and interactive control of sound source positions and other properties for variable reproduction systems in live sound reinforcement applications. The system has been designed in close cooperation with experts in sound reinforcement for opera houses. The developed user interfaces are described in addition to a detailed view on the practical realization and audio processing in such a system.

[Associated Poster Presentation in Session P28, Tuesday, May 23, at 09:00]

Presentation is scheduled to begin at 18:00
Convention Paper 6787 (Purchase now)


P20 - Low Bit-Rate Audio Coding, Part 2

Monday, May 22, 14:00 — 15:20

Chair: Mark Vinton, Dolby Laboratories - San Francisco, CA, USA

P20-1 A Novel Integrated Audio Bandwidth Extension Toolkit (ABET)Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, ATC Labs - Chatham, , NJ, USA, University of Porto, Porto, Portugal; Harinarayanan E. V., ATC Labs - Chatham, NJ, USA
Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit-rate audio and speech codecs. In this paper we describe the components of a novel integrated audio bandwidth extension toolkit (ABET). The ABET toolkit is a combination of two bandwidth extension tools: the Fractal Self-Similarity Model (FSSM) for signal spectrum; and, Accurate Spectral Replacement (ASR). Combination of these two tools, which are applied directly to high frequency resolution representation of the signal such as the Modified Cosine Transform (MDCT), has several benefits for increased accuracy and coding efficiency of the high frequency signal components. At the same time the combination of the two tools entails a number of importation algorithmic and perceptual considerations. In this paper we describe the components of the ABET bandwidth extension toolkit in detail. Algorithmic details, audio demonstrations, and comparison to other audio coding schemes will be presented. Additional information and audio samples are available at http://www.atc-labs.com/abet/.

Presentation is scheduled to begin at 14:00
Convention Paper 6788 (Purchase now)

P20-2 Evaluation of Real-Time Transport Protocol Configurations Using aacPlusAndreas Schneider, Kurt Krauss, Andreas Ehret, Coding Technologies - Nuremberg, Germany
aacPlus is a highly efficient audio codec that is being used in a growing number of applications where the compressed audio data is encapsulated in a real-time transport protocol and transmitted over error-prone channels. In this paper the implication of packet losses during transmission and techniques to mitigate the impact on the resulting audio quality are discussed. Example transmission channel characteristics are used to show how typical protocol configuration parameters are derived. The benefits of the described techniques are evaluated and verified by setting up a complete simulation chain and performing listening tests.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 14:20
Convention Paper 6789 (Purchase now)

P20-3 Audio Communication CoderAnibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA; Deepen Sinha, ATC Labs - Chatham, NJ, USA
3G mobile and wireless communication networks elicit new ways of multimedia human interaction and communication, notably two-way high-quality audio communication. This is in line with both the consumer expectation of new audio experiences and functionalities, and with the motivation of telecom operators to offer consumers new services and communication modalities. In this paper we describe the design and optimization of a monophonic audio coder (Audio Communication Coder -ACC) that features low-delay coding (< 50 ms) and intrinsic error robustness, while minimizing complexity and achieving competitive coding gains and audio quality at bit rates around 32 kbit/s and higher. ACC source, perceptual, and bandwidth extension tools are described and an emphasis is placed on ACC structural and operational features making it suitable for real-time, two-say audio communication. A few performance results are also presented. Audio demonstrations are available at http://www.atc-labs.com/acc/.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 14:40
Convention Paper 6790 (Purchase now)

P20-4 ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio CodingRalf Geiger, Fraunhofer IIS - Erlangen, Germany; Rongshan Yu, Institute for Infocomm Research - Singapore; Jürgen Herre, Fraunhofer IIS - Erlangen, Germany; Susanto Rahardja, Institute for Infocomm Research - Singapore; Sang-Wook Kim, Samsung Electronics - Suwon, Korea; Xiao Lin, Institute for Infocomm Research - Singapore; Markus Schmidt, Fraunhofer IIS - Erlangen, Germany
Recently, the MPEG Audio standardization group has successfully concluded the standardization process on technology for lossless coding of audio signals. This paper provides a summary of the Scalable Lossless Coding (SLS) technology as one of the results of this standardization work. MPEG-4 Scalable Lossless Coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innovative technology is described in detail and its performance is characterized for lossless and near-lossless representation, both in conjunction with an AAC coder and as a stand-alone compression engine. A number of application scenarios for the new technology are also discussed.

[Associated Poster Presentation in Session P23, Monday, May 22, at 16:00]

Presentation is scheduled to begin at 15:00
Convention Paper 6791 (Purchase now)


P21 - Posters: Room and Architectural Acoustics

Monday, May 22, 14:00 — 15:30

P21-1 Koch’s Snowflake: A Case Study of Sound Scattering of Fractal SurfacesDavid Degos, Steven Edson, Densil Cabrera, University of Sydney - Syndey, New South Wales, Australia
Diffusion and scattering are becoming increasingly relevant in room acoustics design. The scattering performance of current passive diffusers is often restricted to a certain bandwidth due to physical constraints. One possible approach to this is to use fractal surface profiles, which have similar geometric features over a wide range of scales, and so should achieve an extended bandwidth for effective scattering. A range of acoustic panels of varying complexity, based around Koch’s Snowflake pattern, was constructed and tested using a two-dimensional pseudo-anechoic method adapted from the AES-4id-2001. This paper reports on these results and also on issues encountered in implementing the measurements.

[Poster Presentation Associated with Paper Presentation P15-1]
Convention Paper 6735 (Purchase now)

P21-2 Influence of Ray Angle of Incidence and Complex Reflection Factor on Acoustical Simulation Results (Part II)Emad El-Saghir, Acoustic Design Ahnert Limited - Cairo, Egypt; Stefan Feistel, SDA Software Design Ahnert GmbH - Berlin, Germany
In a previous paper (Convention Paper 6171, 116th AES Convention, Berlin, Germany), it was shown that the influence of neglecting the incidence-angle dependence of absorption coefficients in a simple single-source shoebox room model was insignificant as far as simulation results are concerned. Neglecting phase shift at each reflection led, however, to a significant difference in the predicted pressure in the same model. This paper investigates the same two questions in a complicated model with several sources and a diversity of surface materials. It attempts to analytically estimate the error associated with disregarding these two issues.

[Poster Presentation Associated with Paper Presentation P15-3]
Convention Paper 6737 (Purchase now)

P21-3 Adaptive Audio Equalization of Rooms Based on a Technique of Transparent Insertion of Acoustic Probe SignalsAriel Rocha, António Leite, Francisco Pinto, Aníbal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA
This paper presents a new method of performing real time adaptive equalization of room acoustics in the frequency domain. The developed method obtains the frequency response of the room by means of transparent insertion of a certain number of acoustic probe signals into the main audio spectrum. The opportunities for the insertion of tones are identified by means of a spectral analysis of the audio signal and using a psychoacoustic model of frequency masking. This enhanced version of the adaptive equalizer will be explained as well as its real-time implementation on a TMS320C6713 DSP-based platform. Results of the acoustic tests and conclusions about its performance will be presented.

[Poster Presentation Associated with Paper Presentation P15-4]
Convention Paper 6738 (Purchase now)

P21-4 An Amphitheatric Hall Modal Analysis Using the Finite Element Method Compared to In Situ MeasurementsAnastasia Papastefanou, Christos Sevastiadis, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The distribution of the low frequency room modes is important in room acoustics. The Finite Element Method (FEM) is a powerful numerical technique for analyzing the behavior of sound waves in enclosures, especially irregular ones. Also, it is the method that produces reliable results in the low frequency range where other methods like ray tracing and image source methods fail. A modal analysis is presented using the FEM in a nonrectangular, medium-sized amphitheatric hall, and we compare the calculated results with those obtained by on site measurements.

[Poster Presentation Associated with Paper Presentation P15-5]
Convention Paper 6739 (Purchase now)

P21-5 A Computer-Aided Design Method for Dimensions of a Rectangular Enclosure to Avoid Degeneracy of Standing WavesZhi Liu, Fan Wu, Beijing Union University - Beijing, China
A method for designing dimensions of a rectangular enclosure to avoid degeneracy of standing waves and the corresponding computer-aided design software are presented in this paper. A math model to calculate many dimensions in favor of avoiding degeneracy of standing waves is created. The similarity of the normal frequencies regarded as degeneracy is limited under a specific condition. Based on the relationship between normal frequencies and the dimensions of a rectangular enclosure, the dimensions to avoid degeneracy can be chosen. A Computer Aided Design program is also developed to identify the dimensions that can be applied in the design of a loudspeaker cabinet or room to get the best acoustic effect.

[Poster Presentation Associated with Paper Presentation P15-6]
Convention Paper 6740 (Purchase now)

P21-6 Performance Analysis of Wave Field Simulation with the Functional Transformation MethodStefan Petrausch, Rudolf Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany
The application of the Functional Transformation Method (FTM) for the simulation of acoustical wave fields has been recently extended to complex room geometries by the usage of so-called “block-based” modeling techniques. The complete model is split into several elementary blocks with simple geometry, which are solved separately with the FTM and reconnected in the discrete system with Wave Digital Filter (WDF) principles. Concerning the performance of this algorithm, two questions arise: how much additional error is introduced compared with an all-in-one FTM solution, and how accurate are the FTM simulations compared with classical methods (e.g., digital waveguide meshes). This paper offers an answer for both questions by proving that the worst-case scenario of the proposed procedure, i.e., minimum size block models, is identical to a waveguide mesh. The complete derivation is first performed for 1-D models and then extended to 2-D wave fields, demonstrating the equivalence of minimum-size block-based FTM modeling and digital waveguide meshes.
Convention Paper 6792 (Purchase now)

P21-7 Real Time Acoustic Rendering of Complex Environments Including Diffraction and Curved SurfacesOlivier Deille, Julien Maillard, Nicolas Noé, Centre Scientifique et Technique de Bâtiment - Saint Martin d'Hères, France; Kadi Bouatouch, Institut de Recherche en Informatique et Systèmes Aléatoires - Rennes Cedex, France; Jacques Martin, Centre Scientifique et Technique de Bâtiment - Saint Martin d'Hères, France
A solution to produce virtual sound environments based on the physical characteristics of a modeled complex volume is described. The goal is to reproduce, in real time, the sound field depending on the position of the listener and to allow some interactivity (change in material characteristics for instance). First an adaptive beam tracing algorithm is used to compute a geometrical solution between the sources and several positions inside that volume. This algorithm is not limited to polygonal faces and handles diffraction. Then, the precomputed paths, once ordered and selected, are auralized, and an adaptive artificial reverberation is used. New techniques to allow for fast and accurate rendering are detailed. The proposed approach provides accurate audio rendering on headphones or within advanced multi-user immersive environments.

[Poster Presentation Associated with Paper Presentation 15-9]
Convention Paper 6743 (Purchase now)

P21-8 Comparisons between Binaural In-Situ Recordings and AuralizationsKonca Saher, Delft University of Technology - Delft, The Netherlands; Jens Holger Rindel, Technical University Denmark - Kgs. Lyngby, Denmark; Lau Nijs, Delft University of Technology - Delft, The Netherlands
The doctoral research of “Prediction and Assessment of Acoustical Quality in Living-rooms for People with Intellectual Disabilities” at Delft University of Technology investigates, among other issues, the applicability and verification of auralization as a quality assessment tool in acoustical-architectural design. This paper deals with the comparison between binaural in-situ recordings and auralizations obtained from computer simulations. Listening tests and questionnaires were prepared from auralizations to compare with the reference binaural recordings. The difficulties in evaluation of auralization quality are discussed. The results indicate that although auralizations and binaural recordings evoke different aural perception auralization is a strong tool to assess the acoustical environment before the space is built. Two commercial programs are used for the auralizations: ODEON and CATT-Acoustics.

[Poster Presentation Associated with Paper Presentation 15-10]
Convention Paper 6744 (Purchase now)

P21-9 A Review of NFPA 72 Requirements for Emergency CommunicationsMichael S. Pincus, Acentech Incorporated - Cambridge, MA, USA
The National Fire Protection Association publication 72, “The National Fire Alarm Code,” is the basis for most fire codes in the United States. The latest edition, published in 2006, will have updated requirements for both sound pressure level and intelligibility relating to messages used for emergency communications. This paper describes the changes between the new edition and the previous version, published in 2002, as well as a summary of proposed changes that were not accepted. A case study will show the impact of these requirements on the design of sound system designs for a series of light rail stations in Seattle, Washington, and contrast them with subway stations in Boston, Massachusetts.
Convention Paper 6793 (Purchase now)

P21-10 Classroom Acoustics: Current and Future Criteria for the Assessment of Acoustics for LearningSooch San Souci, Line Guerra, Nicolas Teichner, AiA - Audition, Intelligibility, Acoustics - Boulogne, France; Dick Campbell, Bang-Campbell Associates - East Falmouth, MA, USA
Assuring that a student can hear the teacher and classmates clearly, without having to filter out excessive noise, has been a common goal of the past, but the current standards fall short of the optimum acoustic for the act of learning. Several important factors have been overlooked by the current acoustic criteria for listening while learning. For example, the actions involved in receiving new information while listening in a learning environment and and their relationshp with multiple levels of perception and concentration during the “discovery phase” of the integration of new ideas. This paper describes an approach to define distinct acoustic criteria for learning environments. Data collected from several prototype classrooms specifically built to assess criteria significance, renovation cost/value, and measurement reproducibility with acoustic criteria determined on a seat-by-seat basis will be presented.
Convention Paper 6794 (Purchase now)


Introduction to Paper Session P22

Monday, May 22, 15:30 — 16:00

-1 William Martens, McGill University - Montreal, Quebec, Canada


P22 - Design and Engineering of Auditory Displays

Monday, May 22, 16:00 — 18:20

Chair: Densil Cabrera, University of Sydney - Syndey, New South Wales, Australia

William Martens, McGill University - Montreal, Quebec, Canada

P22-1 Spatial Sound in Auditory Vision Substitution SystemsAleksander Väljamäe, Mendel Kleiner, Chalmers University of Technology - Göteborg, Sweden
Current auditory vision sensory substitution (AVSS) systems might be improved by the direct mapping of an image into a matrix of concurrently active sound sources in a virtual acoustic space. This mapping might be similar to the existing techniques for tactile substitution of vision where point arrays are successfully used. This paper gives an overview of the current auditory displays used to sonify 2-D visual information and discuss the feasibility of new perceptually motivated AVSS methods encompassing spatial sound.

Presentation is scheduled to begin at 16:00
Convention Paper 6795 (Purchase now)

P22-2 Acoustic Rendering for Color InformationLudovico Ausiello, Emanuele Cecchetelli, Massimo Ferri, Nicoletta Caramelli, University of Bologna - Bologna, Italy
The Espacio Acustico Virtual (EAV) is a portable device that acoustically represents visual environmental scenes by rendering objects with the sound of virtual rain drops. Here, an improvement of this device is presented, which adds color to the information conveyed. Two different mappings of color into sound were implemented. Georama is a geometric coding based on red green blue vectors, while Colorama is an associative coding based on the hue and saturation model of color space. An experiment was run on both sighted and blind participants in order to assess which of these coding is the most user-friendly. The results showed that participants learned to discriminate colors through sounds better when trained with Georama than with Colorama.

[Associated Poster Presentation in Session P27, Tuesday, May 23, at 11:00]

Presentation is scheduled to begin at 16:20
Convention Paper 6796 (Purchase now)

P22-3 Auditory Display of AudioDensil Cabrera, Sam Ferguson, University of Sydney - Sydney, New South Wales, Australia
In this paper we consider applications of auditory display for representing audio systems and audio signal characteristics. Conventional analytic representation of system characteristics, such as impulse response or nonlinear distortion, rely on numeric and graphic communication. Alternatively, simply listening to the system under test should reveal important aspects of its performance. Given that auditioning systems is so effective, it seems useful to develop higher-level auditory representations (auditory displays) of system performance parameters to exploit these listening abilities. For this purpose, we consider ways in which audio signals can be further transformed for auditory display, beyond the simple act of playing the sound.

[Associated Poster Presentation in Session P27, Tuesday, May 23, at 11:00]

Presentation is scheduled to begin at 16:40
Convention Paper 6797 (Purchase now)

P22-4 Nonvocal Auditory Signals in the Operating Room for Each Phase of the Anesthesia ProcedureAnne Guillaume, Léonore Bourgeon, Elisa Jacob, Marie Rivenez, Claude Valot, IMASSA - Brétigny sur Orge, France; Jean-Bernard Cazalà, Hôpital Necker - Paris, France
Auditory warning signals are considered by the anesthetist team as a major source of annoyance and confusion in the operating room. An ergonomic approach was carried out in order to propose a functional classification of the auditory alarms and to allocate a correct level of urgency to them. It allows the team to analyze the pertinence of the auditory warning signals emitted during anesthesia progress taking into account each phase of the anaesthesia procedure. The results showed that the design of auditory warning signals could be improved taking into account the activity of the anesthetist team. They also showed significantly higher frequencies of warning signals during induction and emergence phases. However, the alarms were often ignored during these two phases as they occurred as a result of deliberate anesthetist actions. Most of them were then considered as nuisance alarms.

[Associated Poster Presentation in Session P27, Tuesday, May 23, at 11:00]
Convention Paper 6798 (Purchase now)

P22-5 Frequency Bandwidth and Multitalker EnvironmentsSimon Carlile, David Schonstein, University of Sydney - Sydney, New South Wales, Australia
Understanding a talker of interest from a complex background is a common and difficult listening task not just restricted to cocktail parties. Recent work demonstrates that high frequencies in speech are important for accurately localizing the talker and that perceived differences in the locations of talkers are important in solving the cocktail party problem. This paper describes experiments demonstrating that high frequencies contribute to the spatial release from masking by other talkers. In addition, low frequency energy at the fundamental frequency of the talker, over and above the perception of the fundamental frequency, also plays a role in spatial release from masking.

Presentation is scheduled to begin at 17:20
Convention Paper 6799 (Purchase now)

P22-6 Usability of 3-D Sound for Navigation in a Constrained Virtual EnvironmentAntoine Gonot, France Telecom R&D - Lannion, France, CNAM, Paris, France; Noël Château, Marc Emerit, France Telecom R&D - Lannion, France
This paper presents a study on a global evaluation of spatial auditory displays in a constrained virtual environment. Forty subjects had to find nine sound sources in a virtual town, navigating by using spatialized auditory cues that were delivered differently in four different conditions: by a binaural versus a stereophonic rendering (through headphones) combined by a contextualized versus decontextualized presentation of information. Behavioral data, auto-evaluation of cognitive load and subjective-impression data collected via a questionnaire were recorded. The analysis shows that the binaural-contextualized presentation of auditory cues leads to the best results in terms of usability, cognitive load, and subjective evaluation. However, these advantages are only observable after a certain period of acquisition.

[Associated Poster Presentation in Session P27, Tuesday, May 23, at 11:00]

Presentation is scheduled to begin at17:40
Convention Paper 6800 (Purchase now)

P22-7 Psychoacoustic Evaluation of a New Method for Simulating Near-Field Virtual Auditory SpaceAlan Kan, Craig Jin, André van Schaik, University of Sydney - Sydney, New South Wales, Australia
A new method for generating near-field virtual auditory space (VAS) is presented. This method synthesizes near-field head-related transfer functions (HRTFs) based on a distance variation function (DVF). Using a sound localization experiment, the fidelity of the near-field VAS generated using this technique is compared to that obtained using near-field HRTFs synthesized using a multipole expansion of a set of HRTFs interpolated using a spherical thin-plate spline. Individualized HRTFs for varying distances in the near-field were synthesized using the subjects’ HRTFs measured at a radius of 1-m for a limited number of locations around the listener’s head. Both methods yielded similar localization performance showing no major directional localization errors and reasonable correlation between perceived and target distances of sounds up to 50 cm from the center of the subjects head. Also, subjects tended to overestimate the target distance for both methods.

[Associated Poster Presentation in Session P27, Tuesday, May 23, at 11:00]

Presentation is scheduled to begin at 18:00
Convention Paper 6801 (Purchase now)


P23 - Posters: Low Bit-Rate Audio Coding

Monday, May 22, 16:00 — 17:30

P23-1 The Relationship between Selected Artifacts and Basic Audio Quality in Perceptual Audio CodecsPaulo Marins, Francis Rumsey, Slawomir Zielinsky, University of Surrey - Guildford, Surrey, UK
Up to this point, perceptual audio codecs have been evaluated according to ITU-R standards such as BS.1116-1 and BS.1534-1. The majority of these tests tend to measure the performance of audio codecs using only one perceptual attribute, namely the basic audio quality. This approach, although effective in terms of the assessment of the overall performance of codecs, does not provide any further information about the perceptual importance of different artifacts inherent to low bit-rate audio coding. Therefore in this study an alternative style of listening test was performed, investigating not only basic audio quality but also the perceptual significance of selected audio artifacts. The choice of the artifacts included in this investigation was inspired by the CD-ROM published by the AES Technical Committee on Audio Coding entitled “Perceptual Audio Coders: What to Listen For.”

[Poster Presentation Associated with Paper Presentation P16-1]
Convention Paper 6745 (Purchase now)

P23-2 Reduced Bit Rate Ultra Low Delay Audio CodingStefan Wabnik, Gerald Schuller, Jens Hirschfeld, Ulrich Krämer, Fraunhofer Institute for Digital media Technology (IDMT) - Ilmenau, Germany
An audio coder with a very low delay (6 to 8 ms) for reduced bit rates is presented. Previous coder versions were based on backward adaptive coding, which has suboptimal noise shaping capabilities for reduced rate coding. We propose to use a different noise shaping method instead, resulting in an approach that uses forward adaptive predictive coding. We will show that, in comparison, the forward adaptive method has the following advantages: it is more robust against high quantization errors, has additional noise shaping capabilities, has a better ability to obtain a constant bit rate, and shows improved error resilience.

[Poster Presentation Associated with Paper Presentation P16-3]
Convention Paper 6747 (Purchase now)

P23-3 Scalable Audio Coding with Iterative Auditory MaskingChristophe Veaux, Pierrick Philippe, France Telecom R&D - Cesson-Sévigné, France
In this paper reducing the cost of scalability is investigated. A coding scheme based on cascaded MDCT-transform is presented, for which masking thresholds are iteratively calculated from the transform coefficients quantized at previous layers. As a result, the masking thresholds are updated at the decoder in the same way as at the encoder without the need to transmit explicit information such as scale factors. By eliminating this overhead, this approach significantly improves the coding efficiency. It is also shown that further improvements are made possible by allowing the transmission of some side information depending on the frame or on the layer.

[Poster Presentation Associated with Paper Presentation P16-6]
Convention Paper 6750 (Purchase now)

P23-4 A Frequency-Domain Framework for Spatial Audio Coding Based on Universal Spatial CuesMichael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
Spatial audio coding (SAC) addresses the emerging need to efficiently represent high-fidelity multichannel audio. The SAC methods previously described involve analyzing the input audio for interchannel relationships, encoding a downmix signal with these relationships as side information, and using the side data at the decoder for spatial rendering. These approaches are channel-centric in that they are generally designed to reproduce the input channel content over the same output channel configuration. In this paper we propose a frequency-domain SAC framework based on the perceived spatial audio scene rather than on the channel content. We propose time-frequency spatial direction vectors as cues to describe the input audio scene, present an analysis method for robust estimation of these cues from arbitrary multichannel content, and discuss the use of the cues to achieve accurate spatial decoding and rendering for arbitrary output systems.

[Poster Presentation Associated with Paper Presentation P16-7]
Convention Paper 6751 (Purchase now)

P23-5 Parametric Joint-Coding of Audio SourcesChristof Faller, EPFL - Lausanne, Switzerland
The following coding scenario is addressed. A number of audio source signals need to be transmitted or stored for the purpose of mixing stereo, multichannel surround, wave field synthesis, or binaural signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, properties of mixing techniques, and spatial perception. The sum of the source signals is transmitted plus the statistical properties that determine the spatial cues at the mixer output. Subjective evaluation indicates that the proposed scheme achieves high audio quality.

[Poster Presentation Associated with Paper Presentation P16-8]
Convention Paper 6752 (Purchase now)

P23-6 Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio CodingChristophe Tournery, Christof Faller, EPFL - Lausanne, Switzerland
For parametric stereo and multichannel audio coding, it has been proposed to use level difference, time difference, and coherence cues between audio channels to represent the perceptual spatial features of stereo and multichannel audio signals. In practice, it has turned out that by merely considering level difference and coherence cues a high audio quality can already be achieved. Time difference cue analysis/synthesis did not contribute much to a higher audio quality, or, even decreases audio quality when not done properly. However, for binaural audio signals, e.g., binaural recordings or signals mixed with HRTFs, time differences play an important role. We investigate problems of time difference analysis/synthesis with such critical signals and propose an algorithm for improving it. A subjective evaluation indicates significant improvement over our previous time difference analysis/synthesis.

[Poster Presentation Associated with Paper Presentation P16-9]
Convention Paper 6753 (Purchase now)

P23-7 Closing the Gap between the Multichannel and the Stereo Audio World: Recent MP3 Surround ExtensionsBernhard Grill, Oliver Hellmuth, Johannes Hilpert, Jürgen Herre, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
After more than 10 years of commercial availability, MP3 continues to be the most utilized format for compressed audio. New technologies extend the use from stereo to multichannel applications. Presented in 2004, the MP3 Surround format allows representation of high-quality 5.1 surround sound at bit rates so far used for stereo signals while remaining compatible with any MP3 playback device. Recently, add-on technologies complemented the usability of MP3 Surround. The capability of spatializating stereo content into MP3 Surround files provides listener envelopment also for the reproduction of legacy stereo content. Improved spatial reproduction is offered by the auralized reproduction of MP3 Surround via regular stereo headphones. This paper describes the underlying concepts and the interworking of the technology components.

[Poster Presentation Associated with Paper Presentation P16-10]
Convention Paper 6754 (Purchase now)

P23-8 Design for High Frequency Adjustment Module in MPEG-4 HE-AAC Encoder Based on Linear Prediction MethodHan-Wen Hsu, Yung-Cheng Yang, Chi-Min Liu, Wen-Chieh Lee, National Chiao Tung University - Hsin-Chu, Taiwan
High frequency adjustment module is the kernel module of spectral band replication (SBR) in MPEG-4 HE-AAC. The objective of high frequency adjustment is to recover the tonality of reconstructed high frequency. There are two crucial issues, the accurate measurement of tonality and the decision of shared control parameters. Control parameters, which are extracted according to signal tonalities, will be used to determine gain control and energy level of additional components in decoder part. In other words, the quality of the reconstructed signal will be directly related to the high-frequency adjustment module. In this paper an efficient method based on the Levinson-Durbin algorithm is proposed to measure the tonality by linear prediction approach with adaptive orders to fit different subband contents. Furthermore, the artifact due to the sharing of control parameters is also investigated and the efficient decision criterion of control parameters is proposed.

[Poster Presentation Associated with Paper Presentation P16-11]
Convention Paper 6755 (Purchase now)

P23-9 Evaluation of Real-Time Transport Protocol Configurations Using aacPlusAndreas Schneider, Kurt Krauss, Andreas Ehret, Coding Technologies - Nuremberg, Germany
aacPlus is a highly efficient audio codec that is being used in a growing number of applications where the compressed audio data is encapsulated in a real-time transport protocol and transmitted over error-prone channels. In this paper the implication of packet losses during transmission and techniques to mitigate the impact on the resulting audio quality are discussed. Example transmission channel characteristics are used to show how typical protocol configuration parameters are derived. The benefits of the described techniques are evaluated and verified by setting up a complete simulation chain and performing listening tests.

[Poster Presentation Associated with Paper Presentation P20-2]
Convention Paper 6789 (Purchase now)

P23-10 Audio Communication CoderAnibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA; Deepen Sinha, ATC Labs - Chatham, NJ, USA
3G mobile and wireless communication networks elicit new ways of multimedia human interaction and communication, notably two-way high-quality audio communication. This is in line with both the consumer expectation of new audio experiences and functionalities, and with the motivation of telecom operators to offer consumers new services and communication modalities. In this paper we describe the design and optimization of a monophonic audio coder (Audio Communication Coder -ACC) that features low-delay coding (< 50 ms) and intrinsic error robustness, while minimizing complexity and achieving competitive coding gains and audio quality at bit rates around 32 kbit/s and higher. ACC source, perceptual, and bandwidth extension tools are described and an emphasis is placed on ACC structural and operational features making it suitable for real-time, two-say audio communication. A few performance results are also presented. Audio demonstrations are available at http://www.atc-labs.com/acc/.

[Poster Presentation Associated with Paper Presentation P20-3]
Convention Paper 6790 (Purchase now)

P23-11 ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio CodingRalf Geiger, Fraunhofer IIS - Erlangen, Germany; Rongshan Yu, Institute for Infocomm Research - Singapore; Jürgen Herre, Fraunhofer IIS - Erlangen, Germany; Susanto Rahardja, Institute for Infocomm Research - Singapore; Sang-Wook Kim, Samsung Electronics - Suwon, Korea; Xiao Lin, Institute for Infocomm Research - Singapore; Markus Schmidt, Fraunhofer IIS, Erlangen - Germany
Recently, the MPEG Audio standardization group has successfully concluded the standardization process on technology for lossless coding of audio signals. This paper provides a summary of the Scalable Lossless Coding (SLS) technology as one of the results of this standardization work. MPEG-4 Scalable Lossless Coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innovative technology is described in detail and its performance is characterized for lossless and near-lossless representation, both in conjunction with an AAC coder and as a stand-alone compression engine. A number of application scenarios for the new technology are also discussed.

[Poster Presentation Associated with Paper Presentation P20-4]
Convention Paper 6791 (Purchase now)

P23-12 A Scalable CELP/Transform Coder for Low Bit Rate Speech and Audio CodingGuillaume Fuchs, Roch Lefebvre, University of Sherbrooke - Sherbrooke, Quebec, Canada
With the increase of channel capacity in communication systems, several emerging applications require an acceptable reproduction quality for speech signals at low bit rates and a superior quality for any kind of audio inputs when more bandwidth is available. To meet this requirement, we propose a new scalable audio coding algorithm. The proposed coder consists of a wideband speech coder embedded in a multilayer transform coding algorithm. The transform coefficients are quantized using scalable lattice vector quantization. The global system exhibits low computational complexity and memory requirements and leads to a very fine-grained scalability. The new coding algorithm is suitable for communications over heterogeneous networks with no or uneven guarantee on the quality of service (QoS) for packet delivery.
Convention Paper 6802 (Purchase now)

P23-13 A New Low Bit-Rate Speech Coding Scheme for Mixed ContentRaghuram A., ATC Labs - Chatham, NJ, USA; Anibal Ferreira, ATC Labs - Chatham, NJ, USA, University of Porto, Porto, Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Speech coding is a very mature research area and many coding schemes are available that provide speech qualities ranging from highly intelligible synthetic speech at about 2-kbit/s, to wideband natural speech at about 16-kbit/s. However, emerging application scenarios such as information services on broadcast radio are eliciting additional concurrent challenges not easily addressed by current speech coding technology, namely the need to code mixed audio material, the need to permit flexible bit-rate coding configurations, the need to scale effectively in quality in the range 2- to 8-kbit/s, and the need to offer pleasant natural sound. In this paper we present a new very low rate speech/audio coding technology addressing those concurrent challenges thanks to the use of innovative approaches regarding accurate reconstruction of harmonic complexes, optimal coding of the excitation, efficient side information coding, and suitable combination of new bandwidth extension techniques. The structure of the speech/audio coder is detailed and its performance in the range 2.4- to 12-kbit/s is illustrated and compared to that of reference coders.
Convention Paper 6803 (Purchase now)

P23-14 On Improving Parametric Stereo Audio CodingJimmy Lapierre, Roch Lefebvre, Sherbrooke University - Sherbrooke, Quebec, Canada
Existing schemes for stereo and spatial audio coding rely on psychoacoustically-relevant parametric models. These systems generally encode and transmit interchannel intensity, coherence, and phase parameters extracted from a time-frequency plane. Building on this framework, we discuss a number of potential refinements that can improve the quality or reduce the bit-rate of these existing schemes using information already transmitted to the decoder. We also evaluate and assert the performance of these enhancements with a distortion analysis of the relevant parameters.
Convention Paper 6804 (Purchase now)

P23-15 Stack-Run Audio CodingMarie Oger, Julien Bensa, Stéphane Ragot, France Télécom R&D - Lannion Cedex, France; Marc Antonini, Lab. I3S-UMR 6070 CNRS and University of Nice Sophia Antipolis - Sophia Antipolis, France
In this paper we present an application of stack-run entropy coding to audio compression. Stack-run coding represents signed integers and zero run length by adaptive arithmetic coding using a quaternary alphabet (0, 1, +. -). We use this method to encode scalar quantization indices representing the MDCT spectrum of perceptually-weighted wideband audio signals (sampled at 16000 Hz). Noise injection and pre-echo reduction are also used to improve quality. The average quality of the proposed technique is similar to ITU-T G722.1. In addition, we compare the performance of scalar quantization with stack-run coding to the multirate lattice vector quantization of 3GPP AMR-WB+.
Convention Paper 6805 (Purchase now)

P23-16 A Codebook-Based Cascade Coder for Embedded Lossless Audio CodingChristian Ritz, Kevin Adistambha, Jason Lukasiak, Ian Burnett, University of Wollongong - Wollongong, New South Wales, Australia
Embedded lossless audio coding embeds a perceptual audio coding bit stream within a lossless audio coding bit stream. Such an approach provides access to both a lossy and lossless version of the audio signal within the one coding scheme. Previously, a lossless embedded audio coder based on the Advanced Audio Coding (AAC) approach and utilizing both backward Linear Predictive Coding (LPC) and cascade coding was proposed. This paper further investigates the adaptation of cascade coding to lossless audio compression using a novel codebook-based approach. The codebook is trained using LPC residual signals obtained from the decorrelation stage of the embedded coder. Results show that the overall lossless compression performance of cascade coding closely follows Rice coding.
Convention Paper 6806 (Purchase now)

P23-17 A Unified Transient Detector for Enhanced aacPlus EncoderSamsudin, Boon Poh Ng, Nanyang Technological University - Singapore; Evelyn Kurniawati, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore; Farook Sattar, Nanyang Technological University - Singapore; Sapna George, STMicroelectronics Asia Pacific Pte.Ltd. - Singapore
An enhanced aacPlus audio codec is a combination of MPEG-4 Advanced Audio Coding (AAC), Spectral Band Replication (SBR), and Parametric Stereo (PS). To deal with transient signals, SBR and AAC employ separate transient detectors, although both detectors basically perform detection on the same signal. This paper presents an idea of a low-complexity transient detector that operates in a PS encoder. It performs online detection at the same time as PS spatial parameter extraction and takes advantage of some computations performed for subband grouping. Testing on a few percussive solo instrument signals and percussive mixture shows a good transient information matching with the transient information generated by the original SBT and AAC detectors, with much less computational requirements. This implies that complexity of the encoder can be reduced by replacing both detectors with the proposed unified low-complexity detector.
Convention Paper 6807 (Purchase now)

P23-18 New Results in Rate-Distortion Optimized Parametric Audio CodingMads Græsbøll Christensen, Søren Holdt Jensen, Aalborg University - Aalborg, Denmark
In this paper we summarize some recently published methods and results in parametric audio coding. These are all based on rate-distortion optimized coding using a perceptual distortion measure. We summarize how a number of well-known computationally efficient methods for incorporating perception in sinusoidal parameter estimation relate to minimizing this perceptual distortion measure. Then a number of methods for parametric coding of transients are compared, and results of listening tests are presented. Finally, we show how the complexity of rate-distortion optimized audio coding can be reduced by rate-distortion estimation.
Convention Paper 6808 (Purchase now)

P23-19 Harmonic Structure Reconstruction in Audio Compression Method Based on Spectral-Oriented TreesWei-Chen Chang, Jing-Xin Wang, Alvin Su, National Cheng-Kung University - Tainan, Taiwan
A novel audio compression method called Harmonic Structure Quad Tree is presented. The method employs a bit-plane based quantization-encoding method called Concurrent Encoding in Hierarchical Tree to encode MDCT coefficients of the overlapping audio frames. Scalability is easily achieved by discarding the tailing bits at any position of the bit-stream as long as head information is preserved. In this paper an embedded harmonic structure reconstruction method is proposed to predict and restore the coefficients missed during the encoding process. The proposed method is compared favorably to the popular MP3 coder in both pop and classic audio programs. No psychoacoustic model is used. The computation complexity and the coding table size are much smaller compared to those of an MP3 coder.
Convention Paper 6809 (Purchase now)

P23-20 An Experimental Audio Coder Using Rate-Distortion Controlled Temporal Block SwitchingJohannes Boehm, Sven Kordon, Peter Jax, Thomson Corporate Research - Hannover, Germany
To address the requirement of piecewise stationarity within the analyzed signal segments, today’s state of the art audio codecs make use of two filter bank resolutions. Short temporal resolution sequences are used to adapt to transient-like jump signals; long temporal resolutions are used to effectively code the more steady or slowly drifting waveforms. With increasing computational capacity a better adaptation of the filter bank to the signal becomes feasible. This paper presents an experimental MDCT-based transform coder that is capable of switching between four filter bank resolutions. A distortion measure is deployed that is driven by a simple psychoacoustic model that incorporates masking effects both for stationary and transient signals. A rate-distortion control is proposed to partition the signal to optimally match the signal contour with the temporal resolutions of the filter bank. Performance results are presented and compared to the conventional two-resolution approach. Proposals for further developments such as pre-segmentation are evaluated.
Convention Paper 6810 (Purchase now)

P23-21 Detection and Extraction of Transients for Audio CodingOliver Niemeyer, Thomson Corporate Research - Hannover, Germany; Bernd Edler, University of Hannover - Hannover, Germany
An algorithm for the detection and extraction of transient signal components is presented. It is based on the detection of sharp onsets of the signal power in time direction of the complex time frequency domain. Afterward the detected transients are extracted in the corresponding MDCT spectrum. The audio signal containing only the extracted transients is synthesized using the inverse MDCT. In an audio coding application this transient signal and a resulting residual signal can be coded separately using specifically optimized coders. One approach of such an audio coding scheme using an MDCT-based coder for the transient signal is also presented.
Convention Paper 6811 (Purchase now)

P23-22 Audio Coding Using a Genetic AlgorithmDavid Marston, BBC R&D - Tadworth, Surrey, UK
Currently MPEG 1 Layer II encoders incorporate a feedforward technique where a psychoacoustic model derived from the input signal drives the bit allocation. This paper describes a novel approach where a psychoacoustic metric compares the output signal with the input signal to drive the bit allocation and uses a genetic algorithm in the feedback process. The audio quality is compared with that of leading conventional audio coders. The aim of the work was to assess how far Layer II coding can be improved and whether any further progress can be made with conventional coding.
Convention Paper 6812 (Purchase now)

P23-23 Parametric Representation of Multichannel Audio Based on Principal Component AnalysisManuel Briand, David Virette, France Telecom R&D - Lannion Cedex, France; Nadine Martin, Laboratoire des Images et des Signaux - St. Martin D’Hères Cedex, France
Low-bit-rate parametric audio coding for multichannel audio is mainly based on Binaural Cue Coding (BCC). In this paper we show that the Unified Domain Representation (UDR) of multichannel audio, recently introduced, is equivalent to BCC scheme in a parametric stereo coding context. Based on the fact that spatial parameters can be represented by rotation angles, we propose a general model based on the Principal Component Analysis (PCA) approach. This model may be applied both to parametric representation of multichannel audio signals and upmix methods. Moreover, we apply the analysis results to propose a new parametric audio coding method based on frequency subbands PCA processing.
Convention Paper 6813 (Purchase now)

P23-24 A Dual Audio Transcoding Algorithm for Digital Multimedia Broadcasting ServicesKyoung Ho Bang, Yonsei University - Seoul, Korea; Young Cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
In this paper we propose a dual audio transcoding algorithm to service high quality audio streams using a broadcasting network comprising heterogeneous audio formats. As two typical cases, audio transcodings from TDTV to T-DMB and S-DMB services are considered. While the Korean DTV audio standard employs the Dolby AC-3, the Korean T-DMB and Korean S-DMB services use the MPEG-4 BSAC and the MPEG-4 HE-AAC audio coding technologies, respectively. In the proposed algorithm, the bit allocation information of AC-3 is reused in the process of BSAC and HE-AAC encodings and the nested loops are reestablished as two independent loops, which saves significant amount of computational cost. Overall, the transcoding algorithm can save about 65 percent of the computational cost for the BSAC encoding and 31 percent of HE-AAC encoding. Subjective quality evaluations show that the proposed algorithm has mean diffgrades of -0.02 and -0.01 relative to the tandem method. Due to its computational simplicity and effective performance, the proposed algorithm is suitable for the mobile multimedia services.
Convention Paper 6814 (Purchase now)

P23-25 A Subband Domain Downmixing Scheme for Parametric Stereo EncoderSamsudin, Nanyang Technological University - Singapore; Evelyn Kurniawati, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore; Farook Sattar, Ng Boon Poh, Nanyang Technological University - Singapore; Sapna George, STMicroelectronics Asia Pacific Pte. Ltd. - Singapore
Parametric Stereo (PS) coding describes a stereo audio signal with a monaural signal and a set of spatial parameters. This paper describes a signal-dependant, subband-domain downmixing scheme for a PS encoder to obtain the monaural signal. The downmixing is performed on the subband signal output from PS analysis filtering, hence no extra signal decomposition is required. The scheme is able to minimize phase cancellation by performing phase alignment of the stereo signals prior to the mixing. In addition, power equalization ensures preservation of the overall power of the original stereo signal in the monaural downmixed signal. Additional computational requirements can be kept low by making use of the available PS spatial parameter data for the phase alignment. Testing on synthetic and real-life audio recording shows a good performance especially for audio recording with significant out-of-phase side signal component.
Convention Paper 6815 (Purchase now)


P24 - Psychoacoustics, Perception, and Listening Tests

Tuesday, May 23, 09:00 — 12:40

Chair: Gaetan Lorho, Nokia Corporation - Helsinki, Finland

P24-1 Auditory Scene Synthesis for Distributed Audiences in E-Learning ApplicationsGavin Kearney, Dermot Furlong, Trinity College - Dublin, Ireland
The enhancement of learning processes through electronic presentation has led to the development of e-Learning environments, which merge traditional classroom instruction with teleconference capabilities. One aspect of such presentations is the correct localization of both stationary and mobile sound sources for all audience members with the video data on a teleconference screen. Research into conventional sound reinforcement solutions shows how systems based on stereophonic principles fail in this regard. Furthermore, sound systems such as Delta Stereophony and Wave Front Synthesis, which provide the accurate wavefronts required for correct localization, are found to be uneconomic for application to e-Learning classroom environments. A solution to this localization problem is presented as a 5-speaker frontal line-array and its effectiveness is verified through accurate simulator and localization tests. A physical implementation of the system is also presented for subjective evaluation.

Presentation is scheduled to begin at 09:00
Convention Paper 6816 (Purchase now)

P24-2 The Effect of Loudspeaker Frequency Bandwidth Limitation and Stereo Base Width on Perceived QualityGaetan Lorho, Nokia Corporation - Helsinki, Finland
The effect of frequency bandwidth limitation and stereo base width on listeners’ preference for loudspeaker reproduction of music and movie sound was studied. Various combinations of high and low frequency band limitation were considered for near-field monophonic and stereophonic loudspeaker reproduction. Two loudspeaker configurations with a reduced stereo base width representative of mobile multimedia systems were also included in this experiment to investigate the perceived effect of a stereo enhancement algorithm. The results of this study indicate that untrained listeners consider low-frequency content to be more important than high-frequency content or stereophony in their preference judgments. For cases where the optimal stereo reproduction was preferred to the monophonic and reduced stereo base setups, a significant improvement in preference was found with the stereo enhancement systems.

Presentation is scheduled to begin at 09:20
Convention Paper 6817 (Purchase now)

P24-3 Spatial Character and Quality Assessment of Selected Stereophonic Image Enhancements for Headphone Playback of Popular MusicAtsushi Marui, William Martens, McGill University - Montreal, Quebec, Canada
The effects of selected stereophonic image enhancement algorithms on perceived spatial character and quality of headphone playback for popular music was investigated for a sampling of program material typical of conventional multitrack mixes. Preference ratings were made for the auditory images resulting from three enhancement algorithms in comparison with the original PCM recordings of nine short musical programs. A perceptual coding (MP3) of the original recordings was also presented, making a total of five versions to be compared for each musical program. In addition, ratings were collected on a perceptual attribute identified herein as Ensemble Stage Width (ESW). Applied algorithms had significant effects on both preference and ESW ratings, regardless of whether expensive or inexpensive headphones were used in the listening tests.

Presentation is scheduled to begin at 09:40
Convention Paper 6818 (Purchase now)

P24-4 Designing a Spatial Audio Attribute Listener Training System for Optimal TransferRafael Kassier, Tim Brookes, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Interest in spatial audio has increased due to the availability of multichannel reproduction systems for the home and car. Various timbral ear training systems have been presented, but relatively little work has been carried out into training in spatial audio attributes of reproduced sound. To demonstrate that such a training system is truly useful, it is necessary to show that learned skills are transferable to different settings. Issues relating to the transfer of training are examined; a recent study conducted by the authors is discussed in relation to the level of transfer shown by participants, and a new study is proposed that is aimed to optimize the transfer of training to different environments.

[Associated Poster Presentation in Session P29, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 10:00
Convention Paper 6819 (Purchase now)

P24-5 Evaluation of Loudness in a Room Acoustic ModelYi Shen, Konstantinos Angelakis, Technical University of Denmark - Kgs. Lyngby, Denmark
The equal loudness contours were measured using an analytical model, which was constructed for rectangular rooms based on the Kuttruff’s room acoustic model. A stimulus can be presented through the room acoustic model to the subjects via headphones. The results are in very close agreement with measurements in a real room. Two additional experiments were conducted to study the loudness as a function of reverberation time for different types of stimuli. The results showed that the effect of reverberation on loudness is negligible for a stationary stimulus. On the other hand, for an impulse train, loudness depends on both the reverberation time of the test room and the repetition frequency of the stimulus.

[Associated Poster Presentation in Session P29, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6820 (Purchase now)

P24-6 Effect of Direction on Loudness for Wideband and Reverberant SoundsVille Pekka Sivonen, Aalborg University - Aalborg, Denmark, and Brüel & Kjær Sound & Vibration Measurement A/S, Nærum, Denmark; Wolfgang Ellermeier, Aalborg University - Aalborg, Denmark
The effect of incidence angle on loudness was investigated for wideband and reverberant sounds. In an adaptive procedure, five listeners matched the loudness of a sound coming from five incidence angles in the horizontal plane to that of the same sound with frontal incidence. The stimuli were presented to the listeners via individual binaural synthesis. The results confirm that loudness depends on a sound incidence angle, as it does for narrow-band, anechoic sounds. The directional effects, however, were attenuated with the wideband and reverberant stimuli used in the present investigation.

Presentation is scheduled to begin at 10:40
Convention Paper 6821 (Purchase now)

P24-7 Investigations in Real-time Loudness MeteringGilbert Soulodre, Michel Lavoie, Communications Research Centre - Ottawa, Ontario, Canada
There has been much research in the past few years on loudness perception and metering. Recently, the authors developed an objective loudness algorithm that accurately measures the perceived loudness of mono, stereo, and multichannel audio sequences. The algorithm provides a single loudness reading for the overall audio sequence. In broadcast, film, and music applications it is desirable to have a real-time loudness meter that can track the loudness of the audio signal over time. The new meter would be used in conjunction with existing metering methods to provide additional information about the audio signal. In the present paper the requirements for such a meter are examined and new subjective testing methods are devised to help in the development and evaluation of a new meter.

[Associated Poster Presentation in Session P29, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 11:00
Convention Paper 6822 (Purchase now)

P24-8 Measuring the Threshold of Audibility of Temporal DecaysAndrew Goldberg, Genelec Oy - Iisalmi, Finland; Helsinki University of Technology, Espoo, Finland
A listening test system designed to measure the threshold of audibility of the decay time of low frequency resonances is described. The system employs the Parameter Estimation by Sequential Testing (PEST) technique, and the listening test is conducted on calibrated headphones to remove factors associated with the listening environment. Program signal, replay level, and resonance frequency are believed to influence decay time threshold. A trial the listening test shows that the system reveals realistic results but the temporal resonance modeling filter requires some adjustment to remove audible nonmodal cues. Transducer limitations still affect the test at low frequencies and high replay levels. Factors for future large-scale listening tests are refined. Early indications are that temporal decay thresholds rise with reduced frequency and SPL.

[Associated Poster Presentation in Session P29, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6823 (Purchase now)

P24-9 The Influence of Impulse Response Length and Transition Bandwidth of Magnitude Complementary Crossover on Perceived Sound QualityIva Djukic, Institute Mihailo Pupin - Belgrade, Serbia and Montenegro; Dejan Todorovic, RTS-Radio Beograd - Belgrade, Serbia and Montenegro; Ljiljana D. Milic, Institute Mihailo Pupin - Belgrade, Serbia and Montenegro
In this paper a special type of magnitude complementary IIR filter pair with variable transition bandwidth and impulse response length was used in order to examine the effects of these two characteristics on subjective perception of the reproduced sound. Two types of listening tests were performed. In the first type of tests the sum of crossover outputs was compared to the original signal. In the second type of tests the IIR filter pairs were compared among themselves, as well with linear phase magnitude complementary FIR filter pairs as a reference. The results of the tests show that overall differences are not significant. It was found that considered filters are suitable for loudspeaker crossover applications.

[Associated Poster Presentation in Session P29, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 11:40
Convention Paper 6824 (Purchase now)

P24-10 Perception of Simultaneity and Detection of Asynchrony between Audio and Structural Vibration in Multimodal Music ReproductionKent Walker, William L. Martens, Sungyoung Kim, McGill University - Montreal, Quebec, Canada
In music reproduction incorporating haptic display there is a need to know the human observer’s tolerance for asynchrony between presentations of audio (airborne vibration) and hapti content (in this case whole-body vibration). Two methods for measuring the human tolerance for such audio-haptic asynchrony were employed in experiments using recorded musical instrument sound as stimuli: judgments of the time order of arrival of airborne versus structure-borne vibration, and judgments of subjective simultaneity that required no report of which component arrived first. Optimal intermodal delay values derived from the time order judgments were related to the direct judgments of simultaneity for the same set of stimuli.

Presentation is scheduled to begin at 12:00
Convention Paper 6825 (Purchase now)

P24-11 Computational Two-Channel ITD ModelVille Pulkki, Helsinki University of Technology - Helsinki, Finland
Recently, the Jeffress model of localization has been questioned in neurophysiological studies, and a two-channel ITD model has been proposed. In this paper a simple computational implementation of the two-channel ITD model is presented, which models the ITD decoding based on neurophysiological data, although computationally in a very simple way. The model fits almost perfectly to the neurophysiological data recorded from a guinea pig and matches at least qualitatively with psychoacoustic data.

Presentation is scheduled to begin at 12:20
Convention Paper 6826 (Purchase now)


P25 - Instrumentation and Measurement

Tuesday, May 23, 09:00 — 11:40

Chair: Michel Keyhl, OPTICOM - Erangen, Germany

P25-1 Automatic Recognition of Urban Sound SourcesBoris DeFreville, LASA - Paris, France, University of Cergy-Pontoise, Cergy-Pontoise, France; Pierre Roy, Sony CSL Paris - Paris, France; Christophe Rosin, LASA - Paris, France; François Pachet, SONY CSL Paris - Paris, France
The goal of the FDAI project is to create a general system that computes an efficient representation of the acoustic environment. More precisely, FDAI has to compute a noise disturbance indicator based on the identification of six categories of sound sources. This paper describes experiments carried out to identify acoustic features and recognition models that were implemented in FDAI. This framework is based on EDS–Extractor Discovery System–an innovative acoustic feature extraction system for sound feature extraction. The design and development of FDAI raised two critical issues. Completeness: it is very difficult to design descriptors that identify every sound source in urban environments; and Consistency: some sound sources are not acoustically consistent. We solved the first issue with a conditional evaluation of a family of acoustic descriptors, rather than the evaluation of a single general-purpose extractor. Indeed, a first hierarchical separation between vehicles (moped, bus, motorcycle and car) and non-vehicles (bird and voice) significantly raised the accuracy of identification of the buses. The second issue turned out to be more complex and is still under study. We give here preliminary results.

Presentation is scheduled to begin at 09:00
Convention Paper 6827 (Purchase now)

P25-2 A New Integrated System for Laboratory Speech/Voice ExaminationCostas Pastiadis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Georgia Psyllidou, Paris Telecom - Paris, France; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The paper presents a new computer-based system for the examination and analysis of speech/voice functionality in laboratory environments. Although the system is mainly designed for clinical applications, it employs features that afford its generalized use as a speech/voice acquisition, analysis, and evaluation tool. The system offers an integrated and interactive modular structure for the conduction of various speech/voice examination procedures, and provides necessary data management capabilities for further exploitation in diagnostic expert systems and knowledge-based speech/voice applications.

[Associated Poster Presentation in Session P30, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 09:20
Convention Paper 6828 (Purchase now)

P25-3 Directivity Measurements on a Highly Directive Hearing Aid: The Hearing GlassesMarinus M. Boone, Technical University of Delft - Delft, The Netherlands
A highly directional hearing aid has been developed with the aim to give a much higher speech intelligibility than with conventional hearing aids. The high directivity is obtained by mounting four microphones in each temple of a pair of glasses and performing optimized beam forming. This leads to an averaged directivity index of 9 dB under free field conditions, without head disturbance. In a recent research program the directivity of this device has been measured with different directivity settings under free field and diffuse field conditions, with and without head diffraction. Results are presented of this research, where a comparison will also be made with the directivity of a conventional hearing aid. Also, the influence of the setting of the superdirective beamforming on the noise sensitivity is shown, indicating that for practical use the directivity should be limited.

[Associated Poster Presentation in Session P30, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 09:40
Convention Paper 6829 (Purchase now)

P25-4 Accurate Nonlinear Models of Valve Amplifiers Including Output TransformersPierre Touzelet, Technical Director - Vélizy, France
Available commercial network analysis programs are now powerful enough to look at sophisticated models of complete valve amplifiers including nonlinear components such as valves and output transformers. Objectives of such accurate nonlinear models are evident. They allow for the evaluation, with a high degree of realism, of global amplifier performances and their distortion, reducing, as a result, major risks at the development stage of any amplifier project. It is the intention of this paper to show how such sophisticated models can be developed and which kind of results and information can be extracted from them, by applying these sophisticated modelizations on a real amplifier, as an illustrative example.

Presentation is scheduled to begin at 10:00
Convention Paper 6830 (Purchase now)

P25-5 The Self-Compensated Audio Transformers for Tube and Solid State Single-Ended AmplifiersAristide Polisois, A2B Electronic - La Houssaye en Brie, France; Giovanni Mariani, GRAAF srl - Modena, Italy
The self-compensated output transformer presented at the AES Convention held in Barcelona in May 2005 (Convention Paper 6346), intended for single-ended audio amplifiers, is based on the principle that an auxiliary winding (named tertiary), crossed by the same current as the primary winding, can oppose a magnetic flux that reduces the overall flux, produced by the direct current, in the core, to almost zero. However, at the same time, this antagonist winding also opposes the induced alternating current. A capacitor is therefore connected to its terminals, short-circuiting the alternating current. Under these circumstances, the alternating potential difference is close to zero and the primary is no less affected. But the above short-circuit has a drawback: it reduces the inductance of the primary, considerably. Novel solutions have been found to remove this obstacle to a satisfactory performance of the self-compensated output transformer.

[Associated Poster Presentation in Session P30, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 10:20
Convention Paper 6831 (Purchase now)

P25-6 Some Neglected Audio Distortion MechanismsRichard Black, Richard Black Associates - London, UK
In addition to the familiar harmonic and intermodulation distortions, there exist various other mechanisms by which electronic equipment can degrade sound quality. Some of these are closely related to the familiar types, others are the result of direct acoustical interaction of the equipment, while yet others rely on the existence of two (or more) unrelated distortions in a system to produce an audible result. This paper examines some of these distortion mechanisms.

Presentation is scheduled to begin at 10:40
Convention Paper 6832 (Purchase now)

P25-7 Comparison of Four Subwoofer Measurement TechniquesManuel Melon, Christophe Langrenne, CNAM, Laboratoire d’Acoustique - Paris Cedex, France; David Rousseau, Bruno Roux, BC Acoustique - Alfortville, France; Philippe Herzog, Laboratoire de Mécanique et D’Acoustique - Marsielle, France
Acoustic measurements at very low frequency are difficult to perform. Then, interpretation of the results is tricky. In this paper four subwoofer measurement techniques are compared in terms of frequency response and directivity. The methods used are the following ones: anechoic room, pseudo free-field, isobaric room, and semi-anechoic room. Three subwoofers are tested: two closed-box systems and an active/passive system. For the semi-anechoic technique, double layer pressure measurements on a half-sphere surrounding the source are performed. Then, using spherical harmonic decomposition, outgoing and ingoing pressure fields are separated to recover free field conditions (i.e., removal of reflections on walls below room cut-off frequency). Discrepancies between results are discussed and explained when possible.

Presentation is scheduled to begin at 11:00
Convention Paper 6833 (Purchase now)

P25-8 Room Impulse Response Measurement Using a Moving MicrophoneThibaut Ajdler, Luciano Sbaiz, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland, and University of California at Berkeley, Berkeley, CA, USA
In this paper we present a technique to record a large set of room impulse responses using a microphone moving along a trajectory. The technique processes the signal recorded by the microphone to reconstruct the signals that would have been recorded at all the possible spatial positions along the array. The speed of movement of the microphone is shown to be the key factor for the reconstruction. This fast method of recording spatial impulse responses can also be applied for the recording of head-related transfer functions.

[Associated Poster Presentation in Session P30, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 11:20
Convention Paper 6834 (Purchase now)


P26 - Posters: Loudspeakers and Sound Reinforcement

Tuesday, May 23, 09:00 — 10:30

P26-1 Methods to Improve the Horizontal Pattern of a Line Array Module in the MidrangeNils Benjamin Schröder, Tobias Schwalbe, Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany
This paper reviews methods for modeling the vertical directivity of the frequency range from 200 Hz to 1 kHz in line array configurations. It describes the advantages and disadvantages of the following concepts: the horn, the “V-Alignment,“ the flat alignment, and the partial coverage of the loudspeakers. We will shed light on the interrelationship between the angle of two cone loudspeakers and the resulting directivity. Symmetrical and asymmetrical configurations of mid-range drivers and horns are compared. We will outline a procedure to combine these solutions for superior results. One main result will be the desired match of the midsection’s directivity with the directivity of the hf-waveguide section. A concept for building systems with variable directivity over the whole frequency range will be drafted.

[Poster Presentation Associated with Paper Presentation P19-2]
Convention Paper 6776 (Purchase now)

P26-2 The Performance and Restrictions of High Frequency Waveguides in Line ArraysNils Benjamin Schröeder, Tobias Schwalbe, Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany
It is necessary to form a plane coherent wavefront in the hf-section of line arrays. Several different concepts have been applied to reach this goal. We discuss these existing solutions. The different ideas on how to create a cylindrical wavefront will be explained and evaluated. Especially those waveguides that have their weak point in the theoretical design will be criticized. An explanation on how we developed a new waveguide will be given. Finally, we want to give some ideas on how the next generation of waveguides could be designed.

[Poster Presentation Associated with Paper Presentation P19-3]
Convention Paper 6777 (Purchase now)

P26-3 Efficient Nonlinear LoudspeakersBo Rohde Pedersen, University of Aalborg - Aalborg, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
Loudspeakers have traditionally been designed to be as linear as possible. However, as techniques for compensating nonlinearities are emerging, it becomes possible to use other design criteria. This paper presents and examines a new idea for improving the efficiency of loudspeakers at high levels by changing the voice coil layout. This deliberate nonlinear design has the benefit that a smaller amplifier can be used, which, in turn, has the benefit of reducing system cost as well as reducing power consumption.

[Poster Presentation Associated with Paper Presentation P19-4]
Convention Paper 6778 (Purchase now)

P26-4 A Dipole Multimedia LoudspeakerVladimir Filevski, Broadcasting Council of Macedonia - Skopje, Macedonia
A multimedia/computer loudspeaker usually stands on a desk, so the reflected sound from the desk interferes with the direct sound from the loudspeaker. This results in a comb-like frequency response, with first minimum deep at least -8 dB, followed higher in the frequency by a peak of about +4 dB, and so on. This paper describes the design of a dipole multimedia/computer loudspeaker, with less than +2 dB/ -2.4 dB of difference between resultant frequency response (including reflected sound from the desk) and anechoic response.

[Poster Presentation Associated with Paper Presentation P19-7]
Convention Paper 6781 (Purchase now)

P26-5 A Compact 120 Independent Element Spherical Loudspeaker Array with Programmable Radiation PatternsRimas Avizienis, Adrian Freed, Peter Kassakian, David Wessel, University of California at Berkely - Berkeley, CA, USA
We describe the geometric and engineering design challenges that were overcome to create a new compact, 10-inch
diameter spherical loudspeaker array with integrated class-D amplifiers and a 120-independent channel digital audio interface using Gigabit Ethernet. A special hybrid geometry is used that combines the maximal symmetry of a triangular-faceted icosahedron with the compact planar packing of six circles on an equilateral triangle ("billiard ball packing"). Six custom 1.25-inch drivers developed by Meyer Sound Labs are mounted on each of 20 aluminum triangular circuit boards. Class D amplifiers for the six loudspeakers are mounted on the other side of each board. Two pentagonal circuit boards in the icosahedron employ Xilinx Spartan 3E FPGA's to demultiplex digital audio signals from incoming Gigabit Ethernet packets and process them before feeding the class-D modulators. Processing includes scaling, delaying, filtering, and limiting.

[Poster Presentation Associated with Paper Presentation P19-9]
Convention Paper 6783 (Purchase now)

P26-6 Constant Directivity End-Fire Arrays for Public Address SystemsFilip Verbinnen, University of Southampton - Southampton, UK
The directivity of current public address systems is controlled very well in mid and high audio frequencies using arrays or horns. Low frequencies, though, are mostly still omnidirectional. The cardioid subwoofer is making its introduction but has some drawbacks limiting the maximum sound pressure level achievable by this type of system. As a possibly better alternative, the end-fire line array is considered as a directive bass system. Some research has already been done on end-fire arrays but none exploited the current potential of digital signal processing techniques. Using a linear end-fire array of loudspeakers each with its own digitally-processed input, the possibilities and limitations of these tapered end-fire linear arrays were examined with the main goal to create a constant directivity end-fire array with a usable frequency range from 20Hz to 200 Hz.

[Poster Presentation Associated with Paper Presentation P19-11]
Convention Paper 6785 (Purchase now)

P26-7 DGRC Arrays: A Synthesis of Geometrical and Electronic Loudspeaker ArraysXavier Meynial, Active Audio - St. Herblain, France
Loudspeaker arrays offer an efficient way of achieving both uniform SPL coverage and high sound clarity over a large audience area. Two types of arrays have been proposed over the last 15 years: geometrically steered J shape arrays, mainly for high power sound reinforcement; and electronically steered vertical arrays, mainly for speech diffusion in public spaces. This paper introduces the Digital and Geometric Radiation Control (DGRC) principle, which combines the advantages of geometrical arrays and electronic arrays; an array that is vertical so that it can be mounted on a wall; that is controlled with great flexibility using its DSP; and that the power is evenly distributed upon loudspeakers.

[Poster Presentation Associated with Paper Presentation P1921-12]
Convention Paper 6786 (Purchase now)

P26-8 Universal System for Spatial Sound Reinforcement in Theaters and Large Venues—System Design and User InterfaceFrank Melchior, Gabriel Gatzsche, Michael Strauss, Katrin Reichelt, Martin Dausel, Joachim Deguara, Fraunhofer IDMT - Ilmenau, Germany
Sound reinforcement for large venues is a challenging task. Up to now most of the systems and concepts are focused on a more or less stereophonic reproduction. Beside these concepts a promising technology exists, which enables a spatial sound reinforcement for a larger audience. Spatial sound reinforcement is an important aspect especially in high quality applications like opera houses and venues for classical music. This paper presents an innovative system- and multi-user interface concept for dynamic automation and interactive control of sound source positions and other properties for variable reproduction systems in live sound reinforcement applications. The system has been designed in close cooperation with experts in sound reinforcement for opera houses. The developed user interfaces are described in addition to a detailed view on the practical realization and audio processing in such a system.

[Poster Presentation Associated with Paper Presentation P19-13]
Convention Paper 6787 (Purchase now)

P26-9 Sound Field Characterization in Audio Reproduction with the Bit-Grouped Digital Transducer ArrayJorge Mendoza-López, Simon C. Busbridge, University of Brighton - Brighton, East Sussex, UK; Peter A. Fryer, B&W Group Ltd. - Steyning, West Sussex, UK
A bit-grouped digital transducer array loudspeaker with different numbers of nominally identical transducers for each bit has been developed. The direct digital-to-acoustic conversion process produces a sound field whose quality is shown to be spatially dependent and highly influenced by real effects including nonuniform transducer frequency responses, transducer mismatching, and baffle size. Spatial sound pressure and total harmonic distortion maps show that reducing the array size leads to improved reconstruction due to reduced phase distortion. For a given sampling rate and signal frequency, total harmonic distortion decreases as the listening distance is increased. A new criterion for the sweet-spot location in digital arrays is proposed based on the difference between the distortion introduced by path-length differences and the inherent quantization distortion.
Convention Paper 6835 (Purchase now)

P26-10 Radiation Impedance of Transducer Field Driven by Binary SignalsLibor Husník, František Kadlec, Czech Technical University in Prague - Prague, Czech Republic
This paper addresses another aspect of transducers with the direct digital to analog conversion, sometimes called digital loudspeakers, which is the radiation impedance. In a transducer array embodying such a system, every value of the acoustic pressure from its dynamic range is radiated by a different number of elementary transducers, i.e., different total surface of membranes, driven by the PCM signal. Since the critical frequency depends mainly on the total surface of membranes, an interesting phenomenon appears, i.e., every sound pressure level is radiated with a different radiation impedance. As a result, different levels may be radiated differently.
Convention Paper 6836 (Purchase now)

P26-11 An Introductory Review for U-fa (USM Driven Woofer) DevelopmentHirokazu Negishi, University of Essex - Yokosuka, Japan
The concept of “U-fa” was born twelve years ago. Since that time, there has been a lot of development in that field. This paper was originally intended to be presented at the AES/NY Convention in 2001 but was cancelled for various reasons. However, a second revival of the activity brought about several Japanese convention papers, and they debuted in AES/NY last October. Since the audio world and Ultrasonic Motor has little common ground, it seems difficult to appreciate what actually made the difference by introducing USM to Woofer. In order to bridge the gap, the original paper has been totally rewritten. Emphasis is now put on introductory reading for early day’s and the background of U-fa, rather than on theories and equations.
Convention Paper 6837 (Purchase now)

P26-12 Improved Model of Loudspeaker Using Continuous Revolution of Ultrasonic MotorYusuke Iwaki, Yuta Ohnuma, Juro Ohga, Shibaura Institute of Technology - Minato-ku, Tokyo, Japan; Hirokazu Negishi, DiMagic Co. Ltd. - Tokyo, Japan; Kazuaki Maeda, TOA Co. - Kobe, Japan
The loudspeaker using continuous revolutions of an ultrasonic motor (USM), proposed by the authors, is suitable to radiate sound at very low frequency. This paper describes an improved model of the USM loudspeaker. Functions of rotor and stator is more inversed than the first model to simplify the electric connection. Mass of the rotating ring is increased to make the inertial force larger. A silicon rubber joint is used to connect USM and the cone radiator to avoid frictional noise.
Convention Paper 6838 (Purchase now)

P26-13 Ring Element Model: Program ResultsElena Prokofieva, Linn Products - Waterfoot, UK
The new modeling method was described in a series of papers presented at AES conventions in 2004 through 2005. In the proposed model the general approach is to represent a cone driver as a set of rings, loaded by a concentric force, applied around the lower element’s edge. The ring element is preferred to the finite element model due to its simple yet precise driver simulation. The standard theoretical model of the radiating piston was initially considered. The problems inherent to this approach were highlighted, and the model was improved by removing standard assumptions one-by-one and replacing them with more complex calculation procedures for improved simulation. The first group of assumptions is programmed and compared against the real measurements and analytical calculation. The advantages of each model are studied and an explanation of how each of the standard theoretical assumptions affects the final result is provided. The possibilities to use different models in preliminary loudspeaker design are also discussed in this paper.
Convention Paper 6839 (Purchase now)

P26-14 Analysis and Optimal Design of Miniature LoudspeakersMingsian Bai, Rong-Liang Chen, National Chiao-Tung University - Hsin-Chu, Taiwan
Miniature loudspeakers are key components to many 3C products especially for portable devices such as mobile phones, PDAs, MP3 players, etc. Due to size limitation, miniature loudspeakers suffer from the problem of low output level. To gain higher output, one tends to drive the miniature loudspeaker over the excursion limit and induce nonlinear distortion. Thus, how to best reconcile the conflicting requirements of nonlinear distortion and acoustic output is extremely crucial in the design of such loudspeakers. To address the issue, this paper presents a systematic procedure to pinpoint the optimal designs appropriate for miniature dynamic moving-coil loudspeakers. The optimization procedure is based on an electro-acoustic model established by using the test-box method. Characteristics including voice-coil impedance, frequency response, and harmonic distortion are evaluated. The results show that significant improvement in output performance and excursion limitation has been gained by using the optimal design.
Convention Paper 6840 (Purchase now)

P26-15 Positions Effect of Multi Exciters and the Optimization on Sound Pressure Responses of Distributed Mode LoudspeakerSuzhen Zhang, Yong Shen, Nanjing University - Nanging, China; Xiaoxiang Shen, Creative Technology (China) Co., Ltd. - Shanghai, China
The exciters of distributed mode loudspeaker (DML) mainly play two roles, activating forces and attached masses, both of which will affect the sound pressure response of the panel. Therefore, to derive a smoother sound pressure response, the positions of the exciters should be considered carefully. In this paper the model of a panel with activating forces and attached masses is developed with partial differential equations (PDEs) in FEMLAB. The optimized positions of the exciters are given combining the use of genetic algorithm (GA) based on two different optimization criteria, sound pressure response and mode distribution. Optimal results in both cases are derived and show that various optimization criteria lead to different sound pressure responses and sound pressure sensitivities.
Convention Paper 6841 (Purchase now)

P26-16 Simulation of Reconstruction of Oversampled Signals in Digital LoudspeakersHaihua Zhang, Simon C. Busbridge, Chris Garrett, University of Brighton - Brighton, UK; Peter A. Fryer, B&W Group Ltd. - Worthing, West Sussex, UK
The technique of over sampling and noise shaping has the potential to improve the resolution of digital loudspeaker systems, at the expense of increasing the signal bandwidth. Previous work has shown that the acoustic radiator in a digital loudspeaker system can act as a reconstruction filter if the over-sampled signal bandwidth exceeds the transducer bandwidth. If the oversampled signal is within the transducer bandwidth, the use of reconstruction filters has to be considered in the system. This paper presents an investigation of reconstruction with both pre-acoustic and postacoustic filtering. Mathematical modeling suggests that the reconstruction in a direct digital-to-analog loudspeaker should take place before the summation of the digital bit streams to avoid intermodulation distortion. This is counter-intuitive, because the electronic driving signals are no longer digital in the digital loudspeaker system.
Convention Paper 6842 (Purchase now)

P26-17 Digital Measurement for Dynamic Distortion of LoudspeakersKeiichi Imaoka, Juro Ohga, Shibaura Institute of Technology - Minato-ku, Tokyo, Japan
Most measuring methods using digital signal processing techniques developed recently are only for measurements in a linear range. There is still no suitable measuring method by a digital processing system for measurement of nonlinear distortion of acoustical devices. An accurate and convenient nonlinear distortion measurement system should be developed. The authors already proposed a new digital distortion measuring method for acoustical devices. This method applies a Pink-TSP signal (Time Stretched Pulse, i.e., quickly swept sinusoidal signal), whose frequency band is partially eliminated, to an acoustical system to be measured. The detected component produced in the rejected band is measured as a distortion. This paper analyzes experimental results by using a single resonant system.
Convention Paper 6843 (Purchase now)

P26-18 Effect of Membrane Damages on Loudspeaker PerformanceRomuald Boleiko, Wroclaw University of Technology - Wroclaw, Poland
This paper deals with the effect of membrane damage on loudspeaker parameters. Tweeters with small dents in their metal membranes and with a perforation in their fabric membranes were tested. Acoustic parameters of the loudspeakers were investigated by measuring the sound pressure level frequency response and vibrations of loudspeaker’s membrane. A scanning laser Doppler vibrometer was used for the latter measurements. It was found that dents and a perforation in the tweeter’s dome may change its frequency response by more than 6 dB. As a rule, dents in the dome affect the tweeter frequency response at high frequencies while a perforation in the soft fabric dome affects the tweeter response at medium frequencies.
Convention Paper 6844 (Purchase now)

P26-19 Loudspeaker Testing at the Production LineWolfgang Klippel, Stefan Irrgang, Ulf Seidel, Klippel GmbH - Dresden, Germany
Quality control in the mass production of transducers and electro-acoustical systems requires an objective technique for reliable selection of defective units. A new technique is presented for detecting defects that produce almost inaudible symptoms during testing but may degrade sound quality in the final application (e.g., loose particles in the gap). Here the regular distortion that is characteristic for good units is modeled and actively compensated in the measured signal of a device under test to reveal symptoms of irregular defects (meta-hearing technology). This paper shows ways to perform high-speed measurements close to the physical limits and how to cope with ambient noise in a production environment. Traditional and more advanced techniques for separating passed and failed units are compared and their integration into process control is discussed. Finally, the paper addresses cost effective implementation in a robust hardware, flexibility to customer’s needs, simple handling, and other practical requirements.
Convention Paper 6845 (Purchase now)

P26-20 New Structure of LoudspeakerGuy Lemarquand, University du Maine - Le Mans, France
We present a new structure of loudspeaker: the motor is ironless, the suspension is ferrofluidic, and the moving part is piston-like, with a concave dome. The absence of iron guarantees a small and constant inductance of the moving coil, as well as the absence of eddy currents. The motor includes two circular joints, one on each side of the moving coil. These joints are ferrofluidic. They fulfill the guidance and centering function and the air tightness function. This structure is quite rigid. As there is no traditional suspension in this structure, the related nonlinearities and hysteresis disappear.
Convention Paper 6846 (Purchase now)


P27 - Posters: Design and Engineering of Auditory Displays

Tuesday, May 23, 11:00 — 12:30

P27-1 Acoustic Rendering for Color InformationLudovico Ausiello, Emanuele Cecchetelli, Massimo Ferri, Nicoletta Caramelli, University of Bologna - Bologna, Italy
The Espacio Acustico Virtual (EAV) is a portable device that acoustically represents visual environmental scenes by rendering objects with the sound of virtual rain drops. Here, an improvement of this device is presented, which adds color to the information conveyed. Two different mappings of color into sound were implemented. Georama is a geometric coding based on red green blue vectors, while Colorama is an associative coding based on the hue and saturation model of color space. An experiment was run on both sighted and blind participants in order to assess which of these coding is the most user-friendly. The results showed that participants learned to discriminate colors through sounds better when trained with Georama than with Colorama.

[Poster Presentation Associated with Paper Presentation P22-2]
Convention Paper 6796 (Purchase now)

P27-2 Auditory Display of AudioDensil Cabrera, Sam Ferguson, University of Sydney - Sydney, New South Wales, Australia
In this paper we consider applications of auditory display for representing audio systems and audio signal characteristics. Conventional analytic representation of system characteristics, such as impulse response or nonlinear distortion, rely on numeric and graphic communication. Alternatively, simply listening to the system under test should reveal important aspects of its performance. Given that auditioning systems is so effective, it seems useful to develop higher-level auditory representations (auditory displays) of system performance parameters to exploit these listening abilities. For this purpose, we consider ways in which audio signals can be further transformed for auditory display, beyond the simple act of playing the sound.

[Poster Presentation Associated with Paper Presentation P22-3]
Convention Paper 6797 (Purchase now)

P27-3 Nonvocal Auditory Signals in the Operating Room for Each Phase of the Anesthesia ProcedureAnne Guillaume, Léonore Bourgeon, Elisa Jacob, Marie Rivenez, Claude Valot, IMASSA - Brétigny sur Orge, France; Jean-Bernard Cazalà, Hôpital Necker - Paris, France
Auditory warning signals are considered by the anesthetist team as a major source of annoyance and confusion in the operating room. An ergonomic approach was carried out in order to propose a functional classification of the auditory alarms and to allocate a correct level of urgency to them. It allows the team to analyze the pertinence of the auditory warning signals emitted during anesthesia progress taking into account each phase of the anaesthesia procedure. The results showed that the design of auditory warning signals could be improved taking into account the activity of the anesthetist team. They also showed significantly higher frequencies of warning signals during induction and emergence phases. However, the alarms were often ignored during these two phases as they occurred as a result of deliberate anesthetist actions. Most of them were then considered as nuisance alarms.

[Poster Presentation Associated with Paper Presentation P22-4]
Convention Paper 6798 (Purchase now)

P27-4 Usability of 3-D Sound for Navigation in a Constrained Virtual EnvironmentAntoine Gonot, France Telecom R&D - Lannion, France, CNAM, Paris, France; Noël Château, Marc Emerit, France Telecom R&D - Lannion, France
This paper presents a study on a global evaluation of spatial auditory displays in a constrained virtual environment. Forty subjects had to find nine sound sources in a virtual town, navigating by using spatialized auditory cues that were delivered differently in four different conditions: by a binaural versus a stereophonic rendering (through headphones) combined by a contextualized versus decontextualized presentation of information. Behavioral data, auto-evaluation of cognitive load and subjective-impression data collected via a questionnaire were recorded. The analysis shows that the binaural-contextualized presentation of auditory cues leads to the best results in terms of usability, cognitive load, and subjective evaluation. However, these advantages are only observable after a certain period of acquisition.

[Poster Presentation Associated with Paper Presentation P22-6]
Convention Paper 6800 (Purchase now)

P27-5 Psychoacoustic Evaluation of a New Method for Simulating Near-Field Virtual Auditory SpaceAlan Kan, Craig Jin, André van Schaik, University of Sydney - Sydney, New South Wales, Australia
A new method for generating near-field virtual auditory space (VAS) is presented. This method synthesizes near-field head-related transfer functions (HRTFs) based on a distance variation function (DVF). Using a sound localization experiment, the fidelity of the near-field VAS generated using this technique is compared to that obtained using near-field HRTFs synthesized using a multipole expansion of a set of HRTFs interpolated using a spherical thin-plate spline. Individualized HRTFs for varying distances in the near-field were synthesized using the subjects’ HRTFs measured at a radius of 1-m for a limited number of locations around the listener’s head. Both methods yielded similar localization performance showing no major directional localization errors and reasonable correlation between perceived and target distances of sounds up to 50 cm from the center of the subjects head. Also, subjects tended to overestimate the target distance for both methods.

[Poster Presentation Associated with Paper Presentation P22-7]
Convention Paper 6801 (Purchase now)


P28 - Audio Recording and Reproduction

Tuesday, May 23, 12:00 — 13:20

Chair: Geoff Martin, Bang & Olufsen A/S - Struer, Denmark

P28-1 Multichannel High Performance Analog Volume Control with a New Serial I2C/SPI Compatible Control PortVivek Saraf, Chad Hardy, John Tucker, Johann Gaboriau, Cirrus Logic, Inc. - Austin, TX, USA
A new scheme for digitally operating an 8-channel high-performance analog volume control is proposed. Along with being I2C/SPI compatible, this scheme is faster and less complex than existing solutions due to its inherent advanced support for group/individual addressing. The volume control varies monotonically all the way from -96 dB to +22 dB in 0.25 dB steps. The chip achieves 110 dB THD+N, 1.8 µVrms total integrated noise and an interchannel isolation of greater than 120 dB.

[Associated Poster Presentation in Session P31, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 12:00
Convention Paper 6847 (Purchase now)

P28-2 Evaluation of Ambience Microphone Arrangements Utilizing Frequency Dependent Spatial Cross Correlation (FSCC)Teruo Muraoka, University of Tokyo - Tokyo, Japan
Ambience in stereophonic recording is mostly determined with the main microphone system. The authors examined the effect utilizing Frequency-dependent Spatial Cross Correlation (FSCC), which is defined as a cross correlation of two outputs of the above-mentioned microphone system. If the recording sound field is diffusive, the above FSCC is desired to be zero. However, actual FSCC varies within -1 to 1 depending on frequency; that is brought by the microphone’s directionality and location. The authors theoretically analyzed FSCCs of typical main microphone systems such as AB system, ORTF system, WF system, and MS system and discovered that the FSCC of MS system becomes uniformly zero under conditions when its azimuth of directional angles is set at 132-degrees. This was proven experimentally, and excellent ambient recording was achieved.

[Associated Poster Presentation in Session P31, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 12:20
Convention Paper 6848 (Purchase now)

P28-3 Parameter Estimation of Dynamic Range Compressors: Models, Procedures, and Test SignalsUwe Simmer, University of Applied Sciences Oldenburg/Ostfriesland/Wilhelmshave - Oldenburg, Germany; Denny Schmidt, University of Bremen - Bremen, Germany; Joerg Bitzer, University of Applied Sciences Oldenburg/Ostfriesland/Wilhelmshave - Oldenburg, Germany
An analysis of digital dynamic range compression algorithms is presented. They are studied by employing a single-band feed-forward compressor model allowing the use of independent attack and release times for both RMS detection and gain smoothing. Artificial test signals for measuring the static and dynamic compressor characteristics are discussed. The parameters of the compressor model are estimated by fitting the model output to the output of the compressor under test by using the simplex method. The results are verified by comparing the output levels of the actual and the fitted compressor for real world audio samples.

[Associated Poster Presentation in Session P31, Tuesday, May 23, at 14:00]

Presentation is scheduled to begin at 12:40
Convention Paper 6849 (Purchase now)

P28-4 Redefining the Directivity Index for Adaptive Microphone ArraysDaniel Schobben, Philips Research Laboratories - Eindhoven, The Netherlands
The Directivity Index (DI) allows for quantifying the directivity of a microphone or a nonadaptive array of microphones. The DI indicates how well a target sound source can be extracted in the presence of diffuse background noise. Adaptive microphone arrays have the potential to provide an improved performance for suppressing distracters that are spatially apart from the target sound source. In the absence of performance measures for adaptive microphone arrays, the DI has been used to evaluate adaptive microphone arrays in both anechoic and diffuse noise conditions. For adaptive microphone arrays the DI may not reflect the real-life performance of the array however, with theoretical directivity values that can be driven to infinity for spatially separated target and distracter sound sources in anechoic conditions, while such conditions will hardly be encountered in daily life. On the other hand, adaptive microphone arrays inherently do not have improved performance in diffuse sound fields for which the DI originally has been defined. Starting from the definition of DI a new measure for adaptive arrays is introduced in this paper as an objective measure for quantifying improvements in signal-to-noise based on directivity.

Presentation is scheduled to begin at 13:00
Convention Paper 6850 (Purchase now)


P29 - Posters: Psychoacoustics, Perception, and Listening Tests

Tuesday, May 23, 14:00 — 15:30

P29-1 Designing a Spatial Audio Attribute Listener Training System for Optimal TransferRafael Kassier, Tim Brookes, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
Interest in spatial audio has increased due to the availability of multichannel reproduction systems for the home and car. Various timbral ear training systems have been presented, but relatively little work has been carried out into training in spatial audio attributes of reproduced sound. To demonstrate that such a training system is truly useful, it is necessary to show that learned skills are transferable to different settings. Issues relating to the transfer of training are examined; a recent study conducted by the authors is discussed in relation to the level of transfer shown by participants, and a new study is proposed that is aimed to optimize the transfer of training to different environments.

[Poster Presentation Associated with Paper Presentation P24-4]
Convention Paper 6819 (Purchase now)

P29-2 Evaluation of Loudness in a Room Acoustic ModelYi Shen, Konstantinos Angelakis, Technical University of Denmark - Kgs. Lyngby, Denmark
The equal loudness contours were measured using an analytical model, which was constructed for rectangular rooms based on the Kuttruff’s room acoustic model. A stimulus can be presented through the room acoustic model to the subjects via headphones. The results are in very close agreement with measurements in a real room. Two additional experiments were conducted to study the loudness as a function of reverberation time for different types of stimuli. The results showed that the effect of reverberation on loudness is negligible for a stationary stimulus. On the other hand, for an impulse train, loudness depends on both the reverberation time of the test room and the repetition frequency of the stimulus.

[Poster Presentation Associated with Paper Presentation P24-5]
Convention Paper 6820 (Purchase now)

P29-3 Investigations in Real-time Loudness MeteringGilbert Soulodre, Michel Lavoie, Communications Research Centre - Ottawa, Ontario, Canada
There has been much research in the past few years on loudness perception and metering. Recently, the authors developed an objective loudness algorithm that accurately measures the perceived loudness of mono, stereo, and multichannel audio sequences. The algorithm provides a single loudness reading for the overall audio sequence. In broadcast, film, and music applications it is desirable to have a real-time loudness meter that can track the loudness of the audio signal over time. The new meter would be used in conjunction with existing metering methods to provide additional information about the audio signal. In the present paper the requirements for such a meter are examined and new subjective testing methods are devised to help in the development and evaluation of a new meter.

[Poster Presentation Associated with Paper Presentation P24-7]
Convention Paper 6822 (Purchase now)

P29-4 Measuring the Threshold of Audibility of Temporal DecaysAndrew Goldberg, Genelec Oy - Iisalmi, Finland; Helsinki University of Technology, Espoo, Finland
A listening test system designed to measure the threshold of audibility of the decay time of low frequency resonances is described. The system employs the Parameter Estimation by Sequential Testing (PEST) technique, and the listening test is conducted on calibrated headphones to remove factors associated with the listening environment. Program signal, replay level, and resonance frequency are believed to influence decay time threshold. A trial the listening test shows that the system reveals realistic results but the temporal resonance modeling filter requires some adjustment to remove audible nonmodal cues. Transducer limitations still affect the test at low frequencies and high replay levels. Factors for future large-scale listening tests are refined. Early indications are that temporal decay thresholds rise with reduced frequency and SPL.

[Poster Presentation Associated with Paper Presentation P24-8]
Convention Paper 6823 (Purchase now)

P29-5 The Influence of Impulse Response Length and Transition Bandwidth of Magnitude Complementary Crossover on Perceived Sound QualityIva Djukic, Institute Mihailo Pupin - Belgrade, Serbia and Montenegro; Dejan Todorovic, RTS-Radio Beograd - Belgrade, Serbia and Montenegro; Ljiljana D. Milic, Institute Mihailo Pupin - Belgrade, Serbia and Montenegro
In this paper a special type of magnitude complementary IIR filter pair with variable transition bandwidth and impulse response length was used in order to examine the effects of these two characteristics on subjective perception of the reproduced sound. Two types of listening tests were performed. In the first type of tests the sum of crossover outputs was compared to the original signal. In the second type of tests the IIR filter pairs were compared among themselves, as well with linear phase magnitude complementary FIR filter pairs as a reference. The results of the tests show that overall differences are not significant. It was found that considered filters are suitable for loudspeaker crossover applications.

[Poster Presentation Associated with Paper Presentation P24-9]
Convention Paper 6824 (Purchase now)

P29-6 A System for Rapid Measurement and Direct Customization of Head-Related ImpulseSimone Fontana, Ecole Nationale Supérieure des Télécommuications - Paris, France; Angelo Farina, Università di Parma - Parma, Italy; Yves Grenier, Ecole Nationale Supérieure des Télécommuications - Paris, France
Head-Related Impulse Responses (HRIRs) measurement systems are quite complex and present long acquisition times for an accurate sampling of the full 3-D space. Therefore HRIRs customization has become an important research topic. In HRIRs customization some parameters (generally anthropometric measurements) are obtained from new listeners and ad-hoc HRIRs can be retrieved from them. Another way to get new listeners’ parameters is to measure a subset of the full 3-D space HRIRs and extrapolate them in order to obtain a full 3-D database. This partial acquisition system, of course, should be rapid and accurate. In this paper we present a system that allows for rapid acquisition and equalization of HRIRs for a subset of the 3-D grid. Then a technique to carry out HRIR customization based on the measured HRIRs will be described.
Convention Paper 6851 (Purchase now)

P29-7 Observations on Bimodal Audio Visual Subjective Assessments of a Virtual 3-D SceneUlrich Reiter, Sebastian Großmann, Dominik Strohmeier, Markus Exner, Technische Universität Ilmenau - Ilmenau, Germany
This paper deals with observations made during audio visual subjective assessments of perceived overall quality of a virtual 3-D scene. Over 30 test subjects were individually presented with a virtual living room. For each of them a predefined sequence of self-movement was displayed on a 2.7-m wide projection screen. The visual impression was complemented with different versions of room acoustic real time simulations rendered audible via a circular 8-channel loudspeaker setup. These were contrasted in a pair-comparison test. Interestingly, the amount of reverberation judged by test subjects to be “most realistic” was highly dependent on the acoustic stimulus itself. We will present a number of interesting observations related to expectations of test subjects and gender, as well as an interpretation from which we can derive a number of suggestions for subsequent bimodal assessments.
Convention Paper 6852 (Purchase now)

P29-8 Measurement of Reverberation Discrimination Threshold for Chinese Subjects with Chinese Music MotifsZihou Meng, Fengjie Zhao, Communication University of China - Beijing, China
The just-noticeable difference of reverberation time for Chinese subjects was studied using the Chinese music motifs as the test materials. Three subject groups were tested—the audio technician group, the students from the audio engineering department, and a group of students without professional training on audio engineering and listening. The test was carried out with headphones in mono style to get the intrinsic reverberation perception. The psychometric method is constant-stimulus-method. The measured just-noticeable difference of reverberation is higher than that reported in the study with western subjects and western music. The difference caused by different music motifs is insignificant, but the difference possibly caused by the professional training and experience of different subjects groups is noticeable.
Convention Paper 6853 (Purchase now)

P29-9 An Auditory Process Model for the Evaluation of Virtual Acoustic Imaging SystemsMunhum Park, Philip A. Nelson, University of Southampton - Southampton, UK; Youngtae Kim, Samsung Advanced Institute of Technology (SAIT) - Yongin-Si Gyeonggi-do, Korea
This paper describes the initial application of an auditory process model to the evaluation of various virtual acoustic imaging systems. The model has been designed to simulate human binaural hearing by means of an equalization-cancellation process for the binaural process and a template-matching with frequency weighting for the central process, while linear and nonlinear filters have been employed for the peripheral process. The model prediction has been shown to be consistent with the performance of human spatial hearing in case of localization of white Gaussian noise and the lateralization of low-frequency pure tones. In this paper virtual acoustic images presented by conventional stereophony, the stereo dipole, and the optimal source distribution have been tested on the optimal listening positions, following a discussion on the template matching process of the model. The simulation results suggest that the current model, with certain limitations, can be a good predictor of the fidelity of such systems in providing a virtual sound image.
Convention Paper 6854 (Purchase now)

P29-10 Evaluation of Packet Loss Distortion in Audio SignalsJan Erik Voldhaug, Erik Hellerud, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim Norway
Audio streamed over best effort packet switched networks under real-time requirements may be distorted by lost or delayed packets. In this work triple stimulus hidden reference subjective tests are used to evaluate the perceptual quality of audio signals exposed to packet loss. The effects of packet loss combined with both very simple and very complex error concealment schemes are evaluated, together with four different packet loss rates and five audio clips. Results show statistically significant differences between different packet loss rates, error concealment schemes, and audio clips. Results are also compared with output from an objective audio quality evaluation tool (PEAQ).
Convention Paper 6855 (Purchase now)

P29-11 Dithering Strategy Applied to Tinnitus MaskingAndrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland, International Center of Hearing and Speech, Warsaw, Poland; Krzysztof Kochanek, Henryk Skarzynski, International Center of Hearing and Speech - Warsaw, Poland
The hypothesis on the existence of a parasitic quantization that accompanies hearing loss has been formulated in this paper and then related to other existing theories on causes of tinnitus. Some preliminary experiments have been carried out to verify the correctness of the proposed interpretation of applied maskers employing dither theory. An effective method of providing a masking signal that uses bone conductivity was derived for the purpose of these experiments. The results of the experiments initially confirm the analogy between the threshold phenomena occurring in the digital audio circuits and ear noises origin. The presented results may induce the elaboration of more effective ear therapies based on high-frequency dither having specially formed spectral characteristics.
Convention Paper 6856 (Purchase now)


P30 - Posters: Instrumentation and Measurement

Tuesday, May 23, 14:00 — 15:30

P30-1 A New Integrated System for Laboratory Speech/Voice ExaminationCostas Pastiadis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Georgia Psyllidou, Paris Telecom - Paris, France; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The paper presents a new computer-based system for the examination and analysis of speech/voice functionality in laboratory environments. Although the system is mainly designed for clinical applications, it employs features that afford its generalized use as a speech/voice acquisition, analysis, and evaluation tool. The system offers an integrated and interactive modular structure for the conduction of various speech/voice examination procedures, and provides necessary data management capabilities for further exploitation in diagnostic expert systems and knowledge-based speech/voice applications.

[Poster Presentation Associated with Paper Presentation P25-2]
Convention Paper 6828 (Purchase now)

P30-2 Directivity Measurements on a Highly Directive Hearing Aid: The Hearing GlassesMarinus Boone, Technical University of Delft - Delft, The Netherlands
A highly directional hearing aid has been developed with the aim to give a much higher speech intelligibility than with conventional hearing aids. The high directivity is obtained by mounting four microphones in each temple of a pair of glasses and performing optimized beam forming. This leads to an averaged directivity index of 9 dB under free field conditions, without head disturbance. In a recent research program the directivity of this device has been measured with different directivity settings under free field and diffuse field conditions, with and without head diffraction. Results are presented of this research, where a comparison will also be made with the directivity of a conventional hearing aid. Also, the influence of the setting of the superdirective beamforming on the noise sensitivity is shown, indicating that for practical use the directivity should be limited.

[Poster Presentation Associated with Paper Presentation P25-3]
Convention Paper 6829 (Purchase now)

P30-3 The Self-Compensated Audio Transformers for Tube and Solid State Single-Ended AmplifiersAristide Polisois, A2B Electronic - La Houssaye en Brie, France; Giovanni Mariani, GRAAF srl - Modena, Italy
The self-compensated output transformer presented at the AES Convention held in Barcelona in May 2005 (Convention Paper 6346), intended for single-ended audio amplifiers, is based on the principle that an auxiliary winding (named tertiary), crossed by the same current as the primary winding, can oppose a magnetic flux that reduces the overall flux, produced by the direct current, in the core, to almost zero. However, at the same time, this antagonist winding also opposes the induced alternating current. A capacitor is therefore connected to its terminals, short-circuiting the alternating current. Under these circumstances, the alternating potential difference is close to zero and the primary is no less affected. But the above short-circuit has a drawback: it reduces the inductance of the primary, considerably. Novel solutions have been found to remove this obstacle to a satisfactory performance of the self-compensated output transformer.

[Poster Presentation Associated with Paper Presentation P25-5]
Convention Paper 6831 (Purchase now)

P30-4 Room Impulse Response Measurement Using a Moving MicrophoneThibaut Ajdler, Luciano Sbaiz, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland, and University of California at Berkeley, Berkeley, CA, USA
In this paper we present a technique to record a large set of room impulse responses using a microphone moving along a trajectory. The technique processes the signal recorded by the microphone to reconstruct the signals that would have been recorded at all the possible spatial positions along the array. The speed of movement of the microphone is shown to be the key factor for the reconstruction. This fast method of recording spatial impulse responses can also be applied for the recording of head-related transfer functions.

[Poster Presentation Associated with Paper Presentation P25-8]
Convention Paper 6834 (Purchase now)


P31 - Posters: Audio Recording and Reproduction

Tuesday, May 23, 14:00 — 15:30

P31-1 Multichannel High Performance Analog Volume Control with a New Serial I2C/SPI Compatible Control PortVivek Saraf, Chad Hardy, John Tucker, Johann Gaboriau, Cirrus Logic, Inc. - Austin, TX, USA
A new scheme for digitally operating an 8-channel high-performance analog volume control is proposed. Along with being I2C/SPI compatible, this scheme is faster and less complex than existing solutions due to its inherent advanced support for group/individual addressing. The volume control varies monotonically all the way from -96 dB to +22 dB in 0.25 dB steps. The chip achieves 110 dB THD+N, 1.8 µVrms total integrated noise and an interchannel isolation of greater than 120 dB.

[Poster Presentation Associated with Paper Presentation P30-1]
Convention Paper 6847 (Purchase now)

P31-2 Evaluation of Ambience Microphone Arrangements Utilizing Frequency Dependent Spatial Cross Correlation (FSCC)Teruo Muraoka, University of Tokyo - Tokyo, Japan
Ambience in stereophonic recording is mostly determined with the main microphone system. The authors examined the effect utilizing Frequency-dependent Spatial Cross Correlation (FSCC), which is defined as a cross correlation of two outputs of the above-mentioned microphone system. If the recording sound field is diffusive, the above FSCC is desired to be zero. However, actual FSCC varies within -1 to 1 depending on frequency; that is brought by the microphone’s directionality and location. The authors theoretically analyzed FSCCs of typical main microphone systems such as AB system, ORTF system, WF system, and MS system and discovered that the FSCC of MS system becomes uniformly zero under conditions when its azimuth of directional angles is set at 132-degrees. This was proven experimentally, and excellent ambient recording was achieved.

[Poster Presentation Associated with Paper Presentation P30-2]
Convention Paper 6848 (Purchase now)

P31-3 Parameter Estimation of Dynamic Range Compressors: Models, Procedures, and Test SignalsUwe Simmer, University of Applied Sciences Oldenburg/Ostfriesland/Wilhelmshave - Oldenburg, Germany; Denny Schmidt, University of Bremen - Bremen, Germany; Joerg Bitzer, University of Applied Sciences Oldenburg/Ostfriesland/Wilhelmshave - Oldenburg, Germany
An analysis of digital dynamic range compression algorithms is presented. They are studied by employing a single-band feed-forward compressor model allowing the use of independent attack and release times for both RMS detection and gain smoothing. Artificial test signals for measuring the static and dynamic compressor characteristics are discussed. The parameters of the compressor model are estimated by fitting the model output to the output of the compressor under test by using the simplex method. The results are verified by comparing the output levels of the actual and the fitted compressor for real world audio samples.

[Poster Presentation Associated with Paper Presentation P30-3]
Convention Paper 6849 (Purchase now)

P31-4 3-D Sound Field Recording with Higher Order Ambisonics—Objective Measurements and Validation of a 4th Order Spherical MicrophoneSébastien Moreau, Consultant - Colombes, France; Jérôme Daniel, Stéphanie Bertet, France Telecom R&D - Lannion, France
Higher Order Ambisonics (HOA) is a flexible approach for representing and rendering 3-D sound fields. Nevertheless, lack of effective microphone systems limited its use until recently. As a result of the authors’ previous work on the theory and design of spherical microphone arrays, a 4th order HOA microphone has been built, measured, and used for natural recording. The present paper first discusses theoretical aspects and physical limitations proper to discrete, relatively small arrays (spatial aliasing, low-frequency estimation). Then it focuses on the objective validation of such microphones. HOA directivities reconstructed from simulated and measured 3-D responses are compared to the expected spherical harmonics. Criteria like spatial correlation help characterizing the encoding artifacts due to the model limitations and the prototype imperfections. Impacts on localization criteria are evaluated.
Convention Paper 6857 (Purchase now)

P31-5 Audio Cable Distortion Is Not a Myth!Richard Black, Richard Black Associates - London, UK
Specialist audio cables are often sold to the consumer on the basis of eyebrow-raising claims for technical performance, though to date no repeatable test has shown any effect more surprising than mild frequency-selective attenuation. However, because the loudspeaker load is typically nonlinear and causes harmonic currents to flow, finite impedance in an audio cable does indeed cause harmonic voltages to appear across the loudspeaker. This distortion term is similar to, or even greater than, that produced by the amplifier’s intrinsic nonlinearity.
Convention Paper 6858 (Purchase now)


   
  (C) 2006, Audio Engineering Society, Inc.