Audio Engineering Society Preprints

AES 116th Convention

Berlin, Germany
May 8-11, 2004

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

5994
Foss, Richard; Laubscher, Rob; Fujimori, Jun-ichi
An application, called the mLAN Installation Designer, has been developed that enables the user to graphically design and validate an mLAN sound installation. This application is built upon a model of mLAN systems that is defined by an Extensible Markup Language (XML) Schema, ensuring cross platform portability and future scalability. The XML Schema provides sufficient flexibility to form the basis for a standard effort to describe the configuration of IEEE 1394 based sound installation environments. The output from the mLAN Installation Designer application file is an XML document, consistent with the defined schema, which allows a configuration tool to configure the mLAN devices for automatic operation during deployment of the system.
An XML-Based Approach to the Generation and Testing of mLAN Sound Installation Configurations

5995
Frandsen, Christian G.; Lave, Morten
It is a challenge to predict fault tolerance of the total system using point-to-point digital audio interfaces to build complex routing structures. In real life, digital interfacing is therefore still considered less robust than analog. This paper provides a systematic investigation of factors determining reliability in a number of widely used professional audio and synchronisation interfaces such as AES3, SPDIF, ADAT, TDIF and Word Clock. Electrical characteris-tics, phase-offset and tolerance to offset, intrinsic jitter and tolerance to jitter, and sample rate precision have been tested. Additionally, compliancy with standards has been evaluated. Finally, a discussion how these problems can be dealt with followed by specific thoughts about the next generation of interfaces will be presented with examples.
Plug and Play? An Investigation into Problems and Solutions of Digital Audio Networks

5996
Floros, Andreas; Karoubalis, Theodore
Based on the current version of the forthcoming IEEE802.11e standard, the paper examines the wireless, real-time transmission of high-quality audio streams. The required procedures that provide the necessary Quality of Service (QoS) support are presented and optimized for digital audio applications, and their effect on the achieved playback quality is estimated through a sequence of tests, in terms of the achieved wireless bit rate and the end-to-end packet delay. Both two-channel and multi-channel audio playback setups are considered, in order to accurately simulate typical stereo and home theater wireless applications.
Delivering High-quality Audio over Wireless LANs

5997
Ryu, Sang-Uk; Rose, Kenneth
This paper investigates error concealment based on sinusoidal analysis and synthesis. Major shortcomings are identified with focus on the extraction of sinusoidal frequency evolution and sinusoid matching, and a new approach to frame loss concealment is proposed. It involves parallel Fourier transformation with long and short windows to accurately extract model parameters, and is complemented with two sinusoid matching techniques -- sinusoidal pair alignment by dynamic programming and harmonics-based matching. Moreover, due to the incompatibility of sinusoidal representation with broadband, noise-like signals, an alternative, ``sinusoids plus residual'' model is incorporated. The new algorithm was applied to CD quality audio of various genres and was demonstrated to improve the perceptual quality with considerable gains for non-transient frames.
Advances in Sinusoidal Analysis/Synthesis-based Error Concealment in Audio Networking

5998
Rumsey, Francis; Neher, Tobias; Brookes, Tim
In the context of devising a spatial ear-training system, a study into the perceptual construct ?ensemble depth? was executed. Based on the findings of a pilot study into the auditory effects of early reflection (ER) pattern characteristics, exemplary stimuli were created. Changes were highly controlled to allow unidimensional variation of the intended quality. To measure the psychological structure of the stimuli and hence to evaluate the success of the simulation, Multidimensional Scaling (MDS)techniques were employed. Supplementary qualitative data were collected to assist with the analyses of the perceptual (MDS) spaces. Results show (1) that syllabicity of source material (rather than ER design) is crucial to depth hearing and (2) that unidimensionality was achieved, thus suggesting the stimuli to be suitable for training purposes.
Unidimensional Simulation of the Spatial Attribute ?Ensemble Depth? for Training Purposes ? Part 2: Creation and Validation of Reference Stimuli

5999
Boone, Marinus M.; Helleman, Hiske W.
When recording impulse responses of a concert hall for later processing in a spatial audio reproduction system such as Wave Field Synthesis (WFS), the question arises in how far these impulse responses can be used for different source positions without a loss in spatial perception. A preliminary study has been carried out to find the threshold of audibility of spatial variations in the position of a single reflection. It was found that the minimum audible spatial variation of a single reflection is 1 ? 2 m, or 5 ? 10 degrees, depending on the spatial configuration, and whichever is the largest. From that result preliminary conclusions can be drawn about the necessary resolution in recording and synthesis of reflection patterns for WFS rendering or other spatial reproduction systems.
Audibility Thresholds of Spatial Variations in a Single Acoustic Reflection

6000
Wittek, Helmut; Kerber, Stefan; Rumsey, Francis; Theile, Gunther
This paper describes listening tests conducted to evaluate WFS in a movie theatre with about 100 seats. Parameters under test are the number of loudspeakers, the distance between loudspeakers, the position of the simulated source and the position of listeners relative to the loudspeakers. In addition to this, testing of the audio-visual coherence was investigated.
Spatial Perception in Wave Field Synthesis Rendered Sound Fields: Distance of Real and Virtual Nearby Sources

6001
Bellini, Alberto; De Benedetti, Antonio; Franceschini, Giovanni; Burlenghi, Michele; Violi, Francesco
State-of-the-art audio amplifier can be classified into two major classes: Linear Amplifiers and Switching Amplifiers. The former class features low distortion but poor efficiency, while the latter feature high efficiency coupled with high distortion and low bandwidth. In this paper a hybrid architecture is presented, that combines linear and switching topology, in order to obtain an audio amplifier featuring high efficiency, low distortion and high bandwidth. The intrinsic structure of the switching stage allows an automatic spreading of the switching frequency, reducing EMI issues. A prototype amplifier was realized, tailored for automotive applications. The proposed architecture is patent pending.
TANDEM Digital Audio Amplifier

6002
Shively, Roger; King, Josh
This is an update to a previous study, which used mechanical dynamic behavior data, impedance and distortion measurements of several automotive doors to compare low frequency performance and low frequency sound quality. The updated information further investigates a methodology for quantifying door enclosures and refines the criteria for qualifying automotive doors as loudspeaker enclosures.
Update to Automotive Doors as Loudspeaker Enclosures

6003
Squartini, Stefano; Piazza, Francesco; Toppi, Romolo; Lattanzi, Ariano; Ciavattini, Emanuele; Bettarelli, Ferruccio; Navarri, Massimo; Lori, Walter
An original software-based system, featuring two different tools, is here proposed for vehicle audio quality assessment. The first one performs the acquisition of relevant data for system modelling and for cancelling the undesired effects of the acquisition chain. The second offers a user-friendly interface for real time simulation of different car audio systems and for both objective and subjective evaluation, where the listening procedure is directly experienced at PC workplace. The validity of this approach has been examined through a subjective listening test set (more than 50 participants and three cars involved), developed by means of a dedicated software environment and based on appropriate ITU recommendations. Experimental results have shown that the quality rating delivered by conventional in-car procedure is confirmed when the software-based approach is used.
Evaluating Different Vehicle Audio Environments Through a Novel Software-based System

6004
Bozzoli, Fabio; Angelo, Farina
One of the most used intelligibility?s parameters is Speech Transmission Index : the techniques for determining it employ an artificial speaker and listener. When signal to noise ratio is particularly low, for example inside cars, the value of STI is mainly influenced by this ratio and measuring the emission level of real speakers is the only way for driving correctly the artificial mouth. We have implemented a technique that is based on a throat-activated microphone and it is able to find the effective level of a real speaker?s voice inside a noisy space in realistic conditions. We have studied especially the speech level inside cars and we have discovered how the value defined by IEC/ITU standards may be extremely different from real one. In this way, we were able to produce test signals at a more appropriate emission level
Measurement of Active Speech Level Inside Cars using Throat-activated Microphone

6005
Panzer, Joerg; Ferekidis, Lampos
The direct application of interpolation, smoothing or mean-value algorithms to complex valued frequency response data may cause interference patterns and, due to this, not yield the expected result. This paper demonstrates the effect of the use of continuous phase in a variety of example-applications, such as interpolation between two frequency response curves, complex smoothing with down-sampling using a logarithmic grid and forming mean values of a set of complex frequency response curves. The continuous phase-approach takes into account the multi-valued property of the exponential function of the phase term.
The Use of Continuous Phase for Interpolation, Smoothing and Forming Mean Values of Complex Frequency Response Curves

6006
Czyzewski, Andrzej; Kotus, Jozef
A concept and an implementation of the multimedia computer system for the monitoring of environmental noise threats is presented. The principal aim of the project is to improve the effectiveness of prophylaxis of hearing diseases. It allows to receive, store, analyze and visualize a noise data coming from noise measurement equipment and from electronic questionnaires accessible through the Internet. A new concept of the USB noise meter with GPS is also presented.
Web Based Acoustic Noise Measurement System

6007
Parlantzas, Evaggelos G.; Sevastiadis, Christos V.; Dimoulas, Charalambos A.; Kalliris, George M.; Papanikolaou, George V.
The current paper presents a software application that conducts electroacoustic measurements using a digital approach to Time Delay Spectrometry. Development was focused on simplified hardware requirements such as a personal desktop or laptop computer. A friendly and flexible user interface has been designed. Linear and logarithmic sweep test signals are generated and reproduced. System under test (e.g. room) response is recorded and stored in the hard disk. Energy Time Curve (ETC) and Frequency domain analysis procedures are guided efficiently. Reverberation time in the case of a room is estimated very fast. All task data may be restored later for further analysis. Finally, the results of comparison measurements using our application to measurements using a widely accepted TDS analyzer are presented
Software Application for Electroacoustic Measurements Using the Time Delay Spectrometry (TDS) Method

6008
Danyuk, Dimitri
The design for a low-noise amplifier is presented. The amplifier has a triode-like transfer characteristic and produces harmonic distortion components that are similar to triode preamplifier. It can be used as a building block for microphone preamplifiers, active front-end electronics for various pick-ups and aural stimulator units.
Triode Emulator

6009
Hans, Nicolas; de Koster, Johan
An increasing number of broadcasters and organizations are considering the digitization of their media archives. Implementing digital media libraries so as to ensure the proper preservation of legacy archives has been recognized as a priority. Yet, many organizations are faced with a paradox: although strategic, these digitization projects are postponed because of budgetary constraints. This paper discusses several case studies and suggests a new approach to implementing a pragmatic archiving strategy ? one that will get approval and support from management.
Taking Care of Tomorrow Before it is Too Late: A Pragmatic Archiving Strategy

6010
Blohmer, Helge; Loeffler, Jobst; Koehler, Joachim; Kaup, Kai Uwe
This paper describes methods for automatic extraction of descriptive metadata for audio material and the workflow of archiving. These new algorithms and archiving tools developed at Fraunhofer IMK are to be directly integrated into MediaFabric, a commercially available radio broadcasting framework. Processing steps are based on pattern recognition algorithms and include speech/non-speech detection, speaker change detection and classification, jingle and advertising recognition. The extracted audio structure is described as a hierarchical representation of segment nodes annotated with suitable metadata. An extended retrieval application allows interactive display and navigation of the audio structure. A novel approach to keyword search based on a syllable representation of audio material is used for effective retrieval within the digital radio archive.
Archiving of Radio Broadcast Data using Automatic Metadata Generation Methods within MediaFabric Framework

6011
Mason, Andrew J.
Audio watermarking has recently had a resurgence of interest, spurred on by the desire for copyright protection of digital audio recordings. Several audio watermarking techniques, some dating back more than 30 years are described briefly here. The uses to which watermarking might be put are also summarised. Attention is then focussed on the requirements identified by the EBU applicable to distribution over the Eurovision and Euroradio networks. The EBU issued a call for systems to meet its requirements. Subjective and objective tests were done on the systems supplied for testing. Audibility and robustness of the watermarks were measured. The results are encouraging for those considering using audio watermarking in broadcast applications.
EBU Tests of Commercial Audio Watermarking Systems

6012
Ricard, Julien; Herrera, Perfecto
Sound samples metadata are usually limited to a source label and several related textual labels. In the context of sound retrieval this makes the retrieval of sounds having no identifiable source (?abstract sounds?) a hard task. We propose a description framework focusing on intrinsic perceptual sound qualities, based on Schaeffer?s research on sound objects, which could be used to represent and retrieve abstract sounds and to refine traditional search by source for non-abstract sounds. We show that some perceptual labels can be automatically extracted with good performance, avoiding the time-consuming manual labelling task, and that the resulting representation is evaluated as useful and usable by a pool of users.
Morphological Sound Description: Computational Model and Usability Evaluation

6013
Batlle, Eloi; Guaus, Enric
Speech-Music discriminators are usually designed under some rigid constrains. This paper deals with a more general Speech-Music Discriminator successfully used in AIDA project. The system is based on a Hidden Markov Model style classification process in which the styles are grouped into two major categories: Speech or Music. The goals of this sub-system are (1)the expandible possibilities with the addition of some new styles (like "phone female voice"), (2)the use of new rhytmical descriptors in combination with other typical ones and (3)the robustness of our speech/music discriminator in many different environments by using some Mathematical Morphology and non-linear post-processing techniques. The techniques used in our system allow a fast track in changes between styles and, thus, typical confusions in commercials can be easily cleaned. The accuracy of this system can be up to a 94.3% in broadcast radio environment.
A Non-linear Rhythm-Based Style Classifciation for Broadcast Speech-Music Discrimination

6014
Hsu, Han-Wen; Liu, Chi-Min; Lee, Wen-Chieh
Current audio encoders like MP3 or AAC leads to some artifacts due to the bit rate constraint. This paper considers two artifacts. The first artifact is the unusual spectral valley which is perceptually heard as fishy noise. The second one is the spectrum clipping which leads to the muffling audio. This paper proposes the spectrum patch method to handle the two artifacts in the decoders. The technique can be included in MPEG1? Layer3 and MPEG4?AAC (Advanced Audio Coding) decoders to conceal the artifacts without prior information on the original audio tracks. Intensive experiments have been conducted on various encoders and audio tracks to check the quality improvement and the possible risks in degrading the quality. The objective test measures used is the recommendation system by ITU-R Task Group 10/4.
Audio Patch Method in Audio Decoders -- MP3 and AAC

6015
Algazi, V. Ralph; Duda, Richard O.; Thompson, Dennis M.
A new method is presented for capturing, recording, and reproducing spatial sound that provides a vivid sense of realism. The method generalizes binaural recording, preserving the information needed for dynamic head-motion cues. These dynamic cues greatly reduce the need for customization to the listener. During either capture or recording, the sound field in the vicinity of the head is sampled with a microphone array. During reproduction, a head tracker is used to determine the microphones that are closest to the positions of the listener's ears. Interpolation procedures are used to produce the headphone signals. The properties of different methods for interpolating the microphone signals are presented and analyzed.
Motion-Tracked Binaural Sound

6016
Silzle, Andreas; Strauss, Holger; Novo, Pedro
The basic requirements for an Auditory Virtual Environment (AVE) are presented and a system based on a physical approach (IKA-SIM), employing the mirror-image model to generate the early reflections, is described. The static and dynamic structure of the IKA-SIM software (written in C++) is shown in diagrams and the computational requirements for real-time performance are delineated. IKA-SIM is able to render rooms of arbitrary shape, to account for frequency dependent absorption factors, and to calculate high order reflections in real-time on a standard PC. The different interfaces for real-time interaction are presented. IKA-SIM supports headphone and loudspeaker reproduction. A new elevation panning algorithm for loudspeaker reproduction is introduced. Design aspects relevant to a real-time AVE system are presented.
IKA-SIM: A System to Generate Auditory Virtual Environments

6017
Daniel, Jerome; Moreau, Sebastien
Higher Order Ambisonics (HOA) provides a rational and flexible way for spatial encoding, conveying and rendering of 3D sound fields. For this reason it has known a growing interest over past years. Nevertheless, representing near field sources and recording natural sound fields has been addressed only quite recently. This raises the problem of "infinite bass-boost", which a recent approach (NFC-HOA) solves while being fully equivalent with spherical harmonics representation. To better handle problematic cases where bass-boost remains excessive, the present study discusses the actual usefulness of some spatial components depending on the area targeted for sound field reconstruction. Therefore it suggests frequency dependent restriction of spatial resolution by high-passing spatial components. As a particular result, it shows that a much moderated amplification is sufficient to efficiently model sound sources at any distance, and derives a safe and fine solution to simulate sources inside the listening area.
Further Study of Sound Field Coding with Higher Order Ambisonics

6018
Warusfel, Olivier; Misdariis, Nicolas
A diffusion device based on a digitally-controlled 3D array of loudspeakers ? La Timée ? was developped in order to synthesize a given radiation pattern from the combination of a set of elementary directivities. This radiation synthesis method, designed for musical and performance constraints (real-time control, musical vocabulary associated to different directivity patterns, ?), has been used for stage performances and sound installations. In order to translate the sound experience for domestic setups, the paper also addresses the post-production step where the spatial image associated to the radiation synthesis is transcoded on conventional formats like transaural, ambisonic or 5.1 formats. The method is based on the characterization of the performance room thru the different elementary directivities, and then on their superimposition.
Sound Source Radiation Syntheses: From Performance to Domestic Rendering

6019
Steinke, Gerhard
The growing penetration of the DVD stimulates also to a more intimate association of sophisticated multichannel sound and larger high-quality image with ?ideal? TV format 16:9 (1.78:1). Nevertheless, different geometrical assignments may exist between image size and loudspeaker basis width in production studios, multimedia rooms as well as in home living rooms ? besides varying room-acoustical and qualitative conditions. For best possible imagination of program essences the exact locations of sound and picture sources should be assigned as closely as possible, i.e., with corresponding horizontal listening angles and viewing angles for avoiding disturbing discrepancies between acoustical and optical perspective. Essential connections are considered and the recommendation is derived to adjust the optimum viewing distance 2H with regard to appropriate large loudspeaker basis width and image size for high home theater experiences.
Surround-Sound: Relations of Listening and Viewing Configurations

6020
Ellis-Geiger, Robert J.
This poster represents a new approach to recording acoustic music for film and has the potential to dramatically improve the performance of an orchestra, small ensemble or solo performer for highly emotional scenes. Additionally, this approach to film music production will allow for sudden changes to be made during the scoring session, such as last minute film edits that will result in changes to the final score. This poster will also reveal some of the processes in film music composition and the use of technology.
Film Music Recording Using Technology

6021
Pape, Daniel; Kalkbrenner, Gerrit; Maihorn, Jan
An electronic learning module covering the field ?Perceptual Audio Coding? (famous representive: MP3) was specified, designed and implemented by means of the multimedia software Macromedia Director. The presented program is split into different modules. These include: (1) An auralization of the filterbank implemented in MP3 (2) simulations of various classic psychoacoustic experiments (mainly masking) for three different music styles. Other audio examples exhibit (1) a comparison of the sound quality of a Fraunhofer MP3 codec at different bitrates and (2) a comparison of today?s most important audio and speech codecs (like WindowsMediaEncoder and Real9) at different bitrates and (3) audio examples and explanation of typical error signals introduced by perceptual audio coding. Finally a structured explanation of the mode-of-operation of an MP3 encoder and technical papers with further references to publications on perceptive audio coding were included in the presented software.
Development of a Multimedia Learning Module Covering the Field Perceptual Audio Coding

6022
Feiten, Bernhard; Graffunder, Andreas; Wolf, Ingo
Future services in the Internet have to support heterogeneous networks and end-devices. Audio and video services have to support a flexible adaptation of bitrate. MPEG-21 provides a multimedia framework that supports the "digital item adaptation" in various ways. Adaptation of the quality of a service is supported by the AQoS description scheme and the bitstream syntax description language (BSDL). Utilities exist to describe the relation between the scaling of the bitstream and the related perceived quality. The brightness, the cleanness and the wideness are proposed as dimensions to assess the quality and to derive parameters for controlling the audio transmission. A mapping of these features on the model output values (MOV?s) of the ITU assessment method PEAQ is proposed.
Controlling the Quality of Audio Services in the Internet

6023
Muheim, Men
This paper introduces a Ph.D. thesis that has been recently presented at the Swiss Federal Institute of Technology (http://www.ife.ee.ethz.ch/~men/phd.shtml). The thesis envisions a distributed audio system based on commodity computer components. It examines to what extent the "real-time" attributes of mainstream operating systems lead to audio dropouts and therefore to quality loss. It studies extrapolation methods to prevent loss of quality and shows that quality improvement to a non-annoying level is possible. A synchronization mechanism is implemented on application layer in order to facilitate the use of Ethernet as the only communication network. Thereby the thesis shows that a synchronization accuracy of 11us between separated loudspeakers is feasible. Furthermore the thesis proposes a novel software framework, which makes the development of distributed audio services easier.
Design and Implementation of a Commodity Audio System

6024
Simeonov, Aleksandar; Zoia, Giorgio; Lluis-Garcia, Robert; Mlynek, Daniel
The constantly increasing demand for a better quality in sound and video for multimedia content and virtual reality compels the implementation of more and more sophisticated 3D audio models in authoring and playback tools. A very careful and systematic analysis of the best available development libraries in this area was carried out, considering different Application Programming Interfaces, their features, extensibility, and portability among each other. The results show that it is often difficult to find a tradeoff between flexibility, efficiency, quality and speed. In this paper we propose a low level, modular DSP library, which can be used to implement advanced 3D audio models; it is based on reconfigurable primitive methods required by most 3D algorithms and it provides fast development and good flexibility.
Advanced 3D Audio Algorithms by a Flexible, Low Level Application Programming Interface

6025
Su, Alvin W.Y.; Xiao, Yi-Song; Yeh, Jia-Lin; Wu, Jien-Lung
MPEG-4 Structure Audio is an algorithmic based coding standard designed for low bit-rate high quality audio. With this standard, the desired sound can be identical on both the encoder side and the decoder side by using Structured Audio Orchestra Language (SAOL) to generate sound samples. It requires a player and a streaming engine when real-time interactive internet presentations are necessary. In this paper, we present such a system implemented and applied over IBM PC based computers. The proposed streaming engine follows ISMA specification and its implementation is closely related to Apple's Darwin Server. After the streaming SA player receives the bitstream from the server, it converts SAOL data stream to JAVA codes and links to a proposed scheduler program generated from SASL data stream for direct execution such that one can hear the sound in real time. Unlike sfront, no intermediate C codes and C compilers are necessary. In order to improve the performance, optimized software modules such as the core opcodes and the core wavetable engine have been embedded. Significant speedup is achieved compared to the reference SAOLC decoder. Real-time demonstration of the system will be made during the presentation. Discussion of the possible future algorithmic coding method using JAVA is also given.
Real-Time Internet MPEG-4 SA Player and the Streaming Engine

6026
Karjalainen, Matti; Lokki, Tapio; Nironen, Heli; Harma, Aki; Savioja, Lauri; Vesa, Sampo
Several applications for wearable and mobile reality audio are presented. All applications exploit a headset where microphones are integrated into small headphone elements. The proposed system allows us to implement applications where virtual sound events are superimposed to the user's auditory environment to produce an augmented audio display. In addition, binaural audio-over-IP connections, wired or wireless, are discussed. Finally, some future application scenarios are sketched.
Application Scenarios of Wearable and Mobile Augmented Reality Audio

6027
Shirley, Ben G.; Kendrick, Paul
In this paper a hybrid architecture is presented, that combines linear and switching topology, in order to obtain an audio amplifier featuring high efficiency, low distortion and high bandwidth. The intrinsic structure of the switching stage allows an automatic spreading of the switching frequency, reducing EMI issues. A prototype amplifier was realized, tailored for automotive applications. The proposed architecture is patent pending.
ITC Clean Audio Project

6028
Schwark, Mathias; Reiter, Ulrich; Dantele, Andreas
In an audiovisual virtual 3D environment the conformance of visual and auditory impression is important to provide a high level of immersion. Restrictions of processing power for the auralization (including early and late reverberation) are usually high due to the demanding visual rendering. For the audio part a trade-off between high accuracy and speeding up the rendering process has to be found, especially for real-time user interaction. We show how the rendering process of early reflections can be done in real-time by reducing the scene representation to auditory relevant elements. A suitable scene simplification algorithm and corresponding audio rendering issues are discussed.
Audiovisual Virtual Environments: Enabling Realtime Rendering of Early Reflections by Scene Graph Simplification

6029
Paiva, Rui Pedro; Mendes, Teresa; Cardoso, Amilcar
We present a bottom-up method for melody detection in polyphonic musical signals. Our approach is based on the assumption that the melodic line is often salient in terms of note intensity (energy). First, trajectories of the most intense harmonic groups are constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and energy variations). Too short, low-energy and octave-related notes are then eliminated. Finally, the melody is extracted by selecting the most important notes at each time, based on their intensity. We tested our method with excerpts from 12 songs encompassing several genres. In the songs where the solo stands out clearly, most of the melody notes were successfully detected. However, for songs where the melody is not that salient, the algorithm performed poorly. Nevertheless, we could say that the results are encouraging.
A Methodology for Detection of Melody in Polyphonic Musical Signals

6030
Wroblewski, Jakub; Wieczorkowska, Alicja
Estimation of fundamental frequency (so called pitch tracking) can be performed using various methods. However, all these algorithms are susceptible to errors, especially octave ones. In order to avoid these errors, pitch-trackers are usually adjusted to particular musical instruments. Therefore problem arises when one wants to extract fundamental frequency independent on the timbre. Our goal was to elaborate method of fundamental frequency extraction, which works correctly for any timbre. We propose multi-algorithm approach, where fundamental frequency estimation is based on results coming both from a range of frequency tracking methods, and additional parameters of sound. Also, we propose frequency tracking based on direct analysis of signal and its spectrum. We explain the structure of our estimator and the obtained results for various musical instruments and sound articulation techniques.
Octave-Error Proof Timbre-Independent Estimation of Fundamental Frequency of Musical Sounds

6031
Dittmar, Christian; Uhle, Christian
This publication presents a new method for the detection and classification of un-pitched percussive instruments in real world musical signals. The derived information is an important pre-requisite for the creation of a musical score, i.e. automatic transcription, and for the automatic extraction of semantic meaningful meta-data, e.g. tempo and musical meter. The proposed method applies Independent Subspace Analysis using Non-Negative Independent Component Analysis and principles of Prior Subspace Analysis. An important extension of Prior Subspace Analysis is the identification of frequency subspaces of percussive instruments from the signal itself. The frequency subspaces serve as information for the detection of the percussive events and the subsequent classification of the occurring instruments. Results are reported on 40 manually transcribed test items.
Further Steps towards Drum Transcription of Polyphonic Music

6032
Uhle, Christian; Dittmar, Christian
This publication addresses the generation of a musical score of percussive un-pitched instruments. A musical event is defined as the occurrence of a sound of a musical instrument. The presented method is restricted to events of percussive instruments without determinate pitch. Events are detected in the audio signal and classified into instrument classes, the temporal positions of the events are quantized on a tatum grid, musical meter is estimated and preparatory beats are identified. The identification of rhythmic patterns on basis of the frequency of their occurrence enables a robust identification of the tempo and gives valuable cues for the positioning of the bar lines using musical knowledge.
Generation of Musical Scores of Percussive Un-Pitched Instruments from Automatically Detected Events

6033
Azizi, Seyed-Ali
Modern asynchronous sample rate converters (ASRCs) are composed of an interpolation filter to increase the sample rate by an integer factor, followed by a polynomial interpolator which produces the desired output samples at arbitrary output sampling time instants. A crucial feature determining the precision of the ASRCs is the phase linearity of the interpolation filter in use. That is the main reason why traditionally easily realizable linear phase FIR filters, but not IIR filters suffering from inherent phase non-linearity, have been employed as interpolation filters, although IIR filters are more economical. This paper introduces a novel ASRC design approach which uses the zero phase IIR filtering concept to produce highly efficient, linear phase IIR interpolation filters to be used in ASRCs. The basic concept is explained and the functions of the involved units are investigated
Efficient Arbitrary Sample Rate Conversion Using Zero Phase IIR Filters

6034
Kudo, Akihiro; Hokari, Haruhide; Shimada, Shoji
Many papers have described moving sound image localization schemes that use loudspeakers or headphones. Most of these schemes are based on switching spatial transfer functions, so wave discontinuity occurs at the moment of switching, which degrades the sound quality. While the characteristics of the wave discontinuity depend on the moving sound image localization schemes, no paper appears to have considered the relationship between the wave discontinuity and the scheme used. To rectify this omission, this paper examines three approaches: simple switching approach, overlap-add approach, and fade-in--fade-out approach. We assess the sound degradation caused by wave discontinuity, and use the objective measure of spectrum distortion width to quantify the wave discontinuity. We also carry out paired comparison tests as subjective assessments. Both assessments verify that the third approach is the best of the three.
A Study on Implementing Switching Transfer Functions Focusing on Wave Discontinuity

6035
Petrovsky, Alexander A.; Parfieniuk, Marek; Borowicz, Adam
This paper considers a novel application of the Warped Discrete Fourier Transform in single channel noise reduction system. Namely, the WDFT is simultaneously the basis for spectral weighting and psychoacoustic model, thus allowing overall system to operate strictly in the critical band domain. The warped transform allows non-uniform allocation of the z transform frequency samples in good accordance with the Bark scale. Thus the psychoacoustic modeling is more accurate than in the DFT-based solutions and the subjective quality of enhanced speech increases. The noise suppression algorithm utilizes the majority of currently most advanced ideas in perceptually motivated spectral weighting. Its especial advantage is in the fact that the masking threshold is directly involved into the weighting rule.
Warped DFT Based Perceptual Noise Reduction System

6036
Tatlas, Nicolas-Alexander; Mourjopoulos, John N.
Loudspeaker Arrays driven by digital bitstreams are direct digital-signal to acoustic transducers, usually comprising of a digital signal processing module driving actuators. Current research efforts are focusing on topologies directly driven by multi-bit digital bitstreams. In this work, the above investigations are extended to the case of using 1-bit signals such as Sigma-Delta for driving such topologies, using time and frequency domain analysis. Simulation results will be presented for idealized actuators. Finally, an optimized architecture for such a loudspeaker will be proposed, based on this analysis.
Digital Loudspeaker Arrays Driven by 1-bit Signals

6037
Grenier, Yves; Rosier, Julie
In this paper, we present a fast and efficient technique for multipitch estimation of musical signals. We deal with mixtures where several instruments are present in a monophonic recording. The approach consists in clustering the spectral peaks of the mixture to obtain a spectral representation of each musical note. These spectra are then used to estimate the fundamental frequencies. We compare two techniques for the classification of the spectral peaks: a K-means procedure and a simpler aggregation technique associated to a criterion that represents the closeness to harmonicity for any couple of frequency peaks. This comparison is made on complex mixtures holding various musical instruments and piano chord mixtures. The effectiveness of the two estimation methods is presented using computation of pitch recognition rates and mean source number estimate.
Unsupervised Classification Techniques for Multipitch Estimation

6038
Wilson, R. Scott; Walters, Jeffrey H.; Abel, Jonathan S.
Given an array of speakers and a set of noisy inter-speaker range estimates, we consider the problem of estimating the relative positions of the array elements. A closed-form position estimator which minimizes an equation error norm is presented and shown to be related to a multidimensional scaling analysis. The information inequality is used to bound position estimate mean square error and to gauge the accuracy of the closed-form estimator. A geometric interpretation of the bound variance is given and used in examining our simulation results.
Speaker Array Calibration Using Inter-Speaker Range Measurements

6039
Moerman, Jean Paul
Nowadays, in a world of super-audio formats, the 'loudness'-problem is one of the most important restrictions for the audience to get an informative and relaxed experience. When zapping through the channels, loudness-differences are quite the usual thing. But also within one broadcaster levels are not consistent from one program switch to another. Viewers are extremely annoyed and complaints are to be expected, but no major enhancement has been undertaken in the broadcast world. Surprisingly enough the transition from analogue to digital did not improve matters - on the contrary, it became much worse!The trap to be the loudest is very tempting. The use of heavily compression techniques and the development of new signal processors have fed a culture of rivaling loudness. Louder attracts attention, but in the end the viewer will turn down the volume and discover a beaten, compressed and uninteresting sound. A common and single solution to the loudness-problem is to 'try' to correct the level at the end of the productionchain. Inserting just one peace of equipment right before transmission can?t solve this: a processor, which solves all of the problems. This results in a sound which even causes listening fatigue. It should be clear that a more extensive solution is advised. The solution was the installation of a broadcast processor in every facility unit within the VRT. The program will also be processed just before transmission and pro format: mono, nicam-stereo and recently audio for DVB-T. Most important was not to forget the training of all technicians from every unit as post-production, studio, OBfacility, continuity and transmission. Even the (non-sound-minded) editors who fill in all the production aspects in an off-line video facility, do need some facts on how to judge loudness. The external production units of advertising trailers and programs should also be given the necessary information.. Louder attracts attention, but in the end the viewer will turn down
Loudness in TV Sound

6040
Foti, Frank
Over the past few years, as development, testing, and rollout progressed regarding the HD-Radio (IBOC), DAB and DRM transmission systems, audio processing has been one of the key components to augment this new technology. It became apparent that dynamics processing would figure in both the aural and technical performance aspects of these new systems. It has been successfully proven that dynamics signal processing improved other bitrate-reduced audio services like internet audio streaming, especially at low bitrates. This discussion will offer examples of proven methods that demonstrate the benefits of audio processing with these new digital broadcast systems. There are important issues that must be considered, and thought out, or digital radio?s benefits will not be fully realized.
Audio Processing for Digital Broadcast Mediums

6041
Millot, Laurent
Analysis tools used in research laboratories, for sound synthesis, by musicians or sound engineers can be rather different. Discussion of the assumptions and of the limitations of these tools permits to propose a first tool as relevant and versatile as possible for all the sound actors with a major aim: one must be able to listen to each element of the analysis because hearing is the final reference tool. This tool should also be used, in the future, to reinvestigate the definition of sound (or Acoustics) on the basis of some recent works on musical instrument modelling, speech production and loudspeakers design. Audio illustrations will be given.
Some Clues to Build a Sound Analysis Relevant to Hearing

6042
Chang, Wei-Chen; Su, Alvin W.Y.
Struck string instruments such as pianos usually have groups of strings terminated at some common bridges, respectively. Because of the strong coupling phenomenon, the produced tones exhibit highly complex amplitude modulation patterns. Therefore, it is difficult to determine the synthesis model parameters such that the synthesized tones can match the recorded tones. In this paper, a multi-channel recurrent network is proposed based on three previous works: the coupled-string model, the commuted piano synthesis method and the IIR synthesis method. This work attempts to automatically extract the synthesis parameters by using a neural-network training algorithm without the knowledge of physical properties of the instruments. Encouraging results are shown in the computer simulations.
Synthesizing Coupled-String Musical Instruments with a Mulit-Channel Recurrent Network

6043
Ortiz-Berenguer, Luis; Casajus-Quiros, F. Javier
The non linear behavior of the piano strings is a very important issue when the chords have to be recognized using spectral patterns. In order to calculate the pattern and masks used in the recognition algorithm it is necessary to model the effects of non linearity. A model using intermodulation products have proved to give good results. For validation of the model we have recorded 11 pianos and analyzed the ?A? note of the octaves 1 to 7, using 4 different forces. The basis of this model are presented in this contribution.
Non Linearity Modeling for Spectral Pattern Recognition in Piano Chords

6044
Lang, Bob
Bezier curves are frequently used in graphical applications and drawing packages. In this paper, the author presents a technique of direct sound wave synthesis using Bezier curves. The technique is further expanded by modulating the position of the Bezier control points as synthesis takes place to create waveforms with complex harmonic structures. The paper also outlines how the technique can be used to create a musical instrument (synthesizer).
Waveform Synthesis using Bezier Curves with Control Point Modulation

6045
D'haes, Wim
In the field of sinusoidal modelling, two types of least squares amplitude estimation methods are distinguished. A first group of methods estimate the complex amplitude of each sinusoid in an iterative manner. Although their main disadvantage is that they are unable to resolve overlapping frequency responses, they are used frequently because of their computational complexity being O(N log (N)). By contrast, methods that compute all amplitudes simultaneously can resolve overlapping frequency responses but their computational complexity scales with a power of three in function of the number of sinusoidal components. In this work a method is proposed which allows to compute all amplitudes simultaneously and still has an O(N log (N)) complexity. This is realized by explicitly including a window with a bandlimited frequency response in the least squares derivation resulting in a band diagonal system of equations which can be solved in linear time. Since overlapping frequency responses are allowed, an iterative method must be used to optimize the frequencies resulting in a nonlinear least squares technique. A commonly used technique is Newton optimization which requires the computation of the gradient and the Hessian matrix. Also here, the same computational gain is realized by applying the same methodology.
A Highly Optimized Nonlinear Least Squares Technique for Sinusoidal Analysis: From (Omega)(K2N) to (Omega)(N log (N))

6046
Rault, Jean-Bernard; Marchand, Sylvain; Lagrange, Mathieu
This paper introduces a partial-tracking algorithm suitable for the sinusoidal modelling of polyphonic sounds. A new method, based on the backward exploration of possible extensions of the partials in future frames, is proposed to cope with the lack or corruption of spectral data. The allocation of spectral peaks to a partial is done by considering possible trajectories in future frames where frame hoping is allowed. A suitable transition probability that takes into account missing or rejected peaks is proposed. The trajectory that exhibits the highest probability is searched for and the corresponding peak for the current frame is chosen to extend the partial.
Partial Tracking Based on Future Trajectories Exploration

6047
Liebchen, Tilman; Reznik, Yuriy; Moriya, Takehiro; Yang, Dai Tracy
Lossless coding will become the latest extension of the MPEG-4 audio standard. The lossless audio codec of the Technical University of Berlin was chosen as reference model for MPEG-4 Audio Lossless Coding (ALS). The MPEG-4 ALS encoder is based on linear prediction, which enables high compression even with moderate complexity, while the corresponding decoder is straightforward. The paper describes the basic elements of the codec as well as some additional features, gives compression results, and points out envisaged applications.
MPEG-4 Audio Lossless Coding

6048
Shimada, Osamu; Tanaka, Naoya; Tsushima, Mineo; Norimatsu, Takeshi; Kok Seng, Chong; Kim Hann, Kuah; Sua Hong, Neo; Nomura, Toshiyuki; Takamizawa, Yuichiro; Serizawa, Masahiro
This paper proposes a Low Power Spectral Band Replication algorithm (LP-SBR) adopted in the MPEG-4 Audio standard. It operates with low computational complexity compared to the conventional SBR algorithm called the High Quality SBR algorithm (HQ-SBR). LP-SBR utilizes real-valued processing instead of complex-valued processing used in HQ-SBR for complexity reduction. To minimize the sound quality degradation caused by this reduction, LP-SBR employs aliasing reduction techniques and a gain compensation technique. Subjective quality test results show that there is no statistical difference between LP-SBR and HQ-SBR when they are incorporated into AAC decoders. A complexity comparison of both SBR decoders implemented on 16-bit fixed-point DSPs shows that an AAC decoder with LP-SBR requires 30% less computational complexity than that with HQ-SBR.
A Low Power SBR Algorithm for the MPEG-4 Audio Standard and Its DSP Implementation

6049
Herre, Jürgen; Faller, Christof; Ertel, Christian; Hilpert, Johannes; Hoelzer, Andreas; Spenger, Claus
Finalized in 1992, the MP3 compression format has become a synonym for personalized music enjoyment for millions of users. The paper presents a novel extension of this popular format which adds support for the coding of multi-channel signals, including the widely used 5.1 surround sound. As a prominent feature of the extended format, complete backward compatibility with existing stereo MP3 decoders is retained, i.e. standard decoders reproduce a full stereo downmix of the multi-channel sound image. The paper discusses the underlying advanced technology enabling the representation of multi-channel sound at bitrates that are comparable to what is currently used to encode stereo material. Results for subjective sound quality are presented and related activities of the MPEG standardization group are reported.
MP3 Surround: Efficient and Compatible Coding of Multi-Channel Audio

6050
Derrien, Olivier; Daudet, Laurent
In the context of lossy audio coding, the power spectral density of stationary tones can be over/underestimated in some windows due to the time-shift sensitivity of the Modified Discrete Cosine Transform (MDCT), which leads to potentially audible coding artefacts. This paper discusses the advantages of using a nearly time-shift invariant regularized MDCT spectrum for the bit allocation in MPEG-AAC coder. We show how this modification applies to the standard iterative algorithm, as well as to a more efficient model-based framework. Objective and subjective results indicate that the overall quality is significantly improved when rich stationary sounds are encoded at low bit-rates, or when the coder operates in a variable bit-rate mode.
Reduction of Artifacts in MPEG-AAC with MDCT Spectrum Regularization

6051
Chang, Tzu-Wen; Liu, Chi-Min; Lee, Wen-Chieh
Temporal noise shaping has been defined in MPEG-4 AAC to control the pre-echo noise in attack signals. The module, which is especially important for the MPEG-4 Low Delay AAC due to the absence of window switching mechanism, can shape and control quantization noise spread to improve the quality under bit rate constraint. However, this paper illustrates that the TNS will introduce three artifacts. The first artifact is similar to the Gibbs phenomenon which has high noise level occurring at the edge of the attack signal. The second effect is the time-domain aliasing noise which has unusual noise at a distance from the attack time frame. The third is the noise spreading with the TNS filter orders. This paper will propose the efficient TNS method which shapes noise with good concerns on the above three artifacts. Also, we provide an efficient computing method to activate the TNS. Both subjective and objective tests are conducted to illustrate the improvement over existing TNS methods.
The Efficient Temporal Noise Shaping Method

6052
Corteel, Etienne; Warusfel, Olivier; van Zon, Rik; de Vries, Diemer
Wave Field Synthesis (WFS) allows reproducing the spatial and temporal properties of a target sound field over a large listening area. Thanks to their screen shape, Multi-Actuator Panels (MAP) represent a good alternative for WFS reproduction in multimedia installations. However, MAP speakers act as reflectors for acoustic waves which disturb the perception of the target soundfield. A general listening room compensation technique is proposed, based on multichannel inversion, that allows attenuating early reflections caused by a reflector using loudspeakers integrated into this reflector (e.g. MAP loudspeakers). After an analysis of the geometrical arrangement of the panels, the method processes separately the free field equalization of the loudspeaker array and the reflection compensation. Simulation and measurements show that the attenuation is effective over the entire listening area.
Multi-Actuator Panel (MAP) Loudspeakers: How to Compensate for Their Mutual Reflections?

6053
Hamasaki, Kimio; Nishiguchi, Toshiyuki; Hiyama, Koichiro; Ono, Kazuho
Various sound systems have been studied at NHK with the objective of developing the next-generation broadcasting system. This paper introduces an ultimate 22.2 multichannel audio system for ultrahigh-definition video with 4000 scanning lines, and an advanced multichannel sound system with frontal loudspeakers placed in several rows for reproducing a live sound field. The former system has three vertical layers of loudspeakers with 2 LFEs, namely 3 loudspeakers at the bottom layer, 10 loudspeakers at the middle layer, 9 loudspeakers at the upper layer and 2 LFEs. The latter system consists of frontal loudspeaker-ranks and rear loudspeaker-arrays for reproducing a natural impression of depth and ambience. This paper describes the principal advantages of the newly proposed multichannel audio system over ordinary multichannel sound systems such as the 5.1.
Advanced Multichannel Audio Systems with Superior Impression of Presence and Reality

6054
Usher, John; Woszczyk, Wieslaw
To describe a multichannel audio experience in terms of its spatial features requires us to consider sound imagery in terms of precedent sound. We mean precedent sound to be that part of a phantom sound image that contains spatial information about the virtual sound source. We have developed and tested a Graphical User Interface (GUI) to allow a listener to describe where they hear both precedent and environment-related sound in an audio scene. The GUI has previously been used as a tool for describing where we hear the precedent sound in two-channel sound reproduction, and we now extend the experimental paradigm to investigate phantom imagery for a multichannel loudspeaker arrangement. We present a category system for describing the spatial sound attribute ``definition'', and have tested the GUI using 5 loudspeakers arranged according to BS-775 to replay multi-channel sound recordings of three different musical pieces (of which two were duets and one solo). Graduate Tonmeister students used the GUI to describe these sound scenes, and a variety of statistical analyses are used to visualize auditory spatial imagery.
Visualizing Auditory Spatial Imagery of Multi-channel Audio

6055
Sporer, Thomas; Klehs, Beate
In anechoic rooms the concept of Wave Field Synthesis (WFS) has already proven to provide superior spatial sound over a large part of the room. In anechoic space WFS needs a large number of loudspeakers. In "normal" listening conditions simulated and real acoustics interfere with each other making the generated wave field less exact. This paper describes listening tests conducted to evaluate WFS in a movie theatre with about 100 seats. Parameters under test are the number of loudspeakers, the distance between loudspeakers, the position of the simulated source and the position of listeners relative to the loudspeakers. In addition to this, testing of the audio-visual coherence was investigated.
Wave Field Synthesis in the Real World: Part 2 - In the Movie Theatre

6056
Pueo, Basilio; Bleda, Sergio; Escolano, Jose; Lopez, Jose Javier
Finite-Difference Time-Domain (FDTD) method was successfully developed to model electromagnetic systems. This technique has been also used in several disciplines, such as optics and acoustics. A new approach for Wave Field Synthesis (WFS) simulation using FDTD instead of finite difference classic method is presented. This software permits to evaluate precision and behaviour of different WFS configurations in time domain and thus in a particular frequency band. Moreover, simulations can be analyzed inside a room or in free space.
Wave Field Synthesis 3D Simulator Based on Finite-Difference Time-Domain Method

6057
Pulkki, Ville; Merimaa, Juha; Lokki, Tapio
A technique for spatial reproduction of room acoustics, Spatial Impulse Response Rendering (SIRR), has been recently proposed. In the method, a multi-channel impulse response of a room is measured, and responses for loudspeakers in an arbitrary multi-channel listening setup are computed. When the responses are loaded to a convolving reverberator, they will create a perception of space corresponding to the measured room. The method is based on measuring with a SoundField microphone or a comparable system, and on analyzing direction-of-arrival and diffuseness at frequency bands. An omnidirectional response is then positioned to a loudspeaker system according to analyzed directions and diffuseness. In this paper, the SIRR method is reviewed and refined. The reproduction quality of SIRR and some other systems is evaluated with listening tests, and it is found that SIRR yields a natural spatial reproduction of the acoustics of a measured room.
Reproduction of Reverberation with Spatial Impulse Response Rendering

6058
Schmidt, Juergen; Schroeder, Ernst F.
Since the early days of audio stereophony we tend to think of audio transmission and audio presentation in terms of loudspeaker feeds or "channels". This seemed to be appropriate for as few channels as two and still reasonable for five, but is rapidly loosing its meaning with the advent of technologies like e.g. wave field synthesis. A key part of MPEG-4 is the introduction of object-oriented thinking for the description, generation, transport, and rendering of audio scenes. Binary Information for Scenes (BIFS) is that part of the MPEG-4 standard that enables to transmit scene descriptions together with the audio signals to facilitate the final rendering. The latest version of AudioBIFS (Version 3) now has a number of improvements and new concepts that are explained in detail.
New and Advanced Features for Audio Presentation in the MPEG-4 Standard

6059
Williams, Michael; Le Du, Guillaume
This paper is the second part of a paper presented at the 110th AES Convention in Amsterdam. A selection of different Multichannel Microphone Arrays are again presented but this time using Supercardioid and Hypocardioid microphones. Five channel array configurations are described with respect to their particular characteristic: microphone directivity, specific segment coverage, segment offset values where necessary, microphone coordinates and orientations. Arrays have been chosen so as to assist the sound engineer in the search for the optimum microphone array for a given recording situation.
The Quick Reference Guide to Multichannel Microphone Arrays Design - Part II : Using Supercardioid and Hypocardioid Microphones

6060
Baumgarte, Frank; Faller, Christof; Kroon, Peter
A major application for Binaural Cue Coding (BCC) is multichannel audio coding. A previously proposed system combines full-band BCC for spatial parameters with an audio coder for a downmixed representation of the multichannel input. This paper presents a scalable hybrid coder combining a partial-band BCC as preprocessor and post-processor with a subband coder. The hybrid system supports a gradual tradeoff of bitrate and spatial image ranging from transparent multichannel and stereo to full-band BCC. To avoid coloration from the required up and down-mixing within BCC, an equalized mixing scheme based on a binaural loudness model is proposed. Subjective tests and bitrate simulations confirm the expected benefits of the hybrid coder in the transition range from full-band BCC to stereo.
Audio Coder Enhancement using Scalable Binaural Cue Coding with Equalized Mixing

6061
Harma, Aki; Faller, Christof
Techniques where a stereo or a multichannel signal is decomposed into spatial source-labeled time-frequency slots by level, time-difference, and coherence metrics have become popular in recent years. Good examples are binaural cue coding and up/downmixing techniques. In the article, we will provide an overview and discuss parallel approaches in the field of array processing and blind source separation. Typically, time-frequency slots are formed from subband representations of signals. However, it is also possible to produce a similar spatial decomposition for a parametric representation (sinusoids, transients, and noise) of a stereo or multichannel audio signal. Advantages and disadvantages of the two approaches for audio coding applications are discussed in this article.
Spatial Decomposition of Time-frequency Regions: Subbands or Sinusoids

6062
Gayer, Marc; Lutzky, Manfred; Schuller, Gerald; Kraemer, Ulrich; Wabnik, Stefan
Digital audio processing has been revolutionized by perceptual audio coding in the past decade. The main parameter to benchmark different codecs is the audio quality at a certain bit-rate. For many applications, however, delay is another key parameter which varies between only a few and hundreds of milliseconds depending on the algorithmic properties of the codec. Latest research results in low delay audio coding can significantly improve the performance of applications such as communications, digital microphones, and wireless loudspeakers with lip synchronicity to a video signal. This paper describes the delay sources and magnitude of the most common audio codecs and thus provides a guideline for the choice of the most suitable codec for a given application.
A Guideline to Audio Codec Delay

6063
Oomen, Werner; Szczerba, Marek; Klein Middelink, Marc
For mobile applications memory and computational complexity requirements are very strict. Therefore, traditional wavetable/FM synthesis methods have to compromise between the number and the quality of instruments in the soundbank. This paper presents a wavetable synthesizer employing a parametric representation of the soundbank samples, sharing the advantages of both wavetable and parametric synthesis methods. The soundbank is compact and thus easy to store and transmit, and the sound quality can match that of traditional wavetable synthesis. Moreover, post-processing of samples in a parametric representation ? such as pitch change, filtering and envelope ? can be performed directly in the parametric domain, effectively reducing synthesizer complexity.
Parametric Audio Coding Based Wavetable Synthesis

6064
Prakash, Vinod; Kumar, Anil; Konda, Preethi; Vadapalli, Sarat Chandra
The birdie artifact is the predominant factor affecting audio quality of perceptual coders operating at very low bit rates. Conventional approaches to overcome the birdie artifact involve use of Low Pass Filters to reduce the amount of signal to quantize. This approach does not eliminate the birdie artifact if the effect is seen in the in-band components. This paper proposes a new algorithm to overcome the birdie artifact and hence improve the audio quality. The proposed algorithm modifies the bit allocation strategy such that the critical bands are preserved, while still maintaining the Perceptual distortion criteria. Results of Spectrogram analysis are presented.
Removal of Birdie Artifact in Perceptual Audio Coders

6065
Torres-Guijarro, Soledad; Gomez-Alfageme, Juan Jose; Blanco-Martin, Elena; Casajus-Quiros, F. Javier
In videoconference systems formed by microphone and loudspeaker arrays, the sound field reproduced in the receiving room must be as similar as possible to the sampled field by the microphone array (according to the wave field synthesis). Different measurements of objective and subjective quality can be made. A measurement method has been developed based on spatial localization in the horizontal plane. Two different situations have been compared: first, a real source placed at different azimuth angles in front of the listener; second, the virtual source created by the loudspeaker array. Interpolated HRTFs have been calculated according to several methods and in order to determine the azimuth angle the cross correlation function (IACC) and the interaural time difference (ITD) have been evaluated.
Objective Measurements of Sound Source Localization in a Multichannel Transmission System for Videoconferencing

6066
Stofringsdal, Bard; Svensson, Peter
Sound field simulations at low frequencies usually employ finite element or other mesh based methods. For auralization, output data from these methods need to be converted to a format compatible with auralization methods such as Wave Field Synthesis (WFS), Higher Order Ambisonics (HOA) or binaural reproduction. A method is proposed for converting the mesh data to plane wave components using a circular array of virtual sources centered around the listening position. The method is based on solving sets of linear propagation equations in the frequency domain. Results are presented for two-dimensional examples and numerical issues are discussed.
Plane Wave Decomposition of Volume Element Mesh Data Simulations

6067
Mickiewicz, Witold; Sawicki, Jerzy
This paper gives an overview of recent advances in the acoustic and electronic design of studio condenser microphones.
Headphone Processor Based on Individualized Head Related Transfer Functions Measured in Listening Room

6068
Supper, Ben; Brookes, Tim; Rumsey, Francis
Finally, the features build by the system are combined into an optimized machine learning descriptor model, and an executable program is generated to compute the model on any audio signal. In this paper, we describe the overall system and compare its results against traditional approaches in musical feature extraction à la Mpeg7.
A Lateral Angle Tool for Spatial Auditory Analysis

6069
Alexandre, Enrique; Pena, Antonio
This paper presents some ideas for the appropriate management of every information source present in a generic speech or audio coder. This task becomes more necessary as coding structures get more complex, and an appropriate organization and processing of this information is a key point for an efficient implementation, in terms of complexity and quality. First, a data structure will be proposed, inspired by classic comprehension theories, which sorts the information into three different hierarchical levels. Based on this structure, a global sound encoder block diagram will be described. This model is based on blackboard models, commonly applied in speech recognition applications. Finally, it will be shown how an MPEG-2/4 AAC-LC coder can be considered as a particular case of the proposed model.
A Layered Data Model for Information Management In Sound Coding Architectures

6070
Mourjopoulos, John N.; Hatziantoniou, Panagiotis D.
The aim of this study is to investigate the robustness of room acoustics real-time equalization using inverse filters derived from the Complex Smoothing of the Transfer Function using perceptual criteria. The robustness of the method is assessed by real-time tests which compare the performance of Complex Smoothing-based equalization (for different filter lengths) with the traditional, ideal inverse filtering, over a range of room locations, which differ to the ones where response measurements were taken. Objective measurements and audio examples will show that the Complex Smoothing-based equalization performance, is largely immune to position changes and does not introduce processing artifacts, problems affecting the traditional ideal inversion.
Real-Time Room Equalization Based on Complex Smoothing: Robustness Results

6071
Kumar, Suthikshn
Mandelbrot equations are very popular for generating images and music. We propose to use them for generating mobile ring tones. These mandelbrot ring tones are both entertaining and melodious. As the computations required for generating melodious mandelbrot tones are simple iterations, the ring tone generator can be integrated with the mobile handset. The Fuzzy Mandelbrot sets are proposed for extending the usefulness of the ring tone generator. This ring tone generator is personalized by using the audiogram. People with hearing impairment will benefit by the personalized ring tone generator. A PC based mobile phone ring tone generator demo is being developed based on the Nokia series 60 SDK for Symbian OS mobile handsets. This will be used for demonstrating the concepts proposed in this paper.
Personalized Mobile Ring Tone Generator using Madelbrot Music

6072
Breebaart, Jeroen; van de Par, Steven; Kohlrausch, Armin; Schuijers, Erik
Recently, so-called binaural cue coding schemes have been introduced. These audio coding schemes transmit two perceptually relevant sound localization cues (i.e., level and time differences between the input channels), combined with a mono audio signal. Although these schemes are able to reconstruct the locations of various sound sources quite effectively, other aspects of the spatial ambience (such as the spatial diffuseness of reverberation) cannot be captured in this way. In this paper, we will present an extension to these spatial coding schemes, which comprises a spatial sound-field parameter that is able to capture ambience properties. Experiments show that the combination of three spatial parameters enables highly efficient, high-quality stereo audio representations.
High-quality Parametric Spatial Audio Coding at Low Bitrates

6073
Schuijers, Erik; Breebaart, Jeroen; Purnhagen, Heiko; Engdegard, Jonas
Parametric stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of stereo parameters. The monaural signal can be encoded using any audio coder. The stereo parameters can be embedded in the ancillary part of the mono bit stream creating backwards mono compatibility. In the decoder, first the monaural signal is decoded after which the stereo signal is reconstructed from the stereo parameters. In this paper, a low complexity decoder solution is described based on complex-modulated filter banks. Combinations of the parametric stereo decoder with both a parametric coding scheme and with aacPlus will be elucidated.
Low Complexity Parametric Stereo Coding

6074
Purnhagen, Heiko; Engdegard, Jonas; Roden, Jonas; Liljeryd, Lars
Parametric stereo coding in combination with an efficient coder for the underlying monaural audio signal results in the most efficient coding scheme for stereo signals at very low bit rates available today. While techniques for lateral localization have been studied since early intensity stereo coding tools, synthesis of stereophonic ambience was only recently applied in parametric stereo coding systems. This paper studies different techniques for synthetic ambience generation in the context of parametric stereo coding systems and discusses their mono-compatibility. Implementations of these techniques in combination with mp3PRO and aacPlus are presented together with experimental results.
Synthetic Ambience in Parametric Stereo Coding

6075
Prakash, Vinod; Vadapalli, Sarat Chandra
Maintenance of audio quality under the resource constraints on embedded platforms is very crucial. One of the major factors affecting the quality of Stereophonic Audio Coders is the method of distribution of bits across channels of a stereo pair. Conventional approaches use Perceptual Entropy, a computationally intensive metric, to distribute bits across channels. Improper computation or absence of this metric can severely degrade the audio quality. This paper presents an efficient and robust scheme to distribute the bits across channels, without using Perceptual Entropy, while still maintaining the audio quality. In the proposed scheme the bit allocation for both channels is performed simultaneously by allocating bits from a common bit pool. A detailed example illustrating this scheme is presented.
Efficient Bit Distribution Strategy for Stereophonic Audio Coders

6076
Gournay, Philippe; Garcia, Jean-Luc; Lefebvre, Roch
Lossless audio coding aims at achieving the lowest possible bitrate for transmission or storage of audio without any loss of information. This is usually done by first removing redundancy from the audio signal, and then applying entropy coding to the residual signal. Linear prediction (LP), when applied to monophonic signals, is a very effective way to remove redundancy. It produces minimum-phase predictors that are efficiently compressed by combining vector quantization with a meaningful representation of the LP coefficients (such as the LSFs). When applied to stereo signals however, joint channel prediction often produces non-minimum-phase predictors, whose quantization requires a high bit rate and poses stability problems. In this paper, we show that backward estimation of the LP coefficients (where those are estimated on the past decoded signal) solves most of the problems associated with the use of joint channel prediction in a lossless audio coder.
Backward Linear Prediction for Lossless Coding of Stereo Audio

6077
Koenig, Florian M.
Head-related sound reproduction devices vary in transducers characteristics, the acoustic basic principle like open / closed / circum- or supra-aural systems. Furthermore the transducers de-/centred placement inside the earcup influences the tone quality. These headphone techniques were evaluated thousand times in comparison meanwhile. One creation with a spatial reproduction of sound was much more conspicuous statistically, because of a higher quantity recommended sound quality judgements as "to much" and "less high frequency range" parallely. This forced investigations to find the reason of those strange review accumulations. Four different headphone types were measured via seven testing persons by probe microphones in the auditory cannel. The research result shows an electro-acoustic cause for perceived tone coloration's of headphones by in the transducer positioning and the human pinna filtering efficiency.
The Causals of Headphones Tone Coloration Variations Related on the Human Pinna Influence

6078
Kapralos, Bill; Zikovitz, Daniel; Jenkin, Michael R.; Harris, Laurence R.
Despite its potential importance, few studies have methodically examined the role of auditory cues to the perception of self-motion. Here we describe a series of experiments that investigate the relative roles of various combinations of physical motion and decreasing sound source intensity cues to the perception of linear self-motion. Self-motion was simulated using either (i) physical motion only, (ii) moving audio-cues only, (iii) decreasing intensity cues, and (iv) physical motion coupled with moving audio-cues. In all conditions an over-estimation of self-motion of measures that varied systematically with the simulated acceleration. Of particular interest was that audio cues combined with physical motion cues resulted in more accurate estimates of self-motion than did either audio or physical motion cues in isolation.
Auditory Cues in the Perception of Self Motion

6079
Mackensen, Philip
The localization of a single sound source can be described mathematically by a new formalism to be presented here. Commonly, the HRTF (head related transfer function) is described as a function of variables related to the sound source position and of variables related to the spectrum. In this new approach the multivariable representation of the HRTF is replaced by introducing two independent transfer functions, one only regarding the position and the other regarding solely the source?s spectral attributes. Therefore, it can be separated between the three dimensional local space and the ?spectral space?. This offers a localization independent of the Gestalt of the sound source.
A New Mathematical Approach to Describe Localization

6080
Garcia Arnal Barbedo, Jayme; Lopes, Amauri
The current ITU?s standard for objective assessment of audio quality, Perceptual Evaluation of Audio Quality (PEAQ), has some shortcomings that prevent its reliable use for a number of codification conditions and some kind of signals. The paper aims to improve the PEAQ performance through the following proposals: 1- modifications in the manner the signals are submitted to the assessment; 2- improvement of existing Model Output Variables (MOVs); 3- creation of new MOVs; 4- determination of a better architecture for the neural network that maps the MOVs into a single estimate for the subjective score. The results are compared to those ones achieved by PEAQ.
Strategies to Increase the Applicability of Methods for Objective Assessment of Audio Quality

6081
Ramos, German; Lopez, Jose Javier
In this paper a subjective evaluation of a novel method for loudspeaker equalization is presented. The equalization is performed using a direct method with random parametric optimization for the design of a bank of second order peak filters, RaPOSOS. The subjective evaluation has been carried out using a preselected jury composed by lecturers, research staff and university students related with audio engineering. For evaluating its performance, it has been compared with other well known equalization method using the ABX test. In particular, our method with different level of approximation has been compared with long FIR filters obtained by minimum square error criteria. The results show that with relatively low order filters, the perceived difference is anecdotic or inexistent, requiring quite less computational cost.
Subjective Evaluation of an Equalization Method for Loudspeakers Based on Random Parametric Optimization of IIR Filters

6082
Ivanov, Alexei V.; Petrovsky, Alexander A.
An alternative approach to psychoacoustical masking modelling is to model such phenomena as suppression, spreadof-excitation and IHC adaptation, which are among the underlying physiological phenomena for psychoacoustically observed masking. This paper proposes a physiologically grounded model for threshold estimation. It includes a reconfigurable non-uniform filterbank to simulate a "cochlear amplifier" and an associated suppression effect and a digital reservoir IHC model to account for their adaptive responses. It allows designing coders, which retain enough information to create an identical excitation pattern in the auditory nerve compared to that of the original signal. As our model is based on the masking physiology, its application is justified in the complex audio signals case.
A Composite Physiological Model of the Inner Ear for Audio Coding

6083
Antsalo, Poju; Karjalainen, Matti; Makivirta, Aki; Valimaki, Vesa
Modal equalization has recently been of research interest in order to improve sound reproduction in rooms that have excessively strong modes at low frequencies. Instead of acoustic treatment by expensive and space-reserving absorbing structures, modal equalization is based on DSP affecting the electric-to-acoustic reproduction chain. Several DSP-based techniques for modal equalization have been proposed recently and tested in performance. From a perceptual point of view, however, no clear picture on the importance of controlled temporal decay has been shown, although it is known that towards the lowest frequencies the human hearing becomes increasingly insensitive to temporal details. In the present study we conducted listening tests where only a single synthetic mode with increased decay time but magnitude-equalized response was used to find the JND threshold of excessive decay time. The main conclusion is that at typical listening levels and down to 100 Hz the modal decay time T60 is allowed to increase from about 0.3 seconds by 0.1 to 0.4 seconds, while at 50 Hz even decay times of up to two seconds do not make a noticeable difference.
Perception of Temporal Decay of Low-frequency Room Modes

6084
Hameed, Sharaf; Pakarinen, Jyri; Valde, Kari; Pulkki, Ville
The ability of human listeners to estimate the size of a room from the acoustical response of that room is an interesting and not yet thoroughly examined phenomenon. This study uses simulated multi-channel room impulse responses convolved with speech signals as stimuli in listening tests to explore the perception of room size. The synthetic room impulse responses contained two adjustable parameters, and our goal was to study how these parameters affect the perceived size of this virtual room. Listening tests were conducted to test the effect of reverberation time and the direct to reverberant energy ratio (D/R ratio). Sound samples with different parameter settings were presented as stimuli in a paired comparison test procedure. The results reveal that reverberation time is unequivocally the most important parameter. It appears that D/R ratio is not used in room size perception.
Psychoacoustic Cues in Room Size Perception

6085
Kozlowski, Piotr; Dobrucki, Andrzej
This document displays research about objective methods, which use psychoacoustics knowledge for estimation of the quality of audio signals. The software written especially for this research is presented. This program allows for implementation of the different methods for evaluation of the quality of perceptual coded audio signals. Protocols: PAQM, PSQM, NMR, PEAQ, PESQ are ready to use. All of these algorithms are used for simulation of the auditory system. The software is open for addition next protocols as the plug-ins. There is a possibility to change and improve earlier protocols. Suggested changes, which improve results of objective evaluation, are presented. The criterion of optimization is a difference between results of subjective and objective evaluation tests.
Proposed Changes to the Methods of Objective, Perceptual Based Evaluation of Compressed Speech and Audio Signals

6086
Blech, Dominic; Yang, Min-Chi
To study perceptual discrimination between two digital audio coding formats, ?Direct Stream Digital? and highresolution (24-bit, 176.4 kHz) PCM, subjective listening comparison tests were conducted with specially recorded sound stimuli in stereo and surround. To guarantee their reliability, validity and objectivity, the double-blind ABX tests followed three main principles: The signal chain should be based on identical audio components as far as possible; these components should be able to convey very high audio frequencies; and the test population should consist of various groups of subjects with different listening expectations and perspectives. The results showed that hardly any of the subjects could make a reproducible distinction between the two encoding systems. Hence it may be concluded that no significant differences are audible.
DVD-Audio versus SACD: Perceptual Discrimination of Digital Audio Coding Formats

6087
Sarris, John C.; Stefanakis, Nick J.; Cambourakis, George E.
Multichannel equalisation is generally accomplished by designing inverse filters to remove the distortion associated with the transmission paths between a set of sources and receivers. The filters are estimated by minimising a cost function based on the least squares error criterion. However, under certain conditions this least squares error based formulation fails to provide a solution or provides a solution that lacks robustness. These conditions are investigated and modifications are introduced in the definition of the cost function so that the problem has always a solution with increased robustness. Moreover, the multiple error LMS algorithm is employed to adapt the filter coefficients to their optimum values, issues like convergence speed and stability are discussed and simulation results are presented.
Signal Processing Techniques for Robust Multichannel Sound Equalisation

6088
Miller, Ray
Equalizers with fixed frequency filter bands, although successful, have historically had a combined frequency response that at best only roughly matches the band amplitude settings. This situation is explored in practical terms with regard to equalization methods, filter band interference, and desirable frequency resolution. Fixed band equalizers generally use second-order discrete filters. Equalizer band interference can be better understood by analyzing the complex frequency response of these filters and the characteristics of combining topologies. Response correction methods may avoid additional audio processing by adjusting the existing filter settings in order to optimize the response. A method is described which closely approximates a linear band interaction by varying bandwidth, in order to efficiently correct the response.
Equalization Methods with True Response using Discrete Filters

6089
Ramos, German; Lopez, Jose Javier
This paper presents a novel method for audio equalization using IIR (Infinite Impulse Response) filters. The algorithm is based on a direct method with a random parametric optimization process using second order sections (RaPOSOS). Given a loudspeaker response, and the definition of the desired electro-acoustical target response, an optimized filter is obtained. For full band loudspeakers, a bank of peak filters is designed to perform the equalization. For multiway systems, the process is repeated for each way with bandpass targets using lowpass, highpass and peak filters computing the combined response and performing time-align correction. The final result provides the parameters that define each filter (frequency, gain, Q) in correction order of importance; first the ones that perform deepest improvement, so scalable solutions with different degrees of correction could be derived.
Direct Method with Random Optimization for Parametric IIR Audio Equalization - Applications to One Way and Multiway Systems

6090
Watson, Matthew A.; Ganju, Vineet; Maur, Gaganjot
Many audio algorithms, such as room simulators and reverberators, operating on Digital Signal Processors access large delay buffers in a non-sequential fashion. Generally, these delay buffers are too large to reside in the on-chip memory of the processor, so they must be placed in external, slow memories. Furthermore, the non-sequential accesses present a problem for maintaining high performance. This paper presents a number of methods that may be employed to improve the performance of the memory accesses of such algorithms. Methods examined include the use of direct CPU memory access, hardware data cache, and dedicated Direct Memory Access (DMA) controllers. Additionally, the algorithm, sample block size, delay taps, tap spacing, and buffer size will be examined and performance results will be presented.
Performance Improvements for Audio Algorithms that Use Non-sequential Memory Accesses on Digital Signal Processors

6091
Cheng, Corey
This paper introduces a method for estimating the magnitude and phase responses in audio coders which employ the Modified Discrete Cosine Transform (MDCT). This technique computes magnitude and phase estimates at the decoder using two pieces of information: 1) MDCT coefficients transmitted by the encoder; 2) an estimate of the Modified Discrete Sine Transform (MDST) computed from the transmitted MDCT coefficients. In this manner, approximate magnitude and phase estimates suitable for use with some decoder-oriented signal processing techniques can be constructed entirely from MDCT coefficients available at the decoder. We show that these approximate methods are less computationally intensive than exact methods, and we compare the performance of the approximate methods to exact methods.
Method for Estimating Magnitude and Phase in the MDCT Domain

6092
Reiss, Joshua D.; Sandler, Mark B.
This work explores the effects of limit cycles on the frequency content in the DSD bitstream. We show how any periodic bitstream can be expressed as a sum of square waves of various phases with width equal to the sampling period. A Fourier expansion may be used to exactly determine the phases and amplitudes of all spectral content. We thus determine all harmonics that appear in the output, and thus are able to distinguish limit cycles from idle tones. These results are put into the context of recent advances in the theory of limit cycles and idle tones in sigma delta modulators.
The Harmonic Content of a Limit Cycle in a DSD Bitstream

6093
Lipshitz, Stanley P.; Vanderkooy, John
This is Part 4 of an ongoing investigation into the behaviour of 1-bit sigma-delta modulators. In this paper we question the usual concept of the ?average quantizer gain? as it applies to the quantizer transfer characteristic of a 1-bit modulator. We show that the concept is rather nebulous and does not help us to understand the operation of the 1-bit modulator. But our investigation of a number of possible alternative definitions of the gain shows that some of them do yield stable values which may have some significance.
Towards a Better Understanding of 1-Bit Sigma-Delta Modulators ? Part 4

6094
Zhang, Haihua; Busbridge, Simon C.; Fryer, Peter A.
The resolution of true digital loudspeakers is currently limited by their physical construction. Transducer arrays require 2 to the Nth power minus 1 speaklets and multiple voice coil topologies require N coils (N = the number of bits). Oversampling and noise shaping has been used to maintain resolution with fewer bits. Results are presented where the oversampled signal falls both within and outside of the bandwidth of the radiator. A linear model is being developed to understand the observations. The radiator displacement shows little difference between the original and oversampled cases. It is concluded that the limited bandwidth of existing acoustical radiators is advantageous in acting as the re-integration filter. In circumstances where this is not possible the auditory system may perform this task.
Bit Expansion in Digital Loudspeakers with Oversampling and Noise Shaping

6095
Larsen, Peter
The frequency response of a loudspeaker cone is affected by two main factors: Material Parameters and Geometry. While the first may be generally understood, the inherent stiffness due to the basic geometry is the subject of this study. Using Finite Element Modelling (FEM), first a flat cone disk is analysed followed by shallow and deep conical cones plus curved concave and convex cones. The results are extended to include softer and high damping cone materials. The cone break-up behaviour and frequency response is shown to be strongly dependant on the Geometrical Stiffness of the Cone, which should therefore be considered a very important design parameter.
Geometrical Stiffness of Loudspeaker Cones

6096
Fontanesi, Lorenzo; Salvini, Alessandro
Demodulation ring solution can offer many advantages in terms of harmonic distortion reduction for high power 18in woofers. In this paper is shown a circuit approach to evaluate the effects of aluminum short circuit rings properly shaped to improve woofer performances. To find the Laplacian force that acts on the voice coil the proposed approach make use of the partial inductance calculation method [1] to evaluate the distribution of eddy currents into the massive ring aluminum conductors. By partitioning the conductor into cells and by modeling each cell by an equivalent circuit, this method can give results showing a maximum error equal to 6% by comparing measurements to simulations.
A Circuit Approach to Short Circuit Ring Design for High Power Woofers

6097
Behler, Gottfried K.; Makarski, Michael
For the numerical simulation (BEM) of horns, the sound velocity distribution at the horn throat is required as one boundary condition. It is common to use plane wave excitation even at high frequencies since the shape of the real wave front in general is unknown. The error in the simulation result (directivity / frequency response) is difficult to predict and can only be judged by measurement of the real system. To achieve accurate simulation results the specific velocity distribution of each driver is required which must be measured at the interface between horn driver and horn. A more general approach for simulation techniques is created using modal composition. Measurements and simulations of different systems are compared to verify this method.
On the Velocity Distribution at the Interface of Horn Driver and Horn

6098
Makarski, Michael
The basic theory and a measurement procedure for the two-port description of horn drivers and horns was presented at the 111th AES Convention in New York, 2001 (Preprint 5409 "Two-port Representation of Horn Driver and Horn"). It was shown that this method is a powerful tool for the development of loudspeakers but it suffered from the restricted frequency range of the necessary acoustical impedance measurements with the Kundt's tube. A new method of measuring the driver's two-port parameters is presented here using only electrical measurements and an acoustical reference impedance. The frequency range of the two-port parameters could be extended using this method. The theoretical approach and first results are presented.
Determining Two-Port Parameters of Horn Drivers using only Electrical Measurements

6099
Doldi, Davide; Mocellin, Marco; Antinori, Paolo; Orsoni, Remo; Santarelli, Giorgio; Di Cola, Mario; Grifoni, Rinaldo
High output loudspeaker systems, particularly horn loaded loudspeaker system, are often severely affected by unwanted structural resonances due to the high sound pressure locally generated.This sound pressure turns out to be a great stimulus for cabinet?s structural modes. An experimental procedure aimed to resonances minimization is being shown. This method is based on FEM structural analysis techniques validated by accelerometer measurements
Analysis and Minimization of Unwanted Resonances in Loudspeaker Systems via FEM Techniques

6100
Leitao, Jorge; Ferreira, Anibal J. S.; Fernandes, Gabriel
This paper addresses the implementation of a real-time 20-band adaptive digital audio equalizer for room equalization. The system has been implemented on a TMS320C6711 DSP platform and performs adaptive filtering using techniques of fast filtering in the frequency domain that include an adaptation procedure. The paper explains how the structure of a previously designed graphic equalizer has been improved to support adaptivity, describes its operation as well as its functionality based on a graphical user interface, and presents the results of tests that have been conducted to optimize its performance.
Adaptive Room Equalization in the Frequency Domain

6101
Mourjopoulos, John N.; Vassilantonopoulos, Stamatis L.
Virtual acoustics can assist the aural exploration and the study of the acoustic properties of famous buildings of the antiquity. Here, examples of such reconstruction of ritual and public buildings of the ancient Greek city of Olympia are presented and findings of their acoustic behavior are introduced, especially with respect to the modes of speech communication and general functionality. Examples of these auralisations are presented and are made available in an electronic address.
Acoustic Reconstruction of Buildings in the Ancient City of Olympia

6102
Goussios, Christos A.; Dimoulas, Charalampos A.; Kalliris, George M.; Papanikolaou, George V.; Fouloulis, Athanassios G.
The purpose of this work is the design and implementation of a software application for the estimation of the acoustic behaviour of a rectangular room, when a number of sound sources are activated. The room dimensions, the number and positions of the sources can be modified. Materials are chosen from a library. Sound level distribution is calculated for a desired section of the room, using the image source method. Room modes are calculated for studying the standing waves. Reverberation times are also calculated using statistical formulas. Effort has been made for the use of this software in non-rectangular rooms, based on different estimation methods.
A Software Application for Estimation of Room Acoustic Behaviour by Multi Source Excitation

6103
Goussios, Christos A.; Kalliris, George M.; Papanikolaou, George V.; Sevastiadis, Christos V.
Apart from the world famous ancient Greek theaters, whose acoustics often attracted engineers, smaller closed amphitheatric halls -called Odea (plural of the Greek word Odeion)- had been constructed and used through the Greek and roman periods. The acoustical characteristics for some of them and information concerning their location, use, history and architectural elements are presented. An effort for the modeling and estimation of their acoustics was made. Results of measurements that had been also carried out are discussed.
The Acoustics of Ancient Greek Odea

6104
Völker, Ernst-Joachim; Teuber, Wolfgang
In 1931 in Berlin the Haus des Rundfunks was opened with a ceremony in studio 3 after a hectic 8 years of radio history beginning with the first transmission from the FOX Haus near Potsdamer Platz (now: Sony Center)in 1923. In 1929 UFA decided to eventually start with sound films (at first only 50% of the productions) after 5 years of financial pressure and hard decisions when for instance more than 10000 musicians in film theaters had to be fired due to cost reduction. UFA in Babelsberg immediately built the so-called Tonkreuz with 4 huge halls and a center for sound mixing in the middle of the cross. Quality requirements became stronger. Therefore sound approved film studios became necessary. In Babelsberg the existing halls of the Tonkreuz were used to built in a House-in-House or a room-in-room construction with double walls, roof, doors and windows.
On the Acoustics of Old Berlin Studios for Film and Radio

6105
Szczuko, Piotr; Dalka, Piotr; Dabrowski, Marcin; Kostek, Bozena
The objective of this paper is to determine which of the MPEG-7 standard low-level sound descriptors are the most significant in the process of automatic classification of musical instrument sounds. First, pitch detection is performed. Then, the parametrization stage of musical sounds based on descriptors contained in the MPEG-7 standard is carried out. Next, a thorough statistical analysis of the feature vectors obtained is performed. For the purpose of automatic classification, two decision systems based on artificial neural networks (ANNs) and rough sets, are used. Both decision systems are trained with feature vectors consisted mostly of parameters contained in the MPEG-7 standard, however their content being reduced after statistical analyses. In addition, a comparison of results obtained by these decision systems with the results got from the nearest neighbor algorithm is made.
MPEG-7-based Low-Level Descriptor Effectiveness in the Automatic Musical Sound Classification

6106
Purwins, Hendrik; Blankertz, Benjamin; Obermayer, Klaus; Dornhege, Guido
In this paper we introduce and explore a method for extracting low dimensional features from digitized recordings of music performance: The so called constant Q scale degree profiles are 12-dimensional vectors that reflect the prominence of the 12 scale degrees in respective analyzed part of music. Here we study the type and amount of information that is captured in those profiles when calculated from whole short pieces of piano music. The analyzed data set includes pieces from Bach's Well-Tempered Clavier (WTC), part I and II, the sets of preludes that encompass a piece in every key by Chopin (op.28), Alkan (op.31), Scriabin (op.11), Shostakovich (op.34), and the fugues of Hindemith's `ludus tonalis' (one fugue for each pitch class, neither major nor minor). For the purpose of investigation we employ supervised and unsupervised machine learning techniques. In a supervised approach we investigated the ability of classifiers to recognize composers from profiles. As unsupervised methods we performed (1) a cluster analysis which resulted in one major and one minor cluster, and (2) a visualization technique called Isomap which reveals in its 2-dimensional representation some additional structure apart from the major--minor duality. In summary it is astonishing how much information on a music piece is contained in the 12-dimensional profiles that can be calculated in a straight-forward manner from any digitized music recording.
Scale Degree Profiles from Audio Investigated with Machine Learning

6107
Vieira, Jose
The correct estimation of the reverberation time of the room acoustics can be an important task for several systems such as sound localizers, hearing-aids and telephony. These systems are affected by reverberation and needs an estimate of this acoustic parameter in order to adapt the algorithms to different environments. This article presents a method to estimate the reverberation time of a room widthout using test signals. From the captured signals in the room, the system is able to estimate the reverberation time without any prior knowledge of the sound sources or room geometry. The estimates are obtained from the "tails" of the sounds, and we use a run-length energy integral followed by an algorithm that estimates the decay of the sound energy.
Automatic Estimation of Reverberation Time

6108
Goldberg, Andrew; Makivirta, Aki
We compare the room response controls available in active loudspeakers to a third-octave graphical equaliser. The room response controls are set using an automated optimisation method presented in earlier AES publications. A third-octave ISO frequency constant-Q graphic equaliser is set to minimise the least squares deviation from linear within the passband in a smoothed acoustical response. The resulting equalisation performance of the two methods is compared using objective metrics, to show how these standard room response equalising methods perform. For all loudspeaker models pooled together, the room response controls improve the RMS deviation from a linear response from 6.1 dB to 4.7 dB (improvement 22%), whereas graphic equalisation improves the RMS deviation to 1.8 dB (improvement 70%). Both equalisation techniques achieve a similar improvement in the broadband balance, which has been shown to affect a subjective lack of colouration in sound systems. The optimisation time for a graphic equaliser is up to 48 times longer compared to that for active loudspeaker room response controls.
Performance Comparison of Graphic Equalisation and Active Loudspeaker Room Response Controls

6109
Huon, Graeme; Velican, Zeljko
Most important was not to forget the training of all technicians from every unit as post-production, studio, OBfacility, continuity and transmission. Even the (non-sound-minded) editors who fill in all the production aspects in an off-line video facility, do need some facts on how to judge loudness. The external production units of advertising trailers and programs should also be given the necessary information.
Spatially Consistent Reproduction of the Reverberant Sound Field

6110
Ferekidis, Lampos; Kempe, Uwe
Most low frequency sources radiate energy in an omni-directional manner. This often leads to unsatisfying results regarding the reproduction of low frequencies in small listening rooms. The influence of different radiation characteristics is investigated concerning the reproduction of low frequencies in a sparsely modal environment. In this paper the room transfer function characteristic of a monopole, a dipole, and a cardioid are compared. The different room mode excitation mechanisms are explained using comparative measurements taken in a reverberation chamber. Furthermore the effect of a single reflective boundary on the low frequency response is simulated. The cardioid turns out to be the more preferable low frequency source for the three types investigated.
The Beneficial Coupling of Cardioid Low Frequency Sources to the Acoustics of Small Rooms

6111
Backman, Juha
The one-cycle time offset between the high-pass and low-pass sections typical to symmetrical constant-amplitude crossover networks implies that the polar pattern is controlled by a single driver (or driver group) during the onset and end of a sharp transient. This implies that the ratio of overall radiated energy to the input energy near the crossover frequency depends on the duration of the transient, which again affects the sound pressure in a reverberant field.
Polar Pattern and Energy Response of Transients in Multi-way Loudspeakers

6112
Beigelbeck, Roman; Pichler, Heinrich
In security relevant workspaces, such as air traffic control rooms, near field beam forming in small spaces is an important task. In this paper, a sound design based on a set of n-linear loudspeaker arrays where each consists of m-elliptic loudspeakers is investigated from a mathematical point of view. Based on these results, optimized array parameters are determined and useful approximations are developed. Three-Dimensional near field directional diagrams of the sound pressure in front of the arrays are shown to visualize the sound field. These diagrams are plotted and evaluated for different frequencies and distances of the field point, in addition to variations in the control signal phases and amplitudes. Finally, these theoretical values are compared with practical results.
Near Field Beam Forming in Security Relevant Workspaces Using a Set of Linear Loudspeaker Arrays

6113
Olive, Sean E.
Part I of this paper presents the objective measurements and listening test results on 13 loudspeakers rated according to preference, spectral balance and distortion. In part II the data provides the framework for the development and verification of a multiple regression model that predicts listeners? preferences based on objective measurements. We review relevant predictive models and test one model currently used by Consumers Union (CU), a consumer product testing organization in the United States. There is no correlation between listeners? loudspeaker preference ratings and CU?s predicted accuracy scores (r = 0.05; p = .81). As the CU model is based largely on the loudspeaker?s 1/3-octave sound power response we conclude that measured sound power, alone, cannot accurately predict its perceived sound quality.
A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part I - Listening Test Results

6114
Torres-Guijarro, Soledad; Beracoechea-Alava, Jon Ander; Casajus-Quiros, F. Javier; Perez-Garcia, Isidoro
The Karhunen-Loeve Transform (KLT) has proven to be an efficient method of decorrelating multichannel signals prior to coding. Careful bit rate allocation among decorrelated channels reduce the overall bit rate. In order to explore how bits are distributed in the coding process, a new quality measure of the reconstructed sound field is proposed: the binaural signal that the listener would perceive in a real environment is synthesized and evaluated by means of the standard Perceptual Audio Quality Measure (PEAQ). Results on codification via AAC with different kind of audio signals, bit allocations and multichannel arrangements are reported.
Coding Strategies and Quality Measure for Multichannel Audio

6115
Hosoi, Shintaro; Hamada, Hiroyuki; Kameyama, Nobuo
In this paper, we raise the issue of bass reproduction of surround music, when using LFE. We showed that this issue originates from the method of creating an LFE. Therefore, we propose the practicable method of ``LFE phase sync,'' that improves the quality of bass by applying the proper amount of delay. The optimum delay is calculated for using this method for various cutoffs and order of filters. We introduce the manner in which this method can be used for actual recording projects, and mentioned the method for monitoring when some encoder is used.
An Improvement in Sound Quality of LFE by Flattening Group Delay

6116
Montoya, Sebastien; Bruno, Remy; Laborie, Arnaud
Multichannel recording is certainly one of the most important remaining issues concerning today's sound techniques. A good surround recording is extremely difficult to obtain as it must fulfill a number of conditions including envelopment feeling, accurate localisation and a large sweet spot without compromising the timbres. Advanced signal processing allows to obtain directivities designed from panning laws, which have been designed to optimally drive any multichannel layout. This paper presents the underlying concept of High Spatial Resolution, the spatial equivalent for High Fidelity, and points out why this is a key point to achieve high spatial quality. Actual performances of such a High Spatial Resolution 5.0 microphone featuring a small array of 8 omnidirectional capsules are fully simulated and measured.
High Spatial Resolution Multichannel Recording

6117
Pellegrini, Renato; Kuhn, Clemens
Wave Field Synthesis (WFS) provides holographic sound reproduction for a large listening area. Fundamentals of WFS recording and reproduction techniques have been developed in the past few years, however there is a lack of intuitive tools for WFS mixing and mastering. In this paper the authors propose a WFS user interface compatible with available and accepted digital audio workstations. These WFS-plug-ins are based on a novel audio network technology. They open new possibilities for creative audio production in WFS.
Wave Field Synthesis: Mixing and Mastering Tools for Digital Audio Workstations

6118
Strauss, Michael; Wagner, Andreas; Walther, Andreas; Melchior, Frank
Wave Field Synthesis permits the reproduction of a sound field, which fills nearly the whole reproduction room with correct localization and spatial impression. This technology enables a correct spatial sound reproduction with a proper localization over a wide listening area. So far, this technique has been mainly used and demonstrated for music reproduction. Because of its properties, WFS is ideal for the creation of sound for motion picture or virtual reality applications. In both cases the creation of highly immersive atmospheres is important to give the auditorium the illusion of being a part of the auditory scene. In this paper a new approach in designing immersive atmospheres (e.g. rain) using Wave Field Synthesis reproduction is presented. New tools and techniques to control and generate these atmospheres have been developed and investigated in listening tests.
Generation of Highly Immersive Atmospheres for Wave Field Synthesis Reproduction

6119
Buchner, Herbert; Spors, Sascha; Rabenstein, Rudolf
Wave field synthesis is an auralization technique which allows to control the entire wave field within the entire listening area. However, reflections in the listening room interfere with the auralized wave field and may impair the spatial reproduction. Active listening room compensation aims at reducing these impairments by using the playback system. Due to the high number of playback channels used for wave field synthesis, the existing approaches to room compensation are not applicable. A novel approach to active room compensation overcomes these problems by a transformation from the space-time to the wave domain and application of wave-domain adaptive filtering.
Efficient Active Listening Room Compensation for Wave Field Synthesis

6120
Buchner, Herbert; Spors, Sascha; Kellermann, Walter
For high-quality multimedia communication systems such as telecollaboration or virtual reality applications, both multichannel sound reproduction and full-duplex capability are highly desirable. Full 3D sound spatialization over a large listening area is offered by wave field synthesis, where arrays of loudspeakers generate a prespecified sound field. However, before this new technique can be utilized for full-duplex systems with microphone arrays and loudspeaker arrays, an efficient solution to the problem of multichannel acoustic echo cancellation (MC AEC) has to be found in order to avoid acoustic feedback. This paper presents a novel approach that extends the current state of the art of MC AEC and transform-domain adaptive filtering by reconciling the flexibility of adaptive filtering and the underlying physics of acoustic waves in a systematic and efficient way. Our new framework of wave-domain adaptive filtering (WDAF) explicitly takes into account the spatial dimensions of loudspeaker arrays and microphone arrays with closely spaced transducers. Experimental results with a 48-channel AEC verify the concept for both, simulated and measured room acoustics.
Full-Duplex Systems for Sound Field Recording and Auralization Based on Wave Field Synthesis

6121
Apel, Andreas; Roeder, Thomas; Brix, Sandra
Wave Field Synthesis allows the reproduction of arbitrary wave fields in a large listening area. The theoretical driving function for the loudspeakers states, that a correction filter must be implemented to get a flat frequency response of the system. Practical implementations require an adaptation of the filter to the current source position. In this paper measurements of frequency responses for different source positions are compared. Based on those measurements a method for a proper equalization of the system is proposed. Finally, results of listening tests are shown, which compare the quality of a position dependent filtering with a position independent filtering.
Equalization of Wave Field Synthesis Systems

6122
Putzeys, Bruno; de Saint Moulin, Renaud
The impact of clock jitter on AD/DA conversion performance is detailed for several conversion methods. Account is taken of the spectral distribution of both the jitter and of the converted waveform. The inadequacy of a single ?picosecond? performance figure is shown, and the use of a dBc/sqrt(Hz) specification is proposed instead.
Effects of Jitter on AD/DA Conversion - Clock and Interface Jitter Specifications

6123
Wolfe, Patrick J.; Howarth, Jamie
The goal of most sampling schemes is to sample the analogue signal of interest at a regular rate sufficiently high to ensure a perfect reconstruction principle in theory. Indeed, analysis and subsequent signal processing is almost always predicated on this requirement. However, the assumption of uniformly spaced samples is often invalidated in practice. Here, we describe nonuniform sampling theory, which provides a framework for the investigation and analysis of such cases. We review aspects of the theory and describe how it may be applied to practical problems of interest in audio signal processing, including those of wow and flutter in the analogue domain as well as jitter in the digital domain.
Nonuniform Sampling Theory in Audio Signal Processing

6124
Tikander, Miikka; Harma, Aki; Karjalainen, Matti
Tracking a user's movement and orientation is essential for providing realistic mobile augmented reality audio (MARA) services. For mobile use the tracking system needs to be light-weight, wearable and wireless. Binaural microphones offer a convenient and practical solution for tracking user movement and orientation. These sensors can be easily integrated with portable headphones. In addition to tracking, the microphones also offer several possibilities to control the user's acoustic environment. This paper reviews the latest results in binaural head tracking with known anchor sources and also discusses the case where there are no known anchor (reference) sources available. Some transducer issues are also discussed.
Acoustic Positioning and Head Tracking Based on Binaural Signals

6125
Paul-Taiwo, Adebunmi; Sandler, Mark B.; Davies, Mike
Most important was not to forget the training of all technicians from every unit as post-production, studio, OBfacility, continuity and transmission. Even the (non-sound-minded) editors who fill in all the production aspects in an off-line video facility, do need some facts on how to judge loudness. The external production units of advertising trailers and programs should also be given the necessary information.
Feature Extractors for Music Information Retrieval: Noise Robustness

6126
Czyzewski, Andrzej; Kotus, Jozef; Rypulak, Andrzej; Pawlik, Arkadiusz; Kaczmarek, Andrzej; Zwan, Pawel
A general characteristic of the engineered speech signal registration and restoration system is presented in the paper. It contains a concise description of specific components of the system, the system being a set of advanced tools for registration, analysis and reconstruction of speech, existing in the form of computer software. The tools included allow for prompt search of desired fragments of recordings and for the improvement of their quality through noise, distortion and interference reduction. A brief information concerning selected speech reconstruction algorithms is presented also, the use of which allowed for an especially significant increase of processed speech comprehension.
A System for Multitask Noisy Speech Enhancement

6127
Zils, Aymeric; Pachet, Francois
High-Level music descriptors are key ingredients for music information retrieval systems. Although there is a long tradition in extracting information from acoustic signals, the field of music information extraction is largely heuristic in nature. We present here a heuristic-based generic approach for extracting automatically high-level music descriptors from acoustic signals. This approach is based on Genetic Programming, used to build relevant features as functions of mathematical and signal processing operators. The search of relevant features is guided by specialized heuristics that embody knowledge about the signal processing functions built by the system. Signal processing patterns are used in order to control the general processing methods. In addition, rewriting rules are introduced to simplify overly complex expressions, and a caching system further reduces the computing cost of each cycle.
Automatic Extraction of Music Descriptors from Acoustic Signals Using EDS

6128
Busbridge, Simon C.; Herman, David; Haestier, Dudley
The effectiveness of conventional noise cancellation techniques is limited by tolerances between the signal path and noise path A system is described in which the ambient noise error signal is fed back for further cancellation (Advanced ambient Noise Rejection Technology, ANRT). Two microphones slightly separated differentiate near field signals from high level ambient noise. Band limiting filters further reduce high frequency phase distortion. The effectiveness is increased such that an unintelligible signal produced by normal speech can result in a SNR improvement of 40 dB in an ambient noise field of 98 dBA. The technology can be integrated into a single, small low power CMOS analogue integrated circuit; it is also ideally suited for MEMS (Si-Mic).
An Improved Method of Noise Rejection

6129
Goldin, Alexander A.
The paper presents Close Talking mode of Autodirective Dual Microphone (ADM) technology developed by Alango Ltd. ADM is an adaptive beamforming technology having two operational modes. In Far Talk mode ADM provides optimal directivity for every frequency region such that sounds coming from the back plane are cancelled. In Close Talk mode all sounds originating outside a close proximity to the microphone are (theoretically) completely cancelled. ADM fast adaptation time leads to excellent noise cancellation in changing noisy environments. ADM technology has a low demand for placing, matching and distance between individual sensors. This simplifies its integration into mobile and other devises. ADM operational mode is defined by DSP algorithm easily switching according to situation.
Close Talking Autodirective Dual Microphone

6130
Mueller, Roland; Holstein, Peter
Digital microphones are commonly based on an LF-condenser wit an ADC in the same ousing. However,this concept as some disadvantages,suc as t e inherent problems of LF-condenser microp ones with respect to t e influence of umidity on sensitivity,distortion and low cut-off frequency. Therefore,anot er approac for digital microphones is proposed,w ereby the capacity of t e microp one capsule controls t e frequency of an LC-type generator.The resulting non-linear distortion is of second order and similiar to those of classical microphones wit vacuum tube preamplifier.A negative capacitance can be added to reduce the distortion.There are several ways to implement demodulation and digitalization;simulations show that a sufficient dynamic range can be achieved by using a special kind of sigma-delta FM discriminator.
About A Digital RF-Condenser Microphone

6131
Peus, Stephan
Condenser microphones have been used for more than 70 years in professional audio recording applications, due to their good frequency response, extended frequency range and wide dynamic range. If all parameters are properly designed, the microphone capsule will also have an excellent transient response. The basic design of studio microphone capsules today dates back several decades. Some capsules have been in production unchanged for 50 years or more. Nevertheless, the technical performance of microphones has been improved step by step by continued refinement of the associated electronic circuitry (e.g. tubes versus semiconductors, FET technology improvements, circuitry design aspects, etc.). Not until a few years ago did the quality of the electronics finally match that of the capsule in terms of self-noise level and dynamic range. However, the capsule design has also been improved by making use of technological advances and modern materials.
Modern Acoustic and Electronic Design of Studio Condenser Microphones

6132
Gorelik, Vladimir; Peissig, Juergen; Kudaev, Sergey; Schreiber, Peter
Motivated by the advantages of optical sensors, like immunity with respect to EMI/RFI and electrically isolated realization, today´s fiber- and micro-optics technology enables the manufacturing of sensitive optical microphones. In the first part of this article a short review of applicable sensing principles is given and pros and cons for realization are discussed. In the second part design, manufacturing and characterization for two fiber-coupled optical microphones employing optical sampling of a membrane are presented.
Fiber-Coupled Optical Microphones

6133
Ignatov, Pavel V.
The history of the sound recording in Russia dates back to the end of the XIX-th century. The creation of first sound recording studios began in the 20-30?s. Although the technical facilities, which were used then seemed to be quite primitive, the work of such outstanding tonmeisters as Khustov M.G., Grossman A.B., Gakhlin D.G. made outstanding recording of classical music and live concerts. The main feature of the second period (1950-1980?s) is the fast development of the TV, RB and recording studios (292 large television centers and radiostudios had been built by the 1980?s). Today new digital technologies and surround sound systems are used in tonmeister practice. Such masters as Shugal S.G., Vinogradov V.V., Khondrashin P.K., Dinov V.G. and many are creating new methods of digital sound recording. The main periods of the development of tonmeisters technology in Russia are investigated in this paper.
The History of the Tonmeister Recording Technique in Russia

6134
Mickiewicz, Witold
Many symphonic orchestras use non-optimal two microphone setup during rehearsal recordings. These recordings are used for archiving purposes and to evaluate and improve artistic skills of whole orchestra and its members. For that purposes good resolution of stereo image during reproduction is needed. The process of decission to choose right microphone setup can be based on geometrical parameters of the orchestra podium and acoustical properties of a rehearsal hall. Some theoretical considerations presented in this paper are supported by real recordings made in the hall of the Philharmony of Szczecin, Poland and listening tests made by orchestra members.
Optimization of Microphone Setup for Symphonic Orchestra Recordings During Rehearsal

6135
Jeong, Daegwon; Hamada, Hareo; Jang, Daeyoung; Kang, Kyeongok; Kim, Jinwoong; Lee, Taejin
Generally, dummy-head microphone is used for the 3D audio acquisition. Because of its human-like shape, we can get good spatial images. However its shape and size are also the restriction of its public use. In this paper, we propose 3D audio acquisition and reproduction method using multiple microphones on a rigid sphere. We place the 5 microphones on a rigid sphere?s special points and generate various audio signals for the reproduction of headphone, stereo, stereo dipole, 4ch and 5ch reproduction environments. Subjective reproduction experiments of 4ch and 5ch loudspeaker configurations show that the front/back confusion, which is common limitation of 3D audio reproduction system using dummy-head microphone, can be reduced dramatically.
3D Audio Acquisition and Reproduction System using Multiple Microphones on a Rigid Sphere

6136
Eisenberg, Gunnar; Batke, Jan-Mark; Sikora, Thomas
A Query by Tapping System is a multi-media database containing rhythmic metadata descriptions of songs. This paper presents a Query by Tapping system called BeatBank. The system allows to formulate queries by tapping the melody line?s rhythm of a song requested on a MIDI keyboard or an e-drum. The query entered is converted into an MPEG-7 compliant representation. The actual search process takes only rhythmic aspects of the melodies into ac-count by comparing the values of the MPEG-7 Beat Description Scheme. An efficiently computable similarity measure is presented which enables the comparison of two database entries. This system works in real-time and computes the search process online. It computes and presents a new search result list after every tap made by the user.
BeatBank ? An MPEG-7 Compliant Query by Tapping System

6137
Batke, Jan-Mark; Eisenberg, Gunnar; Weishaupt, Philipp; Sikora, Thomas
Studio microphones developed recently for high-resolution applications are capable of sensitivity corresponding to the noise level of air particles hitting the diaphragm surface due to thermal molecular movement and at the same time have a dynamic range of 130 dB or more. This is true for both microphones using analog electronics and microphones using the most recent ADC technology.
A Query by Humming System using MPEG-7 Descriptors

6138
Kostek, Bozena; Czyzewski, Andrzej
The paper addresses the capabilities that should be expected from intelligent Web search tools in order to respond properly to user's music information retrieval needs. An advanced query algorithm was engineered employing a concept of inference rule derivation from flow graphs with regard to semantic data processing. This concept, introduced recently by Pawlak, is used for mining knowledge in databases. The created database searching engine utilizes knowledge acquired in advance and stored in flow graphs in order to enable searching in musical repositories. Results obtained show that employing the implemented method the resulting search matches are ranked optimally, thus metadata related to recorded sound can be retrieved efficiently with the use of this algorithm.
Music Archive Metadata Processing Based on Flow Graphs

6139
Cano, Pedro; Koppenberger, Markus; Herrera, Perfecto; Le Groux, Sylvain; Ricard, Julien; Wack, Nicolas
Audio classification methods work well when fine-tuned to reduced domains, such as musical instrument classification or simplified sound effects taxonomies. Classification methods cannot currently offer the detail needed in general sound recognition. A real-world-sound recognition tool would require thousands of classifiers, each specialized in distinguishing little details and a taxonomy that represents the real world. We describe the use of WordNet, a semantic network that organizes real world knowledge as the taxonomy backbone. In order to overcome the huge number of classifiers to distinguish an ever growing number of sounds, the recognition engine uses nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts.
Nearest-neighbor Generic Sound Classification with a WordNet-based Taxonomy

6140
Zielinski, Slawomir K.; Rumsey, Francis; Bech, Søren; Kassier, Rafael
The basic audio quality of 5.1 multichannel audio reproduction was evaluated under different technical conditions. The obtained database of subjective responses was used to develop a multichannel audio quality expert system. There are three aims of this development: (1) to predict audio quality as a function of individual channel bandwidth, (2) to predict audio quality as a function of given down-mix algorithms, (3) to predict the optimum technical trade-off between these factors for a given total bandwidth of a multichannel audio signal. Obtained results indicate a close correspondence between the predicted and actual quality ratings. It is intended that the final version of the Quality Adviser will be suitable as a decision making aid for broadcasters and codec designers.
Quality Adviser ? A Multichannel Audio Quality Expert System

6141
Zacharov, Nick; Lorho, Gaetan
A subjective evaluation of Virtual Home Theatre systems (VHT) for loudspeaker and headphone reproduction is presented in this paper. Several algorithms for loudspeakers and headphones were selected and applied to six different multichannel audio programs. A subjective experiment was performed for each configuration using screened listeners to assess the performance of these VHT algorithms in terms of overall sound reproduction quality. A paired comparison method was chosen, with the discrete 5-channel reproduction (3/2) system as a reference in the loudspeaker test, and the stereo downmix of the 5-channel material in the headphone test. The stereo downmix was also compared to the 5-channel reference in the case of the loudspeaker reproduction. The experimental design and the detailed analysis of results are presented in this paper.
Subjective Evaluation of Virtual Home Theatre Sound Systems for Loudspeakers and Headphones

6142
Lee, Hyun-Kook; Rumsey, Francis
The subjective attributes of 2-channel phantom images of transient piano, continuous trumpet and male speech sources were elicited using pair-wise comparison between reference mono images and their phantom images. The attributes elicited included ?image focus?, ?image width?, ?image distance?, ?brightness?, ?hardness? and ?fullness?. The effect of interchannel time and intensity differences on the perceived difference between the real image and its phantom image was investigated for each sound source in respect of the elicited subjective attributes. Results show that the type of panning method (pure time, pure intensity and combination of the two) had a statistically significant effect on image focus and image width attributes. It was also found that the type of sound source had a significant effect on all the attributes.
Elicitation and Grading of Subjective Attributes of 2-Channel Phantom Images

6143
Skovenborg, Esben; Quesnel, René; Nielsen, Soren H.
An experiment was performed to investigate the assessment of loudness of music and speech using a General Linear Model. Eight expert listeners participated in the experiment. The method of adjustment was used for loudness matching of stimuli. Both stimuli of each pair were selected from a collection of 147 homogeneous audio segments including representative samples of speech, jazz, rock/pop, and classical music, together with pink noise and a 1 kHz tone. For each segment, a reliable estimate of the loudness level was obtained from the model. Both the uncertainty and the subjectivity factors were shown to depend on the class of the stimuli. An alternative categorization based on four MPEG-7 Audio Descriptors was also used for the analysis.
Loudness Assessment of Music and Speech

6144
Salava, Tomas
The paper will deal with some open problems of low-frequency sounds reproduction, particularly in medium and small listening rooms. First, the basic facts and terms concerning sound fields and transfer functions in bounded spaces will be briefly recalled. Specifics of sound quality perception at low frequencies will then be outlined. Opinion differences in this field will be discussed too. Strong influence of the test signals properties is stressed, and using both musical, and artificial test signals for low-frequency listening tests recommended. Several examples of different artificial low-frequency test signals will be described, demonstrated, and compared with musical signals. (Samples of such test signals are available in MP3 format free on request.)
Imperfections at Low Frequencies - How Much Are They Audible or Annoying?

6145
Salmela, Juha; Mattila, Ville-Veikko
A new intrusive method, combined of several independent objective metrics, has been developed for the evaluation of the quality of acoustic noise suppression in mobile communications. Extensive subjective data, including simulations of several noise suppression solutions in various noise environments, was gathered to serve as the benchmark for the metrics. Partial least-square regression and full cross-validation were used to establish the applicability of 26 metrics, that were making use of different measurement procedures, to predict the perceived quality. A Phase IV, vector-based preference model, was optimized to predict quality with a correlation of 0.95, resulting in an average prediction error of 8 %. Different measurement procedures appeared to contribute with a similar extent to the prediction ability of the optimized model.
New Intrusive Method for the Objective Quality Evaluation of Acoustic Noise Suppression in Mobile Communications

6146
Brock-Nannestad, George
Graphical or real-time interactive analysis of recorded sound occurred at least 20 years before the invention of reproducible sound in 1877. Scientifically reproduced sound quickly found its way into phonetics and musicology. Early commercial sound recording for entertainment retained an aura of reproduction of a real sound event and prescribed certain calibration features. After commercial success was ensured, manipulation techniques were developed and refined, and the later analogue years demonstrated imaginative thinking that came to a climax when fast digital technology enabled satisfactory signals that only contained what the ear requires, and no more. The dissociation from the total real sound was complete. The paper provides a balanced, well-documented historical overview of the techniques and their consequences.
Respecting the Sound - From Aural Event to Ear Stimulus

6147
Angus, James A. S.
This paper presents a new approach to dither in Sigma-Delta Modulation (SDM) systems. In particular, it clarifies the position of the overload point in one-bit SDM systems and presents several overload control methods with comparisons of their efficacy. It then goes on to examine the problem of applying dither to one-bit systems and describes a new approach to applying high levels of dither. It presents results, which show that such dither can be effective in SDM systems.
A New Approach to Effective Dither in Delta-Sigma Modulation Systems

6148
Hawksford, Malcolm J.
To process audio signals prior to DSD and LPCM delivery, an audio data format is required that possesses a resolution substantially greater than the final release form. A number of strategies are presented capable of enhanced resolution. Techniques using the step-back algorithm are extended to include a multi-level quantizer but where the amplitude range is finite. An earlier scheme based upon multilevel SDM and multi-stage loss-less differential coding is enhanced by incorporating more aggressive noise shaping implemented by means of parametric noise shaping previously used for binary SDM.
Ultra High-Resolution Audio Formats for Mastering Applications

6149
Wegener, Michael; Gerhard-Multhaupt, Reimund; Bergweiler, Steffen; Wirges, Werner; Pucher, Andreas
Voided space-charge electrets, such as cellular polypropylene, have recently been developed as piezoelectric materials that exhibit strong electromechanical thickness oscillations corresponding to high piezoelectric coefficients of around 500 pC/N and very good acoustical matching to air (low density of typically around 0.5 g/cm³ and low sound speed). Here, we discuss different aspects of the manufacture and the applicability of cellular polypropylene films as transducer materials at high frequencies and for ultrasound. The frequency response up to 90 kHz and the directivity patterns for several transducer geometries were investigated. Second- and third- order harmonic distortions and the power consumption of cellular polypropylene films in acoustic transducers are also described. Our results demonstrate that the relatively new ferroelectret films are very attractive for a range of device applications.
Voided Space-Charge Electrets ? Piezoelectric Transducer Materials for Electro-Acoustic Applications

6150
Schneider, Martin
Microphones are used in all environments. Especially for outdoor locations but also in studio surroundings, wind and humidity characteristics of microphones and their relevant accessories are of interest. The paper presents acoustic and noise measurements plus audio examples of different types of microphones under climatically adverse circumstances, with diverse protective accessories like foam windshields, wind baskets, etc. Application guidelines for recording engineers are deduced.
Wind & Weather

6151
Bolaños, Fernando
The article proposes to analyze individually each part that makes the loudspeaker up, specifically the diaphragm-surround set. Experiences were performed on low and medium amplitude displacement ranges. The paper uses traditional experimental methods to seek for the surround and diaphragm?s spectral signatures in the main eigenvalue region. Method consists in exciting the diaphragm-surround set by a reluctance transducer which was fed by an electric impulse, and analyze its response with an Eddy Current Displacement Transducer in the Frequency Domain. The most typical experimental spectral signatures of the nonlinear systems in free response are reviewed. This paper presents the results that were obtained after examining six samples, finding only one sample completely free of nonlinearities.
Frequency Domain Experiences in Loudspeaker?s Suspensions

6152
Mazin, Victor; Lee, Yong-Sang
In electrodynamic loudspeakers the force factor Bl is an irregular and asymmetrical function of the voice-coil displacement. This results in diverse distortion during voice-coil oscillation. In the paper a method of artifact reduction is suggested. This method is based on application of non-uniform voice-coil winding, i.e. number of layers varies along the voice-coil axis. The voice coil proposed allows a more regular and symmetrical Bl factor than a conventional voice coil. Theoretical background of the method is given. Effects of the non-uniform voice coil on loudspeaker performance have been investigated using Klippel Distortion Analyzer.
Non-uniform Voice Coil Winding for Electrodynamic Loudspeaker

6153
Thiele, Neville
When a bridged-T network is inserted into the feedback path of a voltage follower, it can produce an inexpensive biquadratic filter whose transfer function has first-order coefficients as low as 2.5 (Q = 0.4), often approaching 2 (Q = 0.5), in the numerator when those in the denominator lie in the very useful range between 0.5 and 2. Among its applications, it is peculiarly suited to equalizing "over-damped" loudspeakers, i.e. with exceptionally low Qt's, that are typical of robust, sensitive, drivers with large magnets. The wide range of applications is possible through selection of the more suitable of the two possible configurations of a bridged-T network, described in Figs 2 to 5 as CRRC or RCCR. The work is the subject of intellectual property claims.
An Active Biquadratic Filter for Equalizing Overdamped Loudspeakers

6154
Prokofieva, Elena
A theoretical step-by-step investigation of the conventional speaker, placed into a sealed cabinet and then installed within a rigid wall, has been conducted. The speaker diaphragm was simulated by a rigid circle piston and then by a number of concentric rings inserted into a large but finite sized baffle and enclosure. The acoustic pressure and dynamic displacement expressions were formulated using a quasi-dynamic approach to loading force representation. This simulation allows for the withdrawal of some standard assumptions commonly used in the traditional theory of plates. A block-scheme of a proposed computer simulation using the developed quasi-dynamic model is also presented.
Radiation of Enclosed Loudspeaker in a Large Baffle: Speaker Simulation Model

6155
Poulsen, Soren; Andersen, Michael A. E.
An integration of electrodynamic loudspeakers and switch mode amplifiers has earlier been proposed in [1]. The work presented in this paper is related to the practical aspects of integration of switch mode audio amplifiers and electro dynamic loudspeakers, using the speaker?s voice coil as output filter, and the magnetic structure as heat sink for the amplifier.
Practical Considerations for Integrating Switch Mode Audio Amplifiers and Loudspeakers for a Higher Power Efficiency

6156
Mellow, Tim J.
Radiation characteristics of a concept loudspeaker are calculated both analytically and using finite element analysis. It comprises two closely spaced stretched piezoelectric membranes pushed apart by a pressurized gas. A drive voltage applied across conductive coatings on both membranes causes their tensions to vary in opposite phase and consequently the membranes are displaced in the same direction. Driven by a class D amplifier, this transducer potentially offers higher efficiency than conventional moving coil technology but with the smooth response and light weight of electrostatic devices. However, the voltage requirement is lower and the potential SPL higher than the latter. The only remaining question is whether it can be manufactured economically.
Sound Radiation from a Dual Microflim Piezoelectric Loudspeaker in Free Space

6157
Pellerin, Guillaume; Polack, Jean-Dominique; Morkerken, Jean-Pierre
Whereas the aerodynamic effects take a significant place in the behavior of sound sources in the low frequency domain and for signals containing a high specific energy, new complex fluid parameters have to be implemented to take into account possible causes of sound distortion such as the stalling phenomenon in the boundary layer around the mechanical structure. For the design of vented boxes, we show that the choice of a nozzle profile for the resonator ensure a better dynamical stability of the airflow and thus authorize extreme low cutoff frequencies in "dipole" configurations. It will be described some experimental and computed results based on phase spaces and fluid FEM about the radiating output of that kind of source under 40 Hz.
Sound Source Design in the Very Low Frequency Domain

6158
Pincus, Michael S.
A closed-loop audio system can be defined as one in which the loudspeaker is in the same space as the microphone. As such, some sound from the loudspeaker will mix with the source creating an interference pattern. The interference is dependent on the path length from the loudspeaker back to the microphone, the amplitude of the interfering signal, and the latency of the forward-fed signal. This paper will investigate this interference and its effect on the output response of the microphone.
Microphone Response in a Closed-Loop System

6159
Milanov, Emil; Milanova, Elena
In this article we example the space characteristics of directed single gradient microphones with two acoustical entrances. A formula is defined that describes the shape of the space characteristics for any microphone regardless of its technical implementation, provided that the microphone has two acoustical entrances. The conclusions are valid for all cases when there is no diffraction. The basic formula that describes the space characteristics is valid for a sphere and plane sound wave and for the whole frequency response of the microphone.
Space Characteristics of Directed Single Gradient Mircrophones

6160
Phua, Kok-Soon; Chen, Jian-Feng; Shue, Louis; Sun, Han-Wu
In this paper, we propose a compact directional noise-cancelling device which consists of a differential microphone formed by two omni-directional microphones connected in an endfire orientation. By making use of adaptive beamforming for improved directionality, and echo shaping, a form of nonlinear speech enhancement, the proposed device is positioned to tackle noise found in real environments, which is typically a mixture of directional, stationary and non-stationary interferences. Performance evaluation of our real-time implementation is based on the following criteria: 1) directionality, 2) distortions, and 3) speech quality as measured by the Mean-Opinion-Score (MOS), through subjective listening tests and using the ITU-T P.862 Perceptual Evaluation of Speech Quality tool. Our experimental results indicate an average interference suppression of as much as 22dB, and consistent improvement in speech quality.
Performance Study of a Compact 2-Sensor Noise Cancelling System

6161
Soulodre, Gilbert A.
There are many applications where it is desirable to objectively measure the perceived loudness of typical audio signals. The ITU-R is investigating suitable objective measures (meters) that would allow the perceived loudness of various program materials to be equalized for broadcast applications. Ten objective loudness meters were submitted for formal evaluation by several private companies and research organizations. The loudness meters were evaluated in their ability to predict the results of an extensive database derived from a series of formal subjective tests conducted at five test sites around the world. The performance of the various loudness meters is compared and rated using several newly proposed metrics. Several basic objective loudness measures were also evaluated.
Evaluation of Objective Loudness Meters

6162
Nielsen, Lars; Schuhmacher, Andreas; Liu, Bin; Jonsson, Soren
Ear simulators are standardized devices used for calibration of e.g. earphones and telecommunication equipment. In this paper, the ear simulator B&K Type 4157 is investigated using a combined boundary/finite element model (BEM/FEM) of the air inside. Traditionally lumped parameter models have been used to create an electrical equivalent diagram for simulating acoustic impedances. However, these lumped parameter models have some built-in limitations and may not be valid for higher frequencies where the acoustic wavelength is in the range of the ear simulator dimensions. A more accurate acoustic model can be derived using well-established techniques like BEM and FEM. Here we present a combined BEM/FEM model, taking into account the thermo-viscous effects, which are shown to be required for obtaining realistic results. Comparisons between simulation and measurement are given.
Simulation of the IEC 60711 Occluded Ear Simulator

6163
Gorelik, Vladimir; Peissig, Juergen; Wiggers, Rainer
High-Performance wideband Ultrasonic 'Sell'-Transducer Jürgen Peissig, Vladimir Gorelik, Rainer Wiggers, Sennheiser electronic, Wedemark The ultrasonic (US) transducer based on Sell's principle is well known to work invertible as microphone and speaker with a broad-band frequency response. US transducers are used for movement and distance sensors, flow-meters and in parametric transducers where it is important to have a high US sound level in air and good directivity. Driven by these applications we developed several versions of Sell transducers with optimised backplate structures for high sound pressure levels, minimum loss due to the membrane suspension, optimal drive of the membrane surface and high directivity. Different membrane materials und vent openings result in different frequency responses. The transducer design, its acoustical performance and the applications will be discussed.
High-Performance Wideband Ultrasonic 'Sell'-Transducer

6164
Temme, Steve; Brunet, Pascal
During loudspeaker production, particles may become trapped in the loudspeaker motor and voice coil vicinity, resulting in a distinctive defect that is easily heard, but difficult to detect by traditional test and measurements. We found that a Sine Sweep Stimulus followed by a High Pass Filter and RMS Envelope Analysis efficiently detected Loose Particles and Rub & Buzz defects. The remaining problem is how to reduce the effect of background noise, and get more reliable results. Statistical descriptors such as Crest Factor, Skewness, and Kurtosis are first investigated. Experimental results are given and the different tools are compared. New enhancements are described that increase effectively the overall immunity to background noise and discrimination of the method.
Enhancements for Loose Particle Detection in Loudspeakers

6165
Ahnert, Wolfgang; Feistel, Stefan; Richert, Waldemar
Today various acoustic measurement methods are used to investigate rooms or devices under test. For room-acoustic measurements MLS routines are often applied to obtain the detailed data according to ISO standard 3382. Instead of MLS, nowadays the dual-channel FFT method based on a sweep stimulus is commonly accepted, too. On the other hand, excitation by continuous noise or shot noise is used to obtain a good overview in a short time. For loudspeaker data acquisition or commissioning tests in noisy environments a TDS sweep measurement is performed to achieve results of high accuracy. Here a new measurement tool will be presented, incorporating all of these widely known methods. The advantages and disadvantages as well as the limitations will be discussed for each technique by means of specific examples and measuring applications. A detailed comparison will be provided and recommendations for the practical use under selected acoustic environmental conditions will be given.
Merging Room-Acoustic and Electro-Acoustic Measurement Methods

6166
Ferreira, Anibal J. S.
This paper describes recent developments on the design of an advanced Audio Spectral Coder (ASC) that seeks: coding efficiency by combining source and perceptual audio coding techniques; bit stream semantic scalability by segmenting the audio signal into transients, sinusoids and noise; low delay coding by using a moderate transform size and no bit stream buffer; and embedded error robustness by not using interframe coding. The operation of ASC is explained, its performance is assessed using a few test results, and potential application areas are also addressed.
Efficient Intraframe Coding of Monophonic Audio

6167
Pueo, Basilio; Escolano, Jose; Bleda, Sergio
Finite-Difference Time-Domain (FDTD) method was successfully developed to model electromagnetic systems. This technique has been also used in several disciplines, such as optics and acoustics. A new approach for Wave Field Synthesis (WFS) simulation using FDTD instead of finite difference classic method is presented. This software permits to evaluate precision and behaviour of different WFS configurations in time domain and thus in a particular frequency band. Moreover, simulations can be analyzed inside a room or in free space.
Finite-Difference Time-Domain Acoustic Analysis of Fibrous Sound-Absorbing Materials

6168
Ono, Kazuho; Komiyama, Setsu; Hamasaki, Kimio; Sakumoto, Sumi; Ohga, Juro
Electroacoustic reverberation control system is used mainly in multi-purpose auditorium and concert halls whose acoustical design is not ideal for music performance. The present paper discusses the use of loudspeaker arrays for electroacoustical reverberation control in an auditorium, especially the effect of using multiple loudspeakers on listening area. Subjective experiments were conducted with various loudspeaker setups and various listening points, including off-center ones, in our new auditorium equipped with seven vertical line arrays on each sidewall. The results of this subjective evaluation were compared with measurements of the sound pressure distribution created by corresponding loudspeaker setups and were used to infer a criterion for setting loudspeakers to provide a large listening area.
Reverberation Control in an Auditorium using Loudspeaker Array

6169
Munro, Andrew
Film mixing theatres are becoming both smaller and more flexible. The author considers the implications of acoustic and loudspeaker variations between rooms and comes to some interesting conclusions.
Room Acoustics and Equalisation of Speaker Systems for Multipurpose Theatres

6170
Collins, Tim
Most techniques for estimating the transfer function (or impulse response) of an acoustical space with a high signal-to-noise ratio operate along similar principles. A known, broadband signal is transmitted at one point in the room whilst being simultaneously recorded at another. A matched-filter is then used to compress the transmission waveform into an approximate impulse and equalisation filtering is used to remove any colouration caused by the non-uniform energy-spectrum of the transmission and/or the non-ideal response of the loudspeaker/microphone combination. In this paper, the limitations of this conventional technique will be highlighted, especially when using low-cost equipment. An alternative, non-linear deconvolution technique is proposed which will be shown to give superior performance using both synthetic waveforms and practical room measurements.
Implementation of a Non-Linear Room Impulse Response Estimation Algorithm

6171
El-Saghir, Emad; Feistel, Stefan
Many ray tracing algorithms make use of the single-valued diffuse-field absorption coefficient to simulate the sound field in a given room computer model. They consider, however, neither the effect of the angle of incidence nor the fact that the reflection factor is complex. If characteristic impedance and wave number, which are measured in an impedance tube, are known, we can expect reflectograms, which look different from those generated by current simulators, and look different for different thickness. The paper investigates how far the angle-dependent reflectograms, which consider phase shift due to complex reflection factors, look different from the angle-independent ones, and whether the statistical nature of reflectograms leads to the cancellation of such effects.
Influence of Ray Angle of Incidence and Complex Reflection Factor on Acoustical Simulation Results

6172
Konda, Preethi; Prakash, Vinod
Using the perceptual distortion metric returned by the psychoacoustic module, conventional bit allocation schemes operate iteratively to maintain equal perceptual distortion in all critical bands. For codecs employing uniform quantization schemes, this paper proposes a new approach to determine the optimal MNR (Mask to Noise Ratio) levels for the critical bands. The scheme exploits the fact that the quantizer used is uniform in nature and all critical bands are equally distorted, to arrive at a non-iterative solution. Additionally, this method is independent of the target bit-rate. The proposed scheme achieves a 2-3x reduction in the complexity of the quantization block. An example application for this scheme is given with reference to the MPEG-2 Layer 1 and 2 encoder.
Optimal Bit Allocation Strategy for Perceptual Audio Coders Employing Uniform Quantization Schemes

6173
Kamaruzzaman, Md.; Taddei, Herve
Embedded speech coding technique is of interest for many applications like VoIP, Multimedia Broadcasting, Video Conferencing. We propose a CELP based embedded speech codec that is operable for both narrowband and wideband speech signals. Our three-layered embedded codec offers three bit-rates. This embedded codec is based on Speex codec. In our embedded speech codec, innovation vectors of the higher layers are embedded in the innovation vector of the lowest layer. All speech coding parameters but innovation vector are shared between lowest layer and higher layers. In our algorithm, higher bit rates are rewarded with better quality penalizing the lowest bit rate.
Embedded Speech Codec Based on Speex

6174
Bhatt, Mahabaleswara R.
This article proposes a novel method for memory and computationally efficient implementation of sub-band synthesis filter for MPEG audio decoding. In contrast to conventional approach, this derived approach proposes to compute 64 sets of windowing operations in the beginning each with eight input samples and four re-arranged window co-efficient. Subsequently these windowed sequences are used for two matrixing operations. The proposed fast algorithm exploits not only the DCT relationship for matrixing operations but also procedure pruning for required DCT co-efficient computations. Moreover, the windowing operations make use of the symmetry exist in the window co-efficient array. Additionally, the derived approach eliminates the intermediate arrays and explicit filtering operation by appropriately merging these in to the windowing and matrixing operations itself. This yields a benefit in reducing the memory requirement and also involved data transfers while computing.
A Memory and Computationally Efficient Synthesis Sub-band Filter for MPEG Audio Decoding

6175
Suresh Babu, Venkata; Malot, Ashish Kumar; Vijayachandran, V.M.; Vinay, M.K.
State of the art audio encoders are based on transform-domain coding algorithms. Due to time-frequency uncertainty, transform domain coders suffer from ?pre-echo? and ?diffusion? artifacts during transient portions of the signal. These artifacts occur because of large transform lengths used to achieve higher coding gains. Audio encoders employ various tools such as adaptive transform lengths, TNS etc to efficiently code transient portions of the audio signal. Typically audio signals have time domain transients (e.g. castanets), frequency domain transients (e.g. flute, clarinet) and transients observed in speech signals during consonant to vowel transitions etc. Identification of these transients in an audio signal is vital to achieve perceptual quality at low bit-rates. This paper discusses the various transient classes present in audio signals, apart from describing a transient detector employed for efficient modeling of all classes of transients. The proposed transient detector has been incorporated in MPEG-4 AAC encoder, independent of the psycho-acoustic analysis methodology used. Listening tests as well as OPERA scores indicate substantial improvement in audio quality, over the baseline encoder.
Transient Detection for Transform Domain Coders

6176
Reche-Lopez, Pedro Jesus; Vera-Candeas, Pedro; Ruiz-Reyes, Nicolas; Curpian-Alonso, Jose; Rosa-Zurera, Manuel
In this paper, signal-adaptive parametric models based on overcomplete dictionaries of time-frequency atoms are considered for high quality low bit-rate parametric audio coding. There are a variety of frameworks for deriving overcomplete signal expansions, which differ in the structure of the dictionary and the manner in which dictionary atoms are selected for the expansion. Psychoacoustic-adapted matching pursuits are accomplished for extracting sinusoidal components using an harmonic dictionary, while energy-adapted matching pursuits are carried out for transients modelling with a wavelet-based dictionary. First, transients are detected, modelled (with wavelet functions) and removed from the original audio signal, leaving a residue. Then, sinusoids are modelled using complex exponential functions and removed from the initial residue, leaving a noise-like signal. This final residue is modelled taking advantage of the good time-frequency location of the wavelet transform and considering psychoacoustic principles. An M-depth Wavelet Transform is first applied to the residue. Energy of each wavelet sub-band is then computed, and finally a Time Noise Shaping (TNS) process is applied to each one, which involves a parametric model for the noise-like signal. The resulting multi-part model (Sines + Transients + Noise) is efficiently applied by taking into account psycho-acoustical information for audio coding purposes. The combination of these all ideas results in nearly transparent parametric audio coding at binary rates close to 16kbps for most of the CD-quality one channel audio signals considered for testing. Listening tests allow us to say that our coder achieves better results than MPEG-4 AAC at very low bit rates (close to 16kbps).
Signal-adaptive Parametric Modelling for High Quality Low Bit Rate Audio Coding

6177
Kurniawati, Evelyn; Lau, Chiew Tong; Premkumar, Benjamin; George, Sapna; Absar, Javed
A method to improve the PSNR of a perceptual audio coder is presented. It is based on the use of noise estimator at the decoder side to relate the quantization parameters and the quantization error. The quartic equation established contains two real roots, of which one of them is the desired spectral value. This value contains lesser quantization error compared to the de-quantized spectral value of a normal decoder. This leads to an improvement of up to 12 dB in SNR without significant increase in the decoder complexity.
Decoder Based Approach to Enhance Low Bit Rate Audio

Back to AES Preprints


(C) 2004, Audio Engineering Society, Inc.