Audio Engineering Society Preprints

AES 110th Convention

2001 May 12-15

AES Preprint Ordering

Single Convention Preprints are available through the AES Preprint Search and Shop facility.

Preprints Listing

Gino Gobbi,Angelo Farina,Emanuele Ugolotti,
The paper describes a new subjective evaluation method of the acoustical quality produced by a sound system inside a car compartment. The method produces a single rating number, called Index of Performance Acoustic (IPA), which is defined as a weighted average of the subjective responses to a questionnaire compiled during listening tests conducted with the subject sitting inside different cars. The paper describes the details of the subjective test and focuses on the choice of questions in the questionnaire and the weight employed. The principal innovation of the new method is that the weights are changed according to the reliability of the subject, which is also inferred from the questionnaires. Thus, the evaluation is very robust and almost immune to inclusion in the panel of completely unreliable evaluators.
IPA - A Subjective Assessment Method Of Sound Quality Of Car Sound Systems

Annabelle Edge,David Clark,
A series of planar magnetic drive units has been developed to target specific problems in automotive sound systems that are not fully addressed by conventional loud-speakers. The naturally flat-form factor of planar magnetic drive units allows them to be placed in more nearly optimal locations for imaging performance. Also contributing to superior imaging is the directivity of the large planar radiating surface. Aperture restriction and use of acoustic lens are methods of controlling this directivity.
New Planar Magnetic Loudspeakers for Automotive Sound Systems

Alberto Bellini,Gianfranco Cibelli,Angelo Farina,
The paper describes a new measurement technique of the acoustical quality produced by a sound system. The method is called acoustic quality test (AQT), and it produces a graphical representation of the dynamic response of the system to tone bursts at various frequencies. It makes it possible to visualize simultaneously the steady frequency response, the transient response, and the signal-to-noise ratio. The new method is particularly useful for describing the performance of a sound system coupled with a small, noisy reproduction space, such as car audio systems.
AQT - A New Objective Measurement Of The Acoustical Quality Of Sound Reproduction In Small Compartments

Neal House,
A study was done to subjectively compare and rank the sound field characteristics of a vehicle equipped with a high-end home surround sound processor and to assess compatibility of several discrete and matrix decoding modes in combination with both discrete and down-mixed source materials. A vehicle was fitted with a 3/4 loudspeaker arrangement (7.1) and was evaluated with optimized amplitude balance, using all loudspeakers, for all encoding and decoding modes. Comparisons were also done in a 3/4 listening room for reference. The study design details and analysis results are reported.
Subjective Evaluation of 2-Channel Vs Surround Formats in Vehicles

John Stewart,
The auxiliary loudspeaker cone, known as a whizzer, is used to gain high-frequency amplitude response in applications where cost or weight constraints prohibit a separate, dedicated high-frequency transducer. Usually attached to the front edge of a dynamic loudspeaker voice-coil former at the apex of the main cone, its target performance range is 5 kHz to 20 kHz. The functional mechanism of the whizzer is not particularly well understood. Performance prediction is difficult. This paper discusses the test results, laser vibrometer images, anecdotes, and design directions that may aid future whizzer designs and stimulate the development of better design tools.
Performance of Loudspeaker Whizzers

Timothy Nind,
The arrival of multimedia in car entertainment presents a challenge for the audio playback system. The majority case will continue to be the playback of two-channel sources, such as CD and radio; but the system also needs to reproduce various 5.1 formats from new sources, such as DVD, usually for the rear seat. This paper discusses the alternative architecture and processor options and proposes a Logic 7 system, adapted for automotive applications, as an elegant solution, providing enhanced spatial reproduction of two-channel material and the full-surround effect from 5.1 sources.
Multimedia in cars: The use of Logic 7 surround processing as the solution to the challenge of providing surround sound in cars from all 2 channel and encoded 5.1 sources.

Mark Ziemba,
Different test signals and test methodologies have been used to evaluate automotive sound systems. On the objective side are technical measurements that produce graphical plots of data. In contrast are subjective listening tests that use music as a test signal and yield a psychoacoustics perception from the test subjects. This paper investigates many of these test methodologies and discusses the use of some new test signals for both objective measurement and subjective evaluation of vehicle sound system performance.
Test Signals for the Objective and Subjective Evaluation of Automotive Audio Systems

Shokichiro Hino,
The system combines a sound field measurement system and a convolution processing unit with a computer as the central controlling unit. The sound design tool enables equalizers to be generated by directly editing the measured frequency characteristics as seen on the graph. The resulting sound is immediately convoluted for comparative listening tests. An application example of the system adopted for the acoustic design of cars is illustrated.
The Sound Design System

Juha Merimaa,Matti Karjalainen,Timo Peltonen,Benoit Gouatarbes,Tapio Lokki,
The development of a system for room acoustical measurements and analysis is described. The goal of the project has been a versatile system for multichannel and binaural investigations of room and concert hall acoustics. It comprises a portable PC-based workstation with multichannel AD/DA data acquisition, an omnidirectional sound source, a 3-D microphone grid for directional response registration, a dummy head, standard omnidirectional and cardioid microphones, and MLS-based response computation. In addition to traditional room acoustical attribute analysis, special algorithms have been developed to investigate the time frequency behavior of responses in different directions, for example, in concert halls. Analysis cases of interesting hall measurements are discussed.
A System for Multi-Channel and Binaural Room Response Measurements

Poju Antsalo,Aki Makivirta,Vesa Valimaki,Timo Peltonen,Matti Karjalainen,
Estimation of modal decay parameters from noisy measurements of reverberant and resonating systems is a common problem in audio and acoustics, e.g., in room and concert hall measurements or musical instrument modeling. In this paper reliable methods to estimate the initial response level, decay rate, and noise floor level from noisy measurement data are studied and compared. A new method based on nonlinear optimization of a model for exponential decay plus stationary noise floor is presented. Comparison to traditional decay parameter estimation techniques using simulated measurement data shows that the proposed method outperforms in accuracy and robustness, especially in extreme SNR conditions. Three cases of practical applications of the method are described.
Estimation of Modal Decay Parameters from Noisy Response Measurements

Matthieu Latour,
The use of data reduction codecs and sound processors in broadcast does not allow traditional quality measurement methods, such as THD and signal-to-noise ratio. In order to find an efficient and cost-effective quality evaluation method, Radio France’s Value Department experimented with a tool called Tocade audio monitor (from CCETT research center). This device indicates the sound quality of a signal without any reference. It was tested with a CD that gives an image of Radio France's programs (music, speech, location reports, etc.). This CD test was used as a reference. All kinds of signal paths were simulated using actual sound processors and data-rate reduction codecs encountered at Radio France.
Quality evaluation in broadcasting with Tocade Audio Monitor

Hans-Elias de Bree,Erik Druyvesteyn,Ron Raangs,
Sound intensity is a useful measure in acoustics, because in a reverberant environment, free-field measurements can be done. The proposed low-cost intensity probe combines a sound particle velocity sensor and a microphone, which can be used as separate devices or as one package. For the latter case, the velocity and the pressure are measured at almost the same location. Since the intensity is calculated from the cross-correlation of the velocity and pressure, a very accurate phase matching, such as the p-p method, is not necessary; and its signal-to-noise ratio (SNR) is higher than it is for the separate sensors. The data acquisition and processing are implemented on a standard personal computer, combined with a simple calibration, creating a very powerful intensity measuring device.
A low cost Intensity probe

Florian Koenig,
Epidemiological studies about extremely low-frequency electromagnetic fields have underlined human health impairing aspects. This work demonstrates multiple uses of headphones/headsets, not only near-field sound reinforcing devices. Measurements of about 100 different headphones/headsets using pink noise at 70-dB SPL(C) reveal that the majority of objects produced a critical magnetic flux (see TCO '95). Furthermore, a technique to reduce the head-related magnetic field emissions is illustrated.
Health Impairing Aspects by Headphones Electro-Magnetic Fields

Yuichiro Takamizawa,Toshiyuki Nomura,
This paper describes the processor-efficient implementation of a high-quality MPEG-2 AAC encoder employing fast psychoacoustic analysis, efficient encoding of side information, and SIMD instructions. A psychoacoustic analysis in the MDCT domain reduces computational costs. Smoothing of scale factors and optimized selection of Huffman tables are introduced to efficiently encode the side information. SIMD instructions are heavily used in MDCT and quantization processes to improve the encoding speed. Seven-grade comparison MOS test results show that the AAC encoder at 96 kb/s/stereo achieves sound quality equivalent to that of MP3 at 128-kb/s/stereo. The encoder works 13 times faster than real time for stereo encoding on an 800-MHz Pentium III processor.
Processor-efficient implementation of a high quality MPEG-2 AAC encoder

Leonid Yaroslavsky,Miikka Vilermo,Mauri Vaananen,Ye Wang,
This paper presents a novel lossless multichannel audio coding algorithm to remove interchannel redundancy. The authors employed an integer-to-integer discrete cosine transform (INT-DCT) to perform interchannel decorrelation after quantization of modified discrete cosine transform (MDCT) coefficients of individual channels. When compared with a Karhunen-Loeve transform (KLT)-based approach, this new method has three major advantages: 1) avoids quantization noise spreading to other channels, 2) computational simplicity, and 3) uses less overhead information, while having a similar decor-relation capability. (A quantized covariance matrix or eigenvector is avoided in this algorithm.)
A Multichannel Audio Coding Algorithm for Inter-Channel Redundancy Removal

Shankar Regunathan,Kenneth Rose,Ashish Aggarwal,
The authors propose a new approach to achieve efficient scalability in audio coders and discuss its performance using the MPEG-4 Advanced Audio Coder (AAC). In conventional scalable coding, the enhancement layer performs straightforward re-quantization of the base-layer reconstruction error. This coding scheme implicitly discards useful information from the base layer and does not truly minimize a perceptually meaningful distortion criterion, such as the noise-mask ratio. The authors reformulated the problem of scalable coding within a companding framework; and results show that re-quantization in the compander’s compressed domain achieves, in the asymtotic sense, optimal scalability. Based on this observation, the authors developed a scalable AAC coder, which performs enhancement-layer quantization while exploiting all the information available at that layer. Simulation results of a two-layer scalable coder on the standard test database of 44.1-kHz sampled audio show that the proposed approach yields substantial savings in bit rate for a given reproduction quality.
Compander Domain Approach to Scalable AAC

Andreas Ribbrock,Frank Kurth,
The authors propose an MPEG-1 layer 3 conforming audio codec for multiple generations (cascaded) compression without loss of perceptual quality. Previous research addressing this topic mainly focused on less complex coding schemes, such as MPEG-1 layer 2. In this paper the techniques proposed in those approaches are extended to comply with layer 3's advanced features, such as hybrid filtering, block switching, and bit reservoir-based processing. A prototypic implementation, including extensive listening tests, shows the feasibility of perceptually stable cascaded layer 3 compression.
An Embedding Codec for Multiple Generations Compression Based on MPEG-1 Layer III

Alexey Petrovsky,Alexander Petrovsky,
Adaptive WP tree derived via dynamic algorithm transforms (DATs) is presented. The DAT defines parameter of input audio signals (subband entropy) and output coded sequences (subband rate) for the given embedded system architecture. A DAT-based pipe-line processor (WP trees analysis [encoder] and synthesis [decoder] algorithms) based on reconfigurable hardware, such as SRAM-FPGA plus distributed arithmetic, is described.
Audio coding with a masking threshold adapted wavelet packet based on run-time reconfigurable processor architecture

Damian Martinez Munoz,Fernando Cruz Roldan,Francisco Lopez-Ferreras,Manuel Zurera,
This paper deals with the design and implementation of a scheme for CD-quality audio coding that introduces a delay as low as 6 ms and provides near-transparent coding. It is implemented with a uniform filter bank that decomposes the audio signal in 32 bands. For such a delay, the filters in the filter bank must have a short impulse response, so the filters' nonideal frequency responses should be taken into account. Near-transparent coding is achieved at 96 kb/s, which is a very good result for such a low delay.
Low delay audio coding incorporating psycho-acoustic information

Nikolaus Meine,Bernd Edler,Heiko Purnhagen,
The Harmonic and Individual Lines plus Noise (HILN) MPEG-4 parametric audio coding tool allows efficient representation of general audio signals at very low-bit rates. Therefore possible applications include transmission over IP and wireless channels, which are both characterized by specific transmission error models. On the other hand, since parametric audio coding is a relatively new technique compared to transform coding and CELP speech coding, there has been only very limited investigations on HILN's behavior in error-prone environments. In this paper an analysis of error sensitivities and approaches to error protection and concealment are presented.
Error Protection and Concealment for HILN MPEG-4 Parametric Audio Coding

Francisco Lopez-Ferreras,Nicolas Ruiz-Reyes,Pedro Vera-Candeas,Manuel Zurera,
A new algorithm for avoiding overlapping of adjacent frames in order to reduce the block effect is presented. The algorithm is based on forward and backward prediction at the border of frames and has been applied with success to an audio coder based on time-varying wavelet-packet decompositions that use symmetrical and periodic extension as the method for processing frames in isolation.
Avoiding overlapping in a time-varying wavelet-packet based audio coder

Antonio Pena,Enrique Alexandre,Begona Rivas,Rafael Perez,Alberto Duenas,
MPEG natural audio coders, such as MPEG-1/2 layer 3 and MPEG-2/4 AAC, require a great amount of calculation mainly because of the iterative bit allocation processes proposed by the ISO/IEC technical documentation. This complexity makes it difficult for a real-time implementation of the normative algorithms. To solve this difficulty, a set of nonoptimal solutions to reduce the computing load is discussed. Based on these solutions, a real-time implementation of MPEG-1/2 layer 3 using a single fixed-point DSP is presented. In addition, techniques to achieve good audio performance and the methods for the adaptation of parameters are discussed.
Realtime implementations of MPEG-2 and MPEG-4 natural audio coders

Steven Hutt,
Studies of loudspeaker components most often concentrate on the functions that add or detract from the acoustically measurable performance aspects of loudspeakers, such as cones, suspensions or magnet and motor assemblies. On the other hand, application of litz wire into the loudspeaker design is not well documented. Yet the litz wire has a profound impact on reliability and has a propensity to cause problems, such as unacceptable extraneous noise. After a review of the classical theory on vibration of strings, bars and elasticity, one will surmise that understanding the vibration of litz wire is not trivial. This paper reviews a loudspeaker case study and the ramifications of using behaved and not so well behaved litz wire.
Loudspeaker Litz Wire

Mark Ureda,
Line arrays of loudspeakers are often employed to provide increased directivity, generally in the vertical plane. For improved performance, contemporary line arrays employ specially designed loudspeaker elements to provide a nearly continuous line source. However, even these may have imperfections relative to a perfect line source. This paper provides mathematical models to evaluate the directivity response of line sources and to quantify the effects of certain imperfections.
Line Arrays: Theory and Applications

Yaxiong Huang,Peter Fryer,Simon Busbridge,
A digital loudspeaker is one that does not contain any form of embedded digital-to-analog converter. From a consideration of the mathematical operations describing the digital-to-analog conversion process in a digital loudspeaker, it has been concluded that n identical analog filters (where n is the number of bits) separately filtering each bit driver offers a practical alternative to digital signal processing. The implementation of a multiple driver, multiple voice-coil digital loudspeaker is described. The effect of component tolerances in the crossover and compatibility with the requirements of current drive are evaluated. The interactions between the motion EMF, driving current and mutual coupling EMFs are considered. It has been concluded that this method of crossover implementation is both viable and achievable.
Crossover Systems in Digital Loudspeakers

Vesa Valimaki,Jari Kataja,Marko Antila,
The sound radiation properties of a striped panel loudspeaker are modified in this study. The panel loudspeaker is based on electromechanical film (EMFi) and comprises 14 individual sound radiating areas in the form of narrow stripes. Each stripe can be driven individually so that the panel radiation pattern can be modified and used as a directivity-controlled sound source in audio applications. The performance of the panel was analyzed using an acoustical boundary-element method (BEM) model. For real-time directivity control, an algorithm was developed to process the input signals to gain appropriate radiation patterns. The measurements carried out in anechoic conditions were used to verify that the striped EMFi panel operates according to the simulations and that it produces the desired directivity pattern.
Sound directivity control using striped panel loudspeaker

Neville Thiele,
The effect of phase rotation on the reproduced quality of an audio signal is well known. However, the effect of phase rotations in the responses of individual drivers on the amplitude response of a multiway system is often misunderstood. The paper quantifies this effect, and in showing that it is similar in nature to the better understood problem of time alignment, demonstrates how similar remedies may be applied to both effects together. It also shows how lowering the low-frequency cutoff of a loudspeaker system can reduce its group delay errors in band.
Phase Considerations in Loudspeaker Systems

Ulf Seidel,wolfgang Klippel,
A new measurement technique is presented for the estimation of linear parameters of the lumped transducer model. It is based on measurement of the electrical impedance and the voice coil displacement using a laser sensor. This technique directly identifies the electrical and mechanical parameters and dispenses with a second measurement of the driver using a test enclosure or an additional mass. Problems because of leakage of the enclosure or the attachment of the mass are avoided, giving accurate and reliable results. Measurement of the displacement also allows identification of the mechanical compliance versus frequency (explaining suspension creep), which is the basis for precisely predicting the radiated sound pressure response at low frequencies. The linear parameters measured at various amplitudes are compared with the results of large signal parameter identification, and the need for nonlinear transducer modeling is discussed.
Fast and Accurate Measurement of the Linear Transducer Parameters

John Meyer,Perrin Meyer,Justin Baird,
The far-field model of loudspeaker interaction is a widely used technique to model the sound pressure radiated from arrays of loudspeakers. The paper provides a quick but thorough introduction to the mathematical foundation of the far-field model and discusses how the model compares, with regard to accuracy and computational efficiency, to other formulations, such as series solutions and boundary-element methods. The paper presents data from a series solution of a model spherical loudspeaker and compares this mathematical model to actual high-resolution measurements. Using this model loudspeaker, methods for calculating and measuring accurate loudspeaker far-field polar (or directivity) patterns are described; and how the accuracy of the far-field polar patterns affect the accuracy of the far-field model is discussed.
Far-Field Loudspeaker Interaction: Accuracy in Theory and Practice

wolfgang Klippel,
A new auralization technique is presented for the objective and subjective assessment of drivers in the large signal domain. Using the results of the large signal parameter identification, a digital model of the particular driver has been realized in a digital signal processor (DSP) to simulate the sound pressure output for any given input signal (test signal, music). This technique combines the objective analysis and subjective listening test to assess the linear and distortion components in real time. This valuable tool shows the impact of each distortion component on sound quality and allows driver optimization with respect to performance, size, weight, and cost.
Speaker Auralization - Subjective Evaluation of Nonlinear Distortion

Andrzej Czyzewski,Bozena Kostek,Piotr Odya,
The paper contains a description of experiments aimed at determining the visual cue influence on the perception of spatial sound. Earlier stages of the carried out experiments showed that there exists a relationship between the perception of video presented in the screen and sound signals reproduced in a surround system. However, this relationship is dependent on the type of audiovisual signals. Thus a series of subjective tests have been performed on dozens of experts in order to discover these dependencies. The main issue in these experiments is the analysis of the influence of visual cues on the perception of the surround sound. Conclusions concerning the complexity of the investigated problem are included.
Determination of Influence of Visual Cues on Perception of Spatial Sound

Jens Blauert,John Mourjopoulos,Jorg Buchholz,
For human listeners, many of the reflections generated inside rooms are masked by the direct signal and other reflections. To describe this masking, a multidimensional function is introduced, which determines the reflection masking threshold (RMT). Based on this function, a perceptual model has been developed, which can evaluate the audibility of reflections, as it is described in examples derived from simulated rooms.
Room Masking: Understanding and Modelling the Masking of Reflections in Rooms

Kuniaki Honno,Michael Cohen,William Martens,
A psychophysically derived control for the perceived range of a virtual sound source was implemented for the Pioneer sound field controller (PSFC), a spatial auditory display employing a 15-loudspeaker hemispherical array. Capable of presenting two independent sound sources moving within a simulated reverberant environment, the PSFC primitives include parameters to manipulate source azimuth and elevation as well as the size and liveness of the simulated space. As accurate control of virtual source range was confounded by variations in both the liveness parameter and overall PSFC channel volume, an empirical approach was employed to derive a look-up table (LUT) inverting the average range estimates obtained from a group of human subjects who listened to a set of virtual sources (short speech samples).
Psychophysically-derived control of source range for the Pioneer Sound Field Controller

Thomas Sporer,Jan Plogsties,Sandra Brix,
This paper introduces the activities and technical steps of an interdisciplinary European project called CARROUSO. This name stands for creating, assessing and rendering in real time high-quality audiovisual environments in MPEG-4 context. The key objective of this project has been to provide a novel technology that enables the transfer of a sound field, generated at a certain real or virtual space, to another usually remote located space. New modeling, recording, encoding, decoding, and rendering techniques, which support and implement this technology, are discussed.
CARROUSO - An European Approach to 3D-Audio

Gaetan Lorho,Olli Tuomi,Nick Zacharov,
This paper examines how various aspects of the physical characteristics of the human head and torso affect directional loudness characteristics. Modeled directional characteristics are presented based upon the head-related transfer functions (HRTFs) of a number of individuals in conjunction with the Moore loudness model. Data is presented for the frontal, horizontal, and median planes. Variations between individuals are explored, as are the difference between near- and far-field HRTFs. The contributions of the pinna, head, and torso are examined separately.
Auditory periphery, HRTF's and directional loudness perception

Francis Rumsey,David Murphy,
The spatial rendering of sound in virtual reality systems can quickly become a computationally expensive process. The authors propose a spatial sound rendering system, which allows the graceful degradation of spatial quality based upon scaling parameters. The parameters are a combination of both physical and perceptual attributes. The scalable spatial sound rendering system is divided into three user profiles: professional, pro-sumer and consumer, where each profile comprises a number of varying levels of quality. Typical applications for this scalable framework include mobile VR systems and personal VR systems based upon standard multimedia PCs. One of the main advantages of this scalable architecture is that the audio content is created once and is appropriately scaled for the end user (write once read many).
A Scalable Spatial Sound Rendering System

Lauri Savioja,Jarmo Hiipakka,Tapio Lokki,
A new evaluation framework for virtual acoustic environments (VAE) is introduced. The framework is based on the comparison of real-head recordings with physics-based room acoustic modeling and auralization. The real-head recording procedure and VAE creation method are discussed, and new signal processing structures for auralization are introduced. As a case study, recordings were made in a classroom, which was also modeled and auralized.
A framework for evaluating virtual acoustic environments

Manuel Recuero-Lopez,Juan Gomez-Alfageme,
The software allows the study and analysis of direct-radiation loudspeaker systems of one or multiple ways and with different configurations in the woofer unit by means of computer simulation based on electroacoustical analog circuits. This program allows simulating the electrodynamic loudspeaker behavior from the electrical, mechanical, and acoustical points of view and provides information about the different variables of frequency response.
SSAVV: A Loudspeaker Systems Simulation Software

Davide Saronni,Davide Doldi,Mario Di Cola,
The directional properties of horn devices are governed by the wave front's shape presented at the mouth. An analysis of the sound pressure distribution across the horn's mouth, called pressure distribution mapping, can certainly help to understand how the wave front is shaped there. Moreover, it can also help to understand what happens in some particular circumstances. For example, midrange beaming or high-frequency mouth diffraction phenomena are two well-known obstacles to overcome when designing a broadband constant directivity horn. This method, which was forwarded by the authors in previous work, has been extended to some different cases and has been improved in the data processing. The results of such analysis are described through graphic illustrations. Presented are the results obtained performing measurements upon real devices correlated to traditional directivity plots as well.
Horn's directivity related to the pressure distribution at their mouth: part 2

Peter Fryer,David Henwood,Jon Moore,Gary Geaves,
An approach for simulating transient structural wave propagation in loudspeakers is described. The finite-element method is used for spatial discretization, and the Laplace transform is used for the time solution. The accuracy of the spatial discretization has been verified by simulating the acoustic frequency response of a loudspeaker and comparing the results with measured data. A damping model is introduced, which approximates standard hysteretic damping and can be used directly in both the time and frequency domains. The overall approach has been verified by comparing laser measured and simulated results of the transient structural response of a loudspeaker to an impulselike excitation. Finally, structural energy is plotted and discussed.
Verification of an approach for transient strucural simulation of loudspeakers incorporating damping

Francesco Maffioli,Umberto Nicolao,
Although there is no question that the implementation of a parallel crossover network represents the most useful and flexible approach to the electroacoustical transducer matching in a loudspeaker system, there is also no doubt that series-type crossover networks can provide the designer with some interesting features. Unfortunately, the conventional formulae are restricted to the case where the impedance loads, representing the drivers, are equal to each other; therefore, limiting the application range. This work explores in more detail the series crossover network topology, presenting more general formulae and showing the advantages and disadvantages with respect to a parallel solution. In this paper first- to less-than-second-order circuit realizations are considered.
Series-Type Passive Crossover Networks. Part 1: First and Less-Than-Second Order Crossovers

Johan van der Werff,
It is possible to arrange loudspeakers in such a way that only one lobe emits from the array. This lobe can have an arbitrary beam width and, to a certain extent, an arbitrary beam shape. Because of this control over the beam, narrow beam widths can be made where wave fronts travel coherently 200 meters or more. It is now possible to cover an area below and in front of the array from almost zero to 200 meters with even direct sound distribution of ±3 dB, where the frequency response is dependent on the transducer used and the air absorption. This eliminates the coloration effects due to side or grating lobes.
Electronically controlled loudspeaker arrays without side lobes.

Per Rubak,Lars Johansen,
A new three-band loudspeaker/room correction system has been designed in order to reveal what room acoustic properties and psychoacoustic relations are necessary and sufficient to consider. Most parameters are variable, and the system is designed to enable real-time implementation. A small-scale listening test has revealed that even when operating on high-end audio equipment and employing a well-damped listening room, improvements in reproduction quality can be achieved. Also in a listening position far away from the optimal and corrected one, some improvement is observed.
Listening Test Results from a new Digital Loudspeaker/Room Correction Systems

Kirill Horoshenkov,Neil Harris,Elena Prokofieva,
This work investigates experimentally the effect a porous layer has on the acoustic response of distributed mode loudspeakers (DMLs), which are manufactured under license from NXT plc. The experiments were carried out in an anechoic chamber. The results suggest that a porous layer between a rigid base and a DML panel can considerably alter the acoustic emission in the near field and the far field. This is typically illustrated by a reduction in the level of fluctuations in the emitted acoustic pressure spectra. These fluctuations are normally associated with interference between the sound emitted by the front surface of the loudspeaker and by that emitted from the back. The results also suggest that the interference pattern in the air gap is altered by the porous layer so that some individual resonances in the acoustic pressure spectra, which inevitably occur between the rigid base and the vibrating plate, can appear suppressed. A numerical simulation was carried out to model this effect.
The effect of porous materials on the acoustic response of DML panels

Grzegorz Matusiak,Andrzej Dobrucki,
A denominator of a transfer function of a symmetrical bandpass loudspeaker system can be presented as a product of three or four second-order polynomials. Every polynomial can represent the denominator of the transfer function of an electrical filter. The product of remaining polynomials forms the denominator of the transfer function of a nonsymmetrical loudspeaker system. Then, various combinations of these polynomials provide various possibilities of realization of the entire system.
Design principles for symmetrical band-pass loudspeaker systems of sixth and eighth order

Dong-Yan Huang,
This paper presents a novel approach for pitch detection of musical sound signals using the signal-adapted wavelet transform (WT). Since the effectiveness of the wavelet transform for a particular application depends on the choice of the wavelet function, a wavelet function derived from the input power spectral density (PSD) has been designed to concentrate the signal energy in the low-frequency region. Based on the corresponding wavelet transform, a time-based event detection method is proposed to extract the pitch periods information from the wavelet coefficients. Because the wavelet is signal adapted and presents the characteristics of the signal, the pitch detector is suitable for sound signals having a range from 50 to 4000 Hz of the fundamental frequency values. The simulation results and real music experiments demonstrate that the main features of this method are better accuracy than the different methods for pitch period estimation and robustness to noise.
Signal-Adapted Wavelet for Pitch Detection of Musical Signals

Anssi Klapuri,
Two generic mechanisms are proposed to facilitate the efficient integration of audio content analysis algorithms. The first mechanism, priority-rule-based interleaving of algorithms, allows the simultaneous interoperation of several bottom-up analysis modules by interleaving their atomic steps. It aims at increased accuracy through controlled manipulation of common data. The second mechanism, top-down routing of requests for data, allows high-level predictions to direct the bottom-up analysis toward verifying the predicted hypotheses by observations. Examples from automatic music transcription are presented to clarify the use of the proposed methods.
Means of integrating audio content analysis algorithms

Bernd Schoner,Tristan Jehan,
A real-time synthesis engine that models and predicts the timbre of different acoustic instruments based on perceptual features is presented. The paper describes the modeling sequence, including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. Demonstrations included the timbre synthesis of stringed instruments and the singing voice as well as the cross-synthesis and timbre morphing between these instruments.
An Audio-Driven, Spectral Analysis-Based, Perceptual Synthesis Engine

Tim Brookes,
There are several different audio frequency scales in common use, each has its particular merits. The speech-based frequency scale derived here, from vowel formant frequency difference limens, has a markedly different shape from the others and attaches more relative weight to the range of frequencies associated with vowel perception, making it potentially well suited to speech analysis applications.
A Speech-Based Frequency Scale

Rei-Wen Wang,Alvin Su,
A novel music analysis/synthesis method is proposed. The basic structure comprises a delay line, a feedback filter, and a short wavetable as the excitation signal. Because most musical tones are quasiperiodic, the feedback filter predicts the next input data to the delay line based on the signal in the delay buffer. The filter coefficients are obtained in the analysis process performed by using source signals as the teaching vector and a recurrent neural network learning procedure. Because the basic architecture is identical to digital waveguide filters (DWF), the most efficient supplemental processing and implementation techniques for DWF can be applied. Instead of a fixed-length delay line, a variable-length delay line and a control method are embedded when a wide-range portamento is required. The proposed method is currently applied to synthesize plucked-string instruments.
A Novel Portamento Embedded Model For Analysis and Synthesis of Musical Sound

Matti Karjalainen,Vesa Valimaki,Paulo Esquef,
This paper presents new propositions for audio restoration and enhancement based on sound source modeling. The paper describes a case based on the commuted waveguide synthesis algorithm for plucked-string tones. The main motivation is to take advantage of prior information of generative models of sound sources when restoring or enhancing musical signals.
Restoration and Enhancement of Instrumental Recordings Based on Sound Source Modeling

Eduardo Miranda,
This paper presents a technique for synthesizing prosody based upon information extracted from spoken utterances. The author is interested in designing systems that learn how to speak autonomously, by interacting with humans. The motivation for an in-depth investigation on prosody is prompted by the fact that infants seem to have acute prosodic listening during the first months of life. We presume that any system aimed at learning some form of speaking skills should display this fundamental capacity. This paper addresses two fundamental components for the development of such systems: prosody listening and prosody production. It begins with a brief introduction to the problem within the context of the research objectives. Then, it introduces the system and presents some commented examples. The paper concludes with final remarks and a brief discussion of future developments.
Synthesising Prosody with Variable Resolution

Tuomas Virtanen,
The paper describes a method in which two stable sinusoids can be represented with a single sinusoid with time-varying parameters and, in some conditions, approximated with a stable sinusoid. The method is utilized in an iterative sinusoidal analysis algorithm, which combines the components obtained in different iteration steps using the described method. The proposed algorithm improves the quality of the analysis at the expense of an increased number of components.
Accurate Sinusoidal Model Analysis and Parameter Reduction by Fusion of Components

Andrzej Czyzewski,Bozena Kostek,
The paper discusses the subject of retrieval of musical data from the Internet or multimedia databases, which has been carried out for some time now but does not successfully reach its final stage of application. There are still many problems related to the subject of automatic recognition of music or musical instrument sounds that can not be solved easily. Especially important are finding adequate parameters of musical signal based on time and frequency and/or wavelet analysis. Proposed feature vectors were derived on the basis of the constructed databases that contain recorded musical sounds. This study shows that some methods of automatic identification of musical instruments are based on both classical-statistical and soft-computing approaches. They were then used to classify musical instruments. The results obtained in the investigations are presented and analyzed, thus leading to some specific and some more general conclusions.
Automatic Recognition of Musical Instrument Sounds - Further Developments

Frank Thomas,Gary Hebert,
The authors encountered anecdotal evidence that suggests field failures of existing line driver and microphone preamplifier integrated circuits (ICs) correlated with accidental connections between line outputs and microphone inputs with phantom power applied. Analysis showed that the most probable mechanism was large currents flowing as a result of rapid discharge of the high-valued ac-coupling capacitors. Commonly used protection schemes were measured and analyzed, and the results show they are lacking. More robust schemes that address these shortcomings are presented. It was concluded that the small additional cost of these more robust protection schemes is outweighed by the reduction in field failures and their associated repair cost.
The 48-Volt Phantom Menace

Guillaume LE DU,Michael Williams,
No one microphone array is able to fulfill the needs of the sound engineer in all the different sound recording environments he encounters. This paper presents over 220 multichannel microphone arrays using cardioid microphones and describes their particular characteristics with respect to front-triplet, lateral-pair and back-pair coverage, together with the specific segment offset values required for critical linking. Arrays have been chosen to assist the sound engineer in his search for the optimum microphone array for a given recording situation.
The Quick Reference Guide to Multichannel Microphone Arrays - Part I : using Cardioid Microphones

Emmanuelle Bourdillat,Diemer de Vries,Edo Hulsebos,
In order to correctly reproduce (auralize) the acoustic wave field in a hall through a wave field synthesis (WFS) system, impulse responses are nowadays measured along arrays of microphone positions. In this paper three array configurations are considered: the linear array, the cross-array, and the circular array. Both linear and cross-array configurations have strong limitations, most of which can be avoided by using circular arrays. For the circular array configuration, the connection between circular holophony, high-order incoming and outgoing ambisonics, and plane-wave decomposition for a sound field are established and are used as tools for auralization. Auralization techniques are explained for all three types of arrays.
Improved microphone array configurations for auralization of sound fields by Wave Field Synthesis.

Martin Schneider,
The geometry of the microphone surrounding a transducer capsule has a large influence on the acoustical behavior of the transducer as a whole. Therefore, only four microphone design principles are in common use today: with mostly free-standing capsules, with cylindrical housings, embedded in large-boundary layers or with spherical housings. Especially for omnidirectional pressure transducers, the spherical housing can be applied, yielding positive results on frequency response and polar pattern. Spherical housings have been investigated and were introduced to microphone design some 50 years ago. An overview of the historical development and its applications are presented, along with the current embodiments of this principle.
Omnis and Spheres - Revisited

Robert Schulein,Wim Soede,
This development effort focuses on the design of a small, highly directional array microphone. Such a design is valuable for applications where the directivity of first-order microphones is too low and the physical size of traditional shotgun or parabolic microphones prohibits their use. Application examples include hearing aids, automotive mobile phones, and interview situations with less visibility of the microphones.
Optimization of the diffuse field performance of a miniature highly directional array microphone.

Elena Milanova,Emil Milanov,
This paper describes the polar patterns of electrodynamic microphones with two acoustical entrances. By definition the polar pattern is a ration between the microphone sensitivity when theta angle varies related to the sensitivity when thita is 0 degrees when the frequency and the sound pressure are constant. It is well known that the reason for the proximity effect is the change of the wave front from a plane to a sphere when the distance to the sound source decreases. This paper defines the relationship of the polar pattern in a wave front with a sphere and plane form.
Proximity effect and space characteristics of microphones

Magnus Johansson,Per Ove Almeflo,
A digital microphone has high-power consumption compared with analog. It will result in a higher working temperature, especially if traditional linear regulators are used in the microphone power supply circuitry. For a better degree of power efficiency, a switch-mode power supply can be used. A switch-mode power supply will, however, add noise. If the pulse-width modulator is synchronized to the analog-to-digital converter’s sample-rate clock, the switch-mode power supply noise decreases by the elimination of the alias frequencies.
Suppression of Switch Mode Power Supply Noise in Digital Microphones

Elena Milanova,Emil Milanov,
This paper discusses the proximity effect of electrodynamic microphones with two acoustical entrances. The proximity effect appears when a directed microphone gets closer to the sound source. It has been described as an alteration of the frequency response and, more specifically, as an increase of the microphone’s sensitivity when the distance is decreasing. The reason for this change is the change of form of the sound wave from a plane to a sphere when the distance to the sound source is decreasing. The goal of this work has been to determine the relationship of the proximity effect from the angle between the sound wave and the acoustical axis of the microphone.
Proximity Effect of microphone

Jin-woo Hong,Tae-jin Lee,Jose Soler Lucas,
During the last few years, a number of efforts have been done toward audio streaming. But among them, little has been done on MPEG-2 Advanced Audio Coding. This paper presents the process the authors used to build an AAC streaming service over RTP for use in the Internet. The result obtained is quite satisfactory and fulfills all the authors' prospects.
Design and Implementation of a Real-Time Audio Service using MPEG-2 AAC and Streaming Technology.

Jurgen Herre,Christian Neubauer,Frank Siebenhaar,
Perceptual audio coding has become a customary technology for storage and transmission of audio signals. Audio watermarking enables the robust and imperceptible transmission of data within audio signals, thus allowing valuable information to be attached to the content, such as song title, name of the composer, and artist or property rights-related data. This paper describes a new concept for simultaneous low bit-rate encoding and watermark embedding for audio signals. In particular, the advantages of this combined technique over separate steps of encoding and watermark embedding are discussed (i.e., encoding of water-marked PCM audio signals or watermarking of existing bit streams). Experimental results obtained from the first implementation of an extended MPEG-2/4 AAC encoder are shown.
Combined Compression / Watermarking for Audio Signals

Ton Kalker,Werner Oomen,Aweke Negash Lemma,Jaap Haitsma,Fons Bruekers,Michiel Veen,
Based on existing technology used in image and video watermarking, the authors have developed a robust, multifunctional, and high-quality audio watermarking technique. The embedding algorithm operates in the frequency domain, where the magnitudes of the Fourier coefficients are slightly modified. Watermark detection relies on cross-correlation techniques in which not only the presence of a watermark is detected but also its payload. Experiments demonstrated that for this particular watermark, objective (ITU-R BS.1387) and subjective (ITU-R BS.1116) audio quality measures correlate fairly well. Combined analysis of the perceived audio quality and robustness indicated that specific watermark parameters can be optimized for different applications. These range from copy management (limited information capacity, high robustness, and very high audio quality) to broadcast monitoring (intermediate to large information capacity, intermediate robustness, intermediate to high audio quality).
Robust, multi-functional, and high quality audio audiowatermarking technology

Jurgen Herre,Ralph Kulessa,Christian Neubauer,
Audio watermarking enables the robust and imperceptible transmission of data within audio signals. Among the many possible applications of this technique, a number of scenarios require direct embedding of watermarks into a compressed signal representation. Such bitstream watermarking TX systems enable, e.g., on-the-fly embedding of transaction specific data at the time of content delivery via the Internet. This paper extends previous work on bitstream watermarking for MPEG-2/4 Advanced Audio Coding (AAC) toward a compatible family of watermarking schemes for MPEG-1/2 layer 3 (MP3), MPEG-2/4 AAC, and uncompressed audio. Regardless of the format used during data embedding, the same watermark extractor can be employed to recover the embedded message. Both the underlying concepts and relevant experimental results for these schemes are described.
A compatible family of bitstream watermarking schemes for MPEG-Audio

Michael Clausen,Frank Kurth,
The paper presents a system for indexing an index-based search in PCM-based audio material. Given a short excerpt of a waveform signal as a query, the index returns all pieces in a database containing that waveform. Additionally, the precise position of the waveform within those pieces is returned. The indexing method is robust against several signal processing operations, such as lossy compression or addition of noise. Indexing of a test database consisting of approximately 10 GB of audio data resulted in an index size of 16 MB. Response times to queries of lengths of about one or one-half second are only fractions of a second.
Full-Text Indexing of Very Large Audio Data Bases

George Kalliris,Charalampos Dimoulas,George Papanikolaou,Costas Rizakos,Christos Sevastiadis,
The current work focuses on the development of a web-based user-friendly environment for the design and management of distance-learning courses. Its objective is to promote the simple web service to a dynamically refashioning multimedia-enabled tool to the tutor, offering advanced learning facilities to the trainees. Based on the core of previous work and the incorporation of contemporary technologies--such as a database server and a web portal application server--a system capable of managing numerous users and courses, accommodating many new services, and providing new means of communication has been designed. The system is currently being evaluated through a digital audio distance-learning course which is offered via the Internet to the students of the authors’ department.
Development of a distance-learning environment, using database driven dynamic web pages. Application for digital audio internet courses

Siegbert Herla,
For an economic realization of future online archives working in a global network structure, the right course shall be set. Therefore during the digitization and capturing process of single sound carrier archives, attention is already being paid to the registration of relevant metadata, along with the re-recording of sound and video data. The description of sound and video content, their technical quality, cue sheet, and status information are as equally important as a unique material identifier for a worldwide unique identification of source material. In addition, the continuous gathering of metadata shall accompany the audio and video material on the road. A platform-independent container, such as the BWF format, is used to gather all the helpful information on the way.
Online Archives - the Pressure of Metadata

Neil Packer,
A permanent large-scale audio system for sound reinforcement and zoned announcements was installed in the 760-hectare (1900-acre) Sydney Olympic Park precinct--the site of 14 major venues for the 2000 Olympic Games events. During the Games, up to 450 000 people were present on the site at any time. Around the site, 550 distributed loudspeakers were driven from 56 adjacent dual-channel amplifiers. Networked digital audio directly accessed loudspeaker amplifiers, carried across the site on both optic-fiber and copper data links. The paper discusses the benefits of this approach, including flexible signal routing, distributed signal processing, direct-to-network digital message playout, and centralized or local control alternatives.
A Networked Sound Reinforcement and Announcement System for the 2000 Sydney Olympic Games

Wolfgang Teuber,Ernst-Joachim Voelker,
The office sound in open spaces can be consequently designed to obtain confidence and acceptance. The improvement of listening conditions includes the avoidance of interfering speech from adjacent workplaces, disturbing noise from computers or telephones, sound from air conditioning outlets or noise from photocopy machines. Sound design is defined as the build up of a pleasant and well accepted sound field within the office area. The tools implemented are absorption, distance, and desk orientation combined with artificial ambient noise for masking purposes. A field of confidence can be found within limits of acceptance. Beyond these limits, disadvantages occur with the consequence of negative judgments and rejections. The limits are sometimes so strong that little adjustments must be made to overcome other measures. Some open-planned offices need to be continuously monitored, checking the important influencing factors. The goal is privacy for every office workplace with different privacy requirements. A scale of privacy exists and must be used properly.
Sound Conditioning and Acoustical Sound Design for Office Working Places

Oliver Schmitz,Michael Vorlaender,Stefan Feistel,Wolfgang Ahnert,
The combination of a well-known electroacoustical simulation software with a state-of-the-art room acoustical simulation engine by merging the advantages of both to form a multipurpose simulation tool is introduced. The new tool is useful for consultants for application in complex situations in medium-sized and large rooms. Advantages of combining a high-quality loudspeaker database and a powerful 3-D CAD interface with a hybrid image source/ray-tracing algorithm, including diffuse scattering, are discussed. Details about the room acoustical model and its limitations as well as the extended possibilities of the system are the main topics of this paper.
Merging software for sound reinforcement systems and for room acoustics

Peter D'Antonio,Trevor Cox,
Modes in small rooms may lead to uneven frequency responses and extended sound decays at low frequencies. In critical listening environments, they often cause unwanted coloration effects, which can be detrimental to the sound quality. Choosing an appropriately proportioned room may reduce the audible effects of modes. This paper details a new methodology for determining the room dimensions for small critical listening spaces. It is based on numerical optimization of the room dimensions to achieve the flattest possible frequency response. The method is contrasted with previous techniques.
Determining Optimum Room Dimensions for Critical Listening Environments: A New Methodology

Trevor Cox,Francis Li,
This paper presents a novel method to extract Speech Transmission Index (STI) from reverberated speech utterances using an artificial neural network. The convolutions of anechoic speech signals and simulated impulse responses of rooms of various kinds are used to train the artificial neural network. A time-to-frequency domain transformation algorithm is proposed as the preprocessor. A multilayered feedforward neural network trained by back propagation is adopted. Once trained, the neural network can accurately estimate STI from speech signals received by a microphone in rooms. This approach utilizes a naturalistic sound source-speech, and hence has potential to facilitate occupied measurement.
Extraction of Speech Transmission Index from Speech Signals Using Artificial Neural Networks

Bayan Sharif,Oliver Hinton,Andrzej Dobrucki,Krzysztof Passella,
This paper is a proposal for an extension of the wave field synthesis (WFS) method toward a new direction in adaptive sound reinforcement systems. The positions of sound sources may be found using array signal processing techniques. Analysis of an acoustic field can be performed on a microphone array using the direction of arrival (DOA) algorithm, which can estimate angle of arrival of acoustic waves, such as tones or speech signals. This paper presents the real-time analysis of an acoustic field based on the implementation of DOA algorithms on the DSP board.
The wave field analysis and synthesis of an acoustic filed in rooms using the concept of a direction of arrival algorithm

Tapio Lahti,Anssi Ruusuvuori,Henrik Moller,
The paper describes a series of acoustic measurements in Finnish concert halls. The measurements were made using the IRMA system, which is described in another paper offered for presentation at an earlier conference. Both traditional single-channel measurements and binaural measurements were taken in all halls. In some halls multidimensional measurements were also made using the special probe described in the above mentioned paper. The measurement results are used in a comparison of the acoustic conditions in Finnish concert spaces to other spaces.
The acoustic conditions in Finnish concert spaces - Preliminary results

Ryoichi Omachi,Damian Leonard,Damian Rowe,Mac Takeuchi,
Sydney 2000 Olympic Games was held in Australia from 2000 September 15 to October 1, with more than 10 000 athletes in 300 events. The authors designed the sound systems for 34 venues, not only for indoor events but also for outdoor events. Computer acoustic simulation software was originally developed to support designing a sound system for each venue. It summarizes to maintain the high-quality sound and good intelligibility for long-distance transmission, even for outdoor venues, by using digital technology. Furthermore, sound pressure level (SPL) of each venue was measured during the actual Games. This paper describes how sound systems for the Sydney 2000 Olympic Games were designed and how they performed.
Sound System Design for the Sydney 2000 Olympic Games

Sidnei Noceti Filho,Rosalfonso Bortoni,Rui Seara,
A procedure for analyzing, designing, and assessing audio power amplifier output stages operating in Class A, B, AB, G, and H with reactive loads is presented. This study considers steady-state sinusoidal analysis for BJT, IGBT, and MOSFET technologies. Electrical-mechanical-acoustical models of loudspeakers and enclosures whose parameters were obtained through the Thiele-Small model were used. An equivalent electrical-thermal model for the transistor-heatsink-ambience associated with the instantaneous and average powers was used for designing the power stage. MATLAB software has been developed, which provides considerable support to the designer for all required phases of an audio power amplifier output stage design.
Analysis, Design and Assessment of Class A, B, AB, G and H Audio Power Amplifier Output Stages Based on Matlab Software

Enrico Armelloni,Alberto Bellini,Angelo Farina,
This work defines a new method for processing audio signals, with the aim to recreate an audible simulation (auralization) of the modification imposed on the original signal by a complex system. The new method is an extension of the classic auralization process based on the linear convolution of the 'dry' original signal with the impulse response of the system. The extension allows the emulation of nonlinear systems, characterized in terms of harmonic distortion at several orders. First, the work presents the mathematical framework of the proposed implementation, then shows how a nonlinear system can be experimentally characterized by a new measurement method of multiple impulse responses at various harmonic orders. Finally, it shows how these impulse responses can be employed in a multiple convolution process: an experimental demonstration was performed for the similarity of the numerically processed sound with the live recording coming from a highly distorting device.
Not-Linear Convolution: A New Approach For The Auralization Of Distorting Systems

George Brock-Nannestad,
A number of physical limitations are inherent in analog recording media; and for this reason, compromises have to be accepted. In order to widen the frequency range of the recordings during mechanical, optical, and magnetic recording, pre-emphasis at recording was used, which was then suppressed again by a complementary de-emphasis at replay. The paper traces the parallel development in all three fields of analog recording.
Pre- and De-Emphasis - A Forgotten Necessity

Nicolaos Tatlas,John Mourjopoulos,Andreas Floros,
Using analytic PCM-to-PWM mapping, combined with a novel method for eliminating PWM-induced distortions (jithering), a distortion-free, all-digital and high-quality PWM coder was developed. High-efficiency performance was achieved at switching frequencies between 44.1 to 176.4 kHz. A field programmable gate array-based environment was used for the implementation of the PWM converter, which is suitable for any digital audio applications.
A Distortion-free PWM Coder for All-digital Audio Amplifiers

Malcolm Hawksford,
A theory of smart loudspeaker arrays is described, where a modified Fourier technique yields complex filter coefficients to determine the broadband radiation characteristics of a uniform array of microdrive units. Beamwidth and direction are individually programmable over a 180-degree arc, where multiple-agile and steerable beams carrying dissimilar signals can be accommodated. A novel method for diffuse filter design is also presented, which endows the directional array with diffuse radiation properties.
Smart directional and diffuse digital loudspeaker arrays

Girish Subramaniam,Raghunath Rao,
Nonlinear quantization of the type INT(x M/N + constant) is commonly used in audio compression techniques, particularly MPEG-1 and MPEG-2 layer 3 (MP3) and MPEG Advanced Audio Coding (AAC). Finding a suitable DSP implementation is a problem, since look-up table methods are prohibitive because of excessive storage requirements, conventional series approximation methods do not give sufficient precision, and not all processors have log/exp assist functions. This paper describes a method that utilizes the property of geometric periodicity of the x M/N function to first normalize the problem to a small range of input x. Subsequently, one can choose to perform the x M/N in this limited range based on look-up, interpolation or series expansion, and finally re-normalize the output to obtain the overall answer. Using a hybrid scheme based on look-up and interpolation, very good overall precision is achieved. Compared to direct application of any of the above techniques, there is very little additional computational burden, and the improvement in precision is very significant. Mathematically this method has shown to be a special case of log-exp-based computation where the log is quantized.
Optimized DSP Implementation of Non-Linear Quantization

Alexander Goldin,Meir Tzur (Zibulski),
This paper addresses the problem of equalizing an audio signal in a constantly changing noisy environment. The purpose of equalization is to provide perceptually equal loudness of sound regardless of the environmental conditions. Based on an automatic estimation of noise level and its spectral content, selective amplification of frequencies masked by noise is performed. In the case of speech signals, the result is intelligible speech regardless of the surrounding noise. For musical signals, an improved comprehension of the musical content is achieved.
Sound equalization in a noisy environment

Helmut Bresch,Wolfgang Mathis,Frank Felgenhauer,Martin Streitenberger,
Zero Position Coding (ZePoC) is introduced in this paper as a generalized concept for describing methods of generating binary signals with varying pulse lengths. This class of signals is of basic interest within concepts of Class-D power amplification. It is emphasized that from a generalized point of view, such signals are generated by coding the positions of the zero crossings (sign changes) of some auxiliary signal being uniquely determined by the audio input signal. The new ZePoC concept includes classical methods, such as NPWM and UPWM, as well as a new method, SB-ZePoC, which allows the generation of a binary signal with separated base band. The methods are compared, showing that SB-ZePoC is favored for use in the Class-D amplification concept. Results of the first full audio band implementation of SB-ZePoC are given.
Zero Position Coding (ZePoC) - A Generalised Concept of Pulse-Length Modulated Signals and its Application to Class-D Audio Power Amplfiers

Peter Fels,
Multichannel sound for a teleconferencing system with better sound quality and wide-band multichannel sound production, transmission, and reproduction is described. This method has the advantages of the coincidence of visual and auditory perspective, the minimum of feedback, and several others. The first part of the paper describes a modern solution for optimizing the technology of the multichannel production and for minimizing the echo and noise. The second part discusses the complete solution in greater detail. The third part examines the applications of the principles of the delta-stereophony-systems (SOR) for multichannel conferencing systems.
Multichannel and "SOR" principles for conferencing and teleconferencing systems

Andreas Gernemann,
Until now, considerations for the arrangement of microphones using three frontal channels based on ITU-R BS 755-1 Recommendation assume that three frontal loudspeakers are set up in equal height and equal distance in front of a listener. Unfortunately in most home applications, the center loudspeaker is not set up as required by the ITU standard. In addition, most of these microphone techniques are not compatible with two-channel stereo. Stereo+C is an arrangement that allows use of normal stereo microphone techniques with an additional specially arranged center microphone. The entire arrangement is completely stereo-compatible and uncritical of a nonideal loudspeaker set up at the consumer's home.
Stereo+C: An All-Purpose Arrangement of Microphones Using Three Frontal Channels

Francis Rumsey,Paul Segar,
A number of surround sound arrays have been constructed with closely spaced microphones of cardioid and omnidirectional patterns. The spacing and angles between microphones were calculated to test two different psychoacoustics models that provide 360° imaging in a horizontal plane. A series of controlled subjective listening tests have been undertaken, and results comparing image localization accuracy and localization confidence between the arrays are presented. Results of the effect of crosstalk between opposite microphones in the arrays are also provided.
Optimisation and Subjective Assessment of Surround Sound Microphone Arrays

Chas Kennedy,John Emmett,
This paper raises a debate about the future options that are open for multichannel audio metering in television and radio production. These options are seen in the light of experience with movie production and as a part of the wider issue when metering is applied to legacy applications. In particular, the study questions the continued relevance of signal-level metering within all-digital audio chains. It also concentrates on the importance of dialogue loudness and intelligibility, when that dialogue may form an integral part of several different audio delivery packages. Finally, the visual-perceptual aspect of various types of meter display seems to be an important but neglected aspect of the overall measurement process.
Metering for Multichannel Audio

Werner de Bruijn,Wilfred van Rooijen,Marinus Boone,
Wave field synthesis (WFS) is a method to reproduce spatial sound with a correct localization over a large listening area. It enables a high-quality sound reproduction of sound objects according to, for instance, the MPEG-4 standard, but also compatible reproduction of 2/0 and 3/2 sound material. Recent developments are presented concerning true perspective acoustic reproduction, in combination with video projection, making use of different types of loudspeaker arrays, including multiexciter DML panels.
Recent developments on WFS for high quality spatial sound reproduction

Mark Sandler,Nikolaos Mitianoudis,Josh Reiss,
The paper presents a new method to extract the mutual information for data from any number of channels from either a discrete or continuous system. This generalized mutual information allows the estimation of the average number of redundant bits in a vector measurement. Thus, it provides insight into the information shared between all channels of the data. It may be used as a measure for the success of blind signal separation with multichannel audio. Several multichannel audio signals were separated using various ICA methods, and the mutual information of each signal was computed and interpreted. It was also implemented as a contrast function in ICA for a new method of blind signal separation.
Computation of Generalized Mutual Information from Multichannel Audio Data

Philip Nelson,Takashi Takeuchi,
When binaural sound signals are presented with loud-speakers, the system inversion involved gives rise to a number of problems, such as loss of dynamic range and a lack of robustness to small errors of control performance. These problems for such systems were investigated, resulting in the proposal of a new system, the optimal source distribution (OSD) system, which overcomes these problems by means of variable transducer span. A practical solution to realize a variable transducer span by discretization is also described. Several examples of the OSD system are demonstrated, which in practice produce a very robust system over the whole audible frequency range. The relationship to the stereo dipole system is also described.
Optimal source distribution system for virtual acoustic imaging.

Mark Sandler,christian landone,
Multichannel sound systems have been extensively used in sound reinforcement applications and in other large installations, but the overall impact on audiences still remains unknown at the design stage. This paper highlights some of the challenges involved in predicting the spatial reproduction performance of surround sound systems serving large and acoustically live listening areas and explores the shortcomings of current objective assessment methods.
Surround Sound Impact over Large Areas

Andrzej Czyzewski,Bozena Kostek,Piotr Odya,Artur Kornacki,
The problem of production of recordings designated for surround sound systems becomes a vital problem in sound technology. Existing standards of surround systems allow reproduction of spatial sound. However, there are no consistent recommendations for which microphone and mixing technique can be used in specific situations. For the purpose of research presented in this paper, several microphone techniques were employed for recordings of a quartet playing classical music. The mixing results in two-channel excerpts and several multi-channel ones designated for the 5.1 reproduction system. In order to find the most preferable recording technique, these excerpts were used in subjective tests.
Problems Related to Surround Sound Production

Bozena Kostek,Rafal Krolikowski,Andrzej Czyzewski,
The primary aim of this paper is to show that it is possible to localize the direction of the incoming acoustical signal based on the neural network trained for that purpose. Consequently, the automatically localized acoustical signal may be attenuated if it obscures the desired target sound. A set of parameters was formulated in order to localize target source and unwanted signals. To process acoustical signals incoming from various directions at the same time, a neural network-based system was designed and implemented. The feature extraction method is thoroughly discussed; the training process is described; and the recently obtained results are discussed.
Neural Networks Applied to Sound Localization Detection

Nikolaus Rettelbach,Doris Huhn,Harald Gernhardt,Wolfgang Fiesel,
The recently introduced ISO/MPEG-2 Advanced Audio Coding (AAC) technology provides a powerful framework, which covers almost any application from simple monophonic compression to full-featured multichannel coding. This paper discusses approaches and results of an implementation effort of a 5.1 AAC multichannel encoder on a high performance floating-point catalog DSP platform. Based on a new AAC coding strategy, it leads to a very efficient encoder on a single DSP; thus, also enables cost-effective, high-quality multichannel encoding for consumer-type applications.
MPEG-2 AAC Multi-Channel Real-Time Implementation on a Single Floating Point DSP

Peter Thorpe,Nathan Bentall,Gary Cook,Chris Gerard,Chris Sleight,Mike Smith,Peter Eastty,
This paper presents practical recipes for the processing of DSD-Wide [64FS 8-bit] signals, which are fully compatible with the DSD [64FS 1-bit] signals used by the SACD consumer audio format. The designs are presented in a schematic form compatible for implementation by interested engineers in either FPGA or (with some modification) traditional DSP methods. It is intended to open up the processing of such super high-fidelity signals to a wider audience.
DSD-Wide. A Practical Implementation for Professional Audio.

Matti Karjalainen,Tuomas Paatero,
Frequency-warped filters have recently been applied successfully to a number of audio applications. The idea of all-pass delay elements replacing unit delays in digital filters allows focusing of the enhanced frequency resolution on lowest (or highest) frequencies and enables a good match to the psychoacoustic Bark scale. Kautz filters can be seen as a further generalization where each transversal element may be different, including complex conjugate poles. This enables arbitrary allocation of frequency resolution for filter design, such as modeling and equalization (inverse modeling) of linear systems. In this paper the authors formulate strategies for using Kautz filters in audio applications. Case studies of loudspeaker equalization, room response modeling, and guitar body modeling for sound synthesis are presented.
Kautz filters and generalized frequency resolution - Theory and audio applications

Mark Sandler,Jean-Julien Aucouturier,
This paper presents a segmentation algorithm for acoustic musical signals, using a hidden Markov model. Through unsupervised learning, the authors discovered regions in the music that present steady statistical properties, such as textures. Different front ends for the system were investigated and compared to their performances. The paper then shows that the obtained segmentation often translates a structure explained by musicology, such as chorus and verse as well as different instrumental sections. Finally, it discusses the necessity of the HMM and concludes that an efficient segmentation of music is more than a static clustering and should make use of the dynamics of the data.
Segmentation of Musical Signals Using Hidden Markov Models.

Markus Cremer,Bernhard Froba,Oliver Hellmuth,Jurgen Herre,Eric Allamanche,
Fueled by the digital revolution, efficient data reduction schemes and the breakthrough of the Internet, an ever-increasing amount of audio material has recently become available in digital format. Efficient handling and possibility of identification for these items are becoming extremely important to manage such amounts of content. This paper describes a prototype system for a content-based identification system of audio material based on a database of registered works. The technical approach is outlined, and the system's current performance and the range of possible applications are discussed.
AudioID: Towards Content-Based Identification of Audio Material

Shreyas Paranjpe,
The author has developed an artificial reverberation device based on a novel, time-variant orthogonal matrix feedback delay network topology. This novel topology uses multiple, time-variant output taps for each delay line; and therefore, simultaneously reduces the amount of delay memory required without introducing coloration and increases the echo density. Furthermore, the system is guaranteed to be stable provided that certain constraints on the delay line lengths and tap weights are fulfilled. The implementation on a 24-bit digital signal processor requires only 16384 words of delay line memory for a four-channel input/four-channel output reverb.
Time-variant Orthogonal Matrix Feedback Delay Network Reverberator

Charalampos Dimoulas,George Papanikolaou,George Kalliris,
Equivalent masking noise estimation could be introduced in conventional broadband acoustic noise reduction to provide a new class of modified techniques. The psychoacoustical facts exploited in this paper resulted in a frequency-dependent parametric Wiener filter. A discussion of classical spectral subtraction and proof of equivalence under certain conditions to Wiener filtering is given. The concept of parametric Wiener filter is then examined, and a frequency dependence based on the model of pure tones masked by broadband noise is introduced. Finally, filter bank, STFT, and wavelet implementations of the new approach are compared to classical spectral subtraction for background noise reduction in old 78 rpm music disc recordings and noisy speech tape recordings.
Broad-Band Acoustic Noise Reduction using a Novel Frequency Depended Parametric Wiener Filter. Implementations using Filter-bank, STFT and Wavelet Analysis/Synthesis Techniques.

Bart de Bruyn,Francis Rumsey,Russell Mason,
The subjective spatial effect of noise signals with sinusoidal ITD fluctuations was investigated. Both verbal and nonverbal elicitation experiments were carried out to examine the subjective effect of the ITD fluctuations with a number of fluctuation frequencies and fluctuation magnitudes. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the perceived width of the sound.
An investigation of interaural time difference fluctuations, part 1: the subjective spatial effect of fluctuations delivered over headphones

Sean Olive,
A new computer-based listener training application is described, which trains listeners how to detect, classify, and rate linear distortions added to program material. All signal processing is performed natively on the computer, eliminating the need for expensive external hardware or large sound file libraries of pre-processed signals. The software adapts to the listener's ability and performs automatic statistical analysis and storage of the results on a database server. It provides a useful tool for selecting the most discriminating and consistent listeners for listening tests and product evaluation.
A New Listener Training Software Application

Francis Rumsey,Amber Naqvi,
The main objective of the active listening room project has been to design a critical listening environment where the key acoustic features of the room can be actively modified. The aim has been to create a truly variable listening condition in a reference listening room by means of active simulation of key acoustic parameters, such as the early reflection pattern, early decay time, and reverberation time. These parameters are likely to affect the subjective assessment of reproduced sound quality in a listening environment. Aims of the project are described, together with results of the preliminary experiments.
Active Listening room simulator: Part 1

Sean Olive,
Objective and subjective measurements were conducted on five commercial software-based plug-ins intended to provide spatial enhancement of stereo reproduction at the computer workstation. Listeners rated the sound quality of each using several different scales, such as preference, timbral balance, three different spatial attributes, and audible distortion; and provided comments. Regular stereo (no enhancement) was included as a hidden reference. The listening test results reveal clear winners and losers. Stereo was preferred over three of the five plug-ins tested. The subjective results tend to correlate with their measured frequency response.
Evaluation of Five Commercial Stereo Enhancement 3D Audio Software Plug-ins

Atsushi Marui,William Martens,
Multidimensional perceptual and semantic differential analyses were performed for a set of stimuli that were generated by submitting a prerecorded guitar performance to a popular multieffects processor. Within three nominal types of distortion effect (overdrive, distortion and fuzz), the drive setting of the effect was varied between minimum and maximum levels while adjusting the volume of the resulting sounds to maintain constant loudness. As the meaning of the drive parameter varies across these effects, changing the tone color for some while changing only the loudness for others, the loudness of the processor outputs was equalized prior to subjective rating sessions to determine what perceptual attributes the drive parameter affects besides loudness.
Perceptual and semantic scaling for user-centered control over distortion-based guitar effects

Francis Rumsey,Bart de Bruyn,Natanya Ford,
Nonverbal elicitation techniques may be used in addition to verbal methods in order to obtain meaningful subjective responses about the spatial attributes of reproduced sound. By analyzing results from a preliminary graphical investigation, the provision of such responses has been appraised, and practical considerations are highlighted. Data analysis indicates that nonverbal responses uphold conventional expectations with respect to the effect of loudspeaker and listener location on perceived sound images. With this in mind, it is suggested that the technique be used to assess variables that have not been subject to such intensive study or that have not been employed in situations where a verbal language may not be appropriate. Further investigations are, therefore, proposed with respect to the findings in this paper.
Graphical elicitation techniques for subjective assessment of the spatial attributes of loudspeaker reproduction - a pilot investigation

Francis Rumsey,Bart de Bruyn,Russell Mason,
The effect of the audio frequency of narrow-band noise signals with a sinusoidal ITD fluctuation was investigated. To examine this, a subjective experiment was carried out using a match-to-sample method and stimuli delivered over headphones. It was found that the magnitude of the subjective effect is dependent on audio frequency and that the relationship between the audio frequency and a constant subjective effect appears to be based on equal maximum phase difference fluctuations.
An investigation of interaural time difference fluctuations, part 2: dependence of the subjective spatial effect on audio frequency

Kaoru Ashihara,Shogo Kiryu,
A simulator to examine detection threshold of the distortion due to time jitter is proposed. Signals with artificial time jitter were simulated on digital data using oversampling, interpolation, and decimation. With this method, quantitatively controlled distortion was added to the musical signals, and the signals were presented to human subjects through a conventional DA converter. The amount of the distortion hardly depended on the equipment. The authors are examining detection threshold of time jitter. Preliminary results show that some subjects can detect jitter of several hundred nanoseconds.
A Jitter Simulator on Digital Data

Boaz Rafaely,Takashi Takeuchi,Philip Nelson,John Rose,
3-D audio systems are effective when the listener's head location is close to the head location assumed when the system was designed. In order to accommodate head movement, it is possible to design 3-D sound systems that continuously select appropriate virtual audio filters that correspond to a listener’s varying head position. The required spatial resolution of the audio filters depends on the size of the sweet spot. In this paper the size of the sweet spot of a two loudspeaker 3-D audio system is evaluated subjectively at symmetric and asymmetric head locations.
Variance of Sweet Spot Size with Head Location for Virtual Audio

Ayataka Nishio,Hiroshi Takahashi,
The digital encoding method known as Direct Stream Digital (DSD), which is based on direct recording of the 1-bit output signal of a delta-sigma modulated analog-to-digital (A-to-D) converter, provides an analoglike sound quality for both professional audio applications and the new consumer audio delivery format, Super Audio CD (SACD). In this paper an investigation of 1-bit delta-sigma conversion in some basic DSD signal processing is provided to show the practical performance of DSD-compatible production tools.
Investigation of Practical 1-bit Delta-Sigma Conversion for Professional AudioApplications

James Angus,
This paper clarifies some of the confusion that has arisen over the efficacy of dither in PCM and sigma-delta modulation systems. It describes a fair means of comparison between them. It also presents results that show dither is effective in sigma-delta modulation systems and proposes methods for achieving optimum performance in both systems.
Achieving Effective Dither in Delta-Sigma Modulation Systems

James Angus,
This paper presents a novel method that allows a direct comparison of the benefits of multibit sigma-delta modulation versus PCM in audio systems, when physically implemented. It then examines the effect of various errors, in particular nonlinearity and jitter in the two systems, and shows that the effect of component tolerances strongly limit the maximum performance of multibit quantizers in such systems.
The Practical Performance Limits of multi-bit Sigma-Delta modulation

John Vanderkooy,Stanley Lipshitz,
Single-stage, 1-bit sigma-delta converters are in principle imperfectible. The authors prove this fact. The reason, simply stated, is that when properly dithered they are in constant overload. Prevention of overload allows only partial dithering to be performed. The consequence is that distortion, limit cycles, instability, and noise modulation can never be totally avoided. The authors demonstrated these effects, and using coherent averaging techniques were able to display the consequent profusion of nonlinear artifacts, which are usually hidden in the noise floor. Recording, editing, storage, or conversion systems using single-stage, 1-bit sigma-delta modulators are thus inimical to audio of the highest quality. In contrast, multibit sigma-delta converters, which output linear PCM code, are in principle infinitely perfectible. (Here, multibit refers to at least two bits in the converter.) They can be properly dithered to guarantee the absence of all distortion, limit cycles, and noise modulation. The audio industry is misguided if it adopts 1-bit sigma-delta conversion as the basis for any high-quality processing, archiving or distribution format to replace multibit, linear PCM.
Why 1-Bit Sigma-Delta Conversion is Unsuitable for High-Quality Applications

Peter Nuijten,Derk Reefman,
An overview of Direct Stream Digital (DSD) signal processing is given. It is shown that 1-bit DSD signals can be dithered properly, so the resulting dithered DSD stream does not contain audible artifacts in a band from 0 to 100 kHz. It is also shown that signal processing can be done best in a high-rate, multibit domain. Arguments are given that the minimal frequency span needed to comply with the human auditory system is roughly 0 to 300 kHz. Following the signal processing, final conversion to DSD was made. It is demonstrated that Super Audio CD (SACD) is a very efficient consumer format. It is the format that uses the least bits from the disk while maintaining all necessary psychoacoustical characteristics, such as high-band width, filtering with wide transition bands, etc. - hence offering the longest playing time.
Why Direct Stream Digital (DSD) is the best choice as a digital audio format.

Malcolm Hawksford,
Significant misrepresentation of both 1-bit SDM and multibit LPCM coding paradigms persist within both professional and commercial arenas, which impact directly upon the perception of DVD-A and SACD formats. A balanced appraisal of these schemes is presented in order to expose the core differences in the technology, in both the theoretical and instrumentation domains. Some observations are made about the fallacy of performance comparisons and the consequence of misinformation that subsequently is derived.
SDM versus LPCM: the debate continues

Stanley Lipshitz,John Vanderkooy,
Although 1-bit sigma-delta modulators can not be adequately dithered to make their performance perfect, the application of substantial noise shaping gives them modest performance while maintaining reasonable stability. Partial dithering has been shown to improve these devices, but spuriae remain. By simulation, the authors studied: a) the nature of the idle tone; b) the effect of the order of the shaping filter; c) the influence of shaper stability; and d) the coherence and incoherence of the shaper's output. Calculation of these features demands high-numerical precision and is greatly aided by coherent averaging.
Towards a Better Understanding of 1-Bit Sigma-Delta Modulators

Peter Nuijten,Derk Reefman,
This paper addresses the issues of switching 1-bit audio streams, such as those in Direct Stream Digital (DSD) in the Super Audio CD format. A theoretical description is derived, which shows how these streams can be switched, independent of the sigma-delta topologies that are used. A simplification is also discussed, which is technically much simpler while still achieving high-quality crossovers between 1-bit audio streams without any audible artifacts.
Editing and switching in 1-bit audio streams

Robert W. Stewart,Gerhard Gruhler,Rolf Esslinger,
A completely digital audio power amplifier uses binary (two-level) signals up to the power stage. Often MOSFETs are used as power devices, since they can provide a high-output power together with a fast-switching speed. The generation of accurate, rectangular pulses in the analog power amplifier circuit is a problem, especially at pulses with an extremely short duration, since they may come from high-resolution sigma-delta modulation. In this paper some of the typical pulse signal waveform distortions introduced by the realistic switching power stage are shown by simple circuit models. Additionally, it is explained how certain errors can affect the linearity of pulse signals from sigma-delta modulation.
Digital audio power amplifiers using Sigma Delta Modulation - Linearity problems in the class-D power stage

Shogo Kiryu,Kaoru Ashihara,
To investigate audibility of ultrasounds contained in a complex tone, psychoacoustic experiments were designed. Human subjects were required to discriminate stimuli with and without components above 22 kHz. All subjects distinguished between sounds with and without ultrasounds when only the stimulus was presented through a single loudspeaker. When the stimulus was divided into six bands of frequencies and presented through six loudspeakers in order to reduce intermodulation distortions, no subject could detect any ultrasounds. It was concluded that the addition of ultrasounds might affect sound impression by means of some nonlinear interaction that might occur in the loudspeakers.
Detection threshold for tones above 22 kHz

Ville Pulkki,
The coloration of amplitude-panned virtual sources was studied with listening tests and auditory modeling for anechoic and reverberant listening. It was found that the amplitude panning produces a comb-filter effect that is audible in anechoic listening. When the listening room is reverberant, the effect is less audible, depending on the amount of reverberation. The coloration of a virtual source is dependent on the number of loudspeakers used to generate it and on the locations of loudspeakers.
Coloration of Amplitude-Panned Virtual Sources

William Martens,
Whereas the primary motivation in spatial hearing research has been to gain greater understanding of the mechanisms of human spatial hearing, the motivation for applied research has been the verification and validation of various spatial audio rendering technologies under development. This paper outlines some of the uses and misuses of psychophysical methods typically employed in the subjective evaluation of spatial sound reproduction. The emphasis is on the essential tension between engineering goals and scientific goals, which, while often conflicting, serve to focus psychophysical research upon resolving disputes between rival theories of how best to simulate spatial sound fields for a human listener.
Uses and misuses of psychophysical methods in the evaluation of spatial sound reproduction

Mikko Parviainen,Antti Eronen,Anssi Klapuri,Vesa Peltonen,
A listening test was conducted to determine the human abilities for recognizing everyday auditory scenes based on binaural recordings. The accuracy, latency, and acoustic cues used by the subjects in the recognition process were analyzed. The average correct recognition rate for 19 subjects was 70% for 25 different scenes, and the average recognition time was 20 seconds. In most cases, the test subjects reported that the recognition was based on prominent identified sound events.
Recognition of Everyday Auditory Scenes: Potentials, Latencies and Cues

Nick Zacharov,Ville-Veikko Mattila,
The need to perform subjective evaluations of audio is forever present. Such techniques are known to be inefficient and unreliable. This problem can be partially overcome by using so-called expert as opposed to naive listeners. Expertise is addressed in some depth to clarify its meaning and to illustrate the benefits in terms of reliability and repeatability of listening tests. The generalized listener selection (GLS) procedure is presented for establishing permanent expert listening panels for a wide range of subjective tests. The method allows the rapid selection and assessment of listeners based upon a number of criteria. Correct sampling of the population is achieved by an assessment of online questionnaires, followed by an audiometric evaluation. The last stage of the GLS procedure comprises three listening tests, identical in structure, designed to evaluate the discrimination skills and reliability of subjects. Means for the assessment of both intra-rater reliability and inter-rater agreement are presented.
GLS - A generalised listener selection procedure

Armin Kohlrausch,Jeroen Breebaart,
This paper discusses the perceptual consequences of smoothing the anechoic HRTF phase and magnitude spectra. The smoothing process is based on a binaural perception model in which interaural cues in the auditory system are rendered at a limited spectral resolution. This limited resolution is the result of the filter bank present in the peripheral auditory system (i.e., the cochlea). Listening tests with single and multiple virtual sound sources revealed that both the phase and magnitude spectra of HRTFs can be smoothed with a gamma-tone filter, which equals estimates of the spectral resolution of the cochlea without audible artifacts. The amount of smoothing was then increased by decreasing the order of the gamma-tone filters. If the filter order is reduced by a factor 3, subjects indicate spectral and positional changes in the virtual sound sources. The binaural detection model, developed by Breebaart, van de Par and Kohlrausch, was used to predict the audibility of the smoothing process. A comparison between model predictions and experimental data shows that the threshold at which subjects start to hear smoothing artifacts can be predicted accurately. Moreover, a high correlation exists between the model output and the amount of stimulus degradation reported by subjects.
Perceptual (ir)relevance of HRTF magnitude and phase spectra

David Clark,
Measured data from over 150 automotive sound systems was compared to subjective assessment of music tonal balance for the same systems. The trained listeners assessed a set of subattributes of tonal balance, such as peakiness and frequency extension, using music source material. Listening was always completed before measurements were made. The measured data was analyzed by a corresponding set of technical subattributes. Encouraging correlation was found between aspects of measured and perceived tonal balance.
Progress in Perceptual Transfer Function Measurement - Tonal Balance

Dimitris Christidis,George Kalliris,George Papanikolaou,Christos Sevastiadis,Charalampos Dimoulas,
This work focuses on the design and implementation of a computerized psychophysiological monitoring engineering system that detects human responses to environmental noises. Classical human stimulus monitoring analysis tools were used to treat human bodily response under the presence of annoying noises. Applicable sound recording and reinforcement equipment were selected and employed to accurately simulate environmental noise conditions. A computerized data acquisition and analysis system was developed. Some first results have been obtained and are currently analyzed in this paper.
Development of an engineering application for subjective evaluation of human response to noise

Back to AES Preprints

(C) 2003, Audio Engineering Society, Inc.