AES 117th Convention
San Francisco, CA, USA
October 28-31, 2004
AES Preprint Ordering
Single Convention Preprints are available through the AES Preprint Search and Shop facility.
Dodd, Mark; Klippel, Wolfgang; Oclee-Brown, Jack
Recent work by Klippel and Voishvillo et al. has shown the significance of voice coil inductance in respect to the non-linear behaviour of loudspeakers. In such work the methods used to derive distortion require the inductance to be represented by an equivalent circuit rather than the frequency domain models of Wright  and Leach . A new technique for measurement of displacement and frequency dependent impedance has been introduced. The complex relationship between coil impedance, frequency and displacement has been both measured and modelled, using stationary transient FEM, with exceptional agreement. Results show that the impedance model requires that its parameters vary independently with displacement to satisfactorily describe all cases. Distortion induced by the variation of impedance with coil displacement is predicted using a lumped parameter method, this prediction is compared to measurements of the actual distortion. The possibility of using a dynamic transient FE method to predict distortion is demonstrated. The nature of the distortion is discussed.
Voice Coil Impedance as a Function of Frequency and Displacement
The nonlinear stiffness K(x) and the reciprocal compliance C(x) of suspension parts (spider, surrounds, cones) and passive radiators (drones) are measured versus displacement x over the full range of operation. A dynamic, nondestructive technique is developed which excites the suspension parts pneumatically under similar condition as operated in the loudspeaker. The nonlinear parameters are estimated from the measured displacement and sound pressure signal. This guarantees highest precision of the results as well as simple handling and short measurement time. The paper develops the theoretical basis for the new technique but also discusses the practical handling, interpretation of the results and their reproducibility.
Dynamical Measurement of Loudspeaker Suspension Parts
AES paper 5732 presented a mesh-less, analytic 3-D solution to the problem of an acoustic source located in a nonanechoic room. By applying the inverse Fourier transform, temporal data was extracted to form a complete time and frequency domain description. This paper uses the same model to investigate the correlation properties of the acoustic radiation. Maps are produced, showing how the correlation varies with position in space. Statistical analysis suggests a possible objective classification of the diffuseness of acoustic fields by scalar quantities such as mean and standard deviation of correlation coefficient values. The distribution of correlation values is seen to follow the beta distribution quite closely.
Modelling Acoustic Room Interaction for Pistonic and Distributed-Mode Loudspeakers in the Correlation Domain
Goldberg, Andrew; Makivirta, Aki; Varla, Ari
In professional audio applications, small loudspeakers are often mounted on or near (within the loudspeaker 's near field region) large solid surfaces, such as mixing consoles, desktops and work surfaces. In approximately two-thirds of loudspeakers mounted in such a fashion, the magnitude response is compromised in a predictable and systematic way. An upward deviation of peak value 5.0dB ±1.5dB centred on 141Hz ±31Hz was observable in approximately 80% of the cases studied. An additional Room Response Control in active loudspeakers is proposed to compensate for this aberration. A statistical analysis of 89 near-field loudspeakers helps define the correction filter, and quantifies the effectiveness of the fixed filter design. Use of the proposed filter in an automated response optimisation algorithm for in-situ response equalisation is demonstrated.
Compensating the Acoustical Loading of Small Loudspeakers Mounted Near Desktops
An application of the image source model for computing the interior sound field of a loudspeaker enclosure is presented. The image source model allows computing the effects of individual reflections, enclosure modes, etc. on the response of the speakers in an efficient manner for rectangular enclosures, both sealed and ported. The effect of the absorbent material can be included without excessively adding the computational complexity. As the model makes no assumptions on the enclosure size, it can be equally well applied for modeling e.g. in-wall loudspeakers, where a combined model of room and loudspeaker responses can be developed.
Image Source Model for Loudspeaker Enclosure
Geiger, Ralf; Herre, Juergen; Huang, Haibin; Lin, Xiao; Rahardja, Susanto; Yu, Rongshan
As the latest extension of MPEG-4 Audio coding, MPEG-4 Lossless Audio Coding includes a scalable audio coding solution (SLS) that integrates the functionalities of lossless audio coding, perceptual audio coding, and fine granular scalable audio coding into a single coder framework while providing backward compatibility to MPEG Advanced Audio Coding (AAC) at the bit-stream level. Despite its abundant functionalities, SLS still achieves a compression performance that is comparable to state-of-the-art non-scalable lossless audio coding algorithms. As a result, SLS provides a universal digital audio format for a variety of application domains including professional audio, Internet music, consumer electronics, broadcasting and others. This paper presents the structure of SLS and its latest developments during the MPEG standardization process.
MPEG-4 Scalable to Lossless Audio Coding
Crockett, Brett G.
A new audio coding tool that uses improved time scaling synthesis techniques has been developed which reduces the duration of pre-noise introduced by low-bit rate audio coding of transient material. When transient pre-noise reduction processing is used, decoded PCM audio located prior to transient material is processed in the decoder using time scaling synthesis. The synthesized PCM audio is used to remove or reduce the duration of transient prenoise, improving the perceived quality of low-bit rate audio coded transient material.
Improved Transient Pre-Noise Performance of Low Bit Rate Audio Coders Using Time Scaling Synthesis
Lavoie, Michel C.; Soulodre, Gilbert A.
Spectral Band Replication (SBR) was developed as a means of enhancing the coding of audio signals. It has been recently proposed to use SBR, integrated within the MPEG Layer II codec, as a possible extension to the EUREKA 147 DAB standard. The goal is to provide an equivalent level of subjective quality at a reduced bitrate. In the present study, formal subjective tests were conducted to evaluate the performance of Layer II+ SBR at typical DAB bitrates. The tests included Layer II+ SBR codecs operating at 128 and 160 kbps, as well as a standard Layer II codec at 128, 160, and 192 kbps. The subjective tests were conducted using the ITU-R BS. 1534 (MUSHRA) methodology.
Subjective Evaluation of MPEG Layer II with Spectral Band Replication
Disch, Sascha; Ertel, Christian; Faller, Christof; Herre, Juergen; Hilpert, Johannes; Hoelzer, Andreas; Kroon, Peter; Linzmeier, Karsten; Spenger, Claus
Recently, a new approach in low bitrate coding of stereo and multi-channel audio has emerged: Spatial audio coding permits an efficient representation of multi-channel audio signals by transmitting a downmix signal along with some compact spatial side information describing the most salient properties of the multi-channel sound image. Besides its impressive efficiency allowing multi-channel sound at total bitrates of only 64kbit/s and lower, the approach is also backward compatible to existing transmission systems and thus accommodates a smooth transition towards multichannel audio in the consumer market. The paper gives an overview of the basic concepts and the options provided by spatial audio coding technology. It reports about some recent performance data, first commercial applications and related activities within the ISO/MPEG standardization group.
Spatial Audio Coding: Next-Generation Efficient and Compatible Coding of Multi-Channel Audio
Recently, various schemes were proposed for parametric coding of stereo and multi-channel audio signals. Binaural Cue Coding (BCC) is such a technique. It represents multi-channel audio signals as a single downmixed channel plus a small amount of side information. BCC can be applied to mono and stereo backwards compatible coding of multi-channel audio signals. In this paper, we propose a general paradigm for BCC with multiple transmission channels and show how this can be applied not only to bridging between mono/stereo and multi-channel surround but also to bridging between different multi-channel surround formats.
Coding of Spatial Audio Compatible with Different Playback Formats
The Boundary Element Method (BEM) is a well known tool in acoustics for the calculation of radiation from vibrating surfaces. When using BEM for the calculation of horn loudspeakers, the horn surface is described by its surface admittance; the connected driver is modeled by the velocity distribution at the common junction of driver and horn. Measurements of the velocity distribution have shown that higher order modes within the horn throat can be excited by the horn driver (presented at the 116th AES convention). On the other hand, a two-port description of the driver together with a plane-wave velocity distribution for the BEM calculation leads to good results. It is investigated to what extend higher order modes at the driver's mouth contribute to the sound radiation.
Do Higher Order Modes at the Horn Driver's Mouth Contribute to the Sound Field of a Horn Loudspeaker?
Harris, Neil; Hildyard, Alan; Taylor, Valerie
The tweeter in a two-way loudspeaker was replaced by a unit having a natural bandwidth of 300 Hz to 20 kHz. This gave a much greater degree of freedom to the choice of cross-over frequency than would normally be possible. The first part of this paper looks at the potential benefits such freedom could bring to the acoustical performance of the loudspeaker. The second part reports results of early listening tests, which were conducted to discover the most preferred cross-over frequency in the range 700 Hz to 3 kHz.
Investigating the Potential Benefits to Both the Objective and Subjective Performance of a Two-Way Loudspeaker Obtained by Using a Wide-Band “tweeter” to Place the Cross-Over at a Lower Than Usual Frequency.
Olive, Sean E.
A new model is presented that accurately predicts listener preference ratings of loudspeakers based on anechoic measurements. The model was tested using 70 different loudspeakers evaluated in 19 different listening tests. Its performance was compared to 2 models based on in-room measurements with 1/3-octave and 1/20-octave resolution, and 2 models based on sound power measurements, including the Consumers Union (CU) model, tested in Part One. The correlations between predicted and measured preference ratings were: 1.0 (our model), 0.91 (inroom, 1/20th-octave), 0.87 (sound power model), 0.75 (in-room, 1/3-octave), and - 0.22 (CU model). Models based on sound power are less accurate because they ignore the qualities of the perceptually important direct and earlyreflected sounds. The premise of the CU model is that the sound power response of the loudspeaker should be flat, which we show is negatively correlated with preference rating. It is also based on 1/3-octave measurements that are shown to produce less accurate predictions of sound quality.
A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part II - Development of the Model
Jabbari, Ali; Kanellakopoulos, Ioannis; Kantor, Kenneth L.
The role of compliant parts in the operation of loudspeaker drivers is discussed, and a new method of construction employing a magnetic suspension system is presented. Audio transducers require a complex interaction between moving and non-moving structures, placing conflicting demands on the soft parts typically employed to interface between them. The limitations of current materials and of manufacturing technology suggest that replacing flexible and compliant mechanical parts with a system based on magnetic forces might yield several benefits. Such a system, which utilizes a moving magnet balanced between static repulsive forces, is discussed conceptually, analytically and experimentally. Proposed advantages include increased linear excursion, convenient form-factor, reduced wear and fatigue, and the simplification of certain production processes.
Compact Magnetic Suspension Transducer
Nonlinear effects in horn drivers are the inseparable part of the principle of their operation. In addition to the distortion caused by electrodynamic and mechanical effects, the distortion is generated in the compression chamber by the nonlinear adiabatic compression, modulation of the air's mechanical stiffness, mass, and viscous losses, and by the nonlinear relationship between the particle velocity and the sound pressure. A new more accurate nonlinear model of compression chamber has been developed. A significant part of distortion is generated in the phasing plug and the horn due to the nonlinear propagation of the high pressure sound waves. Quantitative comparison of nonlinear effects in compression chamber and horn is carried out. Comparison is performed by using such criteria as harmonic distortion and two-tone intermodulation distortion.
Comparative Analysis of Nonlinear Distortion in Compression Drivers and Horns
Keele, Jr., D.B. (Don)
Small-signal calculations show that the maximum nominal efficiency of a horn loudspeaker compression driver is 50% and the maximum true efficiency is 100%. Maximum efficiency occurs at the driver's resonance frequency. In the absence of driver mechanical losses, the maximum nominal efficiency occurs when the reflected acoustic load resistance equals the driver 's voice-coil resistance and the maximum true efficiency occurs when the reflected acoustic load resistance is much higher that the driver’s voice-coil resistance. To maximize the driver 's broad-band true efficiency, the Bl force factor must be increased as much as possible, while jointly reducing moving mass, voice-coil inductance, mechanical losses, and front airchamber volume. Higher compression ratios will raise high-frequency efficiency but may decrease mid-band efficiency. This paper will explore in detail the efficiency and design implications of both the nominal and true efficiency relationships including gain-bandwidth tradeoffs.
Maximum Efficiency of Compression Drivers
Devantier, Allan; Rapoport, Zachary
Bass reflex ports are used in loudspeakers to enhance low frequency performance. At low sound levels the port extends the low frequency response by supplying one of the components of a Helmholtz resonator. At higher sound levels the turbulent intensity in the port increases disrupting the Helmoholtz resonance causing distortion, noise and compression. Although there has been significant work done to reduce these negative effects, no optimal solution has been found. To better understand the flow phenomena within the port, Computational Fluid Dynamics, CFD, was used to model the flow. The flow was simulated for six port profiles over a wide range of sound levels. In order to correlate the results of the CFD work to the real world, the same six ports were prototyped and subjected to several objective and subjective tests.
Analysis and Modeling of the Bi-Directional Fluid Flow in Loudspeaker Ports
Marques, Avelino; Freitas, Diamantino
Three time domain digital inverse filter design techniques are considered for non-minimum phase loudspeaker systems equalization, namely: FIR filter obtained with adjustable modeling delay, IIR filter followed by "excessphase" compensation and warped filter also followed by "excess-phase" compensation. Off-line inverse filtering results using real measured impulse responses of loudspeaker systems are compared and discussed for each design technique on the basis of the time equalization error, similar response’s magnitude flatness, phase linearity and filter order. Real-time inverse filter implementations requirements on a real set-up, using a digital signal processor of the Texas Instruments TMS320 family are also compared based on computational load and memory needs. Results show that loudspeaker equalization with an inverse IIR filter followed by "excess-phase" compensation appears as a good compromise solution
Comparison of Inverse Filter Real-Time Equalization Methods for Non-Minimum Phase Loudspeaker Systems
Andersen, Robert L.; Crockett, Brett G.; Davidson, Grant A.; Davis, Mark F.; Fielder, Louis D.; Turner, Stephen C.; Vinton, Mark S.; Williams, Phillip A.
An extension to the existing Dolby Digital (AC-3) multichannel audio coding standard is described and its new capabilities explored. This coding system is designed with extensive compatibility with the existing standard by retaining most of its metadata and data-framing structure to preserve and extend functionality in existing applications. New features include simultaneous support for multiple program streams, carriage of multichannel signals beyond 5.1 channels, and fine-grained control and support for data rates up to 6.144 Mbps. New coding "tools" including spectral extension, enhanced channel coupling, transient pre-noise processing, and improved filterbank/quantization enhance the performance of earlier AC-3 technology.
Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System
Hirschfeld, Jens; Klier, Juliane; Kraemer, Ulrich; Schuller, Gerald; Wabnik, Stefan
The Ultra Low Delay (ULD) codec developed at the Fraunhofer IDMT is based on a versatile perceptual audio coding method that achieves very low encoding/decoding delay and is nevertheless capable of high compression ratios. Utilizing a perceptual model for irrelevance reduction, the ULD codec is in principle a variable bit rate codec. To achieve coding with constant bit rate, the use of bit reservoir techniques would result in additional coding delay. This paper presents a rate loop which ensures constant bit rate coding without increasing coding delay. It is shown that this technique does not decrease the decoded audio quality significantly.
Ultra Low Delay Audio Coding with Constant Bit Rate
Lee, Jun Wei; Lemma, Aweke; van der Veen, Michiel
In a repeated quantization scenario, apart from the nominal quantization error, an additional tandem error is introduced. The amount of the excess tandem error depends on the characteristics of the quantizers used. In this work, the effect of tandem error and its dependence on the underlying quantizer characteristics are analyzed. A prime example where tandem error leads to increased noise is in audio transcoding. The behavior of tandem-noise for typical audio coding methods such as MPEG 1 Layer 2 is investigated. A method of reducing the tandem errors is proposed. This method involves guiding the quantization process of the first quantizer, assuming prior knowledge of the second quantizer. Results show that the method is able to reduce the amount of tandem error in the repeated quantization scenario.
An Analysis of Tandem Error During Audio Transcoding
Ehret, Andreas; Horich, Holger; Kjorling, Kristofer; Purnhagen, Heiko; Roden, Jonas
aacPlus, the combination of the well known MPEG AAC and the Spectral Band Replication tool SBR has been introduced as a highly efficient low-bitrate audio codec, representing today's State-of-the-Art by providing full bandwidth, near CD audio quality at 48kbit/s stereo. It is thus suited for applications that demand highest compression ratios. This paper discusses benefits when using aacPlus at moderate compression ratios in the range of 80 to 128kbit/s stereo, where so far AAC was the codec of choice. The technological approach for applying SBR in such a scenario is described and subjective evaluations of the presented solution as well as an overview on system and implementation aspects are given.
aacPlus, Only a Low-Bitrate Codec?
Chen, Li-Wei; Chien, Chu-Ting; Hsiao, You-Hua; Lee, Wen-Chieh; Li, Zheng-Wen; Liu, Chi-Min; Su, Ming-Ton; Yang, Chung-Han
Bit reservoir controlling the bits budget among music frames has been the kernel module to have good bit-quality tradeoff in current audio encoders like MP3 and AAC. The approaches of bit reservoirs can be investigated from demand-driven approach and budget-driven one. Demand-driven approach determines the required bits according to the audio contents while budget-driven one allocates bits according to the bit budgets accumulated in the bit reservoir. Existing bit reservoirs follow basically the budget-driven approach. This paper presents an efficient bit reservoir design with concerns from both demand and budget. The bit reservoir includes a demand estimator to adaptively predict the bits required for each frame. Also, the bit reservoir has a budget regulator to control the bits used according to the codec protocol and the preferred scenario. The new bit reservoir method is included in MP3 and AAC to verify the efficiency through extensive objective and subjective tests.
Efficient Bit Reservoir Design for MP3 and AAC
Chang, Tzu-Wen; Chien, Chu-Ting; Chiou, Ting; Hsiao, You-Hua; Hue, Hen-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Peng, Kang-Yan; Yang, Chung-Han
The state-of-art natural audio coder, MPEG-4 AAC  , has provided the extensive coding modules for achieving high coding efficiency. The modules, which includes filter bank, window switch, psychoacoustic model, bit allocation, bit reservoir, lossless coding, temporal noise shaping, and middle/side coding, has span a large design dimension and challenge for audio coding technology. In this paper, the design of these modules, named as NCTUAAC encoder, is presented to provide adorable audio quality with low computation complexity.
Design of MPEG-4 AAC Encoder
This paper discusses a hybrid audio synthesis method employing both additive synthesis and DPCM audio playback, and the implementation of a miniature synthesizer system that accepts MIDI as an input format. Additive synthesis is performed in the frequency domain using a weighted overlap-add filterbank, providing efficiency gains compared to previously known methods. The synthesizer system is implemented on an ultra-miniature, low-power, reconfigurable application specific digital signal processing platform. This low-resource MIDI synthesizer is suitable for portable, low-power devices such as mobile telephones and other portable communication devices. Several issues related to the additive synthesis method, DPCM codec design, and system tradeoffs are discussed.
Frequency-Domain Additive Synthesis with an Oversampled Weighted Overlap-Add Filterbank for a Portable Low-Power MIDI Synthesizer
Huovilainen, Antti; Janis, Pekka; Kanerva, Aki; Karjalainen, Matti; Maki-Patola, Teemu
A combination of hand-held controllers and a guitar synthesizer is called here the "Virtual Air Guitar". The name refers to playing an "air guitar", i.e., just acting the playing with music playback, and the term virtual refers to making a playable synthetic instrument. Sensing of the left hand position is used for pitch control, the right hand movements for plucking, and the finger positions of both hands for other features of sound production. The synthetic guitar algorithm supports electric as well as acoustic sounds, augmented with sound effects and intelligent mapping from playing gestures to synthesis parameters. The realization of the virtual instrument is described and sound demos are made available.
Virtual Air Guitar
This paper describes the concepts, design, implementation, and evaluation of two new interfaces for music performance and composition and their control of various synthesis algorithms through the visual domain. Both of the interfaces were inspired by the idea of generating music through drawing, but they approach the activity in different ways; while the Graphonic Interface allows you to make music as you are drawing, the Sonic Scanner needs pre-composed graphic material in order to make music. However, both of the devices are real-time controllers that produce sound in an interactive manner, thereby allowing them to be used as performance instruments.
Visually Controlled Synthesis Using the Sonic Scanner and the Graphonic Interface
Nowadays, meta-data and audio descriptors extraction (used in classification, for instance) is engaged in a rather blind and brute-force method, computing the most possible and then selecting what works using an often long and boring statistical analysis. Moreover, this analysis barely takes into account the intrinsic sense these descriptors may carry. Mana is graphical user-interface (GUI) based system that aims at adding a bit of human supervision in the process, combining state-of-the-art classification methods (in audio-content extraction and classification) with an ease of use that provides the user with direct control over descriptors and their significance in classification.
Mana, a Tool for Human-Supervised Statistical Analysis in Audio-Content Extraction
Bharitkar, Sunil; Kyriakakis, Chris
In this paper we present an application of a multi-dimension to two-dimension projection algorithm for visualizing room equalization at multiple locations. The algorithm allows easy visualization of the formation of clusters for our previously presented pattern recognition based multiple listener room equalization filter. Furthermore, the mapping provides an interesting perspective on the formation of room response clusters. We also compare the results obtained from using the proposed map to the results obtained by using the spectral deviation measure.
Nonlinear Projection Algorithm for Evaluating Multiple Listener Equalization Performance
Lazzaro, John; Wawrzynek, John
The Real-Time Protocol (RTP) is an extensible transport for sending media streams over Internet Protocol packet networks. We describe a new payload format that extends RTP to transport MIDI (the Musical Instrument Digital Interface command language). The payload format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as the remote operation of musical instruments) and content-delivery applications (such as file streaming). The format defines tools for graceful recovery from packet loss, to support use over lossy unicast and multicast networks (including wireless networks). Stream behavior, including the MIDI rendering method, may be specified during session setup. Rendering methods are specified using the extensible Multipurpose Internet Mail Extensions (MIME) registry.
An RTP Payload Format for MIDI
Chafe, Chris; Gurevich, Michael
Pairs of musicians were placed apart in isolated rooms and asked to clap a rhythm together. Each person monitored the other 's sound via headphones and microphone pickup which was as close as possible. Time delay from source to listener was manipulated across trials. Trials were recorded and clap onset times were measured with an event detection algorithm. Longer delays produced increasingly severe tempo deceleration and shorter delays (<11.5ms) produced a modest, but surprising acceleration. The study 's goal is to characterize effects of delay on rhythmic accuracy and identify the region most conducive to ensemble playing. The results have implication for networked musical performance. Network delay is a function of transmission distance and/or internetworking (routing) delays. The findings suggest that sensitive ensemble performance can be supported over rather long paths (e.g., San Francisco to Denver at about 20ms, one-way). The finding that moderate amounts of delay are beneficial to tempo stability seems, at first glance, counterintuitive. We discuss the observed effect.
Network Time Delay and Ensemble Accuracy: Effects of Latency, Asymmetry
Kurittu, Antti; Mattila, Ville-Veikko
The ITU-T P.862 Recommendation specifies the Perceptual Evaluation of Speech Quality (PESQ) algorithm that is the current industrial standard for the objective, intrusive assessment of the one-way speech quality of narrowband networks and speech codecs. The practical use of P.862, however, has raised several questions about the robustness, applicability, and accuracy of the algorithm. The current paper presents results from an investigation of these issues. The characteristics of test signals and the interferences of signal interfaces are shown to have a significant effect on the quality assessment with P. 862. A measurement procedure is proposed to define the accuracy of P.862 in the comparison of different or unknown technologies. It is concluded that various test factors should be carefully defined so as different P. 862 measurements to be comparable.
Practical Issues in Objective Speech Quality Assessment with ITU-T P.862
Cook, Gary; Eastty, Peter; Marshall, Richard; Page, Michael
This paper describes a family of related technologies that provide a very high-performance audio interconnection system for professional applications. The first is an advanced point-to-point audio interconnection based on 100 Megabit Ethernet physical layer, which is currently undergoing AES standardization. The second is a complementary Gigabit-based high-capacity interconnection for applications such as backbone links. These are all linked together with a router technology that provides both low-latency audio channel routing, and packet-switched control data routing. These technologies together provide a flexible, high-bandwidth digital audio infrastructure, which is ideally suited for applications requiring low, deterministic latency and high reliability.
Integrated High-Performance Multi-Channel Audio Interconnection
Atkinson, Bob; Blank, Tom; Isard, Michael; Johnston, James D. (JJ); Olynyk, Kirk
We describe a system that applies Internet concepts and software techniques to deliver audio from source to speakers using common computing hardware. The techniques overcome clocking and jitter problems. Microphones built into each transducer to locate loudspeakers allow the system to identify speaker placement, automatically compensate for off-center listening locations, adjust for inter-channel gain, delay, and do frequency response matching. A research prototype demonstrates the concepts and measures the resulting quality.
An Internet Protocol (IP) Sound System
Czyzewski, Andrzej; Dziubinski, Marek; Kaczmarek, Andrzej; Kostek, Bozena; Maziewski, Przemyslaw
The engineered algorithms are presented for the detection of parasitic frequency modulation in audio originating from irregularities of sound carrier velocity. The algorithms were developed with special regard to non-periodic frequency modulation effects found in old movie sound tracks. The proposed algorithms consider the influence of the wow disturbance on the location of formants in time-frequency representation. The dynamic analysis of formant structures behavior underlies discriminating between parasitic frequency changes and natural frequency fluctuations. The compensation of the detected wow-related frequency modulation is accomplished basing on the non-uniform resampling algorithm, driven by the discerned parasite modulation patterns. The details of the proposed wow detection and compensation techniques are presented and achieved results are discussed.
Wow Detection and Compensation Employing Spectral Processing of Audio
Howarth, Jamie; Wolfe, Patrick J.
Here we describe a system whereby analog hardware is combined with the theory of nonuniform sampling in order to correct for wow and flutter effects in analog tape transfers. We show how in certain instances the medium itself can provide an accurate measurement of a recording's timing irregularities, in which case digital signal processing techniques permit a playback-rate correction of what is essentially an irregularly sampled audio waveform. Results using both real and synthetic data demonstrate the effectiveness of the method, both in cases of severe degradation as well as high-quality analog transfers heretofore considered normal.
Correction of Wow and Flutter Effects in Analog Tape Transfers
Huang, Hesu; Kyriakakis, Chris
Convolutive noise in terms of reverberation can significantly degrade the quality and intelligibility of the real-world audio recordings. To reduce this type of acoustic noise, we propose a single-microphone dereverberation method based on Constant Modulus Algorithm (CMA) - a blind deconvolution technique. In particular, a new Non-causal Delayless Subband Filtering architecture is designed and combined with CMA to reduce the computational complexity of the overall system. Experimental results show that our method presents a comparable performance to fullband CMA, but with less computational cost in dereverberating audio signals.
Computationally Efficient Blind Dereverberation of Audio Signals
Blum, Thom; Keislar, Douglas; Wold, Erling
Audio fingerprinting provides the base technology for a variety of recent applications. Aspects of underlying fingerprinting technology and some typical applications are presented. An effort is made to discuss both Audible Magic 's work as well as that of its competitors.
Audio Fingerprints: Technology and Applications
Imaoka, Keiichi; Ohga, Juro
There is still no suitable measuring method by a digital processing system for nonlinear distortion of acoustical devices. This paper presents a new amplitude nonlinearity measurement by using Pink-TSP (time stretched pulse) signal. This method applies a TSP signal, whose frequency band is eliminated partially, to the device. The detected component produced in the rejected band is measured as a distortion.
A New Digital Measurement for Distortion of Acoustical Devices
Nakano, Yusuke; Ohga, Juro
This paper describes a new measuring method for small size loudspeakers by using a tube load. The acoustical loads defined in IEC standards for loudspeaker measurement, both of closed boxes and a baffle, are too larger in size than the practical acoustical loads for small loudspeakers, for example, mobile telephone bodies. This paper proposes a tube load for measurement, and examines practical methods without any effect by tube resonance.
Measurement of Small-Size Loudspeaker Units by New Acoustical Loads
It is frequently desirable to make electro-acoustic measurements in ordinary working spaces. These measurements would normally be performed in anechoic chambers, since it is the response of the device under test that is needed, not the response of the room reflections. In the past 35 years various techniques have been employed to make what are commonly referred to as “quasi-anechoic” measurements. These techniques make use of the fact that the initial signal from a loudspeaker-microphone system is anechoic, until the first reflection arrives. By analyzing only that portion of the signal which arrives before the signal, an anechoic measurement is achieved. However, these measurements as a class suffer from a low-frequency limitation due to the shortness of the reflection-free time window. Time-frequency tradeoffs in the transformation of the impulse response to the frequency domain make it difficult to obtain an accurate estimate of the response of the device under test. We first characterize the nature of the errors induced by the short time window, and then propose methodologies for reducing the error.
Extending Quasi-Anechoic Measurements to Low Frequencies
Chiang, Tihao; Hang, Hsueh-Ming; Lai, Te-Hsueh; Lee, Chunyi; Yang, Cheng-Han
This paper presents a novel algorithm for transcoding the MPEG-4 AAC single-layer bit-streams for bit-rate adaptation purpose. The delivery of multimedia over heterogeneous networks and to the devices with various capabilities calls for the bit-rate adaptation. A previous approach that cascades a pair of full-grown decoder and encoder has very high computational complexity. Our approach can reduce the complexity drastically while the coding efficiency is close to that of the previous cascaded method. In order to achieve such simplification, three rate-distortion models/techniques are employed.
Efficient AAC Single Layer Transcoer
Lee, Keun-Sup; Park, Young-Cheol; Yeon, Kyu-Chel; Youn, Dae Hee
The goal of perceptual audio coder is to reduce redundancy and irrelevancy of audio signal based on the concept of masking. Several studies on masking effect reveal that the masking threshold varies as a function of the noise-like or tone-like nature of audio signals. Therefore, tonality of audio signal influences significantly the quality and efficiency of perceptual audio coder. In this paper, we proposed a new effective algorithm for tonality measure using spectrum energy. The performance of the proposed algorithm is comparable to the MPEG audio psychoacoustic model II (PAM-II). However, since the proposed algorithm consists of simple operations plus a few transcendental functions, computational complexity is much lower than the PAM-II. The proposed algorithm was tested with audio signals, and DSP implementation showed that the proposed algorithm could be implemented with 2.88 MIPS.
Effective Tonality Detection Algorithm Based on Spectrum Energy in Perceptual Audio Coder
Hsu, Han-Wen; Lee, Wen-Chieh; Li, Zheng-Wen; Liu, Chi-Min
This paper extends the previous work on AAC to the HE-AAC. The audio path method consists of two individual parts, zero band dithering and high frequency reconstruction. The zero band dithering can conceal the fishy artifact in the low frequency part that is encoded by a convention AAC encoder. Furthermore, high frequency reconstruction can extend the audio obtained from the SBR to a full bandwidth signal. Intensive experiments have been conducted on various audio tracks to check the quality improvement and the possible risks in degrading the quality. The objective test measures used is the recommendation system by ITU-R Task Group 10/4.
Audio Patch Method in MPEG-4 HE AAC Decoder
Hang, Hsueh-Ming; Yang, Cheng-Han
An efficient algorithm for removing inter-channel redundancy in subband audio coding is presented in this paper. In our approach, the bit-weighted inter-channel prediction is applied to the Modified Discrete Cosine Transform (MDCT) coefficients. Similar to the INT-DCT based approach, no audio quality degradation will be induced by our method. In addition, the bit rate reduction performance of our method is about 8% better than that of the INT-DCT based approach for the cases that inter-channel prediction is useful.
Bit-Weighted Inter-Channel Prediction for Subband Audio Coding
Tuffy, Mark A.
Over the last two decades, there have been dramatic advances in video game technology. In this time, audio for games has moved from monophonic beeps to full 5.1 surround sound, utilizing Dolby Digital and DTS. While the games industry has embraced these technologies, there are no standards or guidelines in place to ensure that game audio exploits the potential of this delivery mechanism. As a result, there is still the push toward "louder is better." One element key to moving away from "loud" to "quality" is establishing a reference level for playback. This paper suggests such a reference level and why this would be logical for the games industry to adopt.
Establishing a Reference Playback Level for Video Games
Rumsey, Francis; Ward, Peter; Zielinski, Slawomir K.
The investigation aimed to discover the effect of involvement in an interactive task on the perception of audio-visual asynchrony in a computer game environment. An experimental game was designed to test the investigated phenomenon. The experiment tested only audio lag conditions. It was found that within the confines of the experimental method, the threshold of perception was increased in the interactive game condition by approximately 40ms (±20ms), which is a small but statistically significant value.
Can Playing a Computer Game Affect Perception of Audio-Visual Synchrony?
Algazi, V. Ralph; Duda, Richard O.; Melick, Joshua B.; Thompson, Dennis M.
Motion-tracked binaural or MTB recordings preserve the dynamic sound localization cues provided by voluntary head motion, making MTB less sensitive than other binaural methods to mismatches between the HRTF of the listener and the HRTF of the recording system. However, MTB performance can be improved by customizing the reproduction process to the listener. In this paper, the different types of mismatch and their perceptual consequences are identified. Techniques are presented for partially or completely correcting the mismatches, and properties of these techniques are described.
Customization for Personalized Rendering of Motion-Tracked Binaural Sound
Hamasaki, Kimio; Hatano, Wataru; Hiyama, Koichiro; Komiyama, Setsu; Okubo, Hiroyuki
5.1 surround sound productions and broadcasts have become popular in Japan, and NHK has developed a 22.2 multichannel sound system for ultra-high definition video. While two-dimensional or three-dimensional sound reproduction is possible with these multichannel sound systems, the production of contents is more complicated and time-consuming than in two channel stereo production. In productions using a conventional surround sound mixing tool, in particular, much time is needed for creating two-dimensional sound effects. Therefore, an integrated surround panning system was developed to enable various surround sound effects to be created easily. This paper introduces the newly developed integrated surround sound panning system, which has innovative functions such as a distance control and an integrated sound source movement control, and discusses various issues concerning multichannel sound production.
5.1 and 22.2 Multichannel Sound Productions Using an Integrated Surround Sound Panning System
Ichikawa, Masaki; Muraoka, Teruo; Nakazato, Tomoaki
Locations of loudspeakers were examined utilizing Frequency dependent interaural cross correlation (FIACC) for optimum sound field recomposition in multichannel recording and reproducing process. Experiments were conducted by comparing pairs of FIACC, where one of a pair was measured in an original sound field and the other was measured in the reproduced sound field. Conclusively, it became clear that ITU 's recommendation to the speaker arrangement in 5 channel system is reasonable.
Examination of Multichannel Sound Field Recomposition Utilizing Frequency Dependent Interaural Cross Correlation (FIACC)
Braasch, Jonas; Martens, William L.; Woszczyk, Wieslaw
For economical reasons home entertainment surround sound systems are usually equipped with a single subwoofer channel. The main argument for this procedure is the believed inability of the auditory system to localize low frequencies in small reverberant rooms. However a psychoacoustic localization test that was conducted using a standard 5-channel set-up with subwoofers showed that the listeners were able to determine the lateral displacement left center or right of the loudspeaker presenting the test stimulus (an octave-band noise burst at 31.5-Hz, 63-Hz or 125-Hz center frequency). Using a binaural model simulating human perception recordings of subwoofers signals at different positions were analyzed. As expected the interaural level differences remained nearly constant for different subwoofer positions in the low frequency range. On basis of interaural time differences however the model was able to predict the position of the loudspeaker regarding the left/right dimension verifying the outcome of the listening test. The results indicate the importance to consider more than one subwoofer in multi-channel audio systems.
Modeling Auditory Localization of Subwoofer Signals in Multi-Channel Loudspeaker Arrays
Martens, William L.; Braasch, Jonas; Woszczyk, Wieslaw
This investigation addressed two primary questions relating to the use of subwoofers in reverberant reproduction environments, the first being whether listeners are able to discriminate between the auditory images resulting from correlated and decorrelated low-frequency signals, and the second being whether decorrelation between drivers produced identifiably greater listener envelopment. For the experiments reported in this paper, octave-band noise samples with center frequencies ranging in third-octave steps from 31.5Hz to 125Hz were presented via a multichannel loudspeaker array. These low-frequency signals could be either perfectly correlated (drivers receiving identical signals) or maximally decorrelated between selected pairs of five loudspeakers positioned according to the ITU standard configuration. Even in a small and highly reverberant listening room, decorrelated signals with center frequencies greater than 50Hz were both discriminably and identifiably different from correlated signals, but only when such low-frequency signals were reproduced via the left and right surround channel drivers.
Identification and Discrimination of Listener Envelopment Percepts Associated with Multiple Low-Frequency Signals in Multichannel Sound Reproduction
In previous papers the Multichannel Microphone Array Design (MMAD) procedure has been used mainly to determine the design of arrays giving complete 360° coverage of the sound field. Many sound recording engineers however use the main microphone array to cover only the front sound stage, and add in early reflections and reverberation either by artificial means (electronic generation) or by using a second array in the reverberation field. This paper describes MMAD procedure applied to only front coverage of the main sound stage using 3,4 or 5 channels-microphones, covering any desired angle within the front hemisphere, and for the usual 1st order microphone directivities. Various array alignments are described in the form of the arc-of-a-circle with different radius. All arrays described are critically linked (seamless) within the front hemisphere.
Multichannel Sound Recording Using 3,4 and 5 Channels Arrays for Front Sound Stage Coverage
Bruno, Remy; Laborie, Arnaud; Montoya, Sebastien
Multichannel recording is one of the most important challenges of today's audio techniques. A good surround recording should provide at the same time good envelopment feeling, accurate localisation, a large sweet spot and respect for tones. Fourier-Bessel theory and advanced signal processing allow to obtain directivities designed from panning laws, which have been designed to optimally drive any multichannel layout. This paper presents the underlying concept of High Spatial Resolution, the spatial equivalent for High Fidelity, and points out why this is a key point to achieve high spatial quality. A very flexible and scalable technology providing High Spatial Resolution, as well as a high-performance 5.0 microphone featuring a compact array of 16 omnidirectional capsules are also presented.
Designing High Spatial Resolution Microphones
Ben-Hador, Ronen; Ben-Hador, Ronen; Neoran, Itai
We discuss the capturing manipulation and reproduction of impulse responses (IRs) of acoustic spaces. While trying to maintain the accuracy of an IR, other factors such as sound quality and musical character of sound, should also be considered. Furthermore, IRs are not limited to preserving the sound of venues but also as a tool in music production. Therefore, the IRs are converted to standard multi-channel reproduction formats, such as stereo and ITU 5.0. In order to obtain a flexible reverb tool, the IRs are manipulated to modify acoustic properties such as reverb time and inter-channel de-correlation. A new real-time audio plug-in was developed for which IRs of venues and devices were recorded worldwide. The IRs are convolved with dry audio. The plugin supports mono, stereo, and surround, at sample-rates up to 96kHz
Capturing Manipulation and Reproduction of Sampled Acoustic Impulse Responses
Audio reproduction systems have as their first goal the ability to reproduce the program sounds at the level desired by the user. There are a number of areas in audio system design where this is critical. The system should be able to reproduce audio at the levels needed by typical users, and presumably should also have the ability to accommodate a substantial portion of the desired reproduction levels. Beyond this simple requirement, some systems should have their behavior optimized for the level at which they will be used. The literature on listening levels has been surveyed, and new data has been gathered to determine what the preferred listening levels are for a variety of listening circumstances. Additional experiments have been done to estimate the range of listening levels which may be acceptable to the typical listener.
Preferred Listening Levels and Acceptance Windows for Dialog Reproduction in the Domestic Environment
Skovenborg, Esben; Nielsen, Soren H.
The evaluation of twelve models of loudness perception is presented. One of the loudness models is based on a novel algorithm, and another is based on a combination of two known measurement techniques. The remaining models are all implementations of common or standardized loudness algorithms. The ability of each model to predict or measure the subjective loudness of speech and music segments is evaluated. The reference loudness is derived from two listening experiments using the speech and music segments as stimuli. Different statistical measures are employed in the evaluation of the models, so that both the absolute performance of the models and the performance relative to the between-listener disagreement are measured.
Evaluation of Different Loudness Models with Music and Speech Material
Cassidy, Ryan J.
In a recent work, we reviewed the basics of level detection for dynamic range compression, and presented various tunings of level detector parameters for optimal correspondence with well-known and recently-proposed facts pertaining loudness perception. In the paper, we review key points, and present several extensions. We review the mathematics behind the operation of the popular root-mean-square detector, with special attention paid to the effect of time constant on detector performance. We compare an equal loudness filter, designed in the prior work to compensate for frequency-dependent steady-state loudness perception, to standard weightings. Updated results based on newly standardized loudness contours are also presented.
Level Detection Tunings and Techniques for the Dynamic Range Compression of Audio Signals
Crockett, Brett G.; Seefeldt, Alan; Smithers, Michael
The applications of an accurate objective measure of subjectively perceived loudness are many. Accordingly, the ITU-R has initiated a study to identify such a measure for a new recommendation. A new objective loudness measure based on modifications to a traditional psychoacoustic model of perceived loudness was developed for this study. When compared to subjective loudness matching data generated outside the ITU-R, the new measure is found to perform better than simpler weighted power measurements and the unmodified psychoacoustic model.
A New Objective Measure of Perceived Loudness
Hess, Wolfgang; Merimaa, Juha
A group of listeners were engaged in training to learn to evaluate auditory source width (ASW) and listener envelopment (LEV). The training consisted of discussions on perception of spatial sound and visualization of both attributes with drawings. After each session the subjects evaluated the ASW and LEV of a set of stimuli consisting of different source signals simulated in a few chosen acoustical environments. Most subjects developed consistent criteria for their judgements and maintained them throughout the training and a subsequent control two months later. However, considerable individual differences were found. Analysis of the data revealed that large part of the differences was due to different judgements between the chosen source signals. The training also suggested that some differences could have been caused by the translation from multi-dimensional perception to the unidimensional judgements. A further graphical evaluation of the stimuli showed that this was not the case.
Training of Listeners for Evaluation of Spatial Attributes of Sound
Agerkvist, Finn T.; Kvist, Preben; Lee, Joonhyun; Park, Sangil; Thomsen, Carsten
This paper describes the development of the first version of the Sound Quality Evaluation System. The purpose of the system is to predict the subjective sound quality of home theatre systems based on objective measurements. 16 home theatre systems were measured in an anechoic room. Several metrics expected to correlate with the subjective quality were proposed and tested. A model for the sound quality was created by mapping the subjective evaluations of the home theatre systems with the metrics calculated for each system. Correlation between subjective listening test and the prediction is presented.
Development of a Sound Quality Evaluation System
Schobben, Daniel; van de Par, Steven
The impact of using loudspeaker versus headphone playback on the subjective quality of compressed audio is investigated. It is shown that reverberation and to a lesser extent cross-talk, which both are introduced naturally in loudspeaker playback, can effectively hide coding artifacts. In double blind listening tests subjects had to rate MP3 coded excerpts at various bit-rates. The excerpts were played back over headphones. Reverberation and cross-talk were introduced artificially to simulate loudspeaker playback, so that their impact could be assessed separately. Results show that quality scores of the reverberated excerpts were significantly higher than for the corresponding 'dry' excerpts for 64 kb/s bit-rate while these differences diminished with increasing bit-rate. This indicates that coding artifacts can become less audible in reverberant listening conditions.
The Effect of Room Acoustics on MP3 Audio Quality Evaluation
Herzog, Philippe; Lavandier, Mathieu; Meunier, Sabine
This study deals with the relationships between two parallel evaluations of a panel of loudspeakers: perceptual measurements and physical ones. The sound radiated by the loudspeakers has been recorded. The recordings were submitted to both listening tests and signal analysis. Pair-comparison tests were run using headphones, so the spatial dimension of sound reproduction is not investigated. This first attempt revealed two main perceptual dimensions. They were independent of the tested recording techniques and musical excerpts. We determined a suitable method of analysis for the physical measurements, and then we looked for objective attributes correlated with the perceptual dimensions.
The Restitution of Timbre by Loudspeakers in a Listening Room: Perceptual and Physical Measurements.
Agerkvist, Finn T.; Fenger, Lars M.
This paper presents subjective tests of the active transducers concept. The tests are designed to determine whether the output filter on class D amplifiers used in an active loudspeaker can be omitted without audible error occurring. The input signal of the amplifiers was limited to 0-3 kHz corresponding to that of a woofer unit. A listening panel of 7 persons was used in the tests. The tests which showed that no errors could be detected.
Subjective Test of Class D Amplifiers Without Output Filter
Kagawa, Yukio; Kyouno, Noboru; Usagawa, Tsuyoshi; Yamabuchi, Tatsuo
Acoustic responses of an axisymmetric cone-type loudspeaker mounted in an infinite baffle have been analyzed as an electro-mechano-acoustic transducer by applying the Finite Element Method; conical shell elements to the mechanical system and triangular ring elements to the acoustic system. The outer semi-infinite space where sound is emitted from the cone speaker is treated analytically by applying the Green’s function. The mechano-acoustic system of the cone speaker is connected to the electrical system by an electro-mechanical equivalent circuit. The calculated sound pressure responses are compared with measured results, demonstrating that the calculated responses are very good predictors of measured results.
Acoustic Response Simulation of a Cone-Type Loudspeaker by the Finite Element Method
This paper presents transducer nonlinearity analysis in view of compensation of these effects. We describe an experimental method of weak nonlinearity characterization, based on a representation of the nonlinearity by Volterra series and using multitone excitations. Device linearization can be achieved by applying the inverse nonlinearity upstream of the device, on the condition that the nonlinearity law is known. To address the need to distinguish nonlinear effects from linear distortions, an ad hoc experimental method has been developed. The characterization of a weakly nonlinear electroacoustic device with usual methods of measurement (THD, intermodulation) does not illustrate the nonlinearities themselves, but only some of their effects. This is the reason why this characterization method was developed.
Zhang , Z.L.; Zong, F.D.
Japanese expert Yoshinisa studied the total harmonic distortion in nonlinear phenomena of the loudspeaker in a low frequency range caused by nonlinear mechanic resilience, but he ignored the case of nonlinearity of mechanic resilience and magnetic field. The authors' work focuses on finding the total harmonic distortion of the nonlinear motion of the loudspeaker by means of numerical calculation. They obtained the numerical solution through numerical calculation using MATLAB software, and the corresponding curves about the total harmonic distortion versus frequency through spectrum analysis using SPECTRA PLUS software. They also analyzed the influence of nonlinearity of magnetic field on the total harmonic distortion of the loudspeaker, and drew several useful conclusions.
Numerical Analysis of Total Harmonic Distortion of a Loudspeaker in a Low-Frequency Range
Fujii, Jun; Ohga, Juro; Sasida, Norikazu
This paper presents construction and characteristics of a small size loudspeaker with a multilayer piezoelectric ceramic bimorph diaphragm for a mobile telephone use. The multilayer ceramic wafer is characterized by lower operation voltage due to lager capacitance than it of conventional ceramics. This is suitable for mobile equipment with a battery operation. Precise measurement of diaphragm parameters and analysis of loudspeaker response are described in this paper.
Characteristic of Loudspeaker by a Multilayer Piezoelectric Ceramic
A theoretical investigation of the conventional speaker, placed into a rigid wall, has been presented at AES 116 . The quasi-dynamic approach to the loading force and pressure was introduced there. The speaker diaphragm was regarded as a number of concentric rings. The acoustic pressure and surface velocity is predicted using the first two approximations in . These simulations show the difference between the real measurements and the analytical models, used with some standard assumptions from the classic theory of plates. The results suggest that the modelling process needs to become more specified. Hence, to simulate the behaviour of a standard suspension, the outer, clamped ring was given the characteristics of rubber, with non-linear deformations. The 4th approximation of the problem is formulated in this paper.
Radiation of Enclosed Loudspeaker in Baffle: Simulation Model and Results
Jabbari, Ali; Unruh, Andy
The resonance behavior of a driver with low damping is studied. In such a system, the existing nonlinearities can result in jump resonance, a bifurcation phenomenon with two regimes. One regime, accompanied by a sudden decrease in amplitude, is evident when the frequency of excitation is increasing. The other regime, exhibiting a sudden increase in amplitude, is present when the frequency of excitation is decreasing. Jump resonance was experimentally observed in an audio transducer with low damping and subsequently confirmed by analysis and simulation using a detailed dynamic model that includes the most significant sources of nonlinearities. The conclusion of this work is that the primary cause of jump resonance in audio transducers is the nonlinearity in the driver compliance. The importance of this phenomenon increases as the use of current amplifiers becomes more widespread, since the resulting low system damping makes jump resonance more likely.
Jump Resonance in Audio Transducers
Jiang, Chao; Shen, Yong; Zou, Jian
A uniform method is presented to calculate impulse response of an arbitrary point of sound field radiated by a line loudspeaker array. The frequency response is also obtained by applying FFT technology to the impulse response. It is shown that, in any point of sound field, the frequency response is similar to a low-pass filter and the cut off frequency varies with the position of the observation point.
Impulse Response and Frequency Response of a Line Loudspeaker Array
Bortoni, Constancio; Bortoni, Rosalfonso; Noceti Filho, Sidnei; Seara, Rui
With loudspeakers operating in a high power environment (common in PA systems), the voice-coil overheating and the excessive cone displacement are the main causes of damages and faults. These drawbacks are related to the low efficiency and cone displacement limitation, respectively. This paper proposes a procedure to measure and control both the voice-coil temperature and cone displacement by using a digital signal processor (DSP). The voice-coil temperature and cone displacement are indirectly obtained from the coil DC resistance variation and the cone acceleration, respectively. This approach takes into account (by measuring) some real characteristics of the loudspeaker, as its inherent nonlinearities. Thus, we can obtain the most from the sound system, since it may now work without the usual safety margin needed for these systems.
Real-Time Voice-Coil Temperature and Cone Displacement Control of Loudspeakers
True, Robert; Unruh, Andrew D.
Three different multi-diaphragm loudspeaker transducers with a tubular form factor are investigated. The transducers consist of a conventional motor structure, a tubular housing, and multiple diaphragms. In one design, sound is generated by the relative motion between a housing that is driven by a single motor and diaphragms that are attached to the housing via flexible surrounds. In a second design, a single motor drives one set of diaphragms and sound is generated by the relative motion between the diaphragms and the fixed housing. In the final design, two motors are used to drive two sets of diaphragms in opposition and sound is generated by the relative motion between them.
Loudspeaker Transduers with an Alternative Tubular Form Factor
Anthony, Jamie; Celmer, Robert; Foley, Dan; Pagliaro, Tony; Sachwald, Benjamin; Thompson, Shane
Loudspeaker assembly faults, such as a rubbing voice coil, bent frame, loose spider, etc., have traditionally been detected using experienced human listeners at the end of a production line. Previous attempts to develop production measurement systems for on-line testing typically analyze only low-order harmonics for the primary purpose of measuring total harmonic distortion (THD), and thus are not specifically designed to detect defective rub, buzz, and ticking sounds. This paper describes a new method wherein the total energy of high-order harmonics groups, for example, 10th through the 20th or 31st through the 40th, are measured and analyzed. By grouping high-order harmonics and resolving their respective total energies, distinct signatures can be obtained that correlate to the root cause of audible rub and buzz distortions (Temme, 2000). The paper discusses loudspeakers tested with specific defects, as well as results of a computer-based electroacoustic measurement and analysis system used for detection.
Higher Order Harmonic Signature Analysis for Loudspeaker Defect Detection
Goldin, Alexander A.
The paper presents Long Range Noise Canceling (LRNC) microphone array technology developed in Alango Ltd. LRNC is a digital signal processing technology utilizing near field signals from two unidirectional or four omnidirectional microphones. It allows differentiation between user 's voice originating in a closed region in front of LRNC microphone and other sounds that are effectively blocked. The pick up range of LRNC microphone may be as large as 70cm in front of the microphone and, if necessary, may be easily reduced by changing corresponding software parameters. This unique property makes LRNC microphone attractive for a variety of voice applications where distant sounds, noises or acoustic echoes must be blocked.
Long Range Noise Canceling Microphone
Miller III, Robert (Robin)
Higher sampling rates are necessary for high spectral resolution, but it is higher angular resolution and precision that preserves source directionality, and therefore higher tonal/timbral quality of that source, termed spatial definition. In acoustic spaces that are extensions of musical instruments, voices, and sources of sound effects (for movies, virtual reality, training simulation), tonality is a major contributor to lifelike perception - but in audio reproduction, lifelike tonality is limited by the recording system. A surround microphone has been developed both for more precise 2D surround ("PanAmbio"), compatible with ITU 5.1 and stereo, and for "PerAmbio" 3D (with height) for the ultimate in tonal reality distributable using ordinary 6-channel media for either decoderless 2D replay or 3D with decoder and five additional speakers.
Spatial Definition and the PanAmbiophone Microphone Array for 2D Surround & 3D Fully Periphonic Recording
Grinnip, Roger S., III
An advanced numerical model of a pressure condenser microphone capsule is presented. The model divides the acoustic space into internal and external domains and couples the dynamic pressure in each domain to the capsule diaphragm motion. The external acoustic space is modeled using the boundary element (BE) method which allows for arbitrary geometry of the capsule/microphone external surface. The diaphragm is modeled as a circular tensioned membrane of negligible bending stickiness. The internal acoustic space (both the viscous air film and back chamber) is modeled as a cylindrical cavity with negligible axial pressure variation. Flow through the back plate is modeled as an annular array of circular pores with generalized functions locating each pore position. Although the presented model is specialized for a simple pressure condenser microphone, the numerical implementation is sufficiently generic to allow for a large variation in capsule parameters. The complete model, implemented in a software package called VC, is used to generate a simulated response curve that is compared to a response curve taken from an experimental prototype. The results show excellent agreement throughout the measured frequency range, indicating this new coupled model may be used for advanced microphone characterization and design.
Advanced Simulation of a Condenser Microphone Capsule
Avendano, Carlos; Goodwin, Michael
This paper describes a processing approach which enables perceptually compelling modification of audio signals via accentuation or suppression of transients. The transient detection uses a frequency-domain analysis which yields a spectral. ux parameter. In typical detection methods, such a parameter would be compared with a threshold to derive a binary transient detection function. Here, we instead use an adjustable graded response to arrive at a continuous transient characterization function. This smooth function is used to drive a nonlinear frequency-domain signal modification. We demonstrate that binary detection is problematic for perceptual manipulation, that our approach overcomes these problems, and that our system is able to achieve substantial modification of a signal's perceptual attributes without introducing significant artifacts.
Enhancement of Audio Signals Using Transient Detection and Modification
Abe, Mototsugu; Smith, Julius O., III
Due to its simplicity and accuracy, quadratic peak interpolation in a zero-padded Fast Fourier Transform (FFT) has been widely used for sinusoidal parameter estimation in audio applications. While general criteria can guide the choice of window type, FFT length, and zero-padding factor, it is sometimes desirable in practice to know more precisely the requirements for achieving a prescribed error bound. In this paper, we theoretically predict and numerically measure the errors associated with various choices of analysis parameters, and provide precise criteria for designing the estimator. In particular, we determine 1) the minimum zero-padding factor needed for a given error bound in quadratic peak interpolation, and 2) the minimum allowable frequency separation for a given window length, for various window types.
Design Criteria for Simple Sinusoidal Parameter Estimation Based on Quadratic Interpolation of FFT Magnitude Peaks
Baird, Justin; Jackson, Bruce; McGrath, David
An improved method of audio equalization utilizing Raised Cosine Filters is introduced. Raised Cosine Filters offer improved selectivity in comparison to traditionally implemented equalization functions, while also maintaining beneficial attributes such as a minimum phase response. The Raised Cosine Filter also enables flat summation and asymmetrical filtering characteristics, resulting in an equalization system offering capability beyond traditional filter implementations.
Raised Cosine Equalization Utilizing Log Scale Filter Synthesis
Barry, Dan; Coyle, Eugene; Lawlor, Bob
We present a real-time sound source separation algorithm which performs the task of source separation based on the lateral displacement of a source within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. Gain scaling and phase cancellation techniques are used in the frequency domain to expose frequency dependent nulls across the azimuth plane. The position of these nulls in conjunction with magnitude estimation and grouping techniques are then used to resynthesise separated sources. Results obtained from real recordings show that for music, this algorithm outperforms current source separation schemes.
Real-Time Sound Source Separation: Azimuth Discrimination and Resynthesis
Avendano, Carlos; Goodwin, Michael
In this paper we describe a signal processing technique for enhancing audio signals based on manipulation of their modulation spectra. The modification is achieved by. Filtering the time trajectories of spectral envelopes in different frequency bands. Scaling of higher modulation frequencies with shelving. Filters is used to modify rapidly changing acoustic events, thus effectively enhancing transient components without the need for explicit detection. The perceptual effect of such modifications is analogous to the edge processing applied to images, where acoustic details can either be smoothed or sharpened depending on the desired quality of the sound.
Enhancement of Audio Signals Based on Modulation Spectrum Processing
Candy, Bruce H.; Cox, Stephen M.
A novel analogue class-D amplifier has been developed which produces low distortion. The structure follows the well known prior-art class-D structures with negative feedback, but includes modulation of the symmetry of the carrier oscillator waveform by a derivative of the input signal. This compensates a nonlinear phase modulation effect that is intrinsic to the prior-art structures. The improvement is substantial at very low extra cost.
Improved Analogue Class-D Amplifier with Carrier Symmetry Modulation
Floru, Fred; Whitlock, Bill
Limited Common-Mode Rejection Ratio (CMRR) in balanced interfaces often limits dynamic range in real-world audio systems. Conventional differential amplifier input circuits suffer serious CMRR degradation when driven by real system signal sources instead of laboratory generators. An ideal audio transformer, because of its extremely high common-mode impedances, is virtually immune to this degradation. A new Integrated Circuit (IC) is described that uses a patented topology to achieve common-mode impedances comparable to those of an ideal transformer. As a result, the IC enables signals with very high dynamic range to be transported without contamination by system groundvoltage differences or other sources of common-mode interference. Other features of the IC, relating to audio signal quality and reliability, are also detailed.
New Balanced Input Integrated Circuit Achieves Very High Dynamic Range in Real-World Systems
Artificial reverberation is used by recording engineers to help create a sense of spaciousness, depth, and envelopment in a sound recording. A system is proposed for training listeners to detect and identify various aspects of artificial reverberation reproduced in multichannel audio. The training helps increase listeners 'auditory sensitivity to parameters of artificial reverberation in sound scene comparisons. The exercises progress from simple matching to identification of more subtle aspects of artificial reverberation.
An Ear Training System for Identifying Parameters of Artificial Reverberation in Multichannel Audio
Kyriakakis, Chris; Sadek, Ramy
This paper presents a novel panning algorithm called Speaker-Placement Correction Amplitude Panning (SPCAP) which guarantees conservation of loudspeaker power output. The method is appropriate for any speaker arrangement (e.g. ITU 5.1, 10.2, etc.), and scales with the number of speakers. SPCAP works by correcting initial pan values based on speaker placement to achieve constant power output. Because panning occurs over an arbitrary number of speakers (i.e. is not pair-wise), SPCAP provides two significant advantages over discrete panning schemes. First, pan values for current and future surround-sound formats (e.g. 5.1 and 10.2) are guaranteed to conserve power under any lower-resolution setup, making dynamic up/down mixing in non-standard setups feasible. Second, SPCAP provides a framework for producing wide (non point-source) sounds.
A Novel Multichannel Panning Method for Standard and Arbitrary Loudspeaker Configurations
Birkedal Nielsen, Sofus; Celestinos, Adrian
Sound level distribution generated by loudspeakers placed in a room can be simulated using numerical methods. The purpose of this paper is to present an application based on. nite-difference time-domain approximations (FDTD) for the study of low frequencies in audio reproduction such as ordinary stereo to multi-channel surround setups. A rectangular room is simulated by using a discrete model in time and space. This technique has been used extensively and gives good performance at low frequencies. The impulse response can be obtained in addition to the sound level distribution. Simulation of multiple loudspeakers in a room can be achieved to evaluate and visualize their coupling with the room. A high frequency resolution can be obtained for auralization purpose.
Multi-Source Low Frequency Room Simulation Using Finite Difference Time Domain Approximations
This paper presents ARIA (Application Rendering Immersive Audio). This system provides a means for the research community to easily test and integrate algorithms into a multichannel playback/recording system. ARIA uses a host-based architecture, meaning that programs can be developed and debugged in standard C++ without the need for expensive, specialized DSP programming and testing tools. ARIA allows developers to exploit the speed and low cost of modern CPUs, provides cross-platform portability, and simplifies the modification and sharing of codes. This system is designed for real-time playback and processing, thus closing the gap between research testbed and delivery systems.
A Host-Based Real-Time Multichannel Immersive Sound Playback and Processing System
Orinos, Chris; Tsakiris, Vassilis "Bill"
In this paper we investigate the subwoofer concept in relation to the various benefits of digital equalization and the way it can be used together with today 's small sized multichannel loudspeaker systems. We try to systematize a somewhat objective method of comparing between different subwoofer positions and crossover frequencies regarding their optimum response in a listening area. All these have as purpose to show that we can raise the subwoofer frequency at 120Hz and thus relieve the main loudspeakers from the task of reproducing frequencies down to 80Hz. Thus it is possible to create a hi-end system using slim, line array, main loudspeakers, with all their known advantages, which can be correctly integrated, both aesthetically and acoustically, in any listening room.
Optimum Loudspeaker System with Subwoofer and Digital Equalization
One of the most relevant characteristics of a sound system is the maximum level at which it will function in its intended environment before the output becomes objectionably distorted. Due to design, construction, or thermal limitations, this characteristic can vary with both the frequency content and the duration of the applied stimulus at each measurement. Further complicating distortion measurement is the variation in frequency response caused by reflections in the environment. This paper describes an automated technique using shaped tone-bursts under software control to generate the stimuli, acquire the responses, process and correct the data for room response, and present a graphical representation of the peak sound level capability versus test frequency. Also described is a novel technique for separating noise and distortion energy from stimulus energy from an in-room measurement.
Determining the Peak Sound Level Capability of Loudspeakers and Sound Systems
The use of a plane wave tube (PWT) is standard practice for the testing of audio compression drivers, as the damped tube provides an acoustic impedance load for the driver that is similar to an infinitely long horn of the same throat diameter. When properly terminated, it is anechoic. Uses of plane wave tubes for compression driver testing:
Plane Wave Tubes - Uses and Limitations
The log-swept sine chirp provides a way to measure the transfer function and harmonic distortion of an audio device simultaneously. A deconvolution operation separates the linear and non-linear responses in time. Results on real audio equipment are compared to classical methods and found to agree. An extension for simultaneously measuring crosstalk is suggested.
Measurement of Audio Equipment with Log-Swept Sine Chirps
After the inauguration of the expansion of the Danish National Gallery in 1998, a serious acoustic mishap was experienced in the new large exhibition rooms. These interconnected rooms of approx. 33,000 cubic metres (approx. 1.2 million cubic feet) were supposed to offer also a multipurpose acoustical environment for a variety of cultural events. A record-breaking reverberation time of approx. 11 seconds was measured. The acoustic redesign process included not only to the necessity of finding acoustical effective solutions; these solutions also had to be invisible or near-invisible due to the architectural requirements. This paper describes how the requirements were met, resulting in a highly acceptable reverberation time of a little more than 2 seconds.
Acoustic Re-Design of the Danish National Gallery
STI or its derivatives (RaSTI and STIPa) have become the internationally accepted methods for acoustically measuring the potential intelligibility performance of a sound system. However, in practice, many of the measurements carried out in the field, to either verify or ascertain sound system and Voice Alarm intelligibility performance, are often based on flawed techniques. The paper examines a number of common problems found to affect measurement accuracy. The paper also highlights conditions under which STI and STIPa inherently appear to incorrectly predict intelligibility performance. In particular it is shown that the currently available commercial software programmes and instrumentation fail to correctly predict the performance of sound systems exhibiting irregular or band limited frequency responses when they are operating in reverberant environments under quiet (i.e. high signal to noise ratio) conditions.
Systematic & Common Errors in Sound System STI and Intelligibility Measurements
Given a multi-channel loudspeaker system, in a typical single or multiple listener setup, the combined response of the loudspeakers will exhibit significant fluctuation around the crossover region due to noncoincident positions of any two loudspeakers. This fluctuation manifests as an undesired broad spectral notch or a peak around the crossover region. The spectral notch, for example, introduced around the crossover due to complex addition of the two loudspeaker responses, generally, cannot be compensated with only magnitude response equalization. In this paper, we present a recipe for compensating the spectral notch around the crossover region by designing a digital equalization filter using a stable all-pass network.
Phase Equalization for Multi-Channel Loudspeaker-Room Responses
Cabrera, Densil; Willsallen, Scott
This study investigates subjective and objective parameters of a sound reinforcement system in a large sports stadium. The sound at fourteen groups of three receiving positions was studied in a subjective listening test, as well as through objective system measurements. For an orchestral music sample, seventeen system tunings were subjectively assessed with fifteen scales. Objective measurements were made at each receiving position. Results showed significant variation for many of the subjective scales between tunings, as well as between receiving positions. To some extent, subjective and objective measurements were related as they describe system tuning and receiving position. Beyond its specific results, this study highlights a range of difficulties in empirically assessing audio quality for music in a very large venue.
Assessment of Music Audio Quality in a Sports Stadium
Staffeldt, Henrik; Thompson, Ambrose
The paper is focusing on the direct sound frequency response of line arrays - rectlinear or curved - at mid and high frequencies (1kHz - 10kHz) which is arguably the most important range and one that is relatively easy to measure. In this frequency range a line array may produce irregular on-and off-axis frequency responses in the audience area. Which is difficult to predict using simpler models. The irregularities, which appear as frequency varying attenuation, depend in a complicated way on array configuration and air absorption. Array performance prediction software usually models a line array as a number of directive point sources placed on a line or curve. The directive point source model has been used to simulate line arrays to study the frequency response behaviour of line arrays at mid and high frequencies. The results of the study are compared with frequency response predictions calculated by new software including multi-channel array controller simulations and measured complex spherical polar data for a specific 3-way line array cabinet. The predictions are compared to direct sound frequency response measurements on line arrays using the same 3-way cabinet to show the degree of accuracy with which directive point source models can predict the frequency responses of line arrays.
Line Array Performance at Mid and High Frequencies
Ahnert, Wolfgang; Feistel, Stefan; Lentz, Tobias; Moldrzyk, Christoph; Weinzierl, Stefan
A desirable feature of modern acoustical simulation programs is the easy, fast and reliable auralization of prediction results. To be considered as a serious tool, the auralization results should be equivalent to human perception in reality. In this paper we consider a new auralization technique, based on a head-tracked headphone system with high spatial resolution and real-time convolution. We discuss the measurement of directional head-related transfer functions, the calculation of directional binaural impulse responses and the realisation as a real-time convolution software. A listening test was performed, comparing reality, measurement and prediction results for an example room.
Head-Tracked Auralization of Acoustical Simulation
Gover, Bradford N.
A spherical microphone array has been used to perform directional measurements of airborne sound transmission between rooms. With a source and array on opposite sides of a wall, omnidirectional impulse responses were measured to each of the array microphones. Beamforming resulted in a set of directional impulse responses, which were analyzed to find the distribution of arriving sound energy at the array position during various time ranges. Weak spots in the separating wall are indicated as directions of increased arriving sound energy. The system was able to identify minor defects in a test wall in between two reverberation chambers, and also to identify leaks in the wall of an actual meeting room.
Directional Measurement of Airborne Sound Transmission Paths Using a Spherical Microphone Array
Kashani, Reza; Wischmeyer, James
Bass traps, regardless of their effectiveness in abating bass acoustic coloration in a room have two, somewhat undesirable attributes: 1) large size and 2) lack of adaptability. An alternative to the use of bass traps, discussed in this paper, is incorporating a properly devised, feedback control scheme into a powered subwoofer making the subwoofer to exhibit the same dynamics as that of a bass trap. This patent pending, active coloration control solution which can be viewed as an' electronic bass trap 'adds acoustic damping to the low-frequency modes of a room. In addition to a powered subwoofer, the electronic bass trap uses a microphone and an op-amp circuit. Numerical and experimental results indicate the effectiveness of the electronic bass trap in adding acoustic damping to the low-frequency standing wave (s) in a room.
Electronic Bass Trap
This paper investigates what has happened to the many transistors used in digital audio engineering. Improvements in signal quality have certainly accrued but advances in ease of use, reliability, availability, serviceability, power consumption, delay and cost have far outweighed, from a practical standpoint, advances in audio quality. Many examples are given, a novel signal visualization technique is described and, based upon the history some predictions are made.
Moore’s Law and Digital Audio. What Have We Done with All the Transistors?
Lipshitz, Stanley P.; Vanderkoy, John
Twenty-five years after the discovery of the desirable attributes of nonsubtractive triangular probability density function dither for signal quantization, some misunderstandings, myths, and half-truths still abound regarding what dither does and does not do. The increased use of dithered sigma-delta modulators has recently brought some of these questions to the fore. Some of these errors are relatively easy to explain and correct, while others are considerably more subtle in nature, but nevertheless also need to be addressed. This paper attempts to explain and clarify these matters, with the aid of copious time-domain, frequency-domain, and statistical-domain illustrations. It assumes that the reader already has a good knowledge of the theory of dithered quantization.
Dither Myths and Facts
Janssen, Erwin; Reefman, Derk; Reiss, Joshua D.; Sandler, Mark
The authors have recently developed a framework for analysis of limit cycle behavior in feedforward sigma delta modulators (SDMs). However, the dynamics of feedback SDMs appear to be completely different. Here, we extend that framework to include limit cycles in feedback SDMs. We prove that for DC inputs, periodic output implies state space periodicity. An outcome of this is that for an Nth order SDM, at least N-1 initial conditions must be fixed in order to have limit cycle behaviour. We present expressions for the minimum disturbance of the input or initial conditions that is needed to break up a limit cycle. These expressions are notably different from the analogous expressions for feedforward SDMs. We show that dithering the quantiser is a sub-optimal approach to removing limit cycles, and limit cycle stability is determined. Examples are provided that illustrate the theoretical results, and these results are also compared with those found for feedforward SDM designs. It is shown that, with respect to limit cycle behaviour, it makes little difference whether feedforward or feedback designs are used.
Description of Limit Cycles in Feedback Sigma Delta Modulators
Angus, James A. S.
Look-ahead Sigma-Delta modulators look forward k samples before deciding to output a "one" or a "zero". The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. This paper describes alternative tree based algorithms. Tree based algorithms are simpler to implement because they do not require backtracking to determine the correct output value. They can also be made more efficient using "Stack" algorithms. Both the tree algorithm and the more computationally efficient "Stack" algorithms are described. Implementations of both algorithms are described in some detail. In particular, the appropriate data structures for both the trial filters and score memories.
Implementation of "Tree" and “Stack” Algorithms for Look-Ahead Sigma Delta Modulators
Macbeth, Ian; Roberts, Tegid
The control and real-time software programmability of low frequency audio signals in the analog domain is inaccurate, cumbersome and expensive. In the digital domain there are issues relating to low frequency distortions, latency and design time. A fully programmable analog array IC methodology is presented which combines the benefits of DSP programmability with analog signal processing by way of a case study demonstrating software design tools and custom software configuration models. This single chip Subwoofer Conditioner solution implements a sub-sonic filter, adjustable audio compressor, Linkwitz transform filter and low pass output filter with full software control. Performance measurements of this implementation as well as further enhancements to the software models are also discussed.
Novel Subwoofer Signal Conditioner Design Using a Field Programmable Analog Array and Software Tools
Kassakian, Peter; Wessel, David
The synthesis and rotational control of radiation patterns produced by spherical arrays of loudspeakers is studied. We identify operating regions, in terms of complexity of patterns and frequency ranges, over which patterns can be accurately synthesized. By considering an inner product space of far-field patterns, we can reason geometrically about approximation errors when using the systems to synthesize and control target responses. Bounds for normalized error across subspaces, in particular subspaces corresponding to the control operation of rotation, are calculated using singular value decomposition. The bounds can be interpreted as the best and worst case errors encountered when dynamically steering the patterns.
Characterization of Spherical Loudspeaker Arrays
Keele, Jr., D.B. (Don)
This paper describes a class of FIR filter/convolvers based on interpolation that allow sparse specification of the filter’s impulse-response waveform or equivalently its frequency spectrum in both linear-and log-spaced domains. Interpolation allows the filter 's impulse response or frequency response to be specified in significantly fewer samples. This is turn means that farless filter taps are required. Linear-and log-sampled interpolating filter/convolvers can further be categorized into two types: Type 1, interpolation in time, and Type 2, interpolation in frequency. Type 1 provides direct specification of the filter’s impulse response in linear or log time, while Type 2 allows direct specification of the complex (real-imaginary) frequency response of the filter in linear or log frequency. Each form of filter vastly reduces the number of filter taps but greatly increases the processing complexity at each tap. Efficient implementations of the log-spaced filter-convolvers are presented which use multiple asynchronous sample-rate converters. This paper is a continuation of the author 's" Log Sampling "paper presented to the AES in Nov. 1994. This paper represents work in progress with a conceptual description of the convolution technique with minimal mathematical development.
Interpolating Linear- and Log-Sampled Convolution
Hicks, Daniel A.; Letowski, Tomasz; Rao, Mohan D.
The Callsign Acquisition Test (CAT) is a new speech recognition test developed by the U.S. Army to examine speech intelligibility in a military environment. This study compared speech intelligibility results of the Callsign Acquisition Test with another test used widely in industrial applications, the Modified Rhyme Test, using listening tests and objective speech metrics. A group of 24 listeners between the ages of 18 and 25 participated in the study. Six different types of recorded background noises radiating from an armored personnel carrier; helicopter; jet engine; mid-size car; subway train; and standard pink noise were used in the study. Test results demonstrated that the differences in the mean speech recognition scores obtained for CAT and MRT across all selected background noises were not statistically significant. However, the effect of noise and interactions between the noise and the test were statistically significant. A correlation of the measured scores with the spectral content of the background noise revealed somewhat higher scores for MRT compared with CAT under selected background noises that have most of the frequency content above 500 Hz. In contrast, slightly higher scores for CAT were noticed for selected noises having predominantly low frequency components below 250 Hz.
A Comparison of Speech Intelligibility Results Between the Callsign Acquisition Test and the Modified Rhyme Test
Dobrucki, Andrzej; Kozlowski, Piotr
This document presents further results of continuation of research about objective methods, which use psychoacoustics knowledge for estimation of the quality of audio signals. The software written especially for this research is presented. This program allows for implementation of the different published methods for evaluation of the quality of perceptual coded audio signals. Protocols: PAQM, PSQM, NMR, PEAQ, PESQ have been implemented. All of these algorithms are used for simulation of the auditory system. The software is open for addition next protocols as the plug-ins. There is a possibility to change and improve protocols published earlier. Authors proposed in previous works how to improve objective protocols e.g. by changing pitch scale. Suggested adjustment of internal parameters of signal processing, which improves results of objective evaluation, is presented. The criterion of optimization is the difference between results of subjective and objective evaluation.
Adjustment of the Parameters Proposed for the Objective, Perceptual Based Evaluation Methods of Compressed Speech and Audio Signals
Czyzewski, Andrzej; Kostek, Bozena; Lorens, Artur; Walkowiak, Adam
Measurement of Spread of Excitation (SoE) provides a potential method of assessment of cochlear implant users' benefit. To provide maximum benefit for the cochlear implant users the speech processor should be fitted to the patients 'need. One objective method that could deliver important information for fitting is Neural Response Telemetry (NRT). This method helps to estimate an amplitude of electrical current that is required to elicit hearing sensation via cochlear implant. It is also possible to determine Spread of Excitation - the longitudinal spread of electrically evoked neural excitation in the cochlea, based on NRT results. The parameters of the Spread of Excitation in the individual patient may help to explain the patients' performance and indicate in which way sound processing strategies could be modified to improve one's benefit. In this work measured profiles of SoE are shown, as well as some preliminary analyses are presented.
New Techniques Assisting Cochlear Implants Fitting
To validate a head-tracking system in comparison to a loudspeaker-arrangement regarding perception of auditory source width (ASW), i.e. the horizontal extension of the auditory event (s), experiments by representing narrowband and broadband noise stimuli of chosen degrees of correlation were conducted. The excitation signals were presented by (i) pairs of loudspeakers and (ii) headphones with head-tracking in an anechoic environment. The evaluation by five trained subjects resulted in a good correspondence of the outcomes of the speaker and the head-tracking experiments, i.e. the head-tracking system had a negligible influence on spatial perception.
Influence of Head-Tracking on Spatial Perception
Kuwata, Satoshi; Nakayama, Yasushige; Watanabe, Kaoru
We describe a method of controlling three-dimensional (3-D) sound images in which the level of intensity of the sound images is controlled by arranging them near or at a distance from the listener (ìcontrolî means amplitude panning for distance). The images were created using two loudspeakers arranged near or at a distance from the listener, or a loudspeaker array. A subjective evaluation was carried out to examine the perceptual distance of octave bands and white noise by changing the direct-to-reverberant energy ratio. It was found that the distance produced by the direct-to-reverberant energy ratio was frequency-dependent, and that the distance for frequencies above 5,660 Hz was not significantly different when the ratio was changed. A 3-D audio coding method was developed based on these results. An experiment using the coding tool showed that the bit-rate efficiency of the method, which combines frequency components above 5,660 Hz, was more than 30% higher than that of dualmono transform coding, with no degradation in 3-D audio reproduction.
Frequency Dependence of Perceptual Sound Image Distance Using Direct-To-Reverberant Energy Ratio Control Method
Jang, Seongcheol; Kim, Sunmin; Lee, Joonhyun; Park, Sangil
This paper provides a wide stereo algorithm which widens the stereo sound stage for a two-channel loudspeaker layout. The design method consists of a binaural synthesis, a crosstalk canceller and a direct filter. This creates multiple virtual loudspeakers and allows them to spread out in the front. Consequently, the proposed algorithm, which includes the widening of the sound stage and the timbre preservation, is designed in a form of a 2 by 2 filter matrix. The filter order is minimized for easy implementation while maintaining the performance.
Virtual Sound Algorithm for Wide Stereo Sound Stage
Boueri, Maurice; Kyriakakis, Chris
This paper presents a new method for decorrelating audio signals by applying a random time shift to each critical band. The resulting signals exhibit a significantly lower inter-aural cross-correlation. The effects of this type of decorrelation on perceived envelopment and loss of phantom image will be presented.
Audio Signal Decorrelation Based on a Critical Band Approach
Brookes, Tim; Kassier, Rafael; Rumsey, Francis
Despite recent consumer uptake of surround sound systems and the existence of a number of studies into spatial audio attributes, there is currently no system to train listeners in the detection and discrimination of the spatial attributes of reproduced sound. Timbral ear training has been shown to increase response consistency in listening tests, but a number of obstacles must be negotiated in order to successfully implement an ear training system for spatial aspects of sound reproduction. The first of these is the determination of which spatial audio attributes are appropriate. This paper describes the formulation of a new spatial audio paradigm, by testing previously documented attributes against specific selection criteria, and including, modifying or rejecting them accordingly.
A Simplified Scene-Based Paradigm for Use in Spatial Audio Listener Training Applications
Lesso, Paul; Travis, Chris
The question of sample-clock quality is a perennial one for digital audio equipment designers. Yet most chip makers provide very little information about the jitter performance of their products. Consequently, equipment designers sometimes get burnt by jitter issues. The increasing use of packet-based communications and class-D amplification will throw these matters into sharp relief. This paper reviews various ways of characterizing and quantifying jitter, and refines several of them for audio purposes. It also attempts to present a common, unambiguous terminology. The focus includes wideband jitter, baseband jitter, jitter spectra, period jitter, long-term jitter and jitter signatures. Comments are made on jitter transfer through phase-locked loops and on the jitter susceptibility of audio converters.
Specifying the Jitter Performance of Audio Components
To audio systems designers, the "fully differential op amp" is a relatively new entry. Two discrete-circuit variations on the theme are presented, one of which provides effectively floating outputs.
High-Performance Discrete Building Blocks for Balanced Audio Signal Processing
Immersive audio for interactive gaming is necessarily processed and mixed in real time as it is being rendered on the game audio playback platform. It is generally assumed that music and movie soundtracks require no comparable processing during playback because listeners typically provide no real-time input that might affect the final rendering. In reality, pre-packaged audio is being delivered to music and movie playback platforms in increasingly diverse forms. The result is that mismatches between the spatial audio format, bit depth, and frequency range of the content and of the playback system pose an emerging problem for which sophisticated playback processing may be an appropriate response. This paper presents a formal statement of the mismatch problem and proposes a unified solution using frequency-domain processing to perform "partial unmixing" of the pre-packaged content. Lastly, we show how this can enable a new music/movie listening experience rooted in the concept of "personalized audio."
Partial Unmixing for Personalized Audio
Angus, James A. S.
This paper presents a new a new method of applying high levels of dither to Sigma-Delta Modulators. In particular, it clarifies the position of the overload point in one-bit Sigma-Delta Modulation systems and presents several overload control methods with comparisons of their efficacy. It then goes on to examine the problem of applying dither to one-bit systems and describes a new approach for applying high levels of dither. It also examines the effect of different dither probability density distributions and shows that simple bi-level dither can be effective at lower levels than other probability density distributions. It presents results, which show that dither can be applied at a high enough level to be effective in one-bit Sigma-Delta Modulation systems.
A New Method of Applying High Levels of Dither to Delta-Sigma Modulators
Hawksford, Malcolm J.
DSD is a 1-bit coding scheme based upon sigma-delta modulation. In the commercial realization of this technology exploiting DVD optical disc storage, six discrete channels are accommodated each with a constant bit rate of 2.8224 Mb/s, a specification that cannot be changed within the context of the SACD release format. However, a method of embedding additional data in the DSD bitstream is shown to be feasible with the aim of increasing the number of channels to twelve. The technique retains full compatibility with SACD and only requires modest processing to decode an additional six channels.
Scaleable Multi-Channel DSD Coding
Ando, Akio; Hamasaki, Kimio; Nisiguchi, Toshiyuki; Ono, Kazuho
Subjective evaluation tests on perceptual discrimination between musical sounds with and without very high frequency (above 20 kHz) components have been conducted. To make a precise evaluation, the test system is designed to exclude any influence from very high frequency components in the audible frequency range. Moreover, various sound stimuli are originally recorded by a newly developed very wide frequency range microphone, in order to contain enough components in very high frequency range. Tests showed that some subjects might be able to discriminate between musical sounds with and without very high frequency components. This paper describes these subjective evaluations, and discusses the possibility of such discrimination as well as the high resolution audio recording of music.
Perceptual Discrimination of Very High Frequency Components in Musical Sound Recorded with a Newly Developed Wide Frequency Range Microphone
Bergman, Devon A.; Scordilis, Michael
A common problem in acoustical echo cancellation is the continuous change in the room 's impulse response. These changes need to be monitored and adapted to in order to create an enjoyable echo-free environment. In the experiment proceeding, measurements were taken to find an optimal FIR filter that would allow for fast convergence of the adaptive filter as well as a significant echo return loss. Finally, considerations were also taken to create a smooth adaptation from one configuration to the next.
High-Ordered Adaptive FIR Filters for Acoustical Echo Cancellation
Class D amplifiers are used for their high efficiency, but they have some undesirable characteristics, one of these being the residual switching frequency ripple. This paper shows a method of switching frequency ripple reduction by means of ripple steering. With this technique a second output is constructed, into which the switching ripple is steered, substantially relieving the main output from a major artifact of Class D operation.
Class D Amplifier with Zero Switching Ripple
Implementing hardware design in Field Programmable Gate Arrays is a formidable and an interesting task especially when considering digital signal processing applications. Hardware design skills and strong background in signal processing are required. Sometimes problems arise in realizing hardware implementation for a simple design of systems where the theoretical concept is plausible; care should be taken to account for minute design details. The objective of this paper is to present the design of a digital audio signal processor which performs multi-effect processing and at the same time is capable of real time configurability on a single FPGA chip. The design is specific to certain algorithmic tasks; there is no need for general purpose architecture and it can be characterized as a system on chip application. It is configurable and able to change coefficients utilizing Look up Tables and is capable of performing filtering and echo/delay generation.
FPGA Implementation of an Audio Processor
Boudreaux, Randy; Gaboriau, Johann; Hagge, Mel; Melanson, John; Zhang, Lingli
This paper presents a pure digital real-time power supply compensation scheme for both single-ended and bridgetied-load configured noise-shaped class D amplifiers. Using the appropriate power supply measurement circuitry, the scaled AC and DC components of the power supply voltage rail(s) are fed back into the PWM controller to modify the feedback path and the direct path of the noise shaper. All delays through the feedback loop have been minimized such that the ripple cancellation of the output stage is accomplished in real time. A two-chip ADC/PWM controller with this compensation scheme achieves 40dB power supply rejection of a 60Hz ripple and 100dB system dynamic range.
Real-Time Power Supply Compensation for Noise-Shaped Class D Amplifier
Hawksford, Malcolm J.; Prime, Francis M.
A digital power amplifier topology is proposed optimized specifically for use with DSD-type data streams. The configuration enables direct interfacing of DSD data with no requirement for intermediate signal processing or analogue-to-digital conversion. The output architecture exploits a classic H-bridge configuration and uses a novel form of ac data coupling to simplify internal interface circuitry. Wide range gain control is enabled through modulation of the output-stage power supply voltage that also improves power efficiency at low gain settings. Consideration is given to finite pulse rise time and a modified DSD data format is investigated.
Digital Audio Power Amplifier for DSD Data Streams
Anderegg, Rolf; Felber, Norbert; Fichtner, Wolfgang; Franke, Ulrich
Audio signal processing often requires modeling of large rooms (e.g. churches) with impulse responses of several seconds duration. Direct convolution of the sound stream with such long responses exceeds the capacity of common signal processors by far. Using the Fast Fourier Transform instead reduces the number of operations logarithmically, but introduces unacceptable latency. Segmenting the processing into initial short blocks and subsequent longer ones lets one trade latency vs. computation power as presented in previous AES papers. Hardware-wise the reduction of operations comes at the cost of large storage with high memory bandwidths. Dedicated application specific integrated circuits (ASIC) are predestined to perform the rather regular processing, freeing the processors for other tasks. This paper shows suitable architectures for integration on silicon of optimized fast-convolution algorithms. Possible optimizations for fast-convolution algorithms are examined. Based on these findings different architectures for integration on ASIC/FPGA (Field Programmable Gate Array) of such algorithms are developed, analysed and compared. The paper is concluded by presenting an exemplary ASIC implementation.
Implementation of High-Order Convolution Algorithms with Low Latency on Silicon Chips
Yoo, Chul-Jae; Kim, Hyung-Myung
Modified Discrete Cosine Transform (MDCT) filter bank, or often called as Time Domain Aliasing Cancellation (TDAC) filter bank, is widely used in audio coding systems. The last step of the conventional MDCT filter bank is the dual overlap add procedure to restore uncompressed original signal. The last step can be generalized using the multiple overlap add procedure, in which the input and output block size can be reduced as the number of overlapped windows increases. The MDCT system with multiple overlap add can reveal scalability features in proportion to the number of overlapped windows when it is used along with the fixed bit adaptive quantization capable of maintaining nearly the same SNR irrespective of the input level. It has been shown that the proposed structure is scalable in block unit with the same data rate as the conventional system and that it shows slight SNR improvement over conventional one.
Scalability in the Modified Discrete Cosine Transform Filter Bank
Azzali, Andrea; Boreanaz, Giovanni; Farina, Angelo; Irato, Giorgio; Rovai, Guido
A measurable index ("IQSB") quantifying perceived quality of car stereos has been developed, to forecast aural appreciation. Results of panel interviews and listening tests (in a special "auralisation room") have been correlated with the analysis of corresponding binaural recordings. Two outputs were obtained. First, a model of the subjectively most relevant features was identified, in terms of statistically significant "verbal descriptors". Second, a single-figure index was constructed, function of objective measurable quantities related with audio performance, and well correlating with the average verbal evaluation (both of "naïve" and "expert" listeners). This tool is of great importance for the automotive industry, because it allows for the direct quantification of the audio system performance, significant part of the perceived quality of the product.
Construction of a Car Stereo Audio Quality Index
Bozzoli, Fabio; Farina, Angelo
One of the most used intelligibility 's parameters is the Speech Transmission Index: the techniques for determining it employs artificial speaker and listener. Inside cars, where signal to noise ratio is particularly low, the value of STI is mainly influenced by this ratio and determining the sound power of real speakers is the only way for piloting correctly the artificial mouth. We have implemented a technique that is based on throat-activated microphone and it is able to find the level of real speaker's voice inside noisy spaces in the effective conditions. Especially, we have studied the speech inside cars and we have discovered how the value defined by typical configuration may be extremely different from real one and, in this way, we have been able to produce more reliable excitation signals. Using this "raised" signal we have tested one car and we have tried to find a good correlation between drivers 'impression and objective values.
Measurement of Speech Transmission Index Inside Cars Using Throat-Activated Microphone and Analysis of Its Correlation with Drivers’ Impression
Bellini, Alberto; Cavatorta, Matteo; Franceschini, Giovanni; Lorenzani, Emilio; Violi, Francesco
This paper presents an original DC/DC step-up converter topology for high power car audio applications with battery supply. A prototype was realized and tested for the power supply of a TANDEM subwoofer box. Audio signals are characterized by high dynamic variations, thus they can be amplified only relying on power supplies with high dynamic capabilities and low output ripple. The latter constraint can be achieved only using closed-loop switch-mode DC/DC converters at high switching frequency. So doing converter efficiency is reduced, and EMC problems arise. In summary power supply efficiency and supply voltage quality are key features of the converter design. In this paper the above mentioned issued were tackled relying on an open loop topology. The original solution is the adoption of a three-phase transformer within a full-bridge converter topology. The proposed architecture will be referred to as 3boost power supply. Experimental results confirm that the 3boost power supply topology allows to achieve higher efficiency, a lower ripple factor, and a unitary transformer utilization factor. It turns out that it is an efficient power supply for a car audio subwoofer system, specifically for a digital output stage, where the quality of the supply level is a key element. The proposed architecture is patent pending.
High Power Step-Up Converter for Car Subwoofers
S., Krishna Kumar; Sreenivas, Thippur V.
Audio watermarks are often made signal-dependant to keep them imperceptible. A blind watermark detector, which does not have access to the original unwatermarked signal, seems handicapped, because an approximate watermark has to be re-derived from the watermarked signal. Referring to the exact watermark known scenario as a semi-blind detector, some reduction in performance is anticipated in blind detection over that of semi-blind detection. The present work is an experimental investigation into this issue, explored around a typical correlation-based audio watermark detection scheme. It is found, surprisingly, that the statistical performance of the blind detector is better than that of the semi-blind detector. It is found that the rederived watermark is better correlated to the host signal and hence leads to better detection performance. It is confirmed that this happens only if the embedded watermark is the same as the examined watermark.
Increased Correlation in Blind Audio Watermark Detection - a Blessing in Disguise?
Dalka, Piotr; Dziubinski, Marek; Kostek, Bozena
In this paper several algorithms are presented, developed for musical sound separation. The proposed techniques for the decomposition of mixed sounds are based on the assumption that pitch of the sounds contained in the mix is known, i.e. inputs of the algorithms are pitch tracks of the signals contained in the mixture. The estimation process of phase and amplitude contours representing harmonic components is based on the limited number of inner product operations, performed on the signal with the use of complex exponentials matching pitch characteristics of the separated signals, and not on the discrete spectral representations calculated via DFT. In this paper examples of separation results are presented and each algorithm performance is analyzed. The effectiveness of separation algorithms consists in calculation of feature vectors (FVs) derived from musical sounds after the separation process is performed, and then in feeding them the Neural Network (NN) for automatic musical sound identification. The experimental results are shown and discussed. A comparison of effectiveness of all presented algorithms is also included, and conclusions are derived.
Comparison of Effectiveness of Musical Sound Separation Algorithms Employing Neural Networks
Lavoie, Michel C.; Norcross, Scott G.; Soulodre, Gilbert A.
Previous studies have shown that certain inverse filtering methods introduce audible artifacts that can degrade the audio signal. To correct some of these artifacts various techniques such as regularization, smoothing and increasing the length of the inverse filter have been proposed. While these methods help in some cases they may also produce other artifacts or distortions that degrade the audio quality. In the present study formal subjective tests were conducted to systematically investigate modeled distortions similar to those found in inverse filtering. Parameters of the distortions, such as spectral shape, length and time profile were varied for the subjective tests. The results of the tests can be used to better understand the audibility of these artifacts and to create a perceptual model that can be used to design subjectively improved inverse filters.
Distortion Audibility in Inverse Filtering
Coyle, Eugene; Gainza, Mikel; Lawlor, Bob
A technique for separating harmonic sound sources using FIR comb filters is presented. First, a pre-processing task is performed by a multipitch estimator to detect the pitches that the signal is composed of. Then, a method based on the Short Time Fourier Transform (STFT) is utilized to iteratively extract the harmonics belonging to a given source by using FIR comb filters. The presented approach improves upon existing sinusoidal model approaches in terms of the perceptual quality of the extracted signal.
Harmonic Sound Source Separation Using FIR Comb Filters
Maher, Robert C.
The AES Technical Committee on Signal Processing is developing a compact disc with educational material and demonstrations intended for students, educators, and working digital audio engineers. The material includes examples of quantization and dither, basic psychoacoustics, and practical DSP. The multi-mode CD will have both audio tracks and a CD-ROM section with a web-browser interface. The CD will be produced for sale by the AES Publications office.
AES Technical Committee on Signal Processing Educational CD Project
Iyer, Subu; Jouppi, Norman P.
We have developed a system we call Telescopic Spatial Radio (TSR). This system transforms monaural transmissions from geographically distributed speakers into a spatial audio presentation using binaural techniques which preserve the actual physical angles between participants. TSR instantly augments the user’s situational awareness with the headings of the speaking users. The system leverages orientation measuring, location tracking, and signal processing capabilities that are rapidly decreasing in cost. TSR has many potential applications ranging from emergency and aviation communication to a richer consumer experience. We have developed a prototype system using laptop computers, GPS, and electronic compasses. The system allows users to select HRTFs from a library, and operates over a computer network.
Telescopic Spatial Radio
Behler, Gottfried; Lentz, Tobias
To create a Virtual Reality environment with true immersion a precise spatial audio reproduction system is required. Since the placement of large loudspeaker arrays which are needed for wave field synthesis systems may be impossible for some environments, alternative solutions must be found. One application of this kind, for instance, is a multi screen VR system where the stereoscopic video images envelope the user. In such a case the presented binaural approach has many advantages. This paper describes the virtual sound source imaging by binaural synthesis and the reproduction over loudspeakers with a dynamic (tracked) cross-talk cancellation system which only needs three to four loudspeakers to cover all listening positions.
Dynamic Cross-Talk Cancellation for Binaural Synthesis in Virtual Reality Environments
Cohen, Michael; Sasaki, Masahiro
We describe a unique MIDI-based spatial sound system featuring a network-driven bank of four RSS-10s (Roland Sound Space Processors) driving an eight-transducer circumferential speaker array in a" 3D Theater", enabling a three-dimensional dynamic musical space. Sound sources can be choreographed by adding dynamic positional gestures to standard MIDI files. Our sequencing system interprets such files, partitioning their data into two streams: one for MIDI tonal events, sent to synthesizers, and the other for positional data, sent simultaneously to sound spatializers, clients in a multimodal musical control of sound spatialization, synchronizable with stereographic 3D contents, for spatializing music to give real presence to an audience.
Dancing Music: Integrated MIDI-Driven Synthesis and Spatialization for Virtual Reality
Brookes, Tim; Mason, Russell; Rumsey, Francis
A measurement model based on the interaural cross-correlation coefficient (IACC) that attempts to predict the perceived source width of a range of auditory stimuli is currently under development. It is necessary to combine the predictions of this model with measurements of interaural time difference (ITD) to allow the model to provide its output on a meaningful scale and to allow integration of results across frequency. A detailed subjective experiment was undertaken using narrow-band stimuli with a number of centre frequencies, IACCs and ITDs. Subjects were asked to indicate the perceived position of the left and right boundaries of a number of these stimuli by altering the ITD of a pair of white noise comparison stimuli. It is shown that an existing IACC-based model provides a poor prediction of the subjective results but that modifications to the model significantly increase its accuracy.
Integration of Measurements of Interaural Cross-Correlation Coefficient and Interaural Time Difference Within a Single Model of Perceived Source Width
Advancements in nonlinear editing technology have enabled directors to modify their film project at any point during the post process. This freedom provides significant creative flexibility. However, the technologies for sound and film editing are not fully integrated and pose a challenge for sound editors keeping sync with film edits and changes. This paper introduces new workflows and technologies that enable sound editors to work in tandem with the changing film and automate manual processes in a collaborative non-linear environment. These new workflows and changing technologies will be described using a real-world motion picture case study - Lord of The Rings.
Sound Editing Workflows and Technologies for Digital Film: The Non Linear Soundtrack
Kouchi, Hiroshi; Ogata, Shinichiro; Uchimura, Kazutsugu
In the international co-production of HD (high definition) programs, there are some problems in post-production. These problems originate in the frame-rate relationship between 24p and 23.976p. Generally, 23.976p shooting is used to ensure compatibility with TV systems such as NTSC, but some problems occur when transferring to the PAL system or films. In this study, in a co-production with China called" The Ancient Routes of Tea & Horses', we accomplished 5.1 surround sound which was compatible with both 24p and 59.94i HD images. This paper describes the production techniques and problems, and some future challenges. We believe the techniques will be useful for media combinations in the future.
5.1 Surround Sound Productions with Multi-Format HDTV Programs
Komiyama, Setsu; Matsui, Kentaro; Okubo, Hiroyuki
A PC-based sound reproduction system, called PC-VRAS control, has been developed which is linked to virtual environments rendered by Virtual Reality Modeling Language (VRML) and provides spatially synchronized three-dimensional sound with the VRML scene. A listener can explore the VRML scene at will. The surrounding sound is synchronized and automatically re-synthesized with each step taken by the listener in real time. The control is built on ActiveX technologies and runs on the Internet Explorer browser window. All processing is done by software that runs on a standard personal computer, so there is no need for any special device.
PC-Based Sound Reproduction System Linked to Virtual Environments Rendered by VRML
Iyer, Subu; Jouppi, Norman P.; Slayden, April
We have developed a headphone-free bidirectional immersive audio telepresence system. The primary user of the system experiences four-channel audio from a remote location while sitting or standing in a 360-degree surround projection display cube. The display cube incorporates numerous acoustic enhancements, including tilted screens, an anechoic ceiling, and speakers ported through slits in the display cube edges. Head tracking based on near-infrared video technology obtains both the user 's head position and orientation. Users can then vary the orientation of their projected voice at the remote location merely by rotating their own head. Similarly, the arrival time and volume of sound channels transmitted from the remote location are varied automatically in the display cube based on the position of the user’s head, to help maintain proper perceived interaural time and level differences between multiple channels.
A Headphone-Free Head-Tracked Audio Telepresence System
Welti, Todd S.
Bass management has many advantages for surround sound listening, however there is still a question regarding audibility of two channel versus single channel bass reproduction. Many previous investigations have lacked rigor or been preliminary in nature. The study presented here includes strict control of nuisance variables and significance testing of results. A test is described wherein trained listeners compared four different subwoofer configurations in controlled listening tests, using a very sensitive triangle test, in a typical listening room. Methods of controlling nuisance variables are discussed. These include double blind testing, equalizing all responses flat below 120 Hz, and allowing pre-training for the listening test. Critical selection of audio test loops using low frequency decorrelation is discussed. Results are presented.
Subjective Comparison of Single Channel versus Two Channel Subwoofer Reproduction