Audio Engineering Society Papers

AES 124th Convention

Amsterdam, The Netherlands
May 17-20, 2008

AES Paper Ordering

Single Convention Papers are available through the AES Paper Search and Shop facility.

Papers Listing

7314
Audio Capacitors. Myth or Reality?
Dodds, Paul; Duncan, Philip; Williams, Nigel
This paper gives an account of work carried out to assess the effects of metallised film polypropylene crossover capacitors on key sonic attributes of reproduced sound. The capacitors under investigation were found to be mechanically resonant within the audio frequency band, and results obtained from subjective listening tests have shown this to have a measurable effect on audio delivery. The listening test methodology employed in this study evolved from initial ABX type tests with set program material to the final A/B tests where trained test subjects used program material that they were familiar with. The main findings were that capacitors used in crossover circuitry can exhibit mechanical resonance, and that maximizing the listener’s control over the listening situation and minimizing stress to the listener were necessary to obtain meaningful subjective test results.

7315
Perceptual Study and Auditory Analysis on Digital Crossover Filters
Karjalainen, Matti; Korhola, Henri
Digital crossover filters offer interesting possibilities for sound reproduction, but there does not exist many publications on how they behave perceptually. In this research, phase and magnitude errors in digital implementations of linear phase FIR as well as Linkwitz-Riley crossover filters are studied perceptually and by auditory analysis. In a headphone simulation listening experiment we explored the just noticeable level of degradation due to crossover filter artifacts. In a real loudspeaker experiment we explored rough guidelines for 'safe' filter orders of linear-phase FIR crossover filters, which would not produce audible errors. Possibilities to predict the perceived errors were then explored using auditory analysis, including also third-octave magnitude spectrum and group delay as simple auditory correlates. Linear-phase FIR crossovers were found to produce different kind of phase errors than Linkwitz-Riley crossovers. The auditory analysis can qualitatively explain the perceptibility degradation.

7316
The Air Spring Effect of Flat Panel Speakers
Beer, Daniel; Jahr, Michael; Reich, Alexander; Schuster, Michaela
Flat panel speakers are characterized by its low manufactured depth. Compared with conventional loudspeakers the space saving integration in existing surroundings is an advantage. From the acoustics point of view disadvantages come along with the low manufactured depth that influence the reproduction in the lower and middle frequency range. Based on measurements and FEM-simulations the reasons for this behavior were analyzed. Supplementary methods for solving this problem have been considered that are derived from conventional loudspeaker technologies.

7317
The Inertial Air Load of a Loudspeaker Diaphragm
Vanderkooy, John
A typical bass loudspeaker driver has an inertial air load which is about 30% of its actual cone mass. This air load mass is often poorly understood, but it is significant in defining the resonance frequency, and the purpose of this paper is to understand the concept, clarify important aspects, and present some corroborative measurements. The immediate surroundings of the diaphragm determine the low-frequency air load, and measurements on a test driver with different mounting arrangements are made and assessed, including measurements in vacuum. A loudspeaker box presents its own complications. Simulations are used to show how the air load depends on baffle size. In general the air load may not be accurately represented by the usual approximations that apply to a piston in an infinite baffle or to a freely oscillating disk, but they do give a rough estimate.

7318
Horn Loudspeakers Nonlinearity Comparison and Linearization Using Volterra Series
Bard, Delphine
The characterization of a weakly nonlinear electroacoustic device with usual methods of measurement (THD, intermodulation) does not illustrate the nonlinearities themselves, but only some of their effects. Device linearization can be achieved by applying the inverse nonlinearity upstream of the device, under the condition that the nonlinearity law is known in detail. This paper presents nonlinearities behavior comparison of horn loudspeakers of different frequency ranges using an experimental method of weak nonlinearity characterisation and compensation, based on a representation of the nonlinearity by Volterra series using multitone excitations.

7319
Audibility of Phase Response Differences in a Stereo Playback System. Part 1: Headphone Reproduction
Choisel, Sylvain; Martin, Geoff
The audibility of phase distortion in sound reproduction systems has been the subject of many studies, however, it remains a topic of controversy, in particular in the field of loudspeaker or headphone equalization. Most studies lead to the conclusion that, although phase distortion may be audible for specific stimuli, in realistic listening situations in a room, they will go largely unnoticed. These studies, however, have focused on monophonic phase distortion; a severe limitation, since ignoring phase response in equalization can result in different phase distortion in different channels. It is the purpose of the present study to investigate the audibility of stereophonic phase mismatch in the specific case of headphone reproduction. In addition, the implications on microphone design and production are discussed.

7320
Audible ICMP Echo Responses for Monitoring Ultra Low Delayed Audio Streams
Carôt, Alexander; Renaud, Alain; Werner, Christian
Playing live music on the Internet is very demanding in terms of delay, loss or jitter and hence requires extremely reliable network conditions. Jitter is the most problematic factor because it has a direct influence on the required network buffer sizes for receiving low delay audio streams. Therefore measuring the amount of jitter is a very complex task due to the multi-hop architecture of the Internet. So far it has been impossible to know at which hop these delay variances appear. The authors propose a solution that is able to generate an audible impression of the jitter problem for each hop.

7321
Audio Fingerprint and Its Applications to Peer-To-Peer Systems
D'Aguanno, Antonello; Haus, Goffredo
In this work we want to analyze the applicability of audio-fingerprint technology to peer-to-peer systems. Audio-fingerprint is a technology commonly applied to scopes like audio identification or digital rights management. Peer-to-peer is a common Internet paradigm to share various digitalized contents. In this paper we propose an improvement for typical peer-to-peer architectures (query flooding, centralized directory, hybrid architecture) which permits the application of audio fingerprint technology to these systems.

7322
EBU Tech.doc. 3326 for Interoperability Between Audio Over IP Units
Coinchon, Mathias; Jonsson, Lars
Audio over IP end units are now common in radio and TV operations for streaming programmes over IP networks. The units are used to create contribution circuits from remote sites or local offices into main studio centres. The IP networks used are usually well managed corporate networks with good Quality of Service (QoS) and usually high bandwidth. Due to its availability, the Internet is also increasingly used for various cases of radio and television contribution, especially over longer distances. However, the use of high bit rates and reliable contribution transmissions over the Internet can not be guaranteed. Correspondents have the choice in their equipment to use either ISDN or the Internet to deliver their reports. More than 20 manufacturers now provide equipment for audio over IP applications. The EBU has issued and verified a standard, EBU TECH 3326-2007, which allows for interoperability between previously not compatible Audio over IP codecs. A plug-test between nine manufacturers held in February 2008 proved that earlier incompatible units now can connect according to the new standard.

7323
A Grid-Based Approach to the Remote Control and Recall of the Properties of IEEE1394 Audio Devices
Foss, Richard; Foulkes, Philip
Typically, the configuration of audio hardware and software is not integrated. This paper discusses a software system that has been developed to remotely control and recall the properties of IEEE1394 (FireWire) audio devices via a series of graphical routing matrices. The software presents sound engineers with a graphical routing matrix that shows, along its axes, the available FireWire audio devices on a FireWire network. Inter device connection management may be performed by selecting the cross points on the grid, and intra device control may be performed via device editors that are displayed via the axes of the matrix. The software application may be hosted by a compatible Digital Audio Workstation (DAW) application to allow for the storing and recalling of the various properties associated with the devices.

7324
Can the Public Internet Be Used for Broadcast Applications?
Daniels, Simon
This paper will look at a number of examples of remote broadcasts over contended IP links and examine the key points in their success. We will talk about issues such as jitter and latency and considerations regarding essential features on IP codec equipment. The experiences of major European broadcasters trialing audio over the Public Internet will form the basis of a discussion of the pitfalls and possibilities associated with using the Public Internet for essential broadcast links.

7325
Objective and Subjective Evaluation of Urban Acoustic Modelling and Auralisation
Kang, Jian; Meng, Yan; Smyrnova, Yuliya
This paper presents the results of objective and subjective evaluation of a simulation/auralisation system based on model CRR – combined ray-tracing and radiosity. Auralisation of an urban square has been carried out with various boundary reflection patterns (purely specular, purely diffuse and mix of specular and diffuse), using two audio stimuli. The subjective evaluation results reveal a strong impact of sound sources and reflection pattern. Despite similarities in objective measures, there are noticeable differences in subjective attributes between signals based on simulated and measured impulse responses, but current auralisation algorithms are still adequate in simulating real urban environments.

7326
Virtual Vs. Actual Multichannel Acoustical Recording
Kearney, Gavin; Levison, Jeff
We present a comparison of live recordings of a choral ensemble versus dry recordings of the same players, with the acoustic environment reconstructed from impulse responses of the original reverberant performance space. Binaural measurements are used to objectively classify the recordings, and the perceptual attributes are investigated through a series of subjective listening tests. It is shown that the differences between dry recordings convolved with linear time-invariant (LTI) impulse responses and actual acoustical recordings can be perceived by a panel of expert listeners.

7327
Virtual Sources and Moving Targets
Cooper, David; Dickins, Glenn; McGrath, David
This paper presents an analysis of the effects of listener mobility on the stability of virtual source images created by a pair of speakers. A spherical head is used to generate analytic head related transfer functions from which we create a simple perceptual localisation model for the forward half of the horizontal plane. This model is then used to investigate changes in perceived source localisation as the listener moves. The analysis demonstrates that even with this simple model, and the assumption of small listener movements, the source image becomes unstable at a relatively low frequency. Given that for such low frequencies the spherical head model is a reasonable approximation of measured HRTFs, this work suggests that individualised HRTF and pinnae functions are of little benefit when designing a virtualizer system that allows for some listener mobility.

7328
On the Use of Directional Loudspeakers to Create a Sound Source Close to the Listener
de Bruijn, Werner; Härmä, Aki; van de Par, Steven
It is sometimes desired to create an illusion that a sound source appears closer to the listener than the nearest loudspeaker location. By using highly directional loudspeakers one may manipulate the relation between direct and reverberant energy and therefore change the distance cues to make the sound source appear very close to the listener. Another factor that influences the perceived distance of a source is the time- and spatial fine-structure of the early reflections at the listener position. In this paper we present a method for producing controllable distance effects between the listener location and the nearest loudspeakers that combines these two mechanisms. In particular, the method consisist of reproduction of the source signal by a highly directional loudspeaker, combined with adding synthetic early reflections using a standard surround audio reproduction system. A listening test was carried out, which confirms that the method is indeed capable of producing the desirable effect.

7329
Directional Analysis of Sound Field with Linear Microphone Array and Applications in Sound Reproduction
Ahonen, Jukka; Kallinger, Markus; Küch, Fabian; Pulkki, Ville; Schultz-Amling, Richard
The use of linear microphone array composed of two closely spaced omnidirectional microphones as input to teleconference application of Directional Audio Coding (DirAC) is presented. DirAC is a method for spatial sound processing, where the direction of the arrival of sound and diffuseness are analyzed, and used for different purposes in reproduction. Two-dimensional plane arrays have been used so far to generate input signals for DirAC, in which case it is possible to measure directly two-dimensional sound field. In this study, one-dimensional linear array is used to provide input signals for one-dimensional direction and diffuseness analysis in DirAC. Listening tests are conducted to evaluate the intelligibility of speech with two simultaneous talkers, when the linear array is used in teleconference application.

7330
The SoundScape Renderer: A Unified Spatial Audio Reproduction Framework for Arbitrary Rendering Methods
Ahrens, Jens; Geier, Matthias; Spors, Sascha
The SoundScape Renderer is a versatile software framework for real-time spatial audio rendering. The modular system architecture allows the use of arbitrary rendering methods. Three rendering modules are currently implemented: Wave Field Synthesis, Vector Base Amplitude Panning and Binaural Rendering. After a description of the software architecture, the implementation of the available rendering methods is explained and the graphical user interface is shown as well as the network interface for the remote control of the virtual audio scene. Finally, the Audio Scene Description Format, a system-independent storage file format, is briefly presented.

7331
Initial Investigation of Signal Capture Techniques for Objective Measurement of Spatial Impression Considering Head Movement
Kim, Chungeun; Mason, Russell; Tim, Brookes
In a previous study it was discovered that listeners normally make head movements attempting to evaluate source width and envelopment as well as source location. To accommodate this finding in the development of an objective measurement model for spatial impression, two capturing models were introduced and designed in this research, based on binaural technique: 1) rotating Head And Torso Simulator (HATS), and 2) a sphere with multiple microphones. As an initial study, measurements of interaural time difference (ITD), level difference (ILD) and cross-correlation coefficient (IACC) made with the HATS were compared with those made with a sphere containing two microphones. The magnitude of the differences was judged in a perceptually relevant manner by comparing them with the just-noticeable differences (JNDs) of these parameters. The results showed that the differences were generally not negligible, implying the necessity of enhancement of the sphere model, possibly by introducing equivalents of the pinnae or torso. An exception was the case of IACC, where the reference of JND specification affected the perceptual significance of its difference between the two models.

7332
A Second Order Differential Microphone Technique for Spatially Encoding Virtual Room Acoustics
Murphy, Damian; Southern, Alexander
Room acoustics modelling using a numerical simulation technique known as the Digital Waveguide Mesh (DWM) has previously been presented as a suitable method for measuring spatial Room Impulse Responses (RIR) of virtual enclosed spaces. In this paper, a new method for capturing the DWM modelled soundfield using an array of spatially distributed pressure sensitive receivers is presented. The polar response of the formed 2nd order virtual microphone is measured and compared to the theoretical polar response. This approach is proven to be capable of decomposing the modelled soundfield into the second order spherical harmonic components that are typically associated with 2nd order Ambisonics.

7333
Time-Varying Transform for High Quality Audio Communication Codecs
Kovesi, Balazs; Philippe, Pierrick; Virette, David
High quality audio communication is a current challenge addressed by the standardisation committees. In this context, ITU and MPEG recently issued standards for high quality coding of both speech and music contents. Transform coding is used and allows quality commensurate with bit rates regardless of the audio content. Up to now, only constant transform sizes were used in these coding schemes since time varying transform needed lookahead for perfect reconstruction, hence adding further delay. In this paper we demonstrate how variable transform sizes can be used without affecting the coding delay. Based on the filterbank theory, a framework avoiding lookahead is presented. The quality improvement offered by the proposed solution is illustrated in the context of MPEG-4 Enhanced Low Delay AAC.

7334
Differential Graph-Based Coding of Spikes in a Biologically-Inspired Universal Audio Coder
Lahdili, Hassan; Najaf-Zadeh, Hossein; Pichevar, Ramin; Thibault, Louis
In a previous work we showed that it is possible to code audio materials using a biologically-inspired universal audio coder based on matching pursuit. The best atoms/kernels chosen by matching pursuit are represented by spikes to reflect the biologically-inspired nature of the algorithm. In that work, each spike or atom was defined by parameters such as timing, channel frequency, amplitude, chirp factor, etc. that were encoded independently. However, encoding each atom/spike as a separate entity is very bit consuming. In the present work, we propose algorithms to encode only the difference between parameters associated with spikes. Hence, we assume that each spike/atom is a node in a graph and choose the sequence of spikes that will minimize the differential encoding costs. Methods based on minimum spanning tree and travelling salesman are proposed and compared for the graph-based optimization of the code.

7335
Unravelling the Relationship Between Basic Audio Quality and Fidelity Attributes in Low Bit-Rate Multi-Channel Audio Codecs
Marins, Paulo; Rumsey, Francis; Zielinski, Slawomir
Prior to this study the evaluation of multi-channel audio codecs has been done mainly according to the ITU-R standards BS.1116 and BS.1534. Basic audio quality is the only perceptual attribute assessed in the majority of these tests. This approach, although efficient for measuring the overall quality of several codecs at once, does not provide reasons why a particular codec is rated as better as or worse than another. In this study, fidelity attributes were included; these were based on the attributes suggested in the ITU-R standards but have not been used explicitly in codec evaluation up to now. In this experiment the perceptual importance of these attributes and their contribution to the basic audio quality of low bit-rate surround sound codecs were investigated.

7336
A New Perceptual Model for Audio Coding Based on Spectro-Temporal Masking
Kohlrausch, Armin; Koppens, Jeroen; Oomen, Werner; van de Par, Steven
In psychoacoustics, considerable advances have been made recently in developing computational models that can predict the discriminability of two sounds taking into account spectro-temporal masking effects. These models operate as artificial observers by making predictions about the discriminability of arbitrary signals [e.g. Dau et al. J. Acoust. Soc. Am. 99, Vol. 36(15), 1996]. Therefore, such models can be applied in the context of a perceptual audio coder. A drawback, however, is the computational complexity of such advanced models, especially because the model needs to evaluate each quantization option separately. In this contribution a model is introduced and evaluated that is a computationally lighter version of the Dau model but maintains its essential spectro-temporal masking predictions. Listening test results in a transform coder setting show that the proposed model outperforms a conventional purely spectral masking model and the original model proposed by Dau.

7337
Delayless Mixing - On the Benefits of MPEG-4 AAC-ELD in High Quality Communication Systems
Albert, Tobias; Ekstrand, Per; Geiger, Ralf; Henn, Fredrik; Lutzky, Manfred; Przioda, Daniel; Ruoppila, Vesa; Schmidt, Markus; Schnell, Markus; Tarnes, Erlend
Tele- and video conferencing systems for modern business communication are managed by central hubs, so-called multipoint control units (MCU). One major task of these units is the mixing of audio streams from the participating sites. This is traditionally done by decoding the streams, mixing in time domain and then re-encoding of the mixed signals. This requires additional processing power, leads to increased delay and degraded audio quality. The paper demonstrates how the recently standardized MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec offers a solution to these problems by efficient and delayless mixing in the transform domain of the codec.

7338
Low-Power MPEG-4 HE-AAC Version-2 Encoder
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Yang, Chung-Han
In MPEG-4 HE-AAC version-2 encoder, the analysis/synthesis complex-exponential modulation filter banks are used in spectral band replication (SBR) and parametric stereo (PS) coding. Due to the aliasing interference, the complex banks instead of real banks are adopted in the SBR and PS coding. However, the additional overhead from the complex values in the CEMFB and the subsequent processing have led to high operational overhead. Our previous work has designed the SBR encoders based on the real-domain cosine modulation filter banks; we proposed a complexification-based approach for the SBR coding. This paper extends the work into PS coding. An approximate method for parameters estimation is proposed to save operational overhead with only one CEMFBanalysis channel. Also, a phase-adjustment down-mixing method is proposed to reduce energy vanish effects.

7339
Low Complexity Bit Allocation Algorithms for MP3/AAC Encoding
Nithin, S.; Sreenivas, T. V.; Suresh, Kumaraswamy
We have developed two reduced complexity bit-allocation algorithms for MP3/AAC based audio encoding, which can be useful at low bit-rates. One algorithm derives optimum bit-allocation using constrained optimization of weighted noise-to-mask ratio and the second algorithm uses decoupled iterations for distortion control and rate control, with convergence criteria. MUSHRA based evaluation indicated that the new algorithm would be comparable to AAC but requiring only about 1/10 th the complexity.

7340
Linear Filtering in MDCT Domain
Sreenivas, T. V.; Suresh, Kumaraswamy
In this paper, expressions for convolution multiplication properties of MDCT are derived starting from the equivalent DFT representations. Using these expressions, methods for implementing linear filtering through block convolution in the MDCT domain are presented. The implementation is exact for symmetric filters and approximate for non-symmetric filters in the case of rectangular window based MDCT. For a general MDCT window function, the filtering is done on the windowed segments and hence the convolution is approximate for symmetric as well as non-symmetric filters. This approximation error is shown to be perceptually insignificant for symmetric impulse response filters. Moreover, the inherent $50 \%$ overlap between adjacent frames used in MDCT computation does reduce this approximation error similar to smoothing of other block processing errors. The presented techniques are useful for compressed domain processing of audio signals.

7341
A Study of Electrostatic Forces in Single-Acting Condenser Digital Transducer
Husník, Libor
One of the possibilities to design a transducer with the direct digital-to-analog conversion, sometimes called a digital loudspeaker, is the miniature condenser transducer manufactured on a silicon chip. Only recently this micro technology has been made available commercially, which open further application possibilities. The proposed article is aimed at the study in which the back electrode of the electrostatic transducer is partitioned into sections having total areas proportional to powers of 2. Since the electrostatic force acting on the membrane is affected by the distribution of bit groups, which cannot be even, said electrostatic force will not be a linear function of the signal voltage. Correction coefficients for some arrangements are searched for.

7342
Ultra-Thin Micro-Loudspeaker Using Oblique Magnetic Circuit.
Matsumura, Toshiyuki; Saiki, Shuji; Sano, Koji; Usuki, Sawako
More and more functions are installed to a mobile phone but the size of the handset has become smaller. Devices installed in the mobile phone have been required to be downsized or thinner. Micro-loudspeaker installed to the mobile phone is required to be thinner. They are required both becoming thinner and reproducing high quality sound. However, it has been very difficult to make thinner micro-loudspeaker without deteriorating the acoustic performance because the structure of conventional dynamic micro-loudspeaker is not suitable to be thinner. We have succeeded in developing an ultra-thin micro-loudspeaker using Oblique Magnetic Circuitwhich is 1.5mm thick (45% thinner than conventional dynamic micro-loudspeaker) without sacrificing sound quality.

7343
A Novel Glass Laminated Structure for Flat Panel Loudspeakers
Harris, Neil; Mal, Olivier; Novotny, Marek; Verbeeren, Bart
A new, patented “sandwich structure” has been developed for various audio applications, in which thin glass sheets are laminated with a special PVB (Polyvinyl Butyral) film to eliminate typical acoustical weaknesses of monolithic glass and standard laminated solutions. The glass improvements include suppression of ringing of the audio signal, and a much more flexible and lightweight glass structure. It results in flatter frequency response (both on-axis and 180° power response) and better transport of vibrations in the glass surface In addition; better acoustical sensitivity and mechanical resistance are achieved. In this paper, after defining the structure of the developed laminated glass solution, we compare its performances to previously try monolithic and laminated glass solutions. We also emphasize the key factors influencing the final acoustical properties. Finally, we introduce potential application fields for the developed structure.

7344
A Digitally Direct Driven Dynamic-Type Loudspeaker
Kuroki, Kazushige; Saito, Ryota; Shinkawa, Naoto; Tsuchiya, Tomohiro; Yasuda, Akira
If the speaker can be driven digitally, it becomes possible to perform all processes from the input to the output digitally without analog components such as a power amplifier, and a small, light, and high-quality speaker system can be achieved. In this paper, we propose a basic idea of Digital-Speaker and a digital-driven dynamic-type loudspeaker that provided with multiple-voice coils employing multi-bit delta-sigma modulation. The piezo-electric-speaker, which was utilized in previous paper, is replaced with the voice coil. The prototype implemented with a FPGA, CMOS drivers, and dynamic-type loudspeaker. THD and SPL are about 0.42% and 104dB, and the output power is 1W even when power supply is 1V.

7345
Accelerated Power Test Analysis Based on Loudspeaker Life Distribution
Shen, Yong; Wang, Xu; Wu, Zhicheng
For the loudspeaker manufacturers, the long time spending on power test required by relative standards or buyers has deeply influenced the period of product design and development. The authors apply the theory of reliability to cut the duration of loudspeaker power test. On the basis of experiment data, a model of loudspeaker life distribution is propounded, from which an accelerated factor of loudspeaker power test is derived and then the characteristic of loudspeaker under normal working conditions can be estimated. The method can be used on relative power test conveniently and shorten the duration of the test effectively.

7346
Perception and Physical Behavior of Loudspeaker Nonlinearities at Bass Frequencies in Closed Vs. Reflex Enclosures
Ahonen, Jukka; Karjalainen, Matti; Rauhala, Jukka; Tikander, Miikka
This paper examines loudspeaker nonlinearities at bass frequencies in closed and reflex enclosures using signal analysis and perceptual evaluation methods. The nonlinearities are investigated by driving the loudspeakers to be compared with sinusoidal and musical test tones. The produced responses are evaluated in terms of diaphragm displacement, harmonic distortion, and bandwise distortion. In addition, a listening experiment is conducted in order to determine how the nonlinearities are perceived in both reflex and closed enclosures. The results show that with signals that have energy close to the tuning frequency of the reflex port produce more distortion with the closed enclosure. On the other hand, acoustic bass test tone behaved in an opposite way causing more distortion with the reflex enclosure. These phenomena were verified with the listening tests.

7347
Enhancements to the SBC CODEC for Voice Communication in Mobile Devices
Pilati, Laurent; Zadissa, Mohammad
The Bluetooth Audio Distribution profile uses Low complexity sub-band Coder (SBC) as its mandatory audio compression codec. More recently, SBC has been selected for Bluetooth wideband voice communication. Since SBC was first designed for audio compression, it does not incorporate the features that speech coders commonly use. The use of Voice Activity Detection and Comfort Noise Generation to reduce bandwidth usage and power consumption is an example. In this work, we investigated extensions for SBC that would make it better suited for voice compression in the Bluetooth framework. The proposed enhancements were evaluated on the basis of their impact on voice quality, their implementation requirements, and their bandwidth savings.

7348
Efficiently Shuffling Large Sets of Clips
Herrmann, Ulrich
A method for randomly shuffling through large sets of video or audio clips. Many up-to-date devices have only a rather limited capability of shuffling only up to 200 or 256 songs. This algorithm presents a way of shuffling even large sets with almost unlimited number of items. It also provides the ability to traverse back and forth with little processing power on today’s micro controllers. All this is done with only few bytes of codes and almost no RAM needed.

7349
Hardware/Software Co-Design of Multi-Format Audio Decoder
Kim, DoHyung; Lee, KangEun; Lee, Shihwa; Ryu, Soojung; Son, ChangYong
This paper presents a hardware/software co-design method for the implementation of multi-format audio decoder with ultra low power, small chip size, and high flexibility which are most critical factors in embedded devices. This approach can provide both flexibility and low power with high performance in such a way that hardware implementation has been focused on the commonly used critical blocks of multiple audio decoders having intensive computations. Hardware blocks are well modularized to allow easy and rapid architecture exploration of several digital audio standards. The proposed system can decode MP3 bitstream using only about 4MHz clock frequency and AAC bitstream using only about 7MHz clock frequency on average at the sampling rate of 48 kHz and the target bitrate of 128kbps/stereo.

7350
Audio Enhancement for Portable Device Based Speech Applications
Hoare, Steve; Hughes, Peter; Turnbull, Rory
Portable devices with audio capabilities necessitate the use of small transducers, often with poor frequency responses. This can be a limiting factor in the perception of the speech quality of VoIP services hosted on such a device. This paper seeks to investigate the problem and provide practical solutions through the use of appropriate enhancement technologies. The paper covers the use of equalization, dynamic range compression and psychoacoustic bass enhancement as possible methods for improving intelligibility. Subjective tests are used to evaluate the enhancements prior to making practical recommendations.

7351
An Efficient, Low-Noise Filter Architecture for Bass Processing on a DSP Core
Bentall, Nathan; Eastty, Peter; Stott, Duncan
Bass Enhancement is becoming popular in many forms of consumer devices. Whatever technique is used on whatever processor, the low frequency filtering involved is frequently the major determinant of system signal-to-noise ratio. The architecture described combines an efficient, cascaded, low-pass FIR filter and a poly-phase adaptation of standard low frequency IIR filtering. The resulting circuit achieves a 20 to 30dB improvement in signal to noise ratio at the cost of only 12 instructions per sample. The technique may be applied to any bass processing using fixed or floating point processors. Complete design tables for the cascaded FIR filters are given as are noise spectrum plots of the results.

7352
Implementation of Dynamic Voltage and Frequency Scaling on Portable Audio Player
Dahyanto, Harliono; Gan, Woon-Seng
Current portable computing device demands not only higher performance but also lower power consumption. For the same reason, this research aims to build a framework that enables a rapid design of energy-efficient embedded systems. Specifically, this research is focused on dynamic voltage scaling algorithm, which has been found effective in saving power consumption. We develop a method of scaling voltage and frequency dynamically on the latest embedded processor, jointly designed by Analog Devices and Intel. The rationale behind this method is to avoid processor being idle in high operating frequency and voltage. Instead, processor can save power by running task at lower frequency and voltage, and completing it just before the real-time deadline. Furthermore, our method can also be implemented in other embedded processors with voltage-frequency scaling feature.

7353
Low-Frequency Extension of Gated Loudspeaker Measurements
Backman, Juha
The free-field response of a loudspeaker system can be approximated through a gated measurement, made in a sufficiently large space. The frequency resolution is nominally determined by the time gap between the direct sound and the first reflection, but the actual low-frequency accuracy of gated measurements is reduced also by the group delay of the loudspeaker itself. The group delay at low frequencies may cause a large fraction of the energy sound radiation to be cut off, underestimating the low-frequency response. A method is presented to estimate the approximate low-frequency response from the impedance measurement of the loudspeaker and to use the response to pre-process the acoustical measurement to improve the accuracy of the gated measurement.

7354
Measurement and Fourier-Bessel Analysis of Loudspeakers Radiation Patterns Using a Spherical Array of Microphones.
Brunel, Vincent; Fazi, Filippo; Hörchens, Lars; Nelson, Philip
Loudspeakers are widely used in three-dimensional sound field reconstruction systems, but their spatial directivity features are relatively little-known. In this paper, a hemispherical array of 40 microphones was designed and built in order to measure the pressure field radiated by different commercially available loudspeakers. The spatial samples of the acoustic pressure were processed in order to estimate the truncated Fourier-Bessel expansion of the sound field, which allows the reconstruction of the 3D radiation pattern. An analysis of the errors involved in the estimation was also performed with a numerical model of the array.

7355
Turbulent and Viscous Air Friction in the Mid-High Frequency Loudspeaker
Djurek, Danijel; Djurek, Ivan; Petosic, Antonio
Mid-high frequency loudspeaker with resonant frequency f = 982 Hz has been studied in atmospheres of air, He4, D2 and H2 at pressures ranging 0-1 bar. The measurements of viscous and turbulent contributions to the friction entering Q-factor showed significant difference as compared to a low frequency loudspeaker. The resonant frequency in air is considerably lower in an evacuated space than at 1 bar, and this differs from the low frequency loudspeaker, when opposite is true. Measurements showed that imaginary part of viscous friction in Navier-Stokes equation is dominant, while contribution of the real part to the friction term is less significant, and Navier-Stokes equation reduces to the Stokes form grad p =- mi delta v, when imaginary part of the viscous force reduces effective vibration mass, which in turn enables operation of the loudspeaker at high frequency. The data were interpreted in terms of Greenspan theory of the piston radiator.

7356
Modeling of an Electrodynamic Loudspeaker Including Membrane Viscoelasticy
Djurek, Danijel; Djurek, Ivan; Petosic, Antonio
The model is proposed based upon viscoelastic properties of the loudspeaker membrane, and properties considered include stress-strain hysteresis, creeping effect, initial stress effect and appearance of the temperature fluctuations on the membrane surface. The creeping displacement response dependent on the step-like excitation current has been measured on different loudspeaker configurations, and listed effects were analyzed in terms of the N-order Bennewitz-Rötger differential equation, commonly used for description of the system of vibrating viscoelastic body. The main parameter in this equation is inverse stress parameter which connects friction and restoring term in the loudspeaker vibrating system.

7357
On a Novel Concept of Membrane Suspension in an Electrodynamic Loudspeaker
Djurek, Danijel; Djurek, Ivan; Petosic, Antonio
A laboratory model of an electrodynamic loudspeaker has been realized with the membrane suspended on a hollow elastic torus positioned in the bottom of the membrane, close to the voice coil. This geometry removes the torque in the membrane coming from the maximum possible distance of the suspension on the outer rim from the voice coil. The suppressed torque results in the suppression of the Bessel vibration modes which generate stochastic deformation tilts on the membrane surface. Such tilts contribute to the intrinsic friction of the membrane, and their absence results in minor viscoelastic losses. Lateral rigidity of the torus is sufficient for operation of the loudspeaker without centric fixation.

7358
The Theory of Wave Field Synthesis Revisited
Ahrens, Jens; Rabenstein, Rudolph; Spors, Sascha
Wave field synthesis is a spatial sound field reproduction technique aiming at authentic reproduction of auditory scenes. Its theoretical foundation has been developed almost 20 years ago and has been improved considerably since then. Most of the original work on wave field synthesis is restricted to the reproduction in a planar listening area using linear loudspeaker arrays. Extensions like arbitrarily shaped distributions of secondary sources and three-dimensional reproduction in a listening volume have not been discussed in a unified framework so far. This paper revisits the theory of wave field synthesis and presents a unified theoretical framework covering arbitrarily shaped loudspeaker arrays for two- and three-dimensional reproduction. The paper additionally gives an overview on the artifacts resulting in practical setups and briefly discusses some extensions to the traditional concepts of WFS.

7359
A Finite Difference Time Domain Approach to Analysing Room Effects on Wave Field Synthesis Reproduction
Drumm, Ian; Hirst, Jos; Oldfield, Robert
Probably the largest pit-fall to accurate audio reproduction using wave field synthesis (WFS) is the listening space. The WFS theory assumes free field, source free conditions which are seldom the case for practical sound reproduction. There is consequently a need to determine what effect the reproduction room has upon the synthesised sound field. This paper presents a finite difference time-domain (FDTD) approach to predicting the sound field in a room with arbitrary geometry and frequency dependent absorbing boundaries. A significant benefit to using FDTD is that the WFS system can be modelled both as part of the room and also in free-field conditions therefore distortion of the sound field from the acoustics of the reproduction room can be quantified.

7360
Wave Field Synthesis Evaluation Using the Minimum Audible Angle in a Concert Hall
Corteel, Etienne; Marentakis, Georgios; Mc Adams, Stephen
Localization accuracy with Wave Field Synthesis (WFS) was estimated in a variable-acoustics concert hall. Contrary to previous studies, we employed a Minimum Audible Angle (MAA) paradigm as a measure of localization performance. The MAA was estimated for three different listening positions, three orientations of the listeners (\degrees{0},\degrees{60}, \degrees{90}) and two acoustical conditions. WFS was found to produce satisfying localization cues that depend little on the reverberation time of the room and only weakly on the position of the listener.

7361
Objective and Subjective Analysis of Localisation Accuracy in Wave Field Synthesis
Corteel, Etienne; Sanson, Joseph; Warusfel, Olivier
This paper analyses localization inaccuracies in the synthesis of virtual sound sources using Wave Field Synthesis (WFS), particularly at high frequencies. Objective and perceptual analyses are conducted through a binaural simulation of the actual sound field reproduced at the listener’s ears. The simulation consists in summing the respective contribution of each array transducer after filtering it with the appropriate HRTF according to the considered listener’s position. High-pass filtered white noises are used as a critical signal to investigate the impact of aliasing on localization accuracy. Objective and perceptual observations show that localization accuracy may degrade for off-centered listening positions which can be mainly attributed to a mismatch in the elicited Interaural Level Differences (ILD) above the aliasing frequency.

7362
Wave Field Synthesis Rendering with Increased Aliasing Frequency
Corteel, Etienne; Kuhn-Rahloff, Clemens; Pellegrini, Renato
Wave Field Synthesis (WFS) is a sound reproduction techniques that enables the synthesis of target sound fields without any assumption on the listening position. Spatial aliasing is one of the remaining artefacts of WFS which limits the exact synthesis below a corner frequency referred to as spatial aliasing frequency. This paper presents a new technique that enables to increase the spatial aliasing frequency of WFS assuming a preferred listening area. The presented technique is fully scalable and may be adapted to any listening zone shape or location. Applications in the domain of simulation environments and home entertainment are discussed.

7363
Reproduction of Moving Virtual Sound Sources with Special Attention to the Doppler Effect
Ahrens, Jens; Spors, Sascha
In this paper, we outline a basic framework for the reproduction of the wave field of moving virtual sound sources. Conventional implementations usually reproduce moving virtual sources as a sequence of stationary positions. This process leads to various artifacts as reported in the literature. On the example of wave field synthesis, we show that the explicit consideration of the physical properties of the wave field of moving sources avoids these artifacts and allows for the accurate reproduction of the Doppler Effect. However, numerical simulations suggest that the artifacts inherent to the reproduction system can lead to a heavy degradation of the reproduction quality.

7364
A Graphical Tool Set for Analyzing Wave Field Synthesis Algorithms
Korn, Thomas
Current Wave Field Synthesis (WFS) rendering realizations consist of large structures of audio signal processing components (filters, delays, amplitude weighting) that are controlled by complex algorithms based on the virtual source's properties. This paper proposes a set of tools that is used to analyze the underlying WFS coefficient calculation algorithms visually by mapping characteristic measures dependent on the source's and listener's position. These measures are derived from the reproduction system's idealized transfer function and parametric impulse response description. They reveal functional aspects of the algorithm's behavior. The measures aim at supporting an intuitive understanding of the perception of virtual sound events in a Wave Field Synthesis system, but also they facilitate the basic algorithm development process.

7365
Audio-Visual Processing Tools for Auditory Scene Synthesis
Boland, Frank; Dahyot, Rozenn; Kearney, Gavin
We present an integrated set of audio-visual tracking and synthesis tools to aid matching of the audio to the video position in both horizontal and periphonic sound reinforcement systems. Compensation for screen size and loudspeaker layout for high definition formats is incorporated and the spatial localisation of the source is rendered using advanced spatialisation techniques. A subjective comparison of several original and enhanced film sequences using the Vector Base Amplitude Panning (VBAP) method is presented. The results show that the encoding of non-contradictory audio-visual spatial information, for presentation on different loudspeaker layouts significantly improves the naturalness of the listening/viewing experience.

7366
Encoding Higher Order Ambisonics with AAC
Burnett, Ian; Hellerud, Erik; Solvang, Audun; Svensson, U. Peter
In this work we explore a simple method for reducing the bit rate needed for transmitting and storing Higher Order Ambisonics (HOA). The HOA B-format signals are simply encoded using Advanced Audio Coding (AAC) as if they were individual mono signals. Wave field simulations show that by allocating more bits to the lower order signals than the higher the resulting error is very low in the sweet spot, but increases as function of distance from the center. Encoding the higher order signals with a low bit rate does not lead to a reduced audio quality. The spatial information is improved when higher-order channels are included, even if these are encoded with a low bit rate.

7367
Virtualized Listening Tests for Loudspeakers
Hiekkanen, Timo; Karjalainen, Matti; Mäkivirta, Aki
The precise location of a loudspeaker in a listening room is known to affect loudspeaker preference ratings. When multiple loudspeakers are compared the evaluation is limited by the poor human auditory memory. To overcome these problems, a method to evaluate and compare loudspeakers using headphones is proposed. The method utilizes personal head-related transfer functions in rendering the sound field recorded in a standard listening room with an artificial head. Equalization of circumaural headphones and the artificial head are investigated. Formal listening tests are conducted to examine differences between the proposed binaural method and real loudspeakers in a standard listening room. Listening tests show that the virtualized loudspeakers can be nearly imperceptible from reality in many but not in all cases.

7368
Binaural Rendering in MDCT Domain for Multi-Object Audio Coding
Iizuka, Shinya; Kikuiri, Kei; Naka, Nobuhiko
We propose a binaural rendering method in the Modified Discrete Cosine Transform (MDCT) domain. It has good compatibility with audio codecs because a number of audio codecs utilize a MDCT filter bank for time-frequency transform. The proposal maps the MDCT coefficients to the real part of the Modulated Complex Lapped Transform (MCLT) coefficients and processes the amplitudes and phases according to the binaural information. The inverse MCLT is applied to the coefficients with a synthesis window function which is derived from the perfect reconstruction condition for the phase shifted signal under the assumption of linear phase property. The proposed method is applicable to the Binaural Cue Coding Type I and offers equivalent subjective quality to the original binaural signal.

7369
Room-Dependent Preference of Virtual Surround Sound
Roginska, Agnieszka; Scott, Frederick
A common method for simulating surround sound over headphones, so-called virtual surround sound, is the convolution of content information with binaural cues. Often, room information is included. This paper examines if using HRTFs with room impulse responses customized to the room the listener is in enhances the listening experience. Perceptual experiments were conducted to evaluate subjects’ preference of processing method based on the room they were seated in.

7370
Quantization of 2D Higher Order Ambisonics
Hellerud, Erik; Solvang, Audun; Svensson, U. Peter
The spatial distribution of the quantization noise for a 2-D Higher Order Ambisonics (HOA) signal is inves- tigated analytically. Uniformly distributed loudspeakers radiating plane waves in a non reverberant environ- ment and frequency domain quantization are presumed. It is found that employing the same quantization interval for all orders leads to uniformly distributed quantization noise in space. Assigning a larger quanti- zation interval (i.e. fewer bits) to higher orders leads to a radially increasing quantization noise. Matching the quantization error to the reproduction error at the near perfect reconstruction boundary suggests that as little as four bits per sample can be used for quantization. Furthermore, high-pass ¯ltering the HOA components opens up for employing as little as three bits per sample. This quantization strategy seems very promising for reducing the rate of HOA.

7371
A Binaural Auditory Model for the Evaluation of Reproduced Stereophonic Sound
Lorho, Gaetan; Takanen, Marko
Binaural cues describing the differences in phase and power between the signals at the two ears enable our auditory system to localize sound sources and segregate spatially multiple auditory events. Recent publications on binaural auditory modeling have shown how the interaural coherence can be utilized to estimate these cues and therefore model the localization ability of our auditory system. The approach is exploited in this paper to estimate the binaural cues at different frequency bands and identify the direction of sound sources from signals recorded with a head and torso simulator. We illustrate the application of this binaural auditory model to evaluate sound reproduced by a stereophonic loudspeaker setup in terms of source localization and specific loudness.

7372
An Augmented Reality Audio Mixer and Equalizer
Karjalainen, Matti; Riikonen, Ville; Tikander, Miikka
In Augmented Reality Audio (ARA) applications the real sound environment of the user is extended with virtual ob jects. The real environment is reproduced as a pseudo-acoustic world via a special ARA headset that consists of binaural microphones and headphones. However, the headset causes coloration to the pseudo- acoustic representation. In order to make the headset acoustically transparent, equalization is needed. Digital equalization easily causes unacceptable delays. This paper presents a novel ARA mixer with real-time analog equalization to correct the coloration caused by the leakage through the headset and changed resonances in the closed ear canal.

7373
Sub-Band Adaptive Crosstalk Cancellation: A Novel Approach for Immersive Audio
Bettarelli, Ferruccio; Cecchi, Stefania; Palestini, Lorenzo; Peretti, Paolo; Piazza, Francesco
In the field of immersive audio, crosstalk canceller is required when a virtual sound is rendered over two loudspeakers. In the last decade several adaptive algorithms have been proposed: nowadays the least square (LMS) algorithm seems to be the best compromise between simplicity and robustness although its convergence is weaken for colored inputs. In this work a new approach for crosstalk cancellation based on a sub-band adaptive algorithm will be derived. The effectiveness of this algorithm, considering colored input, will be presented in terms of matrix inversion quality and fast convergence rate comparing it with the conventional LMS algorithm.

7374
Analysis and Adjustment of Planar Microphone Arrays for Application in Directional Audio Coding
Ahonen, Jukka; Del Galdo, Giovanni; Kallinger, Markus; Küch, Fabian; Pulkki, Ville; Schultz-Amling, Richard
Directional Audio Coding (DirAC) is a well-established and efficient way to capture and reproduce a spatial sound event. In a recording room, DirAC requires four spatially coincident microphones to estimate the desired parameters, i.e., direction-of-arrival and diffuseness of sound: one omnidirectional and three figure-of-eight microphones pointing along the axes of a three-dimensional Cartesian coordinate system. In most consumer applications only two dimensional scenes need to be reproduced, implying that only two figure-of-eight microphones are required. Furthermore, instead of directional microphones, arrays of omnidirectional microphones are considered for economic reasons. Therefore, we investigate various two-dimensional microphone configurations with respect to their usability for DirAC. We derive theoretical limits for the correct estimation of both direction-of-arrival and diffuseness for the most suitable planar arrays. Furthermore, we suggest a way to equalize the systematic bias for the direction-of-arrival estimation, introduced by the discrete planar arrays.

7375
Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio Using Directional Audio Coding
Ahonen, Jukka; Del Galdo, Giovanni; Kallinger, Markus; Küch, Fabian; Pulkki, Ville; Schultz-Amling, Richard
Recording and reproduction of spatial audio becomes more and more important, as multichannel audio applications gain increasing attention. Directional Audio Coding (DirAC) represents a well proven approach for the analysis and reproduction of spatial sound. In the analysis part, the direction-of-arrival and the diffuseness of the sound field is estimated in subbands using B-format signals, which can be created with 3D omnidirectional microphone arrays. However, 3D microphone configurations are not practical in consumer applications, e.g., due to physical design constraints. In this paper, we propose a new approach which allows for an approximation of the required B-format signals, but is based on a planar microphone configuration only. Comparisons with the standard DirAC approach confirm that the proposed method is able to correctly estimate the desired parameters within a wide range of frequency and the spatial resolution matches the human perception.

7376
User-Dependent Optimization of Wave Field Synthesis Reproduction for Directive Sound Fields
Diemer de Vries, Diemer; Fröhlich, Bernd; Melchior, Frank; Sladeczek, Christoph
The use of wave field synthesis (WFS) enables the correct localization of virtual sources over a large listening area. While point sources outside the listening area can be accurately reconstructed, sources inside the listening area can be correctly perceived from certain positions only. To avoid these limitations, we continuously optimize the selection of the involved speakers and the signal processing in real-time based on a tracked user position. In addition, user tracking enables us to simulate a specific directivity of a source and to optimize a corresponding room simulation. Our pilot study confirmed that this approach significantly improves the localization and sound quality of focused sources located inside the listening area.

7377
Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding
Breebaart, Jeroen; Engdegård, Jonas; Falch, Cornelia; Hellmuth, Oliver; Hilpert, Johannes; Hoelzer, Andreas; Koppens, Jeroen; Oomen, Werner; Resch, Barbara; Schuijers, Erik; Terentiev, Leonid
Following the recent trend of employing parametric enhancement tools for increasing coding or spatial rendering efficiency, Spatial Audio Object Coding (SAOC) is one of the recent standardization activities in the MPEG audio group. SAOC is a technique for efficient coding and flexible, user-controllable rendering of multiple audio objects based on transmission of a mono or stereo downmix of the object signals. The SAOC system extends the MPEG Surround standard by re-using its spatial rendering capabilities. This paper will describe the chosen reference model architecture, the association between the different operational modes and applications, and the current status of the standardization process.

7378
Focusing of Virtual Sound Sources in Higher Order Ambisonics
Ahrens, Jens; Spors, Sascha
Higher order Ambisonics (HOA) is an approach to the physical (re-)synthesis of a given wave field. It is based on the orthogonal expansion of the involved wave fields formulated for interior problems. This implies that HOA is per se only capable of recreating the wave field generated by events outside the listening area. When a virtual source is intended to be reproduced inside the listening area, strong artifacts arise in certain listening positions. These artifacts can be significantly reduced when a wave field with a focus point is reproduced instead of a virtual source. However, the reproduced wave field only coincides with that of the virtual source in one half-space defined by the location and nominal orientation of the focus point. The wave field in the other half-space converges towards the focus point.

7379
Listener Envelopment – What Has Been Done and What Future Research Is Needed?
Berg, Jan; Nyberg, Dan
In concert hall acoustics, the perceived spatial impression and/or spaciousness are characterized by the two attributes apparent source width (ASW) and listener envelopment (LEV). For LEV there are no clear consensus across the results of previous work. This paper aims to discuss the research performed on LEV and how these research results are confirming or contradicting each other. There is a consensus on the arrival angle of the later sound energy and its influence on LEV, whereas there is no clear agreement on the delay time and frequency content of the late reflections.

7380
Obtaining a Highly Directive Center Channel from Coincident Stereo Microphone Signals
Faller, Christof
Time-frequency based post-processing applied to a coincident stereo recording is proposed to generate an audio signal with a highly directive directional response pointing straight forward. Assuming an ideal coincident stereo microphone, the directional response of this center channel is effectively time and frequency invariant. Further, the look direction can be steered to left and right front directions. The technique is based on the insight that the signal which predicts left from right, modified by limiting the magnitude of the frequency domain prediction gains, has a center forward directional response. The center channel is generated using both, a left-right and a right-left magnitude-limited-predictor signal. Applications of the proposed scheme are use of stereo microphones as ``digital steerable shot gun microphones' and center channel generation for music recording.

7381
Spatial Sound in the Use of Multimodal Interfaces for the Acquisition of Motor Skills
Hoffmann, Pablo
This paper discusses the potential effectiveness of spatial sound in the use of multimodal interfaces and virtual environment technologies for the acquisition of motor skills. Because skills are generally of multimodal nature, spatial sound is discussed in terms of the role that it may play in facilitating skill acquisition by complementing, or substituting, other sensory modalities. An overview of related research areas on audiovisual and audiotactile interaction is given in connection to the potential benefits of spatial sound as a means to improve the perceptual quality of the interfaces as well as to convey information considered critical for the transfer of motor skills.

7382
Evaluating the Sensation of Envelopment Arising from 5-Channel Surround Sound Recordings
Bech, Søren; George, Sunish; Rumsey, Francis; Zielinski, Slawomir
This paper discusses a series of listening tests conducted in the UK and Denmark to evaluate the perceived envelopment of surround audio recordings. The listening tests were designed to overcome some drawbacks (such as range equalisation bias) present in the scores of a listening test based on ITU-R. BS. 1534-1 Recommendation (MUSHRA) [1], [2]. In this method the listeners were asked to evaluate the envelopment of 5-channel surround sound recordings using a 100-point continuous scale. In order to calibrate the scale, two anchor recordings were used to define points 15 and 85 on the scale. The anchor recordings were selected by means of a formal listening test and interviews with the listeners. According to the obtained results, the proposed method provides repeatable results.

7383
An Improved Pattern-Matching Method for Piano Multipitch Detection
Blanco-Martín, Elena; Casajus-Quiros, Francisco Javier; Ortiz-Berenguer, Luis Ignacio
A previous method presented by the authors carried out multi-pitch piano sound identification by using a pattern-matching process. In that method, the identification required, besides the matching-metric calculation, both a spectral predetection process and a validation step. Predetection allowed to select a subset out of the eighty-eight patterns, whereas the validation verified whether the detected note were actually in the analyzed spectrum. Both highly increased the true-positive detections ratio, but they imposed restrictions to identification of complex real sounds (e.g., two-hands playing). This paper presents an improvement in the method that allows to get rid of both, predetection and validation, by using a modified matching-metric algorithm. This work has been supported by the Spanish National Project TEC2006-13067-C03-01/TCM.

7384
Polyphonic Piano Transcription Based on Spectral Separation
Canadas-Quesada, Francisco Jesus; Carabias-Orti, Julio Jose; Mata-Campos, Raul; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
We propose a discriminative model for polyphonic piano transcription. Spectral features are obtained individually for each note. To solve the overlapping partial problem, we apply spectral separation by estimating the spectral envelope for each note. For classifying purposes, support vector machines (SVM) are trained on the spectral energy inferred from these spectral features. We apply a scheme of one-versus-all (OVA) SVM classifiers to discriminate frame-level note instances. To decrease the high frequency notes residual energy due to the downward notes shared partials, a method to cancel the interferences from the downward notes to the upward notes has been developed. The classifier output is filtered with a hidden Markov model. Our approach has been tested with synthesized and real piano recordings obtaining very promising results.

7385
Towards a Real-Time Implementation of a Physical Modelling Based Percussion Synthesizer
Chuchacz, Katarzyna; O'Modhrain, Sile; Woods, Roger
This paper presents work carried out with the objective of designing a novel percussion synthesizer based on a physical model of a plate-based percussion instrument. The algorithm has been implemented in real-time for the first time, on a Field Programmable Gate Arrays (FPGAs) chip allowing a number of parameters such as excitation value, stroke location and plate stiffness, to be changed in real-time. This presents the player with a number of new modes of playability but requires the definition and design of a flexible interface that gives the extensive access to the sound world of the synthesis model. Details of the hardware implementation architecture are put forward as well as fixed point/floating point computation aspects that impact the instrument’s playability.

7386
Dual Noise Suppression in Hearing Aids
Boone, Marinus M.; Schlesinger, Anton
A twofold approach for the enhancement of speech intelligibility in hearing aids is presented. The improvement of the signal-to-noise ratio is fundamental to the success of acoustic hearing aids for the majority of hearing impaired people, suffering from a sensorineural hearing loss. Our approach reduces background noise in order to improve speech intelligibility. The suppression of noise is achieved by a combination of optimal beam-forming and computational auditory scene analysis. With respect to objective metrics of the assessment of speech intelligibility, the associated processing shows advantages compared to either of the underlying methods alone.

7387
Automatic Sound Recognition for Security Purposes
Zwan, Pawel
In the paper an automatic sound recognition system is presented. It forms a part of a larger security system developed in order to monitor outdoor conditions for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted outdoor thus non-stationary noise of a significant energy may be present in it. In the paper an especially designed algorithm for an outdoor noise reduction is presented, non-typical events in audio stream are automatically detected and parameterized. Parameter values of various audio events are analyzed and sounds are automatically recognized. The automatic recognition accuracy obtained for various feature vectors and some chosen recognition systems is compared. The conclusions are derived and a future plan of experiments is proposed.

7388
Multipitch Estimation of Harmonically-Related Event-Notes by Improving Harmonic Matching Pursuit Decomposition
Canadas-Quesada, Francisco Jesus; Carabias-Orti, Julio Jose; Mata-Campos, Raul; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
In this work, we propose a note detection approach based on harmonic matching pursuit (HMP) and specifically designed to detect simultaneous notes. However, HMP is not able to decompose harmonic sounds in different harmonic atoms when their fundamental frequencies are harmonically-related. To solve this problem, we propose an algorithm, called atomic spectral smoothness (SS), that works over the harmonic atoms obtained by HMP. This algorithm is based on the spectral smoothness principle which supposes that the spectral envelope of a harmonic sound usually forms smooth contours. Our proposal shows promising results for polyphonic musical signals with two harmonically-related note-events.

7389
amplitude Modification Algorithms Within the Framework of Physical Modeling and of Haptic Gestural Interaction
Cadoz, Claude; Kontogeorgakopoulos, Alexandros
Every underlying technique which has been used for the realization of audio effects since the beginning of electronic and computer music, introduced different types of sound modifications and proposed new ways of control. The advent of digital signal processing has stimulated the audio processing researchers to a great extent thus a variety of algorithms were designed to provide novel sound modifications. On the other hand, physical modeling and digital simulation formalisms have been principally used for the merely imitation and emulation of older sound processing systems. The aim of this article is to propose three physical models conceived to offer sound modifications which mainly alter the amplitude of audio signals. The originality of this case is not the resulted audio modifications but their transposition in the framework of physical modeling and digital simulation, which outlines an alternative control procedure.

7390
Circular Pitch Space Based Harmonic Change Detection
Arndt, Daniel; Brandenburg, Karlheinz; Gatzsche, Gabriel; Mehnert, Markus
This paper introduces a novel method for detecting harmonic boundaries in musical audio signals. These boundaries are very useful for chord analysis because they define temporal chord limits. This harmonic change driven (event driven) analysis of musical audio signals is a better basis for a following chord analysis than the traditionally frame based only concept. The method itself works with circular pitch spaces (CPS). CPSs summarize high level aspects of the audio signal like semantic and music theoretical relationships. Using CPSs entails good results in detecting harmonic changes.

7391
Circular Pitch Space Based Musical Tonality Analysis
Arndt, Daniel; Brandenburg, Karlheinz; Gatzsche, Gabriel; Mehnert, Markus
The focus of this paper is to give an overview of existing circular pitch spaces, its special properties and application for semantic audio analysis. Beside this the symmetry model is proposed as a framework to describe the inter-model relationships between different circular pitch spaces. Similar to color spaces in vision musical pitch spaces organize pitches in a way that semantic/cognitive/theoretical/physical relationships between tones become geometrically apparent. Within the last years pitch spaces were mainly subject of music theory. But they become more and more interesting for semantic analysis of musical audio signals. Pitch spaces can be applied tokey and chord recognition, similarity calculation of musical pieces, genre estimation, tension analysis or harmonic change detection.

7392
Drift, Wow and Flutter Measurement and Reduction in Shrunken Movie Soundtracks
Czyzewski, Andrzej; Kupryjanow, Adam; Maziewski, Przemyslaw
The paper presents the method and algorithms used to determine and reduce drift, wow and flutter in shrunken movie tapes. The idea behind the algorithms is to use image processing for calculating the local tape shrinkage which is one of the reasons for drift, wow and flutter. The shrinkage can be calculated via analyzing the image height of: a movie frame, sprocket hole, pitch or another standardized movie tape element; and then it can be expressed as the drift, wow and flutter characteristic. After the characteristic determination both the soundtrack and movie frames can be corrected. The paper presents the description of the image based drift, wow and flutter determination method and the experiments confirming the theoretical findings.

7393
The Norwegian Institute of Recorded Sound: From Collection to Archive to Public Private Partnership
Drews, Mark; von Arb, Jacqueline
In 2006, the Norwegian Institute of Recorded Sound (NIRS) entered into a partnership with Memnon Audio Archiving Services to form MemNor, a commercial audio archiving service based in Stavanger, Norway. This paper traces the evolution of the Norwegian Institute of Recorded Sound from a private collection of music recordings to a municipally funded audio archive to a public private partnership and discusses the past, the current, and the future challenges involved. Details of ongoing activities are included.

7394
Cable-Free Audio Delivery for Home Theater Entertainment Systems
Floros, Andreas; Grimanis, Dimitris; Mourjopoulos, John; Tatlas, Nicolas-Alexander
Real time, multichannel audio content delivery over the air is expected to significantly simplify the interconnection complexity required for setting up typical home theater applications. However, despite the technological advantages of wireless networking standards related to high transmission rates and Quality-of-Service support, a number of issues have to be additionally addressed, such as multiple loudspeaker synchronization and packet delay/losses containing compressed quality and multiplexed audio data. In this work, further developments in the area of wireless audio delivery are presented by considering in detail multichannel reproduction for wireless home theater applications. Using both subjective and objective performance evaluation criteria, it is shown that cable-free multichannel audio playback is feasible under specific networking and audio coding conditions.

7395
Adaptive Playout for VoIP Based on the Enhanced Low Delay AAC Audio Codec
Färber, Nikolaus; Issing, Jochen; Lutzky, Manfred
The MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec extends the application area of the Advanced Audio Coding (AAC) family towards high quality conversational services. Through the support of the full audio bandwidth at low delay and low bit rate, it offers excellent support for enhanced VoIP applications. In this paper we provide a brief overview of the AAC-ELD codec and describe how its codec structure can be exploited for IP transport. The overlapping frames and excellent error concealment make it possible to use frame insertion/deletion in order to adjust the playout time to varying network delay. A playout algorithm is proposed which estimates the jitter on the network and adapts the size of the de-jitter buffer in order to minimizes buffering delay and late loss. Considering typical network conditions and the same average delay, it is shown that the playout algorithm can reduce the loss rate by more than one magnitude compared to fixed playout.

7396
Time-Alignment of Multi-Way Speakers with Group Delay Equalization - I
Bharitkar, Sunil; Holman, Tom; Kyriakakis, Chris
In this paper, a first of two-parts, a technique for time-aligning the driver responses (viz., woofer, mid-range, and tweeter responses) in a multi-way speaker system is presented. Generally, woofers exhibit a much larger time-of-arrival delay, at a listening position, compared to the mid-range and high-frequency drivers. Moreover, the time-of-arrival delay for all drivers is frequency dependent exhibiting a large variation over the audible frequency domain. Due to these differences, a two-part study was undertaken to understand the effects of these variations, quantitatively and qualitatively. In this first part, we present the motivation behind the system used for applying all-pass filters to process audio signals being delivered to the multi-way speaker and propose a time-delay difference equalization technique. We show that applying all-pass filters result in significant ``temporal-smearing' of the response, despite flattening of the group delay response. Thus depending on the amount of group-delay equalization, the smearing with pre-ring effects could potentially have audible effects depending on the content. However, despite the temporal-smearing (viz., response-dilation in time) for an arbitrary-order all-pass filter, we show that the time-frequency characteristics of these group-delay equalizing filters exhibit a uniform decay rate at all-frequencies allowing group-delay equalization without affecting the modal decay rates. Thus this enables other cascaded filter structures to be utilized for modal equalization in additional to conventional loudspeaker-room equalizers. We also propose group-delay flattening for the woofer and a small range of the midrange frequencies through a weighted approach at the lower frequencies for group-delay equalization. Future work will involve investigations using perceptually motivated variable-octave complex smoothing of responses (1/24-th octave smoothing at low frequencies and 1/3-rd octave at higher frequencies), and designing all-pass filters based on this phase-smoothed data. Quantitative results obtained will be presented in this paper, whereas the next part of the two-part paper will present results from listening tests.

7397
Singing Voice Separation Combining Panning Information and Pitch Tracking
Cobos, Maximo; López, José J.
Source Separation techniques applied to music mixtures are able to extract relevant nformation that can be very useful for many applications, such as music remixing and reprocessing, lyrics recognition or music information retrieval. Among all the sources present in modern music themes, singing voice has an especial interest because it is the only one that combines music, lyrics and expression. In this paper, we propose a system designed for extracting singing voice from stereo recordings in different steps. This system combines panning information and pitch tracking, allowing to refine the time-frequency mask applied for extracting a vocal segment, and thus, improving the separation. An application example is discussed.

7398
The Downsampling Dilemma: Perceptual Issues in Sample Rate Reduction
Leonard, Brett
Many options currently exist for sample rate conversion. With sample rate reduction playing an integral part in the modern production world, downsampling algorithm quality is more important than ever. This paper presents data exploring the differences in sample rate reduction algorithms. While certain tests clearly display differences in the quality of the algorithms, listening test data shows the average listener is unable to repeatedly discern the difference in sample rate reduction methods.

7399
NU-Tech: The Entry Tool of the HArtes Toolchain for Algorithms Design
Bettarelli, Ferruccio; Cecchi, Stefania; Lattanzi, Ariano
The aim of the hArtes project is to facilitate and automate the rapid design and development of heterogeneous embedded systems, targeting a combination of a general purpose embedded processor, digital signal processing and reconfigurable hardware. In this paper, we present the NU-Tech platform, the main entry tool from the hArtes toolchain, which has the role of assisting the designers in tuning and possibly improve the input algorithm at the highest level of abstraction. A brief description of the project itself will be given and its vocation to audio highlighted through a case study application.

7400
Recovery of Missing Signals Utilizing GHA(Generalized Harmonic Analysis)-Applied Interpolation
Ifukube, Tohru; Miura, Takahiro; Muraoka, Teruo
For archiving damaged historical recordings, recoveries of missing portions are essentially important as well as noise reduction. Conventional counter-measures with functional interpolation are not effective when the missing interval is long. Inharmonic frequency analysis GHA is profitable for this purpose, because the recomposed signal with frequency components obtained by GHA exhibits very long period. Length of the period is given as an inverse number of least common multiple of rendered inharmonic frequency?fs periods. This feature is very advantageous for signal recovery, and the authors devised an extrapolation simply extending a re-synthesizing waveform obtained through GHA analysis/re-synthesis. The authors got satisfactory results a whole by applying interpolation combined with forward and backward extrapolations based upon abovementioned method. Results of recovery highly depend upon characters of signals (such as music) and the author did not find definite rules for setting GHA?fs analyzing conditions. Those are given through auditory examination this time.

7401
Combination of Warped and Linear Filter Structures for Loudspeaker Equalization
López, José J.; Pueo, Basilio; Ramos, German
The warping filters where introduced years ago for loudspeaker equalization in order to solve the lack of resolution of the linear filters at low frequencies, and also to follow the frequency resolution of psycho-acoustic scales like the Bark scale, with a more logarithmic than linear behavior. However, this improvement in the frequency resolution at low and mid frequencies is done at the expense of loosing resolution at high frequencies and increasing the complexity of the filter and its implementation computational cost. In this paper a smart combination of linear and warped filter structures previously developed by the authors for FIR filters is presented with new contributions and extended to IIR filters. This combination saves computational cost and obtains a proper frequency resolution at the whole frequency band, obtaining better results for the same computational cost that when using linear or warped filters alone. The results have been subjectively tested using the ABX methodology with successfully results. The presented filter structures, methodology, and apparatus to do the filtering are patent pending.

7402
Multi-Channel Dereverberation System Using Modified Correlation-Based Blind Deconvolution and Multi-Microphone Spectral Subtraction
Jeong, Jae-woong; Lee, Seok-Pil; Park, Young-cheol; Youn, Dae-hee
This paper presents a new multi-channel dereverberation system combining a modified correlation-based blind deconvolution with the multi-microphone spectral subtraction. In the proposed system, we make M combinations of observed signals and apply them to the correlation-based blind deconvolution. The deconvolved signals are then used as inputs to the multi-microphone spectral subtraction. The proposed system estimates the reverberant energy by using both a frame delay and a frequency-dependent weight. Due to accurate estimation of the reverberant energy, the combination of the correlation-based blind deconvolution with the multi-microphone spectral subtraction provides improved dereverberation performance. Performance improvement of the proposed system has been confirmed through experiments.

7403
Harmonic and Intermodulation Analysis of Nonlinear Devices Used in Virtual Bass Systems
Gan, Woon-Seng; Oo, Nay
Nonlinear devices (NLD) are used in the virtual bass system. NLD generates harmonics which in turn create the pitch perception and are used in the audio bass enhancement systems using psychoacoustics. This paper presents the mathematical derivations and analyses of five different NLD devices, together with intermodulation analysis of harmonics generated by these NLDs. The five NLDs are half-wave rectifier, full-wave rectifier, square wave, polynomial function and exponential function. The derivation of harmonic analysis equations are based on Fourier Theorems, Chebyshev Polynomials, and Taylor Series expansions. Besides the harmonics, intermodulation components are also resulted from NLDs. Both mathematical analysis and simulation results are presented for the intermodulation effects of harmonics generated by NLDs.

7404
Speech Quality Measurement for the Hearing Impaired on the Basis of PESQ
Beerends, John; Eneman, Koen; Huber, Rainer; Krebber, Jan; Luts, Heleen
One of the research topics within the HearCom project, a European project that studies the impact of hearing loss on communication, is to find methods with which the speech quality as perceived by the hearing impaired can be measured objectively. ITU-T Recommendation P.862 PESQ and its wideband extension P.862.2, are obvious candidates for this despite the fact that they were developed for normally hearing subjects. This paper investigates the extent to which PESQ and possible simple extensions can be used to measure the quality of speech signals as perceived by hearing impaired subjects.

7405
Subjective Evaluation of Speech Quality in a Conversational Context
Gautier-Turbin, Valérie; Geissner, Emilie; Gros, Laetitia; Gueguin, Marie
Within the framework of ITU-T, an objective conversational model is developed to predict the impact of network impairments on the conversational quality experienced by a end-user. To train and validate such a model, subjective scores are required. Assuming that a conversation is made of talking, listening and interaction activities, a subjective test protocol is specially designed to take into account these multidimensional aspects of the speech quality in a conversation. Subjects are asked to evaluate speech quality in talking, listening and conversational contexts separately during three successive tasks. The analyses of several tests show that this method is valid for the assessment of listening, talking and conversational quality.

7406
Contribution of Interaural Difference to Obstacle Sense of the Blind During Walking
Ifukube, Tohru; Ino, Shuichi; Miura, Takahiro; Muraoka, Teruo
Most blind people can recognize some measure of objects existing around them only by hearing. This ability is called "obstacle sense" or "obstacle perception". It is known that this ability is facilitated while the subjects are moving, however, the exact reason of the facilitation has been unknown. It is apparent that some differences of sounds reaching between both ears significantly change while approaching the obstacles. We focused on this phenomenon called interaural difference in order to analyze the facilitation mechanism of the obstacle sense. We investigated how the interaural differences change depending on the head rotation while walking and then measured the DL (Difference Limen) of the interaural difference. Furthermore, we compared the measurement data and the DL with the relationship between the subject-to-obstacle distance, and then discussed one of the factors of the facilitating the obstacle sense.

7407
The Accuracy of Localizing Virtual Sound Sources: Effects of Pointing Method and Visual Environment
Goupell, Matthew; Laback, Bernhard; Majdak, Piotr; Mihocic, Michael
The ability to localize sound sources in 3D-space was tested in humans. The subjects listened to noises filtered with subject-specific head-related transfer functions. In the experiment using naïve subjects, the conditions included the type of visual environment (darkness or structured virtual world) presented via head mounted display and pointing method (head and manual pointing). The results show that the errors in the horizontal dimension were smaller when head pointing was used. Manual pointing showed smaller errors in the vertical dimension. Generally, the effect of pointing method was significant but small. The presence of structured virtual visual environment significantly improved the localization accuracy in all conditions. This supports the benefit of using a visual virtual environment in acoustic tasks like sound localization.

7408
Perceived Spatial Distribution and Width of Horizontal Ensemble of Independent Noise Signals as Function of Waveform and Sample Length
Hirvonen, Toni; Pulkki, Ville
This paper investigates the perceived sound distribution and the width of a horizontal loudspeaker ensemble as a function of signal length, as all speakers emit simultaneous, white Gaussian noise bursts. In Experiment 1, subjects indicated the perceived distribution of 10 frozen cases where signal length was 2.5 ms. In Experiment 2, two cases from the previous test were investigated with signal lengths of 5-640 ms. The results indicate that 1) ensembles consisting of different short noise bursts vary in perceived distribution between cases and 2) when the length of the signal is increased, the produced sound event is generally perceived more wide. In perceiving such cases, the hearing system possibly utilizes some temporal integration and/or adaptive processes.

7409
Effect of Minimizing Spatial Separation and Melodic Variations in Simultaneously Presented Two-Syllable Words
Allan, Jon; Berg, Jan
This study will examine two important factors for the conception Auditory Streaming defined by Bregman, pitch and localization. By removing one or two of these factors as possible identifiers to separate sound sources, the importance of each of them and the effect of reducing both of them will be studied. Stimuli with combinations of two-syllable words will be presented simultaneously in speakers to subjects and the number of correct identifications will be measured. In one category of stimuli speech melody will be removed and replaced with a monotonous pitch, equal for all words. One category will have all words presented from one speaker only. A significant effect was found for pitch as a factor for successful segregation of words. Conclusions will be related to earlier studies and common theories, the Cocktail party effect among others.

7410
Characterization of the Multidimensional Perceptive Space for Current Speech and Sound Codecs
Etame, Thierry; Faucon, Gérard; Gros, Laetitia; Le Bouquin Jeannes, Régine; Quinquis, Catherine
The purpose of our work is to produce a reference system that can simulate and calibrate degradations of speech and audio codecs which are currently used on telecommunications networks, for subjective assessment tests of voice quality. At first, 20 wideband codecs are evaluated through subjective tests with the general goal of producing the multidimensional perceptive space underlying the perception of current degradations. Then, from a verbalization task, it appears that the identified attributes are clear/muffle, high-frequency noise, noise on speech and hiss. Finally, these dimensions are characterized with correlates such as spectral centroid, spectral flatness measure, Mean Opinion Score and correlation coefficient.

7411
An Automatic Maximum Gain Normalization Technique with Applications to Audio Mixing.
Reiss, Joshua D.; Perez Gonzalez, Enrique
A method for real-time magnitude gain normalization of a changing linear system has been developed and tested with a parametric filter design. The method is useful in situations where the maximum gain before feedback is needed. The method automatically calculates the appropriate gain that should be applied in order to maintain maximum unitary gain. The method uses an impulse measurement of a mathematical model of the system to be normalized. This is particularly useful for mixing engineers, who have to continually revise their gain structure in order to maximize gain before feedback. The system is also useful in many other situations where solving the analytical solution from the mathematical model is not possible.

7412
An Alternative Approach for the Convolution in Time-Domain: The Taches-Algorithm
Millot, Laurent; Pelé, Gérard
We present an alternative temporal approach for convolution, providing a new algorithm, called the taches-algorithm. Based on interferences between the successive delayed and amplified output signals associated respectively with the impulses constituting the input signal, the taches-algorithm can give access immediately to the new output sample and have a low latency response even without using vector-based optimisation of the calculation. With the taches-algorithm it seems easy to change (even in real-time) the impulse response while running the calculation, simply by updating the impulse response to use it for next samples, a task rather difficult to achieve using FFT convolution. Real-time audio demonstrations using notably Pure Data and simple explanations of the taches-algorithm will be given.

7413
Performance of Independent Component Analysis When Used to Separate Competing Acoustic Sources in Anechoic and Reverberant Conditions
Kendrick, Paul; Shirley, Ben
A review of existing methods for independent component analysis was carried out and a series of experiments conducted assessing the use of existing independent component analysis (ICA) methods to separate microphone sources in varied acoustic environments. Specifically the research looked at how effectively ICA could perform in a broadcast context using standard microphone techniques such as spaced omni and coincident crossed cardioid pairs. Experiments were carried out in an anechoic chamber and also in a listening room conforming to the ITU-R BS.1116-2 standard. Comparisons showed a large variance in the performance of different ICA algorithms and results clearly indicate the limitations of ICA when performed on audio material recorded in a reverberant environment however it was still shown possible to achieve separation of signals of up to 12dB even in these conditions.

7414
A Cross-Platform Audio Signal Processing Environment for Real-Time Audio Algorithm Development
Hämäläinen, Matti; Ristimäki, Mika; Turku, Julia; Väänänen, Riitta
This paper presents a real-time audio algorithm development environment for experimental audio system research. The backbone of the system is Pure Data audio signal processing platform, which enables flexible implementation of real-time audio systems. With the proposed development environment the user can concentrate on real-time audio algorithm development and performance evaluation in the workstation environment. We present the proposed algorithm design method and environment, and its application to an experimental VoIP system development.

7415
New Enhancements to the Automatic Noise Removal (ANR) System Utilizing Improved Noise Statistics and Multi-Band Processing
E.V., Harinarayanan; Ferreira, Anibal; Saeed, Shamail; Sinha, Deepen
We recently introduced a novel Automatic Noise Reduction (ANR) algorithm for the removal of wideband stationary/non-stationary noise from audio [1]. Current noise reduction techniques exhibit certain undesirable characteristics. Distortion and/or alteration of the audio characteristics is a common problem. User intervention in identifying the noise profile is sometimes necessary. ANR uses a novel framework employing dominant component subtraction and restoration and performs better than conventional techniques in subjective tests. Here we describe three enhancements to ANR. The first of these increases the level of noise removal for the special case of stationary background noise. The second is a new tool for improving the temporal envelope coherence and yields additional noise removal. The third is a multi-band processing tool for conditioning time-frequency envelope for reduced listener fatigue.

7416
A Channel Vocoder Using Wavelet Packets on a Reconfigurable Device
Salvador Castañeda, César Daniel
A channel vocoder using wavelet packets for computer music applications is proposed. The input audio signals are a modulating voice and a carrier melody. The wavelet packets channel vocoder transforms windowed frames of both signals to a subband domain, mixes the melody with the voice envelope, and transforms back the result to the original domain. Design is performed with Matlab/Simulink tools and real time implementations with Pure Data and Virtex II Pro FPGA board. Appropriate choices of frame length, wavelets, decomposition levels and envelope detector filter are proposed to achieve good quality sound effects. Finally, guidelines to improve transmission and compression rates in a future work are suggested.

7417
The Effects of Lossy Audio Encoding on Genre Classification Tasks
Casey, Michael; Fields, Ben; Jacobson, Kurt; Sandler, Mark
In large audio collections, it is common to store audio content using perceptual encoding. However, encoding parameters may vary from collection to collection or even within a collection - using different bit rates, sample rates, codecs, etc. We evaluate the effect of various lossy audio encodings on the application of audio spectrum projection features to the automatic genre classification tasks. We show that decreases in mean classification accuracy, while small, are statistically significant for bit-rates of 96kbps or lower. Also, a heterogeneous collection of audio encodings has statistically significant decreases in mean classification accuracy compared to a pure PCM collection.

7418
Loop Region Detection in Music Signals
Ong, Bee Suan; Streich, Sebastian
Spotting loops within a music recording seems to be an easy task for human listeners. Nevertheless it becomes highly time and effort consuming when loop segments are to be identified from a large music collection. The process can be greatly facilitated with an audio editing tool that highlights regions where loops appear and suggests loop durations respectively. This paper proposes a method for computing both types of information from the music signals. Our approach is based on identifying sequential and regular repetitions of tonal features. In addition, we present a prototype implementation featuring the proposed method to facilitate the audio browsing and searching process. Finally, we discuss other possible applications of this technology in the audio content description context.

7419
Music-Inspired Harmony Search Algorithm Applied to Feature Selection for Sound Classification in Hearing Aids
Alexandre, Enrique; Álvarez, Lorena; Amor, Javier; Gil-Pita, Roberto; Huerta, Ester
This paper explores the application of the music-inspired Harmony-Search algorithm to the problem of feature selection for sound classification in digital hearing aids. The importance of this problem is given by the strong computational constraints inherent to the DSPs used in modern digital hearing aids. The goal of the feature selection algorithm is to select a subset of features in order to reduce the computational complexity of the system while maintaining a low probability of error. A set of experiments will be performed to test the performance of the proposed system, using a total of 74 different features. The results will be compared with those obtained using other widely-used algorithms, such as sequential search algorithms or random search.

7420
Analysis of the Effects of Finite Precision in Sound Classifiers for Digital Hearing Aids
Alexandre, Enrique; Álvarez, Lorena; Amor, Javier; Gil-Pita, Roberto; Huerta, Ester
This paper deals with the analysis of quantisation effects in an automatic sound classification system for DSP-based hearing aids. The results obtained in this work will be used to find out the impact of finite accuracy determined by the digital signal processor (DSP) on the users of hearing aids. The DSP has a finite word length that affects the main ability of these systems: the automatic adaptation to the changing acoustic environment. The goal of this work is to model a quantized Neural Network-based classifier in order to compare the probability of error obtained with those non-finite precision systems.

7421
A Constructive Algorithm for Multilayer Perceptrons for Speech/Non-Speech Classification in Hearing Aids
Alexandre, Enrique; Álvarez, Lorena; Cuadra, Lucas; Rosa-Zurera, Manuel; Vicen-Bueno, Raúl
Constructive learning algorithms offer an attractive approach for the incremental construction of near-minimal neural-network architectures for pattern classification. This paper explores the feasibility of using a constructive algorithm for multilayer perceptrons (MLPs) applied to the problem of speech/non-speech classification in hearing aids. When properly designed and trained, MLPs are able to generate an arbitrary classification frontier with a relatively low computational complexity. The paper will focus on the design of a constructive algorithm for MLPs which attempts to converge to the minimum complexity network for the given problem. The results obtained will be compared with those cases in which the constructive algorithm is not considered.

7422
Seeing the Inaudible. Descriptors Used for Generating Objective and Reproducible Data in Real-Time for Musical Instrument Playing Standard Situations
Grosshauser, Tobias; Schwarz, Diemo
This article describes a method to generate objective and reproducible data to assist instrument teaching and practicing. The method is based on using audio descriptors and their efficient visualisation that assist in the perception of musical parameters difficult to hear. To aid comparison, we defined and recorded a comprehensive database of positive and negative sound examples from the violin that encompasses frequent mistakes made by students and a wide variety of playing styles.

7423
Structural Segmentation of Music Using Set Accented Tones
Coyle, Eugene; Dorran, David; Gainza, Mikel; Kelly, Cillian
An approach which efficiently segments Irish Traditional Music into its constituent structural segments is presented. The complexity of the segmentation process is greatly increased due to melodic variation existent within this music type. In order to deal with these variations, a novel method using ‘set accented tones’ is introduced. The premise is that these tones are less susceptible to variation than all other tones. Thus, the location of the accented tones is estimated and pitch information is extracted at these specific locations. Following this, a vector containing the pitch values is used to extract similar patterns using heuristics specific to Irish Traditional Music. The robustness of the approach is evaluated using a set of commercially available Irish Traditional recordings.

7424
AnClaS3: A Blackboard-Based Cooperative Framework for Sound Separation
Degara-Quintela, Norberto; Pena, Antonio; Sobreira-Seoane, Manuel; Torres-Guijarro, Soledad
Blackboard modelling provides a great flexibility in structuring complex problems and a robust adaptation to the conditions of the signal to be analyzed, adding both bottom-up and top-down capabilities to the system. AnClaS3 (Analysis, Classification and Synthesis for Sound Separation) is a cooperative project where five research groups collaborate integrating algorithms and developing new separation methods. This contribution defines a blackboard-based framework where four blackboard-based systems interact to integrate the expertise of independent research groups in order to solve a sound separation problem.

7425
Analysis and Synthesis of Audio Vibrato Using Harmonic Sinusoids
Sandler, Mark; Wen, Xue
This paper introduces the analysis and synthesis of vibrato in music audio. The analyzer separates frequency modulators from their carriers using a demodulation process. It then describes the frequency variations of a vibrato using a period-synchronized parameter set, and the accompanying amplitude variations using a source-filter model, both of which can be regarded slow-varying. The synthesizer, on the other hand, reconstructs a vibrato from a given set of parameters. Using this system we are able to retrieve specific characteristics of vibratos, or modify them to implement various audio effects.

7426
Distortion Analysis and Reduction for the Parametric Array
Gan, Woon-Seng; Ji, Peifeng; Tan, Ee Leng; Yang, Jun
In this paper, distortion analysis and reduction for the parametric array loudspeaker is being presented. The parametric loudspeaker has been found useful in generating a highly directional sound beam. However, due to the nonlinear interaction of ultrasonic wave in air, several undesired harmonic distortions have been generated. Conventional approaches in reducing the distortion have not created satisfying solutions. A new approach capable of further reducing the distortion has been proposed in this paper. Several simulation results are being carried out in this work to test and compare the effectiveness of this proposed solution with conventional approaches.

7427
Piano "Forte Pedal" Analysis and Detection
Badeau, Roland; Bertin, Nancy; David, Bertrand; Schutz, Antony; Slock, Dirk
In this paper, we describe some features of the Forte Pedal piano effect and propose a method for detecting it through signal analysis. The detection method is applied to single tones recorded for this purpose. The Forte Pedal is found to increase the decay time of partials. in fact this effect dominates the behavior of the partials, in not only the duration, but also the evolution. When the sustain pedal is used, a floor noise appears for all the notes of the piano. Here, after the analysis of some relevant caracteristics we provide a method based on harmonic plus noise decomposition for analysing the residual and decide if the pedal is pressed or not.

7428
Diffusing Boundary Implementations in the 2-D Digital Waveguide Mesh
Murphy, Damian; Shelley, Simon
The digital waveguide mesh is a wave-based time-domain approach to the simulation of sound wave propagation in an acoustic system. The implementation of diffuse reflection is an important consideration in such an application, as the presence of diffuse reflection has a significant effect on an acoustic environment. The scattering effect of diffuse boundaries on reflected sounds, both in simulation and the real world, can be described using a technique that results in the formulation of frequency dependent diffusion coefficients. In this paper, a number of different approaches to modelling diffuse reflection in a 2-D digital waveguide mesh are presented as well as a detailed analysis and comparison of the local scattering effect of the diffuse boundary models using this technique.

7429
RenderAIR – Room Acoustics Simulation Using a Hybrid Digital Waveguide Mesh Approach
Beeson, Mark; Moore, Alastair; Murphy, Damian; Shelley, Simon; Southern, Alexander
The digital waveguide mesh (DWM) is a numerical simulation technique used to model signal propagation through a regular grid of spatio-temporal sampling points, and has been demonstrated as appropriate for modelling the acoustics of an enclosed space, particularly at low frequencies. The RenderAIR DWM application allows intuitive definition of parameters associated with geometry, boundary surface, and source/receiver parameters, required to generate spatially encoded Room Impulse Responses (RIRs). In this paper the expectations and limitations of DWM-based room acoustics modelling are explored through the use of the RenderAIR application in a number of situations. ISO3382 metrics are used as the main benchmark for the results obtained, which compare well with both real-world measurements and more traditional geometric acoustic approaches.

7430
Modelling Frequency-Dependent Boundaries as Digital Impedance Filters in FDTD Room Acoustic Simulations
Kowalczyk, Konrad; van Walstijn, Maarten
This paper presents a new method for modelling frequency-dependent boundaries in finite difference time domain (FDTD) and Kirchhoff variable digital waveguide mesh (K-DWM) room acoustics simulations. The proposed approach allows direct incorporation of a digital impedance filter (DIF) in the multi-dimensional (i.e.\ 2D or 3D) FDTD boundary model of a locally reacting surface. An explicit boundary update equation is obtained by carefully constructing a suitable recursive formulation. The method is analysed in terms of pressure wave reflectance for different wall impedance filters and angles of incidence. Results obtained from numerical experiments confirm the high accuracy of the proposed digital impedance filter boundary model, the reflectance of which closely matches locally reacting surface (LRS) theory. Furthermore, a numerical boundary analysis (NBA) formula is provided as a technique for analytic evaluation of the numerical reflectance of the proposed digital impedance filter boundary formulation.

7431
Commercial Low Frequency Absorbers - A Comparative Study
Hauser, Gabriel; Noy, Dirk; Storyk, John
This paper ties in to a previous Convention Paper by the same authors (AES 115th Convention, 2003, #5944, [1]) and presents a current set of commercially available passive and active low frequency absorbing devices. One item in particular is of an experimental nature – a wood box loaded with conventional membrane loudspeakers. These are not connected to an amplifier, but to a variety of different passive electronics networks (parallel, serial). Reproduci-ble acoustical measurements have been taken in a completely untreated rectangular concrete room, sequentially with and without a total of eight different absorbing devices. Results are compared and conclusions are presented.

7432
Volumetric Diffusers
Angus, Jamie; Cox, Trevor; Hughes, Richard; Umnova, Olga
Although many types of diffusers have been proposed, they are predominantly surface treatment. This paper places the diffuser in the volume of the room rather than on the surfaces, forming a volume based diffuser. In particular, we examine suitable sequences for their implementation. We also consider suitable metric’s to evaluate their performance. At first single layer volumetric diffusers are examined, and then multi-layer volumetric diffusers are investigated. In particular, the effects of varying the spacing, and number of layers, is more closely examined. The Boundary Element Method (BEM) model is used to gain accurate predictions of the diffuser’s performance. Finally, we demonstrate a diffusion structure that has a similar performance to that of a Primitive Root Diffuser (PRD).

7433
Loudspeaker Time Alignment Using Live Sound Measurements
Ahnert, Wolfgang; Feistel, Stefan; Maier, Thorsten; Miron, Alexandru Radu
The authors previously introduced the measurement software EASERA SysTune which can be used for measurements with live music and speech signals. In this work, we discuss specifically the use of real-time measurements for the time alignment of loudspeaker arrays and distributed systems and for the optimal adjustment of their phase relationships. Being capable of deriving impulse responses of up to 12 seconds length, the measuring process with EASERA SysTune is simpler and more accurate as the real-time function provides a more immediate view on the tuning process. Because measurements can be performed with standard stimulus signals as well as with external speech and music signals, fine-tuning loudspeaker settings becomes possible even during the rehearsal time of the musicians. Required measurement conditions and limitations are given.

7434
INR as an Estimator for the Decay Range of Room Acoustic Impulse Responses
Hak, Constant; Hak, Jan; Wenmaekers, Remy
A room acoustic impulse response can be used to derive the reverberation time and other parameters. For this a certain minimum energy decay range or effective signal to noise ratio is required, which relates to the difference between the initial signal level and the noise level. An impulse response parameter called INR is presented as an estimator for the decay range and shown to be a useful qualifier in practical measurements.

7435
Musical-Inspired Features for Automatic Sound Classification in Digital Hearing Aids
Alexandre, Enrique; Canadas-Quesada, Francisco Jesus; Rosa-Zurera, Manuel; Ruiz-Reyes, Nicolas; Vera-Candeas, Pedro
This paper proposes the use of some musical-inspired features for the automatic classification of sounds in digital hearing aids. This kind of application is characterized by very strong constraints in terms of computational complexity. The proposed features are based on fundamental frequency detection and exhibit a low computational complexity while providing good results in terms of probability of correct classification. The performance of the system will be tested using an 1-NN classifier being the goal to distinguish among speech, noise and music. For the experiments, a sound database, obtained using a hearing aid simulator, will be used.

7436
Assessing the Potential Intelligibility of Assistive Audio Systems for the Hard of Hearing and Other Users
Mapp, Peter
Around 14% of the European population suffer from a noticeable degree of hearing loss and would benefit from some form of hearing assistance or deaf aid. Recent DDA legislation and requirements mean that many more hearing assistive systems are being installed – yet there is evidence to suggest that many of these systems fail to perform adequately and provide the benefit expected. This paper reports on the results of some trial acoustic performance testing of such systems. In particular the effects of system microphone type, distance and location are shown to have a significant effect on the resultant performance. The potential of using the Sound Transmission Index (STI) and in particular STIPa, for carrying out installation surveys has been investigated and a number of practical problems are highlighted. The requirements for a suitable acoustic test source to mimic a human talker are discussed as is the need to the need to adequately assess the effects of both reverberation and noise. The findings discussed in the paper are also relevant to the installation and testing of classroom ‘sound field’ systems and also boardroom type reinforcement systems and conferencing / teleconferencing systems.

7437
Graphical Control of a Parametric Equalizer
Loviscach, Jörn
Graphic equalizers allow the user to define a filter's magnitude response virtually free of restrictions. Parametric equalizers are much more limited. However, they offer some vital advantages over graphic equalizers, such as consuming less computational power and operating minimally invasively with naturally soft magnitude and phase responses. This work aims at combining the best of both worlds: It presents a range of methods to control a digital parametric equalizer graphically through a curve or a collection of anchor points. While the user is editing the graphical input, an optimization process runs in the background and adjusts the equalizer's parameters to reflect the input. In addition, the number of bands and their type (shelving/peak) can be adjusted automatically to produce a simple solution.

7438
Audio Software Development - An Audio Quality Perspective
Berg, Jan; Ekeroot, Jonas
When developing audio applications, different choices on software implementation aspects influence the total audio software signal path and can be of importance from an audio quality perspective. The field is not well documented in the literature. A study was carried out aiming at identifying relevant questions that must be considered. The general development perspective was on audio software written in C++ to be run on general purpose CPUs. A research review, comprising literature from different fields such as audio engineering, computer science and software engineering, was conducted to summarize and integrate an overview of the field. The result can be viewed as a map of questions for future research activities, consisting of further literature studies and experiments with software prototypes.

7439
Multi Carrier Modulator for Switch-Mode Audio Power Amplifiers
Andersen, Michael A.E.; Knott, Arnold; Pfaffinger, Gerhard
While switch-mode audio power amplifiers allow compact implementations and high output power levels due to their high power efficiency, they are very well known for creating electromagnetic interference (EMI) with other electronic equipment, in particular radio receivers. Lowering the EMI of switch-mode audio power amplifiers while keeping the performance measures to excellent levels is therefore of high general interest. A modulator utilizing multiple carrier signals to generate a two level pulse train will be shown in this paper. The performance of the modulator will be compared in simulation to existing modulation topologies. The lower EMI as well as the preserved audio performance will be shown in simulation as well as in measurement results on a prototype.

7440
A Comparison of Theoretical, Simulated, and Experimental Results Concerning the Stability of Sigma Delta Modulators
Mladenov, Valeri; Reiss, Joshua D.; Tsenov, Georgi
Sigma delta modulation is a popular form of audio analogue-to-digital and digital-to-analogue conversion, but suffers from stability problems for many designs and many input signals. A general theory of stability in sigma delta modulators has been developed which predicts the stability of a high order one bit sigma delta modulator (SDM) under a variety of designs. In this paper, the theoretical approach to stability as it applies to boundedness of states is explained. Several low pass SDM designs are developed which are intended for audio analogue to digital conversion, and predicted results for stability of these designs are given. Stability is examined both in terms of the maximum allowable DC input amplitude and the theoretical sufficient conditions for stable behavior. Theoretical results are compared with simulated results, and where possible, with experimental results from a realisation of a third order SDM with adjustable parameters. Practical observations are then made concerning the effect of noiseshaping, pole/zero placement, and cut-off frequency on the stability.

7441
A New Method for Identification of Nonlinear Systems Using MISO Model with Swept-Sine Technique: Application to Loudspeaker Analysis
Kadlec, Frantisek; Lotton, Pierrick; Novak, Antonin; Simon, Laurent
This work presents a Multiple Input Single Output (MISO) nonlinear model in combination with sine-sweep signals as a method for nonlinear system identification. The method is used for identification of loudspeaker nonlinearities and can be applied to nonlinearities of any audio components. It extends the method based on nonlinear convolution presented by Farina, providing a nonlinear model that allows to simulate the identified nonlinear system. The MISO model consists of a parallel combination of nonlinear branches containing linear filters and memory-less power-law distortion functions. Once the harmonic distortion components are identified by the method of Farina, the linear filters of the MISO model can be derived. The practical application of the method is demonstrated on a loudspeaker.

7442
Junction Identification Using Acoustic Reflectometry
Kestian, Adam; Roginska, Agnieszka
Acoustic reflectometry is a non-invasive, time-domain method of identifying the geometry of an acoustical space. A sound pulse is injected into a space and the resulting impulse response details particular changes of impedance. In the present study, acoustic reflectometry is utilized to identify scattering junctions of geometric spaces. Most notably, the four most common types of scattering junctions are identified: a cross-sectional increase, cross-sectional decrease, L-intersection, and T-intersection.

7443
Loss of Subjective Localization Cues in Virtual Acoustic Opening
Blanco-Martín, Elena; Casajus-Quiros, Francisco Javier; Gómez-Alfageme, Juan Jose; Ortiz-Berenguer, Luis Ignacio
The reproduced sound event quality is very important in a WFS configuration that is used for an acoustic opening. One way of checking the subjective quality perceived by a listener is the ITU R. BS.1387 “Method for objective measurements of perceived audio quality”, but this method does not provide information about the listener ability to localize the sound. A Matlab application has been implemented (SEL, Sound Event Localization) simulating an acoustic opening configuration. The number of microphones and loudspeakers in the arrays is selectable, just as the sound source position, the gap between array transducers and the listener position. This simulation has been verified against a real configuration of acoustic opening. Moreover, the loss of localization cues has been analyzed with different multichannel codings.

7444
Effect of Interaural Differences on Loudness of Narrowband Noise Bursts
Hirvonen, Toni; Pulkki, Ville
This paper investigates the effects of interaural time and level differences (ITDs and ILDs, respectively) on loudness. Dichotic samples containing various amounts of interaural differences were compared to a diotic reference. The subjects adjusted the relative threshold gain of the test sample using a two-alternative, forced choice adaptive procedure (2AFC). The test signals were Gaussian noise samples with a bandwidth of one critical band and center frequencies of 150, 600, and 2400 Hz. The results imply that ILD is prominently responsible for changes in directional loudness, which is in agreement with present binaural loudness models that consider only ILD. The experiments revealed significant individual differences between subjects even when matching two identical signals.

7445
Perception of Movements of a Focused Sound Generated with a Linear Loudspeaker Array System
Ashihara, Kaoru; Kiryu, Shogo; Manon, Ichiki; Sato, Daiki; Tanno, Tomoaki
A special loudspeaker array system was developed for an experiment on perception of movements of a focused sound. The spatial patterns of the sound pressure level for the focused sounds were measured. The patterns were improved compared to the previous preliminary experiment using commercial devices. A psychoacoustic experiment on perception of movements of the focused sound was conducted using the developed system.

7446
Subjective Evaluation for Music Recording Positions in a Coherent Region of a Reverberant Field
Hara, Yoshifumi; Miyoshi, Kazunori; Nomura, Hiroaki; Tohyama, Mikio
In this article, we describe the most preferable frequency characteristics of the early reflections for music recording positions. We recorded short passages from two music pieces (Haendel," Water Music Suite" and Brahms," Symphony No.4") at various distances from a sound source in a coherent region in a reverberation chamber. Subjects evaluated the preference and the subjective loudness through headphones under the diotic condition by paired comparison tests. As a result, we found that the most preferable distance indicated the distance where the loudness became maximum. The preferable recording condition could be also characterized by narrow-band envelope spectrum analysis.

7447
Efficient Individualization of HRTF Using Critical-Band Based Spectral Cues Control
Hur, Yoomi; Lee, Seok-Pil; Park, Young-cheol; Youn, Dae-hee
Recently, 3D audio technologies are commonly implemented through headphones. A major problem of the headphone-based 3D audio is in-the-head localization, which occurs due to the inaccurate Head-Related Transfer Function (HRTF). Since the individual measurements of HRTFs are impractical, there have been several researches for HRTF customization. In this paper, we proposed an efficient method of customizing HRTFs. In the proposed method, spectral notches and envelopes are controlled based on a critical-band rate. Thus, the structure of the proposed algorithm is much simpler than that of previous methods, but still effective. The proposed method was evaluated in the problem of externalization, and the results showed that the customized HRTF using proposed method could greatly improve the externalization performance.

7448
How to Widen the Sweet Spot in Monitoring 5.1.
Bassères, Julien; Thevenot, Patrick
Generally speaking, sound reproduction tends to achieve the widest sweet spot. But it's seldom realized and more than that, the restricted sweet spot has become rather usual and well accepted by the audio community. This paper proposes to find a new approach in order to get a wider sweet spot, up to a certain extend, in multi channel. By optimizing the directivity of each loudspeaker in order to compensate the position of the listener, this method aims at creating a coherent and homogeneous acoustic field. Special care will be given to the directivity pattern (amplitude and phase) of the loudspeaker system.

7449
Auditory Modeling Via Frequency Warped Transforms
Borowicz, Adam; Parfieniuk, Marek; Petrovsky, Alexander; Petrovsky, Alexander
The goal of this paper is to show and compare four different versions auditory modeling based on frequency warped transforms: bark-scaled wavelet packet decomposition, bark-scaled adapted wavelet packet decomposition, warped discrete Fourier transform and four-band wavelets paraunitary filter bank, useful for perceptual audio coding, speech enhancement and parametric audio coding matching pursuit procedure based on the psychoacoustic optimized wavelet packet dictionary. A practical implementations of the audio signal processing based on the given auditory modeling approaches are in details considered and analyzed from positions: depth of a compression, perceptual perception, a structural realizability, an opportunity to build an embedded systems.

7450
The Role of Spectral Features in Sound Localization
Møller, Henrik; Toledo, Daniela
Spectral components of head-related transfer functions (HRTFs) are highly dependent on the anthropometric characteristics of subjects. In the low frequency range, a common structure is often found in HRTFs from different subjects. However, individual differences are seen at high frequencies. In binaural synthesis with non-individual HRTFs, localization errors occur if the spectral characteristics of the directional filters used do not match the individual characteristic of the listener. This investigation is focused on the spectral characteristics of HRTFs that are relevant as localization cues and how to parameterize them. This is done by cross-matching individual and non-individual HRTFs from different subjects according to the results of localization experiments.

7451
Multichannel Loudness Listening Test
Cabrera, Densil; Dash, Ian; Miranda, Luis
As part of ongoing research for ITU Recommendation BS.1770 Algorithms to measure audio programme loudness and true-peak audio level, listening tests were conducted using a standard five-channel geometry in a standard listening room to confirm the channel gains and the spectral weightings for equal loudness contribution. Most ITU-related work to date has used broadcast program as a test signal. In this test, octave band noise was used as a test signal. Twenty-seven listeners participated. Results were analysed for statistical consistency as well as for average and variance. Agreement between the test results and various broadband loudness models, including ITU-R BS.1770, is examined.

7452
Challenges in Reproduction and Evaluation of Upmixed Audio in an Automotive Environment
Bergweiler, Steffen; Hellmuth, Oliver; Holzhaeuser, Stefan; Neumann, Manfred; Walther, Andreas
Audio systems with high quality sound reproduction capabilities are becoming more and more popular in the car. The need to create a pleasant soundfield has lead to an increased number of loudspeakers combined with digital signal processing. To benefit from the advantages of surround sound reproduction also for two-channel legacy content an upmixing algorithm is required. In this paper, challenges and requirements for high quality surround sound reproduction and upmixing are first introduced separately and then discussed jointly with the specific focus on the automotive environment. Finally a test method for the evaluation of different upmixing algorithms in the car is suggested.

7453
A General Approach to Loudspeaker Array Synthesis Methods
Navarro Ruiz, Juan Miguel
Loudspeakers arrays are often used by sound reinforcement in large concert halls and outdoor events to provide increased directivity. Unlike to what happens in the loudspeaker systems, there is an entrenched theory in antenna array synthesis, which has been used extensively over the past few years. This paper focus on discussing several consolidated antenna array’s synthesis methods. Then, a simulation’s software is implemented to show theirs pros and cons for using a loudspeaker array. Finally, an efficient synthesis method is proposed to achieve the required characteristics.

7454
On Large Multiactuator Panels for Wave Field Synthesis Applications
Escolano, José; López, José J.; Pueo, Basilio; Ramos, German
Wave Field Synthesis (WFS) is a spatial sound rendering technique that generates a true sound field using loudspeaker arrays. Multiactuator Panels (MAPs) are an alternative technology to the dynamic piston loudspeakers, based on the distribute mode operation. Because of its low visual profile and negligible vibration of the panel, MAPs are very suitable for WFS reproduction. However, the size of current prototypes does not allow its use for real immersive environments in which loudspeaker must be integrated as walls or as projection screens. In addition, the extra area of a large panel can be used to accommodate extra exciters with which to generate sound fields at another elevation levels. In this paper, a large MAP prototype is presented that has been designed and built to fulfill the requirements of immersive audio aplications. It represents a step forward in the applications of MAPs for immersing scenarios. The panel size enhances its acoustic behaviour in the low frequency range. Also, it can be employed for relative large projection screens for videoconferencing and for virtual reality.

7455
Temporal Change of Psychological Impressions Regarding Microphone Arrays for Multichannel Recording
Kamekawa, Toru
Microphone technique for surround sound recording of an orchestra is discussed. Seven types of surround microphone sets recorded in a concert hall were compared in subjective listening test on the attributes such as powerfulness and spaciousness using a method inspired by MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor). To minimize temporal change in music, Phase Randomized Signal (PRS) was proposed. From the average score of the listening test, the impression difference between original source and PRS was found in some microphone arrays consisted of directional microphones at some pieces. It means that the impression of these arrays depend on temporal changes in music. The data from the listening test between original source and PRS showed that impression of powerfulness had slightly higher correlation. The relations of the physical factors of each array were also compared, such as SC (Spectral Centroid), LFC (Lateral Fraction Coefficient), and S/M (Side/Mid Ratio) of each array. The correlation of these physical factors and the attribute scores show that the contribution of these physical factors depends on music and its temporal change. ‘Powerfulness’ is related to timbral character and ‘spaciousness’ is related to temporal change.

7456
Manufacturing Recordings from 100 Year Old Masters
Davies, Sean; Hooning, Rinus
Most work on the 78rpm analogue recording format concentrates on pressings made near to the time of the recording and the best ways to retrieve the information from these for future storage and reproduction. However, a considerable number of metal master plates have been preserved from the earliest days to the end of the format’s active period. This paper describes a project to manufacture new pressings from the original plates, the reasons for so doing and the technical challenges involved.

7457
Replay of Digital Original Tapes: Practical Experiences with Video Tape Based PCM Adapters and R-DAT
Pichler, Heinrich; Spitzbart, Johannes; Wallaszkovits, Nadja
As many of the early digital formats are already obsolete and support of these formats cannot be guaranteed much longer by the manufacturers, archives should presently give priority to the replay of original recordings on such material. Based on a short theoretical discussion and outlining the format specific characteristics, the paper discusses a variety of practical problems of signal retrieval from PCM (Pulse Code Modulation) encoded signal on a VTR (video tape recorder) and R-DAT (Rotary-Head Digital Audio Tape), such as mechanical problems, tracking problems and playback incompatibilities, data integrity checking, extraction and incompatibility of sub-code-information, pre-emphasis as well as other problems occurring from irregular recording conditions (typically with field recordings produced on portable devices) or format peculiarity.

7458
High Resolution Audio Recording, Preservation and Delivery at Indiana University’s Jacobs School of Music
Gregg, Travis; Strauss, Konrad
The Indiana University Jacobs School of Music has been making live concert recordings on a variety of formats since the 1940s, and we continue to record approximately 500 concerts each year. Recent industry trends and changes in technology have led us to investigate the possibility of creating high-resolution digital files rather than continuing to use physical media as the archival format for our recordings. Our goal was to develop a system for the creation, access, and long-term preservation of high-resolution audio recordings and associated metadata that conformed to emerging standards for digital audio preservation. We began building such a system in July of 2006 and reached full implementation in February of 2007. This paper gives an overview of the development process, presents hardware and software solutions, and discusses workflow and data management issues.

7459
A Fast Feature Extraction System on Compressed Audio Data
Friedrich, Tobias; Gruhne, Matthias; Schuller, Gerald
We describe an efficient system, which directly extracts features from compressed audio material. It consists of a time/frequency conversion method and a feature extraction algorithm. The conversion method provides the feature extraction algorithm with a suitable complex spectral representation directly from the compressed domain. It further allows to trade-off between computational complexity and conversion accuracy. Several operating points using different conversion accuracies were tested with an MPEG audio identification system in order to evaluate the identification confidence. Based on these results it is possible to reduce the computational complexity from O(N log N) to O(N) compared to the conventional approach (complete decoding followed by a frequency analysis).

7460
A Proposed Audio Visual Product Evaluation Measure
Joe, Peters; Peters, Joe
The Multimedia Section at the Centre for Instructional Technology at the National University of Singapore, has developed an audio visual assessment index (AVAI) to serve as a tool for clients to measure their evaluation of audio and video products. AVAI is based on a listing of indicators and variables that make up the fundamental elements in the capture and processing of AV products (video production): Image, Color, Light, Audio, Form, Aesthetics and Delivery. AVAI is currently being used by professionals for internal evaluation. A series of simulator-based AVAI courses are also underway, the purpose of which is to enable lay persons to understand the indicators and variables through simulated explanations. The thesis is, that in order to keep product value high the information gap between the producers and the lay clients must be narrowed. The sub-set of this thesis is that this narrowing can be achieved through even a singular simulator training session. What is presented in this paper is the conceptual framework and some preliminary tests. The tests are not substantial as studies are slow. AVAI is not a core area of the work of the Multimedia Section handling this study. Nevertheless, it is important to have some response from AES on this preliminary presentation.

7461
Nonexistence of Frontal Signal Unmasking from Spatially Wide Masker
Ahonen, Jukka; Pulkki, Ville
The masking of a frontal signal by spatially wide noise sources was investigated in a listening experiment. The noise sources consisted of a single or multiple symmetrically positioned loudspeakers in the frontal horizontal plane in anechoic conditions. It is shown that the detection threshold of the signal does not depend on masker width, which suggests that frontal unmasking does not exist in loudspeaker listening. In additional tests with signal source positioned in side it is shown that moderately small binaural unmasking occurs in that case from wide masker, and that increasing the width of masker source decreases binaural unmasking effect.

7462
Evaluating Perception of Salient Frequencies: Do Mixing Engineers Hear the Same Thing?
Bitzer, Joerg; LeBoeuf, Jay; Simmer, Uwe
In this contribution, we analyze the agreement of mixing engineers when finding salient frequencies in recorded audio tracks. Twenty-two mixing engineers were asked to use an equalizer with a high Q and high gain setting. Using this tool to sweep through the files' frequencies, they analyzed sixteen audio tracks and reported the most perceptually salient frequencies. The results show that the agreement depends on the analysis bandwidth. Most mixing engineers agree within a wide frequency range. However, only a few engineers agree if the matching bandwidth is below or equal to one-third octave. In this paper, we try to explain these results and give a detailed analysis.

7463
Evaluation of Stereophonic Images with Listening Tests and Model Simulations
Kang, Kyeong Ok; Nelson, Philip; Park, Munhum
A binaural hearing model has recently been suggested for the evaluation of the performance of virtual acoustic imaging systems. The model considers excitation-inhibition (EI) cell activity patterns as the internal representation of sound localisation cues, and a pattern-matching procedure with a frequency-weighting scheme produces the estimate of source location in the horizontal plane. Given the reasonable prediction of some important features in human sound localisation and lateralisation, this paper presents a further verification and application of the model in actual listening tests. In this work, participants' responses to stereophonic images have been compared with the predictions of the model, individually established from the subject's own HRTF. Model predictions have been found to be both qualitatively and quantitatively consistent with the test results, and in particular, the agreement between 2 and 3\,kHz gave a good indication that, unlike some similar models, the current model can effectively incorporate both ITD and ILD information according to their relative importance.

7464
The Sound Character Space of Spectrally Distorted Telephone Speech and Its Impact on Quality
Möller, Sebastian; Raake, Alexander; Wältermann, Marcel
Spectral distortions of speech transmitted over a telephone channel may stem from linear channel filtering, codecs, electro-acoustic properties of end-user terminals, or the acoustic environment at send side. In this contribution, a study is presented which aims at revealing the perceptual space of spectrally distorted telephone speech and establishing a link to the overall quality of the speech. Two dimensions were identified as relevant for explaining the perceived quality: "Indirectness" and "brightness". Whereas "brightness" is related to the center frequency of a transfer function, "indirectness" is correlated with the equivalent rectangular bandwidth and constitutes the dominating factor in the perceptual space in terms of covered variance. The concept of the bandwidth impairment factor which fits into the framework of the so-called E-Model and which is based on these simple parameters for computing the integral quality of spectrally distorted speech could successfully be applied to the given data.

7465
Objective Evaluation of a Non-Environment Control Room for 5.1 Surround Listening
Degara-Quintela, Norberto; Pena, Antonio; Torres-Guijarro, Soledad
The control room at the ‘Universidad de Vigo’ was built for the purpose of assessing small audio artefacts, such as listening analytically to coded material with different data rates. It follows a non-environment design that minimises the influence of the room. The use of such a room as a 5.1 surround listening room will be analysed according to international recommendations. This research includes the study of the electro-acoustic behaviour of loudspeakers, geometric and acoustic properties of the room, and sound field conditions. A discussion of some divergences and implications for its use when performing surround listening tests follows the measurement results.

7466
A Case Study on Sound Reproduction and Acoustic Enhancement in Concert Halls Using Wave Field Synthesis
Casdorff, Max; Kuhn-Rahloff, Clemens; Moser, Roger; Rosenthal, Matthias; Casdorff, Max
This paper presents the wave fields synthesis system under construction at the National Conservatory of Music Detmold (Germany). The system is dedicated to sound reproduction for artistic purposes at the Tonmeister department (Erich Thienhaus Institute) and to an enhancement of room acoustics. The system comprises 346 independent loudspeaker channels, including a horizontal loudspeaker array all around the auditory (500 seats) and ceiling loudspeakers. Since the hall is used for a broad repertoire comprising of chamber music, romantic orchestra instrumentations, organ concerts, contemporary music, etc., the hall will be equipped with a variable room acoustic system. The paper presents perceptual aspects of system design, concerning the direct sound and diffuse field as well as practical implementations for WFS rendering.

7467
Small Studios with Gypsum Board Walls, a Review of Their Room Acoustics, Details at the Low Frequencies.
Nastasi, Francesco; Rizzi, Lorenzo
The present time of most music pre-production and production is often carried out in very small, privately owned, rooms, which are called ‘project studios’. Gypsum board technology is very common in the construction of these rooms because of high insulation capabilities compared to low monetary and time costs. The article discusses sweet spot impulse response measurements that have been carried out in 3 different but acoustically small rooms built with gypsum board sound insulating structures comparing it to a masonry built one. The room modal behavior is underlined, continuing with the analysis of decaying in time at low frequencies related to insights on perception and analysis. A different methodology of study is proposed.

7468
On the Measurement of Electro Acoustic Enhanced Sound Fields
Melchior, Frank; Walter, Florian
The installation and optimization of acoustic enhancement systems requires a large amount of experience. The verification in terms of measurement is most of the time done using conventional reverberation and acoustic parameter measurements according to ISO 3382. This is a good solution for diffuse sound analysis and general examination of early reflections, but in terms of direction dependent analysis the results are not satisfying. In this study a room equipped with an acoustic enhancement system was measured using a circular array. The effects of adding specific early reflections and direction dependent diffuse energy generated by the acoustic enhancement system are investigated. The results are compared to standard measurements according to ISO 3382.

7469
Applying Cochlear Modeling and Psychoacoustics in Room Acoustics
de Vries, Diemer; van Dorp Schuitman, Jasper
The acoustical qualities of a concert hall or any other room are generally expressed using acoustical parameters. These parameters are determined from impulse responses, as measured from single positions in a room or along a line array. However, from array measurements it turned out that parameters can fluctuate severely between small distance steps, something which does not agree with human perception. Applying cochlear modeling and psychoacoustics in this process seems a promising technique to reach results which do not suffer from these fluctuations and thus are much closer to human perception compared with conventional techniques.

7470
Empirical Evaluation of the Frequency-Dependent Boundary Conditions in a Digital Waveguide Mesh
Cobos, Maximo; Escolano, José; López, José J.; Pueo, Basilio
The digital waveguide mesh is a popular method for time domain acoustic system simulation such as room acoustics. One of the main reasons to choose this paradigm relies in the ease to include boundary conditions in the simulation. This work is focused on the comparison of the simulation with real-world measurements, where a particular scenario is physically built and the corresponding simulation, according to their physical parameters, is carried out. The main scope of this paper is the validation and discussion of a boundary condition model and their correspondence with the measurements through an example.

7471
Subjective Effects of Dispersion in the Simulation of Room Acoustics Using Digital Waveguide Mesh
Cobos, Maximo; Escolano, José; López, José J.; Pueo, Basilio
The simulation of room acoustics using the Digital Waveguide Mesh method has gained interest in the last years. One of the problems of this method is the frequency and angle dependent dispersion. In order to reduce this effect, an oversampling is usually employed but at the cost of highly increasing the resulting computational cost and restricting the simulation to lower frequencies. In this paper, a subjective analysis is carried out; where different oversampling factors in voice band simulations have been performed and evaluated by a set of listeners. Some listening tests employing ABX methodology have been used to evaluate the subjective effects, obtaining some preliminary results that, almost not being conclusive; they represent a first approach to the problem.

7472
Bitstream Format for Spatio-Temporal Wave Field Coder
Pinto, Francisco; Vetterli, Martin
We present a non-parametric method for compressing multichannel audio data for reproduction through Wave Field Synthesis. The method consists of applying a two-dimensional filterbank to the input multichannel signal, in both time and channel dimensions, and coding the two-dimensional spectra using a spatio-temporal frequency masking model. The coded spectral data is organized into a bitstream together with side information containing scale factors and Huffman codebook information. We demonstrate how this coding method can be applied to any smooth distribution of loudspeakers in space, while obtaining a stable bitrate that is 15% lower compared to coding each channel independently.

7473
The Design of Ambisonic Decoders for the ITU 5.1 Layout with Even Performance Characteristics
Moore, David; Wakefield, Jonathan
All previously published Ambisonic decoders for irregular loudspeaker layouts have localisation performance which varies significantly by angle around the listener. This contrasts with decoders designed for evenly spaced arrangements of loudspeakers where performance characteristics are isotropic. Furthermore even localisation performance around the listener is desirable for a number of application areas of 5.1 surround sound. New decoder design criteria are presented which aim to reduce this variation in localisation performance. These criteria are added to a multiobjective fitness function, based on auditory localisation theory, which guides a heuristic search algorithm to derive decoder parameter sets for the ITU 5.1 layout. The derived decoders exhibit a significant improvement in localisation performance variation by angle around the 360º sound stage.

7474
Methods for Sharing Stereo and Multichannel Recordings Among Planetariums
Gaston, Leslie
There is a demand for research on the transferability of surround sound audio from one planetarium to another, so that 1) audiences have similar experiences, and 2) audio engineers can easily create this experience. This research will consider: acoustics, production, delivery, equipment, and seating arrangements. Our recent survey of over 100 planetariums worldwide in the fall of 2007 will provide a look at current practices. The University of Colorado Denver and Gates Planetarium have collaborated in order to explore the potential of current audio technology, and to discover what similarities and differences exist between planetariums in order to achieve this goal of transferability.

7475
Optimal Hierarchical Bandwidth Limitation of Surround Sound
Jiao, Yu; Rumsey, Francis; Zielinski, Slawomir
In order to save the transmission bandwidth of surround sound, a technique named Hierarchical Bandwidth Limitation (HBL) was proposed by the authors. In HBL, a psychoacoustically hierarchical transform is used as the preprocessing algorithm prior to bandwidth limitation. In our former experiments we found that the Karhunen-L¨°eve transform (KLT) is a suitable hierarchical transform for HBL. Besides the hierarchical transform, the choice of an appropriate strategy for bandwidth allocation is also essential from the point of view of the resultant audio quality. In order to find the optimal bandwidth allocation strategy that achieves the best audio quality, the authors attempted to build up the mathematical relationship between audio quality and the bandwidth allocation strategy using a MUSHRA listening test. The experiment design and results are reported in this paper.

7476
Frequency-Dependent Signal-Correlation in Surround- And Stereo-Microphone Systems and the Blumlein-Pfanzagl-Triple (BPT)
Hoeldrich, Robert; Pfanzagl-Cardone, Edwin
With the aim to recreate the original concert-hall sound-field as faithfully as possible in the control- or living-room recordings were made simultaneously with an artificial head and several surround-microphone techniques (among them the new BPT method). The surround recordings were re-recorded using the same dummy-head as in the concert-hall. The results of subjective listening tests (loudspeaker as well as binaural) were assessed using ANOVA and correlation analysis. Acoustical analysis of the dummy-head recordings was performed by measuring the ‘Frequency-dependent Inter Aural Cross-correlation Coefficient’ (FIACC): the ‘low-correlation’ AB-PC microphone system was capable of reproducing the original sound-field better than any of the other systems under test (DECCA, KFM, OCT). A microphone-systems ‘Critical Frequency’, below which correlation raises towards 1, is defined.

7477
Holographic Design of a Source Array for Achieving a Desired Sound Field
Cho, Wan-Ho; Ih, Jeong-Guon; Boone, Marinus M.
For realizing a desired complicated sound field, an acoustic source array should be designed appropriately to obtain the acoustic source parameters. To this end, we suggest a method utilizing the acoustical holography technique based on the inverse boundary element method. Acoustical analogy between the problems of source reconstruction and source design was the initial motivation of the study. In the design of the source array, the pressure distribution at specific field points is the constraint of the problem and the signal distribution at the source surface points is the object function of the problem. The whole procedure of the application consists of three stages: First, a condition of the desired sound field should be set as the constraint. Second, the geometry and boundary condition of the source array system and the target field, i.e., points in the sound field of concern, are modeled by the boundary elements. Actual characteristics of source and space can be considered to generate the accurate condition of the target field. Finally, the source parameters are inversely calculated by the backward projection. As an example, a source array to fulfill the plane wave propagating zone and another quiet zone near the propagation zone was designed and tested by simulation and measurement.

7478
New Dimensions for Ambisonics
Chapman, Michael
Both two-dimensional (pantophonic) and three-dimensional (periphonic) representations of soundfields are common place in ambisonics. Reproducing either on rigs essentially designed for the other is common place. What though if one synthesises a four (or more) dimensional soundfield and reproduces this on a standard rig? As there appears to be no source on hyperspherical harmonics applicable to ambisonics, the mathematical basis is first set out. The manipulation of hyperambisonic soundfields (rotation, mirroring, dominance) is then discussed. During that discussion various 'proofs' are advanced as to the finite range of transformations that can be applied to ambisonic soundfields, of whatever dimension.

7479
Improving Spherical Microphone Arrays
Daniel, Jerome; Epain, Nicolas
Spherical microphone arrays are useful or numerous applications, such as spatial audio capture and beamforming. However, these sensor arrays are known to have a limited frequency range, due to poor directivity at low frequencies and spatial aliasing at high frequencies. In this paper, we study two methods aiming at enhancing the frequency range of spherical microphone arrays without using more sensors. First, the benefit of locating the sensors at the end of cavities within the sphere is assessed through measurements and simulations. Second, we study the influence of using large membrane microphones. Finally, results show that the frequency range could be increased in both cases studied.

7480
Migration of 5.0 Multichannel Microphone Array Design to Higher Order MMAD (6.0, 7.0 & 8.0) with or Without the Inter-Format Compatibility Criteria.
Williams, Michael
The severe limitations of the 5.0 Multichannel Reproduction Standard in reproducing good quality audio-visual or stand-alone audio surround sound reproduction has increased the pressure on recording and reproduction system designers to increase the number of channels in an attempt to give an even more satisfactory envelopment experience. This paper extends the MMAD process to show how higher order channel array designs (6.0, 7.0 and 8.0) can be developed from the existing data on 4.0 or 5.0 Multichannel Front Sound Stage Coverage Array Designs with almost perfectly seamless and linear surround sound reproduction. Designing for inter-format compatibility can also be accommodated from the existing multi-format array design data described in a previous paper on Multichannel Arrays Generating Inter-format Compatibility (MAGIC arrays)(3).

7481
Autoregressive Modelling of Hilbert Envelopes for Wide-Band Audio Coding
Ganapathy, Sriram; Garudadri, Harinath; Hermansky, Hynek; Motlicek, Petr
Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper, we propose a wide-band audio coding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP based audio codec at 66 kbps provides competitive results compared to the state-of-art codecs at similar bit-rates.

7482
On Locality of Spectral Oriented Tree for Bit-Plane Based Low-Bit Rate Audio Coding
Su, Alvin W.Y.; Wang, Yu-Lin
For Spectral Oriented Trees (SOT) based coders such as SPIHT and CEIHT, locality is usually related to the locations of coefficients within a SOT and its effect to coding efficiency. How to construct a SOT to achieve better locality is very important. This paper presents a diagnostic aspect of the localities of different ordering techniques for low bit-rate audio coding. We used several coefficient ordering schemes to construct SOTs with the same set of MDCT coefficients and observed their effects. Both objective and subjective results are presented.

7483
Perceptual Matching Pursuit for Audio Coding
Lahdili, Hassan; Najaf-Zadeh, Hossein; Pichevar, Ramin; Thibault, Louis
This paper introduces a Perceptual Matching Pursuit (PMP) algorithm for audio coding. A masking model has been developed and integrated into the matching pursuit algorithm to account for the characteristics of the hearing system. By doing so, only an audible kernel is extracted at each iteration. Moreover, contrary to the matching pursuit algorithm, PMP will stop decomposing an audio signal once there is no audible part left in the residual. We have used ITU-R PEAQ to compare audio materials decomposed by PMP and by matching pursuit. Objective scores for PMP increase by up to 1 unit. A semi-formal listening test has verified the objective scores and shown the perceptual superiority of PMP over the matching pursuit algorithm.

7484
A Unifying Approach to Transform and Sinusoidal Coding of Audio
Bartkowiak, Maciej
The paper describes a new scenario for low bit rate audio compression that combines two classical techniques: transform coding and sinusoidal coding into a united framework. The main idea is to adaptively decompose the audio signal into subbands whose central frequencies follow continuously the local instantaneous frequencies of certain signal components (formants or individual harmonic partials). The content in each subband is encoded in the baseband after frequency shift towards DC. The technique may be considered either as modified transform coding, i.e. coding along instantaneous frequencies or as extended sinusoidal coding, i.e. modeling with partial envelopes that are represented by transform coefficients. In other words, it is a hybrid scheme offering a continuous operating mode between purely transform and purely sinusoidal compression.

7485
Low Bit Rate Audio Coding for Digital Wireless Systems
Wray, Stephen
With the transition from analogue to digital television, the available spectrum for wireless microphones, in-ear monitors and other wireless devices could be under threat. Spectrum is a valuable commodity and it is the responsibility of governments to manage it appropriately. Much has been made recently of the Spectrum Squeeze both sides of the Atlantic with discussions on White Spaces and the Digital Dividend. With bandwidth at such a premium the Audio Industry has been forced to consider new technologies that make efficient use of spectrum without sacrificing quality or service. Within this context, we need a new revolutionary approach to maximizing bandwidth efficiency. The author will present a new and novel coding solution to overcome the prevailing technical limitations and industry requirements for wireless applications.

7486
Bit Allocation for Linear Prediction Coefficients with Application to Lossless Audio Compression
Ghido, Florin; Tabus, Ioan
We propose a novel technique of using bit allocation for linear prediction coefficients in asymmetric lossless audio compression. We show how to determine the optimal bit allocation using a new closed-form formula for the excess error from quantization and describe a recently introduced algorithm (Optimization-Quantization Least Squares) which computes the optimal quantized prediction coefficients applied for the allocation. The proposed method, implemented as a modified asymmetrical OptimFROG, obtains small (but consistent) signal dependent compression improvements with virtually no decoder complexity increase (for an 847 MB audio corpus, up to 0.27%, on average around 0.06%). Compared to MPEG-4 ALS, it obtained 0.38% better compression, while being at the same time approximately 5 times faster at decoding.

7487
Design of Framing in MPEG Surround Based on Dynamic Programming Algorithm
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Yang, Chung-Han
MPEG Surround (MPS) defined by ISO/IEC is the audio coding standard of multi-channel signals based on the down-mixed signal and the spatial parameters. In MPEG Surround, the time-frequency tiles decide the units to share the same spatial parameters among the multi-channel signals. Hence, the decision of the tiles is the critical module deciding the required quality and bits. However, the large number of combination in the time regions, frequency bands and multi-channel signal statistics has spanned the huge search space for deciding the tiles. Our previous work in AES 119 has proposed the dynamic programming method to efficiently decide the time-frequency units for the parameter stereo coding in HE-AAC. This paper will extend the dynamic programming method to the MPS coding.

7488
New Enhancements to the Audio Bandwidth Extension Toolkit (ABET)
Annadana, Raghuram; E.V., Harinarayanan; Ferreira, Anibal; Sinha, Deepen
Audio bandwidth extension has emerged as a key low bit rate coding tool. In continuation with our on going research on audio bandwidth extension, this paper presents new enhancements to Audio Bandwidth Extension Toolkit (ABET). ABET consists of three primary tools Accurate Spectral Replacement (ASR), Fractal Self Similarity Model (FSSM) and Multi-band Temporal Envelope Amplitude Coding (MBTAC) [1],[2],[3] . Additionally we have also introduced a blind bandwidth extension mode into ABET [4]. We discuss several new ideas / improvements to ABET. Specifically enhancements to the blind bandwidth extension architecture which allow it to work with signals with only 3.5-4.0 kHz audio bandwidth are described. We also elaborate on a new tool for efficient coding of time-frequency envelope which cuts the overhead by 0.75-1.0 kbps/channel. We also address a practical issue i.e., the computational complexity and describe a new low decoder complexity mode of ABET.

7489
Perceptual Evaluation of Numerically Simulated Head-Related Transfer Functions
Kärkkäinen, Asta; Kärkkäinen, Leo; Kirkeby, Ole; Pölönen, Monika; Seppälä, Eira; Turku, Julia; Vilermo, Miikka
Head-related transfer functions (HRTFs) produced by numerical simulations were compared to measured HRTFs through two listening tests. The purpose was to determine whether the numerically simulated HRTFs, which do not contain any of the artifacts associated with acoustic measurements, capture the detail necessary for reproducing convincing 3D sound. The results suggest that when virtual sound sources are presented to listeners binaurally over headphones, the measured and modelled HRTF sets perform equally well in terms of perception of direction. Regarding preference of binauralisation methods, the simulated HRTFs performed slightly better.

7490
Reaction Times and Performances in Recognition Tasks to Assess Speech Quality
Durin, Virginie; Gros, Laetitia; Hericher, Gilles
This paper deals with perceptive test methodologies to assess speech quality of telecommunication systems. Faced with drawbacks of typical methodologies recommended by ITU-T, a new way to assess speech quality is investigated, by collecting reaction times and performances when subjects are achieving tasks involving degraded speech signals. A dual task with a digit recognition memory task and a letter recognition task is proposed. Three different quality levels are applied to audio signals describing digits and letters. The results show significant differences of performances and reaction times between the three quality levels.

7491
Influence of Visual Appearance on Loudspeaker Sound Quality Evaluation.
Christensen, Flemming; Karandreas, Alex
Product sound quality evaluation aims to identify relevant attributes and assess their influence on the overall auditory impression. Extending this sound specific rationale, the present study evaluates overall impression in relation to audition and vision, specifically for loudspeakers. In order to quantify the bias that the loudspeaker appearance has on the sound quality evaluation of a naive listening panel, audio stimuli of varied degradation are coupled with actual loudspeakers of different visual appearance.

7492
Comparison of Loudspeaker-Room Equalization Preferences for Multichannel, Stereo, and Mono Reproductions: Are Listeners More Discriminating in Mono?
Devantier, Allan; Hess, Sean; Olive, Sean
Digital loudspeaker-room correction products are more popular than ever, despite the general lack of perceptual studies on their performance over a wide range of different playback conditions. This paper describes the first of several experiments that explore the influence of important acoustical and perceptual factors on their performance. In this experiment, a panel of trained listeners gave comparative preference ratings for three different loudspeaker equalizations based on anechoic and in situ measurements evaluated in a semi-reflective room, using three multichannel music recordings reproduced in surround, stereo, and mono. These equalizations were compared to the unequalized loudspeaker. The results are summarized as follows: all three equalizations were equally preferred over the unequalized system. The differences in preference ratings increased as the number of playback channels was reduced from 5 channels (surround) to 1 (mono).

7493
Caution and Warning Alarm Design and Evaluation for NASA CEV Auditory Displays
Begault, Durand; Godfroy, Martine; Holden, Kritina; Sandor, Aniko
The design of caution-warning signals for NASA’s Crew Exploration Vehicle (CEV) and other future spacecraft will be based on both best practices based on current research and evaluation of current alarms. A design approach is presented based upon cross-disciplinary examination of psychoacoustic research, human factors experience, aerospace practices, and acoustical engineering requirements. A listening test with thirteen participants was performed involving ranking and grading of current and newly developed caution-warning stimuli under three conditions: (1) alarm levels adjusted for compliance with ISO 7731, "Danger signals for work places – Auditory Danger Signals", (2) alarm levels adjusted to an overall 15 dBA s/n ratio and (3) simulated codec low-pass filtering. The resulting analyses include determination of sounds that were judged as inappropriate, independent of condition.

7494
Loudness Calculation for Individual Acoustical Objects Within Complex Temporally Variable Sounds
Bradter, Cornelius; Hobohm, Klaus
Models used for loudness calculation normally treat their input signal as an integral whole. For sounds consisting of two or more distinguishable acoustical objects, this contradicts listening experience. Auditory perception analyzes and identifies acoustical objects and may treat them differently. By expanding principles used in excitation synthesis based loudness models, we developed a procedure to calculate loudness of a time-varying acoustical object while a second object is simultaneously present. When signals of both objects are available individually and in combination, the procedure reflects effects of one object on the other as well as changes of loudness perception due to signal properties of one or both objects.

7495
An Anatomy of Graph-Based User Interfaces for Media Processing
LeBoeuf, Jay; Loviscach, Jörn; Mathur, Shailendra; Schultz, Christopher
Graph-based user interfaces are employed in a variety of software such as audio synthesizers, video compositing tools, and database application builders. All of these uses afford the graphical metaphor of a graph: "Nodes" such as sound generators or filters are tied together by "links," which may represent signal flow or conceptual relations. Focusing on media production tools, we have examined a large range of current software products to find out which de-facto standards have evolved in the field of graph-based interfaces and which features can be considered unique. We categorize a multitude of interface concepts employed in actual graph-based interfaces and describe differences in their implementation. The findings provide guidelines for developers of media production software.

7496
A Framework for Automatic Mixing Using Timbral Similarity Measures and Genetic Optimization
Kolasinski, Bennett
A novel method is introduced for automatic mix recreation using timbral classification techniques and an optimization algorithm. This approach uses the Euclidean distance between modified Spectral Histograms to calculate the distance between a mix and a target sound and uses a genetic optimization algorithm to figure out the best coefficients for that mix. The implementation has been shown to successfully recreate multitrack mixes accurately and may pave the way towards the automatic mixing of novel multitrack sessions based on a desired target sound.

7497
Delta-Sigma DAC Topologies for Improved Jitter Performance
Løkken, Ivar; Sæther, Trond; Vinje, Anders
Specifications for audio digital-to-analog converters (DACs) place requirements on the analog circuit design that contradict physical design conditions in a modern, digital-oriented system on a chip process. Because of low supply voltages, use of current-steering DACs has become the dominant choice for high resolution applications. Fed by a delta-sigma modulator that requantizes the digital signal to a manageable number of bits, the current-steering DAC is a continuous time type converter without any discrete time filtering. This makes it very susceptible to sampling clock jitter. In this paper, jitter induced distortion is addressed at a topology level, investigating design choices for the delta-sigma requantizer and the possible use of semidigital multi-bit current-steering filter DACs to reduce problems with jitter susceptibility.

7498
New Measurement Methods for Anechoic Chamber Characterization
Blanco-Martín, Elena; Gómez-Alfageme, Juan Jose; Sánchez-Bote, José Luis
As a continuation of the work presented in 122nd AES Convention, this paper tries to study in depth the anechoic chambers qualification. The purpose of this paper is to find parameters that allow the characterization of this type of enclosures. The proposal that becomes in this work is trying to obtain data of the anechoic chambers absorption by means of the transfer functions between pairs of microphones, or by means of the impulse response between pairs of microphones. Based on the results of the transfer functions between pairs of microphones can be checked easily agreement of the inverse squared law, allowing to determine the chamber cut-off frequency. Making a band-pass filtering it could be confirmed the anechoic chambers qualification.

7499
Acoustic Feedback Reduction Based on LMS and Normalized LMS Algorithms in WOLA Filters Bank Based Digital Hearing Aids
Cuadra-Rodríguez, Lucas; Martínez-Leira, Almudena; Rosa-Zurera, Manuel; Vicen-Bueno, Raúl
Acoustic feedback phenomenon can disturb a digital hearing aid performance at high gains, causing instability in the haring aid and degradation in the speech. In order to restore a stable situation, it is needed an acoustic feedback reduction (AFR) subsystem using adaptive algorithms such as the least-mean square (LMS) algorithm. This algorithm has a reduced computational cost, but it is very unstable. In order to avoid this situation, it is used another feedback reduction system based on a modified version of the LMS algorithm. Such algorithm is: the Normalized LMS (NLMS). These two algorithms are tested in two digital hearing aid categories: the In-The-Ear and the In-The-Canal. These categories are selected because they have great feedback effects, so robust AFR subsystems are needed. The added stable gain (ASG) over the limit gain when AFR subsystem is working in the digital hearing aid is obtained for each category. The ASG is determined as a trade-off between two measurements: the Segmented Signal-to-Noise Ratio (objective measurement) and the speech quality (subjective measurement). The results show how the digital hearing aids working with a feedback reduction adaptive filter adapted with the NLMS algorithm is able to achieve up to 18 dB of increase over the limit gain.

7500
Non-Linear Distortions in Capacitors
van der Veen, Menno; van Maanen, Hans
Many people have claimed that capacitors have a notable influence on the audible quality of systems. We have identified one of the major causes of non-linear distortions in capacitors. Charging the capacitor will result in an attractive force acting on the conducting plates. As no material is infinitely stiff, this force will reduce the thickness of the dielectricum and thus increase the capacitance. This process occurs in both phases of an AC signal in the same way and is thus non-linear. In this paper the consequences of this process are discussed. It should be noted that other passive components like resistors and inductors can also show similar non-linear behaviour.

Back to AES Papers


(C) 2008, Audio Engineering Society, Inc.