Audio Engineering Society Papers

AES 123rd Convention

New York, NY, USA
October 5-8, 2008

AES Paper Ordering

Single Convention Papers are available through the AES Paper Search and Shop facility.

Papers Listing

7162
Room Reflections Misunderstood?
Linkwitz, Siegfried
In a domestic living space a 2-channel monopolar and a dipolar loudspeaker system are compared for perceived differences in their reproduction of acoustic events. Both sound surprisingly similar and that is further enhanced by extending dipole behavior to frequencies above 1.4 kHz. The increased bandwidth of reflections is significant for spatial impression. Measured steady-state frequency response and measured reflection patterns differ for the two systems, while perceived sound reproduction is nearly identical in terms of timbre, phantom image placement and sound stage width. The perceived depth in the recording is greater for the dipole loudspeaker. Auditory pattern recognition and precedence effects appear to explain these observations. Implications upon the design of loudspeakers, room treatment and room equalization are discussed.

7163
Aspects of Reverberation Echo Density
Huang, Patty; Abel, Jonathan S.
Echo density, and particularly its time evolution at the reverberation impulse response onset, is thought to be an important factor in the perceived time domain texture of reverberation. In this paper, the psychoacoustics of reverberation echo density is explored using reverberation impulse responses synthesized via a Poisson process to have a variety of static and evolving echo densities. In addition, a recently proposed echo density measure called the normalized echo density, or NED, is explored, and related via a simple expression to echo density specified in echoes per second using echo patterns with static echo densities. A continuum of perceived time-domain texture was noted, from “sputtery” around 100 echoes per second to “smooth” above about 20,000 echoes per second, at which point it was perceptually identical to Gaussian noise. The character of the reverberation impulse response onset was explored for various rates of echo density increase, and ranged from “sputtery” for long mixing times to “instantly smooth” for short mixing times.

7164
Localization in Spatial Audio—From Wave Field Synthesis to 22.2
Liebetrau, Judith; Sporer, Thomas; Korn, Thomas; Kunze, Kristina; Mank, Christoph; Marquard, Daniel; Matheja, Timo; Mauer, Stephan; Mayenfels, Thomas; Möller, Robert; Schnabel, Michael-Andreas; Slobbe, Benjamin; Überschär, Andreas
Spatial audio reproduction used to concentrate on systems with a low number of loudspeakers arranged in the horizontal plane. Wave Field Synthesis (WFS) and NHK's 22.2 two systems promise better localisation and envelopment. Comparisons of 22.2 with 5.1 concerning spatial attributes on one hand, and evaluation of spatial properties of WFS on the other hand have been published in the past, but different methods have been used. In this paper a listening test method is presented which is tailored on the evaluation of localisation of 3D audio formats at different listener positions. Two experiments have been conducted: In the first experiment the localisation precision of 22.2 reproduction was evaluated. In a second experiment the localisation precision in the horizontal plane as a function of spatial sampling was studied.

7165
Thresholds for Discriminating Upward from Downward Trajectories for Smooth Virtual Source Motion Within a Sagittal Plane
Benson, David H.; Martens, William L.; Scavone, Gary P.
In virtual auditory display, sound source motion is typically cued through dynamic variations in two types of localization cues: the inter-aural time delay (ITD) and binaural spectral cues. Generally, both types of cues contribute to the perception of sound source motion. For certain spatial tra jectories, however, namely those lying on the surfaces of cones of confusion, ITD cues are absent, and motion must be inferred solely on the basis of spectral variation. This paper tests the effectiveness of these spectral variation cues in eliciting motion percepts. A virtual sound source was synthesized that traversed sections of a cone of confusion on a particular sagittal plane. The spatial extent of the source’s trajectory was systematically varied to probe directional discrimination thresholds.

7166
Headphone Transparification: A Novel Method for Investigating the Externalization of Binaural Sounds
Moore, Alastair H.; Tew, Anthony I.; Nicol, Rozenn
The only way to be certain that binaurally rendered sounds are properly externalised is to compare them to real sound sources in a discrimination experiment. However, the presence of the headphones required for the binaural rendering interfere with the real sound source. A novel technique is presented which uses small compensating signals applied to the headphones at the same time as the real source is active, such that the signals reaching the ears are the same as if the headphones were not present.

7167
On the Sound Color Properties of Wavefield Synthesis and Stereo
Wittek, Helmut; Rumsey, Francis; Theile, Günther
The sound colour reproduction properties of wavefield synthesis are analysed by listening tests and compared with that of stereophony. A novel technique, "OPSI", designed to avoid spatial aliasing is presented and analysed in theory and practice. Both stereophonic phantom sources as well as OPSI sources were perceived to be less coloured than was predicted by colouration predictors based on the spectral alterations of the ear signals. This leads to the hypothesis that a decolouration process exists for stereophonic reproduction as proposed in the "association model" of Theile.

7168
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2-D Filtering
Lukin, Alexey; Todd, Jeremy
Spectral attenuation algorithms for audio noise reduction often generate annoying musical noise artifacts. Most existing methods for suppression of musical noise employ a combination of instantaneous and time-smoothed spectral estimates for calculation of spectral gains. In this paper, a 2D approach to the filtering of a time-frequency spectrum is proposed, based on a recently developed Non-Local Means image denoising algorithm. The proposed algorithm demonstrates efficient reduction of musical noise, without creating “noise echoes” artifacts inherent in time-smoothing methods.

7169
Perceptually Motivated Gain Filter Smoothing for Noise Suppression
Favrot, Alexis; Faller, Christof
Stationary noise suppression is widely used, mostly for reducing noise in speech signals or for audio restoration. Most noise suppression algorithms are based on spectral modification, i.e. a real-valued gain filter is applied to short-time spectra of the speech signal to reduce noise. The more noise is to be removed, the more likely are artifacts due to aliasing effects and time variance of the gain filter. A perceptually motivated systematic time and frequency smoothing of the gain filter is proposed to improve quality, considering the frequency resolution of the auditory system and masking. Comparison with a number of previous methods indicates that the proposed noise suppressor performs as good as the best other method, while computational complexity is much lower.

7170
A Novel Automatic Noise Removal Technique for Audio and Speech Signals
E.V., Harinarayanan; Sinha, Deepen; Saeed, Shamail; Ferreira, Anibal
This paper introduces new ideas on wideband stationary/non-stationary noise removal for audio signals. Current noise reduction techniques have generally proven to be effective, yet these typically exhibit certain undesirable characteristics. Distortion and/or alteration of the audio characteristics of primary audio sound is a common problem. Also user intervention in identifying the noise profile is sometimes necessary. The proposed technique is centered on the classical Kalman filtering technique for noise removal but uses a novel architecture whereby advanced signal processing techniques are used to identify and preserve the richness of the audio spectrum. The paper also includes conceptual and derivative results on parameter estimation, a description of multi parameter Signal Activity Detector (SAD) and our new found improved results.

7171
The Concept, Design, and Implementation of a General Dynamic Parametric Equalizer
Wise, Duane K.
The classic operations of dynamics processing and parametric equalization control two separate domains of an audio signal. The operational nature of the two processors give insight to a manner in which they may be combined into a single processor. This integrated processor can perform as the equivalent of a standalone dynamics processor or parametric equalizer, but can also modify the boost and/or cut of an equalizer stage over time following a dynamics curve. The design of a digital version of this concept is discussed herein, along with implementation issues and proposals for their resolutions.

7172
Short-term Memory for Musical Intervals: Cognitive Differences for Consonance and Dissonant Pure-Tone Dyads
Rogers, Susan E.; Levitin, Daniel J.
To explore the origins of sensory and musical consonance/dissonance, 16 participants performed a short-term memory task by listening to sequentially presented pure-tone dyads. Each dyad was presented twice; during each trial participants judged whether a dyad was novel or familiar. Nonmusicians showed greater recognition of musically dissonant than musically consonant dyads. Musicians recognized all dyads more accurately than predicted. Neither group used sensory distinctiveness as a recognition cue, suggesting that the frequency ratio, rather than the frequency difference between two tones, underlies memory for musical intervals. Participants recognized dyads well beyond the generally understood auditory short-term memory limit of 30 seconds, despite the inability to encode the stimuli for long-term storage.

7173
Multiple Regression Modeling of the Emotional Content of Film and Music
Parke, Rob; Chew, Elaine; Kyriakakis, Chris
Our research seeks to model the effect of music on the perceived emotional content of film media. We used participants’ ratings of the emotional content of film-alone, music-alone, and film-music pairings for a collection of emotionally neutral film clips and emotionally provocative music segments. Mapping the results onto a three-dimensional emotion space, we observed a strong relationship between the ratings of the film- and music-alone clips, and those of the film-music pairs. Previously, we modeled the ratings in each dimension independently. We now develop models, using stepwise regression, to describe the film-music ratings using quadratic terms, and based on all dimensions simultaneously. We demonstrate that while linear-terms are sufficient for single emotion dimension models, regression models that consider multiple emotion dimensions yield better results.

7174
Measurements and Perception of Nonlinear Distortion—Comparing Numbers and Sound Quality
Voishvillo, Alex
The discrepancy between traditional measures of nonlinear distortion and its perception is commonly recognized. THD, two-tone and multitone intermodulation and coherence function provide certain objective information about nonlinear properties of a DUT but they do not use any psychoacoustical principles responsible for distortion perception. Two approaches to building psychoacoustically-relevant measurement methods are discussed; one is based on simulation of the hearing system.s response similar to the methods used for assessment of codec.s sound quality. The other approach is based on several ideas such as distinguishing .low-level. versus .high-level. nonlinearities, low-order versus high-order nonlinearities, and spectral content of distortion signals that occur below the spectrum of an undistorted signal versus one that overlaps the signal.s spectrum or occurs above it. Several auralization examples substantiating this approach are demonstrated.

7175
Influence of Loudness Level on the Overall Quality of Transmitted Speech
Nicolas, Côté; Gautier-Turbin, Valérie; Möller, Sebastian
This paper consists of a study on the influence of the loudness on the perceived quality of transmitted speech. This quality is based on judgments of particular quality features, one of which is loudness. In order to determine the influence of loudness on perceived speech quality, we designed a two-step auditory experiment. We varied the speech level of selected speech samples and degraded them by coding and packet-loss. Results show that loudness has an effect on the overall speech quality, but that effect depends on the other impairments involved in the transmission path, and especially on the bandwidth of the transmitted speech. We tried to predict the auditory judgments with two quality prediction models. The signal-based WB-PESQ model, which normalizes the speech signals to a constant speech level, does not succeed in predicting the speech quality for speech signals with only impairments due to a non-optimum speech level. However, the parametric E-model, which includes a measure of the listening level, provides a good estimation of the speech quality.

7176
On the Use of Graphic Scales in Modern Listening Tests
Zielinski, Slawomir; Brooks, Peter; Rumsey, Francis
This paper provides a basis for discussion of the perception and use of graphic scales in modern listening tests. According to the literature, the distances between the adjacent verbal descriptors used in typical graphic scales are often perceptually unequal. This implies that the scales are perceptually non-linear and the ITU-R Quality Scale is shown to be particularly non-linear in this respect. In order to quantify the degree of violation of linearity in listening tests, the evaluative use of graphic scales was studied in three listening tests. Contrary to expectation, the results showed that the listeners use the scales almost linearly. This may indicate that the listeners ignore the meaning of the descriptors and use the scales without reference to the labels.

7177
A Model-Based Technique for the Perceptual Optimization of Multimodal Musical Performances
Valente, Daniel L.; Braasch, Jonas
As multi-channel audio and visual processing becomes more accessible to the general public, musicians are beginning to experiment with performances where players are in two or more remote locations. These co-located or telepresence performances challenge the conventions and basic rules of traditional musical experience. While they allow for collaboration with musicians and audiences in remote locations, the current limitations of technology restricts the communication between musicians. In addition, a telepresence performance introduces optical distortion that can result in impaired auditory communication, resulting in the need to study certain auditory-visual interactions. One such interaction is the relationship between a musician and a virtual visual environment. How does the attendant visual environment affect the perceived presence of a musician? An experiment was conducted to determine the magnitude of this effect. Two pre-recorded musical performances were presented through virtual display in a number of acoustically diverse environments under different relative background lighting conditions. Participants in this study were asked to balance the level of the direct-to-reverberant ratio, and reverberant level until the virtual musician's acoustic environment is congruent with that of the visual representation. One can expect auditory-visual interactions in the perception of a musician in varying virtual environments. Through a multivariate parameter optimization, the results from this study will be used to develop a parametric model that will control the current auditory rendering system, Virtual Microphone Control (ViMiC), in order to create a more perceptually accurate auditory visual environment for performance.

7178
Subjective and Objective Rating of Intelligibility of Speech Recordings
Gover, Bradford N.; Bradley, John S.
Recordings of test speech and an STIPA modulated noise stimulus were made with several microphone systems placed in various locations in a range of controlled test spaces. The intelligibility of the test speech recordings was determined by a subjective listening test, revealing the extent of differences among the recording systems and locations. Also, STIPA was determined for each physical arrangement, and compared with the intelligibility test scores. The correlation between STIPA and the intelligibility scores was not found to be high in all situations. A computer program was written to determine STIPA in accordance with IEC 60268-16. The result was found to be highly sensitive to the method of determining the modulation transfer function at each modulation frequency, yielding the most accurate result when normalizing by the pre-measured properties of the specific stimulus used.

7179
Potential Biases in MUSHRA Listening Tests
Zielinski, Slawomir; Hardisty, Philip; Hummersone, Christopher; Rumsey, Francis
The method described in the ITU-R BS.1534-1 standard, commonly known as MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors), is widely used for the evaluation of systems exhibiting intermediate quality levels, in particular low-bit rate codecs. This paper demonstrates that this method, despite its popularity, is not immune to biases. In two different experiments designed to investigate potential biases in the MUSHRA test, systematic discrepancies in the results were observed with a magnitude up to 22%. The data indicates that these discrepancies could be attributed to the stimulus spacing and range equalizing biases.

7180
Loudness Domain Signal Processing
Seefeldt, Alan
Loudness Domain Signal Processing (LDSP) is a new framework within which many useful audio processing tasks may be achieved with high quality results. The LDSP framework presented here involves first transforming audio into a perceptual representation utilizing a psychoacoustic model of loudness perception. This model maps the non-linear variation in loudness perception with signal frequency and level into a domain where loudness perception across frequency and time is represented on a uniform scale. As such, this domain is ideal for performing various loudness modification tasks such as volume control, automatic leveling, etc. These modifications may be performed in a modular and sequential manner, and the resulting modified perceptual representation is then inverted through the psychoacoustic loudness model to produce the final processed audio.

7181
Design of a Flexible Crossfade/Level Controller Algorithm for Portable Media Platforms
Jochelson, Danny; Fedigan, Stephen; Kridner, Jason; Hayes, Jeff
The addition of a growing number of multimedia capabilities on mobile devices necessitates rendering multiple streams simultaneously, fueling the need for intelligent mixing of these streams to achieve proper balance and address the tradeoff between dynamic range and saturation. Additionally, the crossfading of subsequent streams can greatly enhance the user experience on portable media devices. This paper describes the architecture, features, and design challenges for a real-time, intelligent mixer with crossfade capabilities for portable audio platforms. This algorithm shows promise in addressing many audio system challenges on portable devices through a highly flexible and configurable design while maintaining low processing requirements.

7182
Audio Delivery Specification
Lund, Thomas
From the quasi-peak meter in broadcast to sample by sample assessment in music production, normalization of digital audio has traditionally been based on a peak level measure. The paper demonstrates how low dynamic range material under such conditions generally comes out the loudest, and how the recent ITU-R BS.1770 standard offers a coherent alternative to peak level fixation. Taking the ITU-R recommendations into account, novel ways of visualizing short-term loudness and loudness history are presented; and applications for compatible statistical descriptors portraying an entire music track or broadcast program are discussed.

7183
Multi-Core Signal Processing Architecture for Audio Applications
Karley, Brent; Liberman, Sergio; Gallimore, Simon
As already seen in the embedded computing industry and other consumer markets, the trend in audio signal processing architectures is towards multi-core designs. This trend is expected to continue given the need to support higher performance applications that are becoming more prevalent in both the consumer and professional audio industries. This paper describes a multi-core audio architecture being promoted to the audio industry and details the various architectural hardware, software and system level trade-offs. The proper application of multi-core architectures is addressed for both consumer and professional audio applications and a comparison of single core, multi-core, and multi-chip designs is provided based on the authors’ experience in the design, development and application of signal processors.

7184
Rapidly Prototyping and Implementing Audio Algorithms on DSPs Using Model-Based Design and Automatic Code Generation
Ananthan, Arvind
This paper explores the increasingly popular model-based design concept to design audio algorithms within a graphical design environment, Simulink, and automatically generate processor specific code to implement it on target DSP in a short time without any manual coding. The final fixed-point processors targeted in this lecture will be Analog Devices Blackfin processor and Texas Instruments C6416 DSP. Examples include an acoustic noise cancellation system (using a LMS algorithm), 3-Band a parametric equalizer, and miscellaneous audio effects. The design process starting from a floating point model to easily converting it to a fixed-point model is clearly demonstrated. Finally, the model is then implemented on C6416 DSK board and Blackfin 537 EZ-Kit board using Real Time Workshop (RTW) code generation tool. Other examples that’ll design and implemented on the DSP within an hour will include audio effects such as reverberation, flanging, and voice pitch shifting.

7185
Filter Reconstruction and Program Material Characteristics Mitigating Word Length Loss in Digital Signal Processing-Based Compensation Curves Used for Playback of Analog Recordings
Robinson, Robert S.
Renewed consumer interest in pre-digital recordings, such as vinyl records, has spurred efforts to implement playback emphasis compensation in the digital domain. This facilitates realizing tighter design objectives with less effort than required with practical analog circuitry. A common assumption regarding a drawback to this approach, namely bass resolution loss (word length truncation) of up to approximately seven bits during digital de-emphasis of recorded program material, ignores the reconstructive properties of compensation filtering and the characteristics of typical program material. An analysis of the problem is presented, as well as examples showing a typical resolution loss of zero to one bits. The worst case resolution loss, which is unlikely to be encountered with music, is approximately three bits.

7186
Modeling of Nonlinearities in Electrodynamic Loudspeakers
Bard, Delphine; Sandberg, Göran
This paper proposes a model of the non-linearities in an electro-dynamic loudspeaker based on Volterra series decomposition and taking into account the thermal effects affecting the electrical parameters when temperature increases. This model will be used to predict nonlinearities taking place in a loudspeaker and their evolution as the loudspeaker is used for a long time and/or at high power rates and its temperature increases. A temperature increase of the voice coil will cause its series resistance value to increase, therefore reducing the current flowing in the loudspeaker. This phenomenon is known as power compression.

7187
Listening Tests of the Localization Performance of Stereodipole and Ambisonic Systems
Capra, Andrea; Fontana, Simone; Adriaensen, Fons; Farina, Angelo; Grenier, Yves
In order to find a possible correlation of objective parameters and subjective descriptors of the acoustic of theatres, auditoria or music hall, and so to perform meaningful listening tests, we need to find a reliable 3D audio system which should give the correct perception of the distances, a good localization all around the listener and a natural sense of realism. For this purpose a Stereo Dipole system and an Ambisonic system were installed in a listening room at La Casa Della Musica (Parma, Italy). Listening tests were carried out for evaluating the localization performances of the two systems.

7188
Round Robin Comparison of HRTF Simulation Systems: Preliminary Results
Greff, Raphaël; Katz, Brian F. G.
Variability in experimental measurement techniques of the HRTF is a concern that numerical calculation methods can hope to avoid. Numerical techniques such as the Boundary Element Method (BEM) allow for the calculation of the HRTF over the full audio spectrum from a geometrical model. While numerical calculations are not prone to the same errors as physical measurements, other problems appear which cause variations: geometry acquisition and modeling of real shapes as meshes can be performed in different ways. An on-going international round-robin study, “Club Fritz”, gathers already many HRTF data measured at from different laboratories on a unique dummy head. This work presents preliminary results of numerical simulations based on an acquired geometrical model of this artificial head.

7189
Simulation of Complex and Large Rooms Using a Digital Waveguide Mesh
López, José J.; Escolano, José; Pueo, Basilio
The Digital Waveguide Mesh (DWM) method for room acoustic simulation has been introduced in the last years to solve sound propagation problems numerically. However, the huge computer power needed in the modeling of large rooms and the complexity to incorporate realistic boundary conditions has delayed their general use, being restricted to the validation of theoretical concepts using simple and small rooms. This paper presents a complete DWM implementation that includes a serious treatment of boundary conditions, and it is able to cope with different materials in very large rooms up to reasonable frequencies. A simulation of a real large building modeled with a high degree of precision has been carried out and the obtained results are presented and analyzed in detail.

7190
The Flexible Bass Absorber
Adelman-Larsen, Niels W.; Thompson, Eric R.; Gade, Anders C.
Multi-purpose concert halls face a dilemma. They host different performance types that require significantly different acoustic conditions in order to provide the best sound quality to both the performers, sound engineers and the audience. Pop and rock music often contain high levels of bass sound energy but still require high definition for good sound quality. The mid- and high-frequency absorption is easily regulated, but adjusting the low-frequency absorption has typically been too expensive or requires too much space to be practical for multi-purpose halls. A practical solution to this dilemma has been developed. Measurements were made on a variable and mobile low-frequency absorber. The paper presents the results of prototype sound absorption measurements as well as elements of the design.

7191
The Relation between Active Radiating Factor and Frequency Responses of Loudspeaker Line Arrays - Part 2
Shen, Yong; An, Kang; Ou, Dayi
Active Radiating Factor (ARF) is an important parameter for evaluating the similarity between a real loudspeaker line array and the ideal continuous line source. Our previous paper dealt with the relation between ARF of the loudspeaker line array and the Differential chart of its Frequency Responses in two distances (FRD). In this paper, an improved way to estimate ARF of the loudspeaker line array by measuring on-axis frequency responses is introduced. Some further problems are discussed and experiment results are analyzed. The results may give some help to the loudspeaker array designers.

7192
Time Varying Behavior of the Loudspeaker Suspension
Pedersen, Bo Rohde; Agerkvist, Finn T.
The suspension part of the electrodynamic loudspeaker is often modelled as a simple linear spring with viscous damping, however the dynamic behaviour of the suspension is much more complicated than predicted by such a simple model. At higher levels the compliance becomes non-linear and often changes during excitation at high levels. This paper investigates how the compliance of the suspension depends on the excitation, i.e. level and frequency content. The measurements are compared with other known measurement methods of the suspension.

7193
Diffusers with Extended Frequency Range
Dadiotis, Konstantinos; Angus, Jamie A.; Cox, Trevor J.
Schroeder diffusers are unable to diffuse sound when all their wells radiate in phase, a phenomenon known as flat plate effect. This phenomenon appears at multiple frequencies of pf0, where p is the integer that generates the well depths and f0 the design frequency. A solution is to send the flat plate frequencies above the bandwidth of interest. For QRDs and PRDs to achieve this goal, impractically long sequences are needed. This paper presents Power Residue diffusers, of small length in comparison to their prime generator, as solutions to the problem. Their characteristics are investigated and their performance when applied to Schroeder diffusers is explored while modulation is used to cope with periodicity. The results confirm the expectations.

7194
Waveguide Mesh Reverberator with Internal Decay and Diffusion Structures
Abel, Jonathan S.; Huang, Patty; Smith, Julius O., III
Loss and diffusion elements are proposed for a digital waveguide mesh reverberator. The elements described are placed in the interior of the waveguide mesh, and may be viewed as modeling objects within the acoustical space. Filters at internal scattering junctions provide frequency-dependent losses, and control over decay rate. One proposed design method attenuates signals according to a desired reverberation time, taking into account the local density of loss junctions. Groups of one or several adjacent scattering junctions are altered to break up propagating wavefronts, thereby increasing diffusion. A configuration which includes these internal elements offers more flexibility in tailoring the reverberant impulse response than the common waveguide mesh construction where loss and diffusion elements are uniformly arranged solely at the boundaries. Finally, such interior decay and diffusion elements are ideally suited for use with closed waveguide structures having no boundaries, such as spherical or toroidal meshes, or meshes formed by connecting the edges or surfaces of two or more meshes.

7195
Deriving Physical Predictors for Auditory Attribute Ratings Made in Response to Multichannel Music Reproductions
Kim, Sungyoung; Martens, William L.
A group of 8 students engaged in a Tonmeister training program were presented with multichannel loudspeaker reproductions of a set of solo piano performances, and were asked to complete two attribute rating sessions that were well separated in time. Five of the 8 listeners produced highly consistent ratings after a 6 month period during which they received further Tonmeister training. Physical predictors for the obtained attribute ratings were developed from the analysis of binaural recordings of the piano reproductions in order to support comparison between these stimuli and other stimuli, and thereby to establish a basis for independent variation in the attributes to serve both creative artistic goals and further scientific exploration of such multichannel music reproductions.

7196
Interaction between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio Reproduction
Olive, Sean E.; Martens, William L.
The physical interaction between loudspeakers and the acoustics of the room in which they are positioned has been well established; however, the influence on listener preferences for loudspeakers that results from such variation in room acoustics has received little experimental verification. If listeners adapt to listening room acoustics relatively quickly, then room acoustic variation should not significantly influence loudspeaker preferences. In the current study, two groups of listeners were given differential exposure to listening room acoustics via a binaural room scanning (BRS) measurement and playback system. Although no significant difference in loudspeaker preference was found between these two groups of listeners, the room acoustic variation to which they were exposed did significantly influence loudspeaker preferences.

7197
Evaluating Off-Center Sound Degradation in Surround Loudspeaker Setups for Various Multichannel Microphone Techniques
Peters, Nils; McAdams, Stephen; Braasch, Jonas
Many listening tests have been undertaken to estimate listeners' preferences for different multichannel recording techniques. Usually these tests focus on the sweet spot, the spatial area where the listener maintains optimal perception of virtual sound sources, thereby neglecting to consider off-center listening positions. The purpose of the present study is to determine how different microphone configurations affect the size of the sweet spot. A perceptual method is chosen in which listening impressions achieved by three different multichannel recording techniques for several off-center positions are compared with the listening impression at the sweet spot. Results of this listening experiment are presented and interpreted.

7198
The Effects of Latency on Live Sound Monitoring
Lester, Michael; Boley, Jon
A subjective listening test was conducted to determine how objectionable various amounts of latency are for performers in live monitoring scenarios. Several popular instruments were used and the results of tests with wedge monitors are compared to those with in-ear monitors. It is shown that the audibility of latency is dependent on both the type of instrument and monitoring environment. This experiment shows that the acceptable amount of latency can range from 42ms to possibly less than 1.4ms under certain conditions. The differences in latency perception for each instrument are discussed. It is also shown that more latency is generally acceptable for wedge monitoring setups than for in-ear monitors.

7199
A Perforated Desk Surface to Diminish Coloration in Desktop Audio-Production Environments
Gentner, Karl; Braasch, Jonas; Calamia, Paul
In audio-production rooms, a common source of harmful reflections is the mixing console or desk surface itself. A perforated material is proposed as an alternative desk surface to reduce coloration by achieving acoustical transparency. A variety of desk surfaces and perforation schemes were tested within common room conditions. The resulting psychoacoustic study indicates that the fully-perforated desk provides lower coloration than that of the solid desk in every condition. A partially perforated desk shows a similar decrease in coloration, specifically when the perforated area is determined by the Fresnel zones dictated by the source and receiver positions.

7200
Perceptually Modeled Effects of Interchannel Crosstalk in Multichannel Microphone Technique
Lee, Hyun-Kook; Mason, Russell; Rumsey, Francis
One of the most noticeable perceptual effects of interchannel crosstalk in multichannel microphone technique is an increase in perceived source width. The relationship between the perceived source-width-increasing effect and its physical causes was analysed using an IACC-based objective measurement model. A description of the measurement model is presented and the measured data obtained from stimuli created with crosstalk and those without crosstalk are analysed visually. In particular, frequency and envelope dependencies of the measured results and their relationship with the perceptual effect are discussed. The relationship between the delay time of the crosstalk signal and the effect of different frequency content on the perceived source width is also discussed in this paper.

7201
Sigma-Delta Modulators Without Feedback Around the Quantizer?
Lipshitz, Stanley P.; Vanderkooy, John; Bodmann, Bernhard G.
We use a result due to Craven — the “Integer Noise Shaper Theorem” — to show that the internal system dynamics of the class of sigma-delta modulators (or equivalently noise shapers) with integer-coefficient error-feedback filters can be completely understood from the action of simple, linear pre- and de-emphasis filters surrounding a (possibly dithered) quantizer. In these mathematically equivalent models, there is no longer any feedback around the quantizer. The major stumbling block, which has previously prevented a complete dynamical analysis of such systems of order higher than one, is thus removed. The class of integer noise shapers includes, but is not restricted to, the important family of “Pascal” shapers, having all their zeros at dc. Before examining the “integer” shaper case, we discuss and extend Gerzon’s generic “Generalized Noise Shaper Theorem.

7202
The Effect of Different Metrics on the Performance of “Stack” Algorithms for Look-Ahead Sigma Delta Modulators
Websdell, Peter C.; Angus, Jamie A.
Look-ahead Sigma-Delta modulators look forward k samples before deciding to output a “one” or a “zero”. The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. This paper describes alternative tree based algorithms. Tree based algorithms are simpler to implement because they do not require backtracking to determine the correct output value. They can also be made more efficient using “Stack” algorithms. Both the tree algorithm and the more computationally efficient “Stack” algorithms are described. In particular, the effects of different error metrics on the performance of the “Stack” algorithm are described and the average number of moves required per bit discussed. The performance of the “Stack” algorithm is shown to be better than previously thought.

7203
Evaluation of Time-Frequency Analysis Methods and Their Practical Applications
Brunet, Pascal; Rimkunas, Zachary; Temme, Steve
Time-Frequency analysis has been in use for more than 20 years and many different Time-Frequency distributions have been developed. Four in particular, Short Time Fourier Transform, Cumulative Spectral Decay, Wavelet and Wigner-Ville have gained popularity and firmly established themselves as useful measurement tools. This paper compares these four popular transforms, explains their trade-offs and discusses how to apply them to analyzing audio devices. Practical examples of loudspeaker impulse responses, loose particles, and Rub & Buzz defects are given as well as a demonstration of their application to common problems with digital/analog audio devices such as Bluetooth headsets, MP3 players and VoIP telephones.

7204
Time-Frequency Characterization of Loudspeaker Responses Using Wavelet Analysis
Ponteggia, Daniele; Di Cola, Mario
An electro-acoustic transducer can be characterized by measuring its Impulse Response (IR). Usually the collected IR is then transformed by means of the Fourier Transform to get the complex frequency response. IR and complex frequency response form a pair of equivalent views of the same phenomena. An alternative joint time-frequency view of the system response can be achieved using Wavelet Transform and a color-map display. This work illustrates the implementation of the Wavelet Transform into a commercial measurement software and presents some practical results on different kinds of electro-acoustic systems.

7205
Equalization of Loudspeaker Resonances Using Second-Order Filters Based on Spatially Distributed Impulse Response Measurements
Dyreby, Jakob; Choisel, Sylvain
A new approach for identifying and equalizing resonances in loudspeakers is presented. The method optimizes the placement of poles and zeros in a second-order filter by minimization of the frequency-dependent decay. Each resonance may be equalized by the obtained second-order filter. Furthermore, the use of spectral decay gives opportunity for optimizing on multiple measurements simultaneously making it possible to take multiple spatial directions into account. The proposed procedure is compared to direct inversion and minimum-phase equalization. It makes it possible to equalize precisely the artifacts responsible for ringing, while being largely unaffected by other phenomena such as diffractions, reflections, and noise.

7206
Pump Up the Volume: Enhancing Music Phone Audio Quality and Power Using a Supercapacitor for Power Management
Mars, Pierre
As multimedia and music phones grow in popularity, consumers want an iPod-quality, uninterrupted audio experience without the buzzing and clicks associated with wireless transmission. This article describes the problems delivering high power and high quality audio in music-enabled mobile phones and how a supercapacitor can overcome them. Typically, the audio amplifier power supply input in a mobile phone is connected directly to Vbattery. This paper compares audio performance between the typical setup and connecting the audio amp supply to a supercapacitor charged to 5V through a current limited boost converter.

7207
Digital Audio Processing on a Tiny Scale: Hardware and Software for Personal Devices
Eastty, Peter
The design of an audio signal processor, graphical programming environment, DSP software and parameter adjustment tool is described with reference to the hardware and software requirements of the audio sweetening function in personal devices, particularly cell phones. Special care is taken in the hardware design to ensure low operating power, small size (4mm*4mm package) or 0.5 to 1 sq mm area depending on geometry, stereo analog and digital I/O and high performance. The parameter adjustment tool allows real time control of the DSP so that processing may be customized to the actual properties of the audio sources and the acoustic properties of the enclosure and speakers. A live demonstration of the programming and parameter adjustment of the processor will be given as part of the presentation of the paper.

7208
Enhancing End-User Capabilities in High Speed Audio Networks
Chigwamba, Nyasha; Foss, Richard
Firewire is a digital network technology that can be used to interconnect professional audio equipment, PCs and electronic devices. The Plural Node Architecture splits connection management of firewire audio devices between two nodes, namely an Enabler and a Transporter. The Audio Engineering Society’s SC-02-12-G Task Group has produced an Open Generic Transporter guideline document which describes a generic interface between the Enabler and Transporter. A client-server implementation above the Plural Node Architecture allows connection management of firewire audio devices via TCP/IP. This paper describes enhancements made to connection management applications as a result of additional capabilities revealed by the Open Generic Transporter document.

7209
Sharing Acoustic Spaces over Telepresence Using Virtual Microphone Control
Braasch, Jonas; Valente, Daniel L.; Peters, Nils
This paper describes a system which is used to project musicians in two or more co-located venues into a shared virtual acoustic space. The sound of the musicians is captured using spot mics. Afterwards, it is projected at the remote end using spatialization software based on virtual microphone control (ViMiC) and an array of loudspeakers. In order to simulate the same virtual room at all co-located sites, the ViMiC systems communicate using the OpenSound Control protocol to exchange room parameters and the room coordinates of the musicians.

7210
A Tutorial: Fiber Optic Cables and Connectors for Pro-Audio
Ajemian, Ronald G.
There have been many technological breakthroughs in the area of fiber optic technology which have allowed an easier transition to migrate into the professional audio arena. Since the current rise of copper prices in the worldwide markets, there has been an increase usage in fiber optic based equipment, cables and connectors deployed for pro-audio and video. This prompted the writing of this tutorial to bring the professional audio community up to date with some old and new fiber optic cables and connectors now being deployed in pro-audio.

7211
The Most Appropriate Method of Producing TV Program Audio Focusing on the Audience
Ohmata, Hisayuki; Fukada, Akira; Kouchi, Hiroshi
When audiences watch TV programs, they often perceive a difference in audio levels. This is a real annoyance for the audience and it is caused by differences in program audio. In order to have equal audio levels, it is necessary to produce audio under the same conditions for all programs. To solve this problem, we propose a method to produce TV program audio. We make clear the manner in which different monitoring levels influence mixing balance at various mixing stages. This proposal also describes management of audio levels for programs with different digital broadcasting head rooms.

7212
Beyond Splicing: Technical Ear Training Methods Derived from Digital Audio Editing Techniques
Corey, Jason
The process of digital audio editing, especially with classical or acoustic music using a source-destination method, offers an excellent opportunity for ear training. Music editing involves making transparent connections or splices between takes of a piece of music, and often requires specifying precise edit locations by ear. The paper outlines how aspects of digital editing can be used systematically as an ear training method, even out of the context of an editing session. It describes a software tool based on audio editing techniques to create an effective ear training method offering benefits that transfer beyond audio editing.

7213
New Trends in Sound Reinforcement Systems Based on Digital Technology
Kozlowski, Piotr Z.; Dziechcinski, Pawel; Grzadziel, Wojciech
This paper presents new aspects of modern sound reinforcement system’s designing which came into view because of prevalence of digital technology. Basic structure of modern digital electro-acoustical system is explained at example of one has been installed at Wrocław Opera House. This article is focusing on some aspects connected to digital transmission of audio signals, proper audience area sound coverage, achieving smooth frequency response, obtaining directive propagation at low frequencies and controlling the system. Some measurement and tests about the topics presented in the paper have been done during the tuning of the system at Wroclaw Opera House. Attained results prove that it is possible to meet these targets.

7214
Network Music Performance with Ultra-Low-Delay Audio Coding under Unreliable Network Conditions
Kraemer, Ulrich; Jens, Hirschfeld; Schuller, Gerald; Wabnik, Stefan; Carôt, Alexander; Werner, Christian
A key issue for successfully interconnecting musicians in real-time over the Internet is minimizing the end-to-end signal delay for transmission and coding. Anyhow, the variance of transmission delay (``jitter') occasionally causes some packets arrive too late for playback. To avoid this problem previous approaches are working with rather large receive buffers while accepting larger delay. In this paper we will present a novel solution that keeps buffer sizes and delay minimal. On the network layer we are using a highly optimized audio framework called ``Soundjack' and on the coding layer we are working with an ultra low-delay codec for high-quality audio. We analyze and evaluate a modified transmission and coding scheme for the Fraunhofer Ultra-Low-Delay (ULD) audio coder, which is designed to be more resilient to lost and late arriving data packets.

7215
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR-WB+ Codec against Bit Errors
Gournay, Philippe
Audio codecs face various channel impairments when used in challenging applications such as digital radio. The standard AMR-WB+ audio codec includes a concealment procedure to handle lost frames. It is also inherently robust to bit errors, although some bits within any given frame are more sensitive than others. Motivated by this observation, the present paper makes two contributions. First, a detailed study of the sensitivity of individual bits in AMR-WB+ frames is provided. All the bits in a frame are then divided into three sensitivity classes so that efficient unequal error protection (UEP) schemes can be designed. Then, a very low bit rate protection layer to increase the robustness of the codec against bit errors is proposed and assessed using the results of subjective audio quality tests. Remarkably, in contrast to the standard codec, where some errors have a very discernable effect, the protection layer ensures that the decoded audio is free of major channel artifacts even at a significant 0.5% bit error rate.

7216
Trellis Based Approach for Joint Optimization of Window Switching Decisions and Bit Resource Allocation
Melkote, Vinay; Rose, Kenneth
The fact that audio compression for streaming or storage is usually performed offline alleviates traditional constraints on encoding delay. We propose a rate-distortion optimized approach, within the MPEG Advanced Audio Coding framework, to trade delay for optimal window switching and resource allocation across frames. A trellis is constructed where stages correspond to audio frames, nodes represent window choices, and branches implement transition constraints. A suitable cost, comprising bit consumption and psycho-acoustic distortion, is optimized via multiple passes through the trellis until the desired bit-rate is achieved. The procedure offers optimal window switching as well as better bit distribution than conventional bit-reservoir schemes that are restricted to only ``borrow' bits from past frames. Objective and subjective tests show considerable performance gains.

7217
Transcoding of Dynamic Range Control Coefficients and Other Metadata into MPEG-4 HE AAC
Schildbach, Wolfgang; Krauss, Kurt; Rödén, Jonas
With the introduction of HE AAC (also known as aacPlus) into several new broadcasting systems, the topic of how to best encode new and transcode pre-existing metadata such as dynamic range control (DRC) data, program reference level and downmix coefficients into HE AAC has gained renewed interest. This paper will discuss the means of carrying metadata within HE AAC and derived standards like DVB, and present studies on how to convert metadata persistent in different formats into HE AAC. Listening tests are employed to validate the results.

7218
Advanced Audio for Advanced IPTV Services
Vlaicu, Roland
PRESENTATION AND ABSTRACT: Telco service providers face significant new challenges for audio delivery in next generation broadcast systems such as IPTV. These include the capability to deliver soundtracks from mono to 5.1-channels and beyond with greater efficiency than current systems. The session will discuss the requirements for an audio coding scheme to address both near- and long-term technology challenges of IPTV broadcasting, such as delivering HDTV over IP. Attendees will understand the various factors to be considered in determining the characteristics of a suitable audio coding scheme. Participants will learn how to maintain full control over the listening experience for all consumer environments through comprehensive metadata control and standardized connectivity. Attendees will also understand how to deal with the many requirements posed by multichannel audio, how to facilitate the integration of a new audio codec with a broadcast production environment, and how to offer performance improvements for the consumer, while ensuring simple integration into the consumer listening environment. ADDITIONAL INFORMATION: The session will cover the factors to be considered in determining the characteristics of a suitable audio coding scheme, including requirements for new IPTV and broadcast services such as HDTV and services using advanced video codec such as H.264 or VC-1; opportunities for improving the audio performance of current IPTV broadcast services; the impact of a new audio coding scheme on broadcast production practices; the impact of a new audio coding scheme on consumer hardware and listening environments; and differentiation between IPTV and competitive TV services. For example, the discussion about how providers can prepare for the demands of future broadcast services will include an overview of requirements such as the ability to deliver audio quality improvements to match video quality improvements of high-definition IPTV broadcasting; fl

7219
A Study of the MPEG Surround Quality Versus Bit-Rate Curve
Rödén, Jonas; Breebaart, Jeroen; Hilpert, Johannes; Purnhagen, Heiko; Schuijers, Erik; Kippens, Jeroen; Linzmeier, Karsten; Hölzer, Andreas
MPEG Surround provides unsurpassed multi-channel audio compression efficiency by extending a mono or stereo audio coder with additional side information. This compression method has two important advantages. The first is its backward compatibility, which is important when MPEG Surround is employed to upgrade an existing service. Secondly, the amount of side information can be varied over a wide range to enable high-quality multi-channel audio compression at extremely low bit rates up to perceptual transparency at higher bit rates. The present paper provides a study of the performance of MPEG Surround, highlighting the various tradeoffs that are available when using MPEG Surround. Furthermore, a quality versus bit rate curve describing the MPEG Surround performance will be presented.

7220
Quality Impact of Diotic versus Monaural Hearing on Processed Speech
Nagle, Arnault; Quinquis, Catherine; Sollaud, Aurelien; Battistello, Anne; Slock, Dirk
In VoIP audio conferencing, hearing is done over handsets or headphones, so through one or two ears. In order to keep the same loudness perception between the two modes, a listener can only tune the sound level. The goal of this paper is to show that monaural or diotic hearing has a quality impact on speech processed by VoIP coders. It can increase or decrease the differences in perceived quality between tested coders and even change their ranking according to the sound level. This impact on the ranking of the coders will be explained thanks to the normal equal-loudness-level contours over headphones and the specifics of some coders. It is important to be aware of the impact of the hearing system and its associated sound level.

7221
A Novel Audio Post-Processing Toolkit for the Enhancement of Audio Signals Coded at Low Bit Rates
Annadana, Raghuram; E.V., Harinarayanan; Sinha, Deepen; Ferreira, Anibal
Low bit rate audio coding often results in the loss of a number of key audio attributes such as audio bandwidth and stereo separation. Additionally, there is also typically a loss in the level of details and intelligibility and/or warmth in the signal. Due to the proliferation, e.g. on Internet, of low bit rate audio coded using a variety of coding schemes and bit rates over which the listener has no control, it is becoming increasingly attractive to incorporate processing tools in the player which can ensure a consistent listener experience. We describe a novel post-processing toolkit which incorporates tools for (i) Stereo Enhancement, (ii) Blind Bandwidth Extension, (iii) Automatic Noise Removal and Audio Enhancement, and, (iv) Blind 2-to-5 channel upmixing. Algorithmic details, listening results, and audio demonstrations are presented.

7222
Subjective Evaluation of the Immersive Sound Field Rendition System and Recent Enhancements
Dubey, Chandresh; Annadana, Raghuram; Sinha, Deepen; Ferreira, Anibal
Consumer audio applications such as satellite radio broadcasts, multi-channel audio streaming and playback systems coupled with the need to meet stringent bandwidth requirements are eliciting newer challenges in parametric multi-channel audio coding schemes. This paper describes the continuation of our research concerning the Immersive Soundfield Rendition (ISR) system. In particular we present detailed subjective result data benchmarking the ISR system in comparison to MPEG Surround and also characterizing the audio quality level at different sub-modes of the system. We also describe enhancements to various algorithmic components in particular the blind 2-to-5 channel upmixing algorithm and describe a novel scheme for providing enhanced stereo downmix at the receiver for improved decoding by conventional matrix decoding systems.

7223
Improved Stereo Imaging in Automobiles
Smithers, Michael J.
A significant challenge in the automobile listening environment is the predominance of off-axis listening positions. This leads to audible artifacts including comb filtering and indeterminate stereo imaging; both in traditional stereo and more recent multi-channel speaker configurations. This paper discusses the problem of off-axis listening as well as methods to improve stereo imaging in a symmetric manner using all-pass FIR and IIR filters. This paper also discusses a more efficient IIR filter design that achieves similar performance to previous filter designs. Use of these filters results in stable, virtual sources in front of off-axis listeners.

7224
A Listening Test System for Automotive Audio – Part 3: Comparison of Attribute Ratings Made in a Vehicle with Those Made Using an Auralization System
Hegarty, Patrick; Choisel, Sylvain; Bech, Søren
A system has been developed to allow listening tests of car audio sound systems to be conducted over headphones. The system employs dynamic binaural technology to capture and reproduce elements of an in-car soundfield. An experiment, a follow up to previous work, to validate the system is described. Seven trained listeners were asked to rate a range of stimuli in a car as well as over headphones for 15 elicited attributes. Analysis of variance was used to compare ratings from the two hardware setups. Results show the ratings for spatial attributes to be preserved while differences exist for some timbral and temporal attributes.

7225
A Listening Test System for Automotive Audio - Part 4: Comparison of Attribute Ratings Made by Expert and Non-Expert Listeners
Choisel, Sylvain; Hegarty, Patrick; Christensen, Flemming; Pedersen, Benjamin; Ellermeier, Wolfgang; Ghani, Jody; Song, Wookeun
A series of experiments was conducted in order to validate an experimental procedure to perform listening tests on car audio systems in a simulation of the car environment in a laboratory, using binaural synthesis with head-tracking. Seven experts and 40 non-expert listeners rated a range of stimuli for 15 sound-quality attributes developed by the experts. This paper presents a comparison between the attribute ratings from the two groups of participants. Overall preference of the non-experts was also measured using direct ratings as well as indirect scaling based on paired comparisons. The results of both methods are compared.

7226
The Application of Direct Digital Feedback for Amplifier System Control
Bell, Craig; Jones, David; Watts, Robert
An effective feedback topology is clearly a beneficial requirement for a well performing digital amplifier. The ability to cancel corrupting influences such as power supply ripple and un-matched components is necessary for good sonic performance. Additional benefits derive from the fact that the feedback information is processed in the digital domain. Current delivered into the speaker load can be inferred. The amplifier acts as a voltage source, the value of which is derived from the recorded source material. The current delivered into the loudspeaker is also clearly influenced by the load impedance, which varies with frequency and other factors. This paper describes the ability of the system to measure current and derive speaker impedance and actual delivered power and goes on to illustrate applications in real systems.

7227
Generation of Variable Frequency Digital PWM
Midya, Pallab
Digital audio amplifiers convert digital PCM to digital PWM to be amplified by a power stage. This paper introduces a method to generate a quantized duty ratio digital PWM with a switching frequency that varies over a 20% range to mitigate EMI issues. The method is able to compensate for the variation in switching frequency such that the SNR in the audio band is comparable to fixed frequency PWM. To obtain good rejection of the noise introduced by the variation of the PWM frequency higher order noise shapers are used. This paper describes in detail the algorithm for a fourth order noise shaper. Using this method dynamic range in excess of 120 dB un-weighted over a 20 kHz bandwidth is achieved.

7228
Recursive Natural Sampling for Digital PWM
Midya, Pallab; Roeckner, Bill; Paulo, Theresa
This paper presents a highly accurate and computationally efficient method for digital-domain computation of naturally sampled digital pulse width modulation (PWM) signals. This method is used in a switching digital audio amplifier. The method is scalable for performance versus calculation complexity. Using a second order version of the algorithm with no iteration, intermodulation linearity of better than 113 dB is obtained with a full scale input at 19 kHz and 20 kHz. Matlab simulation and measured results from a digital amplifier implemented with this algorithm are presented. Overall system performance is not limited by the accuracy of the natural sampling method.

7229
Impact of Equalizing Ear Canal Transfer Function on Out-of-Head Sound Localization
Yoshida, Masataka; Kudo, Akihiro; Hokari, Haruhide; Shimada, Shoji
Several papers have pointed out that the frequency characteristics of the ear canal transfer functions (ECTFs) depend on headphone type, ear placement position of headphones, and subject's ear canal shape/volume. However, the effect of these factors on creating out-of-head sound localization has not been sufficiently clarified. The purpose of this paper is to clarify this effect. Sound localization tests using several types of headphones are performed in three conditions; listener's (individualized) ECTFs, HATS's (non-individualized) ECTFs, and omitted ECTFs. The results show that employing the individualized ECTFs generally yields accurate localization, while omitting the use of ECTFs increase the horizontal average localization error in ccordance with the type of headphone employed.

7230
A Method for Estimating the Direction of Sound Image Localization for Designing a Virtual Sound Image Localization Control System
Ohta, Yoshiki; Obata, Kensaku
We developed a method of estimating the direction of sound image localization. Our method is based on the sound pressure distribution in the vicinity of a listener. In the experiment, band noises that only differ in phase were produced from two loudspeakers. We determined what relation existed between the subjective direction of the sound image localization and the objective sound pressure distribution in the vicinity of the listener. We found that an azimuth of localization can be expressed as a linear combination of sound pressure levels in the vicinity of the listener. Our method can be used to estimate azimuths with a high degree of accuracy and to associate phase differences with azimuths. Therefore, it can be used to design a system for controlling virtual sound image localization.

7231
A Preliminary Experimental Study on Perception of Movement of a Focused Sound Using a 16-Channel Loudspeaker Array
Sato, Daiki; Oto, Teruki; Ashihara, Kaoru; Horiguchi, Ryozo; Kiryu, Shogo
We have been developing a sound field effector by using a loudspeaker array. In order to design a practical system, psychoacoustic experiments for recognition of sound fields are required. In this study, perception of a sound focus is investigated using 16 channel loudspeaker array. Listening experiments were conducted in an anechoic room and a listening room. The movement of 25cm in horizontal direction and the movement of 100cm in the direction from the speaker array towards the subject could be recognized in the both rooms, but that in the vertical could not be perceived in the both rooms.

7232
Perceptual Categories of Artificial Reverberation for Headphone Reproduction of Music
Marui, Atsushi
In the studies of artificial reverberations, the focus is usually on recreating the natural reverberation that can be heard in the real environments. However, little attention was payed to the evaluation of \emph{useful ranges} in application of the artificial reverberation in music production. The focus of this paper is to discuss and evaluate three artificial reverberation algorithms intended for headphone reproduction of music, and to propose iso-usefulness contour on those algorithms for different types of musical sounds.

7233
Correspondence Relationship between Physical Factors and Psychological Impressions of Microphone Arrays for Orchestra Recording
Kamekawa, Toru; Marui, Atsushi; Irimajiri, Hideo
Microphone technique for surround sound recording of an orchestra is discussed. Eight types of well known microphone array recorded in a concert hall were compared in subjective listening test on seven attributes such as spaciousness, powerfulness and localization using a method inspired by MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor). The result of the experiment shows similarity and dissimilarity between each microphone array. It is estimated that directivity of a microphone and distance between each microphone are related to the character of microphone array and these similarities are changed by music character. The relations of the physical factors of each array were also compared, such as SC (Spectral Centroid), LFC (Lateral Fraction Coefficient), and IACC (Inter Aural Cross-correlation Coefficient) from the impulse response of each array or recordings by a dummy head. The correlation of these physical factors and the attribute scores show that the contribution of these physical factors depends on music.

7234
Assessment of the Quality of Digital Audio Reproduction Devices by Panels of Listeners of Different Professional Profiles
Kleczkowski, Piotr; Pluta, Marek; Piotrowski, Szymon
A series of experiments has been conducted, where different panels of listeners assessed the quality of some selected digital audio reproduction devices. The quality of the devices covered a very wide range, from budget MP3 players through to a professional high resolution digital-to-analog conversion system. The main goal of this research was to investigate whether panels of listeners of different professional profiles are able to give different evaluations of the sound quality. Some interesting results have been obtained.

7235
Music Structure Segmentation Using the Azimugram in Conjunction with Principal Component Analysis
Barry, Dan; Gainza, Mikel; Coyle, Eugene
A novel method to segment stereo music recordings into formal musical structures such as verses and choruses is presented. The method performs dimensional reduction on a time-azimuth representation of audio which results in a set of time activation sequences, each of which corresponds to a repeating structural segment. This is based on the assumption that each segment type such as verse or chorus has a unique energy distribution across the stereo field. It can be shown that these unique energy distributions along with their time activation sequences are the latent principal components of the time-azimuth representation. It can be shown that each time activation sequence represents a structural segment such as a verse or chorus.

7236
Using the Semantic Web for Enhanced Audio Experiences
Raimond, Yves; Sandler, Mark
In this paper, we give a quick overview of some key Semantic Web technologies which allow us to overcome the limitations of the current web of documents to create a machine-processable web of data, where information is accessible by automated means. We then detail a framework for dealing with audio-related information on the Semantic Web: the Music Ontology. We describe some examples of how this ontology has been used to link together heterogeneous data sets, dealing with editorial, cultural or acoustic data. Finally, we explain a methodology to embed such knowledge into audio applications (from digital jukeboxes and digital archives to audio editors and sequencers), along with concrete examples and implementations.

7237
Content Management Using Native XML and XML-Enabled Database Systems in Conjunction with XML Metadata Exchange Standards
Sincaglia, Nicolas
The digital entertainment industry has developed communication standards to support the distribution of digital content using XML technology. Recipients of these data communications are challenged when transforming and storing the hierarchical XML data structures into more traditional relational database structures for content management purposes. Native XML and XML-Enabled database systems provide possible solutions to many of these challenges. This paper will consider several data modeling design options and evaluate the suitability of these alternatives for content data management.

7238
Music Information Retrieval in Broadcasting: Some Visual Applications
Mason, Andrew; Evans, Michael J.; Sheikh, Alia
The academic research field of music information retrieval is expanding as rapidly as the MP3 collection of a stereotypical teenager. This could be no coincidence : the benefit of an automated genre classifier increases when the music collection contains several thousand tracks. Of course, there are other applications of music information retrieval. Here we highlight a few that make use of a simple, visual, representation of an audio signal, based on three easy-to-calculate audio features. The applications range from simple navigation around consumer recordings of broadcasts, to a music video production planning tool, to a short term "Listen Again" eye-catching display.

7239
Addressing the Discrepancy Between Measured and Modeled Impulse Responses for Small Rooms
Chen, Zhixin; Maher, Robert
Simple computer modeling of impulse responses for small rectangular rooms is typically based on the image source method, which results in an impulse response with very high time resolution. Image source method is easy to implement, but simulated impulse responses are often a poor match to measured impulse responses because descriptions of sources, receivers, and room surfaces are often too idealized to match real measurement conditions. In this paper, a more elaborate room impulse response computer modeling technique is developed by incorporating measured polar responses of speaker, measured polar responses of microphone, and measured reflection coefficients of room surfaces into basic image source method. Results show that compared with basic image source method, the modeled room impulse response using this method is a better match to the measured room impulse response, as predicted by standard acoustical theories and principles.

7240
Comparison of Simulated and Measured HRTFs: FDTD Simulation Using MRI Head Data
Mokhtari, Parham; Takemoto, Hironori; Nishimura, Ryouichi; Kato, Hiroaki
This paper presents a comparison of computer-simulated versus acoustically measured, front-hemisphere head related transfer functions (HRTFs) of two human subjects. Simulations were carried out with a 3D finite difference time domain (FDTD) method, using magnetic resonance imaging (MRI) data of each subject's head. A spectral distortion measure was used to quantify the similarity between pairs of HRTFs. Despite various causes of mismatch including a different head-to-source distance, the simulation results agreed considerably with the acoustic measurements, particularly in the major peaks and notches of the front ipsilateral HRTFs. Averaged over 133 source locations and both ears, mean spectral distortions for the two subjects were 4.7 dB and 3.8 dB respectively.

7241
Scattering Uniformity Measurements and First Reflection Analysis in a Large Non-Anechoic Environment
Rizzi, Lorenzo; Farina, Angelo; Galaverna, Paolo; Martignon, Paolo; Rosati, Andrea; Conti, Lorenzo
A new campaign of experiments was run on the floor of a large room to obtain a long enough anechoic time window: this permitted to study the first reflection from the panels themselves and their diffusion uniformity. The results are discussed, comparing them with past measurements and with the ones from a simplified set-up with a smaller geometry. Some key matters to measurement are discussed, they were proposed in a recent comment letter posted to the specific AES-4id document committee on its reaffirmation. An analysis of the single reflection and reflectivity data was undertaken to investigate the behavior of a perforated panel and the measurement set-up overall potential.

7242
A Note on the Implementation of Directive Sources in Discrete Time-Domain Dispersive Meshes for Room Acoustic Simulation
Escolano, José; López, José J.; Pueo, Basilio; Cobos, Maximo
The use of wave methods to simulate room impulse responses provides the most accurate solutions. Recently, a method to incorporate directive sources in discrete-time methods, such as Finite Differences and Digital Waveguide Mesh has been proposed. It is based in the proper combination of monopoles in order to achieve the desired directivity pattern in far field condition. However, this method is used without taking into account the inherent dispersion in most of these discrete-time paradigms. This paper analyzes how influent is the dispersion in order to get the proper directivity through different study cases.

7243
Rendering of Virtual Sound Sources with Arbitrary Directivity in Higher Order Ambisonics
Ahrens, Jens; Spors, Sascha
Higher order Ambisonics (HOA) is a spatial audio reproduction technique aiming at physically synthesizing a desired sound field. It is based on the expansion of sound fields into orthogonal basis functions (spatial harmonics). In this paper we present an approach to the two-dimensional reproduction of virtual sound sources at arbitrary positions having arbitrary radiation directivities. The approach is based on the description of the directional properties of a source by a set of circular harmonics. Consequences of truncation of the circular harmonics expansion and spatial sampling as occurring in typical installations of HOA systems due to the employment of a finite number of loudspeakers are discussed. We illustrate our findings with simulated reproduction results.

7244
The Ill-Conditioning Problem in Sound Field Reconstruction
Fazi, Filippo M.; Nelson, Philip A.
A method for the analysis and reconstruction of a three dimensional sound field using an array of microphones and an array of loudspeakers is presented. The criterion used to process the microphone signals and obtain the loudspeakers signals is based on the minimisation of the least-square error between the reconstructed and the original sound field. This approach requires the formulation of an inverse problem that can lead to unstable solutions due to the ill-conditioning of the propagation matrix. The concepts of generalised Fourier transform and singular value decomposition are introduced and applied to the solution of the inverse problem in order to obtain stable solutions and to provide a clear understanding of the regularisation method.

7245
Analysis of Edge Boundary Conditions on Multiactuator Panels
Pueo, Basilio; Escolano, José; López, José J.; Bleda, Sergio
Distributed Mode Loudspeakers consist of a flat panel of a light and stiff material to which a mechanical exciter is attached, creating bending waves that are then radiated as sound fields. It can be used to build arrays for Wave Field Synthesis reproduction by using multiple exciters in a single vibrating surface. The exciter interaction with the panel, the panel material and the panel contour clamp conditions are some of the critical points that need to be evaluated and improved. In this paper, we address the edge boundary conditions influence on the quality of the emitted wave field. The measures of the wave fields have been interpreted in the wavenumber domain, where the source radiation is decomposed into plane waves for arbitrary angles of incidence. Results show how the wave field is degraded when the boundary conditions are modified.

7246
Acoustics in Pop and Rock Music Halls
Adelman-Larsen, Niels W.; Thompson, Eric R.; Gade, Anders C.
The existing body of literature regarding the acoustic design of concert halls has focused almost exclusively on classical music, although there are many more performances of rhythmic music, including rock and pop. Objective measurements were made of the acoustics of twenty rock music venues in Denmark and a questionnaire was used in a subjective assessment of those venues with professional rock musicians and sound engineers. Correlations between the objective and subjective results lead, among others, to a recommendation for reverberation time as a function of hall volume. Since the bass frequency sounds are typically highly amplified, they play an important role in the subjective ratings and the 63-Hz-band must be included in objective measurements and recommendations.

7247
Interactive Beat Tracking for Assisted Annotation of Percussive Music
Evans, Michael J.
A practical, interactive beat-tracking algorithm for percussive music is described. Regularly-spaced note onsets are determined by energy-based analysis and users can then explore candidate beat periods and phases as the overall rhythm pattern develops throughout the track. This Assisted approach can allow more flexible rhythmic analysis than purely automatic algorithms. An open-source software API based on the algorithm has been developed, along with several practical applications to allow more effective annotation, segmentation and analysis of music.

7248
Identification of Partials in Polyphonic Mixtures Based on Temporal Envelope Similarity
Gunawan, David; Sen, D.
In musical instrument sound source separation, the temporal envelopes of the partials are correlated due to the physical constraints of the instruments. With this assumption, separation algorithms then exploit the similarities between the partial envelopes in order to group partials into sources. In this paper, we quantitatively investigate the partial temporal envelope similarities of a large database of instrument samples and develop weighting functions in order to model the similarities. These model partials then provide a reference to identify similar partials of the same source. The partial identification algorithm is evaluated in the separation of polyphonic mixtures and is shown to successfully discriminate between partials from different sources.

7249
Structural Decomposition of Recorded Vocal Performances and It's Application to Intelligent Audio Editing
Fazekas, György; Sandler, Mark
In an intelligent editing environment, the semantic music structure can be used as beneficial assistance during the post production process. In this paper we propose a new approach to extract both low and high level hierarchical structure from vocal tracks of multi-track master recordings. Contrary to most segmentation methods for polyphonic audio, we utilize extra information available when analyzing a single audio track. A sequence of symbols is derived using a hierarchical decomposition method involving onset detection, pitch tracking and timbre modelling to capture phonetic similarity. Results show that the applied model well captures similarity of short voice segments.

7250
Vibrato Experiments with Bassoon Sounds by Means of the Digital Pulse Forming Synthesis and Analysis Framework
Oehler, Michael; Reuter, Christoph
The perceived naturalness of real and synthesized bassoon vibrato sounds is investigated in a listening test. The stimuli were generated by means of a currently developed synthesis and analysis framework for wind instrument sounds, based on the pulse forming theory. The framework allows controlling amplitude and frequency parameters at many different stages during the sound production process. Applying an ANOVA and Tukey HSD test it could be shown that timbre modulation (a combined pulse width and cycle duration modulation) is an important factor for the perceived naturalness of bassoon vibrato sounds. Obtained results may be useful for sound synthesis as well as in the field of timbre research.

7251
A High Level Musical Score Alignment Technique Based on Fuzzy Logic and DTW
Gagnon, Bruno; Lefebvre, Roch; Brunet, Charles-Antoine
This paper presents a system to align musical notes extracted from an audio signal with the notes of the musical score being played. Building on conventional alignment systems using Dynamic Time Warping (DTW), the proposed method uses fuzzy logic to create the similarity matrix used by DTW. Like a musician following a score, the fuzzy logic system uses high level information as its inputs, such as note identity, note duration and local rhythm. Using high level information instead of frame by frame information reduces substantially the size of the DTW similarity matrix and thus reduces significantly the complexity to find the best path for alignment. Finally, the proposed method can automatically track where a musician starts and stops playing in a musical score.

7252
Audio Synthesis and Visualization with Flash CS3 and ActionScript 3.0
Kolasinski, Jordan
This paper explains the methods and techniques used to build a fully functional audio synthesizer and FFT-based audio visualizer within the newest version of Flash CS3. Audio synthesis and visualization have not been possible to achieve in previous versions of Flash, but two new elements of ActionScript 3.0 – the Byte Array and Compute Spectrum function – make it possible even though it is not included in Flash’s codebase. Since Flash is present on 99% of the world’s computers, this opens many new opportunities for audio on the web.

7253
Improvement of One-Dimensional Loudspeaker Models
Backman, Juha
Simple one-dimensional waveguide models of loudspeaker enclosures describe well enclosures with simple interior geometry, but their accuracy is limited if used with more complex internal structures. The paper compares the results from one-dimensional models to FEM models for some simplified enclosure geometries found in typical designs. Based on these results it is apparent that one-dimensional models need to be refined to take some three-dimensional aspects of the sound field in close proximity of drivers into account. Approximations matched to FEM solutions are presented for enclosure impedance as seen by the driver and for the end correction of ports, taking both edge rounding and distance to the back wall into account.

7254
Simulating the Directivity Behavior of Loudspeakers with Crossover Filters
Feistel, Stefan; Ahnert, Wolfgang; Hughes, Charles; Olson, Bruce
In previous publications the description of loudspeakers was introduced based on high-resolution data, comprising most importantly of complex directivity data for individual drivers as well as of crossover filters. In this work it is presented how this concept can be exploited to predict the directivity balloon of multi-way loudspeakers depending on the chosen crossover filters. Simple filter settings such as gain and delay and more complex IIR filters are utilized for loudspeaker measurements and simulations, results are compared and discussed. In addition advice is given how measurements should be made particularly regarding active and passive loudspeaker systems.

7255
Intrinsic Membrane Friction and Onset of Chaos in an Electrodynamic Loudspeaker
Djurek, Danijel; Djurek, Ivan; Petosic, Antonio
Chaotic state observed in an electrodynamic loudspeaker results from a nonlinear equation of motion and is driven by anharmonic restoring term being assisted by intrinsic membrane friction. This friction is not smooth function of displacements but the sum of local hysteretic surface fluctuations, which give rise to its high differentiability in displacements, being responsible for onset of Feigenbaum bifurcation cascades and chaos. When an external small perturbation of low differentiability is added to the friction, another type of chaotic state appears, and this state involves period-3 window evidenced for the first time in these experiments.

7256
Damping of an Electrodynamic Loudspeaker by Air Viscosity and Turbulence
Djurek, Ivan; Petosic, Antonio; Djurek, Danijel
Damping of an electrodynamic loudspeaker has been studied with respect to air turbulence and viscosity. Both quantities were evaluated as a difference of damping friction measured in air and in an evacuated space. The viscous friction dominates for small driving currents (< 10 mA) and is masked by turbulence for currents extending up to 100 mA. Turbulence contribution was evaluated as a difference of air damping friction recorded at 1.0 and 0.1 bars, and it was studied for selected driving frequencies. Hot wire anemometry has been adopted to meet requirements of convection study from the loudspeaker, and obtained spectra were compared to measured turbulence friction, in order to trace the perturbation of emitted signal by turbulent motion.

7257
Energetic Sound Field Analysis of Stereo and Multichannel Loudspeaker Reproduction
Merimaa, Juha
Energetic sound field analysis has been previously applied to encoding the spatial properties of multichannel signals. This paper contributes to the understanding of how stereo or multichannel loudspeaker signals transform into energetic sound field quantities. Expressions for the active intensity, energy density, and energetic diffuseness estimate are derived as a function of signal magnitudes, cross-correlations, and loudspeaker directions. It is shown that the active intensity vector can be expressed in terms of the Gerzon velocity and energy vectors, and its direction can be related to the tangent law of amplitude panning. Furthermore, several cases are identified where the energetic analysis data may not adequately represent the spatial properties of the original signals.

7258
A New Methodology for the Acoustic Design of Compression Driver Phase-Plugs with Concentric Annular Channels
Dodd, Mark; Oclee-Brown, Jack
In compression drivers a large membrane is coupled to a small horn throat resulting in high efficiency. For this efficiency to be maintained to high frequencies the volume of the resulting cavity, between horn and membrane, must be kept small. Early workers devised a phase-plug to fill most of the cavity volume and connect membrane to horn throat with concentric annular channels of equal length to avoid destructive interference [1]. Later work, representing the cavity as a flat disc, describes a method of calculating the positions and areas of these annular channels where they exit the cavity, giving least modal excitation, thus avoiding undesirable response irregularities [2]. In this paper the result of applying both the equal path-length and modal approaches to a phase-plug with concentric annular channels coupled to a cavity shaped as a flat disc is further explored. The assumption that the cavity may be represented as a flat disc is investigated by comparing its behavior with that of an axially vibrating rigid spherical cap radiating into a curved cavity. It is demonstrated that channel arrangements derived for a flat disc are not optimum for use in a typical compression driver with a curved cavity. A new methodology for calculating the channel positions and areas giving least modal excitation is described. The impact of the new approach will be illustrated with a practical design.

7259
A Computational Model for Optimizing Microphone Placement on Headset Mounted Arrays
Gillett, Philip; Johnson, Marty; Carneal, Jamie
Microphone arrays mounted on headsets provide a platform for performing transparent hearing, source localization, focused listening, and enhanced communications while passively protecting the hearing of the wearer. However it is not trivial to determine the microphone positions that optimize these capabilities, as no analytical solution exists to model acoustical diffraction around both the human and headset. As an alternative to an iterative experimental approach for optimization, an equivalent source model of the human torso, head, and headset is developed. Results show that the model closely matches the microphone responses measured from a headset placed on a Kemar mannequin in an anechoic environment.

7260
A Simple Simulation of Acoustic Radiation from a Vibrating Object
Maxwell, Cynthia Bruyns
The goal of this pro ject is to explore the role that fluid coupling plays on the vibration of an ob ject, and to investigate how one can model such effects. We want to determine whether the effects of coupling to the medium surrounding a vibrating ob ject are significant enough to warrant including them into our current instrument modeling software. For example, we wish to examine how the resonant frequencies of an ob ject change due to the presence of a surrounding medium. We also want to examine the different methods of modeling acoustic radiation in interior and exterior domains. Using a simple 2D beam as an example, this investigation shows that coupling with dense fluids, such as water, dramatically changes the resonant frequencies of the system. We also show that using a simple finite element model and modal analysis, we can simulate the acoustic radiation profile and determine a realistic sound pressure level at arbitrary points in the domain in real-time.

7261
Sampling the Energy in a 3-D Sound Field
Pedersen, Jan Abildgaard
The energy in the 3D sound field in a room holds crucial information needed when designing a room correction system. This paper shows how measured sound pressure in at least 4 randomly selected positions scattered across the entire listening room is a robust estimate of the energy in the 3D sound field. The reproducibility was investigated for different number of random positions, which lead to an assessment of the robustness of room correction systems based on different numbers of random microphone positions.

7262
Multi-Source Room Equalization: Reducing Room Resonances
Vanderkooy, John
Room equalization traditionally has been implemented as a single correction filter applied to all the channels in the audio system. Having more sources reproducing the same monophonic low-frequency signal in a room has the benefit of not exciting certain room modes, but it does not remove other strong room resonances. This paper explores the concept of using some of the loudspeakers as sources, while others are effectively sinks of acoustic energy, so that as acoustic signals cross the listening area, they flow preferentially from sources to sinks. This approach resists the buildup of room resonances, so that modal peaks and antimodal dips are reduced in level, leaving a more uniform low-frequency response. Impulse responses in several real rooms were measured with a number of loudspeaker positions and a small collection of observer positions. These were used to study the effect of source and sink assignment, and the derivation of an appropriate signal delay and response to optimize the room behaviour. Particular studies are made of a common 5.0 speaker setup, and a stereo configuration with two or more standard subwoofers. A measurable room parameter is defined which quantifies the deleterious effects of low-frequency room resonances, supported by a specific room equalization philosophy. Results are encouraging but not striking. Signal modification needs to be considered.

7263
A Low Complexity Perceptually Tuned Room Correction System
Johnston, James D.; Smirnov, Serge
In many listening situations using loudspeakers, the actualities of room arrangements and the acoustics of the listening space combine to create a situation where the audio signal is unsatisfactorily rendered from the listener’s position. This is often true not only for computer-monitor situations, but also for home theatre or “surround-sound” situations in which some speakers may be too close or too far from the listener, in which some loudspeakers (center, surrounds) may be different than the main loudspeakers, or in which room peculiarities introduce problems in imaging or timbre coloration. In this paper, we explain a room-correction algorithm that restores imaging characteristics, equalizes the first-attack frequency response of the loudspeakers, and substantially improves the listeners’ experience by using relatively simple render-side DSP in combination with a sophisticated room analysis engine that is expressly designed to capture room characteristics that are important for stereo imaging and timbre correction.

7264
Variable-Octave Complex Smoothing
Bharitkar, Sunil
In this paper we present a technique for processing room responses using a variable-octave complex-domain (viz., time-domain) smoother. Traditional techniques for room response processing, for equalization and other applications such as auralization, have focussed on a constant-octave (e.g., 1/3 octave) and with magnitude domain smoothing of these room responses. However, recent research has shown that room responses need to be processed with a {\em high resolution especially in the low-frequency region} to characterize the discrete room modal structure as these are distinctly audible. Coupled this with the need for reducing the computational requirements associated with filters obtained from undesirable {\em over-fitting the high-frequency part} of the room response with such a high-Q complex-domain smoother, and knowledge of the fact that the auditory filters have wider bandwidth (viz., lower resolution) in the high-frequency part of the human hearing, the present paper proposes a variable-octave complex-domain smoothing. Thus this paper incorporates, simultaneously, the high low-frequency resolution requirement as well as the requirement of relatively lower-resolution fitting of the room response in the high-frequency part through a perceptually motivated approach.

7265
Multichannel Inverse Filtering with Minimal-Phase Regularization
Norcross, Scott G.; Bouchard, Martin
Inverse filtering methods are used in numerous audio applications such as loudspeaker and room correction. Regularization is commonly used to limit the amount of the original response that the inverse filter attempts to correct in an effort to reduce audible artifacts. It has been shown that the amount and type of regularization used in the inversion process must be carefully chosen so that it does not add additional artifacts that can degrade the audio signal. A method of designing a target function based on the regularization magnitude was introduced by the authors, where a minimal-phase target function could be used to reduce any pre-response caused by the regularization. In the current paper, a multi-channel inverse filtering scheme is introduced and explored where the phase of the regularization itself can be chosen to reduce the audibility of the added regularization. In the single-channel case, this approach is shown to be equivalent to the technique that was previously introduced by the authors.

7266
An In-Flight Low Latency Acoustic Feedback Cancellation Algorithm
Osmanovic, Nermin; Clarke, Victor E.; Velandia, Erich
Acoustic feedback is a common problem in high gain systems; it is very unpredictable and unpleasant to the ear. Cockpit communication systems on aircraft may suffer from acoustic feedback between a pilot’s boomset microphone and high gain cockpit speaker. The acoustic feedback tone can compromise flight safety by temporarily blocking communication between the pilot and ground control. This paper presents the design of an in-flight low latency (<6ms) digital audio processing system that automatically detects and removes acoustic feedback tones from the microphone to speaker audio path. We present information about the acoustic feedback cancellation algorithm including the calculation of feedback existence probability, as implemented in an aircraft cockpit communication system

7267
Using Audio Classifiers as a Mechanism for Content-Based Song Similarity
Fields, Benjamin; Casey, Michael
As collections of digital music become larger and more widespread, there is a growing need for assistance in a user's navigation and interaction with a collection and with the individual members of that collection. Examining pairwise song relationships and similarities, based upon content derived features, provides a useful tool to do so. This paper looks into a means of extending a song classification algorithm to provide song to song similarity information. In order to evaluate the effectiveness of this method, the similarity data is used to group the songs into k-means clusters, these clusters are then compared against the original genre sorting algorithm.

7268
Toward Textual Annotation of Rhythmic Style in Electronic Dance Music
Jacobson, Kurt; Davies, Matthew; Sandler, Mark
Music information retrieval encompasses a complex and diverse set of problems. Some recent work has focused on automatic textual annotation of audio data, paralleling work in image retrieval. Here we take a narrower approach to the automatic textual annotation of music signals and focus on rhythmic style. Training data for rhythmic styles are derived from simple, precisely labeled drum loops intended for content creation. These loops are already textually annotated with the rhythmic style they represent. The training loops are then compared against a database of music content to apply textual annotations of rhythmic style to unheard music signals. Three distinct methods of rhythmic analysis are explored. These methods are tested on a small collection of electronic dance music resulting in a labeling accuracy of 73%.

7269
Key-Independent Classification of Harmonic Change in Musical Audio
Li, Ernest; Bello, Juan Pablo
We introduce a novel method for describing the harmonic development of a musical signal by using only low-level audio features. Our approach uses Euclidean and phase distances in a Tonal Centroid space. Both measurements are taken between successive chroma partitions of a harmonically segmented signal, for each of three harmonic circles representing fifths, major thirds, and minor thirds. The resulting feature vector can be used to quantify a string of successive chord changes according to changes in chord quality and movement of the chordal root. We demonstrate that our feature set can provide both unique classification and accurate identification of harmonic changes, while resisting variations in orchestration and key.

7270
Automatic Bar Line Segmentation
Gainza, Mikel; Barry, Dan; Coyle, Eugene
A method that segments the audio according to the position of the bar lines is presented. The method detects musical bars that frequently repeat in different parts of a musical piece by using an audio similarity matrix. The position of each bar line is predicted by using prior information about the position of previous bar lines as well as the estimated bar length. The bar line segmentation method does not depend on the presence of percussive instruments to calculate the bar length. In addition, the alignment of the bars allows moderate tempo deviations

7271
The Analysis and Determination of the Tuning System in Audio Musical Signals
Heydarian, Peyman; Jones, Lewis; Seago, Allan
The tuning system is an essential aspect of a musical piece. It specifies the scale intervals and contributes to the emotions of a song. There is a direct relationship between the musical mode and the tuning of a piece for modal musical traditions. In a broader sense it represents the different genres. In this research algorithms based on spectral and chroma averages are developed to construct patterns from audio musical files. Then a similarity measure like the Manhattan distance or the cross-correlation determines the similarity of a piece to each tuning class. The tuning system provides valuable information about a piece and is worth incorporating into the metadata of a musical file.

7272
Experiment in Computational Voice Elimination Using Formant Analysis
Begault, Durand R.
This study explores the use of a computational approach to the elimination of a known from an unknown voice exemplar in a forensic voice elimination protocol. A subset of voice exemplars from 11 talkers, taken from the TIMIT data base, were analyzed using a formant tracking program. Intra- versus inter-speaker mean formant frequencies are analyzed and compared.

7273
Applications of ENF Analysis Method in Forensic Authentication of Digital Audio and Video Recordings
Grigoras, Catalin
This paper reports on the electric network frequency (ENF) method as a means of assessing the integrity of digital audio/video evidence analysis. A brief description is given to different ENF types and phenomena that determine ENF variations, analysis methods, stability over different geographical locations on continental Europe, inter-laboratory validation tests, uncertainty of measurement, real case investigations, different compression algorithm effects on ENF values and possible problems to be encountered during forensic examinations. By applying the ENF Method in forensic audio/video analysis, one can determine whether and where a digital recording has been edited, establish whether it was made at the time claimed, and identify the time and date of the registering operation.

7274
Quantifying the Speaking Voice: Generating a Speaker Code as a Means of Speaker Identification Using a Simple Code-Matching Technique
Popolo, Peter S.; Sanders, Richard W.; Titze, Ingo R.
This paper looks at a methodology of quantifying the speaking voice, by which temporal and spectral features of the voice are extracted and processed to create a numeric code that identifies speakers, so those speakers can be searched in a database much like fingerprints. The parameters studied include: (1) average fundamental frequency (F0) of the speech signal over time, (2) standard deviation of the F0, (3) the slope and (4) sign of the FO contour, (5) the average energy, (6) the standard deviation of the energy, (7) the spectral energy contained from 50 Hz to 1,000 Hz, (8) the spectral energy from 1,000 Hz to 5,000 Hz, (9) the Alpha Ratio, (10) the average speaking rate, and (11) the total duration of the spoken sentence.

7275
Further Investigation into the ENF Criterion for Forensic Authentication
Brixen, Eddy B.
In forensic audio one important task is the authentication of audio recordings. In the field of digital audio and digital media one single complete methodology has not been demonstrated yet. However, the ENF (Electric Network Frequency) Criterion has shown promising results and should be regarded as a major tool in that respect. By tracing the electric network frequency in the recorded signal a unique timestamp is provided. This paper analyses a number of situations with the purpose to provide further information for the assessment of this methodology. The topics are: Ways to establish reference data, spectral contents of the electromagnetic fields, low bit rate codecs’ treatment of low level hum components, and tracing ENF harmonic components.

7276
Spatial Audio Scene Coding in a Universal Two-Channel 3-D Stereo Format
Jot, Jean-Marc; Krishnaswami, Arvindh; Laroche, Jean; Merimaa, Juha; Goodwin, Michael M.
We describe a frequency-domain method for phase-amplitude matrix decoding and up-mixing of two-channel stereo recordings, based on spatial analysis of 2-D or 3-D directional and ambient cues in the recording, and re-synthesis of these cues for consistent reproduction over any loudspeaker or headphone playback system. The decoder is compatible with existing two-channel phase-amplitude stereo formats; however, unlike existing time-domain decoders, it preserves source separation and allows accurate reproduction of ambience and reverberation cues. The two-channel spatial encoding/decoding scheme is extended to incorporate 3-D elevation, without relying on HRTF cues. Applications include data-efficient storage or transmission of multi-channel soundtracks and computationally-efficient interactive audio spatialization in a backward-compatible stereo encoding format.

7277
Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding
Goodwin, Michael M.; Jot, Jean-Marc
In standard virtualization of stereo or multichannel recordings for headphone reproduction, channel-dependent interaural relationships based on head-related transfer functions are imposed on each input channel in the binaural mix. In this paper, we describe a new binaural reproduction paradigm based on frequency-domain spatial analysis-synthesis: the input content is analyzed for channel-independent positional information on a time-frequency basis, and the binaural signal is generated by applying appropriate HRTF cues to each time-frequency component, resulting in a high spatial resolution that overcomes a fundamental limitation of channel-centric virtualization methods. The spatial analysis and synthesis algorithms are discussed in detail and a variety of applications are described.

7279
Real-Time Spatial Representation of Moving Sound Sources
Tsakostas, Christos; Floros, Andreas
The simulation of moving sound sources represents a fundamental issue for efficiently representing virtual worlds and acoustic environments but it is limited by the Head Related Transfer Function resolution measurement, usually overcome by interpolation techniques. In this work, a novel time-varying binaural convolution / filtering algorithm is presented which, that takes into account both physical and psychoacoustic criteria, can efficiently simulate a moving sound source. It is shown that the proposed algorithm overcomes the excessive calculation load problems usually raised by legacy moving sound source spatial representation techniques, while high-quality 3D sound spatial quality is achieved in both terms of objective and subjective criteria.

7280
The Use of Cephalometric Features for Headmodels in Spatial Audio Processing
Bharitkar, Sunil; Gislason, Pall
In two-channel or stereo applications, such as for televisions, automotive infotainment, and hi-fi systems, the speakers are typically placed substantially close to each other. The sound field generated from such a setup creates an image that is perceived as monophonic while lacking sufficient spatial ``presence'. Due to this limitation, a stereo expansion technique may be utilized to widen the soundstage to give the perception to listener(s) that sound is origination from a wider angle (e.g., +/- 30 degrees relative to the median plane) using head-related-transfer functions (HRTF's). In this paper, we propose extensions to the headmodel (viz., the ipsilateral and contralateral headshadow functions) based on analysis of the diffraction of sound around head cephalometric features, such as the nose, whose dimensions are of the order to cause variations in the headshadow responses in the high-frequency region. Modeling these variations is important for accurate rendering of a spatialized sound-field for 3-D audio applications. Specifically, this paper presents refinements to the existing spherical head-models for spatial audio applications.

7281
MDCT Domain Analysis and Synthesis of Reverberation for Parametric Stereo Audio
Suresh, K.; Sreenivas, T. V.
We propose a parametric stereo coding analysis and synthesis directly in the MDCT domain using an analysis by synthesis parameter estimation. The stereo signal is represented by an equalized sum signal and spatialization parameters. Equalized sum signal and the spatialization parameters are obtained by sub-band analysis in the MDCT domain. The de-correlated signal required for the stereo synthesis is also generated in the MDCT domain. Subjective evaluation test using MUSHRA shows that the synthesized stereo signal is perceptually satisfactory and comparable to the state of the art parametric coders.

7282
Correlation-Based Ambience Extraction from Stereo Recordings
Merimaa, Juha; Goodwin, Michael M.; Jot, Jean-Marc
One of the key components in current multichannel upmixing techniques is identification and extraction of ambience from original stereo recordings. This paper describes correlation-based ambience extraction within a time-frequency analysis-synthesis framework. Two new estimators for the time- and frequencydependent amount of ambience in the input channels are analytically derived. These estimators are discussed in relationship to two other algorithms from the literature and evaluated with simulations. It is also shown that the time constant used in a recursive correlation computation is an important factor in determining the performance of the algorithms. Short-time correlation estimates are typically biased such that the amount of ambience is underestimated.

7283
A Study of Hearing Damage Caused by Personal MP3 Players
Farina, Adriano
This paper aims to assess the actual in-hear sound pressure level during use of mp3 players. The method is based on standard EN 50332 (100dB as maximum SPL), IEC 60959 (HATS) and IEC 60711 (ear simulators), as explained in the January 2007 issue of the Bruel and Kjaer Magazine (page 13) [1]. In this study a number of MP3 players were tested, employing a dummy head and a software for spectrum analysis. The measurements were aimed to assess the hearing damage risk for youngsters who employ an MP3 player for several hours/day. The students of an Italian high school (15-18 years old) were asked to supply their personal devices for testing, leaving untouched the gain from the last usage. The results show that the risk of hearing damage is real for many of the devices tested, which revealed to be capable of reproducing average sound pressure levels well above the risk threshold.

7284
Electret Receiver for In-Ear Earphone
Lin, Shu-Ru; Chiang, Dar-Ming; Lee, I-Chen; Chen, Yan-Ren
This paper presents an electret receiver developed for in-ear earphone. The electret diaphragm is fabricated by nano-porous fluoropolymer and charged by the corona method at room temperature. The electret diaphragm is driven to vibrate as a piston motion and sound by the electrostatic force while the audio signal is applied. The influence factors, such as electrostatic charge quantities of electret diaphragm and distance between the electrode plate and diaphragm, are investigated to promote the output sound pressure level of the in-ear earphone. An enclosure with resonator is also design to improve the efficient performance of the in-ear earphone. Consequently, the output sound pressure inside the 2cc coupler can be lifted to exceed 105dB at 1 kHz with the driving voltage of sound signal Vpp=?}3V and remarkably enlarge the output sound pressure level response at low frequency.

7285
New Generation Artificial Larynx
Czyzewski, Andrzej; Odya, Piotr; Kostek, Bozena; Szczuko, Piotr
The aim of the presented paper is to show a new generation of devices for laryngectomy patients. The artificial larynx has many disadvantages. The major problem is a background noise caused by the device. There are two different approaches to solve this task. The first one focuses on the artificial larynx. The artificial larynx engineered was equipped with a digital processor and an amplifier. Two algorithms, namely spectral subtraction algorithm and the comb filter were proposed for noise reduction. The second approach employs PDA to generate speech. A speech synthesis is performed, allowing for playing back any sentence, therefore any text can be entered by a user, and played through PDA speaker.

7286
A Graphical Method for Studying Spectra Containing Harmonics and Other Patterns
Catravas, Palmyra
A technique for identifying and characterizing patterns in spectra is described. Motion signatures are a key component of the technique; motion enhances visual recognition of systematic effects. Multiple harmonic series, odd and even harmonics, missing modes and sidebands produce identifiable signatures. The technique works well with more complicated, inharmonic spectral patterns, and reveals systematic behavior in data over larger ranges in frequency than when the spectra are presented in serial form.

7287
Immersive Auditory Environments for Teaching and Learning
Parvin, Elizabeth A.
3D audio simulations allow for the creation of immersive auditory environments for enhanced and alternative interactive learning. Several supporting teaching and learning philosophies are presented. Experimental research and literature on spatial cognition and sound perception provide further backing. Museums, schools, research and training facilities, as well as online educational websites all significantly can benefit from its use. Design dependence on project purpose, content, and audience is explored. An example installation is discussed.

7288
Dynamic Bit-Rate Adaptation for Speech and Audio
van Schijndel, Nicolle H.; Gros, Laetitia; van de Par, Steven
Many audio and speech transmission applications have to deal with highly time-varying channel capacities, making dynamic adaptation to bit rate an important issue. This study investigates such adaptation using a coder that is driven by rate-distortion optimization mechanisms, always coding the full signal bandwidth. For perceptual evaluation, the continuous quality evaluation methodology is used, which has specifically been designed for dynamic quality testing. Results show latency and smoothing effects in the judged sound quality, but no quality penalty for the switching between quality levels; the overall quality using adaptation is comparable to using the average available bit rate. Thus, dynamic bit-rate adaptation has a clear benefit as compared to always using the lowest guaranteed available bit rate.

7289
A 216 kHz 124 dB Single Die Stereo Delta Sigma Audio Analog-to-Digital Converter
Yang, YuQing; Sculley, Terry; Abraham, Jacob
A 216kHz single die stereo delta sigma ADC is designed for high precision audio applications. A single loop, fifth-order, thirty-three level delta sigma analog modulator with positive and negative feedforward path is implemented. An interpolated multilevel quantizer with unevenly weighted quantization levels replaces a conventional 5-bit flash type quantizer in this design. These new techniques suppress the signal dependent energy inside the delta sigma loop and reduce internal channel noise coupling. Integrated with an on-chip bandgap reference circuit, DEM(dynamic element matching) circuit and a linear phase, FIR decimation filter, the ADC achieves 124dB dynamic range (A-weighted), –110dB THD+N over a 20kHz bandwidth. Inter-channel isolation is 130dB. Power consumption is approximately 330mW

7290
Encoding Bandpass Signals Using Level Crossings: A Model-Based Approach
Kumaresan, Ramdas; Panchal, Nitesh
A new approach to representing a time-limited, and essentially bandlimited signal x(t), by a set of discrete frequency/time values is proposed. The set of discrete frequencies is the set of frequency locations at which (real and imaginary parts of) the Fourier transform of x(t) cross certain levels and the set of discrete time values corresponds to the traditional level crossings of x(t). The proposed representation is based on a simple bandpass signal model called a Sum-of-Sincs (SOS) model, that exploits our knowledge of the bandwidth/timewidth of x(t). Given the discrete fequency/time locations, we can reconstruct the x(t) by solving a least-squares problem. Using this approach, we propose an analysis/synthesis algorithm to decompose and represent composite signals like speech.

7291
Theory of Short-Time Generalized Harmonic Analysis (SGHA) and Its Fundamental Characteristics
Muraoka, Teruo; Miura, Takahiro; Ochiai, Daisuke; Ifukube, Tohru
Current digital signal processing was utilized practically by rapid progresses of processing hardwares brought by IC technology and processing algorithms such as FFT and digital filterings. In short, the processings are for modifying any digitalized signals and classified into following two methods; (1) Digital filtering [Parametric processing] and (2) Analysis-Synthesis [Non-parametric processing]. Both methods commonly have a weak point when detecting and removing any locally existing frequency components without any side effects. This difficulty will be removed by applying inharmonic frequency analysis and its fundamental principle was proven by N. Wiener in his publication of "Generalized Harmonic Analysis (GHA)" in 1930. Its application to practical signal processing was achieved by Dr. Y. Hirata in 1994, and the method corresponds to GHA's short time & sequential processing, therefore let us call it Short-time Generalized Harmonic Analysis (SGHA). The authors have been engaged in research of its fundamental characteristics and application to noise reduction, and reported the results in previous AES conventions. This time, SGHA's fundamental theory will be explained together with its characteristics.

7292
Quality Improvement Using a Sinusoidal Model in HE-AAC
Kim, Jung Geun; Hyun, Dong Il; Youn, Dae Hee; Park, Young Cheol
In Spectral Band Replication (SBR), a tone is often restored as a noise, which results in audible distortion. In this paper, we propose an efficient way of restoring the original tones using SBR. In the proposed algorithm, the tones are identified through a sinusoidal modeling and their frequencies are adjusted within the QMF band in order to reduce the noise floor. The proposed algorithm is perfectly compatible to the standard High-Efficiency AAC (HE-AAC) because it doesn¢¢ç¯t require additional information or operations in the decoding process. Output spectrograms were compared and listening tests were conducted to evaluate the proposed algorithm. Results confirmed the effectiveness of the proposed algorithm.

7293
Special Hearing Aid for Stuttering People
Odya, Piotr; Czyzewski, Andrzej
Owing to recent progress in digital signal processors developments it has been possible to build a subminiature device combining speech and hearing aid. Furthermore, despite its small dimensions, the device can execute quite complex algorithms and can be easily reprogrammed. The paper puts an emphasis on issues related to the design and implementation of algorithms applicable to both speech and hearing aids. Frequency shifting or delaying the audio signal are often used for speech fluency improvement. The basic frequency altering algorithm (FAF) is similar to the sound compression algorithm used in some special hearing aid as above. Therefore, the experimental device presented in the paper provides a universal hearing & speech aid which may be used by hearing or by speech impaired persons or by persons suffering from both problems, simultaneously.

7294
An Improved Low Complexity AMR-WB+ Encoder Using Neural Networks for Mode Selection
Lecomte, Jérémie; Lefebvre, Roch; Richard, Guy
This paper presents an alternative mode selector based on neural networks to improve the low-complexity AMR WB+ standard audio coder especially at low bit rates. The AMR-WB+ audio coder is a multi-mode coder using both time-domain and frequency-domain modes. In low complexity operation, the standard encoder determines the coding mode on a frame-by-frame basis by essentially applying thresholding to parameters extracted from the input signal and using a logic which favors time-domain modes. The mode selector proposed in this paper reduces this bias, and achieves a mode decision which is closer to the full complexity encoder. This results in measurable quality improvements, in both objective and subjective assessments.

7295
Real-Time Auralization Employing a Not-Linear, Not-Time-Invariant Convolver
Farina, Angelo; Farina, Adriano
The paper reports the first results of listening tests performed with a new software tool, capable of not-linear convolution (employing the Diagonal Volterra Kernel approach) and of time-variation (employing efficient morphing among a number of kernels). The listening tests were done in a special listening room, employing a menu-driven playback system, capable of presenting blindly sound samples recorded from real-world devices and samples simulated employing the new software tool, and, for comparison, samples obtained by traditional linear, time-invariant convolution. The listener fills up a questionnaire for each sound sample, being able to switch them back and forth for better comparing. The results show that this new device-emulation tool provides much better results than already-existing convolution plugins (which only emulate the linear, time-invariant behavior), requiring little computational load and causing short latency and prompt reaction to user’s action.

7296
Real-Time Panning Convolution Reverberation
Stewart, Rebecca; Sandler, Mark
Convolution reverberation is an excellent method for generating high-quality artificial reverberation that accurately portrays a specific space, but it can only represent the static listener and source positions of the measured impulse response being convolved. In this paper, multiple measured impulse responses along with interpolated impulse responses between measured locations are convolved with dry input audio to create the illusion of a moving source. The computational cost is decreased by using a hybrid approach to reverberation which recreates the early reflections through convolution with a truncated impulse response while the late reverberation is simulated with a feedback delay network.

7297
Ambisonic Panning
Neukom, Martin
Ambisonics is a surround-system for encoding and rendering a 3D sound field. Sound is encoded and stored in multi-channel sound files and is decoded for playback. In this paper a panning function equivalent to the result of ambisonic encoding and so-called in-phase decoding is presented. In this function the order of ambisonic resolution is just a variable that can be an arbitrary positive number not restricted to integers and that can be changed during playback. The equivalence is shown, limitations and advantages of the technique are mentioned and real time applications are described.

7298
Adaptive Karhunen-Lòeve Transform for Multichannel Audio
Jiao, Yu; Zielinski, Slawomir; Rumsey, Francis
In previous works, the authors proposed a hierarchical bandwidth limitation technique based on the Karhunen-Lòeve Transform (KLT) to reduce the bandwidth for multichannel audio transmission. The subjective results proved that this technique could be used to reduce the overall bandwidth without significant audio quality degradation. Further study found that the transform matrix varied considerably over time for many recordings. In this paper, the KLT matrix was calculated based on short-term signals and updated adaptively over time. The perceptual effects of the adaptive KLT process were studied using a series of listening tests. The results showed that adaptive KLT resulted in better spatial quality than non-adaptive KLT but introduced some other artefacts.

7299
Extension of an Analytic Secondary Source Selection Criterion for Wave Field Synthesis
Spors, Sascha
Wave field synthesis (WFS) is a spatial sound reproduction technique that facilitates a high number of loudspeakers (secondary sources) to create a virtual auditory scene for a large listening area. It requires a sensible selection of the loudspeakers that are active for the reproduction of a particular virtual source. For virtual point sources and plane waves suitable intuitively derived selection criteria are used in practical implementations. However, for more complex virtual source models and loudspeaker array contours the selection is not straightforward. In a previous publication the author proposed a secondary source selection criterion on basis of the sound intensity vector. This contribution will extend this criterion to data-based rendering and focused sources, and will discuss truncation effects.

7300
Adaptive Wave Field Synthesis for Sound Field Reproduction: Theory, Experiments, and Future Perspectives
Gauthier, Philippe-Aubert; Berry, Alain
Wave field synthesis is a sound field reproduction technology which assumes that the reproduction environment is anechoic. A real reproduction space thus reduces the objective accuracy of wave field synthesis. Adaptive wave field synthesis is defined as a combination of wave field synthesis and active compensation. With adaptive wave field synthesis the reproduction errors are minimized along with the departure penalty from the wave field synthesis solution. Analysis based on the singular value decomposition connects wave field synthesis, active compensation and "Ambisonics". The decomposition allows the practical implementation of adaptive wave field synthesis based on independent radiation mode control. Results of experiments in different rooms support the theoretical propositions and show the efficiency of adaptive wave field synthesis for sound field reproduction.

7301
360° Localization via 4.x RACE Processing
Glasgal, Ralph
Recursive Ambiophonic Crosstalk Elimination (RACE), implemented as a VST plug-in, convolved from an impulse response, or purchased as part of a TacT Audio or other home audiophile product, properly reproduces all the ITD and ILD data sequestered in most standard two or multichannel media. Ambiophonics is so named because it is intended to be the replacement for 75 year old stereophonics and 5.1 in the home, car, or monitoring studio, but not in theaters. The response curves show that RACE produces a loudspeaker binaural soundfield with no audible colorations, much like Ambisonics or Wavefield Synthesis. RACE can do this starting with most standard CD/LP/DVD two, four or five-channel media, or even better, 2 or 4 channel recordings made with an Ambiophone, using one or two pairs of closely spaced loudspeakers. The RACE stage can easily span up to 170° for two channel orchestral recordings or 360° for movie/electronic-music surround sources. RACE is not sensitive to head rotation and listeners can nod, recline, stand up, lean sideways, move forward and back, or sit one behind the other. As in 5.1, off center listeners can easily localize the center dialog even though no center speaker is ever needed.

7302
Loudspeaker Systems for Flat Television Sets
Behrends, Herwig; Bradinal, Werner; Heinsberger, Christoph
The rapidly increasing sales of Liquid Crystal- and Plasma Display Television (TV) sets lead to new challenges to the sound processing inside TV-sets. Flat cabinets do not accommodate sufficiently room for loudspeakers which are able to reproduce frequencies below 100 to 200 Hz without distortions and with a reasonable sound pressure level. Cost reduction forces the set makers to use cheap and small loudspeakers, which are in no way comparable anymore to the loudspeakers used in Cathode Ray Tube Televisions. In this paper, we will describe the trends and the requirements of the market, and discuss different approaches and a practical implementation of a new algorithm, which tackle these problems.

7303
Loudspeakers for Flexible Displays
Sugimoto, Takehiro; Ono, Kazuho; Kurozumi, Kohichi; Ando, Akio; Hara, Akira; Morita, Yuichi; Miura, Akito
Flexible displays that can be rolled up would allow users to enjoy programs wherever they are. NHK Science & Technical Research Laboratories have been developing flexible displays for mobile television. The loudspeakers for such televisions must have the same features as the displays; they must be thin, lightweight, and flexible. We created two types of loudspeakers; one was made of polyvinylidene fluoride and the other used electro-dynamic actuators. Their characteristics were demonstrated to be suitable for mobile use and promising for flexible displays.

7304
Software-Based Live Sound Measurements, Part 2
Ahnert, Wolfgang; Feistel, Stefan; Miron, Alexandru Radu; Finder, Enno
In previous publications the authors introduced the software based measuring system EASERA to be used for measurements with pre-recorded music and speech signals. This second part investigates the use of excitation signals supplied from an independent external source in real-time. Using a newly developed program module live-sound recordings or speech and music signals from a microphone input and from the mixing console can be utilized to obtain impulse response data for further evaluation. New noise suppression methods are presented that allow these impulse responses to be acquired in full-length even in occupied venues. As case studies, room acoustic measurements based on live sound supply are discussed for a concert hall and a large cathedral. Required measuring conditions and limitations are derived as a result.

7305
A System for Remote Control of the Height of Suspended Microphones
McKinnie, Douglas
An electrically driven pulley system allowing remote control of the height of cable-suspended microphones is described. It can be assembled from inexpensive and readily available component parts. A reverse block-and tackle system is used to allow many meters of cable to be drawn into a 1.2 meter long space, allowing the cable to remain connected and the microphone to remain in use during movement. An advantage of this system is that single microphones, stereo pairs, or mic arrays can be remotely positioned "by ear" during rehearsal, soundcheck, or warmup.

7306
Music at Your Fingertips: An Electrotactile Fader
Loviscach, Jörn
Tactile sensations can be invoked by applying short high-voltage low-current electrical pulses to the skin. This phenomenon has been researched into extensively to support visually or hearing impaired persons. However, it can also be applied to operate audio production tools in eyes-free mode and without acoustical interferences. The electrotactile fader presented in this paper is used to indicate markers or to "display" a track's short-time spectrum using five electrodes mounted on the lever. As opposed to mechanical solutions, which may for instance involve the fader's motor, the electrotactile display neither causes acoustic noise nor reduces the fader's input precision due to vibration.

7307
Concept and Components of a Sound Field Effector Using a Loudspeaker Array
Oto, Teruki; Tanno, Tomoaki; Hua, Jiang; Tamaura, Risa; Kiryu, Syogo; Kamekawa, Toru
Most of effectors used for electrical music instruments provide some temporal changes to sounds. If effectors aimed to spatial expressions had been developed, artists could have a new performance. We propose a Sound Field Effector using a loudspeaker array. Various sound fields such as a focus controlled in real time by sound engineering and/or artists. The Sound Field Effector is mainly divided to software parts and hardware parts. A 16ch system was developed as a prototype. The system can change sound fields within 1 msec. Focal patterns produced with the system were measured in an anechoic room.

7308
A Novel Mapping with Natural Transition from Linear to Logarithmic Scaling
Panzer, Joerg
The area hyperbolic function ArSinh has the interesting property of performing a linear mapping at arguments close to zero and a quasi-logarithmic mapping for large arguments. Further, it works also with a negative abscissa and at the zero-point. The transition from the linear to the logarithmic range is monotonic, so is the transition to the negative range. This paper demonstrates the use of the ArSinh-function in a range of application examples, such as zooming into the display of transfer-functions, sampling of curves with high density at a specific point and a coarse resolution elsewhere. The paper also reviews the linear and logarithmic mapping and discusses the properties of the new ArSinh-mapping.

7309
Real Time Implementation of an Innovative Digital Audio Equalizer
Cecchi, Stefania; Peretti, Paolo; Palestini, Lorenzo; Piazza, Francesco; Bettarelli, Ferruccio; Lattanzi, Ariano
Fixed frequency response audio equalization has well-known problems due to algorithms computational complexity and to filters design techniques. This paper describes the design and real time implementation of a $M$-band linear phase digital audio equalizer. Beginning from multirate systems and filterbanks, an innovative uniform and non uniform bands audio equalizer is derived. The idea of this work arises from different approaches employed in filterbanks to avoid aliasing in the case of adaptive filtering in each band. The effectiveness of the real time implementation is shown comparing it with a frequency domain equalizer. The solution presented here has several advantages in terms of linear phase and uniform frequency response avoiding ripple between adjacent bands.

7310
Wideband Beamforming Method Using Two-Dimensional Digital Filter
Kushida, Koji; Shimizu, Yasushi; Nishikawa, Kiyoshi
This paper presents a method for designing a DSP controlled directional array speaker with constant directivity and specified sidelobe level over the wideband frequency by means of the two-dimensional (2-D) Fourier series approximation. The band of constant directivity can be extended in the lower frequency band by using the non-physical area in the 2-D frequency plane, where the target amplitude response of the 2-D filter is set to design the 2-D FIR filter. We discuss that the beamwidth of the array speaker can be narrowed in the more lower frequency band with a modification of the original algorithm by K. Nishikawa, et al.

7311
Linear Phase Mixed FIR/IIR Crossover Networks: Design and Real-Time Implementation
Palestini, Lorenzo; Peretti, Paolo; Cecchi, Stefania; Piazza, Francesco; Lattanzi, Ariano; Bettarelli, Ferruccio
Crossover networks are crucial components of audio reproduction systems and therefore they have received great attention in literature. In this paper, the design and implementation of a digital crossover will be presented. A mixed FIR/IIR solution has been explored in order to exploit the respective benefits of FIR and IIR realizations, aiming at designing a low delay, low complexity, easily extendible, approximately linear phase crossover network. A software real-time implementation for the NU-Tech platform of the proposed system will be shown. Practical tests have been carried out to evaluate the performance of the proposed approach.

7312
Convolutive Blind Source Separation of Speech Signals in the Low Frequency Bands
Jafari, Maria G.; Plumbley, Mark D.
Sub-band methods are often used to address the problem of convolutive blind speech separation, as they offer the computational advantage of approximating convolutions by multiplications. The computational load, however, often remains quite high, because separation is performed on several sub-bands. In this paper, we exploit the well known fact that the high frequency content of speech signals typically conveys little information, since most of the speech power is found in frequencies up to 4kHz, and consider separation only in frequency bands below a certain threshold. We investigate the effect of changing the threshold, and find that separation performed only in the low frequencies can lead to the recovered signals being similar in quality to those extracted from all frequencies.

7313
A Highly Directive 2-Capsule Based Microphone
Faller, Christof
While microphone technology has reached a high level of performance in terms of signal-to-noise ratio and linearity, directivity of commonly used first order microphones is limited. Higher order gradient based microphones can achieve higher directivity but suffer from signal-to-noise ratio issues. The usefulness of beamforming techniques with multiple capsules is limited due to high cost (a high number of capsules is required for high directivity) and highly frequency variant directional response. A highly directive 2-capsule based microphone system is proposed, using two cardioid capsules. Time-frequency processing is applied to the corresponding two signals. A highly directive directional response is achieved which is time invariant and frequency invariant over a large frequency range.

Back to AES Papers


(C) 2008, Audio Engineering Society, Inc.