AES New York 2007
Perception, Part 1
Paper Session Details
Friday, October 5, 9:00 am — 12:00 pm
Chair: William Martens, McGill University - Montreal, Quebec, Canada
P1-1 Room Reflections Misunderstood?—Siegfried Linkwitz, Linkwitz Lab - Corte Madera, CA, USA
In a domestic living space a 2-channel monopolar and a dipolar loudspeaker system are compared for perceived differences in their reproduction of acoustic events. Both sound surprisingly similar and that is further enhanced by extending dipole behavior to frequencies above 1.4 kHz. The increased bandwidth of reflections is significant for spatial impression. Measured steady-state frequency response and measured reflection patterns differ for the two systems, while perceived sound reproduction is nearly identical in terms of timbre, phantom image placement, and sound stage width. The perceived depth in the recording is greater for the dipole loudspeaker. Auditory pattern recognition and precedence effects appear to explain these observations. Implications upon the design of loudspeakers, room treatment, and room equalization are discussed.
Convention Paper 7162 (Purchase now)
P1-2 Aspects of Reverberation Echo Density—Patty Huang, Jonathan Abel, Stanford University - Stanford, CA, USA
Echo density, and particularly its time evolution at the reverberation impulse response onset, is thought to be an important factor in the perceived time domain texture of reverberation. In this paper the psychoacoustics of reverberation echo density is explored using reverberation impulse responses synthesized via a Poisson process to have a variety of static and evolving echo densities. In addition, a recently proposed echo density measure called the normalized echo density, or NED, is explored, and related via a simple expression to echo density specified in echoes per second using echo patterns with static echo densities. A continuum of perceived time-domain texture was noted, from “sputtery” around 100 echoes per second to “smooth” above about 20,000 echoes per second, at which point it was perceptually identical to Gaussian noise. The character of the reverberation impulse response onset was explored for various rates of echo density increase, and ranged from “sputtery” for long mixing times to “instantly smooth” for short mixing times.
Convention Paper 7163 (Purchase now)
P1-3 Localization in Spatial Audio—From Wave Field Synthesis to 22.2—Judith Liebetrau, Thomas Sporer, Thomas Korn, Fraunhofer IDMT - Ilmenau, Germany; Kristina Kunze, Christoph Man, Daniel Marquard, Timo Matheja, Stephan Mauer, Thomas Mayenfels, Robert Möller, Michael-Andreas Schnabel, Benjamin Slobbe, Andreas Überschär, Technical University of Ilmenau, Ilmenau, Germany
Spatial audio reproduction used to concentrate on systems with a low number of loudspeakers arranged in the horizontal plane. Wave Field Synthesis (WFS) and NHK's 22.2 two systems promise better localization and envelopment. Comparisons of 22.2 with 5.1 concerning spatial attributes on one hand, and evaluation of spatial properties of WFS on the other hand have been published in the past, but different methods have been used. In this paper a listening test method is presented that is tailored on the evaluation of
localization of 3-D audio formats at different listener positions. Two experiments have been conducted. In the first experiment the localization precision of 22.2 reproduction was evaluated. In a second experiment the localization precision in the horizontal plane as a function of spatial sampling was studied.
Convention Paper 7164 (Purchase now)
P1-4 Thresholds for Discriminating Upward from Downward Trajectories for Smooth Virtual Source Motion within a Sagittal Plane—David H. Benson, William L. Martens, Gary P. Scavone, McGill University - Montreal, Quebec, Canada
In virtual auditory display, sound source motion is typically cued through dynamic variations in two types of localization cues: the inter-aural time delay (ITD) and binaural spectral cues. Generally, both types of cues contribute to the perception of sound source motion. For certain spatial trajectories, however, namely those lying on the surfaces of cones of confusion, ITD cues are absent, and motion must be inferred solely on the basis of spectral variation. This paper tests the effectiveness of these spectral cues in eliciting motion percepts. A virtual sound source was synthesized that traversed sections of a cone of confusion on a particular sagittal plane. The spatial extent of the source's trajectory was systematically varied to probe directional discrimination thresholds.
Convention Paper 7165 (Purchase now)
P1-5 Headphone Transparification: A Novel Method for Investigating the Externalization of Binaural Sounds—Alastair Moore, Anthony Tew, University of York - York, UK; Rozenn Nicol, France Telecom R&D - Lannion, France
The only way to be certain that binaurally rendered sounds are properly externalized is to compare them to real sound sources in a discrimination experiment. However, the presence of the headphones required for the binaural rendering interfere with the real sound source. A novel technique is presented that uses small compensating signals applied to the headphones at the same time as the real source is active, such that the signals reaching the ears are the same as if the headphones were not present.
Convention Paper 7166 (Purchase now)
P1-6 On the Sound Color Properties of Wavefield Synthesis and Stereo—Helmut Wittek, Schoeps Mikrofone GmbH - Karlsruhe, Germany, and University of Surrey, Guildford, Surrey, UK; Francis Rumsey, University of Surrey - Guildford, Surrey, UK; Günther Theile, Institut für Rundfunktechnik - Munich, Germany
The sound color reproduction properties of wavefield synthesis are analyzed by listening tests and compared with that of stereophony. A novel technique, "OPSI," designed to avoid spatial aliasing is presented and analyzed in theory and practice. Both stereophonic phantom sources as well as OPSI sources were perceived to be less colored than was predicted by coloration predictors based on the spectral alterations of the ear signals. This leads to the hypothesis that a decoloration process exists for stereophonic reproduction as proposed in the "association model" of Theile.
Convention Paper 7167 (Purchase now)
Signal Processing, Part 1
Friday, October 5, 9:00 am — 11:00 am
Chair: Duane Wise, Consultant - Boulder, CO, USA
P2-1 Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2-D Filtering—Alexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
Spectral attenuation algorithms for audio noise reduction often generate annoying musical noise artifacts. Most existing methods for suppression of musical noise employ a combination of instantaneous and time-smoothed spectral estimates for calculation of spectral gains. In this paper a 2-D approach to the filtering of a time-frequency spectrum is proposed, based on a recently developed non-local means image denoising algorithm. The proposed algorithm demonstrates efficient reduction of musical noise without creating “noise echoes” inherent in time-smoothing methods.
Convention Paper 7168 (Purchase now)
P2-2 Perceptually Motivated Gain Filter Smoothing for Noise Suppression—Alexis Favrot, Christof Faller, Illusonic LLC - Chavannes, Switzerland
Stationary noise suppression is widely used, mostly for reducing noise in speech signals or for audio restoration. Most noise suppression algorithms are based on spectral modification, i.e., a real-valued gain filter is applied to short-time spectra of the speech signal to reduce noise. The more noise is to be removed, the more likely are artifacts due to aliasing effects and time variance of the gain filter. A perceptually motivated systematic time and frequency smoothing of the gain filter is proposed to improve quality, considering the frequency resolution of the auditory system and masking. Comparison with a number of previous methods indicates that the proposed noise suppressor performs as good as the best other method, while computational complexity is much lower.
Convention Paper 7169 (Purchase now)
P2-3 A Novel Automatic Noise Removal Technique for Audio and Speech Signals—Harinarayanan E.V., ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Shamail Saeed, ATC Labs - Noida, India; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
This paper introduces new ideas on wideband stationary/nonstationary noise removal for audio signals. Current noise reduction techniques have generally proven to be effective, yet these typically exhibit certain undesirable characteristics. Distortion and/or alteration of the audio characteristics of primary audio sound is a common problem. Also user intervention in identifying the noise profile is sometimes necessary. The proposed technique is centered on the classical Kalman filtering technique for noise removal but uses a novel architecture whereby advanced signal processing techniques are used to identify and preserve the richness of the audio spectrum. The paper also includes conceptual and derivative results on parameter estimation, a description of multi-parameter Signal Activity Detector (SAD), and our new-found improved results.
Convention Paper 7170 (Purchase now)
P2-4 The Concept, Design, and Implementation of a General Dynamic Parametric Equalizer—Duane Wise, Wholegrain Digital Systems, LLC - Boulder, CO, USA
The classic operations of dynamics processing and parametric equalization control two separate domains of an audio signal. The operational nature of the two processors give insight to a manner in which they may be combined into a single processor. This integrated processor can perform as the equivalent of a standalone dynamics processor or parametric equalizer, but can also modify the boost and/or cut of an equalizer stage over time following a dynamics curve. The design of a digital version of this concept is discussed herein, along with implementation issues and proposals for their resolutions.
Convention Paper 7171 (Purchase now)
Perception, Part 2
Friday, October 5, 1:30 pm — 5:30 pm
Chair: Poppy Crum, Johns Hopkins University School of Medicine - Baltimore, MD, USA
P3-1 Short-Term Memory for Musical Intervals: Cognitive Differences for Consonant and Dissonant Pure-Tone Dyads—Susan Rogers, Daniel Levitin, McGill University - Montreal, Quebec, Canada
To explore the origins of sensory and musical consonance/dissonance, 16 participants performed a short-term memory task by listening to sequentially presented dyads. Each dyad was presented twice; during each trial participants judged whether a dyad was novel or familiar. Nonmusicians showed greater recognition of musically dissonant than musically consonant dyads. Musicians recognized all dyads more accurately than predicted. Neither group used sensory distinctiveness as a recognition cue, suggesting that the frequency ratio, rather than the frequency difference between two tones, underlies memory for musical intervals. Participants recognized dyads well beyond the generally understood auditory short-term memory limit of 30 seconds, despite the inability to encode stimuli for long-term storage.
Convention Paper 7172 (Purchase now)
P3-2 Multiple Regression Modeling of the Emotional Content of Film and Music—Rob Parke, Elaine Chew, Chris Kyriakakis, University of Southern California - Los Angeles, CA, USA
Our research seeks to model the effect of music on the perceived emotional content of film media. We used participants’ ratings of the emotional content of film-alone, music-alone, and film-music pairings for a collection of emotionally neutral film clips and emotionally provocative music segments. Mapping the results onto a three-dimensional emotion space, we observed a strong relationship between the ratings of the film- and music-alone clips, and those of the film-music pairs. Previously, we modeled the ratings in each dimension independently. We now develop models, using stepwise regression, to describe the film-music ratings using quadratic terms and based on all dimensions simultaneously. We demonstrate that while linear-terms are sufficient for single emotion dimensional models, regression models that consider multiple emotion dimensions yield better results.
Convention Paper 7173 (Purchase now)
P3-3 Measurements and Perception of Nonlinear Distortion—Comparing Numbers and Sound Quality—Alex Voishvillo, JBL Professional - Northridge, CA, USA
The discrepancy between traditional measures of nonlinear distortion and its perception is commonly recognized. THD, two-tone and multitone intermodulation and coherence function provide certain objective information about nonlinear properties of a DUT, but they do not use any psychoacoustical principles responsible for distortion perception. Two approaches to building psychoacoustically-relevant measurement methods are discussed: one is based on simulation of the hearing system’s response similar to the methods used for assessment of codec’s sound quality. The other approach is based on several ideas such as distinguishing low-level versus high-level nonlinearities, low-order versus high-order nonlinearities, and spectral content of distortion signals that occur below the spectrum of an undistorted signal versus one that overlaps the signal’s spectrum or occurs above it. Several auralization examples substantiating this approach are demonstrated
Convention Paper 7174 (Purchase now)
P3-4 Influence of Loudness Level on the Overall Quality of Transmitted Speech—Nicolas Côté, France Télécom R&D - Lannion, France, and Berlin University of Technology, Berlin, Germany; Valérie Gautier-Turbin, France Télécom R&D - Lannion, France; Sebastian Möller, Berlin University of Technology - Berlin, Germany
This paper consists of a study on the influence of the loudness on the perceived quality of transmitted speech. This quality is based on judgments of particular quality features, one of which is loudness. In order to determine the influence of loudness on perceived speech quality, we designed a two-step auditory experiment. We varied the speech level of selected speech samples and degraded them by coding and packet-loss. Results show that loudness has an effect on the overall speech quality, but that effect depends on the other impairments involved in the transmission path, and especially on the bandwidth of the transmitted speech. We tried to predict the auditory judgments with two quality prediction models. The signal-based WB-PESQ model, which normalizes the speech signals to a constant speech level, does not succeed in predicting the speech quality for speech signals with only impairments due to a non-optimum speech level. However, the parametric E-model, which includes a measure of the listening level, provides a good estimation of the speech quality.
Convention Paper 7175 (Purchase now)
P3-5 On the Use of Graphic Scales in Modern Listening Tests—Slawomir Zielinski, Peter Brooks, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
This paper provides a basis for discussion of the perception and use of graphic scales in modern listening tests. According to the literature, the distances between the adjacent verbal descriptors used in typical graphic scales are often perceptually unequal. This implies that the scales are perceptually nonlinear and the ITU-R Quality Scale is shown to be particularly nonlinear in this respect. In order to quantify the degree of violation of linearity in listening tests, the evaluative use of graphic scales was studied in three listening tests. Contrary to expectation, the results showed that the listeners use the scales almost linearly. This may indicate that the listeners ignore the meaning of the descriptors and use the scales without reference to the labels.
Convention Paper 7176 (Purchase now)
P3-6 A Model-Based Technique for the Perceptual Optimization of Multimodal Musical Performances—Daniel Valente, Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
As multichannel audio and visual processing becomes more accessible to the general public, musicians are beginning to experiment with performances where players are in two or more remote locations. These co-located or telepresence performances challenge the conventions and basic rules of traditional musical experience. While they allow for collaboration with musicians and audiences in remote locations, the current limitations of technology restricts the communication between musicians. In addition, a telepresence performance introduces optical distortion that can result in impaired auditory communication, resulting in the need to study certain auditory-visual interactions. One such interaction is the relationship between a musician and a virtual visual environment. How does the attendant visual environment affect the perceived presence of a musician? An experiment was conducted to determine the magnitude of this effect. Two pre-recorded musical performances were presented through virtual display in a number of acoustically diverse environments under different relative background lighting conditions. Participants in this study were asked to balance the level of the direct-to-reverberant ratio, and reverberant level until the virtual musician's acoustic environment is congruent with that of the visual representation. One can expect auditory-visual interactions in the perception of a musician in varying virtual environments. Through a multivariate parameter optimization, the results from this paper will be used to develop a parametric model that will control the current auditory rendering system, Virtual Microphone Control (ViMiC), in order to create a more perceptually accurate auditory visual environment for performance.
Convention Paper 7177 (Purchase now)
P3-7 Subjective and Objective Rating of Intelligibility of Speech Recordings—Bradford Gover, John Bradley, National Research Council - Ottawa, Ontario, Canada
Recordings of test speech and an STIPA modulated noise stimulus were made with several microphone systems placed in various locations in a range of controlled test spaces. The intelligibility of the test speech recordings was determined by a subjective listening test, revealing the extent of differences among the recording systems and locations. Also, STIPA was determined for each physical arrangement and compared with the intelligibility test scores. The results indicate that STIPA was poorly correlated with the subjective responses, and not very useful for rating the microphone system performance. A computer program was written to determine STIPA in accordance with IEC 60268-16. The result was found to be highly sensitive to the method of determining the modulation transfer function at each modulation frequency, yielding the most accurate result when normalizing by the premeasured properties of the specific stimulus used.
Convention Paper 7178 (Purchase now)
P3-8 Potential Biases in MUSHRA Listening Tests—Slawomir Zielinski, Philip Hardisty, Christopher Hummersone, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
The method described in the ITU-R BS.1534-1 standard, commonly known as MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors), is widely used for the evaluation of systems exhibiting intermediate quality levels, in particular low-bit rate codecs. This paper demonstrates that this method, despite its popularity, is not immune to biases. In two different experiments designed to investigate potential biases in the MUSHRA test, systematic discrepancies in the results were observed with a magnitude up to 20 percent. The data indicates that these discrepancies could be attributed to the stimulus spacing and range equalizing biases.
Convention Paper 7179 (Purchase now)
Signal Processing, Part 2
Friday, October 5, 1:30 pm — 4:30 pm
Chair: Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA
P4-1 Loudness Domain Signal Processing—Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA
Loudness Domain Signal Processing (LDSP) is a new framework within which many useful audio processing tasks may be achieved with high quality results. The LDSP framework presented here involves first transforming audio into a perceptual representation utilizing a psychoacoustic model of loudness perception. This model maps the nonlinear variation in loudness perception with signal frequency and level into a domain where loudness perception across frequency and time is represented on a uniform scale. As such, this domain is ideal for performing various loudness modification tasks such as volume control, automatic leveling, etc. These modifications may be performed in a modular and sequential manner, and the resulting modified perceptual representation is then inverted through the psychoacoustic loudness model to produce the final processed audio.
Convention Paper 7180 (Purchase now)
P4-2 Design of a Flexible Crossfade/Level Controller Algorithm for Portable Media Platforms—Danny Jochelson, Texas Instruments, Inc. - Dallas, TX, USA; Stephen Fedigan, General Dynamics SATCOM - Richardson, TX, USA; Jason Kridner, Jeff Hayes, Texas Instruments, Inc. - Stafford, TX, USA
The addition of a growing number of multimedia capabilities on mobile devices necessitate rendering multiple streams simultaneously, fueling the need for intelligent mixing of these streams to achieve proper balance and address the tradeoff between dynamic range and saturation. Additionally, the crossfading of subsequent streams can greatly enhance the user experience on portable media devices. This paper describes the architecture, features, and design challenges for a real-time, intelligent mixer with crossfade capabilities for portable audio platforms. This algorithm shows promise in addressing many audio system challenges on portable devices through a highly flexible and configurable design while maintaining low processing requirements.
Convention Paper 7181 (Purchase now)
P4-3 Audio Delivery Specification—Thomas Lund, TC Electronic A/S - Risskov, Denmark
From the quasi-peak meter in broadcast to sample by sample assessment in music production, normalization of digital audio has traditionally been based on a peak level measure. The paper demonstrates how low dynamic range material under such conditions generally comes out the loudest and how the recent ITU-R BS.1770 standard offers a coherent alternative to peak level fixation. Taking the ITU-R recommendations into account, novel ways of visualizing short-term loudness and loudness history are presented; and applications for compatible statistical descriptors portraying an entire
music track or broadcast program are discussed.
Convention Paper 7182 (Purchase now)
P4-4 Multi-Core Signal Processing Architecture for Audio Applications—Brent Karley, Sergio Liberman, Simon Gallimore, Freescale Semiconductor, Inc. - Austin, TX, USA
As already seen in the embedded computing industry and other consumer markets, the trend in audio signal processing architectures is toward multi-core designs. This trend is expected to continue given the need to support higher performance applications that are becoming more prevalent in both the consumer and professional audio industries. This paper describes a multi-core audio architectures being promoted to the audio industry and details the various architectural hardware, software, and system level trade-offs. The proper application of multi-core architectures is addressed for both consumer and professional audio applications and a comparison of single core, multi-core, and multi-chip designs is provided based on the authors’ experience in the design, development, and application of signal processors.
Convention Paper 7183 (Purchase now)
P4-5 Rapid Prototyping and Implementing Audio Algorithms on DSPs Using Model-Based Design and Automatic Code Generation—Arvind Ananthan, The MathWorks - Natick, MA, USA
This paper explores the increasingly popular model-based design concept to design audio algorithms within a graphical design environment, Simulink, and automatically generate processor specific code to implement it on target DSP in a short time without any manual coding. The final fixed-point processors targeted in this paper will be Analog Devices Blackfin processor and Texas Instruments C6416 DSP. The concept of model-based design introduced here will be explained primarily using an acoustic noise cancellation system (using an LMS algorithm) as an example. However, the same approach can be applied to other audio and signal processing algorithms; other examples that will be shown during the lecture will include a 3-Band a parametric equalizer, reverberation model, flanging, voice pitch shifting, and other audio effects. The design process starting from a floating point model to easily converting it to a fixed-point model is clearly demonstrated in this paper. The model is then implemented on C6416 DSK board and Blackfin 537 EZ-Kit board using the automatically generated code. Finally, the paper also explains how to profile the generated code and optimize it using C-intrinsics (C-callable assembly libraries).
Convention Paper 7184 (Purchase now)
P4-6 Filter Reconstruction and Program Material Characteristics Mitigating Word Length Loss in Digital Signal Processing-Based Compensation Curves Used for Playback of Analog Recordings—Robert S. Robinson, Channel D Corporation - Trenton, NJ, USA
Renewed consumer interest in pre-digital recordings, such as vinyl records, has spurred efforts to implement playback emphasis compensation in the digital domain. This facilitates realizing tighter design objectives with less effort than required with practical analog circuitry. A common assumption regarding a drawback to this approach, namely bass resolution loss (word length truncation) of up to approximately seven bits during digital de-emphasis of recorded program material, ignores the reconstructive properties of compensation filtering and the characteristics of typical program material. An analysis of the problem is presented, as well as examples showing a typical resolution loss of zero to one bits. The worst case resolution loss, which is unlikely to be encountered with music, is approximately three bits.
Convention Paper 7185 (Purchase now)
Friday, October 5, 2:00 pm — 3:30 pm
P5-1 Modeling of Nonlinearities in Electrodynamic Loudspeakers—Delphine Bard, Göran Sandberg, Lund University - Lund, Sweden
This paper proposes a model of the nonlinearities in an electrodynamic loudspeaker based on Volterra series decomposition and taking into account the thermal effects affecting the electrical parameters when temperature increases. This model will be used to predict nonlinearities taking place in a loudspeaker and their evolution as the loudspeaker is used for a long time and/or at high power rates and its temperature increases. A temperature increase of the voice coil will cause its series resistance value to increase, therefore reducing the current flowing in the loudspeaker. This phenomenon is known as power compression.
Convention Paper 7186 (Purchase now)
P5-2 Listening Tests of the Localization Performance of Stereodipole and Ambisonic Systems—Andrea Capra, LAE Group - Parma, Italy and University of Parma, Parma, Italy; Simone Fontana, LAE Group - Parma, Italy, and Ecole Nationale Supérieure des Télécommunications, Paris, France; Fons Adriaensen, LAE Group - Parma, Italy; Angelo Farina, LAE Group - Parma, Italy, and University of Parma, Parma, Italy; Yves Grenier, Ecole Nationale Supérieure des Télécommunications - Paris, France
In order to find a possible correlation of objective parameters and subjective descriptors of the acoustics of theaters, auditoria or music halls, and perform meaningful listening tests, we need to find a reliable 3-D audio system that should give the correct perception of the distances, a good localization all around the listener, and a natural sense of realism. For this purpose a Stereo Dipole system and an Ambisonic system were installed in a listening room at La Casa Della Musica (Parma, Italy). Listening tests were carried out for evaluating the localization performances of the two systems.
Convention Paper 7187 (Purchase now)
P5-3 Round Robin Comparison of HRTF Simulation Results: Preliminary Results—Raphaël Greff, A-Volute - Douai, France; Brian F. G. Katz, LIMSI – CNRS - Orsay, France
Variability in experimental measurement techniques of the HRTF is a concern that numerical calculation methods can hope to avoid. Numerical techniques such as the Boundary Element Method (BEM) allow for the calculation of the HRTF over the full audio spectrum from a geometrical model. While numerical calculations are not prone to the same errors as physical measurements, other problems appear that cause variations: geometry acquisition and modeling of real shapes as meshes can be performed in different ways. An on-going international round-robin study, “Club Fritz,” gathers HRTF data measured from different laboratories on a unique dummy head. This paper presents preliminary results of numerical simulation based on an acquired geometrical model of this artificial head.
Convention Paper 7188 (Purchase now)
P5-4 Simulation of Complex and Large Rooms Using a Digital Waveguide Mesh—Jose Lopez, Technical University of Valencia - Valencia, Spain; Jose Escolano, University of Jaen - Jaen, Spain; Basilio Pueo, University of Alicante - Alicante, Spain
The Digital Waveguide Mesh (DWM) method for room acoustic simulation has been introduced in the last years to solve sound propagation problems numerically. However, the huge computer power needed in the modeling of large rooms and the complexity to incorporate realistic boundary conditions has delayed their general use, being restricted to the validation of theoretical concepts using simple and small rooms. This paper presents a complete DWM implementation that includes a serious treatment of boundary conditions, and it is able to cope with different materials in very large rooms up to reasonable frequencies. A simulation of a large building modeled with a high degree of precision has been carried out, and the obtained results are presented and analyzed in detail.
Convention Paper 7189 (Purchase now)
P5-5 The Flexible Bass Absorber—Niels W. Adelman-Larsen, Flex Acoustics - Lyngby, Denmark; Eric Thompson, Anders C. Gade, Technical University of Denmark - Lyngby, Denmark
Multipurpose concert halls face a dilemma. They host different performance types that require significantly different acoustic conditions in order to provide the best sound quality to the performers, sound engineers, and the audience. Pop and rock music contains high levels of bass sound but still require a high definition for good sound quality. The mid- and high-frequency absorption is easily regulated, but adjusting the low-frequency absorption has typically been too expensive or requires too much space to be practical for multipurpose halls. A practical solution to this dilemma has been developed. Measurements were made on a variable and mobile low-frequency absorber. The paper presents the results of prototype sound absorption measurements as well as elements of the design.
Convention Paper 7190 (Purchase now)
P5-6 The Relation between Active Radiating Factor and Frequency Responses of Loudspeaker Line Arrays – Part 2—Yong Shen, Kang An, Dayi Ou, Nanjing University - Nanjing, China
Active Radiating Factor (ARF) is an important parameter for evaluating the similarity between a real loudspeaker line array and the ideal continuous line source. Our previous paper dealt with the relation between ARF of the loudspeaker line array and the Differential chart of its Frequency Responses in two distances (FRD). In this paper an improved way to estimate ARF of the loudspeaker line array by measuring on-axis frequency responses is introduced. Some further problems are discussed and experiment results are analyzed. The results may give some help to loudspeaker array designers.
Convention Paper 7191 (Purchase now)
P5-7 Time Varying Behavior of the Loudspeaker Suspension—Bo Rohde Pedersen, Aalborg University - Esbjerg, Denmark; Finn Agerkvist, Technical University of Denmark - Lyngby, Denmark
The suspension part of the electrodynamic loudspeaker is often modeled as a simple linear spring with viscous damping. However, the dynamic behavior of the suspension is much more complicated than predicted by such a simple model. At higher levels the compliance becomes nonlinear and often changes during high excitation at high levels. This paper investigates how the compliance of the suspension depends on the excitation, i.e., level and frequency content. The measurements are compared with other known measurement methods of the suspension.
Convention Paper 7192 (Purchase now)
P5-8 Diffusers with Extended Frequency Range—Konstantinos Dadiotis, Jamie Angus, Trevor Cox, University of Salford - Salford, Greater Manchester, UK
Schroeder diffusers are unable to diffuse sound when all their wells radiate in phase, a phenomenon known as flat plate effect. This phenomenon appears at multiple frequencies of pf0, where p is the integer that generates the well depths and f0 the design frequency. A solution is to send the flat plate frequencies above the bandwidth of interest. For QRDs and PRDs to achieve this goal, impractically long sequences are needed. This paper presents power residue diffusers, of small length in comparison to their prime generator, as solutions to the problem. Their characteristics are investigated and their performance when applied to Schroeder diffusers is explored while modulation is used to cope with periodicity. The results confirm the expectations.
Convention Paper 7193 (Purchase now)
P5-9 Waveguide Mesh Reverberator with Internal Decay and Diffusion Structures—Jonathan Abel, Patty Huang, Julius Smith III, Stanford University - Stanford, CA, USA
Loss and diffusion elements are proposed for a digital waveguide mesh reverberator. The elements described are placed in the interior of the waveguide mesh and may be viewed as modeling objects within the acoustical space. Filters at internal scattering junctions provide frequency-dependent losses and control over decay rate. One proposed design method attenuates signals according to a desired reverberation time, taking into account the local density of loss junctions. Groups of one or several adjacent scattering junctions are altered to break up propagating wavefronts, thereby increasing diffusion. A configuration that includes these internal elements offers more flexibility in tailoring the reverberant impulse response than the common waveguide mesh construction where loss and diffusion elements are uniformly arranged solely at the boundaries. Finally, such interior decay and diffusion elements are ideally suited for use with closed waveguide structures having no boundaries, such as spherical or toroidal meshes, or meshes formed by connecting the edges or surfaces of two or more meshes.
Convention Paper 7194 (Purchase now)
Perception, Part 3
Saturday, October 6, 9:00 am — 12:00 pm
Chair: Brent Edwards, Starkey Hearing Research Center - Berkeley, CA, USA
P6-1 Deriving Physical Predictors for Auditory Attribute Ratings Made in Response to Multichannel Music Reproductions—Sungyoung Kim, William Martens, McGill University - Montreal, Quebec, Canada
A group of eight students engaged in a Tonmeister training program were presented with multichannel loudspeaker reproductions of a set of solo piano performances and were asked to complete two attribute rating sessions that were well separated in time. Five of the eight listeners produced highly consistent ratings after a six-month period during which they received further Tonmeister training. Physical predictors for the obtained attribute ratings were developed from the analysis of binaural recordings of the piano reproductions in order to support comparison between these stimuli and other stimuli, and thereby to establish a basis for independent variation in the attributes to serve both creative artistic goals and further scientific exploration of such multichannel music reproductions.
Convention Paper 7195 (Purchase now)
P6-2 Interaction between Loudspeakers and Room Acoustics Influences Loudspeaker Preferences in Multichannel Audio Reproduction—Sean Olive, Harman International Industries, Inc. - Northridge, CA, USA; William Martens, McGill University - Montreal, Quebec, Canada
The physical interaction between loudspeakers and the acoustics of the room in which they are positioned has been well established; however, the influence on listener preferences for loudspeakers that results from such variation in room acoustics has received little experimental verification. If listeners adapt to listening room acoustics relatively quickly, then room acoustic variation should not significantly influence loudspeaker preferences. In the current paper two groups of listeners were given differential exposure to listening room acoustics via a binaural room scanning (BRS) measurement and playback system. Although no significant difference in loudspeaker preference was found between these two groups of listeners, the room acoustic variation to which they were exposed did significantly influence loudspeaker preferences.
Convention Paper 7196 (Purchase now)
P6-3 Evaluating Off-Center Sound Degradation in Surround Loudspeaker Setups for Various Multichannel Microphone Techniques—Nils Peters, Stephen McAdams, McGill University - Montreal, Quebec, Canada; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
Many listening tests have been undertaken to estimate listeners' preferences for different multichannel recording techniques. Usually these tests focus on the sweet spot, the spatial area where the listener maintains optimal perception of virtual sound sources, thereby neglecting to consider off-center listening positions. The purpose of the present paper is to determine how different microphone configurations affect the size of the sweet spot. A perceptual method is chosen in which listening impressions achieved by three different multichannel recording techniques for several off-center positions are compared with the listening impression at the sweet spot. Results of this listening experiment are presented and interpreted.
Convention Paper 7197 (Purchase now)
P6-4 The Effects of Latency on Live Sound Monitoring—Michael Lester, Jon Boley, Shure Incorporated - Niles, IL, USA
A subjective listening test was conducted to determine how objectionable various amounts of latency are for performers in live monitoring scenarios. Several popular instruments were used and the results of tests with wedge monitors are compared to those with in-ear monitors. It is shown that the audibility of latency is dependent on both the type of instrument and monitoring environment. This experiment shows that the acceptable amount of latency can range from 42 ms to possibly less than 1.4 ms under certain conditions. The differences in latency perception for each instrument are discussed. It is also shown that more latency is generally acceptable for wedge monitoring setups than for in-ear monitors.
Convention Paper 7198 (Purchase now)
P6-5 A Perforated Desk Surface to Diminish Coloration in Desktop Audio-Production Environments—Karl Gentner, BRC Acoustics & Technology - Seattle, WA, USA; Jonas Braasch, Paul Calamia, Rensselaer Polytechnic Institute - Troy, NY, USA
In audio-production rooms, a common source of harmful reflections is the mixing console or desk surface itself. A perforated material is proposed as an alternative desk surface to reduce coloration by achieving acoustical transparency. A variety of desk surfaces and perforation schemes were tested within common room conditions. The resulting psychoacoustic study indicates that the fully-perforated desk provides lower coloration than that of the solid desk in every condition. A partially-perforated desk shows a similar decrease in coloration, specifically when the perforated area is determined by the Fresnel zones dictated by the source and receiver positions.
Convention Paper 7199 (Purchase now)
P6-6 Perceptually Modeled Effects of Interchannel Crosstalk in Multichannel Microphone Technique—Hyun-Kook Lee, LG electronics - Seoul, Korea; Russell Mason, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
One of the most noticeable perceptual effects of interchannel crosstalk in multichannel microphone techniques is an increase in perceived source width. The relationship between the perceived source-width-increasing effect and its physical causes was analyzed using an IACC-based objective measurement model. A description of the measurement model is presented, and the measured data obtained from stimuli created with crosstalk and those without crosstalk are analyzed visually. In particular, frequency and envelope dependencies of the measured results and their relationship with the perceptual effect are discussed. The relationship between the delay time of the crosstalk signal and the effect of different frequency content on the perceived source width is also discussed in this paper.
Convention Paper 7200 (Purchase now)
Signal Processing, Part 3
Saturday, October 6, 9:00 am — 11:30 am
Chair: Dana Massie, Audience, Inc. - Mountain View, CA, USA
P7-1 Sigma-Delta Modulators Without Feedback Around the Quantizer?—Stanley Lipshitz, John Vanderkooy, Bernhard Bodmann, University of Waterloo - Waterloo, Ontario, Canada
We use a result due to Craven and Gerzon—the “Integer Noise Shaping Theorem”—to show that the internal system dynamics of the class of sigma-delta modulators (or equivalently noise shapers) with integer-coefficient FIR error-feedback filters can be completely understood from the action of simple, linear pre- and de-emphasis filters surrounding a (possibly nonsubtractively dithered) quantizer. In this mathematically equivalent model, there is no longer any feedback around the quantizer. The major stumbling block, which has previously prevented a complete dynamical analysis of all such systems of order higher than one, is thus removed. The class of integer noise shapers includes, but is not restricted to, the important family of “Pascal” shapers, having all their zeros at dc.
Convention Paper 7201 (Purchase now)
P7-2 The Effect of Different Metrics on the Performance of “Stack” Algorithms for Look-Ahead Sigma Delta Modulators—Peter Websdell, Jamie Angus, University of Salford - Salford, Greater Manchester, UK
Look-ahead Sigma-Delta modulators look forward k samples before deciding to output a “one” or a “zero.” The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. This paper describes alternative tree based algorithms. Tree based algorithms are simpler to implement because they do not require backtracking to determine the correct output value. They can also be made more efficient using “Stack” algorithms. Both the tree algorithm and the more computationally efficient “Stack” algorithms are described. In particular, the effects of different error metrics on the performance of the “Stack” algorithm are described and the average number of moves required per bit discussed. The performance of the “Stack” algorithm is shown to be better than previously thought.
Convention Paper 7202 (Purchase now)
P7-3 Evaluation of Time-Frequency Analysis Methods and Their Practical Applications—Pascal Brunet, Zachary Rimkunas, Steve Temme, Listen, Inc. - Boston, MA, USA
Time-Frequency analysis has been in use for more than 20 years and many different time-frequency distributions have been developed. Four in particular, Short Time Fourier Transform (STFT), Cumulative Spectral Decay (CSD), Wavelet, and Wigner-Ville have gained popularity and firmly established themselves as useful measurement tools. This paper compares these four popular transforms, explains their trade-offs, and discusses how to apply them to analyzing audio devices. Practical examples of loudspeaker impulse responses, loose particles, and rub & buzz defects are given as well as demonstration of their application to common problems with digital/analog audio devices such as Bluetooth headsets, MP3 players, and VoIP telephones.
Convention Paper 7203 (Purchase now)
P7-4 Time-Frequency Characterization of Loudspeaker Responses Using Wavelet Analysis—Daniele Ponteggia, Audiomatica - Florence, Italy; Mario Di Cola, Audio Labs Systems - Milan, Italy
An electroacoustic transducer can be characterized by measuring its impulse response (IR). Usually the collected IR is then transformed by means of the Fourier Transform to get the complex frequency response. IR and complex frequency response form a pair of equivalent views of the same phenomena. An alternative joint time-frequency view of the system response can be achieved using wavelet transform and a color-map display. This work illustrates the implementation of the wavelet transform into a commercial measurement software and presents some practical results on different kinds of electroacoustic systems.
Convention Paper 7204 (Purchase now)
P7-5 Equalization of Loudspeaker Resonances Using Second-Order Filters Based on Spatially Distributed Impulse Response Measurements—Jakob Dyreby, Sylvain Choisel, Bang & Olufsen A/S - Struer, Denmark
A new approach for identifying and equalizing resonances in loudspeakers is presented. The method optimizes the placement of poles and zeros in a second-order filter by minimization of the frequency-dependent decay. Each resonance may be equalized by the obtained second-order filter. Furthermore, the use of spectral decay gives opportunity for optimizing on multiple measurements simultaneously making it possible to take multiple spatial directions into account. The proposed procedure is compared to direct inversion and minimum-phase equalization. It makes it possible to equalize precisely the artifacts responsible for ringing, while being largely unaffected by other phenomena such as diffractions, reflections, and noise.
Convention Paper 7205 (Purchase now)
Applications in Audio
Saturday, October 6, 9:30 am — 11:00 am
P8-1 Pump Up the Volume: Enhancing Music Phone Audio Quality and Power Using Supercapacitors for Power Management—Pierre Mars, CAP-XX (Australia) Pty. Ltd. - Sydney, NSW, Australia
As multimedia and music phones grow in popularity, consumers want an iPod-quality, uninterrupted audio experience without the buzzing and clicks associated with wireless transmission. This paper describes the problems delivering high power and high quality audio in music-enabled mobile phones and how a supercapacitor can overcome them. Typically, the audio amplifier power supply input in a mobile phone is connected directly to Vbattery. This paper compares audio performance between the typical setup and connecting the audio amp supply to a supercapacitor charged to 5V through a current limited boost converter.
Convention Paper 7206 (Purchase now)
P8-2 Digital Audio Processing on a Tiny Scale: Hardware and Software for Personal Devices—Peter Eastty, Oxford Digital Limited - Oxfordshire, UK
The design of an audio signal processor, graphical programming environment, DSP software, and parameter adjustment tool is described with reference to the hardware and software requirements of the audio sweetening function in personal devices, particularly cell phones. Special care is taken in the hardware design to ensure low operating power, small size (4mm*4mm package), or 0.5 to 1 sq. mm area depending on geometry, stereo analog, and digital I/O and high performance. The parameter adjustment tool allows real time control of the DSP so that processing may be customized to the actual properties of the audio sources and the acoustic properties of the enclosure and loudspeakers. A live demonstration of the programming and parameter adjustment of the processor will be given as part of the presentation of the paper.
Convention Paper 7207 (Purchase now)
P8-3 Enhancing End-User Capabilities in High Speed Audio Networks—Nyasha Chigwamba, Richard Foss, Rhodes University - Grahamstown, South Africa
Firewire is a digital network technology that can be used to interconnect professional audio equipment, PCs, and electronic devices. The Plural Node Architecture splits connection management of firewire audio devices between two nodes namely, an enabler and a transporter. The Audio Engineering Society’s SC-02-12-G Task Group has produced an Open Generic Transporter guideline document that describes a generic interface between the enabler and transporter. A client-server implementation above the Plural Node Architecture allows connection management of firewire audio devices via TCP/IP. This paper describes enhancements made to connection management applications as a result of additional capabilities revealed by the Open Generic Transporter document.
Convention Paper 7208 (Purchase now)
P8-4 Sharing Acoustic Spaces over Telepresence Using Virtual Microphone Control—Jonas Braasch, Daniel L. Valente, Rensselaer Polytechnic Institute - Troy, NY, USA; Nils Peters, McGill University - Montreal, Quebec, Canada
This paper describes a system that is used to project musicians in two or more co-located venues into a shared virtual acoustic space. The sound of the musicians is captured using spot microphones. Afterward, it is projected at the remote end using spatialization software based on virtual microphone control (ViMiC) and an array of loudspeakers. In order to simulate the same virtual room at all co-located sites, the ViMiC systems communicate using the OpenSound Control protocol to exchange room parameters and the room coordinates of the musicians.
Convention Paper 7209 (Purchase now)
P8-5 A Tutorial: Fiber Optic Cables and Connectors for Pro-Audio—Ronald Ajemian, Owl Fiber Optics - Flushing, NY, USA
There have been many technological breakthroughs in the area of fiber optic technology that have allowed an easier transition to migrate into the professional audio arena. Since the current rise of copper prices in the worldwide markets, there has been an increase of
usage in fiber optic based equipment, cables, and connectors deployed for pro-audio and video. This prompted the writing of this tutorial to bring the professional audio community up to date with some old and new fiber optic cables and connectors now being deployed in pro-audio. This tutorial will help audio professionals understand the jargon and to better understand fiber optic technology now being deployed in pro-audio.
Convention Paper 7210 (Purchase now)
P8-6 The Most Appropriate Method of Producing TV Program Audio Focusing on the Audience—Hisayuki Ohmata, NHK Science & Technical Research Laboratories - Tokyo, Japan; Akira Fukada, NHK Broadcasting Center - Tokyo, Japan; Hiroshi Kouchi, NHK Kofu Station - Kofu, Yamanashi, Japan
When audiences watch TV programs, they often perceive a difference in audio levels. This is a real annoyance for them, and it is caused by differences in program audio. In order to have equal audio levels, it is necessary to produce audio under the same conditions for all programs. To solve this problem, we propose a method to produce TV program audio. We make clear the manner in which different monitoring levels influence mixing balance at various mixing stages. This paper also describes management of audio levels for programs with different digital broadcasting head rooms.
Convention Paper 7211 (Purchase now)
P8-7 Beyond Splicing: Technical Ear Training Methods Derived from Digital Audio Editing Techniques—Jason Corey, University of Michigan - Ann Arbor, MI, USA
The process of digital audio editing, especially with classical or acoustic music using a source-destination method, offers an excellent opportunity for ear training. Music editing involves making transparent connections or splices between takes of a piece of music and often requires specifying precise edit locations by ear. The paper outlines how aspects of digital editing can be used systematically as an ear training method, even out of the context of an editing session. It describes a software tool that uses specific techniques from audio editing to create an effective ear training method that offers benefits that transfer beyond audio editing.
Convention Paper 7212 (Purchase now)
P8-8 New Trends in Sound Reinforcement Systems Based on Digital Technology—Piotr Kozlowski, Pawel Dziechcinski, Wroclaw University of Technology - Wroclaw, Poland; Wojciech Grzadziel, Pracownia Akustyczna, Acoustic Design Team - Wroclaw, Poland
This paper presents new aspects of modern sound reinforcement system’s designing that came into view because of the prevalence of digital technology. The basic structure of modern digital electro acoustical systems is explained using as an example the one installed at Wroclaw Opera House. This paper focuses on some aspects connected to digital transmission of audio signals, proper audience area sound coverage, achieving smooth frequency response, getting directive propagation at low frequencies, and controlling the system. Some measurement and tests about the topics presented in the paper have been done during the tuning of the system at Wroclaw Opera House. Achieved results prove that it is possible to acquire these targets.
Convention Paper 7213 (Purchase now)
P8-9 Using Audio Classifiers as a Mechanism for Content-Based Song Similarity—Benjamin Fields, Michael Casey, Goldsmiths College, University of London - London, UK
As collections of digital music become larger and more widespread, there is a growing need for assistance in a user's navigation and interaction with a collection and with the individual members of that collection. Examining pairwise song relationships and similarities, based upon content derived features, provides a useful tool to do so. This paper looks into a means of extending a song classification algorithm to provide song to song similarity information. In order to evaluate the effectiveness of this method, the similarity data is used to group the songs into k-means clusters. These clusters are then compared against the original genre sorting algorithm.
Convention Paper 7267 (Purchase now)
Saturday, October 6, 12:30 pm — 5:00 pm
Chair: James Johnston, Microsoft Corporation - Redmond, WA, USA
P9-1 Network Music Performance with Ultra-Low-Delay Audio Coding under Unreliable Network Conditions—Ulrich Kraemer, Jens Hirschfeld, Gerald Schuller, Stefan Wabnik, Fraunhofer IDMT - Ilmenau, Germany; Alexander Carôt, Christian Werner, University of Lübeck - Lübeck, Germany
A key issue for successfully interconnecting musicians in real-time over the Internet is minimizing the end-to-end signal delay for transmission and coding. The variance of transmission delay (“jitter”) occasionally causes some packets to arrive too late for playback. To avoid this problem previous approaches are working with rather large receive buffers while accepting larger delay. In this paper we will present a novel solution that keeps buffer sizes and delay minimal. On the network layer we are using a highly optimized audio framework called “Soundjack” and on the coding layer we are working with an ultra low-delay codec for high-quality audio. We analyze and evaluate a modified transmission and coding scheme for the Fraunhofer Ultra-Low-Delay (ULD) audio coder, which is designed to be more resilient to lost and late arriving data packets.
Convention Paper 7214 (Purchase now)
P9-2 A Very Low Bit-Rate Protection Layer to Increase the Robustness of the AMR-WB+ Codec against Bit Errors—Philippe Gournay, University of Sherbrooke - Sherbrooke, Quebec, Canada
Audio codecs face various channel impairments when used in challenging applications such as digital radio. The standard AMR-WB+ audio codec includes a concealment procedure to handle lost frames. It is also inherently robust to bit errors, although some bits within any given frame are more sensitive than others. Motivated by this observation, the present paper makes two contributions. First, a detailed study of the sensitivity of individual bits in AMR-WB+ frames is provided. All the bits in a frame are then divided into three sensitivity classes so that efficient unequal error protection (UEP) schemes can be designed. Then, a very low bit rate protection layer to increase the robustness of the codec against bit errors is proposed and assessed using the results of subjective audio quality tests. Remarkably, in contrast to the standard codec, where some errors have a very discernable effect, the protection layer ensures that the decoded audio is free of major channel artifacts even at a significant 0.5 percent bit error rate.
Convention Paper 7215 (Purchase now)
P9-3 Trellis Based Approach for Joint Optimization of Window Switching Decisions and Bit Resource Allocation—Vinay Melkote, Kenneth Rose, University of California at Santa Barbara - Santa Barbara, CA, USA
The fact that audio compression for streaming or storage is usually performed offline alleviates traditional constraints on encoding delay. We propose a rate-distortion optimized approach, within the MPEG Advanced Audio Coding framework, to trade delay for optimal window switching and resource allocation across frames. A trellis is constructed where stages correspond to audio frames, nodes represent window choices, and branches implement transition constraints .A suitable cost comprising bit consumption and psychoacoustic distortion, is optimized via multiple passes through the trellis until the desired bit-rate is achieved. The procedure offers optimal window switching as well as better bit distribution than conventional bit-reservoir schemes that are restricted to “borrow” bits from past frames. Objective and subjective tests show considerable performance gains.
Convention Paper 7216 (Purchase now)
P9-4 Transcoding of Dynamic Range Control Coefficients and Other Metadata into MPEG-4 HE AAC—Wolfgang Schildbach, Kurt Krauss, Coding Technologies - Nuremberg, Germany; Jonas Rödén, Coding Technologies - Stockholm, Sweden
With the introduction of HE-AAC (also known as aacPlus) into several new broadcasting systems, the topic of how to best encode new and transcode pre-existing metadata such as dynamic range control (DRC) data, program reference level and downmix coefficients into HE-AAC has gained renewed interest. This paper will discuss the means of carrying metadata within HE-AAC and derived standards like DVB, and present studies on how to convert metadata persistent in different formats into HE-AAC. Listening tests are employed to validate the results.
Convention Paper 7217 (Purchase now)
P9-5 Advanced Audio for Advanced IPTV Services—Roland Vlaicu, Oren Williams, Dolby Laboratories - San Francisco, CA, USA
Television service providers have significant new requirements for audio delivery in next-generation broadcast systems such as high-definition television and IPTV. These include the capability to deliver soundtracks from mono to 5.1 channels and beyond with greater efficiency than current systems. Compatibility with existing consumer home cinema systems must also be maintained. A new audio delivery system, Enhanced AC-3, has been developed to meet these requirements, and has been standardized in DVB, . . ., as well as in ATSC. Also, Enhanced AC-3 is being included in widely used middleware solutions and paired with RTP considerations. This paper describes how operators can manage multichannel assets on linear broadcast turn-around and video-on-demand services in order to provide a competitive IPTV offering.
Convention Paper 7218 (Purchase now)
P9-6 A Study of the MPEG Surround Quality versus Bit-Rate Curve—Jonas Rödén, Coding Technologies - Stockholm, Sweden; Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits - Erlangen, Germany; Heiko Purnhagen, Coding Technologies - Stockholm, Sweden; Erik Schuijers, Jeroen Kippens, Philips Applied Technologies - Eindhoven, The Netherlands; Karsten Linzmeier, Andreas Hölzer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
MPEG Surround provides unsurpassed multichannel audio compression efficiency by extending a mono or stereo audio coder with additional side information. This compression method has two important advantages. The first is its backward compatibility, which is important when MPEG Surround is employed to upgrade an existing service. Second, the amount of side information can be varied over a wide range to enable high-quality multichannel audio compression at extremely low bit rates up to perceptual transparency at higher bit rates. The present paper provides a study of the performance of MPEG Surround, highlighting the various tradeoffs that are available when using MPEG Surround. Furthermore a quality versus bit rate curve describing the MPEG Surround performance will be presented.
Convention Paper 7219 (Purchase now)
P9-7 Quality Impact of Diotic Versus Monaural Hearing on Processed Speech—Arnault Nagle, Catherine Quinquis, Aurélien Sollaud, Anne Battistello, France Telecom Research and Development - Lannion, France; Dirk Slock, Institut Eurecom - Antipolis Cedex, France
In VoIP audio conferencing, hearing is done over handsets or headphones, so through one or two ears. In order to keep the same loudness perception between the two modes, a listener can only tune the sound level. The goal of this paper is to show that monaural or diotic hearing has a quality impact on speech processed by VoIP coders. It can increase or decrease the differences in perceived quality between tested coders and even change their ranking according to the sound level. This impact on the ranking of the coders will be explained thanks to the normal equal-loudness-level contours over headphones and the specifics of some coders. It is important to be aware of the impact of the hearing system and its associated sound level.
Convention Paper 7220 (Purchase now)
P9-8 A Novel Audio Post-Processing Toolkit for the Enhancement of Audio Signals Coded at Low Bit Rates—Raghuram Annadana, Harinarayanan E.V., ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
Low bit rate audio coding often results in the loss of a number of key audio attributes such as audio bandwidth and stereo separation. Additionally, there is also typically a loss in the level of details and intelligibility and/or warmth in the signal. Due to the proliferation, e.g., on Internet, of low bit rate audio coded using a variety of coding schemes and bit rates over which the listener has no control, it is becoming increasingly attractive to incorporate processing tools in the player that can ensure a consistent listener experience. We describe a novel post-processing toolkit which incorporates tools for (i) stereo enhancement, (ii) blind bandwidth extension, (iii) automatic noise removal and audio enhancement, and, (iv) blind 2-to-5 channel upmixing. Algorithmic details, listening results, and audio demonstrations will be presented.
Convention Paper 7221 (Purchase now)
P9-9 Subjective Evaluation of Immersive Sound Field Rendition System and Recent Enhancements—Chandresh Dubey, Raghuram Annadana, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
Consumer audio applications such as satellite radio broadcasts, multichannel audio streaming, and playback systems coupled with the need to meet stringent bandwidth requirements are eliciting newer challenges in parametric multichannel audio coding schemes. This paper describes the continuation of our research concerning the Immersive Soundfield Rendition (ISR) system. In particular we present detailed subjective result data benchmarking the ISR system in comparison to MPEG Surround and also characterizing the audio quality level at different sub-modes of the system. We also describe enhancements to various algorithmic components in particular the blind 2-to-5 channel upmixing algorithm and describe a novel scheme for providing enhanced stereo downmix at the receiver for improved decoding by conventional matrix decoding systems.
Convention Paper 7222 (Purchase now)
Automotive Audio and Amplifiers
Saturday, October 6, 12:30 pm — 3:30 pm
Chair: Richard Stroud, Stroud Audio - Kokomo, IN, USA
P10-1 Improved Stereo Imaging in Automobiles—Michael Smithers, Dolby Laboratories - Sydney, NSW, Australia
A significant challenge in the automobile listening environment is the predominance of off-axis listening positions. This leads to audible artifacts including comb filtering and indeterminate stereo imaging; both in traditional stereo and more recent multichannel loudspeaker configurations. This paper discusses the problem of off-axis listening as well as methods to improve stereo imaging in a symmetric manner using all-pass FIR and IIR filters. This paper also discusses a more efficient IIR filter design that achieves similar performance to previous filter designs. Use of these filters results in stable, virtual sources in front of off-axis listeners.
Convention Paper 7223 (Purchase now)
P10-2 A Listening Test System for Automotive Audio—Part 3: Comparison of Attribute Ratings Made in a Vehicle with Those Made Using an Auralization System—Patrick Hegarty, Sylvain Choisel, Søren Bech, Bang & Olufsen a/s - Struer, Denmark
A system has been developed to allow listening tests of car audio sound systems to be conducted over headphones. The system employs dynamic binaural technology to capture and reproduce elements of an in-car soundfield. An experiment, a follow-up to a previous work, to validate the system is described. Seven trained listeners were asked to rate a range of stimuli in a car as well as over headphones for 15 elicited attributes. Analysis of variance was used to compare ratings from the two hardware setups. Results show the ratings for spatial attributes to be preserved while differences exist for some timbral and temporal attributes.
Convention Paper 7224 (Purchase now)
P10-3 A Listening Test System for Automotive Audio - Part 4: Comparison of Attribute Ratings Made by Expert and Non-Expert Listeners—Sylvain Choisel, Patrick Hegarty, Bang & Olufsen a/s - Struer, Denmark; Flemming Christensen, Benjamin Pedersen, Wolfgang Ellermeier, Jody Ghani, Wookeun Song, Aalborg University - Aalborg, Denmark
A series of experiments was conducted in order to validate an experimental procedure to perform listening tests on car audio systems in a simulation of the car environment in a laboratory, using binaural synthesis with head-tracking. Seven experts and 40 non-expert listeners rated a range of stimuli for 15 sound-quality attributes developed by the experts. This paper presents a comparison between the attribute ratings from the two groups of participants. Overall preference of the non-experts was also measured using direct ratings as well as indirect scaling based on paired comparisons. The results of both methods are compared.
Convention Paper 7225 (Purchase now)
P10-4 The Application of Direct Digital Feedback for Amplifier System Control—Craig Bell, David Jones, Robert Watts, Zetex Semiconductors - Oldham, UK
An effective feedback topology is clearly a beneficial requirement for a well performing digital amplifier. The ability to cancel corrupting influences such as power supply ripple and unmatched components is necessary for good sonic performance. Additional benefits derive from the fact that the feedback information is processed in the digital domain. Current delivered into the loudspeaker load can be inferred. The amplifier acts as a voltage source, the value of which is derived from the recorded source material. The current delivered into the loudspeaker is also clearly influenced by the load impedance, which varies with frequency and other factors. This paper describes the ability of the system to measure current and derive loudspeaker impedance and the actual delivered power and goes on to illustrate the applications in real systems.
Convention Paper 7226 (Purchase now)
P10-5 Generation of Variable Frequency Digital PWM—Pallab Midya, Freescale Semiconductor Inc. - Lake Zurich, IL, USA
Digital audio amplifiers convert digital PCM to digital PWM to be amplified by a power stage. This paper introduces a method to generate a quantized duty ratio digital PWM with a switching frequency over a 20 percent range to mitigate EMI issues. The method is able to compensate for the variation in switching frequency such that the SNR in the audio band is comparable to fixed frequency PWM. To obtain good rejection of the noise introduced by the variation of the PWM frequency higher order noise shapers are used. This paper describes in detail the algorithm for a fourth order noise shaper. Using this method dynamic range in excess of 120 dB unweighted over a 20 kHz bandwidth is achieved.
Convention Paper 7227 (Purchase now)
P10-6 Recursive Natural Sampling for Digital PWM—Pallab Midya, Bill Roeckner, Theresa Paulo, Freescale Semiconductor Inc. - Lake Zurich, IL, USA
This paper presents a highly accurate and computationally efficient method for digital-domain computation of naturally sampled digital pulse width modulation (PWM) signals. This method is used in a switching digital audio amplifier. The method is scalable for performance versus calculation complexity. Using a second order version of the algorithm with no iteration, intermodulation linearity of better than 113 dB is obtained with a full scale input at 19 kHz and 20 kHz. Matlab simulation and measured results from a digital amplifier implemented with this algorithm are presented. Overall system performance is not limited by the accuracy of the natural sampling method.
Convention Paper 7228 (Purchase now)
Saturday, October 6, 2:00 pm — 3:30 pm
P11-1 Impact of Equalizing Ear Canal Transfer Function on Out-of-Head Sound Localization—Masataka Yoshida, Nagaoka University of Technology - Nagaoka, Niigata, Japan; Akihiro Kudo, Tomakomai National College of Technology - Tomakomai, Hokkaido, Japan; Haruhide Hokari, Shoji Shimada, Nagaoka University of Technology - Nagaoka, Niigata, Japan
Several papers have pointed out that the frequency characteristics of the ear canal transfer functions (ECTFs) depend on headphone type, ear placement position of headphones, and subject's ear canal shape/volume. However, the effect of these factors on creating out-of-head sound localization has not been sufficiently clarified. The purpose of this paper is to clarify this effect. Sound localization tests using several types of headphones are performed in three conditions: listener's (individualized) ECTFs, HATS's (non-individualized) ECTFs, and omitted ECTFs. The results show that employing the individualized ECTFs generally yields accurate localization, while omitting the use of ECTFs increase the horizontal average localization error in accordance with the type of headphone employed.
Convention Paper 7229 (Purchase now)
P11-2 A Method for Estimating the Direction of Sound Image Localization for Designing a Virtual Sound Image Localization Control System—Yoshiki Ohta, Kensaku Obata, Pioneer Corporation - Tsurugashima-city, Saitama, Japan
We developed a method of estimating the direction of sound image localization. Our method is based on the sound pressure distribution in the vicinity of a listener. In the experiment, band noises that only differ in phase were produced from two loudspeakers. We determined what relation existed between the subjective direction of the sound image localization and the objective sound pressure distribution in the vicinity of the listener. We found that an azimuth of localization can be expressed as a linear combination of sound pressure levels in the vicinity of the listener. Our method can be used to estimate azimuths with a high degree of accuracy and to associate phase differences with azimuths. Therefore, it can be used to design a system for controlling virtual sound image localization.
Convention Paper 7230 (Purchase now)
P11-3 A Preliminary Experimental Study on Perception of Movement of a Focused Sound Using a 16-Channel Loudspeaker Array—Daiki Sato, Musashi Institute of Technology - Setagaya-ku, Tokyo, Japan; Teruki Oto, Kenwood Corporation - Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Ryuzo Horiguchi, Advanced Industrial Science and Technology - Tsukuba, Japan, and Musashi Institute of Technology, Setagaya-ku, Tokyo, Japan; Shogo Kiryu, Musashi Institute of Technology - Setagaya-ku, Tokyo, Japan
We have been developing a sound field effecter by using a loudspeaker array. In order to design a practical system, psychoacoustic experiments for recognition of sound fields are required. In this paper perception of a sound focus is investigated using a 16-channel loudspeaker array. Listening experiments were conducted in an anechoic room and a listening room. The movement of 25 cm in horizontal direction and the movement of 100 cm in the direction from the loudspeaker array toward the subject could be recognized in both rooms, but that in the vertical could not be perceived in both rooms.
Convention Paper 7231 (Purchase now)
P11-4 Perceptual Categories of Artificial Reverberation for Headphone Reproduction of Music—Atsushi Marui, Tokyo National University of Fine Arts and Music - Tokyo, Japan
In the studies of artificial reverberations, the focus is usually on recreating the natural reverberation that can be heard in the real environment. However, little attention was paid to the evaluation of useful ranges in application of the artificial reverberation in music production. The focus of this paper is to discuss and evaluate three artificial reverberation algorithms intended for headphone reproduction of music, and to propose iso-usefulness contour on those algorithms for several different types of musical sounds.
Convention Paper 7232 (Purchase now)
P11-5 Correspondence Relationship between Physical Factors and Psychological Impressions of Microphone Arrays for Orchestra Recording—Toru Kamekawa, Atsushi Marui, Tokyo National University of Fine Arts and Music - Tokyo, Japan; Hideo Irimajiri, Mainichi Broadcasting Corporation - Kita-ku, Osaka, Japan
Microphone technique for the surround sound recording of an orchestra is discussed. Eight types of well known microphone arrays recorded in a concert hall were compared in subjective listening tests on seven attributes such as spaciousness, powerfulness, and localization using a method inspired by MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor). The result of the experiment shows similarity and dissimilarity between each microphone array. It is estimated that directivity of a microphone and distance between each microphone are related to the character of the microphone array, and these similarities are changed by music character. The relations of the physical factors of each array were also compared, such as SC (Spectral Centroid), LFC (Lateral Fraction Coefficient), and IACC (Inter Aural Cross-correlation Coefficient) from the impulse response of each array or recordings by a dummy head. The correlation of these physical factors and the attribute scores show that the contribution of these physical factors depends on music.
Convention Paper 7233 (Purchase now)
P11-6 Assessment of the Quality of Digital Audio Reproduction Devices by Panels of Listeners of Different Professional Profiles—Piotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland; Marek Pluta, Academy of Music in Cracow - Krakow, Poland; Szymon Piotrowski, AGH University of Science and Technology - Krakow, Poland
A series of experiments has been conducted, where different panels of listeners assessed the quality of some selected digital audio reproduction devices. The quality of the devices covered a very wide range from budget MP3 players through to a professional high resolution digital-to-analog conversion system. The main goal of this research was to investigate whether panels of listeners of different professional profiles are able to give different evaluations of the sound quality. Some interesting results have been obtained.
Convention Paper 7234 (Purchase now)
Audio Content Management
Saturday, October 6, 3:30 pm — 5:30 pm
Chair: Rob Maher, Montana State University - Bozeman, MT, USA
P12-1 Music Structure Segmentation Using the Azimugram in Conjunction with Principal Component Analysis—Dan Barry, Mikel Gainza, Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
A novel method to segment stereo music recordings into formal musical structures such as verses and choruses is presented. The method performs dimensional reduction on a time-azimuth representation of audio, which results in a set of time activation sequences, each of which corresponds to a repeating structural segment. This is based on the assumption that each segment type such as verse or chorus has a unique energy distribution across the stereo field. It can be shown that these unique energy distributions along with their time activation sequences are the latent principal components of the time-azimuth representation. It can be shown that each time activation sequence represents a structural segment such as a verse or chorus.
Convention Paper 7235 (Purchase now)
P12-2 Using the Semantic Web for Enhanced Audio Experiences—Yves Raimond, Mark Sandler, Queen Mary, University of London - London, UK
In this paper we give a quick overview of some key Semantic Web technologies, allowing us to overcome the limitations of the current web of documents to create a machine-processable web of data, where information is accessible by automated means. We then detail a framework for dealing with audio-related information on the Semantic Web: the Music Ontology. We describe some examples of how this ontology has been used to link together heterogeneous data sets, dealing with editorial, cultural or acoustic data. Finally, we explain a methodology to embed such knowledge into audio applications (from digital jukeboxes and digital archives to audio editors and sequencers), along with concrete examples and implementations.
Convention Paper 7236 (Purchase now)
P12-3 Content Management Using Native XML and XML-Enabled Database Systems in Conjunction with XML Metadata Exchange Standards—Nicolas Sincaglia, DePaul University - Chicago, IL, USA
The digital entertainment industry has developed communication standards to support the distribution of digital content using XML technology. Recipients of these data communications are challenged when transforming and storing the hierarchical XML data structures into more traditional relational database structures for content management purposes. Native XML and XML-enabled database systems provide possible solutions to many of these challenges. This paper will consider several data modeling design options and evaluate the suitability of these alternatives for content data management.
Convention Paper 7237 (Purchase now)
P12-4 Music Information Retrieval in Broadcasting: Some Visual Applications—Andrew Mason, Michael Evans, Alia Sheikh, British Broadcasting Corporation Research - Tadworth, Surrey, UK
The academic research field of music information retrieval is expanding as rapidly as the MP3 collection of a stereotypical teenager. This could be no coincidence: the benefit of an automated genre classifier increases when the music collection contains several thousand tracks. Of course, there are other applications of music information retrieval. Here we highlight a few that make use of a simple, visual, representation of an audio signal, based on three easy-to-calculate audio features. The applications range from simple navigation around consumer recordings of broadcasts, to a music video production planning tool, to a short term "Listen Again" eye-catching display.
Convention Paper 7238 (Purchase now)
Acoustic Modeling, Part 1
Sunday, October 7, 8:30 am — 12:30 pm
Chair: Kurt Graffy, Arup Acoustics - San Francisco, CA, USA
P13-1 Addressing the Discrepancy Between Measured and Modeled Impulse Responses for Small Rooms—Zhixin Chen, Robert Maher, Montana State University - Bozeman, MT, USA
Simple computer modeling of impulse responses for small rectangular rooms is typically based on the image source method, which results in an impulse response with very high time resolution. Image source method is easy to implement, but simulated impulse responses are often a poor match to measured impulse responses because descriptions of sources, receivers, and room surfaces are often too idealized to match real measurement conditions. In this paper a more elaborate room impulse response computer modeling technique is developed by incorporating measured polar responses of loudspeaker, measured polar responses of microphone, and measured reflection coefficients of room surfaces into basic image source method. Results show that compared with basic image source method, the modeled room impulse response using this method is a better match to the measured room impulse response, as predicted by standard acoustical theories and principles.
Convention Paper 7239 (Purchase now)
P13-2 Comparison of Simulated and Measured HRTFs: FDTD Simulation Using MRI Head Data—Parham Mokhtari, Hironori Takemoto, Ryouichi Nishimura, Hiroaki Kato, NICT/ATR - Kyoto, Japan
This paper presents a comparison of computer-simulated versus acoustically measured, front-hemisphere head related transfer functions (HRTFs) of two human subjects. Simulations were carried out with a 3-D finite difference time domain (FDTD) method, using magnetic resonance imaging (MRI) data of each subject’s head. A spectral distortion measure was used to quantify the similarity between pairs of HRTFs. Despite various causes of mismatch including a different head-to-source distance, the simulation results agreed considerably with the acoustic measurements, particularly in the major peaks and notches of the front ipsilateral HRTFs. Averaged over 133 source locations and both ears, mean spectral distortions for the two subjects were 4.7 dB and 3.8 dB respectively.
Convention Paper 7240 (Purchase now)
P13-3 Scattering Uniformity Measurements and First Reflection Analysis in a Large Nonanechoic Environment—Lorenzo Rizzi, LAE - Laboratorio di Acustica ed Elettroacustica - Parma, Italy; Angelo Farina, Università di Parma - Parma, Italy; Paolo Galaverna, Genesis Acoustic Workshop - Parma, Italy; Paolo Martignon, Andrea Rosati, Lorenzo Conti, LAE - Laboratorio di Acustica ed Elettroacustica - Parma, Italy
A new campaign of experiments was run on the floor of a large room to obtain a long enough anechoic time window. This permitted us to study the first reflection from the panels themselves and their diffusion uniformity. The results are discussed, comparing them with past measurements and with the ones from a simplified set-up with a smaller geometry. Some key matters to measurement are discussed; they were proposed in a recent comment letter posted to the specific AES-4id document committee on its reaffirmation. An analysis of the single reflection and reflectivity data was undertaken to investigate the behavior of a perforated panel and the measurement set-up overall potential.
Convention Paper 7241 (Purchase now)
P13-4 A Note On the Implementation of Directive Sources in Discrete Time-Domain Dispersive Meshes for Room Acoustic Simulation—José Escolano, University of Jaén - Jaén, Spain; José J. López, Technical University of Valencia - Valencia, Spain; Basilio Pueo, University of Alicante - Alicante, Spain; Maximo Cobos, Technical University of Valencia - Valencia, Spain
The use of wave methods to simulate room impulse responses provides the most accurate solutions. Recently, a method to incorporate directive sources in discrete-time methods, such as finite differences and digital waveguide mesh has been proposed. It is based in the proper combination of monopoles in order to achieve the desired directive pattern in far field conditions. However, this method is used without taking into account the inherent dispersion in most of these discrete-time paradigms. This paper analyzes how influent is the dispersion in order to get the proper directivity through different study cases.
Convention Paper 7242 (Purchase now)
P13-5 Rendering of Virtual Sound Sources with Arbitrary Directivity in Higher Order Ambisonics—Jens Ahrens, Sascha Spors, Technical University of Berlin - Berlin, Germany
Higher order Ambisonics (HOA) is a spatial audio reproduction technique aiming at physically synthesizing a desired sound field. It is based on the expansion of sound fields into orthogonal basis functions (spatial harmonics). In this paper we present an approach to the two-dimensional reproduction of virtual sound sources at arbitrary positions having arbitrary directivities. The approach is based on the description of the directional properties of a source by a set of circular harmonics. Consequences of truncation of the circular harmonics expansion and spatial sampling as occurring in typical installations of HOA systems due to the employment of a finite number of loudspeakers are discussed. We illustrate our descriptions with simulated reproduction results.
Convention Paper 7243 (Purchase now)
P13-6 The Ill-Conditioning Problem in Sound Field Reconstruction—Filippo Fazi, Philip Nelson, University of Southampton - Southampton, UK
A method for the analysis and reconstruction of a three dimensional sound field using an array of microphones and an array of loudspeakers is presented. The criterion used to process the microphone signals and obtain the loudspeaker signals is based on the minimization of the least-square error between the reconstructed and the original sound field. This approach requires the formulation of an inverse problem that can lead to unstable solutions due to the ill-conditioning of the propagation matrix. The concepts of generalized Fourier transform and singular value decomposition are introduced and applied to the solution of the inverse problem in order to obtain stable solutions and to provide a clear understanding of the regularization method.
Convention Paper 7244 (Purchase now)
P13-7 Analysis of Edge Boundary Conditions on Multiactuator Panels—Basilio Pueo, University of Alicante - Alicante, Spain; José Escolano, University of Jaén - Jaén, Spain; José J. López, Technical University of Valencia - Valencia, Spain; Sergio Bleda, University of Alicante - Alicante, Spain
Distributed mode loudspeakers consist of a flat panel of a light and stiff material to which a mechanical exciter is attached, creating bending waves that are then radiated as sound fields. It can be used to build arrays for wave field synthesis reproduction by using multiple exciters in a single vibrating surface. The exciter interaction with the panel, the panel material, and the panel contour clamp conditions are some of the critical points that need to be evaluated and improved. In this paper we address the edge boundary conditions influence the quality of the emitted wave field. The measures of the wave fields have been interpreted in the wavenumber domain, where the source radiation is decomposed into plane waves for arbitrary angles of incidence. Results show how the wave field is degraded when the boundary conditions are modified.
Convention Paper 7245 (Purchase now)
P13-8 Acoustics in Rock and Pop Music Halls—Niels W. Adelman-Larsen, Flex Acoustics - Lyngby, Denmark; Eric Thompson, Anders C. Gade, Technical University of Denmark - Lyngby, Denmark
The existing body of literature regarding the acoustic design of concert halls has focused almost exclusively on classical music, although there are many more performances of rhythmic music, including rock and pop. Objective measurements were made of the acoustics of twenty rock music venues in Denmark and a questionnaire was used in a subjective assessment of those venues with professional rock musicians and sound engineers. Correlations between the objective and subjective results lead, among others, to a recommendation for reverberation time as a function of hall volume. Since the bass frequency sounds are typically highly amplified, they play an important role in the subjective ratings and the 63-Hz-band must be included in objective measurements and recommendations.
Convention Paper 7246 (Purchase now)
Signal Processing Applied To Music
Sunday, October 7, 9:00 am — 12:00 pm
Chair: John Strawn, S Systems - Larkspur, CA, USA
P14-1 Interactive Beat Tracking for Assisted Annotation of Percussive Music—Michael Evans, British Broadcasting Corporation - Tadworth, Surrey, UK
A practical, interactive beat-tracking algorithm for percussive music is described. Regularly-spaced note onsets are determined by energy-based analysis and users can then explore candidate beat periods and phases as the overall rhythm pattern develops throughout the track. This assisted approach can allow more flexible rhythmic analysis than purely automatic algorithms. An open-source software package based on the algorithm has been developed, along with several practical applications to allow more effective annotation, segmentation, and analysis of music.
Convention Paper 7247 (Purchase now)
P14-2 Identification of Partials in Polyphonic Mixtures Based on Temporal Envelope Similarity—David Gunawan, D. Sen, The University of New South Wales - Sydney, NSW, Australia
In musical instrument sound source separation, the temporal envelopes of the partials are correlated due to the physical constraints of the instruments. With this assumption, separation algorithms then exploit the similarities between the partial envelopes in order to group partials into sources. In this paper we quantitatively investigate the partial envelope similarities of a large database of instrument samples and develop weighting functions in order to model the similarities. These model partials then provide a reference to identify similar partials of the same source. The partial identification algorithm is evaluated in the separation of polyphonic mixtures and is shown to successfully discriminate between partials from different sources.
Convention Paper 7248 (Purchase now)
P14-3 Structural Decomposition of Recorded Vocal Performances and it's Application to Intelligent Audio Editing—György Fazekas, Mark Sandler, Queen Mary University of London - London, UK
In an intelligent editing environment, the semantic music structure can be used as beneficial assistance during the postproduction process. In this paper we propose a new approach to extract both low and high level hierarchical structure from vocal tracks of multitrack master recordings. Contrary to most segmentation methods for polyphonic audio, we utilize extra information available when analyzing a single audio track. A sequence of symbols is derived using a hierarchical decomposition method involving onset detection, pitch tracking, and timbre modeling to capture phonetic similarity. Results show that the applied model well captures similarity of short voice segments.
Convention Paper 7249 (Purchase now)
P14-4 Vibrato Experiments with Bassoon Sounds by Means of the Digital Pulse Forming Synthesis and Analysis Framework—Michael Oehler, Institute for Music and Drama - Hanover, Germany; Christoph Reuter, University of Cologne - Cologne, Germany
The perceived naturalness of real and synthesized bassoon vibrato sounds is investigated in a listening test. The stimuli were generated by means of a currently developed synthesis and analysis framework for wind instrument sounds, based on the pulse forming theory. The framework allows controlling amplitude and frequency parameters at many different stages during the sound production process. Applying an ANOVA and Tukey HSD test it could be shown that timbre modulation (a combined pulse width and cycle duration modulation) is an important factor for the perceived naturalness of bassoon vibrato sounds. Obtained results may be useful for sound synthesis as well as in the field of timbre research.
Convention Paper 7250 (Purchase now)
P14-5 A High Level Musical Score Alignment Technique Based on Fuzzy Logic and DTW—Bruno Gagnon, Roch Lefebvre, Charles-Antoine Brunet, University of Sherbrooke - Sherbrooke, Quebec, Canada
This paper presents a method to align musical notes extracted from an audio signal with the notes of the musical score being played. Building on conventional alignment systems using Dynamic Time Warping (DTW), the proposed method uses fuzzy logic to create the similarity matrix used by DTW. Like a musician following a score, the fuzzy logic system uses high level information as its inputs, such as note identity, note duration, and local rhythm. Using high level information instead of frame by frame information reduces substantially the size of the DTW similarity matrix and thus reduces significantly the complexity to find the best path for alignment. Finally, the proposed method can automatically track where a musician starts and stops playing in a musical score.
Convention Paper 7251 (Purchase now)
P14-6 Audio Synthesis and Visualization with Flash CS3 and ActionScript 3.0—Jordan Kolasinski, New York University - New York, NY, USA
This paper explains the methods and techniques used to build a fully functional audio synthesizer and FFT-based audio visualizer within the newest version of Flash CS3. Audio synthesis and visualization have not been possible to achieve in previous versions of Flash, but two new elements of ActionScript 3.0—the Byte Array and Compute Spectrum function—make it possible even though it is not included in Flash’s codebase. Since Flash is present on 99 percent of the world’s computers, this opens many new opportunities for audio on the web.
Convention Paper 7252 (Purchase now)
Acoustic Modeling, Part 2
Sunday, October 7, 1:00 pm — 5:00 pm
Chair: Geoff Martin, Bang & Olufsen a/s - Struer, Denmark
P15-1 Improvement of One-Dimensional Loudspeaker Models—Juha Backman, Nokia Corporation - Espoo, Finland
Simple one-dimensional waveguide models of loudspeaker enclosures describe well enclosures with simple interior geometry, but their accuracy is limited if used with more complex internal structures. The paper compares the results from one-dimensional models to FEM models for some simplified enclosure geometries found in typical designs. Based on these results it is apparent that one-dimensional models need to be refined to take some three-dimensional aspects of the sound field in close proximity of drivers into account. Approximations matched to FEM solutions are presented for enclosure impedance as seen by the driver and for the end correction of ports, taking both edge rounding and distance to the back wall into account.
Convention Paper 7253 (Purchase now)
P15-2 Simulating the Directivity Behavior of Loudspeakers with Crossover Filters—Stefan Feistel, Wolfgang Ahnert, Ahnert Feistel Media Group - Berlin, Germany; Charles Hughes, Excelsior Audio Design & Services, LLC - Gastonia, NC, USA; Bruce Olson, Olson Sound Design - Brooklyn Park, MN, USA
In previous publications the description of loudspeakers was introduced based on high-resolution data, comprising most importantly of complex directivity data for individual drivers as well as of crossover filters. In this paper it is presented how this concept can be exploited to predict the directivity balloon of multi-way loudspeakers depending on the chosen crossover filters. Simple filter settings such as gain and delay and more complex IIR filters are utilized for loudspeaker measurements and simulations, results are compared and discussed. In addition advice is given how measurements should be made particularly regarding active and passive loudspeaker systems.
Convention Paper 7254 (Purchase now)
P15-3 Intrinsic Membrane Friction and Onset of Chaos in an Electrodynamic Loudspeaker—Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia; Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia
Chaotic state observed in an electrodynamic loudspeaker results from a nonlinear equation of motion and is driven by an harmonic restoring term being assisted by intrinsic membrane friction. This friction is not the smooth function of displacements but the sum of local hysteretic surface fluctuations, which give rise to its high differentiability in displacements, being responsible for onset of Feigenbaum bifurcation cascades and chaos. When an external small perturbation of low differentiability is added to the friction, another type of chaotic state appears, and this state involves period-3 window evidenced for the first time in these experiments.
Convention Paper 7255 (Purchase now)
P15-4 Damping of an Electrodynamic Loudspeaker by Air Viscosity and Turbulence—Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
Damping of an electrodynamic loudspeaker has been studied with respect to air turbulence and viscosity. Both quantities were evaluated as a difference of damping friction measured in air and in an evacuated space. The viscous friction dominates for small driving currents (< 10 mA) and is masked by turbulence for currents extending up to 100 mA. Turbulence contribution was evaluated as a difference of air damping friction at 1.0 and 0.1 bars, and it was studied for selected driving frequencies. Hot wire anemometry has been adopted to meet requirements of convection study from the loudspeaker, and obtained spectra were compared to measured turbulence friction, in order to trace the perturbation of emitted signal by turbulent motion.
Convention Paper 7256 (Purchase now)
P15-5 Energetic Sound Field Analysis of Stereo and Multichannel Loudspeaker Reproduction—Juha Merimaa, Creative Advanced Technology Center - Scotts Valley, CA, USA
Energetic sound field analysis has been previously applied to encoding the spatial properties of multichannel signals. This paper contributes to the understanding of how stereo or multichannel loudspeaker signals transform into energetic sound field quantities. Expressions for the active intensity, energy density, and energetic diffuseness estimate are derived as a function of signal magnitudes, cross-correlations, and loudspeaker directions. It is shown that the active intensity vector can be expressed in terms of the Gerzon velocity and energy vectors, and its direction can be related to the tangent law of amplitude panning. Furthermore, several cases are identified where the energetic analysis data may not adequately represent the spatial properties of the original signals.
Convention Paper 7257 (Purchase now)
P15-6 A New Methodology for the Acoustic Design of Compression Driver Phase-Plugs with Concentric Annular Slots—Mark Dodd, Celestion International Ltd. - Ipswich, UK, and GP Acoustics (UK) Ltd., Maidstone, UK; Jack Oclee-Brown, GP Acoustics (UK) Ltd. - Maidstone, UK, and University of Southampton, UK
In compression drivers a large membrane is coupled to a small horn throat resulting in high efficiency. For this efficiency to be maintained to high frequencies the volume of the resulting cavity, between horn and membrane, must be kept small. Early workers devised a phase-plug to fill most of the cavity volume and connect membrane to horn throat with concentric annular channels of equal length to avoid destructive interference . Later work, representing the cavity as a flat disc, describes a method for calculating the positions and areas of these annular channels where they exit the cavity, giving least modal excitation, thus avoiding undesirable response irregularities. In this paper the result of applying both the equal path-length and modal approaches to a phase-plug with concentric annular channels coupled to a cavity shaped as a flat disc is further explored. The assumption that the cavity may be represented as a flat disc is investigated by comparing its behavior to that of an axially vibrating rigid spherical cap radiating into a curved cavity. It is demonstrated that channel arrangements derived for a flat disc are not optimum for use in a typical compression driver with a curved cavity. A new methodology for calculating the channel positions and areas giving least modal excitation is described. The impact of the new approach will be illustrated with a practical design.
Convention Paper 7258 (Purchase now)
P15-7 A Computational Model for Optimizing Microphone Placement on Headset Mounted Arrays—Philip Gillett, Marty Johnson, Jamie Carneal, Virginia Tech Vibration and Acoustics Laboratories - Blacksburg, VA, USA
Microphone arrays mounted on headsets provide a platform for performing transparent hearing, source localization, focused listening, and enhanced communications while passively protecting the hearing of the wearer. However it is not trivial to determine the microphone positions that optimize these capabilities, as no analytical solution exists to model acoustical diffraction around both the human and headset. As an alternative to an iterative experimental approach for optimization, an equivalent source model of the human torso, head, and headset is developed. Results show that the model closely matches the microphone responses measured from a headset placed on a Kemar mannequin in an anechoic environment.
Convention Paper 7259 (Purchase now)
P15-8 A Simple Simulation of Acoustic Radiation from a Vibrating Object—Cynthia Bruyns Maxwell, University of California at Berkeley - Berkeley, CA, USA
The goal of this paper is to explore the role that fluid coupling plays on the vibration of an object, and to investigate how one can model such effects. We want to determine whether the effects of coupling to the medium surrounding a vibrating object are significant enough to warrant including them into our current instrument modeling software. For example, we wish to examine how the resonant frequencies of an object change due to the presence of a surrounding medium. We also want to examine the different methods of modeling acoustic radiation in interior and exterior domains. Using a simple 2-D beam as an example, this investigation shows that coupling with dense fluids, such as water, dramatically changes the resonant frequencies of the system. We also show that using a simple finite element model and modal analysis, we can simulate the acoustic radiation profile and determine a realistic sound pressure level at arbitrary points in the domain in real-time.
Convention Paper 7260 (Purchase now)
Signal Processing for Room Correction
Sunday, October 7, 1:00 pm — 4:00 pm
Chair: Rhonda Wilson, Meridian Audio - UK
P16-1 Sampling the Energy in a 3-D Sound Field—Jan Abildgaard Pedersen, Lyngdorf Audio - Skive, Denmark
The energy in the 3-D sound field in a room holds crucial information needed when designing a room correction system. This paper shows how measured sound pressure in at least 4 randomly selected positions scattered across the entire listening room is a robust estimate of the energy in the 3-D sound field. The reproducibility was investigated for a different number of random positions, which lead to an assessment of the robustness of a room correction system based on different number of random microphone positions.
Convention Paper 7261 (Purchase now)
P16-2 Multi-Source Room Equalization: Reducing Room Resonances—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
Room equalization traditionally has been implemented as a single correction filter applied to all the channels in the audio system. Having more sources reproducing the same monophonic low-frequency signal in a room has the benefit of not exciting certain room modes, but it does not remove other strong room resonances. This paper explores the concept of using some of the loudspeakers as sources, while others are effectively sinks of acoustic energy, so that as acoustic signals cross the listening area, they flow preferentially from sources to sinks. This approach resists the buildup of room resonances, so that modal peaks and antimodal dips are reduced in level, leaving a more uniform low-frequency response. Impulse responses in several real rooms were measured with a number of loudspeaker positions and a small collection of observer positions. These were used to study the effect of source and sink assignment, and the derivation of an appropriate signal delay and response to optimize the room behavour. Particular studies are made of a common 5.0 loudspeaker setup, and some stereo configurations with two or more standard subwoofers. A measurable room parameter is defined that quantifies the deleterious effects of low-frequency room resonances, supported by a specific room equalization philosophy. Results are encouraging but not striking. Signal modification needs to be considered.
Convention Paper 7262 (Purchase now)
P16-3 A Low Complexity Perceptually Tuned Room Correction System—James Johnston, Serge Smirnov, Microsoft Corporation - Redmond, WA, USA
In many listening situations using loudspeakers, the actualities of room arrangements and the acoustics of the listening space combine to create a situation where the audio signal is unsatisfactorily rendered from the listener’s position. This is often true not only for computer-monitor situations, but also for home theater or surround-sound situations in which some loudspeakers may be too close to too far from the listener, in which some loudspeakers (center, surrounds) may be different than the main loudspeakers, or in which room peculiarities introduce problems in imaging or timbre coloration. In this paper we explain a room-correction algorithm that restores imaging characteristics, equalizes the first-attack frequency response of the loudspeakers, and substantially improves the listeners’ experience by using relatively simple render-side DSP in combination with a sophisticated room analysis engine that is expressly designed to capture room characteristics that are important for stereo imaging and timbre correction.
Convention Paper 7263 (Purchase now)
P16-4 Variable-Octave Complex Smoothing—Sunil Bharitkar, Audyssey Labs - Los Angeles, CA, USA, and University of Southern California, Los Angeles, CA, USA
In this paper we present a technique for processing room responses using a variable-octave complex-domain (viz., time-domain) smoother. Traditional techniques for room response processing, for equalization and other applications such as auralization, have focused on a constant-octave (e.g., 1/3 octave) and with magnitude domain smoothing of these room responses. However, recent research has shown that room responses need to be processed with a high resolution especially in the low-frequency region to characterize the discrete room modal structure as these are distinctly audible. Coupled this with the need for reducing the computational requirements associated with filters obtained from undesirable over-fitting the high-frequency part of the room response with such a high-Q complex-domain smoother, and knowledge of the fact that the auditory filters have wider bandwidth (viz., lower resolution) in the high-frequency part of the human hearing, the present paper proposes a variable-octave complex-domain smoothing. Thus this paper incorporates, simultaneously, the high low-frequency resolution requirement as well as the requirement of relatively lower-resolution fitting of the room response in the high-frequency part through a perceptually motivated approach.
Convention Paper 7264 (Purchase now)
P16-5 Multichannel Inverse Filtering With Minimal-Phase Regularization—Scott Norcross, Communications Research Centre - Ottawa, Ontario, Canada; Martin Bouchard, University of Ottawa - Ottawa, Ontario, Canada
Inverse filtering methods are used in numerous audio applications such as loudspeaker and room correction. Regularization is commonly used to limit the amount of the original response that the inverse filter attempts to correct in an effort to reduce audible artifacts. It has been shown that the amount and type of regularization used in the inversion process must be carefully chosen so that it does not add additional artifacts that can degrade the audio signal. A method of designing a target function based on the regularization magnitude was introduced by the authors, where a minimal-phase target function could be used to reduce any pre-response caused by the regularization. In the current paper a multichannel inverse filtering scheme is introduced and explored where the phase of the regularization itself can be chosen to reduce the audibility of the added regularization. In the single-channel case, this approach is shown to be equivalent to the technique that was previously introduced by the authors.
Convention Paper 7265 (Purchase now)
P16-6 An In-flight Low-Latency Acoustic Feedback Cancellation Algorithm—Nermin Osmanovic, Consultant - Seattle, WA, USA; Victor E. Clarke, Erich Velandia, Gables Engineering - Coral Gables, FL, USA
Acoustic feedback is a common problem in high gain systems; it is very unpredictable and unpleasant to the ear. Cockpit communication systems on aircraft may suffer from acoustic feedback between a pilot’s boomset microphone and high gain cockpit loudspeaker. The acoustic feedback tone can compromise flight safety by temporarily blocking communication between the pilot and ground control. This paper presents the design of an in-flight low latency (<6 ms) digital audio processing system that automatically detects and removes acoustic feedback tones from the microphone to loudspeaker audio path. We present information about the acoustic feedback cancellation algorithm including the calculation of feedback existence probability, as implemented in an aircraft cockpit communication system.
Convention Paper 7266 (Purchase now)
Signal Processing Applied to Music
Sunday, October 7, 2:30 pm — 4:00 pm
P17-1 Toward Textual Annotation of Rhythmic Style in Electronic Dance Music—Kurt Jacobson, Matthew Davies, Mark Sandler, Queen Mary University of London - London, UK
Music information retrieval encompasses a complex and diverse set of problems. Some recent work has focused on automatic textual annotation of audio data, paralleling work in image retrieval. Here we take a narrower approach to the automatic textual annotation of music signals and focus on rhythmic style. Training data for rhythmic styles are derived from simple, precisely labeled drum loops intended for content creation. These loops are already textually annotated with the rhythmic style they represent. The training loops are then compared against a database of music content to apply textual annotations of rhythmic style to unheard music signals. Three distinct methods of rhythmic analysis are explored. These methods are tested on a small collection of electronic dance music resulting in a labeling accuracy of 73 percent.
Convention Paper 7268 (Purchase now)
P17-2 Key-Independent Classification of Harmonic Change in Musical Audio—Ernest Li, Juan Pablo Bello, New York University - New York, NY, USA
We introduce a novel method for describing the harmonic development of a musical signal by using only low-level audio features. Our approach uses Euclidean and phase distances in a tonal centroid space. Both measurements are taken between successive chroma partitions of a harmonically segmented signal, for each of three harmonic circles representing fifths, major thirds, and minor thirds. The resulting feature vector can be used to quantify a string of successive chord changes according to changes in chord quality and movement of the chordal root. We demonstrate that our feature set can provide both unique classification and accurate identification of harmonic changes, while resisting variations in orchestration and key.
Convention Paper 7269 (Purchase now)
P17-3 Automatic Bar Line Segmentation—Mikel Gainza, Dan Barry, Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
A method that segments the audio according to the position of the bar lines is presented. The method detects musical bars that frequently repeat in different parts of a musical piece by using an audio similarity matrix. The position of each bar line is predicted by using prior information about the position of previous bar lines as well as the estimated bar length. The bar line segmentation method does not depend on the presence of percussive instruments to calculate the bar length. In addition, the alignment of the bars allows moderate tempo deviations
Convention Paper 7270 (Purchase now)
P17-4 The Analysis and Determination of the Tuning System in Audio Musical Signals—Peyman Heydarian, Lewis Jones, Allan Seago, London Metropolitan University - London, UK
The tuning system is an essential aspect of a musical piece. It specifies the scale intervals and contributes to the emotions of a song. There is a direct relationship between the musical mode and the tuning of a piece for modal musical traditions. In a broader sense it represents the different genres. In this paper algorithms based on spectral and chroma averages are developed to construct patterns from audio musical files. Then a similarity measure like the Manhattan distance or the cross-correlation determines the similarity of a piece to each tuning class. The tuning system provides valuable information about a piece and is worth incorporating into the metadata of a musical file.
Convention Paper 7271 (Purchase now)
Sunday, October 7, 4:00 pm — 6:00 pm
Chair: Durand R. Begault, Charles M. Salter Associates - San Francisco, CA, USA
P18-1 Experiment in Computational Voice Elimination Using Formant Analysis—Durand R. Begault, Charles M. Salter Associates - San Francisco, CA, USA
This paper explores the use of a computational approach to the elimination of a known from an unknown voice exemplar in a forensic voice elimination protocol. A subset of voice exemplars from 11 talkers, taken from the TIMIT database, were analyzed using a formant tracking program. Intra- versus inter-speaker mean formant frequencies are analyzed and compared.
Convention Paper 7272 (Purchase now)
P18-2 Applications of ENF Analysis Method in Forensic Authentication of Digital Audio and Video Recordings—Catalin Grigoras, Forensic Examiner - Bucharest, Romania
This paper reports on the electric network frequency (ENF) method as a means of assessing the integrity of digital audio/video evidence analysis. A brief description is given to different ENF types and phenomena that determine ENF variations, analysis methods, stability over different geographical locations on continental Europe, interlaboratory validation tests, uncertainty of measurement, real case investigations, different compression algorithm effects on ENF values and possible problems to be encountered during forensic examinations. By applying the ENF Method in forensic audio/video analysis, one can determine whether and where a digital recording has been edited, establish whether it was made at the time claimed, and identify the time and date of the registering operation.
Convention Paper 7273 (Purchase now)
P18-3 Quantifying the Speaking Voice: Generating a Speaker Code as a Means of Speaker Identification Using a Simple Code-Matching Technique—Peter S. Popolo, National Center for Voice and Speech - Denver, CO, USA, and University of Iowa, Iowa City, IA, USA; Richard W. Sanders, National Center for Voice and Speech - Denver, CO, USA, and University of Colorado at Denver, Denver, CO, USA; Ingo R. Titze, University of Iowa - Iowa City, IA, USA and University of Colorado at Denver, Denver, CO, USA
This paper looks at a methodology of quantifying the speaking voice, by which temporal and spectral features of the voice are extracted and processed to create a numeric code that identifies speakers, so those speakers can be searched in a database much like fingerprints. The parameters studied include: (1) average fundamental frequency (F0) of the speech signal over time, (2) standard deviation of the F0, (3) the slope and (4) sign of the FO contour, (5) the average energy, (6) the standard deviation of the energy, (7) the spectral energy contained from 50 Hz to 1,000 Hz, (8) the spectral energy from 1,000 Hz to 5,000 Hz, (9) the Alpha Ratio, (10) the average speaking rate, and (11) the total duration of the spoken sentence.
Convention Paper 7274 (Purchase now)
P18-4 Further Investigations into the ENF Criterion for Forensic Authentication—Eddy Brixen, EBB-consult - Smørum, Denmark
In forensic audio one important task is the authentication of audio recordings. In the field of digital audio and digital media one single complete methodology has not been demonstrated yet. However, the ENF (Electric Network Frequency) Criterion has shown promising results and should be regarded as a major tool in that respect. By tracing the electric network frequency in the recorded signal a unique timestamp is provided. This paper analyses a number of situations with the purpose to provide further information for the assessment of this methodology. The topics are: ways to establish reference data, spectral contents of the electromagnetic fields, low bit rate codecs’ treatment of low level hum components, and tracing ENF harmonic components.
Convention Paper 7275 (Purchase now)
Signal Processing for 3-D Audio, Part 1
Monday, October 8, 9:00 am — 12:00 pm
Chair: Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
P19-1 Spatial Audio Scene Coding in a Universal Two-Channel 3-D Stereo Format—Jean-Marc Jot, Arvindh Krishnaswami, Jean Laroche, Juha Merimaa, Mike Goodwin, Creative Advanced Technology Centre - Scotts Valley, CA, USA
We describe a frequency-domain method for phase-amplitude matrix decoding and up-mixing of two-channel stereo recordings, based on spatial analysis of 2-D or 3-D directional and ambient cues in the recording and re-synthesis of these cues for consistent reproduction over any headphone or loudspeaker playback system. The decoder is compatible with existing two-channel phase-amplitude stereo formats; however, unlike existing time-domain decoders, it preserves source separation and allows accurate reproduction of ambiance and reverberation cues. The two-channel spatial encoding/decoding scheme is extended to incorporate 3-D elevation, without relying on HRTF cues. Applications include data-efficient storage or transmission of multichannel soundtracks and computationally-efficient interactive audio spatialization in a backward-compatible stereo encoding format.
Convention Paper 7276 (Purchase now)
P19-2 Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding—Michael Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
In standard virtualization of stereo or multichannel recordings for headphone reproduction, channel-dependent interaural relationships based on head-related transfer functions are imposed on each input channel in the binaural mix. In this paper we describe a new binaural reproduction paradigm based on frequency-domain spatial analysis-synthesis. The input content is analyzed for channel-independent positional information on a time-frequency basis, and the binaural signal is generated by applying appropriate HRTF cues to each time-frequency component, resulting in a high spatial resolution that overcomes a fundamental limitation of channel-centric virtualization methods. The spatial analysis and synthesis algorithms are discussed in detail and a variety of applications are described.
Convention Paper 7277 (Purchase now)
P19-3 Real-Time Spatial Representation of Moving Sound Sources—Christos Tsakostas, Holistiks Engineering Systems - Athens, Greece; Andreas Floros, Ionian University - Corfu, Greece
The simulation of moving sound sources represents a fundamental issue for efficiently representing virtual worlds and acoustic environments but it is limited by the Head Related Transfer Function resolution measurement, usually overcome by interpolation techniques. In this paper a novel time-varying binaural convolution / filtering algorithm is presented that, based on a frequency morphing mechanism that takes into account both physical and psychoacoustic criteria, can efficiently simulate a moving sound source. It is shown that the proposed algorithm overcomes the excessive calculation load problems usually raised by legacy moving sound source spatial representation techniques, while high-quality 3-D sound spatial quality is achieved in both terms of objective and subjective criteria.
Convention Paper 7279 (Purchase now)
P19-4 The Use of Cephalometric Features for Headmodels in Spatial Audio Processing—Sunil Bharitkar, Audyssey Labs - Los Angeles, CA, USA, and University of Southern California, Los Angeles, CA, USA; Pall Gislason, Audyssey Labs - Los Angeles, CA, USA
In two-channel or stereo applications, such as for televisions, automotive infotainment, and hi-fi systems, the loudspeakers are typically placed substantially close to each other. The sound field generated from such a setup creates an image that is perceived as monophonic while lacking sufficient spatial “presence.” Due to this limitation, a stereo expansion technique may be utilized to widen the soundstage to give the perception to listener(s) that sound is originated from a wider angle (e.g., +/– 30 degrees relative to the median plane) using head-related-transfer functions (HRTF’s). In this paper we propose extensions to the headmodel (viz., the ipsilateral and contralateral headshadow functions) based on analysis of the diffraction of sound around head cephalometric features, such as the nose, whose dimensions are of the order to cause variations in the headshadow responses in the high-frequency region. Modeling these variations is important for accurate rendering of a spatialized sound-field for 3-D audio applications. Specifically, this paper presents refinements to the existing spherical head-models for spatial audio applications.
Convention Paper 7280 (Purchase now)
P19-5 MDCT Domain Analysis and Synthesis of Reverberation for Parametric Stereo Audio—K. Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
We propose a parametric stereo coding analysis and synthesis directly in the MDCT domain using an analysis by synthesis parameter estimation. The stereo signal is represented by an equalized sum signal and spatialization parameters. Equalized sum signal and the spatialization parameters are obtained by sub-band analysis in the MDCT domain. The de-correlated signal required for the stereo synthesis is also generated in the MDCT domain. Subjective evaluation test using MUSHRA shows that the synthesized stereo signal is perceptually satisfactory and comparable to the state of the art parametric coders.
Convention Paper 7281 (Purchase now)
P19-6 Correlation-Based Ambience Extraction from Stereo Recordings—Juha Merimaa, Michael M. Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
One of the key components in current multichannel upmixing techniques is identification and extraction of ambience from original stereo recordings. This paper describes correlation-based ambience extraction within a time-frequency analysis-synthesis framework. Two new estimators for the time- and frequency-dependent amount of ambience in the input channels are analytically derived. These estimators are discussed
in relationship to two other algorithms from the literature and evaluated with simulations. It is also shown that the time constant used in a recursive correlation computation is an important factor in determining the performance of the algorithms. Short-time correlation estimates are typically biased such that the amount of ambience is underestimated.
Convention Paper 7282 (Purchase now)
Applications in Audio, Part 1
Monday, October 8, 9:30 am — 12:00 pm
Chair: Michael Kelly, Sony Computer Entertainment Europe - London, UK
P20-1 A Study of Hearing Damage Caused by Personal MP3 Players—Adriano Farina, Liceo Ginnasio statale G.D. Romagnosi - Parma, Italy
This paper aims to assess the actual in-hear sound pressure level during use of mp3 players. The method is based on standard EN 50332 (100 dB as maximum SPL), IEC 60959 (HATS), and IEC 60711 (ear simulators), as explained in the January 2007 issue of the Bruel and Kjaer Magazine (page 13). In this study a number of MP3 players were tested, employing a dummy head and a software for spectrum analysis. The measurements were aimed to assess the hearing damage risk for youngsters who employ an MP3 player for several hours/day. The students of an Italian high school (15 to 18 years old) were asked to supply their personal devices for testing, leaving untouched the gain from the last usage. The results show that the risk of hearing damage is real for many of the devices tested, which revealed to be capable of reproducing average sound pressure levels well above the risk threshold.
Convention Paper 7283 (Purchase now)
P20-2 Electret Receiver for In-Ear Earphone—Shu-Ru Lin, Dar-Ming Chiang, I-Chen Lee, Yan-Ren Chen, Industrial Technology Research Institute (ITRI) - Chutung, Hsinchu, Taiwan
This paper presents an electret receiver developed for in-ear earphones. The electret diaphragm is fabricated by a nano-porous fluoropolymer and charged by the corona method at room temperature. The electret diaphragm is driven to vibrate as a piston motion and sound by the electrostatic force while the audio signal is applied. The influence factors, such as electrostatic charge quantities of electret diaphragm and distance between the electrode plate and diaphragm, are investigated to promote the output sound pressure level of the in-ear earphone. An enclosure with resonators is also designed to improve the efficient performance of the in-ear earphone. Consequently, the output sound pressure inside the 2cc coupler can be lifted to exceed 105 dB at 1 kHz with the driving voltage of sound signal Vpp=±3V and remarkably enlarge the output sound pressure level response at low frequency.
Convention Paper 7284 (Purchase now)
P20-3 New Generation Artificial Larynx—Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland, and Excellence Center, PROKSIM, Warsaw, Poland; Piotr Odya, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland, and Excellence Center, PROKSIM, Warsaw, Poland; Piotr Szczuko, Gdansk University of Technology - Gdansk, Poland
The aim of the presented paper is to show a new generation of devices for laryngectomy patients. The artificial larynx has many disadvantages. The major problem is a background noise caused by the device. There are two different approaches to solve this task. The first one focuses on the artificial larynx. The artificial larynx engineered was equipped with a digital processor and an amplifier. Two algorithms, namely spectral subtraction algorithm and the comb filter, were proposed for noise reduction. The second approach employs PDA to generate speech. A speech synthesis is performed, allowing for playing back any sentence, therefore any text can be entered by a user and played through PDA speaker.
Convention Paper 7285 (Purchase now)
P20-4 A Graphical Method for Studying Spectra Containing Harmonics and Other Patterns—Palmyra Catravas, Union College - Schenectady, NY, USA
A technique for identifying and characterizing patterns in spectra is described. Multiple harmonic series, odd and even harmonics, and missing modes produce identifiable signatures. Motion enhances visual recognition of systematic effects. The technique is adapted for use with more complicated, inharmonic spectral patterns.
Convention Paper 7286 (Purchase now)
P20-5 Immersive Auditory Environments for Teaching and Learning—Elizabeth Parvin, New York University - New York, NY, USA
3-D audio simulations allow for the creation of immersive auditory environments for enhanced and alternative interactive learning. Several supporting teaching and learning philosophies are presented. Experimental research and literature on spatial cognition and sound perception provide further backing. Museums, schools, research and training facilities, as well as online educational websites all significantly can benefit from its use. Design dependence on project purpose, content, and audience is explored. An example installation is discussed.
Convention Paper 7287 (Purchase now)
Signal Processing, Part 1
Monday, October 8, 10:00 am — 11:30 am
P21-1 Dynamic Bit-Rate Adaptation for Speech and Audio—Nicolle H. van Schijndel, Philips Research - Eindhoven, The Netherlands; Laetitia Gros, France Telecom R&D - Lannion, France; Steven van de Par, Philips Research - Eindhoven, The Netherlands
Many audio and speech transmission applications have to deal with highly time-varying channel capacities, making dynamic adaptation to bit rate an important issue. This paper investigates such adaptation using a coder that is driven by rate-distortion optimization mechanisms, always coding the full signal bandwidth. For perceptual evaluation, the continuous quality evaluation methodology was used, which has specifically been designed for dynamic quality testing. Results show latency and smoothing effects in the judged audio quality, but no quality penalty for the switching between quality levels; the overall quality using adaptation is comparable to using the average available bit rate. Thus, dynamic bit-rate adaptation has a clear benefit as compared to always using the lowest guaranteed available rate.
Convention Paper 7288 (Purchase now)
P21-2 A 216 kHz 124 dB Single Die Stereo Delta Sigma Audio Analog-to-Digital Converter—YuQing Yang, Terry Scully, Jacob Abraham, Texas Instruments, Inc. - Austin, TX, USA
A 216 kHz single die stereo delta sigma ADC is designed for high precision audio applications. A single loop, fifth-order, thirty-three level delta sigma analog modulator with positive and negative feedforward path is implemented. An interpolated multilevel quantizer with unevenly weighted quantization levels replaces a conventional 5-bit flash type quantizer in this design. These new techniques suppress the signal dependent energy inside the delta sigma loop and reduce internal channel noise coupling. Integrated with an on-chip bandgap reference circuit, DEM (dynamic element matching) circuit and a linear phase, FIR decimation filter, the ADC achieves 124 dB dynamic range (A-weighted), –110 dB THD+N over a 20 kHz bandwidth. Inter-channel isolation is 130 dB. Power consumption is approximately 330 mW.
Convention Paper 7289 (Purchase now)
P21-3 Encoding Bandpass Signals Using Level Crossings: A Model-Based Approach—Ramdas Kumaresan, Nitesh Panchal, University of Rhode Island - Kingston, RI, USA
A new approach to representing a time-limited, and essentially bandlimited signal x(t), by a set of discrete frequency/time values is proposed. The set of discrete frequencies is the set of frequency locations at which (real and imaginary parts of) the Fourier transform of x(t) cross certain levels and the set of discrete time values corresponds to the traditional level crossings of x(t). The proposed representation is based on a simple bandpass signal model called a Sum-of-Sincs (SOS) model, that exploits our knowledge of the bandwidth/timewidth of x(t). Given the discrete fequency/time locations, we can reconstruct the x(t) by solving a least-squares problem. Using this approach, we propose an analysis/synthesis algorithm to decompose and represent composite signals like speech.
Convention Paper 7290 (Purchase now)
P21-4 Theory of Short-Time Generalized Harmonic Analysis (SGHA) and its Fundamental Characteristics—Teruo Muraoka, University of Tokyo - Meguro-ku, Tokyo, Japan; Takahiro Miura, University of Tokyo - Bunkyo-ku, Tokyo, Japan; Daisuke Ochiai, Tohru Ifukube, University of Tokyo - Meguro-ku, Tokyo, Japan
Current digital signal processing was utilized practically by rapid progress of processing hardware brought by IC technology and processing algorithms such as FFT and digital filtering. In short, they are for modifying any digitalized signals and classified into following two methods: (1) digital filtering [parametric processing] and (2) analysis-synthesis [non-parametric processing]. Both methods commonly have a weak point when detecting and removing any locally existing frequency components without any side effects. This difficulty will be removed by applying inharmonic frequency analysis. Its fundamental principle was proven by N. Wiener in his publication of "Generalized Harmonic Analysis (GHA)" in 1930. Its application to practical signal processing was achieved by Dr. Y. Hirata in 1994, and the method corresponds to GHA's short time and sequential processing, therefore let us call it Short-Time Generalized Harmonic Analysis (SGHA). The authors have been engaged in research of its fundamental characteristics and application to noise reduction and reported the results at previous AES conventions. This time, SGHA's fundamental theory will be explained together with its characteristics.
Convention Paper 7291 (Purchase now)
P21-5 Quality Improvement Using a Sinusoidal Model in HE-AAC—Jung Geun Kim, Dong-Il Hyun, Dae Hee Youn, Yonsei University - Seoul, Korea; Young Cheol Park, Yonsei University - Wonju-City, Korea
This paper identifies a phenomenon that a signal is distorted because noise floor is generated when restoring a tone in HE-AAC, which does not exist in the original input signal. To solve this matter, it suggests how to restore only the original tonal components in decoding by adding a sinusoidal model to the HE-AAC encoder. In this process, the sinusoidal model is used to analyze a tone and to move it to the place where noise floor is reduced. The lower the bit-rate is, the lower the frequency where the restoration by SBR (Spectral Band Replication) is started becomes; and in the lower frequency, the distortion phenomenon by noise inflow can be sensed easily. Thus, the effect of improvement in the suggested method is greater, and it is beneficial that no additional information or operation in the decoding process is needed.
Convention Paper 7292 (Purchase now)
P21-6 Special Hearing Aid for Stuttering People—Piotr Odya, Gdansk University of Technology - Gdansk, Poland; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland, and Excellence Center, PROKSIM, Warsaw, Poland
Owing to recent progress in digital signal processor developments it has been possible to build a subminiature device combining speech and a hearing aid. Despite its small dimensions, the device can execute quite complex algorithms and can be easily reprogrammed. The paper puts an emphasis on issues related to the design and implementation of algorithms applicable to both speech and hearing aids. Frequency shifting or delaying the audio signal are often used for speech fluency improvement. The basic frequency altering algorithm is similar to the sound compression algorithm used in some special hearing aids. Therefore, the experimental device presented in this paper provides a universal hearing and speech aid that may be used by hearing or speech impaired persons or by persons suffering from both problems, simultaneously.
Convention Paper 7293 (Purchase now)
P21-7 An Improved Low Complexity AMR-WB+ Encoder Using Neural Networks for Mode Selection—Jérémie Lecomte, Roch Lefebvre, Guy Richard, Université de Sherbrooke - Sherbrooke, Quebec, Canada
This paper presents an alternative mode selector based on neural networks to improve the low-complexity AMR-WB+ standard audio coder especially at low bit rates. The AMR-WB+ audio coder is a multimode coder using both time-domain and frequency-domain modes. In low complexity operation, the standard encoder determines the coding mode on a frame-by-frame basis by essentially applying thresholding to parameters extracted from the input signal and using a logic that favors time-domain modes. The mode selector proposed in this paper reduces this bias and achieves a mode decision, which is closer to the full complexity encoder. This results in measurable quality improvements in both objective and subjective assessments.
Convention Paper 7294 (Purchase now)
Signal Processing for 3-D Audio, Part 2
Monday, October 8, 1:00 pm — 4:30 pm
Chair: Soren Bech, Bang & Olufsen a/s - Struer, Denmark
P22-1 Real-Time Auralization Employing a Not-Linear, Not-Time-Invariant Convolver—Angelo Farina, University of Parma - Parma, Italy; Adriano Farina, Liceo Ginnasio Statale G. D. Romagnosi - Parma, Italy
The paper reports the first results of listening tests performed with a new software tool, capable of not-linear convolution (employing the Diagonal Volterra Kernel approach) and of time-variation (employing efficient morphing among a number of kernels). The listening tests were done in a special listening room, employing a menu-driven playback system, capable of presenting blindly sound samples recorded from real-world devices and samples simulated employing the new software tool, and, for comparison, samples obtained by traditional linear, time-invariant convolution. The listener answers a questionnaire for each sound sample, being able to switch them back and forth for better comparing. The results show that this new device-emulation tool provides much better results than already-existing convolution plug-ins (which only emulate the linear, time-invariant behavior), requiring little computational load and causing short latency and prompt reaction to user’s action.
Convention Paper 7295 (Purchase now)
P22-2 Real-Time Panning Convolution Reverberation—Rebecca Stewart, Mark Sandler, Queen Mary University of London - London, UK
Convolution reverberation is an excellent method for generating high-quality artificial reverberation that accurately portrays a specific space, but it can only represent the static listener and source positions of the measured impulse response being convolved. In this paper multiple measured impulse responses along with interpolated impulse responses between measured locations are convolved with dry input audio to create the illusion of a moving source. The computational cost is decreased by using a hybrid approach to reverberation that recreates the early reflections through convolution with a truncated impulse response, while the late reverberation is simulated with a feedback delay network.
Convention Paper 7296 (Purchase now)
P22-3 Ambisonic Panning—Martin Neukom, Zurich University of the Arts - Zurich, Switzerland
Ambisonics is a surround-system for encoding and rendering a 3-D sound field. Sound is encoded and stored in multichannel sound files and is decoded for playback. In this paper a panning function equivalent to the result of ambisonic encoding and so-called in-phase decoding is presented. In this function the order of ambisonic resolution is just a variable that can be an arbitrary positive number not restricted to integers and that can be changed during playback. The equivalence is shown, limitations and advantages of the technique are mentioned, and real time applications are described.
Convention Paper 7297 (Purchase now)
P22-4 Adaptive Karhunen-Lòeve Transform for Multichannel Audio—Yu Jiao, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
In previous works, the authors proposed the hierarchical bandwidth limitation technique based on Karhunen-Lòeve Transform (KLT) to reduce the bandwidth for multichannel audio transmission. The subjective results proved that this technique could be used to reduce the overall bandwidth without significant audio quality degradation. Further study found that the transform matrix varied considerably over time for many recordings. In this paper the KLT matrix was calculated based on short-term signals and updated adaptively over time. The perceptual effects of the adaptive KLT process were studied using a series of listening tests. The results showed that adaptive KLT resulted in better spatial quality than nonadaptive KLT but introduced some other artifacts.
Convention Paper 7298 (Purchase now)
P22-5 Extension of an Analytic Secondary Source Selection Criterion for Wave Field Synthesis—Sascha Spors, Berlin University of Technology - Berlin, Germany
Wave field synthesis (WFS) is a spatial sound reproduction technique that facilitates a high number of loudspeakers (secondary sources) to create a virtual auditory scene for a large listening area. It requires a sensible selection of the loudspeakers that are active for the reproduction of a particular virtual source. For virtual point sources and plane waves suitable intuitively derived selection criteria are used in practical implementations. However, for more complex virtual source models and loudspeaker array contours the selection might not be straightforward. In a previous publication the author proposed secondary source selection criterion on the basis of the sound intensity vector. This contribution will extend this criterion to data-based rendering and focused sources and will discuss truncation effects.
Convention Paper 7299 (Purchase now)
P22-6 Adaptive Wave Field Synthesis for Sound Field Reproduction: Theory, Experiments, and Future Perspectives—Philippe-Aubert Gauthier, Alain Berry, Université de Sherbrooke - Sherbrooke, Quebec, Canada
Wave field synthesis is a sound field reproduction technology that assumes that the reproduction environment is anechoic. A real reproduction space thus reduces the objective accuracy of wave field synthesis. Adaptive wave field synthesis is defined as a combination of wave field synthesis and active compensation. With adaptive wave field synthesis the reproduction errors are minimized along with the departure penalty from the wave field synthesis solution. Analysis based on the singular value decomposition connects wave field synthesis, active compensation, and Ambisonics. The decomposition allows the practical implementation of adaptive wave field synthesis based on independent radiation mode control. Results of experiments in different rooms support the theoretical propositions and show the efficiency of adaptive wave field synthesis for sound field reproduction.
Convention Paper 7300 (Purchase now)
P22-7 360° Localization via 4.x RACE Processing—Ralph Glasgal, Ambiophonics Institute - Rockleigh, NJ, USA
Recursive Ambiophonic Crosstalk Elimination (RACE), implemented as a VST plug-in, convolved from an impulse response, or purchased as part of a TacT Audio or other home audiophile product, properly reproduces all the ITD and ILD data sequestered in most standard two or multichannel media. Ambiophonics is so named because it is intended to be the replacement for 75 year old stereophonics and 5.1 in the home, car, or monitoring studio, but not in theaters. The response curves show that RACE produces a loudspeaker binaural sound field with no audible colorations, much like Ambisonics or Wavefield Synthesis. RACE can do this starting with most standard CD/LP/DVD two, four or five-channel media, or even better, 2 or 4 channel recordings made with an Ambiophone, using one or two pairs of closely spaced loudspeakers. The RACE stage can easily span up to 170° for two channel orchestral recordings or 360° for movie/electronic-music surround sources. RACE is not sensitive to head rotation and listeners can nod, recline, stand up, lean sideways, move forward and back, or sit one behind the other. As in 5.1, off center listeners can easily localize the center dialog even though no center speaker is ever needed.
Convention Paper 7301 (Purchase now)
Applications in Audio, Part 2
Monday, October 8, 1:00 pm — 3:30 pm
Chair: Juha Backman, Nokia Corporation - Espoo, Finland
P23-1 Loudspeaker Systems for Flat Television Sets—Herwig Behrends, Werner Bradinal, Christoph Heinsberger, NXP Semiconductors - Hamburg, Germany
The rapidly increasing sales of liquid crystal- and plasma display television sets lead to new challenges to the sound processing inside the TV-sets. Flat cabinets do not sufficiently accommodate room for loudspeakers that are able to reproduce frequencies below 100 to 200 Hz without distortions and with a reasonable sound pressure level. Cost reduction forces the set makers to use cheap and small loudspeakers, which are in no way comparable to the loudspeakers used in cathode ray tube televisions. In this paper we will describe the trends and the requirements of the market and discuss different approaches and a practical implementation of a new algorithm, which tackle these problems.
Convention Paper 7302 (Purchase now)
P23-2 Loudspeakers for Flexible Displays—Takehiro Sugimoto, Kazuho Ono, NHK Science & Technical Research Laboratories - Setagaya-ku, Tokyo, Japan; Kohichi Kurozumi, NHK Engineering Services - Setagaya-ku, Tokyo, Japan; Akio Ando, NHK Science & Technical Research Laboratories - Setagaya-ku, Tokyo, Japan; Akira Hara, Yuichi Morita, Akito Miura, Foster Electric Co., Ltd. - Akishima, Tokyo, Japan
Flexible displays that can be rolled up would allow users to enjoy programs wherever they are. NHK Science & Technical Research laboratories have been developing flexible displays for mobile television. The loudspeaker for such televisions must have the same features as the displays; they must be thin, lightweight, and flexible. We created two types of loudspeakers; one was made of polyvinylidene fluoride and the other used electro-dynamic actuators. Their characteristics were demonstrated to be suitable for mobile use and promising for flexible displays.
Convention Paper 7303 (Purchase now)
P23-3 Software-Based Live Sound Measurements, Part 2—Wolfgang Ahnert, Stefan Feistel, Alexandru Radu Miron, Enno Finder, Ahnert Feistel Media Group - Berlin, Germany
In previous publications the authors introduced the software-based measuring system EASERA to be used for measurements with prerecorded music and speech signals. This second part investigates the use of excitation signals supplied from an independent external source in real-time. Using a newly developed program module live-sound recordings or speech and music signals from a microphone input and from the mixing console can be utilized to obtain impulse response data for further evaluation. New noise suppression methods are presented that allow these impulse responses to be acquired in full-length even in occupied venues. As case studies, room acoustic measurements based on live sound supply are discussed for a concert hall and a large cathedral. Required measuring conditions and limitations are derived as a result.
Convention Paper 7304 (Purchase now)
P23-4 A System for Remote Control of the Height of Suspended Microphones—Douglas McKinnie, Middle Tennessee State University - Murfreesboro, TN, USA
An electrically driven pulley system allowing remote control of the height of cable-suspended microphones is described. It can be assembled from inexpensive and readily available component parts. A reverse block-and tackle system is used to allow many meters of cable to be drawn into a 1.2 meter long space, allowing the cable to remain connected and the microphone to remain in use during movement. An advantage of this system is that single microphones, stereo pairs, or microphone arrays can be remotely positioned "by ear" during rehearsal, soundcheck, or warmup.
Convention Paper 7305 (Purchase now)
P23-5 Music at Your Fingertips: An Electrotactile Fader—Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Tactile sensations can be invoked by applying short high-voltage low-current electrical pulses to the skin. This phenomenon has been researched extensively to support visually or hearing impaired persons. However, it can also be applied to operate audio production tools in eyes-free mode and without acoustical interferences. The electrotactile fader presented in this paper is used to indicate markers or to “display” a track’s short-time spectrum using five electrodes mounted on the lever. As opposed to mechanical solutions, which may for instance involve the fader’s motor, the electrotactile display neither causes acoustic noise nor reduces the fader’s input precision due to vibration.
Convention Paper 7306 (Purchase now)
Signal Processing, Part 2
Monday, October 8, 1:30 pm — 3:00 pm
P24-1 Concept and Components of a Sound Field Effector Using a Loudspeaker Array—Teruki Oto, Kenwood Corporation - Tokyo, Japan; Tomoaki Tanno, Jiang Hua, Risa Tamaura, Syogo Kiryu, Musashi Institute of Technology - Tokyo, Japan; Toru Kamekawa, Tokyo National University of Fine Arts and Music - Tokyo, Japan
Most effectors used for electrical music instruments provide some temporal changes to sounds. If effectors aimed at spatial expressions had been developed, artists could have a new performance. We propose a Sound Field Effector using a loudspeaker array. Various sound fields such as a focus can be controlled in real time by sound engineering and/or artists. The Sound Field Effector is mainly divided to software parts and hardware parts. A 16-ch. system was developed as a prototype. The system can change sound fields within 1 msec. A focal pattern produced with the system was measured in an anechoic room.
Convention Paper 7307 (Purchase now)
P24-2 A Novel Mapping with Natural Transition from Linear to Logarithmic Scaling—Joerg Panzer, R&D Team - Salgen, Germany
The area hyperbolic function ArSinh has the interesting property of performing a linear mapping at arguments close to zero and a quasi-logarithmic mapping for large arguments. Further, it works also with a negative abscissa and at the zero-point. The transition from the linear to the logarithmic range is monotonic, so is the transition to the negative range. This paper demonstrates the use of the ArSinh-function in a range of application examples, such as zooming into the display of transfer-functions, sampling of curves with high density at a specific point, and a coarse resolution elsewhere. The paper also reviews the linear and logarithmic mapping and discusses the properties of the new ArSinh-mapping.
Convention Paper 7308 (Purchase now)
P24-3 Real Time Implementation of an Innovative Digital Audio Equalizer—Stefania Cecchi, Paolo Peretti, Lorenzo Palestini, Francesco Piazza, Università Politecnica Delle Marche - Ancona, Italy; Ferruccio Bettarelli, Ariano Lattanzi, Leaff Engineering - Porto Potenza Picena (MC), Italy
Fixed frequency response audio equalization has well-known problems due to algorithms computational complexity and to the filters design techniques. This paper describes the design and the real time implementation of an M-band linear phase digital audio equalizer. Beginning from multirate systems and filterbanks, an innovative uniform and nonuniform bands audio equalizer is derived. The idea of this work arises from different approaches employed in filterbanks to avoid aliasing in the case of adaptive filtering in each band. The effectiveness of the real time implementation is shown comparing it with a frequency domain equalizer. The solution presented here has several advantages in terms of low computational complexity, low delay, and uniform frequency response avoiding ripple between adjacent bands.
Convention Paper 7309 (Purchase now)
P24-4 Wideband Beamforming Method Using Two-Dimensional Digital Filter—Koji Kushida, Yasushi Shimizu, Yamaha Corporation - Japan; Kiyoshi Nishikawa, Kanazawa University - Kanazawa, Japan
This paper presents a method for designing a DSP-controlled directional array loudspeaker with constant directivity and specified sidelobe level over the wideband frequency by means of the two-dimensional (2-D) Fourier series approximation. The band of the constant directivity can be extended in the lower frequency band by using the nonphysical area in the 2-D frequency plane, where the target amplitude response of the 2-D filter is set to design the 2-D FIR filter. We discuss that the beamwidth of the array loudspeaker can be narrowed in the lower frequency band with a modification of the original algorithm by K. Nishikawa, et al.
Convention Paper 7310 (Purchase now)
P24-5 Linear Phase Mixed FIR/IIR Crossover Networks: Design and Real-Time Implementation—Lorenzo Palestini, Paolo Peretti, Stefania Cecchi, Francesco Piazza, Università Politecnica Delle Marche - Ancona, Italy; Ariano Lattanzi, Ferruccio Bettarelli, Leaff Engineering - Porto Potenza Picena (MC), Italy
Crossover networks are crucial components of audio reproduction systems and therefore they have received great attention in literature. In this paper the design and implementation of a digital crossover will be presented. A mixed FIR/IIR solution has been explored in order to exploit the respective strengths of FIR and IIR realizations, aiming at designing a low delay, low complexity, easily extendible, approximately linear phase crossover network. A software real-time implementation for the NU-Tech platform of the proposed system will be shown. Practical tests have been carried out to evaluate the performance of the proposed approach.
Convention Paper 7311 (Purchase now)
P24-6 Convolutive Blind Source Separation of Speech Signals in the Low Frequency Bands—Maria Jafari, Mark Plumbley, Queen Mary University of London - London, UK
Sub-band methods are often used to address the problem of convolutive blind speech separation, as they offer the computational advantage of approximating convolutions by multiplications. The computational load, however, often remains quite high, because separation is performed on several sub-bands. In this paper we exploit the well known fact that the high frequency content of speech signals typically conveys little information, since most of the speech power is found in frequencies up to 4 kHz, and consider separation only in frequency bands below a certain threshold. We investigate the effect of changing the threshold, and find that separation performed only in the low frequencies can lead to the recovered signals being similar in quality to those extracted from all frequencies.
Convention Paper 7312 (Purchase now)
P24-7 A Highly Directive 2-Capsule Based Microphone—Christof Faller, Illusonic LLC - Chavannes, Switzerland
While microphone technology has reached a high level of performance in terms of signal-to-noise ratio and linearity, directivity of commonly used first order microphones is limited. Higher order gradient based microphones can achieve higher directivity but suffer from signal-to-noise ratio issues. The usefulness of beamforming techniques with multiple capsules is limited due to high cost (a high number of capsules is required for high directivity) and high frequency variant directional response. A highly directive 2-capsule-based microphone is proposed, using two cardioid capsules. Time-frequency processing is applied to the corresponding two signals. A highly directive directional response is achieved that is time invariant and frequency invariant over a large frequency range.
Convention Paper 7313 (Purchase now)