AES Munich 2009
Paper Session Details
Audio for Telecommunications
Thursday, May 7, 09:00 — 11:30
Chair: Damian Murphy
P1-1 20 Things You Should Know Before Migrating Your Audio Network to IP—Simon Daniels, APT - Belfast, Northern Ireland, UK
For many years, synchronous networks have been considered the industry standard for audio transport worldwide. Balanced analog copper circuits, microwave, and synchronous based systems such as V.35/X.21 or T1/E1 have been the traditional choice for studio transmitter and inter-studio links in professional audio broadcast networks. Readily available from all major service providers, the popularity of synchronous links has been largely due to the fact that they offer dedicated, reliable, point-to-point and bi-directional communication at guaranteed data and error rates. However, the reign of synchronous links as the preferred choice for STLs is currently coming under threat from a new challenger, in the form of IP-based network technology.
Convention Paper 7651 (Purchase now)
P1-2 Deploying Large Scale Audio IP Networks—Kevin Campbell, APT - Belfast, Northern Ireland, UK
This paper will examine the key considerations for those interested in deploying large-scale ip audio networks. It will include an overview of the main challenges and draw on the experience of national public broadcasters who have already migrated to IP. We will provide an overview of the key concerns such as jitter, delay, and link reliability that are valid for an IP network of any size. However, this paper will focus mainly on the issues arising from the greater complexity and
scale of large national and country-wide deployments. The paper will use illustrations and network applications from real-world deployments to illustrate the points.
Paper presented by Hartmut Foerster
Convention Paper 7652 (Purchase now)
P1-3 A Spatial Filtering Approach for Directional Audio Coding—Markus Kallinger, Henning Ochsenfeld, Giovanni Del Galdo, Fabian Kuech, Dirk Mahne, Richard Schultz-Amling, Oliver Thiergart, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
In hands-free telephony, spatial filtering techniques are employed to enhance intelligibility of speech. More precisely, these techniques aim at reducing the reverberation of the desired speech signal and attenuating interferences. Additionally, it is well-known that the spatially separate reproduction of desired and interfering sources enhance intelligibility of speech. For the latter task, Directional Audio Coding (DirAC) has proven to be an efficient method to capture and reproduce spatial sound. In this paper we propose a spatial filtering processing block, which works in the parameter domain of DirAC. Simulation results show that compared to a standard beamformer the novel technique offers significantly higher interference attenuation, while introducing comparably low distortion of the desired signal. Additional subjective tests of speech intelligibility confirm the instrumentally obtained results.
Convention Paper 7653 (Purchase now)
P1-4 A New Bandwidth Extension for Audio Signals without Using Side-Information—Kha Le Dinh, Chon Tam Le Dinh, Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
The use of narrow bandwidth (300 – 3400 Hz) in the current telephone network limits the perceptual quality of telephone conversations. Changing to wideband network is a solution that can help to improve quality, but it will need a long time to upgrade. Thus, bandwidth extension can be seen as an alternative solution during the transition time. A new bandwidth extension method is presented in this paper. Without using any side-information, the proposed method can be applied as a post-processing step at the terminal devices, maintaining the compatibility to the current telephone network, and thus, no modification is needed in the network nodes. Experimental results show that the proposed solution can help to improve significantly the perceptual quality of narrowband telephone signal.
Convention Paper 7654 (Purchase now)
P1-5 Feature Selection vs. Feature Space Transformation in Music Genre Classification Framework—Hanna Lukashevich, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
Automatic classification of music genres is an important task in music information retrieval research. Nearly all state-of-the-art music genre recognition systems start from the feature extraction block. The extracted acoustical features often could tend to be correlated or/and redundant, which can cause various difficulties in the classification stage. In this paper we present a comparative analysis on applying supervised Feature Selection (FS) and Feature Space Transformation (FST) algorithms to reduce the feature dimensionality. We discuss pros and cons of the methods and weigh the benefits of each one against the others.
Convention Paper 7655 (Purchase now)
Audio for Games and Interactive Media
Thursday, May 7, 09:00 — 11:30
Chair: Michael Kelly
P2-1 Viable Distribution of Multichannel Audio-over-IP for Live and Interactive “Voice Talent”-Based Gaming Using High-Quality, Low-Latency Audio Codec Technology—Gregory Massey, APT - Belfast, Northern Ireland, UK
The delivery of multichannel audio—from mono to surround sound—in real-time over public IP networks for the purpose of interactive crowd-participant gaming presents a significant design engineering challenge to games developers, console manufacturers, ISPs, and CDNs. Leveraging expertise gained in professional broadcasting and recording studio postproduction, APT has developed a robust and scalable audio codec technology that meshes with popular gaming systems to realize low-latency distribution of high-quality audio for immersive, instantaneous audio experiences in massively multi-player online games involving interactive audience responses to vocal/singing talent.
Paper will be presented by David Trainor.
Convention Paper 7656 (Purchase now)
P2-2 Elevator: Emotional Tracking Using Audio/Visual Interaction—Basileios Psarras, Andreas Floros, Marianna Strapatsakis, Ionian University - Corfu, Greece
The research interest on modeling everyday human emotions and controlling them through typical multimedia content (i.e., audio and video data) has recently increased. In this paper an interactive methodology is introduced for detecting, controlling, and tracking emotions. Based on the above methodology, an interactive audiovisual installation termed “Elevator” was realized, aiming to analyze and manipulate simple emotions of the participants (such as anger) using simplified emotion detection audio signal processing techniques and specifically selected combined audio/visual content. As a result, the human emotions are “elevated” to pre-defined levels and appropriately mapped to visual content, which corresponds to the emotional “thumbnail” of the participants.
Convention Paper 7657 (Purchase now)
P2-3 Applications of Bending Wave Technology in Human Interface Devices—Neil Harris, New Transducers Ltd. (NXT) - Cambridge, UK
The application of bending-waves to so-called “flat panel loudspeakers” has often been the topic of papers at AES Conventions. This paper looks at other interesting applications of the technology that have, or are beginning to have commercial pull. These applications are also part of the interface between human and machine, but focus on the sense of touch rather than of hearing. The idea of a touch screen is not a new but is only now becoming ubiquitous with a new generation of devices, typified by the i-Phone. If touch sensors are the analog of the microphone, then haptic feedback generators are the analog of the loudspeaker. Bending waves are beginning to find application here too.
Convention Paper 7658 (Purchase now)
P2-4 Designing Auditory Display Menu Interfaces—Cues for Users' Current Location in Extensive Menus—Erik Sikström; Jan Berg, Luleå University of Technology - Luleå, Sweden
This paper reviews the current research in auditory display in search for design guidelines for presenting the contents in audio-only menu interfaces. The aim of the review is to find new directions for auditory display menu interface design. Among several techniques for representing individual menu items the preliminary results show that the spearcon seems to be the most suitable method. For the layout of menu items, studies have shown that spatial separation, different timbres, and staggering onset between the items improves recognition rates, particularly for concurrently presented items. A remaining issue to be investigated is how to remind the user of her current location in the menus of extensive menu interfaces.
Convention Paper 7659 (Purchase now)
P2-5 Symmetry Model-Based Key Finding—Markus Mehnert, Technische Universität Ilmenau - Ilmenau, Germany; Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Daniel Arndt, Fraunhofer IIS - Ilmenau, Germany
In this paper we introduce a new key finding algorithm that is based on the symmetry model introduced by Gatzsche et al. The algorithm consists of two parts. First, the most probable diatonic pitch class set of the musical piece is recognized. Second, using one of the subspaces of the symmetry model the mode of the piece is estimated. The algorithm is evaluated with 100 Beatles songs, 90 newer “Pop and Rock” songs, and 252 classical pieces from the Naxos database. The results will be compared to the algorithms of Lerch, Zhu et al., and an algorithm based on binary major and minor chord profiles. The new algorithm has the highest overall key finding MIREX’05 score of 82.9 percent.
Convention Paper 7660 (Purchase now)
Recording, Reproduction, and Delivery
Thursday, May 7, 10:00 — 11:30
P3-1 Audio Content Annotation, Description, and Management Using Joint Audio Detection, Segmentation, and Classification Techniques—Christos Vegiris, Charalambos Dimoulas, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on audio content management by means of joint audio segmentation and classification. We concentrate on the separation of typical audio classes, such as silence/background noise, speech, crowded speech, music, and their combinations. A compact feature-vector subset is selected by a Correlation feature selection subset evaluation algorithm after the use of EM clustering algorithm on an initial audio data set. Time and spectral parameters are extracted using filter-banks and wavelets in combination with sliding windows and exponential moving averaging techniques. Features are extracted on a point-to-point basis, using the finest possible time resolution, so that each sample can be individually classified to one of the available groups. Clustering algorithms like EM or Simple K-means are tested to evaluate the final point-to-point classification result, therefore the joint audio detection-classification indexes. The extracted audio detection, segmentation, and classification results can be incorporated into appropriate description schemes that would annotate audio events/segments for content description and management purposes.
Convention Paper 7661 (Purchase now)
P3-2 Ambience Sound Recording Utilizing Dual MS (Mid-Side) Microphone Systems Based upon Frequency Dependent Spatial Cross Correlation (FSCC) [Part 3: Consideration of Microphones’ Locations]—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, University of Tokyo - Tokyo, Japan
In order to achieve ambient and exactly sound-localized musical recording with fewer number of microphones, we studied sound acquisition performances of microphone arrangements utilizing their Frequency Dependent Spatial Cross Correlation (FSCC). The result is that an MS microphone is best for this purpose. The setting of the microphone's directional azimuth at 132 degrees is the best for ambient sound acquisition and setting of that at 120 degrees is best for on-stage sound acquisition. We conducted actual concert recordings with a combination of those MS microphones (Dual MS microphone systems) and obtained satisfactory results. Successively, we studied the proper setting positions of those microphones. For ambient sound acquisition, suspending the microphone at the center of a concert hall is favorable, and for on-stage sound acquisition, locating it at almost above the conductor’s position will also be satisfactory. Process of the studies will be reported.
Convention Paper 7662 (Purchase now)
P3-3 A Comparative Approach to Sound Localization within a 3-D Sound Field—Martin J. Morrell, Joshua D. Reiss, Queen Mary, University of London - London, UK
In this paper we compare different methods for sound localization around and within a 3-D sound field. The first objective is to determine which form of panning is consistently preferred for panning sources around the loudspeaker array. The second objective and main focus of the paper is localizing sources within the loudspeaker array. We seek to determine if the sound sources can be located without movement or a secondary reference source. The authors compare various techniques based on ambisonics, vector base amplitude panning and time delay based panning. We report on subjective listening tests that show which method of panning is preferred by listeners and rate the success of panning within a 3-D loudspeaker array.
Convention Paper 7663 (Purchase now)
P3-4 The Effect of Listening Room on Audio Quality in Ambisonics Reproduction—Olli Santala, Helsinki University of Technology - Espoo, Finland; Heikki Vertanen, Helsinki University of Technology - Espoo, Finland, University of Helsinki, Helsinki, Finland; Jussi Pekonen, Jan Oksanen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
In multichannel reproduction of spatial audio with first-order Ambisonics the loudspeaker signals are relatively coherent, which produces prominent coloration. The coloration artifacts have been suggested to depend on the acoustics of the listening room. This dependency was researched with subjective listening tests in an anechoic chamber with an octagonal loudspeaker setup. Different virtual listening rooms were created by adding diffuse reverberation with 0.25 seconds RT60 using a 3-D 16-channel loudspeaker setup. In the test, the subjects compared the audio quality in the virtual rooms. The results suggest that optimal audio quality was obtained when the virtual room effect and the direct sound were on equal level at the listening position.
Convention Paper 7664 (Purchase now)
P3-5 Ontology-Based Information Management in Music Production—Gyorgy Fazekas, Mark Sandler, Queen Mary, University of London - London, UK
In information management, ontologies are used for defining concepts and relationships of a domain in question. The use of a schema permits structuring, interoperability, and automatic interpretation of data, thus allows accessing information by means of complex queries. In this paper we use ontologies to associate metadata, captured during music production, with explicit semantics. The collected data is used for finding audio clips processed in a particular way, for instance, using engineering procedures or acoustic signal features. As opposed to existing metadata standards, our system builds on the Resource Description Framework, the data model of the Semantic Web, which provides flexible and open-ended knowledge representation. Using this model, we demonstrate a framework for managing information, relevant in music production.
Convention Paper 7665 (Purchase now)
Recording, Reproduction, and Delivery
Thursday, May 7, 14:00 — 18:30
Chair: Joerg Wuttke
Siegfried Linkwitz, Linkwitz Lab
P4-1 An Expert in Absentia: A Case-Study for Using Technology to Support Recording Studio Practice—Andrew King, University of Hull - Scarborough, North Yorkshire, UK
This paper examines the use of a Learning Technology Interface (LTI) to support the completion of a recording workbook with audio examples over a ten-week period. The VLE provided contingent support to studio users for technical problems encountered in the completion of four recording tasks. Previous research has investigated how students collaborate and problem-solve during a short session in the recording studio using technology as a contingent support tool. In addition, online message boards have been used to record problems encountered when completing a prescribed task (critical-incident recording). A mixed-methods case study approach was used in this study. The students interactions within the LTI were logged (i.e., frequency, time, duration, type of support) and their feedback was elicited via a user questionnaire at the end of the project. Data for this study demonstrates that learning technology can be a successful support tool and also highlights the frequency and themes concerning the types of recording practice information accessed by the learners.
Convention Paper 7669 (Purchase now)
P4-2 Recording and Reproduction over Two Loudspeakers as Heard Live—Part 1: Hearing, Loudspeakers, and Rooms—Siegfried Linkwitz, Linkwitz Lab - Corte Madera, CA, USA; Don Barringer, Linkwitz Lab - Arlington, CA, USA
Innate hearing processes define the realism that can be obtained from reproduced sound. An unspecified system with two loudspeakers in a room places considerable limitations upon the degree of auditory realism that can be obtained. It has been observed that loudspeakers and room must be hidden from the auditory scene that is evoked in the listener’s brain. Requirements upon the polar response and the output volume capability of the loudspeaker will be discussed. Problems and solutions in designing a three-way, open baffle loudspeaker with piston drivers will be presented. Loudspeakers and listener must be symmetrically placed in the room to minimize the effects of reflections upon the auditory illusion.
Convention Paper 7670 (Purchase now)
P4-3 Recording and Reproduction over Two Loudspeakers as Heard Live—Part 2: Recording Concepts and Practices—Don Barringer, Linkwitz Lab - Arlington, VA, USA; Siegfried Linkwitz, Linkwitz Lab - Corte Madera, CA, USA
For a half century, the crucial interaction between recording engineer and monitor loudspeakers during two-channel stereophonic recording has not been resolved, leaving the engineer to cope with uncertainties. However, recent advances in defining and improving this loudspeaker-room-listener interface have finally allowed objectivity to inform and shape the engineer’s choices. The full potential of the two-channel format is now accessible to the recording engineer, and in a room that is just as normal as most consumers’ rooms. The improved reproduction has also allowed a deeper understanding of the merits and limits of spaced and coincident/near-coincident microphone arrays. As a result of these and earlier observations, a four-microphone array was conceived that exploits natural hearing processes to achieve greater auditory realism from two loudspeakers. A number of insights have emerged from the experiments.
Convention Paper 7671 (Purchase now)
P4-4 Vision and Technique behind the New Studios and Listening Rooms of the Fraunhofer IIS Audio Laboratory—Andreas Silzle, Stefan Geyersberger, Gerd Brohasga, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Dieter Weninger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany, Innovationszentrum für Telekommunikationstechnik GmbH IZT, Erlangen, Germany; Michael Leistner, Fraunhofer Institute for Building Physics IBP - Stuttgart, Germany
The new audio laboratory rooms of the Fraunhofer IIS and their technical design are presented here. The vision behind them is driven by the very high demands of a leading edge audio research organization with more than 100 scientists and engineers. The 300 m2 sound studio complex was designed with the intention of providing capabilities that are in combination far more extensive than those available in common audio research or production facilities. The reproduction room for listening tests follows the strict recommendations of ITU-R BS 1116. The results of the qualification measurements regarding direct sound, reflected sound, and steady state sound field will be shown and the construction efforts needed to achieve these values are explained. The connection from all the computers in the server room to more than 70 loudspeakers in the reproduction rooms, other audio interfaces, and the projection screens is done by an audio and video routing system. The architecture of the advanced control software of this routing system is presented. It allows easy and flexible access for each class of user to all the possibilities made available by this completely new system.
Convention Paper 7672 (Purchase now)
P4-5 Advances in National Broadcaster Networks: Exploring Transparent High Definition IPTV—Matthew O’Donnell, British Sky Broadcasting - Upminster, UK
British commercial broadcasters are increasing their ability to determine the quality of distribution of audio-over-IP by acquiring and installing next generation national Gigabit networks. This paper explores how broadcasters can use the advances in broadband technology to transparently integrate supplemental on-demand IPTV services with traditional broadcasting transport, which has led to broadcasters being confident in achieving scalable carrier-class quality of service for delivery of high definition media direct to the customer’s set top box.
Convention Paper 7673 (Purchase now)
P4-6 Multi-Perspective Surround Sound Audio Recording—Mark J. Sarisky, The University of Texas at Austin - Austin, TX, USA
With the advent of Blu-Ray Disc Audio (BD-Audio), high resolution uncompressed audio recordings can be presented as a consumer product in a variety of surround sound formats. This paper proposes a new take on the recording of live and studio music in surround sound that allows the consumer to benefit from the large capacity of the BD-Audio disc and enjoy the recording from multiple listening perspectives.
Convention Paper 7674 (Purchase now)
P4-7 Sound Intensity-Based Three-Dimensional Panning—Akio Ando, Kimio Hamasaki, NHK Science and Technical Research Laboratories - Setagaya, Tokyo, Japan
Three-dimensional (3-D) panning equipment is essential for the production of 3-D audio content. We have already proposed an algorithm to enable such panning. It generates the input signal to be fed into multichannel loudspeakers so as to realize the same physical properties of sound at the receiving point as those created by a single loudspeaker model of the virtual source. A sound pressure vector is used as the physical property. This paper proposes a new method that uses sound intensity instead of the sound pressure vector and shows that both conventional “vector base amplitude panning” and our previous method come very close to achieving coincidence of sound intensity. A new panning method using four loudspeakers is also proposed.
Convention Paper 7675 (Purchase now)
P4-8 A Practical Comparison of Three Tetrahedral Ambisonic Microphones—Dan Hemingson, Mark Sarisky, The University of Texas at Austin - Austin, TX, USA
This paper compares two low-cost tetrahedral ambisonic microphones, an experimental microphone, and a Core Sound TetraMic with a Soundfield MKV or SPS422B serving as a standard for comparison. Recordings were made in natural environments of live performances, in a recording studio, and in an anechoic chamber. The results of analytical and direct listening tests of these recordings are discussed in this paper. A description of the experimental microphone and the recording setup is included.
Convention Paper 7676 (Purchase now)
P4-9 A New Reference Listening Room for Consumer, Professional, and Automotive Audio Research—Sean Olive, Harman International - Northridge, CA, USA
This paper describes the features, scientific rationale, and acoustical performance of a new reference listening room designed for the purpose of conducting controlled listening tests and psychoacoustic research for consumer, professional, and automotive audio products. The main features of the room include quiet and adjustable room acoustics, a high-quality calibrated playback system, an in-wall loudspeaker mover, and complete automated control of listening tests performed in the room.
Convention Paper 7677 (Purchase now)
Thursday, May 7, 14:00 — 18:00
Chair: John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
P5-1 Estimating the Velocity Profile and Acoustical Quantities of a Harmonically Vibrating Loudspeaker Membrane from On-Axis Pressure Data—Ronald M. Aarts, Philips Research Europe - Eindhoven, The Netherlands, Technical University of Eindhoven, Eindhoven, The Netherlands; Augustus J. Janssen, Philips Research Europe - Eindhoven, The Netherlands
Formulas are presented for acoustical quantities of a harmonically excited resilient, flat, circular loudspeaker in an infinite baffle. These quantities are the sound pressure on-axis, far-field, directivity and the total radiated power. These quantities are obtained by expanding the velocity distribution in terms of orthogonal polynomials. For rigid and non-rigid radiators, this yields explicit, series expressions for both the on-axis and far-field pressure. In the reverse direction, a method of estimating velocity distributions from (measured) on-axis pressures by matching in terms of expansion coefficients is described. Together with the forward far-field computation scheme, this yields a method for assessment of loudspeakers in the far-field and of the total radiated power from (relatively near-field) on-axis data (generalized Keele scheme).
Convention Paper 7678 (Purchase now)
P5-2 Testing and Simulation of a Thermoacoustic Transducer Prototype—Fotios Kontomichos, Alexandros Koutsioubas, John Mourjopoulos, Nikolaos Spiliopoulos, Alexandros Vradis, Stamatis Vassilantonopoulos, University of Patras - Patras, Greece
Thermoacoustic transduction is the transformation of thermal energy fluctuations into sound. Devices fabricated by appropriate materials utilize such a mechanism in order to achieve acoustic wave generation by direct application of an electrical audio signal and without the use of any moving components. A thermoacoustic transducer causes local vibration of air molecules resulting in a proportional pressure change. The present paper studies an implementation of this alternative audio transduction technique for a prototype developed on silicon wafer. Measurements of the performance of this hybrid solid state device are presented and compared to the theoretical principles of its operation, which are evaluated via simulations.
Convention Paper 7679 (Purchase now)
P5-3 Analysis of Viscoelasticity and Residual Strains in an Electrodynamic Loudspeaker—Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
An electrodynamic loudspeaker was analyzed in three steps: (a) as a device supplied by the market, (b) removed upper suspension, and (c) dismantled assembly consisting only of vibrating spider and voice coil. In three steps, resonant frequency and stiffness were measured dynamically for driving currents up to 100 mA, whereas stiffness was also measured quasi-statically by the use of calibrated masses. It was found that widely quoted effect of decreasing resonant frequency, as plotted against driving current, comes from the residual strain in the vibrating material, and significant contribution is associated with the spider. When driving current increases residual strain is gradually compensated, giving rise to the minimum of stiffness, and further increase of resonant frequency is attributed to a common nonlinearity in the forced vibrating system.
Convention Paper 7680 (Purchase now)
P5-4 Forces in Cylindrical Metalized Film Audio Capacitors—Philip J. Duncan, University of Salford, Greater Manchester, UK; Nigel Williams, Paul S. Dodds, ICW Ltd. - Wrexham, Wales, UK
This paper is concerned with the analysis of forces acting in metalized polypropylene film capacitors in use in loudspeaker crossover circuits. Capacitors have been subjected to rapid discharge measurements to investigate mechanical resonance of the capacitor body and the electrical forces that drive the resonance. The force due to adjacent flat current sheets has been calculated in order that the magnitude of the electro-dynamic force due to the discharge current can be calculated and compared with the electrostatic force due to the potential difference between the capacitor plates. The electrostatic force is found to be dominant by several orders of magnitude, contrary to assumptions in previous work where the electro-dynamic force is assumed to be dominant. The capacitor is then modeled as a series of concentric cylindrical conductors and the distribution of forces within the body of the capacitor is considered. The primary outcome of this is that the electrostatic forces act predominantly within the inner and outer turn of the capacitor body, while all of the forces acting within the body of the capacitor are balanced almost to zero. Experimental results where resonant acoustic emissions have been measured and analyzed are presented and discussed in the context of the model proposed.
Convention Paper 7682 (Purchase now)
P5-5 On the Use of Motion Feedback as Used in 4th Order Systems—Stefan Willems, Denon & Marantz Holding, Premium Sound Solutions - Leuven Belgium; Guido D’Hoogh, Retired
Class D amplification allows the design of compact very high power amplifiers with a high efficiency. Those amplifiers are an excellent candidate for being used in compact high-powered subwoofers. The drawback of compact subwoofers is the nonlinear compression of the air inside the (acoustically) small box. Fourth order systems are beneficial over 2nd order systems due to their increased efficiency. To combine the best of both worlds, 4th order design and acoustically small enclosures, a feedback mechanism has been developed to reduce the nonlinear distortion found in compact high-powered subwoofers. Acceleration feedback on woofer systems is traditionally used in 2nd order systems. This paper discusses the use of an acceleration and velocity feedback system applied to a 4th order system.
Convention Paper 7683 (Purchase now)
P5-6 Mapping of the Loudspeaker Emission by the Use of Anemometric Method—Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia; Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia
Lateral wire anemometry (LWA) has been developed for recording of air vibration. Standard anemometry is founded upon the hot wire method, and wire temperature changes in the oscillating air velocity in the range 800-1000 °C, which is less suitable because of the proper heat emission from the wire. LWA deals only with the initial slope of the changing wire resistance, and subsequent Fourier analysis enables measurements of periodic air velocity. The probe has been developed for precise mapping of the air velocity field in the front of the membrane, and local power emission of the membrane may be evaluated in the region fitted to 0.15 cm2.
Convention Paper 7684 (Purchase now)
P5-7 Flat Panel Loudspeaker Consisting of an Array of Miniature Transducers—Daniel Beer, Stephan Mauer, Sandra Brix, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Jürgen Peissig, Sennheiser Electronic GmbH & Co. KG - Wedemark, Germany
Multichannel audio reproduction systems like the Wave Field Synthesis (WFS) use a large number of small and closely spaced loudspeakers. The successful use of WFS requires, among other things, the ability of an "invisible” integration of loudspeakers in a room. Flat panel loudspeakers compared with conventional cone loudspeakers provide advantages in the space saved room integration because of their low manufactured depth. In this way flat panel loudspeakers can be found in furniture, media devices, or like pictures hung on the wall. Besides the integration, flat loudspeakers should provide at least the same good acoustical performance as conventional loudspeakers. This is indeed a problem, because the low depth negatively influences the acoustical quality of reproduction in the lower and middle frequency range. This paper demonstrates a new flat panel loudspeaker consisting of an array of miniature transducers.
Convention Paper 7685 (Purchase now)
P5-8 Subwoofer Loudspeaker System with Dynamic Push-Pull Drive—Drazenko Sukalo, DSLab–Device Solution Laboratory - Munich, Germany
This paper examines the influence of mutual coupling between two driver-diaphragms driven by two electrical signals, each with a 90° phase shift on the voice-coil impedance curve. A new model of the system is described, and the effects are observed using the electrical circuit simulator PSpice. Finally, predicted and measured values are presented.
Convention Paper 7686 (Purchase now)
Thursday, May 7, 14:00 — 15:30
P6-1 Adaptive Predictive Modeling of Stereo LPC with Application to Lossless Audio Compression—Florin Ghido, Ioan Tabus, Tampere University of Technology - Tampere, Finland
We propose a novel method for exploiting the redundancy of stereo linear prediction coefficients by using adaptive linear prediction for the coefficients themselves. We show that an important proportion of the stereo linear prediction coefficients, on both the intrachannel and the interchannel parts, still contains important redundancy inherited from the signal. We can therefore significantly reduce the amplitude range of those LP coefficients by using adaptive linear prediction with orders up to 4, separately on the intrachannel and intrachannel parts. When integrated into asymmetrical OptimFROG, the new method obtains on average 0.29 percent improvement in compression with negligible increase in decoder complexity.
Convention Paper 7666 (Purchase now)
P6-2 A Study of MPEG Surround Configurations and Its Performance Evaluation—Evelyn Kurniawati, Samsudin Ng, Sapna George, ST Microelectronics Asia Pacific Pte. Ltd. - Singapore
The standardization of MPEG Surround in 2007 opens a new range of possibility for low bit rate multichannel audio encoding. While ensuring backward compatibility with legacy decoder, MPEG Surround offers various configurations to upmix to the desired number of channels. The downmix stream, which can be in mono or stereo format, can be passed to transform, hybrid, or any other types of encoder. These options give us more than one possible combination to encode a multichannel stream at a specific bit rate. This paper presents a comparative study between those options in terms of their quality performance that will help us choose the most suitable configuration of MPEG Surround in a range of operating bit rate.
Convention Paper 7667 (Purchase now)
P6-3 Lossless Compression of Spherical Microphone Array Recordings—Erik Hellerud, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
The amount of spatial redundancy for recordings from a spherical microphone array is evaluated using a low delay lossless compression scheme. The original microphone signals, as well as signals transformed to the spherical harmonics domain, are investigated. It is found that the correlation between channels is, as expected, very high for the microphone signals, in several di?erent acoustical environments. For the signals in the spherical harmonics domain, the compression gain from using inter-channel prediction is reduced, since this conversion results in many channels with low energy. Several alternatives for reducing the coding complexity are also investigated.
Convention Paper 7668 (Purchase now)
Spatial Audio Processing
Thursday, May 7, 16:30 — 18:00
P7-1 Low Complexity Binaural Rendering for Multichannel Sound—Kangeun Lee, Changyong Son, Dohyung Kim, Samsung Advanced Institute of Technology - Suwon, Korea
The current paper is concerned with an effective method to emulate the multichannel sound in a portable environment where low power is required. The goal of this paper is to show the complexity of binaural rendering of the multichannel to stereo sound systems in cases of portable devices. To achieve this, we proposed the modified discrete cosine transform (MDCT) based binaural rendering, combined with the Dolby Digital decoder (AC-3) that is a multichannel audio decoder. A reverberation algorithm is added to the proposed algorithm for closing to real sound. This combined structure is implemented on a DSP processer. The complexity and quality are compared with a conventional head-related transfer function (HRTF) filtering method and Dolby headphone that are the most current in commercial binaural rending technology, demonstrating significant complexity reduction and comparable sound quality to the Dolby headphone.
Convention Paper 7687 (Purchase now)
P7-2 Optimal Filtering for Focused Sound Field Reproductions Using a Loudspeaker Array—Youngtae Kim, Sangchul Ko, Jung-Woo Choi, Jungho Kim, SAIT, Samsung Electronics Co., Ltd. - Gyeonggi-do, Korea
This paper describes audio signal processing techniques in designing multichannel filters for reproducing an arbitrary spatial directivity pattern with a typical loudspeaker array. In designing the multichannel filters, some design criteria based on, for example, least-squares methods and the maximum energy array are introduced as non-iterative optimization techniques with a lower computational complexity. The abilities of the criteria are first evaluated with a given loudspeaker configuration for reproducing a desired acoustic property in a spatial area of interest. Also, additional constraints are considered to impose for minimizing the error between the amplitudes of actual and the desired spatial directivity pattern. Their limitations in practical applications are revealed by experimental demonstrations, and finally some guidelines are proposed in designing optimal filters.
Convention Paper 7688 (Purchase now)
P7-3 Single-Channel Sound Source Distance Estimation Based on Statistical and Source-Specific Features—Eleftheria Georganti, Philips Research Europe - Eindhoven, The Netherlands, University of Patras, Patras, Greece; Tobias May, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Steven van de Par, Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands; John Mourjopoulos, University of Patras - Patras, Greece
In this paper we study the problem of estimating the distance of a sound source from a single microphone recording in a room environment. The room effect cannot be separated from the problem without making assumptions about the properties of the source signal. Therefore, it is necessary to develop methods of distance estimation separately for different types of source signals. In this paper we focus on speech signals. The proposed solution is to compute a number of statistical and source-specific features from the speech signal and to use pattern recognition techniques to develop a robust distance estimator for speech signals. Experiments with a database of real speech recordings showed that the proposed model is capable of estimating source distance with acceptable performance for applications such as ambient telephony.
Convention Paper 7689 (Purchase now)
P7-4 Implementation of DSP-Based Adaptive Inverse Filtering System for ECTF Equalization—Masataka Yoshida; Haruhide Hokari; Shoji Shimada, Nagaoka University of Technology - Nagaoka, Niigata, Japan
The Head Related Transfer Function (HRTF) and the inverse Ear Canal Transfer Function (ECTF) must be accurately determined if stereo earphones are realized out-of-head sound localization (OHL) with high presence. However, the characteristics of ECTF depend on the type of earphone used and the number of earphone mounting and demounting operations. Therefore, we present a DSP-based adaptive inverse filtering system for ECTF equalization in this paper. The buffer composition and size of DSP were studied so as to implement operation processing. As a result, we succeeded in constructing a system that was able to work in the audio-band of 15 kHz with the sampling frequency of 44.1 kHz. Listening tests clarified that the effective estimation error of the adaptive inverse-ECTF for OHL was less than –11 dB with convergence time of about 0.3 seconds.
Convention Paper 7690 (Purchase now)
P7-5 Improved Localization of Sound Sources Using Multi-Band Processing of Ambisonic Components—Charalampos Dimoulas, George Kalliris, Konstantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the use of multi-band ambisonic-processing for improved sound source localization. Energy-based localization can be easily delivered using soundfield microphone pairs, as long as free field conditions and the single omni-directional-point-source model apply. Multi-band SNR-based selective processing improves the noise tolerance and the localization accuracy, eliminating the influence of reverberation and background noise. Band-related sound-localization statistics are further exploited to verify the single or multiple sound-sources scenario, while continuous spectral fingerprinting indicates the potential arrival of a new source. Different sound-excitation scenarios are examined (single /multiple sources, narrowband / wideband signals, time-overlapping, noise, reverberation). Various time-frequency analysis schemes are considered, including filter-banks, windowed-FFT and wavelets with different time resolutions. Evaluation results are presented.
Convention Paper 7691 (Purchase now)
P7-6 Spatial Audio Content Management within the MPEG-7 Standard of Ambisonic Localization and Visualization Descriptions—Charalampos Dimoulas, George Kalliris, Kostantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on spatial audio video/imaging and sound field visualization using ambisonic-processing, combined with MPEG-7 description schemes for multi-modal content description and management. Sound localization can be easily delivered using multi-band ambisonic processing under free-field and single point-source excitation conditions, offering an estimate on the achieved accuracy. Sound source forward propagation models can be applied in case that confident localization accuracy has achieved, to visualize the corresponding sound field. Otherwise, 3-D audio/surround sound reproduction simulation can be used instead. In any case, sound level distribution colormap-videos and highlighting images can be extracted. MPEG-7 adapted description schemes are proposed for spatial-audio audiovisual content description and management, facilitating a variety of user-interactive postprocessing applications.
Convention Paper 7692 (Purchase now)
Friday, May 8, 09:00 — 11:30
Chair: Ronald M. Aarts
P8-1 Phase Velocity and Group Velocity in Cylindrical and Spherical Waves—Ian M. Dash, Australian Broadcasting Corporation - Sydney, NSW, Australia; Fergus R. Fricke, University of Sydney - Sydney, NSW, Australia
Closed-form expressions are derived for phase velocity and group velocity in cylindrical and spherical sound waves. These are plotted and compared for orders 0, 1, and 2, but the expressions are general and may be applied to waves of any order. Dispersion characteristics of these waves are examined and discussed. The implications for thermodynamic applicability of the wave equation and for application of Huygens’ principle are discussed.
Convention Paper 7693 (Purchase now)
P8-2 Selection of Loudspeaker Positions for Reverberation Time and Sound Field Measurements—Elena Prokofieva, Napier University - Edinburgh, UK
According to the various building standards, the source loudspeakers and receiving microphones during the internal noise level measurements can be placed “in any convenient position,” with just some distance restrictions from the nearest reflecting surfaces. In rooms of different shapes and volumes the location of the source and receiver microphone may significantly affect the measured results. If the difference between the reverberation times or noise levels measured for two positions in the same room exceeds 10 percent, they cannot be averaged. The simulation program is created to recommend the most suitable locations for microphone and loudspeakers in tested room for reverberation time measurements. The results of series of tests are analyzed to confirm the results of the simulation.
Convention Paper 7694 (Purchase now)
P8-3 A Rehearsal Hall with Virtual Acoustics for Symphony Orchestras—Tapio Lokki, Jukka Pätynen, Helsinki University of Technology - Espoo, Finland; Timo Peltonen, Olli Salmensaari, Akukon Consulting Engineers Ltd. - Helsinki, Finland
A solution for constructing a small rehearsal hall, the acoustics of which resembles the stage of a large concert hall is presented. The implemented system was evaluated both objectively with measurements and subjectively by collecting feedback from musicians. The subjective opinions were very positive and encouraging and the main objective was achieved. The electroacoustically enhanced rehearsal space sounded like a much bigger hall, although the sound pressure level increased less than one decibel. The presented solution is applicable in all spaces, which are not very reverberant by nature and where the height of the room is at least twice the standard room height.
Convention Paper 7695 (Purchase now)
P8-4 Sound Field Characterization and Absorption Measurement of Wideband Absorbers—Soledad Torres-Guijarro, Laboratorio Oficial de Metroloxía de Galicia (LOMG) - Ourense, Spain; Antonio Pena, Alfonso Rodríguez-Molares, Norberto Degara-Quintela, Universidad de Vigo - Vigo, Spain
Wideband absorbers are a fundamental part of non-environment control rooms. They consist of huge angled hanging panels in conjunction with a multilayer wall or ceiling. Their absorption capacity is very noticeable, mostly in the low frequency range. In this paper the mechanisms of absorption of the wideband absorbers of the rear wall of the control room at the Universidad de Vigo will be studied. Conclusions will be drawn from the analysis of pressure, velocity volume, and intensity measurements performed in the vicinity of the panels, and from the computation of the normal specific acoustic impedance and the normal absorption coefficient.
Convention Paper 7696 (Purchase now)
P8-5 Temporal Matching of 2-D and 3-D Wave-Based Acoustic Modeling for Efficient and Realistic Simulation of Rooms—Jeremy J. Wells, Damian T. Murphy, Mark Beeson, University of York - York, UK
Methods for adapting the output of a two-dimensional Kirchoff-variable digital waveguide mesh to better match that of a 3-D mesh, both of which are intended to model the same acoustic space, are presented. Details of the methods, including quality of output and computational demands, are given along with the details of how they are incorporated into the hybrid system within which they are employed.
Convention Paper 7697 (Purchase now)
Signal Analysis, Measurements, Restoration
Friday, May 8, 09:00 — 12:30
Chair: Jan Abildgaard Pedersen
P9-1 Some Improvements of the Playback Path of Wire Recorders—Nadja Wallaszkovits, Phonogrammarchiv Austrian Academy of Sciences - Vienna, Austria; Heinrich Pichler, Audio Consultant - Vienna, Austria
The archival transfer of wire recordings to the digital domain is a highly specialized process that incorporates a wide range of specific challenges. One of the basic problems is the format incompatibility between different manufacturers and models. The paper discusses the special design philosophy, using the tone control network in the record path as well as in the playback path. This tone control circuit causes additional phase and group delay distortions. The influence and characteristics of the tone control (which was not a priori present with every model) is discussed and analog phase correction networks are described. The correction of phase errors is outlined. As this format has been obsolete for many decades, a high quality archival transfer can only be reached by modifying dedicated equipment. The authors propose some possible main modifications and improvements of the playback path of wire recorders, such as signal pickup directly after the playback head, introducing a high quality preamplifier, followed by analog phase correction and correction of the amplitude characteristics. Alternatively signal pickup directly after the playback head, introducing a high quality preamplifier, followed by digital signal processing to optimize the output signal is discussed.
Convention Paper 7698 (Purchase now)
P9-2 Acoustics of the Crime Scene as Transmitted by Mobile Phones—Eddy B. Brixen, EBB-consult - Smorum, Denmark
One task for the audio forensics engineer is to extract background information from audio recordings. A major problem is the assessment of analyzed telephone calls in general and mobile phones (LPC-algorithms) in particular. In this paper the kind of acoustic information to be extracted from a recorded phone call is initially explained. The parameters used for the characterization of the various acoustic spaces and events in question are described. It is discussed how the acoustical cues should be assessed. The validity of acoustic analyses carried out in the attempt to provide crime scene information like reverberation time is presented.
Convention Paper 7699 (Purchase now)
P9-3 Silence Sweep: A Novel Method for Measuring Electroacoustical Devices—Angelo Farina, University of Parma - Parma, Italy
This paper presents a new method for measuring some properties of an electroacoustical system, for example a loudspeaker or a complete sound system. Coupled with the already established method based on Exponential Sine Sweep, this new Silence Sweep method provides a quick and complete characterization of not linear distortions and noise of the device under test. The method is based on the analysis of the distortion products, such as harmonic distortion products or intermodulation effects, occurring when the system is fed with a wide-band signal. Removing from the test signal a small portion of the whole spectrum, it becomes possible to collect and analyze the not-linear response and the noise of the system in that “suppressed” band. Changing continuously the suppressed band over time, we get the Silence Sweep test signal, which allows for quick measurement of noise and distortion over the whole spectrum. The paper explains the method with a number of examples. The results obtained for some typical devices are presented, compared with those obtained with a standard, state-of-the-art measurement system.
Convention Paper 7700 (Purchase now)
P9-4 Pitch and Played String Estimation in Classic and Acoustic Guitars—Isabel Barbancho, Lorenzo Tardón, Ana M. Barbancho, Simone Sammartino, Universidad de Málaga - Málaga, Spain
In classic and acoustic guitars that use standard tuning, the same pitch can be produced at different strings. The aim of this paper is to present a method based on the time and frequency-domain characteristics of the recorded sound to determine, not only the pitch but also the string of the guitar that has been played to produce that pitch. This system will provide information not only of the pitch of the notes played, but also about how those notes were played. This specific information can be valuable to identify the style of the player and can be used in teaching to play the guitar.
Convention Paper 7701 (Purchase now)
P9-5 Statistical Properties of Music Signals—Miomir Mijic, Drasko Masovic, Dragana Sumarac-Pavlovic, Faculty of Electrical Engineering - Belgrade, Serbia
This paper is concerned with the results of a complex approach to statistical properties of various music signals based on 412 musical pieces classified in 12 different genres. Analyzed signals contain more than 24 hours of music. For each piece time variation of the signal level was found, performed with a 10 ms period of integration in rms calculation and with 90 percent overlap, making a new signal representing the level as a function of time. For each piece the statistical analysis of signal level has been performed by its statistical distribution, cumulative distribution, effective value within complete duration of piece, mean level value, and level value corresponding to maximum of the statistical distribution. The parameter L1, L10, L50, and L99 were extracted from cumulative distributions as numerical indicators of dynamic properties. The paper contains detailed statistical data and averaged data for all observed genres, as well as quantitative data about dynamic range and crest factor of various music signals.
Convention Paper 7702 (Purchase now)
P9-6 Multi-Band Generalized Harmonic Analysis (MGHA) and its Fundamental Characteristics in Audio Signal Processing—Takahiro Miura, Teruo Muraoka, Tohru Ifukube, University of Tokyo - Tokyo, Japan
One of the main problems in sound restoration of valuable historical recordings includes the noise reduction. We have been proposing and continuing to improve the noise reduction method utilized by inharmonic analysis such as GHA (Generalized Harmonic Analysis). Algorithm of GHA frequency extraction enables us to extract arbitrary frequency components. In this paper we aimed at more accurate frequency identification from noisy signals to divide analyzed frequency section into multi-bands before analysis: this algorithm is named as Multi-Band GHA (MGHA). The simulation of frequency analysis in a noise-free condition indicated that MGHA is more effective than GHA for the extraction of low frequency components in the condition of both lower window length and amount of frequency components. However, excluding the case of both lower window length and amount of frequency components, GHA identifies frequency components more precisely. Furthermore the result of frequency analysis in condition with steady noise shows that MGHA can be more effectively applied to the case of short window length, many frequency components, and low S/N.
Convention Paper 7703 (Purchase now)
P9-7 Automatic Detection of Salient Frequencies—Joerg Bitzer, University of Applied Science Oldenburg - Oldenburg, Germany; Jay LeBoeuf, Imagine Research, Inc. - San Francisco, CA, USA
In this paper we present several techniques to find the most significant frequencies in recorded audio tracks. These estimated frequencies could be used as a starting point for mixing engineers in the EQing process. In order to evaluate the results, we compare the detected frequencies with a list of reported salient frequencies from audio engineers. The results show that automatic detection is possible. Thus, one of the more boring tasks of a mixing engineer can be automated, which gives the mixing engineer more time to do the artistic part of the mixing process.
Convention Paper 7704 (Purchase now)
Audio for Telecommunications
Friday, May 8, 10:30 — 12:00
P10-1 Harmonic Representation and Auditory Model-Based Parametric Matching and its Application in Speech/Audio Analysis—Alexey Petrovsky, Elias Azarov, Belarusian State University of Informatics and Radioelectronics - Minsk, Belarus; Alexander Petrovsky, Bialystok Technical University - Bialystok, Poland
The paper presents new methods for the selection of sinusoids and transients components in hybrid sinusoidal modeling of speech/audio. The instantaneous harmonic parameters (magnitude, frequency, and phase) are calculated as the result of the narrow band filtering of speech/audio. The frequency-modulated filters synthesis with the closed form impulse response has been proposed. The filter frequency bounds can be determined during the components frequency tracking and can be adjusted according to the fundamental frequency modulations. It can be implemented speech/audio harmonic/noise decomposition. The transient components modeling are presented by matching pursuit with frame-based psychoacoustic optimized wavelet packet dictionary. The choice of most relevant coefficients is based on maximizing the matching between the auditory excitation scalograms of original and modeled signals.
Convention Paper 7705 (Purchase now)
P10-2 Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference—Toni Hirvonen, Institute of Computer Science (ICS) of the Foundation for Research and Technology - Hellas, Greece; Jukka Ahonen, Ville Pulkki, TKK - Finland
In teleconferencing application of Directional Audio Coding, the transmitted data consists of monophonic audio signal and directional metadata measured in frequency bands depending on time. In reproduction, each frequency channel of the signal is reproduced to corresponding direction with corresponding diffuseness. This paper examines methods for reducing the data rate of the metadata. The compression methods are based on psychoacoustic studies about the accuracy of directional hearing, and further developed and validated. Informal tests with one-way reproduction, as well as usability testing where an actual teleconference was arranged, were utilized for this purpose. The results indicate that the data rate can be as low as approximately 3 kbit/s without a significant loss in the reproduced spatial quality.
Convention Paper 7706 (Purchase now)
P10-3 Speaker Detection and Separation with Small Microphone Arrays—Maximo Cobos, Jose J. Lopez, David Martinez, Universidad Politécnica de Valencia - Valencia, Spain
Small microphone arrays are desirable for many practical speech processing applications. In this paper we describe a system for detecting several sound sources in a room and enhancing a predominant target source using a pair of close microphones. The system consists of three main steps: time-frequency processing of the input signals, source localization via model fitting, and time-frequency masking for interference reduction. Experiments and results using recorded signals in real scenarios are discussed.
Convention Paper 7707 (Purchase now)
P10-4 Directional Audio Coding with Stereo Microphone Input—Jukka Ahonen, Ville Pulkki, TKK - Finland; Fabian Kuech, Giovanni Del Galdo, Markus Kallinger, Richard Schultz-Amling, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The use of stereo microphone configuration as input to teleconference application of Directional Audio Coding (DirAC) is presented. DirAC is a method for spatial sound processing, in which the direction of the arrival of sound and diffuseness are analyzed and used for different purposes in reproduction. So far, omnidirectional microphones arranged in an array have been used to generate input signals for one- and two-dimensional sound field analysis in DirAC processing. In this study the possibility to use domestic stereo microphones with DirAC analysis is investigated. Different methods to derive omnidirectional and dipole signals from stereo microphones for directional analysis are presented and their applicability is discussed.
Convention Paper 7708 (Purchase now)
P10-5 Robust Noise Reduction Based on Stochastic Spatial Features—Mitsunori Mizumachi, Kyushu Institute of Technology - Fukuoka, Japan
This paper proposes a robust noise reduction method relying on stochastic spatial features. Almost all of noise reduction methods have both strong and weak sides in the real world. In this paper time evolution of direction of arrival (DOA) and its stochastic reliability are the clues for selecting a suitable approach of noise reduction under time-variant noisy environments, where a DOA is an important spatial feature in beamforming for noise reduction. On the other hand, single channel approaches for noise reduction may be reasonable when DOA estimates are not reliable. Then, either spectral subtraction or beamforming is selected out for achieving robust noise reduction depending on a DOA estimate and its reliability. The proposed method had an advantage in noise reduction compared with a conventional approach.
Convention Paper 7709 (Purchase now)
Friday, May 8, 13:00 — 16:30
Chair: Nick Zacharov
P11-1 A Novel Scheme for Low Bit Rate Unified Speech and Audio Coding—MPEG RM0—Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Markus Multrus, Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Stefan Bayer, Guillaume Fuchs, Johannes Hilpert, Nikolaus Rettelbach, Frederik Nagel, Julien Robilliard, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Redwan Salami, VoiceAge Corporation - Montreal, Quebec, Canada; Gerald Schuller, Fraunhofer IDMT - Ilmenau, Germany; Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Bernhard Grill, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Coding of speech signals at low bit rates, such as 16 kbps, has to rely on an efficient speech reproduction model to achieve reasonable speech quality. However, for audio signals not fitting to the model this approach generally fails. On the other hand, generic audio codecs, designed to handle any kind of audio signal, tend to show unsatisfactory results for speech signals, especially at low bit rates. To overcome this, a process was initiated by ISO/MPEG, aiming to standardize a new codec with consistent high quality for speech, music, and mixed content over a broad range of bit rates. After a formal listening test evaluating several proposals MPEG has selected the best performing codec as the reference model for the standardization process. This paper describes this codec in detail and shows that the new reference model reaches the goal of consistent high quality for all signal types.
Convention Paper 7713 (Purchase now)
P11-2 A Time-Warped MDCT Approach to Speech Transform Coding—Bernd Edler, Sascha Disch, Leibniz Universität Hannover - Hannover, Germany; Stefan Bayer, Guillaume Fuchs, Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The modified discrete cosine transform (MDCT) is often used for audio coding due to its critical sampling property and good energy compaction, especially for harmonic tones with constant fundamental frequencies (pitch). However, in voiced human speech the pitch is time-varying and thus the energy is spread over several transform coefficients, leading to a reduction of coding efficiency. The approach presented herein compensates for pitch variation in each MDCT block by application of time-variant re-sampling. A dedicated signal adaptive transform window computation ensures the preservation of the time domain aliasing cancellation (TDAC) property. Re-sampling can be designed such that the duration of the processed blocks is not altered, facilitating the replacement of the conventional MDCT in existing audio coders.
Convention Paper 7710 (Purchase now)
P11-3 A Phase Vocoder Driven Bandwidth Extension Method with Novel Transient Handling for Audio Codecs—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Disch, Leibniz Universitaet Hanover - Hanover, Germany; Nikolaus Rettelbach, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Storage or transmission of audio signals is often subject to strict bit-rate constraints. This is accommodated by audio encoders that encode the lower frequency part in a waveform preserving way and approximate the high frequency signal from the lower frequency data by using a set of reconstruction parameters. This so called bandwidth extension can lead to roughness and other unpleasant auditory sensations. In this paper the origin of these artifacts is identified, and an improved bandwidth extension method called Harmonic Bandwidth Extension (HBE) is outlined avoiding auditory roughness in the reconstructed audio signal. Since HBE is based on phase vocoders, and thus intrinsically not well suited for transient signals, an enhancement of the method by a novel transient handling approach is presented. A listening test demonstrates the advantage of the proposed method over a simple phase vocoder approach.
Convention Paper 7711 (Purchase now)
P11-4 Efficient Cross-Fade Windows for Transitions between LPC-Based and Non-LPC-Based Audio Coding—Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-LPC-based coding mode (based on AAC) operating in the transform-domain and an LPC-based coding (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wLPT). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-LPC-based coding. This paper presents the new set of windows that was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes.
Convention Paper 7712 (Purchase now)
P11-5 Low Bit-Rate Audio Coding in Multichannel Digital Wireless Microphone Systems—Stephen Wray, APT Licensing Ltd. - Belfast, Northern Ireland, UK
Despite advances in voice and data communications in other domains, sound production for live events (concerts, theater, conferences, sports, worship, etc.) still largely depends on spectrum-inefficient forms of analog wireless microphone technology. In these live scenarios, low-latency transmission of high-quality audio is mission critical. However, while demand increases for wireless audio channels (for microphones, in-ear monitoring, and talkback
systems), some of the radio bands available for “Program Making and Special Events” are to be re-assigned for new wireless mobile telephony and Internet connectivity services: the FCC recently decided to permit so-called White Space Devices to operate in sections of UHF spectrum previously reserved for shared use by analog TV and wireless microphones. This paper examines the key performance aspects of low bit-rate audio codecs for the next generation of bandwidth-efficient digital wireless microphone systems that meet the future needs of live events.
Convention Paper 7714 (Purchase now)
P11-6 Krasner’s Audio Coder Revisited—Jamie Angus, Chris Ball, Thomas Peeters, Rowan Williams, University of Salford - Salford, Greater Manchester, UK
An audio compression encoder and decoder system based on Krasner’s work was implemented. An improved Quadrature Mirror Filter tree, which more closely approximates modern critical band measurements, splits the input signal into sub bands that are encoded using both adaptive quantization and entropy coding. The uniform adaptive quantization scheme developed by Jayant was implemented and enhanced through the addition of non-uniform quantization steps and look ahead. The complete codecs are evaluated using the perceptual audio evaluation algorithm PEAQ and their performance compared to equivalent MPEG-1 Layer III files. Initial, limited, tests reveal that the proposed codecs score Objective Difference Grades close to or even better than MPEG-1 Layer III files encoded at a similar bit rate.
Convention Paper 7715 (Purchase now)
P11-7 Inter-Channel Prediction to Prevent Unmasking of Quantization Noise in Beamforming—Mauri Väänänen, Nokia Research Center - Tampere, Finland
This paper studies the use of inter-channel prediction for the purpose of preventing or reducing the risk of noise unmasking when beamforming type of processing is applied to quantized microphone array signals. The envisaged application is the re-use and postprocessing of user-created content. Simulations with an AAC coder and real-world recordings using two microphones are performed to study the suitability of two existing coding tools for this purpose: M/S stereo coding and the AAC Long Term Predictor (LTP) tool adapted for inter-channel prediction. The results indicate that LTP adapted for inter-channel prediction often gives more coding gain than mere M/S stereo coding, both in terms of signal-to-noise ratio and perceptual entropy.
Convention Paper 7716 (Purchase now)
Friday, May 8, 13:30 — 15:00
P12-1 Reduction of Distortion in Conical Horn Loudspeakers at High Levels—Sverre Holm, University of Oslo - Oslo, Norway; Rune Skramstad, Paragon Arrays - Drammen, Norway
Many horns have audible distortion at high levels. We measured a horn consisting of 6 conical sections with a 10-inch element at 99 dB SPL. A closed back gave maximum 2.4 percent second harmonic and 3.4 percent third harmonic distortion in the 100–1000 Hz range, while an open construction had 1.25 percent and 0.6 percent. A new semi-permeable back chamber reduced this to 0.7 percent and 0.35 percent. We hypothesize that the distortion is partly due to the non-linear compliance of air in the back chamber, and partly is due to the element’s interaction with the front and back loading of the horn, and that the new construction loads the element in a more optimal way.
Convention Paper 7717 (Purchase now)
P12-2 Comparison of Different Methods for the Subjective Sound Quality Evaluation of Compression Drivers—José Martínez, Acustica Beyma S.L. - Valencia, Spain; Joan Croañes, Escola Politecnica Superior de Gandia - Valencia, Spain; Jorge Francés Monllor Jaime Ramis, Universidad de Alicante - Alicante, Spain
In this paper an approach to the problem of sound quality evaluation of radiating systems is considered, applying a perceptual model. One of the objectives is to use the parameter proposed by Moore [. . .] to test if it provides satisfactory results when it is applied to the quality evaluation of indirect radiation loudspeakers. Three compression drives have been used for these proposals. Recordings with different test signals at different input voltages have been done. Using this experimental base, an approach to the problem from different points of view is done: [. . .] Taking in consideration classic sound quality parameters such as roughness, sharpness, and tonality. [. . .] Applying the parameter suggested by Moore obtained from the application of a perceptual model. Moreover, a psychoacoustic experiment has been made on a population of 25 people. The results, although preliminary and strongly dependant on the reference signal used to obtain Rnonlin, show a good correlation with the Rnonlin values.
Convention Paper 7718 (Purchase now)
P12-3 Membrane Modes in Transducers with the Direct D/A Conversion—Libor Husník, Czech Technical University in Prague - Prague, Czech Republic
Operating principle of systems with the direct acoustic D/A conversion, which are sometimes called digital loudspeakers, brings new features to the field of transducer design. There are many design possibilities to these systems, using different transduction principles and spatial arrangement of constituting parts. This paper deals with the single-acting condenser transducer, suitable for micromachining applications, in which the membrane is driven by a partitioned back electrode. While in conventional transducers the electric force between the back electrode and the membrane is evenly distributed, in digital transducers it is no longer the case. Consequences to membrane vibrations for some cases of excitation by various distributions of forces representing given binary combinations from the dynamic level are presented.
Convention Paper 7719 (Purchase now)
P12-4 Increasing Active Radiating Factor of High-Frequency Horns by Using Staggered Arrangement in Loudspeaker Line Array—Kang An, Yong Shen, Aiping Zhang, Nanjing University - Nanjing, China
Active Radiating Factor (ARF) is an important parameter to analyze the loudspeaker line array when considering the gap between each two transducers, especially for high-frequency horns. As ARF is desired to be as high as possible, the staggered arrangement of horns is introduced in this paper. The responses in vertical direction and horizontal direction are analyzed. Compared with the conventional arrangement, the negative effects of gaps are reduced and responses are improved in simulation.
Convention Paper 7720 (Purchase now)
Spatial Audio and Spatial Perception
Friday, May 8, 14:00 — 18:30
Chair: Tapio Lokki
P13-1 Evaluation of Equalization Methods for Binaural Signals—Zora Schärer, Alexander Lindau, TU Berlin - Berlin, Germany
The most demanding test criterion for the quality of binaural simulations of acoustical environments is whether they can be perceptually distinguished from a real sound field or not. If the simulation provides a natural interaction and sufficient spatial resolution, differences are predominantly perceived in terms of spectral distortions due to a non-perfect equalization of the transfer functions of the recording and reproduction systems (dummy head microphones, headphones). In order to evaluate different compensation methods, several headphone transfer functions were measured on a dummy head. Based upon these measurements, the performance of different inverse filtering techniques re-implemented from literature was evaluated using auditory measures for spectral differences. Additionally, an ABC/HR listening test was conducted, using two different headphones and two different audio stimuli (pink noise, acoustical guitar). In the listening test, a real loudspeaker was directly compared to a binaural simulation with high spatial resolution, which was compensated using seven different equalization methods.
Convention Paper 7721 (Purchase now)
P13-2 Crosstalk Cancellation between Phantom Sources—Florian Völk, Thomas Musialik, Hugo Fastl, Technical University of München - München, Germany
This paper presents an approach using phantom sources (resulting from the so-called summing localization of two loudspeakers) as sources for crosstalk cancellation (CTC). The phantom sources can be rotated synchronously with the listener’s head, thus demanding significantly less processing power than traditional approaches using fixed CTC loudspeakers, as an online re-computation of the CTC filters is (under certain circumstances) not necessary. First results of localization experiments show the general applicability of this procedure.
Convention Paper 7722 (Purchase now)
P13-3 Preliminary Evaluation of Sweet Spot Size in Virtual Sound Reproduction Using Dipoles—Yesenia Lacouture Parodi, Per Rubak, Aalborg University - Aalborg, Denmark
In a previous study, three crosstalk cancellation techniques were evaluated and compared under different conditions. Least square approximations in frequency and time domain were evaluated along with a method based on minimum-phase approximation and a frequency independent delay. In general, the least square methods outperformed the method based on minimum-phase approximation. However, the evaluation was only done for the best-case scenario, where the transfer functions used to design the filters correspond to the listener’s transfer functions and his/her location and orientation relative to the loudspeakers. In this paper we present a follow-up evaluation of the performance of the three inversion techniques when these conditions are violated. A setup to measure the sweet spot of different loudspeaker arrangements is described. Preliminary measurement results are presented for loudspeakers placed at the horizontal plane and an elevated position, where a typical 60-degree stereo setup is compared with two closely spaced loudspeakers. Additionally, two- and four-channel arrangements are evaluated.
Convention Paper 7723 (Purchase now)
P13-4 The Importance of the Direct to Reverberant Ratio in the Perception of Distance, Localization, Clarity, and Envelopment—David Griesinger, Consultant - Cambridge, MA, USA
The Direct to Reverberant ratio (D/R)—the ratio of the energy in the first wave front to the reflected sound energy—is absent from most discussions of room acoustics. Yet only the direct sound (DS) provides information about the localization and distance of a sound source. This paper discusses how the perception of DS in a reverberant field depends on the D/R and the time delay between the DS and the reverberant energy. Threshold data for DS perception will be presented, and the implications for listening rooms, hall design, and electronic enhancement will be discussed. We find that both clarity and envelopment depend on DS detection. In listening rooms the direct sound must be at least equal to the total reflected energy for accurate imaging. As the room becomes larger (and the time delay increases) the threshold goes down. Some conclusions: typical listening rooms benefit from directional loudspeakers, small concert halls should not have a shoe-box shape, early reflections need not be lateral, and electroacoustic enhancement of late reverberation may be vital in small halls.
Convention Paper 7724 (Purchase now)
P13-5 Frequency-Domain Interpolation of Empirical HRTF Data—Brian Carty, Victor Lazzarini, National University of Ireland - Maynooth, Ireland
This paper discusses Head Related Transfer Function (HRTF)-based artificial spatialization of audio. Two alternatives to the minimum phase method of HRTF interpolation are suggested, offering novel approaches to the challenge of phase interpolation. A phase truncation, magnitude interpolation technique aims to avoid complex preparation, manipulation or transformation of empirical HRTF data, and any inaccuracies that may be introduced by these operations. A second technique adds low frequency nonlinear frequency scaling to a functionally based phase model. This approach aims to provide a low frequency spectrum more closely aligned to the empirical HRTF data. Test results indicate favorable performance of the new techniques.
Convention Paper 7725 (Purchase now)
P13-6 Analysis and Implementation of a Stereophonic Play Back System for Adjusting the “Sweet Spot” to the Listener’s Position—Sebastian Merchel, Stephan Groth, Dresden University of Technology - Dresden, Germany
This paper focuses on a stereophonic play back system designed to adjust the “sweet spot” to the listener’s position. The system includes an optical face tracker that provides information about the listener’s x-y position. Accordingly, the loudspeaker signals are manipulated in real-time in order to move the “sweet spot.” The stereophonic perception with an adjusted “sweet spot” is theoretically investigated on the basis of several models of binaural hearing. The results indicate that an adjustment of signals corresponding to the center of the listener’s head does improve the localization over the whole listening area. Although some localization error remains due to asymmetric signal paths for off-center listening positions, which can be estimated and compensated for.
Convention Paper 7726 (Purchase now)
P13-7 Issues on Dummy-Head HRTFs Measurements—Daniela Toledo, Henrik Møller, Aalborg University - Aalborg, Denmark
The dimensions of a person are small compared to the wavelength at low frequencies. Therefore, at these frequencies head-related transfer functions (HRTFs) should decrease asymptotically until they reach 0 dB—i.e., unity gain—at DC. This is not the case in measured HRTFs: the limitations of the equipment used result in a wrong—and random—value at DC and the effect can be seen well within the audio frequencies. We have measured HRTFs on a commercially available dummy-head Neumann KU-100 and analyzed issues associated to calibration, DC correction, and low-frequency response. Informal listening tests suggest that the ripples seen in HRTFs with a wrong DC value affect the sound quality in binaural synthesis.
Convention Paper 7727 (Purchase now)
P13-8 Binaural Processing Algorithms: Importance of Clustering Analysis for Preference Tests—Andreas Silzle, Bernhard Neugebauer, Sunish George, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The acceptability of a newly proposed technology for commercial application is often assumed if the sound quality reached in a listening test surpasses a certain target threshold. As an example, it is a well-established procedure for decisions on the deployment of audio codecs to run a listening test comparing the coded/decoded signal with the uncoded reference signal. For other technologies, e.g., upmix or binaural processing, however, the unprocessed signal only can act as a "comparison signal." Here, the goal is to achieve a significant preference of the processed over the comparison signal. For such preference listening tests, we underline the importance of clustering the test results to obtain additional valuable information, as opposed to using the standard statistic metrics like mean and confidence interval. This approach allows determining the size of the user group that significantly prefers to use the proposed algorithm when it would be available in a consumer device. As an example, listening test data for binaural processing algorithms are analyzed in this investigation.
Convention Paper 7728 (Purchase now)
P13-9 Perception of Head-Position-Dependent Variations in Interaural Cross-Correlation Coefficient—Russell Mason, Chungeun Kim, Tim Brookes, University of Surrey - Guildford, Surrey, UK
Experiments were undertaken to elicit the perceived effects of head-position-dependent variations in the interaural cross-correlation coefficient of a range of signals. A graphical elicitation experiment showed that the variations in the IACC strongly affected the perceived width and depth of the reverberant environment, as well as the perceived width and distance of the source. A verbal experiment gave similar results and also indicated that the head-position-dependent IACC variations caused changes in the perceived spaciousness and envelopment of the stimuli.
Convention Paper 7729 (Purchase now)
Friday, May 8, 16:30 — 18:30
Chair: Ville Pulkki
P14-1 Further EBU Tests of Multichannel Audio Codecs—David Marston, BBC R&D - Tadworth, Surrey, UK; Franc Kozamernik, EBU - Geneva, Switzerland; Gerhard Stoll, Gerhard Spikofski, IRT - Munich, Germany
The European Broadcasting Union technical group D/MAE has been assessing the quality of multichannel audio codecs in a series of subjective tests. The two most recent tests and results are described in this paper. The first set of tests covered 5.1 multichannel audio emission codecs at a range of bit-rates from 128 kbit/s to 448 kbit/s. The second set of tests covered cascaded contribution codecs, followed by the most prominent emission codecs. Codecs under test include offerings from Dolby, DTS, MPEG, Apt, and Linear Acoustics. The conclusions observe that while high quality is achievable at lower bit-rates, there are still precautions to be aware of. The results from cascading of codecs have shown that the emission codec is usually the bottleneck of quality.
Convention Paper 7730 (Purchase now)
P14-2 Spatial Parameter Decision by Least Squared Error in Parametric Stereo Coding and MPEG Surround—Chi-Min Liu, Han-Wen Hsu, Yung-Hsuan Kao, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Parametric stereo coding (PS) and MPEG Surround (MPS) are used to reconstruct stereo or multichannel signals from down-mixed signals with a few spatial parameters. For extracting spatial parameters, the first issue is to decide a time-frequency (T-F) tile that controls the resolution of reconstructed spatial scenes and highly determines the amount of consumed bits. On the other hand, according to the standard syntax, the up-mixing matrices for time slots not on time borders are reconstructed by interpolation in the decoder. Therefore, the second issue is to decide the transmitted parameter values on the time borders for confirming the minimum reconstruction error of matrices. For both PS and MPS, based on the criterion of least squared error, this paper proposes a generic dynamic programming method for deciding the two issues under the tradeoff of audio quality and limited bits.
Convention Paper 7731 (Purchase now)
P14-3 The Potential of High Performance Computing in Audio Engineering—David Moore, Jonathan Wakefield, University of Huddersfield - West Yorkshire, UK
High Performance Computing (HPC) resources are fast becoming more readily available. HPC hardware now exists for use in conjunction with standard desktop computers. This paper looks at what impact this could have on the audio engineering industry. Several potential applications of HPC within audio engineering research are discussed. A case study is also presented that highlights the benefits of using the Single Instruction, Multiple Data (SIMD) architecture when employing a search algorithm to produce surround sound decoders for the standard 5-speaker surround sound layout.
Convention Paper 7732 (Purchase now)
P14-4 Efficient Methods for High Quality Merging of Spatial Audio Streams in Directional Audio Coding—Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ville Pulkki, Helsinki University of Technology - Espoo, Finland; Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Mikko-Ville Laitinen, Helsinki University of Technology - Espoo, Finland; Richard Schultz-Amling, Markus Kallinger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional Audio Coding (DirAC) is an efficient technique to capture and reproduce spatial sound. The analysis step outputs a mono DirAC stream, comprising an omnidirectional microphone pressure signal and side information, i.e., direction of arrival and diffuseness of the sound field expressed in time-frequency domain. This paper proposes efficient methods to merge multiple mono DirAC streams to allow a joint playback at the reproduction side. The problem of merging two or more streams arises in applications such as immersive spatial audio teleconferencing, virtual reality, and online gaming. Compared to a trivial direct merging of the decoder outputs, the proposed methods are more efficient as they do not require the synthesis step. From this it follows the benefit that the loudspeaker setup at the reproduction side does not have to be known in advance. Simulations and listening tests confirm the absence of any artifacts and that the proposed methods are practically indistinguishable from the ideal merging.
Convention Paper 7733 (Purchase now)
Friday, May 8, 16:30 — 18:00
P15-1 Psychoacoustics and Noise Perception Survey in Workers of the Construction Sector—Marcos D. Fernández; Bálder Vitón; José Antonio Ballesteros, Samuel Quintana, Isabel González, Escuela Universitaria Politécnica de Cuenca, Universidad de Castilla-La Mancha - Cuenca, Spain
Noise levels are not enough to assess completely the influence of the noise. Therefore, psychoacoustics and perception surveys should be taken into account. The noise that the construction workers produce in their tasks is recorded with a HATS. Later, those recordings are processed to derive different parameters: spectrum, weighted equivalent levels, and the main psychoacoustics parameters. After that, a specific survey has been developed to assess the perception of such activity noises during the working time to correlate adjectives of perception with those parameters mentioned. The survey has been designed to be answered by the workers that are exposed to the noise, so that conclusions could be derived about the feelings and annoyance that the noise can cause.
Convention Paper 7734 (Purchase now)
P15-2 On the Design of Automatic Sound Classification Systems for Digital Hearing Aids—Enrique Alexandre, Lorena Álvarez-Perez, Roberto Gil-Pita, Raúl Vicen-Bueno, Lucas Cuadra, University of Alcalá - Alcalá de Henares, Spain
The design of digital hearing aids able to carry out advanced functionalities (such as, for instance, classify the acoustic environment and automatically select the best amplification program for the user’s comfort) exhibits a great difficulty. Since hearing aids have to work at very low clock frequency in order to minimize power consumption and maximize life battery, the number of available instructions per second is actually very small. This enforces the design of efficient algorithms with a reduced number of instructions. In particular, this paper will focus on three extremely related topics: (1) the design of low-complexity features; (2) the use of automatic feature selection algorithms to optimize the performance of the classifier; and (3) the critical analysis of a variety of different classification algorithms, basically based on their complexity and performance and determining whether or not they are feasible to be implemented.
Convention Paper 7735 (Purchase now)
P15-3 Pruning Algorithms for Multilayer Perceptrons Tailored for Speech/Non-Speech Classification in Digital Hearing Aids—Lorena Álvarez, Enrique Alexandre, Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Spain
This paper explores the feasibility of using different pruning algorithms for multilayer perceptrons (MLPs) applied to the problem of speech/non-speech classification in digital hearing aids. A classifier based on MLPs is considered the best option in spite of its presumably high computational cost. Nevertheless, its implementation has been proven to be feasible: it requires some trade-offs involving a balance between reducing the computational demands (that is, the number of neurons) and the quality perceived by the user. In this respect, this paper will focus on the design of three novel pruning algorithms for MLPs, which attempt to converge to the minimum complexity network (that is, the lowest number of neurons in the hidden layer) without degrading the performance of it. The results obtained with the proposed algorithms will be compared with those obtained when using another pruning algorithm proposed in the literature.
Convention Paper 7736 (Purchase now)
P15-4 Evolutionary Optimization for Hearing Aids of Computational Auditory Scene Analysis—Anton Schlesinger, Marinus M. Boone, Technical University of Delft - Delft, The Netherlands
Computational auditory scene analysis (CASA) provides an excellent means to improve speech intelligibility in adverse acoustical situations. In order to utilize algorithms of CASA in hearing aids, sets of algorithmic parameters need to be adjusted to the individual auditory performance of the listener and the acoustic scene in which they are employed. Performed manually, the optimization is an expensive procedure. We therefore developed a framework in which algorithms of CASA are automatically optimized by the principles of evolution, i.e., by a genetic algorithm. By using the speech transmission index (STI) as an objective function, the presented framework presents a holistic routine that is solely based on psychoacoustical and physiological models to improve and to assess speech intelligibility. The initial listening test revealed a discrepancy between the objective and subjective assessment of speech intelligibility, which suggests a review of the objective function. Once the objective function is in accordance with the individual perception of speech intelligibility, the presented framework could be applied in the optimization of all complex speech processors and therewith accelerate their assessment and application.
Convention Paper 7737 (Purchase now)
P15-5 Enhanced Control of On-Screen Faders with a Computer Mouse—Michael Hlatky, Kristian Gohlke, David Black, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Input devices of the audio studio that formerly were physical have mostly been converted into virtual controls on the computer screen. Whereas this transition saves space and cost, it has reduced the performance of these controls, as virtual controls adjusted using the computer mouse do not exhibit the accuracy and accessibility of their physical counterparts. Previous studies show that interaction with scrollable timelines can be enhanced by an intelligent interpretation of the mouse movement. We apply similar techniques to virtual faders as used for audio control, leveraging such approaches as controllable zoom levels and pseudo-haptic interaction. Tests conducted on five such methods provide insight into how to decouple the fader from the mouse movement to improve accuracy without impairing the speed of the interaction.
Convention Paper 7738 (Purchase now)
P15-6 Modeling of External Ear Acoustics for Insert Headphone Usage—Marko Hiipakka, Miikka Tikander, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
Although acoustics of the external ear has been studied extensively for auralization and hearing aids, the acoustic behavior with insert headphones is not as well known. Our research focused on the effects of outer ear physical dimensions, particularly on sound pressure at the eardrum. The main parameter was the length of the canal, but eardrum’s damping of resonances was also studied. Ear canal simulators and a dummy head were constructed. Measurements were also performed from human ear canals. The study was carried out both with unblocked ear canals and when the canal entrance was blocked with an insert earphone. Special insert earphones with in-ear microphones were constructed for this purpose. Physics-based computational models were finally used to validate the approach.
Convention Paper 7739 (Purchase now)
Spatial Rendering–Part 1
Saturday, May 9, 09:00 — 11:30
Chair: Andreas Silzle
P16-1 An Alternative Ambisonics Formulation: Modal Source Strength Matching and the Effect of Spatial Aliasing—Franz Zotter, Hannes Pomberger, University of Music and Dramatic Arts - Graz, Austria; Matthias Frank, Graz University of Technology - Graz, Austria
Ambisonics synthesizes sound fields as a sum over angular (spherical/cylindrical harmonic) modes, resulting in the definition of an isotropically smooth angular resolution. This means, virtual sources are synthesized with outstanding smoothness across all angles of incidence, using discrete loudspeakers that uniformly cover a spherical or circular surface around the listening area. The classical Ambisonics approach models the fields of these discrete loudspeakers in terms of a sampled continuum of plane-waves. More accurately, the contemporary concept of Ambisonics uses a continuous angular distribution of point-sources at finite distance instead, which is considerably easier to imagine. This also improves the accuracy of holophonic sound field synthesis and the analytic description of the sweet spot. The sweet spot is a limited area of faultless synthesis emerging from angular harmonics truncation. Additionally, playback with loudspeakers causes spatial aliasing. In this sense, it allows for a successive consideration of the major shortcomings of Ambisonics: the limited sweet spot size and spatial aliasing. To elaborate on this concept this paper starts with the solution of the nonhomogeneous wave equation for a spherical point-source distribution, and ends with a novel study on spatial aliasing in Ambisonics.
Convention Paper 7740 (Purchase now)
P16-2 Sound Field Reproduction Employing Non-Omnidirectional Loudspeakers—Jens Ahrens, Sascha Spors, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
In this paper we treat sound field reproduction via circular distributions of loudspeakers. The general formulation of the approach has been recently published by the authors. In this paper we concentrate on the employment of secondary sources (i.e., loudspeakers) whose spatio-temporal transfer function is not omnidirectional. The presented approach allows us to treat each spatial mode of the secondary source’s spatio-temporal transfer function individually. We finally outline the general process of incorporating spatio-temporal transfer functions obtained from microphone array measurements.
Convention Paper 7741 (Purchase now)
P16-3 Alterations of the Temporal Spectrum in High-Resolution Sound Field Reproduction of Different Spatial Bandwidths—Jens Ahrens, Sascha Spors, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
We present simulations of the wave field reproduced by a discrete circular distribution of loudspeakers. The loudspeaker distribution is driven either with signals of infinite spatial bandwidth (as it happens in wave field synthesis), or the loudspeaker distribution is driven with signals of finite spatial bandwidth (as it is the case in near-field compensated higher order Ambisonics). The different spatial bandwidths lead to different accuracies of the desired component of the reproduced wave field and to spatial aliasing artifacts with essentially different properties. Our investigation focuses on the potential consequences of the artifacts on human perception.
Convention Paper 7742 (Purchase now)
P16-4 Cooperative Spatial Audio Authoring: Systems Approach and Analysis of Use Cases—Jens-Oliver Fischer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Francis Gropengiesser, TU Ilmenau - Ilmenau, Germany; Sandra Brix, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
Today’s audio production process is highly parallel and segregated. This is especially the case in the field of audio postproduction for motion pictures. The introduction of spatial audio systems like 5.1, 22.2 or Wave Field Synthesis results in even more production steps, namely the spatial authoring, to accomplish a rich experience for the audience. This paper proposes a system that enables the audio engineers to work together on the same project. The proposed system is planned to be implemented for an existing spatial authoring software but can be utilized by any other application that organizes its data in a tree structured way. Three major use cases, i.e., Single User, Work Space, and Work Group, are introduced and analyzed.
Convention Paper 7743 (Purchase now)
P16-5 Spatial Sampling Artifacts of Wave Field Synthesis for the Reproduction of Virtual Point Sources—Sascha Spors, Jens Ahrens, Deutsche Telekom Laboratories, Techniche Universität Berlin - Berlin, Germany
Spatial sound reproduction systems with a large number of loudspeakers are increasingly being used. Wave field synthesis is a reproduction technique using a large number of densely placed loudspeakers (loudspeaker array). The underlying theory, however, assumes a continuous distribution of loudspeakers. Individual loudspeakers placed at discrete positions constitute a spatial sampling process that may lead to sampling artifacts. These may degrade the perceived reproduction quality and will limit the application of active control techniques like active room compensation. The sampling artifacts for the reproduction of plane waves have already been discussed in previous papers. This paper derives the spatial sampling artifacts and anti-aliasing conditions for the reproduction of virtual point sources on linear loudspeaker arrays using wave field synthesis techniques.
Convention Paper 7744 (Purchase now)
Room Acoustics & Loudspeaker Interaction
Saturday, May 9, 09:00 — 11:00
Chair: Eddy B. Brixen
P17-1 Effects of Loudspeaker Directivity on Perceived Sound Quality—A Review of Existing Studies—William Evans, University of Surrey - Guildford, Surrey, UK; Jakob Dyreby, Søren Bech, Bang & Olufsen A/S - Struer, Denmark; Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
The directivity of a loudspeaker system is often regarded as a prominent factor in the overall subjective quality of the reproduced sound experience. Much literature is available on the topic, and currently a broad field of opinion exists among designers. This paper provides an overview of the available literature, as well as an extended investigation into listener-based research. Results indicate that for such a widely debated topic, conclusive measurement data with regard to human listeners is limited, and, therefore, a proposal for more informative listening tests is presented.
Convention Paper 7745 (Purchase now)
P17-2 Subjective Validity of Figures of Merit for Room Aspect Ratio Design—Matthew Wankling, Bruno Fazenda, University of Huddersfield - Huddersfield, West Yorkshire, UK
Attempts have long been made to classify a room’s low frequency audio reproduction capability with regard to its aspect ratio. Common metrics used have relied on the homogeneous distribution of modal frequencies and from these a number of “optimal” aspect ratios have emerged. However, most of these metrics ignore the source and receiver coupling to the mode shapes—only a few account for this in the derivation of a figure of merit. The subjective validity of these attempts is tested and discussed. Examples are given of supposedly good room ratios with bad performance and vice versa. Subjective assessment of various room scenarios is undertaken and a ranking order has been obtained to correlate with a proposed figure of merit.
Convention Paper 7746 (Purchase now)
P17-3 A Study of Low-Frequency Near- and Far-Field Loudspeaker Behavior—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada, B&W Group Ltd., Steyning, West Sussex, UK; Martial Rousseau, B&W Group Ltd. - Steyning, West Sussex, UK
Low-frequency loudspeaker measurements are difficult. Room reflections, mediocre anechoic chambers, and random noise play havoc with the quest. Diffraction is different in nearfield and farfield. This paper covers a range of topics that bear on these problems, such as boundary element diffraction simulations, an approximate theory for low frequencies, methods to shorten the impulse response, and nearfield characteristics. A few points are illustrated with measurements. An earlier simplified diffraction theory of Kessel is checked for axisymmetric cylindrical and rectangular boxes by boundary-element simulations, in an attempt to pin down the diffractive 4pi to 2pi transition. It turns out to have a strong connection to the acoustic center of a loudspeaker. Some measurements are made under various conditions. Shortening methods are used to minimize the deleterious effect of truncating room reflections from the impulse response.
Convention Paper 7747 (Purchase now)
P17-4 Subwoofers in Symmetrical and Asymmetrical Rooms—Juha Backman, Nokia Corporation - Espoo, Finland
A theoretical study of behavior of single and multiple subwoofers, taking also geometrical and acoustical asymmetry of practical listening environments into account, is presented. The results indicate that configurations aimed at precise cancellation of individual modes have a high sensitivity to deviations from the ideal. However, with multiple subwoofers it is possible to find robust placements that both reduce the spatial variation of the sound field and the frequency variation of the response. This, however, requires loudspeaker placements where also the height of the source from the floor is varied.
Convention Paper 7748 (Purchase now)
Saturday, May 9, 10:30 — 12:00
P18-1 WhisPER—A New Tool for Performing Listening Tests—Simon Ciba, André Wlodarski, Hans-Joachim Maempel, Technical University of Berlin - Berlin, Germany
A software tool is presented for performing experiments in the field of perceptual audio evaluation and psychoacoustic measurement, controlling the interaction with both the subject and the playback environment. For this purpose a repertoire of test procedures has been implemented, including popular qualitative and quantitative approaches. By using OpenSound Control commands, not only traditional multichannel reproduction is supported, but also advanced spatial audio reproduction such as dynamic binaural synthesis or wavefield synthesis. WhisPER has been written in MATLAB to facilitate its further development within the scientific community. As opposed to existing libraries it provides a coherent graphical user interface system allowing easier access and configuration also for users without advanced programming experience.
Convention Paper 7749 (Purchase now)
P18-2 Psychoacoustic Assessment of the Noise Emitted by the Machines. The Case of the Grinders—Marcos D. Fernández, José Antonio Ballesteros, Iván Suárez; Samuel Quintana; Isabel González, Escuela Universitaria Politécnica de Cuenca, Universidad de Castilla-La Mancha - Cuenca, Spain
Sound quality it is used as the suitability of the sound emitted by a machine, depending on the characteristics of that sound and the perceptual sensation received that reflects the degree of acceptance of the machine by the user. In order to evaluate the sound quality of the grinders under study, binaural recordings, are required of the emitted sounds to determine the objective psychoacoustic parameters, and then, to make subjective tests to a representative number of people about the impression made by that particular sound.
Convention Paper 7750 (Purchase now)
P18-3 Investigations of the Effects of Nonlinear Distortions on Psychoacoustical Measures—Stephan Herzog, Technical University Kaiserslautern - Kaiserslautern, Germany
The perception of nonlinear distortions of audio devices, in particular the perception of nonlinear distortions of digital audio, is only insufficiently described by typical measures like THD. To provide a better insight into perceptual effects of nonlinear distortions, their audibility, and the impact on the psychoacoustical measures like loudness and sharpness is examined. For this purpose a test method has been developed. The first step in the test is the measurement of the frequency response of the device under test with an efficient method to enable the separation of linear and nonlinear processing. The second step of the test consists of the computation of the psychoacoustical measures and the thresholds for the audibility of nonlinear distortions. Both computations are based on the same psychoacoustical model to obtain consistent results. Results for several types of distortion obtained with simulations and measurements on analog circuits are presented.
Convention Paper 7751 (Purchase now)
Event, Stage, and Sound Reinforcement
Saturday, May 9, 13:30 — 15:00
Chair: Francis Rumsey, University of Surrey - Guildford, Surrey, UK
P19-1 Comparative Evaluation of Howling Detection Criteria in Notch-Filter-Based Howling Suppression—Toon van Waterschoot, Marc Moonen, Katholieke Universiteit Leuven - Leuven Belgium
Notch-filter-based howling suppression (NHS) is one of the most popular methods for acoustic feedback control in public address and hands-free communication systems. The NHS method consists of two stages: howling detection and notch filter design. While the design of notch filters is based on well-established filter design techniques, there is little agreement in the NHS literature on how the howling detection subproblem should be tackled. Moreover, since the NHS literature mainly consists of patents, only few experimental results have been reported. The aim of this paper is to describe a unifying framework for howling detection and to provide a comparative evaluation of existing and novel howling detection criteria.
Convention Paper 7752 (Purchase now)
P19-2 Professional Wireless Microphone Systems: Current Situation and Upcoming Changes in Regulatory Issues in Europe and USA—Frank Ernst, Beyerdynamic GmbH & Co. KG - Heilbronn, Germany
Professional wireless microphones have been in use for almost 50 years now. The operation is based on a frequency sharing with TV broadcast transmitters. With the transition to digital TV, this situation changes. Digital TV is more spectrum efficient. After the transition is completed, areas of the spectrum, will be cleared from TV broadcast and will be available for new services. These cleared spectrum areas are referred to as “digital dividend” or white spaces. By the allocation of these bands to other services, valuable resources for the operation of professional wireless microphones will be lost. This paper will give an overview on the current situation for professional wireless microphones and the upcoming changes with the transition to digital TV.
Convention Paper 7753 (Purchase now)
P19-3 Sound Field Reconstruction: An Improved Approach for Wave Field Synthesis—Mihailo Kolundzija, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland, University of California at Berkeley, Berkeley, CA, USA
Wave field synthesis (WFS) is a prevalent approach to multiple-loudspeaker sound reproduction for an extended listening area. Although powerful as a theoretical concept, its deployment is hampered by practical limitations due to diffraction, aliasing, and the effects of the listening room. Reconstructing the desired sound field in the listening area, accounting for the medium propagation characteristic, is another approach termed as sound field reproduction (SFR). It is based on the essential band-limitedness of the sound field, which enables a continuous matching of the reconstructed and the desired sound field by their matching on a discrete set of points spaced below the Nyquist distance. We compare the two approaches in a common single-source free-field setup, and show that SFR provides improved sound field reproduction compared to WFS in a wide listening area around a defined reference line.
Convention Paper 7754 (Purchase now)
Saturday, May 9, 13:30 — 15:00
P20-1 Subjective Audio Quality with Multiple Description Coding in a WLAN 802.11-Based Multicast Distribution Network—Marcus Purat, Tom Ritter, TFH Berlin - Berlin, Germany
This paper presents a number of results of a study of different methods to mitigate the impact of packet loss on the subjective quality in a wireless distribution network of compressed high fidelity audio. The system was simulated in Matlab based on parameters of an 802.11a WLAN in multicast-mode and the Vorbis codec. The performance of multiple description coding in terms of expected subjective audio quality and its impact on the data rate is quantified and compared to other receiver-based packet loss concealment methods. The optimum set of parameters for a multiple description coder that achieves the best subjective audio quality for a given packet loss rate in the simulated system is presented.
Convention Paper 7755 (Purchase now)
P20-2 A Joint Approach to Extract Multiple Fundamental Frequency in Polyphonic Signals Minimizing Gaussian Spectral Distance—Francisco J. Cañadas-Quesada, Pedro Vera-Candeas, Nicolás Ruiz-Reyes, Julio Jose Carabias-Orti, D. Martínez-Muñoz, University of Jaén - Jaén, Spain
This paper presents a joint estimation approach to extract multiple fundamental frequency (F0) in monaural polyphonic music signals. In a frame-based analysis, we generate a spectral envelope for each combination of F0 candidates, from non-overlapped partials, under assumption that a harmonic sound is characterized by a Gaussian mixture model (GMM). The optimal F0 candidates combination minimizes a spectral Euclidean distance measure between the original spectrum and Gaussian spectral models. Evaluation was carried out using several piano recordings. Evaluation shows promising results.
Convention Paper 7756 (Purchase now)
P20-3 A Mixture-of-Experts Approach for Note Onset Detection—Norberto Degara, Antonio Pena, Manuel Sobreira-Seoane, Universidade de Vigo - Vigo, Spain; Soledad Torres-Gijarro, Laboratorio Oficial de Metroloxía de Galicia (LOMG) - Tecnópole, Ourense, Spain
Finding the starting time of events (onsets) is useful in a number of applications for audio signals. The goal of this paper is to present a combination of techniques for automatic detection of events in audio signals. The proposed system uses a supervised classification algorithm to combine a set of features extracted from the audio signal and reduce the original signal to a robust detection function. Onsets are obtained by using a simple peak-picking algorithm. This paper describes the analysis system used to extract the features and the details of the neural network algorithm used to combine them. We conclude by comparing the performance of the proposed algorithm with the system that obtained the first place in the 2005 Music Information Retrieval Evaluation eXchange.
This paper presented by Soledad Torres-Gijarro.
Convention Paper 7757 (Purchase now)
P20-4 Automatic Adjustment of Off-the-Shelf Reverberation Effects—Sebastian Heise, Michael Hlatky, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Virtually all effect units that process digital audio—software plug-ins as well as dedicated hardware—can be controlled digitally. This allows subjecting their settings to optimization processes. We demonstrate the automatic adaptation of reverberation plug-ins to given room impulse responses. This facilitates replacing computationally expensive convolution reverberation units with standard ones, which also are amenable to easier parameter tweaking after their overall setting has been adjusted through our method. We propose optimization strategies for this multi-dimensional nonlinear problem that need no adaptation to the particularities of each effect unit, are sped up using multicore processors and networked computers. The optimization process evaluates the difference between the actual response and the targeted response on the basis of psychoacoustic features. An acoustic comparison with effect parameter settings crafted by professional human operators indicates that the computationally optimized settings yield comparable or better results.
Convention Paper 7758 (Purchase now)
P20-5 Improvements on Automatic Parametric Equalization and Cross-Over Alignment of Audio Systems—German Ramos, Pedro Tomas, Technical University of Valencia - Valencia, Spain
The idea and algorithm implementation of an automatic parametric equalizer and cross-over alignment of audio systems was proposed previously by one of the authors with proven success. This method designed Infinite Impulse Response (IIR) equalization and cross-over filters directly in a series of second-order-sections (SOS) employing peak filters, and pre-initialized high-pass and low-pass filters, defined by its parameters (frequency, gains, and Q). The method supported the inclusion of constraints (maximum and minimum parameter values) and designed the SOS in order of importance in the equalization, providing thus a scalable filter implementation. In order to lower the order of the needed filter, and looking also for an automatic decision on the selection of filter types and initialization, several improvements are presented. It is now possible for the algorithm to select, configure, and use Shelving filters in the SOS chain for equalization. Also, the decision and initialization of the needed high-pass and low-pass SOS filters could be automatic, helping in the cross-over design stage for active audio systems.
Convention Paper 7759 (Purchase now)
P20-6 Low Noise Transformer Input Preamp Design—A Solution that Eliminates CMID—Milan Kovinic, MMK Instruments - Belgrade, Serbia; Dragan Drincic, Advanced School for Electrical & Computer Engineering - Belgrade, Serbia; Sasha Jankovic, OXYGEN-Digital, Parkgate Studio - Sussex, UK
This paper examines more closely the advantages of input transformer-op amplifier configurations, especially those implemented in low-noise designs. Usual transformer input stage topology works in non-inverting architecture, since it allows the transformer to work with optimum loading, to maximize the signal-to-noise ratio. However, this configuration is subject to Common-Mode voltage Induced Distortion—CMID. The susceptibility is further increased if the amplifier source impedance is not perfectly matched. This is illustrated by tests on popular audio op amps. Advanced transformer input stage topology proposed in this paper completely prevents this kind of distortion. Noise performance remains unaffected, yet listening tests in practical application confirm the sound to be more pleasant.
Convention Paper 7760 (Purchase now)
Spatial Rendering–Part 2
Saturday, May 9, 15:00 — 18:30
Chair: Sascha Spors, Technical University of Berlin - Berlin, Germany
P21-1 Score File Generators for Boids-Based Granular Synthesis in Csound—Enda Bates, Dermot Furlong, Trinity College - Dublin, Ireland
In this paper we present a set of score file generators and granular synthesis instruments for the Csound language. The applications use spatial data generated by the Boids flocking algorithm along with various user-defined values to generate score files for grainlet additive synthesis, granulation, and glisson synthesis instruments. Spatialization is accomplished using Higher Order Ambisonics and distance effects are modeled using the Doppler Effect, early reflections, and global reverberation. The sonic quality of each synthesis method is assessed and an original composition by the author is presented.
Convention Paper 7761 (Purchase now)
P21-2 Acoustical Rendering of an Interior Space Using the Holographically Designed Sound Array—Wan-Ho Cho, Jeong-Guon Ih, KAIST - Daejeon, Korea
It was reported that the filter for the acoustic array can be inversely designed in a holographic way, which was demonstrated in a free-field. In this study the same method using the boundary element method (BEM) was employed to render the interior sound field in an acoustically desired fashion. Because the inverse BEM technique can deal with arbitrary shaped source or bounding surfaces, one can simultaneously consider the effect of irregular radiation surface and reflection boundaries having impedances such as walls, floor, and ceiling. To examine the applicability, a field rendering example was tested to control the relative spatial distribution of sound pressure in the enclosed field.
Convention Paper 7762 (Purchase now)
P21-3 Validation of a Loudspeaker-Based Room Auralization System Using Speech Intelligibility Measures—Sylvain Favrot, Jörg M. Buchholz, Technical University of Denmark - Lyngby, Denmark
A novel loudspeaker-based room auralization (LoRA) system has been proposed to generate versatile and realistic virtual auditory environments (VAEs) for investigating human auditory perception. This system efficiently combines modern room acoustic models with loudspeaker auralization using either single loudspeaker or high-order Ambisonics (HOA) auralization. The LoRA signal processing of the direct sound and the early reflections was investigated by measuring the speech intelligibility enhancement by early reflections in diffuse background noise. Danish sentences were simulated in a classroom and the direct sound and each early reflection were either auralized with a single loudspeaker, HOA or first-order Ambisonics. Results indicated that (i) absolute intelligibility scores are significantly dependent on the reproduced technique and that (ii) early reflections reproduced with HOA provide a similar benefit on intelligibility as when reproduced with a single loudspeaker. It is concluded that speech intelligibility experiments can be carried out with the LoRA system either with the single loudspeaker or HOA technique.
Convention Paper 7763 (Purchase now)
P21-4 Low Complexity Directional Sound Sources for Finite Difference Time Domain Room Acoustic Models—Alexander Southern, Damian Murphy, University of York - York, UK
The demand for more natural and realistic auralization has resulted in a number of approaches to the time domain implementation of directional sound sources in wave-based acoustic modeling schemes such as the Finite Difference Time Domain (FDTD) method and the Digital Waveguide Mesh (DWM). This paper discusses an approach for implementing simple regular directive sound sources using multiple monopole excitations with distributed spatial positioning. These arrangements are tested along with a discussion of the characteristic limitations for each setup scenario.
Convention Paper 7764 (Purchase now)
P21-5 Binaural Reverberation Using a Modified Jot Reverberator with Frequency-Dependent and Interaural Coherence Matching—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
An extension of the Jot reverberator is presented, producing binaural late reverberation where the interaural coherence can be controlled as a function of frequency such that it matches the frequency-dependent interaural coherence of a reference binaural room impulse response (BRIR). The control of the interaural coherence is implemented using linear filters outside the reverberator’s recursive loop. In the absence of a reference BRIR, these filters can be calculated from an HRTF set.
Convention Paper 7765 (Purchase now)
P21-6 Design and Limitations of Non-Coincidence Correction Filters for Soundfield Microphones—Christof Faller, Illusonic LLC - Lausanne, Switzerland; Mihailo Kolundzija, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
The tetrahedral microphone capsule arrangement in a soundfield microphone captures a so-called A-format signal, which is then converted to a corresponding B-format signal. The phase differences between the A-format signal channels due to non-coincidence of the microphone capsules cause coloration and errors in the corresponding B-format signals and linear combinations thereof. Various strategies for designing B-format non-coincidence correction filters are compared and limitations are discussed.
Convention Paper 7766 (Purchase now)
P21-7 Generalized Multiple Sweep Measurement—Stefan Weinzieri, Andre Giese, Alexander Lindau, TU Berlin - Berlin, Germany
A system identification by impulse response measurements with multiple sound source configurations can benefit greatly from time-efficient measurement procedures. An optimized method by interleaving and overlapping of multiple exponential sweeps (MESM) used as excitation signals was presented by Majdak et al. (2007). For single system identifications, however, much higher signal-to-noise ratios (SNR) can be reached with sweeps whose magnitude spectra are adapted to the background noise spectrum of the acoustical environment, as proposed by Müller & Massarani (2001). We investigated on which conditions and to what extent the efficiency of multiple sweep measurements can be increased by using arbitrary, spectrally adapted sweeps. An extension of the MESM approach toward generalized sweep spectra is presented, along with a recommended measurement procedure and a prediction of the efficiency of multiple sweep measurements depending on typical measurement conditions.
Convention Paper 7767 (Purchase now)
Microphones and Headphones
Saturday, May 9, 15:00 — 18:30
Chair: William Evans, University of Surrey - Guildford, Surrey, UK
P22-1 Frequency Response Adaptation in Binaural Hearing—David Griesinger, Consultant - Cambridge, MA, USA
The pinna and ear canals act as listening trumpets to concentrate sound pressure on the eardrum. This concentration is strongly frequency dependent, typically showing a rise in pressure of 20 dB at 3000 Hz. In addition, diffraction and reflections from the pinna substantially alter the frequency response of the eardrum pressure as a function of the direction of a sound source. In spite of these large departures from flat response, listeners usually report that a uniform pink power spectrum sounds frequency balanced, and loudspeakers are manufactured to this standard. But on close listening frontal pink noise does not sound uniform. The ear clearly uses adaptive correction of timbre to achieve these results. This paper discusses and demonstrates the properties and limits of this adaptation. The results are important for our experience of live music in halls and in reproduction of music through loudspeakers and headphones.
Convention Paper 7768 (Purchase now)
P22-2 Concha Headphones and Their Coupling to the Ear—Lola Blanchard, Bang & Olufsen ICEpower s/a - Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Lyngby, Denmark
The purpose of the study is to obtain a better understanding of concha headphones. Concha headphones are the small types of earpiece that are placed in the concha. They are not sealed to the ear and therefore, there is a leak between the earpiece and the ear. This leak is the reason why there is a significant lack of bass when using such headphones. This paper investigates the coupling between the headphone and the ear, by means of measurement in artificial ears and models. The influence of the back volume is taken into account.
Convention Paper 7769 (Purchase now)
P22-3 Subjective Evaluation of Headphone Target Frequency Responses—Gaëtan Lorho, Nokia Corporation - Finland
The effect of headphone frequency response equalization on listeners’ preference was studied for music and speech reproduction. The high-quality circum-aural headphones selected for this listening experiment were first equalized to produce a flat frequency response. Then, a set of filters was created based on two parameters defining the amplitude and center frequency of the main peak found around 3 kHz in the free-field and diffuse-field equalization curves. Two different listening tests were carried out to evaluate these equalization candidates using a different methodology and a total of 80 listeners. The results of this study indicate that a target frequency response with a 3 kHz peak of lower amplitude than in the diffuse-field response is preferred by listeners for both music and speech.
Convention Paper 7770 (Purchase now)
P22-4 Study and Consideration on Symmetrical KEMAR HATS Conforming to IEC60959—Kiyofumi Inanaga, Homare Kon, Sony Corporation - Tokyo, Japan; Gunnar Rasmussen, Per Rasmussen, G.R.A.S. Sound & Vibration A/S - Holte, Denmark; Yasuhiro Riko, Riko Associates - Yokohama, Japan
KEMAR is widely recognized as a leading model of head and torso simulators (HATS) for different types of acoustic measurements meeting requirements of a global industrial standard, ANSI S3.36/ASA58-1985 and IEC 60959:1990. One of the KEMAR HATS pinna models has a reputation for good reproducibility of measured results in examining headphones and earphones. However, it requires free filed compensation in order to conduct the measurements; thus, the head-related transfer function (HRTF) of HATS fitted with the pinna model must be corrected. Because headphones and earphones are usually designed symmetrically, we developed a prototype of Symmetrical KEMAR HATS based on the original KEMAR mounted with the pinna model with good reproducibility. We measured and evaluated a set of HRTFs from the sound source to both ears. Our study concluded that the HATS we developed carries symmetrical characteristics and is also suitable to be utilized as a tool to measure the qualities of variety of acoustic devices along with the conventional KEMAR and it can serve as a new common platform for different types of electroacoustic measurements.
Convention Paper 7771 (Purchase now)
P22-5 Spatio-Temporal Gradient Analysis of Differential Microphone Arrays—Mihailo Kolundzija, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland; Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland, University of California at Berkeley, Berkeley, CA, USA
The literature on gradient and differential microphone arrays makes a distinction between the two, and nevertheless shows how both types can be used to obtain the same response. A more theoretically sound rationale for using delays in differential microphone arrays has not yet been given. This paper presents a gradient analysis of the sound field viewed as a spatio-temporal phenomenon, and gives a theoretical interpretation of the working principles of gradient and differential microphone arrays. It shows that both types of microphone arrays can be viewed as devices for approximately measuring the spatio-temporal derivatives of the sound field. Furthermore, it also motivates the design of high-order differential microphone arrays using the aforementioned spatio-temporal gradient analysis.
Convention Paper 7772 (Purchase now)
P22-6 The Analog Microphone Interface and its History—Joerg Wuttke, Joerg Wuttke Consultancy - Pfinztal, Germany, Schoeps GmbH, Karlsruhe, Germany
The interface between microphones and microphone inputs has special characteristics and requires special attention. The low output levels of microphones and the possible need for long cables have made it necessary to think about noise and interference of all kinds. A microphone input is also the electrical load for a microphone and can have an adverse influence on the its performance. Condenser microphones contain active circuitry that required some form of powering. With the introduction of transistorized circuitry in the 1960s, it became practical for this powering to be incorporated into microphone inputs. Various methods appeared in the beginning; 48-Volt phantom powering is now the dominant standard, but this standard method is still not always implemented correctly.
Convention Paper 7773 (Purchase now)
P22-7 Handling Noise Analysis in Large Cavity Microphone Windshields—Improved Solution—Philippe Chenevez, CINELA - Paris, France
Pressure gradient microphones are well known to be highly sensitive to vibrations. Respectable suspensions are made to create the best isolation possible, but when the microphone is placed inside a large cavity windshield, the external skin behaves as a drum excited by the vibrations of the support (boom or stand). As a consequence structure-borne noise is also transmitted acoustically to the microphone, due to its hard proximity effect. Some theoretical aspects and practical measurements are presented, in conjunction with a proposed improved solution.
Convention Paper 7774 (Purchase now)
Psychoacoustics and Perception
Saturday, May 9, 16:30 — 18:00
P23-1 Influence of the Listening Room in the Perception of a Musical Work—Nelia Valverde, Marcos D. Fernández, José Antonio Ballesteros, Leticia Martínez, Samuel Quintana, Isabel González, Escuela Universitaria Politénica de Cuenca - Cuenca, Spain
The listening of the same musical composition generates a unique perception for every listener but, simultaneously, the specific acoustic conditions of the chosen room have a decisive influence on the perception. In order to evaluate such differences depending on the listening room, a musical work for choir has been composed and recorded with a HATS in an anechoic room, in a reverberant room, and in a normal room. With those records, surveys to professional musicians and non-expert listeners have been carried out, once they have previously heard the recording with headphones, and finally, the answers obtained have been evaluated in order to determine the influence of the listening room in the perception of the musical work.
Convention Paper 7775 (Purchase now)
P23-2 Comparison of Methods for Measuring Sound Quality through HATS and Binaural Microphones—José Antonio Ballesteros, Marcos D. Fernández, Samuel Quintana, Isabel González, Laura Rodríguez, Escuela Universitaria Politécnica de Cuenca, Universidad de Castilla-La Mancha - Cuenca, Spain
Sound quality techniques are currently becoming more important as they take into account the human perception of sound. By now, there is no well established international standards for measuring sound quality and no well recognizable reference index for its assessment. Then, a HATS or a pair of binaural microphones can be used for measuring the typical sound quality parameters. A set of measurements, under the same condition, has been carried out using both devices for assessing the differences and the possible variation in the results. As a consequence of all of this, guidance is given for choosing the device that best fits depending on each measurement context.
Convention Paper 7776 (Purchase now)
P23-3 Improving Perceived Tempo Estimation by Statistical Modeling of Higher-Level Musical Descriptors—Ching-Wei Chen, Markus Cremer, Kyogu Lee, Peter DiMaria, Ho-Hsiang Wu, Gracenote, Inc. - Emeryville, CA, USA
Conventional tempo estimation algorithms generally work by detecting significant audio events and finding periodicities of repetitive patterns in an audio signal. However, human perception of tempo is subjective and relies on a far richer set of information, causing many tempo estimation algorithms to suffer from octave errors, or “double/half-time” confusion. In this paper we propose a system that uses higher-level musical descriptors such as mood to train a statistical model of perceived tempo classes, which can then be used to correct the estimate from a conventional tempo estimation algorithm. Our experimental results show reliable classification of perceived tempo class, as well as a significant reduction of octave errors when applied to an array of available tempo estimation algorithms.
Convention Paper 7777 (Purchase now)
P23-4 Perceptually-Motivated Audio Morphing: Softness—Duncan Williams, Tim Brookes, University of Surrey - Guildford, Surrey, UK
A system for morphing the softness and brightness of two sounds independently from their other perceptual or acoustic attributes was coded. The system is an extension of a previous one that morphed brightness only, that was based on the Spectral Modeling Synthesis additive/residual model. A Multidimensional Scaling analysis, of listener responses to paired comparisons of stimuli generated by the morpher, showed movement in three perceptually-orthogonal directions. These directions were labeled in a subsequent verbal elicitation experiment that found that the effects of the brightness and softness controls were perceived as intended. A Timbre Morpher, adjusting additional timbral attributes with perceptually-meaningful controls, can now be considered for further work.
Convention Paper 7778 (Purchase now)
P23-5 Resolution of Spatial Distribution Perception with Distributed Sound Source in Anechoic Conditions—Olli Santala, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
The resolution of directional perception of spatially distributed sound sources was investigated with a listening test in an anechoic chamber using various sound source distributions. Fifteen loudspeakers were used to produce test cases that included sound sources with varying widths and wide sound sources with gaps in the distribution. The subjects were asked to distinguish which loudspeakers emitted sound according to their own perception. Results show that small gaps in the sound source were not perceived accurately and wide sound sources were perceived narrower than they actually were. The results also indicate that the resolution for fine spatial details was worse than 15 degrees when the sound source was wide.
Convention Paper 7779 (Purchase now)
P23-6 Perceived Roughness—A Recent Psychoacoustic Measurement—Robert Mores, Thorsten Smit, Jana-Marie Wiese, University of Applied Science - Hamburg, Germany
This paper relates to an investigation on perceived roughness from Aures in 1984 where findings are based on psychoacoustic tests with synthetic sounds and a small group of people. The related results have repeatedly been used for modeling roughness perception since then, for instance in the context of noise perception. Roughness is again an issue when investigating the perceived quality or timbre of musical sounds. In this context roughness is one among some ten mid-level features to be extracted. Here, perceived roughness is measured again, but on a wider basis than in the earlier investigation. This paper outlines the psychoacoustic investigation, basically following the method of Aures, but modifying some of the issues under question. The results are reasonable and differ from the earlier findings in various aspects.
Convention Paper 7780 (Purchase now)
P23-7 A Physiological Auditory Model—Václav Vencovsky, Czech Technical University in Prague - Prague, Czech Republic
A physiological auditory model is described. The model simulates a processing of a sound by an outer, middle, and inner ear. A nonlinear inner ear model comprises the cochlear frequency selectivity model and the inner hair cells model proposed according to mammalian physiological data. A capability of the auditory model to simulate human psychophysical masking data is verified.
Convention Paper 7781 (Purchase now)
Assessment and Evaluation
Sunday, May 10, 09:00 — 12:30
Chair: Gaëtan Lorho
P24-1 Influence of Level Setting on Loudspeaker Preference Ratings—Vincent Koehl, Mathieu Paquier, Université de Brest - Plouzané, France
The perceived audio quality of a sound-reproduction device such as a loudspeaker is hard to evaluate. Industrial and academic researchers are still focusing on the design of reliable assessment procedures to measure this subjective character. One of the main issues of listening tests is about their validity in regard to real comparison situations (Hi-Fi magazine evaluations, audiophile, sound engineer, customer, etc.). Are the conclusions of laboratory tests consistent with these almost informal comparisons? As an example, one of the main differences between listening tests and real-life comparisons is about the loudness matching. This paper is aimed at comparing paired-comparison tests that are commonly accomplished under laboratory conditions with a procedure assumed to be closer to real-life conditions. It shows that differences in the test procedures led to differences in the subjective assessments.
Convention Paper 7782 (Purchase now)
P24-2 Comparing Three Methods for Sound Quality Evaluation with Respect to Speed and Accuracy—Florian Wickelmaier, Nora Umbach, Konstantin Sering, University of Tübingen - Tübingen, Germany; Sylvain Choisel, Bang & Olufsen A/S - Struer, Denmark, now at Philips Consumer Lifestyle, Leuven, Belgium
The goal of the present study was to compare three response-collection methods that may be used in sound quality evaluation. To this end, 52 listeners took part in an experiment where they assessed the audio quality of musical excerpts and six processed versions thereof. For different types of program material, participants performed (a) a direct ranking of the seven sound samples, (b) pairwise comparisons, and (c) a novel procedure, called ranking by elimination. The latter requires subjects on each trial to eliminate the least preferred sound; the elimination continues until only the sample with the highest audio quality is left. The methods are compared with respect to the resulting ranking/scaling and the time required to obtain the results.
Convention Paper 7783 (Purchase now)
P24-3 Reference Units for the Comparison of Speech Quality Test Results—Nicolas Côté, Ecole Nationale d’Ingénieurs de Brest - Plouzané, France, Deutsche Telekom Laboratories, Berlin, Germany; Vincent Koehl, Université de Brest - Plouzané, France; Valérie Gautier-Turbin, France Telecom R&D - Lannion, France; Alexander Raake, Sebastian Möller, Deutsche Telekom Laboratories - Berlin, Germany
Subjective tests are carried out to assess the quality of an entity as perceived by a user. However, several characteristics inherent to the subject or to the test methodology might influence the users’ judgments. As a result, reference conditions are usually included in subjective tests. In the field of quality of transmitted speech, reference conditions correspond to a speech sample impaired by a known amount of degradation. In this paper several kinds of reference conditions and the process used for their production are presented. Examples of the corresponding normalization procedure of each kind of reference are given.
Convention Paper 7784 (Purchase now)
P24-4 The Influence of Sound Processing on Listeners’ Program Choice in Radio Broadcasting—Hans-Joachim Maempel, Fabian Gawlik, Technische Universität Berlin - Berlin, Germany
Many opinions on broadcast sound processing are founded on tacit assumptions about certain effects on listeners. These, however, have lacked support by internally and ecologically valid empirical data so far. Thus, under largely realistic conditions it has been experimentally investigated to what extent broadcast sound processing influences listeners’ program choice. Technical features of stimuli, socio-demographic data of the test persons, and data of listening conditions have been additionally collected. In the main experiment, subjects were asked to choose one out of six audio stimuli varied in content and sound processing. The varied sound processing caused marginal and statistically not significant differences in frequencies of program choice. By contrast, a subsequent experiment enabling a direct comparison of different sound processings of the same audio content yielded distinct preferences for certain sound processings.
Convention Paper 7785 (Purchase now)
P24-5 Free Choice Profiling and Natural Grouping as Methods for the Assessment of Emotions in Musical Audio Signals—Sebastian Schneider, Florian Raschke, Ilmenau University of Technology - Ilmenau, Germany; Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology, IDMT - Ilmenau, Germany; Dominik Strohmeier, Ilmenau University of Technology - Ilmenau, Germany
To measure the perceived emotions caused by musical audio signals we propose to use “Free Choice Profiling” (FCP) combined with “Natural Grouping” (NG). FCP/NG—originally derived from food research and new to the research of music perception—allow participants to evaluate stimuli using their own vocabulary. To evaluate the proposed methods, we conducted an experiment where 16 participants had to assess major-major and minor-minor chord pairs. Unlike one could expect, allowing participants to express themselves freely does not lead to a degeneration of the quality of the data. Instead, clearly interpretable results consistent with music theory and emotional psychology were obtained. These results encourage further investigations, which could lead to a general method for assessing emotions in music.
Convention Paper 7786 (Purchase now)
P24-6 Subjective Quality Evaluation of Audio Streaming Applications on Absolute and Paired Rating Scales—Bernhard Feiten, Alexander Raake, Marie-Neige Garcia, Ulf Wüstenhagen, Jens Kroll, Deutsche Telekom Laboratories - Berlin, Germany
In the context of the development of a parametric model for the quality assessment of audiovisual IP-based multimedia applications, audio tests have been carried out. The test method used for the subjective audio tests was aligned to the method used for video tests. Hence, the Absolute Category Ranking (ACR) method was applied. To prove the usability of ACR tests for this purpose MUSHRA and ACR were applied in parallel listening tests. The MPEG audio codecs AAC, HE-AAC, MP2, and MP3 at different bitrates and different packet loss conditions were evaluated. The test results show that the ACR method also reveals the quality differences for higher qualities, even though MUSHRA has superior resolution.
Convention Paper 7787 (Purchase now)
P24-7 Assessor Selection Process for Multisensory Applications—Søren Vase Legarth, Nick Zacharov, DELTA SenseLab - Hørsholm, Denmark
Assessor panels are used to perform perceptual evaluation tasks in the form of listening and viewing tests. In order to ensure the quality of collected data it is vital that the selected assessors have the desired qualities in terms of discrimination aptitude as well as consistent rating ability. This work extends existing procedures in this field to provide a statistically robust and effcient manner for assessing and evaluating the performance of assessors for listening and viewing tasks.
Convention Paper 7788 (Purchase now)
Sound Design and Processing
Sunday, May 10, 09:00 — 12:30
Chair: Michael Hlatky
P25-1 Hierarchical Perceptual Mixing—Alexandros Tsilfidis, Charalambos Papadakos, John Mourjopoulos, University of Patras - Patras, Greece
A novel technique of perceptually-motivated signal-dependent audio mixing is presented. The proposed Hierarchical Perceptual Mixing (HPM) method is implemented in the spectro-temporal domain; its principle is to combine only the perceptually relevant components of the audio signals, derived after the calculation of the minimum masking threshold, which is introduced in the mixing stage. Objective measures are presented indicating that the resulting signals have enhanced dynamic range and lower crest factor with no unwanted artifacts, compared to the traditionally mixed signals. The overall headroom is improved, while clarity and tonal balance are preserved.
Convention Paper 7789 (Purchase now)
P25-2 Source-Filter Modeling in Sinusoid Domain—Wen Xue, Mark Sandler, Queen Mary, University of London - London, UK
This paper presents the theory and implementation of source-filter modeling in sinusoid domain and its applications on timbre processing. The technique decomposes the instantaneous amplitude in a sinusoid model into a source part and a filter part, each capturing a different aspect of the timbral property. We show that the sinusoid domain source-filter modeling is approximately equivalent to its time or frequency domain counterparts. Two methods are proposed for the evaluation of the source and filter, including a least-square method based on the assumption of slow variation of source and filter in time, and a filter bank method that models the global spectral envelope in the filter. Tests show the effectiveness of the algorithms for isolation frequency-driven amplitude variations. Example applications are given to demonstrate the use of the technique for timbre processing.
Convention Paper 7790 (Purchase now)
P25-3 Analysis of a Modified Boss DS-1 Distortion Pedal—Matthew Schneiderman, Mark Sarisky, University of Texas at Austin - Austin, TX, USA
Guitar players are increasingly modifying (or paying someone else to modify) inexpensive mass-produced guitar pedals into boutique units. The Keeley modification of the Boss DS-1 is a prime example. In this paper we compare the measured and perceived performance of a Boss DS-1 before and after applying the Keeley All-Seeing-Eye and Ultra mods. This paper sheds light on psychoacoustics, signal processing, and guitar recording techniques in relation to low fidelity guitar distortion pedals.
Convention Paper 7791 (Purchase now)
P25-4 Phase and Amplitude Distortion Methods for Digital Synthesis of Classic Analog Waveforms—Joseph Timoney, Victor Lazzarini, Brian Carty, NUI Maynooth - Maynooth, Ireland; Jussi Pekonen, Helsinki University of Technology - Espoo, Finland
An essential component of digital emulations of subtractive synthesizer systems are the algorithms used to generate the classic oscillator waveforms of sawtooth, square, and triangle waves. Not only should these be perceived to be authentic sonically, but they should also exhibit minimal aliasing distortions and be computationally efficient to implement. This paper examines a set of novel techniques for the production of the classic oscillator waveforms of analog subtractive synthesis that are derived from using amplitude or phase distortion of a mono-component input waveform. Expressions for the outputs of these distortion methods are given that allow parameter control to ensure proper bandlimited behavior. Additionally, their implementation is demonstrably efficient. Last, the results presented illustrate their equivalence to their original analog counterparts.
Convention Paper 7792 (Purchase now)
P25-5 Soundscape Attribute Identification—Martin Ljungdahl Eriksson, Jan Berg, Luleå University of Technology - Luleå, Sweden
In soundscape research, the field’s methods can be employed in combination with approaches involving sound quality attributes in order to create a deeper understanding of sound images and soundscapes and how these may be described and designed. The integration of four methods are outlined, two from the soundscape domain and two from the sound engineering domain.
Convention Paper 7793 (Purchase now)
P25-6 SonoSketch: Querying Sound Effect Databases through Painting—Michael Battermann, Sebastian Heise, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Numerous techniques support finding sounds that are acoustically similar to a given one. It is hard, however, to find a sound to start the similarity search with. Inspired by systems for image search that allow drawing the shape to be found, we address quick input for audio retrieval. In our system, the user literally sketches a sound effect, placing curved strokes on a canvas. Each of these represents one sound from a collection of basic sounds. The audio feedback is interactive, as is the continuous update of the list of retrieval results. The retrieval is based on symbol sequences formed from MFCC data compared with the help of a neural net using an editing distance to allow small temporal changes.
Convention Paper 7794 (Purchase now)
P25-7 Generic Sound Effects to Aid in Audio Retrieval—David Black, Sebastian Heise, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Sound design applications are often hampered because the sound engineer must either produce new sounds using physical objects, or search through a database of sounds to find a suitable sample. We created a set of basic sounds to mimic these physical sound-producing objects, leveraging the mind's onomatopoetic clustering capabilities. These sounds, grouped into onomatopoetic categories, aid the sound designer in music information retrieval (MIR) and sound categorization applications. Initial testing regarding the grouping of individual sounds into groups based on similarity has shown that participants tended to group certain sounds together, often reflecting the groupings our team constructed.
Convention Paper 7795 (Purchase now)
Room Acoustics and Loudspeaker Interaction
Sunday, May 10, 10:30 — 12:00
P26-1 Acoustic Design of Classrooms—Suthikshn Kumar, PESIT - Bangalore, India
Acoustic principles when used effectively in classroom design can improve the audibility of the professor in dramatic way. The cost-effective way to enhance the acoustics serves several purposes: Less speaking effort on the part of the lecturer; students can easily hear the lecturer more clearly; improved communication and, hence, improved learning experience. Several improvements can be done to the classroom architecture to enhance the signal-to-noise ratio, reduce reverberation, and background noise. We propose an innovative way of providing parabolic reflectors near the platform for amplifying the lecturer’s voice. This paper focuses on the cost-effective, energy efficient acoustic design of classrooms.
Convention Paper 7796 (Purchase now)
P26-2 Epidaurus: Comments on the Acoustics of the Legendary Ancient Greek Theater—Christos Goussios, Christos Sevastiadis, Kalliopi Chourmouziadou, George Kalliris,, Aristotle University of Thessaloniki - Thessaloniki, Greece
The ancient Greek theaters and especially the well preserved theater of Epidaurus are of great interest because of their legendary acoustic characteristics. In the present paper the history and the construction characteristics of the specific theater are presented. The differences between the ancient and modern use of it are explained. Important acoustic parameters calculated using in situ measurements are presented. The conclusions show the relation between its excellent acoustic performance and the obtained results.
Convention Paper 7797 (Purchase now)
P26-3 A Matlab Toolbox for the Analysis of Ando’s Factors—Dario D'Orazio, Paolo Guidorzi, Massimo Garai, University of Bologna - Bologna, Italy
The autocorrelation and crosscorrelation functions analysis, as well-known in literature, obtains remarkable results in different scientific fields. The autocorrelation function (ACF) and the interaural crosscorrelation function (IACF) analysis in architectural acoustics is known thanks to Y. Ando's work. The Toolbox presented in this work has been developed in order to compute Ando's significant and spatial factors (as the factors obtained from ACF and IACF are called), to subjective preference functions and to investigate further applications.
Convention Paper 7798 (Purchase now)
Sunday, May 10, 13:00 — 17:00
Chair: Günther Thiele
P27-1 On the Myth of Pulse Width Modulated Spectrum in Theory and Practice—Arnold Knott, Technical University of Denmark - Lyngby, Denmark, Harman/Becker Automotive Systems GmbH, Straubing, Germany; Gerhard Pfaffinger, Harman/Becker Automotive Systems GmbH - Straubing, Germany; Michael A. E. Andersen, Technical University of Denmark - Lyngby, Denmark
Switch-mode audio power amplifiers are commonly used in sound reproduction. Their well-known drawback is the radiation of high frequent energy, which can disturb radio and TV receivers. The designer of switch-mode audio equipment therefore need to make arrangements to prevent this coupling, which would otherwise result in bad audio performance. A deep understanding of the pulse width modulated (PWM) signal is therefore essential, which resulted in different mythic models as pulse, trapezoidal, or Double Fourier Series (DFS) representations in the past. This paper will clarify these theoretical approaches by comparing them with reality from both the time and the frequency domain perspective. For validation a switch-mode audio power amplifier was built, delivering the contents material with less than 0.06 percent distortion across the audio band at 50 W. The switch-mode signals have been evaluated very precisely in time and spectral domain to enlighten the assumptions about the PWM spectra and decrypt this myth.
Convention Paper 7799 (Purchase now)
P27-2 Design Approaches for Psychoacoustical In-Band Noise Shaping Filters—Jochen Hahn, University of Kaiserslautern - Kaiserslautern, Germany
Noise shaping is a state-of-the-art technique to preserve the perceived quality of audio signals when requantization happens. Noise shaping filters are special filters because of the nonlinear characteristics in hearing. They have to be taken into account when designing these special digital audio filters. The design approaches presented in this contribution meet these requirements. They minimize or limit the filter magnitude, the unweighted noise amplification, and the group delay characteristics of the filter.
Convention Paper 7800 (Purchase now)
P27-3 A New Analog Input Topology for Extreme Dynamic Range Analog to Digital Conversion—Jamie Angus, University of Salford - Salford, Greater Manchester, UK
The purpose of this paper is to introduce a novel form of input topology for the analog inputs of oversampled analog to digital converters. This new topology, when used with existing components, can achieve a dynamic range of 28 linear bits but has the potential to achieve even more if suitable technology can be developed. The paper analyzes the current limitations of existing topologies, presents the new topology, and shows how it can achieve much higher dynamic ranges. The optimal application of the topology and means of extending it for higher dynamic ranges is also discussed.
Convention Paper 7801 (Purchase now)
P27-4 Automatic Equalization of Flat TV Loudspeakers Using Parametric IIR Filters—Herwig Behrends, NXP Semiconductors - Hamburg, Germany; Adrian von dem Knesebeck, Helmut Schimidt University, University of the Federal Armed Forces - Hamburg, Germany; Werner Bradinal, Peter Neumann, NXP Semiconductors - Hamburg, Germany; Udo Zölzer, Helmut Schmidt University, University of the Federal Armed Forces - Hamburg, Germany
Small loudspeakers used in the today’s flat television set cabinets and the requirement for “invisible sound” lead to a frequency response that is influenced in a very disadvantageous way by the physical design constraints. Loudspeakers are deeply embedded within the cabinet—the sound is thus forced through narrow vents, funnels or other waveguides. Down- or backfiring placements of the loudspeakers are also common practice, in order to minimize their visibility as much as possible. Generally, this leads to a non-flat frequency response with a strong coloration of the sound. We present an approach to compensate these effects by means of simple second order equalizer sections (biquads), where center frequency, gain, and bandwidth of the equalizer sections are automatically calculated from a measured frequency response. The tool is usable in a laboratory environment, with relatively inexpensive standard PC sound cards and microphones.
Convention Paper 7802 (Purchase now)
P27-5 Audio n-Genie: Domain Specific Language for Audio Processing—Tiziano Leidi, Institute for Applied Computer Science and Industrial Technologies of Southern Switzerland (ICIMSI) - Manno, Switzerland; Thierry Heeb, ANAGRAM Technologies SA - Préverenges, Switzerland; Marco Colla, Institute for Applied Computer Science and Industrial Technologies of Southern Switzerland (ICIMSI) - Manno, Switzerland; Jean-Philippe Thiran, Ecole Polytechnique Federale de Lausanne (EPFL) - Lausanne, Switzerland
Specialized development environments represent today an important added value for domain specific system providers suffering the lack of a dedicated, ergonomic, efficient, and affordable tool able to boost their core business. This paper describes Audio n-genie, a domain-specific language and its associated development environment supporting the automatic production, by mean of component-based model-driven generative programming, of digital audio processing applications.
Convention Paper 7803 (Purchase now)
P27-6 Acoustic Echo Cancellation Using MIMO Blind Deconvolution—Ephraim Gower, Malcolm Hawksford, University of Essex - Colchester, UK
A new multiple-input multiple-output frame-processing algorithm is introduced that exploits blind deconvolution for acoustic echo cancellation. The channel deconvolution filters, which can be blindly estimated as either finite impulse or infinite impulse responses, are optimized by maximizing the information flow through several nonlinear neurons. The algorithm requires that for every system audio output there be a corresponding microphone for effective feedback signal separation/cancellation. The desired talker signal from the algorithm outputs is recognized and transmitted while the identified feedback signals are discarded.
Convention Paper 7804 (Purchase now)
P27-7 Implementing Audio Algorithms and Integrating Processor-Specific Code Using Model Based Design—Arvind Ananthan, The MathWorks - Natick, MA, USA; Mark Corless, The MathWorks - Novi, MI, USA; Marco Roggero, The MathWorks - Ismaning, Germany
This paper explores the final stages in the model-based design and implementation of an audio algorithm on a fixed-point embedded processor. Once the algorithm, a 3-band parametric equalizer in this example, is designed and simulated using a combination of scripting and graphical modeling tools, real-time C-code is automatically generated from this model. This paper illustrates how algorithmic C-code generated from such a model in Simulink can be integrated into a stand-alone embedded project as a library and implemented on an Analog Devices Blackfin® 537 processor. It also elaborates how processor specific C-callable assembly code can then be integrated into the model for both simulation and code generation to improve its execution performance on this processor.
Convention Paper 7805 (Purchase now)
P27-8 Subjective and Objective Evaluation of the Audio Vacuum-Tube Amplifiers—Andrzej Dobrucki, Wroclaw University of Technology - Wroclaw, Poland; Stanislaw Maleczek, Military Institute of Engineering Technology - Wroclaw, Poland; Maurycy Kin, Wroclaw University of Technology - Wroclaw, Poland
The subjective and objective evaluation of 5 high-quality vacuum-tube audio amplifiers is presented in this paper. As the reference the professional transistor amplifier has been used. The subjective evaluation has been done by the team of judges as well as with the computer-based psychoacoustic model according with PAQM protocol. The amplifiers’ sound quality assessed by the listeners is consistent with the one evaluated with the use of the psychoacoustic model. It was found that the best sound quality is obtained by vacuum-tube amplifiers, the worst—by the reference amplifier. The results of subjective evaluation are inconsistent with quality assessed by measurement of objective parameters: all amplifiers have comparable quality, but the best is the transistor amplifier because of lowest level of THD+N level.
Convention Paper 7806 (Purchase now)
Psychoacoustics and Perception
Sunday, May 10, 13:00 — 16:00
Chair: Florian Wickelmaier
P28-1 Localization of Consecutive Sound Events in Reverberant Environment—Marko Takanen, Antti Jylhä, Tapani Pihlajamäki, Juha Holm, Ilkka Huhtakallio, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
A listening test was conducted to assess the localization of consecutive sound events in simulated reverberant conditions. The stimuli consisted of two sound events, which were reverberant wideband harmonic sounds reproduced in a multichannel anechoic chamber. Localization threshold for the latter sound event was measured as the direct-to-reverberant sound level ratio with an adaptive transformed up-down method. The studied factors affecting the localization threshold were the time interval and pitch difference between the two sound events and the time gap between the direct sound and reverberation. The results indicate that all factors have a significant effect on localization.
Convention Paper 7807 (Purchase now)
P28-2 The Contrasting and Conflicting Definitions of Envelopment—Jan Berg, Luleå University of Technology - Luleå, Sweden
In spatial audio, the term envelopment is not unambiguously defined and the different de facto definitions both overlap and contradict one another. This unclarity may pose a problem where the sensation of being surrounded by sound is subject for investigation and analysis. This paper reviews the different concepts of envelopment in order to point to where possible problems may occur. A tentative suggestion for a terminology that can serve the different contexts of enveloping sounds is also given.
Convention Paper 7808 (Purchase now)
P28-3 Apparent Source Width in ITU Surround—Jorge Medina Victoria, Thomas Görne, Hamburg University of Applied Sciences - Hamburg, Germany
Apparent Source Widths (ASW) of phantom images in a ITU-R BS.775-1 standard surround loudspeaker configuration have been investigated for different signals by means of a randomized blind test. Test signals were generated from anechoic recordings by amplitude panning between adjacent channels. The listening test showed that an increase of Apparent Source Width coincides with the increase of localization uncertainty at the side and back areas of the ITU setup. Largest ASW values were found between RS and LS channels.
Convention Paper 7809 (Purchase now)
P28-4 A New Methodological Approach to the Noise Threat Evaluation Based on the Selected Physiological Properties of the Human Hearing System—Jozef Kotus, Bozena Kostek, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A new way of assessment of noise-induced harmful effects on the human hearing system is presented in this paper. The method takes into consideration properties of the selected physiological human hearing system. On the basis of the hearing examinations and noise measurements results and psychoacoustical noise dosimeter performance the new indicators of the noise harmfulness were proposed. The evaluation of the proposed indicators were conducted on the basis of hearing examinations in the real noise exposure situations and also on the basis of the simulation results using standard test signals (such as white, pink, and brown noise). The performed analysis and obtained results confirmed the practical usefulness and correctness of the proposed indicators.
Convention Paper 7813 (Purchase now)
P28-5 Octave-Band Analysis on ITU-R Listening Test Data—Ian M. Dash, Australian Broadcasting Corporation - Sydney, NSW, Australia
Listening test data collected in 2003 on 49 audio program samples were used to formulate the ITU-R BS.1770 program loudness prediction algorithm. The validity of this data at low frequencies was unproven. Octave-band analysis has therefore been performed on the test samples to test for audibility in each band. Results suggest that further listening tests may be needed to obtain reliable low-frequency data. A multiple regression analysis was also performed on the octave-band data to obtain a least-squares weighting curve for comparison with the BS.1770/RLB2 weighting curve. Results suggest that while the BS.1770 curve performs well, there is still room for improvement.
Convention Paper 7811 (Purchase now)
P28-6 Windowed Sine Bursts: In Search of Optimal Test Signals for Detecting the Threshold of Audibility of Temporal Decays—Andrew Goldberg, Helsinki University of Technology - Espoo, Finland
A slow decay in an audio signal is perceived as ringing and is commonly caused by room modes. This affects the perception of intelligibility, clarity, definition, and spatial rendering. A method has previously been devised to find the threshold of audibility of the decay in low-frequency narrow-band signals. One of the test signals in the large-scale listening test will be a low-frequency sine burst, but spectral spreading at the start and end of the test signal acts as an additional non-modal cue. This effect is removed by windowing, for example a half Hann. The aim of this paper is to determine the window length required (threshold) to render the end of the test signal free from audible spectral spreading. The Parameter Estimation by Sequential Testing (PEST) method and calibrated headphones (to remove factors associated with the listening environment) are used in subjective listening tests. The window length threshold is found to be constant above 200 Hz but rises exponentially toward low frequencies, and is replay level dependent. Threshold may be related to the absolute threshold of hearing, masking curves and/or auditory filter bandwidth.
Convention Paper 7812 (Purchase now)
Signal Analysis, Measurements, Restoration
Sunday, May 10, 13:30 — 15:00
P29-1 Evaluation and Comparison of Audio Chroma Feature Extraction Methods—Michael Stein, Benjamin M. Schubert, Ilmenau University of Technology - Ilmenau, Germany; Matthias Gruhne, Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Markus Mehnert, Ilmenau University of Technology - Ilmenau, Germany
This paper analyzes and compares different methods for digital audio chroma feature extraction. The chroma feature is a descriptor, which represents the tonal content of a musical audio signal in a condensed form. Therefore chroma features can be considered as an important prerequisite for high-level semantic analysis, like chord recognition or harmonic similarity estimation. A better quality of the extracted chroma feature enables much better results in these high-level tasks. In order to discover the quality of chroma features, seven different state-of-the-art chroma feature extraction methods have been implemented. Based on an audio database, containing 55 variations of triads, the output of these algorithms is critically evaluated. The best results were obtained with the Enhanced Pitch Class Profile.
Convention Paper 7814 (Purchase now)
P29-2 Measuring Transient Structure-Borne Sound in Musical Instruments—Proposal and First Results from a Laser Intensity Measurement Setup—Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany; Marcel thor Straten, Consultant - Seevetal, Germany; Andreas Selk, Consultant - Hamburg, Germany
The proposal for this new measurement setup is motivated by curiosity in transients propagating across arched tops of violins. Understanding the impact of edge construction on transient wave reflection back to the to the top of a violin or on conduction into the rib requires single-shot recordings possibly without statistical processing. Signal-to-noise ratio should be high although mechanical amplitudes at distinct locations on the structure surface are in the range of a few micrometers only. In the proposed setup, the intensity of a laser beam is directly measured after passing a screen attached to the device under test. The signal-to-noise ratio achieved for one micrometer transients in single-shot recordings is significantly more than 60 dB.
Convention Paper 7815 (Purchase now)
P29-3 Evaluating Ground Truth for ADRess as a Preprocess for Automatic Musical Instrument Identification—Joseph McKay, Mikel Gainza, Dan Barry, Dublin Institute of Technology - Dublin, Ireland
Most research in musical instrument identification has focused on labeling isolated samples or solo phrases. A robust instrument identification system capable of dealing with polytimbral recordings of instruments remains a necessity in music information retrieval. Experiments are described that evaluate the ground truth of ADRess as a sound source separation technique used as a preprocess to automatic musical instrument identification. The ground truth experiments are based on a number of basic acoustic features, while using a Gaussian Mixture Model as the classification algorithm. Using all 44 acoustic feature dimensions, successful identification rates are achieved.
Convention Paper 7816 (Purchase now)
P29-4 Improving Rhythmic Pattern Features Based on Logarithmic Preprocessing—Matthias Gruhne, Christian Dittmar, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
In the area of Music Information Retrieval, the rhythmic analysis of music plays an important role. In order to derive rhythmic information from music signals, several feature extraction algorithms have been described in the literature. Most of them extract the rhythmic information by auto-correlating the temporal envelope derived from different frequency bands of the music signal. Using the auto-correlated envelopes directly as an audio-feature is afflicted with the disadvantage of tempo dependency. To circumvent this problem, further postprocessing via higher-order statistics has been proposed. However, the resulting statistical features are still tempo dependent to a certain extent. This paper describes a novel method, which logarithmizes the lag-axis of the auto-correlated envelope and discards the tempo-dependent part. This approach leads to tempo-invariant rhythmic features. A quantitative comparison of the original methods versus the proposed procedure is described and discussed in this paper.
Convention Paper 7817 (Purchase now)
P29-5 Further Developments of Parameterization Methods of Audio Stream Analysis for Security Purposes—Pawel Zwan, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The paper presents an automatic sound recognition algorithm intended for application in an audiovisual security monitoring system. A distributed character of security systems does not allow for simultaneous observation of multiple multimedia streams, thus an automatic recognition algorithm must be introduced. In the paper a module for the parameterization and automatic detection of audio events is described. The spectral analysis of sounds of a broken window, gunshot, and scream are performed and parameterization methods are proposed and discussed. Moreover, a sound classification system based on the Support Vector Machines (SVM) algorithm is presented and its accuracy is discussed. The practical application of the system with the use of a monitoring station is shown. The plan of further experiments is presented and the conclusions are derived.
Convention Paper 7818 (Purchase now)
P29-6 Estimating Instrument Spectral Envelopes for Polyphonic Music Transcription in a Music Scene-Adaptive Approach—Julio J. Carabias-Orti, Pedro Vera-Candeas, Nicolas Ruiz-Reyes, Francisco J. Cañadas-Quesada, Pablo Cabañas-Molero, University of Jaén - Linares, Spain
We propose a method for estimating the spectral envelope pattern of musical instruments in a musical scene-adaptive scheme, without having any prior knowledge about the real transcription. A musical note is defined as stable when variations between its harmonic amplitudes are held constant during a certain period of time. A density-based clustering algorithm is used with the stable notes in order to separate different envelope models for each note. Music scene-adaptive envelope patterns are finally obtained from similarity and continuity of the different note models. Our approach has been tested in a polyphonic music transcription scheme with synthesized and real music recordings obtaining very promising results.
Convention Paper 7819 (Purchase now)