• Sessions by Industry
• Detailed Calendar
• Convention Planner
• Paper Sessions
• Master Classes
• Live Sound Seminars
• Exhibitor Seminars
• Special Events
• Student Program
• Technical Tours
• Technical Council
• Standards Committee
• Heyser Lecture
AES Amsterdam 2008
Loudspeakers - 1
Paper Session Details
Saturday, May 17, 09:00 — 12:00
Chair: Stanley Lipshitz, University of Waterloo - Waterloo, Ontario, Canada
P1-1 Audio Capacitors. Myth or Reality? —Philip Duncan, University of Salford - Salford, Greater Manchester, UK; Paul Dodds, Nigel Williams, ICW, Ltd. - Wrexham, Wales, UK
This paper gives an account of work carried out to assess the effects of metallized film polypropylene crossover capacitors on key sonic attributes of reproduced sound. The capacitors under investigation were found to be mechanically resonant within the audio frequency band, and results obtained from subjective listening tests have shown this to have a measurable effect on audio delivery. The listening test methodology employed in this study evolved from initial ABX type tests with set program material to the final A/B tests where trained test subjects used program material that they were familiar with. The main findings were that capacitors used in crossover circuitry can exhibit mechanical resonance, and that maximizing the listener’s control over the listening situation and minimizing stress to the listener were necessary to obtain meaningful subjective test results.
Convention Paper 7314 (Purchase now)
P1-2 Perceptual Study and Auditory Analysis on Digital Crossover Filters —Henri Korhola, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
Digital crossover filters offer interesting possibilities for sound reproduction, but there does not exist many publications on how they behave perceptually. In this paper phase and magnitude errors in digital implementations of linear phase FIR as well as Linkwitz-Riley crossover filters are studied perceptually and by auditory analysis. In a headphone simulation listening experiment we explored the just noticeable level of degradation due to crossover filter artifacts. In a real loudspeaker experiment we explored rough guidelines for "safe" filter orders of linear-phase FIR crossover filters, which would not produce audible errors. Possibilities to predict the perceived errors were then explored using auditory analysis, including also third-octave magnitude spectrum and group delay as simple auditory correlates. Linear-phase FIR crossovers were found to produce different kinds of phase errors than Linkwitz-Riley crossovers. The auditory analysis can qualitatively explain the perceptibility degradation.
Convention Paper 7315 (Purchase now)
P1-3 The Air Spring Effect of Flat Panel Loudspeakers —Daniel Beer, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Michaela Schuster, Michael Jahr, Technical University Ilmenau - Ilmenau, Germany; Alexander Reich, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Flat panel loudspeakers are characterized by their low manufactured depth. Compared with conventional loudspeakers the space saving integration in existing surroundings is an advantage. From the acoustics point of view disadvantages come along with the low manufactured depth that influence the reproduction in the lower and middle frequency range. Based on measurements and FEM-simulations the reasons for this behavior were analyzed. Supplementary methods for solving this problem have been considered that are derived from conventional loudspeaker technologies.
Convention Paper 7316 (Purchase now)
P1-4 The Inertial Air Load of a Loudspeaker Diaphragm —John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada, and B&W Group Ltd., Steyning, West Sussex, UK
A typical bass loudspeaker driver has an inertial air load, which is about 30% of its actual cone mass. This air mass is often poorly understood, but it is significant in defining the resonance frequency; and the purpose of this paper is to understand the concept, clarify important aspects, and present corroborative measurements. The immediate surroundings of the diaphragm determine the low-frequency air load, and measurements on a test driver with different mountings arrangements are made and assessed, including measurements in vacuum. A loudspeaker box presents its own complications. Simulations are used to show how the air load depends on baffle size. In general, the air load may not be accurately represented by the usual approximations that apply to a piston in an infinite baffle or to a freely oscillating disk, but they do give a rough estimate.
Convention Paper 7317 (Purchase now)
P1-5 Horn Loudspeaker Nonlinearty Comparison and Linearization Using Volterra Series —Delphine Bard, University of Lund - Lund, Sweden
The characterization of a weakly nonlinear electroacoustic device with usual methods of measurement (THD, intermodulation) does not illustrate the nonlinearities themselves, but only some of their effects. Device linearization can be achieved by applying the inverse nonlinearity upstream of the device, under the condition that the nonlinearity law is known in detail. This paper presents nonlinearities behavior comparison of horn loudspeakers of different frequency ranges using an experimental method of weak nonlinearity characterization and compensation, based on a representation of the nonlinearity by Volterra series using multitone excitations.
Convention Paper 7318 (Purchase now)
P1-6 Audibility of Phase Response Differences in a Stereo Playback System. Part I: Headphone Reproduction of Wide-Band Stimuli —Geoff Martin, Sylvain Choisel, Bang & Olufsen A/S - Struer, Denmark
The audibility of phase distortion in sound reproduction systems has been the subject of many studies. However, it remains a topic of controversy, in particular in the field of loudspeaker or headphone equalization. Most studies lead to the conclusion that, although phase distortion may be audible for specific stimuli, in realistic listening situations in a room, they will go largely unnoticed. These studies, however, have focused on monophonic phase distortion; a severe limitation, since ignoring phase response in equalization can result in different phase distortion in different channels. It is the purpose of the present study to investigate the audibility of stereophonic phase mismatch in the specific case of headphone reproduction. In addition, the implications on microphone design and production are discussed.
Convention Paper 7319 (Purchase now)
Saturday, May 17, 09:30 — 12:00
Chair: Sascha Spors, Technische Universität Berlin - Berlin, Germany
P2-1 EBU Tech.doc. 3326 for Interoperability between Audio over IP Units —Lars Jonsson, Swedish Radio - Stockholm, Sweden; Mathias Coinchon, European Broadcasting Union - Geneva, Switzerland
Audio over IP end units are now common in radio and TV operations for streaming programs over IP networks. The units are used to create contribution circuits from remote sites or local offices into main studio centers. The IP networks used are usually well managed corporate networks with good Quality of Service (QoS) and usually high bandwidth. Due to its availability, the Internet is also increasingly used for various cases of radio and television contribution, especially over longer distances. However, the use of high bit rates and reliable contribution transmissions over the Internet cannot be guaranteed. Correspondents have the choice in their equipment to use either ISDN or the Internet to deliver their reports. More than 20 manufacturers now provide equipment for audio over IP applications. The EBU has issued and verified a standard, EBU TECH 3326-2007, which allows for interoperability between previously not compatible Audio over IP codecs. A plug-test between nine manufacturers held in February 2008 proved that earlier incompatible units now can connect according to the new standard.
Convention Paper 7322 (Purchase now)
P2-2 Audio Fingerprint and its Applications to Peer-to-Peer Systems —Antonello D'Aguanno, Goffredo Haus, Università degli Studi di Milano - Milano, Italy
In this paper we want to analyze the applicability of audio-fingerprint technology to peer-to-peer systems. Audio-fingerprint is a technology commonly applied to scopes like audio identification or digital rights management. Peer-to-peer is a common Internet paradigm to share various digitalized contents. We propose an improvement for typical peer-to-peer architectures (query flooding, centralized directory, hybrid architecture) that permits the application of audio fingerprint technology to these systems.
Convention Paper 7321 (Purchase now)
P2-3 Audible ICMP Echo Responses for Monitoring Ultra Low Delayed Audio Streams —Alexander Carôt, University of Lübeck - Lübeck, Germany; Alain B. Renaud, Queen’s University Belfast - Belfast, Northern Ireland, UK; Christian Werner, University of Lübeck - Lübeck, Germany
Playing live music on the Internet is very demanding in terms of delay, loss, or jitter and hence requires extremely reliable network conditions. Jitter is the most problematic factor because it has a direct influence on the required network buffer sizes for receiving low delay audio streams. Therefore, measuring the amount of jitter is a very complex task due to the multi-hop architecture of the Internet. So far it has been impossible to know at which hop these delay variances appear. The authors propose a solution that is able to generate an audible impression of the jitter problem for each hop.
Convention Paper 7320 (Purchase now)
P2-4 A Grid-Based Approach to the Remote Control and Recall of the Properties of IEEE1394 Audio Devices —Philip Foulkes, Richard Foss, Rhodes University - Grahamstown, South Africa
Typically, the configuration of audio hardware and software is not integrated. This paper discusses a software system that has been developed to remotely control and recall the properties of IEEE1394 (FireWire) audio devices via a series of graphical routing matrices. The software presents sound engineers with a graphical routing matrix that shows, along its axes, the available FireWire audio devices on a FireWire network. Inter device connection management may be performed by selecting the cross points on the grid, and intra device control may be performed via device editors that are displayed via the axes of the matrix. The software application may be hosted by a compatible Digital Audio Workstation (DAW) application to allow for the storing and recalling of the various properties associated with the devices.
Convention Paper 7323 (Purchase now)
P2-5 Can the Public Internet Be Used for Broadcast? —Simon Daniels, Audio Processing Technology - Belfast, Northern Ireland, UK
This paper will look at a number of examples of remote broadcasts over contended IP links and examine the key points in their success. We will talk about issues such as jitter and latency and considerations regarding essential features on IP codec equipment. The experiences of major European broadcasters trialing audio over the Public Internet will form the basis of a discussion of the pitfalls and possibilities associated with using the public Internet for essential broadcast links.
Convention Paper 7324 (Purchase now)
Spatial Audio Perception and Processing - 1
Saturday, May 17, 14:00 — 18:00
Chair: Gunther Theile
P3-1 Objective and Subjective Evaluation of Urban Acoustic Modeling and Auralization —Yuliya Smyrnova, Yan Meng, Jian Kang, University of Sheffield - Western Bank, Sheffield, UK
This paper presents the results of objective and subjective evaluation of a simulation and auralization system based on model CRR—combined ray-tracing and radiosity. Auralization of an urban square has been carried out with various boundary reflection patterns (purely specular, purely diffuse, and a mix of specular and diffuse) using two audio stimuli. The subjective evaluation results reveal a strong impact of sound sources and reflection pattern. Despite similarities in objective measures, there are noticeable differences in subjective attributes between signals based on simulated and measured impulse responses, but current auralization algorithms are still adequate in simulating real urban environments.
Convention Paper 7325 (Purchase now)
P3-2 Virtual vs. Actual Multichannel Acoustical Recording —Gavin Kearney, Trinity College - Dublin, Ireland; Jeff Levison, Euphonix, Inc. - Palo Alto, CA, USA
We present a comparison of live recordings of a choral ensemble versus dry recordings of the same players, with the acoustic environment reconstructed from impulse responses of the original reverberant performance space. Binaural measurements are used to objectively classify the recordings, and the perceptual attributes are investigated through a series of subjective listening tests. It is shown that the differences between dry recordings convolved with linear time-invariant (LTI) impulse responses and actual acoustical recordings can be perceived by a panel of expert listeners.
Convention Paper 7326 (Purchase now)
P3-3 Virtual Sources and Moving Targets —Glenn Dickins, David Cooper, David McGrath, Dolby Laboratories - Sydney, NSW, Australia
This paper presents an analysis of the effects of listener mobility on the stability of virtual source images created by a pair of loudspeakers. A spherical head is used to generate analytic head related transfer functions from which we create a simple perceptual localization model for the forward half of the horizontal plane. This model is then used to investigate changes in perceived source localization as the listener moves. The analysis demonstrates that even with this simple model, and the assumption of small listener movements, the source image becomes unstable at a relatively low frequency. Given that for such low frequencies the spherical head model is a reasonable approximation of measured HRTFs, this work suggests that individualized HRTF and pinnae functions are of little benefit when designing a virtualizer system that allows for some listener mobility.
Convention Paper 7327 (Purchase now)
P3-4 On the Use of Directional Loudspeakers to Create a Sound Source Close to the Listener—Aki Härmä, Steven van de Par, Werner de Bruijn, Philips Research Laboratories - Eindhoven, The Netherlands
It is sometimes desired to create an illusion that a sound source appears closer to the listener than the nearest loudspeaker location. By using highly directional loudspeakers one may manipulate the relation between direct and reverberant energy and therefore change the distance cues to make the sound source appear very close to the listener. In this paper we present a method combining highly directional sound with surround audio reproduction to produce controllable distance effects between the listener location and the nearest loudspeakers.
Convention Paper 7328 (Purchase now)
P3-5 Directional Analysis of Sound Field with Linear Microphone Array and Applications in Sound Reproduction —Jukka Ahonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland; Fabian Küch, Markus Kallinger, Richard Schultz-Amling, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The use of a linear microphone array composed of two closely spaced omnidirectional microphones as input to teleconference application of Directional Audio Coding (DirAC) is presented. DirAC is a method for spatial sound processing, where the direction of the arrival of sound and diffuseness are analyzed and used for different purposes in reproduction. Two-dimensional plane arrays have been used so far to generate input signals for DirAC, in which case it is possible to measure directly a two-dimensional sound field. In this paper a one-dimensional linear array is used to provide input signals for one-dimensional direction and diffuseness analysis in DirAC. Listening tests are conducted to evaluate the intelligibility of speech with simultaneous talkers when the linear array is used in teleconference applications.
Convention Paper 7329 (Purchase now)
P3-6 The SoundScape Renderer: A Unified Spatial Audio Reproduction Framework for Arbitrary Rendering Methods —Matthias Geier, Jens Ahrens, Sascha Spors, Technische Universität Berlin - Berlin, Germany
The SoundScape Renderer is a versatile software framework for real-time spatial audio rendering. The modular system architecture allows the use of arbitrary rendering methods. Three rendering modules are currently implemented: Wave Field Synthesis, Vector Base Amplitude Panning, and Binaural Rendering. After a description of the software architecture, the implementation of the available rendering methods is explained and the graphical user interface is shown as well as the network interface for the remote control of the virtual audio scene. Finally, the Audio Scene Description Format, a system-independent storage file format, is briefly presented.
Convention Paper 7330 (Purchase now)
P3-7 Initial Investigation of Signal Capture Techniques for Objective Measurement of Spatial Impression Considering Head Movement —Chungeun Kim, Russell Mason, Tim Brookes, University of Surrey - Guildford, Surrey, UK
In a previous study it was discovered that listeners normally make head movements attempting to evaluate source width and envelopment as well as source location. To accommodate this finding in the development of an objective measurement model for spatial impression, two capturing models were introduced and designed in this research, based on binaural technique: 1) rotating Head And Torso Simulator (HATS), and 2) a sphere with multiple microphones. As an initial study, measurements of interaural time difference, level difference and cross-correlation made with the HATS were compared with those made with a sphere containing two microphones. The magnitude of the differences was judged in a perceptually relevant manner by comparing them with the just-noticeable differences of these parameters.
Convention Paper 7331 (Purchase now)
P3-8 A Second Order Differential Microphone Technique for Spatially Encoding Virtual Room Acoustics —Alexander Southern, Damian Murphy, University of York - Heslington, York, UK
Room acoustics modeling using a numerical simulation technique known as the Digital Waveguide Mesh (DWM) has previously been presented as a suitable method for measuring spatial Room Impulse Responses (RIR) of virtual enclosed spaces. In this paper a new method for capturing the DWM modeled soundfield using an array of spatially distributed pressure-sensitive receivers is presented. The polar response of the formed 2nd order virtual microphone is measured and compared to the theoretical polar response. This approach is proven to be capable of decomposing the modeled soundfield into second order spherical harmonic components that are typically associated with 2nd order Ambisonics.
Convention Paper 7332 (Purchase now)
Low Bit-Rate Audio Coding
Saturday, May 17, 14:00 — 18:00
Chair: Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
P4-1 Time-Varying Transform for High Quality Audio Communication Codecs —Pierrick Philippe, France Télécom R&D - Cesson Sevigne, France; David Virette, Balázs Kövesi, France Télécom R&D - Lannion, France
High quality audio communication is a current challenge addressed by the standardization committees. In this context, ITU and MPEG recently issued standards for high quality coding of both speech and music contents. Transform coding is used and allows quality commensurate with bit rates regardless of the audio content. Up to now, only constant transform sizes were used in these coding schemes since time varying transform needed look-ahead for perfect reconstruction, hence adding further delay. In this paper we demonstrate how variable transform sizes can be used without affecting the coding delay. Based on filter bank theory, a framework avoiding look-ahead is presented. The quality improvement offered by the proposed solution is illustrated in the context of MPEG4 enhanced low delay AAC.
Convention Paper 7333 (Purchase now)
P4-2 Differential Graph-Based Coding of Spikes in a Biologically-Inspired Universal Audio Coder—Ramin Pichevar, Hossein Najaf-Zadeh, Louis Thibault, Hassan Lahdili, Communications Research Centre - Ottawa, Ontario, Canada
In a previous work we showed that it is possible to code audio materials using a biologically-inspired universal audio coder based on matching pursuit. The best atoms/kernels chosen by matching pursuit are represented by spikes to reflect the biologically-inspired nature of the algorithm. In that work, each spike or atom was defined by parameters such as timing, channel frequency, amplitude, chirp factor, etc., that were encoded independently. However, encoding each atom/spike as a separate entity is very bit consuming. In the present paper, we propose algorithms to encode only the difference between parameters associated with spikes. Hence, we assume that each spike/atom is a node in a graph and choose the sequence of spikes that will minimize the differential encoding costs. Methods based on minimum spanning tree and traveling salesman are proposed and compared for the graph-based optimization of the code.
Convention Paper 7334 (Purchase now)
P4-3 Unraveling the Relationship between Basic Audio Quality and Fidelity Attributes in Low Bit-Rate Multichannel Audio Codecs —Paulo Marins, Francis Rumsey, Slawomir Zielinski, University of Surrey - Guildford, Surrey, UK
Prior to this study the evaluation of multichannel audio codecs has been done mainly according to the ITU-R standards BS.1116 and BS.1534. Basic audio quality is the only perceptual attribute assessed in the majority of these tests. This approach, although efficient for measuring the overall quality of several codecs at once, does not provide reasons why a particular codec is rated as better as or worse than another. In this paper fidelity attributes were included; these were based on the attributes suggested in the ITU-R standards but have not been used explicitly in codec evaluation up to now. In this experiment the perceptual importance of these attributes and their contribution to the basic audio quality of low bit-rate surround sound codecs were investigated.
Convention Paper 7335 (Purchase now)
P4-4 A New Perceptual Model for Audio Coding Based on Spectro-Temporal Masking —Steven van de Par, Philips Research Europe - Eindhoven, The Netherlands; Jeroen Koppens, Philips Applied Technologies - Eindhoven, The Netherlands; Armin Kohlrausch, Philips Research Europe - Eindhoven, The Netherlands, and Eindhoven University of technology, Eindhoven, The Netherlands; Werner Oomen, Philips Applied Technologies - Eindhoven, The Netherlands
In psychoacoustics, considerable advances have been made recently in developing computational models that can predict the discriminability of two sounds taking into account spectro-temporal masking effects. These models operate as artificial observers by making predictions about the discriminability of arbitrary signals [e.g., Dau et al., J. Acoust. Soc. Am. 99, Vol. 36(15), 1996]. Therefore, such models can be applied in the context of a perceptual audio coder. A drawback, however, is the computational complexity of such advanced models, especially because the model needs to evaluate each quantization option separately. In this paper a model is introduced and evaluated that is a computationally lighter version of the Dau model but maintains its essential spectro-temporal masking predictions. Listening test results in a transform coder setting show that the proposed model outperforms a conventional purely spectral masking model and the original model proposed by Dau.
Convention Paper 7336 (Purchase now)
P4-5 Delayless Mixing—On the Benefits of MPEG-4 AAC-ELD in High Quality Communication Systems —Markus Schnell, Markus Schmidt, Fraunhofer IIS - Erlangen, Germany; Per Ekstrand, Dolby Sweden - Stockholm, Sweden; Tobias Albert, Daniel Przioda, Manfred Lutzky, Ralf Geiger, Fraunhofer IIS - Erlangen, Germany; Vesa Ruoppila, Dolby Germany - Nuremberg, Germany; Fredrik Henn, Dolby Sweden - Stockholm, Sweden; Erlend Tårnes, Tandberg - Oslo, Norway
Tele- and video conferencing systems for modern business communication are managed by central hubs, so-called multipoint control units (MCU). One major task of these units is the mixing of audio streams from the participating sites. This is traditionally done by decoding the streams, mixing in time domain, and then re-encoding of the mixed signals. This requires additional processing power, leads to increased delay, and degraded audio quality. The paper demonstrates how the recently standardized MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec offers a solution to these problems by efficient and delayless mixing in the transform domain of the codec.
Convention Paper 7337 (Purchase now)
P4-6 Low-Power MPEG-4 HE-AAC Version-2 Encoder —Chi-Min Liu, Han-Wen Hsu, Chung-Han Yang, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
In MPEG-4 HE-AAC version-2 encoder, the analysis/synthesis complex-exponential modulation filter banks are used in spectral band replication (SBR) and parametric stereo (PS) coding. Due to the aliasing interference, the complex banks instead of real banks are adopted in the SBR and PS coding. However, the additional overhead from the complex values in the CEMFB and the subsequent processing have led to high operational overhead. Our previous work has designed the SBR encoders based on the real-domain cosine modulation filter banks; we proposed a complexification-based approach for the SBR coding. This paper extends the work into PS coding. An approximate method for parameters estimation is proposed to save operational overhead with only one CEMFB-analysis channel. Also, a phase-adjustment down-mixing method is proposed to reduce energy vanish effects.
Convention Paper 7338 (Purchase now)
P4-7 Low Complexity Bit Allocation Algorithms for MP3/AAC Encoding —S Nithin, National Institute of Technology - Surathkal, India; Kumaraswamy Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
We have developed two reduced complexity bit-allocation algorithms for MP3/AAC based audio encoding, which can be useful at low bit-rates. One algorithm derives optimum bit-allocation using constrained optimization of weighted noise-to-mask ratio and the second algorithm uses decoupled iterations for distortion control and rate control, with convergence criteria. MUSHRA based evaluation indicated that the new algorithm would be comparable to AAC but requiring only about 1/10th the complexity.
Convention Paper 7339 (Purchase now)
P4-8 Linear Filtering in MDCT Domain —Kumaraswamy Suresh, T. V. Sreenivas, Indian Institute of Science - Bangalore, India
In this paper expressions for convolution multiplication properties of MDCT are derived starting from equivalent DFT representations. Using these expressions, methods for implementing linear filtering through block convolution in the MDCT domain are presented. The implementation is exact for symmetric filters and approximate for non-symmetric filters in the case of rectangular window-based MDCT. For a general MDCT window function, the filtering is done on the windowed segments and hence the convolution is approximate for symmetric as well as non-symmetric filters. This approximation error is shown to be perceptually insignificant for symmetric impulse response filters. Moreover, the inherent 50% overlap between adjacent frames used in MDCT computation does reduce this approximation error similar to smoothing of other block processing errors. The presented techniques are useful for compressed domain processing of audio signals.
Convention Paper 7340 (Purchase now)
Microphones and Loudspeakers
Saturday, May 17, 14:00 — 15:30
P5-1 A Study of Electrostatic Forces in Single-Acting Condenser Digital Transducer—Libor Husník, Czech Technical University in Prague - Prague, Czech Republic
One of the possibilities to design a transducer with the direct digital-to-analog conversion, sometimes called a digital loudspeaker, is the miniature condenser transducer manufactured on a silicon chip. Only recently has this micro technology been made available commercially, which can open further application possibilities. This paper is aimed at the study in which the back electrode of the electrostatic transducer is partitioned into sections having total areas proportional to powers of 2. Since electrostatic force acting on the membrane is affected by the distribution of bit groups, which cannot be even, said electrostatic force will not be a linear function of the signal voltage. Correcting coefficients for some arrangements are searched for.
Convention Paper 7341 (Purchase now)
P5-2 Ultra-Thin Micro-Loudspeaker Using Oblique Magnetic Circuit—Toshiyuki Matsumura, Shuji Saiki, Sawako Usuki, Matsushita Electric Industrial Co., Ltd. - Kadoma City, Osaka, Japan; Koji Sano, Panasonic Electronic Devices Co., Ltd. - Matsusaka City, Japan
More and more functions are installed to a mobile phone, but the size of the handset has become smaller. Devices installed in the mobile phone have been required to be downsized or thinner. Micro-loudspeakers installed to the mobile phone are required to be thinner. They are required to become both thinner and reproduce high quality sound. However, it has been very difficult to make thinner micro-loudspeakers without deteriorating the acoustic performance because the structure of conventional dynamic micro-loudspeaker is not suitable to make it thinner. We have succeeded in developing an ultra-thin micro-loudspeaker using Oblique Magnetic Circuit, which is 1.5 mm thick (45% thinner than conventional dynamic micro-loudspeakers) without deteriorating the acoustic performance.
Convention Paper 7342 (Purchase now)
P5-3 A Novel Glass Laminated Structure for Flat Panel Loudspeakers—Olivier Mal, Marek Novotny, Bart Verbeeren, AGC Research & Development Centre - Jumet, Belgium; Neil Harris, New Transducers Ltd. (NXT) - Cambourne, UK
A new, patented “sandwich structure” has been developed for various audio applications, in which thin glass sheets are laminated with a special PVB (Polyvinyl Butyral) film to eliminate typical acoustical weaknesses of monolithic glass and standard laminated solutions. The glass improvements include suppression of ringing of the audio signal and a much more flexible and lightweight glass structure. It results in flatter frequency response (both on-axis and 180° power response) and better transport of vibrations in the glass surface. In addition, better acoustical sensitivity and mechanical resistance are achieved. In this paper, after defining the structure of the developed laminated glass solution, we compare its performances to previously tried monolithic and laminated glass solutions. We also emphasize the key factors influencing the final acoustical properties. Finally, we introduce potential application fields for the developed structure.
Convention Paper 7343 (Purchase now)
P5-4 A Digitally Direct Driven Dynamic-Type Loudspeaker—Ryota Saito, Akira Yasuda, Kazushige Kuroki, Tomohiro Tsuchiya, Naoto Shinkawa, Hosei University - Koganei, Tokyo, Japan
If a speaker can be driven digitally, all processes from the input to the output can be performed digitally without the use of analog components such as power amplifiers; and a small, light, and high-quality speaker system can be realized. In this paper we propose the basic principle behind Digital-Speaker, and a digitally driven dynamic-type loudspeaker provided with multiple voice coils employing multibit delta-sigma modulation. The piezoelectric-speaker used in our previous study is replaced by the voice coil. The prototype is implemented along with a FPGA, CMOS drivers, and a dynamic-type loudspeaker. The THD and SPL are approximately 0.1% and 104 dB, respectively, and the output power is 1 W even when the power supply voltage is 1 V.
Convention Paper 7344 (Purchase now)
P5-5 Accelerated Power Test Analysis Based on Loudspeaker Life Distribution—Xu Wang, Yong Shen, Zhicheng Wu, Nanjing University - Nanjing, China
For the loudspeaker manufacturers, the long time spent on power tests made by relative standards or buyers has deeply influenced the period of product design and development. The authors apply the theory of reliability to cut the duration of loudspeaker power tests. On the basis of experiment data, a model of loudspeaker life distribution is propounded, from which an accelerated factor of the loudspeaker power test is derived, and then the characteristics of the loudspeaker under normal working conditions can be estimated. The method can be conveniently used on relative power tests and shorten the duration of the tests effectively.
Convention Paper 7345 (Purchase now)
P5-6 Perception and Physical Behavior of Loudspeaker Nonlinearities at Bass Frequencies in Closed vs. Reflex Enclosures—Jukka Rauhala, Jukka Ahonen, Miikka Tikander, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
This paper examines loudspeaker nonlinearities at bass frequencies in closed and reflex enclosures using signal analysis and perceptual evaluation methods. The nonlinearities are investigated by driving the loudspeakers to be compared with sinusoidal and musical test tones. The produced responses are evaluated in terms of diaphragm displacement, harmonic distortion, and bandwise distortion. In addition, a listening experiment is conducted in order to determine how the nonlinearities are perceived in both reflex and closed enclosures. The results show that with signals that have energy close to the tuning frequency of the reflex port produce more distortion with the closed enclosure. On the other hand, acoustic bass test tone behaved in an opposite way causing more distortion with the reflex enclosure. These phenomena were verified with the listening tests.
Convention Paper 7346 (Purchase now)
Mobile Phone Audio
Saturday, May 17, 16:00 — 17:30
P6-1 Enhancements to the SBC CODEC for Voice Communication in Bluetooth Devices —Laurent Pilati, Broadcom Corp. - Sophia Antipolis, France; Mohammad Zad-issa, Broadcom Corp. - Irvine, CA, USA
The Bluetooth Audio Distribution profile has uses low complexity sub-band coder (SBC) as its mandatory audio compression codec. More recently, SBC has been selected for Bluetooth wideband voice communication. Since SBC was first designed with audio compression, it does not incorporate the features that speech coders commonly use. The use of voice activity detection and comfort noise generation to reduce bandwidth usage and power consumption is an example. In this paper we investigated extensions for SBC that would make it better suited for voice compression in the Bluetooth framework. The proposed enhancements were evaluated on the basis of their impact on voice quality, their implementation requirements, and their bandwidth power savings.
Convention Paper 7347 (Purchase now)
P6-2 Efficiently Shuffling Large Sets of Clips —Ulrich Herrmann, austriamicrosystems - Graz, Austria
A method for randomly shuffling through large sets of video or audio clips is presented in this paper. Many up-to-date devices have only a rather limited capability of shuffling only up to 200 or 256 songs. This algorithm presents a way of shuffling even large sets with an almost unlimited number of items. It also provides the ability to traverse back and forth with little processing power on today’s micro controllers. All this is done with few bytes of code and almost no RAM.
Convention Paper 7348 (Purchase now)
P6-3 Hardware/Software Co-design of Multi-Format Audio Decoder —ChangYong Son, KangEun Lee, DoHyung Kim, Soojung Ryu, Shihwa Lee, Samsung Advanced Institute of Technology - Suwon, Korea
This paper presents a hardware/software co-design method for the implementation of a multi-format audio decoder with ultra low power, small chip size, and high flexibility, which are the most critical factors in embedded devices. This approach can provide both flexibility and low power with high performance in such a way that hardware implementation has been focused on the commonly used critical blocks of multiple audio decoders having intensive computations. Hardware blocks are well modularized to allow easy and rapid architecture exploration of several digital audio standards. The proposed system can decode an MP3 bitstream using only about 4 MHz clock frequency and AAC bitstream using only about 7 MHz clock frequency on average at the sampling rate of 48 kHz and the target bit rate of 128 kbps/stereo.
Convention Paper 7349 (Purchase now)
P6-4 Audio Enhancement for Portable Device-Based Speech Applications—Rory Turnbull, Peter Hughes, Steve Hoare, BT Group CTO - Ipswich, Suffolk, UK
Portable devices with audio capabilities necessitate the use of small transducers, often with poor frequency responses. This can be a limiting factor in the perception of the speech quality of VoIP services hosted on such a device. This paper seeks to investigate the problem and provide practical solutions through the use of appropriate enhancement technologies. The paper covers the use of equalization, dynamic range compression, and psychoacoustic bass enhancement as possible methods for improving intelligibility. Subjective tests are used to evaluate the enhancements prior to making practical recommendations.
Convention Paper 7350 (Purchase now)
P6-5 An Efficient, Low-Noise Filter Architecture for Bass Processing on a Processor Core—Peter Eastty, Nathan Bentall, Duncan Stott, Oxford Digital Limited - Stonesfield, Oxfordshire, UK
Bass Enhancement is becoming popular in many forms of consumer devices. Whatever technique is used on whatever processor, the low frequency filtering involved is frequently the major determinant of system signal-to-noise ratio. The architecture described combines an efficient, cascaded, low-pass FIR filter and a poly-phase adaptation of standard low frequency IIR filtering. The resulting circuit achieves a 20 to 30 dB improvement in signal to noise ratio at the cost of only 12 instructions per sample. The technique may be applied to any bass processing using fixed or floating point processors. Complete design tables for the cascaded FIR filters are given as are noise spectrum plots of the results.
Convention Paper 7351 (Purchase now)
P6-6 Implementation of Dynamic Voltage and Frequency Scaling on Portable Audio Players—Dahyanto Harliono, Woon-Seng Gan, Nanyang Technological University - Singapore
Current portable computing devices demand not only higher performance but also lower power consumption. For the same reason, this research aims to build a framework that enables a rapid design of energy-efficient embedded systems. Specifically, this research is focused on a dynamic voltage scaling algorithm, which has been found effective in saving power consumption. We developed a method of scaling voltage and frequency dynamically on the latest embedded processor, jointly designed by Analog Devices and Intel. The rationale behind this method is to avoid the processor being idle in high operating frequency and voltage. Instead, the processor can save power by running its task at a lower frequency and voltage, and completing it just before the real-time deadline. Furthermore, our method can also be implemented in other embedded processors with voltage-frequency scaling features.
Convention Paper 7352 (Purchase now)
Loudspeakers - 2
Sunday, May 18, 09:00 — 11:30
Chair: Ronald Aarts, Philips Research - Eindhoven, The Netherlands
P7-1 Low-Frequency Extension of Gated Loudspeaker Measurements—Juha Backman, Nokia Corporation - Espoo, Finland
The free-field response of a loudspeaker system can be approximated through a gated measurement, made in a sufficiently large space. The frequency resolution is nominally determined by the time gap between the direct sound and the first reflection, but the actual low-frequency accuracy of gated measurements is reduced also by the group delay of the loudspeaker itself. The group delay at low frequencies may cause a large fraction of the energy sound radiation to be cut off, underestimating the low-frequency response. A method is presented to estimate the approximate low-frequency response from the impedance measurement of the loudspeaker and to use the response to pre-process the acoustical measurement to improve the accuracy of the gated measurement.
Convention Paper 7353 (Purchase now)
P7-2 Measurement and Fourier-Bessel Analysis of Loudspeaker Radiation Patterns Using a Spherical Array of Microphones—Filippo M. Fazi, Vincent Brunel, Philip A. Nelson, University of Southampton - Highfield, Southampton, UK; Lars Hörchens, Delft University of Technology - Delft, The Netherlands; Jeongil Seo, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea
Loudspeakers are widely used in three-dimensional sound field reconstruction systems, but their spatial directivity features are relatively little-known. In this paper a hemispherical array of 40 microphones was designed and built in order to measure the pressure field radiated by different commercially available loudspeakers. The spatial samples of the acoustic pressure were processed in order to estimate the truncated Fourier-Bessel expansion of the sound field, which allows the reconstruction of the 3-D radiation pattern. An analysis of the errors involved in the estimation was also performed with a numerical model of the array.
Convention Paper 7354 (Purchase now)
P7-3 Turbulent and Viscous Air Friction in the Mid-High Frequency Loudspeaker—Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
Mid-high frequency loudspeaker with resonant frequency f = 982 Hz has been studied in atmospheres of air, He4, D2 and H2 at pressures ranging 0-1 bar. The measurements of viscous and turbulent contributions to the friction entering Q-factor showed significant difference as compared to a low frequency loudspeaker. The resonant frequency in air is considerably lower in an evacuated space than at 1 bar, and this differs from the low frequency loudspeaker, when the opposite is true. Measurements showed that imaginary part of viscous friction in Navier-Stokes equation is dominant, while contribution of the real part to the friction term is less significant, and Navier-Stokes equation reduces to the Stokes form ?p=-µ?vr , when imaginary part of the viscous force reduces effective vibration mass, which in turn enables operation of the loudspeaker at high frequency. The data were interpreted in terms of Greenspan theory of the piston radiator.
Convention Paper 7355 (Purchase now)
P7-4 Modeling of an Electrodynamic Loudspeaker including Membrane Viscoelasticy—Antonio Petosic, Ivan Djurek, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
The model is proposed based upon viscoelastic properties of the loudspeaker membrane, and properties considered include stress-strain hysteresis, creeping effect, initial stress effect, and appearance of the temperature fluctuations on the membrane surface. The creeping displacement response dependent on the step-like excitation current has been measured on different loudspeaker configurations, and listed effects were analyzed in terms of the N-order Bennewitz-Rötger differential equation, commonly used for description of the system of vibrating viscoelastic body. The main parameter in this equation is inverse stress parameter which connects friction and restoring term in the loudspeaker vibrating system.
Convention Paper 7356 (Purchase now)
P7-5 On a Novel Concept of Membrane Suspension in an Electrodynamic Loudspeaker—Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia; Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia
A laboratory model of an electrodynamic loudspeaker has been realized with the membrane suspended on a hollow elastic torus positioned in the bottom of the membrane, close to the voice coil. This geometry removes the torque in the membrane coming from the maximum possible distance of the suspension on the outer rim from the voice coil. The suppressed torque results in the suppression of the Bessel vibration modes, which generate stochastic deformation tilts on the membrane surface. Such tilts contribute to the intrinsic friction of the membrane, and their absence results in minor viscoelastic losses. Lateral rigidity of the torus is sufficient for operation of the loudspeaker without centric fixation.
Convention Paper 7357 (Purchase now)
Wave Field Synthesis
Sunday, May 18, 09:00 — 12:30
Chair: Diemer de Vries, Delft University of Technology - Delft, The Netherlands
P8-1 The Theory of Wave Field Synthesis Revisited—Sascha Spors, Technische Universität Berlin - Berlin, Germany; Rudolph Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany; Jens Ahrens, Technische Universität Berlin - Berlin, Germany
Wave field synthesis is a spatial sound field reproduction technique aiming at authentic reproduction of auditory scenes. Its theoretical foundation was developed almost 20 years ago and has been improved considerably since then. Most of the original work on wave field synthesis is restricted to the reproduction in a planar listening area using linear loudspeaker arrays. Extensions like arbitrarily shaped distributions of secondary sources and three-dimensional reproduction in a listening volume have not been discussed in a unified framework so far. This paper revisits the theory of wave field synthesis and presents a unified theoretical framework covering arbitrarily shaped loudspeaker arrays for two- and three-dimensional reproduction. The paper additionally gives an overview on the artifacts resulting in practical setups and briefly discusses some extensions to the traditional concepts of WFS.
Convention Paper 7358 (Purchase now)
P8-2 A Finite Difference Time-Domain Approach to Analyzing Room Effects on Wave Field Synthesis Reproduction—Robert Oldfield, Ian Drumm, Jos Hirst, University of Salford - Salford, Greater Manchester, UK
Probably the largest pit-fall to accurate audio reproduction using wave field synthesis (WFS) is the listening space. The WFS theory assumes free field, source free conditions that are seldom the case for practical sound reproduction. There is consequently a need to determine what effect the reproduction room has upon the synthesized sound field. This paper presents a finite difference time-domain (FDTD) approach to predicting the sound field in a room with arbitrary geometry and frequency dependent absorbing boundaries. A significant benefit to using FDTD is that the WFS system can be modeled both as part of the room and also in free-field conditions; therefore distortion of the sound field from the acoustics of the reproduction room can be quantified.
Convention Paper 7359 (Purchase now)
P8-3 Wave Field Synthesis Evaluation Using the Minimum Audible Angle in a Concert Hall—Georgios Marentakis, McGill University - Montreal, Quebec, Canada; Etienne Corteel, sonic emotion ag - Oberglatt, Switzerland; Stephen McAdams, McGill University - Montreal, Quebec, Canada
Localization accuracy with Wave Field Synthesis (WFS) was estimated in a variable-acoustics concert hall. Contrary to previous studies, we employed a Minimum Audible Angle (MAA) paradigm as a measure of localization performance. The MAA was estimated for three different listening positions, three orientations of the listeners (0, 60, 90 degrees) and two acoustical conditions. WFS was found to produce satisfying localization cues that depend little on the reverberation time of the room and only weakly on the position of the listener.
Convention Paper 7360 (Purchase now)
P8-4 Objective and Subjective Analysis of Localization Accuracy in Wave Field Synthesis—Joseph Sanson, IRCAM - Paris, France; Etienne Corteel, sonic emotion ag - Oberglatt, Switzerland; Olivier Warusfel, IRCAM - Paris, France
This paper analyses localization inaccuracies in the synthesis of virtual sound sources using Wave Field Synthesis (WFS), particularly at high frequencies. Objective and perceptual analyses are conducted through a binaural simulation of the actual sound field reproduced at the listener’s ears. The simulation consists in summing the respective contribution of each array transducer after filtering it with the appropriate HRTF according to the considered listener’s position. High-pass filtered white noises are used as a critical signal to investigate the impact of aliasing on localization accuracy. Objective and perceptual observations show that localization accuracy may degrade for off-centered listening positions, which can be mainly attributed to a mismatch in the elicited Interaural Level Differences (ILD) above the aliasing frequency.
Convention Paper 7361 (Purchase now)
P8-5 Wave Field Synthesis with Increased Aliasing Frequency—Etienne Corteel, Renato Pellegrini, Clemens Kuhn-Rahloff, sonic emotion ag - Obergltt (Zurich), Switzerland
Wave Field Synthesis (WFS) is a sound reproduction technique that enables the synthesis of target sound fields without any assumption on the listening position. Spatial aliasing is one of the remaining artifacts of WFS that limits the exact synthesis below a corner frequency referred to as spatial aliasing frequency. This paper presents a new technique that enables an increase of the spatial aliasing frequency of WFS assuming a preferred listening area. The presented technique is fully scalable and may be adapted to any listening zone shape or location. Applications in the domain of simulation environments and home entertainment are discussed.
Convention Paper 7362 (Purchase now)
P8-6 Reproduction of Moving Virtual Sound Sources with Special Attention on the Doppler Effect—Jens Ahrens, Sascha Spors, Technische Universität Berlin - Berlin, Germany
In this paper we outline a basic framework for the reproduction of the wave field of moving virtual sound sources. Conventional implementations usually reproduce moving virtual sources as a sequence of stationary positions. This process leads to various artifacts as reported in the literature. On the example of wave field synthesis, we show that the explicit consideration of the physical properties of the wave field of moving sources avoids these artifacts and allows for the accurate reproduction of the Doppler Effect. However, numerical simulations suggest that the artifacts inherent to the reproduction system can lead to a heavy degradation of the reproduction quality.
Convention Paper 7363 (Purchase now)
P8-7 A Graphical Tool Set for Analyzing Wave Field Synthesis Algorithms—Thomas Korn, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Current Wave Field Synthesis (WFS) rendering realizations consist of large structures of audio signal processing components (filters, delays, amplitude weighting) that are controlled by complex algorithms based on the virtual source's properties. This paper proposes a set of tools that is used to analyze the underlying WFS coefficient calculation algorithms visually by mapping characteristic measures dependent on the source's and listener's position. These measures are derived from the reproduction system's idealized transfer function and parametric impulse response description. They reveal functional aspects of the algorithm's behavior. The measures aim at supporting an intuitive understanding of the perception of virtual sound events in a Wave Field Synthesis system, but also they facilitate the basic algorithm development process.
Convention Paper 7364 (Purchase now)
Spatial Audio Perception and Processing
Sunday, May 18, 09:30 — 11:00
P9-1 Audio-Visual Processing Tools for Auditory Scene Synthesis—Gavin Kearney, Rozenn Dahyot, Frank Boland, Trinity College Dublin - Dublin, Ireland
We present an integrated set of audio-visual tracking and synthesis tools to aid matching of the audio to the video position in both horizontal and periphonic sound reinforcement systems. Compensation for screen size and loudspeaker layout for high definition formats is incorporated and the spatial localization of the source is rendered using advanced spatialization techniques. A subjective comparison of several original and enhanced film sequences using the Vector Base Amplitude Panning (VBAP) method is presented. The results show that the encoding of non-contradictory audio-visual spatial information, for presentation on different loudspeaker layouts significantly improves the naturalness of the listening/viewing experience.
Convention Paper 7365 (Purchase now)
P9-2 Encoding Higher Order Ambisonics with AAC—Erik Hellerud, Norwegian University of Science and Technology - Trondheim, Norway; Ian Burnett, University of Wollongong - Wollongong, New South Wales, Australia; Audun Solvang, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
In this paper we explore a simple method for reducing the bit rate needed for transmitting and storing Higher Order Ambisonics (HOA). The HOA B-format signals are simply encoded using Advanced Audio Coding (AAC) as if they were individual mono signals. Wave field simulations show that by allocating more bits to the lower order signals than the higher the resulting error is very low in the sweet spot but increases as function of distance from the center. Encoding the higher order signals with a low bit rate does not lead to a reduced audio quality. The spatial information is improved when higher-order channels are included, even if these are encoded with a low bit rate.
Convention Paper 7366 (Purchase now)
P9-3 Virtualized Listening Tests for Loudspeakers—Timo Hiekkanen, Helsinki University of Technology - Espoo, Finland; Aki Mäkivirta, Genelec Oy - Iisalmi, Finland; Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
The precise location of a loudspeaker in a listening room is known to affect loudspeaker preference ratings. When multiple loudspeakers are compared the evaluation is limited by the poor human auditory memory. To overcome these problems, a method to evaluate and compare loudspeakers using headphones is proposed. The method utilizes personal head-related transfer functions in rendering the sound field recorded in a standard listening room with an artificial head. Equalization of circumaural headphones and the artificial head are investigated. Formal listening tests are conducted to examine differences between the proposed binaural method and real loudspeakers in a standard listening room. Listening tests show that the virtualized loudspeakers can be nearly imperceptible from reality in many but not in all cases.
Convention Paper 7367 (Purchase now)
P9-4 Binaural Rendering in MDCT Domain for Multi-Object Audio Coding—Shinya Iizuka, Kei Kikuiri, Nobuhiko Naka, NTT DoCoMo, Inc. - Yokosuka, Kanagawa, Japan
We propose a binaural rendering method in Modified Discrete Cosine Transform (MDCT) domain. It has good compatibility with audio codecs because a number of audio codecs utilize an MDCT filter bank for time-frequency transform. The proposal maps MDCT coefficients to the real part of the Modulated Complex Lapped Transform (MCLT) coefficients and processes the amplitudes and phases according to the binaural information. The inverse MCLT is applied to the coefficients with a synthesis window function, which is derived from the perfect reconstruction condition for the phase shifted signal under the assumption of linear phase property. The proposed method is applicable to the Binaural Cue Coding Type I and offers equivalent subjective quality to the original binaural signal.
Convention Paper 7368 (Purchase now)
P9-5 Room-Dependent Preference of Virtual Surround Sound—Frederick Scott, Agnieszka Roginska, New York University - New York, NY, USA
A common method for simulating surround sound over headphones, so-called virtual surround sound, is the convolution of content information with binaural cues. Often, room information is included. This paper examines if using HRTFs with room impulse responses customized to the room the listener is in enhances the listening experience. Perceptual experiments were conducted to evaluate whether or not listeners prefer a room accurate rendering versus a room that is dissimilar to the one a listener is seated in. A preference test was conducted using music as the test material.
Convention Paper 7369 (Purchase now)
P9-6 Quantization of 2-D Higher Order Ambisonics Wave Fields—Audun Solvang, U. Peter Svensson, Erik Hellerud, Norwegian University of Science and Technology - Trondheim, Norway
The spatial distribution of the quantization noise for a 2-D Higher Order Ambisonics (HOA) signal is investigated analytically. Uniformly distributed loudspeakers radiating plane waves in a nonreverberant environment and frequency domain quantization are presumed. It is found that employing the same quantization interval for all orders leads to uniformly distributed quantization noise in space. Assigning a larger quantization interval (i.e., fewer bits) to higher orders leads to a radially increasing quantization noise. Matching the quantization error to the reproduction error at the near perfect reconstruction boundary suggests that as little as four bits per sample can be used for quantization. Furthermore, high-pass filtering the HOA components opens up for employing as little as three bits per sample. This quantization strategy seems very promising for reducing the rate of HOA.
Convention Paper 7370 (Purchase now)
P9-7 A Binaural Auditory Model for the Evaluation of Reproduced Stereophonic Sound—Marko Takanen, Helsinki University of Technology - Espoo, Finland; Gaëtan Lorho, Nokia Corporation - Helsinki, Finland
Binaural cues describing the differences in phase and power between signals at the two ears enable our auditory system to localize sound sources and segregate spatially multiple auditory events. Recent publications on binaural auditory models have shown how the interaural coherence can be utilized to estimate these cues and therefore model the localization ability of our auditory system. This approach is exploited in this paper to estimate the binaural cues at different frequency bands and identify the spatial location of sound sources from recorded broadband signals. We illustrate the application of a binaural auditory model to evaluate sound reproduced by a stereophonic loudspeaker setup in terms of source localization and specific loudness.
Convention Paper 7371 (Purchase now)
P9-8 An Augmented Reality Audio Mixer and Equalizer—Ville Riikonen, Miikka Tikander, Matti Karjalainen, Helsinki University of Technology - Espoo, Finland
In Augmented Reality Audio (ARA) applications the real sound environment of the user is extended with virtual objects. The real environment is reproduced as a pseudo-acoustic world via a special ARA headset that consists of binaural microphones and headphones. However, the headset causes coloration to the pseudo-acoustic representation. In order to make the headset acoustically transparent, equalization is needed. Digital equalization easily causes unacceptable delays. This paper presents a novel ARA mixer with real-time analog equalization to correct the coloration caused by the leakage through the headset and changed resonances in the closed ear canal.
Convention Paper 7372 (Purchase now)
P9-9 Sub-Band Adaptive Crosstalk Cancellation: A Novel Approach for Immersive Audio—Stefania Cecchi, Lorenzo Palestini; Paolo Peretti, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Porto Potenza Picena (MC), Italy
In the field of immersive audio, crosstalk canceller is required when a virtual sound is rendered over two loudspeakers. In the last decade several adaptive algorithms have been proposed: nowadays the least square (LMS) algorithm seems to be the best compromise between simplicity and robustness although its convergence is weakened for colored inputs. In this paper a new approach for crosstalk cancellation based on a sub-band adaptive algorithm will be derived. The effectiveness of this algorithm, considering colored input, will be presented in terms of matrix inversion quality and fast convergence rate comparing it with the conventional LMS algorithm.
Convention Paper 7373 (Purchase now)
Spatial Audio Perception and Processing - Part 2
Sunday, May 18, 13:00 — 17:30
Chair: Renato Pelligrini, sonic emotion ag - Obergltt (Zurich), Switzerland
P10-1 Analysis and Adjustment of Planar Microphone Arrays for Application in Directional Audio Coding—Markus Kallinger, Fabian Kuech, Richard Schultz-Amling, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jukka Ahonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
Directional Audio Coding (DirAC) is a well-established and efficient way to capture and reproduce a spatial sound event. In a recording room, DirAC requires four spatially coincident microphones to estimate the desired parameters, i.e., direction-of-arrival and diffuseness of sound: one omnidirectional and three figure-of-eight microphones pointing along the axes of a three-dimensional Cartesian coordinate system. In most consumer applications only two dimensional scenes need to be reproduced, implying that only two figure-of-eight microphones are required. Furthermore, instead of directional microphones, arrays of omnidirectional microphones are considered for economic reasons. Therefore, we investigate various two-dimensional microphone configurations with respect to their usability for DirAC. We derive theoretical limits for the correct estimation of both direction-of-arrival and diffuseness for the most suitable planar arrays. Furthermore, we suggest a way to equalize the systematic bias for the direction-of-arrival estimation, introduced by the discrete planar arrays.
Convention Paper 7374 (Purchase now)
P10-2 Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio Using Directional Audio Coding—Richard Schultz-Amling, Fabian Kuech, Markus Kallinger, Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jukka Ahonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
Recording and reproduction of spatial audio becomes more and more important, as multichannel audio applications gain increasing attention. Directional Audio Coding (DirAC) represents a well proven approach for the analysis and reproduction of spatial sound. In the analysis part, the direction-of-arrival and the diffuseness of the sound field is estimated in subbands using B-format signals, which can be created with 3-D omnidirectional microphone arrays. However, 3-D microphone configurations are not practical in consumer applications, e.g., due to physical design constraints. In this paper we propose a new approach that allows for an approximation of the required B-format signals but is based on a planar microphone configuration only. Comparisons with the standard DirAC approach confirm that the proposed method is able to correctly estimate the desired parameters within a wide range of frequency and the spatial resolution matches the human perception.
Convention Paper 7375 (Purchase now)
P10-3 User-Dependent Optimization of Wave Field Synthesis Reproduction for Directive Sound Fields—Frank Melchior, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and Delft University of Technology, Delft, The Netherlands; Christoph Sladeczek, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and Bauhaus Universität Weimar, Germany; Diemer Diemer de Vries, Delft University of Technology - Delft, The Netherlands; Bernd Fröhlich, Bauhaus Universität Weimar - Weimar, Germany
The use of wave field synthesis (WFS) enables the correct localization of sources over a large listening area. This works well for simulated point sources outside the listening area. The perception of focused sources is only correct for a subspace of the listening area. The subspace depends on the selected set of loudspeakers used for the reproduction of the focused source. If the position of the listener is known the selection of loudspeakers as well as the signal processing can be optimized. By the use of continuous tracking of the listener this adaptation can be done in real time. The same data can be used to simulate a specific directivity of a source and optimize a corresponding room simulation for the tracked listener. We present a wave field synthesis system for the simulation of directive focused sources including room simulation, which is continuously optimized for the position of a tracked listener. Our observations confirm that this approach significantly improves the localization and sound quality of focused sources located inside the listening area.
Convention Paper 7376 (Purchase now)
P10-4 Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding—Jonas Engdegård, Barbara Resch, Dolby Sweden - Stockholm, Sweden; Cornelia Falch, Oliver Hellmuth, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Jeroen Koppens, Erik Schuijers, Werner Oomen, Philips Applied Technologies - Eindhoven, The Netherlands
Following the recent trend of employing parametric enhancement tools for increasing coding or spatial rendering efficiency, Spatial Audio Object Coding (SAOC) is one of the recent additions to the standardization activities in the MPEG audio group. SAOC is a technique for efficient coding and flexible, user-controllable rendering of multiple audio objects based on transmission of a mono or stereo downmix of the object signals. The SAOC system extends the MPEG Surround standard by re-using its spatial rendering capabilities. This paper will describe the chosen reference model architecture, the association between the different operational modes and applications, and the current status of the standardization process.
Convention Paper 7377 (Purchase now)
P10-5 Focusing of Virtual Sound Sources in Higher Order Ambisonics—Jens Ahrens, Sascha Spors, Technische Universität Berlin - Berlin, Germany
Higher order Ambisonics (HOA) is an approach to the physical (re-)synthesis of a given wave field. It is based on the orthogonal expansion of the involved wave fields formulated for interior problems. This implies that HOA is per se only capable of recreating the wave field generated by events outside the listening area. When a virtual source is intended to be reproduced inside the listening area, strong artifacts arise in certain listening positions. These artifacts can be significantly reduced when a wave field with a focus point is reproduced instead of a virtual source. However, the reproduced wave field only coincides with that of the virtual source in one half-space defined by the location and nominal orientation of the focus point. The wave field in the other half-space converges toward the focus point.
Convention Paper 7378 (Purchase now)
P10-6 Listener Envelopment—What Has Been Done and What Future Research Is Needed?—Dan Nyberg, Jan Berg, Luleå University of Technology - Piteå, Sweden
In concert hall acoustics, the perceived spatial impression and/or spaciousness are characterized by the two attributes apparent source width (ASW) and listener envelopment (LEV). For LEV there are no clear consensus across the results of previous work. This paper aims to discuss the research performed on LEV and how these research results are confirming or contradicting each other. There is a consensus on the arrival angle of the later sound energy and its influence on LEV, whereas there is no clear agreement on the delay time and frequency content of the late reflections.
Convention Paper 7379 (Purchase now)
P10-7 Obtaining a Highly Directive Center Channel from Coincident Stereo Microphone Signals—Christof Faller, Illusonic LLC - Lausanne, Switzerland
Time-frequency based postprocessing applied to a coincident stereo recording is proposed to generate an audio signal with a highly directive directional response pointing straight forward. Assuming an ideal coincident stereo microphone, the directional response of this center channel is effectively time and frequency invariant. Further, the look direction can be steered to left and right front directions. The technique is based on the insight that the signal that predicts left from right, modified by limiting the magnitude of the frequency domain prediction gains, has a center forward directional response. The center channel is generated using both, a left-right and a right-left magnitude-limited-predictor signal. Applications of the proposed scheme are use of stereo microphones as “digital steerable shot gun microphones” and center channel generation for music recording.
Convention Paper 7380 (Purchase now)
P10-8 Spatial Sound in the Use of Multimodal Interfaces for the Acquisition of Motor Skills—Pablo F. Hoffmann, Aalborg University - Aalborg, Denmark
This paper discusses the potential effectiveness of spatial sound in the use of multimodal interfaces and virtual environment technologies for the acquisition of motor skills. Because skills are generally of multimodal nature, spatial sound is discussed in terms of the role that may play in facilitating skill acquisition by complementing, or substituting, other sensory modalities. An overview of related research areas on audiovisual and audiotactile interaction is given in connection to the potential benefits of spatial sound as a means to improve the perceptual quality of the interfaces as well as to convey information that may prove critical for the transfer of motor skills.
Convention Paper 7381 (Purchase now)
P10-9 Evaluating the Sensation of Envelopment Arising from 5-Channel Surround Sound Recordings—Sunish George, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK; Søren Beck, Bang & Olufsen a/s - Struer, Denmark
This paper discusses a series of listening tests conducted in the UK and Denmark to evaluate the perceived envelopment of surround audio recordings. The listening tests were designed to overcome some drawbacks (such as range equalization bias) present in the scores of a listening test based on ITU-R. BS. 1534-1 Recommendation (MUSHRA). In this method the listeners were asked to evaluate the envelopment of 5-channel surround sound recordings using a 100-point continuous scale. In order to calibrate the scale, two anchor recordings were used to define points 15 and 85 on the scale. The anchor recordings were selected by means of a formal listening test and interviews with the listeners. According to the obtained results, the proposed method provides repeatable results.
Convention Paper 7382 (Purchase now)
Analysis and Synthesis of Sound
Sunday, May 18, 13:00 — 17:30
Chair: Olivier Warusfel, IRCAM - Paris, France
P11-1 An Improved Pattern-Matching Method for Piano Multipitch Detection—Luis Ortiz-Berenguer, Francisco J. Casajus-Quiros, Elena Blanco-Martin, Technical University of Madrid - Madrid, Spain
A previous method presented by the authors carried out multipitch piano sound identification by using a pattern-matching process. In that method, the identification required, besides the matching-metric calculation, both a spectral predetection process and a validation step. Predetection allowed selection of a subset out of the eighty-eight patterns, whereas the validation verified whether the detected note were actually in the analyzed spectrum. Both highly increased the true-positive detections ratio, but they imposed restrictions to the identification of complex real sounds (e.g., two-hands playing). This paper presents an improvement in the method that allows getting rid of both, predetection and validation, by using a modified matching-metric algorithm. This work has been supported by the Spanish National Project TEC2006-13067-C03-01/TCM.
Convention Paper 7383 (Purchase now)
P11-2 Polyphonic Piano Transcription Based on Spectral Separation—Julio Jose Carabias-Orti, Pedro Vera-Candeas, Nicolas Ruiz-Reyes, Raul Mata-Campos, Francisco Jesus Cañadas-Quesada, University of Jaén - Linares, Jaén, Spain
We propose a discriminative model for polyphonic piano transcription. Spectral features are obtained individually for each note. To solve the overlapping partial problem, we apply spectral separation by estimating the spectral envelope for each note. For classifying purposes, support vector machines (SVM) are trained on the spectral energy inferred from these spectral features. We apply a scheme of one-versus-all (OVA) SVM classifiers to discriminate frame-level note instances. To decrease the high frequency notes residual energy due to the downward notes shared partials, a method to cancel the interferences from the downward notes to the upward notes has been developed. The classifier output is filtered with a hidden Markov model. Our approach has been tested with synthesized and real piano recordings obtaining very promising results.
Convention Paper 7384 (Purchase now)
P11-3 Toward a Real-Time Implementation of a Physical Modeling Based Percussion Synthesizer—Katarzyna Chuchacz, Roger Woods, Sile O'Modhrain, Queen’s University Belfast - Belfast, Northern Ireland, UK
This paper presents work carried out with the objective of designing a novel percussion synthesizer based on a physical model of a plate-based percussion instrument. The algorithm has been implemented in real-time for the first time, on a Field Programmable Gate Array (FPGA) chip allowing a number of parameters such as excitation value, stroke location, and plate stiffness, to be changed in real-time. This presents the player with a number of new modes of playability but requires the definition and design of a flexible interface that gives the extensive access to the sound world of the synthesis model. Details of the hardware implementation architecture are put forward as well as fixed point/floating point computation aspects that impact the instrument’s playability.
Convention Paper 7385 (Purchase now)
P11-4 Dual Noise Suppression in Hearing Aids—Anton Schlesinger, Marinus M. Boone, Delft University of Technology - Delft, The Netherlands
A combined processing scheme for the enhancement of speech intelligibility in hearing aids is presented. The approach utilizes an optimized beam-forming method in connection with a biologically inspired processing model of modulation perception and binaural interaction.
Convention Paper 7386 (Purchase now)
P11-5 Automatic Sound Recognition for Security Purposes—Pawel Zwan, Gdansk University of Technology - Gdansk, Poland
In the paper an automatic sound recognition system is presented. It forms a part of a larger security system developed in order to monitor outdoor conditions for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted outdoor, thus non-stationary noise of a significant energy may be present in it. In the paper an especially designed algorithm for an outdoor noise reduction is presented, non-typical events in audio stream are automatically detected and parameterized. Parameter values of various audio events are analyzed and sounds are automatically recognized. The automatic recognition accuracy obtained for various feature vectors and some chosen recognition systems is compared. The conclusions are derived and a future plan of experiments is proposed.
Convention Paper 7387 (Purchase now)
P11-6 Multipitch Estimation of Harmonically-Related Event-Notes by Improving Harmonic Matching Pursuit Decomposition—Francisco Jesus Canadas-Quesada, Pedro Vera-Candeas, Nicolas Ruiz-Reyes, Raul Mata-Campos, Julio Jose Carabias-Orti, University of Jaén - Linares, Jaén, Spain
In this paper we propose a note detection approach based on harmonic matching pursuit (HMP) and specifically designed to detect simultaneous notes. However, HMP is not able to decompose harmonic sounds in different harmonic atoms when their fundamental frequencies are harmonically-related. To solve this problem, we propose an algorithm, called atomic spectral smoothness (SS), which works over the harmonic atoms obtained by HMP. This algorithm is based on the spectral smoothness principle that supposes that the spectral envelope of a harmonic sound usually forms smooth contours. Our proposal shows promising results for polyphonic musical signals with two harmonically-related note-events.
Convention Paper 7388 (Purchase now)
P11-7 Amplitude Modification Algorithms within the Framework of Physical Modeling and of Haptic Gestural Interaction—Alexandros Kontogeorgakopoulos, Claude Cadoz, Institute National Polytechnique de Grenoble - Grenoble, France
Every underlying technique that has been used for the realization of audio effects since the beginning of electronic and computer music, introduced different types of sound modifications and proposed new ways of control. The advent of digital signal processing has stimulated the audio processing researchers to a great extent; thus a variety of algorithms were designed to provide novel sound modifications. On the other hand, physical modeling and digital simulation formalisms have been principally used for the merely imitation and emulation of older sound processing systems. The aim of this paper is to propose three physical models conceived to offer sound modifications that mainly alter the amplitude of audio signals. The originality of this case is not the resulted audio modifications but their transposition in the framework of physical modeling and digital simulation, which outlines an alternative control procedure.
Convention Paper 7389 (Purchase now)
P11-8 Circular Pitch Space Based Harmonic Change Detection—Markus Mehnert, Technische Universität Ilmenau - Ilmenau, Germany; Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Daniel Arndt, Technische Universität Ilmenau - Ilmenau, Germany; Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
This paper introduces a novel method for detecting harmonic boundaries in musical audio signals. These boundaries are important for chord analysis and between two boundaries is just one particular chord. This event-driven analysis of musical audio signals is a better basis for a following chord analysis than the traditionally frame-based-only concept. The method itself works with circular pitch spaces (CPS). The idea behind CPS is the calculation of parameters that summarize high level aspects of the audio signal such as semantic and music theoretical relationships. Using CPSs entails good results in detecting harmonic changes.
Convention Paper 7390 (Purchase now)
P11-9 Circular Pitch Space Based Musical Tonality Analysis—Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Markus Mehnert, Technische Universität Ilmenau - Ilmenau, Germany; Daniel Arndt, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
The focus of this paper is to give an overview of existing circular pitch spaces, its special properties and application for semantic audio analysis. Beside this the symmetry model is proposed as a framework to describe the inter-model relationships between different circular pitch spaces. Similar to color spaces in vision musical pitch spaces organize pitches in a way that semantic/cognitive/theoretical/physical relationships between tones become geometrically apparent. Within the last years pitch spaces were mainly the subject of music theory. But they become more and more interesting for semantic analysis of musical audio signals. Pitch spaces can be applied to key and chord recognition, similarity calculation of musical pieces, genre estimation, tension analysis, or harmonic change detection.
Convention Paper 7391 (Purchase now)
Audio Archiving, Storage, Restoration, and Content Management & Audio Networking
Sunday, May 18, 14:00 — 15:30
P12-1 Drift, Wow, and Flutter Measurement and Reduction in Shrunken Movie Soundtracks—Przemek Maziewski, Adam Kupryjanow, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The paper presents the method and algorithms used to determine and reduce drift, wow, and flutter in shrunken movie tapes. The idea behind the algorithms is to use image processing for calculating the local tape shrinkage, which is one of the reasons for drift, wow, and flutter. The shrinkage can be calculated via analyzing the image height of: a movie frame, sprocket hole, pitch, or another standardized movie tape element; and then it can be expressed as the drift, wow, and flutter characteristic. After the characteristic determination both the soundtrack and movie frames can be corrected. The paper presents the description of the image based drift, wow, and flutter determination method and the experiments confirming the theoretical findings.
Convention Paper 7392 (Purchase now)
P12-2 The Norwegian Institute of Recorded Sound: From Collection to Archive to Public Private Partnership—Mark Drews, University of Stavanger - Stavanger, Norway; Jacqueline von Arb, Norwegian Institute of Recorded Sound - Stavanger, Norway
In 2006, the Norwegian Institute of Recorded Sound (NIRS) entered into a partnership with Memnon Audio Archiving Services to form MemNor, a commercial audio archiving service based in Stavanger, Norway. This paper traces the evolution of the Norwegian Institute of Recorded Sound from a private collection of music recordings to a municipally funded audio archive to a public private partnership and discusses the past, the current, and the future challenges involved. Details of ongoing activities is included.
Convention Paper 7393 (Purchase now)
P12-3 Cable-free Audio Delivery for Home Theater Entertainment Systems—Andreas Floros, Ionian University - Corfu, Greece; Nicolas-Alexander Tatlas, John Mourjopoulos, Dimitris Grimanis, University of Patras - Patras, Greece
Real time, multichannel audio content delivery over the air is expected to significantly simplify the interconnection complexity required for setting up typical home theater applications. However, despite the technological advantages of wireless networking standards related to high transmission rates and Quality-of-Service support, a number of issues has to be additionally addressed, such as multiple loudspeaker synchronization and packet delay/losses containing compressed quality and multiplexed audio data. In this paper further developments in the area of wireless audio delivery are presented by considering in detail multichannel reproduction for wireless home theater applications. Using both subjective and objective performance evaluation criteria, it is shown that cable-free multichannel audio playback is feasible under specific networking and audio coding conditions.
Convention Paper 7394 (Purchase now)
P12-4 Adaptive Playout for VoIP Based on the Enhanced Low Delay AAC Audio Codec—Jochen Issing, Nikolaus Färber, Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The MPEG-4 Enhanced Low Delay AAC (AAC-ELD) codec extends the application area of the Advanced Audio Coding (AAC) family toward high quality conversational services. Through the support of the full audio bandwidth at low delay and low bit rate, it offers excellent support for enhanced VoIP applications. In this paper we provide a brief overview of the AAC-ELD codec and describe how its codec structure can be exploited for IP transport. The overlapping frames and excellent error concealment make it possible to use frame insertion/deletion in order to adjust the playout time to varying network delay. A playout algorithm is proposed that estimates the jitter on the network and adapts the size of the de-jitter buffer in order to minimizes buffering delay and late loss. Considering typical network conditions and the same average delay, it is shown that the playout algorithm can reduce the loss rate by more than one magnitude compared to fixed playout.
Convention Paper 7395 (Purchase now)
Signal Processing, Sound Quality Design
Sunday, May 18, 16:00 — 17:30
P13-1 Time-Alignment of Multiway Loudspeakers with Group Delay Equalization: Part I—Sunil Bharitkar, Chris Kyriakakis, Tom Holman, University of Southern California and Audyssey Labs - Los Angeles, CA, USA
In this paper, a first of two-parts, a technique for time-aligning the driver responses (viz., woofer, mid-range, and tweeter responses) in a multiway loudspeaker system is presented. Generally, woofers exhibit a much larger time-of-arrival delay at a listening position, compared to the mid-range and high-frequency drivers due to the presence of crossover networks. Moreover, the time-of-arrival delay for all drivers is frequency dependent exhibiting a large variation over the audible frequency domain. Due to these differences a two-part study was undertaken to understand the effects of these variations, quantitatively and qualitatively, in the direct as well as reverberant field in typical listening rooms with diverse content. Various multi-way loudspeakers, measured in one of the anechoic chambers at Audyssey, were selected to provide a diverse corpus of responses. In this first part we present the motivation behind the system used for applying all-pass filters to process audio signals being delivered to the multiway speaker and propose a time-delay difference equalization technique, between the drivers of the multiway loudspeakers showing group-delay equalization while retaining a flat magnitude response. Clearly, applying all-pass filters will result in temporal-smearing of the measured response. Furthermore, an investigation using perceptually motivated variable-octave complex smoothing of responses, and designing all-pass filters based on this phase-smoothed data, will also be undertaken. Quantitative results obtained will be presented in this paper whereas the next part of the two-part paper will present results from listening tests.
Convention Paper 7396 (Purchase now)
P13-2 Singing Voice Separation Combining Panning Information and Pitch Tracking—Maximo Cobos, Jose J. Lopez, Technical University of Valencia - Valencia, Spain
Source Separation techniques applied to music mixtures are able to extract relevant information that can be very useful for many applications, such as music remixing and reprocessing, lyrics recognition or music information retrieval. Among all the sources present in modern music themes, the singing voice has a especial interest because it is the only one that combines music, lyrics, and expression. In this paper we propose a system designed for extracting the singing voice from stereo recordings in different steps. This system combines panning information and pitch tracking, allowing the refinement of the time-frequency mask applied for extracting a vocal segment, and thus, improving the separation. An application example is discussed.
Convention Paper 7397 (Purchase now)
P13-3 The Downsampling Dilemma: Perceptual Issues in Sample Rate Reduction—Brett Leonard, New York University - New York, NY, USA
Many options currently exist for sample rate conversion. With sample rate reduction playing an integral part in the modern production world, downsampling algorithm quality is more important than ever. This paper presents data exploring the differences in sample rate reduction algorithms. While certain tests clearly display differences in the quality of the algorithms, listening test data shows the average listener is unable to repeatedly discern the difference in sample rate reduction methods.
Convention Paper 7398 (Purchase now)
P13-4 NU-Tech: The Entry Tool of the hArtes Toolchain for Algorithms Design—Ariano Lattanzi, Ferruccio Bettarelli, Leaff Engineering - Porto Potenza Picena (MC), Italy; Stefania Cecchi, Univerista Politecnica delle Marche - Ancona, Italy
The aim of the hArtes project is to facilitate and automate the rapid design and development of heterogeneous embedded systems, targeting a combination of a general purpose embedded processor, digital signal processing, and reconfigurable hardware. In this paper we present the NU-Tech platform, the main entry tool from the hArtes toolchain, which has the role of assisting the designers in tuning and possibly improving the input algorithm at the highest level of abstraction. A brief description of the project itself will be given and its vocation to audio highlighted through a case study application.
Convention Paper 7399 (Purchase now)
P13-5 Recovery of Missing Signals Utilizing (GHA) Generalized Harmonic Analysis—Applied Interpolation—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, University of Tokyo - Tokyo, Japan
For archiving damaged historical recordings, recoveries of missing portions are as essentially important as noise reduction. Conventional counter-measures with functional interpolation are not effective when the missing interval is long. Inharmonic frequency analysis GHA is profitable for this purpose, because the recomposed signal with frequency components obtained by GHA exhibits very long periods. Length of the period is given as an inverse number of the least common multiple of the rendered inharmonic frequency’s periods. This feature is very advantageous for signal recovery, and the authors devised an extrapolation simply extending a re-synthesizing waveform obtained through GHA analysis/re-synthesis. The authors got satisfactory results as a whole by applying interpolation combined with forward and backward extrapolations based upon the abovementioned method. Results of recovery highly depend upon characters of signals (such as music), and the authors did not find definite rules for setting GHA’s analyzing conditions. Those are given through auditory examination this time.
Convention Paper 7400 (Purchase now)
P13-6 Combination of Warped and Linear Filter Structures for Loudspeaker Equalization—German Ramos, Jose J. Lopez, Technical University of Valencia - Valencia, Spain; Basilio Pueo, University of Alicante - Alicante, Spain
The warping filters were introduced years ago for loudspeaker equalization in order to solve the lack of resolution of the linear filters at low frequencies, and also to follow the frequency resolution of psycho-acoustic scales like the Bark scale, with a more logarithmic than linear behavior. However, this improvement in the frequency resolution at low and mid frequencies is done at the expense of loosing resolution at high frequencies and increasing the complexity of the filter and its implementation computational cost. In this paper a smart combination of linear and warped filter structures previously developed by the authors for FIR filters is presented with new contributions and extended to IIR filters. This combination saves computational cost and obtains a proper frequency resolution at the whole frequency band, obtaining better results for the same computational cost than when using linear or warped filters alone. The results have been subjectively tested using the ABX methodology with successfully results. The presented filter structures, methodology, and apparatus to do the filtering are patent pending.
Convention Paper 7401 (Purchase now)
P13-7 Multichannel Dereverberation System Using Modified Correlation-Based Blind Deconvolution and Multi-Microphone Spectral Subtraction—Jae-woong Jeong, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Korea; Seok-Pil Lee, Korea Electronics Technology Institute (KETI) - Sungnam, Korea; Dae-hee Youn, Yonsei University - Seoul, Korea
This paper presents a new multichannel dereverberation system combining modified correlation-based blind deconvolution with multi-microphone spectral subtraction. In the proposed system, we make M combinations of observed signals and apply them to the correlation-based blind deconvolution. The deconvolved signals are then used as inputs to the multi-microphone spectral subtraction. These spectral subtractions with the multiple deconvolved signals estimate the reverberant energy by using both a frame delay and a frequency-dependent weight. Due to the accurate estimation of the reverberant energy, the combination of correlation-based blind deconvolution with the multi-microphone spectral subtraction provides improved dereverberation performance. Performance improvement of the proposed system has been confirmed through experiments.
Convention Paper 7402 (Purchase now)
P13-8 Harmonic and Intermodulation Analysis of Nonlinear Devices Used in Virtual Bass Systems—Nay Oo, Woon-Seng Gan, Nanyang Technological University - Singapore
Nonlinear Devices (NLD) are used in virtual bass system. NLD generates harmonics which in turn create the pitch perception and are used in audio bass enhancement systems using psychoacoustics. This paper presents the mathematical derivations and analysis of five different NLD devices, together with intermodulation analysis of harmonics generated by these NLDs. The five NLDS are half-wave rectifier, full-wave rectifier, square wave, polynomial function, and exponential function. The derivation of harmonic analysis equations are based on Fourier Theorems, Chebyshev Polynomials, and Taylor Series expansions. Besides the harmonics, intermodulation components also resulted from NLDs. Both mathematical analysis and simulation results are presented for the intermodulation effects of harmonics generated by NLDS.
Convention Paper 7403 (Purchase now)
Psychoacoustics, Perception, and Listening Tests - 1
Monday, May 19, 09:00 — 12:00
Chair: Jan de Laat, LUMC - Leiden, The Netherlands
P14-1 Speech Quality Measurement for the Hearing Impaired on the Basis of PESQ—John G. Beerends, TNO Information and Communication Technology - Delft, The Netherlands; Jan Krebber, Technische Universität Dresden - Dresden, Germany, now with Sysopendigia PLC, Helsinki, Finland; Rainer Huber, HörTech GmbH - Oldenburg, Germany; Koen Eneman, Heleen Luts, Katholieke Universiteit Leuven - Leuven, Belgium
One of the research topics within the HearCom project, a European project that studies the impact of hearing loss on communication, is to find methods with which the speech quality as perceived by the hearing impaired can be measured objectively. ITU-T Recommendation P.862 PESQ and its wideband extension P.862.2, are obvious candidates for this despite the fact that they were developed for normal hearing subjects. This paper investigates the extent to which PESQ and possible simple extensions can be used to measure the quality of speech signals as perceived by hearing impaired subjects.
Convention Paper 7404 (Purchase now)
P14-2 Subjective Evaluation of Speech Quality in a Conversational Context—Emilie Geissner, Valérie Gautier-Turbin, France Télécom R&D - Lannion, France; Marie Guéguin, Laboratoire Traitement du Signal et de l'Image - Rennes, France, and Université de Rennes, Rennes, France; Laetitia Gros, France Télécom R&D - Lannion, France
Within the framework of ITU-T, an objective conversational model is developed to predict the impact of network impairments on the conversational quality experienced by a end-user. To train and validate such a model, subjective scores are required. Assuming that a conversation is made of talking, listening, and inter-action activities, a subjective test protocol is specially designed to take into account these multidimensional aspects of the speech quality in a conversation. Subjects are asked to evaluate speech quality in talking, listening, and conversational contexts separately during three successive tasks. The analyses of several tests show that this method is valid for the assessment of listening, talking, and conversational quality.
Convention Paper 7405 (Purchase now)
P14-3 Contribution of Interaural Difference to Obstacle Sense of the Blind While Walking—Takahiro Miura, Teruo Muraoka, Shuichi Ino, Tohru Ifukube, University of Tokyo - Tokyo, Japan
Most blind people can recognize some measure of objects existing around them only by hearing. This ability is called "obstacle sense" or "obstacle perception." It is known that this ability is facilitated while the subjects are moving, however, the exact reason of the facilitation has been unknown. It is apparent that some differences of sounds reaching between both ears significantly change while approaching the obstacles. We focused on this phenomenon called interaural difference in order to analyze the facilitation mechanism of the obstacle sense. We investigated how the interaural differences change depending on the head rotation while walking and then measured the DL (Difference Limen) of the interaural difference. Furthermore, we compared the measurement data and the DL with the relationship between the subject-to-obstacle distance and then discussed one of the factors of the facilitating the obstacle sense.
Convention Paper 7406 (Purchase now)
P14-4 The Accuracy of Localizing Virtual Sound Sources: Effects of Pointing Method and Visual Environment—Piotr Majdak, Bernhard Laback, Matthew Goupell, Michael Mihocic, Austrian Academy of Sciences - Vienna, Austria
The ability to localize sound sources in a 3-D-space was tested in humans. The subjects listened to noises filtered with subject-specific head-related transfer functions. In the experiment using naïve subjects, the conditions included the type of visual environment (darkness or structured virtual world) presented via head mounted display and pointing method (head and manual pointing). The results show that the errors in the horizontal dimension were smaller when head pointing was used. Manual pointing showed smaller errors in the vertical dimension. Generally, the effect of pointing method was significant but small. The presence of structured virtual visual environment significantly improved the localization accuracy in all conditions. This supports the benefit of using a visual virtual environment in acoustic tasks like sound localization.
Convention Paper 7407 (Purchase now)
P14-5 Perceived Spatial Distribution and Width of Horizontal Ensemble of Independent Noise Signals as Function of Waveform and Sample Length—Toni Hirvonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
This paper investigates the perceived sound distribution and width of a horizontal loudspeaker ensemble as a function of signal length, as all loudspeakers emit simultaneous, white Gaussian noise bursts. In Experiment 1, subjects indicated the perceived distribution of 10 frozen cases where signal length was 2.5 ms. In Experiment 2, two cases from the previous test were investigated with signal lengths of 5-640 ms. The results indicate that (1) ensembles consisting of different short noise bursts vary in perceived distribution between cases and (2) when the length of the signal is increased, the produced sound event is generally perceived more wide. In perceiving such cases, the hearing system possibly utilized some temporal integration and/or adaptive processes.
Convention Paper 7408 (Purchase now)
P14-6 Effect of Minimizing Spatial Separation and Melodic Variations in Simultaneously Presented Two-Syllable Words—Jon Allan, Jan Berg, Luleå University of Technology - Piteå, Sweden
This paper will examine two important factors for the conception Auditory Streaming defined by Bregman, pitch, and localization. By removing one or two of these factors as possible identifiers to separate sound sources, the importance of each of them and the effect of reducing both of them will be studied. Stimuli with combinations of two-syllable words will be presented simultaneously in speakers to subjects, and the number of correct identifications will be measured. In one category of stimuli speech, melody will be removed and replaced with a monotonous pitch, equal for all words. One category will have all words presented from one speaker only. Conclusions will be related to earlier studies and common theories, the Cocktail party effect among others.
Convention Paper 7409 (Purchase now)
Signal Processing, Sound Quality Design
Monday, May 19, 09:00 — 12:00
Chair: Jan Abildgaard Pedersen, Lyngdorf Audio - Skive, Denmark
P15-1 Characterization of the Multidimensional Perceptive Space for Current Speech and Sound Codecs—Thierry Etame, France Télécom R&D - Lannion Cedex, France, and University of Rennes, Rennes, France; Laetitia Gros, Catherine Quinquis, France Télécom R&D - Lannion Cedex, France; Gérard Faucon, Régine Le Bouquin Jeannes, INSERM - Rennes, France, and University of Rennes, Rennes, France
The purpose of our work is to produce a reference system that can simulate and calibrate degradations of speech and audio codecs which are currently used on telecommunications networks, for subjective assessment tests of voice quality. At first, 20 wideband codecs are evaluated through subjective tests with the general goal of producing the multidimensional perceptive space underlying the perception of current degradations. Then, from a verbalization task, it appears that the identified attributes are clear/muffle, high-frequency noise, noise on speech, and hiss. Finally, these dimensions are characterized with correlates such as spectral centroid, spectral flatness measure, Mean Opinion Score, and correlation coefficient.
Convention Paper 7410 (Purchase now)
P15-2 An Automatic Maximum Gain Normalization Technique with Applications to Audio Mixing—Enrique Perez Gonzalez, Joshua D. Reiss, Queen Mary, University of London - London, UK
A method for real-time magnitude gain normalization of a changing linear system has been developed and tested with a parametric filter design. The method is useful in situations where the maximum gain before feedback is needed. The method automatically calculates the appropriate gain that should be applied in order to maintain maximum unitary gain. The method uses an impulse measurement of a mathematical model of the system to be normalized. This is particularly useful for mixing engineers, who have to continually revise their gain structure in order to maximize gain before feedback. The system is also useful in many other situations where solving the analytical solution from the mathematical model is not possible.
Convention Paper 7411 (Purchase now)
P15-3 An Alternative Approach for the Convolution in Time-Domain: The Taches-Algorithm—Laurent Millot, Gérard Pelé, ENS Louis-Lumière - Noisy-le-Grand cedex France
We present an alternative temporal approach for convolution, providing a new algorithm, called the taches-algorithm. Based on interferences between the successive delayed and amplified output signals associated respectively with the impulses constituting the input signal, the taches-algorithm can give access immediately to the new output sample and have a low latency response using vector-based optimization of the calculation. With the taches-algorithm it is easy to change (even in real time) the impulse response while running the calculation, simply by updating the impulse response to use it for next samples, a task rather difficult to achieve using FFT convolution. Real time audio demonstrations using Pure Data and simple explanations of the taches-algorithm will be given.
Convention Paper 7412 (Purchase now)
P15-4 Performance of Independent Component Analysis when Used to Separate Competing Acoustic Sources in Anechoic and Reverberant Conditions—Ben Shirley, Paul Kendrick, University of Salford - Salford, Greater Manchester, UK
A review of existing methods for independent component analysis was carried out and a series of experiments conducted assessing the use of existing independent component analysis (ICA) methods to separate microphone sources in varied acoustic environments. Specifically the research looked at how effectively ICA could perform in a broadcast context using standard microphone techniques such as spaced omni and coincident crossed cardioid pairs. Experiments were carried out in an anechoic chamber and also in a listening room conforming to the ITU-R BS.1116-2 standard. Results clearly indicate the limitations of ICA when performed on audio material recorded in a reverberant environment; however it was still shown possible to achieve separation of signals of up to 12 dB even in these conditions.
Convention Paper 7413 (Purchase now)
P15-5 A Cross-Platform Audio Signal Processing Environment for Real-Time Audio Algorithm Development—Mika Ristimäki, Nokia Research Center - Helsinki, Finland; Matti Hämäläinen, Nokia Research Center - Tampere, Finland; Julia Turku, Nokia Research Center - Helsinki, Finland; Riitta Väänänen, Nokia Research Center - Tampere, Finland
This paper presents a real-time audio algorithm development environment for experimental audio system research. The backbone of the system is Pure Data audio signal processing platform, which enables flexible implementation of real-time audio systems. With the proposed development environment the user can concentrate on real-time audio algorithm development and performance evaluation in the workstation environment. We present the proposed algorithm design method and environment, and its application to an experimental Voice over Internet Protocol (VoIP) system development.
Convention Paper 7414 (Purchase now)
P15-6 New Enhancements to the Automatic Noise Removal (ANR) System Utilizing Improved Noise Statistics and Multi-Band Processing—Shamail Saeed, Harinarayanan E. V., ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
We recently introduced a novel Automatic Noise Reduction (ANR) algorithm for the removal of wideband stationary/nonstationary noise from audio. Current noise reduction techniques exhibit certain undesirable characteristics. Distortion and/or alteration of the audio characteristics is a common problem. User intervention in identifying the noise profile is sometimes necessary. ANR uses a novel framework employing dominant component subtraction and restoration and performs better than conventional techniques in subjective tests. Here we describe three enhancements to ANR. The first of these increases the level of noise removal for the special case of stationary background noise. The second is a new tool for improving the temporal envelope coherence and yields additional noise removal. The third is a multi-band processing tool for conditioning time-frequency envelope for reduced listener fatigue.
Convention Paper 7415 (Purchase now)
Analysis and Synthesis of Sound, Part 1
Monday, May 19, 09:30 — 11:00
P16-1 A Channel Vocoder Using Wavelet Packets over a Reconfigurable Device—César Daniel Salvador Castañeda, Pontificia Universidad Católica del Perú - Lima, Peru
A channel vocoder using wavelet packets for computer music applications is proposed. The inputs are a modulating signal, which is choice to be voice, and a carrier signal, which can be music or noise. The Wavelet Packets Channel Vocoder transforms windowed frames of both signals to a symmetric multiresolution representation, mixes the envelope of the modulating signal with the carrier, and transforms back the result to the original domain. Simulations run with Simulink. Real time implementations are presented for Pure Data and Xilinx Virtex II Pro FPGA. Appropriate choices of window, overlap, wavelets, decomposition levels, and envelope detector are presented to achieve different sound effects. Finally, new ideas to improve transmission and compression rates in future works are also proposed.
Convention Paper 7416 (Purchase now)
P16-2 The Effects of Lossy Audio Encoding on Genre Classification Tasks—Kurt Jacobson, Queen Mary University of London - London, UK; Ben Fields, Goldsmith's College, University of London - London, UK; Mark Sandler, Queen Mary, University of London - London, UK; Michael Casey, Goldsmith's College, University of London - London, UK
In large audio collections, it is common to store audio content using perceptual encoding. However, encoding parameters may vary from collection to collection or even within a collection—using different bit rates, sample rates, codecs, etc. We evaluate the effect of various lossy audio encodings on the application of audio spectrum projection features to the automatic genre classification tasks. We show that decreases in mean classification accuracy, while small, are statistically significant for bit-rates of 96-kbps or lower. Also, a heterogeneous collection of audio encodings has statistically significant decreases in mean classification accuracy compared to a pure PCM collection.
Convention Paper 7417 (Purchase now)
P16-3 Loop Region Detection in Music Signals—Bee Suan Ong, Sebastian Streich, Centre for Advanced Sound Technologies, Yamaha Corporation - Japan
Spotting loops within a music recording seems to be an easy task for human listeners. Nevertheless it becomes highly time and effort consuming when loop segments are to be identified from a large music collection. The process can be greatly facilitated with an audio editing tool that highlights regions where loops appear and suggests loop durations respectively. This paper proposes a method for computing both types of information from the music signals. Our approach is based on identifying sequential and regular repetitions of tonal features. In addition, we present a prototype implementation featuring the proposed method to facilitate the audio browsing and searching process. Finally, we discuss other possible applications of this technology in the audio content description context.
Convention Paper 7418 (Purchase now)
P16-4 Music-Inspired Harmony Search Algorithm Applied to Feature Selection for Sound Classification in Hearing Aids—Javier Amor, Enrique Alexandre, Roberto Gil-Pita, Lorena Álvarez, Ester Huerta, Universidad de Alcalá - Alcalá, Spain
This paper explores the application of the music-inspired harmony search algorithm to the problem of feature selection for sound classification in digital hearing aids. The importance of this problem is given by the strong computational constraints inherent to the DSPs used in modern digital hearing aids. The goal of the feature selection algorithm is to select a subset of features in order to reduce the computational complexity of the system while maintaining a low probability of error. A set of experiments will be performed to test the performance of the proposed system, using a total of 74 different features. The results will be compared with those obtained using other widely-used algorithms, such as a genetic algorithm, a sequential search algorithm or random search.
Convention Paper 7419 (Purchase now)
P16-5 Analysis of the Effects of Finite Precision in Sound Classifiers for Digital Hearing Aids—Ester Huerta, Enrique Alexandre, Roberto Gil-Pita, Lorena Álvarez, Javier Amor, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain
This paper deals with the analysis of quantization effects in an automatic sound classification system for DSP-based hearing aids. The results obtained in this work will be used to find out the impact of finite accuracy determined by the digital signal processor (DSP) on the users of hearing aids. The DSP has a finite word length that affects the main ability of these systems: the automatic adaptation to the changing acoustic environment. The goal of this work is to model a quantized Neural Network-based classifier in order to compare the probability of error obtained with those nonfinite precision systems.
Convention Paper 7420 (Purchase now)
P16-6 A Constructive Algorithm for Multilayer Perceptrons for Speech/Non-Speech Classification in Hearing Aids—Lorena Álvarez, Enrique Alexandre, Raúl Vicen, Lucas Cuadra, Manuel Rosa, Universidad de Alcalá - Alcalá de Henares, Spain
Constructive learning algorithms offer an attractive approach for the incremental construction of near-minimal neural-network architectures for pattern classification. This paper explores the feasibility of using a constructive algorithm for multilayer perceptrons (MLPs) applied to the problem of speech/non-speech classification in hearing aids. When properly designed and trained, MLPs are able to generate an arbitrary classification frontier with a relatively low computational complexity. The paper will focus on the design of a constructive algorithm for MLPs that attempts to converge to the minimum complexity network for the given problem. The results obtained will be compared with those cases in which the constructive algorithm is not considered.
Convention Paper 7421 (Purchase now)
P16-7 Seeing the Inaudible. Descriptors Used for Generating Objective and Reproducible Data in Real-Time for Musical Instrument Playing Standard Situations—Tobias Grosshauser, Diemo Schwarz, Norbert Schnell, IRCAM - Paris, France
This paper describes a method to generate objective and reproducible data to assist instrument teaching and practicing. the method is based on using audio descriptors and their efficient visualization that assist in the perception of musical parameters difficult to hear. To aid comparison, we defined and recorded a comprehensive database of positive and negative sound examples from the violin that encompasses frequent mistakes made by students and a wide variety of playing styles.
Convention Paper 7422 (Purchase now)
Analysis and Synthesis of Sound, Part 2
Monday, May 19, 11:30 — 13:00
P17-1 Structural Segmentation of Music Using Set Accented Tones—Cillian Kelly, Mikel Gainza, David Dorran, Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
An approach that efficiently segments Irish Traditional Music into its constituent structural segments is presented. The complexity of the segmentation process is greatly increased due to melodic variation existent within this music type. In order to deal with these variations, a novel method using “set accented tones” is introduced. The premise is that these tones are less susceptible to variation than all other tones. Thus, the location of the accented tones is estimated and pitch information is extracted at these specific locations. Following this, a vector containing the pitch values is used to extract similar patterns using heuristics specific to Irish Traditional Music. The robustness of the approach is evaluated using a set of commercial Irish Traditional recordings.
Convention Paper 7423 (Purchase now)
P17-2 AnClaS3: A Blackboard-Based Cooperative Framework for Sound Separation—Antonio Pena, Norberto Degara-Quintela, Manuel Sobreira-Seoane, Universidade de Vigo - Vigo, Spain; Soledad Torres-Guijarro, Laboratorio Oficial de Metroloxía de Galicia (LOMG) - Tecnópole, Ourense, Spain
Blackboard modeling provides a great flexibility in structuring complex problems and a robust adaptation to the conditions of the signal to be analyzed, adding both bottom-up and top-down capabilities to the system. AnClaS3 (Analysis, Classification, and Synthesis for Sound Separation) is a cooperative project where five research groups collaborate integrating algorithms and developing new separation methods. This contribution defines a blackboard-based framework where four blackboard-based systems interact to integrate the expertise of independent research groups in order to solve a sound separation problem.
Convention Paper 7424 (Purchase now)
P17-3 Analysis and Synthesis of Audio Vibrato Using Harmonic Sinusoids—Wen Xue, Mark Sandler, Queen Mary, University of London - London, UK
This paper introduces the analysis and synthesis of vibrato in music audio. The analyzer separates frequency modulators from their carriers using a demodulation process. It then describes the frequency variations of a vibrato using a period-synchronized parameter set and the accompanying amplitude variations using a source-filter model, both of which can be regarded slow-varying. The synthesizer, on the other hand, reconstructs a vibrato from a given set of parameters. Using this system we are able to retrieve specific characteristics of vibratos, or modify them to implement various audio effects.
Convention Paper 7425 (Purchase now)
P17-4 Distortion Analysis and Reduction for the Parametric Array—Ee-Leng Tan, Woon-Seng Gan, Nanyang Technological University - Singapore; PeiFeng Ji, Jun Yang, Chinese Academy of Sciences - Beijing, China
In this paper distortion analysis and reduction for the parametric array loudspeaker is being presented. The parametric loudspeaker has been found useful in generating a highly directional sound beam. However, due to the nonlinear interaction of ultrasonic wave in air, several undesired harmonic distortions have been generated. Conventional approaches in reducing the distortion have not created satisfying solutions. A new approach capable of further reducing the distortion has been proposed in this paper. Several simulation results are being carried out in this work to test and compare the effectiveness of this proposed solution with conventional approaches.
Convention Paper 7426 (Purchase now)
P17-5 Piano "Forte Pedal" Analysis and Detection—Antony Schutz, EURECOM Institute - Valbonne Sophia-Antipolis, France; Valentin Emiya, Telecom Paris Tech - Paris, France; Dirk T. M. Slock, EURECOM Institute - Valbonne Sophia-Antipolis, France; Bertrand David, Roland Badeau, Telecom Paris Tech - Paris, France
In this paper we describe some features of the Forte Pedal piano effect and propose a method for detecting it through signal analysis. The detection method is applied to single tones recorded for this purpose. The Forte Pedal is found to increase the decay time of partials. In fact, this effect dominates the behavior of the partials, in not only the duration, but also the evolution. When the sustain pedal is used, a floor noise appears for all the notes of the piano. Here, after the analysis of some relevant characteristics we provide a method based on harmonic plus noise decomposition for analyzing the residual and decide if the pedal is pressed or not.
Convention Paper 7427 (Purchase now)
Room and Architectural Acoustics & Sound Reinforcement
Monday, May 19, 12:30 — 17:00
Chair: Jan Voetmann, DELTA Acoustics - Hoersholm, Denmark
P18-1 Diffusing Boundary Implementations in the 2-D Digital Waveguide Mesh—Simon Shelley, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Damian Murphy, York University - Heslington, York, UK
The digital waveguide mesh is a wave-based time-domain approach to the simulation of sound wave propagation in an acoustic system. The implementation of diffuse reflection is an important consideration in such an application, as the presence of diffuse reflection has a significant effect on an acoustic environment. The scattering effect of diffuse boundaries on reflected sounds, both in simulation and the real world, can be described using a technique that results in the formulation of frequency dependent diffusion coefficients. In this paper a number of different approaches to modeling diffuse reflection in a 2-D digital waveguide mesh are presented as well as a detailed analysis and comparison of the local scattering effect of the diffuse boundary models using this technique.
Convention Paper 7428 (Purchase now)
P18-2 RenderAIR—Room Acoustics Simulation Using a Hybrid Digital Waveguide Mesh—Damian Murphy, Mark Beeson, University of York - Heslington, York, UK; Simon Shelley, Eindhoven University of Technology - Eindhoven, The Netherlands; Alex Southern, Alastair Moore, University of York - Heslington, York, UK
The digital waveguide mesh (DWM) is a numerical simulation technique used to model signal propagation through a regular grid of spatio-temporal sampling points, and has been demonstrated as appropriate for modeling the acoustics of an enclosed space, particularly at low frequencies. The RenderAIR DWM application allows intuitive definition of parameters associated with geometry, boundary surface, and source/receiver parameters required to generate spatially encoded Room Impulse Responses (RIRs). In this paper the expectations and limitations of DWM-based room acoustics modeling are explored through the use of the RenderAIR application in a number of situations. ISO3382 metrics are used as the main benchmark for the results obtained, which compare well with both real-world measurements and more traditional geometric acoustic approaches.
Convention Paper 7429 (Purchase now)
P18-3 Volumetric Diffusers—Richard Hughes, Jamie A. S. Angus, Trevor Cox, Olga Umnova, University of Salford - Salford, Greater Manchester, UK
Although many types of diffusers have been proposed, they are predominantly surface treatment. This paper places the diffuser in the volume of the room rather than on the surfaces, forming a volume-based diffuser. In particular, we examine suitable sequences for their implementation. We also consider suitable metric’s to evaluate their performance. At first single layer volumetric diffusers are examined, and then multi-layer volumetric diffusers are investigated. In particular, the effects of varying the spacing, and number of layers, is more closely examined. The Boundary Element Method (BEM) model is used to gain accurate predictions of the diffuser’s performance. Finally, we demonstrate a diffusion structure that has a similar performance to that of a Primitive Root Diffuser (PRD).
Convention Paper 7432 (Purchase now)
P18-4 Commercial Low Frequency Absorbers—A Comparative Study—Gabriel Hauser, Dirk Noy, Walters-Storyk Design Group - Basel, Switzerland; John Storyk, Walters-Storyk Design Group - Highland, NY, USA
This paper ties in to a previous Convention Paper by the same authors (AES 115th Convention, 2003, #5944) and presents a current set of commercially available passive and active low frequency absorbing devices. One item in particular is of an experimental nature—a wood box loaded with conventional membrane loudspeakers. These are not connected to an amplifier, but to a variety of different passive electronics networks (parallel, serial). Reproducible acoustical measurements have been taken in a completely untreated rectangular concrete room, sequentially with and without a total of eight different absorbing devices. Results are compared and conclusions are presented.
Convention Paper 7431 (Purchase now)
P18-5 Modeling Frequency-Dependent Boundaries as Digital Impedance Filters in FDTD and K-DWM Room Acoustic Simulation—Konrad Kowalczyk, Maarten van Walstijn, Queen's University Belfast - Belfast, Northern Ireland, UK
This paper presents a new method for modeling frequency-dependent boundaries in finite difference time domain (FDTD) and Kirchhoff variable digital waveguide mesh (K-DWM) room acoustics simulations. The proposed approach allows direct incorporation of a digital impedance filter (DIF) in the multi-dimensional (i.e., 2-D or 3-D) FDTD boundary model of a locally reacting surface. An explicit boundary update equation is obtained by carefully constructing a suitable recursive formulation. The method is analyzed in terms of pressure wave reflectance for different wall impedance filters and angles of incidence. Results obtained from numerical experiments confirm the high accuracy of the proposed digital impedance filter boundary model, the reflectance of which closely matches locally reacting surface (LRS) theory. Furthermore, a numerical boundary analysis (NBA) formula is provided as a technique for analytic evaluation of the numerical reflectance of the proposed digital impedance filter boundary formulation.
Winner of the Student Paper Award
Convention Paper 7430 (Purchase now)
P18-6 Loudspeaker Time Alignment Using Live Sound Measurements—Wolfgang Ahnert, Stefan Feistel, Thorsten Maier, Alexandru Radu Miron, Ahnert Feistel Media Group - Berlin, Germany
The authors previously introduced the measurement software EASERA SysTune, which can be used for measurements with live music and speech signals. In this paper we discuss specifically the use of real-time measurements for the time alignment of loudspeaker arrays and distributed systems and for the optimal adjustment of their phase relationships. Being capable of deriving impulse responses of up to 12 seconds length, the measuring process with EASERA SysTune is simpler and more accurate as the real-time function provides a more immediate view on the tuning process. Because measurements can be performed with standard stimulus signals as well as with external speech and music signals, fine-tuning loudspeaker settings becomes possible even during the rehearsal time of the musicians. Required measurement conditions and limitations are given.
Convention Paper 7433 (Purchase now)
P18-7 INR as an Estimator for the Decay Range of Room Acoustic Impulse Responses—Constant Hak, Eindhoven University of Technology - Eindhoven, The Netherlands; Jan Hak, Acoustics Engineering - Boxmeer, The Netherlands; Remy Wenmaekers, Level Acoustics - Eindhoven, The Netherlands
A room acoustic impulse response can be used to derive the reverberation time and other parameters. To this end a certain minimum energy decay range or effective signal to noise ratio is required, which relates to the difference between the integrated signal level and the noise level. An impulse response parameter called INR is presented as an estimator for the decay range and shown to be a useful qualifier in practical measurements.
Convention Paper 7434 (Purchase now)
P18-8 Musical-Inspired Features for Automatic Sound Classification in Digital Hearing Aids—Pedro Vera-Candeas, Francisco J. Cañadas-Quesada, University of Jaén - Linares, Jaén, Spain; Enrique Alexandre, Manuel Rosa, University of Alcalá - Alcalá, Spain
This paper proposes the use of some musical-inspired features for the automatic classification of sounds in digital hearing aids. This kind of application is characterized by very strong constraints in terms of computational complexity. The proposed features are based on fundamental frequency detection and exhibit a low computational complexity while providing good results in terms of probability of correct classification. The performance of the system will be tested using a 1-NN classifier, the goal being to distinguish among speech, noise, and music. For the experiments a sound database, obtained using a hearing aid simulator, will be used.
Convention Paper 7435 (Purchase now)
P18-9 Assessing the Potential Intelligibility of Assistive Audio Systems for the Hard of Hearing and Other Users—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Around 14% of the European population suffer from a noticeable degree of hearing loss and would benefit from some form of hearing assistance or deaf aid. Recent DDA legislation and requirements mean that many more hearing assistive systems are being installed—yet there is evidence to suggest that many of these systems fail to perform adequately and provide the benefit expected. This paper reports on the results of some trial acoustic performance testing of such systems. In particular the effects of system microphone type, distance, and location are shown to have a significant effect on the resultant performance. The potential of using the Sound Transmission Index (STI) and in particular STIPa, for carrying out installation surveys has been investigated, and a number of practical problems are highlighted. The requirements for a suitable acoustic test source to mimic a human talker are discussed as is the need to the need to adequately assess the effects of both reverberation and noise. The findings discussed in the paper are also relevant to the installation and testing of classroom “sound field” systems and also boardroom type reinforcement systems and conferencing / teleconferencing systems.
Convention Paper 7436 (Purchase now)
Software, Instrumentation, and Measurement
Monday, May 19, 12:30 — 15:30
Chair: John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
P19-1 Graphical Control of a Parametric Equalizer—Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Graphic equalizers allow the user to define a filter’s magnitude response virtually free of restrictions. Parametric equalizers are much more limited. However, they offer some vital advantages over graphic equalizers, such as consuming less computational power and operating minimally invasively with naturally soft magnitude and phase responses. This paper aims at combining the best of both worlds. It presents a range of methods to control a digital parametric equalizer graphically through a curve or a collection of anchor points. While the user is editing the graphical input, an optimization process runs in the background and adjusts the equalizer’s parameters to reflect the input. In addition, the number of bands and their type (shelving/peak) can be adjusted automatically to produce a simple solution.
Convention Paper 7437 (Purchase now)
P19-2 Audio Software Development—An Audio Quality Perspective—Jonas Ekeroot, Jan Berg, Luleå University of Technology - Piteå, Sweden
When developing audio applications, different choices on software implementation aspects influence the total audio software signal path and can be of importance from an audio quality perspective. The field is not well documented in the literature. A study was carried out aiming at identifying relevant questions that must be considered. The general development perspective was on audio software written in C++ to be run on general purpose CPUs. A research review, comprising literature from different fields such as audio engineering, computer science, and software engineering was conducted to summarize and integrate an overview of the field. The result can be viewed as a map of questions for future research activities, consisting of further literature studies and experiments with software prototypes.
Convention Paper 7438 (Purchase now)
P19-3 Multi Carrier Modulator for Switch-Mode Audio Power Amplifiers—Arnold Knott, Harman/Becker Automotive Systems GmbH - Straubing, Germany, and Technical University of Denmark, Lyngby, Denmark; Gerhard Pfaffinger, Harman/Becker Automotive Systems GmbH - Straubing, Germany; Michael A. E. Andersen, Technical University of Denmark - Lyngby, Denmark
While switch-mode audio power amplifiers allow small implementations and high output power levels due to their high power efficiency, they are very well known for creating electromagnetic interference (EMI) with other electronic equipment, in particular radio receivers. Lowering the EMI of switch-mode audio power amplifiers while keeping the performance measures to excellent levels is, therefore, of high general interest. A modulator utilizing multiple carrier signals to generate a two level pulse train will be shown in this paper. The performance of the modulator will be compared in simulation to existing modulation topologies. The lower EMI as well as the preserved audio performance will be shown in simulation as well as measurement results of a prototype.
Convention Paper 7439 (Purchase now)
P19-4 A Comparison of Theoretical, Simulated, and Experimental Results Concerning the Stability of Sigma Delta Modulators—Georgi Tsenov, Valeri Mladenov, Technical University of Sofia - Sofia, Bulgaria; Joshua D. Reiss, Queen Mary, University of London - London, UK
Sigma delta modulation is a popular form of audio analog-to-digital and digital-to-analog conversion, but suffers from stability problems for many designs and many input signals. A general theory of stability in sigma delta modulators has been developed that predicts the stability of a high order one-bit sigma-delta modulator (SDM) under a variety of designs. In this paper the theoretical approach to stability as it applies to boundedness of states is explained. Several low pass SDM designs are developed that are intended for audio analog to digital conversion, and predicted results for stability of these designs are given. Stability is examined both in terms of the maximum allowable DC input amplitude and the theoretical sufficient conditions for stable behavior. Theoretical results are compared with simulated results, and where possible, with experimental results from a realization of a third order SDM with adjustable parameters. Practical observations are then made concerning the effect of noiseshaping, pole/zero placement, and cut-off frequency on the stability.
Convention Paper 7440 (Purchase now)
P19-5 A New Method for Identification of Nonlinear Systems Using MISO Model with Swept-Sine Technique: Application to Loudspeaker Analysis—Antonín Novák, Czech Technical University in Prague - Prague, Czech Republic, and Université du Main, Le Mans, France; Laurent Simon, Pierrick Lotton, Université du Main - Le Mans, France; Frantisek Kadlec, Czech Technical University in Prague - Prague, Czech Republic
This paper presents a Multiple Input Single Output (MISO) nonlinear model in combination with sine-sweep signals as a method for nonlinear system identification. The method is used for identification of loudspeaker nonlinearities and can be applied to nonlinearities of any audio components. It extends the method based on nonlinear convolution presented by Farina, providing a nonlinear model that allows simulation of the identified nonlinear system. The MISO model consists of a parallel combination of nonlinear branches containing linear filters and memory-less power-law distortion functions. Once the harmonic distortion components are identified by the method of Farina, the linear filters of the MISO model can be derived. The practical application of the method is demonstrated on a loudspeaker.
Convention Paper 7441 (Purchase now)
P19-6 Junction Identification Using Acoustic Reflectometry—Adam Kestian, Agnieszka Roginska, New York University - New York, NY, USA
Acoustic reflectometry is a non-invasive, time-domain method of identifying the geometry of an acoustical space. A sound pulse is injected into a space and the resulting impulse response details particular changes of impedance. In the present paper acoustic reflectometry is utilized to identify scattering junctions of geometric spaces. Most notably, the four most common types of scattering junctions are identified: a cross-sectional increase, cross-sectional decrease, L-intersection, and T-intersection.
Convention Paper 7442 (Purchase now)
Psychoacoustics, Perception, and Listening Tests
Monday, May 19, 14:00 — 15:30
P20-1 Loss of Subjective Localization Cues in Virtual Acoustic Opening—Elena Blanco-Martín, Francisco Javier Casajus-Quiros, Juan Jose Gomez-Alfageme, L. I. Ortiz-Berenguer, Universidad Politécnica de Madrid - Madrid, Spain
The reproduced sound event quality is very important in a WFS configuration that is used for an acoustic opening. One way of checking the subjective quality perceived by a listener is the ITU R. BS.1387 “Method for objective measurements of perceived audio quality,” but this method does not provide information about the listener’s ability to localize the sound. A Matlab application has been implemented (SEL, Sound Event Localization) simulating an acoustic opening configuration. The number of microphones and loudspeakers in the arrays is selectable, just as the sound source position, the gap between array transducers and the listener position. This simulation has been verified against a real configuration of acoustic opening. Moreover, the loss of localization cues has been analyzed with different multichannel codifications.
Convention Paper 7443 (Purchase now)
P20-2 Effect of Interaural Differences on Loudness of Narrowband Noise Bursts—Toni Hirvonen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
This paper investigates the effects of interaural time and level differences (ITDs and ILDs, respectively) on loudness. Dichotic samples containing various amounts of interaural differences were compared to a diotic reference. The subjects adjusted the relative threshold gain of the test sample using a two-alternative, forced choice adaptive procedure (2AFC). The test signals were Gaussian noise samples with a bandwidth of one critical band and center frequencies of 150, 600, and 2400 Hz. The results imply that ILD is prominently responsible for changes in directional loudness, which is in agreement with present binaural loudness models that consider only ILD. The experiments revealed significant individual differences between subjects even when matching two identical signals.
Convention Paper 7444 (Purchase now)
P20-3 Perception of Movements of a Focused Sound Generated with a Linear Loudspeaker Array System—Ichiki Manon, Daiki Sato, Tomoaki Tanno, Musashi Institute of Technology - Setagaya-ku, Tokyo, Japan; Kaoru Ashihara, National Institute of Advanced Science and Technology - Tsukuba, Japan; Shogo Kiryu, Musashi Institute of Technology - Setagaya-ku, Tokyo, Japan
A special loudspeaker array system was developed for an experiment on perception of movements of a focused sound. The spatial patterns of the sound pressure level for the focused sounds were measured. The patterns were improved compared to the previous preliminary experiment using commercial devices. A psychoacoustic experiment on perception of movements of the focused sound was conducted using the developed system.
Convention Paper 7445 (Purchase now)
P20-4 Subjective Evaluation for Music Recording Positions in a Coherent Region of a Reverberant Field—Yoshifumi Hara, Kogakuin University - Shinjuku-ku, Tokyo, Japan; Hiroaki Nomura, Kure College of Technology - Kure-city, Hiroshima, Japan; Mikio Tohyama, Waseda University - Shinjuku, Tokyo, Japan; Kazunori Miyoshi, Kogakuin University - Shinjuku-ku, Tokyo, Japan
In this paper we describe the most preferable frequency characteristics of the early reflections for music recording positions. We recorded short passages from two music pieces (Haendel,”Water Music Suite” and Brahms, ”Symphony No. 4”) at various distances from a sound source in a coherent region in a reverberation chamber. Subjects evaluated the preference and the subjective loudness through headphones under the diotic condition by paired comparison tests. As a result, we found that the most preferable distance indicated the distance where the loudness became maximum. The preferable recording condition could be also characterized by narrow-band envelope spectrum analysis.
Convention Paper 7446 (Purchase now)
P20-5 Efficient Individualization of HRTF Using Critical-Band-Based Spectral Cues Control—Yoomi Hur, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea; Seok-Pil Lee, Korea Electronics Technology Institute - Bundang-Gu, Sungnam-Si, Korea
Recently, 3-D audio technologies have been commonly implemented through headphones. A major problem of the headphone-based 3-D audio is in-the-head localization, which occurs due to the inaccurate Head-Related Transfer Function (HRTF). Since the individual measurements of HRTFs are impractical, there have been several researches for HRTF customization. In this paper we propose an efficient method of customizing HRTFs. In the proposed method, spectral notches and envelopes are controlled based on a critical-band rate. Thus, the structure of the proposed algorithm is much simpler than that of previous methods, but still effective. The proposed method was evaluated in the problem of externalization, and the results showed that the customized HRTF using the proposed method could greatly improve the externalization performance.
Convention Paper 7447 (Purchase now)
P20-6 How to Widen the Sweet Spot in Monitoring 5.1—Julien Bassères, Patrick Thevenot, Taylor Made System - Nangis, France
Generally speaking, sound reproduction tends to achieve the widest sweet spot. But it's seldom realized and more than that, the restricted sweet spot has become rather usual and well accepted by the audio community. This paper proposes to find a new approach in order to get a wider sweet spot, up to a certain extent, in multichannel. By optimizing the directivity of each loudspeaker in order to compensate the position of the listener, this method aims at creating a coherent and homogeneous acoustic field. Special care will be given to the directivity pattern (amplitude and phase) of the loudspeaker system.
Convention Paper 7448 (Purchase now)
P20-7 Auditory Modeling via Frequency Warped Transforms—Alexey Petrovsky, Belarusian State University of Informatics and Radioelectronics - Minsk, Belarus; Marek Parfieniuk, Adam Borowicz, Bialystok Technical University - Bialystok, Poland; Alexander Petrovsky, Belarusian State University of Informatics and Radioelectronics - Minsk, Belarus, and Bialystok Technical University, Bialystok, Poland
The goal of this paper is to show and compare four different versions of auditory modeling based on frequency warped transforms: bark-scaled wavelet packet decomposition, bark-scaled adapted wavelet packet decomposition, warped discrete Fourier transform, and four-band wavelets paraunitary filter bank, useful for perceptual audio coding, speech enhancement, and parametric audio coding matching pursuit procedure based on the psychoacoustic optimized wavelet packet dictionary. A practical implementation of the audio signal processing based on the given auditory modeling approaches are in details considered and analyzed from positions: depth of a compression, perceptual perception, a structural realizability, an opportunity to build embedded systems.
Convention Paper 7449 (Purchase now)
P20-8 The Role of Spectral Features in Sound Localization—Daniela Toledo, Henrik Møller, Aalborg University - Aalborg, Denmark
Spectral components of head-related transfer functions (HRTFs) are highly dependent on the anthropometric characteristics of subjects. In the low frequency range, a common structure is often found in HRTFs from different subjects. However, individual differences are seen at high frequencies. In binaural synthesis with non-individual HRTFs, localization errors occur if the spectral characteristics of the directional filters used do not match the individual characteristic of the listener. This investigation is focused on the spectral characteristics of HRTFs that are relevant as localization cues and how to parameterize them. This is done by cross-matching individual and non-individual HRTFs from different subjects according to the results of localization experiments.
Convention Paper 7450 (Purchase now)
P20-9 Multichannel Loudness Listening Test—Ian Dash, Australian Broadcasting Corporation - Sydney, NSW, Australia; Luis Miranda, Densil Cabrera, University of Sydney - Sydney, NSW, Australia
As part of ongoing research for ITU Recommendation BS.1770 Algorithms to measure audio programme loudness and true-peak audio level, listening tests were conducted using a standard five-channel geometry in a standard listening room to confirm the channel gains and the spectral weightings for equal loudness contribution. Most ITU-related work to date has used broadcast program as a test signal. In this test, octave band noise was used as a test signal. Twenty-seven listeners participated. Results were analyzed for statistical consistency as well as for average and variance. Agreement between the test results and various broadband loudness models, including ITU-R BS.1770, is examined.
Convention Paper 7451 (Purchase now)
Monday, May 19, 16:00 — 17:30
P21-1 Challenges in Reproduction and Evaluation of Upmixed Audio in an Automotive Environment—Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Steffen Bergweiler, Manfred Neumann, Stefan Holzhäuser, Lear Corporation GmbH - Kronach, Germany; Andreas Walther, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Audio systems with high quality sound reproduction capabilities are becoming more and more popular in the car. The need to create a pleasant sound field has lead to an increased number of loudspeakers combined with digital signal processing. To benefit from the advantages of surround sound reproduction also for two-channel legacy content an upmixing algorithm is required. In this paper challenges and requirements for high quality surround sound reproduction and upmixing are first introduced separately and then discussed jointly with the specific focus on the automotive environment. Finally a test method for the evaluation of different upmixing algorithms in the car is suggested.
Convention Paper 7452 (Purchase now)
P21-2 A General Approach to Methods for Loudspeaker Array Synthesis—Juan Miguel Navarro Ruiz, Universidad Católica San Antonio de Murcia - Guadalupe, Murcia, Spain
Loudspeaker arrays are often used as sound reinforcement in large concert halls and outdoor events to provide increased directivity. Contrary to what happens in loudspeaker systems, there is an entrenched theory in antenna array synthesis, which has been used extensively over the past few years. This paper focuses on discussing several consolidated antenna array synthesis methods. Then, a simulation software is implemented to show pros and cons of using on loudspeaker arrays. Finally, an efficient synthesis method is proposed to achieve the required characteristics.
Convention Paper 7453 (Purchase now)
P21-3 On Large Multiactuator Panels for Wave Field Synthesis Applications—Basilio Pueo, University of Alicante - Alicante, Spain; José Escolano, University of Jaén - Linares (Jaén), Spain; José Javier López, Germán Ramos, Technical University of Valencia - Valencia, Spain
Wave Field Synthesis (WFS) is a spatial sound rendering technique that generates a true sound field using loudspeaker arrays. Multiactuator Panels (MAPs) are an alternative technology to the dynamic piston loudspeakers, based on the distribute mode operation. Because of its low visual profile and negligible vibration of the panel, MAPs are very suitable for WFS reproduction. However, the size of current prototypes does not allow its use for real immersing environments in which the loudspeaker must be integrated as walls or as projection screens. In addition, the extra area of a large panel can be used to accommodate extra exciters with which to generate sound fields at another elevation level. In this paper a very large MAP prototype is presented that has been designed and built to fulfill the requirements of immersive audio applications. It represents a step forward in the applications of MAPs for immersing scenarios. The panel size enhances its acoustic behavior in the low frequency range. Also, it can be employed for relatively large projection screens for videoconferencing and for virtual reality.
Convention Paper 7454 (Purchase now)
P21-4 Temporal Changes of Psychological Impressions Regarding Microphone Arrays for Multichannel Recording—Toru Kamekawa, Atsushi Marui, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
Microphone technique for surround sound recording of an orchestra is discussed. Seven types of surround microphone sets recorded in a concert hall were compared in subjective listening test on attributes such as powerfulness and spaciousness using a method inspired by MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor). To minimize temporal change in music, Phase Randomized Signal (PRS) was proposed. From the average score of the listening test, the impression difference between original source and PRS was found in some microphone arrays consisting of directional microphones at some pieces. It means that the impression of these arrays depend on temporal changes in music. The data from the listening test between the original source and PRS showed that impressions of powerfulness had slightly higher correlation. The relations of the physical factors of each array were also compared, such as SC (Spectral Centroid), LFC (Lateral Fraction Coefficient), and S/O (Side/Omni Ratio) of each array. The correlation of these physical factors and the attribute scores show that the contribution of these physical factors depends on music and its temporal change.
Convention Paper 7455 (Purchase now)
Audio Archiving, Storage, Restoration, and Content Management
Tuesday, May 20, 09:00 — 11:00
Chair: Tin Jonker, NOB - Hilversum, The Netherlands
P22-1 Manufacturing Recordings from 100-Year-Old Masters—Sean Davies, S.W. Davies Ltd. - Aylesbury, UK; Rinus Hooning, Record Industry Bv - Haarlem, The Netherlands
Most work on the 78 rpm analog recording format concentrates on pressings made near to the time of the recording and the best ways to retrieve the information from these for future storage and reproduction. However, a considerable number of metal master plates have been preserved from the earliest days to the end of the format’s active period. This paper describes a project to manufacture new pressings from the original plates, the reasons for doing so, and the technical challenges involved.
Convention Paper 7456 (Purchase now)
P22-2 Replay of Digital Original Tapes: Practical Experiences with Video Tape Based PCM Adapters and R-DAT—Nadja Wallaszkovits, Austrian Academy of Sciences - Vienna, Austria; Heinrich Pichler, Audio Consultant - Vienna, Austria; Johannes Spitzbart, Austrian Academy of Sciences - Vienna, Austria
As many of the early digital formats are already obsolete and support of these formats cannot be guaranteed much longer by the manufacturers, archives should presently give priority to the replay of original recordings on such material. Based on a short theoretical discussion and outlining the format-specific characteristics, the paper discusses a variety of practical problems of signal retrieval from PCM (Pulse Code Modulation) encoded signals on a VTR (video tape recorder) and R-DAT (Rotary-Head Digital Audio Tape), such as mechanical problems, tracking problems and playback incompatibilities, data integrity checking, extraction and incompatibility of sub-code-information, pre-emphasis, as well as other problems occurring from irregular recording conditions (typically with field recordings produced on portable devices) or format peculiarity.
Convention Paper 7457 (Purchase now)
P22-3 A New System for File-Based Audio Recording, Preservation, and Access at Indiana University’s Jacobs School of Music—Konrad Strauss, Travis Gregg, Indiana University - Bloomington, IN, USA
The Indiana University Jacobs School of Music has been making live concert recordings on a variety of formats since the 1940s, and we continue to record approximately 500 concerts each year. Recent industry trends and changes in technology have led us to investigate the possibility of creating high-resolution digital files rather than continuing to use physical media as the archival format for our recordings. Our goal was to develop a system for the creation, access, and long-term preservation of high-resolution audio recordings and associated metadata that conformed to emerging standards for digital audio preservation. We began building such a system in July of 2006 and reached full implementation in February of 2007. This paper gives an overview of the development process, presents hardware and software solutions, and discusses workflow and data management issues.
Convention Paper 7458 (Purchase now)
P22-4 A Fast Feature Extraction System on Compressed Audio Data—Tobias Friedrich, Matthias Gruhne, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
We describe an efficient system that directly extracts features from compressed audio material. It consists of a time/frequency conversion method and a feature extraction algorithm. The conversion method provides the feature extraction algorithm with a suitable complex spectral representation directly from the compressed domain. It further allows a trade-off between computational complexity and conversion accuracy. Several operating points using different conversion accuracies were tested with an MPEG audio identification system in order to evaluate the identification confidence. Based on these results it is possible to reduce the computational complexity from O(N log N) to O(N) compared to the conventional approach (complete decoding followed by a frequency analysis).
Convention Paper 7459 (Purchase now)
Psychoacoustics, Perception, and Listening Tests - 2
Tuesday, May 20, 09:30 — 12:00
Chair: Joerg Bitzer, University of Applied Science Oldenburg - Oldenburg, Germany
P23-1 A Proposed Audio Visual Product Measure—Joe Peters, National University of Singapore - Singapore
The Multimedia Section at the Centre for Instructional Technology at the National University of Singapore has developed an audio visual assessment index (AVAI) to serve as a tool for clients to measure their evaluation of audio and video products. AVAI is based on a listing of indicators and variables that make up the fundamental elements in the capture and processing of AV products (video production): image, color, light, audio, form, aesthetics, and delivery. AVAI is currently being used by professionals for internal evaluation. A series of simulator-based AVAI courses are also underway, the purpose of which is to enable lay persons to understand the indicators and variables through simulated explanations. The thesis is, that in order to keep product value high the information gap between the producers and the lay clients must be narrowed. The sub-set of this thesis is that this narrowing can be achieved through even a singular simulator training session. What is presented in this paper is the conceptual framework and some preliminary tests. The tests are not substantial as studies are slow. AVAI is not a core area of the work of the Multimedia Section handling this study. Nevertheless, it is important to have some response from AES on this preliminary presentation.
Convention Paper 7460 (Purchase now)
P23-2 Nonexistence of Frontal Signal Unmasking from Spatially Wide Masker—Ville Pulkki, Jukka Ahonen, Helsinki University of Technology - Espoo, Finland
The masking of a frontal signal by spatially wide noise sources was investigated in a listening experiment. The noise sources consisted of a single or multiple symmetrically positioned loudspeakers in the frontal horizontal plane in anechoic conditions. It is shown that the detection threshold of the signal does not depend on masker width, which suggests that frontal unmasking does not exist in loudspeaker listening. In additional tests with signal source positioned in side it is shown that moderately small binaural unmasking occurs in that case from wide masker, and that increasing the width of masker source decreases binaural unmasking effect.
Convention Paper 7461 (Purchase now)
P23-3 Reaction Times and Performances in Recognition Tasks to Assess Speech Quality—Virginie Durin, Laetitia Gros, Orange Labs - Lannion, France; Gilles Hericher, Laboraoire Psychologie et Neurosciences de la cognition - Mont Saint Aignan, France
This paper deals with perceptive test methodologies to assess speech quality of telecommunication systems. Faced with drawbacks of typical methodologies ecommended by ITU-T, a new way to assess speech quality is investigated, by collecting reaction times and performances when subjects are achieving tasks involving degraded speech signals. A duel task with a digit recognition memory task and a letter recognition task is proposed. Three different quality levels are applied to audio signals describing digits and letters. The
results show significant differences of performances and reaction times between the three quality levels.
Convention Paper 7490 (Purchase now)
P23-4 Evaluation of Stereophonic Images with Listening Tests and Model Simulations—Munhum Park, Philip Nelson, University of Southampton - Highfield, Southampton, UK; Kyeong Ok Kang, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea
A binaural hearing model has recently been suggested for the evaluation of the performance of virtual acoustic imaging systems. The model considers excitation-inhibition (EI) cell activity patterns as the internal representation of sound localization cues and a pattern-matching procedure with a frequency-weighting scheme produces the estimate of source location in the horizontal plane. Given the reasonable prediction of some important features in human sound localization and lateralization, this paper presents a further verification and application of the model in actual listening tests. In this paper participants' responses to stereophonic images have been compared with the predictions of the model, individually established from the subject's own HRTF. Model predictions have been found to be both qualitatively and quantitatively consistent with the test results, and in particular, the agreement between 2 and 3 kHz gave a good indication that, unlike some similar models, the current model can effectively incorporate both ITD and ILD information according to their relative importance.
Convention Paper 7463 (Purchase now)
P23-5 The Sound Character Space of Spectrally Distorted Telephone Speech and its Impact on Quality—Marcel Wältermann, Alexander Raake, Sebastian Möller, Berlin University of Technology - Berlin, Germany
Spectral distortions of speech transmitted over a telephone channel may stem from linear channel filtering, codecs, electro-acoustic properties of end-user terminals, or the acoustic environment at send side. In this paper a study is presented that aims at revealing the perceptual space of spectrally distorted telephone speech and establishing a link to the overall quality of the speech. Two dimensions were identified as relevant for explaining the perceived quality: indirectness and brightness. Whereas brightness is related to the center frequency of a transfer function, indirectness is correlated with the equivalent rectangular bandwidth and constitutes the dominating factor in the perceptual space in terms of covered variance. The concept of the bandwidth impairment factor that fits into the framework of the so-called E-Model and that is based on these simple parameters for computing the integral quality of spectrally distorted speech could successfully be applied to the given data.
Convention Paper 7464 (Purchase now)
Room and Architectural Acoustics & Sound Reinforcement
Tuesday, May 20, 09:30 — 11:00
P24-1 Objective Evaluation of a Non-Environmental Control Room for 5.1 Surround Listening—Soledad Torres-Guijarro, Laboratorio Oficial de Meroloxía de Galicia (LOMG) - Tecnópole, Ourense, Spain; Antonio Pena, Norberto Degara-Quintela, Universidad de Vigo - Vigo, Spain
The control room of the Universidad de Vigo was built for the purpose of assessing small audio artifacts, such as listening analytically to coded material with different data rates. It follows a non-environment design that minimizes the influence of the room. The use of such a room as a 5.1 surround listening room will be analyzed according to international recommendations. This research includes the study of the electro-acoustic behavior of loudspeakers, geometric and acoustic properties of the room, and sound field conditions. A discussion of some divergences and implications for its use when performing surround listening tests follows the measurement results.
Convention Paper 7465 (Purchase now)
P24-2 A Case Study of Sound Reproduction and Acoustic Enhancement in Concert Halls Using Wave Field Synthesis—Clemens Kuhn-Rahloff, Matthias Rosenthal, Max Casdorff, Roger Moser, sonic emotion ag - Obergltt (Zurich), Switzerland
This paper presents the wave fields synthesis system under construction at the National Conservatory of Music Detmold (Germany). The system is dedicated to sound reproduction for artistic purposes at the Tonmeister department (Erich Thienhaus Institute) and to an enhancement of room acoustics. The system comprises 346 independent loudspeaker channels, including a horizontal loudspeaker array all around the auditory (500 seats) and ceiling loudspeakers. Since the hall is used for a broad repertoire comprising chamber music, romantic orchestra instrumentations, organ concerts, contemporary music, etc., the hall will be equipped with a variable room acoustic system. The paper presents perceptual aspects of system design concerning the direct sound and diffuse field as well as practical implementations for WFS rendering.
Convention Paper 7466 (Purchase now)
P24-3 Small Studios with Gypsum Board Sound Insulation: A Review of Their Room Acoustics, Details at the Low Frequencies—Lorenzo Rizzi, Francesco Nastasi, Rizzi Acustica - Lecco LC, Italy
At the present time most music preproduction and production is often carried out in very small, privately owned rooms, which are called “project studios.” Gypsum board technology is very common in the construction of these rooms because of its high insulation capabilities compared to low monetary and time costs. The paper discusses sweet spot impulse response measurements that have been carried out in three different but acoustically small rooms built with gypsum board sound insulating structures comparing it to a masonry built one. The room modal behavior is underlined, continuing with the analysis of decaying in time at low frequencies related to insights on perception and analysis. A different methodology of study is proposed.
Convention Paper 7467 (Purchase now)
P24-4 On the Measurement of Electro Acoustic Enhanced Sound Fields—Florian Walter, Mueller-BBM GmbH - Munich, Germany; Frank Melchior, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
The installation and optimization of acoustic enhancement systems require a large amount of experience. The verification in terms of measurement is most of the time done using conventional reverberation and acoustic parameter measurements according to ISO 3382. This is a good solution for diffuse sound analysis and general examination of early reflections, but in terms of direction dependant analysis the results are not satisfying. In this paper a room equipped with an acoustic enhancement system was measured using a circular array. The effects of adding specific early reflections and direction-dependent diffuse energy generated by the acoustic enhancement system are investigated. The results are compared to standard measurements according to ISO 3382.
Convention Paper 7468 (Purchase now)
P24-5 Applying Cochlear Modeling and Psychoacoustics in Room Acoustics—Jasper van Dorp Schuitman, Diemer de Vries, Delft University of Technology - Delft, The Netherlands
The acoustical qualities of a concert hall or any other room are generally expressed using acoustical parameters. These parameters are determined from impulse responses, as measured from single positions in a room or along a line array. However, from array measurements it turned out that parameters can fluctuate severely between small distance steps, something which does not agree with human perception. Applying cochlear modeling and psychoacoustics in this process seems a promising technique to reach results that do not suffer from these fluctuations and thus are much closer to human perception compared with conventional techniques.
Convention Paper 7469 (Purchase now)
P24-6 Empirical Evaluation of the Frequency-Dependent Boundary Conditions in a Digital Waveguide Mesh—José Escolano, University of Jaén - Linares, Jaén, Spain; Basilio Pueo, University of Alicante - Alicante, Spain; José J. López, Máximo Cobos, Technical University of Valencia - Valencia, Spain
The digital waveguide mesh is a popular method for time domain acoustic system simulation such as room acoustics. One of the main reasons to choose this paradigm relies in the ease to include boundary conditions in the simulation. This paper is focused on the comparison of the simulation with real-world measurements, where a particular scenario is physically built and the corresponding simulation, according to their physical parameters, is carried out. The main scope of this paper is the validation and discussion of a boundary condition model and their correspondence with the measurements through an example.
Convention Paper 7470 (Purchase now)
P24-7 Subjective Effects of Dispersion in the Simulation of Room Acoustics Using Digital Waveguide Mesh—Jose J. Lopez, Technical University of Valencia - Valencia, Spain; José Escolano, University of Jaén - Linares (Jaén), Spain; Maximo Cobos, Technical University of Valencia - Valencia, Spain; Basilio Pueo, University of Alicante - Alicante, Spain
The simulation of room acoustics using the Digital Waveguide Mesh method has gained interest in the last few years. One of the problems of this method is the frequency and angle dependent dispersion. In order to reduce this effect, an oversampling is usually employed but at the cost of highly increasing the resulting computational cost and restricting the simulation to lower frequencies. In this paper a subjective analysis is carried out; where different oversampling factors in voice band simulations have been performed and evaluated by a set of listeners. Some listening tests employing ABX methodology have been used to evaluate the subjective effects, obtaining some preliminary results that, although not being conclusive, they represent a first approach to the problem.
Convention Paper 7471 (Purchase now)
Tuesday, May 20, 11:30 — 16:00
Chair: Günther Theile, Institut für Rundfunktechnik - Munich, Germany
P25-1 Bitstream Format for Spatio-Temporal Wave Field Coder—Francisco Pinto, Martin Vetterli, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
We present a non-parametric method for compressing multichannel audio data for reproduction through Wave Field Synthesis. The method consists of applying a two-dimensional filterbank to the input multichannel signal, in both time and channel dimensions, and coding the two-dimensional spectra using a spatio-temporal frequency masking model. The coded spectral data is organized into a bitstream together with side information containing scale factors and Huffman codebook information. We demonstrate how this coding method can be applied to any smooth distribution of loudspeakers in space, while obtaining a stable bit rate that is 15% lower compared to coding each channel independently.
Convention Paper 7472 (Purchase now)
P25-2 The Design of Ambisonic Decoders for the ITU 5.1 Layout with Even Performance Characteristics—David Moore, Jonathan Wakefield, University of Huddersfield - Huddersfield, West Yorkshire, UK
All previously published Ambisonic decoders for irregular loudspeaker layouts have localization performance that varies significantly by angle around the listener. This contrasts with decoders designed for evenly spaced arrangements of loudspeakers where performance characteristics are isotropic. Furthermore, even localization performance around the listener is desirable for a number of application areas of 5.1 surround sound. New decoder design criteria are presented that aim to reduce this variation in localization performance. These criteria are added to a multi-objective fitness function, based on auditory localization theory, which guides a heuristic search algorithm to derive decoder parameter sets for the ITU5.1 layout. The derived decoders exhibit a significant improvement in localization performance variation by angle around the 360-degree sound stage.
Convention Paper 7473 (Purchase now)
P25-3 Methods for Sharing Stereo and Multichannel Recordings among Planetariums—Leslie Gaston, Peter Dougall, Erick D. Thompson, University of Colorado at Denver - Denver, CO, USA
There is a demand for research on the transferability of surround sound audio from one planetarium to another, so that (1) audiences have similar experiences and (2) audio engineers can easily create this experience. This paper will consider: acoustics, production, delivery, equipment, and seating arrangements. Our recent survey of over 100 planetariums worldwide in the fall of 2007 will provide a look at current practices. The University of Colorado Denver and Gates Planetarium have collaborated in order to explore the potential of current audio technology, and to discover what similarities and differences exist between planetariums in order to achieve this goal of transferability.
Convention Paper 7474 (Purchase now)
P25-4 Optimal Hierarchical Bandwidth Limitation of Surround Sound—Yu Jiao, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
In order to save the transmission bandwidth of surround sound, a technique named Hierarchical Bandwidth Limitation (HBL) was proposed by the authors. In HBL, a psychoacoustically hierarchical transform is used as the preprocessing algorithm prior to bandwidth limitation. In our former experiments we found that the Karhunen-Lòeve transform (KLT) is a suitable hierarchical transform for HBL. Besides the hierarchical transform, the choice of an appropriate strategy for bandwidth allocation is also essential from the point of view of the resultant audio quality. In order to find the optimal bandwidth allocation strategy that achieves the best audio quality, the authors attempted to build up the mathematical relationship between audio quality and the bandwidth allocation strategy using a MUSHRA listening test. The experiment design and results of this listening test are reported in this paper.
Convention Paper 7475 (Purchase now)
P25-5 Frequency-Dependent Signal-Correlation in Surround- and Stereo-Microphone Systems and the Blumlein-Pfanzagl-Triple (BPT)—Edwin Pfanzagl-Cardone, Salzburg Festival - Salzburg, Austria; Robert Höldrich, Institute of Electronic Music and Acoustics - Graz, Austria
With the aim to recreate the original concert-hall sound field as faithfully as possible in the control- or living-room, recordings were made simultaneously with an artificial head and several surround microphone techniques (among them the new BPT method). The surround recordings were rerecorded using the same dummy-head as in the concert hall. The results of subjective listening tests (loudspeaker as well as binaural) were assessed using ANOVA and correlation analysis. Acoustical analysis of the dummy-head recordings was performed by measuring the Frequency-Dependent Inter Aural Cross-Correlation Coefficient (FIACC): the low-correlation AB-PC microphone system was capable of reproducing the original sound field better than any of the other systems under test (DECCA, KFM, OCT). A microphone systems Critical Frequency, below which correlation raises toward 1, is defined.
Convention Paper 7476 (Purchase now)
P25-6 Holographic Design of Source Array for Achieving a Desired Sound Field—Wan-Ho Cho, Jeong-Guon Ih, Korea Advance Institute of Science and Technology (KAIST) - Daejeon, Korea; Marinus M. Boone, Delft University of Technology - Delft, The Netherlands
For realizing a desired complicated sound field, an acoustic source array should be designed appropriately to obtain the acoustic source parameters. To this end, we suggest a method utilizing the acoustical holography technique based on the inverse boundary element method. Acoustical analogy between the problems of source reconstruction and source design was the initial motivation of the study. In the design of the source array, the pressure distribution at specific field points is the constraint of the problem and the signal distribution at the source surface points is the object function of the problem. The whole procedure of the application consists of three stages. First, a condition of the desired sound field should be set as the constraint. Second, the geometry and boundary condition of the source array system and the target field, i.e., points in the sound field of concern, are modeled by the boundary elements. Actual characteristics of source and space can be considered to generate the accurate condition of the target field. Finally, the source parameters are inversely calculated by the backward projection. As an example, a source array to fulfill the plane wave propagating zone and another quiet zone near the propagation zone was designed and tested by simulation and measurement.
Convention Paper 7477 (Purchase now)
P25-7 New Dimensions for Ambisonics—Michael Chapman - Culoz, France
Both two-dimensional (pantophonic) and three-dimensional (periphonic) representations of soundfields are common place in ambisonics. Reproducing either on rigs essentially designed for the other is common place. What though if one synthesizes a four (or more) dimensional soundfield and reproduces this on a standard rig? As there appears to be no source on hyperspherical harmonics applicable to ambisonics, the mathematical basis is first set out. The manipulation of hyperambisonic soundfields (rotation, mirroring, dominance) is then discussed. During that discussion various “proofs” are advanced as to the finite range of transformations that can be applied to ambisonic soundfields, of whatever dimension.
Convention Paper 7478 (Purchase now)
P25-8 Improving Spherical Microphone Arrays—Nicolas Epain, Jérome Daniel, France Télécom R&D - Lannion, France
Spherical microphone arrays are useful for numerous applications, such as spatial audio capture and beamforming. However, these sensor arrays are known to have a limited frequency range, due to poor directivity at low frequencies and spatial aliasing at high frequencies. In this paper we study two methods aiming at enhancing the frequency range of spherical microphone arrays without using more sensors. First, the benefit of locating the sensors at the end of cavities within the sphere is assessed through measurements and simulations. Second, we study the influence of using large membrane microphones. Finally, results show that the frequency range could be increased in both cases studied.
Convention Paper 7479 (Purchase now)
P25-9 Migration of 5.0 Multichannel Microphone Array Design to Higher Order MMAD (6.0, 7.0, and 8.0) with or without the Inter-Format Compatibility Criteria—Michael Williams, Sounds of Scotland - Paris, France
The severe limitations of the 5.0 Multichannel Reproduction Standard in reproducing good quality audio-visual or stand-alone audio surround sound reproduction has increased the pressure on recording and reproduction system designers to increase the number of channels in an attempt to give an even more satisfactory envelopment experience. This paper extends the MMAD process to show how higher order channel array designs (6.0, 7.0, and 8.0) can be developed from the existing data on 4.0 or 5.0 Multichannel Front Sound Stage Coverage Array Designs with almost perfectly seamless and linear surround sound reproduction. Designing for inter-format compatibility can also be accommodated from the existing multi-format array design data described in a previous paper on Multichannel Arrays Generating Inter-format Compatibility (MAGIC arrays).
Convention Paper 7480 (Purchase now)
Low Bit-Rate Audio Coding
Tuesday, May 20, 11:30 — 13:00
P26-1 Autoregressive Modeling of Hilbert Envelopes for Wide-Band Audio Coding—Sriram Ganapathy, IDIAP Research Institute - Matigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland; Petr Motlicek, IDIAP Research Institute - Matigny, Switzerland; Hynek Hermansky, IDIAP Research Institute - Matigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland; Harinath Garudadri, Qualcomm Inc. - San Diego, CA, USA
Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper we propose a wide-band audio coding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP-based audio codec at 66 kbps provides competitive results compared to the state-of-art codecs at similar bit-rates.
Convention Paper 7481 (Purchase now)
P26-2 On Locality of Spectral Oriented Tree for Bit-Plane Based Low-Bit Rate Audio Coding—Yu-Lin Wang, Alvin W. Y. Su, National Cheng-Kung University - Tainan, Taiwan
For Spectral Oriented Trees (SOT) based coders such as SPIHT and CEIHT, locality is usually related to the locations of coefficients within a SOT and its effect to coding efficiency. How to construct a SOT to achieve better locality is very important. This paper presents a diagnostic aspect of the localities of different ordering techniques for low bit-rate audio coding. We used several coefficient ordering schemes to construct SOTs with the same set of MDCT coefficients and observed their effects. Both objective and subjective results are presented.
Convention Paper 7482 (Purchase now)
P26-3 Perceptual Matching Pursuit for Audio Coding—Hossein Najaf-Zadeh, Ramin Pichevar, Hassan Lahdili, Louis Thibault, Communications Research Centre Canada - Ottawa, Ontario, Canada
This paper introduces a Perceptual Matching Pursuit (PMP) algorithm for audio coding. A masking model has been developed and integrated into the matching pursuit algorithm to account for the characteristics of the hearing system. By doing so, only an audible kernel is extracted at each iteration. Moreover, contrary to the matching pursuit algorithm, PMP will stop decomposing an audio signal once there is no audible part left in the residual. We have used ITU_R PEAQ to compare audio materials decomposed by PMP and by matching pursuit. Objective scores for PMP increase by up to 1 unit. A semi-formal listening test has verified the objective scores and shown the perceptual superiority of PMP over the matching pursuit algorithm.
Convention Paper 7483 (Purchase now)
P26-4 A Unifying Approach to Transform and Sinusoidal Coding of Audio—Maciej Bartkowiak, Poznan University of Technology - Poznan, Poland
The paper describes a new scenario for low bit rate audio compression that combines two classical techniques: transform coding and sinusoidal coding into a united framework. The main idea is to adaptively decompose the audio signal into subbands whose central frequencies follow continuously the local instantaneous frequencies of certain signal components (formants or individual harmonic partials). The content in each subband is encoded in the baseband after frequency shift toward DC. The technique may be considered either as modified transform coding, i.e., coding along instantaneous frequencies or as extended sinusoidal coding, i.e., modeling with partial envelopes that are represented by transform coefficients. In other words, it is a hybrid scheme offering a continuous operating mode between purely transform and purely sinusoidal compression.
Convention Paper 7484 (Purchase now)
P26-5 Low Bit Rate Audio Coding for Digital Wireless Systems—Stephen Wray, APT Ltd. - Belfast, Northern Ireland, UK
With the transition from analog to digital television, the available spectrum for wireless microphones, in-ear monitors, and other wireless devices could be under threat. Spectrum is a valuable commodity, and it is the responsibility of governments to manage it appropriately. Much has been made recently of the Spectrum Squeeze both sides of the Atlantic with discussions on White Spaces and the Digital Dividend. With bandwidth at such a premium the audio industry has been forced to consider new technologies that make efficient use of spectrum without sacrificing quality or service. Within this context, we need a new revolutionary approach to maximizing bandwidth efficiency. The author will present a new and novel coding solution to overcome the prevailing technical limitations and industry requirements for wireless applications.
Convention Paper 7485 (Purchase now)
P26-6 Bit Allocation for Linear Prediction Coefficients with Application to Lossless Audio Compression—Florin Ghido, Ioan Tabus, Tampere University of Technology - Tampere, Finland
We propose a novel technique of using bit allocation for linear prediction coefficients in asymmetric lossless audio compression. We show how to determine the optimal bit allocation using a new closed-form formula for the excess error from quantization, and describe a recently introduced algorithm (Optimization-Quantization Least Squares), which computes the optimal quantized prediction coefficients applied for the allocation. The proposed method, implemented as a modified asymmetrical OptimFROG, obtains small (but consistent) signal dependent compression improvements with virtually no decoder complexity increase (on a 847 MB audio corpus, up to 0.27%, on average 0.06%). Compared to MPEG-4 ALS, it obtained 0.38% better compression, while being at the same time approximately 5 times faster at decoding.
Convention Paper 7486 (Purchase now)
P26-7 Design of Framing in MPEG Surround Based on Dynamic Programming Algorithm—Chi-Min Liu, Chung-Han Yang, Han-Wen Hsu, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
MPEG Surround (MPS) defined by ISO/IEC is the audio coding standard of multichannel signals based on the down-mixed signal and the spatial parameters. In MPEG Surround, the time-frequency tiles decide the units to share the same spatial parameters among the multichannel signals. Hence, the decision of the tiles is the critical module deciding the required quality and bits. However, the large number of combination in the time regions, frequency bands, and multichannel signal statistics has spanned the huge search space for deciding the tiles. Our previous work at AES 119 has proposed the dynamic programming method to efficiently decide the time-frequency units for the parameter stereo coding in HE-AAC. This paper will extend the dynamic programming method to the MPS coding.
Convention Paper 7487 (Purchase now)
P26-8 New Enhancements to the Audio Bandwidth Extension Toolkit (ABET)—Harinarayanan E. V., Raghuram Annadana, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, and ATC Labs, Chatham, NJ, USA
Audio bandwidth extension has emerged as a key low bit rate coding tool. In continuation with our on going research on audio bandwidth extension, this paper presents new enhancements to the Audio Bandwidth Extension Toolkit (ABET). ABET consists of three primary tools Accurate Spectral Replacement (ASR), Fractal Self Similarity Model (FSSM), and Multi-band Temporal Envelope Amplitude Coding (MBTAC). Additionally we have also introduced a blind bandwidth extension mode into ABET. We discuss several new ideas / improvements to ABET. Specifically, enhancements to the blind bandwidth extension architecture that allow it to work with signals with only 3.5–4.0 kHz audio bandwidth are described. We also elaborate on a new tool for efficient coding of time-frequency envelope that cuts the overhead by 0.75–1.0 kbps/channel. We also address a practical issue, i.e., the computational complexity and describe a new low decoder complexity mode of ABET.
Convention Paper 7488 (Purchase now)
Psychoacoustics, Perception, and Listening Tests - 3
Tuesday, May 20, 13:00 — 16:00
Chair: John Beerends, TNO Information and Communication Technology - Delft, The Netherlands
P27-1 Perceptual Evaluation of Numerically Simulated Head-Related Transfer Functions—Julia Turku, Miikka Vilermo, Eira Seppälä, Monika Pölönen, Ole Kirkeby, Asta Kärkkäinen, Leo Kärkkäinen, Nokia Research Center - Helsinki, Finland
Head-related transfer functions (HRTFs) produced by numerical simulations were compared to measured HRTFs through two listening tests. The purpose was to determine whether the numerically simulated HRTFs, which do not contain any of the artifacts associated with acoustic measurements, capture the detail necessary for reproducing convincing 3-D sound. The results suggest that when virtual sound sources are presented to listeners binaurally over headphones, the measured and modeled HRTF sets perform equally well in terms of perception of direction. Regarding preference of binauralization methods, the simulated HRTFs performed slightly better.
Convention Paper 7489 (Purchase now)
P27-2 Evaluating Perception of Salient Frequencies: Do Mixing Engineers Hear the Same Thing?—Joerg Bitzer, University of Applied Science Oldenburg - Oldenburg, Germany; Jay LeBoeuf, Imagine Research, Inc. - San Francisco, CA, USA; Uwe Simmer, University of Applied Science Oldenburg - Oldenburg, Germany
In this paper we analyze the agreement of mixing engineers when finding salient frequencies in recorded audio tracks. Twenty-two mixing engineers were asked to use an equalizer with a high-Q and high-gain setting. Using this tool to sweep through the files’ frequencies, they analyzed sixteen audio tracks and reported the most perceptually salient frequencies. The results show that the agreement depends on the analysis bandwidth. Most mixing engineers agree with a wide frequency range. However, only a few engineers agree if the matching bandwidth is below or equal to one-third octave. In this paper we try to explain these results and give a detailed analysis.
Convention Paper 7462 (Purchase now)
P27-3 Influence of Visual Appearance on Loudspeaker Sound Quality Evaluation—Alex Karandreas, Flemming Christensen, Aalborg University - Aalborg, Denmark
Product sound quality evaluation aims to identify relevant attributes and assess their influence on the overall auditory impression. Extending this sound specific rationale, the present paper evaluates overall impression in relation to audition and vision, specifically for loudspeakers. In order to quantify the bias that the loudspeaker appearance has on the sound quality evaluation of a naive listening panel, audio stimuli of varied degradation are coupled with actual loudspeakers of different visual appearance.
Convention Paper 7491 (Purchase now)
P27-4 Comparison of Loudspeaker/Room Equalization Preferences for Multichannel, Stereo, and Mono Reproductions: Are Listeners More Discriminating in Mono?—Sean Olive, Sean Hess, Allan Devantier, Harman International Industries, Inc. - Northridge, CA, USA
Automated digital loudspeaker/room correction products are more popular than ever despite the general lack of perceptual studies on their performance measured over a range of different playback conditions. This paper describes the first of several perceptual experiments designed to explore how different loudspeaker-room correction methods affect the sound quality of reproduction given a range of different listening rooms, loudspeakers, setups, and programs that might influence their perceived performance. A panel of trained listeners gave comparative preference ratings for three different loudspeaker equalizations evaluated in a semi-reflective room using three multichannel music recordings reproduced in surround, stereo, and mono playback modes. The three equalizations were based on either anechoic or in-room measurements with different perceptual weighting given to the direct versus the direct + reflected sounds radiated by the loudspeaker. The different equalizations were identical below 400 Hz to focus on perceptual effects occurring above the room’s transition frequency. The results are summarized as follows: all three equalizations were equally preferred over the unequalized system; the difference in preference increased monotonically as the number of playback channels was reduced from 5 (surround) to 1 (mono).
Convention Paper 7492 (Purchase now)
P27-5 Caution and Warning Alarm Design and Evaluation for NASA CEV Auditory Displays—Durand Begault, NASA Ames Research Center - Moffett Field, CA, USA; Martine Godfroy, San Jose State University Foundation, NASA Ames Research Center - Moffett Field, CA, USA; Aniko Sandor, LZ Technology, NASA Johnson Space Center - Houston, TX, USA; Kritina Holden, Lockheed Martin Corporation, NASA Johnson Space Center - Houston, TX, USA
The design of caution-warning signals for NASA’s Crew Exploration Vehicle (CEV) and other future spacecraft will be based on both best practices based on current research and evaluation of current alarms. A design approach is presented based upon cross-disciplinary examination of psychoacoustic research, human factors experience, aerospace practices, and acoustical engineering requirements. A listening test with thirteen participants was performed involving ranking and grading of current and newly developed caution-warning stimuli under three conditions: (1) alarm levels adjusted for compliance with ISO 7731, "Danger signals for work places—Auditory Danger Signals;" (2) alarm levels adjusted to an overall 15 dBA s/n ratio; and (3) simulated codec low-pass filtering. The resulting analyses include determination of sounds that were judged as inappropriate, independent of condition.
Convention Paper 7493 (Purchase now)
P27-6 Loudness Calculation for Individual Acoustical Objects within Complex Temporally Variable Sounds—Cornelius Bradter, Klaus Hobohm, Hochschule für Film und Fernsehen - Potsdam, Germany
Models used for loudness calculation normally treat their input signal as an integral whole. For sounds consisting of two or more distinguishable acoustical objects this contradicts the listening experience. Auditory perception analyzes and identifies acoustical objects and may treat them differently. By expanding principles used in excitation synthesis-based loudness models, we developed a procedure to calculate loudness of a time-varying acoustical object while a second object is simultaneously present. When signals of both objects are available individually and in combination, the procedure reflects effects of one object on the other as well as changes of loudness perception due to signal features of one or both objects.
Convention Paper 7494 (Purchase now)
Software, Instrumentation, and Measurement
Tuesday, May 20, 13:30 — 15:00
P28-1 An Anatomy of Graph-Based User Interfaces for Media Processing—Christopher Schultz, Universität Bremen - Bremen, Germany, now at mediaclipping, Bremen, Germany; Jörn Loviscach, Hochschule Bremen - Bremen, Germany; Shailendra Mathur, Softimage Corp., Avid Technology, Inc. - Montreal, Quebec, Canada; Jay LeBoeuf, Digidesign - Daly City, CA, USA, now at Imagine Research, Inc., San Francisco, CA, USA
Graph-based user interfaces are employed in a variety of software such as audio synthesizers, video compositing tools, and database application builders. All of these uses afford the graphical metaphor of a graph: “Nodes” such as sound generators or filters are tied together by “links,” which may represent signal flow or conceptual relations. Focusing on media production tools, we have examined a large range of current software products to find out which de-facto standards have evolved in the field of graph-based interfaces and which features can be considered unique. We categorize a multitude of interface concepts employed in actual graph-based interfaces and describe differences in their implementation. The findings provide guidelines for developers of media production software.
Convention Paper 7495 (Purchase now)
P28-2 A Framework for Automatic Mixing Using Timbral Similarity Measures and Genetic Optimization—Bennett Kolasinski, New York University - New York, NY, USA
A novel method is introduced for automatic mix recreation using timbral classification techniques and an optimization algorithm. This approach uses the Euclidean distance between modified Spectral Histograms to calculate the distance between a mix and a target sound and uses a genetic optimization algorithm to figure out the best coefficients for that mix. The implementation has been shown to successfully recreate multitrack mixes accurately and may pave the way toward the automatic mixing of novel multitrack sessions based on a desired target sound.
Convention Paper 7496 (Purchase now)
P28-3 Delta-Sigma DAC Topologies for Improved Jitter Performance—Ivar Løkken, Anders Vinje, Trond Sæther, Norwegian University of Science and Technology - Trondheim, Norway
Specifications for audio digital-to-analog-converters (DACs) place requirements on the analog circuit design that contradict physical design conditions in a modern, digital-oriented system on a chip process. Because of low supply voltages, use of current-steering DACs has become the dominant choice for high resolution applications. Fed by a delta-sigma modulator that requantizes the digital signal to a manageable number of bits, the current-steering DAC is a continuous time type converter without any discrete time filtering. This makes it very susceptible to sampling clock jitter. In this paper jitter distortion is addressed at a topology level, investigating design choices for the delta-sigma requantizer and the possible use of semidigital multi-bit current-steering filter DACs to reduce problems with jitter susceptibility.
Convention Paper 7497 (Purchase now)
P28-4 New Measurement Methods for Anechoic Chamber Characterization—Juan Gómez-Alfageme, José Luis Sánchez-Bote, Elena Blanco-Martín, Universidad Politécnica de Madrid - Madrid, Spain
As a continuation of the work presented at the 122nd AES Convention (Paper 7153), this paper tries to study in depth the anechoic chambers qualification. The purpose of this paper is to find parameters that allow the characterization of this type of enclosure. The proposal tries to obtain data of the anechoic chambers absorption by means of the transfer functions between pairs of microphones or by means of the impulse response between pairs of microphones. The results of the transfer functions between pairs of microphones can be easily checked by the agreement of the inverse squared law, allowing determination of the chamber cut-off frequency. Making a band filtering confirmed the anechoic chamber’s qualifications.
Convention Paper 7498 (Purchase now)
P28-5 Acoustic Feedback Reduction Based on LMS and Normalized LMS Algorithms in WOLA Filters Bank Based Digital Hearing Aids—Raúl Vicen-Bueno, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain; Almudena Martínez-Leira, Dimetronic Signals - San Fernando de Henares, Madrid, Spain; Manuel Rosa-Zurera, Lucas Cuadra-Rodríguez, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain
Acoustic feedback phenomenon can disturb a digital hearing aid performance at high gains, causing instability in the haring aid and degradation in the speech. In order to restore a stable situation, an acoustic feedback reduction (AFR) subsystem using adaptive algorithms such as the least-mean square (LMS) algorithm is needed. This algorithm has a reduced computational cost, but it is very unstable. In order to avoid this situation, another feedback reduction system based on a modified version of the LMS algorithm is used. Such algorithm is: the Normalized LMS (NLMS). These two algorithms are tested in two digital hearing aid categories: the In-The-Ear and the In-The-Canal. These categories are selected because they have great feedback effects, so robust AFR subsystems are needed. The added stable gain (ASG) over the limit gain when an AFR subsystem is working in the digital hearing aid is obtained for each category. The ASG is determined as a trade-off between two measurements: the segmented signal-to-noise ratio (objective measurement) and the speech quality (subjective measurement). The results show how the digital hearing aids working with a feedback reduction adaptive filter adapted with the NLMS algorithm is able to achieve up to 18 dB of increase over the limit gain.
Convention Paper 7499 (Purchase now)
P28-6 Nonlinear Distortions in Capacitors—Menno van der Veen, ir.bureau Vanderveen bv - Zwolle, The Netherlands; Hans van Maanen, Temporal Coherence
Many people have claimed that capacitors have a notable influence on the audible quality of systems. We have identified one of the major causes of nonlinear distortions in capacitors. Charging the capacitor will result in an attractive force acting on the conducting plates. As no material is infinitely stiff, this force will reduce the thickness of the dielectricum and thus increase the capacitance. This process occurs in both phases of an AC signal in the same way and is thus nonlinear. In this paper the consequences of this process are discussed. It should be noted that other passive components like resistors and inductors can also show similar nonlinear behavior.
Convention Paper 7500 (Purchase now)