AES Budapest 2012
Paper Session Details
P1 - Applications in Audio
Thursday, April 26, 09:30 — 11:00 (Room: Lehar)
Chair:
Ville Pulkki
P1-1 Efficient Binaural Audio Rendering Using Independent Early and Diffuse Paths—Fritz Menzer, MN Signal Processing - Schwerzenbach, Switzerland
A multi-source binaural audio rendering structure is proposed that efficiently implements plausible binaural reverberation including early reflections and late reverberation. The structure contains delay lines and a feedback delay network that operate independently, modeling early reflections and diffuse reverberation, respectively. Computationally efficient heuristics are presented for the implementation of an HRTF set and for the diffuse reverberation A real-time implementation on a mobile device is presented.
Convention Paper 8584 (Purchase now)
P1-2 The Hand Clap as an Impulse Source for Measuring Room Acoustics—Prem Seetharaman, Stephen P. Tarzia, Northwestern University - Evanston, IL, USA
We test the suitability of hand clap recordings for measuring several acoustic features of musical performance and recording rooms. Our goal is to make acoustic measurement possible for amateur musicians and hobbyists through the use of a smartphone or web app. Hand claps are an attractive acoustic stimulus because they can be produced easily and without special equipment. Hand claps lack the high energy and consistency of other impulse sources, such as pistol shots, but we introduce some signal processing steps that mitigate these problems to produce reliable acoustical measurements. Our signal processing tool chain is fully-automated, which allows both amateurs and technicians to perform measurements in just a few seconds. Using our technique, measuring a room's reverberation times and frequency response is as easy as starting a smartphone app and clapping several times.
Convention Paper 8585 (Purchase now)
P1-3 Subjective Sound Quality Evaluation of a Codec for Digital Wireless Transmission—Matthias Frank, Alois Sontacchi, University of Music and Performing Arts Graz - Graz, Austria; Thomas Lindenbauer, Martin Opitz, AKG Acoustics GmbH - Vienna, Austria
This paper presents a subjective evaluation of a proprietary sub-band ADPCM (Adaptive Differential Pulse Code Modulation) codec for digital wireless transmission. The evaluation is carried out with 40 expert listeners and is divided into several experimental stages. First, the audibility threshold for codec artifacts is determined for each frequency sub-band, separately. In the next stage, different configurations are ranked on a scale of subjective sound quality ratings, with the resolutions varied across all bands. Finally, selected configurations corresponding to different quality ratings are compared to signals of analog wireless transmission in a multidimensional test. This test reveals the characteristic artifacts for each transmission method. Overall, digital transmission can achieve a better sound quality than analog transmission.
Convention Paper 8586 (Purchase now)
P2 - Emerging and Innovative Audio
Thursday, April 26, 09:30 — 11:00 (Room: Liszt)
Chair:
Francis Rumsey
P2-1 Virtual Microphones: Using Ultrasonic Sound to Receive Audio Waves—Tobias Merkel, Beuth Hochschule für Technik - Berlin, Germany; Hans.-G. Lühmann, Lütronik Elektroakustik GmbH - Berlin, Germany; Tom Ritter, Beuth Hochschule für Technik - Berlin, Germany
A highly focused ultrasound beam was sent through the room. At a distance of several meters the ultrasonic wave was received again with an ultrasonic microphone. The wave field of a common audio source was overlaid with the ultrasonic beam. It was found that the phase shift of the received ultrasonic signal obtains the audio information of the overlaid field. Since the ultrasonic beam itself acts as sound receiver, there is no technical device like membranes necessary at direct vicinity of sound reception. Because this kind of sound receiver is not visible or touchable we call it “Virtual Microphone.”
Convention Paper 8587 (Purchase now)
P2-2 Implementation and Evaluation of Autonomous Multi-Track Fader Control —Stuart Mansbridge, Saoirse Finn, Joshua D. Reiss, Queen Mary University of London - London, UK
A new approach to the autonomous control of faders for multi-track audio mixing is presented. The algorithm is designed to generate an automatic sound mix from an arbitrary number of monaural or stereo audio tracks of any sample rate and to be suitable for both live and postproduction use. Mixing levels are determined by the use of the EBU R-128 loudness measure, with a cross-adaptive process to bring each track to a time-varying average. An hysteresis loudness gate and selective smoothing prevents the adjustment of intentional dynamics in the music. Real-time and off-line software implementations have been created. Subjective evaluation is provided in the form of listening tests, where the method is compared against the results of a human mix and a previous automatic fader implementation.
Convention Paper 8588 (Purchase now)
P2-3 A Voice Classification System for Younger Children with Applications to Content Navigation—Christopher Lowis, Christopher Pike, Yves Raimond, BBC R&D - UK
A speech classification system is proposed that has applications for accessibility of content for younger children. To allow a young child to access online content (where typical interfaces such as search engines or hierarchical navigation would be inappropriate) we propose a voice classification system trained to recognize a range of sounds and vocabulary typical of younger children. As an example we designed a system for classifying animal noises. Acoustic features are extracted from a corpus of animal noises made by a class of young children. A Support Vector Machine is trained to classify the sounds into one of 12 corresponding animals. We investigated the precision and recall of the classifier for various classification parameters. We investigated an appropriate choice of features to extract from the audio and compared the performance when using mean Mel-frequency Cepstral Coefficients (MFCC), a single-Gaussian model fitted to the MFCCs, as well as a range of temporal features. To investigate the real-world applicability of the system we paid particular attention to the difference between training a generic classifier from a collected corpus of examples and one trained to a particular voice.
Convention Paper 8589 (Purchase now)
P3 - Music and Modeling
Thursday, April 26, 11:00 — 12:30 (Room: Lehar)
Chair:
Ville Pulkki
P3-1 Physical Model of the Slide Guitar: An Approach Based on Contact Forces—Gianpaolo Evangelista, Linköping University - Campus Norrköping, Sweden
In this paper we approach the synthesis of the slide guitar, which is a particular play mode of the guitar where continuous tuning of the tones is achieved by sliding a metal or glass piece, the bottleneck, along the strings on the guitar neck side. The bottleneck constitutes a unilateral constraint for the string vibration. Dynamics is subject to friction, scraping, textured displacement, and collisions. The presented model is physically inspired and is based on a dynamic model of friction, together with a geometrical model of the textured displacements and a model for collisions of the string with the bottleneck. These models are suitable for implementation in a digital waveguide computational scheme for the 3-D vibration of the string, where continuous pitch bending is achieved by all-pass filters to approximate fractional delays. Friction is captured by nonlinear state-space systems in the slide junction and textured displacements by signal injection at a variable point in the waveguide.
Convention Paper 8590 (Purchase now)
P3-2 Measuring Spectral Directivity of an Electric Guitar Amplifier—Agnieszka Roginska, New York University - New York, NY, USA; Alex U. Case, University of Massachusetts Lowell - Lowell, MA, USA; Andrew Madden, Jim Anderson, New York University - New York, NY, USA
The recorded timbre of an electric guitar amplifier is highly dependent on the position of the microphone. Small changes in the location of the microphone can yield significant spectral differences, particularly at positions very close to the amp. This paper presents densely measured radiation pattern characteristics of an electric guitar amplifier on a 3-D grid in front, beside, behind, and above the amplifier in a hemi-anechoic space. We use this data to analyze the change in spectral differences between the numerous points on the measurement grid. Differences between acoustically measured and estimated frequency responses (predicted, using interpolation) are used to study the change in the acoustic field in order to gain insight and an understanding of the spectral directivity sensitivity factor of the electric guitar amplifier.
Convention Paper 8592 (Purchase now)
P3-3 Magnitude-Priority Filter Design for Audio Applications—Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary
In audio, often specialized filter design methods are used that take into account the logarithmic frequency resolution of hearing. A notable side-effect of these quasi-logarithmic frequency design methods is a high-frequency attenuation for non-minimum-phase targets due to the frequency-dependent windowing effect of the filter design. This paper presents two approaches for the correction of this high-frequency attenuation, based either on the iterative update of the magnitude, or the iterative update of the phase of the target specification. As a result, the filter follows both magnitude and phase in those frequency regions where it can, while where this is not possible, it focuses on the magnitude. Thus, the new method combines the advantages of traditional complex and magnitude-only filter designs. The algorithms are demonstrated by parallel filter designs, but since the method does not make any assumption on the filter design algorithm used in the iteration, it is equally applicable to other techniques.
Convention Paper 8591 (Purchase now)
P4 - Sound Reinforcement and Studio Technologies
Thursday, April 26, 11:00 — 12:30 (Room: Liszt)
Chair:
Diemer de Vries
P4-1 Full Room Equalization at Low Frequencies with Asymmetric Loudspeaker Arrangements—Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary
For rectangular rooms with symmetric loudspeaker arrangements, full room equalization can be achieved at low frequencies, as demonstrated by previous research. The method is based on generating a plane wave that propagates along the room. However, often the room is not rectangular, and/or a symmetric loudspeaker setup cannot be assured, leading to a deteriorated equalization performance. In addition, the performance of the method drops significantly above a cutoff frequency where a plane wave cannot be generated. These problems are addressed by the proposed method by prescribing only the magnitude in the control points, while the phase is determined by an iterative optimization process starting from the plane wave solution. A true “magnitude-only” variant of the method is also presented. Comparison is given to the plane-wave based methods by introducing asymmetries to the loudspeaker setup in a simulated environment, showing that the new methods result in smaller average magnitude deviations compared to the previous plane-wave based approach.
Convention Paper 8593 (Purchase now)
P4-2 Linear Mixing Models for Active Listening of Music Productions in Realistic Studio Conditions—Nicolas Sturmel, Université Paris Diderot - Paris, France; Antoine Liutkus, Telecom ParisTech - Paris, France; Jonathan Pinel, Laurent Girin, GIPSA-Lab, Grenoble INP - Grenoble, France; Sylvain Marchand, Université de Bretagne Occidentale - Brest, France; Gaël Richard, Roland Badeau, Telecom ParisTech - Paris, France; Laurent Daudet, Université Paris Diderot - Paris, France
The mixing/demixing of audio signals as addressed in the signal processing literature (the “source separation” problem) and the music production in studio remain quite separated worlds. Scientific audio scene analysis rather focuses on “natural” mixtures and most often uses linear (convolutive) models of point sources placed in the same acoustic space. In contrast, the sound engineer can mix musical signals of very different nature and belonging to different acoustic spaces, and exploits many audio effects including nonlinear processes. In the present paper we discuss these differences within the strongly emerging framework of active music listening, which is precisely at the crossroads of these two worlds: it consists in giving to the listener the ability to manipulate the different musical sources while listening to a musical piece. We propose a model that allows the description of a general studio mixing process as a linear stationary process of “generalized source image signals” considered as individual tracks. Such a model can be used to allow the recovery of the isolated tracks while preserving the professional sound quality of the mixture. A simple addition of these recovered tracks enables the end-user to recover the full-quality stereo mix, while these tracks can also be used for, e.g., basic remix / karaoke / soloing and re-orchestration applications.
Convention Paper 8594 (Purchase now)
P4-3 Capturing Height: The Addition of Z Microphones to Stereo and Surround Microphone Arrays—Paul Geluso, New York University - New York, NY, USA
As surround systems with height channels become more commonplace, new microphone techniques to capture sound in 3-D are needed. In order for the height channels to be effective, they must contain sonic information that is compatible with the 5.1 surround channels in order to improve the listener’s sense of three-dimensional sound imaging, space, and immersion. Complex height information can be captured by pairing horizontally oriented microphones with vertically oriented bi-directional microphones. In this paper the author presents a rationale, methodology, and preliminary evaluation of a microphone technique based on this concept.
Convention Paper 8595 (Purchase now)
P5 - Recording and Production
Thursday, April 26, 14:30 — 18:30 (Room: Lehar)
Chair:
Balazs Bank
P5-1 Automated Horizontal Orchestration Based on Multichannel Musical Recordings—Maximos Kaliakatsos-Papakostas, University of Patras - Patras, Greece; Andreas Floros, Ionian University - Cordu, Greece; Michael N. Vrahatis, University of Patras - Patras, Greece
Orchestration of computer-aided music composition aims to approximate musical expression using vertical instrument sound combinations, i.e., through finding appropriate sets of instruments to replicate synthesized sound samples. In this paper we focus on horizontal orchestration replication, i.e., the potential of replicating the instantaneous intensity variation of a number of instruments that comprise an existing, target music recording. A method that efficiently performs horizontal orchestration replication is provided, based on the calculation of the instrumental Intensity Variation Curves. It is shown that this approach achieves perceptually accurate automated orchestration replication when combined with automated music generation algorithms.
Convention Paper 8596 (Purchase now)
P5-2 The Effect of Scattering on Sound Field Control with a Circular Double-Layer Array of Loudspeakers—Jiho Chang, Finn Jacobsen, Technical University of Denmark - Lyngby, Denmark
A recent study has shown that a circular double-layer array of loudspeakers makes it possible to achieve a sound field control that can generate a controlled field inside the array and reduce sound waves propagating outside the array. This is useful if it is desirable not to disturb people outside the array or to prevent the effect of reflections from the room. The study assumed free field condition, however in practice a listener will be located inside the array. The listener scatters sound waves, which propagate outward. Consequently, the scattering effect can be expected to degrade the performance of the system. This paper computationally examines the scattering effect based on the simple assumption that the listener’s head is a rigid sphere. In addition, methods to solve the problem are discussed.
Convention Paper 8597 (Purchase now)
P5-3 The Equidome, a Personal Spatial Reproduction Array—Late Cancellation—James L. Barbour, Swinburne University of Technology - Melbourne, Australia
Late Cancellation: This paper will not be presented.
Convention Paper 8598 (Purchase now)
P5-4 A Clipping Detector for Layout-Independent Multichannel Audio Production—Giulio Cengarle, Toni Mateos, Fundació Barcelona Media - Barcelona, Spain
In layout-independent audio production, content is produced independently from the number of channels and their location, so that it can be played-back in different multichannel setups. In such contexts, sound is monitored through a playback system that might differ from the potentially many exhibition layouts. Signals combine to the outputs of each playback system in a different way and may produce clipping in some loudspeakers. A method is presented for detecting and quantitatively estimating clipping in the output stage of such systems, based on a suitable definition of a worst-case loudspeaker layout, and associated audio scene rotation and decoding. Practical examples are provided to validate the algorithm.
Convention Paper 8599 (Purchase now)
P5-5 Evaluation of Spatial Impression Comparing Surround with Height Channels for 3-D Imagery—Toru Kamekawa, Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toshihiko Date, Aiko Kawanaka, AVC Networks Company, Panasonic Corporation - Osaka, Japan; Masaaki Enatsu, marimoRECORDS Inc. - Tokyo, Japan
Three-dimensional (3-D) imagery is now widely spreading as one of the next visual formats for Blu-ray or other future media. Since more audio channels are available with future media, the authors aim to find the suitable sound format for 3-D imagery. Semantic Differential method using 24 attributes such as “presence,” “naturalness,” and “preference” was carried out comparing combinations of 3-D and 2-D imagery with 2-channel stereo, 5-channel surround, and 7-channel surround sound (5-channel surround plus 2 height channels). Three factors (spatial factor, preference factor, and quality factor) were extracted from the results of Factor Analysis. Combination of the 3-D imagery with 7-channel surround gives higher scores at all those three factors.
Convention Paper 8600 (Purchase now)
P5-6 Microphone Array Design for Localization with Elevation Cues—Michael Williams, Sounds of Scotland - Le Perreux sur Marne, France
Analysis of the HRTF characteristics with respect to both azimuth and elevation localization cues would seem to suggest that, while inter-aural time difference and inter-aural level difference information give strong azimuth localization cues, we only have spectral variations in the vertical plane to generate localization cues with respect to elevation. Of course positioning of a second layer of loudspeakers above the horizontal reference plane will already introduce listener spectral differences relative to the horizontal plane reproduction, but the microphone array that feeds this second layer must not generate information that will be in conflict with the localization cues already generated in the horizontal plane of the first layer or main array. In the event of time difference and level difference information being generated between microphones in both layers, localization characteristics must be considered as projected onto the main horizontal plane of localization information.
Convention Paper 8601 (Purchase now)
P5-7 General Integral Equation for the With-Height Reproduction of a Focused Source Inside—Jung-Woo Choi, Yang-Hann Kim, Korea Advanced Institute of Science and Technology (KAIST) - Deajeon, Korea
A general integral formula to reproduce sound field from a virtual sound source located inside secondary sources is proposed. The proposed formula extends the theory on the focused sound source of Wave Field Synthesis for the reproduction of three-dimensional sound field. To resolve the non-existence problem involved with the reproduction of a source inside, an alternative sound field satisfying homogeneous wave equation is derived. The Kirchhoff-Helmholtz integral is reformulated in such a way that the alternative field is reproduced in terms of the secondary sources distributed on a surface. Then the general equation is reduced to simpler forms using various approximations, such as single layer formula reproducing the sound field only by monopole sound sources.
Convention Paper 8602 (Purchase now)
P5-8 Evaluation of a New Active Acoustics System in Performances of Five String Quartets—Doyuen Ko, Wieslaw Woszczyk, Sonh Hui Chon, McGill University - Montreal, Quebec, Canada
An innovative electro-acoustic enhancement system, based on measured high-resolution impulse responses, was developed at the Virtual Acoustics Technology (VAT) lab of McGill University and was permanently installed in Multi-Media Room, a large rectangular scoring stage. Standard acoustic measures confirmed that the system was able to effectively improve room acoustic conditions in both spectral and spatial parameters. Subjective evaluation was conducted with twenty professional musicians from five string quartets on three different acoustic conditions. Spatial impression, stage support, and tonal quality were found to be three most dominant perceptual dimensions, while “naturalism of reverberation” was the most salient attribute affecting musicians’ preferences. Results showed a strong preference for enhanced acoustics over natural acoustics of the space.
Convention Paper 8603 (Purchase now)
P6 - Multimodal Apps and Broadcast
Thursday, April 26, 14:30 — 17:00 (Room: Liszt)
Chair:
Bozena Kostek
P6-1 Immersive Audiovisual Environment with 3-D Audio Playback—Javier Gómez Bolaños, Ville Pulkki, Aalto University - Espoo, Finland
The design of an immersive audiovisual environment for researching the aspects of the perception of spatial sound in the presence of a surrounding moving visual image is presented. The system consists in a visual screen with wide field-of-view based in acoustically transparent screens that span 226° in the horizontal plane and 57° in the vertical plane. In addition, a 3-D multichannel sound reproduction system with 29 active loudspeakers is installed. The total system is optimized for audio playback, and measured acoustical system responses are presented. The system is equipped with a tracking system based on infrared cameras, which enable head-tracking for head phone listening and also interaction based on gestures. This audiovisual system aims to be a tool for researching spatial audio, crossmodal interaction and psychoacoustics, auralization, and gaming.
Convention Paper 8604 (Purchase now)
P6-2 Sensitive Audio Data Encryption for Multimodal Surveillance Systems—Janusz Cichowsk, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
Novel algorithms for data processing in audiovisual surveillance systems were developed allowing for a better personal data protection. The solution merging the image and audio encryption for privacy-sensitive data protection employing audio stream is described. The main objectives of this research study including motivation and the state of the art are presented with a comprehensive explanation of audio stream relation to the surveillance. The invertible encryption methodology for privacy preserving using audio container is applied. The experiments are described and obtained results are reported including prospects for future improvements.
Convention Paper 8605 (Purchase now)
P6-3 Loudness Normalization of Wide-Dynamic Range Broadcast Material—Scott Norcross, Michel Lavoie, Communications Research Center - Ottawa, Ontario, Canada
Various techniques are being used by broadcasters to normalize the loudness levels of their programs. For long-form content, EBU R128 recommends that the full program be measured using the algorithm described in ITU-R BS.1770-2. ATSC A/85 recommends instead that only the “Anchor Element” of long-form content need be measured. For narrow dynamic range material, the differences between the two measures are not large, but there can be large differences between the two approaches when the material has a wide dynamic range. This paper compares these two measurement approaches and explores their subjective consequences.
Convention Paper 8606 (Purchase now)
P6-4 Creating Mood Dictionary Associated with Music—Magdalena Plewa, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
The paper presents an attempt to create a dictionary of words related to mood associated with music. Two parts of a listening test were designed and carried out with a group of students, most of them users of social music online services. The audience task was to propose adjectives well-describing music tracks. These words were given in Polish and then compared to their English equivalents. The obtained results show that terms associated with music are language-specific and in addition there is a need to use multi-label mood description.
Convention Paper 8607 (Purchase now)
P6-5 Redundancy Optimization for Networked Audio Systems—Damian Kowalski, Piotr Z. Kozlowski, Wroclaw University of Technology - Wroclaw, Lower Silesia, Poland
Networked audio systems can be simply defined as a connection of IT and professional audio. Nowadays, we can use protocols developed by IT specialists to ensure system recovery without human intervention. There is a possibility to improve the recovery time of the system after failure by optimizing the protocols responsible for network redundancy. The paper is a summary of research completed at Wroclaw University of Technology on May 2011. It contains guidelines on how to optimize the network redundancy in order to achieve the best results.
Convention Paper 8608 (Purchase now)
P7 - Perception
Thursday, April 26, 17:00 — 18:00 (Room: Liszt)
Chair:
Bozena Kostek
P7-1 Detection of Two Subwoofers: Effect of Broad-Band-Channel Level and Crossover Frequency—Jussi Rämö, Sakari Bergen, Julian Parker, Veli-Matti Yli-Kätkä, Ville Pulkki, Aalto University - Espoo, Finland
The use of multiple subwoofers can be advantageous compared to a setup consisting of a single subwoofer due to the cancellation of room modes. We investigate the effect of subwoofer crossover frequency and program material on the perceived localization of bass frequencies using single or dual subwoofers, via a listening test. Test results show that dual subwoofer setups are harder to detect than single subwoofer setups and also exhibit the well-known relationship between crossover frequency and difficulty of localization.
Convention Paper 8609 (Purchase now)
P7-2 Pitch, Timbre, Source Separation, and the Myths of Loudspeaker Imaging—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
Standard models for both timbre detection and sound localization do not account for our acuity of localization in reverberant environments or when there are several simultaneous sound sources. They also do not account for our near instant ability to determine whether a sound is near or far. This paper presents data on how both semantic content and localization information is encoded in the harmonics of complex tones and the method by which the brain separates this data from multiple sources and from noise and reverberation. Much of the information in these harmonics is lost when a sound field is recorded and reproduced, leading to a sound image which may be plausible but is not remotely as clear as the original sound field.
Convention Paper 8610 (Purchase now)
P8 - Listening Tests: Part 1
Friday, April 27, 09:00 — 11:00 (Room: Lehar)
Chair:
Jan Plogsties
P8-1 Comparison of Localization Performance of Blind and Sighted Subjects on a Virtual Audio Display and in Real-life Environments—György Wersényi, József Répás, Széchenyi István University - Györ, Hungary
Localization performance of blind subjects was measured in a virtual audio environment using non-individualized but customized HRTFs. Results were compared with former results of sighted users using the same measurement setup. Furthermore, orientation and navigation tasks in a real-life outdoor environment were performed in order to compare localization ability of sighted and visually impaired including "walking straight" tasks with and without acoustic feedback and test runs using the white cane as an acoustic tool during navigation.
Convention Paper 8611 (Purchase now)
P8-2 HELM: High Efficiency Loudness Model for Broadcast Content—Alessandro Travaglini, Fox International Channels Italy - Rome, Italy; Andrea Alemanno, Aurelio Uncini, University of Rome “La Sapienza” - Rome, Italy
In this paper we propose a new algorithm for measuring the loudness levels of broadcast content. It is called the High Efficiency Loudness Model (HELM) and it aims to provide robust measurement of programs of any genre, style, and format, including stereo and multichannel audio 5.1 surround sound. HELM was designed taking into account the typical conditions of the home listening environment, and it is therefore particularly good at meeting the needs of broadcast content users. While providing a very efficient assessment of typical generic programs, it also successfully approaches some issues that arise when assessing unusual content such as programs heavily based on bass frequencies, wide loudness range programs, and multichannel programs as opposed to stereo ones. This paper details the structure of HELM, including its channel-specific frequency weighting and recursive gating implementation. Finally, we present the results of a mean opinion score (MOS) subjective test that demonstrates the effectiveness of the proposed method.
Convention Paper 8612 (Purchase now)
P8-3 Defining the Listening Comfort Zone in Broadcasting through the Analysis of the Maximum Loudness Levels—Alessandro Travaglini, Fox International Channels Italy - Rome, Italy; Andrea Alemanno, University of Rome “La Sapienza” - Rome, Italy; Fabrizio Lantini, Electric Light Studio - Rome (RM), Italy
Over the last few years, the broadcasting industry has finally approached the loudness issue by standardizing its measurement and recommending target loudness levels with which all programs are required to comply. If the recommendations are applied and all programs are normalized at the target level, viewers ought to experience consistent perceived loudness levels throughout transmissions. However, due to the inner loudness modulation of the programs themselves, this is not always the case. In fact, even if the overall program loudness levels perfectly match the required target level, excessive loudness modulations can still generate annoyance to viewers if the foreground sound levels exceed the so-called “comfort zone.” The fact is that we still have no clear data on which metering can provide visual/numeric feedback on the perception of “hearing annoyance.” This paper investigates this issue and aims to provide objective evidence of which parameters would better represent this phenomenon. In particular, we describe an extensive subjective test performed for both the typical Stereo TV and the 5.1 home theater set reproductions and analyze its results in order to verify whether the Maximum Momentary Loudness Level, the Maximum Short Loudness Level, and Loudness Range (LRA) values described in EBU R128 can provide robust and reliable numeric references to generate a comfortable listening experience for viewers. Furthermore, we perform a similar analysis for the loudness descriptors of the algorithm HELM and finally indicate the values of those parameters that show the most consistent and reliable figures.
Convention Paper 8613 (Purchase now)
P8-4 The Relative Importance of Speech and Non-Speech Components for Preferred Listening Levels—Ian Dash, Consultant - Marrickville, NSW, Australia; Miles Mossman, Densil Cabrera, University of Syndey - Sydney, NSW, Australia
In a prior paper the authors reported on a listening test that attempted to establish the relative importance of speech and non-speech components of a mixed soundtrack when matching loudness to reference audio items. That paper concluded that listeners match loudness by overall content rather than by the loudness of the speech or non-speech components. This paper reports on a follow-up listening test that attempts to establish the relative importance of speech and non speech components in setting preferred listening level without any external reference. The results indicate that while speech levels are set more consistently than non-speech levels, listeners tend to set the overall levels more consistently than either of these components.
Convention Paper 8614 (Purchase now)
P9 - Analysis and Synthesis: Part 1
Friday, April 27, 09:00 — 12:00 (Room: Liszt)
Chair:
Juha Backman
P9-1 Emergency Voice/Stress-level Combined Recognition for Intelligent House Applications—Konstantinos Drosos, Andreas Floros, Ionian University - Corfu, Greece; Kyriakos Agkavanakis, BlueDev Ltd. - Patras, Greece; Nicolas-Alexander Tatlas, Technological Educational Institute of Piraeus - Piraeus, Greece; Nikolaos-Grigorios Kanellopoulos, Ionian University - Corfu, Greece
Legacy technologies for word recognition can benefit from emerging affective voice retrieval, potentially leading to intelligent applications for smart houses enhanced with new features. In this paper we introduce the implementation of a system, capable to react to common spoken words, taking into account the estimated vocal stress level, thus allowing the realization of a prioritized, affective aural interaction path. Upon the successful word recognition and the corresponding stress level estimation, the system triggers particular affective-prioritized actions, defined within the application scope of an intelligent home environment. Application results show that the established affective interaction path significantly improves the ambient intelligence provided by an affective vocal sensor that can be easily integrated with any sensor-based home monitoring system.
Convention Paper 8615 (Purchase now)
P9-2 Loudness Range (LRA)—Design and Evaluation—Esben Skovenborg, TC Electronic A/S - Risskov, Denmark
Loudness Range (LRA) is a measure of the variation of loudness on a macroscopic time-scale. Essentially, LRA is the difference in loudness level between the soft and loud parts of a program. In 2009 the algorithm for computing LRA was published by TC Electronic and was then included in the EBU R-128 recommendation for loudness normalization. This paper describes the design choices underlying the LRA algorithm. For each of its parameters the interval of optimal values is presented, supported by analyses of audio examples. Consequences of setting parameter values outside these intervals are also described. Although the LRA measure has already proven its usefulness in practice, this paper provides background knowledge that could support further refinement and standardization of the LRA measure.
Convention Paper 8616 (Purchase now)
P9-3 Statistical Properties of the Close-Microphone Responses—Elias K. Kokkinis, Eleftheria Georganti, John Mourjopoulos, University of Patras - Patras, Greece
The close-microphone technique is widely used in modern sound engineering practice. It is mainly used to minimize the effect of leakage and room acoustics on the received signal. In this paper the properties of the close-microphone response are investigated from a signal processing point of view, through the respective frequency domain statistical moments. Room impulse response measurements were made in various rooms and source-microphone distances, and statistical moments were calculated over frequency and distance. It is shown that the statistical properties of the impulse responses remain relatively constant for short source-microphone distances and this in turn provides a consistent sound, which was validated through a subjective evaluation test.
Convention Paper 8617 (Purchase now)
P9-4 Reproduction of Proximity Virtual Sources Using a Line Array of Loudspeakers—Jung-Min Lee, Jung-Woo Choi, Dong-Soo Kang, Yang-Hann Kim, Korea Advanced Institute of Science and Technology (KAIST) - Daejeon, Korea
For reproducing a desired sound field from virtual sources, Wave Field Synthesis (WFS) assumes that virtual sources are positioned at far-field from the loudspeaker array. This far-field assumption inevitably produces reproduction errors when the virtual source is adjacent to the array. In this paper we propose a method that can render the sound field from virtual sources positioned near the loudspeaker array. The driving functions of loudspeakers are derived for the planar array geometry, and then the surface integral is reduced to a line integral by utilizing different approximations from WFS. In addition, a modified equation for the discrete loudspeaker distribution is presented. Numerical simulations show that the proposed method can reduce the reproduction error to a practically acceptable level.
Convention Paper 8618 (Purchase now)
P9-5 On the Statistics of Binaural Room Transfer Functions—Eleftheria Georganti, University of Patras - Patras, Greece; Tobias May, Steven van de Par, University of Oldenburg - Oldenburg, Germany; John Mourjopoulos, University of Patras - Patras, Greece
The well-known property of the spectral standard deviation of Room Transfer Functions (RTFs), that is, its convergence to 5.57 dB, is extended to reverberant Binaural Room Transfer Functions (BRTFs). The BRTFs are related to the anechoic Head Related Transfer Functions (HRTFs) and the corresponding RTFs. Consequently, the statistical properties of the RTFs and HRTFs can be systematically related to the statistical properties of the BRTFs. In this paper the standard deviation of BRTFs measured in different types of rooms, for various source/receiver distances and azimuth angles is computed. The derived values are compared to the ones obtained from the single channel RTFs measured at the same positions. Their relationship to the 5.57 dB value is discussed.
Convention Paper 8619 (Purchase now)
P9-6 On Acoustic Detection of Vocal Modes—Eddy B. Brixen, EBB-consult - Smorum, Denmark; Cathrine Sadolin, Henrik Kjelin, Complete Vocal Institute - Copenhagen, Denmark
This paper is a last minute cancellation.
According to the Complete Vocal Technique four vocal modes are defined: Neutral, Curbing, Overdrive, and Edge. These modes are valid for both the singing voice and the speaking voice. The modes are clearly identified both from listening and from visual laryngograph inspection of the vocal cords and the surrounding area of the vocal tract. For many reasons it would be preferred to apply a simple acoustic analysis to identify the modes. This paper looks at the characteristics of the voice modes from an acoustical perspective based on voice samples from four male and two female subjects. The paper describes frequency domain criteria for the discrimination of the various modes.
Convention Paper 8620 (Purchase now)
P10 - Education
Friday, April 27, 11:00 — 12:00 (Room: Lehar)
Chair:
Jan Plogsties
P10-1 Audio DSP from Scratch for Students of Computing—Ewa Lukasik, Poznan University of Technology - Poznan, Poland
The paper presents a set of dedicated programming applications for teaching digital audio basics to the novices with the background in computing. These applications have been used in the Institute of Computing Science Poznan University of Technology for over 15 years. Each application is devoted to an individual subject, is self explanatory, illustrates the given topic, and includes several simple problems to be solved by students. The topics include: the idea of signal energy and RMS, signals orthogonality, signals approximations, simple sinusoidal synthesis, spectrum, sampling theorem, upsampling and downsampling, convolution, idea of digital filters and interpretation of frequency characteristics, FIR and IIR filters—their zeroes and poles, windowing, DFT and its time and frequency resolution, cepstral analysis, and others. This set of programs proved useful as a starting point to more advanced audio projects in the domain of MIR, speech and speaker recognition, speech synthesis, audio coding, audio transmission, and others.
Convention Paper 8621 (Purchase now)
P10-2 A Comparison of Audio Frameworks for Teaching, Research, and Development—Martin Robinson, University of the West of Engand - Bristol, UK; Jamie Bullock, Birmingham Conservatoire - Birmingham, UK
This paper compares a range of audio frameworks for the support of teaching, research, and the development of audio applications. The authors employ a range of metrics with which to compare the frameworks including: licensing terms, portability across different architectures, audio data-type support, efficiency of processing code, expressiveness, usability, and community activity. Conclusions are drawn that none of these frameworks score highly in all of these domains. This suggests that while there are already a large number of such frameworks there remain areas to be addressed. The authors suggest that this might be through the development of existing systems or the development of new frameworks to meet these needs.
Convention Paper 8622 (Purchase now)
P11 - Quality Evaluation and Spatial Audio
Friday, April 27, 12:30 — 14:00 (Room: Foyer)
P11-1 Influence of Resolution of Head Tracking in Synthesis of Binaural Audio—Mikko-Ville Laitinen, Tapani Pihlajamäki, Stefan Lösler, Ville Pulkki, Aalto University - Espoo, Finland
The use of head tracking in binaural synthesis of spatial sound increases the quality of reproduction. The required quality of a head-tracking system for this purpose is the topic of interest in this paper. A listening test was performed to evaluate the effect of four common sources of error in head-tracking systems. The listeners rated the naturalness of binaural reproduction in different head-tracking conditions. According to the test, the main requirement for the head-tracking system is to obtain an accurate and unrestricted azimuth angle. Furthermore, lower update rate of the tracking system affects the quality. The results showed prominent dependence on program material, and individual differences were also notable.
Convention Paper 8623 (Purchase now)
P11-2 Objective and Subjective Tests of Consumer-Class Audio Devices—Marek Pluta, Pawel Malecki, AGH University of Science and Technology - Krakow, Poland
The paper presents a new stand for subjective listening tests in the Department of Mechanics and Vibroacoustics of Krakow University of Science and Technology, as well as a procedure and results of preliminary tests that utilize it. The most important difference, compared to the previously used stand, is exclusion of an analog mixing console, which reduces the length of electroacoustic channel, and substituting active studio monitors with 12 pairs of closed headphones in order to provide a larger group of listeners with the same listening conditions. The tests study popular consumer audio reproduction devices, therein standalone and motherboard integrated sound cards, as well as portable players. The research included measurements and comparison of objective parameters, as well as ABX method listening test, with a studio-class audio interface as a reference device. A group of listeners of various professional profiles, including students of the Academy of Music in Krakow and Acoustic Engineering of Krakow University of Science and Technology, took part in these listening tests. Results are compared to the data obtained using the previous test stand.
Convention Paper 8624 (Purchase now)
P11-3 Subjective Evaluations of Perspective Control Microphone Array (PCMA) —Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Perspective Control Microphone Array (PCMA) is a technique that allows one to flexibly render spatial audio images depending on the desired virtual listening position in a reproduced sound field. Two subjective listening experiments have been conducted to evaluate the effectiveness of PCMA on the controls over the perceived source/ensemble distance and width attributes. The first experiment verified a hypothesis suggesting that perceived width would decrease as source-listener distance increased, using anechoic trumpet and conga sources convolved with binaural impulse responses of a concert hall. It was shown that the distance and width linearly changed at doubled distances and were negatively correlated. The second experiment tested three reference virtual array configurations of PCMA on the same attributes using the same sources. The results agreed with the perceptual patterns observed in the concert hall situation in that there was a linear decrease in perceived width at an increased perceived distance. The main effect of PCMA configuration was found to be statistically significant. These results seem to strongly validate the effectiveness of PCMA for postproduction and user-interactive applications.
Convention Paper 8625 (Purchase now)
P11-4 Objective Profiling of Perceived Punch and Clarity in Produced Music—Steven Fenton, Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
This paper describes and evaluates an objective measurement that profiles a complex musical signal over time in terms of identification of dynamic content and overall perceived quality. The authors have previously identified a potential correlation between inter-band dynamics and the subjective quality of produced music excerpts. This paper describes the previously presented Inter-Band Relationship (IBR) descriptor and extends this work by conducting more thorough testing of its relationship to perceived punch and clarity over varying time resolutions. A degree of correlation is observed between subjective test scores and the objective IBR descriptor suggesting it could be used as an additional model output variable (MOV) to describe punch and clarity with a piece of music. Limitations have been identified in the measure however and further consideration is required with regard to the choice of threshold adopted based on the range of dynamics detected within the musical extract and the possible inclusion of a gate as utilized in some loudness algorithms.
Convention Paper 8626 (Purchase now)
P11-5 A Hybrid Method Combining Synthesis of a Sound Field and Control of Acoustic Contrast—Martin Bo Møller, Martin Olsen, Bang & Olufsen a/s - Struer, Denmark; Finn Jacobsen, Technical University of Denmark - Lyngby, Denmark
Spatially confined regions with different sound field characteristics, in the following referred to as sound zones, may be desired in some situations. Recently, various sound field control methods for generating separate sound zones have been proposed in the literature. The different algorithms introduce different levels of control over the physical characteristics of the resulting sound fields. This paper introduces a hybrid of two existing methods employed for obtaining sound zones: “Energy Difference Maximization” for control of the sound field energy distribution and “Pressure Matching,” which contributes with synthesis of a desired sound field. The hybrid method introduces a tradeoff between acoustic contrast between two sound zones and the degree to which the phase is controlled in the optimized sound fields.
Convention Paper 8627 (Purchase now)
P11-6 Validation of Room Plane Wave Decomposition as a Tool for Spatial Ecogram Analysis of Rooms—Ana Torres, Universidad de Castilla La Mancha - Ciudad Real, Castile-La Mancha, Spain; Jose J. Lopez, Technical University of Valencia - Valencia, Spain; Basilio Pueo, Universidad de Alicante - Alicante, Spain
The classical point based analysis of sound attributes in room acoustics are not usually enough for analyzing the acoustic complexity of a room. Some spatial attributes that provide more information are been employed. Circular arrays of microphones and the subsequently plane wave decomposition have been proposed in the literature and employed successfully by the authors and others. In this paper we validate our previous work comparing the resulting measured echograms with the simulation of the room acoustics using a basic room acoustics modeling software, ROOMSIM, freely available. The resulting echograms of both methods are compared identifying strengths and weaknesses of the measurement method based on circular arrays and proposing some ideas for a future more in deep analysis.
Convention Paper 8628 (Purchase now)
P12 - Amps and Measurement
Friday, April 27, 14:00 — 16:30 (Room: Lehar)
Chair:
Mark Sandler
P12-1 Current-Driven Switch-Mode Audio Power Amplifiers—Arnold Knott, Niels Christian Buhl, Michael A. E. Andersen, Technical University of Denmark - Lyngby, Denmark
The conversion of electrical energy into sound waves by electromechanical transducers is proportional to the current through the coil of the transducer. However virtually all audio power amplifiers provide a controlled voltage through the interface to the transducer. This paper presents a switch-mode audio power amplifier not only providing controlled current but also being supplied by current. This results in an output filter size reduction by a factor of 6. The implemented prototype shows decent audio performance with THD + N below 0.1%.
Convention Paper 8629 (Purchase now)
P12-2 Debugging of Class-D Audio Power Amplifiers—Lasse Crone, Jeppe Arnsdorf Pedersen, Jakob Døllner Mønster, Arnold Knott, Technical University of Denmark - Lyngby, Denmark
Determining and optimizing the performance of a Class-D audio power amplifier can be very difficult without knowledge of the use of audio performance measuring equipment and of how the various noise and distortion sources influence the audio performance. This paper gives an introduction on how to measure the performance of the amplifier and how to find the noise and distortion sources and suggests ways to remove them. Throughout the paper measurements of a test amplifier are presented along with the relevant theory.
Convention Paper 8630 (Purchase now)
P12-3 Investigation of Crosstalk in Self Oscillating Switch Mode Audio Power Amplifier—Thomas Haagen Birch, Rasmus Ploug, Niels Elkjær Iversen, Arnold Knott, Technical University of Denmark - Lyngby, Denmark
Self oscillating switch mode power amplifiers are known to be susceptible to interchannel disturbances also known as crosstalk. This phenomenon has a significant impact on the performance of an amplifier of this type. The goal of this paper is to investigate the presence and origins of crosstalk in a two-channel self oscillating switch mode power amplifier (class D). A step-by-step reduction of elements in an amplifier built for this task is used for methodically determining the actual presence and origins of crosstalk. The investigation shows that the crosstalk is caused by couplings in the self oscillating pulse width modulation circuits, but also that the output filter has a major impact on the level of crosstalk.
Convention Paper 8631 (Purchase now)
P12-4 Measuring Mixing Time in Non-Sabinian Rooms: How Scattering Influences Small Room Responses—Lorenzo Rizzi, Gabriele Ghelfi, Suono e Vita – Acoustic Engineering - Lecco, Italy
The goal of this work is to optimize a DSP tool for extrapolating from room impulse response information regarding the way in which the transition between early reflections and late reverberation occurs. Two different methods for measuring this transition (usually referred as mixing time, tmix) have been found in literature, both based on statistical properties of acoustic spaces. Appropriate changes have been implemented and the algorithms have been tested on I.R. measured in eight different environments. Particular attention is given to non-Sabinian environments such as small-rooms for music. It has been also measured a relationship between sound diffusion and tmix, showing how the presence of scattering elements contributes to lower tmix altering the statistical properties of I.R.
Convention Paper 8632 (Purchase now)
P12-5 Separation of High Order Impulse Responses in Methods Based on the Exponential Swept-Sine—Stephan Tassart, ST-Ericsson - Paris, France; Aneline Grand, Arkamys - Paris, France
Many real analog systems (e.g., electroacoustic loudspeaker, audio amplifiers, filters, etc.) exhibit weakly nonlinear features when driven by large amplitude signals. A large scale of such electromechanical devices are well modeled by the cascade of Hammerstein models. The exponential swept-sine is a natural excitation vector in order to identify the structural elements from those models. This paper extends the original swept-sine principle to the case of band-limited test vectors, suggests an intermodulation law for the generation of band-limited test vectors, and shows that a long-duration swept-sine can be replaced by a series of slightly phase-shifted short-duration swept-sines. High order impulse responses are separable even in case of temporal overlap with a linear combination of the measurements. The method is demonstrated on examples.
Convention Paper 8633 (Purchase now)
P13 - Analysis and Synthesis: Part 2; Content Management
Friday, April 27, 14:00 — 16:30 (Room: Liszt)
Chair:
Michael Kelly
P13-1 Overview of Feature Selection for Automatic Speech Recognition—Branislav Gerazov, Zoran Ivanovski, Faculty of Electrical Engineering and Information Technologies - Skopje, Macedonia
The selection of features to be used for the task of Automatic Speech Recognition (ASR) is critical to the overall performance of the ASR system. Throughout the history of development of ASR systems, a variety of features have been proposed and used, with greater or lesser success. Still, the research for new features, as well as modifications to the traditional ones, continues. Newly proposed features as well as traditional feature optimization focus on adding robustness to ASR systems, which is of great importance for applications involving noisy environments. The paper seeks to give a general overview of the various features that have been used in ASR systems, giving details to an extent granted by the space available.
Convention Paper 8634 (Purchase now)
P13-2 Evaluating the Influence of Source Separation Methods in Robust Automatic Speech Recognition with a Specific Cocktail-Party Training—Amparo Marti, Universitat Politècnica de València - València, Spain; Maximo Cobos, University of Valencia - Burjassot (Valencia), Spain; Jose J. Lopez, Universitat Politècnica de València - València, Spain
Automatic Speech Recognition (ASR) allows a computer to identify the words that a person speaks into a microphone and convert it to written text. One of the most challenging situations for ASR is the cocktail party environment. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance in case of using separation methods based on time-frequency masks. Recently, the authors proposed a specific training method to deal with simultaneous speech situations in practical ASR systems. In this paper we study how the speech recognition performance is affected by selecting different combinations of separation algorithms both at the training and test stages of the ASR system under different acoustic conditions. The results show that, while different separation methods produce different types of artifacts, the overall performance of the method is always increased when using any cocktail-party training.
Convention Paper 8635 (Purchase now)
P13-3 Automatic Regular Voice, Raised Voice, and Scream Recognition Employing Fuzzy Logic—Kuba LopatkaAndrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A method of automatic recognition of regular voice, raised voice, and scream used in an audio surveillance system is presented. The algorithm for detection of voice activity in a noisy environment is discussed. Signal features used for sound classification, based on energy, spectral shape, and tonality are introduced. Sound feature vectors are processed by a fuzzy classifier. The method is employed in an audio surveillance system working in eal-time both in an indoor and outdoor environment. Achieved results of classifying real signals are presented and discussed.
Convention Paper 8636 (Purchase now)
P13-4 Enhanced Chroma Feature Extraction from HE-AAC Encoder—Marco Fink, University of Erlangen-Nuremberg - Erlangen, Germany; Arijit Biswas, Dolby Germany GmbH - Nuremberg, Germany; Walter Kellermann, University of Erlangen-Nuremberg - Erlangen, Germany
A perceptually enhanced chroma feature extraction during the HE-AAC audio encoding process is proposed. Extraction of chroma features from the MDCT-domain spectra of the encoder and its further enhancement utilizing the perceptual model of the encoder is investigated. The main advantage of such a scheme is a reduced computational complexity when both chroma feature extraction and encoding is desired. Specifically, the system is designed to produce reliable chroma features irrespective of the block switching decision of the encoder. Three methods are discussed to circumvent the poor frequency resolution during short blocks. All proposed enhancements are evaluated systematically within a well-known state-of-the-art chord recognition framework.
Convention Paper 8637 (Purchase now)
P13-5 Hum Removal Filters: Overview and Analysis—Matthias Brandt, Jörg Bitzer, Jade University of Applied Sciences - Oldenburg, Germany
In this contribution we analyze different methods for removing sinusoidal disturbances from audio recordings. In order to protect the desired signal, high frequency selectivity of the used filters is necessary. However, due to the time-bandwidth uncertainty principle, high frequency selectivity brings about long impulse responses. This can result in audibly resonating filters, causing artifacts in the output signal. Thus, the choice of the optimal algorithm is a compromise between frequency selectivity and acceptable time domain behavior. In this context, different filter structures and algorithms have different characteristics. To investigate their influence on the hum disturbance and the desired signal, we have evaluated three methods using objective error measures to illustrate advantages and drawbacks of the individual approaches.
Convention Paper 8638 (Purchase now)
P14 - Listening Tests: Part 2
Friday, April 27, 16:30 — 18:00 (Room: Lehar)
Chair:
Thomas Sporer
P14-1 Determining the Threshold of Acceptability for an Interfering Audio Program—Jon Francombe, Russell Mason, Martin Dewhirst, University of Surrey - Guildford, Surrey, UK; Søren Bech, Bang & Olufsen - Struer, Denmark
An experiment was performed in order to establish the threshold of acceptability for an interfering audio program on a target audio program, varying the following physical parameters: target program, interferer program, interferer location, interferer spectrum, and road noise level. Factors were varied in three levels in a Box-Behnken fractional factorial design. The experiment was performed in three scenarios: information gathering, entertainment, and reading/working. Nine listeners performed a method of adjustment task to determine the threshold values. Produced thresholds were similar in the information and entertainment scenarios, however there were significant differences between subjects, and factor levels also had a significant effect: interferer program was the most important factor across the three scenarios, while interferer location was the least important.
Convention Paper 8639 (Purchase now)
P14-2 Signal Processing Framework for Virtual Headphone Listening Tests in a Noisy Environment—Jussi Rämö, Vesa Välimäki, Aalto University - Espoo, Finland
A signal processing framework is introduced to enable parallel evaluation of headphones in a virtual listening test. It is otherwise impractical to conduct a blind comparison of several headphones. The ambient noise isolation capability of headphones has become an important design feature, since the mobile usage of earphones takes place in noisy listening environments. Therefore, the proposed signal processing framework allows a noise signal to be fed through a filter simulating the ambient sound isolation at the same time when music is played. This enables the simultaneous evaluation of the timbre and background noise characteristics, which together define the total headphone listening experience. Methods to design FIR filters for compensating the reference headphone response and for simulating the frequency response and isolation curve of the headphones to be tested are presented. Furthermore, a real-time test environment implemented using Matlab and Playrec is described.
Convention Paper 8640 (Purchase now)
P14-3 Perceptual Evaluation of Stochastic-Event-Based Percussive Timbres for Use in Statistical Sonification—William Martens, Mark McKinnon-Bassett, Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia
The results of statistical data analysis have typically been presented using visual displays, but the sonification of data for auditory display, particularly using sound varying along realistic timbral dimensions, can offer an attractive alternative means for rendering such results. It was hypothesized that stochastic-event-based percussive timbres could be useful in communicating the details of statistical data, and so a preliminary study was designed to investigate the use of these timbres in data sonification. This study examined the ability of listeners to estimate the variation in a physical parameter for stimuli selected from a set of recorded percussive events. Specifically, two experiments were executed to determine whether listeners are generally able to estimate the number of small pellets that were present inside a container based upon the sounds that were made when the container was shaken a few times. The experimental results showed that a 6-dB trial-to-trial variation in reproduction level had no significant effect on obtained estimates, whereas variation in spectral energy distribution did significantly affect estimates of the number of small pellets in the shaken container. While the capacity to discriminate between sonification system outputs has been established, investigation of system effectiveness in applications remains to be done.
Convention Paper 8642 (Purchase now)
P15 - Spatial Audio: Part 1
Saturday, April 28, 09:00 — 11:30 (Room: Lehar)
Chair:
Christof Faller
P15-1 Analysis on Error Caused by Multi-Scattering of Multiple Sound Sources in HRTF Measurement—Guangzheng Yu, Bosun Xie, Zewei Chen, Yu Liu, South China University of Technology - Guangzhou, China
A model consisting of two pulsating spherical sound sources and a rigid-spherical head is proposed to evaluate the error caused by multi-scattering of multiple sound sources in HRTF measurement. The results indicate that the ipsilateral error below 20 kHz caused by multi-scattering is within ± 1 dB when the radius of sound sources does not exceed 0.025 m, the source distance to head center is not less than 0.5 m, and the angular interval between the two adjacent sources is not less than 25 degrees. This accuracy basically satisfies the requirement of ipsilateral HRTF measurements. For improving the accuracy in contralateral HRTF measurement, some sound absorption treatments on the source surface are necessary.
Convention Paper 8643 (Purchase now)
P15-2 Personalization of Headphone Spatialization Based on the Relative Localization Error in an Auditory Gaming Interface—Aki Härmä, Ralph van Dinther, Thomas Svedström, Munhum Park, Jeroen Koppens, Philips Research Europe - Eindhoven, The Netherlands
In binaural sound reproduction applications using head-related transfer functions (HRTFs) it is beneficial that the properties of the HRTFs correspond to the personal characteristics of the real HRTFs of the user. In this paper we propose a method to choose HRTFs using a relative localization test. This allows us to make the selection of the best HRTFs using a simple auditory interface. It is possible to design the HRTF personalization interface in a consumer device as an auditory game where the task of the user is to place sound objects in relation to each other. Two different interfaces are compared in a listening test. The results of the tests reported in the current paper are mixed and do not give a conclusive picture on the performance of the proposed system, however, they do give interesting insights about the properties of binaural listening.
Convention Paper 8644 (Purchase now)
P15-3 Robustness of a Mixed-Order Ambisonics Microphone Array for Sound Field Reproduction—Marton Marschall, Sylvain Favrot, Technical University of Denmark - Lyngby, Denmark; Jörg Buchholz, National Acoustic Laboratories - Chatswood, Australia
Spherical microphone arrays can be used to capture and reproduce the spatial characteristics of acoustic scenes. A mixed-order Ambisonics (MOA) approach was recently proposed to improve the horizontal spatial resolution of microphone arrays with a given number of transducers. In this paper the performance and robustness of an MOA array to variations in microphone characteristics as well as self-noise was investigated. Two array processing strategies were evaluated. Results showed that the expected performance benefits of MOA are achieved at high frequencies, and that robustness to various errors was similar to that of HOA arrays with both strategies. The approach based on minimizing the error of the reproduced spherical harmonic functions showed better performance at high frequencies for the MOA layout.
Convention Paper 8645 (Purchase now)
P15-4 An Algorithm for Efficiently Synthesizing Multiple Near-Field Virtual Sources in Dynamic Virtual Auditory Display—Bosun Xie, Chengyun Zhang, South China University of Technology - Guangzhou, China
An algorithm for efficiently synthesizing multiple near-field virtual sources in dynamic virtual auditory display (VAD) is proposed. Applying the method of principal component analysis, a set of measured near-field head-related impulse responses (HRIRs) for KEMAR manikin at various source directions and distances are decomposed into a weighted sum of 15 time-domain basic functions along with a mean time-domain function, in which the time-independent weights represent the location dependence of HRIRs. Accordingly, multiple virtual sources synthesis at various locations is implemented by a common bank of 16 filters representing the time -domain basis functions and the mean time-domain function, in which the adjustable gains of the filters for each input stimulus (as well as an overall gain and delay for each input stimulus) control intended source locations. The computational cost of the proposed algorithm is reduced compared with that of conventional ones. Psychoacoustic experiments via a dynamic VAD with head-tracking validate the performance of the proposed algorithm.
Convention Paper 8646 (Purchase now)
P15-5 Scalable Coding of Three-Dimensional Multichannel Sound—Design of Conversion Matrix and Modeling of Unmasking Phenomenon—Akio Ando, NHK Science and Technical Research Laboratories - Setagaya, Tokyo, Japan, and Tokyo Institute of Technology, Meguro, Tokyo, Japan
We propose two methods for coding and transmitting three-dimensional multichannel sound signals: scalable coding and transmission, and modeling of the quantization error. The first method converts N-channel sound signals into M-channel basic signals and (N-M)-channel additional signals using a matrix operation. The matrix is trained by simulated annealing to minimize its condition number and the energy of additional signals. The unmasking artifact may occur when the N-channel signals are restored from the decoded signals using the inverse matrix. The second method estimates the quantization error signals by the polynomial approximation of the decoded signals. Experimental results showed that the combination of both methods could realize a 1.2 Mbps scalable transmission of 22-channel sounds without a notable sound degradation.
Convention Paper 8647 (Purchase now)
P16 - High Resolution and Low Bit Rate
Saturday, April 28, 09:00 — 12:30 (Room: Liszt)
Chair:
Vesa Välimäki
P16-1 Time Domain Performance of Decimation Filter Architectures for High Resolution Sigma-Delta Analog to Digital Conversion—Yonghao Wang, Queen Mary University of London - London, UK, Birmingham City University, Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
We present the results of a comparison of different decimation architectures for high resolution sigma delta analog to digital conversion in terms of passband, transition band performance, simulated signal to noise ratio, and computational cost. In particular, we focus on the comparison of time domain group delay response of different filter architectures including classic multistage FIR, cascaded integrator-comb (CIC) with FIR compensation filters, particularly multistage polyphase IIR filter, cascaded halfband minimum phase FIR filter, and multistage minimum phase FIR filter designs. The analysis shows that the multistage minimum phase FIR filter and multistage polyphase IIR filter are most promising for low group delay audio applications.
Convention Paper 8648 (Purchase now)
P16-2 A Delta-Sigma Modulator Using Dual NTF for 1-Bit Digital Switching Amplifier—Jungmin Choi, Jaeyong Cho, Haekwang Park, Samsung Electronics - Suwon, Korea
In this paper a fifth-order single-loop single-bit delta-sigma modulator (DSM), which is constructed by cascade-of-integrator, feed-back (CIFB) form for a 1-bit digital audio switching amplifier is proposed. High order DSM can achieve high signal-to-noise ratio (SNR), but it has probability that the oscillation occur. To achieve high SNR and improve the stability of the modulator for a large input range, we propose the DSM which is composed of dual noise transfer function (NTF). The one is high SNR mode that maximizes SNR of DSM and the other is stable mode that enhances stability of DSM. The proposed architecture is simulated in the register transfer level (RTL) and implemented in the FPGA board.
Convention Paper 8649 (Purchase now)
P16-3 9 Years HE AAC—Technical Challenges Using an Open Standard in Real-World Applications—Martin Wolters, Dolby Germany GmbH - Nuremberg, Germany; Gregory McGarry, Dolby Australia Pty Ltd., - Sydney, Australia; Andreas Schneider, Robin Thesing, Dolby Germany GmbH - Nuremberg, Germany
The technical work on creating the MPEG HE AAC standard was finished nine years ago. Since then the format has become very popular in specific markets and devices such as PCs and mobile phones mainly due to its high-compression efficiency. However, creating a reliable eco-system based on this open standard remains a technical challenge. In this paper results of several compatibility tests, which were conducted over the last two years on both mobile phones and broadcast receivers, are presented. The problems encountered and recommended solutions are described.
Convention Paper 8650 (Purchase now)
P16-4 Subjective Tests on Audio Mix Dedicated to MP3 Coding—Szymon Piotrowski, AGH University of Science and Technology - Krakow, Poland; Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland
Over the past years the Internet has become very popular as a means of distributing audio. MP3 coded audio is present in the Internet, in a bus, in broadcast. Sound engineers agree that there can often be a lack of control over the downstream processing that is applied to final material. The aim of the presented work is to compare audio mixes dedicated to CD format and converted to MP3 format with MP3-dedicated productions to and evaluate them.
Convention Paper 8651 (Purchase now)
P16-5 New Enhancements for Improved Image Quality and Channel Separation in the Immersive Sound Field Rendition (ISR) Parametric Multichannel Audio Coding System—Hari Om Aggrawal, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - NJ, USA
Consumer audio applications such as satellite broadcasts, HDTV, multichannel audio streaming, gaming, and playback systems are highlighting newer challenges in low-bit-rate parametric multichannel audio coding. This paper describes the continuation of our research related to the Immersive Sound field Rendition (ISR) parametric multichannel encoding system. We focus on the recent enhancements for the surround and center channel generation components of the ISR system. The emphasis being on improving the fidelity and quality of reconstructed 5/5.1-channel audio so that it achieves a level of transparency desirable for high end applications. Furthermore, it is being attempted to improve the robustness of the coding scheme to various difficult signals and listening environments by reducing inter-channel leakage to a minimum. We describe challenging case, various algorithmic improvements to the ISR algorithm to address these and also discuss the subjective impact of these algorithmic improvements.
Convention Paper 8652 (Purchase now)
P16-6 Novel Decimation-Whitening Filter in Spectral Band Replication—Han-Wen Hsu, Chi-Min Liu, National Chiao Tung University - Hsinchu, Taiwan
MPEG-4 high-efficiency advanced audio coding (HE-AAC) has adopted spectral band replication (SBR) to efficiently compress high-frequency parts of the audio. In SBR, the linear prediction is applied to low-frequency subbands to clip the tonal components and smooth the associated spectrum for replicating to high-frequency bands. Such a process is referred to as the whitening filtering. In SBR, to avoid the alias artifact from spectral adjustment, a complex filterbank instead of real filterbank is adopted. For the QMF subbands, this paper analyzes that the linear prediction defined in SBR standard results in the predictive biases. An new whitening filter, called the decimation-whitening filter, is proposed to eliminate the predictive bias and provide advantages in terms of noise-to-signal ratio measure, frequency resolution, energy leakage, and computational complexity for SBR.
Convention Paper 8653 (Purchase now)
P16-7 MPEG Unified Speech and Audio Coding—The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types—Max Neuendorf, Markus Multrus, Nikolaus Rettelbach, Guillaume Fuchs, Julien Robilliard, Jérémie Lecomte, Stephan Wilde, Stefan Bayer, Sascha Disch, Christian Helmric, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Roch Lefebvre, Philippe Gournay, Bruno Bessette, Jimmy Lapierre, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Kristofer Kjörling, Heiko Purnhagen, Lars Villemoes, Dolby Sweden AB - Stockholm, Sweden; Werner Oomen, Erik Schuijers, Philips Research Laboratories - Eindhoven, The Netherlands; Kei Kikuiri, NTT DOCOMO, INC. - Yokosuka, Kanagawa, Japan; Toru Chinen, Sony Corporation - Shinagawa, Tokyo, Japan; Takeshi Norimatsu, Chong Kok Seng, Panasonic Corporation; Eumi Oh, Miyoung Kim, Samsung Electronics - Suwon, Korea; Schuyler Quackenbush, Audio Research Labs - Scotch Plains, NJ, USA; Bernhard Grill, Fraunhofer IIS, Erlangen, Germany
In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) finalized the new MPEG-D Unified Speech and Audio Coding standard. The new codec brings together the previously separated worlds of general audio coding and speech coding. It does so by integrating elements from audio coding and speech coding into a unified system. The present publication outlines all aspects of this standardization effort, starting with the history and motivation of the MPEG work item, describing all technical features of the final system, and further discussing listening test results and performance numbers that show the advantages of the new system over current state-of-the-art codecs.
Convention Paper 8654 (Purchase now)
P17 - Audio Effects
Saturday, April 28, 11:30 — 13:00 (Room: Lehar)
Chair:
Christof Faller
P17-1 Amplitude Manipulation for Perceived Movement in Depth—Sonia Wilkie, Tony Stockman, Joshua D. Reiss, Queen Mary University of London - London, UK
The presentation of objects moving in depth toward the viewer (looming) is a technique used in film (particularly those in 3-D) to assist in drawing the viewer into the created world. The sounds that accompany these looming objects can affect the extent to which a viewer can perceptually immerse within the multidimensional world and interact with moving objects. However the extent to which sound parameters should be manipulated remains unclear. For example, amplitude, spectral components, reverb and spatialization can all be altered, but the degree of their alteration and the resulting perception generated, need greater investigation. This paper presents the results from an investigation into one of the sound parameters used as an audio cue in looming scenes by the film industry, namely amplitude, reporting the degree and slope of their manipulation.
Convention Paper 8655 (Purchase now)
P17-2 Virtual 5.1 Channel Reproduction of Stereo Sound for Mobile Devices—Kangeun Lee, Changyong Son, Dohyung Kim, Shihwa Lee, Samsung Advanced Institute of Technology - Suwon, Korea
With rapid development in mobile devices, consumer demand for a premium sound experience is growing. In this paper a method for 5.1 channel upmixing is introduced. First, the primary and ambience components are separated and the primary signal is decomposed into the standard 5.1 channel direction. Since all separated sources should appear in the upmixed output, we designed the new masking scheme for panning coefficient. In order to reproduce the 5.1 channel sound field in the earphone or headset, the decomposed multichannel signal is virtually rendered by means of HRTF. The proposed method was compared with conventional upmixing methods and demonstrated better spatiality image and panning effects with very low complexity requirements, which allows easy implementation on a wide variety of platforms.
Convention Paper 8656 (Purchase now)
P17-3 From Short- to Long-Range Signal Tunneling—Alexander Carôt, Hochschule Anhalt - Köthen/Anhalt, Germany; Horst Aichmann, Agilent Technologies - Kronberg, Germany
Research results in superluminal pulse transmission indicate propagation with velocities faster than the speed of light. This has so far been considered a short-range effect to be applied within distances of several centimeters. Based on these results and the corresponding theory of quantum tunneling the authors
revisit such experiments with distances of several meters. They show that long-range superluminal signal transmission is possible and that it is only restricted by the actual signal bandwidth.
Convention Paper 8657 (Purchase now)
P18 - Education and Human Factors; Applications
Saturday, April 28, 13:00 — 14:30 (Room: Foyer)
P18-1 Optimizing Teaching Room Acoustics: A Comparison of Minor Structural Modifications to Dereverberation Based on Smoothed Responses—Panagiotis Hatziantoniou, University of Patras - Patras, Greece; Nicolas-Alexander Tatlas, Stelios Potirakis, Technological Education Institute of Piraeus - Aigaleo-Athens, Greece
In this work a comparison between traditional acoustical treatment such as building material substitution and digital room acoustics dereverberation is presented for teaching rooms. Measured responses for a number of listening positions for two rooms are shown as well as relevant parameters namely T30, EDT, C-50, and STI. Corresponding values are calculated by employing a simulation model in order to verify its accuracy; minor changes are then introduced to the model aiming to improve speech intelligibility and a new set of parameters is obtained. Finally, dereverberation achieved from inversion of appropriately modified measured room responses based on Complex Smoothing is employed and the acoustical parameters from the filtered impulse responses are derived.
Convention Paper 8658 (Purchase now)
P18-2 Designing an Audio Engineer's User Interface for Microphone Arrays—Stefan Weigand, Technicolor, Research and Innovation - Hannover, Germany, University of Applied Science (HAW), Hamburg, Germany; Thomas Görne, University of Applied Science (HAW) - Hamburg, Germany; Johann-Markus Batke, Technicolor, Research and Innovation - Hannover, Germany
Microphone arrays are rarely used in artistic recordings, despite the benefits they offer. We think this is due to a lack of user interfaces considering audio engineers' needs, enabling them to address the features in an easy and suited way. This paper contributes to solving this problem by outlining guidelines for such interfaces. A graphical user interface (GUI) for microphone arrays employing Higher Order Ambisonics (HOA), incorporating the audio engineers' aims and expectations, has been developed by analyzing their common activities. The presented solution offers three operation modes, covering the most frequent tasks in (professional) audio productions, thus making it more likely to engage audio engineers in using microphone arrays in the future.
Convention Paper 8659 (Purchase now)
P18-3 User Interface Evaluation for Discrete Sound Placement in Film and TV Post-Production—Braham Hughes, Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
This paper describes initial experiments to evaluate the effectiveness of different 3-D input interfaces combined with visual feedback for discrete sound placement in film and TV post-production. The experiments required the user to control the 3-D position of a sound object for a moving target object within a video clip on screen using a range of physical interfaces with and without visual feedback. Inclusion of visual feedback had a statistically significant impact on the accuracy of the tracking of the target object. The Wii remote controller appeared to perform the best in the tests and in the user preference ranking. The traditional desk-based input method performed worst in all tests and the user preference ranking.
Convention Paper 8660 (Purchase now)
P18-4 KnuckleTap—Exploring the Possibilities of Audio Input in a Mobile Rhythmic Notepad Application—Julian Rubisch, breakingwav.es - Vienna, Austria; Michael Jaksche, University of Applied Sciences - St. Poelten, Austria
Apart from some significant contributions in the scientific community and a few notable product innovations, audio input for musical interaction purposes on mobile devices has until now been widely neglected as a control parameter. This situation does not correspond to the fact that musical ideas are often vocalized and developed by musicians using their voice or other sounding objects, nor does it take advantage of contemporary mobile devices' most rigid and reliable sensor—the microphone. As an example case exploiting these possibilities, we conceived a notepad application to track rhythmic ideas by recording taps on a surface with a smartphone's built-in microphone, refined by subsequent detection of onsets, clustering, instantaneous rearranging of the detected events, and export capabilities.
Convention Paper 8661 (Purchase now)
P18-5 An Optical System to Track Azimuth Head Rotations for Use in Binaural Listening Tests of Automotive Audio Systems—Anthony Price, Bang & Olufsen a/s - Struer, Denmark, presently at University of Surrey, Guildford, Surrey, UK
Binaural technology is used to capture elements of an automotive audio system and reproduce them over headphones. This requires the tracking of azimuth head rotations of listening test participants in order to assist source localization. The parameters and faults of the currently implemented system are discussed and a new method of tracking azimuth head rotations is described. The system is tested, implemented, and found to have an error within 0.26 degrees. The potential for its further development, and the development of the field, is discussed.
Convention Paper 8662 (Purchase now)
P18-6 Investigation of Salient Audio-Features for Pattern-Based Semantic Content Analysis of Radio Productions—Rigas Kotsakis, George Kalliris, Charalampos Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece
The paper focuses on the investigation of salient audio features for pattern-based semantic analysis of radio programs. Most “news and music” radio programs have many structure similarities with respect to the appearance of different content types. Speech and music are continuously interchanged and overlapped, whereas specific speakers and voice patterns are more important to recognize. Recent research showed that various taxonomies and hierarchical classification schemes can be effectively deployed in combination with supervised and unsupervised training for semantic audio content analysis. Undoubtedly, audio feature extraction and selection is very important for the success of the finally trained expert system. The current paper employs feature ranking algorithms, investigating audio features saliency in various classification taxonomies of radio production content.
Convention Paper 8663 (Purchase now)
P18-7 Listeners Who Have Low Hearing Thresholds Do Not Perform Better in Difficult Listening Tasks—Piotr Kleczkowski, Marek Pluta, Paulina Macura, Elzbieta Paczkowska, AGH University of Science and Technology - Krakow, Poland
The relationship between measures of hearing acuity and performance in listening tasks for normally hearing subjects has not found a solid evidence. In this work six one-parameter measures of hearing acuity, based on audiograms, were used to investigate whether a relationship between those measures and listeners’ performance existed. The quantifiable results of several listening tests were analyzed, using speech and non-speech stimuli. The results showed no correlation between hearing acuity and performance thus demonstrating that hearing acuity should not be a critical factor in the choice of listeners.
Convention Paper 8641 (Purchase now)
P19 - Spatial Audio: Part 2
Saturday, April 28, 14:30 — 17:30 (Room: Lehar)
Chair:
Antti Kelloniemi
P19-1 Sound Field Reproduction Method in Spatio-Temporal Frequency Domain Considering Directivity of Loudspeakers—Shoichi Koyama, Ken'ichi Furuya, Yusuke Hiwasaki, Yoichi Haneda, NTT Cyber Space Laboratories, NTT Corporation - Musashino-shi, Tokyo, Japan
A method for transforming received signals of a microphone array into driving signals of a loudspeaker array for sound field reproduction is needed to achieve real-time sound field transmission systems from the far-end to the near-end. We recently proposed a transform method using planar or linear microphone and loudspeaker arrays in the spatio-temporal frequency domain, which is more efficient than conventional methods based on a least squares algorithm. In this method, the directivity of loudspeakers in the array is assumed to be omnidirectional to derive the transform filter. However the directivity of common loudspeakers is not always omnidirectional, especially at high frequencies. We therefore propose a transform method that takes into consideration the directivity of loudspeakers in the array, which is derived using analytical and numerical approaches. Numerical simulation results indicated that the accurately reproduced region of the proposed method was larger than that of the method with an omnidirectional assumption.
Convention Paper 8664 (Purchase now)
P19-2 Practical Applications of Chameleon Subwoofer Arrays—Adam J. Hill, Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK
Spatiotemporal variations of the low-frequency response in a closed-space are predominantly caused by room-modes. Chameleon subwoofer arrays (CSA) were developed to minimize this variance over a listening area using multiple independently-controllable source components and calibrated with one-time measurements. Although CSAs are ideally implemented using hybrid (multiple source component) subwoofers, they can alternatively be realized using conventional subwoofers. This capability is exploited in this work where various CSA configurations are tested using commercially-available subwoofers in a small-sized listening room. Spectral and temporal evaluation is performed using tone-burst and maximum length sequence (MLS) measurements. The systems are implemented with practicality in mind, keeping the number of subwoofers and calibration measurements to a minimum while maintaining correction benefits.
Convention Paper 8665 (Purchase now)
P19-3 Localization in Binaural Reproduction with Insert Headphones—Marko Hiipakka, Marko Takanen, Symeon Delikaris-Manias, Archontis Politis, Ville Pulkki, Aalto University - Espoo, Finland
Circumaural headphones are commonly used in binaural reproduction and it is well known that individual equalization of the headphones improves the quality of the reproduction. The suitability of insert headphones to binaural reproduction has not been studied partly due the lack of a commonly accepted individual equalization method for insert headphones. Recently, a method to estimate the frequency response evoked by insert headphones has been presented. In this paper the localization accuracy of test subjects is evaluated in binaural listening with insert headphones and high-quality circumaural headphones. The results show that the accuracy with inserts is similar to that with circumaural headphones when the recently proposed method is applied for equalization, which motivates their use in binaural reproduction.
Convention Paper 8666 (Purchase now)
P19-4 A Comparative Evaluation between Numerical Techniques for Implementing the Acoustic Diffusion Equation Model—Juan M. Navarro, Juan E. Noriega, San Antonio's Catholic University of Murcia - Guadalupe, Spain; Jose Escolano, University of Jaén - Linares, Spain; Jose J. Lopez, Universidad Politécnica de Valencia - Valencia, Spain
The acoustic diffusion equation model is an energy-based model that is being successfully applied in room acoustics for predicting the late part of the decay, in the past few years. Early researches usually used a finite element method to solve the diffusion equation model. Recently, an alternative implementation, using finite difference methods has been proposed. A comparison between both numerical techniques could be helpful to clarify the pros and cons of each method. In this paper this evaluation is made by several simulations in a cubic shaped room. Both prediction accuracy and computational performance are compared using different absorption distributions. It is suggested that the finite difference implementation is less computationally intensive than the finite element method. Moreover, the obtained values in the simulations are accurate, at least as well as other geometrical models.
Convention Paper 8667 (Purchase now)
P19-5 A Bayesian Framework for Sound Source Localization—José Escolano, University of Jaén - Linares, Spain; Maximo Cobos, University of Valencia - Burjassot, Valencia, Spain; Jose M. Pérez-Lorenzo, University of Jaén - Linares, Spain; José J. López, Universidad Politécnica de Valencia - Valencia, Spain; Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
The localization of sound sources, and particularly speech, has a numerous number of applications to the industry. This has motivated a continuous effort in developing robust direction-of-arrival detection algorithms. Time difference of arrival-based methods, and particularly, generalized cross-correlation approaches have been widely investigated in acoustic signal processing. Once a probability function distribution is obtained, indicating those directions of arrival with highest probability, the vast majority of methods have to assume a certain number of sound sources in order to process the information conveniently. In this paper a model selection based on a Bayesian framework is proposed in order to determine, in an unsupervised way, how many sound sources are estimated. Real measurements using two microphones are used to corroborate the proposed model.
Convention Paper 8668 (Purchase now)
P19-6 A Comparison of Modal versus Delay-and-Sum Beamforming in the Context of Data-Based Binaural Synthesis—Sascha Spors, Hagen Wierstorf, Matthias Geier, Deutsche Telekom Laboratories, Technische Universität Berlin - Berlin, Germany
Several approaches to data-based binaural synthesis have been published that capture a sound field by means of a spherical microphone array. The captured sound field is decomposed into plane waves that are then auralized using head-related transfer functions (HRTFs). The decomposition into plane waves is often based upon modal beamforming techniques that represent the captured sound field with respect to surface spherical harmonics. An efficient and numerically stable approximation to modal beamforming is the delay-and-sum technique. This paper compares these two beamforming techniques in the context of data-based binaural synthesis. Their frequency- and time-domain properties are investigated, as well as the perceptual properties of the resulting binaural synthesis according to a binaural model.
Convention Paper 8669 (Purchase now)
P20 - Transducers
Sunday, April 29, 09:00 — 11:30 (Room: Lehar)
Chair:
David Griesinger
P20-1 Loudspeaker for Low Frequency Signal Driven by Four Piezoelectric Ultrasonic Motors—Juro Ohga, Shibaura Institute of Technology/MIX Corporation - Kamakura, Japan; Ryousuke Suzuki, Keita Ishikawa, Chiba Institute of Technology - Narashino, Japan; Hirokazu Negishi, MIX Corporation - Yokosuka, Japan; Ikuo Oohira, I. Oohira and Associates - Yokohama, Japan; Kazuaki Maeda, TOA Corporation - Takarazuka, Japan; Hajime Kubota, Chiba Institute of Technology - Narashino, Japan
The authors have been developing a completely new direct-radiator loudspeaker construction that is driven by continuous revolution of piezoelectric ultrasonic motors. It converts continuous revolution of ultrasonic motors to reciprocal motion of a cone radiator. This loudspeaker shows almost flat phase frequency characteristics in low frequency region, because it includes no resonance in low frequency region. Therefore it is useful for radiation of the lowest frequency part of the audio signal. At this convention the authors are going to present a practical model of this loudspeaker driven by co-operation of four ultrasonic motors.
Convention Paper 8670 (Purchase now)
P20-2 Comprehensive Measurements of Head Influence on a Supercardioid Microphone—Hannes Pomberger, Franz Zotter, University of Music and Performing Arts Graz - Graz, Austria; Dominik Biba, AKG Acoustics GmbH - Vienna, Austria
The directional pickup pattern of microphones is designed as to assist the audio engineer in avoiding acoustic feedbacks or interference from other sound sources. If the pattern deviates from the specified one, it is important for the audio engineer to know. This paper presents comprehensive measurements of a supercardioid microphone directivity under the influence of a dummy head. This dummy models the diffraction of a human talker or singer in front of the microphone. The discussed measurements collect information on a 15x15 degree grid in azimuth and elevation for different distances between the microphone and dummy head. Based on this data, we are able to discuss the influence of a singer’s body on the free field directivity of the supercardioid microphone, its directivity index, and its front-to-back random ratio in detail.
Convention Paper 8671 (Purchase now)
P20-3 Simulation of a 4” Compression Driver Using a Fully Coupled Vibroacoustic Finite Element Analysis including Viscous and Thermal Losses—René Christensen, Ulrik Skov, iCapture ApS - Gadstrup, Denmark
A 4” JBL compression driver is simulated using a finite element alanysis, FEA. Compared to a conventional electrodynamic driver a compression driver has a phase plug with slits in front of the diaphragm. The slits are acoustically narrow and the diaphragm is separated from the phase plug only by a thin gap so an accurate model must include viscothermal effects to account for the losses associated with the narrow air gaps. Air domains, structural domains, and viscothermal domains are all fully coupled to ensure the proper continuity of their variables. Simulated results are compared to experimental measurements and it is demonstrated how the viscothermal effects dampen out acoustic and structural modes.
Convention Paper 8672 (Purchase now)
P20-4 Design of Vented Boxes Using Current Feedback Filters—Juha Backman, Nokia Corporation - Espoo, Finland; Tim Mellow, MWT Acoustics - Farnham, Surrey, UK
A current feedback arrangement for a loudspeaker system that can be tuned to provide a pre-determined frequency-response shape over a fairly wide and continuous range of box volumes is discussed. A conventional high-pass filter only allows the system to be tuned to give a particular frequency-response shape if the box volume is correct. The traditional arrangement is either a flat response amplifier, a passive filter between the amplifier and loudspeaker, or an active filter before the amplifier. This paper discusses an alternative arrangement where current feedback and filter provide the desired amplifier output impedance and voltage transfer function characteristics. These interact directly with the complex load-impedance of the loudspeaker. Practical realizations of the current feedback implementations are presented.
Convention Paper 8673 (Purchase now)
P20-5 Midrange Coloration Caused by Resonant Scattering in Loudspeakers—Juha Backman, Nokia Corporation - Espoo, Finland
One of the significant sources of midrange coloration in loudspeakers is the resonant scattering of the exterior sound field from ports, recesses, or horns. This paper discusses a computationally efficient model for such scattering, based on waveguide models for the acoustical elements (ports, etc.), and mutual radiation impedance model for their coupling to the sound field generated by the drivers. This allows rapid evaluation of the effect of port placement, suitable for numerical optimization of loudspeaker enclosure layouts.
Convention Paper 8674 (Purchase now)
P21 - Audio Equipment and Instrumentation
Sunday, April 29, 12:00 — 13:30 (Room: Foyer)
P21-1 Evaluation of Vibrating Sound Transducers with Glass Membrane Based on Measurements and Numerical Simulations—György Wersényi, Széchenyi István University - Györ, Hungary
In recent years manufacturers introduced so-called "invisible sound" solutions. In-wall, surface mount, or glass mount versions of different vibrating transducers are commercially available. The entire surface becomes a loudspeaker delivering sound, and the frequency response is said to be equivalent to conventional diaphragm speakers. Furthermore, the sound is omnidirectional at nearly all frequencies (60 Hz – 15 kHz) while channel separation is maintained. This paper presents measurement results of the SolidDrive SD1g transducer mounted on different glass surfaces, including vibration measurements and acoustic parameters. Furthermore, based on a numerical FEM-model using COMSOL, comparison between measured and simulated results and estimation of transfer function and directional characteristics are presented.
Convention Paper 8675 (Purchase now)
P21-2 Voicecoil Inter-Turn Faults Modeling and Simulation—German Ruiz, J. A. Ortega, J. Hernández, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
The purpose of this paper is to present a new model to study intern-turn short circuit faults in a dynamic loudspeaker. The loudspeaker is modeled by using a classical voice coil parametric model attached to a mechanical piston, and the equations are modified to take into account the voice coil inter-turn faults. The loudspeaker model is global and can work in both normal and fault conditions due to a fictitious resistance in the winding circuit. Various simulation results have been presented indicating the fault instant in time and its corresponding effect in power spectral density. The model can serve as a step toward development of fault detection and diagnosis algorithm.
Convention Paper 8676 (Purchase now)
P21-3 Headphone Selection for Binaural Synthesis with Blocked Auditory Canal Recording—Florian Völk, AG Technische Akustik, MMK, Technische Universität München - Munich, Germany
Binaural synthesis aims at eliciting the reference scene hearing sensations by recreating the sound pressures at the eardrums, typically using headphones. If all transfer functions involved are approximated based on eardrum probe microphone or traditional artificial head measurements, the headphones have been shown not to influence the synthesis. It is also possible to achieve correct binaural synthesis with transfer functions measured at the entrances to the blocked auditory canals. Then, the headphones may influence the results. In this paper a blocked auditory canal headphone selection criterion (HPC) for binaural synthesis is proposed. Further, a procedure is derived, which allows to evaluate the (HPC) for specific circum-aural headphones based on four measurements using a specifically designed artificial head.
Convention Paper 8677 (Purchase now)
P21-4 A Low Latency Multichannel Audio Processing Evaluation Platform—Yonghao Wang, Queen Mary University of London - London, UK; Xiangyu Zhu, Hebel University of Science and Technology - Shijiazhuang, China; Qiang Fu, Shijiazhuang Mechanical Engineering College - Shijiazhuang, China
For live digital audio system with high-resolution multichannel functionalities, it is desirable to have accurate latency control and estimation over all of the stages of the digital audio processing chain. The evaluation system we designed supports 12 channel- 24-bit sigma delta based ADC/DAC, incorporating both a programmable FPGA and digital signal processor. It can be used for testing and evaluation of different ADC/DAC digital filter architectures, audio sample buffer subsystem design, interrupt and scheduling, high level audio processing algorithms, and other system factors that might cause the latency effects. It also can estimate the synchronization and delay of multiple channels.
Convention Paper 8678 (Purchase now)
P22 - Quality Evaluation
Sunday, April 29, 14:00 — 16:30 (Room: Lehar)
Chair:
György Wersényi
P22-1 Evaluating Spatial Congruency of 3-D Audio and Video Objects—Kristina Kunze, Judith Liebetrau, Thomas Korn, Fraunhofer Institute for Digital Media Technology, IDMT - Ilmenau, Germany
In this paper we demonstrate the evaluation of spatial congruency of object based audio and 3-D video reproduction. With current developments in 3-D video representation we are able to introduce a depth dimension. Furthermore, audio reproduction systems like Wave Field Synthesis are able to reproduce the sound field of virtual sound sources at various positions in the room, also in front of or behind a video screen. When combing these technologies audio objects can be placed at the positions of 3-D video objects. Subjective evaluations are needed to investigate the quality of such combinations. In our experiment we displaced the audio and video objects with respect to certain angles and evaluated the noticeable displacement angle. Displacements of more than 5° are noticeable and become annoying above 10°.
Convention Paper 8679 (Purchase now)
P22-2 Some New Evidence that Teenagers and College Students May Prefer Accurate Sound Reproduction—Sean Olive, Harman International Industries Inc. - Northridge, CA, USA
A group of 18 high school and 40 college students with different expertise in sound evaluation participated in two separate controlled listening tests that measured their preference choices between music reproduced in (1) MP3 (128 kbp/s) and lossless CD-quality file formats, and (2) music reproduced through four different consumer loudspeakers. As a group, the students preferred the CD-quality reproduction in 70% of the trials and preferred music reproduced through the most accurate, neutral loudspeaker. Critical listening experience was a significant factor in the listeners’ performance and preferences. Together, these tests provide some new evidence that both teenagers and college students can discern and appreciate a better quality of reproduced sound when given the opportunity to directly compare it against lower quality options.
Convention Paper 8683 (Purchase now)
P22-3 Evaluation of Cultural Similarity in Playlist Generation—Mariusz Kleæ, Polish-Japanese Institute of Information Technology - Warsaw, Poland
Choosing appropriate songs that satisfy one’s needs is often frustrating, tiresome, and ineffective due to the increasing number of music collections and their sizes. Successive songs should fit the situation, our mood, or at least have common features. Consequently, there is a need to develop solutions that would enrich our experience of listening to music. In this paper musical similarity has been studied at the cultural level in playlist generation process. Also, an author’s program designed for testing different playlists will be described. It is used to perform an experiment examining quality of playlists created using a cultural similarity model.
Convention Paper 8681 (Purchase now)
P22-4 Toward an Unbiased Standard in Testing Laptop PC Audio Quality—Ravi Kondapalli, Ben-Zhen Sung, CCRMA, Stanford University - Stanford, CA, USA
For a rapidly growing population, laptop PCs and tablet devices have become a primary means of watching and listening to electronic media. Nonetheless, form factor restrictions and placement of device functionality over audio quality have left a gap in the overall quality of the laptop listening experience. Multiple audio Post-processing solutions exist from audio DSP developers that aim to improve this. However, the quality of audio produced by pairing these post-processing solutions with PCs from a variety of manufacturers varies greatly. To date, there have been no blind and systematic comparisons of audio quality resulting from unique PC/post-processing algorithm pairings. Here we present an unbiased methodological approach for evaluating such combinations, looking at audio quality for three different commercially available post-processing solutions as implemented on laptop PCs from two different manufacturers.
Convention Paper 8682 (Purchase now)
P22-5 User Evaluation on Loudness Harmonization on the Web—Gerhard Spikofski, Peter Altendorf, Christian Hartmann, Institut für Rundfunktechnik - Munich, Germany
The problem of annoying loudness jumps occurring between different TV and radio stations or programs became generally known over the last decades. Since the publication of the recommendations ITU-R BS.1770-1 (in 2006) and EBU-R 128 (in 2010), and the actual version of the ITU Recommendation BS.1770-2 (2011), more and more tools have become available that allow loudness harmonization in broadcasting. The European research project NoTube integrates loudness harmonization in its research concepts in order to investigate the applicability of these tools in web environments where the situation is even worse. The user evaluation of different loudness and loudness range adaptations presented in this paper was carried out online and thus considers real web conditions. In particular the interdependence between parameters of the listening environment and the loudness/loudness range adaptations are highlighted based on the results of nearly 100 participants.
Convention Paper 8680 (Purchase now)