AES Rome 2013
Paper Session Details
P1 - Education and Semantic Audio
Saturday, May 4, 10:30 — 12:30 (Sala Carducci)
Jörn Loviscach, University of Applied Sciences - Bielefeld, Germany
P1-1 Alternate Software and Pedagogical Strategies for Teaching Audio Technology to Classically Trained Music Students—Christopher J. Keyes, Hong Kong Baptist University - Kowloon, Hong Kong
Despite the many benefits, teaching audio and audio technology to music students brings as many challenges as rewards. Their vastly different scientific and technical backgrounds often require fundamentally different pedagogical strategies from other subjects in a music curriculum. The dissemination of information via lectures and/or lengthy technical readings is especially problematic. This paper addresses these challenges by presenting a new pedagogical approach centered on a free pedagogical software package: the Interactive Workshop for Audio Intelligence and Literacy (i-WAIL). When combined with inquiry-based learning approaches and projects based on the student’s area of interest, it has so far proven highly effective with music students of a wide range of previous abilities.
Convention Paper 8809 (Purchase now)
P1-2 Auditory-Visual Attention Stimulator—Adam Kupryjanow, Gdansk University of Technology - Gdansk, Poland; Lukasz Kosikowski, Gdansk University of Technology - Gdansk, Poland; Piotr Odya, Gdansk University of Technology - Gdansk, Poland; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A new approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention. In this approach hearing is stimulated using time scale modified speech, and sight is stimulated rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques, i.e., zooming, highlighting, etc. In the experimental part of the paper, results obtained for the reading comprehension training were presented. It was shown that usage of the proposed method could improve this skill in a group of children between the ages of 7 and 8 years.
Convention Paper 8810 (Purchase now)
P1-3 Evaluation of Acoustic Features for Music Emotion Recognition—Chris Baume, BBC Research and Development - London, UK
Classification of music by mood is a growing area of research with interesting applications, including navigation of large music collections. Mood classifiers are usually based on acoustic features extracted from the music, but often they are used without knowing which ones are most effective. This paper describes how 63 acoustic features were evaluated using 2,389 music tracks to determine their individual usefulness in mood classification, before using feature selection algorithms to find the optimum combination.
Convention Paper 8811 (Purchase now)
P1-4 Investigating Auditory Human-Machine Interaction: Analysis and Classification of Sounds Commonly Used by Consumer Devices—Konstantinos Drossos, Ionian University - Corfu, Greece; Rigas Kotsakis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Panos Pappas, Technological Educational Institute of Ionian Islands - Lixouri, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece; Andreas Floros, Ionian University - Corfu, Greece
Many common consumer devices use a short sound indication for declaring various modes of their functionality, such as the start and the end of their operation. This is likely to result in an intuitive auditory human-machine interaction, imputing a semantic content to the sounds used. In this paper we investigate sound patterns mapped to "Start" and "End" of operation manifestations and explore the possibility such semantics’ perception to be based either on users’ prior auditory training or on sound patterns that naturally convey appropriate information. To this aim, listening and machine learning tests were conducted. The obtained results indicate a strong relation between acoustic cues and semantics along with no need of prior knowledge for message conveyance.
Convention Paper 8812 (Purchase now)
P2 - Audio Signal Processing—Part 1
Saturday, May 4, 10:30 — 12:30 (Sala Foscolo)
Danilo Comminiello, Sapienza University of Rome - Rome, Italy
P2-1 Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770—Pedro Duarte Pestana, Research Center for Science and Technology of the Arts, Portguese Catholic University (CITAR-UCP) - Almada, Portugal; CEAUL-FCUL; Joshua D. Reiss, Queen Mary University of London - London, UK; Alvaro Barbosa, Research Center for Science and Technology of the Arts, Portguese Catholic University (CITAR-UCP) - Lisbon, Portugal
The recent loudness measurement recommendations by the ITU and the EBU have gained widespread recognition in the broadcast community. The material it deals with is usually full-range mastered audio content, and its applicability to multitrack material is not yet clear. In the present work we investigate how well the evaluated perception of single track loudness agrees with the measured value as defined by ITU-R BS.1770. We analyze the underlying features that may be the cause for this disparity and propose some parameter alterations that might yield better results for multitrack material with minimal modification to their rating of broadcast content. The best parameter sets are then evaluated by a panel of experts in terms of how well they produce an equal-loudness multitrack mix and are shown to be significantly more successful.
Convention Paper 8813 (Purchase now)
P2-2 Delayless Robust DPCM Audio Transmission for Digital Wireless Microphones—Florian Pflug, Technische Universität Braunschweig - Braunschweig, Germany; Tim Fingscheidt, Technische Universität Braunschweig - Braunschweig, Germany
The employment of digital wireless microphones in professional contexts requires ultra-low delay, strong robustness, and high audio quality. On the other hand, audio coding is required in order to comply with the restrictions on the bandwidth of the radio channel, making the resulting source-coded audio signal more vulnerable to channel distortions. Therefore, in this paper we present a transmission system for differential pulse-code modulated (DPCM) audio with receiver-sided soft-decision error concealment exploiting channel reliability information, explicit redundancy by simple delayless parity-check codes, and residual redundancy within the source-coded audio signal. Simulations on frequency-shift keying (FSK)-modulated channels with additive white Gaussian noise show considerable gains in audio quality compared to hard-decision decoding and soft-decision decoding only exploiting reliability information and 0th-order a priori knowledge.
Convention Paper 8814 (Purchase now)
P2-3 Violin Sound Computer Classification Based on Expert Knowledge—Adam Robak, Poznan University of Technology - Poznan, Poland; Ewa Lukasik, Poznan University of Technology - Poznan, Poland
The paper presents results of the analysis of violins recorded during the final stage of the international violin-makers competition held in Poznan in 2011. In the quest for attributes that are both efficient for machine learning and interpretable for human experts we referred to the research of violin acousticians: Duennwald, Buen, and Fritz and calculated violin sound power in frequency bands recommended by these researchers. The resulting features, obtained for the averaged spectra of the musical pieces played at the competition, were used for clustering and classification experiments. Results are discussed, and a notable experiment is presented where the classifier assigns each analyzed violin to an instrument from the precedent violinmakers’ competition (2001) and compares their ranking.
Convention Paper 8815 (Purchase now)
P2-4 A Finite Difference Method for the Excitation of a Digital Waveguide String Model—Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Luca Remaggi, Universitá Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Vesa Välimäki, Aalto University - Espoo, Finland
With Digital Waveguide modeling (DWG) a number of excitation methods have been proposed to feed the delay line properly. Generally speaking these may be based on signal models fitting recorded samples, excitation signals extracted from recorded samples or digital filter networks. While allowing for a stable, computationally efficient sound emulation, they may be unable to emulate secondary effects generated by the physical interaction of, e.g., distributed interaction of string and hammer. On the other hand, FDTD (Finite Difference Time Domain) models are more accurate in the emulation of the physical excitation mechanism at the expense of a higher computational cost and a complex coefficient design to ensure numerical stability. In this paper a mixed model is proposed composed of a two-step FDTD model, a commuted DWG, and an adaptor block to join the two sections. Properties of the model are provided and computer results are given for the case of the Clavinet tangent-string mechanism as an example application.
Convention Paper 8816 (Purchase now)
P3 - Perception
Saturday, May 4, 14:15 — 17:15 (Sala Carducci)
Frank Melchior, BBC Research and Development - Salford, UK
P3-1 The Relation between Preferred TV Program Loudness, Screen Size, and Display Format—Ian Dash, Consultant - Marrickville, NSW, Australia; Todd Power, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia
The effect of television screen size and display format on preferred TV program loudness was investigated by listening tests using typical program material. While no significant influence on preferred program loudness was observed from screen size or color level, other preference patterns related to soundtrack content type were observed that are of interest.
Convention Paper 8817 (Purchase now)
P3-2 Vibration in Music Perception—Sebastian Merchel, Dresden University of Technology - Dresden, Germany; M. Ercan Altinsoy, Dresden University of Technology - Dresden, Germany
The coupled perception of sound and vibration is a well-known phenomenon during live pop or organ concerts. However, even during a symphonic concert in a classical hall, sound can excite perceivable vibrations on the surface of the body. This study analyzes the influence of audio-induced vibrations on the perceived quality of the concert experience. Therefore, sound and seat vibrations are controlled separately in an audio reproduction scenario. Because the correlation between sound and vibration is naturally strong, vibrations are generated from audio recordings using various approaches. Different parameters during this process (frequency and intensity modifications) are examined in relation to their perceptual consequences using psychophysical experiments. It can be concluded that vibrations play a significant role during the perception of music.
Convention Paper 8818 (Purchase now)
P3-3 An Assessment of Virtual Surround Sound Systems for Headphone Listening of 5.1 Multichannel Audio—Chris Pike, BBC Research and Development - Salford, York, UK; Frank Melchior, BBC Research and Development - Salford, UK
It is now common for broadcast signals to feature 5.1 surround sound. It is also increasingly common that audiences access broadcast content on portable devices using headphones. Binaural techniques can be applied to create a spatially enhanced headphone experience from surround sound content. This paper presents a subjective assessment of the sound quality of 12 state-of-the-art systems for converting 5.1 surround sound to a 2-channel signal for headphone listening. A multiple stimulus test was used with hidden reference and anchors; the reference stimulus was an ITU stereo down-mix. Dynamic binaural synthesis, based on individualized binaural room impulse response measurements and head orientation tracking, was also incorporated into the test. The experimental design and detailed analysis of the results are presented in this paper.
Convention Paper 8819 (Purchase now)
P3-4 Effect of Target Signal Envelope on Direction Discrimination in Spatially Complex Sound Scenarios—Olli Santala, Aalto University School of Electrical Engineering - Aalto, Finland; Marko Takanen, Aalto University School of Electrical Engineering - Aalto, Finland; Ville Pulkki, Aalto University - Aalto, Finland
The temporal envelope of a sound signal has been found to have an effect on localization. Whether this is valid for spatially complex scenarios was addressed by conducting a listening experiment in which a spatially distributed sound source consisted of a target between two interfering noise-like sound sources, all emitting sound simultaneously. All the signals were harmonic complex tones with components within 2 kHz–8.2 kHz and were presented using loudspeaker reproduction in an anechoic chamber. The phases of the harmonic tones of the target signal were altered, causing the envelope to change. The results indicated that prominent peaks in the envelope of the target signal aided in the discrimination of its direction inside the widely distributed sound source.
Convention Paper 8820 (Purchase now)
P3-5 A Framework for Adaptive Real-Time Loudness Control—Andrea Alemanno, Sapienza University of Rome - Rome, Italy; Alessandro Travaglini, Fox International Channels Italy - Guidonia Montecelio (RM), Italy; Simone Scardapane, Sapienza University of Rome - Rome, Italy; Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
Over the last few years, loudness control represents one of the most frequently investigated topics in audio signal processing. In this paper we describe a framework designed to provide adaptive real-time loudness measurement and processing of audio files and streamed content being reproduced by mobile players hosted in laptops, tablets, and mobile phones. The proposed method aims to improve the users’ listening experience by normalizing the loudness level of the content in real-time, while preserving the original creative intent of the original soundtrack. The loudness measurement and adaptation is based on a customization of the High Efficiency Loudness Model algorithm described in the AES Convention Paper #8612 (“HELM: High Efficiency Loudness Model for Broadcast Content,” presented at the 132nd Convention, April 2012). Technical and subjective tests were performed in order to evaluate the performance of the proposed method. In addition, the way the subjective test was arranged offered the opportunity to gather information on the preferred Target Level of streamed and media files reproduced on portable devices.
Convention Paper 8821 (Purchase now)
P3-6 The Perception of Masked Sounds and Reverberation in 3-D vs. 2-D Playback Systems—Giulio Cengarle, Imm Sound S.A., a Dolby company - Barcelona, Spain; Alexandre Pereda, Fundació Barcelona Media - Barcelona, Spain
This paper presents studies on perceptual aspects of spatial audio and their dependency on the playback format. The first study regards the perception of sound in the presence of a masker in stereo, 5.1, and 3-D. Psychoacoustic tests show that the detection threshold improves with the spread of the masker, which justifies the claim that individual elements of dense soundtracks are more audible when they are distributed in a wider panorama. The second study indicates that the perception of the reverberation level does not depend on the playback format. The joint interpretation of these results justifies the claim that in 3-D sound engineers can use higher levels of reverberation without compromising the intelligibility of the sound sources.
Convention Paper 8822 (Purchase now)
P4 - Audio Signal Processing—Part 2
Saturday, May 4, 14:30 — 16:30 (Sala Foscolo)
Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy
P4-1 User-Driven Quality Enhancement for Audio Signal—Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Simone Scardapane, Sapienza University of Rome - Rome, Italy; Michele Scarpiniti, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
Classical methods for audio and speech enhancement are often based on error-driven optimization strategies, such as the mean-square error minimization. However, these approaches do not always satisfy the quality requirements demanded by users of the system. In order to meet subjective specifications, we put forward the idea of a user-driven approach to audio enhancement through the inclusion in the optimization stage of an interactive evolutionary algorithm (IEA). In this way, performance of the system can be adapted to any user in a principled and systematic way, thus reflecting the desired subjective quality. Experiments in the context of echo cancellation support the proposed methodology, showing significant statistical advantage of the proposed framework with respect to classical approaches.
Convention Paper 8823 (Purchase now)
P4-2 Partial Spectral Flatness Measure for Tonality Estimation in a Filter Bank-Based Psychoacoustic Model for Perceptual Audio Coding—Armin Taghipour, International Audio Laboratories Erlangen - Erlangen, Germany; Maneesh Chandra Jaikumar, International Audio Laboratories Erlangen - Erlangen, Germany; Hochschule Rosenheim, University of Applied Science - Rosenheim, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany; Holger Stahl, Hochschule Rosenheim, University of Applied Science - Rosenheim, Germany
Perceptual audio codecs use psychoacoustic models for irrelevancy reduction by exploiting masking effects in the human auditory system. In masking, the tonality of the masker plays an important role and therefore should be evaluated in the psychoacoustic model. In this study a partial Spectral Flatness Measure (SFM) is applied to a filter bank-based psychoacoustic model to estimate tonality. The Infinite Impulse Response (IIR) band-pass filters are designed to take into account the spreading in simultaneous masking. Tonality estimation is adapted to temporal and spectral resolution of the auditory system. Employing subjective audio coding preference tests, the Partial SFM is compared with prediction-based tonality estimation.
Convention Paper 8824 (Purchase now)
P4-3 A New Approach to Model-Based Development for Audio Signal Processing—Carsten Tradowsky, Karlsruhe Institute of Technology (KIT) - Karlsruhe, Germany; CTmusic - Karlsruhe, Germany; Peter Figuli, Karlsruhe Institute of Technology (KIT) - Karlsruhe, Germany; Erik Seidenspinner, Karlsruhe Institute of Technology (KIT) - Karlsruhe, Germany; Felix Held, Karlsruhe Institute of Technology (KIT) - Karlsruhe, Germany; Jürgen Becker, Karlsruhe Institute of Technology (KIT) - Karlsruhe, Germany
Today, digital audio systems are restricted in their functionality. For example, a digital audio player still has a resolution of 16-bit and a sample rate of 44.1 kHz. This relatively low quality does not exhaust the possibilities given by modern hardware for music production. In most cases, the functionality is described in software. This abstraction is very common these days, as only few engineers understand the potential of their target hardware. The design-time increases significantly to develop efficiently for the target hardware. Because of the use of common compiler tool chains the software is statically mapped onto the hardware. This restricts the number of channels per processing core to a minimum when targeting high quality audio. One possibility to close the productivity gap, described above, is to use a high-level model-based development approach. The audio signal processing flow is described in a more abstract high level using the model-based development approach. This model is then platform-independently compiled including automatically generated simulation and verification input. Platform-dependent code can be automatically generated out of the verified model. This enables the evaluation of different target architectures and their trade-offs using the same model description. This paper presents a concept to use a model-based approach to describe audio signal processing algorithms. This concept is used to compile C- and HDL-code out of the same model description to evaluate different target platforms. The goal of this paper is to compare trade-offs for audio signal processing algorithms using a multicore Digital Signal Processor (DSP) target platform. Measurements using data parallelism inside the generated code show a significant speedup on the multicore DSP platform. A conclusion will be made regarding the usability of the proposed model-based tool flow as well as the applicability on the multicore DSP platform.
Convention Paper 8825 (Purchase now)
P4-4 Accordion Music and its Automatic Transcription to MIDI Format—Tomasz Maciejewski, Poznan University of Technology - Poznan, Poland; Ewa Lukasik, Poznan University of Technology - Poznan, Poland
The paper is devoted to the problems related to the automatic transcription of the accordion sound. The accordion is a musical instrument from the free-reed aerophone family that is able to produce polyphonic, multi-chord music. First the playing modes are briefly characterized and problems related to the polyphonic nature of the sound is discussed. Then the results of the analysis and MIDI transcription of the right-side monophonic and polyphonic melodies are presented. Finally, an attempt to transcribe music generated by both sides of the instrument recorded in two channels is presented giving the foundation to further research.
Convention Paper 8827 (Purchase now)
P5 - Speech Processing
Saturday, May 4, 15:00 — 16:30 (Foyer)
P5-1 A Speech-Based System for In-Home Emergency Detection and Remote Assistance—Emanuele Principi, Universitá Politecnica delle Marche - Ancona, Italy; Danilo Fuselli, FBT Elettronica Spa - Recanti (MC), Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Maurizio Bonifazi, FBT Elettronica Spa - Recanti (MC), Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
This paper describes a system for the detection of emergency states and for the remote assistance of people in their own homes. Emergencies are detected recognizing distress calls by means of a speech recognition engine. When an emergency is detected, a phone call is automatically established with a relative or friend by means of a VoIP stack and an Acoustic Echo Canceller. Several low-consuming embedded units are distributed throughout the house to monitor the acoustic environment, and one central unit coordinates the system operation. This unit also integrates multimedia content delivery services and home automation functionalities. Being an ongoing project, this paper describes the entire system and then focuses on the algorithms implemented for the acoustic monitoring and the hands-free communication services. Preliminary experiments have been conducted to assess the performance of the recognition module in noisy and reverberated environments and the out of grammar rejection capabilities. Results showed that the implemented Power Normalized Cepstral Coefficients extraction pipeline improves the word recognition accuracy in noisy and reverberated conditions, and that introducing a "garbage phone" in the acoustic model allows to effectively reject out of grammar words and sentences.
Convention Paper 8828 (Purchase now)
P5-2 Assessment of Speech Quality in the Digital Audio Broadcasting (DAB+) System—Stefan Brachmanski, Wroclaw University of Technology - Wroclaw, Poland; Maurycy Kin, Wroclaw University of Technology - Wroclaw, Poland
The methods for assessment of speech quality fall into two classes: subjective and objective methods. This paper includes an overview of selected methods of subjective listening measurements (ACR, DCR) recommended by ITU-T. The influence of a bit-rate value on the sound quality was a subject of research presented in this paper. The influence of the Spectral Band Replication (SBR) process on the speech quality was also investigated. The tested samples were taken from the Digital Audio Broadcasting experimental emission in Poland as well as from an internet network. The subjective assessment for DAB speech signals has been performed with the use of both: ACR and DCR methods. It turned out that SBR process influences significantly the speech quality at the lower bit-rates making it as good as for higher bit-rates. It was also found that for higher bit-rate values (96 kbit/s, or higher), the use of both methods causes the different results.
Convention Paper 8829 (Purchase now)
P5-3 Investigation on Objective Quality Evaluation for Heavily Distorted Speech—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan
Demand for evaluating speech quality is on the increase. It is advisable for evaluating the speech quality to employ the common objective measure for the wide variety of adverse speech signals. Unfortunately, current speech quality measures do not suit for heavily distorted speech signals. In this paper both the applicability and the limit of the perceptual evaluation of speech quality (PESQ) are investigated compared with the subjective mean opinion score (MOS) for noise-added and noise-reduced speech signals. It is found that the PEAQs are compatible with the MOSs for the noise-reduced speech signals in the non-stationary noise conditions.
Convention Paper 8830 (Purchase now)
P5-4 Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility—Kuba Lopatka, Gdansk University of Technology - Gdansk, Poland; Bartosz Kunka, Gdansk University of Technology - Gdansk, Poland; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
A new algorithm for 5.1 to stereo downmix is introduced that addresses the problem of dialogue intelligibility. The algorithm utilizes proposed signal processing algorithms to enhance the intelligibility of movie dialogue, especially in difficult listening conditions or in compromised speaker setup. To account for the latter, a playback configuration utilizing a portable device, i.e., an ultrabook, is examined. The experiments are presented that confirm the efficiency of the introduced method. Both objective measurements and subjective listening tests were conducted. The new downmix algorithm is compared to the output of a standard downmix matrix method. The results of subjective tests prove that an improved dialogue intelligibility is achieved.
Convention Paper 8831 (Purchase now)
P5-5 Monaural Speech Source Separation by Estimating the Power Spectrum Using Multi-Frequency Harmonic Product Spectrum—David Ayllon, University of Alcala - Alcalá de Henares, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, Spain
This paper proposes an algorithm to perform monaural speech source separation by means of time-frequency masking. The algorithm is based on the estimation of the power spectrum of the original speech signals as a combination of a carrier signal multiplied by an envelope. A Multi-Frequency Harmonic Product Spectrum (MF-HPS) algorithm is used to estimate the fundamental frequency of the signals in the mixture. These frequencies are used to estimate both the carrier and the envelope from the mixture. Binary masks are generated comparing the estimated spectra of the signals. Results show an important improvement in the separation in comparison to the original algorithm that only uses the information from the HPS.
Convention Paper 8832 (Purchase now)
P5-6 The Effectiveness of Speech Transmission Index (STI) in Accounting for the Effects of Multiple Arrivals—Timothy J. Ryan, Webster University - St. Louis, MO, USA; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; William L. Martens, University of Sydney - Sydney, NSW, Australia
The authors conducted concurrent experiments employing subjective evaluation methods to examine the effects of the manipulation of several sound system design and optimization parameters on the intelligibility of reinforced speech. During the course of these experiments, objective testing methods were also employed to measure the Speech Transmission Index (STI) associated with each of the variable treatments used. Included in this paper is a comparison of the results of these two testing methods. The results indicate that, while STI is capable of detecting many effects of multiple arrivals, it appears to overestimate the degradation to intelligibility caused by multiple arrivals with short delay times.
Convention Paper 8833 (Purchase now)
P5-7 Introducing Synchronization of Speech Mixtures in Blind Sparse Separation Problems—Cosme Llerena, University of Alcalá - Alcala de Henares (Madrid), Spain; Lorena Álvarez, University of Alcalá - Alcalá de Henares, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, Spain
This paper explores the feasibility of using synchronization of speech mixtures prior to blind sparse source separation methods in order to improve their results. Broadly, methods that assume sparse sources use level and phase differences between mixtures as their features, and they separate signals from them. If each mixture is considerably delayed with respect to the rest of them, the information extracted from these differences can be wrong. With this idea in mind, this paper will focus on using Time Delay Estimation algorithms in order to synchronize the mixtures and observing the improvement that it provokes in a Blind Sparse Source Separation algorithm. The results obtained show the feasibility of using synchronization of the speech mixtures.
Convention Paper 8834 (Purchase now)
P5-8 An Embedded-Processor Driven Test Bench for Acoustic Feedback Cancellation in Real Environments—Francesco Faccenda, Universitá Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Emanuele Principi, Universitá Politecnica delle Marche - Ancona, Italy; Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
In order to facilitate the communication among speakers, speech reinforcement systems equipped with microphones and loudspeakers are employed. Due to the acoustic couplings between them, the speech intelligibility may result ruined and, moreover, high channel gains could drive the system to instability. Acoustic Feedback Cancellation (AFC) methods need to be applied to keep the system stable. In this paper a new Test Bench for testing AFC algorithms in real environments is proposed. It is based on the TMS320C6748 processor, running the Suppressor-PEM algorithm, a recent technique based on the PEM-AFROW paradigm. The partitioned block frequency domain adaptive filter (PB-FDAF) paradigm has been adopted to keep the computational complexity low. A professional sound card and a PC, where an automatic gain controller has been implemented to prevent signal clipping, complete the framework. Several experimental tests confirmed the framework suitability to operate under diverse acoustic conditions.
Convention Paper 8835 (Purchase now)
P6 - Recording and Production
Sunday, May 5, 09:00 — 13:00 (Sala Carducci)
Alex Case, University of Massachusetts—Lowell - Lowell, MA, USA
P6-1 Automated Tonal Balance Enhancement for Audio Mastering Applications—Stylianos-Ioannis Mimilakis, Technological Educational Institute of Ionian Island - Lixouri, Greece; Konstantinos Drossos, Ionian University - Corfu, Greece; Andreas Floros, Ionian University - Corfu, Greece; Dionysios Katerelos, Technological Educational Institute of Ionian Island - Lixouri, Greece
Modern audio mastering procedures are involved with the selective enhancement or attenuation of specific frequency bands. The main reason is the tonal enhancement of the original / unmastered audio material. The aforementioned process is mostly based on the musical information and the mode of the audio material. This information can be retrieved from a listening procedure of the original stimuli, or the correspondent musical key notes. The current work presents an adaptive and automated equalization system that performs the aforementioned mastering procedure, based on a novel method of fundamental frequency tracking. In addition to this, the overall system is being evaluated with objective PEAQ analysis and subjective listening tests in real mastering audio conditions.
Convention Paper 8836 (Purchase now)
P6-2 A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone Types—Brecht De Man, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
An essential but complicated task in the audio production process is the selection of microphones that are suitable for a particular source. A microphone is often chosen based on price or common practices, rather than whether the microphone actually sounds best in that particular situation. In this paper we perceptually assess six microphone types for recording a female singer. Listening tests using a pairwise and multiple stimuli approach are conducted to identify the order of preference of these microphone types. The results of this comparison are discussed, and the performance of each approach is assessed.
Convention Paper 8837 (Purchase now)
P6-3 Comparison of Power Supply Pumping of Switch-Mode Audio Power Amplifiers with Resistive Loads and Loudspeakers as Loads—Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Lars Press Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Power supply pumping is generated by switch-mode audio power amplifiers in half-bridge configuration, when they are driving energy back into their source. This leads in most designs to a rising rail voltage and can be destructive for either the decoupling capacitors, the rectifier diodes in the power supply or the power stage of the amplifier. Therefore precautions are taken by the amplifier and power supply designer to avoid those effects. Existing power supply pumping models are based on an ohmic load attached to the amplifier. This paper shows the analytical derivation of the resulting waveforms and extends the model to loudspeaker loads. Measurements verify that the amount of supply pumping is reduced by a factor of four when comparing the nominal resistive load to a loudspeaker. A simplified and more accurate model is proposed and the influence of supply pumping on the audio performance is proven to be marginal.
Convention Paper 8838 (Purchase now)
P6-4 The Psychoacoustic Testing of the 3-D Multiformat Microphone Array Design and the Basic Isosceles Triangle Structure of the Array and the Loudspeaker Reproduction Configuration—Michael Williams, Sounds of Scotland - Le Perreux sur Marne, France
Optimizing the loudspeaker configuration for 3-D microphone array design can only be achieved with a clear knowledge of the psychoacoustic parameters of reproduction of height localization. Unfortunately HRTF characteristics do not seem to give us useful information when applied to loudspeaker reproduction. A set of psychoacoustic parameters have to be measured for different configurations in order to design an efficient microphone array recording system, even more so, if a minimalistic approach to array design is going to be a prime objective. In particular the position of a second layer of loudspeakers with respect to the primary horizontal layer is of fundamental importance and can only be based on the psychoacoustics of height perception. What are the localization characteristics between two loudspeakers situated in each of the two layers? Is time difference as against level difference a better approach to interlayer localization? This paper will try to answer these questions and also justify the basic isosceles triangle loudspeaker structure that will help to optimize the reproduction of height information.
Convention Paper 8839 (Purchase now)
P6-5 A Perceptual Audio Mixing Device—Michael J. Terrell, Queen Mary University of London - London, UK; Andrew J. R. Simpson, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
A method and device is presented that allows novice and expert audio engineers to perform mixing using perceptual controls. In this paper we use Auditory Scene Analysis [Bregman, 1990, MIT Press, Cambridge] to relate the multitrack component signals of a mix to the perception of that mix. We define the multitrack components of a mix as a group of audio streams, which are transformed into sound streams by the act of reproduction, and which are ultimately perceived as auditory streams by the listener. The perceptual controls provide direct manipulation of loudness balance within a mixture of sound streams, as well as the overall mix loudness. The system employs a computational optimization strategy to perform automatic signal gain adjustments to component audio-streams, such that the intended loudness balance of the associated sound-streams is produced. Perceptual mixing is performed using a complete auditory model, based on a model of loudness for time-varying sound streams [Glasberg and Moore, J. Audio Eng. Soc., vol. 50, 331-342 (2002 May)]. The use of the auditory model enables the loudness balance to be automatically maintained regardless of the listening level. Thus, a perceptual definition of the mix is presented that is listening-level independent, and a method of realizing the mix practically is given.
Convention Paper 8840 (Purchase now)
P6-6 On the Use of a Haptic Feedback Device for Sound Source Control in Spatial Audio Systems—Frank Melchior, BBC Research and Development - Salford, UK; Chris Pike, BBC Research and Development - Salford, York, UK; Matthew Brooks, BBC Research and Development - Salford, UK; Stuart Grace, BBC Research and Development - Salford, UK
Next generation spatial audio systems are likely to be capable of 3-D sound reproduction. Systems currently under discussion require the sound designer to position and manipulate sound sources in three dimensions. New intuitive tools, designed to meet the requirements of audio production environments, are needed to make efficient use of this new technology. This paper investigates a haptic feedback controller as a user interface for spatial audio systems. The paper will give an overview of conventional tools and controllers. A prototype has been developed based on the requirements of different tasks and reproduction methods. The implementation will be described in detail and the results of a user evaluation will be given.
Convention Paper 8841 (Purchase now)
P6-7 Audio Level Alignment—Evaluation Method and Performance of EBU R 128 by Analyzing Fader Movements—Jon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
A method is proposed for evaluating audio meters in terms of how well the engineer conforms to a level alignment recommendation and succeeds to achieve evenly perceived audio levels. The proposed method is used to evaluate different meter implementations, three conforming to the recommendation EBU R 128 and one conforming to EBU Tech 3205-E. In an experiment, engineers participated in a simulated live broadcast show and the resulting fader movements were recorded. The movements were analyzed in terms of different characteristics: fader mean level, fader variability, and fader movement. Significant effects were found showing that engineers do act differently depending on the meter and recommendation at hand.
Convention Paper 8842 (Purchase now)
P6-8 Balance Preference Testing Utilizing a System of Variable Acoustic Condition—Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
In the modern world of audio production, there exists a significant disconnect between the music mixing control room of the audio professional and the listening space of the consumer or end user. The goal of this research is to evaluate a series of varying acoustic conditions commonly used in such listening environments. Expert listeners are asked to perform basic balancing tasks, under varying acoustic conditions. The listener can remain in position while motorized panels rotate behind a screen, exposing a different acoustic condition for each trial. Results show that listener fatigue as a variable is thereby eliminated, and the subject’s aural memory is quickly cleared, so that the subject is unable to adapt to the newly presented condition for each trial.
Convention Paper 8843 (Purchase now)
P7 - Transducers—Part 1: Loudspeakers
Sunday, May 5, 09:00 — 11:30 (Sala Foscolo)
Balazs Bank, Budapest University of Technology and Economics - Budapest, Hungary
P7-1 Distortion Improvement in the Current Coil of a Loudspeaker—Gaël Pillonnet, University of Lyon - Lyon, France; CPE dept; Eric Sturtzer, University of Lyon - Lyon, France; Timothé Rossignol, ONSemiconductor - Toulouse, France; Pascal Tournier, ON Semiconductor - Toulouse, France; Guy Lemarquand, Universite du Mans - Le Mans Cedex 9, France
This paper deals with the comparison of voltage and current driving units in an active audio system. The effect of the audio amplifier control on the current coil of an electrodynamic loudspeaker is presented. In voltage control topology, the electromagnetic force linked to coil current is controlled through the load impedance. Thus, the electromechanical conversion linearity is decreased by the impedance variation, which implies a reduction of the overall audio quality. A current driving method could reduce the effect of the non-linear impedance by controlling the coil current directly, thereby the acceleration. Large signal impedance modeling is given in this paper to underline the non-linear effects of electrodynamic loudspeaker parameters on the coupling. As a result, the practical comparison of voltage and current driven methods proves that the current control reduces the voice coil current distortions in the three different loudspeakers under test.
Convention Paper 8844 (Purchase now)
P7-2 Driving Electrostatic Transducers—Dennis Nielsen, Technical University of Denmark - Kgs. Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Electrostatic transducers represent a very interesting alternative to the traditional inefficient electrodynamic transducers. In order to establish the full potential of these transducers, power amplifiers that fulfill the strict requirements imposed by such loads (high impedance, frequency depended load, and high bias voltage for linearization) must be developed. This paper analyzes power stages and bias configurations suitable for driving an electrostatic transducer. Measurement results of a ± 300 V prototype amplifier are shown. Measuring THD across a high impedance source is discussed and a high voltage attenuation interface for an audio analyzer is presented. THD below 0:1% is reported.
Convention Paper 8845 (Purchase now)
P7-3 Boundary Element Simulation of an Arrayable Loudspeaker Horn—Tommaso Nizzoli, Acoustic Vibration Consultant - Reggio Emilia, Italy; Stefano Prati, RCF S.p.A. - Reggio Emilia, Italy
The Boundary Element method implemented in a commercial code is used to verify the acoustic directional characteristic in the far field of an arrayable loudspeaker’s horn in comparison with the full space, far field, measured acoustic balloon. A simple model of the full arrayable loudspeaker horn’s splay idealizes each source only by the calculated emission on a flat plane at each horn’s mouth. This approach reduces significantly the BEM’s calculation time with regard to having to model each source by its full geometry. Comparison with the full space, far field, predicted pressure from test data shows good agreement in all the frequency range of interest.
Convention Paper 8846 (Purchase now)
P7-4 Single Permanent Magnet Co-Axial Loudspeakers—Dimitar Dimitrov, BMS Production
Co-axial loudspeakers are designed with a single ring permanent magnetic structure assuring dual flux path for the two voice coil gaps. Internal magnet is used in two realizations with parallel flux division at both its diameters. These two varieties are convenient for Nd magnets and one of them has its local internal flux path crossing two gaps in series for dual membrane compression driver implementation. Another realization uses an external permanent magnetic structure with series flux through the gaps. Proposed co-axial loudspeaker types are very compact, simple, and lightweight. They all can use “Stepped Gap” designs for their low frequency voice coils. Comparative measurements with conventional co-axial loudspeakers reveal competitive performance with much reduced weight and production cost.
Convention Paper 8847 (Purchase now)
P7-5 Multiple Low-Frequency Corner Folded Horn Designs—Rumen Artarski, AVC - Sofia, Bulgaria; Plamen Valtchev, Univox - Sofia, Bulgaria; Dimitar Dimitrov, BMS Production; Yovko Handzhiev, BMS Production
Low-Frequency horn loaded systems for pi/2 radiation are designed employing push-push operated dual membrane loudspeakers, closely mounted in parallel against each other. Tree Axis Symmetrical and efficient horn mouth loading is transformed to a symmetrical and uniform membrane cone loading. By doubling loudspeaker diaphragm, and respectively the horn throat, lower horn cut-off frequency was achieved with the same extension rate, besides acoustic power doubling. Two such corner horn systems could be stacked together for quarter space pi-loading, with important usage in front-of-stage subwoofer applications. Four horn systems are grouped together over the floor or on the ceiling for 2 pi-radiation. Finally, eight such systems could be united for full space low-frequency radiation.
Convention Paper 8848 (Purchase now)
P8 - Audio Processing and Semantics
Sunday, May 5, 09:30 — 11:00 (Foyer)
P8-1 Combination of Growing and Pruning Algorithms for Multilayer Perceptrons for Speech/Music/Noise Classification in Digital Hearing Aids—Lorena Álvarez, University of Alcalá - Alcalá de Henares, Spain; Enrique Alexandre, University of Alcalá - Alcala de Henares (Madrid), Spain; Cosme Llerena, University of Alcalá - Alcala de Henares (Madrid), Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, Spain
This paper explores the feasibility of combining both growing and pruning algorithms in some way that the global approach results in finding a smaller multilayer perceptron (MLP) in terms of network size, which enhances the speech/music/noise classification performance in digital hearing aids, with the added bonus of demanding a lower number of hidden neurons, and consequently, lower computational cost. With this in mind, the paper will focus on the design of an approach that starts adding neurons to an initial small MLP until the stopping criteria for the growing stage is reached. Then, the MLP size is reduced by successively pruning the least significant hidden neurons while maintaining a continuous decreasing function. The results obtained with the proposed approach will be compared with those obtained when using both growing and pruning algorithms separately.
Convention Paper 8850 (Purchase now)
P8-2 Automatic Sample Recognition in Hip-Hop Music Based on Non-Negative Matrix Factorization—Jordan L. Whitney, University of Miami - Coral Gables, FL, USA; Colby N. Leider, University of Miami - Coral Gables, FL, USA
We present a method for automatic detection of samples in hip-hop music. A sample is defined as a short extraction from a source audio corpus that may have been embedded into another audio mixture. A series of non-negative matrix factorizations are applied to spectrograms of hip-hop music and the source material from a master corpus. The factorizations result in matrices of base spectra and amplitude envelopes for the original and mixed audio. Each window of the mixed audio is compared to the original audio clip by examining the extracted amplitude envelopes. Several image-similarity metrics are employed to determine how closely the samples and mixed amplitude envelopes match. Preliminary testing indicates that, distinct from existing audio fingerprinting algorithms, the algorithm we describe is able to confirm instances of sampling in a hip-hop music mixture that the untrained listener is frequently unable to detect.
Convention Paper 8851 (Purchase now)
P8-3 Performance Optimization of GCC-PHAT for Delay and Polarity Correction under Real World Conditions—Nicholas Jillings, Queen Mary University of London - London, UK; Alice Clifford, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
When coherent audio streams are summed, delays can cause comb filtering and polarity inversion can result in cancellation. The GCC-PHAT algorithm is a popular method for detecting (and hence correcting) the delay. This paper explores the performance of the Generalized Cross Correlation with Phase Transform (GCC-PHAT) for delay and polarity correction, under a variety of different conditions and parameter settings, and offers various optimizations for those conditions. In particular, we investigated the performance for moving sources, background noise, and reverberation. We consider the effect of varying the size of the Fourier Transform when performing GCC-PHAT. In addition to accuracy, computational efficiency and latency were also used as metrics of performance.
Convention Paper 8852 (Purchase now)
P8-4 Reducing Binary Masking Artifacts in Blind Audio Source Separation—Toby Stokes, University of Surrey - Guildford, Surrey, UK; Christopher Hummersone, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Binary masking is a common technique for separating target audio from an interferer. Its use is often justified by the high signal-to-noise ratio achieved. The mask can introduce musical noise artifacts, limiting its perceptual performance and that of techniques that use it. Three mask-processing techniques, involving adding noise or cepstral smoothing, are tested and the processed masks are compared to the ideal binary mask using the perceptual evaluation for audio source separation (PEASS) toolkit. Each processing technique's parameters are optimized before the comparison is made. Each technique is found to improve the overall perceptual score of the separation. Results show a trade-off between interferer suppression and artifact reduction.
Convention Paper 8853 (Purchase now)
P8-5 Detection of Sinusoids Using Statistical Goodness-of-Fit Test—Pushkar P. Patwardhan, Nokia India Pvt. Ltd. - Bangalore, India; Ravi R. Shenoy, Nokia India Pvt. Ltd. - Bangalore, India
Detection of tonal components from magnitude spectrum is an important initial step in several speech and audio processing applications. In this paper we present an approach for detecting sinusoidal components from the magnitude spectrum using “goodness-of-fit” test. The key idea is to test the null-hypothesis that the region of spectrum under observation is drawn from the magnitude spectrum of an ideal windowed-sinusoid. This hypothesis is tested with a chi-square “goodness-of-fit” test. The outcome of this hypothesis test is a decision about the presence of sinusoid in the observed region of magnitude spectrum. We have evaluated the performance of the proposed approach using synthetically generated samples containing steady and modulated harmonics in clean and noisy conditions.
Convention Paper 8854 (Purchase now)
P8-6 Novel Designs for the Parametric Peaking EQ User Interface for Single Channel Corrective EQ Tasks—Christopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper evaluates the suitability of existing parametric peaking EQ interfaces of analog and digital mixing desks and audio plugins for single channel corrective EQ tasks. It proposes novel alternatives based upon displaying FFT bin maximums for the full audio duration behind the EQ curve, automatically detecting and displaying the top five FFT bin maximum peaks to assist the engineer, an alternative numerical list display of top five FFT bin maximum peaks, and an interface that allows direct manipulation of the displayed FFT bin maximums. All interfaces were evaluated based on the time taken to perform a corrective EQ task, preference ranking, and qualitative comments. Results indicate that the novel EQ interfaces presented have potential over existing EQ interfaces.
Convention Paper 8855 (Purchase now)
P8-7 Drum Replacement Using Wavelet Filtering—Robert Barañski, AGH University of Science and Technology - Kracow, Poland; Szymon Piotrowski, AGH University of Science and Technology - Kracow, Poland; Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland
The paper presents the solution that can be used to unify snare drum sound within a chosen fragment. The algorithm is based on the wavelet transformation and allows replacement of sub-bands of particular sounds, which are outside a certain range. Five experienced sound engineers put the algorithm under the test using samples of five different snare drums. Wavelet filtering seems to be useful in terms of drum replacement, while the sound engineers response was, in the most cases, positive.
Convention Paper 8856 (Purchase now)
P8-8 Collaborative Annotation Platform for Audio Semantics—Nikolaos Tsipas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
In the majority of audio classification tasks that involve supervised machine learning, ground truth samples are regularly required as training inputs. Most researchers in this field usually annotate audio content by hand and for their individual requirements. This practice resulted in the absence of solid datasets and consequently research conducted by different researchers on the same topic cannot be effectively pulled together and elaborated on. A collaborative audio annotation platform is proposed for both scientific and application oriented audio-semantic tasks. Innovation points include easy operation and interoperability, on the fly annotation while playing audio content online, efficient collaboration with feature engines and machine learning algorithms, enhanced interaction, and personalization via state of the art Web 2.0 /3.0 services.
Convention Paper 8857 (Purchase now)
P8-9 Investigation of Wavelet Approaches for Joint Temporal, Spectral and Cepstral Features in Audio Semantics—Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the investigation of wavelet approaches for joint time, frequency, and cepstral audio feature extraction. Wavelets have been thoroughly studied over the last decades as an alternative signal analysis approach. Wavelet-features have also been successfully implemented in a variety of pattern recognition applications, including audio semantics. Recently, wavelet-adapted mel-frequency cepstral coefficients have been proposed as applicable features in speech recognition and general audio classification, incorporating perceptual attributes. In this context, various wavelet configuration-schemes are examined for wavelet-cepstral audio features extraction. Additional wavelet parameters are utilized in the formation of wavelet-feature-vectors and evaluated in terms of salient feature ranking. Comparisons with classical time-frequency and cepstral audio features are conducted in typical audio-semantics scenarios.
Convention Paper 8858 (Purchase now)
P10 - Transducers—Part 2: Arrays, Microphones
Sunday, May 5, 14:15 — 17:15 (Sala Foscolo)
P10-1 The Radiation Characteristics of an Array of Horizontally Asymmetrical Waveguides that Utilize Continuous Arc Diffraction Slots—Soichiro Hayashi, Bose Corporation - Framingham, MA, USA; Akira Mochimaru, Bose Corporation - Framingham, MA, USA; Paul F. Fidlin, Bose Corporation - Framinham, MA, USA
Previous work presented the radiation characteristics of a horizontally asymmetrical waveguide that utilizes a continuous arc diffraction slot. It showed good coverage control above 1 kHz, as long as the waveguide center line axis angle stays below a certain angle limit. This paper examines the radiation characteristics of an array of horizontally asymmetrical waveguides. Waveguides with different angular variations are developed, and several vertical arrays are constructed consisting of those waveguides. The radiation characteristics of the arrays are measured. Horizontally symmetrical and asymmetrical vertical arrays are compared, and the consistency inside of the coverage and limitations are discussed.
Convention Paper 8865 (Purchase now)
P10-2 Numerical Simulation of Microphone Wind Noise, Part 1: External Flow—Juha Backman, Nokia Corporation - Espoo, Finland
This paper discusses the use of the computational fluid dynamics (CFD) for computational analysis of microphone wind noise. The first part of the work, presented in this paper, discusses the behavior of the flow around the microphone. One of the practical questions answered in this work is the well-known difference between “pop noise,” i.e., noise caused by transient flows, and wind noise generated by more stationary flows. It appears that boundary layer separation and related modification of flow field near the boundary is a significant factor in transient flow noise, while vortex shedding, emerging at higher flow velocities, is significant for steady state flow. The paper also discusses the effects of the geometrical shape and surface details on the wind noise.
Convention Paper 8866 (Purchase now)
P10-3 Listener Preferences for Different Headphone Target Response Curves—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
There is little consensus among headphone manufacturers on the preferred headphone target frequency response required to produce optimal sound quality for reproduction of stereo recordings. To explore this topic further we conducted two double-blind listening tests in which trained listeners rated their preferences for eight different headphone target frequency responses reproduced using two different models of headphones. The target curves included the diffuse-field and free-field curves in ISO 11904-2, a modified diffuse-field target recommended by Lorho, the unequalized headphone, and a new target response based on acoustical measurements of a calibrated loudspeaker system in a listening room. For both headphones the new target based on an in-room loudspeaker response was the most preferred target response curve.
Convention Paper 8867 (Purchase now)
P10-4 Optimal Condition of Receiving Transducer in Wireless Power Transfer Based on Ultrasonic Resonance Technology—WooSub Youm, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Gunn Hwang, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Sung Q Lee, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea
Recently, wireless power transfer technology has drawn lots of attention because of wire reduction and charging convenience. Previous technologies such as magnetic resonance and induction coupling have drawbacks of short transfer distance and harmfulness to human health. As an alternative, ultrasonic resonance wireless power transfer technology is proposed. A pair of commercial ultrasonic transducer arrays for sensor application is used for wireless power transfer. There are two parameters that have to be decided to maximize the efficiency of power transfer. The first parameter is load resistance that is connected to receiving transducer. It should be matched to the impedance of receiving the transducer at resonance frequency. The second one is the operating frequency of transmitting transducer. It should be matched to the optimal frequency of the receiving transducer. In this paper this optimal load resistance and the frequency of the receiving transducer are analyzed based on circuit theory and verified through experiment.
Convention Paper 8868 (Purchase now)
P10-5 Design of a Headphone Equalizer Control Based on Principal Component Analysis—Felix Fleischmann, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bernhard Neugebauer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Unlike for loudspeakers, the optimal frequency response for headphones is not flat. A perceptually optimal target equalization curve for headphones was identified in a previous study. Moreover, strong variability in the frequency characteristics of 13 popular headphone models was observed. Model-specific equalization filters can be implemented but would require the headphone to be known. For most consumer applications this seems impractical. Principal component analysis was applied to the measured headphone equalization data, followed by a reduction of the degrees of freedom. The remaining error is identified objectively. It can be shown that using only one principal component, the sum of a fixed and a weighted filter curve can replace model-specific equalization.
Convention Paper 8869 (Purchase now)
P10-6 Effect of Elastic Gel Layer on Energy Transfer from the Source to Moving Part of Sound Transducer—Minsung Cho, Edinburgh Napier University - Edinburgh, UK; Elena Prokofieva, Edinburgh Napier University - Edinburgh, UK; Mike Barker, Edinburgh Napier University - Edinburgh, UK
The use of porous materials in sound is quite well-known. The materials can be used for absorption of unwanted radiation, for dissipation, and re-distribution of the waves while travelling through the thickness of material and for enhancement of the overall sound and therefore energy transfer within the material. A three layer system comprising of two rigid layers of material with a soft gel middle layer between them was investigated in this research work to establish the effect of the gel material on the system’s energy performance. The role of the gel layer in transferring the energy to the panel was investigated. It was demonstrated by experiments that the gel layer between the top layer and the last layer enhances the performance of the overall construction and also minimizes the mechanical distortion by absorbing its bending waves. This effect enables to extend the radiating frequency of the construction up to high and down to lower frequencies.
Convention Paper 8849 (Purchase now)
P9 - Room Acoustics
Sunday, May 5, 14:30 — 18:00 (Sala Carducci)
Chris Baume, BBC Research and Development - London, UK
P9-1 Various Applications of Active Field Control—Takayuki Watanabe, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan
The Active Field Control system is an acoustic enhancement system that was developed to improve the acoustic conditions of a space so as to match the acoustic conditions required for a variety of different types of performance programs. This system is unique in that it uses FIR filtering to ensure freedom of control and the concept of spatial averaging to achieve stability with a lower number of channels than comparative systems. This system has been used in over 70 projects in both the U.S. and Japan. This paper will provide an overview of the characteristics of the system and examples of how the system has been applied.
Convention Paper 8859 (Purchase now)
P9-2 Comparative Acoustic Measurements: Spherical Sound Source vs. Dodecahedron—Plamen Valtchev, Univox - Sofia, Bulgaria; Denise Gerganova, Spherovox
Spherical sound source, consisting of a pair of coaxial loudspeakers and a pair of compression drivers and radiating into a common radially expanding horn, is used for acoustic measurements of rooms for speech and music. For exactly the same source-microphone pair positions, comparative measurements are made with a typical dodecahedron, keeping the same microphone technique, identical signals, and recording hardware under the same measuring conditions. Several software programs were used for evaluation of the acoustical parameters extracted from impulse responses. Parameters are presented in tables and graphics for better sound source comparisons. Spherical sound source reveals higher dynamic range and perfectly repeatable parameters with source rotation, which is in contrast to dodecahedron, where rotation steps resulted in some parameters’ deviation.
Convention Paper 8860 (Purchase now)
P9-3 Archaeoacoustics: An Introduction—A New Take on an Old Science—Lise-Lotte Tjellesen, CLARP - London, UK; Karen Colligan, CLARP - London, UK
What is Archaeoacoustics and how is it defined? This paper will discuss the history and varying aspects of the discipline of archaeoacoustics, i.e., sound that has been measured, modeled, and analyzed with modern techniques in and around Ancient sites, temple complexes, and standing stones. Piecing together sound environments from a long lost past it is brought to life as a tool for archaeologists and historians. This paper will provide a general overview of some of the most prolific studies to date, discuss some measurement and modeling methods, and discuss where archaeoacoustics may be headed in the future and what purpose it serves in academia.
Convention Paper 8861 (Purchase now)
P9-4 Scattering Effects in Small-Rooms: From Time and Frequency Analysis to Psychoacoustic Investigation—Lorenzo Rizzi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Gabriele Ghelfi, Suono e Vita - Acoustic Engineering - Lecco, Italy
This work continues the authors’ effort to optimize a DSP tool for extrapolating from R.I.R. information regarding mixing time and sound scattering effects with in-situ measurements. Confirming past thesis, a new specific experiment allowed to scrutinize the effects of QRD scattering panels over non-Sabinian environments, both in frequency and in time domain. Listening tests have been performed to investigate perception of scattering panels effecting small-room acoustic quality. The sound diffusion properties have been searched with specific headphone auralization interviews, convolving known R.I.R.s with anechoic musical samples and correlating calculated data to psychoacoustic responses. The results validate the known effect on close field recording in small rooms for music and recording giving new insights.
Convention Paper 8862 (Purchase now)
P9-5 The Effects of Temporal Alignment of Loudspeaker Array Elements on Speech Intelligibility—Timothy J. Ryan, Webster University - St. Louis, MO, USA; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; William L. Martens, University of Sydney - Sydney, NSW, Australia
The effects of multiple arrivals on the intelligibility of speech produced by live-sound reinforcement systems are examined. Investigated variables include the delay time between arrivals from multiple loudspeakers within an array and the geometry and type of array. Subjective testing, using captured binaural recordings of the Modified Rhyme Test under various treatment conditions, was carried out to determine the first- and second-order effects of the two experimental variables. Results indicate that different interaction effects exist for different amounts of delay offset.
Convention Paper 8863 (Purchase now)
P9-6 Some Practical Aspects of STI Measurement and Prediction—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
The Speech Transmission Index (STI) has become the internationally accepted method of testing and assessing the potential intelligibility of sound systems. The technique is standardized in IEC 60268-16, however, it is not a flawless technique. The paper discusses a number of common mechanisms that can affect the accuracy of STI measurements and predictions. In particular it is shown that RaSTI is a poor predictor of STI in many sound system applications and that the standard speech spectrum assumed by STI often does not replicate the speech spectrum of real announcements and is not in good agreement with other speech spectrum studies. The effects on STI measurements of common signal processing techniques such as equalization, compression, and AGC are also demonstrated and the implications discussed. The simplified STI derivative STIPA is shown to be a more robust method of assessing sound systems than RaSTI and when applied as a direct measurement method can have significant advantages over Impulse Response-based STI measurement techniques.
Convention Paper 8864 (Purchase now)
P9-7 Combined Quasi-Anechoic and In-Room Equalization of Loudspeaker Responses—Balazs Bank, Budapest University of Technology and Economics - Budapest, Hungary
This paper presents a combined approach to loudspeaker/room response equalization based on simple in-room measurements. In the first step, the anechoic response of the loudspeaker, which mostly determines localization and timbre perception, is equalized with a low-order non-minimum phase equalizer. This is actually done using the gated in-room response, which of course means that the equalization is incorrect at low frequencies where the gate time is shorter than the anechoic impulse response. In the second step, a standard, fractional-octave resolution minimum-phase equalizer is designed based on the in-room response pre-equalized with the quasi-anechoic equalizer. This second step, in addition to correcting the room response, automatically compensates the low-frequency errors made in the quasi-anechoic equalizer design when we were using gated responses.
Convention Paper 8826 (Purchase now)
P11 - Perception and Education
Sunday, May 5, 15:30 — 17:00 (Foyer)
P11-1 Improve the Listening Ability Using E-Learning Methods—Bartlomiej Kruk, Wroclaw University of Technology - Wroclaw, Poland; Bartosz Zawieja, Wroclaw University of Technology - Wroclaw, Poland
The main aim of this paper is to show the possibility of listening training using the methods of e-learning combined with the classical method of teaching. The described technique uses all available electronic media as well as traditional teaching methods. Theory and practice examples are discussed. However, e-learning allows the students to work independently and provide tests at a convenient place and time. The e-learning exercises have been designed to develop skills including memorization, timbre description, and improving hearing sensitivity for changes in sound.
Convention Paper 8870 (Purchase now)
P11-2 Optimizing Teaching Room Acoustics: Investigating the Exclusive Use of a Distributed Electroacoustic Installation to Improve the Speech Intelligibility—Panagiotis Hatziantoniou, University of Patras - Patras, Greece; Nicolas Tatlas, Technological Educational Institute of Piraeus - Athens, Greece; Stelios M. Potirakis, Technological Educational Institute of Piraeus - Athens, Greece
The possibility to improve speech intelligibility in classrooms of inadequate acoustic design, exclusively by using an electroacoustic installation is investigated in this paper. Measurement results derived from an overall six-loudspeaker arrangement response in different locations inside a test classroom are compared to those derived from a one-loudspeaker electroacoustic response in the speaker’s location. Preliminary results indicate that well-established parameters such as C-50 and D-50 automatically calculated, show no significant improvement. Extended investigation of the responses as well as other criteria such as direct to reverb ratios (DRR) shown that there is noteworthy enhancement in addition to the subjective acoustic perception. Moreover, DRR is shown to further improve from the employment of a room correction technique based on smoothed responses.
Convention Paper 8871 (Purchase now)
P11-3 Software Techniques for Good Practice in Audio and Music Research—Luis Figueira, Queen Mary University of London - London, UK; Chris Cannam, Queen Mary University of London - London, UK; Mark Plumbley, Queen Mary University of London - London, UK
In this paper we discuss how software development can be improved in the audio and music research community by implementing tighter and more effective development feedback loops. We suggest first that researchers in an academic environment can benefit from the straightforward application of peer code review, even for ad-hoc research software; and second, that researchers should adopt automated software unit testing from the start of research projects. We discuss and illustrate how to adopt both code reviews and unit testing in a research environment. Finally, we observe that the use of a software version control system provides support for the foundations of both code reviews and automated unit tests. We therefore also propose that researchers should use version control with all their projects from the earliest stage.
Convention Paper 8872 (Purchase now)
P11-4 A Practical Step-by-Step Guide to the Time-Varying Loudness Model of Moore, Glasberg, and Baer (1997; 2002)—Andrew J. R. Simpson, Queen Mary University of London - London, UK; Michael J. Terrell, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
In this tutorial article we provide a condensed, practical step-by-step guide to the excitation pattern loudness model of Moore, Glasberg, and Baer [J. Audio Eng. Soc., vol. 45, 224–240 (1997 Apr.); J. Audio Eng. Soc., vol. 50, 331–342 (2002 May)]. The various components of this model have been separately described in the well-known publications of Patterson et al. [J. Acoust. Soc. Am., vol. 72, 1788–1803 (1982)], Moore [Hearing, 161-205 (Academic Press 1995)], Moore et al. (1997), and Glasberg and Moore (2002). This paper provides a consolidated and concise introduction to the complete model for those who find the disparate and complex references intimidating and who wish to understand the function of each of the component parts. Furthermore, we provide a consolidated notation and integral forms. This introduction may be useful to the loudness theory beginner and to those who wish to adapt and apply the model for novel, practical purposes.
Convention Paper 8873 (Purchase now)
P11-5 Subjective Evaluation of Sound Quality of Musical Recordings Transmitted via the DAB+ System—Maurycy Kin, Wroclaw University of Technology - Wroclaw, Poland
The results of research on the sound quality of various kinds of music transmitted via Digital Audio Broadcasting using Absolute Category Rating and Comparison Category Rating methods of scaling are presented. The results showed that bit-rate values influence significantly the results. A Spectral Band Replication processing of signals increases the sound quality higher for low bit-rates than for higher values, dependently on a kind of music. The spatial attributes of sound, as a perspective, spaciousness, localization stability and an accuracy of phantom source, also are dependent on the bit-rate, but these relations are different. It was also found that a method of evaluation gives different results, and a CCR method is more accurate for sound assessment for higher bit-rates.
Convention Paper 8874 (Purchase now)
P11-6 Music and Emotions: A Comparison of Measurement Methods—Judith Liebetrau, Fraunhofer IDMT - Ilmenau, Germany; Ilmenau University of Technology - Ilmenau, Germany; Sebastian Schneider, Ilmenau University of Technology - Ilmenau, Germany
Music emotion recognition (MER) as a part of music information retrieval (MIR), examines the question which parts of music evoke what emotions and how can they be automatically classified. Classification systems need to be trained in terms of feature selection and prediction. Due to the subjectivity of emotions, the generation of appropriate ground truth data poses challenges for MER. This paper describes obstacles of defining and measuring emotions evoked by music. Two methods, in principle able to overcome problems in measuring affective states induced by music, are outlined and their results are compared. Although the results of both methods are in line with psychological theories of emotions, the question remains how good the perceived emotions are captured by either method and if these methods are sufficient for ground truth generation.
Convention Paper 8875 (Purchase now)
P11-7 Multidimensional Scaling Analysis Applied to Music Mood Recognition—Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
The paper presents two experiments aimed at categorizing mood associated with music. Two parts of a listening test were designed and carried out with a group of students, most of whom where users of online social music services. The initial experiment was designed to evaluate the extent to which a given label describes the mood of the particular music excerpt. The second subjective test was conducted to collect the similarity data for the MDS (Multidimensional Scaling) analysis. Results were subject of various MDS and correlation analysis. Obtained MDS representation is relevant and remains coherent with acclaimed 2-dimensional Thayer’s model as well as with evaluation using six mood labels.
Convention Paper 8876 (Purchase now)
P11-8 Artificial Stereo Extension Based on Gaussian Mixture Model—Nam In Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper an artificial stereo extension method is proposed to provide the stereophonic sound. The proposed method employs a minimum mean squared error (MMSE) estimator based on a Gaussian mixture model (GMM) to produce stereo signals from a mono signal. The performance of the proposed stereo extension method is evaluated using a multiple stimuli with a hidden reference and anchor (MUSHRA) test and compared with that of the parametric stereo method. It is shown from the test that the mean opinion score of the signals extended by the proposed stereo extension method is around 5% higher than that of the conventional stereo extension method based on inter-channel coherence (ICC).
Convention Paper 8877 (Purchase now)
P12 - Spatial Audio—Part 1: Binaural, HRTF
Monday, May 6, 09:00 — 13:00 (Sala Carducci)
Michele Geronazzo, University of Padova - Padova, Italy
P12-1 Binaural Ambisonic Decoding with Enhanced Lateral Localization—Tim Collins, University of Birmingham - Birmingham, UK
When rendering an ambisonic recording, a uniform speaker array is often preferred with the number of speakers chosen to suit the ambisonic order. Using this arrangement, localization in the lateral regions can be poor but can be improved by increasing the number of speakers. However, in practice this can lead to undesirable spectral impairment. In this paper a time-domain analysis of the ambisonic decoding problem is presented that highlights how a non-uniform speaker distribution can be used to improve localization without incurring perceptual spectral impairment. This is especially relevant to binaural decoders, where the locations of the virtual speakers are fixed with respect to the head, meaning that the interaction between speakers can be reliably predicted.
Convention Paper 8878 (Purchase now)
P12-2 A Cluster and Subjective Selection-Based HRTF Customization Scheme for Improving Binaural Reproduction of 5.1 Channel Surround Sound—Bosun Xie, South China University of Technology - Guangzhou, China; Chengyun Zhang, South China University of Technology - Guangzhou, China; Xiaoli Zhong, South China University of Technology - Guangzhou, China
This work proposes a cluster and subjective selection-based HRTF customization scheme for improving binaural reproduction of 5.1 channel surround sound. Based on similarity of HRTFs from an HRTF database with 52 subjects, a cluster analysis on HRTF magnitudes is applied. Results indicate that HRTFs of most subjects can be classified into seven clusters and represented by the corresponding cluster centers. Subsequently, HRTFs used in binaural 5.1 channel reproduction are customized from the seven cluster centers by means of subjective selection, i.e., a subjective selection-based customization scheme. Psychoacoustic experiments indicate that the proposed scheme partly improves the localization performance in the binaural 5.1 channel surround sound.
Convention Paper 8879 (Purchase now)
P12-3 Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer Functions—Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria; Yukio Iwaya, Tohoku Gakuin University - Tagajo, Japan; Thibaut Carpentier, UMR STMS IRCAM-CNRS-UPMC - Paris, France; Rozenn Nicol, Orange Labs, France Telecom - Lannion, France; Matthieu Parmentier, France Television - Paris, France; Agnieszka Roginska, New York University - New York, NY, USA; Yoiti Suzuki, Tohoku University - Sendai, Japan; Kankji Watanabe, Akita Prefectural University - Yuri-Honjo, Japan; Hagen Wierstorf, Technische Universität Berlin - Berlin, Germany; Harald Ziegelwanger, Austrian Academy of Sciences - Vienna, Austria; Markus Noisternig, UMR STMS IRCAM-CNRS-UPMC - Paris, France
Head-related transfer functions (HRTFs) describe the spatial filtering of the incoming sound. So far available HRTFs are stored in various formats, making an exchange of HRTFs difficult because of incompatibilities between the formats. We propose a format for storing HRTFs with a focus on interchangeability and extendability. The spatially oriented format for acoustics (SOFA) aims at representing HRTFs in a general way, thus, allowing to store data such as directional room impulse responses (DRIRs) measured with a microphone-array excited by a loudspeaker array. SOFA specifications consider data compression, network transfer, a link to complex room geometries, and aim at simplifying the development of programming interfaces for Matlab, Octave, and C++. SOFA conventions for a consistent description of measurement setups are provided for future HRTF and DRIR databases.
Convention Paper 8880 (Purchase now)
P12-4 Head Movements in Three-Dimensional Localization—Tommy Ashby, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Previous studies give contradicting evidence as to the importance of head movements in localization. In this study head movements were shown to increase localization response accuracy in elevation and azimuth. For elevation, it was found that head movement improved localization accuracy in some cases and that when pinna cues were impeded the significance of head movement cues was increased. For azimuth localization, head movement reduced front-back confusions. There was also evidence that head movement can be used to enhance static cues for azimuth localization. Finally, it appears that head movement can increase the accuracy of listeners’ responses by enabling an interaction between auditory and visual cues.
Convention Paper 8881 (Purchase now)
P12-5 A Modular Framework for the Analysis and Synthesis of Head-Related Transfer Functions—Michele Geronazzo, University of Padova - Padova, Italy; Simone Spagnol, University of Padova - Padova, Italy; Federico Avanzini, University of Padova - Padova, Italy
The paper gives an overview of a number of tools for the analysis and synthesis of head-related transfer functions (HRTFs) that we have developed in the past four years at the Department of Information Engineering, University of Padova, Italy. The main objective of our study in this context is the progressive development of a collection of algorithms for the construction of a totally synthetic personal HRTF set replacing both cumbersome and tedious individual HRTF measurements and the exploitation of inaccurate non-individual HRTF sets. Our research methodology is highlighted, along with the multiple possibilities of present and future research offered by such tools.
Convention Paper 8882 (Purchase now)
P12-6 Measuring Directional Characteristics of In-Ear Recording Devices—Flemming Christensen, Aalborg University - Aalborg, Denmark; Pablo F. Hoffmann, Aalborg University - Aalborg, Denmark; Dorte Hammershøi, Aalborg University - Aalborg, Denmark
With the availability of small in-ear headphones and miniature microphones it is possible to construct combined in-ear devices for binaural recording and playback. When mounting a microphone on the outside of an insert earphone the microphone position deviates from ideal positions in the ear canal. The pinna and thereby also the natural sound transmission are altered by the inserted device. This paper presents a methodology for accurately measuring the directional dependent transfer functions of such in-ear devices. Pilot measurements on a commercially available device are presented and possibilities for electronic compensation of the non-ideal characteristics are considered.
Convention Paper 8883 (Purchase now)
P12-7 Modeling the Broadband Time-of-Arrival of the Head-Related Transfer Functions for Binaural Audio—Harald Ziegelwanger, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria
Binaural audio is based on the head-related transfer functions (HRTFs) that provide directional cues for the localization of virtual sound sources. HRTFs incorporate the time-of-arrival (TOA), the monaural timing information yielding the interaural time difference, essential for the rendering of multiple virtual sound sources. In this study we propose a method to robustly estimate spatially continuous TOA from an HRTF set. The method is based on a directional outlier remover and a geometrical model of the HRTF measurement setup. We show results for HRTFs of human listeners from three HRTF databases. The benefits of calculating the TOA in the light of the HRTF analysis, modifications, and synthesis are discussed.
Convention Paper 8884 (Purchase now)
P12-8 Multichannel Ring Upmix—Christof Faller, Illusonic GmbH - Uster, Switzerland; Lutz Altmann, IOSONO GmbH - Erfurt, Germany; Jeff Levison, IOSONO GmbH - Erfurt, Germany; Markus Schmidt, Illusonic GmbH - Uster, Switzerland
Multichannel spatial decompositions and upmixes have been proposed, but these are usually based on an unrealistically simple direct/ambient sound model, not capturing the full diversity offered by N discrete audio channels, where in an extreme case each channel can contain an independent sound source. While it has been argued that a simple direct/ambient model is sufficient, in practice such is limiting the achievable audio quality. To circumvent the problem of capturing multichannel signals with a single model, the proposed “ring upmix" applies a cascade of 2-channel upmixes to surround signals to generate channels for setups with more loudspeakers featuring full support for 360-degree panning with high channel separation.
Convention Paper 8908 (Purchase now)
P13 - Room Acoustics
Monday, May 6, 10:00 — 11:30 (Foyer)
P13-1 The Effect of Playback System on Reverberation Level Preference—Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
The critical role of reverberation in modern acoustic music production is undeniable. Unlike many other effects, reverberation’s spatial nature makes it extremely dependent upon the playback system over which it is experienced. While this characteristic of reverberation has been widely acknowledged among recording engineers for years, the increase in headphone listening prompts further exploration of these effects. In this study listeners are asked to add reverberation to a dry signal as presented over two different playback systems: headphones and loudspeakers. The final reverberation levels set by each subject are compared for the two monitoring systems. The resulting data show significant level differences across the two monitoring systems.
Convention Paper 8886 (Purchase now)
P13-2 Adaptation of a Large Exhibition Hall as a Concert Hall Using Simulation and Measurement Tools—Marco Facondini, TanAcoustics Studio - Pesaro (PU), Italy; Daniele Ponteggia, Studio Ing. Ponteggia - Terni (TR), Italy
Due to the growing demand of multifunctional performing spaces, there is a strong interest in the adaptation of non-dedicated spaces to host musical performances. This leads to new challenges for the acousticians with new design constraints and very tight time frames. This paper shows a practical example of the adaptation of the “Sala della Piazza” of the “Palacongressi” of Rimini. Thanks to the combined use of prediction and measurement tools it has been possible to design the acoustical treatments with a high degree of accuracy, reaching all targets and at the same time respecting the tight deadlines.
Convention Paper 8887 (Purchase now)
P13-3 Digital Filter for Modeling Air Absorption in Real Time—Carlo Petruzzellis, ZP Engineering S.r.L. - Rome, Italy; Umberto Zanghieri, ZP Engineering S.r.L. - Rome, Italy
Sound atmospheric attenuation is a relevant aspect of realistic space modeling in 3-D audio simulation systems. A digital filter has been developed on commercial DSP processors to match air absorption curves. This paper focuses on the algorithm implementation of a digital filter with continuous roll-off control, to simulate high frequency damping of audio signals in various atmospheric conditions, along with rules to allow a precise approximation of the behavior described by analytical formulas.
Convention Paper 8888 (Purchase now)
P13-4 Development of Multipoint Mixed-Phase Equalization System for Multiple Environments—Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Marco Virgulti, Universitá Politecnica della Marche - Ancona, Italy; Stefano Doria, Leaff Engineering - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
The development of a mixed-phase equalizer is still an open problem in the field of room response equalization. In this context, a multipoint mixed-phased impulse response equalization system is presented taking into consideration a magnitude equalization procedure based on a time-frequency segmentation of the impulse responses and a phase equalization technique based on the group delay analysis. Furthermore, an automatic software tool for the measurement of the environment impulse responses and for the design of a suitable equalizer is presented. Taking advantage of this tool, several tests have been performed considering objective and subjective analysis applied in a real environment and comparing the obtained results with different approaches.
Convention Paper 8889 (Purchase now)
P13-5 Acoustics Modernization of the Recording Studio in Wroclaw University of Technology—Magdalena Kaminska, Wroclaw University of Technology - Wroclaw, Poland; Patryk Kobylt, Wroclaw University of Technology - Wroclaw, Poland; Bartlomiej Kruk, Wroclaw University of Technology - Wroclaw, Poland; Jan Sokolnicki, Wroclaw University of Technology - Wroclaw, Poland
The aim of this paper is to present results of the acoustic modernization at the Wroclaw University of Technology recording studio. During the project realization, the focus is on the problem arising in one part of the recording studio—the so-called flutter echoes phenomenon. To minimize this effect we present a several-stage process in which the studio is accommodated to expect this occurrence. The first step was to make some measurements of acoustic properties in the room with the concentration on the previously mentioned effect. Next, a one-dimension diffuser was designed and placed in the phenomenon incidence. The last stage of the research was an acoustic measurement after modification and comparison with the properties before the changes.
Convention Paper 8890 (Purchase now)
P13-6 Accurate Acoustic Echo Reduction with Residual Echo Power Estimation for Long Reverberation—Masahiro Fukui, NTT Corporation - Musashino-shi, Tokyo, Japan; Suehiro Shimauchi, NTT Corporation - Musashino-shi, Tokyo, Japan; Yusuki Hioka, University of Canterbury - Christchurch, New Zealand; Hitoshi Ohmuro, NTT Corporation - Musashino-shi, Tokyo, Japan; Yoichi Haneda, The University of Electro-Communications - Chofu-shi, Tokyo, Japan
This paper deals with the problem of estimating and reducing residual echo components that result from reverberant components beyond the length of FFT block. The residual echo reduction process suppresses the residual echo by applying a multiplicative gain calculated from the estimated echo power spectrum. However, the estimated power spectrum reproduces only a fraction of the echo-path impulse response and so all the reverberant component are not considered. To address this problem we introduce a finite nonnegative convolution method by which each segment of echo-impulse response is convoluted with a received signal in a power spectral domain. With the proposed method, the power spectra of each segment of echo-impulse response are collectively estimated by solving the least-mean-squares problem between the microphone and the estimated-residual-echo power spectra. The performance of this method was demonstrated by simulation results in which speech distortions were decreased compared with the conventional method.
Convention Paper 8891 (Purchase now)
P14 - Applications in Audio
Monday, May 6, 14:30 — 17:30 (Sala Carducci)
Juha Backman, Nokia Corporation - Espoo, Finland
P14-1 Implementation of an Intelligent Equalization Tool Using Yule-Walker for Music Mixing and Mastering—Zheng Ma, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Dawn A. A. Black, Queen Mary University of London - London, UK
A new approach for automatically equalizing an audio signal toward a target frequency spectrum is presented. The algorithm is based on the Yule-Walker method and designs recursive IIR digital filters using a least-squares fitting to any desired frequency response. The target equalization curve is obtained from the spectral distribution analysis of a large dataset of popular commercial recordings. A real-time C++ VST plug-in and an off-line Matlab implementation have been created. Straightforward objective evaluation is also provided, where the output frequency spectra are compared against the target equalization curve and the ones produced by an alternative equalization method.
Convention Paper 8892 (Purchase now)
P14-2 On the Informed Source Separation Approach for Interactive Remixing in Stereo—Stanislaw Gorlow, University of Bordeaux - Talence, France; Sylvain Marchand, Université de Brest - Brest, France
Informed source separation (ISS) has become a popular trend in the audio signal processing community over the past few years. Its purpose is to decompose a mixture signal into its constituent parts at the desired or the best possible quality level given some metadata. In this paper we present a comparison between two ISS systems and relate the ISS approach in various configurations with conventional coding of separate tracks for interactive remixing in stereo. The compared systems are Underdetermined Source Signal Recovery (USSR) and Enhanced Audio Object Separation (EAOS). The latter forms a part of MPEG’s Spatial Audio Object Coding technology. The performance is evaluated using objective difference grades computed with PEMO-Q. The results suggest that USSR performs perceptually better than EOAS and has a lower computational complexity.
Convention Paper 8893 (Purchase now)
P14-3 Scene Inference from Audio—Daniel Arteaga, Fundacio Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain; David García-Garzón, Universitat Pompeu Fabra - Barcelona, Spain; Toni Mateos, imm sound - Barcelona, Spain; John Usher, Hearium Labs - San Francisco, CA, USA
We report on the development of a system to characterize the geometric and acoustic properties of a space from an acoustic impulse response measured within it. This can be thought of as the inverse problem to the common practice of obtaining impulse responses from either real-world or virtual spaces. Starting from an impulse response recorded in an original scene, the method described here uses a non-linear search strategy to select a scene that is perceptually as close as possible to the original one. Potential applications of this method include audio production, non-intrusive acquisition of room geometry, and audio forensics.
Convention Paper 8894 (Purchase now)
P14-4 Continuous Mobile Communication with Acoustic Co-Location Detection—Robert Albrecht, Aalto University - Espoo, Finland; Sampo Vesa, Nokia Research Center - Nokia Group, Finland; Jussi Virolainen, Nokia Lumia Engineering - Nokia Group, Finland; Jussi Mutanen, JMutanen Software - Jyväskylä, Finland; Tapio Lokki, Aalto University - Aalto, Finland
In a continuous mobile communication scenario, e.g., between co-workers, participants may occasionally be located in the same space and thus hear each other naturally. To avoid hearing echoes, the audio transmission between these participants should be cut off. In this paper an acoustic co-location detection algorithm is proposed for the task, grouping participants together based solely on their microphone signals and mel-frequency cepstral coefficients thereof. The algorithm is tested both on recordings of different communication situations and in real time integrated into a voice-over-IP communication system. Tests on the recordings show that the algorithm works as intended, and the evaluation using the voice-over-IP conferencing system concludes that the algorithm improves the overall clarity of communication compared with not using the algorithm. The acoustic co-location detection algorithm thus proves a useful aid in continuous mobile communication systems.
Convention Paper 8895 (Purchase now)
P14-5 Advancements and Performance Analysis on the Wireless Music Studio (WeMUST) Framework—Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Music production devices and musical instruments can take advantage of IEEE 802.11 wireless networks for interconnection and audio data sharing. In previous works such networks have been proved able to support high-quality audio streaming between devices at acceptable latencies in several application scenarios. In this work a prototype device discovery mechanism is described to improve ease of use and flexibility. A diagnostic tool is also described and provided to the community that allows to characterize average network latency and packet loss. Lower latencies are reported after software optimization and sustainability of multiple audio channels is also proved by means of experimental tests.
Convention Paper 8896 (Purchase now)
P14-6 Acoustical Characteristics of Vocal Modes in Singing—Eddy B. Brixen, EBB-consult - Smorum, Denmark; Cathrine Sadolin, Complete Vocal Institute - Copenhagen, Denmark; Henrik Kjelin, Complete Vocal Institute - Copenhagen, Denmark
According to the Complete Vocal Technique four vocal modes are defined: Neutral, Curbing, Overdrive, and Edge. These modes are valid for both the singing voice and the speaking voice. The modes are clearly identified both from listening and from visual laryngograph inspection of the vocal cords and the surrounding area of the vocal tract. In a recent work a model has been described to distinguish between the modes based on acoustical analysis. This paper looks further into the characteristics of the voice modes in singing in order to test the model already provided. The conclusion is that the model is too simple to cover the full range. The work has also provided information on singers’ SPL and formants’ repositioning in dependence of pitch. Further work is recommended.
Convention Paper 8897 (Purchase now)
P15 - Spatial Audio
Monday, May 6, 15:00 — 16:30 (Foyer)
P15-1 Intelligent Acoustic Interfaces for Immersive Audio—Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Michele Scarpiniti, Sapienza University of Rome - Rome, Italy; Raffaele Parisi, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
Oncoming audio technologies privilege the perceptive quality of audio signals, thus offering users an immersive audio experience, which involves listening and acquisition of audio signals. In such a scenario a fundamental role is played by intelligent acoustic interfaces that aim at acquiring audio information, processing it, and returning the processed information under the fulfillment of quality requirements demanded by users. In this paper we introduce intelligent acoustic interfaces for immersive audio experience and we prove their effectiveness within the context of immersive speech communications. In particular, we introduce an intelligent acoustic interface composed of a combined adaptive beamforming scheme in conjunction with a microphone array, which is able to enhance the processed signals in immersive scenarios.
Convention Paper 8898 (Purchase now)
P15-2 The Effects of Spatial Depth in the Combinations of 3-D Imagery and 7-Channel Surround with Height Channels—Toru Kamekawa, Tokyo University of the Arts - Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Toshihiko Date, AVC Networks Company, Panasonic Corporation - Osaka, Japan; Masaaki Enatsu, marimoRECORDS, Inc., - Tokyo, Japan
The effect of the speakers of the height direction in a 3-D imagery focused on the spatial depth were studied and conducted. The first experiment was carried out using a method of magnitude estimation asking how near or far the combination of perceived visual and auditory event is. In the second experiment, the subjects were asked to rate on suitability of the sound to the image using the same materials as the previous experiment. The results show that 7ch and 5ch surround were felt closely and 2ch stereo was felt far under the condition that there is no image. Regarding suitability of sound to an image, 3-D imagery with 7ch surround gives higher score in the near distance.
Convention Paper 8899 (Purchase now)
P15-3 Comparative Analysis on Compact Representation for Spatial Variation of Individual Head-Related Transfer Functions Based on Singular Value Decomposition—Shouichi Takane, Akita Prefectural University - Yurihonjo, Akita, Japan
In this paper the compact representation of the head-related transfer functions (HRTFs) or the Head-Related Impulse Responses (HRIRs) based on singular value decomposition (SVD) was investigated, focusing on the difference in the parameters to construct the average vectors and the covariance matrices in two points. One of them is on what the derived eigenvectors reflect the properties of HRTFs and/or HRIRs. As a result, high correlation between the parameters concerning SVD and the amplitude of HRTFs was found, and the high correlation was obtained in the wide region except the contralateral side. The second investigation is on the required number of the HRTFs or the HRIRs to construct the average vectors and the covariance matrices. It was found that the number of HRTFs are decreased to about 1/4 of the whole directions for the used HRTFs. Among three conditions of the HRIRs, the amplitude of HRTFs, and the log-amplitude of HRTFs, it was also shown that the amplitude of HRTFs is the most effective to construct the parameters of the SVD.
Convention Paper 8900 (Purchase now)
P15-4 Calculation of Individualized Near-Field Head-Related Transfer Function Database Using Boundary Element Method—Yuanqing Rui, South China University of Technology - Guangzhou, China; Guangzheng Yu, South China University of Technology - Guangzhou, Guangdong, China; Bosun Xie, South China University of Technology - Guangzhou, China; Yu Liu, South China University of Technology - Guangzhou, China
Measurement is a common method to obtain the far-field HRTFs. Due to the difficulties in measurement, near-field HRTF databases for an artificial head are rare and an individualized database for human subjects is now unavailable. The present work adopts a laser 3-D scanner to acquire geometrical surfaces of human subjects and then uses boundary element methods to calculate the near-field HRTFs. At last, an individualized near-field HRTF database with 56 human subjects is established. To evaluate the accuracy of the database, the HRTFs for KEMAR are also calculated and compared to the measured ones.
Convention Paper 8901 (Purchase now)
P15-5 A Standardized Repository of Head-Related and Headphone Impulse Response Data—Michele Geronazzo, University of Padova - Padova, Italy; Fabrizio Granza, University of Padova - Padova, Italy; Simone Spagnol, University of Padova - Padova, Italy; Federico Avanzini, University of Padova - Padova, Italy
This paper proposes a repository for storing full- and partial-body Head-Related Impulse Responses (HRIRs/pHRIRs) and Headphone Impulse Responses (HpIRs) from several databases in a standardized environment. The main differences among the available databases concern coordinate systems, sound source stimuli, sampling frequencies, and other important specifications. The repository is organized so as to consider all these differences. The structure of our repository is an improvement with respect to the MARL-NYU data format, born as an attempt to unify HRIR databases. The introduced information supports flexible analysis and synthesis processes and robust headphone equalization.
Convention Paper 8902 (Purchase now)
P15-6 Influence of Different Microphone Arrays on IACC as an Objective Measure Of Spaciousness—Marco Conceição, Trinity College - Dublin, Ireland; Instituto Politécnico do Porto - Porto, Portugal; Dermot Furlong, Trinity College - Dublin, Ireland
Inter-Aural Cross Correlation measurements are used as physical measures that relate to listener spaciousness experience in a comparative study of the influence on spaciousness of different microphone arrays, thus allowing an objective approach to be adopted in the exploration of how microphone arrays affect the perceived spaciousness for stereo and surround sound reconstructions. The different microphone arrays recorded simulated direct and indirect sound components. The recorded signals were played back in three different rooms and IACC measurements were made for the reconstructed sound fields using a dummy head microphone system. The results achieved show how microphone array details influence the IACC peak and lead to a better understanding of how spaciousness can be controlled for 2 channel stereo and 5.1 presentations. Parametric variation of microphone arrays can therefore be employed to facilitate spaciousness control for reconstructed sound fields.
Convention Paper 8885 (Purchase now)
P16 - Spatial Audio—Part 2: 3-D Microphone and Loudspeaker Systems
Tuesday, May 7, 09:00 — 12:30 (Sala Carducci)
Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
P16-1 Recording and Playback Techniques Employed for the “Urban Sounds” Project—Angelo Farina, Università di Parma - Parma, Italy; Andrea Capra, University of Parma - Parma, Italy; Alberto Amendola, University of Parma - Parma, Italy; Simone Campanini, University of Parma - Parma, Italy
The “Urban Sounds” project, born from a cooperation of the Industrial Engineering Department at the University of Parma with the municipal institution La Casa della Musica, aims to record characteristic soundscapes in the town of Parma with a dual purpose: delivering to posterity an archive of recorded sound fields to document Parma in 2012, employing advanced 3-D surround recording techniques and creation of a “musical” Ambisonics composition for leading the audience through a virtual tour of the town. The archive includes recordings of various “soundscapes,” such as streets, squares, schools, churches, meeting places, public parks, train station, and airport, and everything was considered unique to the town. This paper details the advanced digital sound processing techniques employed for recording, processing, and playback.
Convention Paper 8903 (Purchase now)
P16-2 Robust 3-D Sound Source Localization Using Spherical Microphone Arrays—Carl-Inge Colombo Nilsen, University of Oslo - Oslo, Norway; Squarehead Technology AS - Norway; Ines Hafizovic, SquareHead Technology AS - Oslo, Norway; Sverre Holm, University of Oslo - Oslo, Norway
Spherical arrays are gaining increased interest in spatial audio reproduction, especially in Higher Order Ambisonics. In many audio applications the sound source detection and localization is of crucial importance, urging for robust localization methods suitable for spherical arrays. The well-known direction-of-arrival estimator, the ESPRIT algorithm, is not directly applicable to spherical arrays for 3-D applications. The eigenbeam ESPRIT (EB-ESPRIT) is based on the spherical harmonics framework and is especially derived for spherical arrays. However, the ESPRIT method is in general susceptible to errors in the presence of correlated sources and spatial decorrelation is not possible for spherical arrays. In our new implementation of spherical harmonics-based ESPRIT (SA2ULA-ESPRIT) the robustness against correlation is achieved by spatial decorrelation incorporated directly in the algorithm formulation. The simulated performance of the new algorithm is compared to EB-ESPRIT.
Convention Paper 8904 (Purchase now)
P16-3 Parametric Spatial Audio Coding for Spaced Microphone Array Recordings—Archontis Politis, Aalto University - Espoo, Finland; Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Jukka Ahonen, Akukon Ltd. - Helsinki, Finland; Ville Pulkki, Aalto University - Aalto, Finland
Spaced microphone arrays for multichannel recording of music performances, when reproduced in a multichannel system, exhibit reduced inter-channel coherence that translates perceptually to a pleasant ‘enveloping’ quality, at the expense of accurate localization of sound sources. We present a method to process the spaced microphone recordings using the principles of Directional Audio Coding (DirAC), based on the knowledge of the array configuration and the frequency-dependent microphone patterns. The method achieves equal or better quality to the standard high-quality version of DirAC and it improves the common one-to-one channel playback of spaced multichannel recordings by offering improved and stable localization cues.
Convention Paper 8905 (Purchase now)
P16-4 Optimal Directional Pattern Design Utilizing Arbitrary Microphone Arrays: A Continuous-Wave Approach—Symeon Delikaris-Manias, Aalto University - Helsinki, Finland; Constantinos A. Valagiannopoulos, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Aalto, Finland
A frequency-domain method is proposed for designing directional patterns from arbitrary microphone arrays employing the complex Fourier series. A target directional pattern is defined and an optimal set of sensor weights is determined in a least-squares sense, adopting a continuous-wave approach. It is based on discrete measurements with high spatial sampling ratio, which mitigates the potential aliasing effect. Fourier analysis is a common method for audio signal decomposition; however in this approach a set of criteria is employed to define the optimal number of Fourier coefficients and microphones for the decomposition of the microphone array signals at each frequency band. Furthermore, the low-frequency robustness is increased by smoothing the target patterns at those bands. The performance of the algorithm is assessed by calculating the directivity index and the sensitivity. Applications, such as synthesizing virtual microphones, beamforming, binaural, and loudspeaker rendering are presented.
Convention Paper 8906 (Purchase now)
P16-5 Layout Remapping Tool for Multichannel Audio Productions—Tim Schmele, Fundació Barcelona Media - Barcelona, Spain; David García-Garzón, Universitat Pompeu Fabra - Barcelona, Spain; Umut Sayin, Fundació Barcelona Media - Barcelona, Spain; Davide Scaini, Fundació Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain; Daniel Arteaga, Fundacio Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain
Several multichannel audio formats are present in the recording industry with reduced interoperability among the formats. This diversity of formats leaves the end user with limited accessibility to content and/or audience. In addition, the preservation of recordings—that are made for a particular format—comes under threat, should the format become obsolete. To tackle such issues, we present a layout-to-layout conversion tool that allows converting recordings that are designated for one particular layout to any other layout. This is done by decoding the existent recording to a layout independent equivalent and then encoding it to desired layout through different rendering methods. The tool has proven useful according to expert opinions. Simulations depict that after several consecutive conversions the results exhibit a decrease in spatial accuracy and increase in overall gain. This suggests that consecutive conversions should be avoided and only a single conversion from the originally rendered material should be done.
Convention Paper 8907 (Purchase now)
P16-6 Discrimination of Changing Loudspeaker Positions of 22.2 Multichannel Sound System Based on Spatial Impressions—Ikuko Sawaya, Science & Technology Research Laboratories, Japan Broadcasting Corp. - Setagaya, Tokyo, Japan; Kensuke Irie, Science & Technology Research Laboratories, Japan Broadcasting Corp. - Setagaya, Tokyo, Japan; Takehiro Sugimoto, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tokyo Institute of Technology - Midori-ku, Yokohama, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Kimio Hamasaki, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan
On 22.2 multichannel reproduction, we sometimes listened to the sounds reproduced by a loudspeaker arrangement different from that on production, and we did not recognize the difference in spatial impression between them definitely. In this paper we discuss the effects of changing some of the loudspeaker positions from the reference on the spatial impressions in a 22.2 multichannel sound system. Subjective evaluation tests were carried out by altering the azimuthal and elevation angles from the reference in each condition. Experimental results showed that the listeners did not recognize the difference in spatial impression from the reference with some loudspeaker arrangements. On the basis of these results, the ranges keeping the equivalent quality of the spatial impressions to the reference are discussed when the reproduced material has the signals of all the channels of the 22.2 multichannel sound system.
Convention Paper 8909 (Purchase now)
P16-7 Modeling Sound Localization of Amplitude-Panned Phantom Sources in Sagittal Planes—Robert Baumgartner, Austrian Academy of Sciences - Vienna, Austria; Piotr Majdak, Austrian Academy of Sciences - Vienna, Austria
Localization of sound sources in sagittal planes (front/back and top/down) relies on listener-specific monaural spectral cues. A functional model approximating human processing of spectro-spatial information is applied to assess localization of phantom sources created by amplitude panning of coherent loudspeaker signals. Based on model predictions we investigated the localization of phantom sources created by loudspeakers positioned in the front and in the back, the effect of loudspeaker span and panning ratio on localization performance in the median plane, and the amount of spatial coverage provided by common surround sound systems. Our findings are discussed in the light of previous results from psychoacoustic experiments.
Convention Paper 8910 (Purchase now)
P17 - Measurements and Modeling
Tuesday, May 7, 10:00 — 11:30 (Foyer)
P17-1 Estimation of Overdrive in Music Signals—Lasse Vetter, Queen Mary University of London - London, UK; Michael J. Terrell, Queen Mary University of London - London, UK; Andrew J. R. Simpson, Queen Mary University of London - London, UK; Andrew McPherson, Queen Mary University of London - London, UK
In this paper we report experimental and modeling results from an investigation of listeners’ ability to estimate overdrive in a signal. The term overdrive is used to characterize the result of systematic, level-dependent nonlinearity typical of audio equipment and processors (e.g., guitar amplifiers). Listeners (N=7) were given the task of estimating the degree of overdrive in music signals that had been processed with a static, saturating nonlinearity to introduce varying degrees of nonlinear distortion. A statistical model is proposed to account for the data, which is based on a measure of time-variance in the summed frequency-response deviation introduced by the nonlinearity. This provides a useful “black-box” metric that describes the perceived amount of overdrive introduced by an audio processing device.
Convention Paper 8911 (Purchase now)
P17-2 Mobile Audio Measurements Platform: Toward Audio Semantic Intelligence into Ubiquitous Computing Environments—Lazaros Vrysis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George M. Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper presents the implementation of a mobile software environment that provides a suite of professional-grade audio and acoustic analysis tools for smartphones and tablets. The suite includes sound level monitoring, real-time time-frequency analysis, reverberation time, and impulse response measurements, whereas feature-based intelligent content analysis is deployed in terms of long-term audio events detection and segmentation. The paper investigates the implementation of a flexible and user-friendly environment, which can be easily used by non-specialists, providing professional functionality and fidelity of specific-purpose devices and eliminating the mobile-interfacing and hardware limitations. Emphasis is given to the integration of additional capabilities that will offer valuable amenities to the user, having to do with the management of measurement sessions and intelligent cloud-based semantic analysis.
Convention Paper 8912 (Purchase now)
P17-3 System Identification Based on Hammerstein Models Using Cubic Splines—Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Nonlinear system modeling plays an important role in the field of digital audio systems whereas most of the real-world devices show a nonlinear behavior. Among nonlinear models, Hammerstein systems are particular nonlinear systems composed of a static nonlinearity cascaded with a linear filter. In this paper a novel approach for the estimation of the static nonlinearity is proposed based on the introduction of an adaptive CatmullRom cubic spline in order to overcome problems related to the adaptation of high-order polynomials necessary for identifying highly nonlinear systems. Experimental results confirm the effectiveness of the approach, making also comparisons with existing techniques of the state of the art.
Convention Paper 8913 (Purchase now)
P17-4 Impulse Responses Measured with MLS or Swept-Sine Signals: A Comparison between the Two Methods Applied to Noise Barrier Measurements—Paolo Guidorzi, University of Bologna - Bologna, Italy; Massimo Garai, University of Bologna - Bologna, Italy
A sound source and a microphone grid are used for measuring a set of impulse responses with the purpose of estimating the in-situ acoustical characteristics of noise barriers (sound reflection and airborne sound insulation) following the CEN/TS 1793-5 European standard guidelines as improved by the European project QUIESST. The impulse responses are measured using MLS (Maximum Length Sequence) and Swept-sine signals. The acoustical characteristics of the noise barrier, obtained using the two signals, are generally equivalent, but in some critical measurement conditions a discrepancy can be found. Differences and advantages between measurements, obtained by means of MLS or Swept-sine methods are analyzed and discussed in this paper.
Convention Paper 8914 (Purchase now)
P17-5 Polar Measurements of Harmonic and Multitone Distortion of Direct Radiating and Horn Loaded Transducers—Mattia Cobianchi, Lavoce Italiana - Colonna (RM), Italy; Fabrizio Mizzoni, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
While extensive literature is available on the topic of polar pattern measurements and predictions of loudspeakers’ fundamental SPL, only a single paper to our knowledge deals with the polar pattern of nonlinear distortions, in particular with harmonic distortion products of cone type loudspeakers. This paper contains the first results of a more thorough study intended as a complement to fill the gap both in measurement techniques and loudspeaker type. Relative and absolute harmonic distortion as well as relative and absolute multitone distortion, indeed, have been measured for cone, dome, and horn loaded transducers.
Convention Paper 8915 (Purchase now)
P17-6 An Efficient Nonlinear Acoustic Echo Canceller for Low-Cost Audio Devices—Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Antonio Grosso, bdSound - Milan, Italy; Fabio Cagnetti, bdSound - Milan, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
One of the most challenging problems to address in the modeling of acoustic channels is the presence of nonlinearities generated by loudspeakers. This problem has become even more widespread due to the growing availability of low-cost devices that introduce a larger amount of distortions and decrease the quality of hands-free speech communication. In order to reduce the effect of loudspeaker nonlinearities on the speech quality, nonlinear acoustic echo cancellers are adopted. In this paper we present a new adaptive filtering structure for the reduction of nonlinearities in the acoustic path, based on a nonlinear transformation of the input audio signal by means of functional links. We use such a nonlinear model in conjunction with a linear filter providing a nonlinear adaptive architecture for acoustic applications. Experimental results prove the effectiveness of the proposed model in reducing loudspeaker nonlinearities affecting speech signals.
Convention Paper 8916 (Purchase now)
P17-7 Radiation Pattern Differences between Electric Guitar Amplifiers—Justin Mathew, New York University - New York, NY, USA; Stephen Blackmore, New York University - New York, NY, USA
Various aspects of electric guitar amplifiers can differentiate them from one another. Two of the major differentiating characteristics are frequency response and unique radiation patterns. These characteristics are directly related to differences in shape, size, and circuit configuration between different guitar amplifier models. In this paper the differences in radiation patterns of multiple guitar amplifiers will be presented as well a method of classifying the differences.
Convention Paper 8917 (Purchase now)
P18 - Spatial Audio—Part 3: Ambisonics, WFS
Tuesday, May 7, 14:30 — 16:30 (Sala Carducci)
Symeon Delikaris-Manias, Aalto University - Helsinki, Finland
P18-1 An Ambisonics Decoder for Irregular 3-D Loudspeaker Arrays—Daniel Arteaga, Fundacio Barcelona Media - Barcelona, Spain; Universitat Pompeu Fabra - Barcelona, Spain
We report on the practical implementation of an Ambisonics decoder for irregular 3-D loudspeaker layouts. The developed decoder, which uses a non-linear search algorithm to look for the optimal Ambisonics coefficients for each loudspeaker, has a number of features specially tailored for reproduction in real-world 3-D audio venues (for example, special 3-D audio installations, concert halls, audiovisual installations in museums, etc.). In particular, it performs well even for highly irregular speaker arrays, giving an acceptable listening experience over a large audience area.
Convention Paper 8918 (Purchase now)
P18-2 The Effect of Low Frequency Reflections on Stereo Images—Jamie A. S. Angus, University of Salford - Salford, Greater Manchester, UK
This paper looks at the amount of absorption required to adequately control early reflections in reflection-controlled environments at low frequencies (< 700 Hz). This is where the Inter-aural Time Delay Cue (ITD) is important. Most work has focused on wideband energy time performance. In particular, it will derive the effect a given angle and strength of reflection has on the phantom image location using the Blumlein equations. These allow the effect of reflections as a function of frequency to be quantified. It will show that the effect of reflections are comparatively small for floor and ceiling reflections, but that lateral reflections depend on the phantom image location and get worse the more off-center the phantom image becomes.
Convention Paper 8919 (Purchase now)
P18-3 Parametric Spatial Audio Reproduction with Higher-Order B-Format Microphone Input—Ville Pulkki, Aalto University - Aalto, Finland; Archontis Politis, Aalto University - Espoo, Finland; Giovanni Del Galdo, Ilmenau University of Technology - Ilmenau, Germany; Achim Kuntz, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
A time-frequency-domain non-linear parametric method for spatial audio processing is presented here, which can utilize microphone input having directional patterns of any order. The method is based on dividing the sound field into overlapping or non-overlapping sectors. Local pressure and velocity signals are measured within each sector, and an individual Directional Audio Coding (DirAC) processing is performed for each sector. It is shown, that in certain acoustically complex conditions the sector-based processing enhances the quality compared to traditional first-order DirAC processing.
Convention Paper 8920 (Purchase now)
P18-4 Wave Field Synthesis of Virtual Sound Sources with Axisymmetric Radiation Pattern Using a Planar Loudspeaker Array—Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Ferdinando Olivieri, University of Southampton - Southampton, Hampshire, UK; Thibaut Carpentier, UMR STMS IRCAM-CNRS-UPMC - Paris, France; Markus Noisternig, UMR STMS IRCAM-CNRS-UPMC - Paris, France
A number of methods have been proposed for the application of Wave Field Synthesis to the reproduction of sound fields generated by point sources that exhibit a directional radiation pattern. However, a straightforward implementation of these solutions involves a large number of real-time operations that may lead to very high computational load. This paper proposes a simplified method to synthesize virtual sources with axisymmetric radiation patterns using a planar loudspeaker array. The proposed simplification relies on the symmetry of the virtual source radiation pattern and on the far-field approximation, although a near-field formula is also derived. The mathematical derivation of the method is presented and numerical simulations validate the theoretical results.
Convention Paper 8921 (Purchase now)