AES New York 2011
Paper Session Details
P1 - Room Acoustics
Thursday, October 20, 9:00 am — 11:00 am (Room: 1E09)
Peter Mapp, Peter Mapp Associates
P1-1 New Thoughts on Active Acoustic Absorbers—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
This paper continues an earlier exploration of using full-range loudspeakers as both acoustic sources and sinks, in an attempt to reduce the room decay time for bass frequencies. We develop the theory for a point active absorber immersed in the acoustic source field from a point source. This would apply to normal loudspeakers used as either sources or absorbers at frequencies below about 300 Hz, where they act as points. The result extends the theory of Nelson and Elliott for a point absorber interacting with a plane wave. An extra term occurs that has little net effect when averaged over frequency or distance. In rooms such cancellation occurs due to the varying distances from all the source images to the absorber. Impulse responses in several small rooms were measured from a source and an absorber loudspeaker to both a few listening microphones and a microphone mounted at the absorber. The efficacy of the active absorber is assessed and the results are enigmatic.
Convention Paper 8458 (Purchase now)
P1-2 Investigations of Room Acoustics with a Spherical Microphone Array—Samuel W. Clapp, Rensselear Polytechnic Institute - Troy, NY, USA; Anne Guthrie, Rensselear Polytechnic Institute - Troy, NY, USA, Arup, New York, NY, USA; Jonas Braasch, Ning Xiang, Rensselear Polytechnic Institute - Troy, NY, USA
Most room acoustic parameters are calculated with data from omni-directional or figure-of-eight microphones. Using a spherical microphone array to record room impulse responses can open up several new areas of inquiry. It can yield much more information about the spatial characteristics of the sound field, including the diffuseness of the sound field and the directions of individual reflections. A 16-channel microphone array was designed, built, and tested with both simulations and simple, controlled sound events. Room impulse responses were then measured in reverberant rooms used for music from stage and audience positions, and the results were analyzed using beamforming techniques to determine spatial information about the sound field.
Convention Paper 8459 (Purchase now)
P1-3 Room Acoustics Using a 2.5 Dimensional Approach with Damping Included—Patrick Macey, PACSYS Limited - Nottingham, UK
Cavity modes of a finite bounded region with rigid boundaries can be used to compute the steady state harmonic response for point source excitation. In cuboid domains this is straightforward. In general regions, determining a set of orthonormal modes is more difficult. Previous work showed that for rooms of constant height, 3-D modes can be computed from the cross section modes, and this used for a fast solution. This approach used modal damping. More realistic damping associated with wall areas could be included using a damped eigenvalue calculation of the cross section modes. This is restrictive on damping formulations. An alternative non-modal approach, using a trigonometric expansion through the height is proposed. This is still faster than 3-D FEM.
Convention Paper 8460 (Purchase now)
P1-4 Accurate Acoustic Modeling of Small Rooms—Holger Schmalle, AFMG Ahnert Feistel Media Group - Berlin, Germany; Dirk Noy, WSDG Walters-Storyk Design Group - Basal, Switzerland; Stefan Feistel, AFMG Ahnert Feistel Media Group - Berlin, Germany; Gabriel Hauser, WSDG Walters-Storyk Design Group - Basal, Switzerland; Wolfgang Ahnert, AFMG Ahnert Feistel Media Group - Berlin, Germany; John Storyk, WSDG Walters-Storyk Design Group - Basal, Switzerland
Modeling of sound reinforcement systems and room acoustics in large and medium-size venues has become a standard in the audio industry. However, acoustic modeling of small rooms has not yet evolved into a widely accepted concept, mainly because of the unavailable tool set. This work introduces a practical and accurate software-based approach for simulating the acoustic properties of studio rooms based on FEM. A detailed case study is presented and modeling results are compared with measurements. It is shown that results match within given uncertainties. Also, it is indicated how the simulation software can be enhanced to optimize loudspeaker locations and place absorbers and diffusers in order to improve the acoustic quality of the space and thus the listening experience.
Convention Paper 8457 (Purchase now)
P2 - Recording and Sound Production
Thursday, October 20, 9:00 am — 11:00 am (Room: 1E07)
Justin Paterson, University of West London - London, UK
P2-1 Computer Assisted Microphone Array Design (CAMAD)—Michael Williams, Freelance Sound Recording Engineer and Lecturer - Sounds of Scotland, Le Perreux sur Marne, France
The basic aim of Microphone Array Design is to create microphone array recording systems with smooth seamless or “Critically Linked” segment coverage of the surround sound field. Each configuration must take into account the interaction of the many design parameters, with the specific coverage of each segment that is required. The difficulty in the manipulation of these many parameters is one of the major obstacles in developing a wide range of microphone arrays that meet the needs of each particular sound recording environment. This paper will outline the various parameters that need to be taken into consideration and explain the basic approach to developing MATLAB-based software that gives a clear and unambiguous display of all the salient characteristics needed to achieve a stable and reliable microphone array, no matter the number of channels involved, or the type of directivity pattern chosen for each microphone.
Convention Paper 8461 (Purchase now)
P2-2 In Situ Measurements of the Concert Grand Piano—Brett Leonard, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music, Media and Technology, Montreal, Quebec, Canada; Grzegorz Sikora, McGill University - Montreal, Quebec, Canada; Martha de Francisco, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music, Media and Technology, Montreal, Quebec, Canada
An in situ approach to the acoustical study of the grand piano is presented in which the instrument is coupled with a typical, reflective recording space. By using accurate, tightly controlled automated playback of expertly performed material, a small number of high-quality transducers are employed to capture more than 1300 spatially distributed data points in the process known as acoustic space sampling (AcSS). The AcSS measurement task is performed on two pianos in two unique recording environments. The data is analyzed using accepted acoustic metrics and psychoacoustic predictors. It is shown that certain spatial areas containing salient physical and psychoacoustic measures are highly correlated recording engineer preference.
Convention Paper 8462 (Purchase now)
P2-3 Beyond Surround Sound—Creation, Coding and Reproduction of 3-D Audio Soundtracks—Jean-Marc Jot, Zoran Fejzo, DTS, Inc.
We propose a flexible and practical end-to-end solution for creating, encoding, transmitting, decoding, and reproducing spatial audio soundtracks. The soundtrack encoding format is compatible with legacy surround-sound decoders, while enabling the representation of a three-dimensional audio scene, irrespective of the listener’s playback system configuration. It allows for encoding one or more selected audio objects that can be rendered with optimal fidelity and interactively in any target spatial audio format (existing or future). The transmission or storage data rate and the decoder complexity are scalable at delivery time. A 3-D audio soundtrack may thus be produced once and transmitted or broadcast and reproduced as faithfully as possible on the broadest range of target devices.
Convention Paper 8463 (Purchase now)
P2-4 The Effects of Multiple Arrivals on the Intelligibility of Reinforced Speech—Timothy J. Ryan, Richard King, McGill University - Montreal, Quebec, Canada; Jonas Braasch, Rensselaer Polytechnic University - Troy, NY, USA; William L. Martens, University of Syndey - Syndey, NSW, Australia
The effects of multiple arrivals on the intelligibility of speech produced by live-sound reinforcement systems are examined. The intent is to determine if correlations exist between the manipulation of sound system optimization parameters and the subjective attribute speech intelligibility. Investigated variables are signal-to-noise ratio (SNR), delay time between signals arriving from multiple elements of a loudspeaker array, and array type and geometry. Intelligibility scores were obtained through subjective evaluation of binaural recordings, reproduced via headphone, using the Modified Rhyme Test.
Convention Paper 8464 (Purchase now)
P3 - Transducers
Thursday, October 20, 10:00 am — 11:30 am (Room: 1E Foyer)
P3-1 Inter- and Intra-Individual Variability in Blocked Auditory Canal Transfer Functions of Three Circum-Aural Headphones—Florian Völk, AG Technische Akustik, MMK, Technische Universität München - Munich, Germany
In headphone playback, different factors contribute to a deviation of the presented from the intended stimuli. Most important are the headphone transfer functions, their inter-individual differences, and the intra-individual variability due to repeated positioning on the subjects’ heads. This paper gives a detailed inspection of the blocked auditory canal transfer characteristics for one specimen of each of three different circum-aural headphone models frequently used in psychoacoustics, two operating on the electro-dynamic, one on the electro-static converter principle. It is shown that the variability can have considerable influence on the stimuli presented, especially in the frequency range above 6 kHz. The data indicate headphone specific variability, suggesting the need for the variability to be considered as headphone selection criterion.
Convention Paper 8465 (Purchase now)
P3-2 Cable Matters: Instrument Cables Affect the Frequency Response of Electric Guitars—Rafael Cauduro Dias de Paiva, Aalto University School of Electrical Engineering - Espoo, Finland, Nokia Technology Institute INdT, Brasilia, Brazil; Henri Penttinen, Aalto University School of Electrical Engineering - Espoo, Finland
This paper presents analysis results of the effects an instrument cable has on the timbre of an electric guitar. In typical, well-designed audio equipment with proper impedance buffers, the effect of the cable can be considered insignificant at audio frequencies. In contrast, magnetic pickups used in electric guitars, act as resonating low-pass filters. The cable is attached to a resonating electrical circuit and hence its impedance characteristics can influence the system. The simulation and measurement results show that the capacitance of an instrument cable affects the frequency response of the system, whereas the effects of the inductance and series or parallel resistance are negligible. The largest shift in the resonant frequency for the measured cables was 1.4 kHz.
Convention Paper 8466 (Purchase now)
P3-3 An Approach to Small Size Direct Radiation Transducers with High SPL—Jose Martinez, Acústica Beyma, S.L. - Valencia, Spain; Enrique Segovia, Obras Públicas e Infraestructura Urbana - Alicante, Spain; Jaime Ramis, Ingeniería de Sistemas y Teoria de la Señal - San Vicent del Raspeig, Spain; Alejandro Espí, Acústica Beyma, S.L. - Valencia, Spain; Jesús Carbajo, Ingeniería de Sistemas y Teoria de la Señal - San Vicent del Raspeig, Spain
This work analyses some of the issues related to small size direct radiation loudspeakers design and aims to achieve high SPL using this kind of loudspeaker. In order to reach it, large diaphragm displacements are needed. Structural dynamic behavior of the moving assembly must be emphasized. With the aid of a numerical model implemented with finite elements, it is possible to quantify the influence of changing the number of folds in the suspension (spider), the distance between spiders, and the effect of unbalanced forces inherent to the loudspeaker construction. Numerical model predictions are compared with experimental results having as reference a six-inch loudspeaker development.
Convention Paper 8467 (Purchase now)
P3-4 High-Order Analog Control of a Clocked Class-D Audio Amplifier with Global Feedback Using Z-Domain Methods—Pieter Kemp, Toit Mouton, University of Stellenbosch - Matieland, South Africa; Bruno Putzeys, Hypex Electronics B.V. - Groningen, The Netherlands
The design of a clocked analog controlled pulse-width modulated class-D audio amplifier with global negative feedback is presented. The analog control loop is designed in the z-domain by modeling the comparator as a sampling operation. A method is presented to improve clip recovery and ensure stability during over-modulation. Loop gain is shaped to provide a high gain across the audio band, and ripple compensation is implemented to minimize the negative effect of ripple feedback. Experimental results are presented.
Convention Paper 8468 (Purchase now)
P3-5 A Novel Sharp Beam-Forming Flat Panel Loudspeaker Using Digitally Driven Speaker System—Mitsuhiro Iwaide, Akira Yasuda, Daigo Kuniyoshi, Kazuyuki Yokota, Moriyasu, Yugo, Kenji Sakuta, Fumiaki Nakashima, Masayuki, Yashiro, Michitaka Yoshino, Hosei University - Koganei, Tokyo, Japan
In this paper we propose a beam-forming speaker based on a digitally direct driven speaker system (digital-SP); the proposed speaker employs multi-bit delta-sigma modulation in addition to a line speaker array with flat-panel-loudspeakers (FPLS) and a delay circuit. The proposed speaker can be realized only by D flip-flops and digital-SP. The sound direction can easily be controlled digitally. All processes can be performed digitally without the use of analog components such as power amplifiers, and a small, light, thin, high-quality beam-forming speaker system can be realized. The prototype is constructed using an FPGA, CMOS drivers, and a line FPLS array. The 20 dB attenuation for 20 degree direction is measured.
Convention Paper 8469 (Purchase now)
P3-6 The Influence of the Directional Radiation Performance of the Individual Speaker Modules, and Overall Array, on the Tonal Balance, Quality and Consistency of Sound Reinforcement Systems—Akira Mochimaru, Paul Fidlin, Soichiro Hayashi, Kevin Manzolini, Bose Corporation - Framingham, MA, USA
Room acoustic characteristics, such as reflections and reverberation, often change the performance of speaker systems in a room. It has always been challenging to maintain the tonal balance of a single speaker module when multiple modules are used to form speaker arrays. Both phenomena are unique to sound reinforcement speakers and are mainly determined by a combination of the radiation characteristics of an individual speaker module and the interactions between modules, known as arrayability. First, this study reviews the performance of conventional speaker systems. Next, the acoustic characteristics of speaker modules desired for forming an ideal speaker array are discussed. A new category of speaker system, the Progressive Directivity Array, is introduced to realize the theory as a practical solution.
Convention Paper 8470 (Purchase now)
P4 - Loudspeakers
Thursday, October 20, 2:30 pm — 4:30 pm (Room: 1E09)
Alex Voishvillo, JBL Professional
P4-1 Subwoofers in Rooms: Equalization of Multiple Subwoofers—Juha Backman, Nokia Corporation - Espoo, Finland, Aalto University, Espoo, Finland
The effectiveness of multiple subwoofers in controlling low-frequency room modes can be improved through designing individual equalization for each of the subwoofers. The paper discusses two strategies toward equalizer design, including individual equalization of multiple subwoofer responses, minimizing the total energy radiated to the room, and optimization for minimizing sound field variation. The frequency responses and sound field distributions obtained using these methods are compared to the results of conventional equalization and modal control only through loudspeaker placement.
Convention Paper 8471 (Purchase now)
P4-2 A Systematic Approach to Measurement Limit Definitions in Loudspeaker Production—Gregor Schmidle, NTi Audio AG - Schaan, Liechtenstein
A typical end-of-line loudspeaker test comprises ten or more different parameters tested. Each parameter has its own pass/fail limits contributing to the overall test result of the loudspeaker and therefore to the yield of the production line. This paper gives a comprehensive overview about commonly used limit calculation methods and procedures in the industry. It also delivers a systematic guidance for choosing the right limit scheme for maximizing yield, quality, and throughput.
Convention Paper 8472 (Purchase now)
P4-3 Inverse Distance Weighting for Extrapolating Balloon-Directivity-Plots—Joerg Panzer, R&D Team - Salgen, Germany; Daniele Ponteggia, Audiomatica Srl - Firenze, Italy
This paper investigates an extrapolation for missing directivity-data in connection with Balloon-Plots. Such plots display the spherical directivity-pattern of radiators and receivers in form of contoured sound pressure levels. Normally the directivity-data are distributed evenly so that at each point of the display-sphere there would be sufficient data-points. However, there are circumstances where we want to display data that are not evenly distributed. For example, there might be only available the horizontal and vertical scans. The proposed Inverse Distance Weighting method is a means to extrapolate into these gaps. This paper explains this method and demonstrates some examples.
Convention Paper 8473 (Purchase now)
P4-4 Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension—Wolfgang Klippel, Klippel GmbH - Dresden Germany
The mechanical suspension becomes more and more compliant over time changing the loudspeaker properties (e.g., resonance frequency) significantly. This aging process is reproducible and the decay of the stiffness can be modeled by accumulating the apparent power supplied to the suspension part and using an exponential relationship. The free parameters of this model are estimated from empirical data provided by on-line monitoring or intermittent measurements during regular power tests or other kinds of long-term testing. The identified model can be used to predict the load-induced aging for music or test signals having arbitrary spectral properties. New characteristics are being introduced that simplify the quality assessment of suspension parts and separate mechanical fatigue from the initial break-in effect. Practical experiments are performed to verify the model and to demonstrate the diagnostic value for selecting optimal suspension parts providing sufficient long-term stability.
Convention Paper 8474 (Purchase now)
P5 - Audio Processing—Part 1
Thursday, October 20, 2:30 pm — 4:30 pm (Room: 1E07)
Dana Massie, Audience, Inc. - Mountain View, CA, USA
P5-1 Automatic Detection of the Proximity Effect—Alice Clifford, Josh Reiss, Queen Mary University of London - London, UK
The proximity effect in directional microphones is characterized by an undesired boost in low frequency energy as source to microphone distance decreases. Traditional methods for reducing the proximity effect use a high pass filter to cut low frequencies that alter the tonal characteristics of the sound and are not dependent on the input source. This paper proposes an intelligent approach to detect the proximity effect in a single capsule directional microphone in real time. The low frequency boost is detected by analyzing the spectral flux of the signal over a number of bands over time. A comparison is then made between the bands to indicate the existence of the proximity effect. The proposed method is shown to accurately detect the proximity effect in test recordings of white noise and a other musical inputs. This work has applications in the reduction of the proximity effect.
Convention Paper 8475 (Purchase now)
P5-2 Vibrato Detection Using Cross Correlation Between Temporal Energy and Fundamental Frequency—Henrik von Coler, Technical University of Berlin - Berlin, Germany; Axel Röbel, IRCAM - Paris, France
In this work we present an approach for detecting quasi periodic frequency modulations (vibrato) in monophonic instrument recordings. Since a frequency modulation in physical instruments usually causes an amplitude modulation, our method is based on a block wise cross correlation between the extracted frequency- and amplitude-modulation trajectories. These trajectories are obtained by removing the constant components. The resulting cross correlation curve shows significant positive peaks at vibrato regions and local minima at note boundaries. Our approach has the advantage of working without a previous note boundary detection and needs only a small look ahead. Furthermore no presumptions on vibrato parameters have to be made.
Convention Paper 8476 (Purchase now)
P5-3 A Non-Time-Progressive Partial Tracking Algorithm for Sinusoidal Modeling—Maciej Bartkowiak, Poznan University of Technology - Poznan, Poland; Tomasz Zernicki, Telcordia Poland - Poznan, Poland
In this paper we propose a new sinusoidal model tracking algorithm that implements a non-progressive way of data processing. Sinusoidal partial parameters are estimated in the consecutive frames; however, the order of establishing individual connections between partials is optimized within the whole signal or within a specific time window. In this way, the strongest connections may be determined early, and subsequent predictions of each trajectory evolution are based on a more reliable partial evolution history, compared to a traditional progressive scheme. As a consequence, the proposed non-progressive tracking algorithm offers a statistically significant improvement of obtained trajectories in terms of better classic pattern recognition measures.
Convention Paper 8477 (Purchase now)
P5-4 Perceptually Relevant Models for Articulation in Synthesized Drum Patterns—Ryan Stables, Jamie Bullock, Ian Williams, Birmingham City University - Birmingham, UK
In this study we evaluate current techniques for drum pattern humanization and suggest new methods using a probabilistic model. Our statistical analysis shows that both deviations from a fixed grid and corresponding amplitude values of drum patterns can have non-Gaussian distributions with underlying temporal structures. We plot distributions and probability matrices of sequences played by humans in order to demonstrate this. A new method for humanization with structural preservation is proposed, using a Markov Chain and an Empirical Cumulative Distribution Function (ECDF) in order to weight pseudorandom variables. Finally we demonstrate the perceptual relevance of these methods using paired listening tests.
Convention Paper 8478 (Purchase now)
P6 - Speech
Thursday, October 20, 3:00 pm — 4:30 pm (Room: 1E Foyer)
P6-1 A Systematic Study on Formant Transition in Speech Using Sinusoidal Glide—Wen-Jie Wang, City University of New York - New York, NY, USA; Benjamin Guo, Chin-Tuan Tan, New York University, School of Medicine - New York, NY, USA
The goal of this study was to use sinusoidal glide to investigate the perceivable acoustic cues in a consonant-vowel /CV/ syllable systematically. The sinusoidal glide is designed to mimic the formant trajectory in a /CV/ syllable with two parts: a frequency glide followed by a constant frequency. The experiment varied the frequency step (with rising and falling glide) and duration of the initial part, and the center frequency and duration of the final part of the sinusoidal glide. We asked 6 normal hearing subjects to discriminate sinusoidal glides from sinusoids of constant frequency, and found that subjects require a larger frequency step when the duration of the glide is shortened but a smaller frequency step when the center frequency of the final part is lowered, to discriminate the two stimuli. The outcome of this experiment is compared to the outcomes of previous studies using synthesized formants and sinusoidal replicas.
Convention Paper 8479 (Purchase now)
P6-2 Perceived Quality of Resonance-Based Decomposed Vowels and Consonants—Chin-Tuan Tan, Benjamin Guo, New York University School of Medicine - New York, NY, USA; Ivan Selesnick, Polytechnic Institute of New York University - Brooklyn, NY, USA
The ultimate objective of this study is to employ a resonance-based decomposition method for the manipulation of acoustic cues in speech. Resonance-based decomposition (Selesnick, 2010) is a newly proposed nonlinear signal analysis method based not on frequency or scale but on resonance; the method is able to decompose a complex non-stationary signal into a “high-resonance" component and a “low-resonance" component using a combination of low- and high- Q-factors. In this study we conducted a subjective listening experiment on five normal hearing listeners to assess the perceived quality of decomposed components, with the intention of deriving the perceptually relevant combinations of low- and high- Q-factors. Our results show that normal hearing listeners generally rank high-resonance components of speech stimuli higher than low-resonance components. This may be due to a greater salience of perceptually significant formant cues in high-resonance stimuli.
Convention Paper 8480 (Purchase now)
P6-3 Relationship between Subjective and Objective Evaluation of Distorted Speech Signals—Mitsunori Mizumachi, Kyushu Institute of Technology - Fukuoka, Japan
It is important for designing a noise reduction algorithm to evaluate the quality of noise-reduced speech signals accurately and efficiently. Subjective evaluation gives accurate evaluation, but requires listening tests with a lot of subjects and time. Then, objective distortion measures are employed as efficient evaluation. However, almost all the distortion measures do not consider the temporal variation of speech distortion. In this paper the temporal aspect of the segmental speech distortion is investigated based on higher-order statistics, that is, variance, skewness, and kurtosis. It is interesting that the skewness of the objective evaluation gives a good explanation for the discrepancy between subjective and objective evaluation.
Convention Paper 8481 (Purchase now)
P6-4 Multiple Microphones Speech Enhancement Using Minimum Redundant Array—Kwang-Cheol Oh, Samsung Electronics Co., Ltd. - Suwon City, Gyeong-Gi Do, Korea
A non-uniformly-spaced multiple microphone array speech enhancement method with small aperture size is proposed and analyzed. The technique utilizes a minimum redundant array structure used in antenna array in order to prevent spatial aliasing for high frequencies and uses the phase difference-based dual microphone speech enhancement techniques to implement small-microphone array. It has highly directive features from the low-frequency to high frequencies evenly, and its performance is measured with directivity index. The directivity index(DI) for the proposed approach is about 3 dB higher than that of a multiple microphone approach with the phase-based filter.
Convention Paper 8482 (Purchase now)
P7 - Sound Field Analysis and Reproduction—Part 1
Thursday, October 20, 4:30 pm — 6:30 pm (Room: 1E09)
Sascha Spors, Deutsche Telekom Laboratories, Technische Universität Berlin - Berlin, Germany
P7-1 Two Physical Models for Spatially Extended Virtual Sound Sources—Jens Ahrens, Sascha Spors, Deutsche Telekom Laboratories, Technische Universität Berlin - Berlin, Germany
We present physical models for the sound field radiated by plates of finite size and spheres vibrating in higher modes. The intention is obtaining a model that allows for controlling the perceived size of a virtual sound source in model-based sound field synthesis. Analytical expressions for radiated sound fields are derived and simulations of the latter are presented. An analysis of interaural coherence in a virtual listener, which has been shown to serve as an indicator for perceived spatial extent, provides an initial proof of concept.
Convention Paper 8483 (Purchase now)
P7-2 Auditory Depth Control: A New Approach Utilizing a Plane Wave Loudspeaker Radiating from above a Listener—Sungyoung Kim, Hiraku Okumura, Hideki Sakanashi, Takurou Sone, Yamaha Corporation - Hamamatsu, Japan
One of the distinct features of a 3-D image is that the depth perceived by viewer is controlled so that objects appear to project toward the viewers. However, it has been hard to move auditory imagery near to listeners using conventional loudspeakers and panning algorithms. In this study we proposed a new system for controlling auditory depth, which incorporates two loudspeakers: one that radiates sound from the front of a listener and another that radiates plane waves from above a listener. With additional equalization that removes spectral cues corresponding to elevation, the proposed system generates an auditory image "near a listener" and controls the depth perceived by the listener, thereby enhancing the listener’s perception of 3-D sound.
Convention Paper 8484 (Purchase now)
P7-3 The SCENIC Project: Space-Time Audio Processing for Environment-Aware Acoustic Sensing and Rendering—Paolo Annibale, University of Erlangen - Erlangen, Germany; Fabio Antonacci, Paolo Bestagini, Politecnico di Milano - Milan, Italy; Alessio Brutti, Fondazione Bruno Kessler – IRST - Trento, Italy; Antonio Canclini, Politecnico di Milano - Milan, Italy; Luca Cristoforetti, Fondazione Bruno Kessler – IRST - Trento, Italy; Emanuël Habets, University of Erlangen - Erlangen, Germany and Imperial College London, London, UK; J. Filos, Walter Kellerman, Konrad Kowalczyk, Anthony Lombard, Edwin Mabande, University of Erlangen - Erlangen, Germany; Dejan Markovic, Politecnico di Milano - Milan, Italy; Patrick Naylor, Imperial College London - London, UK; Maurizio Omologo, Fondazione Bruno Kessler – IRST, Trento, Italy; Rudolf Rabenstein, University of Erlangen, Erlangen, Germany; Augusto Sarti, Politecnico di Milano, Milan, Italy; Piergiorgio Svaizer, Fondazione Bruno Kessler – IRST, Trento, Italy; Mark Thomas, I
SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for environment-aware acoustic sensing and rendering. The project focuses on space-time acoustic processing solutions that do not just accommodate the environment in the modeling process but that make the environment help toward achieving the goal at hand. The solutions developed within this project cover a wide range of applications, including acoustic self-calibration, aimed at estimating the parameters of the acoustic system; environment inference, aimed at identifying and characterizing all the relevant acoustic reflectors in the environment. The information gathered through such steps is then used to boost the performance of wavefield rendering methods as well as source localization/characterization/extraction in reverberant environments.
Convention Paper 8485 (Purchase now)
P7-4 Object-Based Sound Re-Mix for Spatially Coherent Audio Rendering of an Existing Stereoscopic-3-D Animation Movie—Marc Evrard, University of Liege - Liege, Belgium; Cédric R. André, University of Liege - Liege, Belgium, LIMSI-CNRS, Orsay, France; Jacques G. Verly, Jean-Jacques Embrechts, University of Liege - Liege, Belgium; Brian F. G. Katz, LIMSI-CNRS - Orsay, France
While 3-D cinema is becoming more mainstream, little effort has focused on the general problem of producing a 3-D sound scene spatially coherent with the visual content of a stereoscopic-3-D (s-3D) movie. The perceptual relevance of such spatial audiovisual coherence is of significant interest. In order to carry out such experiments, it is necessary to have an appropriate s-3D movie and its corresponding 3-D audio track. This paper presents the procedure followed to obtain this joint 3-D video and audio content from an exiting animated s-3D film, problems encountered, and some of the solutions employed.
Convention Paper 8486 (Purchase now)
P8 - Loudness Measurement and Perception
Thursday, October 20, 4:30 pm — 6:30 pm (Room: 1E07)
Dan Harris, Sennheiser Technology and Innovation
P8-1 Effect of Horizontal Diffusivity on Loudness—Densil Cabrera, Luis Miranda, University of Sydney - Sydney, NSW, Australia
This paper examines how the spatial characteristics of a sound field affect its loudness. In an experiment, listeners adjusted the gain of stimuli so as to match the loudness of a reference stimulus. The experiment was conducted using binaural stimuli presented over headphones—using stimuli that simulated the sound from eight sound sources evenly distributed in a circle around the listener. Four degrees of diffusivity were tested, ranging from a single active sound source, to all eight sources producing decorrelated sound with identical power spectra. Four power spectra were tested: broadband pink noise, and low-, mid-, and high-frequency bandpass-filtered pink noise. The paper finds that in modeling the binaural loudness summation of the diffuse stimuli, a binaural gain constant about 1 or 2 dB greater than that of the non-diffuse stimuli provides the least error.
Convention Paper 8487 (Purchase now)
P8-2 Locating the Missing 6 dB by Loudness Calibration of Binaural Synthesis—Florian Völk, Hugo Fastl, AG Technische Akustik, MMK, Technische Universität München - Munich, Germany
Binaural synthesis is a sound reproduction technology based on the convolution of sound signals with impulse responses defined between a source and a listener's eardrums. The convolution products are typically presented by headphones. For perceptual verification, subjects traditionally remove the headphones to listen to the corresponding real scenario, which is cumbersome and requires a pause between the stimuli. In this paper loudness adjustments are presented using a method that allows for direct comparison by defining the reference scene for a listener wearing headphones. Influences of different headphones and equalization procedures are discussed and an explanation for the difference in auditory canal pressure between headphone and loudspeaker reproduction at the same loudness commonly referred to as the missing 6 dB is deduced.
Convention Paper 8488 (Purchase now)
P8-3 Difference between the EBU R-128 Meter Recommendation and Human Subjective Loudness Perception—Fabian Begnert, Håkan Ekman, Jan Berg, Luleå University of Technology - Piteå, Sweden
The vast loudness span of broadcast sound can be reduced by the use of loudness meters. In an ideal case, the measured and the perceived loudness would be equal. A loudness meter fulfilling the EBU R-128 recommendation was investigated for its correspondence with perceived loudness. Several sound stimuli with large loudness differences representing five different types of broadcast program material were normalized to have equal meter measured loudness level. Subjects listened to pair-wise presentations of the normalized stimuli, which they subsequently set to have equal perceived loudness. The settings were recorded and analyzed. The results show that the normalization yields both equal as well as different perceived loudness between program types. The maximum difference was ±2.82 dB.
Convention Paper 8489 (Purchase now)
P8-4 A Novel Multi-Stage Loudness Control Algorithm for Audio Processing—Balaji Vodapally, Brijesh Singh Tiwari, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - NJ, USA
Modern audio broadcast processors are designed to provide an audio with consistent loudness within a specified dynamic range. In general a tight control over the processed audio level is provided by the Automatic Gain Control (AGC) mechanism. Most of the other processing functions are tuned for a pre-defined signal level; this arises the need for multi-stage loudness control. This paper presents the gain control mechanism employed in various stages of the proposed multi-stage Audio Loudness Control (MALC) and the interaction between various stages. We also present results which explain how the proposed scheme provides better control over loudness against the signal level. The paper also emphasizes the importance of frequency weighing based loudness measure.
Convention Paper 8490 (Purchase now)
P9 - Applications in Audio
Friday, October 21, 9:00 am — 12:30 pm (Room: 1E07)
Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA
P9-1 Simulation-Based Interface for Playing with Sounds in Media Applications—Insook Choi, City University of New York - Brooklyn, NY, USA; Robin Bargar, Columbia College Chicago - Chicago, IL, USA
Advanced audio processing for interactive media is in demand for a wide range of applications and devices. The requirements for interactive media contexts tend to impose both device-specific and style-specific constraints. The goal of the present research is to develop a robust approach to interactive audio that may be persistent across diverse media contexts. This project adopts a structural approach to the relationship of interactive sounds to interactive graphical media. We refer to this as a model-to-model architecture. Sound production is decoupled from specific media styles, enabling abstractions using feature analysis of simulation output that can be adapted to a variety of media devices. The identifying metaphor for this approach is playing with sounds through graphical representations and interactive scenarios.
Convention Paper 8491 (Purchase now)
P9-2 Advances in ENF Database Configuration for Forensic Authentication of Digital Media—Catalin Grigoras, Jeffrey M. Smith, Christopher W. Jenkins, University of Colorado Denver - Denver, CO, USA
When building an Electric Network Frequency (ENF) database for forensic purposes, ensuring that the recorded signal satisfies standards for forensic analysis is crucial. The ENF signal shall be free of clipping and lossy compression distortions, the signal to noise ratio shall be as high as possible, and the acquisition system clock shall be synchronized with an atomic clock. Using an ENF database to compare reference and questioned ENF involves precise measurements of amplitude, spectrum, and zero-crossings in order to accurately time-stamp, discover potential edits, and authenticate digital audio/video recordings. Due to the inherent differences in electronic components, building multiple ENF probes to create multiple databases with matching waveforms can be challenging. This paper addresses that challenge and offers a solution by using MathWorks MATLAB to calculate the best combination of components and produce graphical displays to help give a visual aid to the outcome in order to build a high quality ENF probe. This paper also addresses the challenge of establishing a fail-safe database to safely store the accurately acquired ENF information. This paper concludes that a reliable ENF database is mandatory for both scientific research and for forensic examination.
Convention Paper 8492 (Purchase now)
P9-3 Virtual Systems Engineering in Automotive Audio—Alfred J. Svobodnik, Harman International - Vienna, Austria
The present paper focuses on virtual product development for automotive audio systems. In the core a multidisciplinary simulation environment is used to perform all system engineering tasks in a fully virtual environment. First the theory of a multiphysical simulation model of electrodynamic loudspeakers is described. Subsequently, this model is extended to account for enclosures used as a resonance volume for loudspeakers (especially for reproduction of low frequency musical content). Furthermore, it is shown how the multiphysical model of loudspeakers and enclosures can be extended to simulate the radiation of sound waves into the car interior. Finally the virtual audio system, described by a multiphysical simulation model, is virtually tuned and auralized long before any piece of hardware exists. Tuning and auralization require that the simulation model is extended toward a multidisciplinary simulation environment as, additionally to engineering analysis methods, paradigms of the following disciplines are added: digital signal processing, psychoacoustics, binaural audio and subjective evaluation. The integration of the human factor (i.e., how audio events are perceived by humans with respect to spectral and spatial effects) is added in the tuning process and it is demonstrated how we can ultimately listen to a virtual audio system by means of advanced auralization techniques based on a binaural playback system. Additionally some remarks on the business benefits of these methods are given and uncertainties in our simulation models, which are inherent to every modeling approach, are addressed as well.
Convention Paper 8493 (Purchase now)
P9-4 Acoustical Modeling of Gunshots including Directional Information and Reflections—Robert Maher, Montana State University - Bozeman, MT, USA
Audio recordings of gunshots exhibit acoustical properties that depend upon the geometry and acoustical characteristics of nearby reflecting surfaces and the relative orientation of the firearm with respect to the recording microphone. Prior empirical studies have demonstrated the basic principles of gunshot recordings near the firearm and near the target. This paper describes an experiment to model the directional characteristics and reflections of several firearm types for a set of test configurations. The results show that reflections and reverberation can be a significant portion of the total acoustic energy received at the microphone.
Convention Paper 8494 (Purchase now)
P9-5 Influence of Recording Distance and Direction on the Analysis of Voice Formants—Initial Considerations—Eddy B. Brixen, EBB Consult - Smørum, Denmark; Siham Christensen
Based on recordings carried out in an anechoic chamber it is investigated to which degree the voice formants are affected by recording distance and direction in the near field (10, 20, 40, 80, and 100 cm) and different directions (on-axis, horizontally 45 and 90 degrees, vertically +/–45 degrees). This paper presents the analysis applied and discusses to what extent the results obtained must be taken into consideration when assessing voice samples for general phonetic research and for automatic voice ID/voice comparisons. It is concluded from the results of the analysis that especially weaker formants are displaced to a not negligible degree.
Convention Paper 8495 (Purchase now)
P9-6 Musical Movements—Gesture Based Audio Interfaces—Anthony Churnside, Chris Pike, Max Leonard, BBC R&D - Media City, Salford, UK
Recent developments have lead to the availability of consumer devices capable of recognizing certain human movements and gestures. This paper is a study of novel gesture-based audio interfaces. The authors present two prototypes for interacting with audio/visual experiences. The first allows a user to “conduct” a recording of an orchestral performance, controlling the tempo and dynamic. The paper describes the audio and visual capture of the orchestra and the design and construction of the audio-visual playback system. An analysis of this prototype, based on testing and feedback from a number of users, is also provided. The second prototype uses the gesture tracking algorithm to control a three-dimensional audio panner. This audio panner is tested and feedback from a number of professional engineers is analyzed.
Convention Paper 8496 (Purchase now)
P9-7 A Nimble Video Editor that Puts Audio First—Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Video editing software tends to be feature-laden, to respond sluggishly to user input—and to be focused on visuals rather than on auditory aspects. All of this is a burden when the task is to edit material in which audio plays the leading role, such as a talk show, a video podcast, or a lecture recording. This paper presents a highly visual no-frills video editor that is tailored to these tasks. It combines a range of techniques that speed up the process of searching and reviewing. They range from an overview-and-detail display to speech recognition to constant-pitch variable-speed playback. The implementation is heavily multithreaded and fully leverages the computer’s main memory to ensure a highly fluid interaction.
Convention Paper 8497 (Purchase now)
P10 - Transducers and Audio Equipment
Friday, October 21, 9:00 am — 12:00 pm (Room: 1E09)
P10-1 A Parametric Study of Magnet System Topologies for Miniature Loudspeakers—Holger Hiebel, Knowles Electronics Austria GmbH - Vienna, Austria, Graz University of Technology, Graz, Austria
This paper presents an overview of the results of a parametric study on miniature loudspeaker (microspeaker) designs. It compares a specific microspeaker design with fixed outer dimensions in three different electrodynamic magnet system topologies, namely the centermagnet, ringmagnet, and doublemagnet configurations. The study results are derived from simulations of the Bl-factor, moving mass, and effective radiating area with the voice coil inner diameter being the independent variable. Sound pressure level, electrical quality factor, and resonance frequency in a closed box were calculated and used to create easily understandable charts comparing the three topologies.
Convention Paper 8498 (Purchase now)
P10-2 A Computational Model of Vented Band-Pass Enclosure Using Transmission Line Enclosure Modeling—Jongbae Kim, Gyung-Tae Lee, Yongje Kim, Samsung Electronics Co., Ltd. - Suwon City, Gyeong-Gi Do, Korea
In order to predict low frequency performance of loudspeaker systems, the lumped parameter model is very useful. However, it doesn’t consider the enclosure geometry. Therefore, in some special cases, it makes a serious deviation between simulation and experimental results. According to the recent slim trend in IT devices, the majority of flat panel TVs and mobile devices adopt not only thin, long enclosures but also front radiating structure with waveguide. This loudspeaker system could be simplified as a vented band-pass enclosure. However due to the negligence of geometry, the effects of vent and driver location can't be considered. This paper discusses a computational model of the complicated vented band-pass enclosure using Backman’s low-frequency method for slim type band-pass enclosure models. Simulation results were compared with experimental results to verify the validity of the computational model.
Convention Paper 8499 (Purchase now)
P10-3 Nonlinear Viscoelastic Models—Finn T. Agerkvist, Technical University of Denmark - Lyngby, Denmark
Viscoelastic effects are often present in loudspeaker suspensions. This can be seen in the displacement transfer function that often shows a frequency dependent value below the resonance frequency. In this paper nonlinear versions of the standard linear solid model (SLS) are investigated. The simulations show that the nonlinear version of the Maxwell SLS model can result in a time dependent small signal stiffness while the Kelvin Voight version does not.
Convention Paper 8500 (Purchase now)
P10-4 Practical Applications of a Closed Feedback Loop Transducer System Equipped with Differential Pressure Control—Fabio Blasizzo - Trieste, Italy; Paolo Desii, Powersoft S.r.l. - Firenze, Italy; Mario Di Cola, Audio Labs Systems - Chieti, Italy; Claudio Lastrucci, Powersoft S.r.l. - Firenze, Italy
A closed feedback loop transducer system dedicated to very low frequency reproduction can be used in several different applications. The use of a feedback control loop can be very helpful to overcome some of the well known transducer limitations and to improve some of the acoustical performances of most of subwoofer systems. The feedback control of this system is based on a differential pressure control sensor. The entire system control is performed by a “Zero Latency DSP” application, specifically designed for this purpose in order to be able to process the system with real time performances. Practical applications to real world examples are being shown with design details and some test results.
Convention Paper 8501 (Purchase now)
P10-5 Dual Diaphragm Compression Drivers—Alex Voishvillo, JBL Professional - Northridge, CA, USA
A new type of compression driver consists of two motors, two diaphragm assemblies, and two phasing plugs connected to the same acoustical load (horn or waveguide). The annular flexural diaphragms are made of light and strong polymer film providing low moving mass. Unique configuration of the phasing plugs provides summation of both acoustical signals and direction of the resulting signal into a mutual acoustical load. Principles of operation of the dual driver are explained using a combination of matrix analysis, finite elements analysis, and data obtained from a scanning vibrometer. Comparison of performance of this drivers and conventional driver based on titanium dome diaphragm is performed. New transducer provides increase of powr handling, lower thermal compression, smoother frequency response, and decrease of nonlinear distortion and sub-harmonics.
Convention Paper 8502 (Purchase now)
P10-6 Distortions in Audio Op-Amps and Their Effect on Listener Perception of Character and Quality—Robert-Eric Gaskell, Peter E. Gaskell, George Massenburg, McGill University - Montreal, Quebec, Canada
Different operational amplifier topologies are frequently thought to play a significant role in the sonic character of analog audio equipment. This paper explores whether common audio operational amplifiers are capable of producing distortion characteristics within their normal operational range that can be detected by listeners and alter listener perception of character and quality. Differences in frequency response and noise are carefully controlled while the distortion characteristics of the op-amps are amplified. Listening tests are performed in order to determine what differences listeners perceive. Listening tests also examine listener preference for different op-amps for the purpose of exploring what physical measurements best predict differences in perceived audio character and quality.
Convention Paper 8503 (Purchase now)
P11 - Production and Analysis of Musical Sounds
Friday, October 21, 9:30 am — 11:00 am (Room: 1E Foyer)
P11-1 Filling the Gaps between the Grains of Down-Pitching PSOLA or Getting the Frog Out of PSOLA—Adrian von dem Knesebeck, Udo Zölzer, Helmut-Schmidt-University Hamburg - Hamburg, Germany
An improvement regarding the down-pitching quality of the Pitch Synchronous Overlap Add (PSOLA) technique is presented. The behavior of the common PSOLA algorithm when decreasing the input signal's pitch by one octave or even less is analyzed. A new grain processing algorithm is proposed, which modifies the synthesis grains depending on the synthesis period. The time domain and frequency domain properties of the proposed algorithm are discussed. The presented algorithm improves the perceived quality of the PSOLA algorithm when down-pitching while preserving the low complexity.
Convention Paper 8504 (Purchase now)
P11-2 Content-Based Approach to Automatic Recommendation of Music—Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
This paper presents a content-based approach to music recommendation. For this purpose a database that contains more than 50000 music excerpts acquired from public repositories was built. Datasets contain tracks of distinct performers within several music genres. All music pieces were converted to mp3 format and then parameterized based on MPEG-7, mel-cepstral, and time-related dedicated parameters. All feature vectors are stored as csv files and will be available on-line. A study of the database statistical characteristics was performed. Different splits into train and test sets were investigated to provide the most accurate evaluation of the decision-based solutions. Classification time and memory complexity were also evaluated.
Convention Paper 8505 (Purchase now)
P11-3 A Digital Emulation of the Boss SD-1 Super Overdrive Pedal Based on Physical Modeling—Martin Holters, Kristjan Dempwolf, Udo Zölzer, Helmut Schmidt University - Hamburg, Germany
The Boss SD-1 Super Overdrive effect pedal is a classical overdrive circuit for electric guitars. It consists of commonly found building blocks, namely common collector circuits for input and output buffering, the distortion stage as an op-amp circuit with diodes in the feed-back, and a tone-control block as a parametric linear circuit around an op-amp. In this paper we analyze the circuit to derive a digital model where carefully applied simplifications strike a good balance between faithful emulation and computational efficiency. Due to the generality of the analyzed sub-circuits, the results should also be easily transferable to the many similar sub-circuits found in other effect units.
Convention Paper 8506 (Purchase now)
P11-4 A Triode Model for Guitar Amplifier Simulation with Individual Parameter Fitting—Kristjan Dempwolf, Martin Holters, Udo Zölzer, Helmut Schmidt University - Hamburg, Germany
A new approach for the modeling of triodes is presented, featuring simple and physically-motivated equations. The mathematical description includes the replication of the grid current, which is a relevant parameter for the simulation of overdriven guitar amplifiers. If reference data from measurements of practical triodes is available, an individual fitting to the reference can be performed, adapting some free parameters. Parameter sets for individual models are given. To study the suitability for circuit simulations, a SPICE model is created and tested under various conditions. Results of the model itself and when embedded in SPICE simulations are presented and compared with measurements. It is shown that the equations characterize the properties of real tubes in good accordance.
Convention Paper 8507 (Purchase now)
P11-5 Dereverberation of Musical Instrument Recordings for Improved Note Onset Detection and Instrument Recognition—Thomas Wilmering, Mathieu Barthet; Mark B. Sandler, Queen Mary University of London - London, UK
In previous experiments it has been shown that reverberation affects the accuracy of onset detection and instrument recognition. Pre-processing a reverberated speech signal with dereverberation for automatic speech recognition (ASR), where reverberation also decreases efficiency, has been proven effective for mitigating this performance decrease. In this paper we present the results of an experimental study addressing the problem of onset detection and instrument recognition from musical signals in reverberant condition by pre-processing the audio material with a dereverberation algorithm. The experiments include four different onset detection techniques based on energy, spectrum, and phase. The instrument recognition algorithm is based on line spectral frequencies (LSF) and k-means clustering. Results show improvement in onset detection performance, particularly of the spectral-based techniques. In certain conditions we also observed improvement in instrument recognition.
Convention Paper 8508 (Purchase now)
P11-6 Dimensional Reduction in Digital Synthesizers GUI Using Parameter Analysis—Daniel Gómez, Universidad ICESI - Cali, Colombia; Juan Sebastián Botero, Ypisilon Tech - Medellín, Colombia
Digital synthesizers have cognitive overload issues for interaction, specially for novice or intermediate users. A system is developed to generate a custom GUI based on timbre clustering and parameter data analysis of preset programs present in VST synthesizers. The result is a new interface with dynamic control of the amount of variables maintaining full functionality of synthesizer performance. This system is designed to adapt to the degree of knowledge and cognitive control of synthesizer parameters of the user. The results of using diverse clustering techniques for synthesizers sound analysis and an original statistical analysis of preset programs parameter data are exposed. Implications of the use of our system in real world scenarios are reviewed.
Convention Paper 8509 (Purchase now)
P12 - Loudspeaker Reproduction
Friday, October 21, 2:00 pm — 4:00 pm (Room: 1E09)
P12-1 Size and Shape of Listening Area Reproduced by Three-Dimensional Multichannel Sound System with Various Numbers of Loudspeakers—Ikuko Sawaya, Satoshi Oode, Akio Ando, Kimio Hamasaki, NHK Science and Technology Research Laboratories - Setagaya, Tokyo, Japan
A wide listening area is necessary so that several people can listen to a multichannel sound program together. It is considered that the size of the listening area depends on the number of loudspeakers. To examine the relationship between the number of loudspeakers and the size of listening area that maintains spatial impression at the center of a three-dimensional multichannel sound, two subjective evaluation experiments were carried out. The first experiment showed that the size of the listening area increases by increasing the number of loudspeakers. The second experiment showed that the shape of the listening area is dependent on the locations of loudspeakers. On the basis of the experimental results, a new parameter for estimating the shape of listening area is proposed.
Convention Paper 8510 (Purchase now)
P12-2 Numerically Optimized Touring Loudspeaker Arrays—Practical Applications—Ambrose Thompson, Jason Baird, Bill Webb, Martin Audio - High Wycombe, UK
We describe the implementation of a user guided automated process that improves the quality and consistency of loudspeaker array deployment. After determining basic venue geometry a few easily understood goals for regions surrounding the array are specified. The relative importance of the goals is then communicated to the optimization algorithm in an intuitive manner. Some representative examples are presented, initially optimized with default coverage goals. We then impose extra requirements such as changing the coverage at the last moment, avoiding noise sensitive regions and demanding a particularly quiet stage.
Convention Paper 8511 (Purchase now)
P12-3 Vertical Loudspeaker Arrangement for Reproducing Spatially Uniform Sound—Satoshi Oode, Ikuko Sawaya, Akio Ando, Kimio Hamasaki, Japan Broadcasting Corporation - Setagaya, Tokyo, Japan; Kenji Ozawa, University of Yamanashi - Kofu, Yamanashi, Japan
It was recently recognized that the loudspeaker arrangement of multichannel sound systems can be vertically expanded to improve the spatial impression. This paper discusses the relationship between the vertical interval between loudspeakers placed in a semicircle above ear height and the impression of spatially uniform sound, as part of a study of three-dimensional multichannel sound systems. In total, 24 listeners evaluated the spatial uniformity of white noise reproduced by loudspeakers arranged in vertical semicircles at equal intervals of 15°, 30°, 45°, 60°, 90° or 180°, and with azimuthal angles of 0°, 45°, or 90°. Loudspeakers arranged with vertical intervals that were less than 45° were found to reproduce spatially uniform sound for all of the azimuthal angles tested.
Convention Paper 8512 (Purchase now)
P12-4 Multichannel Sound Reproduction in the Environment for Auditory Research—Mark A. Ericson, Army Research Laboratory - Aberdeen Proving Ground, MD, USA
The Environment for Auditory Research (EAR) is a new U.S. Army Research Laboratory facility at Aberdeen Proving Ground, Maryland, dedicated to spatial sound perception and speech communication research. The EAR is comprised of four indoor research spaces (Sphere Room, Dome Room, Distance Hall, and Listening Laboratory), one outdoor research space (Open EAR), and one common control center (Control Room). Digital audio signals are routed through state-of-the-art RME Hammerfall DSP and Peavey MediaMatrix® hardware to over 600 loudspeakers and microphone channels throughout the facility. The facility’s acoustic environments range from anechoic, through various soundscapes, to real field environments. The EAR facility layout, the audio signal processing capabilities, and some current research activities are described.
Convention Paper 8513 (Purchase now)
P13 - Low Bit-Rate Coding—Part 1
Friday, October 21, 2:00 pm — 4:30 pm (Room: 1E07)
Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany
P13-1 Performance of MPEG Unified Speech and Audio Coding—Schuyler Quackenbush, Audio Research Labs - Scotch Plains, NJ, USA; Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
The MPEG Unified Speech and Audio Coding (USAC) standard completed technical development in July, 2011 and is expected to issue as international standard ISO/IEC 23003-3, Unified Speech and Audio Coding in late 2011. Verification tests were conducted to assess the subjective quality of this new specification. Test material consisted of 24 items of mixed speech and music content and performance was assessed via subjective listening tests at coding rates ranging from 8 kb/s for mono material to 96 kb/s for stereo material. The mean performance of USAC was found to be better than that of MPEG High Efficiency AAC V2 (HE-AAC V2) and Adaptive Multi-Rate Wide Band Plus (AMR-WB+) at all tested operating points. For most bit rates tested this performance advantage is large. Furthermore, USAC provides a much more consistent quality across all signal content types than the other systems tested. This paper summarizes the results of the verification tests.
Convention Paper 8514 (Purchase now)
P13-2 Improved Error Robustness for Predictive Ultra Low Delay Audio Coding—Michael Schnabel, Ilmenau University of Technology - Ilmenau, Germany; Michael Werner, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Gerald Schuller, Ilmenau University of Technology - Ilmenau, Germany
This paper proposes a new method for improving the audio quality of predictive perceptual audio coding in the context of the Ultra Low Delay (ULD) coding scheme for real time applications. The commonly used auto-regressive (AR) signal model is leading to an IIR predictor in the decoder. For random access of the transmission as well as for transmission errors, a reset of the predictor states, in both encoder and decoder is used. The resets reduce the prediction performance and thus the SNR, especially during stationary signal parts, since the resulting noise peaks could become audible. This paper shows that using adaptive reset intervals, that are chosen according to psychoacoustic rules, improves the audio quality.
Convention Paper 8515 (Purchase now)
P13-3 AAC-ELD v2—The New State of the Art in High Quality Communication Audio Coding—Manfred Lutzky, María Luis Valero, Markus Schnell, Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Recently MPEG finished the standardization of a Low Delay MPEG Surround tool that is tailored for enhancing the widely adopted AAC-ELD low delay codec for high-quality audio communication into AAC-ELD v2. In combination with the Low Delay MPEG Surround tool, the coding efficiency for stereo content outperforms competing low delay audio codecs at least by a factor of 2. This paper describes the technical challenges and solutions for designing a low delay codec that delivers a performance that is comparable to that of existing state of the art compression schemes featuring much higher delay, such as HE AAC v2. It provides a comparison to competing proprietary and ITU-T codecs, as well as a guideline for how to select the best possible points of operation. Applications facilitated by AAC-ELD v2 in the area of broadcasting and mobile video conferencing are discussed.
Convention Paper 8516 (Purchase now)
P13-4 QMF-Based Harmonic Spectral Band Replication—Haishan Zhong, Panasonic Corporation; Lars Villemoes, Per Ekstrand, Dolby Sweden AB; Sascha Disch, Frederik Nagel, Stephan Wilde, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Kok Seng Chong, Takeshi Norimatsu, Panasonic Corporation
Unified speech and audio coding (USAC) is the next step in the evolution of audio codecs that are standardized by the Moving Picture Experts Group (MPEG). USAC provides consistent quality for music, speech and mixed material, by extending the strength of an audio codec by speech codec functions. Two novel flavors of Spectral Band Replication (SBR) were introduced to enhance the perceptual quality of SBR: the Discrete Fourier Transform (DFT) based and the Quadrature Mirror Filterbank (QMF) harmonic SBR. The DFT-based SBR has higher frequency resolution for the harmonic transposition process, resulting in good sound quality. The QMF-based SBR has significantly lower computational complexity. This paper describes the detailed technical aspects of the low complexity QMF-based harmonic SBR tool within USAC. A complexity comparison and the listening test results are also presented in the paper.
Convention Paper 8517 (Purchase now)
P13-5 Perceptually Optimized Cascaded Long Term Prediction of Polyphonic Signals for Enhanced MPEG-AAC—Tejaswi Nanjundaswamy, Kenneth Rose, University of California Santa Barbara - Santa Barbara, CA, USA
MPEG-4 Advanced Audio Coding uses the long term prediction (LTP) tool to exploit inter-frame correlations by providing a segment of previously reconstructed samples as prediction for the current frame, which is naturally useful for encoding signals with a single periodic component. However, most audio signals are polyphonic in nature containing a mixture of several periodic components. While such polyphonic signals are themselves periodic with overall period equaling the least common multiple of the individual component periods, the signal rarely remains sufficiently stationary over the extended period, rendering the LTP tool ineffective. Further hindering the LTP tool is the typically employed parameter selection based on minimizing the mean squared error as opposed to the perceptual distortion criteria defined for audio coding. We thus propose a technique to exploit the correlation of each periodic component with its immediate past, while taking into account the perceptual distortion criteria. Specifically, we propose cascading LTP filters corresponding to individual periodic components, designed appropriately in a two stage method, wherein an initial set of parameters is estimated backward adaptively to minimize the mean squared prediction error, followed by a refinement stage where parameters are adjusted to minimize the perceptual distortion. Objective and subjective results validate the effectiveness of the proposal on a variety of polyphonic signals.
Convention Paper 8518 (Purchase now)
P14 - Audio Processing
Friday, October 21, 2:30 pm — 4:00 pm (Room: 1E Foyer)
P14-1 Acoustic Channel Decorrelation with Phase Modification for Stereo Acoustic Echo Cancellation—Jae-Hoon Jeong, So-Young Jeong, Woo-Jeong Lee, Jung-Eun Park, Jeong-Su Kim, Yongje Kim, Samsung Electronics Co., Ltd. - Suwon, Korea
In this paper we propose a novel acoustic channel decorrelation method, which prevents poor convergence in stereophonic echo cancellation. In order to minimize audio quality degradation while maximizing channel decorrelation, the proposed method provides a subband phase modification of each channel, which depends on the amount of inter-channel phase difference so that redundant phase alteration is avoided. In addition, we introduce a phase control parameter for each subband to preserve perceptual stereo image of each channel. The performances of the proposed method are verified with MUSHRA subjective audio quality test, impulse response misalignment, and echo return loss enhancement. The results show that the proposed method has a good decorrelation performance while minimizing signal distortion and stereo image change.
Convention Paper 8519 (Purchase now)
P14-2 Least-Squares Local Tuning Frequency Estimation for Choir Music—Volker Gnann, Markus Kitza, Julian Becker, Martin Spiertz, RWTH Aachen University - Aachen, Germany
Choir conductors often have to deal with the problem that the tuning pitch of a choir tends to decrease gradually over time. For that reason, we present an algorithm that measures and displays the tuning frequency evolution for polyphonic choir music over time. Basically, it analyzes the short-time Fourier spectrogram, picks out the most important peaks, and determines their frequencies. From these frequencies, the algorithm calculates the concert A pitch that leads to the smallest least-squares-error when the measured frequencies are sorted into a semitone grid.
Convention Paper 8520 (Purchase now)
P14-3 A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real Time Applications—Andrea Primavera, Stefania Cecchi, Laura Romoli, Paolo Peretti, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy
FIR convolution is a widely used operation in the digital signal processing field, especially for filtering operations in real time scenarios. In this context low computationally demanding techniques for calculating convolutions with low input/output latency become essential, considering that the real time requirements are strictly related to the impulse response length. In this paper a multithreaded real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is proposed with the aim of lowering the workload required in applications like reverberation, also exploiting the human ear sensitivity. Several results are reported in order to show the effectiveness of the proposed approach in terms of computational cost, taking into consideration different impulse responses and also introducing comparisons with existing techniques of the state of the art.
Convention Paper 8521 (Purchase now)
P14-4 3-D Audio Depth Rendering Method for 3-DTV—Sunmin Kim, Young Woo Lee, Yongje Kim, Samsung Electronics Co., Ltd. - Suwon, Korea
This paper proposes a novel 3-D audio depth rendering method with stereo loudspeakers in order to enhance an immersion of 3-D video contents. The 3-D audio depth rendering system for 3-DTV consists of an audio depth index that estimates the distance of an audio object between TV and a listener, and a distance control algorithm based on the audio depth index. Two kinds of audio depth index estimation algorithms are presented. One utilizes a disparity map of stereo image, and the other tracks loudness of stereo audio signal. Listening tests show that the proposed audio depth rendering system allows the listener to feel the depth of the sound corresponding to a pop-up effect of the 3-D image.
Convention Paper 8522 (Purchase now)
P14-5 Virtual Height Speaker Rendering for Samsung 10.2-Channel Vertical Surround System—Young Woo Lee, Sunmin Kim, Samsung Electronics Co., Ltd. - Suwon, Korea; Hyun Jo, Youngjin Park, KAIST - Daejeon, Korea; Yongje Kim, Samsung Electronics Co., Ltd. - Suwon, Korea
This paper proposes the virtual sound elevation rendering algorithm that can give a listener an impression of virtual 10.2 channel loudspeakers. The proposed algorithm requires 10.2 channel input signals and the conventional 7.1 channel loudspeaker system (ITU-R BS.775-2). The proposed virtual height speaker rendering consists of generic head-related transfer function (HRTF), which is calculated using 45 individualized HRTFs of CIPIC datasets, and a mixing algorithm using four loudspeakers among 7.1 channels. For subject evaluation, three kinds of playbacks are compared with various materials: original 10.2 channel signal, down-mixed 7.1 channel signal, and the proposed 7.1 channel signal in terms of source positioning, envelopment, and overall sound quality.
Convention Paper 8523 (Purchase now)
P14-6 Non Linear Convolution and its Application to Audio Effects—Lamberto Tronchin, University of Bologna - Bologna, Italy
The non linear convolution could be applied to enhance the linear convolution on acoustic musical instruments and audio devices. In this paper a novel technology, based on exponential sine sweeps measurements, is presented. The non linear convolution is based on the Volterra series approach and enables real time non linear emulation of acoustic devices (as valve amplifiers and musical instruments). The new developed tool (a VST plugin developed in C++ in JUCE framework) will be presented. The emulation of musical instruments will be compared with real recordings. The results will be finally analyzed and commented.
Convention Paper 8524 (Purchase now)
P15 - Sound Field Analysis and Reproduction—Part 2
Friday, October 21, 4:00 pm — 6:30 pm (Room: 1E09)
P15-1 Broadband Analysis and Synthesis for Directional Audio Coding Using A-Format Input Signals—Archontis Politis, Ville Pulkki, Aalto University - Espoo, Finland
Directional Audio Coding (DirAC) is a parametric non-linear technique for spatial sound recording and reproduction, with flexibility in terms of loudspeaker reproduction setups. In the general 3-dimensional case, DirAC utilizes as input B-format signals, traditionally derived from the signals of a regular tetrahedral first-order microphone array, termed A-format. For high-quality rendering, the B-format signals are also exploited in the synthesis stage. In this paper we propose an alternative formulation of the analysis and synthesis, which avoids the effect of non-ideal B-format signals on both stages, and achieves improved broadband estimation of the DirAC parameters. Furthermore, a scheme for the synthesis stage is presented that utilizes directly the A-format signals without conversion to B-format.
Convention Paper 8525 (Purchase now)
P15-2 Beamforming Regularization, Scaling Matrices, and Inverse Problems for Sound Field Extrapolation and Characterization: Part I—Theory—Philippe-Aubert Gauthier, Éric Chambatte, Cédric Camier, Yann Pasco, Alain Berry, Université de Sherbrooke - Sherbrooke, Québec, Canada, and McGill University, Montreal, Québec, Canada
Sound field extrapolation (SFE) is aimed at the prediction of a sound field in an extrapolation region using a microphone array in a measurement region. For sound environment reproduction purposes, sound field characterization (SFC) aims at a more generic or parametric description of a measured or extrapolated sound field using different physical or subjective metrics. In this paper an SFE method recently introduced is presented and further developed. The method is based on an inverse problem formulation combined with a beamforming matrix in the discrete smoothing norm of the cost function. The results obtained from the SFE method are applied to SFC for subsequent sound environment reproduction. A set of classification criteria is proposed to distinguish simple types of sound fields on the basis of two simple scalar metrics. A companion paper presents the experimental verifications of the theory presented in this paper.
Convention Paper 8526 (Purchase now)
P15-3 Beamforming Regularization, Scaling Matrices and Inverse Problems for Sound Field Extrapolation and Characterization: Part II—Experiments—Philippe-Aubert Gauthier, Éric Chambatte, Cédric Camier, Yann Pasco, Alain Berry, Université de Sherbrooke - Sherbrooke, Québec, Canada, and McGill University, Montreal, Québec, Canada
Sound field extrapolation (SFE) is aimed at the prediction of a sound field in an extrapolation region using microphone array. For sound environment reproduction purposes, sound field characterization (SFC) aims at a more generic or parametric description of a measured or extrapolated sound field using different physical or subjective metrics. In this paper experiments with a recently-developed SFE method (Part I—Theory) are reported in a first instance. The method is based on an inverse problem formulation combined with a recently proposed regularization approach: a beamforming matrix in the discrete smoothing norm of the cost function. In a second instance, the results obtained from the SFE method are applied to SFC as presented in Part I. The SFC classification method is verified in two environments that recreate ideal or complex sound fields. At the light of the presented results and discussion, it is argued that SFE and SFC proposed methods are effective.
Convention Paper 8527 (Purchase now)
P15-4 Mixed-Order Ambisonics Recording and Playback for Improving Horizontal Directionality—Sylvain Favrot, Marton Marschall, Johannes Käsbach, Technical University of Denmark - Lyngby, Denmark; Jörg Buchholz, Macquarie University - Sydney, NSW, Australia; Tobias Weller, Technical University of Denmark - Lyngby, Denmark
Planar (2-D) and periphonic (3-D) higher-order Ambisonics (HOA) systems are widely used to reproduce spatial properties of acoustic scenarios. Mixed-order Ambisonics (MOA) systems combine the benefit of higher order 2-D systems, i.e., a high spatial resolution over a larger usable frequency bandwidth, with a lower order 3-D system to reproduce elevated sound sources. In order to record MOA signals, the location and weighting of the microphones on a hard sphere were optimized to provide a robust MOA encoding. A detailed analysis of the encoding and decoding process showed that MOA can improve both the spatial resolution in the horizontal plane and the useable frequency bandwidth for playback as well as recording. Hence the described MOA scheme provides a promising method for improving the performance of current 3-D sound reproduction systems.
Convention Paper 8528 (Purchase now)
P15-5 Local Sound Field Synthesis by Virtual Acoustic Scattering and Time-Reversal—Sascha Spors, Karim Helwani, Jens Ahrens, Deutsche Telekom Laboratories, Technische Universität Berlin - Berlin, Germany
Sound field synthesis techniques like Wave Field Synthesis and near-field compensated higher order Ambisonics aim at synthesizing a desired sound field within an extended area using an ensemble of individually driven loudspeakers. Local sound field synthesis techniques achieve an increased accuracy within a restricted local listening area at the cost of stronger artifacts outside. This paper proposes a novel approach to local sound field synthesis that is based upon the scattering from a virtual object bounding the local listening area and the time-reversal principle of acoustics. The physical foundations of the approach are introduced and discussed. Numerical simulations of synthesized sound fields are presented as well as a comparison to other published methods.
Convention Paper 8529 (Purchase now)
P16 - Low Bit-Rate Coding—Part 2
Friday, October 21, 4:30 pm — 6:30 pm (Room: 1E07)
Christof Faller, Illusonic LLC - St-Sulpice, Swtizerland
P16-1 A Subband Analysis and Coding Method for Downmixing Based Multichannel Audio Codec—Shi Dong, Ruimin Hu, Weiping Tu, Xiang Zheng, Wuhan University - Wuhan, Hubei, China
In the present downmixing based multichannel coding standard, the downmixing process causes the “tone leakage” problem by mixing different channels into one channel. In this paper a novel multichannel analysis method is proposed to reduce “tone leakage” phenomenon with additive side information, the basic idea is to find subbands with the largest spectrum difference and coding their spectrum envelope information. By analyzing the decoded signals, leakage tones are identified and attenuated, then the original ones are reconstructed, which meantime retain the original interchannel level difference (ICLD) of subbands unchanged. Results show our method can improve subjective quality compared with HE-AAC (v2) codec with bit rate increasing slightly.
Convention Paper 8530 (Purchase now)
P16-2 Characterizing the Perceptual Effects Introduced by Low Bit Rate Spatial Audio Codecs—Paulo Marins, Universidade de Brasília - Brasília, Brazil
This paper describes a series of experiments that was carried out aiming to characterize the perceptual effects introduced by low bit rate spatial audio codecs. An initial study was conducted with the intention of investigating the contribution of selected attributes to the basic audio quality of low bit rate spatial codecs. Furthermore, another two experiments were performed in order to identify the perceptually salient dimensions or the independent perceptual attributes related to the artifacts introduced by low bit rate spatial audio coding systems.
Convention Paper 8531 (Purchase now)
P16-3 Error Robust Low Delay Audio Coding Based on Subband-ADPCM—Stephan Preihs, Jörn Ostermann, Institute for Information Processing, Leibniz Universität Hannover - Hannover, Germany
In this paper we present an approach for error robust audio coding at a medium data rate of about 176 kbps (mono, 44.1 kHz sampling rate). By combining a delay-free Adaptive Differential Pulse Code Modulation (ADPCM) coding-scheme and a numerically optimized low delay filter bank we achieve a very low algorithmic coding delay of only about 0.5 ms. The structure of the codec also allows for a high robustness against random single bit errors and even supports error resilience. Implementation structure, results of a listening test, and PEAQ (Perceptual Evaluation of Audio Quality) based objective audio quality evaluation as well as tests of random single bit error performance are given. The presented coding-scheme provides a very good audio quality for vocals and speech. For most of the critical signals the audio quality can still be denoted as acceptable. Tests of random single bit error performance show good results for error rates up to 10-4.
Convention Paper 8532 (Purchase now)
P16-4 The Transient Steering Decorrelator Tool in the Upcoming MPEG Unified Speech and Audio Coding Standard—Achim Kuntz, Sascha Disch, Tom Bäckström, Julien Robillard, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Applause signals are still challenging to code with good perceptual quality using parametric multichannel audio coding techniques. To improve the codec performance for these particular items, the Transient Steering Decorrelator (TSD) tool has been adopted into the upcoming Moving Picture Experts Group (MPEG) standard on Unified Speech and Audio Coding (USAC) as an amendment to the MPEG Surround 2-1-2 module (MPS). TSD improves the perceptual quality of signals which contain rather dense spatially distributed transient auditory events like in applause type of signals. Within TSD, transient events are separated from the core decoder output and a dedicated decorrelator algorithm distributes the transients in the spatial image according to parametric guiding information transmitted in the bitstream. Listening tests show a substantial improvement in subjective quality.
Convention Paper 8533 (Purchase now)
P17 - Audio Equipment and Measurement
Friday, October 21, 4:30 pm — 6:00 pm (Room: 1E Foyer)
P17-1 Swept Sine Grains Applied to Acoustical Measurements Using Perceptual Masking Effects or Musical Compositions—Joel Preto Paulo, ISEL- Instituto Superior de Engenharia de Lisboa - Lisbon, Portugal, CAPS—Instituto Superior Tecnico, TU Lisbon, Lisbon, Portugal; J. L. Bento Coelho, IST- CAPS—Instituto Superior Tecnico, TU Lisbon - Lisbon, Portugal
The swept sine technique has proven to lead to accurate estimations of the room impulse response, even in situations of low SNR, non-linearity, and time-variance of the system under test. Regarding the distributive property of the convolution, the swept sine signal can be split into several segments, namely grains, and sent separately to the room. At the reception, the full frame is assembled from the grains by applying the overlap-add based procedure. Choosing appropriate windows, values to the truncation and the overlap of the captured grains, the degree of degradation on the final results, measured by the amount of produced noise, can be controlled. Therefore, each grain can be matched according to the tempered musical scale, which can be applied on musical compositions, or used in perceptual models, taking into account the human auditory masking effects, to compose a test signal frame for acoustical measurements. The possibility of polyphony musical compositions for use on acoustical measurements is assessed and discussed.
Convention Paper 8534 (Purchase now)
P17-2 Multichannel Impulse Response Measurement in Matlab—Braxton Boren, Agnieszka Roginska, New York University - New York, NY, USA
This paper describes ScanIR, an application for flexible multichannel impulse response measurement in Matlab intended for public distribution. The application interfaces with the PortAudio API using Psychtoolbox-3, a toolkit in Matlab allowing high-precision control of a multichannel audio interface. ScanIR contains single-channel, binaural, and multichannel input modes, and it also allows the use of multiple output test signals. It is hoped that this application will prove useful to researchers using Matlab for physical or psychological acoustic measurements.
Convention Paper 8535 (Purchase now)
P17-3 Why Do Tube Amplifiers Have Fat Sound while Solid State Amplifiers Don't?—Shengchao Li, Wintersweet Electronics, LLC - Potomac, MD, USA
I propose an explanation to why tube amplifiers sound better than solid state amplifiers in certain circumstances. The explanation is, the interaction of (1) the nonlinearity of the output tube, (2) output impedance of the amplifier, and (3) the nonlinearity of the output transformer inductance caused by core material B-H curve, results in a frequency selective nonlinear feedback system that softly limits the speaker cone excursion for low frequency music signals with excessive amplitude, while has little effect on high frequency music signals or low frequency music signals with low to moderate amplitude. Better yet, when low frequency music signals with excessive amplitude is superposed with high frequency music signals, this system selectively limits low frequency music signals and has little effect on the superposed high frequency music signals. Comparing to that of a typical solid state amplifier, this mechanism trades some amplifier nonlinearity for less speaker nonlinearity, resulting in less overall nonlinearity of the music sound waves people's ears perceive.
Convention Paper 8536 (Purchase now)
P18 - Headphone Playback
Saturday, October 22, 9:00 am — 12:00 pm (Room: 1E09)
P18-1 The Effects of Headphones on Listener HRTF Preference—Braxton Boren, Agnieszka Roginska, New York University - New York, NY, USA
Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners’ HRTF preference for three different sets of headphones. HRTF datasets heard over the noise-cancelling Bose Aviation headset were selected as having good externalization more often than those heard over Sennheiser HD650 open headphones or Sony MDR-7506 closed headphones. It is thought that the Bose headset’s frequency response is responsible for its superior externalization. This suggests that in systems where high quality headphones are not available, post-processing equalization should be applied to account for the effect of the headphones on HRTF reproduction.
Convention Paper 8537 (Purchase now)
P18-2 Head Orientation Tracking Using Binaural Headset Microphones—Hannes Gamper, Sakari Tervo, Tapio Lokki, Aalto University School of Science - Aalto, Finland
A head orientation tracking system using binaural headset microphones is proposed. Unlike previous approaches, the proposed method does not require anchor sources, but relies on speech signals from the wearers of the binaural headsets. From the binaural microphone signals, time difference of arrival (TDOA) estimates are obtained. The tracking is performed using a particle filter integrated with a maximum likelihood estimation function. In a case study the proposed method is used to track the head orientations of three conferees in a meeting scenario. With an accuracy of about 10 degrees the proposed method is shown to outperform a reference method, which achieves an accuracy of about 35 degrees.
Convention Paper 8538 (Purchase now)
P18-3 Observing the Clustering Tendencies of Head Related Transfer Function Databases—Areti Andreopoulou, Agnieszka Roginska, Juan Bello, New York University - New York, NY, USA
This study offers a detailed description of the clustering tendencies of a large, standardized HRTF repository, and compares the quality of the results to those of a CIPIC database subset. The statistical analysis was implemented by applying k-means clustering on the log magnitude of HRTFs on the horizontal plane, for a varying number of clusters. A thorough report on the grouping behavior of the filters as the number of clusters increases revealed a superiority of the HRTF repository in describing common behaviors across equivalent azimuth positions, over the CIPIC subset, for the majority of the HRTF datasets.
Convention Paper 8539 (Purchase now)
P18-4 Individual Perception of Headphone Reproduction Asymmetry—Juha Merimaa, Sennheiser Research Laboratory - Palo Alto, CA, USA; V. Ralph Algazi, University of California Davis - Davis, CA, USA; Richard O. Duda - Menlo Park, CA, USA
With headphone listening, the naturally occurring left/right asymmetry in head and ear shapes can produce frequency-dependent variations in the perceived location of a sound source. In this paper the perception of such asymmetry is studied by determining the interaural level differences required to center a set of narrow-band stimuli with different center frequencies. It is shown that the asymmetry varies from one listener to another. Some of the asymmetry can be explained with asymmetry in transmission of sound from the headphones to the entrances of a listener’s ear canals. However, the perceived asymmetry for individual listeners is also correlated between different headphone types including in-ear headphones that couple directly to the ear canals. The asymmetry is relatively stable over different times of wearing the headphones. The effect of correcting for the asymmetry ranges from imperceptible to substantial depending on the individual subject.
Convention Paper 8540 (Purchase now)
P18-5 Binaural Reproduction of Stereo Signals Using Upmixing and Diffuse Rendering—Christof Faller, Illusonic LLC - St-Sulpice, Swtizerland; Jeroen Breebaart, ToneBoosters - Eindhoven, The Netherlands
In this paper benefits and challenges related to binaural rendering for conventional stereo content are explained in terms of width of the sound stage, timbral changes, the perceived distance, and the naturalness of phantom sources. To resolve some of the identified issues, a two-stage process consisting of a spatial decomposition followed by dedicated post processing methods is proposed. In the first stage, several direct sound source signals and additional ambience components are extracted from the stereo content. These signals are subsequently processed with dedicated algorithms to render virtual sound sources by means of HRTF or BRIR convolution and to render an additional diffuse sound field with the correct inter-aural coherence properties based on the extracted ambience signals. It is argued that this approach results in a wider sound stage, more natural and accurate spatial imaging of sound sources, and resolves the “here and now” versus the “there and then” duality for room acoustic simulation in binaural rendering methods.
Convention Paper 8541 (Purchase now)
P18-6 Sound Quality Evaluation Based on Attributes—Application to Binaural Contents—Sarah Le Bagousse, Orange Labs - Cesson Sévigné, France; Mathieu Paquier, Laboratoire d’Informatique des Systemes Compexes - Brest, France; Catherine Colomes, Samuel Moulin, Orange Labs - Cesson Sévigné, France
The audio quality assessment is based on standards that mainly evaluate the overall quality to the detriment of more accurate sound criteria. On the other hand, a major problem of an assessment based on sound criteria is their meaning and their understanding that have to be the same for each listener. A previous study clustered a list of sound attributes in three main categories called “timbre," “space," “defaults." The work presented here is based on those previous results and aims at tuning a subjective test methodology of spatial audio quality. So the three families were included in a test dedicated to the assessment of spatial audio quality with binaural contents. The test was based on the MUSHRA method but using three anchors specifically to each attribute and without explicit reference. The original version was added as the hidden reference. The aim of the listening test described in this paper was to verify the relevance of those three attributes and their influence on the overall quality.
Convention Paper 8542 (Purchase now)
P19 - Recording and Reproduction
Saturday, October 22, 9:30 am — 11:00 am (Room: 1E Foyer)
P19-1 A Revised Approach to Teaching Audio Mixing Techniques: Applying the Deliberate Practice Model—John Holt Merchant, III, Middle Tennessee State University - Murfreesboro, TN, USA
An overview of the Mixing Techniques course currently offered at Middle Tennessee State University, which was designed to help students develop substantive foundational knowledge and technological competencies regarding the aesthetic and technological aspects of audio mixing techniques by applying the tenets of the Deliberate Practice model. Relevant studies in human performance, characteristics of Millennial students, and pedagogy for developing mental models of audio engineering systems are considered as they apply to recording arts course and curricular design. The results of this study suggest that implementing rigorous, formal practice of foundational skills in audio mixing courses significantly improves students’ capabilities.
Convention Paper 8543 (Purchase now)
P19-2 Sound Field Recording and Reproduction Using Transform Filter Designed in Spatio-Temporal Frequency Domain—Shoichi Koyama, Ken'ichi Furuya, Yusuke Hiwasaki, Yoichi Haneda, NTT Cyber Space Laboratories, NTT Corporation - Tokyo, Japan
A method of transform from the received signals of a microphone array to the driving signals of a loudspeaker array for sound field reproduction is investigated. Our objective is to obtain the driving signal of a planar or linear loudspeaker array only from the sound pressure distribution acquired by a planar or linear microphone array. We derive a formulation of the transform from the received signals of the microphone array to the driving signals of the loudspeaker array. The transform is achieved as the mean of a filter in a spatio-temporal frequency domain. Results of measurement experiments in an anechoic room are presented to compare the proposed method with the method based on the conventional least mean square algorithm. The reproduction accuracies were found to be almost the same, but the filter size and amount of calculation required for the proposed method were much smaller than those for the least mean square algorithm based one.
Convention Paper 8544 (Purchase now)
P19-3 Practical Digital Playback of Gramophone Records Using Flat-Bed Scanner Images—Baozhong Tian, West Virginia University Institute of Technology - Montgomery, WV, USA; Samuel Sambasivam, Azusa Pacific University - Azusa, CA, USA; John L. Barron, The University of Western Ontario - London, Ontario, Canada
We present an Optical Audio Reconstruction (OAR) system to play the audio from gramophone records based on image processing. OAR uses an off-the-shelf scanner to achieve an affordable and practical method of reconstructing audio. Converting the analog records to a digital format is important for preserving many historical recordings. We scan the records using a high resolution large format flat-bed scanner. The images were then segmented and the grooves were tracked to simulate the movement of the stylus. The sound signal was converted from the groove track positions. Our OAR algorithm was able to reconstruct the audio successfully from scanned records. We demonstrated that a fast OAR system producing good quality sound can be built economically.
Convention Paper 8545 (Purchase now)
P19-4 CAIRA—A Creative Artificially-Intuitive and Reasoning Agent as Conductor of Telematic Music Improvisations—Jonas Braasch, Doug Van Nort, Selmer Bringsjord, Pauline Oliveros, Anthony Parks, Colin Kuebler, Rensselear Polytechnic Institute - Troy, NY, USA
This paper reports on the architecture and performance of the Creative Artificially-Intuitive and Reasoning Agent CAIRA as a conductor for improvised avant-garde music. CAIRA listening skills are based on a music recognition system that simulates the human auditory periphery to perform an Auditory Scene Analysis (ASA). Its simulation of cognitive processes includes a comprehensive cognitive calculus for reasoning and decision-making using logic-based reasoning. In a specific application, CAIRA is used as conductor for live music performances with distributed ensembles, where the musicians are connected via the internet. CAIRA uses a visual score and directs the ensemble members based on tension arc estimations for the individual music performers.
Convention Paper 8546 (Purchase now)
P19-5 Discrimination between Phonograph Playback Systems—David M. Weigl, Jason Hockman, Catherine Guastavino, Ichiro Fujinaga, McGill University - Montreal, Quebec, Canada
Digitization of phonograph records is an important step toward the preservation of our cultural history and heritage. The phonograph playback systems (PPS) required for this digitization process are comprised of several components in a variety of price ranges. We report on the results of two listening tests intended to ascertain the extent to which expert listeners can discriminate between PPS of different price ranges. These results are intended to determine the extent to which component selection affects the discrimination between PPS and to provide a set of guidelines for the purchase of PPS components for the digitization of phonograph record collections.
Convention Paper 8547 (Purchase now)
P19-6 Vibration Analysis of Edge and Middle Exciters in Multiactuator Panels—Basilio Pueo, José Vicente Rico, University of Alicante - Alicante, Spain; José Javier López, Technical University of Valencia - Valencia, Spain
Multiactuator panels (MAPs) are an extension of the distribute mode technology in order to be employed for Wave Field Synthesis (WFS) applications. In this paper a special type of semiclamped boundary condition that exhibited proper results in recent convention papers by the authors, is used for setting a MAP ready for use. For that purpose, the surface velocity over specific points has been measured with a Laser Doppler Vibrometer, paying special attention to representative exciter locations: central area, edge, and corners. For an understanding of the sound-generating behavior of the panel, measures were taken both for exciters accommodated in a roughly centered line, as in the rest of MAP prototypes and for an identical array of exciters positioned at the upper side of the panel, in which the behavior was still acceptable for WFS purposes. Experiments were also conducted to analyze the role that the exciter coupling ring, used to physically attach transducer to panel, has on the vibrational behavior and radiated sound.
Convention Paper 8548 (Purchase now)
P19-7 Challenges in 2.4 GHz Wireless Audio Streaming—Robin Hoel, Tomas Motos, Texas Instruments Inc. - Oslo, Norway
Based on the experiences accumulated during the development of a family of 2.4 GHz wireless audio streaming ICs, the paper presents challenges in providing high-quality, uninterrupted audio streaming in the 2.4 GHz ISM band. It discusses the impairments to expect in a dense indoor radio environment, gives an overview of the main interfering radio standards in the 2.4 GHz band that any such system must coexist with, and outlines methods and techniques that are essential in order to overcome these difficulties. To exemplify, some of the critical design choices that were made for the first device in the family are presented along with thoughts on possible future improvements.
Convention Paper 8549 (Purchase now)
P19-8 Ray-Traced Graphical User Interfaces for Audio Effect Plug-ins—Benjamin Doursout, ESIEA—École supérieure d’informatique, électronique, automatique - Laval, France; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
On the computer, effects and software-based music synthesizers are often represented using graphical interfaces that mimic analog equipment almost photorealistically. These representations are, however, limited to a fixed perspective and do not include more advanced visual effects such as polished chrome. Leveraging the flexibility of the audio plug-in programming interface, we have created software that equips a broad class of synthesis and effect plug-ins with interactive, ray-traced 3-D replicas of their user interface. These 3-D models are built automatically through an automated analysis of each plug-ins’ standard 2-D interface. Our experiments show that interactive frame rates can be achieved even with low-end graphics cards. The methods presented may also be used for an automatic analysis of settings and for realistic interactive simulations in the design phase of hardware controls.
Convention Paper 8550 (Purchase now)
P20 - Audio Processing—Part 2
Saturday, October 22, 2:30 pm — 4:30 pm (Room: 1E09)
James (JJ) Johnston
P20-1 Digital Low-Pass Filter Design with Analog-Matched Magnitude Response—Michael Massberg, Brainworx Music & Media GmbH - Leichlingen, Germany
Using the bilinear transform to derive digital low-pass filters from their analog prototypes introduces severe warping of the response once the cutoff frequency approaches the Nyquist limit. We show how to design pre-warped first and second order low-pass prototypes that, after application of the bilinear transform, provide a better match with the magnitude response of the analog original than applying the bilinear transform directly. Result plots are given and compared for different cutoff frequencies and Q factors.
Convention Paper 8551 (Purchase now)
P20-2 Performance Evaluation of Algorithms for Arbitrary Sample Rate Conversion—Andreas Franck, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
Arbitrary sample rate conversion (ASRC) enables changes of the sampling frequency by flexible, time-varying ratios. It can be utilized advantageously in many applications of audio signal processing. Consequently, numerous algorithms for ASRC have been proposed. However, it is often difficult to choose a minimal-cost algorithm that meets the requirements of a specific application. In this paper several approaches to ASRC are reviewed. Special emphasis is placed on algorithms that enable optimal designs, which minimize the resampling error with respect to a selectable norm. Evaluations are performed to assess the computational efficiency of different algorithms as a function of the achievable quality. These analyses demonstrate that structures based on oversampling and resampling filters specifically adapted to this structure yield significant performance improvements over established algorithms.
Convention Paper 8552 (Purchase now)
P20-3 Acoustic Echo Cancellation for Surround Sound Using Spatial Decorrelation—Namgook Cho, Jaeyoun Cho, Jaewon Lee, Yongje Kim, Samsung Electronics Co., Ltd. - Korea
One of the main challenges for a stereophonic acoustic echo canceller is that it suffers from poor convergence, which is caused by strong correlation between input signals. We have proposed a new decorrelation technique that adopts spatial decorrelation to address the problem without altering the input signals. Here, we extend the results to a more generic setting, i.e., a 5.1-channel surround system. In the scheme, the input signals are decomposed and projected into the signal subspace and the noise subspace. When the decorrelated signals are fed to the adaptive filters, the interchannel coherence between the input signals decreases significantly, which provides performance improvement in echo reduction. Experiments in a real-world environment and performance comparison with state-of-the-art techniques are conducted to demonstrate the effectiveness of the proposed technique.
Convention Paper 8553 (Purchase now)
P20-4 A Fixed Beamforming Based Approach for Stereophonic Audio-Conference Systems—Matteo Pirro, Stefano Squartini, Francesco Piazza, Università Politecnia delle Marche - Ancona, Italy
Hands-free communications systems require to primarily reduce the impact of the inevitably occurring acoustic echo. Moreover, in the recent past, a certain attention has been devoted to algorithmic frameworks able to provide stereophonic acoustic rendering and so augmenting the pleasantness of the audio-conference experience. In this paper the authors propose an optimally designed fixed Beamformer (BF) based solution for Stereophonic Acoustic Echo Cancellation (SAEC), with a twofold objective in mind: reducing the echo power and maximizing the stereophonic spatial feeling. Up to the author’s knowledge this is new to the literature, and the achieved experimental results seem to confirm the effectiveness of the approach. It must be underlined, on purpose, that preliminary subjective listening tests have been carried out to evaluate the attainable audio stereo-recording quality. Moreover, the proposed solution allows to significantly reduce the overall computational cost required by the SAEC framework: indeed, BF implementation asks for few extra filtering operations with respect to the baseline approach from one hand but make the usage of the decorrelation module unessential.
Convention Paper 8554 (Purchase now)
P21 - Perception
Saturday, October 22, 2:30 pm — 4:00 pm (Room: 1E Foyer)
P21-1 Influence of Different Test Room Environments on IACC as an Objective Measure of Spatial Impression or Spaciousness—Marco Conceição, Trinity College Dublin - Dublin, Ireland, Escola Superior de Musica, Artes e Espectáculo–IPP, Porto, Portugal; Dermot Furlong, Trinity College Dublin - Dublin, Ireland
To investigate the perceptual impression of spaciousness, a physical measure, which relates to listener spaciousness experience, was used. A variable setup was introduced that made possible the control of spaciousness in different rooms. IACC measurements were made using a frontal loudspeaker for the direct sound, with a second loudspeaker for an angled single early reflection positioned in the horizontal plane. Measurements were performed under controlled conditions in which a dummy-head measured Inter Aural Cross Correlation was captured for variable sound fields. It was possible to conclude that there is a similar trend in IACC results from the repetition of the experiments in different rooms. That is, measurement room acoustic details are not crucial to observed trends in IACC variation.
Convention Paper 8555 (Purchase now)
P21-2 The Relationship between Interchannel Time and Level Differences in Vertical Sound Localization and Masking—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening experiments were conducted with a pair of vertically arranged loudspeakers. A group of subjects measured the level of delayed height channel signal at which any subjective effects of the signal became completely inaudible (masked threshold) as well as that at which the perceived sound image was localized fully at the lower loudspeaker (localized threshold), at nine different delay times ranging from 0 to 50 ms. The sound sources were anechoic recordings of bongo and cello performance excerpts. At the delay times up to 5 ms, source type did not have a significant effect for both threshold results and neither threshold varied significantly as the delay time increased. In this time range the average level reduction required for a full image shift was 6~7 dB while that for masking was 9~10dB. At the higher delay times, on the other hand, both thresholds decreased as the delay time increased and the difference between the two sources in both threshold results was significant. Furthermore, the relationship between the two thresholds varied depending on the source type.
Convention Paper 8556 (Purchase now)
P21-3 Observations on Human Sound Source Localization in the Mid-Sagittal Plane and in Anechoic Space—Daniela Toledo, COWI AS - Oslo, Norway
A group of 5 subjects who showed consistently biased sound source localization in the mid-sagittal plane with real sound sources and under anechoic conditions is presented. Three of these subjects were also tested with virtual sound sources synthesized with their own individual head-related transfer functions. Localization under both conditions showed similar trends, even though they could not be considered as equivalent. This suggests that binaural technology was close to emulating the aural experience that the subjects had with real sound sources. These cases are presented to discuss different issues inherent to binaural synthesis, like the way in which the technology is validated and the assumptions that serve as a basis for the technology.
Convention Paper 8557 (Purchase now)
P21-4 Reducing the Cost of Audiovisual Content Classification by Experiment—Ulrich Reiter, Norwegian University of Science and Technology – NTNU - Trondheim, Norway
A set of subjective attributes of audiovisual media content, originally suggested by Woszczyk et al. in a 1995 AES Convention paper was examined for suitability for AV content classification tasks. Subjective experiments indicate that the 4x4 matrix of dimensions and attributes as suggested in the original paper can be reduced to a more compact 3x2 design for classification purposes without loss of information about the perceptual properties of the content. This can significantly reduce the cost of content classification by experiment.
Convention Paper 8558 (Purchase now)
P21-5 Quality and Performance Assessment of Wave Field Synthesis Reproducing Moving Sound Sources—Michele Gasparini, Paolo Peretti, Laura Romoli, Stefania Cecchi, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy
Reproduction of moving sound sources by Wave Field Synthesis (WFS) arises some specific problems that make static source approach ineffective. An algorithm for the reduction of artifacts and natural representation of the movement effect, based on the synthesis of two virtual sources moving together, has been proposed in a previous work by the same authors. In this paper a practical implementation of the algorithm is presented and evaluated in terms of workload and subjective sound source localization. The influence of some processing parameters on the computational cost has been studied. The sound quality concerning true spatial reproduction is assessed through listening tests and a comparison with previous approaches is reported.
Convention Paper 8559 (Purchase now)
P22 - Listening Tests
Sunday, October 23, 9:00 am — 12:30 pm (Room: 1E09)
Duncan Williams, University of Oxford - Oxford, UK
P22-1 Comparison of Subjective Assessments Obtained from Listening Tests through Headphones and Loudspeaker Setups—Vincent Koehl, Mathieu Paquier, University of Brest (UEB) - Plouzané, France; Simeon Delikaris-Manias, National Engineering School of Brest (UEB) - Plouzané, France
Sound reproduction over headphones is, because of its convenience, indifferently used to reproduce and assess a large variety of audio contents. Nevertheless, it is not yet proven that differences between sound sequences are equally perceived when played back through headphones as using dedicated loudspeaker systems. This study aims at evaluating whether differences and preferences between excerpts are equally perceived using these two reproduction methods. Various types of audio contents, issued by two different recording systems, were then to be compared on both headphones and loudspeaker setups. The results indicate that the two reproduction methods provided consistent similarity and preference judgments. This suggests that the features involved in similarity and preference assessments were preserved when reproducing these excerpts over headphones.
Convention Paper 8560 (Purchase now)
P22-2 A Subjective Validation Method for Musical Instrument Emulation—Leonardo Gabrielli, Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Vesa Välimäki, Aalto University - Espoo, Finland
This paper deals with the problem of assessing the distinguishability between the sound generated by an acoustical or electric instrument and an algorithm designed to emulate its behavior. To accomplish this, several previous works employed subjective listening tests. These are briefly reviewed in the paper. Different metrics to evaluate test results are discussed as well. Results are reported for listening tests performed on the sound of the Clavinet and a computational model aimed at its emulation. After discussing these results a guideline for subjective listening tests in the field of sound synthesis is proposed to the research community for further discussion and improvement.
Convention Paper 8561 (Purchase now)
P22-3 Exploratory Studies on Perceptual Stationarity in Listening Tests—Part I: Real World Signals from Custom Listening Tests—Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frederik Nagel, International Audio Laboratories - Erlangen, Germany, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Many recent publications related to audio coding use the recommendation "MUltiple Stimuli with Hidden Reference and Anchor" (MUSHRA; ITU-R BS.1534-1) for the evaluation of subjective audio quality. Judging the quality of multiple conditions can be inconclusive if the employed test excerpts exhibit more than one prevalent artifact. Two papers investigate the impact of time varying artifacts in both, synthetic and real world signals and claim "perceptual stationarity" as a requirement for test sequences used in MUSHRA tests. This first part deals with commonly used test signals. These often have a length of 10 to 20 seconds and frequently contain time varying perceptual artifacts. Ratings of those items are compared to ratings of cutouts that are predominantly perceptually stationary over time.
Convention Paper 8562 (Purchase now)
P22-4 Exploratory Studies on Perceptual Stationarity in Listening Tests—Part II: Synthetic Signals with Time Varying Artifacts—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany, International Audio Laboratories Erlangen, Erlangen, Germany; Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Many recent publications related to audio coding use the recommendation "MUltiple Stimuli with Hidden Reference and Anchor" (MUSHRA; ITU-R BS.1534-1) for the evaluation of subjective audio quality. Judging the quality of multiple conditions can be inconclusive if the employed test excerpts exhibit more than one prevalent artifact. Two papers investigate the impact of time varying artifacts in both, synthetic and real world signals and claim "perceptual stationarity" as a requirement for test sequences used in MUSHRA tests. This second part focuses on the alternation of multiple types of artifacts in one item and the differences in the ratings compared to items which only contain one of the respective types. It furthermore discusses the meaning of the temporal position of artifacts in an item.
Convention Paper 8563 (Purchase now)
P22-5 The Practical Effects of Lateral Energy in Critical Listening Environments—Richard King, Brett Leonard, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music, Media and Technology, Montreal, Quebec, Canada; Grzegorz Sikora, McGill University - Montreal, Quebec, Canada
Limited information exists on the practical effects of lateral reflections in small rooms design for high-quality sound reproduction and critical listening. A study is undertaken to determine what affect specular and diffuse lateral reflections have on a trained listener. A task-based methodology is employed in which a highly trained subject is asked to perform a task commonly seen in their daily work. The physical conditions of the listening environment are altered to minimize, maximize, and diffuse side-wall reflections. Results correlate the presence of strong lateral energy with an initial reduction of subjects’ ability to complete the task within normal tolerances, but adaptation soon occurs, restoring the subjects to practically normal pace and accuracy.
Convention Paper 8565 (Purchase now)
P22-6 The Effects of Monitoring Systems on Balance Preference: A Comparative Study of Mixing on Headphones versus Loudspeakers—Richard King, Brett Leonard, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music, Media and Technology, Montreal, Quebec, Canada; Grzegorz Sikora, McGill University - Montreal, Quebec, Canada
The typical work-flow of the modern recording engineer often necessitates the use of a number of different monitoring systems over the course of a single project, including both loudspeaker-based systems and headphones. Anecdotal evidence exists that suggests different outcomes when using headphones, but there is little quantified, perceptually-based data to guide engineers in the differences to expect when working between monitoring systems. By conducting controlled, in situ measurements with recording engineers performing mixing tasks on both headphones and loudspeakers, the practical effects of both monitoring systems are shown.
Convention Paper 8566 (Purchase now)
P22-7 The Effect of Head Movement on Perceived Listener Envelopment and Apparent Source Width—Anthony Parks, Jonas Braasch, Rensselear Polytechnic Institute - Troy, NY, USA
This study investigates the effect of head movement in the evaluation of LEV and ASW under 15 different concert hall conditions simulated over eight loudspeakers using Virtual Microphone Control. The conditions consist of varying ratios of front-to-back energy and varying levels of cross-correlated reverberant energy. Head movements are monitored in terms of angular rotation using a head tracker while listeners are prompted to assign subjective ratings of LEV and ASW. The tests are repeated while listeners are asked to keep their heads fixed. Head movements are analyzed and results of the tests are compared.
Convention Paper 8567 (Purchase now)
P23 - Spatial Audio Processing—Part 1
Sunday, October 23, 9:30 am — 11:00 am (Room: 1E Foyer)
P23-1 System Theory of Binaural Synthesis—Florian Völk, AG Technische Akustik, MMK, Technische Universität München - Munich, Germany
Binaural synthesis is widely used as an efficient tool for the simulation of acoustical environments. Different headphones together with artificial as well as human heads are employed for the transfer function measurements involved, having considerable influence on the synthesis quality. Within this paper, a detailed system theoretical analysis of the signal paths and systems involved in a typical data based binaural synthesis scenario is given. The components to be equalized are identified, and equalization methods for every scenario are discussed. Further, restrictions and necessities when using artificial or human recording heads and for headphone selection are given. Most important results are the necessity of blocked auditory canal measurements and the selection of proper headphones for completely correct individual binaural synthesis.
Convention Paper 8568 (Purchase now)
P23-2 A Quantization Method Based on Binaural Perceptual Characteristic for Interaural Level Difference—Heng Wang, Ruimin Hu, Weiping Tu, Xiaochen Wang, Wuhan University - Wuhan, Hubei, China
In this paper we study the mechanism exists of perceptual redundancy in spatial parameters and remove redundancy from energy domain to parameter domain. We establish a binaural perceptual model based on frequency dependent of Interaural Level Difference(ILD) and use this model to direct quantization of ILD. It solves the problem that the perceptual redundancy of spatial parameter is difficult to remove. The new quantization strategy merely quantizes the perceived variable quantity of ILD to reduce the coding bit rate. Experimental results showed that this method can bring down the parametric bit rate by about 15% compared with parametric stereo, while maintaining the subjective sound quality.
Convention Paper 8569 (Purchase now)
P23-3 Estimation of Head-Related Impulse Responses from Impulse Responses Acquired for Multiple Source Positions in Ordinary Sound Field—Shouichi Takane, Koji Abe, Kanji Watanabe, Sojun Sato, Akita Prefectural University - Akita, Japan
In this paper a new method for estimation of Head-Related Impulse Responses (HRIRs) from the impulse responses acquired in ordinary sound field is proposed. Estimation of a single HRIR from the impulse response in the same direction acquired in the ordinary sound field was proposed, and the estimation performance was shown to be not enough in some directions [S. Takane, 127th AES Convention, Paper No. 7885 (2009)]. In order to improve the estimation accuracy, the impulse responses acquired at multiple source positions are used, and the Auto Regressive (AR) coefficients of the HRIRs are assumed common for all source positions in the proposed method. The results of the example estimation of the HATS’ HRIRs showed that the estimation accuracy was significantly improved comparing with our previously proposed method.
Convention Paper 8570 (Purchase now)
P23-4 Toward the Creation of a Standardized HRTF Repository—Areti Andreopoulou, Agnieszka Roginska, New York University - New York, NY, USA
One of the main spatial audio topics, nowadays, involves working toward an efficient individualization method of Head Related Transfer Functions. A major limitation in this area of research is the lack of a large and uniform database that will incorporate as many individualized properties as possible. This paper presents the MARL-NYU file format for storing HRTF datasets, and investigates the necessary normalization steps that assure a uniform and standardized HRTF repository, by compiling selected datasets from four HRTF databases.
Convention Paper 8571 (Purchase now)
P23-5 On the Synthetic Binaural Signals of Moving Sources—Nara Hahn, Doo-Young Sung, Koeng-Mo Sung, Seoul National University - Seoul, Korea
Binaural signals of moving sources are synthesized using head-related impulse responses. The ear signals are synthesized such that the physical properties are correctly contained. The source signal at each time instance is filtered by the instantaneous head-related impulse response, and this wavelet is superimposed at the external ear. A number of properties of synthetic binaural signals are investigated. The spectral shift and head shadowing effect are analyzed in the time domain and in the time-frequency domain. The interpolation/extrapolation methods are employed to compute unmeasured head-related impulse responses. Artifacts caused by these processes are briefly reviewed.
Convention Paper 8572 (Purchase now)
P23-6 The Role of Head Related Transfer Functions' Spectral Features in Sound Source Localization in the Mid-Sagittal Plane—Daniela Toledo, COWI AS - Oslo, Norway
The individual nature of HRTFs is responsible for localization errors when non-individual HRTFs are used in binaural synthesis: localization performance is degraded if the spectral characteristics of the directional filters used do not match the individual characteristic of the listener's HRTFs. How similar the HRTFs should be to avoid degradation in the performance is still unknown. This investigation focuses on identifying and parameterizing spectral characteristics of HRTFs that are relevant as localization cues in the mid-sagittal plane. Results suggest that parameters computed from three spectral features of simplified versions of HRTFs help explaining sound source localization in that plane. Those parameters could be used for individualizing non-individual HRTFs.
Convention Paper 8573 (Purchase now)
P24 - Spatial Audio Processing—Part 2
Sunday, October 23, 2:30 pm — 4:00 pm (Room: 1E Foyer)
P24-1 A Selection Model Based on Perceptual and Statistics Characteristic for Interaural Level Difference—Heng Wang, Ruimin Hu, Weiping Tu, Ge Gao, Wuhan University - Wuhan, Hubei, China
In present mobile communication systems, a low bit rate audio signal is supposed to be provided with high quality. This paper researches the mechanism that exists of perceptual and statistics redundancy in binaural cues and establishes a selection model by joint perceptual and statistics characteristic of ILD. It does not quantize the values of ILD where the frequency bands can’t easily be perceived by human ears according to the selection model. Experimental results showed that this method can bring down the parametric bit rate by about 15% compared with parametric stereo, while maintaining the subjective sound quality.
Convention Paper 8574 (Purchase now)
P24-2 Perceptual Evaluation of a Spatial Audio Algorithm Based on Wave Field Synthesis Using a Reduced Number of Loudspeakers—Frank Melchior, Udo Heusinger, IOSONO GmbH - Erfurt, Germany; Judith Liebetrau, Technical University Ilmenau - Ilmenau, Germany
With 3-D picture being the driving force of today’s motion picture production, there is a growing need for adequate audio solutions, e.g., spatial audio algorithms for reproduction with flexible loudspeaker setups. While these reproduction systems will have to fulfill high quality demands, the amount of loudspeakers needed should be kept as low as possible to optimize commercial aspects. One suitable algorithm from a quality point of view is wave field synthesis (WFS), which, however, requires a huge amount of speakers if implemented as stated in literature. This paper presents the results of a perceptual evaluation of a new algorithm based on WFS. A listening experiment compared state-of-the-art WFS, the new algorithm, and Vector Base Amplitude Panning regarding their perceived localization and coloration.
Convention Paper 8575 (Purchase now)
P24-3 Design and Evaluation of an Interactive Simulated Reverberant Environment—Alan C. Johnson, Kevin Salchert, Andreas Sprotte-Hansen, New York University - New York, NY, USA
There are many existing approaches to the challenge of simulating a reverberant field. Most of these methods are designed to operate on a signal that has been recorded in a relatively anechoic environment and seek to add in the simulated reverberation of a chosen space. This paper describes a low-cost, scalable approach for directly converting an acoustically dry space into a reverberant space of a larger size, with a number of configurable parameters. This is accomplished by harnessing the mutual feedback among microphones and loudspeakers arranged in the space. The result is a simple, tunable, and interactive system for creating a convincing reverberant environment. Several novel applications of such a system are also discussed.
Convention Paper 8576 (Purchase now)
P24-4 Multi-Touch Room Expansion Controller for Real-Time Acoustic Gestures—Andrew Madden, Pia Blumental, Areti Andreopoulou, Braxton Boren, Shengfeng Hu, Zhengshan Shi, Agnieszka Roginska, New York University - New York, NY, USA
This paper describes an application that provides real-time high accuracy room acoustics simulation. Using a multi-touch interface, the user can easily manipulate the dimensions of a virtual space while hearing the room’s acoustics change in real-time. Such an interface enables a more fluid and intuitive control of our application, which better lends itself to expressive artistic gestures for use in such activities as sound design, performance, and education. The system relies on high accuracy room impulse responses from CATT-Acoustic and real-time audio processing through Max/MSP and provides holistic control of a spatial environment rather than applying generic reverberation via individual acoustic parameters (i.e., early reflections, RT60, etc.). Such an interface has the capability to create a more realistic effect without compromising flexibility of use.
Convention Paper 8577 (Purchase now)
P24-5 Adaptive Crosstalk Cancellation Using Common Acoustical Pole and Zero (CAPZ) Model—Common Pole Estimation—Hanwook Chung, Seoul National University - Seoul, Korea; Sang Bae Chon, Samsung Electronics Co. Ltd. - Suwon, Gyeonggido, Korea; Nara Hahn, Koeng-Mo Sung, Seoul National University - Seoul, Korea
In this paper we introduce adaptive crosstalk cancellation using a common acoustical pole and zero (CAPZ) model for a head-related transfer function (HRTF). As the CAPZ model for HRTF provides an interpretation of the HRTF wherein zeros describe the spatial difference caused by the acoustical propagation path and common poles describe the characteristics of the human auditory system, we designed the proposed model to follow the zero components of the CAPZ model. By estimating common poles and simulations, we verified that the proposed model exhibits enhanced performance compared with a conventional finite impulse response model of HRTF.
Convention Paper 8578 (Purchase now)
P25 - Auditory Perception
Sunday, October 23, 3:00 pm — 5:30 pm (Room: 1E09)
Richard King, McGill University - Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music, Media and Technology, Montreal, Quebec, Canada
P25-1 The Impact of Producers’ Comments and Musicians’ Self-Evaluation on Performance during Recording Sessions—Amandine Pras, Catherine Guastavino, McGill University - Montreal, Quebec, Canada
When recording in the studio, musicians repeat the same musical composition over and over again without the presence of an audience. Furthermore, recording technologies transform the musical performance that musicians hear in the studio. We conducted a field experiment to investigate whether record producers’ comments and musicians’ self-evaluation helped musicians improve from one take to another during recording sessions. Twenty-five jazz players, grouped into five ensembles, participated in recording sessions with four record producers. Two types of feedback between takes were varied independently: with or without comments from a record producer and with or without musicians’ self-evaluation after listening to the takes in the control room. Our results show that both external comments and self-evaluation give the ensemble a common ground but also make musicians too self-conscious.
Convention Paper 8579 (Purchase now)
P25-2 The Influence of Camera Focal Length in the Direct-to-Reverb Ratio Suitability and Its Effect in the Perception of Distance for a Motion Picture—Luiz Fernando Kruszielski, Toru Kamekawa, Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan
In order to study the possible influence of a camera focal length in the auditory distance perception, two experiments were conducted. In the first experiment, participants were asked to adjust the amount of reverb according to the presented visual image, which had different focal lenses distances, and therefore different backgrounds. In the second experiment, participants were asked to rate the egocentric sense of distance to the sound source and the suitability of the sound for the visual image in a pairwise comparison. The results have shown that the overall sense of distance is mainly dependent on the focal length; however, if the image foreground object has the same size, the focal length can alter the perception of sound distance.
Convention Paper 8580 (Purchase now)
P25-3 Automatic Soundscape Classification via Comparative Psychometrics and Machine Learning—Krithika Rajagopal, University of Miami - Coral Gables, FL, USA, Audio Precision, Beaverton, OR, USA; Phil Minnick, Colby Leider, University of Miami - Coral Gables, FL, USA
Computational acoustical ecology is a relatively new field in which long-term environmental recordings are mined for meaningful data. Humans quite naturally and automatically associate environmental sounds with emotions and can easily identify the components of a soundscape. However, equipping a computer to accurately and automatically rate unknown environmental recordings along subjective psychoacoustic dimensions, let alone report the environment (e.g., beach, barnyard, home kitchen, research lab, etc.) in which the environmental recordings were made with a high degree of accuracy is quite difficult. We present here a robust algorithm for automatic soundscape classification in which both psychometric data and computed audio features are compared and used to train a Naive Bayesian classifier. An algorithm for classifying the type of soundscape across different categories was developed. In a pilot test, automatic classification accuracy of 88% was achieved on 20 soundscapes, and the classifier was able to outperform human ratings in some tests. In a second test classification accuracy of 95% was achieved on 30 soundscapes.
Convention Paper 8581 (Purchase now)
P25-4 Effect of Whole-Body Vibration on Speech. Part II: Effect on Intelligibility—Durand Begault, NASA Ames Research Center - Mofett Field, CA, USA
The effect on speech intelligibility was measured for speech where talkers reading Diagnostic Rhyme Test material were exposed to 0.7 g whole body vibration to simulate space vehicle launch. Across all talkers, the effect of vibration was to degrade the percentage of correctly transcribed words from 83% to 74%. The magnitude of the effect of vibration on speech communication varies between individuals, for both talkers and listeners. A “worst case” scenario for intelligibility would be the most “sensitive” listener hearing the most “sensitive” talker; one subject’s intelligibility was reduced by 26% (97% to 71%) for one of the talkers.
Convention Paper 8582 (Purchase now)
P25-5 Some New Evidence that Teenagers May Prefer Accurate Sound Reproduction—Sean Olive, Harman International Industries, Inc. - Northridge, CA, USA
A group of 18 high school students with no prior listening experience participated in two separate controlled listening tests that measured their preferences between music reproduced in (1) MP3 and lossless CD-quality file formats, and (2) music reproduced through four different consumer loudspeakers. Overall, the teenagers preferred the sound quality of the CD-quality file format, and the most accurate, neutral loudspeaker. Together, these tests provide some new evidence that teenagers can discern and appreciate a better quality of reproduced sound when given the opportunity to directly compare it against lower quality options.
Convention Paper 8583 (Purchase now)