AES New York 2013
Paper Session Details

P1 - Transducers—Part 1: Microphones


Thursday, October 17, 9:00 am — 11:00 am (Room 1E07)

Chair:
Helmut Wittek, SCHOEPS GmbH - Karlsruhe, Germany

P1-1 Portable Spherical Microphone for Super Hi-Vision 22.2 Multichannel AudioKazuho Ono, NHK Engineering System, Inc. - Setagaya-ku, Tokyo, Japan; Toshiyuki Nishiguchi, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kentaro Matsui, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kimio Hamasaki, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan
NHK has been developing a portable microphone for the simultaneous recording of 22.2ch multichannel audio. The microphone is 45 cm in diameter and has acoustic baffles that partition the sphere into angular segments, in each of which an omnidirectional microphone element is mounted. Owing to the effect of the baffles, each segment works as a narrow angle directivity and a constant beam width in higher frequencies above 6 kHz. The directivity becomes wider as frequency decreases and that it becomes almost omnidirectional below 500 Hz. The authors also developed a signal processing method that improves the directivity below 800 Hz.
Convention Paper 8922 (Purchase now)

P1-2 Sound Field Visualization Using Optical Wave Microphone Coupled with Computerized TomographyToshiyuki Nakamiya, Tokai University - Kumamoto, Japan; Fumiaki Mitsugi, Kumamoto University - Kumamoto, Japan; Yoichiro Iwasaki, Tokai University - Kumamoto, Japan; Tomoaki Ikegami, Kumamoto University - Kumamoto, Japan; Ryoichi Tsuda, Tokai University - Kumamoto, Japan; Yoshito Sonoda, Tokai University - Kumamoto, Kumamoto, Japan
The novel method, which we call the “Optical Wave Microphone (OWM)” technique, is based on a Fraunhofer diffraction effect between a sound wave and a laser beam. The light diffraction technique is an effective sensing method to detect the sound and is flexible for practical uses as it involves only a simple optical lens system. OWM is also very useful to detect the sound wave without disturbing the sound field. This new method can realize high accuracy measurement of slight density change of atmosphere. Moreover, OWM can be used for sound field visualization by computerized tomography (CT) because the ultra-small modulation by the sound field is integrated along the laser beam path.
Convention Paper 8923 (Purchase now)

P1-3 Proposal of Optical Wave Microphone and Physical Mechanism of Sound DetectionYoshito Sonoda, Tokai University - Kumamoto, Kumamoto, Japan; Toshiyuki Nakamiya, Tokai University - Kumamoto, Japan
An optical wave microphone with no diaphragm, which uses wave optics and a laser beam to detect sounds, can measure sounds without disturbing the sound field. The theoretical equation for this measurement can be derived from the optical diffraction integration equation coupled to the optical phase modulation theory, but the physical interpretation or meaning of this phenomenon is not clear from the mathematical calculation process alone. In this paper the physical meaning in relation to wave-optical processes is considered. Furthermore, the spatial sampling theorem is applied to the interaction between a laser beam with a small radius and a sound wave with a long wavelength, showing that the wavenumber resolution is lost in this case, and the spatial position of the maximum intensity peak of the optical diffraction pattern generated by a sound wave is independent of the sound frequency. This property can be used to detect complex tones composed of different frequencies with a single photo-detector. Finally, the method is compared with the conventional Raman-Nath diffraction phenomena relating to ultrasonic waves. AES 135th Convention Best Peer-Reviewed Paper Award Cowinner
Convention Paper 8924 (Purchase now)

P1-4 Numerical Simulation of Microphone Wind Noise, Part 2: Internal FlowJuha Backman, Nokia Corporation - Espoo, Finland
This paper discusses the use of the computational fluid dynamics (CFD) for computational analysis of microphone wind noise. The previous part of this work showed that an external flow produces a pressure difference on the external boundary, and this pressure causes flow in the microphone internal structures, mainly between the protective grid and the diaphragm. The examples presented in this work describe the effect of microphone grille structure and microphone diaphragm properties on the wind noise sensitivity related to the behavior of this kind of internal flows.
Convention Paper 8925 (Purchase now)

 
 

P2 - Signal Processing—Part 1


Thursday, October 17, 9:00 am — 12:00 pm (Room 1E09)

Chair:
Jaeyong Cho, Samsung Electronics DMC R&D Center - Suwon, Korea

P2-1 Linear Phase Implementation in Loudspeaker Systems: Measurements, Processing Methods, and Application BenefitsRémi Vaucher, NEXO - Plailly, France
The aim of this paper is to present a new generation of EQ. It provides a way to ensure phase compatibility from 20 Hz to 20 kHz over a range of different speaker cabinets. This method is based on a mix of FIR filters and IIR filters. The use of FIR filters allows a tuning of the phase independently from magnitude and allows an acoustic linear phase above 500 Hz. All targets used to compute FIR coefficient are based upon extensive measurement and subjective listening tests. A template has been set to normalize the crossover frequencies in the low range, enabling compatibility of every sub-bass with the main cabinets.
Convention Paper 8926 (Purchase now)

P2-2 Applications of Inverse Filtering to the Optimization of Professional Loudspeaker SystemsDaniele Ponteggia, Studio Ponteggia - Terni (TR), Italy; Mario Di Cola, Audio Labs Systems - Casoli (CH), Italy
The application of FIR filter technology to implement Inverse Filtering into a Professional Loudspeakers Systems nowadays is easier and more affordable because of the latest development of DSP technology and also because of the existence of a new DSP platform dedicated to the end user. This paper presents an analysis, based on real world examples, of a possible methodology that can be used in order to synthesize an appropriate Inverse Filter both to process a single driver, from a Time Domain perspective in a multi-way system, and to process the output pass-band from of a multi-way system for phase linearization. The analysis and discussion of results for some applications will be shown through real world test and measurements.
Convention Paper 8927 (Purchase now)

P2-3 Live Event Performer Tracking for Digital Console Automation Using Industry-Standard Wireless Microphone SystemsAdam J. Hill, University of Derby - Derby, Derbyshire, UK; Kristian "Kit" Lane, University of Derby - Derby, UK; Adam P. Rosenthal, Gand Concert Sound - Elk Grove Village, IL, USA; Gary Gand, Gand Concert Sound - Elk Grove Village, IL, USA
The ever-increasing popularity of digital consoles for audio and lighting at live events provides a greatly expanded set of possibilities regarding automation. This research works toward a solution for performer tracking using wireless microphone signals that operates within the existing infrastructure at professional events. Principles of navigation technology such as received signal strength (RSS), time difference of arrival (TDOA), angle of arrival (AOA), and frequency difference of arrival (FDOA) are investigated to determine their suitability and practicality for use in such applications. Analysis of potential systems indicates that performer tracking is feasible over the width and depth of a stage using only two antennas with a suitable configuration, but limitations of current technology restrict the practicality of such a system.
Convention Paper 8928 (Purchase now)

P2-4 Real-Time Simulation of a Family of Fractional-Order Low-Pass FiltersThomas Hélie, IRCAM-CNRS UMR 9912-UPMC - Paris, France
This paper presents a family of low-pass filters, the attenuation of which can be continuously adjusted from 0 decibel per octave (filter is a unit gain) to -6 decibels per octave (standard one-pole filter). This continuum is produced through a filter of fractional-order between 0 (unit gain) and 1 (one-pole filter). Such a filter proves to be a (continuous) infinite combination of one-pole filters. Efficient approximations are proposed from which simulations in the time-domain are built.
Convention Paper 8929 (Purchase now)

P2-5 A Computationally Efficient Behavioral Model of the Nonlinear DevicesJaeyong Cho, Samsung Electronics DMC R&D Center - Suwon, Korea; Hanki Kim, Samsung Electronics DMC R&D Center - Suwon, Korea; Seungkwan Yu, Samsung Electronics DMC R&D Center - Suwon, Korea; Haekwang Park, Samsung Electronics DMC R&D Center - Suwon, Korea; Youngoo Yang, Sungkyunkwan University - Suwon, Korea
This paper presents a new computationally efficient behavioral model to reproduce the output signal of the nonlinear devices for the real-time systems. The proposed model is designed using the memory gain structure and verified for its accuracy and computational complexity compared to other nonlinear models. The model parameters are extracted from a vacuum tube amplifier, Heathkit’s W-5M, using the exponentially-swept sinusoidal signal. The experimental results show that the proposed model has 27% of the computational load against the generalized Hammerstein model and maintains similar modeling accuracy.
Convention Paper 8930 (Purchase now)

P2-6 High-Precision Score-Based Audio Indexing Using Hierarchical Dynamic Time WarpingXiang Zhou, Bose Corporation - Framingham, MA, USA; Fangyu Ke, University of Rochester - Rochester, NY, USA; Cheng Shu, University of Rochester - Rochester, NY, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a novel audio signal processing algorithm of high-precision score-based audio indexing that accurately maps a music score with its corresponding audio. Specifically we improve the time precision of existing score-audio alignment algorithms to find the accurate positions of audio onsets and offsets. We achieve higher time precision by (1) improving the resolution of alignment sequences, and (2) admitting a hierarchy of spectrographic analysis results as audio alignment features. The performance of our proposed algorithm is testified by comparing the segmentation results with manually composed reference datasets. Our proposed algorithm achieves robust alignment results and enhanced segmentation accuracy and thus is suitable for audio engineering applications such as automatic music production and human-media interactions.
Convention Paper 8931 (Purchase now)

 
 

P3 - Audio Education


Thursday, October 17, 11:00 am — 12:00 pm (Room 1E07)

Chair:
Timothy J. Ryan, Webster University - St. Louis, MO, USA

P3-1 Music to Our Ears: The Effect of Background Music in Higher Education Learning EnvironmentsAdam J. Hill, University of Derby - Derby, Derbyshire, UK
Learning and teaching practice in higher education has embraced various forms of technology over recent years directed at enhancing the learning experience. Background music is well-known to benefit learning in elementary schools but has been largely ignored in higher education. There is evidence that background music is particularly beneficial for students with previous musical training, which is important for educators of audio engineering or similar courses linked closely with music. This work aims to determine if there are merits to background music in higher education and to point toward future work required to give definitive proof.
Convention Paper 8932 (Purchase now)

P3-2 Recording History in Audio EducationJeffrey Ratterman, Front Range Community College - Fort Collins, CO, USA
This research will discuss the state of history education for the audio recording field. As the audio industry evolves, it is becoming more apparent that its history, for purposes of teaching, is rather unorganized. Also, on the whole, students learning the practice of audio recording are not being well educated in its history. As a result, students studying to become audio experts are not gaining awareness of a fundamental aspect of the field. This report surveys experiences and viewpoints of professionals, highlights existing recording history in audio education, and explores methods for audio history pedagogy. In addition, a suggested framework of historical periods of audio is presented and a concise list of resources is suggested.
Convention Paper 8933 (Purchase now)

 
 

P4 - Room Acoustics


Thursday, October 17, 2:30 pm — 4:30 pm (Room 1E07)

Chair:
Ben Kok, SCENA acoustic consultants - Uden, The Netherlands

P4-1 Investigating Auditory Room Size Perception with Autophonic StimuliManuj Yadav, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia; Luis Miranda, University of Sydney - Sydney, NSW, Australia; William L. Martens, University of Sydney - Sydney, NSW, Australia; Doheon Lee, University of Sydney - Sydney, NSW, Australia; Ralph Collins, University of Sydney - Sydney, NSW, Australia
Although looking at a room gives a visual indicator of its “size,” auditory stimuli alone can also provide an appreciation of room size. This paper investigates such aurally perceived room size by allowing listeners to hear the sound of their own voice in real-time through two modes: natural conduction and auralization. The auralization process involved convolution of the talking-listener’s voice with an oral-binaural room impulse response (OBRIR; some from actual rooms, and others manipulated), which was output through head-worn ear-loudspeakers, and thus augmented natural conduction with simulated room reflections. This method allowed talking-listeners to rate room size without additional information about the rooms. The subjective ratings were analyzed against relevant physical acoustic measures derived from OBRIRs. The results indicate an overall strong effect of reverberation time on the room size judgments, expressed as a power function, although energy measures were also important in some cases.
Convention Paper 8934 (Purchase now)

P4-2 Digitally Steered Columns: Comparison of Different Products by Measurement and SimulationStefan Feistel, AFMG Technologies GmbH - Berlin, Germany; Anselm Goertz, Institut für Akustik und Audiotechnik (IFAA) - Herzogenrath, Germany
Digitally steered loudspeaker columns have become the predominant means to achieve satisfying speech intelligibility in acoustically challenging spaces. This work compares the performance of several commercially available array loudspeakers in a medium-size, reverberant church. Speech intelligibility as well as other acoustic quantities are compared on the basis of extensive measurements and computer simulations. The results show that formally different loudspeaker products provide very similar transmission quality. Also, measurement and modeling results match accurately within the uncertainty limits.
Convention Paper 8935 (Purchase now)

P4-3 A Concentric Compact Spherical Microphone and Loudspeaker Array for Acoustical MeasurementsLuis Miranda, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia; Ken Stewart, University of Sydney - Sydney, NSW, Australia
Several commonly used descriptors of acoustical conditions in auditoria (ISO 3382-1) utilize omnidirectional transducers for their measurements, disregarding the directional properties of the source and the direction of arrival of reflections. This situation is further complicated when the source and the receiver are collocated as would be the case for the acoustical characterization of stages as experienced by musicians. A potential solution to this problem could be a concentric compact microphone and loudspeaker array, capable of synthesizing source and receiver spatial patterns. The construction of a concentric microphone and loudspeaker spherical array is presented in this paper. Such a transducer could be used to analyze the acoustic characteristics of stages for singers, while preserving the directional characteristics of the source, acquiring spatial information of reflections and preserving the spatial relationship between source and receiver. Finally, its theoretical response and optimal frequency range are explored.
Convention Paper 8936 (Purchase now)

P4-4 Adapting Loudspeaker Array Radiation to the Venue Using Numerical Optimization of FIR FiltersStefan Feistel, AFMG Technologies GmbH - Berlin, Germany; Mario Sempf, AFMG Technologies GmbH - Berlin, Germany; Kilian Köhler, IBS Audio - Berlin, Germany; Holger Schmalle, AFMG Technologies GmbH - Berlin, Germany
Over the last two decades loudspeaker arrays have been employed increasingly for sound reinforcement. Their high output power and focusing ability facilitate extensive control capabilities as well as extraordinary performance. Based on acoustic simulation, numerical optimization of the array configuration, particularly of FIR filters, adds a new level of flexibility. Radiation characteristics can be established that are not available for conventionally tuned sound systems. It is shown that substantial improvements in sound field uniformity and output SPL can be achieved. Different real-world case studies are presented based on systematic measurements and simulations. Important practical implementation aspects are discussed such as the spatial resolution of driven sources, the number of FIR coefficients, and the quality of loudspeaker data.
Convention Paper 8937 (Purchase now)

 
 

P5 - Signal Processing—Part 2


Thursday, October 17, 2:30 pm — 5:00 pm (Room 1E09)

Chair:
Juan Pablo Bello, New York University - New York, NY, USA

P5-1 Evaluation of Dynamics Processors’ Effects Using Signal StatisticsTim Shuttleworth, Renkus Heinz - Oceanside, CA, USA
Existing methods of evaluating the action of dynamics processors, i.e., limiters, compressors, expanders, and gates do not provide results that have a direct correlation with the perceived and actual effect on the signals dynamics; aspects such as crest factor, dynamic range, and subjective acceptability of the processed signal or degree of optimization of the use of the transmission medium. A method is described that uses statistical analysis of the pre- and post-processed signal to allow the processor’s action to be characterized in a manner that correlates to the perceived effects and actual modification of signal dynamics. A number of signal statistical and user definable characteristics are introduced and, in addition to well-known statistical techniques, form the basis for this evaluation method.
Convention Paper 8938 (Purchase now)

P5-2 A New Ultra Low Delay Audio Communication CoderBrijesh Singh Tiwari, ATC Labs - Noida, India; Midathala Harish, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Newark, NJ, USA
We propose a new full bandwidth audio codec that has algorithmic delay requirement as low as 0.67 ms to a maximum of 2.7 ms. Low delay is a critical requirement in real many time applications such as networked music performances, wireless speakers and microphones, and Bluetooth devices. The proposed Ultra Low Delay Audio Communication Codec (ULDACC) is a perceptual transform codec utilizing very small transform windows the shape of which is optimized to compensate for the lack of frequency resolution. Specially adapted psychoacoustic model and intra-frame coding techniques are employed to achieve transparent audio quality for bit rates approaching 128 kbps/channel at the algorithmic delay of about 1 ms.
Convention Paper 8939 (Purchase now)

P5-3 Cascaded Long Term Prediction of Polyphonic Signals for Low Power DecodersTejaswi Nanjundaswamy, University of California, Santa Barbara - Santa Barbara, CA, USA; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
An optimized cascade of long term prediction filters, each corresponding to an individual periodic component of the polyphonic audio signal, was shown in our recent work to be highly effective as an inter-frame prediction tool for low delay audio compression. The earlier paradigm involved backward adaptive parameter estimation, and hence significantly higher decoder complexity, which is unsuitable for applications that pose a stringent power constraint on the decoder. This paper overcomes this limitation via extension to include forward adaptive parameter estimation, in two modes that trade complexity for side information: (i) a subset of parameters is sent as side information and the remaining is backward adaptively estimated; (ii) all parameters are sent as side information. We further exploit inter-frame parameter dependencies to minimize the side information rate. Objective and subjective evaluation results clearly demonstrate substantial gains and effective control of the tradeoff between rate-distortion performance and decoder complexity.
Convention Paper 8940 (Purchase now)

P5-4 Voice Coding with OpusKoen Vos, vocTone - San Francisco, CA, USA; Karsten Vandborg Sørensen, Microsoft - Stockholm, Sweden; Søren Skak Jensen, GN Netcom A/S - Ballerup, Denmark; Jean-Marc Valin, Mozilla Corporation - Mountain View, CA, USA
In this paper we describe the voice mode of the Opus speech and audio codec. As only the decoder is standardized, the details in this paper will help anyone who wants to modify the encoder or gain a better understanding of the codec. We go through the main components that constitute the voice part of the codec, provide an overview, give insights, and discuss the design decisions made during the development. Tests have shown that Opus quality is comparable to or better than several state-of-the-art voice codecs, while covering a much broader application area than competing codecs.
Convention Paper 8941 (Purchase now)

P5-5 High-Quality, Low-Delay Music Coding in the Opus CodecJean-Marc Valin, Mozilla Corporation - Mountain View, CA, USA; Gregory Maxwell, Mozilla Corporation; Timothy B. Terriberry, Mozilla Corporation; Koen Vos, vocTone - San Francisco, CA, USA
The IETF recently standardized the Opus codec as RFC6716. Opus targets a wide range of real-time Internet applications by combining a linear prediction coder with a transform coder. We describe the transform coder, with particular attention to the psychoacoustic knowledge built into the format. The result out-performs existing audio codecs that don't operate under real-time constraints.
Convention Paper 8942 (Purchase now)

 
 

P6 - Spatial Audio


Thursday, October 17, 3:00 pm — 4:30 pm (1EFoyer)

P6-1 Improvement of 3-D Sound Systems by Vertical Loudspeaker ArraysAkira Saji, University of Aizu - Aizuwakamatsu City, Japan; Keita Tanno, University of Aizu - Aizuwakamatsu, Fukushima, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
Recently we proposed a 3-D sound system using a horizontal arrangement of loudspeakers by combining the effect of HRTF and the amplitude panning method. In that system, loudspeakers are set at the height of subject's ear level and its sweet-spot is limited by the height of loudspeakers. When the listener's ear level is different from the loudspeakers, it will cause difficulty of sound localization or breakdown of sound localization. However, it is difficult to adjust properly both the height of loudspeakers and subject's ear level every time. In this paper we aimed to improve the robustness of the 3-D sound system using vertical loudspeaker arrays. As a result of experiments, we prove that the loudspeaker arrays can improve the robustness of the 3-D sound system.
Convention Paper 8944 (Purchase now)

P6-2 An Integrated Algorithm for Optimized Playback of Higher Order AmbisonicsRobert E. Davis, University of the West of Scotland - Paisley, Scotland, UK; D. Fraser Clark, University of the West of Scotland - Paisley, Scotland, UK
An algorithm is presented that gives improved playback performance of higher order ambisonic material on practical loudspeaker arrays. The optimizations are based on sound field reproduction theories with additional parameters to account for the compensation of loudspeaker and listener positioning constraints and numbers of listeners. Automatic calculation of loudspeaker distances is also achieved based on room dimensions and a gain calibration routine is incorporated. Results are given to quantify the resulting algorithm performance, informal listening tests were carried out, and aspects of implementation are discussed.
Convention Paper 8945 (Purchase now)

P6-3 I Hear NY3D: Ambisonic Capture and Reproduction of an Urban Sound EnvironmentBraxton Boren, New York University - New York, NY, USA; Areti Andreopoulou, New York University - New York, NY, USA; Michael Musick, New York University - New York, NY, USA; Hariharan Mohanraj, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This paper describes “I Hear NY3D,” a project for capturing and reproducing 3D soundfields in New York City. First order Ambisonic recordings of various locations in Manhattan have taken place, to be used for both aesthetic and informational purposes. The collected data allows for the creation of high quality, fully immersive auditory soundscapes that can be played back at any periphonic speaker array configuration through real time matrixing. Binaural renderings of the same data are also available for more portable applications.
Convention Paper 8946 (Purchase now)

P6-4 I Hear NY3D: An Ambisonic Installation Reproducing NYC SoundscapesMichael Musick, New York University - New York, NY, USA; Areti Andreopoulou, New York University - New York, NY, USA; Braxton Boren, New York University - New York, NY, USA; Hariharan Mohanraj, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This paper describes the development of a reproduction installation for the "I Hear NY3D" project. This project’s aim is the capture and reproduction of immersive soundfields around Manhattan. A means of creating an engaging reproduction of these soundfields through the medium of an installation will also be discussed. The goal for this installation is an engaging, immersive experience that allows participants to create connections to the soundscapes and observe relationships between the soundscapes. This required the consideration of how to best capture and reproduce these recordings, the presentation of simultaneous multiple soundscapes, and a means of interaction with the material.
Convention Paper 8947 (Purchase now)

P6-5 Auralization of Measured Room Impulse Responses Considering Head MovementsAnthony Parks, Rensselaer Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Samuel W. Clapp, Rensselaer Polytechnic Institute - Troy, NY, USA
The purpose of this paper is to describe a novel method for auralizing measured room impulse responses over headphones using impulse responses recorded from a 16-channel spherical microphone array decoded to eight virtual loudspeakers mixed-down binaurally using nonindividualized HRTFs. The novelty of this method lies not in the ambisonic binaural-mixdown process, but rather, the use of head pose estimation code from the Kinect API sent to a Max/MSP patch using Open Sound Control messages. This provides a fast, reliable alternative to auralizations over headphones that allow for head movements without the need for head-related transfer function interpolation by performing a rotation on the spherical harmonic that corresponds to the listener's head rotation.
Convention Paper 8948 (Purchase now)

P6-6 Reduced Representations of HRTF Datasets: A Discriminant Analysis ApproachAreti Andreopoulou, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Juan Pablo Bello, New York University - New York, NY, USA
This paper discusses reduced representations of HRTF datasets, fully descriptive of one’s personalized properties. The data reduction is achieved through elimination of the least discriminative binaural-filter pairs from a set. For this purpose Linear Discriminant Analysis (LDA) was applied on the Music and Audio Research Laboratory (MARL) database of repeated HRTF measurements, which resulted in 67% data reduction. The effectiveness of the sparse HRTF mapping is assessed by way of the performance of a database matching system, followed by a subjective evaluation study. The results indicate that participants have demonstrated strong preference towards the selected HRTF sets, in contrast to the generic KEMAR set and the least similar selections from the repository.
Convention Paper 8949 (Purchase now)

P6-7 Investigation of HRTF Sets Using Content with Limited Spatial ResolutionJohann-Markus Batke, Audio & Acoustics, Technicolor Research & Innovation - Hannover, Germany; Stefan Abeling, Audio & Acoustics, Technicolor Research & Innovation - Hannover, Germany; Stefan Balke, Leibniz Universität Hannover - Hannover, Germany; Gerald Enzner, Ruhr-Universität Bochum - Bochum, Germany
Headphone rendering of sound fields represented by Higher Order Ambisonics (HOA) is greatly facilitated by the binaural synthesis of virtual loudspeakers. Individualized head-related transfer function (HRTF) sets corresponding to the spatial positions of the virtual loudspeakers are used in conjunction with head-tracking to achieve the externalization of the sound event. We investigate the localization accuracy for HOA representations of limited spatial resolution using individualized and generic HRTF sets.
Convention Paper 8950 (Purchase now)

 
 

P7 - Spatial Audio—Part 1


Thursday, October 17, 4:30 pm — 7:00 pm (Room 1E07)

Chair:
Wieslaw Woszczyk, McGill University - Montreal, QC, Canada

P7-1 Reproducing Real-Life Listening Situations in the Laboratory for Testing Hearing AidsPauli Minnaar, Oticon A/S - Smørum, Denmark; Signe Frølund Albeck, Oticon A/S - Smørum, Denmark; Christian Stender Simonsen, Oticon A/S - Smørum, Denmark; Boris Søndersted, Oticon A/S - Smørum, Denmark; Sebastian Alex Dalgas Oakley, Oticon A/S - Smørum, Denmark; Jesper Bennedbæk, Oticon A/S - Smørum, Denmark
The main purpose of the current study was to demonstrate how a Virtual Sound Environment (VSE), consisting of 29 loudspeakers, can be used in the development of hearing aids. A listening test was done by recording everyday sound scenes with a spherical microphone array that has 32 microphone capsules. The playback in the VSE was implemented by convolving the recordings with inverse filters, which were derived by directly inverting a matrix of 928 measured transfer functions. While listening to 5 sound scenes, 10 hearing-impaired listeners could switch between hearing aid settings in real time, by interacting with a touch screen in a MUSHRA-like test. The setup proves to be very valuable for ensuring that hearing aid settings work well in real-world situations.
Convention Paper 8951 (Purchase now)

P7-2 Measuring Speech Intelligibility in Noisy Environments Reproduced with Parametric Spatial AudioTeemu Koski, Aalto University - Espoo, Finland; Ville Sivonen, Cochlear Nordic AB - Vantaa, Finland; Ville Pulkki, Aalto University - Espoo, Finland; Technical University of Denmark - Denmark
This work introduces a method for speech intelligibility testing in reproduced sound scenes. The proposed method uses background sound scenes augmented by target speech sources and reproduced over a multichannel loudspeaker setup with time-frequency domain parametric spatial audio techniques. Subjective listening tests were performed to validate the proposed method: speech recognition thresholds (SRT) in noise were measured in a reference sound scene and in a room where the reference was reproduced by a loudspeaker setup. The listening tests showed that for normally-hearing test subjects the method provides nearly indifferent speech intelligibility compared to the real-life reference when using a nine-loudspeaker reproduction setup in anechoic conditions (<0.3 dB error in SRT). Due to the flexible technical requirements, the method is potentially applicable to clinical environments. AES 135th Convention Student Technical Papers Award Cowinner
Convention Paper 8952 (Purchase now)

P7-3 On the Influence of Headphones on Localization of Loudspeaker SourcesDarius Satongar, University of Salford - Salford, Greater Manchester, UK; Chris Pike, BBC Research and Development - Salford, Greater Manchester, UK; University of York - Heslington, York, UK; Yiu W. Lam, University of Salford - Salford, UK; Tony Tew, University of York - York, UK
When validating systems that use headphones to synthesize virtual sound sources, a direct comparison between virtual and real sources is sometimes needed. This paper presents objective and subjective measurements of the influence of headphones on external loudspeaker sources. Objective measurements of the effect of a number of headphone models are given and analyzed using an auditory filter bank and binaural cue extraction. Objective results highlight that all of the headphones had an effect on localization cues. A subjective localization test was undertaken using one of the best performing headphones from the measurements. It was found that the presence of the headphones caused a small increase in localization error but also that the process of judging source location was different, highlighting a possible increase in the complexity of the localization task.
Convention Paper 8953 (Purchase now)

P7-4 Binaural Reproduction of 22.2 Multichannel Sound with Loudspeaker Array FrameKentaro Matsui, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
NHK has proposed a 22.2 multichannel sound system to be an audio format for future TV broadcasting. The system consists of 22 loudspeakers and 2 low frequency effect loudspeakers for reproducing three-dimensional spatial sound. To allow 22.2 multichannel sound to be reproduced in homes, various reproduction methods that use fewer loudspeakers have been investigated. This paper describes binaural reproduction of 22.2 multichannel sound with a loudspeaker array frame integrated into a flat panel display. The processing for binaural reproduction is done in the frequency domain. Methods of designing inverse filters for binaural processing with expanded multiple control points are proposed to enlarge the listening area outside the sweet spot.
Convention Paper 8954 (Purchase now)

P7-5 An Offline Binaural Converting Algorithm for 3D Audio Contents: A Comparative Approach to the Implementation Using Channels and ObjectsRomain Boonen, SAE Institute Brussels - Brussels, Belgium
This paper describes and compares two offline binaural converting algorithms based on HRTFs (Head-Related Transfer Functions) for 3D audio contents. Recognizing the widespread use of headphones by the typical modern audio content consumer, two strategies to binaurally translate the 3D mixes are explored in order to give a convincing 3D aural experience "on the go." Aiming for the best output quality possible and avoiding the compromises inherent to real-time processing, the paper compares the channel- and the object-based models, notably looking respectively into the spectral analysis of channels for usage of HRTFs at intermediate positions between the virtual speakers and the dynamic convolution of the HRTFs with the objects according to their position coordinates in time.
Convention Paper 8955 (Purchase now)

 
 

P8 - Recording and Production


Friday, October 18, 9:00 am — 12:00 pm (Room 1E07)

Chair:
Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada

P8-1 Music Consumption Behavior of Generation Y and the Reinvention of the Recording IndustryBarry Marshall, The New England Institute of Art - Brookline, MA, USA
This paper will give an overview of the last 15 years of the recording industry’s problems with piracy and decreasing sales, while reporting on research into the music consumption behavior of a group of audio students in both the United States and in eight European countries. Audio students have a unique perspective on the issues surrounding the recording industry’s problems since the advent of Napster and the later generations of illegal file sharing. Their insights into issues like the importance of access to music, the quality of the listening experience, and the ethical quandary of participating in copyright infringement, may help point to a direction for the future of the recording industry.
Convention Paper 8956 (Purchase now)

P8-2 (Re)releasing the BeatlesBrett Barry, Syracuse University - Syracuse, NY, USA
This paper presents a comparative analysis of various Beatles releases, including original 1960s vinyl, early compact discs, and present-day digital downloads through services like iTunes. I will provide original research using source material and interviews with persons directly involved in recording and releasing Beatles albums, examining variations in dynamic range, spectral distribution, psychoacoustics, and track anomalies. Considerations are given to mastering and remastering a catalog of classics.
Convention Paper 8957 (Purchase now)

P8-3 Maximum Averaged and Peak Levels of Vocal Sound PressureBraxton Boren, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Brian Gill, New York University - New York, NY, USA
This work describes research on the maximum sound pressure level achievable by the spoken and sung human voice. Trained actors and singers were measured for peak and averaged SPLs at an on-axis distance of 1 m at three different subjective dynamic levels and also for two different vocal techniques (“back” and “mask” voices). The “back” sung voice was found to achieve a consistently lower SPL than the “mask” voice at a corresponding dynamic level. Some singers were able to achieve high averaged levels with both spoken and sung voice, while others produced much higher levels singing than speaking. A few of the vocalists were able to produce averaged levels above 90 dBA<, the highest found in the existing literature.
Convention Paper 8958 (Purchase now)

P8-4 Listener Adaptation in the Control Room: The Effect of Varying Acoustics on Listener FamiliarizationRichard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, Skywalker Sound - San Francisco, CA, USA; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
The area of auditory adaptation is of central importance to a recording engineer operating in unfamiliar or less-than-ideal acoustic conditions. This study prompts expert listeners to perform a controlled level-balancing task while exposed to three different acoustic conditions. The length of exposure is varied to test the role of adaptation on such a task. Results show that there is a significant difference in the variance of participants’ results when exposed to one condition for a longer period of time. In particular, subjects seem to most easily adapt to reflective acoustic conditions.
Convention Paper 8959 (Purchase now)

P8-5 Spectral Characteristics of Popular Commercial Recordings 1950-2010Pedro Duarte Pestana, Catholic University of Oporto - CITAR - Oporto, Portugal; Lusíada Universityof Portugal (ILID); Centro de Estatística e Aplicacoes; Zheng Ma, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Alvaro Barbosa, Catholic University of Oporto - CITAR - Oporto, Portugal; Dawn A. A. Black, Queen Mary University of London - London, UK
In this work the long-term spectral contours of a large dataset of popular commercial recordings were analyzed. The aim was to analyze overall trends, as well as yearly and genre-specific ones. A novel method for averaging spectral distributions is proposed that yields results that are prone to comparison. With it, we found out that there is a consistent leaning toward a target equalization curve that stems from practices in the music industry but also to some extent mimics natural, acoustic spectra of ensembles.
Convention Paper 8960 (Purchase now)

P8-6 A Knowledge-Engineered Autonomous Mixing SystemBrecht De Man, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
In this paper a knowledge-engineered mixing engine is introduced that uses semantic mixing rules and bases mixing decisions on instrument tags as well as elementary, low-level signal features. Mixing rules are derived from practical mixing engineering textbooks. The performance of the system is compared to existing automatic mixing tools as well as human engineers by means of a listening test, and future directions are established.
Convention Paper 8961 (Purchase now)

 
 

P9 - Applications in Audio—Part I


Friday, October 18, 9:00 am — 11:30 am (Room 1E09)

Chair:
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA

P9-1 Audio Device Representation, Control, and Monitoring Using SNMPAndrew Eales, Wellington Institute of Technology - Wellington, New Zealand; Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
The Simple Network Management Protocol (SNMP) is widely used to configure and monitor networked devices. The architecture of complex audio devices can be elegantly represented using SNMP tables. Carefully considered table indexing schemes support a logical device model that can be accessed using standard SNMP commands. This paper examines the use of SNMP tables to represent the architecture of audio devices. A representational scheme that uses table indexes to provide direct-access to context-sensitive SNMP data objects is presented. The monitoring of parameter values and the implementation of connection management using SNMP are also discussed.
Convention Paper 8962 (Purchase now)

P9-2 IP Audio in the Real-World; Pitfalls and Practical Solutions Encountered and Implemented when Rolling Out the Redundant Streaming Approach to IP AudioKevin Campbell, WorldCast Systems /APT - Belfast, N Ireland; Miami, Florida
This paper will review the development of IP audio links for audio delivery and chiefly look at the possibility of harnessing the flexibility and cost-effectiveness of the public internet for professional audio delivery. We will discuss first the benefits of IP audio when measured against traditional synchronous audio delivery and also the typical problems associated with delivering real-time broadcast audio across packetized networks, specifically in the context of unmanaged IP networks. The paper contains an examination of some techniques employed to overcome these issues with an in-depth look at the redundant packet streaming approach.
Convention Paper 8963 (Purchase now)

P9-3 Implementation of AES-64 Connection Management for Ethernet Audio/Video Bridging DevicesJames Dibley, Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
AES-64 is a standard for the discovery, enumeration, connection management, and control of multimedia network devices. This paper describes the implementation of an AES-64 protocol stack and control application on devices that support the IEEE Ethernet Audio/Video Bridging standards for streaming multimedia, enabling connection management of network audio streams.
Convention Paper 8964 (Purchase now)

P9-4 Simultaneous Acquisition of a Massive Number of Audio Channels through Optical MeansGabriel Pablo Nava, NTT Communication Science Laboratories - Kanagawa, Japan; Yutaka Kamamoto, NTT Communication Science Laboratories - Kanagawa, Japan; Takashi G. Sato, NTT Communication Science Laboratories - Kanagawa, Japan; Yoshifumi Shiraki, NTT Communication Science Laboratories - Kanagawa, Japan; Noboru Harada, NTT Communicatin Science Labs - Atsugi-shi, Kanagawa-ken, Japan; Takehiro Moriya, NTT Communicatin Science Labs - Atsugi-shi, Kanagawa-ken, Japan
Sensing sound fields at multiple locations often may become considerably time consuming and expensive when large wired sensor arrays are involved. Although several techniques have been developed to reduce the number of necessary sensors, less work has been reported on efficient techniques to acquire the data from all the sensors. This paper introduces an optical system, based on the concept of visible light communication, which allows the simultaneous acquisition of audio signals from a massive number of channels via arrays of light emitting diodes (LEDs) and a high speed camera. Similar approaches use LEDs to express the sound pressure of steady state fields as a scaled luminous intensity. The proposed sensor units, in contrast, transmit optically the actual digital audio signal sampled by the microphone in real time. Experiments to illustrate two examples of typical applications are presented: a remote acoustic imaging sensor array and a spot beamforming based on the compressive sampling theory. Implementation issues are also addressed to discuss the potential scalability of the system.
Convention Paper 8965 (Purchase now)

P9-5 Blind Microphone Analysis and Stable Tone Phase Analysis for Audio Tampering DetectionLuca Cuccovillo, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Sebastian Mann, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Patrick Aichroth, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Marco Tagliasacchi, Politecnico di Milano - Milan, Italy; Christian Dittmar, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
In this paper we present an audio tampering detection method based on the combination of blind microphone analysis and phase analysis of stable tones, e.g., the electrical network frequency (ENF). The proposed algorithm uses phase analysis to detect segments that might have been tampered. Afterwards, the segments are further analyzed using a feature vector able to discriminate among different microphone types. Using this combined approach, it is possible to achieve a significantly lower false-positive rate and higher reliability as compared to standalone phase analysis.
Convention Paper 8966 (Purchase now)

 
 

P10 - Amplifiers


Friday, October 18, 2:00 pm — 2:30 pm (Room 1E07)

Chair:
Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA

P10-1 Supply-Feedback Fully-Digital Class D Audio Amplifier Featuring 100 dBA+ SNR and 0.5 W to 1 W Selectable Output PowerRossella Bassoli, ST-Ericsson - Monza Brianza, Italy; Federico Guanziroli, ST-Ericsson - Monza Brianza, Italy; Carlo Crippa, ST-Ericsson - Monza Brianza, Italy; Germano Nicollini, ST-Ericsson - Monza Brianza, Italy
This paper presents a real-time power supply noise correction technique in a fully-digital class D audio amplifier. The power supply is scaled and applied to a 12-bits Nyquist ADC to modify the amplitude of the Pulse-Width-Modulator reference carrier. An improved supply extrapolation algorithm results to a power supply rejection from one to two orders of magnitude higher than reported implementations. Class D sensitivity to clock jitter is presented. SNR higher than 100dBA have been measured in the presence of both power supply ripple and clock jitter. The PWM and output stage are integrated in the same chip in a 0.13µm digital CMOS technology, whereas an external ADC has been used to demonstrate the validity of the supply-feedback algorithm.
Convention Paper 8968 (Purchase now)

 
 

P11 - Perception—Part 1


Friday, October 18, 2:15 pm — 4:45 pm (Room 1E09)

Chair:
Jason Corey, University of Michigan - Ann Arbor, MI, USA

P11-1 On the Perceptual Advantage of Stereo Subwoofer Systems in Live Sound ReinforcementAdam J. Hill, University of Derby - Derby, Derbyshire, UK; Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK
Recent research into low-frequency sound-source localization confirms the lowest localizable frequency is a function of room dimensions, source/listener location, and reverberant characteristics of the space. Larger spaces therefore facilitate accurate low-frequency localization and should gain benefit from broadband multichannel live-sound reproduction compared to the current trend of deriving an auxiliary mono signal for the subwoofers. This study explores whether the monophonic approach is a significant limit to perceptual quality and if stereo subwoofer systems can create a superior soundscape. The investigation combines binaural measurements and a series of listening tests to compare mono and stereo subwoofer systems when used within a typical left/right configuration.
Convention Paper 8970 (Purchase now)

P11-2 Auditory Adaptation to Loudspeakers and Listening Room AcousticsCleopatra Pike, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK
Timbral qualities of loudspeakers and rooms are often compared in listening tests involving short listening periods. Outside the laboratory, listening occurs over a longer time course. In a study by Olive et al. (1995) smaller timbral differences between loudspeakers and between rooms were reported when comparisons were made over shorter versus longer time periods. This is a form of timbral adaptation, a decrease in sensitivity to timbre over time. The current study confirms this adaptation and establishes that it is not due to response bias but may be due to timbral memory, specific mechanisms compensating for transmission channel acoustics, or attentional factors. Modifications to listening tests may be required where tests need to be representative of listening outside of the laboratory.
Convention Paper 8971 (Purchase now)

P11-3 Perception Testing: Spatial AcuityP. Nigel Brown, Ex'pression College for Digital Arts - Emeryville, CA, USA
There is a lack of readily accessible data in the public domain detailing individual spatial aural acuity. Introducing new tests of aural perception, this document specifies testing methodologies and apparatus, with example test results and analyses. Tests are presented to measure the resolution of a subject's perception and their ability to localize a sound source. The basic tests are designed to measure minimum discernible change across a 180° horizontal soundfield. More complex tests are conducted over two or three axes for pantophonic or periphonic analysis. Example results are shown from tests including unilateral and bilateral hearing aid users and profoundly monaural subjects. Examples are provided of the applicability of the findings to sound art, healthcare, and other disciplines.
Convention Paper 8972 (Purchase now)

P11-4 Evaluation of Loudness Meters Using Parameterization of Fader MovementsJon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
The EBU recommendation R 128 regarding loudness normalization is now generally accepted and countries in Europe are adopting the new recommendation. There is now a need to know more about how and when to use the different meter modes, Momentary and Short term, proposed in R 128, as well as to understand how different implementations of R 128 in audio level meters affect the engineers’ actions. A method is tentatively proposed for evaluating the performance of audio level meters in live broadcasts. The method was used to evaluate different meter implementations, three of them conforming to the recommendation from EBU, R 128. In an experiment, engineers adjusted audio levels in a simulated live broadcast show and the resulting fader movements were recorded. The movements were parameterized into “Fader movement,” “Adjustment time,” “Overshoot,” etc. Results show that the proposed parameters produced significant differences caused by the meters and that the experience of the engineer operating the fader is a significant factor.
Convention Paper 8973 (Purchase now)

P11-5 Validation of the Binaural Room Scanning Method for Cinema Audio ResearchLinda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA
Binaural Room Scanning (BRS) is a method of capturing a binaural representation of a room using a dummy head with binaural microphones in the ears and later reproducing it over a pair of calibrated headphones. In this method multiple measurements are made at differing head angles that are stored separately as data files. A playback system employing headphones and a headtracker recreates the original environment for the listener, so that as they turn their head, the rendered audio during playback matches the listeners' current head angle. This paper reports the results of a validation test of a custom BRS system that was developed for research and evaluation of different loudspeakers and different listening spaces. To validate the performance of the BRS system, listening evaluations of different in-room equalizations of a 5.1 loudspeaker system were made both in situ and via the BRS system. This was repeated using three different loudspeaker systems in three different sized listening rooms.
Convention Paper 8974 (Purchase now)

 
 

P12 - Signal Processing


Friday, October 18, 3:00 pm — 4:30 pm (1EFoyer)

P12-1 Temporal Synchronization for Audio Watermarking Using Reference Patterns in the Time-Frequency DomainTobias Bliem, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Juliane Borsum, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Stefan Krägeloh, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Temporal synchronization is an important part of any audio watermarking system that involves an analog audio signal transmission. We propose a synchronization method based on the insertion of two-dimensional reference patterns in the time-frequency domain. The synchronization patterns consist of a combination of orthogonal sequences and are continuously embedded along with the transmitted data, so that the information capacity of the watermark is not affected. We investigate the relation between synchronization robustness and payload robustness and show that the length of the synchronization pattern can be used to tune a trade-off between synchronization robustness and the probability of false positive watermark decodings. Interpreting the two-dimensional binary patterns as one-dimensional N-ary sequences, we derive a bond for the autocorrelation properties of these sequences to facilitate an exhaustive search for good patterns.
Convention Paper 8975 (Purchase now)

P12-2 Sound Source Separation Using Interaural Intensity Difference in Real EnvironmentsChan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper, a sound source separation method is proposed by using the interaural intensity difference (IID) of stereo audio signal recorded in real environments. First, in order to improve the channel separability, a minimum variance distortionless response (MVDR) beamformer is employed to increase the intensity difference between stereo channels. Then, IID between stereo channels processed by the beamformer is computed and applied to sound source separation. The performance of the proposed sound source separation method is evaluated on the stereo audio source separation evaluation campaign (SASSEC) measures. It is shown from the evaluation that the proposed method outperforms a sound source separation method without applying a beamformer.
Convention Paper 8976 (Purchase now)

P12-3 Reverberation and Dereverberation Effect on Byzantine ChantsAlexandros Tsilfidis, accusonus, Patras Innovation Hub - Patras, Greece; Charalampos Papadakos, University of Patras - Patras, Greece; Elias Kokkinis, accusonus - Patras, Greece; Georgios Chryssochoidis, National and Kapodistrian University of Athens - Athens, Greece; Dimitrios Delviniotis, National and Kapodistrian University of Athens - Athens, Greece; Georgios Kouroupetroglou, National and Kapodistrian University of Athens - Athens, Greece; John Mourjopoulos, University of Patras - Patras, Greece
Byzantine music is typically monophonic and is characterized by (i) prolonged music phrases and (ii) Byzantine scales that often contain intervals smaller than the Western semitone. As happens with most religious music genres, reverberation is a key element of Byzantine music. Byzantine churches/cathedrals are usually characterized by particularly diffuse fields and very long Reverberation Time (RT) values. In the first part of this work, the perceptual effect of long reverberation on Byzantine music excerpts is investigated. Then, a case where Byzantine music is recorded in non-ideal acoustic conditions is considered. In such scenarios, a sound engineer might require to add artificial reverb on the recordings. Here it is suggested that the step of adding extra reverberation can be preceded by a dereverberation processing to suppress the originally recorded non ideal reverberation. Therefore, in the second part of the paper a subjective test is presented that evaluates the above sound engineering scenario.
Convention Paper 8977 (Purchase now)

P12-4 Cepstrum-Based Preprocessing for Howling Detection in Speech ApplicationsRenhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Jian Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Conventional howling detection algorithms exhibit dramatic performance degradations in the presence of harmonic components of speech that have the similar properties with the howling components. To solve this problem, this paper proposes a cepstrum preprocessing-based howling detection algorithm. First, the impact of howling components on cepstral coefficients is studied in both theory and simulation. Second, according to the theoretical results, the cepstrum pre-processing-based howling detection algorithm is proposed. The Receiver Operating Characteristic (ROC) simulation results indicate that the proposed algorithm can increase the detection probability at the same false alarm rate. Objective measurements, such as Speech Distortion (SD) and Maximum Stable Gain (MSG), further confirm the validity of the proposed algorithm.
Convention Paper 8978 (Purchase now)

P12-5 Delayless Method to Suppress Transient Noise Using Speech Properties and Spectral CoherenceChengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Shiwei Wang, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
This paper proposes a novel delayless transient noise reduction method that is based on speech properties and spectral coherence. The proposed method has three stages. First, the transient noise components are detected in each subband by using energy-normalized variance. Second, we apply the harmonic property of the voiced speech and the continuity of the speech signal to reduce speech distortion in voiced speech segments. Third, we define a new spectral coherence to distinguish the unvoiced speech from the transient noise to avoid suppressing the unvoiced speech. Compared with those existing methods, the proposed method is computationally efficient and casual. Experimental results show that the proposed algorithm can effectively suppress transient noise up to 30 dB without introducing audible speech distortion.
Convention Paper 8979 (Purchase now)

P12-6 Artificial Stereo Extension Based on Hidden Markov Model for the Incorporation of Non-Stationary Energy TrajectoryNam In Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seung Ho Choi, Prof., Seoul National University of Science and Technology - Seoul, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper an artificial stereo extension method is proposed to provide stereophonic sound from mono sound. While frame-independent artificial stereo extension methods, such as Gaussian mixture model (GMM)-based extension, do not consider the correlation of energies of previous frames, the proposed stereo extension method employs a minimum mean-squared error estimator based on a hidden Markov model (HMM) for the incorporation of non-stationary energy trajectory. The performance of the proposed stereo extension method is evaluated by a multiple stimuli with a hidden reference and anchor (MUSHRA) test. It is shown from the statistical analysis of the MUSHRA test results that the stereo signals extended by the proposed stereo extension method have significantly better quality than those of a GMM-based stereo extension method.
Convention Paper 8980 (Purchase now)

P12-7 Simulation of an Analog Circuit of a Wah Pedal: A Port-Hamiltonian ApproachAntoine Falaize-Skrzek, IRCAM - Paris, France; Thomas Hélie, IRCAM-CNRS UMR 9912-UPMC - Paris, France
Several methods are available to simulate electronic circuits. However, for nonlinear circuits, the stability guarantee is not straightforward. In this paper the approach of the so-called "Port-Hamiltonian Systems" (PHS) is considered. This framework naturally preserves the energetic behavior of elementary components and the power exchanges between them. This guarantees the passivity of the (source-free part of the) circuit.
Convention Paper 8981 (Purchase now)

P12-8 Improvement in Parametric High-Band Audio Coding by Controlling Temporal Envelope with Phase ParameterKijun Kim, Kwangwoon University - Seoul, Korea; Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Eunmi Oh, Samsung Electronics Co., Ltd. - Suwon, Korea; Hochong Park, Kwangwoon University - Seoul, Korea
This study proposes a method to improve temporal envelope control in parametric high-band audio coding. Conventional parametric high-band coders may have difficulties with controlling fine high-band temporal envelope, which can cause the deterioration in sound quality for certain audio signals. In this study a novel method is designed to control temporal envelope using spectral phase as an additional parameter. The objective and the subjective evaluations suggest that the proposed method should improve the quality of sound with severely degraded temporal envelope by the conventional method.
Convention Paper 8982 (Purchase now)

 
 

P13 - Applications in Audio—Part 2


Saturday, October 19, 9:00 am — 11:30 am (Room 1E07)

Chair:
Hans Riekehof-Boehmer, SCHOEPS Mikrofone - Karlsruhe, Germany

P13-1 Level-Normalization of Feature Films Using Loudness vs SpeechEsben Skovenborg, TC Electronic - Risskov, Denmark; Thomas Lund, TC Electronic A/S - Risskov, Denmark
We present an empirical study of the differences between level-normalization of feature films using the two dominant methods: loudness normalization and speech (“dialog”) normalization. The sound of 35 recent “blockbuster” DVDs were analyzed using both methods. The difference in normalization level was up to 14 dB, on average 5.5 dB. For all films the loudness method provided the lowest normalization level and hence the greatest headroom. Comparison of automatic speech measurement to manual measurement of dialog anchors shows a typical difference of 4.5 dB, with the automatic measurement producing the highest level. Employing the speech-classifier to process rather than measure the films, a listening test suggested that the automatic measure is positively biased because it sometimes fails to distinguish between “normal speech” and speech combined with “action” sounds. Finally, the DialNorm values encoded in the AC-3 streams on DVDs were compared to both the automatically and the manually measured speech levels and found to match neither one well. AES 135th Convention Best Peer-Reviewed Paper Award Cowinner
Convention Paper 8983 (Purchase now)

P13-2 Sound Identification from MPEG-Encoded Audio FilesJoseph G. Studniarz, Montana State University - Bozeman, MT, USA; Robert C. Maher, Montana State University - Bozeman, MT, USA
Numerous methods have been proposed for searching and analyzing long-term audio recordings for specific sound sources. It is increasingly common that audio recordings are archived using perceptual compression, such as MPEG-1 Layer 3 (MP3). Rather than performing sound identification upon the reconstructed time waveform after decoding, we operate on the undecoded MP3 audio data as a way to improve processing speed and efficiency. The compressed audio format is only partially processed using the initial bitstream unpacking of a standard decoder, but then the sound identification is performed directly using the frequency spectrum represented by each MP3 data frame. Practical uses are demonstrated for identifying anthropogenic sounds within a natural soundscape recording.
Convention Paper 8984 (Purchase now)

P13-3 Pilot Workload and Speech Analysis: A Preliminary InvestigationRachel M. Bittner, New York University - New York, NY, USA; Durand R. Begault, Human Systems Integration Division, NASA Ames Research Center - Moffett Field, CA, USA; Bonny R. Christopher, San Jose State University Research Foundation, NASA Ames Research Center - Moffett Field, CA, USA
Prior research has questioned the effectiveness of speech analysis to measure a talker's stress, workload, truthfulness, or emotional state. However, the question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Convention Paper 8985 (Purchase now)

P13-4 Gain Stage Management in Classic Guitar Amplifier CircuitsBryan Martin, McGill University - Montreal, QC, Canada
The guitar amplifier became a common tool in musical creation during the second half of the 20th Century. This paper attempts to detail some of the internal mechanisms by which the tones are created and their dependent interactions. Two early amplifier designs are examined to determine the circuit relationships and design decisions that came to define the sound of the electric guitar.
Convention Paper 8986 (Purchase now)

P13-5 Audio Pre-Equalization Models for Building Structural Sound Transmission SuppressionCheng Shu, University of Rochester - Rochester, NY, USA; Fangyu Ke, University of Rochester - Rochester, NY, USA; Xiang Zhou, Bose Corporation - Framingham, MA, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a novel audio pre-equalization model that utilizes the transmission characteristics of building structures to reduce the interference reaching adjacent neighbors while maintaining the audio quality for the target listener. The audio transmission profiles are obtained by field acoustical measurements in several typical types of building structures. We also measure the spectrum of audio to adapt the pre-equalization model to a specific audio segment. We apply a computational auditory model to (1) monitor the perceptual audio quality for the target listener and (2) access the interference caused to adjacent neighbors. The system performance is then evaluated using subjective rating experiments.
Convention Paper 8987 (Purchase now)

 
 

P14 - Transducers—Part 2: Headphones and Loudspeakers


Saturday, October 19, 2:30 pm — 6:00 pm (Room 1E07)

Chair:
Christopher Struck, CJS Labs - San Francisco, CA, USA

P14-1 Application of Matrix Analysis to Identification of Mechanical and Acoustical Parameters of Compression DriversAlexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
In previous work of the author special measurement methods were used to obtain the transfer matrices of compression drivers. This data was coupled with the results of the FEA simulations of horns. It made it possible to simulate the frequency amplitude and directivity responses of horn drivers without building actual physical horns. In this work, a different set of measurements is used to obtain the transfer matrix of a vibrating diaphragm. This approach results in a more detailed and flexible method to analyze and design compression drivers. Other parameters used in the identification process are the electrical parameters of the motor and the acoustical parameters of compression chamber and phasing plug. The method was used in design and optimization of the new JBL dual-diaphragm compression driver to be used in a new JBL line array system.
Convention Paper 8988 (Purchase now)

P14-2 Application of Static and Dynamic Magnetic Finite Elements Analysis to Design and Optimization of Transducers Moving Coil MotorsAlexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA; Felix Kochendörfer, JBL/Harman Professional - Northridge, CA USA
Transducer motors are potential source of nonlinear distortion. There are several nonlinear mechanisms that generate nonlinear distortion in motors. Typical loudspeaker nonlinear models include the dependence of the Bl-product and the voice coil inductance Lvc on the voice coil position and current. These effects cause nonlinearity in the driving force, electrodynamic damping, and generate nonlinear flux modulation and reluctance force. In reality, the voice coil inductance and resistive losses depend also on frequency. To take these effects into account the so-called LR-2 impedance model is used. The L2 and R2 elements are nonlinear functions of the voice coil position and current. In this work detailed analysis of a nonlinear model incorporating these elements is performed. The developed approach is illustrated by the FEA-based design and optimization of a new JBL ultra-linear transducer to be used in a new line array system.
Convention Paper 8989 (Purchase now)

P14-3 End-of-Line Test Concepts to Achieve and Maintain Yield and Quality in High Volume Loudspeaker ProductionGregor Schmidle, NTi Audio AG - Schaan, Liechtenstein
Managing high volume, multiple line, and location loudspeaker production is a challenging task that requires interdisciplinary skills. This paper offers concepts for designing and maintaining end-of-line test systems that help to achieve and maintain consistent yield and quality. Topics covered include acoustic and electric test parameter selection, mechanical test jig design, limit finding strategies, fault-tolerant workflow creation, test system calibration and environmental influence handling as well as utilizing statistics and statistic process control.
Convention Paper 8990 (Purchase now)

P14-4 Advances in Impedance Measurement of Loudspeakers and HeadphonesSteve Temme, Listen, Inc. - Boston, MA, USA; Tony Scott, Octave Labs, LLC - Eastchester, NY, USA
Impedance measurement is often the sole electrical measurement in a battery of QC tests on loudspeakers and headphones. Two test methods are commonly used—single channel and dual channel. Dual Channel measurement offers greater accuracy as both the voltage across the speaker (or headphone) and the reference resistor are measured to calculate the impedance. Single Channel measurement methods are more commonly used on the production line because it only requires one channel of a stereo soundcard, which leaves the other free for simultaneous acoustic tests. They are less accurate, however, due to the test methods making assumptions of constant voltage or constant current. In this paper we discuss a novel electrical circuit that offers similar impedance measurement accuracy compared to complex dual channel measurement methods but using just one channel. This is expected to become popular for high throughput production line measurements where only one channel is available as the second channel of the typical soundcard is being used for simultaneous acoustic tests.
Convention Paper 8991 (Purchase now)

P14-5 Auralization of Signal Distortion in Audio Systems—Part 1: Generic ModelingWolfgang Klippel, Klippel GmbH - Dresden, Germany
Auralization techniques are developed for generating a virtual output signal of an audio system where the different kinds of signal distortion are separately enhanced or attenuated to evaluate the impact on sound quality by systematic listening or perceptive modeling. The generation of linear, regular nonlinear and irregular nonlinear distortion components is discussed to select suitable models and measurements for the auralization of each component. New methods are presented for the auralization of irregular distortion generated by defects (e.g., rub & buzz) where no physical models are available. The auralization of signal distortion is a powerful tool for defining the target performance of an audio product in marketing, developing products at optimal performance-cost ratio and for ensuring sufficient quality in manufacturing.
Convention Paper 8992 (Purchase now)

P14-6 Free Plus Diffuse Sound Field Target Earphone Response Derived from Classical Room Acoustics TheoryChristopher Struck, CJS Labs - San Francisco, CA, USA
The typical standardized free or diffuse field reference or target earphone responses in general represent boundary conditions rather than a realistic listening situation. Therefore a model using classical room acoustics is introduced to derive a more realistic target earphone response in a direct plus diffuse sound field. The insertion gain concept as applied to earphone response measurements using an ear simulator equipped test manikin is detailed in order to appropriately apply the model output to a typical earphone design. Data for multiple sound sources, multiple rooms, and variants of the direct 0° on-axis free field response are shown. Limits of the method are discussed and the results are compared to the well-known free and diffuse field responses.
Convention Paper 8993 (Purchase now)

P14-7 Listener Preferences for In-Room Loudspeaker and Headphone Target ResponsesSean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
Based on preference, listeners adjusted the relative bass and treble levels of three music programs reproduced through a high quality stereo loudspeaker system equalized to a flat in-room target response. The same task was repeated using a high quality circumaural headphone equalized to match the flat in-room loudspeaker response as measured at the eardrum reference point (DRP). The results show that listeners on average preferred an in-room loudspeaker target response that had 2 dB more bass and treble compared to the preferred headphone target response. There were significant variations in the preferred bass and treble levels due to differences in individual taste and listener training.
Convention Paper 8994 (Purchase now)

 
 

P15 - Applications in Audio—Part I


Saturday, October 19, 3:00 pm — 4:30 pm (1EFoyer)

P15-1 An Audio Game App Using Interactive Movement Sonification for Targeted Posture ControlDaniel Avissar, University of Miami - Coral Gables, FL, USA; Colby N. Leider, University of Miami - Coral Gables, FL, USA; Christopher Bennett, University of Miami - Coral Gables, FL, USA; Oygo Sound LLC - Miami, FL, USA; Robert Gailey, University of Miami - Coral Gables, FL, USA
Interactive movement sonification has been gaining validity as a technique for biofeedback and auditory data mining in research and development for gaming, sports, and physiotherapy. Naturally, the harvesting of kinematic data over recent years has been a function of an increased availability of more portable, high-precision sensory technologies, such as smart phones, and dynamic real time programming environments, such as Max/MSP. Whereas the overlap of motor skill coordination and acoustic events has been a staple to musical pedagogy, musicians and music engineers have been surprisingly less involved than biomechanical, electrical, and computer engineers in research efforts in these fields. Thus, this paper proposes a prototype for an accessible virtual gaming interface that uses music and pitch training as positive reinforcement in the accomplishment of target postures.
Convention Paper 8995 (Purchase now)

P15-2 Evaluation of the SMPTE X-Curve Based on a Survey of Re-Recording MixersLinda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA
Cinema calibration methods, which include targeted equalization curves for both dub stages and cinemas, are currently used to ensure an accurate translation of a film's sound track from dub stage to cinema. In recent years, there has been an effort to reexamine how cinemas and dub-stages are calibrated with respect to preferred or standardized room response curves. Most notable is the work currently underway reviewing the SMPTE standard ST202:2010 "For Motion-Pictures - Dubbing Stages (Mixing Rooms), Screening Rooms and Indoor Theaters -B-Chain Electroacoustic Response." There are both scientific and anecdotal reasons to question the effectiveness of the SMPTE standard in its current form. A survey of re-recording mixers was undertaken in an effort to better understand the efficaciousness of the SMPTE standard from the users' point of view.
Convention Paper 8996 (Purchase now)

P15-3 An Objective Comparison of Stereo Recording Techniques through the Use of Subjective Listener Preference RatingsWei Lim, University of Michigan - Ann Arbor, MI, USA
Stereo microphone techniques offer audio engineers the ability to capture a soundscape that approximates how one might hear realistically. To illustrate the differences between six common stereo microphone techniques, namely XY, Blumlein, ORTF, NOS, AB, and Faulkner, I asked 12 study participants to rate recordings of a Yamaha Disklavier piano. I examined the inter-rating correlation between subjects to find a preferential trend toward near-coincidental techniques. Further evaluation showed that there was a preference for clarity over spatial content in a recording. Subjects did not find that wider microphone placements provided for more spacious-sounding recordings. Using this information, this paper also discusses the need to re-evaluate how microphone techniques are typically categorized by distance between microphones.
Convention Paper 8997 (Purchase now)

P15-4 Tampering Detection of Digital Recordings Using Electric Network Frequency and Phase AngleJidong Chai, University of Tennessee - Knoxville, TN, USA; Yuming Liu, Electrical Power Research Institute, Chongqing Electric Power Corp. - Chongqing, China; Zhiyong Yuan, China Southern Power Grid - Guangzhou, China; Richard W. Conners, Virginia Polytechnic Institute and State University - Blacksburg, VA, USA; Yilu Liu, University of Tennessee - Knoxville, TN, USA; Oak Ridge National Laboratory
In the field of forensic authentication of digital audio recordings, the ENF (electric network frequency) Criterion is one of the possible tools and has shown promising results. An important task for forensic authentication is to determine whether the recordings are tampered or not. Previous work performs tampering detection by looking for the discontinuity in either the extracted ENF or phase angle from digital recordings. However, using only frequency or phase angle to detect tampering may not be sufficient. In this paper both frequency and phase angle with a corresponding reference database are used to do tampering detection of digital recordings, which result in more reliable detection. This paper briefly introduces the Frequency Monitoring Network (FNET) at UTK and its frequency and phase angle reference database. A Short-Time Fourier transform (STFT) is employed to estimate the ENF and phase angle embedded in audio files. A procedure of using the ENF criterion to detect tampering, ranging from signal preprocessing, ENF and phase angle estimation, frequency database matching to tampering detection, is proposed. Results show that utilizing frequency and phase angle jointly can improve the reliability of tampering detection in authentication of digital recordings.
Convention Paper 8998 (Purchase now)

P15-5 Portable Speech Encryption Based Anti-Tapping DeviceC. R. Suthikshn Kumar, Defence Institute of Advanced Technology (DIAT) - Girinagar, Pune, India
Tapping telephones nowadays is a major concern. There is a need for a portable device that can be attached to a mobile phone that can prevent tapping. Users want to encrypt their voice during conversation, mainly for privacy. The encrypted conversation can prevent tapping of the mobile calls as the network operator may tap the calls for various reasons. In this paper we propose a portable device that can be attached to the mobile phone/landline phone that serves as an anti-tapping device. The device encrypts the speech and decrypts the encrypted speech in real time. The main idea is that speech is unintelligible when encrypted.
Convention Paper 8999 (Purchase now)

P15-6 Personalized Audio Systems—A Bayesian ApproachJens Brehm Nielsen, Technical University of Denmark - Kongens Lyngby, Denmark; Widex A/S - Lynge, Denmark; Bjørn Sand Jensen, Technical University of Denmark - Kongens Lyngby, Denmark; Toke Jansen Hansen, Technical University of Denmark - Kongens Lyngby, Denmark; Jan Larsen, Technical University of Denmark - Kgs. Lyngby, Denmark
Modern audio systems are typically equipped with several user-adjustable parameters unfamiliar to most listeners. To obtain the best possible system setting, the listener is forced into non-trivial multi-parameter optimization with respect to the listener's own objective and preference. To address this, the present paper presents a general interactive framework for robust personalization of such audio systems. The framework builds on Bayesian Gaussian process regression in which the belief about the user's objective function is updated sequentially. The parameter setting to be evaluated in a given trial is carefully selected by sequential experimental design based on the belief. A Gaussian process model is proposed that incorporates assumed correlation among particular parameters, which provides better modeling capabilities compared to a standard model. A five-band constant-Q equalizer is considered for demonstration purposes, in which the equalizer parameters are optimized for each individual using the proposed framework. Twelve test subjects obtain a personalized setting with the framework, and these settings are significantly preferred to those obtained with random experimentation.
Convention Paper 9000 (Purchase now)

 
 

P16 - Spatial Audio—Part 2


Sunday, October 20, 9:00 am — 12:00 pm (Room 1E07)

Chair:
Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA

P16-1 Defining the Un-Aliased Region for Focused SourcesRobert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ian Drumm, University of Salford - Salford, Greater Manchester, UK
Sound field synthesis reproduction techniques such as wave field synthesis can accurately reproduce wave fronts of arbitrary curvature, including sources with the wave fronts of a source in front of the array. The wave fronts are accurate up until the spatial aliasing frequency, above which there are no longer enough secondary sources (loudspeakers) to reproduce the wave front accurately, resulting in spatial aliasing contribution manifesting as additional wave fronts propagating in directions other than intended. These contributions cause temporal, spectral, and spatial errors in the reproduced wave front. Focused sources (sources in front of the loudspeaker array) have a unique attribute in this sense in that there is a clearly defined region around the virtual source position that exhibits no spatial aliasing contributions even at an extremely high frequency. This paper presents a method for the full characterization of this un-aliased region using both a ray-based propagation model and a time domain approach.
Convention Paper 9001 (Purchase now)

P16-2 Using Ambisonics to Reconstruct Measured SoundfieldsSamuel W. Clapp, Rensselaer Polytechnic Institute - Troy, NY, USA; Anne E. Guthrie, Rensselaer Polytechnic Institute - Troy, NY, USA; Arup Acoustics - New York, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
Spherical microphone arrays can measure a soundfield's spherical harmonic components, subject to certain bandwidth constraints depending on the array radius and the number and placement of the array's sensors. Ambisonics is designed to reconstruct the spherical harmonic components of a soundfield via a loudspeaker array and also faces certain limitations on its accuracy. This paper looks at how to reconcile these sometimes conflicting limitations to produce the optimum solution for decoding. In addition, binaural modeling is used as a method of evaluating the proposed decoding method and the accuracy with which it can reproduce a measured soundfield.
Convention Paper 9002 (Purchase now)

P16-3 Subjective Evaluation of Multichannel Sound with Surround-Height ChannelsSungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Doyuen Ko, Belmont University - Nashville, TN, USA; McGill University - Montreal, Quebec, Canada; Aparna Nagendra, Rochester Institute of Technology - Rochester, NY, USA; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
In this paper we report results from an investigation of listener perception of surround-height channels added to standard multichannel stereophonic reproduction. An ITU-R horizontal loudspeaker configuration was augmented by the addition of surround-height loudspeakers in order to reproduce concert hall ambience from above the listener. Concert hall impulse responses (IRs) were measured at three heights using an innovative microphone array designed to capture surround-height ambience. IRs were then convolved with anechoic music recordings in order to produce seven-channel surround sound stimuli. Listening tests were conducted in order to determine the perceived quality of surround-height channels as affected by three loudspeaker positions and three IR heights. Fifteen trained listeners compared each reproduction condition and ranked them based on their degree of appropriateness. Results indicate that surround-height loudspeaker position has a greater influence on perceived sound quality than IR height. Listeners considered the naturalness, spaciousness, envelopment, immersiveness, and dimension of the reproduced sound field when making judgments of surround-height channel quality.
Convention Paper 9003 (Purchase now)

P16-4 A Perceptual Evaluation of Recording, Rendering, and Reproduction Techniques for Multichannel Spatial AudioDavid Romblom, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Catherine Guastavino, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
The objective of this project is to perceptually evaluate the relative merits of two different spatial audio recording and rendering techniques within the context of two different multichannel reproduction systems. The two recordings and rendering techniques are "natural," using main microphone arrays, and "virtual," using spot microphones, panning, and simulated acoustic delay. The two reproduction systems are the 3/2 system (5.1 surround) and a 12/2 system, where the frontal L/C/R triplet is replaced by a 12-loudspeaker linear array. The perceptual attributes of multichannel spatial audio have been established by previous authors. In this study magnitude ratings of selected spatial audio attributes are presented for the above treatments and results are discussed.
Convention Paper 9004 (Purchase now)

P16-5 The Optimization of Wave Field Synthesis for Real-Time Sound Sources Rendered in Non-Anechoic EnvironmentsIan Drumm, University of Salford - Salford, Greater Manchester, UK; Robert Oldfield, University of Salford - Salford, Greater Manchester, UK
Presented here is a technique that employs audio capture and adaptive recursive filter design to render in real time dynamic, interactive, and content rich soundscapes within non-anechoic environments. Typically implementations of wave field synthesis utilize convolution to mitigate for the amplitude errors associated with the application of linear loudspeaker arrays. Although recursive filtering approaches have been suggested before, this paper aims to build on the work by presenting an approach that exploits Quasi Newton adaptive filter design to construct components of the filtering chain that help compensate for both the particular system configuration and mediating environment. Early results utilizing in-house developed software running on a 112-channel wave field synthesis system show the potential to improve the quality of real-time 3-D sound rendering in less than ideal contexts.
Convention Paper 9005 (Purchase now)

P16-6 A Perceptual Evaluation of Room Effect Methods for Multichannel Spatial AudioDavid Romblom, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Catherine Guastavino, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
The room effect is an important aspect of sound recording technique and is typically captured separately from the direct sound. The perceptual attributes of multichannel spatial audio have been established by previous authors, while the psychoacoustic underpinnings of room perception are known to varying degrees. The Hamasaki Square, in combination with a delay plan and an aesthetic disposition to "natural" recordings, is an approach practiced by some sound recording engineers. This study compares the Hamasaki Square to an alternative room effect and to dry approaches in terms of a number of multichannel spatial audio attributes. A concurrent experiment investigated the same spatial audio attributes with regard to the microphone and reproduction approach. As such, the current study uses a 12/2 system based upon 3/2 (5.1 surround) where the frontal L/C/R triplet has been replaced by a linear wavefront reconstruction array. AES 135th Convention Student Technical Papers Award Cowinner
Convention Paper 9006 (Purchase now)

 
 

P17 - Applications in Audio—Part 2


Sunday, October 20, 10:30 am — 12:00 pm (1EFoyer)

P17-1 Source of ENF in Battery-Powered Digital RecordingsJidong Chai, University of Tennessee - Knoxville, TN, USA; Fan Liu, Chongqing University - Chongqing, China; Zhiyong Yuan, China Southern Power Grid - Guangzhou, China; Richard W. Conners, Virginia Polytechnic Institute and State University - Blacksburg, VA, USA; Yilu Liu, University of Tennessee - Knoxville, TN, USA; Oak Ridge National Laboratory
Forensic audio authenticity has developed remarkably over the last few years due to advances in technology of digital recording processing. The ENF (Electric Network Frequency) Criterion is one of the possible tools and has shown very promising results in forensic authentication of digital recordings. However, currently there are very few experiments and papers on studying the source of ENF signals existing in digital recordings. In addition, it is unclear whether or not there are detectable ENF traces in battery-powered digital audio recordings. In this paper the study of ENF source in battery-powered digital recordings is presented, and it shows that ENF in these recordings may not be mainly caused by low frequency electromagnetic field induction but by low frequency audible hum. This paper includes a number of experiments to explore the possible sources of ENF in battery-powered digital recordings. In these experiments, the electric and magnetic field strength in different locations is measured and the results of corresponding ENF extraction are analyzed. Understanding this underlying phenomenon is critical to verify the validity of ENF techniques.
Convention Paper 9007 (Purchase now)

P17-2 The Audio Performance Comparison and Method of Designing Switching Amplifiers Using GaN FETJaecheol Lee, Samsung Electronics Co., Ltd. - Suwon, Korea; Haejong Kim, Samsung Electronics Co., Ltd. - Suwon, Korea; Keeyeong Cho, Samsung Electronics Co., Ltd. - Suwon, Korea; Haekwang Park, Samsung Electronics DMC R&D Center - Suwon, Korea
This paper addresses physical characteristics of FET materials, the method of designing switching amplifiers using GaN FET, and the audio performance comparison of silicon and GaN FET. The physical characteristics of GaN FET are excellent, but there is a technical limitation to apply to consumer electronics. Depletion mode GaN FET is used in the proposed system. Its characteristic is better than Enhance mode. But it has the characteristic of normally turn on. To solve this problem, a cascaded GaN switch block is used. It is a combination of depletion mode GaN and enhanced mode Si. The proposed method has more of an outstanding audio performance than the switching amplifier used in silicon.
Convention Paper 9008 (Purchase now)

P17-3 Audio Effect Classification Based on Auditory Perceptual AttributesThomas Wilmering, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
While the classification of audio effects has several applications in music production, the heterogeneity of possible taxonomies, as well as the many viable points of view for organizing effects, present research problems that are not easily solved. Creating extensible Semantic Web ontologies provide a possible solution to this problem. This paper presents the results of a listening test that facilitates the creation of a classification system based on auditory perceptual attributes that are affected by the application of audio effects. The obtained results act as a basis for a classification system to be integrated in a Semantic Web Ontology covering the domain of audio effects in the context of music production.
Convention Paper 9009 (Purchase now)

P17-4 Development of Volume Balance Adjustment Device for Voices and Background Sounds within Programs for Elderly PeopleTomoyasu Komori, NHK Engineering System, Inc. - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Atsushi Imai, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Nobumasa Seiyama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Reiko Takou, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tohru Takagi, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Yasuhiro Oikawa, Waseda University - Shinjuku-ku, Tokyo, Japan
Elderly people sometimes feel that the background sounds (music and sound effects) of broadcast programs are annoying. In response, we have developed a device that can adjust the mixing balance of program sounds suitable for elderly people, on the receiver side. The device suppresses uncorrelated components in the stereo background sound in speech segments (intervals in which narration and dialog are mixed with background sounds), and suppresses background sounds only without deterioration by gain control alone in non-speech segments. By subjective evaluations, we have verified that the proposed method can suppress the background sounds of programs by an equivalent of 6 dB, and viewing experiments with elderly people have shown that program sounds have become easier to understand.
Convention Paper 9010 (Purchase now)

P17-5 Acoustical Measurement Software Housed on Mobile Operating Systems TestFelipe Tavera, Walters Storyk Design Group - Highland, NY, USA
A measurement test is devised to provide comparative results between a dedicated type I Sound Level Pressure Meter and a PDA and mobile application with proprietary additional components. The test pretends to analyze and compare results considering only frequency response, linearity over selected dynamic range, and transducer’s directivity under controlled on-site conditions. This, under the purpose of examining the accuracy of the non-dedicated hardware to perform acoustic measurements.
Convention Paper 9011 (Purchase now)

P17-6 Evaluating iBall—An Intuitive Interface and Assistive Audio Mixing Algorithm for Live Football EventsHenry Bourne, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Mixing the on-pitch audio for a live football event is a mentally challenging task requiring the experience of a skilled operator to capture all the important audio events. iBall is an intuitive interface coupled with an assistive mixing algorithm that aids the operator in achieving a comprehensive mix. This paper presents the results of subjective and empirical evaluation of the system. Using multiple stimulus comparison, event counting, fader tracking, and cross-correlation of mixes using different systems, this paper shows that lesser skilled operators can produce more reliable, more dynamic, and more consistent mixes using iBall than when mixing using the traditional fader-based approach, reducing the level of skill required to create broadcast quality mixes.
Convention Paper 9012 (Purchase now)

P17-7 A Definition of XML File Format and an Editor Application for Korean Traditional Music Notation SystemKeunwoo Choi, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Yong Ju Lee, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Yongju Lee, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Kyeongok Kang, Electronics & Telecom. Research Institute (ETRI) - Daejeon, Korea
In this paper a computer-based system for representing Jeongganbo, the Korean traditional music notation system, is introduced. The system consists of an XML Document Type Definition, an editor application, and a converter into MusicXML. All information of Jeongganbo, including notes, directions, playing techniques, and lyrics, are encoded into XML using the grammar of the proposed Document Type Definition. In addition, users can create and edit Jeongganbo XML files using the proposed editor and export them as a MusicXML file. As a result, users can represent, edit, and share the musical content of Korean traditional music in the digital domain, as well as analyze score-based content for information retrieval.
Convention Paper 9013 (Purchase now)

P17-8 The Structure of Noise Power Spectral Density-Driven Adaptive Post-Filtering AlgorithmJie Wang, Guangzhou University - Guangzhou, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Chunliang Zhang, Guangzhou University - Guangzhou, China; Yueyan Sun, Guangzhou University - Guangzhou, China
Conventional post-filtering (CPF) algorithms often use a fixed filter bandwidth to estimate the auto-spectra and the cross-spectrum. This paper first studies the drawback of the CPF algorithms under the stochastic model and discusses the ways to improve the performances of the CPF algorithms. To improve noise reduction without introducing audible speech distortion, we propose a novel spectral estimator, which is based on the structure of the noise power spectral density (NPSD). The proposed spectral estimator is applied to improve the performance of the CPF. Experimental results verify that the proposed algorithm is better than the CPF algorithms in terms of the segmental signal-to-noise-ratio improvement and the noise reduction, especially the noise reduction, is about 6 dB higher than the CPF.
Convention Paper 8943 (Purchase now)

 
 

P18 - Perception—Part 2


Sunday, October 20, 2:00 pm — 4:00 pm (Room 1E07)

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P18-1 Negative Formant Space, “O Superman,” and MeaningS. Alexander Reed, Ithaca College - New York, NY, USA
This in-progress exploration considers both some relationships between sounding and silent formants in music and the compositional idea of spectral aggregates. Using poststructuralist lenses and also interpretive spectrographic techniques informed by music theorist Robert Cogan, it offers a reading of Laurie Anderson’s 1982 hit “O Superman” that connects the aforementioned concerns of timbre with interpretive processes of musical meaning. In doing so, it contributes to the expanding musicological considerations of timbre beyond its physical, psychoacoustic, and orchestrational aspects.
Convention Paper 9014 (Purchase now)

P18-2 The Effects of Interaural Level Differences Caused by Interference between Lead and Lag on Summing LocalizationM. Torben Pastore, Rensselaer Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
Traditionally, the perception of an auditory event in the summing localization range is shown as a linear progression from a location between a coherent lead and lag to the lead location as the delay between them increases from 0-ms to approximately 1-ms. This experiment tested the effects of interference between temporally overlapping lead and lag stimuli on summing localization. We found that the perceived lateralization of the auditory event oscillates with the period of the center frequency of the stimulus, unlike what the traditional linear model would predict. Analysis shows that this is caused by interaural level differences due to interference between a coherent lead and lag.
Convention Paper 9015 (Purchase now)

P18-3 Paired Comparison as a Method for Measuring EmotionsJudith Liebetrau, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Johannes Nowak, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Ilmenau University of Technology - Ilmenau, Germany; Matthias Krause, Ilmenau University of Technology - Ilmenau, Germany; Martin Rekitt, Ilmenau University of Technology - Ilmenau, Germany; Sebastian Schneider, Ilmenau University of Technology - Ilmenau, Germany
Due to the growing complexity and functionality of multimedia systems, quality evaluation becomes a cross-disciplinary task, taking technology-centric assessment, as well as human factors into account. Undoubtedly, emotions induced during perception, have a reasonably high influence on the experienced quality. Therefore the assessment of users’ affective state is of great interest for development and improvement of multimedia systems. In this work problems of common assessment methods as well as newly applied methods in emotional research will be displayed. Direct comparison of stimuli as a method intended for faster and easier assessment of emotions is investigated and compared to previous work. The results of the investigation showed that paired comparison seems inadequate to assess multidimensional items/problems, which often occur in multi-media applications.
Convention Paper 9016 (Purchase now)

P18-4 Media Content Emphasis Using Audio Effect Contrasts: Building Quantitative Models from Subjective EvaluationsXuchen Yang, University of Rochester - Rochester, NY, USA; Zhe Wen, University of Rochester - Rochester, NY, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
In this paper we study media content emphasis patterns of audio effects and construct their quantitative models using subjective evaluation experiments. The media content emphasis patterns are produced by contrasts between effect-sections and non-effect sections, which change the focus of audience attention. We investigate media emphasis patterns of typical audio effects including equalization, reverberation, dynamic range control, and chorus. We compile audio test samples by applying different settings of audio effects and their permutations. Then we construct quantitative models based on the audience rating of the “subjective significance” of test audio segments. Statistical experiment design and analysis techniques are employed to establish the statistical significance of our proposed models.
Convention Paper 9017 (Purchase now)

 
 


Return to Paper Sessions

EXHIBITION HOURS October 18th 10am – 6pm October 19th 10am – 6pm October 20th 10am – 4pm
REGISTRATION DESK October 16th 3pm – 7pm October 17th 8am – 6pm October 18th 8am – 6pm October 19th 8am – 6pm October 20th 8am – 4pm
TECHNICAL PROGRAM October 17th 9am – 7pm October 18th 9am – 7pm October 19th 9am – 7pm October 20th 9am – 6pm
AES - Audio Engineering Society