AES New York 2013
Recording & Production Track Event Details

Wednesday, October 16, 5:00 pm — 7:00 pm (Room 1E10)

Workshop: W21 - Lies, Damn Lies, and Audio Gear Specs

Chair:
Ethan Winer, RealTraps - New Milford, CT, USA
Panelists:
Scott Dorsey, Williamsburg, VA, USA
David Moran, Boston Audio Society - Wayland, MA, USA
Mike Rivers, Gypsy Studio - Falls Church, VA, USA

Abstract:
The fidelity of audio devices is easily measured, yet vendors and magazine reviewers often omit important details. For example, a loudspeaker review will state the size of the woofer but not the low frequency cut-off. Or the cut-off frequency is given, but without stating how many dB down or the rate at which the response rolls off below that frequency. Or it will state distortion for the power amps in a powered monitor but not the distortion of the speakers themselves, which of course is what really matters. This workshop therefore defines a list of standards that manufacturers and reviewers should follow when describing the fidelity of audio products. It will also explain why measurements are a better way to assess fidelity than listening alone.

Excerpts from this workshop are available on YouTube.

Thursday, October 17, 9:00 am — 11:00 am (Room 1E12)

Live Sound Seminar: LS1 - AC Power and Grounding

Chair:
Bruce C. Olson, Olson Sound Design - Brooklyn Park, MN, USA; Ahnert Feistel Media Group - Berlin, Germany
Panelist:
Bill Whitlock, Jensen Transformers, Inc. - Chatsworth, CA, USA; Whitlock Consulting - Oxnard, CA, USA

Abstract:
There is a lot of misinformation about what is needed for AC power for events. Much of it has to do with life-threatening advice. This panel will discuss how to provide AC power properly and safely and without causing noise problems. This session will cover power for small to large systems, from a couple boxes on sticks up to multiple stages in ballrooms, road houses, and event centers; large scale installed systems, including multiple transformers and company switches, service types, generator sets, 1ph, 3ph, 240/120 208/120. Get the latest information on grounding and typical configurations by this panel of industry veterans.

Thursday, October 17, 9:00 am — 10:30 am (Room 1E13)

Tutorial: T1 - FXpertise: Compression

Presenter:
Alex Case, University of Massachusetts Lowell - Lowell, MA, USA

Abstract:
Compressors were invented to control dynamic range. The next day, engineers started doing so much more—increasing loudness, improving intelligibility, adding distortion, extracting ambience, and, most importantly, reshaping timbre. This diversity of signal processing possibilities is realized only indirectly, by choosing the right compressor for the job and coaxing the parameters of ratio, threshold, attack, and release into place. Learn when to reach for compression, know a good starting place for compressor settings, and advance your understanding of what to listen for and which way to tweak.

Thursday, October 17, 9:00 am — 11:00 am (Room 1E07)

Paper Session: P1 - Transducers—Part 1: Microphones

Chair:
Helmut Wittek, SCHOEPS GmbH - Karlsruhe, Germany

P1-1 Portable Spherical Microphone for Super Hi-Vision 22.2 Multichannel Audio—Kazuho Ono, NHK Engineering System, Inc. - Setagaya-ku, Tokyo, Japan; Toshiyuki Nishiguchi, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kentaro Matsui, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Kimio Hamasaki, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan
NHK has been developing a portable microphone for the simultaneous recording of 22.2ch multichannel audio. The microphone is 45 cm in diameter and has acoustic baffles that partition the sphere into angular segments, in each of which an omnidirectional microphone element is mounted. Owing to the effect of the baffles, each segment works as a narrow angle directivity and a constant beam width in higher frequencies above 6 kHz. The directivity becomes wider as frequency decreases and that it becomes almost omnidirectional below 500 Hz. The authors also developed a signal processing method that improves the directivity below 800 Hz.
Convention Paper 8922 (Purchase now)

P1-2 Sound Field Visualization Using Optical Wave Microphone Coupled with Computerized Tomography—Toshiyuki Nakamiya, Tokai University - Kumamoto, Japan; Fumiaki Mitsugi, Kumamoto University - Kumamoto, Japan; Yoichiro Iwasaki, Tokai University - Kumamoto, Japan; Tomoaki Ikegami, Kumamoto University - Kumamoto, Japan; Ryoichi Tsuda, Tokai University - Kumamoto, Japan; Yoshito Sonoda, Tokai University - Kumamoto, Kumamoto, Japan
The novel method, which we call the “Optical Wave Microphone (OWM)” technique, is based on a Fraunhofer diffraction effect between a sound wave and a laser beam. The light diffraction technique is an effective sensing method to detect the sound and is flexible for practical uses as it involves only a simple optical lens system. OWM is also very useful to detect the sound wave without disturbing the sound field. This new method can realize high accuracy measurement of slight density change of atmosphere. Moreover, OWM can be used for sound field visualization by computerized tomography (CT) because the ultra-small modulation by the sound field is integrated along the laser beam path.
Convention Paper 8923 (Purchase now)

P1-3 Proposal of Optical Wave Microphone and Physical Mechanism of Sound Detection—Yoshito Sonoda, Tokai University - Kumamoto, Kumamoto, Japan; Toshiyuki Nakamiya, Tokai University - Kumamoto, Japan
An optical wave microphone with no diaphragm, which uses wave optics and a laser beam to detect sounds, can measure sounds without disturbing the sound field. The theoretical equation for this measurement can be derived from the optical diffraction integration equation coupled to the optical phase modulation theory, but the physical interpretation or meaning of this phenomenon is not clear from the mathematical calculation process alone. In this paper the physical meaning in relation to wave-optical processes is considered. Furthermore, the spatial sampling theorem is applied to the interaction between a laser beam with a small radius and a sound wave with a long wavelength, showing that the wavenumber resolution is lost in this case, and the spatial position of the maximum intensity peak of the optical diffraction pattern generated by a sound wave is independent of the sound frequency. This property can be used to detect complex tones composed of different frequencies with a single photo-detector. Finally, the method is compared with the conventional Raman-Nath diffraction phenomena relating to ultrasonic waves. AES 135th Convention Best Peer-Reviewed Paper Award Cowinner
Convention Paper 8924 (Purchase now)

P1-4 Numerical Simulation of Microphone Wind Noise, Part 2: Internal Flow—Juha Backman, Nokia Corporation - Espoo, Finland
This paper discusses the use of the computational fluid dynamics (CFD) for computational analysis of microphone wind noise. The previous part of this work showed that an external flow produces a pressure difference on the external boundary, and this pressure causes flow in the microphone internal structures, mainly between the protective grid and the diaphragm. The examples presented in this work describe the effect of microphone grille structure and microphone diaphragm properties on the wind noise sensitivity related to the behavior of this kind of internal flows.
Convention Paper 8925 (Purchase now)

Thursday, October 17, 9:00 am — 12:00 pm (Room 1E09)

Paper Session: P2 - Signal Processing—Part 1

Chair:
Jaeyong Cho, Samsung Electronics DMC R&D Center - Suwon, Korea

P2-1 Linear Phase Implementation in Loudspeaker Systems: Measurements, Processing Methods, and Application Benefits—Rémi Vaucher, NEXO - Plailly, France
The aim of this paper is to present a new generation of EQ. It provides a way to ensure phase compatibility from 20 Hz to 20 kHz over a range of different speaker cabinets. This method is based on a mix of FIR filters and IIR filters. The use of FIR filters allows a tuning of the phase independently from magnitude and allows an acoustic linear phase above 500 Hz. All targets used to compute FIR coefficient are based upon extensive measurement and subjective listening tests. A template has been set to normalize the crossover frequencies in the low range, enabling compatibility of every sub-bass with the main cabinets.
Convention Paper 8926 (Purchase now)

P2-2 Applications of Inverse Filtering to the Optimization of Professional Loudspeaker Systems—Daniele Ponteggia, Studio Ponteggia - Terni (TR), Italy; Mario Di Cola, Audio Labs Systems - Casoli (CH), Italy
The application of FIR filter technology to implement Inverse Filtering into a Professional Loudspeakers Systems nowadays is easier and more affordable because of the latest development of DSP technology and also because of the existence of a new DSP platform dedicated to the end user. This paper presents an analysis, based on real world examples, of a possible methodology that can be used in order to synthesize an appropriate Inverse Filter both to process a single driver, from a Time Domain perspective in a multi-way system, and to process the output pass-band from of a multi-way system for phase linearization. The analysis and discussion of results for some applications will be shown through real world test and measurements.
Convention Paper 8927 (Purchase now)

P2-3 Live Event Performer Tracking for Digital Console Automation Using Industry-Standard Wireless Microphone Systems—Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Kristian "Kit" Lane, University of Derby - Derby, UK; Adam P. Rosenthal, Gand Concert Sound - Elk Grove Village, IL, USA; Gary Gand, Gand Concert Sound - Elk Grove Village, IL, USA
The ever-increasing popularity of digital consoles for audio and lighting at live events provides a greatly expanded set of possibilities regarding automation. This research works toward a solution for performer tracking using wireless microphone signals that operates within the existing infrastructure at professional events. Principles of navigation technology such as received signal strength (RSS), time difference of arrival (TDOA), angle of arrival (AOA), and frequency difference of arrival (FDOA) are investigated to determine their suitability and practicality for use in such applications. Analysis of potential systems indicates that performer tracking is feasible over the width and depth of a stage using only two antennas with a suitable configuration, but limitations of current technology restrict the practicality of such a system.
Convention Paper 8928 (Purchase now)

P2-4 Real-Time Simulation of a Family of Fractional-Order Low-Pass Filters—Thomas Hélie, IRCAM-CNRS UMR 9912-UPMC - Paris, France
This paper presents a family of low-pass filters, the attenuation of which can be continuously adjusted from 0 decibel per octave (filter is a unit gain) to -6 decibels per octave (standard one-pole filter). This continuum is produced through a filter of fractional-order between 0 (unit gain) and 1 (one-pole filter). Such a filter proves to be a (continuous) infinite combination of one-pole filters. Efficient approximations are proposed from which simulations in the time-domain are built.
Convention Paper 8929 (Purchase now)

P2-5 A Computationally Efficient Behavioral Model of the Nonlinear Devices—Jaeyong Cho, Samsung Electronics DMC R&D Center - Suwon, Korea; Hanki Kim, Samsung Electronics DMC R&D Center - Suwon, Korea; Seungkwan Yu, Samsung Electronics DMC R&D Center - Suwon, Korea; Haekwang Park, Samsung Electronics DMC R&D Center - Suwon, Korea; Youngoo Yang, Sungkyunkwan University - Suwon, Korea
This paper presents a new computationally efficient behavioral model to reproduce the output signal of the nonlinear devices for the real-time systems. The proposed model is designed using the memory gain structure and verified for its accuracy and computational complexity compared to other nonlinear models. The model parameters are extracted from a vacuum tube amplifier, Heathkit’s W-5M, using the exponentially-swept sinusoidal signal. The experimental results show that the proposed model has 27% of the computational load against the generalized Hammerstein model and maintains similar modeling accuracy.
Convention Paper 8930 (Purchase now)

P2-6 High-Precision Score-Based Audio Indexing Using Hierarchical Dynamic Time Warping—Xiang Zhou, Bose Corporation - Framingham, MA, USA; Fangyu Ke, University of Rochester - Rochester, NY, USA; Cheng Shu, University of Rochester - Rochester, NY, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a novel audio signal processing algorithm of high-precision score-based audio indexing that accurately maps a music score with its corresponding audio. Specifically we improve the time precision of existing score-audio alignment algorithms to find the accurate positions of audio onsets and offsets. We achieve higher time precision by (1) improving the resolution of alignment sequences, and (2) admitting a hierarchy of spectrographic analysis results as audio alignment features. The performance of our proposed algorithm is testified by comparing the segmentation results with manually composed reference datasets. Our proposed algorithm achieves robust alignment results and enhanced segmentation accuracy and thus is suitable for audio engineering applications such as automatic music production and human-media interactions.
Convention Paper 8931 (Purchase now)

Thursday, October 17, 10:30 am — 12:00 pm (Room 1E13)

Workshop: W2 - FX Design Panel: Compression

Chair:
Alex Case, University of Massachusetts Lowell - Lowell, MA, USA
Panelists:
David Derr, Empirical Labs - Parsippany, NJ, USA
Dave Hill, Crane Song - Superior, WI, USA; Dave Hill Designs
Colin McDowell, McDSP - Sunnyvale, CA, USA

Abstract:
Meet the designers whose talents and philosophies are reflected in the products they create, driving sound quality, ease of use, reliability, price, and all the other attributes that motivate us to patch, click, and tweak their effects processors.

Thursday, October 17, 11:00 am — 1:00 pm (Room 1E12)

Workshop: W4 - Microphone Specifications—Believe it or Not

Chair:
Eddy B. Brixen, EBB-consult/DPA Microphones - Smorum, Denmark
Panelists:
Juergen Breitlow, Neumann - Berlin, Germany
Jackie Green, Audio-Technica U.S., Inc. - Stow, OH, USA
Bill Whitlock, Jensen Transformers, Inc. - Chatsworth, CA, USA; Whitlock Consulting - Oxnard, CA, USA
Helmut Wittek, SCHOEPS GmbH - Karlsruhe, Germany
Joerg Wuttke, Joerg Wuttke Consultancy - Pfinztal, Germany

Abstract:
There are lots and lots of microphones available to the audio engineer. The final choice is often made on the basis of experience or perhaps just habits. (Sometimes the mic is chosen because of the looks … .) Nevertheless, there is essential and very useful information to be found in the microphone specifications. This workshop will present the most important microphone specs and provide the attendees with up-to-date information on how these are obtained and understood. Each member of the panel—all related to industry top brands—will present one item from the spec sheet. The workshop takes a critical look on how specs are presented to the user, what to look for and what to expect. The workshop is organized by the AES Technical Committee on Microphones and Applications.

This session is presented in association with the AES Technical Committee on Microphones and Applications

Thursday, October 17, 2:15 pm — 3:45 pm (Room 1E08)

Broadcast and Streaming Media: B3 - Listener Fatigue and Retention

Chair:
Richard Burden, Richard W. Burden Associates - Canoga Park, CA, USA
Panelists:
Frank Foti, Telos - New York, NY, USA
Greg Ogonowski, Orban - San Leandro, CA, USA
Sean Olive, Harman International - Northridge, CA, USA
Robert Reams, Psyx Research
Elliot Scheiner

Abstract:
This panel will discuss listener fatigue and its impact on listener retention. While listener fatigue is an issue of interest to broadcasters, it is also an issue of interest to telecommunications
service providers, consumer electronics manufacturers, music producers, and others. Fatigued listeners to a broadcast program may tune out, while fatigued listeners to a cell phone conversation may switch to another carrier, and fatigued listeners to a portable media player may purchase another company’s product. The experts on this panel will discuss their research and experiences with listener fatigue and its impact on listener retention.

Thursday, October 17, 2:30 pm — 4:30 pm (Room 1E07)

Paper Session: P4 - Room Acoustics

Chair:
Ben Kok, SCENA acoustic consultants - Uden, The Netherlands

P4-1 Investigating Auditory Room Size Perception with Autophonic Stimuli—Manuj Yadav, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia; Luis Miranda, University of Sydney - Sydney, NSW, Australia; William L. Martens, University of Sydney - Sydney, NSW, Australia; Doheon Lee, University of Sydney - Sydney, NSW, Australia; Ralph Collins, University of Sydney - Sydney, NSW, Australia
Although looking at a room gives a visual indicator of its “size,” auditory stimuli alone can also provide an appreciation of room size. This paper investigates such aurally perceived room size by allowing listeners to hear the sound of their own voice in real-time through two modes: natural conduction and auralization. The auralization process involved convolution of the talking-listener’s voice with an oral-binaural room impulse response (OBRIR; some from actual rooms, and others manipulated), which was output through head-worn ear-loudspeakers, and thus augmented natural conduction with simulated room reflections. This method allowed talking-listeners to rate room size without additional information about the rooms. The subjective ratings were analyzed against relevant physical acoustic measures derived from OBRIRs. The results indicate an overall strong effect of reverberation time on the room size judgments, expressed as a power function, although energy measures were also important in some cases.
Convention Paper 8934 (Purchase now)

P4-2 Digitally Steered Columns: Comparison of Different Products by Measurement and Simulation—Stefan Feistel, AFMG Technologies GmbH - Berlin, Germany; Anselm Goertz, Institut für Akustik und Audiotechnik (IFAA) - Herzogenrath, Germany
Digitally steered loudspeaker columns have become the predominant means to achieve satisfying speech intelligibility in acoustically challenging spaces. This work compares the performance of several commercially available array loudspeakers in a medium-size, reverberant church. Speech intelligibility as well as other acoustic quantities are compared on the basis of extensive measurements and computer simulations. The results show that formally different loudspeaker products provide very similar transmission quality. Also, measurement and modeling results match accurately within the uncertainty limits.
Convention Paper 8935 (Purchase now)

P4-3 A Concentric Compact Spherical Microphone and Loudspeaker Array for Acoustical Measurements—Luis Miranda, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia; Ken Stewart, University of Sydney - Sydney, NSW, Australia
Several commonly used descriptors of acoustical conditions in auditoria (ISO 3382-1) utilize omnidirectional transducers for their measurements, disregarding the directional properties of the source and the direction of arrival of reflections. This situation is further complicated when the source and the receiver are collocated as would be the case for the acoustical characterization of stages as experienced by musicians. A potential solution to this problem could be a concentric compact microphone and loudspeaker array, capable of synthesizing source and receiver spatial patterns. The construction of a concentric microphone and loudspeaker spherical array is presented in this paper. Such a transducer could be used to analyze the acoustic characteristics of stages for singers, while preserving the directional characteristics of the source, acquiring spatial information of reflections and preserving the spatial relationship between source and receiver. Finally, its theoretical response and optimal frequency range are explored.
Convention Paper 8936 (Purchase now)

P4-4 Adapting Loudspeaker Array Radiation to the Venue Using Numerical Optimization of FIR Filters—Stefan Feistel, AFMG Technologies GmbH - Berlin, Germany; Mario Sempf, AFMG Technologies GmbH - Berlin, Germany; Kilian Köhler, IBS Audio - Berlin, Germany; Holger Schmalle, AFMG Technologies GmbH - Berlin, Germany
Over the last two decades loudspeaker arrays have been employed increasingly for sound reinforcement. Their high output power and focusing ability facilitate extensive control capabilities as well as extraordinary performance. Based on acoustic simulation, numerical optimization of the array configuration, particularly of FIR filters, adds a new level of flexibility. Radiation characteristics can be established that are not available for conventionally tuned sound systems. It is shown that substantial improvements in sound field uniformity and output SPL can be achieved. Different real-world case studies are presented based on systematic measurements and simulations. Important practical implementation aspects are discussed such as the spatial resolution of driven sources, the number of FIR coefficients, and the quality of loudspeaker data.
Convention Paper 8937 (Purchase now)

Thursday, October 17, 2:30 pm — 5:00 pm (Room 1E09)

Paper Session: P5 - Signal Processing—Part 2

Chair:
Juan Pablo Bello, New York University - New York, NY, USA

P5-1 Evaluation of Dynamics Processors’ Effects Using Signal Statistics—Tim Shuttleworth, Renkus Heinz - Oceanside, CA, USA
Existing methods of evaluating the action of dynamics processors, i.e., limiters, compressors, expanders, and gates do not provide results that have a direct correlation with the perceived and actual effect on the signals dynamics; aspects such as crest factor, dynamic range, and subjective acceptability of the processed signal or degree of optimization of the use of the transmission medium. A method is described that uses statistical analysis of the pre- and post-processed signal to allow the processor’s action to be characterized in a manner that correlates to the perceived effects and actual modification of signal dynamics. A number of signal statistical and user definable characteristics are introduced and, in addition to well-known statistical techniques, form the basis for this evaluation method.
Convention Paper 8938 (Purchase now)

P5-2 A New Ultra Low Delay Audio Communication Coder—Brijesh Singh Tiwari, ATC Labs - Noida, India; Midathala Harish, ATC Labs - Noida, India; Deepen Sinha, ATC Labs - Newark, NJ, USA
We propose a new full bandwidth audio codec that has algorithmic delay requirement as low as 0.67 ms to a maximum of 2.7 ms. Low delay is a critical requirement in real many time applications such as networked music performances, wireless speakers and microphones, and Bluetooth devices. The proposed Ultra Low Delay Audio Communication Codec (ULDACC) is a perceptual transform codec utilizing very small transform windows the shape of which is optimized to compensate for the lack of frequency resolution. Specially adapted psychoacoustic model and intra-frame coding techniques are employed to achieve transparent audio quality for bit rates approaching 128 kbps/channel at the algorithmic delay of about 1 ms.
Convention Paper 8939 (Purchase now)

P5-3 Cascaded Long Term Prediction of Polyphonic Signals for Low Power Decoders—Tejaswi Nanjundaswamy, University of California, Santa Barbara - Santa Barbara, CA, USA; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
An optimized cascade of long term prediction filters, each corresponding to an individual periodic component of the polyphonic audio signal, was shown in our recent work to be highly effective as an inter-frame prediction tool for low delay audio compression. The earlier paradigm involved backward adaptive parameter estimation, and hence significantly higher decoder complexity, which is unsuitable for applications that pose a stringent power constraint on the decoder. This paper overcomes this limitation via extension to include forward adaptive parameter estimation, in two modes that trade complexity for side information: (i) a subset of parameters is sent as side information and the remaining is backward adaptively estimated; (ii) all parameters are sent as side information. We further exploit inter-frame parameter dependencies to minimize the side information rate. Objective and subjective evaluation results clearly demonstrate substantial gains and effective control of the tradeoff between rate-distortion performance and decoder complexity.
Convention Paper 8940 (Purchase now)

P5-4 Voice Coding with Opus—Koen Vos, vocTone - San Francisco, CA, USA; Karsten Vandborg Sørensen, Microsoft - Stockholm, Sweden; Søren Skak Jensen, GN Netcom A/S - Ballerup, Denmark; Jean-Marc Valin, Mozilla Corporation - Mountain View, CA, USA
In this paper we describe the voice mode of the Opus speech and audio codec. As only the decoder is standardized, the details in this paper will help anyone who wants to modify the encoder or gain a better understanding of the codec. We go through the main components that constitute the voice part of the codec, provide an overview, give insights, and discuss the design decisions made during the development. Tests have shown that Opus quality is comparable to or better than several state-of-the-art voice codecs, while covering a much broader application area than competing codecs.
Convention Paper 8941 (Purchase now)

P5-5 High-Quality, Low-Delay Music Coding in the Opus Codec—Jean-Marc Valin, Mozilla Corporation - Mountain View, CA, USA; Gregory Maxwell, Mozilla Corporation; Timothy B. Terriberry, Mozilla Corporation; Koen Vos, vocTone - San Francisco, CA, USA
The IETF recently standardized the Opus codec as RFC6716. Opus targets a wide range of real-time Internet applications by combining a linear prediction coder with a transform coder. We describe the transform coder, with particular attention to the psychoacoustic knowledge built into the format. The result out-performs existing audio codecs that don't operate under real-time constraints.
Convention Paper 8942 (Purchase now)

Thursday, October 17, 3:00 pm — 4:30 pm (1EFoyer)

Poster: P6 - Spatial Audio

P6-1 Improvement of 3-D Sound Systems by Vertical Loudspeaker Arrays—Akira Saji, University of Aizu - Aizuwakamatsu City, Japan; Keita Tanno, University of Aizu - Aizuwakamatsu, Fukushima, Japan; Jie Huang, University of Aizu - Aizuwakamatsu City, Japan
Recently we proposed a 3-D sound system using a horizontal arrangement of loudspeakers by combining the effect of HRTF and the amplitude panning method. In that system, loudspeakers are set at the height of subject's ear level and its sweet-spot is limited by the height of loudspeakers. When the listener's ear level is different from the loudspeakers, it will cause difficulty of sound localization or breakdown of sound localization. However, it is difficult to adjust properly both the height of loudspeakers and subject's ear level every time. In this paper we aimed to improve the robustness of the 3-D sound system using vertical loudspeaker arrays. As a result of experiments, we prove that the loudspeaker arrays can improve the robustness of the 3-D sound system.
Convention Paper 8944 (Purchase now)

P6-2 An Integrated Algorithm for Optimized Playback of Higher Order Ambisonics—Robert E. Davis, University of the West of Scotland - Paisley, Scotland, UK; D. Fraser Clark, University of the West of Scotland - Paisley, Scotland, UK
An algorithm is presented that gives improved playback performance of higher order ambisonic material on practical loudspeaker arrays. The optimizations are based on sound field reproduction theories with additional parameters to account for the compensation of loudspeaker and listener positioning constraints and numbers of listeners. Automatic calculation of loudspeaker distances is also achieved based on room dimensions and a gain calibration routine is incorporated. Results are given to quantify the resulting algorithm performance, informal listening tests were carried out, and aspects of implementation are discussed.
Convention Paper 8945 (Purchase now)

P6-3 I Hear NY3D: Ambisonic Capture and Reproduction of an Urban Sound Environment—Braxton Boren, New York University - New York, NY, USA; Areti Andreopoulou, New York University - New York, NY, USA; Michael Musick, New York University - New York, NY, USA; Hariharan Mohanraj, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This paper describes “I Hear NY3D,” a project for capturing and reproducing 3D soundfields in New York City. First order Ambisonic recordings of various locations in Manhattan have taken place, to be used for both aesthetic and informational purposes. The collected data allows for the creation of high quality, fully immersive auditory soundscapes that can be played back at any periphonic speaker array configuration through real time matrixing. Binaural renderings of the same data are also available for more portable applications.
Convention Paper 8946 (Purchase now)

P6-4 I Hear NY3D: An Ambisonic Installation Reproducing NYC Soundscapes—Michael Musick, New York University - New York, NY, USA; Areti Andreopoulou, New York University - New York, NY, USA; Braxton Boren, New York University - New York, NY, USA; Hariharan Mohanraj, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
This paper describes the development of a reproduction installation for the "I Hear NY3D" project. This project’s aim is the capture and reproduction of immersive soundfields around Manhattan. A means of creating an engaging reproduction of these soundfields through the medium of an installation will also be discussed. The goal for this installation is an engaging, immersive experience that allows participants to create connections to the soundscapes and observe relationships between the soundscapes. This required the consideration of how to best capture and reproduce these recordings, the presentation of simultaneous multiple soundscapes, and a means of interaction with the material.
Convention Paper 8947 (Purchase now)

P6-5 Auralization of Measured Room Impulse Responses Considering Head Movements—Anthony Parks, Rensselaer Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Samuel W. Clapp, Rensselaer Polytechnic Institute - Troy, NY, USA
The purpose of this paper is to describe a novel method for auralizing measured room impulse responses over headphones using impulse responses recorded from a 16-channel spherical microphone array decoded to eight virtual loudspeakers mixed-down binaurally using nonindividualized HRTFs. The novelty of this method lies not in the ambisonic binaural-mixdown process, but rather, the use of head pose estimation code from the Kinect API sent to a Max/MSP patch using Open Sound Control messages. This provides a fast, reliable alternative to auralizations over headphones that allow for head movements without the need for head-related transfer function interpolation by performing a rotation on the spherical harmonic that corresponds to the listener's head rotation.
Convention Paper 8948 (Purchase now)

P6-6 Reduced Representations of HRTF Datasets: A Discriminant Analysis Approach—Areti Andreopoulou, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Juan Pablo Bello, New York University - New York, NY, USA
This paper discusses reduced representations of HRTF datasets, fully descriptive of one’s personalized properties. The data reduction is achieved through elimination of the least discriminative binaural-filter pairs from a set. For this purpose Linear Discriminant Analysis (LDA) was applied on the Music and Audio Research Laboratory (MARL) database of repeated HRTF measurements, which resulted in 67% data reduction. The effectiveness of the sparse HRTF mapping is assessed by way of the performance of a database matching system, followed by a subjective evaluation study. The results indicate that participants have demonstrated strong preference towards the selected HRTF sets, in contrast to the generic KEMAR set and the least similar selections from the repository.
Convention Paper 8949 (Purchase now)

P6-7 Investigation of HRTF Sets Using Content with Limited Spatial Resolution—Johann-Markus Batke, Audio & Acoustics, Technicolor Research & Innovation - Hannover, Germany; Stefan Abeling, Audio & Acoustics, Technicolor Research & Innovation - Hannover, Germany; Stefan Balke, Leibniz Universität Hannover - Hannover, Germany; Gerald Enzner, Ruhr-Universität Bochum - Bochum, Germany
Headphone rendering of sound fields represented by Higher Order Ambisonics (HOA) is greatly facilitated by the binaural synthesis of virtual loudspeakers. Individualized head-related transfer function (HRTF) sets corresponding to the spatial positions of the virtual loudspeakers are used in conjunction with head-tracking to achieve the externalization of the sound event. We investigate the localization accuracy for HOA representations of limited spatial resolution using individualized and generic HRTF sets.
Convention Paper 8950 (Purchase now)

Thursday, October 17, 4:30 pm — 7:00 pm (Room 1E07)

Paper Session: P7 - Spatial Audio—Part 1

Chair:
Wieslaw Woszczyk, McGill University - Montreal, QC, Canada

P7-1 Reproducing Real-Life Listening Situations in the Laboratory for Testing Hearing Aids—Pauli Minnaar, Oticon A/S - Smørum, Denmark; Signe Frølund Albeck, Oticon A/S - Smørum, Denmark; Christian Stender Simonsen, Oticon A/S - Smørum, Denmark; Boris Søndersted, Oticon A/S - Smørum, Denmark; Sebastian Alex Dalgas Oakley, Oticon A/S - Smørum, Denmark; Jesper Bennedbæk, Oticon A/S - Smørum, Denmark
The main purpose of the current study was to demonstrate how a Virtual Sound Environment (VSE), consisting of 29 loudspeakers, can be used in the development of hearing aids. A listening test was done by recording everyday sound scenes with a spherical microphone array that has 32 microphone capsules. The playback in the VSE was implemented by convolving the recordings with inverse filters, which were derived by directly inverting a matrix of 928 measured transfer functions. While listening to 5 sound scenes, 10 hearing-impaired listeners could switch between hearing aid settings in real time, by interacting with a touch screen in a MUSHRA-like test. The setup proves to be very valuable for ensuring that hearing aid settings work well in real-world situations.
Convention Paper 8951 (Purchase now)

P7-2 Measuring Speech Intelligibility in Noisy Environments Reproduced with Parametric Spatial Audio—Teemu Koski, Aalto University - Espoo, Finland; Ville Sivonen, Cochlear Nordic AB - Vantaa, Finland; Ville Pulkki, Aalto University - Espoo, Finland; Technical University of Denmark - Denmark
This work introduces a method for speech intelligibility testing in reproduced sound scenes. The proposed method uses background sound scenes augmented by target speech sources and reproduced over a multichannel loudspeaker setup with time-frequency domain parametric spatial audio techniques. Subjective listening tests were performed to validate the proposed method: speech recognition thresholds (SRT) in noise were measured in a reference sound scene and in a room where the reference was reproduced by a loudspeaker setup. The listening tests showed that for normally-hearing test subjects the method provides nearly indifferent speech intelligibility compared to the real-life reference when using a nine-loudspeaker reproduction setup in anechoic conditions (<0.3 dB error in SRT). Due to the flexible technical requirements, the method is potentially applicable to clinical environments. AES 135th Convention Student Technical Papers Award Cowinner
Convention Paper 8952 (Purchase now)

P7-3 On the Influence of Headphones on Localization of Loudspeaker Sources—Darius Satongar, University of Salford - Salford, Greater Manchester, UK; Chris Pike, BBC Research and Development - Salford, Greater Manchester, UK; University of York - Heslington, York, UK; Yiu W. Lam, University of Salford - Salford, UK; Tony Tew, University of York - York, UK
When validating systems that use headphones to synthesize virtual sound sources, a direct comparison between virtual and real sources is sometimes needed. This paper presents objective and subjective measurements of the influence of headphones on external loudspeaker sources. Objective measurements of the effect of a number of headphone models are given and analyzed using an auditory filter bank and binaural cue extraction. Objective results highlight that all of the headphones had an effect on localization cues. A subjective localization test was undertaken using one of the best performing headphones from the measurements. It was found that the presence of the headphones caused a small increase in localization error but also that the process of judging source location was different, highlighting a possible increase in the complexity of the localization task.
Convention Paper 8953 (Purchase now)

P7-4 Binaural Reproduction of 22.2 Multichannel Sound with Loudspeaker Array Frame—Kentaro Matsui, NHK Science & Technology Research Laboratories - Setagaya, Tokyo, Japan; Akio Ando, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan
NHK has proposed a 22.2 multichannel sound system to be an audio format for future TV broadcasting. The system consists of 22 loudspeakers and 2 low frequency effect loudspeakers for reproducing three-dimensional spatial sound. To allow 22.2 multichannel sound to be reproduced in homes, various reproduction methods that use fewer loudspeakers have been investigated. This paper describes binaural reproduction of 22.2 multichannel sound with a loudspeaker array frame integrated into a flat panel display. The processing for binaural reproduction is done in the frequency domain. Methods of designing inverse filters for binaural processing with expanded multiple control points are proposed to enlarge the listening area outside the sweet spot.
Convention Paper 8954 (Purchase now)

P7-5 An Offline Binaural Converting Algorithm for 3D Audio Contents: A Comparative Approach to the Implementation Using Channels and Objects—Romain Boonen, SAE Institute Brussels - Brussels, Belgium
This paper describes and compares two offline binaural converting algorithms based on HRTFs (Head-Related Transfer Functions) for 3D audio contents. Recognizing the widespread use of headphones by the typical modern audio content consumer, two strategies to binaurally translate the 3D mixes are explored in order to give a convincing 3D aural experience "on the go." Aiming for the best output quality possible and avoiding the compromises inherent to real-time processing, the paper compares the channel- and the object-based models, notably looking respectively into the spectral analysis of channels for usage of HRTFs at intermediate positions between the virtual speakers and the dynamic convolution of the HRTFs with the objects according to their position coordinates in time.
Convention Paper 8955 (Purchase now)

Thursday, October 17, 4:30 pm — 7:00 pm (Room 1E11)

Workshop: W8 - Digital Room Correction—Does it Really Work?

Chair:
Bob Katz, Digital Domain Mastering - Orlando, FL, USA
Panelists:
Ulrich Brüggemann, AudioVero - Herzebrock, Germany
Michael Chafee, Michael Chafee Enterprises - Sarasota, FL, USA
Will Eggleston, Genelec Inc. - Natick, MA, USA
Curt Hoyt, 3D Audio Consultant - Huntington Beach, CA, USA; Trinnov Audio USA Operations
Floyd Toole, Acoustical consultant to Harman, ex. Harman VP Acoustical Engineering - Oak Park, CA, USA

Abstract:
The practice of digital room and loudspeaker correction (DRC) is an especially fruitful beneficiary of Moore's law and increased skills among DSP programmers. DRC is a hot topic of interest for recording, mixing and mastering engineers, and studio designers. The workshop will explore the principles of DRC with three panelists and an expert guest.

This session is presented in association with the AES Technical Committee on Acoustics and Sound Reinforcement and AES Technical Committee on Recording Technology and Practices

Thursday, October 17, 5:00 pm — 7:00 pm (Room 1E15/16)

Special Event: Producing Across Generations: New Challenges, New Solutions—Making Records for Next to Nothing in the 21st Century

Moderator:
Nick Sansano
Panelists:
Frank Filipetti, the living room - West Nyack, NY, USA; METAlliance
Jesse Lauter, New York, NY, USA
Carter Matschullat
Bob Power
Kaleb Rollins, Grand Staff, LLC - Brooklyn, NY, USA
Hank Shocklee
Craig Street, Independent - New York, NY, USA

Abstract:
Budgets are small, retail is dying, studios are closing, fed up audiences are taking music at will … yet devoted music professionals continue to make records for a living. How are they doing it? How are they getting paid? What type of contracts are they commanding? In a world where the “record” has become an artists’ business card, how will the producer and mixer derive participatory income? Are studio professionals being left out of the so-called 360 deals? Let’s get a quality bunch of young rising producers and a handful of seasoned vets in a room and finally open the discussion about empowerment and controlling our own destiny.

Friday, October 18, 9:00 am — 11:00 am (Room 1E11)

Game Audio: G3 - Scoring "Tomb Raider": The Music of the Game

Presenter:
Alex Wilmer, Crystal Dynamics

Abstract:
"Tomb Raider's" score has been critically acclaimed as being uniquely immersive and at a level of quality on par with film. It is a truly scored experience that has raised the bar for the industry. To achieve this, new techniques in almost every part of the music's production needed to be developed. This talk will focus on the process of scoring "Tomb Raider." Every aspect will be covered from the music's creative direction, composition, implementation, and the technology behind it.

Friday, October 18, 9:00 am — 12:00 pm (Room 1E07)

Paper Session: P8 - Recording and Production

Chair:
Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada

P8-1 Music Consumption Behavior of Generation Y and the Reinvention of the Recording Industry—Barry Marshall, The New England Institute of Art - Brookline, MA, USA
This paper will give an overview of the last 15 years of the recording industry’s problems with piracy and decreasing sales, while reporting on research into the music consumption behavior of a group of audio students in both the United States and in eight European countries. Audio students have a unique perspective on the issues surrounding the recording industry’s problems since the advent of Napster and the later generations of illegal file sharing. Their insights into issues like the importance of access to music, the quality of the listening experience, and the ethical quandary of participating in copyright infringement, may help point to a direction for the future of the recording industry.
Convention Paper 8956 (Purchase now)

P8-2 (Re)releasing the Beatles—Brett Barry, Syracuse University - Syracuse, NY, USA
This paper presents a comparative analysis of various Beatles releases, including original 1960s vinyl, early compact discs, and present-day digital downloads through services like iTunes. I will provide original research using source material and interviews with persons directly involved in recording and releasing Beatles albums, examining variations in dynamic range, spectral distribution, psychoacoustics, and track anomalies. Considerations are given to mastering and remastering a catalog of classics.
Convention Paper 8957 (Purchase now)

P8-3 Maximum Averaged and Peak Levels of Vocal Sound Pressure—Braxton Boren, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Brian Gill, New York University - New York, NY, USA
This work describes research on the maximum sound pressure level achievable by the spoken and sung human voice. Trained actors and singers were measured for peak and averaged SPLs at an on-axis distance of 1 m at three different subjective dynamic levels and also for two different vocal techniques (“back” and “mask” voices). The “back” sung voice was found to achieve a consistently lower SPL than the “mask” voice at a corresponding dynamic level. Some singers were able to achieve high averaged levels with both spoken and sung voice, while others produced much higher levels singing than speaking. A few of the vocalists were able to produce averaged levels above 90 dB_A<, the highest found in the existing literature.
Convention Paper 8958 (Purchase now)

P8-4 Listener Adaptation in the Control Room: The Effect of Varying Acoustics on Listener Familiarization—Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, Skywalker Sound - San Francisco, CA, USA; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
The area of auditory adaptation is of central importance to a recording engineer operating in unfamiliar or less-than-ideal acoustic conditions. This study prompts expert listeners to perform a controlled level-balancing task while exposed to three different acoustic conditions. The length of exposure is varied to test the role of adaptation on such a task. Results show that there is a significant difference in the variance of participants’ results when exposed to one condition for a longer period of time. In particular, subjects seem to most easily adapt to reflective acoustic conditions.
Convention Paper 8959 (Purchase now)

P8-5 Spectral Characteristics of Popular Commercial Recordings 1950-2010—Pedro Duarte Pestana, Catholic University of Oporto - CITAR - Oporto, Portugal; Lusíada Universityof Portugal (ILID); Centro de Estatística e Aplicacoes; Zheng Ma, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Alvaro Barbosa, Catholic University of Oporto - CITAR - Oporto, Portugal; Dawn A. A. Black, Queen Mary University of London - London, UK
In this work the long-term spectral contours of a large dataset of popular commercial recordings were analyzed. The aim was to analyze overall trends, as well as yearly and genre-specific ones. A novel method for averaging spectral distributions is proposed that yields results that are prone to comparison. With it, we found out that there is a consistent leaning toward a target equalization curve that stems from practices in the music industry but also to some extent mimics natural, acoustic spectra of ensembles.
Convention Paper 8960 (Purchase now)

P8-6 A Knowledge-Engineered Autonomous Mixing System—Brecht De Man, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
In this paper a knowledge-engineered mixing engine is introduced that uses semantic mixing rules and bases mixing decisions on instrument tags as well as elementary, low-level signal features. Mixing rules are derived from practical mixing engineering textbooks. The performance of the system is compared to existing automatic mixing tools as well as human engineers by means of a listening test, and future directions are established.
Convention Paper 8961 (Purchase now)

Friday, October 18, 9:00 am — 11:30 am (Room 1E09)

Paper Session: P9 - Applications in Audio—Part I

Chair:
Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA

P9-1 Audio Device Representation, Control, and Monitoring Using SNMP—Andrew Eales, Wellington Institute of Technology - Wellington, New Zealand; Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
The Simple Network Management Protocol (SNMP) is widely used to configure and monitor networked devices. The architecture of complex audio devices can be elegantly represented using SNMP tables. Carefully considered table indexing schemes support a logical device model that can be accessed using standard SNMP commands. This paper examines the use of SNMP tables to represent the architecture of audio devices. A representational scheme that uses table indexes to provide direct-access to context-sensitive SNMP data objects is presented. The monitoring of parameter values and the implementation of connection management using SNMP are also discussed.
Convention Paper 8962 (Purchase now)

P9-2 IP Audio in the Real-World; Pitfalls and Practical Solutions Encountered and Implemented when Rolling Out the Redundant Streaming Approach to IP Audio—Kevin Campbell, WorldCast Systems /APT - Belfast, N Ireland; Miami, Florida
This paper will review the development of IP audio links for audio delivery and chiefly look at the possibility of harnessing the flexibility and cost-effectiveness of the public internet for professional audio delivery. We will discuss first the benefits of IP audio when measured against traditional synchronous audio delivery and also the typical problems associated with delivering real-time broadcast audio across packetized networks, specifically in the context of unmanaged IP networks. The paper contains an examination of some techniques employed to overcome these issues with an in-depth look at the redundant packet streaming approach.
Convention Paper 8963 (Purchase now)

P9-3 Implementation of AES-64 Connection Management for Ethernet Audio/Video Bridging Devices—James Dibley, Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
AES-64 is a standard for the discovery, enumeration, connection management, and control of multimedia network devices. This paper describes the implementation of an AES-64 protocol stack and control application on devices that support the IEEE Ethernet Audio/Video Bridging standards for streaming multimedia, enabling connection management of network audio streams.
Convention Paper 8964 (Purchase now)

P9-4 Simultaneous Acquisition of a Massive Number of Audio Channels through Optical Means—Gabriel Pablo Nava, NTT Communication Science Laboratories - Kanagawa, Japan; Yutaka Kamamoto, NTT Communication Science Laboratories - Kanagawa, Japan; Takashi G. Sato, NTT Communication Science Laboratories - Kanagawa, Japan; Yoshifumi Shiraki, NTT Communication Science Laboratories - Kanagawa, Japan; Noboru Harada, NTT Communicatin Science Labs - Atsugi-shi, Kanagawa-ken, Japan; Takehiro Moriya, NTT Communicatin Science Labs - Atsugi-shi, Kanagawa-ken, Japan
Sensing sound fields at multiple locations often may become considerably time consuming and expensive when large wired sensor arrays are involved. Although several techniques have been developed to reduce the number of necessary sensors, less work has been reported on efficient techniques to acquire the data from all the sensors. This paper introduces an optical system, based on the concept of visible light communication, which allows the simultaneous acquisition of audio signals from a massive number of channels via arrays of light emitting diodes (LEDs) and a high speed camera. Similar approaches use LEDs to express the sound pressure of steady state fields as a scaled luminous intensity. The proposed sensor units, in contrast, transmit optically the actual digital audio signal sampled by the microphone in real time. Experiments to illustrate two examples of typical applications are presented: a remote acoustic imaging sensor array and a spot beamforming based on the compressive sampling theory. Implementation issues are also addressed to discuss the potential scalability of the system.
Convention Paper 8965 (Purchase now)

P9-5 Blind Microphone Analysis and Stable Tone Phase Analysis for Audio Tampering Detection—Luca Cuccovillo, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Sebastian Mann, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Patrick Aichroth, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Marco Tagliasacchi, Politecnico di Milano - Milan, Italy; Christian Dittmar, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
In this paper we present an audio tampering detection method based on the combination of blind microphone analysis and phase analysis of stable tones, e.g., the electrical network frequency (ENF). The proposed algorithm uses phase analysis to detect segments that might have been tampered. Afterwards, the segments are further analyzed using a feature vector able to discriminate among different microphone types. Using this combined approach, it is possible to achieve a significantly lower false-positive rate and higher reliability as compared to standalone phase analysis.
Convention Paper 8966 (Purchase now)

Friday, October 18, 10:30 am — 12:00 pm (Room 1E13)

Workshop: W12 - FX Design Panel: Equalization

Chair:
Francis Rumsey, Logophon Ltd. - Oxfordshire, UK
Panelists:
Nir Averbuch, Sound Radix Ltd. - Israel
George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada
Saul Walker, New York University - New York, NY, USA

Friday, October 18, 11:00 am — 12:00 pm (Room 1E15/16)

Special Event: Friday Keynote: The Current and Future Direction of the Recording Process from an Artist, Engineer, and Producer’s Perspective

Presenter:
Jimmy Jam, Flyte Tyme Productions

Abstract:
Five-time GRAMMY Award winner Jimmy Jam is a renowned songwriter, record producer, musician, entrepreneur, and half of the most influential and successful writing/producing duo in modern music history. Since forming their company Flyte Tyme Productions in 1982, Jam and partner Terry Lewis have collaborated with such diverse and legendary artists as Janet Jackson, Mary J. Blige, Gwen Stefani, Robert Palmer, Mariah Carey, Boyz II Men, Rod Stewart, Yolanda Adams, Sting, Heather Headley, Usher, Celine Dion, Kanye West, Chaka Khan, Trey Songz, and Michael Jackson, among others. Jimmy and Terry have written and/or produced over 100 albums and singles that have reached gold, platinum, multi-platinum, or diamond status, including 26 No. 1 R&B and 16 No. 1 pop hits, giving the pair more Billboard No. 1's than any other duo in chart history. Jimmy Jam’s Lunchtime Keynote address will focus on the current and future direction of the recording process from various perspectives. As a songwriter, artist, engineer, and producer, Jimmy is uniquely qualified to give a bird’s-eye view of how each of these “personalities” interact and contribute to the overall final product, and along the way, how technology has evolved and what it has meant to his craft. In Jimmy’s words, “Of course it all starts with a great song, but then, it's important to consider how and what technology should be used to capture that creativity. It’s that intersection between the technology and creativity that I have always looked at every day throughout my career. Ultimately, it’s my job as a artist/producer to have those two elements meet and not crash. And that’s when you're using the available technology to capture the artist in their purest form.”

Friday, October 18, 12:00 pm — 1:00 pm (Room 1E13)

Workshop: W13 - Microphone and Recording Techniques for the Music Ensembles of the United States Military Academy

Chair:
Brandie Lane, United States Military Academy Band - West Point, NY, USA
Panelist:
Joseph Skinner, United States Army, West Point Band - West Point, NY, USA

Abstract:
The United States Military Academy is home to the oldest active duty military band. Our mission is to provide world-class music to educate, train, and inspire the Corps of Cadets and to serve as ambassadors of the United States Military Academy to the local, national, and international communities. This workshop will discuss advanced microphone and recording techniques (stereo and multi-track) used to capture the different elements of the West Point Band including the Concert Band, Jazz Knights, and Field Music group in a recording/studio or live sound reinforcement setting. The recording of other USMA musical elements including the Cadet Glee Club and Cadet Pipes and Drums will also be discussed.

Friday, October 18, 12:45 pm — 2:15 pm (Room 1E15/16)

Special Event: From the Motor City to Broadway: Making "Motown The Musical" Cast Album

Moderator:
Harry Weinger, Universal Music Enterprises (UMe) - New York, NY, USA; New York University - New York, NY, USA
Panelists:
Frank Filipetti, the living room - West Nyack, NY, USA; METAlliance
Jawan Jackson, Motown The Musical - New York, NY, USA
Ethan Popp, Special Guest Music Productions, LLC - New York, NY, USA

Abstract:
Tracing the path taken by pop-R&B classics known the world over to the Broadway stage and the modern-day recording studio—and how cast albums get made with no time and no do-overs.

A panel and Q&A with album producer and mixer Frank Filipetti, a multi-Grammy Award winner, and co-producer Ethan Popp, the show's Tony-nominated musical supervisor.

Moderator: Harry Weinger, VP of A&R at UMe, two-time Grammy winner and album Executive Producer.

Friday, October 18, 1:15 pm — 2:15 pm (Room 1E11)

Special Event: Lunchtime Keynote: On the Transmigration of Souls

Presenter:
Michael Bishop, Five/Four Productions, Ltd.

Abstract:
"On the Transmigration of Souls," is a multi-Grammy winning work for orchestra, chorus, children’s choir, and pre-recorded tape is a composition by composer John Adams. It was commissioned by the New York Philharmonic and Lincoln Center’s Great Performers and Mr. Adams received the 2003 Pulitzer Prize in music for the piece. Its premiere recording received the 2005 Grammy Award for Best Classical Album, Best Orchestral Performance, and Best Classical Contemporary Composition and the 2009 Grammy Award for Best Surround Sound Album. Surround Recording Engineer, Michael Bishop, will discuss the surround production process and play the work in its entirety.

Friday, October 18, 2:00 pm — 2:30 pm (Room 1E07)

Paper Session: P10 - Amplifiers

Chair:
Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA

P10-1 Supply-Feedback Fully-Digital Class D Audio Amplifier Featuring 100 dBA+ SNR and 0.5 W to 1 W Selectable Output Power—Rossella Bassoli, ST-Ericsson - Monza Brianza, Italy; Federico Guanziroli, ST-Ericsson - Monza Brianza, Italy; Carlo Crippa, ST-Ericsson - Monza Brianza, Italy; Germano Nicollini, ST-Ericsson - Monza Brianza, Italy
This paper presents a real-time power supply noise correction technique in a fully-digital class D audio amplifier. The power supply is scaled and applied to a 12-bits Nyquist ADC to modify the amplitude of the Pulse-Width-Modulator reference carrier. An improved supply extrapolation algorithm results to a power supply rejection from one to two orders of magnitude higher than reported implementations. Class D sensitivity to clock jitter is presented. SNR higher than 100dBA have been measured in the presence of both power supply ripple and clock jitter. The PWM and output stage are integrated in the same chip in a 0.13µm digital CMOS technology, whereas an external ADC has been used to demonstrate the validity of the supply-feedback algorithm.
Convention Paper 8968 (Purchase now)

Friday, October 18, 2:15 pm — 4:45 pm (Room 1E09)

Paper Session: P11 - Perception—Part 1

Chair:
Jason Corey, University of Michigan - Ann Arbor, MI, USA

P11-1 On the Perceptual Advantage of Stereo Subwoofer Systems in Live Sound Reinforcement—Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK
Recent research into low-frequency sound-source localization confirms the lowest localizable frequency is a function of room dimensions, source/listener location, and reverberant characteristics of the space. Larger spaces therefore facilitate accurate low-frequency localization and should gain benefit from broadband multichannel live-sound reproduction compared to the current trend of deriving an auxiliary mono signal for the subwoofers. This study explores whether the monophonic approach is a significant limit to perceptual quality and if stereo subwoofer systems can create a superior soundscape. The investigation combines binaural measurements and a series of listening tests to compare mono and stereo subwoofer systems when used within a typical left/right configuration.
Convention Paper 8970 (Purchase now)

P11-2 Auditory Adaptation to Loudspeakers and Listening Room Acoustics—Cleopatra Pike, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK
Timbral qualities of loudspeakers and rooms are often compared in listening tests involving short listening periods. Outside the laboratory, listening occurs over a longer time course. In a study by Olive et al. (1995) smaller timbral differences between loudspeakers and between rooms were reported when comparisons were made over shorter versus longer time periods. This is a form of timbral adaptation, a decrease in sensitivity to timbre over time. The current study confirms this adaptation and establishes that it is not due to response bias but may be due to timbral memory, specific mechanisms compensating for transmission channel acoustics, or attentional factors. Modifications to listening tests may be required where tests need to be representative of listening outside of the laboratory.
Convention Paper 8971 (Purchase now)

P11-3 Perception Testing: Spatial Acuity—P. Nigel Brown, Ex'pression College for Digital Arts - Emeryville, CA, USA
There is a lack of readily accessible data in the public domain detailing individual spatial aural acuity. Introducing new tests of aural perception, this document specifies testing methodologies and apparatus, with example test results and analyses. Tests are presented to measure the resolution of a subject's perception and their ability to localize a sound source. The basic tests are designed to measure minimum discernible change across a 180° horizontal soundfield. More complex tests are conducted over two or three axes for pantophonic or periphonic analysis. Example results are shown from tests including unilateral and bilateral hearing aid users and profoundly monaural subjects. Examples are provided of the applicability of the findings to sound art, healthcare, and other disciplines.
Convention Paper 8972 (Purchase now)

P11-4 Evaluation of Loudness Meters Using Parameterization of Fader Movements—Jon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
The EBU recommendation R 128 regarding loudness normalization is now generally accepted and countries in Europe are adopting the new recommendation. There is now a need to know more about how and when to use the different meter modes, Momentary and Short term, proposed in R 128, as well as to understand how different implementations of R 128 in audio level meters affect the engineers’ actions. A method is tentatively proposed for evaluating the performance of audio level meters in live broadcasts. The method was used to evaluate different meter implementations, three of them conforming to the recommendation from EBU, R 128. In an experiment, engineers adjusted audio levels in a simulated live broadcast show and the resulting fader movements were recorded. The movements were parameterized into “Fader movement,” “Adjustment time,” “Overshoot,” etc. Results show that the proposed parameters produced significant differences caused by the meters and that the experience of the engineer operating the fader is a significant factor.
Convention Paper 8973 (Purchase now)

P11-5 Validation of the Binaural Room Scanning Method for Cinema Audio Research—Linda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA
Binaural Room Scanning (BRS) is a method of capturing a binaural representation of a room using a dummy head with binaural microphones in the ears and later reproducing it over a pair of calibrated headphones. In this method multiple measurements are made at differing head angles that are stored separately as data files. A playback system employing headphones and a headtracker recreates the original environment for the listener, so that as they turn their head, the rendered audio during playback matches the listeners' current head angle. This paper reports the results of a validation test of a custom BRS system that was developed for research and evaluation of different loudspeakers and different listening spaces. To validate the performance of the BRS system, listening evaluations of different in-room equalizations of a 5.1 loudspeaker system were made both in situ and via the BRS system. This was repeated using three different loudspeaker systems in three different sized listening rooms.
Convention Paper 8974 (Purchase now)

Friday, October 18, 3:00 pm — 4:30 pm (1EFoyer)

Poster: P12 - Signal Processing

P12-1 Temporal Synchronization for Audio Watermarking Using Reference Patterns in the Time-Frequency Domain—Tobias Bliem, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Juliane Borsum, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Stefan Krägeloh, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Temporal synchronization is an important part of any audio watermarking system that involves an analog audio signal transmission. We propose a synchronization method based on the insertion of two-dimensional reference patterns in the time-frequency domain. The synchronization patterns consist of a combination of orthogonal sequences and are continuously embedded along with the transmitted data, so that the information capacity of the watermark is not affected. We investigate the relation between synchronization robustness and payload robustness and show that the length of the synchronization pattern can be used to tune a trade-off between synchronization robustness and the probability of false positive watermark decodings. Interpreting the two-dimensional binary patterns as one-dimensional N-ary sequences, we derive a bond for the autocorrelation properties of these sequences to facilitate an exhaustive search for good patterns.
Convention Paper 8975 (Purchase now)

P12-2 Sound Source Separation Using Interaural Intensity Difference in Real Environments—Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper, a sound source separation method is proposed by using the interaural intensity difference (IID) of stereo audio signal recorded in real environments. First, in order to improve the channel separability, a minimum variance distortionless response (MVDR) beamformer is employed to increase the intensity difference between stereo channels. Then, IID between stereo channels processed by the beamformer is computed and applied to sound source separation. The performance of the proposed sound source separation method is evaluated on the stereo audio source separation evaluation campaign (SASSEC) measures. It is shown from the evaluation that the proposed method outperforms a sound source separation method without applying a beamformer.
Convention Paper 8976 (Purchase now)

P12-3 Reverberation and Dereverberation Effect on Byzantine Chants—Alexandros Tsilfidis, accusonus, Patras Innovation Hub - Patras, Greece; Charalampos Papadakos, University of Patras - Patras, Greece; Elias Kokkinis, accusonus - Patras, Greece; Georgios Chryssochoidis, National and Kapodistrian University of Athens - Athens, Greece; Dimitrios Delviniotis, National and Kapodistrian University of Athens - Athens, Greece; Georgios Kouroupetroglou, National and Kapodistrian University of Athens - Athens, Greece; John Mourjopoulos, University of Patras - Patras, Greece
Byzantine music is typically monophonic and is characterized by (i) prolonged music phrases and (ii) Byzantine scales that often contain intervals smaller than the Western semitone. As happens with most religious music genres, reverberation is a key element of Byzantine music. Byzantine churches/cathedrals are usually characterized by particularly diffuse fields and very long Reverberation Time (RT) values. In the first part of this work, the perceptual effect of long reverberation on Byzantine music excerpts is investigated. Then, a case where Byzantine music is recorded in non-ideal acoustic conditions is considered. In such scenarios, a sound engineer might require to add artificial reverb on the recordings. Here it is suggested that the step of adding extra reverberation can be preceded by a dereverberation processing to suppress the originally recorded non ideal reverberation. Therefore, in the second part of the paper a subjective test is presented that evaluates the above sound engineering scenario.
Convention Paper 8977 (Purchase now)

P12-4 Cepstrum-Based Preprocessing for Howling Detection in Speech Applications—Renhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Jian Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Conventional howling detection algorithms exhibit dramatic performance degradations in the presence of harmonic components of speech that have the similar properties with the howling components. To solve this problem, this paper proposes a cepstrum preprocessing-based howling detection algorithm. First, the impact of howling components on cepstral coefficients is studied in both theory and simulation. Second, according to the theoretical results, the cepstrum pre-processing-based howling detection algorithm is proposed. The Receiver Operating Characteristic (ROC) simulation results indicate that the proposed algorithm can increase the detection probability at the same false alarm rate. Objective measurements, such as Speech Distortion (SD) and Maximum Stable Gain (MSG), further confirm the validity of the proposed algorithm.
Convention Paper 8978 (Purchase now)

P12-5 Delayless Method to Suppress Transient Noise Using Speech Properties and Spectral Coherence—Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Shiwei Wang, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
This paper proposes a novel delayless transient noise reduction method that is based on speech properties and spectral coherence. The proposed method has three stages. First, the transient noise components are detected in each subband by using energy-normalized variance. Second, we apply the harmonic property of the voiced speech and the continuity of the speech signal to reduce speech distortion in voiced speech segments. Third, we define a new spectral coherence to distinguish the unvoiced speech from the transient noise to avoid suppressing the unvoiced speech. Compared with those existing methods, the proposed method is computationally efficient and casual. Experimental results show that the proposed algorithm can effectively suppress transient noise up to 30 dB without introducing audible speech distortion.
Convention Paper 8979 (Purchase now)

P12-6 Artificial Stereo Extension Based on Hidden Markov Model for the Incorporation of Non-Stationary Energy Trajectory—Nam In Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seung Ho Choi, Prof., Seoul National University of Science and Technology - Seoul, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper an artificial stereo extension method is proposed to provide stereophonic sound from mono sound. While frame-independent artificial stereo extension methods, such as Gaussian mixture model (GMM)-based extension, do not consider the correlation of energies of previous frames, the proposed stereo extension method employs a minimum mean-squared error estimator based on a hidden Markov model (HMM) for the incorporation of non-stationary energy trajectory. The performance of the proposed stereo extension method is evaluated by a multiple stimuli with a hidden reference and anchor (MUSHRA) test. It is shown from the statistical analysis of the MUSHRA test results that the stereo signals extended by the proposed stereo extension method have significantly better quality than those of a GMM-based stereo extension method.
Convention Paper 8980 (Purchase now)

P12-7 Simulation of an Analog Circuit of a Wah Pedal: A Port-Hamiltonian Approach—Antoine Falaize-Skrzek, IRCAM - Paris, France; Thomas Hélie, IRCAM-CNRS UMR 9912-UPMC - Paris, France
Several methods are available to simulate electronic circuits. However, for nonlinear circuits, the stability guarantee is not straightforward. In this paper the approach of the so-called "Port-Hamiltonian Systems" (PHS) is considered. This framework naturally preserves the energetic behavior of elementary components and the power exchanges between them. This guarantees the passivity of the (source-free part of the) circuit.
Convention Paper 8981 (Purchase now)

P12-8 Improvement in Parametric High-Band Audio Coding by Controlling Temporal Envelope with Phase Parameter—Kijun Kim, Kwangwoon University - Seoul, Korea; Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Eunmi Oh, Samsung Electronics Co., Ltd. - Suwon, Korea; Hochong Park, Kwangwoon University - Seoul, Korea
This study proposes a method to improve temporal envelope control in parametric high-band audio coding. Conventional parametric high-band coders may have difficulties with controlling fine high-band temporal envelope, which can cause the deterioration in sound quality for certain audio signals. In this study a novel method is designed to control temporal envelope using spectral phase as an additional parameter. The objective and the subjective evaluations suggest that the proposed method should improve the quality of sound with severely degraded temporal envelope by the conventional method.
Convention Paper 8982 (Purchase now)

Friday, October 18, 5:00 pm — 6:30 pm (Room 1E15/16)

Special Event: Inside Abbey Road 1967—Photos from the Sgt. Pepper Sessions

Moderator:
Allan Kozinn, NY Times - New York, NY, USA
Panelists:
Henry Grossman
Brian Kehew, CurveBender Publishing - Los Angeles, CA, USA

Abstract:
Allan Kozinn, noted Beatles expert and reviewer for the NY Times will moderate this panel, which shows a behind-the-scenes look at EMI/Abbey Road studios during the making of the landmark "Sgt. Pepper's Lonely Hearts Club Band." Famed Beatles photographer Henry Grossman visited the sessions where he took several hundred photos, many of which are still largely unseen. Henry will show photos and share memories of that creative era. Brian Kehew (co-author of the acclaimed Recording the Beatles book) will illustrate key technical aspects found in Grossman's photos. (Henry Grossman is also the author of Kaleidoscope Eyes: A Day in the Life of Sgt. Pepper and Places I Remember: My Time with The Beatles, considered two of the greatest collections of Beatles photography to date.)

Friday, October 18, 5:00 pm — 6:30 pm (Room 1E13)

Historical: The Art of Recording the Big Band, Revisited

Presenter:
Robert Auld, Auldworks - New York, NY, USA

Abstract:
The jazz big band was born in the 1920s, came of age in the 1930s, enjoyed its greatest popularity in the 1940s, and went into popular decline in the 1950s. In the 1960s the big band enjoyed a comeback of sorts but was displaced from the front pages by The Beatles and other things. In the 1970s it looked like the big band would either expire, or be transformed out of recognition. And yet, it persists; people still play in big bands, still dance to them, still record them. It has proved a most durable ensemble.

In his presentation Robert Auld will survey the history of jazz big bands, both from a musical and a technical, recording point of view. He will also show how he recorded a modern big band in 2011, using techniques typical of recording sessions in the "golden age" of stereo in the late 1950s. The presentation will include photos and recorded examples.

Saturday, October 19, 9:00 am — 11:30 am (Room 1E07)

Paper Session: P13 - Applications in Audio—Part 2

Chair:
Hans Riekehof-Boehmer, SCHOEPS Mikrofone - Karlsruhe, Germany

P13-1 Level-Normalization of Feature Films Using Loudness vs Speech—Esben Skovenborg, TC Electronic - Risskov, Denmark; Thomas Lund, TC Electronic A/S - Risskov, Denmark
We present an empirical study of the differences between level-normalization of feature films using the two dominant methods: loudness normalization and speech (“dialog”) normalization. The sound of 35 recent “blockbuster” DVDs were analyzed using both methods. The difference in normalization level was up to 14 dB, on average 5.5 dB. For all films the loudness method provided the lowest normalization level and hence the greatest headroom. Comparison of automatic speech measurement to manual measurement of dialog anchors shows a typical difference of 4.5 dB, with the automatic measurement producing the highest level. Employing the speech-classifier to process rather than measure the films, a listening test suggested that the automatic measure is positively biased because it sometimes fails to distinguish between “normal speech” and speech combined with “action” sounds. Finally, the DialNorm values encoded in the AC-3 streams on DVDs were compared to both the automatically and the manually measured speech levels and found to match neither one well. AES 135th Convention Best Peer-Reviewed Paper Award Cowinner
Convention Paper 8983 (Purchase now)

P13-2 Sound Identification from MPEG-Encoded Audio Files—Joseph G. Studniarz, Montana State University - Bozeman, MT, USA; Robert C. Maher, Montana State University - Bozeman, MT, USA
Numerous methods have been proposed for searching and analyzing long-term audio recordings for specific sound sources. It is increasingly common that audio recordings are archived using perceptual compression, such as MPEG-1 Layer 3 (MP3). Rather than performing sound identification upon the reconstructed time waveform after decoding, we operate on the undecoded MP3 audio data as a way to improve processing speed and efficiency. The compressed audio format is only partially processed using the initial bitstream unpacking of a standard decoder, but then the sound identification is performed directly using the frequency spectrum represented by each MP3 data frame. Practical uses are demonstrated for identifying anthropogenic sounds within a natural soundscape recording.
Convention Paper 8984 (Purchase now)

P13-3 Pilot Workload and Speech Analysis: A Preliminary Investigation—Rachel M. Bittner, New York University - New York, NY, USA; Durand R. Begault, Human Systems Integration Division, NASA Ames Research Center - Moffett Field, CA, USA; Bonny R. Christopher, San Jose State University Research Foundation, NASA Ames Research Center - Moffett Field, CA, USA
Prior research has questioned the effectiveness of speech analysis to measure a talker's stress, workload, truthfulness, or emotional state. However, the question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F₀) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F₀ was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F₀ or articulation rate.
Convention Paper 8985 (Purchase now)

P13-4 Gain Stage Management in Classic Guitar Amplifier Circuits—Bryan Martin, McGill University - Montreal, QC, Canada
The guitar amplifier became a common tool in musical creation during the second half of the 20th Century. This paper attempts to detail some of the internal mechanisms by which the tones are created and their dependent interactions. Two early amplifier designs are examined to determine the circuit relationships and design decisions that came to define the sound of the electric guitar.
Convention Paper 8986 (Purchase now)

P13-5 Audio Pre-Equalization Models for Building Structural Sound Transmission Suppression—Cheng Shu, University of Rochester - Rochester, NY, USA; Fangyu Ke, University of Rochester - Rochester, NY, USA; Xiang Zhou, Bose Corporation - Framingham, MA, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a novel audio pre-equalization model that utilizes the transmission characteristics of building structures to reduce the interference reaching adjacent neighbors while maintaining the audio quality for the target listener. The audio transmission profiles are obtained by field acoustical measurements in several typical types of building structures. We also measure the spectrum of audio to adapt the pre-equalization model to a specific audio segment. We apply a computational auditory model to (1) monitor the perceptual audio quality for the target listener and (2) access the interference caused to adjacent neighbors. The system performance is then evaluated using subjective rating experiments.
Convention Paper 8987 (Purchase now)

Saturday, October 19, 10:30 am — 12:00 pm (Room 1E11)

Sound for Picture: SP4 - Music Production for Film (Sound for Pictures Master Class)

Presenters:
Brian McCarty, Coral Sea Studios Pty. Ltd - Clifton Beach, QLD, Australia
Simon Franglen, Class1 Media - Los Angeles, CA, USA; London
Chris Hajian

Abstract:
Film soundtracks contain three elements: dialog, music, and sound effects. The creation of a music soundtrack is far more complex than previously, now encompassing “temp music” for preview screenings, synthesizer-enhanced orchestra tracks, and other special techniques. This Master Class with one of Hollywood's leading professionals puts the process under the microscope.

This session is presented in association with the AES Technical Committee on Audio for Cinema

Saturday, October 19, 10:30 am — 12:00 pm (Room 1E13)

Workshop: W16 - FX Design Panel: Distortion

Chair:
Jan Berg, Luleå University of Technology - Piteå, Sweden
Panelists:
Ken Bogdanowicz, SoundToys - Burlington, VT, USA
Marc Gallo, Studio Devil Virtual Tube Amplification - New York, NY, USA
Aaron Wishnick, iZotope - Somerville, MA, USA

Saturday, October 19, 11:15 am — 12:45 pm (Room 1E14)

Tutorial: T13 - A Holistic Approach to Crossover Systems and Equalization for Loudspeakers (A Master Class Event)

Presenter:
Malcolm O. J. Hawksford, University of Essex - Colchester, Essex, UK

Abstract:
Loudspeaker systems employ crossover filters and equalization to optimize their performance in the presence of electroacoustic transducers limitations and associated loudspeaker enclosures. This Master Class will discuss both analog and digital techniques and include examples of constant-voltage, all-pass, and constant-delay crossover alignments together with the constraints imposed by the choice of signal processing. The meaning of “minimum-phase” will be described including its linkage to causality and digital equalization strategies presented that emphasize the importance of loudspeaker impulse response decomposition into minimum-phase and excess-phase transfer functions. The session will include demonstrations on minimum-phase response derivation from a magnitude-frequency response and on the audibility of pure phase distortion to justify the use of the Linkwitz-Riley 4th-order class of analog crossover alignment.

Saturday, October 19, 11:30 am — 1:30 pm (Room 1E15/16)

Special Event: Platinum Producers

Moderator:
David Weiss, SonicScoop - New York, NY, USA
Panelists:
Jeff Jones, "The Jedi Master", World Alert Music - New York, NY, USA; Jazz at Lincoln Center - New York, NY, USA
Dano "ROBOPOP" Omelio
Dave Tozer, Dave Tozer - New York, NY, USA

Abstract:
The musical continuum, and its role in music production, comes into focus at this year's Platinum Producers Panel. How does an understanding of music's past, present, and future serve the producer in their quest to fully realize the artist's vision? We’ll go deep with this elite panel of Jeff Jones (Eric Clapton, Wynton Marsalis, Norah Jones), ROBOPOP (Gym Class Heroes, Maroon 5, Lana Del Ray), and Dave Tozer (John Legend, Kanye West, Justin Timberlake), and moderated by David Weiss (Founder/Editor of SonicScoop). Their collective experience spans decades and has produced hit singles and albums in rock, R&B, hip-hop, pop, jazz, and beyond. From their application of classic techniques to late-breaking revelations, this trio of hit makers will provide inside information on tracking, mixing, mastering, and getting the very best out of artists in the studio.

Saturday, October 19, 12:00 pm — 1:00 pm (Room 1E13)

Tutorial: T14 - Why Facilities Need Tech Managers & How to Be One!

Presenter:
Eric Wenocur, Lab Tech Systems - Silver Spring, MD, USA

Abstract:
This tutorial describes the role of a Technical Manager and discusses the need for Technical Management at facilities of any size—but especially at small shops where people wear many hats. Good and bad experiences from the field will help to emphasize why this matters, and approaches for handling Tech Manager duties will be suggested. The perspective is from someone who designs and builds facilities, interacts with managers and operators, and sees what happens when this role is left to chance!

Saturday, October 19, 12:15 pm — 1:15 pm (Room 1E11)

Historical: Restoring Peggy Lee’s Capitol Records Album “Jump for Joy”

Presenter:
Alan Silverman, Arf! Mastering - New York, NY, USA; NYU|Steinhardt Dept.of Music Technology

Abstract:
“Jump for Joy,” featuring Peggy Lee and arranged by Nelson Riddle, was one of the first records released by Capitol as a stereo LP. The year was 1959, the year the label first made stereo LPs available to the public, but this seminal album was never released in stereo on CD, only in mono. An assignment to master the original stereo mixes for digital release led to the discovery of a 54-year old audio mystery. Had something gone awry at the original stereo mix date? This special event uses photos and high-resolution transfers of original session material to detail a surprising finding and the steps that were taken to reach back in time to restore the album for today’s audience as it was intended to heard.

Saturday, October 19, 1:30 pm — 3:00 pm (Room 1E08)

Broadcast and Streaming Media: B11 - Maintenance, Repair, and Troubleshooting

Chair:
John Bisset, Telos Alliance
Panelists:
Michael Azzarello, CBS
Bill Sacks, Orban / Optimod Refurbishing - Hollywood, MD, USA
Kimberly Sacks, Optimod Refurbishing - Hollywood, MD, USA

Abstract:
Much of today's audio equipment may be categorized as “consumer, throw-away” gear, or so complex that factory assistance is required for a board or module swap. The art of Maintenance, Repair, and Troubleshooting is actually as important as ever, even as the areas of focus may be changing. This session brings together some of the sharpest troubleshooters in the audio business. They'll share their secrets to finding problems, fixing them, and working to ensure they don't happen again. We'll delve into troubleshooting on the systems level, module level, and the component level, and explain some guiding principles that top engineers share.

Saturday, October 19, 1:30 pm — 3:00 pm (Room 1E14)

Tutorial: T15 - The Art of Drum Programming

Presenter:
Justin Paterson, London College of Music, University of West London - London, UK

Abstract:
Drum programming has often faced boundaries in terms of how effectively it could address the complexities of certain genres. This tutorial will explore and push some of these boundaries as implemented in contemporary professional practice, showing contrasting techniques used in the creation of both human emulation and the unashamedly synthetic. Alongside this, many of the studio production techniques often used to enhance such work will be discussed, ranging from dynamic processing to intricate automation. The session will include numerous live demonstrations covering a range of approaches. Although introducing all key concepts from scratch, its range and hybridization should provide inspiration even for experienced practitioners.

Saturday, October 19, 2:30 pm — 4:00 pm (Room 1E15/16)

Special Event: Grammy SoundTable: What Would Ramone Do?

Co-moderators:
BJ Ramone
Elliot Scheiner
Presenters:
Jim Boyer
Peter Chaikin, JBL Professional - Northridge, CA, USA
Jill Dell'Abate
Mark Ethier
Frank Filipetti, the living room - West Nyack, NY, USA; METAlliance
Jimmy Jam, Flyte Tyme Productions
Leslie Ann Jones, Skywalker Sound - San Rafael, CA, USA
Bob Ludwig, Gateway Mastering Studios, Inc. - Portland, ME, USA
Rob Mathes
Al Schmitt, Los Angeles, CA, USA

Abstract:
What Would Ramone Do?

This educational and inspirational career retrospective will delve into the music, creativity, and vision of legendary 14-time GRAMMY Award winning producer/engineer/technologist Phil Ramone. From Marilyn Monroe's performance/rendition of "Happy Birthday" for JFK, Getz/Gilberto’s “Girl From Ipanema,” Billy Joel’s “Just The Way You Are,” Paul Simon’s “50 Ways to Leave Your Lover,” Frank Sinatra’s Duets album, and live concerts in Italy with Luciano Pavarotti, to overseeing groundbreaking sound evolutions for the GRAMMY Awards Telecast, Phil Ramone’s career spanned more than 50 years of artistic and technical innovation. For this retrospective, we’ll go behind the scenes with colleagues, footage, and friends for an analysis of the wisdom and knowledge behind his achievements. This session is guaranteed to be insightful and thought-provoking.

Saturday, October 19, 2:30 pm — 6:00 pm (Room 1E07)

Paper Session: P14 - Transducers—Part 2: Headphones and Loudspeakers

Chair:
Christopher Struck, CJS Labs - San Francisco, CA, USA

P14-1 Application of Matrix Analysis to Identification of Mechanical and Acoustical Parameters of Compression Drivers—Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA
In previous work of the author special measurement methods were used to obtain the transfer matrices of compression drivers. This data was coupled with the results of the FEA simulations of horns. It made it possible to simulate the frequency amplitude and directivity responses of horn drivers without building actual physical horns. In this work, a different set of measurements is used to obtain the transfer matrix of a vibrating diaphragm. This approach results in a more detailed and flexible method to analyze and design compression drivers. Other parameters used in the identification process are the electrical parameters of the motor and the acoustical parameters of compression chamber and phasing plug. The method was used in design and optimization of the new JBL dual-diaphragm compression driver to be used in a new JBL line array system.
Convention Paper 8988 (Purchase now)

P14-2 Application of Static and Dynamic Magnetic Finite Elements Analysis to Design and Optimization of Transducers Moving Coil Motors—Alexander Voishvillo, JBL/Harman Professional - Northridge, CA, USA; Felix Kochendörfer, JBL/Harman Professional - Northridge, CA USA
Transducer motors are potential source of nonlinear distortion. There are several nonlinear mechanisms that generate nonlinear distortion in motors. Typical loudspeaker nonlinear models include the dependence of the Bl-product and the voice coil inductance L_vc on the voice coil position and current. These effects cause nonlinearity in the driving force, electrodynamic damping, and generate nonlinear flux modulation and reluctance force. In reality, the voice coil inductance and resistive losses depend also on frequency. To take these effects into account the so-called LR-2 impedance model is used. The L₂ and R₂ elements are nonlinear functions of the voice coil position and current. In this work detailed analysis of a nonlinear model incorporating these elements is performed. The developed approach is illustrated by the FEA-based design and optimization of a new JBL ultra-linear transducer to be used in a new line array system.
Convention Paper 8989 (Purchase now)

P14-3 End-of-Line Test Concepts to Achieve and Maintain Yield and Quality in High Volume Loudspeaker Production—Gregor Schmidle, NTi Audio AG - Schaan, Liechtenstein
Managing high volume, multiple line, and location loudspeaker production is a challenging task that requires interdisciplinary skills. This paper offers concepts for designing and maintaining end-of-line test systems that help to achieve and maintain consistent yield and quality. Topics covered include acoustic and electric test parameter selection, mechanical test jig design, limit finding strategies, fault-tolerant workflow creation, test system calibration and environmental influence handling as well as utilizing statistics and statistic process control.
Convention Paper 8990 (Purchase now)

P14-4 Advances in Impedance Measurement of Loudspeakers and Headphones—Steve Temme, Listen, Inc. - Boston, MA, USA; Tony Scott, Octave Labs, LLC - Eastchester, NY, USA
Impedance measurement is often the sole electrical measurement in a battery of QC tests on loudspeakers and headphones. Two test methods are commonly used—single channel and dual channel. Dual Channel measurement offers greater accuracy as both the voltage across the speaker (or headphone) and the reference resistor are measured to calculate the impedance. Single Channel measurement methods are more commonly used on the production line because it only requires one channel of a stereo soundcard, which leaves the other free for simultaneous acoustic tests. They are less accurate, however, due to the test methods making assumptions of constant voltage or constant current. In this paper we discuss a novel electrical circuit that offers similar impedance measurement accuracy compared to complex dual channel measurement methods but using just one channel. This is expected to become popular for high throughput production line measurements where only one channel is available as the second channel of the typical soundcard is being used for simultaneous acoustic tests.
Convention Paper 8991 (Purchase now)

P14-5 Auralization of Signal Distortion in Audio Systems—Part 1: Generic Modeling—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
Auralization techniques are developed for generating a virtual output signal of an audio system where the different kinds of signal distortion are separately enhanced or attenuated to evaluate the impact on sound quality by systematic listening or perceptive modeling. The generation of linear, regular nonlinear and irregular nonlinear distortion components is discussed to select suitable models and measurements for the auralization of each component. New methods are presented for the auralization of irregular distortion generated by defects (e.g., rub & buzz) where no physical models are available. The auralization of signal distortion is a powerful tool for defining the target performance of an audio product in marketing, developing products at optimal performance-cost ratio and for ensuring sufficient quality in manufacturing.
Convention Paper 8992 (Purchase now)

P14-6 Free Plus Diffuse Sound Field Target Earphone Response Derived from Classical Room Acoustics Theory—Christopher Struck, CJS Labs - San Francisco, CA, USA
The typical standardized free or diffuse field reference or target earphone responses in general represent boundary conditions rather than a realistic listening situation. Therefore a model using classical room acoustics is introduced to derive a more realistic target earphone response in a direct plus diffuse sound field. The insertion gain concept as applied to earphone response measurements using an ear simulator equipped test manikin is detailed in order to appropriately apply the model output to a typical earphone design. Data for multiple sound sources, multiple rooms, and variants of the direct 0° on-axis free field response are shown. Limits of the method are discussed and the results are compared to the well-known free and diffuse field responses.
Convention Paper 8993 (Purchase now)

P14-7 Listener Preferences for In-Room Loudspeaker and Headphone Target Responses—Sean Olive, Harman International - Northridge, CA, USA; Todd Welti, Harman International - Northridge, CA, USA; Elisabeth McMullin, Harman International - Northridge, CA USA
Based on preference, listeners adjusted the relative bass and treble levels of three music programs reproduced through a high quality stereo loudspeaker system equalized to a flat in-room target response. The same task was repeated using a high quality circumaural headphone equalized to match the flat in-room loudspeaker response as measured at the eardrum reference point (DRP). The results show that listeners on average preferred an in-room loudspeaker target response that had 2 dB more bass and treble compared to the preferred headphone target response. There were significant variations in the preferred bass and treble levels due to differences in individual taste and listener training.
Convention Paper 8994 (Purchase now)

Saturday, October 19, 3:00 pm — 4:30 pm (1EFoyer)

Poster: P15 - Applications in Audio—Part I

P15-1 An Audio Game App Using Interactive Movement Sonification for Targeted Posture Control—Daniel Avissar, University of Miami - Coral Gables, FL, USA; Colby N. Leider, University of Miami - Coral Gables, FL, USA; Christopher Bennett, University of Miami - Coral Gables, FL, USA; Oygo Sound LLC - Miami, FL, USA; Robert Gailey, University of Miami - Coral Gables, FL, USA
Interactive movement sonification has been gaining validity as a technique for biofeedback and auditory data mining in research and development for gaming, sports, and physiotherapy. Naturally, the harvesting of kinematic data over recent years has been a function of an increased availability of more portable, high-precision sensory technologies, such as smart phones, and dynamic real time programming environments, such as Max/MSP. Whereas the overlap of motor skill coordination and acoustic events has been a staple to musical pedagogy, musicians and music engineers have been surprisingly less involved than biomechanical, electrical, and computer engineers in research efforts in these fields. Thus, this paper proposes a prototype for an accessible virtual gaming interface that uses music and pitch training as positive reinforcement in the accomplishment of target postures.
Convention Paper 8995 (Purchase now)

P15-2 Evaluation of the SMPTE X-Curve Based on a Survey of Re-Recording Mixers—Linda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA
Cinema calibration methods, which include targeted equalization curves for both dub stages and cinemas, are currently used to ensure an accurate translation of a film's sound track from dub stage to cinema. In recent years, there has been an effort to reexamine how cinemas and dub-stages are calibrated with respect to preferred or standardized room response curves. Most notable is the work currently underway reviewing the SMPTE standard ST202:2010 "For Motion-Pictures - Dubbing Stages (Mixing Rooms), Screening Rooms and Indoor Theaters -B-Chain Electroacoustic Response." There are both scientific and anecdotal reasons to question the effectiveness of the SMPTE standard in its current form. A survey of re-recording mixers was undertaken in an effort to better understand the efficaciousness of the SMPTE standard from the users' point of view.
Convention Paper 8996 (Purchase now)

P15-3 An Objective Comparison of Stereo Recording Techniques through the Use of Subjective Listener Preference Ratings—Wei Lim, University of Michigan - Ann Arbor, MI, USA
Stereo microphone techniques offer audio engineers the ability to capture a soundscape that approximates how one might hear realistically. To illustrate the differences between six common stereo microphone techniques, namely XY, Blumlein, ORTF, NOS, AB, and Faulkner, I asked 12 study participants to rate recordings of a Yamaha Disklavier piano. I examined the inter-rating correlation between subjects to find a preferential trend toward near-coincidental techniques. Further evaluation showed that there was a preference for clarity over spatial content in a recording. Subjects did not find that wider microphone placements provided for more spacious-sounding recordings. Using this information, this paper also discusses the need to re-evaluate how microphone techniques are typically categorized by distance between microphones.
Convention Paper 8997 (Purchase now)

P15-4 Tampering Detection of Digital Recordings Using Electric Network Frequency and Phase Angle—Jidong Chai, University of Tennessee - Knoxville, TN, USA; Yuming Liu, Electrical Power Research Institute, Chongqing Electric Power Corp. - Chongqing, China; Zhiyong Yuan, China Southern Power Grid - Guangzhou, China; Richard W. Conners, Virginia Polytechnic Institute and State University - Blacksburg, VA, USA; Yilu Liu, University of Tennessee - Knoxville, TN, USA; Oak Ridge National Laboratory
In the field of forensic authentication of digital audio recordings, the ENF (electric network frequency) Criterion is one of the possible tools and has shown promising results. An important task for forensic authentication is to determine whether the recordings are tampered or not. Previous work performs tampering detection by looking for the discontinuity in either the extracted ENF or phase angle from digital recordings. However, using only frequency or phase angle to detect tampering may not be sufficient. In this paper both frequency and phase angle with a corresponding reference database are used to do tampering detection of digital recordings, which result in more reliable detection. This paper briefly introduces the Frequency Monitoring Network (FNET) at UTK and its frequency and phase angle reference database. A Short-Time Fourier transform (STFT) is employed to estimate the ENF and phase angle embedded in audio files. A procedure of using the ENF criterion to detect tampering, ranging from signal preprocessing, ENF and phase angle estimation, frequency database matching to tampering detection, is proposed. Results show that utilizing frequency and phase angle jointly can improve the reliability of tampering detection in authentication of digital recordings.
Convention Paper 8998 (Purchase now)

P15-5 Portable Speech Encryption Based Anti-Tapping Device—C. R. Suthikshn Kumar, Defence Institute of Advanced Technology (DIAT) - Girinagar, Pune, India
Tapping telephones nowadays is a major concern. There is a need for a portable device that can be attached to a mobile phone that can prevent tapping. Users want to encrypt their voice during conversation, mainly for privacy. The encrypted conversation can prevent tapping of the mobile calls as the network operator may tap the calls for various reasons. In this paper we propose a portable device that can be attached to the mobile phone/landline phone that serves as an anti-tapping device. The device encrypts the speech and decrypts the encrypted speech in real time. The main idea is that speech is unintelligible when encrypted.
Convention Paper 8999 (Purchase now)

P15-6 Personalized Audio Systems—A Bayesian Approach—Jens Brehm Nielsen, Technical University of Denmark - Kongens Lyngby, Denmark; Widex A/S - Lynge, Denmark; Bjørn Sand Jensen, Technical University of Denmark - Kongens Lyngby, Denmark; Toke Jansen Hansen, Technical University of Denmark - Kongens Lyngby, Denmark; Jan Larsen, Technical University of Denmark - Kgs. Lyngby, Denmark
Modern audio systems are typically equipped with several user-adjustable parameters unfamiliar to most listeners. To obtain the best possible system setting, the listener is forced into non-trivial multi-parameter optimization with respect to the listener's own objective and preference. To address this, the present paper presents a general interactive framework for robust personalization of such audio systems. The framework builds on Bayesian Gaussian process regression in which the belief about the user's objective function is updated sequentially. The parameter setting to be evaluated in a given trial is carefully selected by sequential experimental design based on the belief. A Gaussian process model is proposed that incorporates assumed correlation among particular parameters, which provides better modeling capabilities compared to a standard model. A five-band constant-Q equalizer is considered for demonstration purposes, in which the equalizer parameters are optimized for each individual using the proposed framework. Twelve test subjects obtain a personalized setting with the framework, and these settings are significantly preferred to those obtained with random experimentation.
Convention Paper 9000 (Purchase now)

Saturday, October 19, 3:00 pm — 5:00 pm (Room 1E13)

Workshop: W19 - "Help! I Have a Tape Recorder!"—Restoration and Rebuilding Analog Tape Machines

Chair:
Noah Simon, New York University - New York, NY, USA
Panelists:
John French, JRF Magnetic Sciences Inc - Greendell, NJ, USA
Bob Shuster, Shuster Sound - Smithtown, NY USA
Daniel Zellman, Zeltec Service Labs - New York, NY, USA; Zeltec Research & Development

Abstract:
A new generation of engineers, musicians, and audiophiles are discovering how the analog recorders from the “good old days” are helping them get a better sound or get that “analog sound” into their recordings. At the same time at the other end, archivists, preservationists, remastering engineers, and high end audiophiles need to know what’s involved in taking care of these machines. This workshop will discuss the various options for these folks when they look for purchasing, maintaining, restoring, and using these recorders. During the workshop discussion, we hope to show examples of tape recorder repairs and restoration and have a running Q&A session.

This session is presented in association with the AES Technical Committee on Archiving Restoration and Digital Libraries

Saturday, October 19, 4:30 pm — 6:00 pm (Room 1E15/16)

Special Event: Bruce Swedien: I Have No Secrets

Presenters:
Bruce Swedien, Ocala, FL, USA
Bill Gibson, Hal Leonard Performing Arts Publishing Group - Seattle, WA, USA; Berklee College of Music Online

Abstract:
This Special Event showcases the mindset of one of music’s most-important engineers—ever! Interviewed by author Bill Gibson, Bruce Swedien generously shares the depth of his technical and artistic insights, inspiring greatness in the musical application of technology in recording and production. In an industry propelled by the excessive use of plug-ins, automatic tuning, and processing, Swedien reveals a different approach—his approach, which achieves massive sonic power through the mastery of musical and technical fundamentals and the insightful understanding of the role of microphones, the acoustical environment, effects processors, and the all-important emotional component in the recording process.

Bring your questions. Don’t miss a chance to learn from an audio industry master—a legend, an icon, and a friend to engineers around the world. Bruce Swedien has always been generous with his knowledge—he has no secrets! There will be space in the program for you to ask questions.

A five-time Grammy winner—and thirteen-time Grammy nominee—Swedien’s impact on popular music is undeniable! His approach to recording music has proven to be a game changer, with engineers at all levels referencing his work as a definitive sonic standard. From recording and mixing Michael Jackson’s albums (Off the Wall, Thriller, Bad, Dangerous, Invincible, and HIStory) to many of Quincy Jones’ hits (The Dude, Back on the Block, Q’s Jook Joint, and more) to the music of greats such as Count Basie, Duke Ellington, the Brothers Johnson, and Natalie Cole, Bruce Swedien has always operated at the very highest level of excellence and expertise in the recording industry.

Saturday, October 19, 5:00 pm — 7:00 pm (Room 1E09)

Historical: The 35mm Album Master Fad

Presenter:
Thomas Fine, (sole proprietor of private studio) - Brewster, NY, USA

Abstract:
In the late 1950s and early 1960s, a new market emerged for ultra-high fidelity recordings. Once cutting and playback of the stereo LP were brought up to high quality levels, buyers of this new super-realistic format wanted ever more "absolute" sound quality. The notion emerged, first with Everest Records, a small independent record label in Queens, to use 35mm magnetic film as the recording and mastering medium. 35mm had distinct advantages over tape formulations and machines of that time—lower noise floor, less wow and flutter, higher absolute levels before saturation, almost no crosstalk or print-through, etc. Everest Records made a splash with the first 35mm LP masters not connected to motion-picture soundtracks but quickly faltered as a business. The unique set of recording equipment and the Everest studio remained intact and was used to make commercially successful 35mm records for Mercury, Command, Cameo-Parkway, and Project 3. The fad faded by the mid-60s as tape machines and tape formulations improved, and the high cost of working with 35mm magnetic film became unsustainable. The original Everest equipment survived to be used in the Mercury Living Presence remasters for CD. Just recently, the original Everest 35mm recordings have been reissued in new high-resolution digital remasters. This presentation will trace the history of 35mm magnetic recording, the brief but high-profile fad of 35mm-based LPs, and the after-life of those original recordings. We will also look at the unique set of hardware used to make the vast majority of the 35mm LPs. The presentation will be augmented with plenty of audio examples from the original recordings.

Saturday, October 19, 5:00 pm — 7:00 pm (Room 1E14)

Workshop: W22 - Loudness Wars: Leave Those Peaks Alone

Chair:
Thomas Lund, TC Electronic A/S - Risskov, Denmark
Panelists:
John Atkinson
Florian Camerer, ORF - Austrian TV - Vienna, Austria; EBU - European Broadcasting Union
Bob Ludwig, Gateway Mastering Studios, Inc. - Portland, ME, USA
George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada
Susan Rogers

Abstract:
Music production, distribution, and consumption has been caught in a vicious spiral rendering two decades of our music heritage damaged. Because of irreversible dynamics processing and data reduction from production onwards, new tracks and remastered ones typically sound worse than what could even be expected from compact cassette. However, with Apple, WiMP, and Spotify now engaged in a competition on quality, and FM radio in Europe adopting EBU R128 loudness normalization, limbo-practice is finally losing its grip on distribution.

The panel uses terms "Peak to Loudness Ratio" (PLR) and "Headroom" to analyze recorded music fidelity over the past 50 years from four different angles: physiological, production, distribution, and consumption. In the new realm, it's futile to master music louder than –16 LKFS.

Sunday, October 20, 9:00 am — 12:00 pm (Room 1E07)

Paper Session: P16 - Spatial Audio—Part 2

Chair:
Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA

P16-1 Defining the Un-Aliased Region for Focused Sources—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Ian Drumm, University of Salford - Salford, Greater Manchester, UK
Sound field synthesis reproduction techniques such as wave field synthesis can accurately reproduce wave fronts of arbitrary curvature, including sources with the wave fronts of a source in front of the array. The wave fronts are accurate up until the spatial aliasing frequency, above which there are no longer enough secondary sources (loudspeakers) to reproduce the wave front accurately, resulting in spatial aliasing contribution manifesting as additional wave fronts propagating in directions other than intended. These contributions cause temporal, spectral, and spatial errors in the reproduced wave front. Focused sources (sources in front of the loudspeaker array) have a unique attribute in this sense in that there is a clearly defined region around the virtual source position that exhibits no spatial aliasing contributions even at an extremely high frequency. This paper presents a method for the full characterization of this un-aliased region using both a ray-based propagation model and a time domain approach.
Convention Paper 9001 (Purchase now)

P16-2 Using Ambisonics to Reconstruct Measured Soundfields—Samuel W. Clapp, Rensselaer Polytechnic Institute - Troy, NY, USA; Anne E. Guthrie, Rensselaer Polytechnic Institute - Troy, NY, USA; Arup Acoustics - New York, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
Spherical microphone arrays can measure a soundfield's spherical harmonic components, subject to certain bandwidth constraints depending on the array radius and the number and placement of the array's sensors. Ambisonics is designed to reconstruct the spherical harmonic components of a soundfield via a loudspeaker array and also faces certain limitations on its accuracy. This paper looks at how to reconcile these sometimes conflicting limitations to produce the optimum solution for decoding. In addition, binaural modeling is used as a method of evaluating the proposed decoding method and the accuracy with which it can reproduce a measured soundfield.
Convention Paper 9002 (Purchase now)

P16-3 Subjective Evaluation of Multichannel Sound with Surround-Height Channels—Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Doyuen Ko, Belmont University - Nashville, TN, USA; McGill University - Montreal, Quebec, Canada; Aparna Nagendra, Rochester Institute of Technology - Rochester, NY, USA; Wieslaw Woszczyk, McGill University - Montreal, QC, Canada
In this paper we report results from an investigation of listener perception of surround-height channels added to standard multichannel stereophonic reproduction. An ITU-R horizontal loudspeaker configuration was augmented by the addition of surround-height loudspeakers in order to reproduce concert hall ambience from above the listener. Concert hall impulse responses (IRs) were measured at three heights using an innovative microphone array designed to capture surround-height ambience. IRs were then convolved with anechoic music recordings in order to produce seven-channel surround sound stimuli. Listening tests were conducted in order to determine the perceived quality of surround-height channels as affected by three loudspeaker positions and three IR heights. Fifteen trained listeners compared each reproduction condition and ranked them based on their degree of appropriateness. Results indicate that surround-height loudspeaker position has a greater influence on perceived sound quality than IR height. Listeners considered the naturalness, spaciousness, envelopment, immersiveness, and dimension of the reproduced sound field when making judgments of surround-height channel quality.
Convention Paper 9003 (Purchase now)

P16-4 A Perceptual Evaluation of Recording, Rendering, and Reproduction Techniques for Multichannel Spatial Audio—David Romblom, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Catherine Guastavino, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
The objective of this project is to perceptually evaluate the relative merits of two different spatial audio recording and rendering techniques within the context of two different multichannel reproduction systems. The two recordings and rendering techniques are "natural," using main microphone arrays, and "virtual," using spot microphones, panning, and simulated acoustic delay. The two reproduction systems are the 3/2 system (5.1 surround) and a 12/2 system, where the frontal L/C/R triplet is replaced by a 12-loudspeaker linear array. The perceptual attributes of multichannel spatial audio have been established by previous authors. In this study magnitude ratings of selected spatial audio attributes are presented for the above treatments and results are discussed.
Convention Paper 9004 (Purchase now)

P16-5 The Optimization of Wave Field Synthesis for Real-Time Sound Sources Rendered in Non-Anechoic Environments—Ian Drumm, University of Salford - Salford, Greater Manchester, UK; Robert Oldfield, University of Salford - Salford, Greater Manchester, UK
Presented here is a technique that employs audio capture and adaptive recursive filter design to render in real time dynamic, interactive, and content rich soundscapes within non-anechoic environments. Typically implementations of wave field synthesis utilize convolution to mitigate for the amplitude errors associated with the application of linear loudspeaker arrays. Although recursive filtering approaches have been suggested before, this paper aims to build on the work by presenting an approach that exploits Quasi Newton adaptive filter design to construct components of the filtering chain that help compensate for both the particular system configuration and mediating environment. Early results utilizing in-house developed software running on a 112-channel wave field synthesis system show the potential to improve the quality of real-time 3-D sound rendering in less than ideal contexts.
Convention Paper 9005 (Purchase now)

P16-6 A Perceptual Evaluation of Room Effect Methods for Multichannel Spatial Audio—David Romblom, McGill University - Montreal, Quebec, Canada; Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) - Montreal, Quebec, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Catherine Guastavino, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
The room effect is an important aspect of sound recording technique and is typically captured separately from the direct sound. The perceptual attributes of multichannel spatial audio have been established by previous authors, while the psychoacoustic underpinnings of room perception are known to varying degrees. The Hamasaki Square, in combination with a delay plan and an aesthetic disposition to "natural" recordings, is an approach practiced by some sound recording engineers. This study compares the Hamasaki Square to an alternative room effect and to dry approaches in terms of a number of multichannel spatial audio attributes. A concurrent experiment investigated the same spatial audio attributes with regard to the microphone and reproduction approach. As such, the current study uses a 12/2 system based upon 3/2 (5.1 surround) where the frontal L/C/R triplet has been replaced by a linear wavefront reconstruction array. AES 135th Convention Student Technical Papers Award Cowinner
Convention Paper 9006 (Purchase now)

Sunday, October 20, 9:00 am — 10:30 am (Room 1E11)

Workshop: W24 - DSD vs. DXD: Extreme DSD and PCM Resolutions Compared

Chair:
Dominique Brulhart, Merging Technologies
Panelists:
Morten Lindberg, 2L (Lindberg Lyd AS) - Oslo, Norway
John Newton, Soundmirror, Inc. - Jamaica Plain, MA, USA

Abstract:
With the recent release of 11.2 MHz Quad-DSD production tools, more than a decade of DSD and DXD productions and the rapidly growing availability of DSD and DXD material available for download on the market, there is a constant debate in both the professional and the audiophile sector about the difference between DSD and PCM and ultimately which one “sounds better.” This panel would like offering the opportunity to two known specialists of these formats, John Newton from Soundmirror and Morten Lindberg from 2L, to present some of their recordings and discuss about their experience making productions in DSD and DXD. Recent recordings in 11.2 MHz DSD, DSD, and DXD will be presented and recording, editing, mixing, and mastering techniques and considerations using DSD and DXD will be discussed and compared.

Sunday, October 20, 9:00 am — 10:30 am (Room 1E14)

Workshop: W25 - Audio @ The Near Speed of Light with Fiber Optics

Chair:
Ronald Ajemian, Owl Fiber Optics - Flushing, NY, USA
Panelists:
Marc Brunke, Optocore GmbH - Grafelfing, Germany
Steve Lampen, Belden - San Francisco, CA, USA
Fred Morgenstern, Neutrik USA
Warren Osse, Applications/Senior Design Engineer, Vistacom, Inc. - Allentown, PA, USA

Abstract:
As the growth of audio/video integrates and increases, so does the demand for more fiber optic technology. Sending, receiving, and storing a good quality audio/video data file is crucial in many areas of our industry. The workshop panel will address the audience to educate, inform, and update by having a question and answer format. Everyone who attends this workshop will walk away with more knowledge to be better prepared on existing and future use of fiber optic technology as it pertains to professional audio/video.

This session is presented in association with the AES Technical Committee on Fiber Optics for Audio

Sunday, October 20, 10:30 am — 12:00 pm (1EFoyer)

Poster: P17 - Applications in Audio—Part 2

P17-1 Source of ENF in Battery-Powered Digital Recordings—Jidong Chai, University of Tennessee - Knoxville, TN, USA; Fan Liu, Chongqing University - Chongqing, China; Zhiyong Yuan, China Southern Power Grid - Guangzhou, China; Richard W. Conners, Virginia Polytechnic Institute and State University - Blacksburg, VA, USA; Yilu Liu, University of Tennessee - Knoxville, TN, USA; Oak Ridge National Laboratory
Forensic audio authenticity has developed remarkably over the last few years due to advances in technology of digital recording processing. The ENF (Electric Network Frequency) Criterion is one of the possible tools and has shown very promising results in forensic authentication of digital recordings. However, currently there are very few experiments and papers on studying the source of ENF signals existing in digital recordings. In addition, it is unclear whether or not there are detectable ENF traces in battery-powered digital audio recordings. In this paper the study of ENF source in battery-powered digital recordings is presented, and it shows that ENF in these recordings may not be mainly caused by low frequency electromagnetic field induction but by low frequency audible hum. This paper includes a number of experiments to explore the possible sources of ENF in battery-powered digital recordings. In these experiments, the electric and magnetic field strength in different locations is measured and the results of corresponding ENF extraction are analyzed. Understanding this underlying phenomenon is critical to verify the validity of ENF techniques.
Convention Paper 9007 (Purchase now)

P17-2 The Audio Performance Comparison and Method of Designing Switching Amplifiers Using GaN FET—Jaecheol Lee, Samsung Electronics Co., Ltd. - Suwon, Korea; Haejong Kim, Samsung Electronics Co., Ltd. - Suwon, Korea; Keeyeong Cho, Samsung Electronics Co., Ltd. - Suwon, Korea; Haekwang Park, Samsung Electronics DMC R&D Center - Suwon, Korea
This paper addresses physical characteristics of FET materials, the method of designing switching amplifiers using GaN FET, and the audio performance comparison of silicon and GaN FET. The physical characteristics of GaN FET are excellent, but there is a technical limitation to apply to consumer electronics. Depletion mode GaN FET is used in the proposed system. Its characteristic is better than Enhance mode. But it has the characteristic of normally turn on. To solve this problem, a cascaded GaN switch block is used. It is a combination of depletion mode GaN and enhanced mode Si. The proposed method has more of an outstanding audio performance than the switching amplifier used in silicon.
Convention Paper 9008 (Purchase now)

P17-3 Audio Effect Classification Based on Auditory Perceptual Attributes—Thomas Wilmering, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
While the classification of audio effects has several applications in music production, the heterogeneity of possible taxonomies, as well as the many viable points of view for organizing effects, present research problems that are not easily solved. Creating extensible Semantic Web ontologies provide a possible solution to this problem. This paper presents the results of a listening test that facilitates the creation of a classification system based on auditory perceptual attributes that are affected by the application of audio effects. The obtained results act as a basis for a classification system to be integrated in a Semantic Web Ontology covering the domain of audio effects in the context of music production.
Convention Paper 9009 (Purchase now)

P17-4 Development of Volume Balance Adjustment Device for Voices and Background Sounds within Programs for Elderly People—Tomoyasu Komori, NHK Engineering System, Inc. - Setagaya-ku, Tokyo, Japan; Waseda University - Shinjuku-ku, Tokyo, Japan; Atsushi Imai, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Nobumasa Seiyama, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Reiko Takou, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Tohru Takagi, NHK Science & Technology Research Laboratories - Setagaya-ku, Tokyo, Japan; Yasuhiro Oikawa, Waseda University - Shinjuku-ku, Tokyo, Japan
Elderly people sometimes feel that the background sounds (music and sound effects) of broadcast programs are annoying. In response, we have developed a device that can adjust the mixing balance of program sounds suitable for elderly people, on the receiver side. The device suppresses uncorrelated components in the stereo background sound in speech segments (intervals in which narration and dialog are mixed with background sounds), and suppresses background sounds only without deterioration by gain control alone in non-speech segments. By subjective evaluations, we have verified that the proposed method can suppress the background sounds of programs by an equivalent of 6 dB, and viewing experiments with elderly people have shown that program sounds have become easier to understand.
Convention Paper 9010 (Purchase now)

P17-5 Acoustical Measurement Software Housed on Mobile Operating Systems Test—Felipe Tavera, Walters Storyk Design Group - Highland, NY, USA
A measurement test is devised to provide comparative results between a dedicated type I Sound Level Pressure Meter and a PDA and mobile application with proprietary additional components. The test pretends to analyze and compare results considering only frequency response, linearity over selected dynamic range, and transducer’s directivity under controlled on-site conditions. This, under the purpose of examining the accuracy of the non-dedicated hardware to perform acoustic measurements.
Convention Paper 9011 (Purchase now)

P17-6 Evaluating iBall—An Intuitive Interface and Assistive Audio Mixing Algorithm for Live Football Events—Henry Bourne, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Mixing the on-pitch audio for a live football event is a mentally challenging task requiring the experience of a skilled operator to capture all the important audio events. iBall is an intuitive interface coupled with an assistive mixing algorithm that aids the operator in achieving a comprehensive mix. This paper presents the results of subjective and empirical evaluation of the system. Using multiple stimulus comparison, event counting, fader tracking, and cross-correlation of mixes using different systems, this paper shows that lesser skilled operators can produce more reliable, more dynamic, and more consistent mixes using iBall than when mixing using the traditional fader-based approach, reducing the level of skill required to create broadcast quality mixes.
Convention Paper 9012 (Purchase now)

P17-7 A Definition of XML File Format and an Editor Application for Korean Traditional Music Notation System—Keunwoo Choi, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Yong Ju Lee, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Yongju Lee, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea; Kyeongok Kang, Electronics & Telecom. Research Institute (ETRI) - Daejeon, Korea
In this paper a computer-based system for representing Jeongganbo, the Korean traditional music notation system, is introduced. The system consists of an XML Document Type Definition, an editor application, and a converter into MusicXML. All information of Jeongganbo, including notes, directions, playing techniques, and lyrics, are encoded into XML using the grammar of the proposed Document Type Definition. In addition, users can create and edit Jeongganbo XML files using the proposed editor and export them as a MusicXML file. As a result, users can represent, edit, and share the musical content of Korean traditional music in the digital domain, as well as analyze score-based content for information retrieval.
Convention Paper 9013 (Purchase now)

P17-8 The Structure of Noise Power Spectral Density-Driven Adaptive Post-Filtering Algorithm—Jie Wang, Guangzhou University - Guangzhou, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Chunliang Zhang, Guangzhou University - Guangzhou, China; Yueyan Sun, Guangzhou University - Guangzhou, China
Conventional post-filtering (CPF) algorithms often use a fixed filter bandwidth to estimate the auto-spectra and the cross-spectrum. This paper first studies the drawback of the CPF algorithms under the stochastic model and discusses the ways to improve the performances of the CPF algorithms. To improve noise reduction without introducing audible speech distortion, we propose a novel spectral estimator, which is based on the structure of the noise power spectral density (NPSD). The proposed spectral estimator is applied to improve the performance of the CPF. Experimental results verify that the proposed algorithm is better than the CPF algorithms in terms of the segmental signal-to-noise-ratio improvement and the noise reduction, especially the noise reduction, is about 6 dB higher than the CPF.
Convention Paper 8943 (Purchase now)

Sunday, October 20, 10:30 am — 12:00 pm (Room 1E13)

Workshop: W26 - FX Design Panel: Reverb

Chair:
Joshua D. Reiss, Queen Mary University of London - London, UK
Panelists:
Michael Carnes, Exponential Audio - Cottonwood Heights, UT, USA
Casey Dowdell, Bricasti

Sunday, October 20, 11:00 am — 1:00 pm (Room 1E15/16)

Special Event: The State of Mastering – 2013

Moderator:
Bob Ludwig, Gateway Mastering Studios, Inc. - Portland, ME, USA
Presenters:
Greg Calbi, Sterling Sound - New York, NY, USA
Darcy Proper, Wisseloord Studios - Hilversum, The Netherlands
Douglas Sax, The Mastering Lab - Ojai, CA, USA
Tim Young, Metropolis Mastering - London, UK

Abstract:
Ten years ago top mastering studios generally mastered and created final production masters for only the Compact Disc. Now we commonly create production masters for CDs, Downloads, files for streaming, special "Mastered for iTunes" downloads, and high resolution files for vinyl disk cutting, HDtracks, and Pure Audio Blu-ray masters.

Our Platinum Panelists will talk about the ramifications of State-of-Mastering in 2013 and what the future may hold. We will include some special sound demonstrations.

Sunday, October 20, 11:00 am — 12:30 pm (Room 1E10)

Workshop: W27 - DSP Studio Monitoring

Chair:
Dave Malekpour, Professional Audio Design Inc. - Pembroke, MA, USA; Augspurger Monitors
Panelists:
Michael Blackmer, Professional Audio Design
David Kotch, Criterion Acoustics
Andrew Munro, Munro Acoustics, Dynaudio Acoustics - London, UK
Carl Nappa, Extreme Institute by Nelly - Saint Louis, MO, USA
Paul Stewart, Genelec

Abstract:
Monitoring systems have evolved over the last decade to include DSP systems for both room correction and system voicing. We will explore how this affects the listening environment for control rooms, mastering, and critical listening environments. We will examine room measurements, and correction curves employed with subsequent results. The panelists will discuss their experiences and show real world examples of how this has worked and not worked in applications.

We will look at how this impacts the users listening environment for accuracy and sonic quality. We will also explore how it affects studio design and how it is implemented by manufacturers to get the results they want from the speaker design.

Sunday, October 20, 12:30 pm — 1:30 pm (Room 1E07)

Special Event: Lunchtime Keynote: Studio of the Future: 2020–2050

Presenter:
John La Grou, Millennia Music & Media Systems - Sierra Nevada, California, USA; POW-R Consortium

Abstract:
A brief look at the evolution of audio electronics, a theory of innovation, and a sweeping vision for the next forty years of audio production technology. Informed by the growth theories of Moore, Cray, and Kurzweil, we project the next forty years of professional audio products, production techniques, and delivery formats.

Sunday, October 20, 1:30 pm — 3:00 pm (Room 1E15/16)

Special Event: Era of the Engineer (Young Guru)

Presenter:
Young Guru, Roc Nation - Brooklyn, NY

Abstract:
Revered as “The Sound of New York,” Young Guru possesses over a decade of experience in sound engineering and production for the acclaimed Roc-A-Fella Records and Def Jam Recordings. Through his lecture and demo series, #eraoftheengineer, Guru examines the recent emergence of a new generation of do-it-yourself engineers, analyzing and demonstrating what it means for the culture at large.

Sunday, October 20, 2:00 pm — 4:00 pm (Room 1E07)

Paper Session: P18 - Perception—Part 2

Chair:
Agnieszka Roginska, New York University - New York, NY, USA

P18-1 Negative Formant Space, “O Superman,” and Meaning—S. Alexander Reed, Ithaca College - New York, NY, USA
This in-progress exploration considers both some relationships between sounding and silent formants in music and the compositional idea of spectral aggregates. Using poststructuralist lenses and also interpretive spectrographic techniques informed by music theorist Robert Cogan, it offers a reading of Laurie Anderson’s 1982 hit “O Superman” that connects the aforementioned concerns of timbre with interpretive processes of musical meaning. In doing so, it contributes to the expanding musicological considerations of timbre beyond its physical, psychoacoustic, and orchestrational aspects.
Convention Paper 9014 (Purchase now)

P18-2 The Effects of Interaural Level Differences Caused by Interference between Lead and Lag on Summing Localization—M. Torben Pastore, Rensselaer Polytechnic Institute - Troy, NY, USA; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
Traditionally, the perception of an auditory event in the summing localization range is shown as a linear progression from a location between a coherent lead and lag to the lead location as the delay between them increases from 0-ms to approximately 1-ms. This experiment tested the effects of interference between temporally overlapping lead and lag stimuli on summing localization. We found that the perceived lateralization of the auditory event oscillates with the period of the center frequency of the stimulus, unlike what the traditional linear model would predict. Analysis shows that this is caused by interaural level differences due to interference between a coherent lead and lag.
Convention Paper 9015 (Purchase now)

P18-3 Paired Comparison as a Method for Measuring Emotions—Judith Liebetrau, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Johannes Nowak, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Ilmenau University of Technology - Ilmenau, Germany; Matthias Krause, Ilmenau University of Technology - Ilmenau, Germany; Martin Rekitt, Ilmenau University of Technology - Ilmenau, Germany; Sebastian Schneider, Ilmenau University of Technology - Ilmenau, Germany
Due to the growing complexity and functionality of multimedia systems, quality evaluation becomes a cross-disciplinary task, taking technology-centric assessment, as well as human factors into account. Undoubtedly, emotions induced during perception, have a reasonably high influence on the experienced quality. Therefore the assessment of users’ affective state is of great interest for development and improvement of multimedia systems. In this work problems of common assessment methods as well as newly applied methods in emotional research will be displayed. Direct comparison of stimuli as a method intended for faster and easier assessment of emotions is investigated and compared to previous work. The results of the investigation showed that paired comparison seems inadequate to assess multidimensional items/problems, which often occur in multi-media applications.
Convention Paper 9016 (Purchase now)

P18-4 Media Content Emphasis Using Audio Effect Contrasts: Building Quantitative Models from Subjective Evaluations—Xuchen Yang, University of Rochester - Rochester, NY, USA; Zhe Wen, University of Rochester - Rochester, NY, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
In this paper we study media content emphasis patterns of audio effects and construct their quantitative models using subjective evaluation experiments. The media content emphasis patterns are produced by contrasts between effect-sections and non-effect sections, which change the focus of audience attention. We investigate media emphasis patterns of typical audio effects including equalization, reverberation, dynamic range control, and chorus. We compile audio test samples by applying different settings of audio effects and their permutations. Then we construct quantitative models based on the audience rating of the “subjective significance” of test audio segments. Statistical experiment design and analysis techniques are employed to establish the statistical significance of our proposed models.
Convention Paper 9017 (Purchase now)

Sunday, October 20, 2:30 pm — 4:00 pm (Room 1E14)

Workshop: W30 - Mastering Our Future Music

Chair:
Rob Toulson, Anglia Ruskin University - Cambridge, UK
Panelists:
Mandy Parnell, Black Saloon Studios - London, UK
Michael Romanowski, Michael Romanowski Mastering - San Francisco, CA, USA; Owner Coast Recorders
Jonathan Shakhovskoy, Script - London, UK

Abstract:
Emerging technologies are impacting the way in which music is captured, packaged, and delivered to the listener. Communications and working practices are evolving, bringing new challenges and opportunities for producing a high quality final product. Technical initiatives including mastering for iTunes, high resolution playback, dynamic range control, and advances in metadata require mastering engineers to continuously modernize their methods. Additionally, the methods and systems for music delivery are evolving, with artists exploring new avenues for engaging their audience. In particular the “Album App” format has been considered with regard to high resolution audio, secure digital content, and the inclusion of album artwork and interactive features. Each of these contemporary initiatives has an impact on the way the audio is mastered and finalized.

This session is presented in association with the AES Technical Committee on Recording Technology and Practices

Return to Recording & Production Track Events

EXHIBITION HOURS October 18th 10am �� 6pm October 19th 10am �� 6pm October 20th 10am �� 4pm

REGISTRATION DESK October 16th 3pm �� 7pm October 17th 8am �� 6pm October 18th 8am �� 6pm October 19th 8am �� 6pm October 20th 8am �� 4pm

TECHNICAL PROGRAM October 17th 9am �� 7pm October 18th 9am �� 7pm October 19th 9am �� 7pm October 20th 9am �� 6pm

Audio Engineering Society

AES New York 2013Recording & Production Track Event Details