AES San Francisco 2010
Broadcast and Media Streaming Track Event Details
Thursday, November 4, 9:30 am — 11:00 am (Room 133)
Broadcast and Media Streaming: B1 - Broadcast Facility Design: Attending to the Details
John Storyk, Walters-Storyk Design Group
Keith Hanadel, HLW Broadcast Facility Design
Bill Jarett, Food Network
Jim Servies, LA ESPN Facilities
Architect/ Acoustician John Storyk, co-principal, Walters-Storyk Design Group will chair a blue-ribbon panel focused on the innumerable details thath must be addressed when designing or upgrading a broadcast production/post-production facility. Panelists include: Food Network VP of Engineering Bill Jarett; leading SF-based acoustician Bob Skye; Keith Hanadel, architect/project manager, HLW Broadcast Facility Design; and Jim Servies, Principal Engineer, ESPN. Among the topics to be covered are: facilitating work flow via intelligent systems design; determining and achieving exact acoustic requirements; a variety of real world facility design specific issues; and, the end-users’ perspective—working the room.
Thursday, November 4, 9:30 am — 12:30 pm (Room 236)
Paper Session: P2 - Speech Processing
P2-1 Language Scrambling for In-Game Voice-Chat Applications—Nicolas Tsingos, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA
We propose a solution to enable speech-driven alien language synthesis in an in-game voice-chat system. Our technique selectively replaces the users’ input speech with a corresponding alien language output synthesized on-the-fly. It is optimized for a client-server architecture and uses a concatenative synthesis framework. To limit memory requirements, as well as preserve forwarding capabilities on the server, the concatenative synthesis is performed in the coded domain. For gaming applications, our approach can be used to selectively scramble the speech of opposing team members in order to provide compelling in-game voice feedback without exposing their strategy. The system has been implemented with multiple alien races in a virtual environment with effective, entertaining results.
Convention Paper 8161 (Purchase now)
P2-2 Speech Referenced Limiting: Controlling the Loudness of a Signal with Reference to its Speech Loudness—Michael Fisher, Nicky Chong-White, The HEARing CRC - Melbourne, Victoria, Australia, National Acoustic Laboratories, Sydney, NSW, Australia; Harvey Dillon, National Acoustic Laboratories - Sydney, NSW, Australia
A novel method of sound amplitude limiting for signals conveying speech is presented. The method uses the frequency-specific levels of the speech conveyed by the signal to generate a set of time-varying speech reference levels. It limits the level of sounds conveyed by the signal to these speech reference levels. The method is called speech referenced limiting (SRL). It provides minimal limiting of speech while providing greater control over the loudness of non-speech sounds compared to conventional (fixed threshold) limiters. It is appropriate for use in applications where speech is the primary signal of interest such as telephones, computers, amplified hearing protectors, and hearing aids. The effect of SRL on speech and non-speech sounds is presented.
Convention Paper 8162 (Purchase now)
P2-3 Individually Adjustable Signal Processing Algorithms for Improved Speech Intelligibility with Cochlear Implants—Isabell Kiral-Kornek, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany; Andreas Büchner,, Hörzentrum Hannover, Hannover Medical School - Hannover, Germany
Thanks to the development of cochlear implants (CIs) the treatment of certain hearing impairments and even deafness has become possible. However, up to now individual adjustments of the speech processing within a cochlear implant are limited to an unequal amplification of different frequency bands. As the perception of speech among patients differs beyond just loudness, improvements concerning individually adjustable signal processing are to be made. A novel approach is being presented that aims to increase the intelligibility for patients by extending common speech recognition tests to allow for an optimized speech processing tailored to the patients’ needs.
Convention Paper 8163 (Purchase now)
P2-4 Objective Evaluation of Wideband Speech Codecs for Bluetooth Voice Communication—Gary Spittle, CSR - Cambridge Silicon Radio - Cambridge, UK; Jacek Spiewla, Walter Kargus, Walter Zuluaga, Xuejing Sun, CSR - Cambridge Silicon Radio - Detroit, MI, USA
Bluetooth devices that stream audio are becoming increasingly popular. The user expectation has increased and as a result the requirements on wireless audio devices using Bluetooth has produced a number of challenges. This paper discusses the impact on the wideband speech due to various forms of adverse connection conditions. These are measured in terms of speech quality and intelligibility. A detailed understanding of various forms of degradation allows proper solutions to be provided.
Convention Paper 8164 (Purchase now)
P2-5 Speech Synthesis Controlled by Eye Gazing—Andrzej Czyzewski, Kuba Lopatka, Bartosz Kunka, Rafal Rybacki, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as "talking by eyes" providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot talk and move any part of their body. The paper describes a methodology of determining the fixation point on a computer screen. Then it presents an algorithm of concatenative speech synthesis used in the solution engineered. An analysis of working with the system is provided. Conclusions focusing on system characteristics are included.
Convention Paper 8165 (Purchase now)
P2-6 Voice Samples Recording and Speech Quality Assessment for Forensic and Automatic Speaker Identification—Andrey Barinov, Speech Technology Center Ltd. - Saint Petersburg, Russia
The task of speaker recognition or speaker identification becomes very important in our digital world. Most of the law enforcement organizations use either automatic or manual speaker identification tools for investigation processes. In any case, before carrying out the identification analysis, they usually need to record a voice sample from the suspect either for one to one comparison or to fill in the database. In this paper we describe the parameters of speech signal that are important for speaker identification performance, we propose the approaches of quality assessment, and provide the practical recommendations of taking the high quality voice sample, acceptable for speaker identification. The materials of this paper might be useful for both soft/hardware developers and forensic practitioners.
Convention Paper 8166 (Purchase now)
Thursday, November 4, 9:45 am — 11:45 am (Room 120)
Workshop: W2 - Standards for Multichannel Audio Distribution
Veronique Larcher, Sennheiser Research
Peter Jax, Peter Jax, Technicolor, Research, & Innovation
Rozenn Nicol, Orange Labs
Nils Peters, CNMAT, ICSI, UC Berkeley
Jack Vad, San Francisco Symphony
Wilfried van Baelen, Galaxy Studios
The case of 7.1 audio distribution seems to be well covered with Blu-Ray disks and Dolby TrueHD or DTS-HD formats. But how should this audio content be streamed? What mechanisms are in place to play it back on mobile platforms? Or to broadcast it to stadiums? To the home? The benefits offered by more channels and specifically by surround sound with height are gaining traction in the car industry. What benefits exactly? Will the channel inflation ever stop? The video industry has made progress toward ubiquitous high-definition and 3-D formats. What are their constraints to combine their video content to our multichannel audio? This workshop will gather practitioners from these application fields and collect everyone's hopes and constraints towards the next multichannel audio distribution standard.
Thursday, November 4, 11:30 am — 1:00 pm (Room 130)
Tutorial: T2 - Equalization—Are You Getting the Most Out of this Humble Effect?
Alex U. Case, University of Massachusetts Lowell - Lowell, MA, USA
Track by track, mix by mix, we reach for equalization constantly. Easy at first, EQ becomes more intuitive when you have a deep understanding of the parameters, types, and technologies used—plus deep knowledge of the spectral content of the most common pop and rock instruments. Alex Case offers a routine for applying EQ and strategies for its use: fixing problems, enhancing features, fitting the spectral pieces together, and more.
Thursday, November 4, 11:30 am — 1:00 pm (Room 133)
Broadcast and Media Streaming: B2 - Innovations in Digital TV
Jerry Whitaker, ATSC
Tim Carroll, Linear Accoustic
Sterling Davis, Cox Media Group
David Layer, NAB
Geir Skaaden, DTS Inc.
David Wilson, CEA
With the transition to digital television in North America well behind us, the various elements of the broadcast-to-consumer chain continue to look for ways to improve the service. Recent developments include technologies such as mobile DTV, non-real-time delivery of program material, and Internet-enabled television sets. Cooperative efforts are underway to develop new services while at the same time preserving the legacy services enjoyed by millions of consumers. This session will examine work currently underway to advance digital television to the next level, including concepts, options, and possible timelines.
Thursday, November 4, 2:30 pm — 6:30 pm (Room 236)
Paper Session: P3 - Acoustical Measurements
P3-1 Methods for Extending Room Impulse Responses beyond Their Noise Floor—Nicholas J. Bryan, Jonathan S. Abel, Stanford University - Stanford, CA, USA
Two methods of extending measured room impulse responses below their noise floor and beyond their measured duration are presented. Both methods extract frequency-dependent reverberation energy decay rates, equalization levels, and noise floor levels, and subsequently extrapolate the reverberation decay toward silence. The first method crossfades impulse response frequency bands with a late-field response synthesized from Guassian noise. The second method imposes the desired decay rates on the original impulse response bands. Both methods maintain an identical impulse response prior to the noise floor arrival in each band and seamlessly transition to a natural sounding decay after the noise floor arrival.
Convention Paper 8167 (Purchase now)
P3-2 On the Use of Ultrasound Transducer Arrays to Account for Time-Variance on Room Acoustics Measurements—Joel Preto Paulo, ISEL- Instituto Superior de Engenharia de Lisboa - Lisbon, Portugal, CAPS – Instituto Superior Técnico, TU Lisbon, Lisbon, Portugal; José Bento Coelho, CAPS – Instituto Superior Técnico, TU Lisbon - Lisbon, Portugal
In real room acoustical measurements, the assumption of time-invariant system is usually not verified. A measurement technique was set-up with the purposes of monitoring the acoustical media, searching for time variance phenomena, and for low SNR situations. A probe test signal in the ultrasonic band is sent to the room by using a parametric loudspeaker array, with high polar pattern directivity, simultaneous with the test signal frames. The relevant parameters to establish time-variance and associated thresholds are then estimated from the acquired ultrasonic sound. The valid test signal frames, which pass the thresholds test, are labeled with a weighting factor depending on its significance. Otherwise, the frames are rejected, not entering on the averaging process. Results are presented and discussed herein.
Convention Paper 8168 (Purchase now)
P3-3 Impulse Response Measurements in the Presence of Clock Drift—Nicholas J. Bryan, Miriam A. Kolar, Jonathan S. Abel, Stanford University - Stanford, CA, USA
There are many impulse response measurement scenarios in which the playback and recording devices maintain separate unsynchronized digital clocks resulting in clock drift. Clock drift is problematic for impulse response measurement techniques involving convolution, including sinusoidal sweeps and pseudo-random noise sequences. We present analysis of both a drifting record clock and playback clock, with a focus on swept sinusoids. When using a sinusoidal sweep without accounting for clock drift, the resulting impulse response is seen to be convolved with an allpass filter having the same frequency trajectory form as the input swept sinusoid with a duration proportional to the input sweep length. Two methods are proposed for estimating the clock drift and compensating for its effects in producing an impulse response measurement. Both methods are shown to effectively eliminate any clock effects in producing room impulse response measurements.
Convention Paper 8169 (Purchase now)
P3-4 Quasi-Anechoic Loudspeaker Measurement Using Notch Equalization for Impulse Shortening—Richard Stroud, Stroud Audio Inc. - Kokomo, IN, USA
The length of the impulse response of a typical piston driver is largely determined by the characteristic second-order high-pass response of the driver. This time response makes anechoic (i.e., gated) measurement difficult in non-anechoic environments, as reflections must be suppressed to returns of 30 ms. or more. This paper outlines a quasi-anechoic frequency and phase response modification technique using a tuned notch, or band-cut, equalization that shortens the impulse response and allows correct full-range loudspeaker measurement in moderately sized non-anechoic rooms.
Convention Paper 8170 (Purchase now)
P3-5 Estimating Room Impulse Responses from Recorded Balloon Pops—Jonathan S. Abel, Nicholas J. Bryan, Patty P. Huang, Miriam A. Kolar, Bissera V. Pentcheva, Stanford University - Stanford, CA, USA
Balloon pops are convenient for probing the acoustics of a space, as they generate relatively uniform radiation patterns and consistent “N-wave” waveforms. However, the N-wave spectrum contains nulls that impart an undesired comb-filter-like quality when the recorded balloon pop is convolved with audio. Here, a method for converting recorded balloon pops into full bandwidth impulse responses is presented. Rather than directly processing the balloon pop recording, an impulse response is synthesized according to the echo density and frequency band energies estimated in running windows over the balloon pop. Informal listening tests show good perceptual agreement between measured room impulse responses using a loudspeaker source and a swept sine technique and those derived from recorded balloon pops.
Convention Paper 8171 (Purchase now)
P3-6 Complex Modulation Transfer Function and its Applications in Transducer and Room Acoustics Measurements—Juha Backman, Nokia Corporation - Espoo, Finland
Modulation transfer function in audio applications describes well the clarity of sound, but conventional definitions and measurement methods are not easily applicable to transducer measurements, low-frequency acoustics, or capturing effects of narrow-band phenomena. A revised definition of modulation transfer function, taking into account the magnitude and phase of modulation transfer for each carrier and modulator frequency combination is presented. This function is derived from the complex frequency response by analyzing the response at the carrier frequency and at the modulation sidebands. Also the distortion of modulation envelope arising from the asymmetry especially in the phase transfer properties is discussed. Examples of the use of the complex modulation transfer function are presented for simple filters, anechoic response measurements of loudspeakers, and for loudspeakers in rooms.
Convention Paper 8172 (Purchase now)
P3-7 Practical Implementation of Perceptual Rub & Buzz Distortion and Experimental Results—Steve Temme, Pascal Brunet, Brian Fallon, Listen, Inc. - Boston, MA, USA
In a previous paper , we demonstrated how an auditory perceptual model based on an ITU standard can be used to detect audible Rub & Buzz defects in loudspeakers using a single tone stimulus. In this paper we demonstrate a practical implementation using a stepped sine sweep stimulus and present detailed experimental results on loudspeakers including comparison to human listeners and other perceptual methods.
Convention Paper 8173 (Purchase now)
P3-8 Measurement of Turbulent Air Noise Distortion in Loudspeaker Systems—Wolfgang Klippel, Robert Werner, Klippel GmbH - Dresden, Germany
Air leaks in the dust cap and cabinets of loudspeakers generate turbulent noise that highly impairs the perceived sound quality as rub and buzz and other loudspeaker defects do. However, traditional measurement techniques often fail in the detection of air leaks because the noise has a large spectral bandwidth but a low power density and similar spectral properties as ambient noise generated in a production environment. The paper models the generation process of turbulent air noise and develops a novel measurement technique based on asynchronous demodulation and envelope averaging. The technique accumulates the total energy of the leak noise radiated during the measurement interval and increases the sensitivity by more than 20 dB for measurement times larger than 1s. The paper also presents the results of the practical evaluation and discusses the application to end-of-line testing.
Convention Paper 8174 (Purchase now)
Thursday, November 4, 2:30 pm — 4:00 pm (Room 132)
Tutorial: T3 - Headphones, Headsets, and Earphones: Electroacoustic Design and Verification
Christopher J. Struck
This presentation reviews basic the electroacoustic concepts of gain, sensitivity, sound fields, signals, linear, and non-linear systems for ear-worn devices. The Insertion Gain concept is introduced. The orthotelephonic response is described as a target for both the free and diffuse fields. Equivalent volume and acoustic impedance are defined. Ear simulators and test manikins appropriate for Circum-, Supra-, and Intra-aural earphones are presented. The salient portions of the IEC 60268-4 standard are reviewed and examples are given of the basic measurements: Frequency Response, Distortion, Impedance. The basic concepts of Noise Canceling devices are also presented.
Thursday, November 4, 2:30 pm — 5:00 pm (Room 220)
Paper Session: P4 - Loudness and Dynamics
P4-1 The Loudness War: Background, Speculation, and Recommendations—Earl Vickers, STMicroelectronics, Inc. - Santa Clara, CA, USA
There is growing concern that the quality of commercially distributed music is deteriorating as a result of mixing and mastering practices used in the so-called “loudness war.” Due to the belief that “louder is better,” dynamics compression is used to squeeze more and more loudness into the recordings. This paper reviews the history of the loudness war and explores some of its possible consequences, including aesthetic concerns and listening fatigue. Next, the loudness war is analyzed in terms of game theory. Evidence is presented to question the assumption that loudness is significantly correlated to listener preference and sales rankings. The paper concludes with practical recommendations for de-escalating the loudness war.
Convention Paper 8175 (Purchase now)
P4-2 Subjective Evaluation of Gating Methods for Use with the ITU-R BS.1770 Loudness Algorithm—Scott Norcross, Michel Lavoie, Communications Research Centre - Ottawa, Ontario, Canada
Loudness measurements using ITU-R Recommendation BS.1770 can be biased downward relative to the perceived loudness level when periods of silence and/or low level signals are present in the program being measured. To address this, it has been proposed that some form of gating be added to the loudness algorithm. To evaluate various gating methods, a formal subjective test was conducted to measure the subjective loudness of broadcast material. The results of the subjective test were used to assess the performance of the gating technique proposed by the EBU P/LOUD expert group on loudness. The study further explored the effect of gating threshold and analysis window size on the accuracy of the objective measurement. While the use of gating did improve the accuracy of the loudness algorithm no single combination could be found that satisfied all scenarios.
Convention Paper 8176 (Purchase now)
P4-3 Comparing Continuous Subjective Loudness Responses and Computational Models of Loudness for Temporally Varying Sounds—Sam Ferguson, University of New South Wales - Sydney, NSW, Australia; Densil Cabrera, The University of Sydney - Sydney, NSW, Australia; Emery Schubert, University of New South Wales - Sydney, NSW, Australia
There are many ways in which loudness can be objectively estimated, including simple weighted models based on physical sound level, as well as complex and computationally intensive models that incorporate many psychoacoustical factors. These complex models have been generated from principles and data derived from listening experiments using highly controlled, usually brief, artificial stimuli; whereas the simple models tend to have a real world emphasis in their derivation and validation. Loudness research has recently also focused on estimating time-varying loudness, as temporal aspects can have a strong effect on loudness. In this paper continuous subjective loudness responses are compared to time-series outputs of loudness models. We use two types of stimuli: a sequence of sine tones and a sequence of band-limited noise bursts. The stimuli were analyzed using a variety of loudness models, including those of Glasberg and Moore, Chalupper and Fastl, and Moore, Glasberg and Baer. Continuous subjective responses were obtained from 24 university students, who rated loudness continuously in time over the period of the experiment, while using an interactive interface.
Convention Paper 8177 (Purchase now)
P4-4 Measuring Dynamics: Comparing and Contrasting Algorithms for the Computation of Dynamic Range—Jon Boley, LSB Audio LLC - Lafayette, IN, USA; Michael Lester, Shure Incorporated - Niles, IL, USA; Christopher Danner, University of Miami - Coral Gables, FL, USA
There is a consensus among many in the audio industry that recorded music has grown increasingly compressed over the past few decades. Some industry professionals are concerned that this compression often results in poor audio quality with little dynamic range. Although some algorithms have been proposed for calculating dynamic range, we have not been able to find any studies suggesting that any of these metrics accurately represent any perceptual dimension of the measured sound. In this paper we review the various proposed algorithms and compare their results with the results of a listening test. We show that none of the tested metrics accurately predict the perceived dynamic range of a musical track, but we identify some potential directions for future work.
Convention Paper 8178 (Purchase now)
P4-5 Dynamic Range Control for Audio Signals Using Fourth-Order Processing—Qing Yang, John Harris, University of Florida - Gainesville FL, USA
The human auditory system has been shown to be more sensitive to transient signals than stationary signals given the same energy. Conventional second-order measurements based on energy or root-mean-squared value cannot adequately characterize the auditory perception of non-stationary audio signals. A fourth-order dynamic range control (DRC) algorithm is proposed in this paper. The perceptual quality and dynamic range reduction effectiveness are evaluated for both second-order and fourth-order DRC algorithms. Evaluation results show that our proposed fourth-order DRC algorithm offers better balance of perceptual quality and dynamic range reduction than the conventional second-order approach.
Convention Paper 8179 (Purchase now)
Thursday, November 4, 3:00 pm — 4:30 pm (Room 133)
Broadcast and Media Streaming: B3 - Lip Sync Issue
Jonathan S. Abrams, CBNT, Chief Technical Engineer, Nutmeg Post
Paul Briscoe, Manager, Strategic Engineering, Harris Broadcast Communications Division
Dan Desmet, Flanders Scientific, Inc.
Pat Waddell, Chair of ATSC TSG/S6
Dave Wilson, Sr. Director, Technology & Standards, Consumer electronics Association
Lip sync remains a complex problem, with several causes and few solutions. From production through transmission and reception, there are many points where lip sync can either be properly corrected or made even worse. This session's panel will discuss several key issues. Where do the latency issues exist? How can the latency be measured? What correction techniques exist for controlled environments? How does video display design affect lip sync? Who is responsible for implementing the mechanisms that ensure lip sync is maintained when the signal reaches your television?
Join us as our panel addresses these questions and more.
Thursday, November 4, 3:00 pm — 5:00 pm
Technical Tour: TT4 - CBS Interactive
CBS Interactive (cbsinteractive.com) is the digital division of CBS Corporation and includes CNET, TV.com, CBS.com, CBSSports.com, MetaCritic.com, GameSpot, Chow, The Insider.com, Money Watch, and BNET. Located in the heart of San Francisco’s SOMA district, their headquarters includes media production and distribution facilities for the entire content distribution network.
Technical Tours are made available on a first come, first served basis to anyone with an All Access badge. Tickets can be purchased during normal registration hours at the convention center.
Price: $35 Members / $45 Nonmembers
Thursday, November 4, 4:30 pm — 6:30 pm (Room 130)
Tutorial: T5 - ImmersAV—"Infinite-Channel" Surround Sound with HD Video—A New Entertainment Format
Robert B. Schulein
In previous tutorials, AES125 and 126 in San Francisco and Munich, the essential elements of binaural hearing, recording, and playback were presented from a historical, current practice, and future trends perspective. One future trend presented the entertainment potential derivable from the synergy of binaural audio and high definition video. Unlike traditional audio recordings, often experienced with ones eyes closed, visual image cues directly related to an audio recording are readily observed to heighten the perceived spatial accuracy of the audio experience. In recognition of the abundance of entertainment audio and video being experienced with earphones connected to personal media players and computers, a scenario can be made for producing entertainment in this fashion. What can result is a "you are there," infinite-multichannel surround audio format with high definition video. The focus of this tutorial is to present the elements of creating such productions from an artistic, and technical perspective. Of particular importance, are the considerations given to the acoustic space, the music and the musician. A range of production examples will be presented supported by a variety of headphone and high definition video playback systems.
Thursday, November 4, 5:00 pm — 6:30 pm (Room 206)
Game Audio: G3 - The Wide Wonderful World of 5.1 Orchestral Recordings
Richard Dekkard, Director, Orphic Media LLC
Tim Gedemer, Owner/Supervising Sound Editor, Source Sound Inc.
When recording an orchestra was a simple affair using two or three microphones, the performance, the choice and placement of said microphones, and the quality of the recording medium were all that factored into the result. These days, orchestral recording takes almost as many forms as pop recording. Spot mics, multichannel arrays, postproduction, and editing are all used in the production process. In this panel, experts in both game and film audio will be discussing the means by which producers and engineers arrive at their final goals for different formats and deal with the challenges of 5.1 orchestral recording for their mediums. Topics will include the different footprint limits in the 5.1 format used for games versus the film format—as well as the those involved in streaming bandwidth for both games and movies. Panelists will go over editing in 5.1 for games to accommodate player-driven music as compared to the linear progression standard in film editing, and will also discuss the lack of standards for 5.1 in games versus the established process and standards for film.
Thursday, November 4, 5:00 pm — 6:00 pm (Room 133)
Broadcast and Media Streaming: B4 - Case Study of PungaNet: Uniting Radio Stations Across a Country
Kirk Harnack, Telos
Igor Zukina, AVC-Group
The traditional, one-to-many audio distribution model is ineffective. Indeed, it's only half a solution when affiliates each have contributions for all or parts of the organization. Consider that a network of related radio stations has resources that are likely geographically widespread. A Central Router Management System provides a management cloud from which individual affiliates may choose published resources, and may publish their own talent and programming assets.
PungaNet is an ingenious suite of mostly off-the-shelf, IP-connected technologies that enables many simultaneous network topologies, each on a scheduled or ad hoc basis, for distributing and sharing audio, control, metadata, and broadcast business processes over standard IP infrastructure.
Friday, November 5, 9:00 am — 10:45 am (Room 131)
Live Sound Seminar: LS3 - Measurement Microphones
Ray Rayburn, K2 Audio
David Josephson, Josephson Engineering
Noland Lewis, ACO Pacific
Karl Winkler, Lectrosonics
What makes measurement microphones different from regular mics? How do you choose, use, and store one? Type 1 or type 2? Free field or pressure response? Wired or wireless? Learn what the experts have to say.
Friday, November 5, 9:00 am — 11:00 am (Room 130)
Workshop: W4 - Wireless Audio Streaming
Gary Spittle, Audio Consultant
Deepen Sinha, ATC Labs
David Trainor, APTX
High quality audio is streamed wirelessly using many forms of radio channels. These range from satellite broadcasts, to mobile telephone networks, to Bluetooth ecosystems, and proprietary ultra low latency systems. This workshop will discuss the challenges we face, along with some of the techniques used, in delivering high quality audio over these connections. Audio codecs are an essential component of each radio link. It will be shown how they are adapted for the specific audio source material, radio channel, and receiving device in the system. Furthermore, the impact of interference on the channel will be presented in relation to the codec and how the effects can be minimized.
Friday, November 5, 9:00 am — 10:15 am (Room 206)
Broadcast and Media Streaming: B5 - Audio for the Olympic Broadcast
Michael Nunan, CTV
Joshua Tidsbury, CTV
Broadcasting the Vancouver 2010 Olympic Winter Games for Canada’s Olympic Broadcast Media Consortium was an exercise in large numbers.
2450 hours of programming
12 TV channels
20 Radio stations
7 Production Control Rooms, 6 studios, 15 crews
21 edit suites and more than 30 editors
“The Olympic Suite” music package produced in-house for the Games contained more than 240 cues. Over 400 animated production elements were designed, mixed, and deployed. More than 40 hours of "Feature" content was pre-produced for in-games use.
Tackling the audio for this massive undertaking would be daunting under any circumstances—but to do it in support of the first ever Olympic Winter Games to be produced entirely in 5.1 Surround was remarkable. Join us for a look behind the scenes of an amazing sonic adventure in Vancouver. The presentation will provide some insight into infrastructure, training, work flow, sound design, music production, and much more—from pre-production through to the Closing Ceremony.
Presenting this session will be 2 members of the CTV Operations and Engineering group: Michael Nunan and Joshua Tidsbury. Between them, Michael and Josh lived every moment of the Games and are pleased to have the opportunity to share their experiences.
Friday, November 5, 9:00 am — 10:45 am (Room 133)
Workshop: W5 - How Does It Sound Now? The Evolution of Audio
Gary Gottlieb, Webster University
With 27 Grammy awards between them, panelists Al Schmitt, Elliot Scheiner, Ed Cherney, and Mark Rubel are uniquely qualified to address the issues surrounding quality in audio, the one constant through decades of transitions in our business. Moderator Gary Gottlieb (engineer, author and educator) draws from the old Chet Atkins story with the punch line, "How does it sound now?" as these audio all-stars discuss the methodology employed when confronted with new and evolving technology and how we retain quality and continue to create a product that conforms to our own high standards. This may lead to other conversations about the musicians we work with, the consumers we serve, and the differences and similarities between their standards and our own. How high should your standards be? How should it sound now? How should it sound tomorrow?
Friday, November 5, 9:00 am — 10:30 am (Room 220)
Paper Session: P6 - Microphone Processing
P6-1 Digitally Enhanced Shotgun Microphone with Increased Directivity—Helmut Wittek, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany; Christof Faller, Illusonic LLC - Lausanne, Switzerland; Christian Langen, SCHOEPS Mikrofone GmbH - Karlsruhe, Germany; Alexis Favrot, Christophe Tournery, Illusonic LLC - Lausanne, Switzerland
Shotgun microphones are still state-of-the-art when the goal is to achieve the highest possible directivity and signal-to-noise ratio with high signal fidelity. As opposed to beamformers, properly designed shotgun microphones do not suffer greatly from inconsistencies and sound color artifacts. A digitally enhanced shotgun microphone is proposed, using a second backward-oriented microphone capsule and digital signal processing with the goal of improving directivity and reducing diffuse gain at low and medium frequencies significantly, while leaving the sound color essentially unchanged. Furthermore, the shotgun microphone’s rear lobe is attenuated.
Convention Paper 8187 (Purchase now)
P6-2 Conversion of Two Closely Spaced Omnidirectional Microphone Signals to an XY Stereo Signal—Christof Faller, Illusonic LLC - St-Sulpice, Switzerland
For cost and form factor reasons it is often advantageous to use omni-directional microphones in consumer devices. If the signals of a pair of such microphones are used directly, time-delay stereo with possibly some weak level-difference cues (device body shadowing) is obtained. The result is weak localization and little channel separation. If the microphones are relatively closely spaced, time-delay cues can be converted to intensity-difference cues by applying delay-and-subtract processing to obtain two cardioids. The delay-and-subtract processing is generalized to also be applicable when there is a device body between the microphones. The two cardioids could be directly used as stereo signal, but to prevent low frequency noise the output signals are derived using a time-variant filter applied to the input microphone signals.
Convention Paper 8188 (Purchase now)
P6-3 Determined Source Separation for Microphone Recordings Using IIR Filters—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Josh Reiss, Queen Mary University of London - London, UK
A method for determined blind source separation for microphone recordings is presented that attenuates the direct path cross-talk using IIR filters. The unmixing filters are derived by approximating the transmission paths between the sources and the microphones by a delay and a gain factor. For the evaluation, the proposed method is compared to three other approaches. Degradation of the separation performance is caused by fractional delays and the directivity of microphones and sources, which are discussed here. The advantages of the proposed method are low latency, low computational complexity, and high sound quality.
Convention Paper 8189 (Purchase now)
Friday, November 5, 9:30 am — 11:00 am (Room 226)
Poster: P8 - Audio Processing—1
P8-1 Near and Far-Field Control of Focused Sound Radiation Using a Loudspeaker Array—Sangchul Ko, Youngtae Kim, Jung-Woo Choi, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
In this paper a sound manipulation technique is proposed to prevent unwanted eavesdropping or disturbing others in the vicinity if a multimedia device is being used in a public place. This is capable of realizing the creation of a spatial region having highly acoustic potential energy at the listener’s position. For doing so, the paper discusses the design of multichannel filters with a spatial directivity pattern for a given arbitrary loudspeaker array configuration. First some limitations in using conventional beamforming techniques are presented, and then a novel control strategy is suggested for reproducing a desired acoustic property in a spatial area of interest close to the loudspeaker array. This technique also allows us to control an acoustic property in an area relatively far from the array with a single objective function. In order to precisely produce a desired shape of energy distribution in both areas, spatial weighting technique is introduced. The results are compared with those from controlling each area separately.
Convention Paper 8198 (Purchase now)
P8-2 A Real-Time implementation of a Novel Psychoacoustic Approach for Stereo Acoustic Echo Cancellation—Stefania Cecchi, Laura Romoli, Paolo Peretti, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy
Stereo acoustic echo cancellers (SAECs) are used in teleconferencing systems to reduce undesired echoes originating from coupling between loudspeakers and microphones. The main problem of this approach is related to the issue of uniquely identifying each pair of room acoustic paths, due to high interchannel coherence. In this paper a real-time implementation of a novel approach for SAEC based on the psychoacoustic effect of missing fundamental is proposed. An adaptive algorithm is employed to track and remove the fundamental frequency of one of the two channels, ensuring a continuous decorrelation without affecting the stereo quality. Several tests are presented taking into account a real-time implementation on a DSP framework in order to confirm its effectiveness.
Convention Paper 8199 (Purchase now)
P8-3 Solo Plucked String Sound Detection by the Energy-to-Spectral Flux Ratio (ESFR)—Byung Suk Lee, LG Electronics Inc. - Seocho-Gu, Seoul, Korea, Columbia University, New York, NY, USA; Chang-Heon Lee, Yonsei University - Seoul, Korea; Gyuhyeok Jeong, In Gyu Kang, LG Electronics Inc. - Seocho-Gu, Seoul, Korea
We address the problem of distinguishing solo plucked string sound from speech. Due to the harmonic components present in both types of signals, a low complexity music/speech classifier often misclassifies these signals. To capture the sustained harmonic structures observed in solo plucked string sound, we propose a new feature, the Energy-to-Spectral Flux Ratio (ESFR). The values and the statistics of the ESFR for solo plucked string sound were distinct from those for speech when calculated over windows of 20 to 50 ms. By building a low complexity detector with the ESFR, we demonstrate the discriminating performance of the ESFR feature for the considered problem.
Convention Paper 8200 (Purchase now)
P8-4 Separation of Repeating and Varying Components in Audio Mixtures—Sean Coffin, Stanford University - Stanford, CA, USA
A large amount of modern pop music contains digital “loops” or “samples” (short audio clips) that appear multiple times during a song. In this paper a novel approach to separating these exactly repeating component waveforms from the rest of an audio mixture is presented. By examining time-frequency representations of the mixture during several instances of a single repeating component and taking the complex value for each time-frequency bin with the smallest magnitude across all instances we can effectively extract the content that is perceived to be repeating given that the rest of the mixture varies sufficiently. Results are presented demonstrating successful application to commercially available recordings as well as to constructed audio mixtures achieving signal to interference ratios up to 42.8 dB.
Convention Paper 8201 (Purchase now)
P8-5 High Quality Time-Domain Pitch Shifting Using PSOLA and Transient Preservation—Adrian von dem Knesebeck, Pooya Ziraksaz, Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
An enhanced pitch shifting system is presented that uses the Pitch Synchronous Overlap Add (PSOLA) technique and a transient detection for processing of monophonic speech or instrument signals. The PSOLA algorithm requires the pitch information and the pitch marks for the signal segmentation in the analysis stage. The pitch is acquired using a well established pitch detector. A new robust pitch mark positioning algorithm is presented that achieves high quality results and allows the positioning of the pitch marks in a frame-based manner to enable real-time application. The quality of the pitch shifter is furthermore enhanced by extracting the transient components before the PSOLA and reapplying them at the synthesis stage to eliminate repetitions of the transients.
Convention Paper 8202 (Purchase now)
Friday, November 5, 10:15 am — 11:15 am (Room 120)
Tutorial: T7 - Analysis and Modeling of the dbx 902 De-esser
Analog device modeling is an increasingly important tool in modern audio signal processing. There are a variety of techniques for modeling many different devices. Here we will present an example of modeling one device, the dbx 902 de-esser, a very well regarded hardware de-esser, from start to finish. We will describe a set of techniques for analyzing the hardware unit as a “grey” box to determine its characteristics, incorporating the device’s specifications, and, most importantly, empirical results from probing the unit with test signals. Mathematical models for analyzing the de-esser will be presented, which would also apply to other dynamic range control processors. We will examine how this device differs from other typical implementations of de-essers, and finally, describe a digital emulation. The lessons learned here should be useful to any beginner interested in device modeling.
Friday, November 5, 10:30 am — 12:45 pm (Room 206)
Workshop: W6 - Single Unit Surround Microphones
Eddy B. Brixen, EBB-Consult
Gary Elko, mh acoustics LLC
David Josephson, Josephson Engineering
Jim Pace, Sanken Microphones / Plus 24
Pieter Schillebeeckx, SoundField
Morten Stove, DPA Microphones
Mattias Strömberg, Milab
Helmut Wittek, SCHOEPS Mikrofone GmbH
The workshop will present available single-unit surround sound microphones in a kind of "shoot out." There are a number of these microphones available and more units are on their way. These microphones are based on different principles. However, due to their compact sizes there may/may not be restrictions to the performance. Basically this workshop will present the different products and the ideas and theories behind them.
Friday, November 5, 10:45 am — 12:45 pm (Room 132)
Workshop: W7 - Applications for High-Quality Audio over Long-Distance Networks
Nathan Brock, University of California San Diego
Chris Chafe, Stanford University
Elizabeth Cohen, Cohen Acoustical
Jeremy Cooperstock, McGill University
Peter Stevens, BBC
The recent deployment of wide-area fiber networks has made low-latency streaming of uncompressed and lightly-compressed audio possible for many users in academia and industry. Applications for such streaming media, and for fast file transfers over such networks, have been explored for the past decade but are not widely known outside of the networking research community. This workshop will present several use cases for such networks in areas including live performance, production and postproduction, archiving, telecommunications, remote pedagogy, and broadcasting.
Friday, November 5, 11:00 am — 1:00 pm (Room 131)
Live Sound Seminar: LS4 - Measurement Systems and Applications
John Murray, Optimum System Solutions
Jamie Anderson, Rational Acoustics
Ralph Heinz, Renkus-Heinz
Bruce C. Olson, Olson Sound Design
Karl Winkler, Lectrosonics
An investigation of the most popular measurement systems used to align sound systems in the field. Methods and results will be discussed.
Friday, November 5, 11:00 am — 1:00 pm (Room 133)
Broadcast and Media Streaming: B6 - Loudness, Metadata, and other Audio Concerns for DTV
Tomlinson Holman, University of Southern California School of Cinematic Arts and Viterbi School of Engineering
Tim Carroll, Linear Acoustic
David Casey, DTS Inc.
Sterling Davis, Cox Media Group
Thomas Lund, TC Electronics
Steve Lyman, Dolby Laboratories
Jim Starzynski, NBC Universal, Chair ATSC S6-3 Audio Loudness Group
Pat Waddell, Harmonic Inc., Chair ATSC S6 Audio and Video Coding
The introduction of digital television to the U.S. market proceeded in a number of steps: standardization in the early 1990’s, first introduction of transmission and sets in the late 1990’s, and accelerated adoption over time, culminating as NTSC television was shut off in 2009. The audio standards (that were thought to be easy to do so came first on the testing schedule but turned out to be far more complex than expected) added a number of features to conventional workflows that are only now becoming to be understood in some areas of the vast television production chain. This workshop will discuss what the situation is today from several points of view, and how the standards are likely to be promulgated over the next few years.
Friday, November 5, 11:00 am — 1:00 pm (Room 220)
Paper Session: P9 - Listening Tests
P9-1 A Digital-Domain Listening Test for High-Resolution—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
There is much debate over whether sampling rates and wordlengths greater than the CD standard are significant for high-quality audio. Tests that have been done require extreme care in selecting compatible devices with known characteristics. I propose tests that use the highest-quality wide-band microphones, only one set of ADCs and DACs, and wide-band reproducing loudspeakers. Real music and artificial signals can be used that have ultrasonic content. The ADCs and DACs are always used at the same extended bit width and high sampling rate, typically 24 bits and 176.4 or 192 kHz. To perform comparative tests at reduced sampling rates and lower bit widths, the digital data is mathematically altered to conform closely to the reduced specification. Files so created can be played back with precise time registration and identical level. ABX tests can be used to quantify if differences are heard, and ensure blindness of tests. Switching of program material can be done in the digital domain, so that relays or other compromising connectivity can be avoided. This paper discusses some remaining difficult issues and outlines the mathematical computations that will be necessary for sample-rate conversion, linear-phase aliasing and reconstruction filters, dithering, and noise shaping of the processed signal.
Convention Paper 8203 (Purchase now)
P9-2 Variance in Level Preference of Balance Engineers: A Study of Mixing Preference and Variance Over Time—Richard King, Brett Leonard, Grzegorz Sikora, McGill University - Montreal, Quebec, Canada
Limited research has been conducted that quantifies how much expert listeners vary over time. A task-based testing method is employed to discern the range of variance an expert listener displays over both short and long periods of time. Mixing engineers are presented with a basic mixing task comprised of one stereo backing track and a solo instrument or voice. By tracking the range in level in which the mixing engineers place a soloist into an accompanying track over a number of trials, trends are observed. Distributions are calculated for three genres of music and variance is calculated over time. The results show that in fact the variance is relatively low, and even lower for the more experienced subjects. These results also provide a baseline for future testing.
Convention Paper 8204 (Purchase now)
P9-3 Evaluation of Superwideband Speech and Audio Codecs—Ulf Wüstenhagen, Bernhard Feiten, Jens Kroll, Alexander Raake, Marcel Wältermann, Deutsche Telekom AG Laboratories - Berlin, Germany
Increasingly growing usage of headphones for different telephony applications is paired with an increased quality expectation of the user. Recently, different standardization bodies have started work on an enhancement of telephone services. One objective is to improve the quality by providing a codec that supports low-delay super-wideband or fullband quality and in addition show a good quality not only for speech but also for music. Deutsche Telekom Laboratories have evaluated a range of low-delay super-wideband speech and audio codecs in comprehensive listening tests. The tests were conducted using the MUSHRA test method. A mixture of speech and audio conditions were used to check the performance of the codecs for different program types. The results of the listening tests are presented and discussed in the light of future applications.
Convention Paper 8205 (Purchase now)
P9-4 Subjective Listening Tests and Neural Correlates of Speech Degradation in Case of Signal-Correlated Noise—Jan-Niklas Antons, Anne K. Porbadnigk, Robert Schleicher, Benjamin Blankertz, Sebastian Möller, Berlin Institute of Technology - Berlin, Germany; Gabriel Curio, Charité-University Medicine - Berlin, Germany
In this paper we examine whether particularly sensitivity of the human cortex to reduction in speech quality is visible in the electroencephalogram (EEG) and whether these measures can be used to improve the behavioral assessment of speech quality. We degraded a speech stimulus (vowel /a/) in a scalable way and asked for a behavioral rating. In addition, the brain activity was measured with EEG. We trained classifiers, who were found capable of distinguishing between events that are seemingly similar at the behavioral level (i.e., no button press), neurally, however, noise contamination is detected, possibly affecting the long-term contentment with the transmission quality.
Convention Paper 8206 (Purchase now)
Friday, November 5, 11:30 am — 1:00 pm (Room 120)
Game Audio: G5 - Developing Sensible Reference Level Standards
Steve Martz, Sr. Design Engineer, THX Ltd.
Lance Brown, Cinematic Game Audio Consultant
Charles Deenen, Senior Creative Director, Audio, Electronic Arts
Ken Felton, Sound Design Manager, SCEA
Tom Hays, Director of Audio Services, Technicolor
Francesco Zambon, Audio Project Lead, Binari Sonori s.r.l.
Particularly in environments where the mix is dynamic and constantly changing, a continuing challenge for game developers is devising (and abiding by) guidelines for appropriate playback levels. While the ever-loudening, highly dynamic-range compressed strategies of the music industry may be appropriate in that world, games can use multiple alternate techniques to 'feel' louder and maintain a wide dynamic range without forcing the player to scramble for their remote. This panel will cover the findings of an ongoing multi-platform, multi-studio conversation about what such a set of guidelines would look like, and how we can apply these.
Friday, November 5, 11:30 am — 1:00 pm (Room 226)
Poster: P10 - Audio Processing—2
P10-1 MPEG-A Professional Archival Application Format and its Application for Audio Data Archiving—Noboru Harada, Yutaka Kamamoto, Takehiro Moriya, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan; Masato Otsuka, Memory-Tech Corporation - Tokyo, Japan
ISO/IEC 23000-6 (MPEG-A) Professional Archival Application Format (PA-AF) has just been standardized. This paper proposes an optimized and standard compliant implementation of a PA-AF archiving tool for audio archiving applications. The implementation made use of an optimized MPEG-4 Audio Lossless Coding (ALS) codec library for audio data compression and Gzip for other files. The PA-AF specification was extended to support platform specific attributes of Mac OSs while keeping interoperability among other OSs. Performance test results for actual audio data, such as ProTools HD projects, show that the processing time of a devised PA-AF archiving tool is twice as fast as that of MacDMG and WinZip while the compressed data size is much smaller than that of MacDMG and WinZip.
Convention Paper 8207 (Purchase now)
P10-2 Switched Convolution Reverberator with Two-Stage Decay and Onset Time Control—Keun-Sup Lee, Jonathan S. Abel, Stanford University - Stanford, CA, USA
An efficient artificial reverberator having two-stage decay and onset time controls is presented. A second-order comb filter controlling the reverberator frequency-dependent decay rates and onset times drives a switched convolution with short noise sequences. In this way, a non-exponential reverberation envelope is produced by the comb filter, while the switched convolution structure produces a high echo density. Several schemes for generating two-stage decays and onset time controls with different onset characteristics in different frequency-band are described.
Convention Paper 8208 (Purchase now)
P10-3 Guitar-to-MIDI Interface: Guitar Tones to MIDI Notes Conversion Requiring No Additional Pickups—Mamoru Ishikawa, Takeshi Matsuda, Michael Cohen, Univeristy of Aizu - Aizu-Wakamatsu, Fukushima-ken, Japan
Many musicians, especially guitarists (both professional and amateur), use effects processors. In recent years, a large variety of digital processing effects have been made available to consumers. Further, desktop music, the “lingua franca” of which is MIDI, has become widespread through advances in computer technology and DSP. Therefore, we are developing a “Guitar to MIDI” interface device that analyzes the analog guitar audio signal and emits a standard MIDI stream. Similar products are already on the market (such as the Roland GI-20 GK-MIDI Interface), but almost all of them need additional pickups or guitar modification. The interface we are developing requires no special guitar accessories. We describe a prototype platformed on a PC that anticipates a self-contained embedded system.
Convention Paper 8209 (Purchase now)
P10-4 A Mixed Mechanical/Digital Approach for Sound Beam Pointing with Loudspeakers Line Array—Paolo Peretti, Stefania Cecchi, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy; Marco Secondini, Andrea Fusco, FBT Elettronica S.p.a. - Recanati (MC), Italy
Digital steering is often used in line array sound systems in order to tilt the reproduced sound beam in a desired direction. Unfortunately, the working frequency range is limited to low and medium frequencies, thus, sound beams referred to high frequencies can be tilted only by using a mechanical steering involving both an expensive manufacture and a higher environmental impact. The proposed solution is a mixed approach to sound beam steering by considering an on-axis mechanical rotation of each loudspeaker together with the classical digital control applied to the entire system. In this manner the sound beam can be tilted also at high frequency maintaining linear array geometry. Simulations, considering real loudspeaker directivity, will be shown in order to demonstrate the effectiveness of the proposed approach.
Convention Paper 8210 (Purchase now)
P10-5 The Non-Flat and Continually Changing Frequency Response of Multiband Compressors—Earl Vickers, STMicroelectronics, Inc. - Santa Clara, CA, USA
Multiband dynamic range compressors are powerful, versatile tools for audio mastering, broadcast, and playback. However, they are subject to certain problems relating to frequency response. First, when excited by a time-varying narrow-band input such as a swept sinusoid, they create unwanted magnitude peaks at the band boundaries. Second, and more importantly, the frequency response continually changes, which may have unwanted effects on the long-term average spectral balance. This paper proposes a frequency-domain solution for the unwanted magnitude peaks, whereby slight adjustments to the band boundaries prevent sinusoidal peaks from being midway between two bands. For the second problem, real-time spectral balance compensation may be implemented in either the time or frequency domain.
Convention Paper 8211 (Purchase now)
P10-6 Volterra Series-Based Distortion Effect—Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
A large part of the characteristic sound of the electric guitar comes from nonlinearities in the signal path. Such nonlinearities may come from the input- or output-stage of the amplifier, which is often equipped with vacuum tubes or a dedicated distortion pedal. In this paper the Volterra series expansion for non linear systems is investigated with respect to generating good distortion. The Volterra series allows for unlimited adjustment of the level and frequency dependency of each distortion component. Subjectively relevant ways of linking the different orders are discussed.
Convention Paper 8212 (Purchase now)
Friday, November 5, 2:30 pm — 4:15 pm (Room 131)
Live Sound Seminar: LS5 - Wireless Microphones for the Future
James Stoffo, Professional Wireless Systems
Don Boomer, Line 6
Mark Brunner, Shure
Joe Ciaudelli, Sennheiser
Gino Sigismondi, Shure
Karl Winkler, Lectrosonics
The FCC keeps changing the wireless spectrum available for microphones. The 700 MHz band is already off limits and now there is Super-WiFi and the National Broadband Plan to consider below 700 MHz. Is any part of the spectrum safe? Learn the latest developments from the FCC and how the experts are insuring reliable RF operation now and preparing for the future.
Friday, November 5, 2:30 pm — 4:00 pm (Room 120)
Game Audio: G6 - Mobile Game Audio for Headphones and Micro-Speakers
Steve Martz, Sr. Design Engineer, THX Ltd.
Peter "pdx" Drescher, Sound Designer, Twittering Machine
Greg Klas, Sr. Manager, Audio Engineering, Fisher-Price, Inc.
Jeffrey Xia, Sr. Acoustics Engineer, Ole Wolf Electronics
Mobile platforms (phones, toys, portable gaming devices) are typically relegated to using small speakers as a means to recreate an immersive environment for games. These playback devices have certain attributes that require unique approaches when creating content for mobile entertainment.
A panel consisting of speaker manufacturers and mobile game creators will discuss the performance characteristics and limitations of headphones and other micro-speakers as they pertain to playback of game audio on those devices as well considerations for designing game content.
Friday, November 5, 2:30 pm — 3:45 pm (Room 133)
Broadcast and Media Streaming: B7 - Innovations in Digital Radio
David Bialik, Consultant
Steve Fluker, Cox Radio
Frank Foti, Telos-Omnia-Axia
David Layer, NAB
Skip Pizzi, Consultant/Radio Ink
Tom Ray, WOR - Buckley Broadcasting
Geir Skaaden, DTS
David Wilson, CEA
This session will discuss the various innovations of the past year plus what is on the horizon. Transmission, playback, production and reception are some of the topics. This will be a discussion of technology and technique.
Friday, November 5, 2:30 pm — 4:00 pm (Room 130)
Workshop: W9 - Live Monitoring and Latency with Digital Audio Networks
Umberto Zanghieri, ZP Engineering srl
Carl Bader, Aviom
Kevin Gross, AVA Networks
Michael Lester, Shure
Robert Scovill, Avid
The increasing adoption of digital audio networks for live events can impact the latency of audio signals as perceived on stage. Issues related to audio latency when considering personal monitoring and traditional, speaker-based monitoring are discussed. Real cases are shown and detailed, as well as the preferences and habits of performers.
Friday, November 5, 2:30 pm — 6:30 pm (Room 220)
Paper Session: P11 - Acoustical and Physical Modeling
Julius O. Smith
P11-1 Virtual Acoustic Prototyping—Practical Applications for Loudspeaker Development—Alex Salvatti, JBL Professional - Northridge, CA, USA
Acoustic simulations using finite elements have been used in loudspeaker development for over 20 years, with complexity and accuracy accelerating in tandem with the increases in computing power generally available on the engineering desktop. Using user-friendly, modern FEA software, the author presents an overview of methods to build virtual prototypes of both horns and loudspeaker drivers that allows a significant reduction in the number of physical prototypes, as well as reduced development time. A comparison of simulated vs. measured data proves the validity of the methods.
Convention Paper 8213 (Purchase now)
P11-2 Simulation of Horn Driver Response by Combination of Matrix Analysis and FEA—Alex Voishvillo, JBL Professional - CA, USA
To access performance of a horn driver (compression driver loaded by a horn), measurement of frequency response on-axis and off-axis must be carried out. The measurement process is time-consuming especially if the entire 3-dimensional “balloon” of responses is to be measured. Prediction of directional responses of the horn only (without compression driver) can be performed by the FEA (Finite Elements Analysis) or BEA (Boundary Elements Analysis). However, FEA or BEA of horn only provides relative directional properties of the horn. The SPL responses of horn driver at different angles remain unknown because these responses depend on interaction of electrical, mechanical, and acoustical parameters of the compression driver and the acoustical parameters of the horn. New methods based on a combination of FEA and matrix analysis makes it possible to predict the response of a combination of various compression drivers and horns without actually measuring each combination and even without physically building horns. This method was verified during the development of a new AM series of JBL professional loudspeaker systems and showed high accuracy.
Convention Paper 8214 (Purchase now)
P11-3 Dynamic Motion of the Corrugated Ribbon In a Ribbon Microphone—Daniel Moses Schlessinger, Sennheiser DSP Research Laboratory - Palo Alto, CA, USA; Jonathan S. Abel, Stanford University - Stanford, CA, USA
Ribbon microphones are known for their warm sonics, owing in part to the unique ribbon motion induced by the sound field. Here the motion of the corrugated ribbon element in a sound field is considered, and a physical model of the ribbon motion is presented. The model separately computes propagating torsional disturbances and coupled transverse and longitudinal disturbances. Each propagation mode is implemented as a mass-spring model where a mass is identified with a ribbon corrugation fold. The model is parametrized using ribbon material and geometric properties. Laser vibrometer measurements are presented, revealing stiffness in the transverse and longitudinal propagation and showing close agreement between measured and modeled ribbon motion.
Convention Paper 8215 (Purchase now)
P11-4 Modeling of Leaky Acoustic Tube for Narrow-Angle Directional Microphone—Kazuho Ono, Takehiro Sugimoto, Akio Ando, Kimio Hamasaki, NHK Science and Technology Research Laboratories - Kinuta Setagaya-ku, Tokyo, Japan; Takeshi Ishii, Yutaka Chiba, Keishi Imanaga, Sanken Microphone Co. Ltd. - Suginami-ku, Tokyo, Japan
Line microphones have been popular as narrow directional microphones for a long time. Their structure adopts a leaky acoustical tube with many slits to suppress off-axis sensitivity, together with a directional capsule attached to this tube. Although many microphones of this type are on the market, we have no quantitative theory to explain its behavior, which is very important for effectively designing directivity. We thus modeled the leaky acoustical tube using a distributed equivalent circuit and combined it with the directional capsule’s equivalent circuit model. The analysis showed that the model agreed well with the measurement results, particularly at the directional characteristics, while an ordinary model of acoustical tube using delay and sum modeling did not.
Convention Paper 8216 (Purchase now)
P11-5 Modeling Viscoelasticity of Loudspeaker Suspensions Using Retardation Spectra—Tobias Ritter, Finn Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark
It is well known that, due to viscoelastic effects in the suspension, the displacement of the loudspeaker increases with decreasing frequency below the resonance. Present creep models are either not precise enough or purely empirical and not derived from the basis of physics. In this investigation, the viscoelastic retardation spectrum, which provides a more fundamental description of the suspension viscoelasticity, is first used to explain the accuracy of the empirical LOG creep model (Knudsen et al.). Then, two extensions to the LOG model are proposed that include the low and high frequency limit of the compliance, not accounted for in the original LOG model. The new creep models are verified by measurements on two 5.5 loudspeakers with different surrounds.
Convention Paper 8217 (Purchase now)
P11-6 Physical Modeling and Synthesis of Motor Noise for Replication of a Sound Effects Library—Simon Hendry, Josh Reiss, Queen Mary University of London - London, UK
This paper presents the results of objective tests exploring the concept of using a small number of physical models to create and replicate a large number of samples from a traditional sound effects library. The design of a DC motor model is presented and this model is used to create both a household drill and a small boat engine. The harmonic characteristics, as well as the spectral centroid were compared with the original samples, and all the features agree to within 6.1%. The results of the tests are discussed with a heavy emphasis on realism and perceived accuracy, and the parameters that have to be improved in order to humanize a model are explored.
Convention Paper 8218 (Purchase now)
P11-7 Measures and Parameter Estimation of Triodes for the Real-Time Simulation of a Multi-Stage Guitar Preamplifier—Ivan Cohen, Ircam - Paris, France, Orosys R&D, Montpellier, France; Thomas Hélie, Ircam - Paris, France
This paper deals with the real-time simulation of a multi-stage guitar preamplifier. Dynamic triode models based on Norman Koren’s model, and "secondary phenomena" as grid rectification effect and parasitic capacitances are considered. Then, the circuit is modeled by a nonlinear differential algebraic system, with extended state-space representations. Standard numerical schemes yield efficient stable simulations of the circuit and are implemented as VST plug-ins. Measures of real triodes have been realized, to develop new triode models, and to characterize the capabilities of aged and new triodes. The results are compared for all the models, using lookup tables generated with the measures and Norman Koren’s model with its parameters estimated from the measures.
Convention Paper 8219 (Purchase now)
P11-8 ZFIT: A MATLAB Tool for Thiele-Small Parameter Fitting and Optimization—Christopher Struck, CJS Labs - San Francisco, CA, USA
Over the years, many approaches to the calculation of the Thiele-Small parameters have been presented. Most current methods rely upon curve-fitting to the impedance magnitude data for a specific lumped parameter model. A flexible Matlab least-mean-squares optimization tool for complex loudspeaker impedance data is described. Magnitude and phase data are fit to a user-selected lumped parameter model of variable complexity. Appropriate constraints on the optimization help identify if the selected model is of sufficient order or overly complex for the given data. Examples are shown for impedance data from several different loudspeaker drivers.
Convention Paper 8220 (Purchase now)
Friday, November 5, 2:30 pm — 4:00 pm (Room 226)
Poster: P13 - Audio Equipment and Measurement
P13-1 Neutral-Point Oscillation Control Based on a New Audio Space Vector Modulation (A-SVM) for DCI-NPC Power Amplifiers—Vicent Sala, Luis Romeral, G. Ruiz, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
In this paper the oscillation or flotation in the DC-BUS neutral point in the DCI-NPC (Diode Clamped Inverter – Neutral Point Clamped) amplifiers is presented as one of the most important distortion sources. This perturbation is characterized and studied, as well as its causes and distorting effects. It also presents two techniques of vector modulation for audio. The intelligent use of these techniques in the process of vector modulation allows the redistribution of the charge of the two capacitors in the DC-BUS, allowing the control of the voltage in the neutral point of the DC-BUS, and therefore, the cancellation of the flotation and its distorting effects. Experimental and simulation results that verify these strategies are presented.
Convention Paper 8227 (Purchase now)
P13-2 Vacuum Tube Amplifiers Using Electronic DC Transformers—Theeraphat Poomalee, Kamon Jirasereeamornkul, King Mongkut’s University of Technology Thonburi - Tung-kru, Bangkok Thailand; Marian K. Kazimierczuk, Wright State University - Dayton, OH, USA
This paper proposes a method to synthesis vacuum-tube audio amplifiers using the electronic DC transformers to replace the traditional audio-frequency output transformers usually used in the output stage of the amplifier. The proposed amplifiers can achieve the frequency response from DC-100 kHz if the DC transformers operated at 500 kHz switching frequency and interleave technique are used. The principle of operation, DC model, and various examples are given.
Convention Paper 8228 (Purchase now)
P13-3 The Single Stereo Display and Stereo VU Meters—Michael D. Callaghan, Radio Station KIIS-FM - Los Angeles, CA, USA
This paper describes the use of a single row of bi-color indicators to replace and overcome the deficiencies of the typical pair of meters used to show left and right signal levels in stereo applications. By using bi-color elements, a total of three colors are actually obtained; a single color when the left channel is driven, a single color when the right channel is driven, and a mixture of the two when both channels are driven. Watching the row of indicators during program operation will indicate three different amplitudes; the left channel volume, the right channel volume, and the difference between the two of them. These amplitudes are immediately obvious and very easy to interpret.
Convention Paper 8229 (Purchase now)
P13-4 Frequency Characteristics Measurements of Cylindrical Record Player by the Pulse-Train Method—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, The University of Tokyo - Tokyo, Japan
The authors have been engaged in the research of restoration of seriously damaged audio signals employing Generalized Harmonic Analysis (GHA). In this research it is important to know frequency characteristics of sound reproducing equipment to obtain clear sound with proper tonal equalization. The authors previously measured frequency characteristics of several acoustic 78 rpm shellac-record players utilizing the Pulse-Train Method, and successively measured cylindrical record players with same method recently. Frequency characteristics of phonograph record players were measured using frequency test records conventionally, however it is impossible to obtain shellac or cylindrical test records any more. Therefore the authors employed the Pulse-Train Method, which was originally developed for the measurements of phonograph cartridges and cutter heads in 1970s. For the measurement this time, the authors first made a cylindrical record curved a silent sound groove and curved an additional groove perpendicular to the sound groove on the cylinder surface. Pulse-train response was obtained by reproducing the cylindrical record using object record players and reference electric record player. Frequency characteristics of object record players were analyzed applying DFT to measured Pulse-Train waveforms.
Convention Paper 8230 (Purchase now)
P13-5 Seeing Sound: Sound Sensor Array with Optical Outputs—Charles Seagrave, Seagrave Instruments - San Rafael, CA, USA; Eric Benjamin, Consultant - Pacifica, CA, USA
Characterization of acoustic spaces frequently involves taking SPL measurements at numerous locations within the space. Such measurements typically require relocation of the measurement apparatus or multiple microphones wired to a multiplexer. This approach can be time consuming, especially if it must be repeated after changes in loudspeaker location or acoustical treatments of other modifications. This paper presents methods of visualizing both standing waves in rooms and loudspeaker coverage uniformity in outdoor venues, using an array of sound sensors with optical (visible light) output. This new approach allows for rapid visual observation of sound fields, and simultaneous SPL data collection from multiple positions.
Convention Paper 8231 (Purchase now)
P13-6 Effects of Oversampling on SNR Using Swept-Sine Analysis—Christopher Bennett, Daniel Harris, Adam Tankanow, Ryan Twilley, Oygo Sound, LLC - Miami, FL, USA
The swept-sine technique is an alternative method to acquire impulse response measurements and distortion component responses. Swept-sine analysis has been under recent investigation for its use in auditory applications. In this paper the researchers seek to show that an improvement in signal-to-noise ratio (SNR) can be achieved by applying oversampling while utilizing swept-sine analysis. Oversampling does not give an improvement in SNR in traditional click impulse response methods; however, due to the noise shaping properties of the post-processing involved in swept-sine analysis, the noise floor can be reduced.
Convention Paper 8232 (Purchase now)
P13-7 Rapid In-Place Measurements of Multichannel Venues—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada
It is often useful to have transfer-function measurements of large venues with an audience present. This precludes multiple chirps or other long-duration signals from being used. This paper studies the use of simultaneous, multiple “orthogonal” maximum-length sequences applied to the loudspeakers, captured by a number of microphones at selected listening positions. Such MLS signals last only a few seconds and are noise-like, being minimally disruptive to an audience, yet they allow full transfer-function system identification between each loudspeaker and microphone. The main detractor of the method is that the effective noise level is high. This paper studies implementation issues and assesses the S/N of such measurements. It turns out that exciting each loudspeaker separately is usually better than simultaneous excitation, except in special circumstances. An example is shown for the simultaneous measurement of two loudspeakers in a room with two microphones.
Convention Paper 8233 (Purchase now)
P13-8 Ground Loops: The Rest of the Story—Bill Whitlock, Jensen Transformers, Inc. - Chatsworth, CA, USA; Jamie Fox, The Engineering Enterprise - Alameda, CA, USA
The mechanisms that enable so-called ground loops to cause well-known hum, buzz, and other audio system noise problems are well known. But what causes power-line related currents to flow in signal cables in the first place? This paper explains how magnetic induction in ordinary premises AC wiring creates the small voltage differences normally found among system ground connections, even if “isolated” or “technical” grounding is used. The theoretical basis is explored, experimental data shown, and an actual case history related. Little has been written about this “elephant in the room” topic in engineering literature and apparently none in the context of audio or video systems. It is shown that simply twisting L-N pairs in the premises wiring can profoundly reduce system noise problems.
Convention Paper 8234 (Purchase now)
Friday, November 5, 4:00 pm — 5:30 pm (Room 133)
Broadcast and Media Streaming: B8 - Listener Fatigue and Retention
Sam Berkow, SIA
Frank Foti, Omnia
JJ Johnston, DTS Inc.
Sean Olive, Harman
Bill Sacks, Optimod.FM
Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
This panel will discuss listener fatigue and its impact on listener retention. While listener fatigue is an issue of interest to broadcasters, it is also an issue of interest to telecommunications service providers, consumer electronics manufacturers, music producers and others. Fatigued listeners to a broadcast program may tune out, while fatigued listeners to a cell phone conversation may switch to another carrier, and fatigued listeners to a portable media player may purchase another company's product. The experts on this panel will discuss their research and experiences with listener fatigue and its impact on listener retention.
Friday, November 5, 4:15 pm — 5:15 pm (Room 120)
Game Audio: G7 - Audio Shorts—Sound Design
Randy Buck, Principal, The Sound Department - Austin, TX, USA
Charles Deenen, Senior Creative Director, Audio, Electronic Arts
Kristoffer Larson, Audio Manager, WB Games, Seattle, WA, USA
Marc Schaefgen, Principal/Owner, The Sound Department - Austin, TX, USA
Jay Weinland, Senior Audio Lead, Bungie Studios
Three mini sessions are presented by game audio dudes that guarantee you will walk away with cool new techniques. Twenty minutes each to serve up an in-depth look at topics in sound design that matter most to them. Q&A to follow.
Game Audio Sound Sourcing - What is special about gathering sonic source material for games as opposed to other media? Need for longer ambiences (streams can be up to 5 minutes or more), more variation, more mic perspectives, components rather than complex events.
The Loop Trick and More - How to create seamless loops of any length and the best way to approach sound design for looping material. And, to loop or not to loop? That is always the question when it comes to rapid fire weapons. Learn a few techniques that go beyond the loop.
My Favorite Plugin! - Three speakers from other sessions will talk about their current favorite plug-in, why they love it so much, and how they use and or abuse it.
Friday, November 5, 4:30 pm — 6:30 pm (Room 130)
Workshop: W10 - Audio Network Control Protocols
Kevin Gross, AVA Networks - Denver, CO, USA
Bradford Benn, Harman Corporation - Los Angeles, CA, USA
Richard Foss, Rhodes University - Grhamstown, South Africa
Jeff Koftinoff, Meyer Sound - Berkeley, CA, USA
Andy Schmeder, CNMAT - University of California Berkeley
Peter Stevens, BBC - London, UK
Digital audio networks have solved a number of problems related to the distribution of audio within a number of contexts, including recording studios, stadiums, convention centers, theaters, and live concerts. They provide cabling ease, better immunity to interference, and enhanced control over audio routing and signal processing when compared to analog solutions. There exist a number of audio network types, and also a number of audio network protocols, that define the messaging necessary for connection management and control of devices within networks. In this workshop a panel of audio network protocol experts will share the features of audio network protocols that they are familiar with and how network protocols might adapt and change over the next few years, bearing in mind the need for interoperability.
Friday, November 5, 4:30 pm — 6:00 pm (Room 132)
Workshop: W11 - AES42 and Digital Microphones
Helmut Wittek, SCHOEPS Mikrofone GmbH
Stephan Flock, DirectOut GmbH
Tom Frey, Sennheiser
Stephan Peus, Georg Neumann GmbH
The AES42 interface for digital microphones is not yet widely used. This can be due to the relatively young appearance of digital microphone technology but also a lack of knowledge and practice with digital microphones and the corresponding interface exists. The advantages and disadvantages have to be communicated in an open and neutral way regardless of commercial interests but on the basis of the actual need of the engineers. Along with an available “White paper” about AES42 and digital microphones, which is aimed a neutral in-depth information and which was compiled from different authors, the proposed workshop intents to enlighten facts and prejudices on this topic.
Friday, November 5, 4:30 pm — 6:30 pm (Room 131)
Workshop: W12 - Keep Turning it Down! Developing an Exit Strategy for the Loudness Wars
Martin Walsh, DTS Inc.
Bob Ludwig, Gateway Mastering
Thomas Lund, TC Electronic
Susan Rogers, Berklee College of Music
Following on from the popular workshop presented at the 127th AES Convention that delved into topics relating to the nature and the consequences of the loudness wars, our panel of loudness experts and "master" mastering engineers will provide an update on the progress toward ending the war and returning peace, harmony, and dynamic range to the people.
The workshop will focus on alternatives to the practice of overly aggressive dynamic range compression using weapons such as seasoned mastering techniques and gain normalization algorithms and standards. Audience participation is encouraged and all are welcome to voice their own opinion and comments in relation to the issues discussed.
Friday, November 5, 4:30 pm — 6:00 pm (Room 226)
Poster: P14 - Loudspeakers and Microphones
P14-1 Coaxial Flat Panel Loudspeaker System with Dynamic Push-Pull Drive—Drazenko Sukalo, DSLab - Device Solution Laboratory - Munich, Germany
After the successful introduction of the flat television, acousticians are concerned with the design of a “full-range” flat panel loudspeaker. A new design with low manufactured depth, consisting of an array of two conventional cone drivers and a transmission line and the method for driving of them is presented. The main aim was to build a small-sized flat panel box but with extended low frequency response and low distortion output because of the extended liner diaphragm excursion. The PSpice-OrCAD® simulator was used to represent a distributed model of a transmission line. The results of the simulation show the influence of the parameters of the transmission-line enclosure on the impedance curve and resonant frequency of the woofer driver. Among others, this paper is concerned with an active filter design for driving loudspeaker drive units in an appropriate phase relationship in the low frequency region, by means of implementing of DPP drive. A prototype of the flat panel loudspeaker is built according to the described design concept and the results of sound pressure level measurement are presented. The design result from work performed for DSLab and is subject to the referenced patent.
Convention Paper 8235 (Purchase now)
P14-2 A Novel Universal-Serial-Bus-Powered Digitally Driven Loudspeaker System with Low Power Dissipation and High Fidelity—Hajime Ohtani, Akira Yasuda, Kenzo Tsuihiji, Ryota Suzuki, Daigo Kuniyoshi, Hosei University - Koganei, Tokyo, Japan; Junichi Okamura, Trigence Semiconductor - Chiyoda, Tokyo, Japan
We propose a novel digitally driven loudspeaker system in which a newly devised mismatch shaper method, multilevel noise shaping dynamic element matching, is used to realize high fidelity, high sound power level, and low power dissipation. The unit used for the mismatch shaper method can easily increase the number of sound pressure levels with the aid of an H-bridge circuit, even when the number of sub-speakers is fixed. Further, it reduces the noise caused by quantization and loudspeaker mismatches and decreases the switching loss. The output sound power level equipped with six voice coils is 94 dB/m when a 3.3-V universal-serial-bus power supply is used exclusively. The power efficiency is 95% at 0 dBFS and 75% at –10 dBFS.
Convention Paper 8236 (Purchase now)
P14-3 Loudspeaker Rub Fault Detection by Means of a New Nonstationary Procedure Test—German Ruiz, Vicent Sala, Miguel Delgado, Juan Antonio Ortega, UPC-Universitat Politecnica de Catalunya - Terrassa, Spain
This paper addresses rub defect loudspeaker detection. The study includes a simulation with a rub model based on classical static coulomb friction added to the loudspeaker nonlinearities parametric model to demonstrate the current signal viability to rub failure detection. The electric current signal is analyzed by means of Zhao-Atlas-Marks distribution (ZAMD). A failure extractor based on relevant harmonic ZAMD frequency regions segmentation and Mahalanobis distance is presented. The simulation and experimental results show the goodness and reliability of rub detection method presented.
Convention Paper 8237 (Purchase now)
P14-4 Contributions to the Improvement of the Response of a Pleated Loudspeaker—Jaime Ramis, Rita Martinez, Acustica Beyma S.L. - Moncada, Valencia, Spain; E. Segovia, Obras Públicas e Infraestructura Urbana - Spain; Jesus Carbajo, Jaime Ramis, Universidad de Alicante - Alicante, Spain
In this paper we describe some results that have led to the improvement of the response of an Air Motion Transformer loudspeaker. First, it is noteworthy that it has been found an approximate analytical solution to the differential equations system that governs the behavior of the moving assembly of this type of transducer, being this valid when the length of the pleat is much greater than the radius of the cylindrical part. This solution is valid for any type of analysis (static, modal, and harmonic), and the modes are significantly simplified assuming the hypothesis above mentioned. In addition, we have analyzed the influence of the thickness and the shape of perforation of the pole piece in the frequency response of the loudspeaker.
Convention Paper 8238 (Purchase now)
P14-5 Exploring the Ultra-directional Acoustic Responses of an Electret Cell Array Loudspeaker—Yu-Chi Chen, Wen-Ching Ko, National Taiwan University - Taipei, Taiwan; Chang-Ho Liou, National Taiwan University - Taipei, Taiwan, Industrial Technology Research Institute, Hsinchu Taiwan; Wen-Hsin Hsiao, Chih-Chiang Cheng, Wen-Jong Wu, Pei-Zen Chang, National Taiwan University - Taipei, Taiwan; Chih-Kung Lee, National Taiwan University - Taipei, Taiwan, Institute for Information Industry, Taipei, Taiwan
In recent years, novel thin-plate loudspeakers have triggered much interest. Applications in areas such as 3C peripherals, automobile audio systems, and home theater have been actively discussed. However, the acoustic directivity of a thin-plate loudspeaker depends on the frequency response. At this time, thin-plate loudspeakers have poor directivity. However, if this limitation can be overcome, thin-plate loudspeakers can find useful applications such as in museums, supermarkets, or exhibition areas that require a channeling the sound to a particular area or location without affecting nearby areas or unintended audiences. From previous studies, electret cell arrays have been confirmed to be an excellent flexible flat loudspeaker since it can create high performance sounds in a mid to high frequency range. An electret loudspeaker can generate ultra-directional audible sound by adjusting the array size, amplitude modulation, and layout structure.
Convention Paper 8239 (Purchase now)
P14-6 A Soundfield Microphone Using Tangential Capsules—Eric Benjamin, Suround Research - Pacifica, CA, USA
The traditional soundfield microphone is a tetrahedral array of pressure gradient microphones, the outputs of which are linearly combined in order to realize signals that are proportional to co-located microphones, one with omnidirectional sensitivity and three orthogonal microphones with figure-of-eight sensitivity. This configuration works well and has been the basis of commercial products for a number of years. Recently, an alternative array type has been disclosed [2,3] by Craven, Law, and Travis, comprised of pressure gradient sensors arranged with their principle axes oriented tangentially with respect to the center. Additional analysis has been performed and several prototypes were constructed and evaluated.
Convention Paper 8240 (Purchase now)
P14-7 A 2-Way Loudspeaker Array System with Pseudorandom Spacing for Music Concerts—Yuki Ayabe, Saburo Nakano, Tokyo City University - Setagaya-ku, Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Shogo Kiryu, Tokyo City University - Setagaya-ku, Tokyo, Japan
A 96-channel loudspeaker array system that allows real-time control of sound field has been developed for live musical concerts. Multiple sound focused at different points can been generated and controlled independently using the system. The variable delay circuits, the controller of the power amplifier, and the communication circuit between the hardware and the computer are implemented in FPGAs. In order to extend the frequency range and reduce the spatial aliasing, the loudspeaker array is assembled by two-way loudspeakers with pseudorandom spacing.
Convention Paper 8241 (Purchase now)
Friday, November 5, 5:45 pm — 7:00 pm (Room 133)
Broadcast and Media Streaming: B9 - Audio Processing for Streaming
Bill Sacks, Optimod.FM
Ray Archie, CBS
Frank Foti, Telos
Greg Ogonowski, Orban
Skip Pizzi, Consultant/Radio Ink
Finding the silver lining in the internet cloud
Traditional media people from broadcast and recording have a far different perspective from IT people. Compression now means two far different things to a seasoned audio engineer depending if the conversation is about dynamic range reduction or a data stream's efficiency. We must learn to communicate with one another. We must learn the TCP/IP language and protocols well enough to relate the needs to our IT counterparts and we need to be able to teach them, in their language, what we need them to do with us in order to accomplish our mission.
This panel will discuss the evolving relationship of audio and IT and how to improve not just the technical interfaces to learn, but the mutual understanding of our needs.
Saturday, November 6, 9:00 am — 10:45 am (Room 132)
Product Design: PD4 - Grounding & Shielding - Circuits and Interference (Part 1)
This first session will discuss the way signals and power are transported. We will discuss the basic meanings of words such as voltage, current, capacitance, and inductance; the role conductor geometries have in controlling where signals and power can travel; the problems of utility and facility design together with the meaning of ground and earth; the interference problems created by transformers and facility wiring; a discussion of shielding as applied to analog circuits and radiating structures. Terms such as differential, balanced, common-mode, normal-mode, and single ended will be explained.
Saturday, November 6, 9:00 am — 10:30 am (Room 133)
Broadcast and Media Streaming: B10 - Audio Over IP: A Tutorial
Steve Church, Telos Systems
Skip Pizzi, Media Technology Consultant & Technology Editor, Radio Ink magazine
IP-based networking continues to grow in popularity among broadcast, sound reinforcement, and other audio facilities. Learn the latest from the men who “wrote the book” on AoIP in this session, which will cover topics ranging from general advantages to specific applications of this groundbreaking new technology.
Saturday, November 6, 9:00 am — 1:00 pm (Room 220)
Paper Session: P15 - Multichannel Audio Playback
P15-1 Why Ambisonics Does Work—Eric Benjamin, Suround Research - Pacifica, CA, USA; Richard Lee, Pandit Littoral - Cookstown, Australia; Aaron Heller, SRI International - Menlo Park, CA, USA
Several techniques exist for surround sound, including Ambisonics, VBAP, WFS, and pair-wise panning. Each of the systems have strengths and weaknesses but Ambisonics has long been favored for its extensibility and for being a complete solution, including both recording and playback. But Ambisonics has not met with great critical or commercial success despite having been available in one form or another for many years. Some observers have gone so far as to suggest that Ambisonics can’t work. The present paper is intended to provide an analysis of the performance of Ambisonics according to various psychoacoustic mechanisms in spatial hearing, such as localization and envelopment.
Convention Paper 8242 (Purchase now)
P15-2 Design of Ambisonic Decoders for Irregular Arrays of Loudspeakers by Non-Linear Optimization—Aaron J. Heller, SRI International - Menlo Park, CA, USA; Eric Benjamin, Surround Research - Pacifica, CA, USA; Richard Lee, Pandit Littoral - Cooktown, Queensland, Australia
In previous papers, the present authors described techniques for design, implementation, and evaluation of Ambisonic decoders for regular loudspeaker arrays. However, to accommodate domestic listening rooms, irregular arrays are often required. Because the figures of merit used to predict decoder performance are non-linear functions of loudspeaker positions, non-linear optimization techniques are needed. In this paper we discuss the implementation of an open-source application based on the NLopt non-linear optimization software library that derives decoders for arbitrary arrays of loudspeakers, as well as providing a prediction of their performance using psychoacoustic criteria, such as Gerzon’s velocity and energy localization vectors. We describe the implementation and optimization criteria and report on listening tests comparing the decoders produced.
Convention Paper 8243 (Purchase now)
P15-3 Discrete Driving Functions for Wave Field Synthesis and Higher Order Ambisonics—César D. Salvador, Universidad de San Martín de Porres - Lima, Peru
Practical implementations of physics-based spatial sound reproduction techniques, such as Wave Field Synthesis (WFS) and Higher Order Ambisonics (HOA), require real-time filtering, scaling, and delaying operations on the audio signal to be spatialized. These operations form the so-called loudspeaker’s driving function. This paper describes a discretization method to obtain a rational representation in the z-plane from the continuous WFS and HOA driving functions. Visual and numerical comparisons between the continuous and discrete driving functions, and between the continuous and discrete sound pressure fields, synthesized with circular loudspeaker arrays, are shown. The percentage discretization errors, in the reproducible frequency range and in the whole listening area, are in the order of 1%. A methodology for the reconstruction of immersive soundscapes composed with nature sounds is also reported as a practical application.
Convention Paper 8244 (Purchase now)
P15-4 Reducing Artifacts of Focused Sources in Wave Field Synthesis—Hagen Wierstorf, Matthias Geier, Sascha Spors, Technische Universität Berlin - Berlin, Germany
Wave Field Synthesis provides the possibility to synthesize virtual sound sources located between the loudspeaker array and the listener. Such sources are known as focused sources. Previous studies have shown that the reproduction of focused sources is subject to audible artifacts. The strength of those artifacts heavily depends on the size of the loudspeaker array. This paper proposes a method to reduce artifacts in the reproduction of focused sources by using only a subset of loudspeakers of the array. A listening test verifies the method and compares it to previous results.
Convention Paper 8245 (Purchase now)
P15-5 On the Anti-Aliasing Loudspeaker for Sound Field Synthesis Employing Linear and Circular Distributions of Secondary Sources—Jens Ahrens, Sascha Spors, Deutsche Telekom AG Laboratories - Berlin, Germany
The theory of analytical approaches for sound field synthesis like wave field synthesis, nearfield compensated higher order Ambisonics, and the spectral division method requires continuous distributions of secondary sources. In practice, discrete loudspeakers are employed and the synthesized sound field is corrupted by a number of artifacts that are commonly referred to as spatial aliasing. This paper presents a theoretical investigation of the properties of the loudspeakers that are required in order to suppress such spatial aliasing artifacts. It is shown that the employment of such loudspeakers is not desired since the suppression of spatial aliasing comes by the cost of an essential restriction of the reproducible spatial information when practical loudspeaker spacings are assumed.
Convention Paper 8246 (Purchase now)
P15-6 The Relationship between Sound Field Reproduction and Near-Field Acoustical Holography—Filippo M. Fazi, Philip Nelson, University of Southampton - Southampton, UK
The problem of reproducing a desired sound field with an array of loudspeakers and the technique known as Near-Field Acoustical Holography share some fundamental theoretical aspects. It is shown that both problems can be formulated as an integral equation that usually defines an ill-posed problem. The example of spherical geometry and planar geometry is discussed in detail. It is shown that for both the reproduction and the acoustical holography cases, the ill-conditioning of the problem is greatly affected by the distance between the source layer and the measurement/control surface.
Convention Paper 8247 (Purchase now)
P15-7 Surround Sound with Height in Games Using Dolby Prologic IIz—Nicolas Tsingos, Christophe Chabanne, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA; Matt McCallus, RedStorm Entertainment - Cary, NC, USA
Dolby Pro Logic IIz is a new matrix encoding/decoding system that enables the transmission of a pair of height channels within a conventional surround sound stream (e.g. 5.1). In this paper we provide guidelines for the use of Pro logic IIz for interactive gaming applications including recommended speaker placement, creation of elevation information, and details on how to embed the height channels within a 5- or 7-channel stream. Surround sound with height is already widely available in home-theater receivers. It offers increased immersion to the user and is a perfect fit for 2-D or stereoscopic 3-D video games.
Convention Paper 8248 (Purchase now)
P15-8 Optimal Location and Orientation for Midrange and High Frequency Loudspeakers in the Instrument Panel of an Automotive Interior—Roger Shively, Harman International - Novi, MI, USA; Jérôme Halley, Harman International - Karlsbad, Germany; François Malbos, Harman International - Chateau du Loir, France; Gabriel Ruiz, Harman International - Bridgend, Wales, UK
In a follow-up to a previous paper (AES Convention Paper # 8023, May 2010) using the modeling process described there for modeling loudspeakers in an automotive interior, the optimization of midrange and of high frequency tweeter loudspeakers’ positions for best acoustic performance in the driver's side (left) and passenger's side (right) of automotive instrument panel is reported on.
Convention Paper 8249 (Purchase now)
Saturday, November 6, 9:00 am — 12:30 pm (Room 236)
Paper Session: P16 - Signal Analysis and Synthesis
Agnieszka Roginska, New York University - New York, NY, USA
P16-1 Maintaining Sonic Texture with Time Scale Compression by a Factor of 100 or More—Robert Maher, Montana State University - Bozeman, MT, USA
Time lapse photography is a common technique to present a slowly evolving visual scene with an artificially rapid temporal scale. Events in the scene that unfold over minutes, hours, or days in real time can be viewed in a shorter video clip. Audio time scaling by a major compression factor can be considered the aural equivalent of time lapse video, but obtaining meaningful time-compressed audio requires interesting practical and conceptual challenges in order to retain the original sonic texture. This paper reviews a variety of existing techniques for compressing 24 hours of audio into just a few minutes of representative "time lapse" audio and explores several useful modifications and optimizations.
Convention Paper 8250 (Purchase now)
P16-2 Sound Texture Analysis Based on a Dynamical Systems Model and Empirical Mode Decomposition—Doug Van Nort, Jonas Braasch, Pauline Oliveros, Rensselaer Polytechnic Institute - Troy, NY, USA
This paper describes a system for separating a musical stream into sections having different textural qualities. This system translates several contemporary approaches to video texture analysis, creating a novel approach in the realm of audio and music. We first represent the signal as a set of mode functions by way of the Empirical Mode Decomposition (EMD) technique for time/frequency analysis, before expressing the dynamics of these modes as a linear dynamical system (LDS). We utilize both linear and nonlinear techniques in order to learn the system dynamics, which leads to a successful separation of the audio in time and frequency.
Convention Paper 8251 (Purchase now)
P16-3 An Improved Audio Watermarking Scheme Based on Complex Spectral Phase Evolution Spectrum—Jian Wang, Ron Healy, Joe Timoney, NUI Maynooth - Co. Kildare, Ireland
In this paper a new audio watermarking algorithm based on the CSPE algorithm is presented. This is an extension of a previous scheme. Peaks in a spectral representation derived from the CSPE are utilized for watermarking, instead of the previously proposed frequency identification. Although this new scheme is simple, it achieves a high robustness besides perceptual transparency and accuracy which is one distinguishing advantage over our previous scheme.
Convention Paper 8252 (Purchase now)
P16-4 About This Dereverberation Business: A Method for Extracting Reverberation from Audio Signals—Gilbert Soulodre, Camden Labs - Ottawa, Ontario, Canada
There are many situations where the reverberation found in an audio signal is not appropriate for its final use, and therefore we would like to have a means of altering the reverberation. Furthermore we would like to be able to modify this reverberation without having to directly measure the acoustic space in which it was recorded. In the present paper we describe a method for extracting the reverberant component from an audio signal. The method allows an estimate of the underlying dry signal to be derived. In addition, the reverberant component of the signal can be altered.
Convention Paper 8253 (Purchase now)
P16-5 Automatic Recording Environment Identification Using Acoustic Reverberation—Usman Amin Chaudhary, Hafiz Malik, University of Michigan-Dearborn - Dearborn, MI, USA
Recording environment leaves its acoustic signature in the audio recording captured in it. For example, the persistence of sound, due to multiple reflections from various surfaces in a room, causes temporal and spectral smearing of the recorded sound. This distortion is referred to as audio reverberation time. The amount of reverberation depends on the geometry and composition of a recording location, the difference in the estimated acoustic signature can be used for recording environment identification. We describe a statistical framework based on maximum likelihood estimation to estimate acoustic signature from the audio recording and use it for automatic recording environment identification. To achieve these objectives, digital audio recording is analyzed first to estimate acoustic signature (in the form of reverberation time and variance of the background noise), and competitive neural network based clustering is then applied to the estimated acoustic signature for automatic recording location identification. We have also analyzed the impact of source-sensor directivity, microphone type, and learning rate of clustering algorithm on the identification accuracy of the proposed method.
Convention Paper 8254 (Purchase now)
P16-6 Automatic Music Production System Employing Probabilistic Expert Systems—Gang Ren, Gregory Bocko, Justin Lundberg, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
An automatic music production system based on expert audio engineering knowledge is proposed. An expert system based on a probabilistic graphical model is employed to embed professional audio engineering knowledge and infer automatic production decisions based on musical information extracted from audio files. The production pattern, which is represented as a probabilistic graphic model, can be “learned” from the operation data of a human audio engineer or manually constructed from domain knowledge. The authors also discuss the real-time implementation of the proposed automatic production system for live mixing application scenarios. Musical event alignment and prediction algorithms are introduced to improve the time synchronization performance of our production model. The authors conclude with performance evaluations and a brief summary.
Convention Paper 8255 (Purchase now)
P16-7 Musical Eliza: An Automatic Musical Accompany System Based on Expressive Feature Analysis—Gang Ren, Justin Lundberg, Gregory Bocko, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose an interactive algorithm that musically accompanies musicians based on the matching of expressive feature patterns to existing archive recordings. For each accompany music segment, multiple realizations with different musical characteristics are performed by master music performers and recorded. Musical expressive features are extracted from each accompany segment and its semantic analysis is obtained using music expressive language model. When the performance of system user is recorded, we extract and analyze musical expressive feature in real time and playback the accompany track from the archive database that best matches the expressive feature pattern. By creating a sense of musical correspondence, our proposed system provides exciting interactive musical communication experience and finds versatile entertainment and pedagogical applications.
Convention Paper 8256 (Purchase now)
Saturday, November 6, 10:30 am — 12:00 pm (Room 130)
Tutorial: T9 - Damping of the Room Low-Frequency Acoustics (Passive and Active)
As the result of its size and geometry, a room excessively amplifies sound at certain frequencies. This is the result of standing waves (acoustic resonances/modes) of the room. These are waves whose original oscillation is continuously reinforced by their own reflections. Rooms have many resonances, but only the low-frequency ones are discrete, distinct, unaffected by the sound absorbing material in the room, and accommodate most of the acoustic energy build up in the room.
In this tutorial, after discussing the low frequency room acoustics, different passive and active bass trapping techniques for adding damping to a room will be talked about and their advantages/disadvantages discussed. The event will conclude by comparing/contrasting damping with equalizing.
Saturday, November 6, 11:00 am — 12:30 pm (Room 133)
Broadcast and Media Streaming: B11 - Audio Processing for Radio
Tom Ray, Buckley Broadcasting
Steve Fluker, Cox Media - Orlando, FL, USA
Frank Foti, Omnia
Jeff Keith, Wheatstone Corporation
Robert Orban, Orban
There is much discussion as to why radio stations are “over-processed”—a term that is true or not depending on your point of view. This panel will be discussing audio processing in the radio environment. There will be a brief discussion of audio processing history, up to and including the advantages of using digital processors. And radio today is not just an analog medium—we will discuss do’s and don’t’s for processing radio in the digital realm—and try taking a look into the future.
Saturday, November 6, 11:00 am — 1:00 pm (Room 206)
Workshop: W13 - Progress in Computer-Based Playback of High Resolution Audio
Vicki R. Melchior, Audio DSP Consultant - Boston, MA, USA
Bob Bauman, Lynx Studio Technology - Costa Mesa, CA, USA
James Johnston, DTS Inc. - Calabasas, CA, USA
Andy McHarg, dCS Ltd. - Cambridge, UK
Daniel Weiss, Weiss Engineering Ltd. - Zurich, Switzerland
With the continuing decline in discs as music sources and concurrent growth of electronic distribution, computers and network attached storage (NAS) are now rapidly evolving as front end components in place of traditional transports and players. Computers have long been useful within mastering workflows, though not always loved, and their introduction into high quality music systems raises a new range of engineering challenges.
Intrinsic to computers are problems of EMC, switching noise, dirty power, jittered clocks, crosstalk, driver and operating system variability, protocol incompatibilities, and software errors, to name a few. These may directly influence audio quality. Of special importance, for example, are the design as well as system configuration of digital audio interfaces (USB, Firewire, S/PDIF, WiFi, Ethernet etc), D/A conversion, and data processing, along with clocks and power sourcing.
The panel in this workshop are active in the design of these systems and will discuss some of their results and thoughts regarding the most salient factors for optimization of sonic performance in this area.
Saturday, November 6, 2:30 pm — 4:30 pm (Room 132)
Product Design: PD5 - Grounding & Shielding - Circuits and Interference (Part 2)
This second session of Grounding and Shielding Circuits takes off where the first session ended. Under discussion will be : the role of digital logic and processors in the audio world; A/D converters; transmission line basics; impedance control and impedance matching; the relation between rise and fall times and frequency spectrum; the need for ground and power planes; decoupling and filtering on logic structures; the interface between analog and digital circuits; digital analog filters; aliasing errors; balanced digital lines and common-mode rejection; multilayer boards; interference problems such as cross talk, ground bounce, via locations.
Saturday, November 6, 2:30 pm — 4:00 pm (Room 133)
Broadcast and Media Streaming: B12 - Audio for Newsgathering
Skip Pizzi, Media Technology Consultant & Technology Editor, Radio Ink magazine
Daniel Mansergh, Director of Engineering, KQED-FM, San Francisco
Jeff Towne, Co-producer/Engineer, “Echoes”, and Tools Editor, Transom.org
The world of broadcast news presents numerous challenges to the recording, production, and presentation of the rich information carried by sound. Most challenging is the proper recording of raw sound for news events as they happen in the field. From faraway battlefields to neighborhood playgrounds, the tools and processes required to capture this sound appropriately and reliably for broadcast are a highly specialized tributary of audio technology. Practicing experts in the field will share tips and techniques honed over years of experience on the beat.
Saturday, November 6, 2:30 pm — 3:30 pm (Room 130)
Workshop: W14 - Rethinking the Digital Audio Workstation
Michael Hlatky, accessive tools GmbH
Jörn Loviscach, University of Applied Sciences Bielefeld
Guy McNally, Uncut Video Inc.
Bernard Mont-Reynard, SoundHound Inc.
Allen Saego, London Metropolitan University
The DAWs of today very much resemble those of 1989. Yes, the buttons have become nicer and we can record more tracks in parallel, but with the technology advances since then, we should be doing much better. There are, however, not many companies on the market today that have the ability to rethink how their products work. Yet, universities and research centers have brought us an immense collection of new technologies to build new products upon. How about, for instance, real-time online cross-DAW collaboration, leveraging social networks for finding the optimal effect settings, and making DAWs not only bulletproof, but also foolproof? This workshop surveys existing technologies; it looks into possible synergies from other fields of computing sciences, and proposes practical improvements for and/or radical changes to DAW software.
Saturday, November 6, 2:30 pm — 6:30 pm (Room 220)
Paper Session: P17 - Real-Time Audio Processing
P17-1 A Time Distributed FFT for Efficient Low Latency Convolution—Jeffrey Hurchalla, Garritan Corp. - Orcas, WA, USA
To enable efficient low latency convolution, a Fast Fourier Transform (FFT) is presented that balances processor and memory load across incoming blocks of input. The proposed FFT transforms a large block of input data in steps spread across the arrival of smaller blocks of input and can be used to transform large partitions of an impulse response and input data for efficiency, while facilitating convolution at very low latency. Its primary advantage over a standard FFT as used for a non-uniform partition convolution method is that it can be performed in the same processing thread as the rest of the convolution, thereby avoiding problems associated with the combination of multithreading and near real-time calculations on general purpose computing architectures.
Convention Paper 8257 (Purchase now)
P17-2 An Infinite Impulse Response (IIR) Hilbert Transformer Filter Design Guide for Audio—Daniel Harris, Sennheiser Research Laboratory - Palo Alto, CA, USA; Edgar Berdahl, Stanford University - Stanford, CA, USA
Hilbert Transformers have found many applications in the signal processing community, from single-sideband communication systems to audio effects. IIR implementations are attractive for computationally sensitive systems due to their lower number of coefficients. However, as in any advanced filter design problem, their tuning and implementation present a number of design challenges and tradeoffs. Furthermore, while literature addressing these problems exists, designers must draw from several sources to find answers. In this paper we present a complete start-to-finish explanation of how to implement an efficient infinite impulse response (IIR) Hilbert transformer filter. We start from a half-band filter design and show how the poles move as the half-band filter is transformed into summed all-pass filters and then from there into a Hilbert transformer filter. The design technique is based largely on pole locations and creates a filter in the cascaded 1st order allpass form, which is numerically robust.
Convention Paper 8258 (Purchase now)
P17-3 Automatic Parallelism from Dataflow Graphs—Ramy Sadek, University of Southern California - Playa Vista, CA, USA
This paper presents a novel algorithm to automate high-level parallelization from graph-based data structures representing data flow. Algorithm correctness is shown via a formal proof by construction. This automatic optimization yields large performance improvements for multi-core machines running host-based applications. Results of these advances are shown through their incorporation into the audio processing engine Application Rendering Immersive Audio (ARIA) presented at AES 117. Although the ARIA system is the target framework, the contributions presented in this paper are generic and therefore applicable in a variety of software such as Pure Data and Max/MSP, game audio engines, non-linear editors and related systems. Additionally, the parallel execution paths extracted are shown to give effectively optimal cache performance, yielding significant speedup for such host-based applications.
Convention Paper 8259 (Purchase now)
P17-4 The Design of Low-Complexity Wavelet-Based Audio Filter Banks Suitable for Embedded Platforms—Neil Smyth, CSR - Cambridge Silicon Radio - Belfast, N. Ireland, UK
Many audio applications require the use of low complexity, low power, and low latency filter banks (e.g., real-time audio streaming to mobile devices). The underlying mathematics of wavelet transforms provides these attractive characteristics for embedded platforms. However, commonly used wavelets (Haar, Daubechies) possess coefficients containing irrational numbers that lead to distortion in fixed-point implementations. This paper discusses the development and provides practical performance comparisons of filter banks using wavelet transforms as an alternative to more commonly used sub-banding filter banks in PCM audio coding algorithms. The advantages and disadvantages of wavelets used in such audio compression applications are also discussed.
Convention Paper 8260 (Purchase now)
P17-5 Application of Optimized Inverse Filtering to Improve Time Response and Phase Linearization in Multiway Loudspeaker Systems—Mario Di Cola, Audio Labs Systems - Casoli (CH), Italy; Daniele Ponteggia, Studio Ing. Ponteggia - Terni (TR), Italy
Digital processing has been widely demonstrated to be a very useful technique in improving loudspeaker systems’ performances. Particularly interesting is Inverse Filtering applied to loudspeaker systems because it can improve performances and sound quality in terms of transient response and reduced overall phase shift. Inverse Filtering is a processing technique that can be realized with FIR filtering techniques with a specific sequence of taps that need to be synthesized “ad hoc” for a specific transducer and/or for a specific loudspeaker system configuration. Most of the studies on this matter so far, with very few exceptions, have been focused on the “DSP processing” point of view, being generally related to the involved mathematics and relative numerical problems. This paper represents a discussion on the philosophy that should drive the application of this technique to process a loudspeaker system in order to really improve it, and consequently it’s been focused on the analysis of the loudspeaker system nature and the understanding of what can really be processed with a 1-dimensional “action.” We will discuss what can be synthesized as a “2-port” model of the loudspeaker and then what can be effectively obtained by processing the input signal of a loudspeaker system.
Convention Paper 8261 (Purchase now)
P17-6 Filter Design for a Double Dipole Flat Panel Loudspeaker System Using Time Domain Toeplitz Equations—Tobias Corbach, Martin Holters, Udo Zölzer, Helmut-Schmidt-University/University of the Federal Armed Forces - Hamburg, Germany
Today flat panel loudspeakers are used in multiple applications. Due to their high directivity and their good structural integration properties, flat panel loudspeakers are commonly used for directed acoustic information. A previously proposed system of 2 parallel flat panel dipole loudspeakers with adapted input filtering ensures a high suppression of the backward radiation and only minor influences to the forward radiation side. This paper presents a new approach to the filter computation for this application. It makes use of the time domain convolution, realized by Toeplitz matrices and builds the desired filter impulse responses by a least squares approach. The different filter computations as well as the numerical and measured results are shown.
Convention Paper 8262 (Purchase now)
P17-7 A Low Complexity Approach for Loudness Compensation—Pradeep D. Prasad, Ittiam Systems Pvt. Ltd. - Bangalore, Karnataka, India
The essence of loudness compensation is to maintain the perceived spectral balance of audio content irrespective of the playback volume level. The need for this compensation arises due to the inherent non-linearity in human aural perception manifesting as change in spectral balance. The compensation varies with critical band, original, and playback specific loudness. This results in a computationally intensive approach of estimating original and target specific loudness and calculating required compensation for every frame. A low complexity algorithm is proposed to enable resource constrained devices to efficiently perform loudness compensation. A closed form expression is derived for the proposed compensation followed by an analysis of the quality versus complexity tradeoff.
Convention Paper 8263 (Purchase now)
P17-8 MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes—Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Jeroen Koppens, Philips Applied Technologies - Eindhoven, The Netherlands; Jürgen Herr, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jonas Engdegård, Dolby Sweden AB - Stockholm, Sweden; Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Lars Villemoes, Dolby Sweden AB - Stockholm, Sweden; Leonid Terentiev, Cornelia Falch, Andreas Hölzer, María Luis Valero, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Barbara Resch, Dolby Sweden AB - Stockholm, Sweden; Harald Mundt, Dolby Germany GmbH, Nürnberg, Germany; Hyen-O Oh, Digital TV Lab., LG Electronics, Seoul, Korea
In 2007, the ISO/MPEG Audio standardization group started a new work item on efficient coding of sound scenes comprising several audio objects by parametric coding techniques. Finalized in the summer of 2010, the resulting MPEG “Spatial Audio Object Coding” (SAOC) specification allows the representation of such scenes at bit rates commonly used for coding of mono or stereo sound. At the decoder side, each object can be interactively rendered, supporting applications like user-controlled music remixing and spatial teleconferencing. This paper summarizes the results of the standardization process, provides an overview of MPEG SAOC technology, and illustrates its performance by the results of the recent verification tests. The test includes operation modes for several typical application scenarios that take advantage of object-based processing.
Convention Paper 8264 (Purchase now)
Saturday, November 6, 2:30 pm — 6:30 pm (Room 236)
Paper Session: P18 - Binaural Audio
Durand R. Begault
P18-1 Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis, Part 2: Individual HRTFs—Juha Merimaa, Sennheiser Research Laboratory - Palo Alto, CA, USA
In the first part of this study , a method for designing modified head-related transfer function (HRTF) filters with reduced timbral effects was proposed. Spectral localization cues were effectively scaled down while preserving the interaural time and level differences. For non-individualized HRTFs, the modifications were found to produce no statistically significant changes in localization. This paper continues the investigation using individual HRTFs. It is shown that in this case the reduction in timbral effects comes at a slight listener-dependent cost in localization performance. The filter design thus enables trading off more neutral timbre against more accurate localization.
Convention Paper 8265 (Purchase now)
P18-2 On the Improvement of Auditory Accuracy with Non-Indivisualized HRTF-Based Sounds—Catarina Mendonça, Jorge Santos, University of Minho - Minho, Portugal; Guilherme Campos, Paulo Dias, José Vieira, University of Aveiro - Aveiro, Portugal; João Ferreira, University of Minho - Minho, Portugal
Auralization is a powerful tool to increase the realism and sense of immersion in Virtual Reality environments. The Head Related Transfer Function (HRTF) filters commonly used for auralization are non-individualized, as obtaining individualized HRTFs poses very serious practical difficulties. It is therefore extremely important to understand to what extent this hinders sound perception. In this paper we address this issue from a learning perspective. In a set of experiments, we observed that mere exposure to virtual sounds processed with generic HRTF did not improve the subjects’ performance in sound source localization, but short training periods involving active learning and feedback led to significantly better results. We propose that using auralization with non-individualized HRTF should always be preceded by a learning period.
Convention Paper 8266 (Purchase now)
P18-3 Processing and Improving Head-Related Impulse Response Database for Auralization—Ben Supper, Focusrite Audio Engineering Ltd. - High Wycombe, UK
To convert a database of anechoic head-related impulse responses [HRIRs] into a set of data that is suitable for auralization involves many stages of processing. The output data set must be precisely corrected to account for some circumstances of the recording. It must then be equalized to remove coloration. Finally, the database must be interpolated to a finer resolution. This paper explains these stages of correction, equalization, and spatial interpolation for a frequently-used data set obtained from a KEMAR dummy head. The result is a useful database of HRIRs that can be applied dynamically to audio signals for research and entertainment purposes.
Convention Paper 8267 (Purchase now)
P18-4 Stimulus-Dependent HRTF Preference—Agnieszka Roginska, New York University - New York, NY, USA; Gregory H. Wakefield, University of Michigan - Ann Arbor, MI, USA; Thomas S. Santoro, Naval Submarine Medical Research Lab - Groton, CT, USA
Measurement of individual Head Related Transfer Functions (HRTFs) can be inconvenient, expensive, and time consuming. User selected HRTFs can alleviate the complexity of individually measured HRTFs and make better quality 3-D audio available to more listeners. This paper presents the results of a study designed to investigate the use of user-selected HRTFs augmented with customized interaural cues. In the study presented subjects were asked to select HRTFs that resulted in an accurate precept based on three specific criteria: externalization quality, elevation, and front/back discrimination. Subjective tests were conducted using three different stimuli. Results of the experiment are presented.
Convention Paper 8268 (Purchase now)
P18-5 Comparison between Spherical Headmodels and HRTFs in Upmixing for Headphone-Based Virtual Surround and Stereo Expansion—Part I—Sunil Bharitkar, Chris Kyriakakis, Audyssey Labs., Inc. - Los Angeles, CA, USA, University of Southern California, Los Angeles, CA, USA
In this paper, a first of multiple-parts, we compare the performance of headmodels with head-related transfer functions (HRTFs), which have previously been published, using different upmixing techniques for headphone virtual surround. We consider a spherical head, with and without the pinna or the torso model, whereas for the HRTFs we incorporate the CIPIC, Nagoya, and MIT HRTF sets in the up-mixing. The up-mixing technique includes using the Moorer reverberator, a modified Moorer reverberator, and modeling the direct sound, the first several discrete reflections (with adjustable delay and amplitude), and the diffuse field reflections with a tunable frequency dependent decorrelator. Furthermore, since the measured HRTFs can introduce audible coloration we investigate if there is a trade-off between localization and timbre by incorporating complex-domain smoothing of the HRTF time responses. To evaluate the localization and timbre performance between the models we use movie and music content (viz., stereo, ITU downmix, and a commercial down-mix method) as well as Gaussian tone noise bursts of critical bandwidth.
Convention Paper 8269 (Purchase now)
P18-6 HRTF Measurements with Recorded Reference Signal—Marko Durkovic, Florian Sagstetter, Klaus Diepold, Technische Universität München - München, Germany
Head-Related Transfer Functions (HRTFs) are used for adding spatial information in 3-D audio synthesis or for binaural robotic sound localization. Both tasks work best when using a custom HRTF database that fits the physiology of each person or robot. Usually, measuring HRTFs is a time consuming and complex procedure that is performed with expensive equipment in an anechoic chamber. In this paper we present a method that enables HRTF measurement in everyday environments by passively recording the surroundings without the need to actively emit special excitation signals. Experiments show that our method captures the HRTF's spatial cues and enables accurate sound localization.
Convention Paper 8270 (Purchase now)
P18-7 Angular Resolution Requirements for Binaural Room Scanning—Todd Welti, Harman International - Northridge, CA, USA; Xinting Zhang, State University of New York at Binghampton - Binghampton, NY, USA
Binaural Room Scanning is a method of capturing and reproducing a binaural representation of a room or car, using a dummy head incorporating binaural microphones, and individual measurements made with the dummy head positioned at a number of different head angles. The measurement process can be time-consuming. It is therefore important to know how high the angular resolution needs to be. An experiment was performed to see if the angular resolution could be reduced from the current 1 degree resolution to 15 degree resolution, without causing an audible difference. Using a 3 alternative forced choice method, trained listeners compared 1 degree and 15 degree angular resolution and could not reliably detect the difference.
Convention Paper 8271 (Purchase now)
P18-8 Binaural Reproduction of 22.2 Multichannel Sound over Loudspeakers—Kentaro Matsui, Akio Ando, NHK Science and Technology Research Laboratories - Tokyo, Japan
NHK has proposed the 22.2 multichannel sound system, which consists of 22 loudspeakers and 2 for LFE producing three-dimensional spatial sound, to be the format for future TV broadcasting. To allow it to be reproduced in homes, we have investigated various reproduction methods that use fewer loudspeakers. We introduce a design of binarual rendering of the 22.2 multichannel sound with three frontal loudspeakers as a minimum configuration model for homes. It can stably process the system inverse filters by dividing them into all-pass and minimum-phase components and successfully compensate the sound quality with a peak suppression method.
Convention Paper 8272 (Purchase now)
Saturday, November 6, 4:00 pm — 5:30 pm (Room 133)
Broadcast and Media Streaming: B13 - Stream Formats for Content Delivery Networks
Ray Archie, CBS
Benny Fischer, Limelight
Andy Jones, Stream Guys
Andrew Snook, StreamOn
Sam Sousa, Stream the World
The streaming formats for CDN’s panel is about the relationship between distribution and encoding methodologies. Licensing, error-correction, quality vs compression, and consumer-adoption are just a few variables to be discussed by this all-star panel. We hope to shed light about the future of scalable and reliable digital distribution.
Saturday, November 6, 5:30 pm — 6:30 pm (Room 120)
Broadcast and Media Streaming: B14 - Careers in Broadcasting
Chriss Scherer, CPBE CBNT; Editor, Radio magazine; Past President, Society of Broadcast Engineers
William Blum, Station Engineer, KBLX-FM
Russell Brown, Chief Engineer, KMTP-TV
Steve Lampen, Multimedia Technology Manager and Product Line Manager, Belden
Kimberly Sacks, Contract Engineer
As technology has evolved, pro audio and broadcasting seem to have diverged. But the skills you use in a pro audio career are likely applicable to a career in broadcasting, too. The moderator and panelists each have experience in pro audio and broadcasting, and they will share their career insights to show that the two industries have a great deal in common. A Q&A will also be held to clarify the bridge between the two industries. Students and professionals are encouraged to attend.
Sunday, November 7, 9:00 am — 10:45 am (Room 130)
Tutorial: T12 - Loudspeakers and Headphones - Diagnostics of Sound Radiation
Wolfgang Klippel, Klippel GmbH
Distributed mechanical parameters describe the vibration and geometry of the sound radiating surface of loudspeaker drive units. This data is the basis for predicting the sound pressure output and a decomposition of the total vibration into modal and sound pressure related components. This analysis separates acoustical from mechanical problems, shows the relationship to the geometry and material properties and gives indications for practical improvement. The tutorial combines the theoretical background with practical loudspeaker diagnostics illustrated on various kinds of tranducers such as woofer, tweeter, compression driver, microspeaker and headphones.
Sunday, November 7, 9:00 am — 12:30 pm (Room 220)
Paper Session: P21 - Low-Bit-Rate Audio Coding
P21-1 Combination of Different Perceptual Models with Different Audio Transform Coding Schemes—Implementation and Evaluation—Armin Taghipour, Nicole Knölke, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany
In this paper four combinations of perceptual models and transform coding systems are implemented and compared. The first of the two perceptual models is based on a DFT with a uniform frequency resolution. The second model uses IIR filters designed in accordance with the temporal/spectral resolution of the auditory system. Both of the two transform coding systems use a uniform spectral decomposition (MDCT). While in the first system the quantizers are directly controlled by the perceptual model, the second system uses a pre- and post-filter with frequency warping for shaping the quantization noise with a temporal/spectral resolution more adapted to the auditory system. Implementation details are given and results of subjective tests are presented.
Convention Paper 8283 (Purchase now)
P21-2 Using Noise Substitution for Backwards-Compatible Audio Codec Improvement—Colin Raffel, Experimentalists Anonymous - Stanford, CA, USA
A method for representing error in perceptual audio coding as filtered noise is presented. Various techniques are compared for analyzing and re-synthesizing the noise representation. A focus is placed on improving the perceived audio quality with minimal data overhead. In particular, it is demonstrated that per-critical-band energy levels are sufficient to provide an increase in quality. Methods for including the coded error data in an audio file in a backwards-compatible manner are also discussed. The MP3 codec is treated as a case study, and an implementation of this method is presented.
Convention Paper 8284 (Purchase now)
P21-3 An Introduction to AVS Lossless Audio Coding—Haiyan Shu, Haibin Huang, Ti-Eu Chan, Rongshan Yu, Susanto Rahardja, Institute for Infocomm Research, Agency for Science, Technology & Research - Singapore
Recently, the audio video coding standard workgroup of China (AVS) issued a call for proposal for audio lossless coding. Several proposals were received, in which the proposal from the Institute for Infocomm Research was selected as Reference Model (RM). The RM is based on time-domain linear prediction and residual entropy coding. It introduces a novel residual pre-processing method for random access data frames and a memory-efficient arithmetic coder with dynamic symbol probability generation. The performance of RM is found to be comparable to those of MPEG-4 ALS and SLS. The AVS lossless coding is expected to be finalized at the end of 2010. It will become the latest extension of the AVS-P3 audio coding standard.
Convention Paper 8285 (Purchase now)
P21-4 Audio Re-Synthesis Based on Waveform Lookup Tables—Sebastian Heise, Michael Hlatky, Accessive Tools GmbH - Bremen, Germany; Jörn Loviscach, Hochschule Bielefeld, University of Applied Sciences - Bielefeld, Germany
Transmitting speech signals at optimum quality over a weak narrowband network requires audio codecs that must not only be robust to packet loss and operate at low latency, but also offer a very low bit rate and maintain the original sound of the coded signal. Advanced speech codecs for real-time communication based on code-excited linear prediction provide bandwidths as low as 2 kbit/s. We propose a new coding approach that promises even lower bit rates through a synthesis approach not based on the source-filter model, but merely on a lookup table of audio waveform snippets and their corresponding Mel-Frequency Cepstral Coefficients (MFCC). The encoder performs a nearest-neighbor search for the MFCC features of each incoming audio frame against the lookup table. This process is heavily sped up by building a multi-dimensional search tree of the MFCC-features. In a speech coding application, for each audio frame, only the index of the nearest neighbor in the lookup table would need to be transmitted. The encoder synthesizes the audio signal from the waveform snippets corresponding to the transmitted indices.
Convention Paper 8286 (Purchase now)
P21-5 A Low Bit Rate Mobile Audio High Frequency Reconstruction—Bo Hang, Ruimin Hu, Yuhong Yang, Ge Gao, Wuhan University - Wuhan, China
In present communication systems, high quality audio signals are supposed to be provided with low bit rate and low computational complexity. To increase the high frequency band quality in current communication system, this paper proposed a novel audio coding high frequency bandwidth extension method, which can improve decoded audio quality with increasing only a few coding bits per frame and a little computational complexity. This method calculates high-frequency synthesis filter parameters by using a codebook mapping method, and transmits quantified gain corrections in high-frequency parts of multiplexing coding bit streams. The test result shows that this method can provide comparable audio quality with lower bit consumption and computational complexity compared to the high frequency regeneration of AVS-P10.
Convention Paper 8287 (Purchase now)
P21-6 Perceptual Distortion-Rate Optimization of Long Term Prediction in MPEG AAC—Tejaswi Nanjundaswamy, Vinay Melkote, University of California, Santa Barbara - Santa Barbara, CA, USA; Emmanuel Ravelli, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Long Term Prediction (LTP) in MPEG Advanced Audio Coding (AAC) exploits inter-frame redundancies via predictive coding of the current frame, given previously reconstructed data. Particularly, AAC Low Delay mandates LTP, to exploit correlations that would otherwise be ignored due to the shorter frame size. The LTP parameters are typically selected by time-domain techniques aimed at minimizing the mean squared prediction error, which is mismatched with the ultimate perceptual criteria of audio coding. We thus propose a novel trellis-based approach that optimizes the LTP parameters, in conjunction with the quantization and coding parameters of the frame, explicitly in terms of the perceptual distortion and rate tradeoffs. A low complexity "two-loop" search alternative to the trellis is also proposed. Objective and subjective results provide evidence for substantial gains.
Convention Paper 8288 (Purchase now)
P21-7 Stereo Audio Coding Improved by Phase Parameters—Miyoung Kim, Eunmi Oh, Hwan Shim, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
The parametric stereo coding exploiting phase parameters in a bit-efficient way is a part of MPEG-D USAC (Unified Speech and Audio Coding) standard. This paper describes the down-mixing and up-mixing scheme to further enhance the stereo coding in strong out-of-phase or near out-of-phase signals. The conventional downmixing as a sum of left and right channel for parametric stereo coding has the potential problems, phase cancellation in out-of-phase signals, which results in audible artifacts. This paper proposes the phase alignment by estimated overall phase difference (OPD) parameter and inter-channel phase difference (IPD) parameter. Furthermore, this paper describes the phase modification to minimize the phase discontinuity of down-mixed signal by scaling the size of the stereo channels.
Convention Paper 8289 (Purchase now)
Sunday, November 7, 9:00 am — 10:00 am (Room 236)
Paper Session: P22 - Enhancement of Audio Reproduction
Richard Foss, Rhodes University - Grahamstown, South Africa
P22-1 Enhancing Stereo Audio with Remix Capability—Hyen-O Oh, LG Electronics Inc. - Seoul, Korea, Yonsei University, Seoul, Korea; Yang-Won Jung, LG Electronics Inc. - Seoul, Korea; Alexis Favrot, Illusonic LLC - Lausanne, Switzerland; Christof Faller, Illusonic LLC - Lausanne, Switzerland, EPFL, Lausanne, Switzerland
Many audio appliances feature capabilities for modifying audio signals, such as equalization, acoustic room effects, etc. However, these modification capabilities are always limited in the sense that they apply to the audio signal as a whole and not to a specific "audio object." We are proposing a scheme that enables modification of stereo panning and gain of specific objects inherent in a stereo signal. This capability is enabled (possibly stereo backwards compatibly) by adding a few kilobits of side information to the stereo signal. For generating the side information, the signals of the objects to be modified in the stereo signal are needed.
Convention Paper 8290 (Purchase now)
P22-2 Automatically Optimizing Situation Awareness and Sound Quality for a Sound Isolating Earphone—John Usher, Hearium Labs - San Francisco, CA, USA
Sound isolating (SI) earphones are increasingly used by the general public with portable media players in noisy urban and transport environments. The dangers of these SI earphones are becoming increasingly apparent, and an urgent review of their usage is being recommended by legislators. The problem is that user is removed from their local ambient scene: a reduction in their “situation awareness” that often leads to accidents involving unheard oncoming vehicles. This paper introduces a new automatic gain control system to automatically mix the ambient sound field with reproduced audio material. A discussion of the audio system architecture is given and an analysis of 20 different warning sounds is used to suggest suitable parameters.
Convention Paper 8291 (Purchase now)
Sunday, November 7, 9:30 am — 10:45 am (Room 133)
Broadcast and Media Streaming: B15 - Gating Methods and the New Loudness Recommendation EBU R 128
Florian Camerer, ORF - Austrian TV; chairman of EBU group PLOUD
Steve Lyman, Dolby Labs
One of the most fundamental changes in the history of audio in broadcasting is underway: the change of the leveling paradigm from peak normalization to loudness normalization. This session presents two aspects of loudness normalization. The first deals with the evaluation of different gating methods that shall help to further improve the matching of the objective measurement with the subjective impression of loudness.
Dolby Labs recently began building a database of wide and narrow dynamic range program samples and evaluating their subjective loudness. The loudness was assessed using the same method that was used by the ITU to develop Recommendation ITU-R BS.1770. The object of the work is to be able to evaluate the effectiveness of any proposed gated or un-gated loudness measurement method on wide and narrow dynamic range program material. Results of the current studies will be presented by Steve Lyman from Dolby Labs.
In the second presentation the core document of the EBU working group PLOUD will be introduced in detail: EBU R 128 "Loudness normalization and permitted maximum level of audio signals." Florian Camerer, the chairman of PLOUD, will explain this groundbreaking recommendation as well as the Technical Documents about Loudness Metering and the descriptor Loudness Range. The documents will also be examined with a special focus on their practical implications and consequences. Audio examples will illustrate the concept of loudness normalization.
Sunday, November 7, 10:15 am — 11:15 am (Room 236)
Paper Session: P24 - Audio Transmission
Richard Foss, Rhodes University - Grahamstown, South Africa
P24-1 Parameter Relationships in High-Speed Audio Networks—Nyasha Chigwamba, Richard Foss, Rhodes University - Grahamstown, South Africa; Robby Gurdan, Brad Klindradt, Universal Media Access Networks GmbH - Dusseldorf, Germany
There exists a need to remotely control and monitor parameters within audio devices. It is often necessary for changes in one parameter to affect other parameters. Thus, it is important to create relationships between parameters. The capability for relationships has existed for some time between the parameters within mixing consoles. This paper explores the parameter relationships within mixing consoles, the parameter relationships in current audio networks, and then goes on to propose some fundamental relationships that should exist between parameters. It describes how these relationships have been implemented within the X170 protocol.
Convention Paper 8301 (Purchase now)
P24-2 Experiment of Sixteen-Channel Audio Transmission Over IP Network by MPEG-4 ALS and Audio Rate-Oriented Adaptive Bit-Rate Video Codec—Yutaka Kamamoto, Noboru Harada, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan, NTT Network Innovation Laboratories, Yokosuka, Kanagawa, Japan; Takehiro Moriya, NTT Communication Science Labs. - Atsugi, Kanagawa, Japan; Sunyong Kim, Masanori Ogawara, Tatsuya Fujii, NTT Network Innovation Laboratories - Yokosuka, Kanagawa, Japan
This paper describes an experiment of lossless audio transmission over the IP network and introduces a prototype codec that combines lossless audio coding and variable bit rate video coding. In the experiment 16-channel acoustic signals compressed by MPEG-4 ALS were transmitted from a live venue to a cafe via the IP network to provide high-quality music. At the cafe, received sound data were decoded losslessly and appropriately remixed for adjustment to the environment at the location. The combination of high-definition video and audio data enables fans to enjoy a musical performance at places other than the live venue at the same time. This experiment motivates us to develop a codec that guarantees audio quality.
Convention Paper 8302 (Purchase now)
Sunday, November 7, 11:00 am — 12:30 pm (Room 133)
Broadcast and Media Streaming: B16 - Audio Performance in Streaming
David Prentice, Dale Pro Audio
J. Todd Baker, SRS Labs
Alex Kosiorek, Cleveland Institute of Music
Jan Nordmann, Fraunhofer
“This program is available over the air, on your computer, or on your mobile device.”
It’s a simple sentence, repeated by broadcasters all over the country. Like many simple sentences, it raises more questions than it answers. Battling bandwidth restrictions and with playback monitors ranging from full-range systems to ear buds, streaming presents a challenging environment for delivering high-quality audio. With the obligation to deliver programs via streaming media, how does a broadcaster maintain the highest audio quality throughout the delivery chain, and how does a broadcast engineer evaluate the audio quality to maximize their program’s audio impact? Are there accepted best practices and is anyone creating regulations or standards for program evaluation? Our panel will address practices, standards, and discuss new delivery formats in a lively presentation.
Sunday, November 7, 2:30 pm — 5:00 pm (Room 236)
Paper Session: P27 - Room Acoustics
P27-1 First Results from a Large-Scale Measurement Program for Home Theaters—Tomlinson Holman, Ryan Green, University of Southern California - Los Angeles, CA, USA, Audyssey Laboratories, Los Angeles, CA, USA
The introduction of one auto-equalization system to the home theater market with an accompanying reporting infrastructure provides methods of data collection that allows research into many practical system installations. Among the results delivered are histograms of room volume, reverberation time vs. volume and frequency, early arrival sound frequency response both equalized and unequalized, and steady-state frequency response both equalized and unequalized. The variation in response over the listening area is studied as well and sheds light on contemporary use of the Schroeder frequency.
Convention Paper 8310 (Purchase now)
P27-2 Improving the Assessment of Low Frequency Room Acoustics Using Descriptive Analysis—Matthew Wankling, Bruno Fazenda, William J. Davies, University of Salford - Salford, Greater Manchester, UK
Several factors contribute to the perceived quality of reproduced low-frequency audio in small rooms. Listeners often use descriptive terms such as “boomy” or “resonant.” However a robust terminology for rating samples during listening tests does not currently exist. This paper reports on an procedure to develop such a set of subjective descriptors for low frequency reproduced sound, using descriptive analysis. The descriptors that resulted are Articulation, Resonance, and Bass Content. These terms have been used in listening tests to measure the subjective effect of changing three objective room parameters: modal decay time, room volume, and source/receiver position. Reducing decay time increased Articulation while increased preference is associated with increased Articulation and decreased Resonance.
Convention Paper 8311 (Purchase now)
P27-3 Subjective Preference of Modal Control Methods in Listening Rooms—Bruno M. Fazenda, Lucy A. Elmer, Matthew Wankling, J. A. Hargreaves, J. M. Hirst, University of Salford - Greater Manchester, UK
Room modes are well known to cause unwanted effects in the correct reproduction of low frequencies in critical listening rooms. Methods to control these problems range from simple loudspeaker/listener positioning to quite complex digital signal processing. Nonetheless, the subjective importance and impact of these methods has rarely been quantified subjectively. A number of simple control methods have been implemented in an IEC standard listening environment. Eight different configurations were setup in the room simultaneously and could therefore be tested in direct comparison to each other. A panel of 20 listeners were asked to state their preferred configuration using the method of paired comparison. Results show clear winners and losers, indicating an informed strategy for efficient control.
Convention Paper 8312 (Purchase now)
P27-4 Wide-Area Psychoacoustic Correction for Problematic Room Modes Using Non-Linear Bass Synthesis—Adam J. Hill, Malcolm O. J. Hawksford, University of Essex - Colchester, UK
Small room acoustics are characterized by a limited number of dominant low-frequency room modes that result in wide spatio-pressure variations that traditional room correction systems find elusive to correct over a broad listening area. A psychoacoustic-based methodology is proposed whereby signal components coincident only with problematic modes are filtered and substituted by virtual bass components to forge an illusion of the suppressed frequencies. A scalable and hierarchical approach is studied using the Chameleon Subwoofer Array (CSA), and subjective evaluation confirms a uniform large-area performance. Bass synthesis exploits parallel nonlinear and phase vocoder generators with outputs blended as a function of transient and steady-state signal content.
Convention Paper 8313 (Purchase now)
P27-5 Beyond Coding: Reproduction of Direct and Diffuse Sounds in Multiple Environments—James D. Johnston, DTS Inc. - Kirkland, WA, USA; Jean-Marc Jot, DTS Inc. - Scotts Valley, CA, USA; Zoran Fejzo,, DTS Inc. - Calabasas, CA, USA; Steve R. Hastings, DTS Inc. - Scotts Valley, CA, USA
For many years, the difference in perception between perceptually direct sounds (i.e., sounds with a specific direction) and perceptually diffuse sounds (i.e., sounds that "surround" or "envelop" the listener) have been recognized, leading to a variety of approaches for simulating or capturing these perceptual effects. Here, we discuss a system using separation of direct and diffuse signals, or for synthetic signals (e.g., those made by modern production methods) synthesis of the diffuse signal in one of several ways, in order to enable the reproduction system, after measuring the characteristics of the playback system, to provide the best possible sensation from that particular set of playback equipment.
Convention Paper 8314 (Purchase now)
Sunday, November 7, 4:30 pm — 6:00 pm (Room 131)
Live Sound Seminar: LS11 - Networked Audio for Live Sound
Jonathan Novick, Audio Precision
Carl Bader, Aviom
Kevin Gross, AVA Networks
Lee Minich, Lab X Technologies
David Scheirman, JBL Professional
Steve Seable, Yamaha Corporation
David Scheirman, JBL Professional
Are audio networks the panacea we all hoped for or is it the peril we all fear? When it comes to live sound networks offer plenty of advantages. However, there are also tradeoffs. No two networks are alike and each offers unique benefits. Should you jump in now or wait for more standardization? FIgure out if networking makes sense for your live business.
Sunday, November 7, 4:30 pm — 6:00 pm (Room 130)
Workshop: W20 - Return to Quality in Audio Production
Andres Mayo, Andres Mayo Mastering - Buenos Aires, Argentina
Ronald Prent, Galaxy Studios - Mol, Belgium
Francisco Miranda, Engineer/Studio Owner - Mexico City, Mexico
Dave Reitzas, Mixer/Producer - Los Angeles, CA, USA
Jeff Wolpert, Producer/Educator - Toronto, Ontario, Canada
We are witnessing a return to the search for better quality in current audio productions, with engineers and producers more concerned about long lasting recordings instead of just thinking about MP3 and Internet delivery. In Los Angeles, London, and Mexico City (just to name a few) great sounding studios have recently opened, and established ones are regaining clientele thanks to new and improved recording and mastering systems. This panel will discuss the paradigm shift that is affecting industry professionals positively throughout the globe.
Sunday, November 7, 4:45 pm — 5:45 pm (Room 133)
Tutorial: T13 - Comparative Listening: What Can We Really Hear?
Learn objective comparative listening techniques while participating in a series of experiments that will dispel or confirm extraordinary claims made by equipment manufacturers and industry professionals.
Through out his 20+ years as a record maker Eric Valentine has continually heard extraordinary claims about hearing the performance differences between a huge variety of tools and products used in the industry. Frequently when asked, he found people are making claims based on a listening experience that is neither scientific or objective. As the industry continues to push for better and better performance from the equipment we use in many cases (digital converters, external clocking device, cables or even mic pres) the differences have become very minute and are impossible to evaluate in a casual way. Opinions derived from these casual listening tests can be the motivation behind purchases that involve many thousands of dollars. Valentine will explain how to apply traditional scientific method to listening tests; how psychological influences play a roll; and what it all means when choosing the tools and methods for record making. The goal of this tutorial is to have all the attendees leave with techniques and information that help them make confident, objective decisions when choosing equipment to buy or use, while participating in a fun interactive series of listening experiments.