AES San Francisco 2008
Consumer Products and Applications Event Details

Thursday, October 2, 9:00 am — 10:15 am

SDA Meeting – 1

Co-moderators:
Jose Leonardo Pupo, Chair
Teri Grossheim, Vice Chair

Abstract:
This opening meeting of the Student Delegate Assembly will introduce new events and election proceedings, announce candidates for the coming year’s election for the North/Latin America Regions, announce the finalists in the recording competition categories, hand out the judges’ sheets to the nonfinalists, and announce any upcoming events of the convention. Students and student sections will be given the opportunity to introduce themselves and their past/upcoming activities. In addition, candidates for the SDA election will be invited to the stage to give a brief speech outlining their platform.

All students and educators are invited and encouraged to participate in this meeting. Also at this time there will be the opportunity to sign up for the mentoring sessions, a popular activity with limited space for participation.

Thursday, October 2, 9:00 am — 12:30 pm

P1 - Audio Coding

Chair: Marina Bosi, Stanford University - Stanford, CA, USA

P1-1 A Parametric Instrument Codec for Very Low Bit Rates—Mirko Arnold, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
A technique for the compression of guitar signals is presented that utilizes a simple model of the guitar. The goal for the codec is to obtain acceptable quality at significantly lower bit rates compared to universal audio codecs. This instrument codec achieves its data compression by transmitting an excitation function and model parameters to the receiver instead of the waveform. The parameters are extracted from the signal using weighted least squares approximation in the frequency domain. For evaluation a listening test has been conducted and the results are presented. They show that this compression technique provides a quality level comparable to recent universal audio codecs. The application however is, at this stage, limited to very simple guitar melody lines. [This paper is being presented by Gerald Schuller.]
Convention Paper 7501 (Purchase now)

P1-2 Stereo ACC Real-Time Audio Communication—Anibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA; Filipe Abreu, SEEGNAL Research - Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Audio Communication Coder (ACC) is a codec that has been optimized for monophonic encoding of mixed speech/audio material while minimizing codec delay and improving intrinsic error robustness. In this paper we describe two major recent algorithmic improvements to ACC: on-the-fly bit rate switching and coding of stereo. A combination of source, parametric, and perceptual coding techniques allows a very graceful switching between different bit rates with minimal impact on the subjective quality. A real-time GUI demonstration platform is available that illustrates the ACC operation from 16 kbit/s mono till 256 kbit/s stereo. A real-time two-way stereo communication platform over Bluetooth has been implemented that illustrates the ACC operational flexibility and robustness in error-prone environments.
Convention Paper 7502 (Purchase now)

P1-3 MPEG-4 Enhanced Low Delay AAC—A New Standard for High Quality Communication—Markus Schnell, Markus Schmidt, Manuel Jander, Tobias Albert, Ralf Geiger, Fraunhofer IIS - Erlangen, Germany; Vesa Ruoppila, Per Ekstrand, Dolby Stockholm/Sweden, Nuremberg/Germany; Bernhard Grill, Fraunhofer IIS - Erlangen, Germany
The MPEG Audio standardization group has recently concluded the standardization process for the MPEG-4 ER Enhanced Low Delay AAC (AAC-ELD) codec. This codec is a new member of the MPEG Advanced Audio Coding family. It represents the efficient combination of the AAC Low Delay codec and the Spectral Band Replication (SBR) technique known from HE-AAC. This paper provides a complete overview of the underlying technology, presents points of operation as well as applications, and discusses MPEG verification test results.
Convention Paper 7503 (Purchase now)

P1-4 Efficient Detection of Exact Redundancies in Audio Signals—José R. Zapata G., Universidad Pontificia Bolivariana - Medellín, Antioquia, Colombia; Ricardo A. Garcia, Kurzweil Music Systems - Waltham, MA, USA
An efficient method to identify bitwise identical long-time redundant segments in audio signals is presented. It uses audio segmentation with simple time domain features to identify long term candidates for similar segments, and low level sample accurate metrics for the final matching. Applications in compression (lossy and lossless) of music signals (monophonic and multichannel) are discussed.
Convention Paper 7504 (Purchase now)

P1-5 An Improved Distortion Measure for Audio Coding and a Corresponding Two-Layered Trellis Approach for its Optimization—Vinay Melkote, Kenneth Rose, University of California - Santa Barbara, CA, USA
The efficacy of rate-distortion optimization in audio coding is constrained by the quality of the distortion measure. The proposed approach is motivated by the observation that the Noise-to-Mask Ratio (NMR) measure, as it is widely used, is only well adapted to evaluate relative distortion of audio bands of equal width on the Bark scale. We propose a modification of the distortion measure to explicitly account for Bark bandwidth differences across audio coding bands. Substantial subjective gains are observed when this new measure is utilized instead of NMR in the Two Loop Search, for quantization and coding parameters of scalefactor bands in an AAC encoder. Comprehensive optimization of the new measure, over the entire audio file, is then performed using a two-layered trellis approach, and yields nearly artifact-free audio even at low bit-rates.
Convention Paper 7505 (Purchase now)

P1-6 Spatial Audio Scene Coding—Michael M. Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
This paper provides an overview of a framework for generalized multichannel audio processing. In this Spatial Audio Scene Coding (SASC) framework, the central idea is to represent an input audio scene in a way that is independent of any assumed or intended reproduction format. This format-agnostic parameterization enables optimal reproduction over any given playback system as well as flexible scene modification. The signal analysis and synthesis tools needed for SASC are described, including a presentation of new approaches for multichannel primary-ambient decomposition. Applications of SASC to spatial audio coding, upmix, phase-amplitude matrix decoding, multichannel format conversion, and binaural reproduction are discussed.
Convention Paper 7507 (Purchase now)

P1-7 Microphone Front-Ends for Spatial Audio Coders—Christof Faller, Illusonic LLC - Lausanne, Switzerland
Spatial audio coders, such as MPEG Surround, have enabled low bit-rate and stereo backwards compatible coding of multichannel surround audio. Directional audio coding (DirAC) can be viewed as spatial audio coding designed around specific microphone front-ends. DirAC is based on B-format spatial sound analysis and has no direct stereo backwards compatibility. We are presenting a number of two capsule-based stereo compatible microphone front-ends and corresponding spatial audio encoder modifications that enable the use of spatial audio coders to directly capture and code surround sound.
Convention Paper 7508 (Purchase now)

Thursday, October 2, 9:00 am — 10:45 am

T1 - Electroacoustic Measurements

Presenter:
Christopher J. Struck, CJS Labs - San Francisco, CA, USA

Abstract:
This tutorial focuses on applications of electroacoustic measurement methods, instrumentation, and data interpretation as well as practical information on how to perform appropriate tests. Linear system analysis and alternative measurement methods are examined. The topic of simulated free field measurements is treated in detail. Nonlinearity and distortion measurements and causes are described. Last, a number of advanced tests are introduced.

This tutorial is intended to enable the participants to perform accurate audio and electroacoustic tests and provide them with the necessary tools to understand and correctly interpret the results.

Thursday, October 2, 9:00 am — 10:45 am

B1 - Listening Tests on Existing and New HDTV Surround Coding Systems

Chair:
Gerhard Stoll, IRT
Panelists:
Florian Camerer, ORF
Kimio Hamasaki, NHK Science & Technical Research Laboratories
Steve Lyman, Dolby Laboratories
Andrew Mason, BBC R&D
Bosse Ternström, SR

Abstract:
With the advent of HDTV services, the public is increasingly
being exposed to surround sound presentations using so-called home theater environments. However, the restricted bandwidth available into the home, whether by broadcast, or via broadband, means that there is an increasing interest in the performance of low bit rate surround sound audio coding systems for “emission” coding. The European Broadcasting Union Project Group D/MAE (Multichannel Audio Evaluations) conducted immense listening tests to asses the sound quality of multichannel audio codecs for broadcast applications in a range from 64 kbit/s to 1.5 Mbit/s. Several laboratories in Europe have contributed to this work.

This Broadcast Session will provide profound information about these tests and the results. Further information will be provided, how the professional industry, i.e. codec proponents and decoder manufacturers, is taking further steps to develop new products for multichannel sound in HDTV.

Thursday, October 2, 9:00 am — 12:30 pm

P2 - Analysis and Synthesis of Sound

Chair: Hiroko Terasawa, Stanford University - Stanford, CA, USA

P2-1 Spatialized Additive Synthesis of Environmental Sounds—Charles Verron, Orange Labs - Lannion, France, and Laboratoire de Mécanique et d’Acoustique, Marseille, France; Mitsuko Aramaki, Institut de Neurosciences Cognitives de la Méditerranée - Marseille, France; Richard Kronland-Martinet, Laboratoire de Mécanique et d’Acoustique - Marsielle, France; Grégory Pallone, Orange Labs - Lannion, France
In virtual auditory environment, sound sources are typically created in two stages: the “dry” monophonic signal is synthesized, and then, the spatial attributes (like source directivity, width, and position) are applied by specific signal processing algorithms. In this paper we present an architecture that combines additive sound synthesis and 3-D positional audio at the same level of sound generation. Our algorithm is based on inverse fast Fourier transform synthesis and amplitude-based sound positioning. It allows synthesizing and spatializing efficiently sinusoids and colored noise, to simulate point-like and extended sound sources. The audio rendering can be adapted to any reproduction system (headphones, stereo, 5.1, etc.). Possibilities offered by the algorithm are illustrated with environmental sound.
Convention Paper 7509 (Purchase now)

P2-2 Harmonic Sinusoidal + Noise Modeling of Audio Based on Multiple F0 Estimation—Maciej Bartkowiak, Tomasz Zernicki, Poznan University of Technology - Poznan, Poland
This paper deals with the detection and tracking of multiple harmonic series. We consider a bootstrap approach based on prior estimation of F0 candidates and subsequent iterative adjustment of a harmonic sieve with simultaneous refinement of the F0 and inharmonicity factor. Experiments show that this simple approach is an interesting alternative to popular strategies, where partials are detected without harmonic constraints, and harmonic series are resolved from mixed sets afterwards. The most important advantage is that common problems of tonal/noise energy confusion in case of unconstrained peak detection are avoided. Moreover, we employ a popular LP-based tracking method that is generalized to dealing with harmonically related groups of partials by using a vector inner product as the prediction error measure. Two alternative extensions of the harmonic model are also proposed in the paper that result in greater naturalness of the reconstructed audio: an individual frequency deviation component and a complex narrowband individual amplitude envelope.
Convention Paper 7510 (Purchase now)

P2-3 Sound Extraction of Delackered Records—Ottar Johnsen Frédéric Bapst, Ecole d'ingenieurs et d'architectes de Fribourg - Fribourg, Switzerland; Lionel Seydoux, Connectis AG - Berne, Switzerand
Most direct cut records are made of an aluminum or glass plate with a coated acetate lacquer. Such records are often crackled due to the shrinkage of the coating. It is impossible to read such records mechanically. We are presenting here a technique to reconstruct the sound from such record by scanning the image of the record and combining the sound from the different parts of the "puzzle." The system has been tested by extracting sounds from sound archives in Switzerland and in Austria. The concepts will be presented as well as the main challenges. Extracted sound samples will be played.
Convention Paper 7511 (Purchase now)

P2-4 Parametric Interpolation of Gaps in Audio Signals—Alexey Lukin, Moscow State University - Moscow, Russia; Jeremy Todd, iZotope, Inc. - Cambridge, MA, USA
The problem of interpolation of gaps in audio signals is important for the restoration of degraded recordings. Following the parametric approach over a sinusoidal model recently suggested in JAES by Lagrange et al., this paper proposes an extension to this interpolation algorithm by considering interpolation of a noisy component in a “sinusoidal + noise” signal model. Additionally, a new interpolator for sinusoidal components is presented and evaluated. The new interpolation algorithm becomes suitable for a wider range of audio recordings than just interpolation of a sinusoidal signal component.
Convention Paper 7512 (Purchase now)

P2-5 Classification of Musical Genres Using Audio Waveform Descriptors in MPEG-7—Nermin Osmanovic, Microsoft Corporation - Seattle, WA, USA
Automated genre classification makes it possible to determine the musical genre of an incoming audio waveform. One application of this is to help listeners find music they like more quickly among millions of tracks in an online music store. By using numerical thresholds and the MPEG-7 descriptors, a computer can analyze the audio stream for occurrences of specific sound events such as kick drum, snare hit, and guitar strum. The knowledge about sound events provides a basis for the implementation of a digital music genre classifier. The classifier inputs a new audio file, extracts salient features, and makes a decision about the musical genre based on the decision rule. The final classification results show a recognition rate in the range 75% to 94% for five genres of music.
Convention Paper 7513 (Purchase now)

P2-6 Loudness Descriptors to Characterize Programs and Music Tracks—Esben Skovenborg, TC Group Research - Risskov, Denmark; Thomas Lund, TC Electronic - Risskov, Denmark
We present a set of key numbers to summarize loudness properties of an audio segment, broadcast program, or music track: the loudness descriptors. The computation of these descriptors is based on a measurement of loudness level, such as specified by the ITU-R BS.1770. Two fundamental loudness descriptors are introduced: Center of Gravity and Consistency. These two descriptors were computed for a collection of audio segments from various sources, media, and formats. This evaluation demonstrates that the descriptors can robustly characterize essential properties of the segments. We propose three different applications of the descriptors: for diagnosing potential loudness problems in ingest material; as a means for performing a quality check, after processing/editing; or for use in a delivery specification.
Convention Paper 7514 (Purchase now)

P2-7 Methods for Identification of Tuning System in Audio Musical Signals—Peyman Heydarian, Lewis Jones, Allan Seago, London Metropolitan University - London, UK
The tuning system is an important aspect of a piece. It specifies the scale intervals and is an indicator of the emotions of a musical file. There is a direct relationship between musical mode and the tuning of a piece for modal musical traditions. So, the tuning system carries valuable information, which is worth incorporating into metadata of a file. In this paper different algorithms for automatic identification of the tuning system are presented and compared. In the training process, spectral and chroma average, and pitch histograms, are used to construct reference patterns for each class. The same is done for the testing samples and a similarity measure like the Manhattan distance classifies a piece into different tuning classes.
Convention Paper 7515 (Purchase now)

P2-8 “Roughometer”: Realtime Roughness Calculation and Profiling—Julian Villegas, Michael Cohen, University of Aizu - Aizu-Wakamatsu, Fukushima-ken, Japan
A software tool capable of determining auditory roughness in real-time is presented. This application, based on Pure-Data (Pd), calculates the roughness of audio streams using a spectral method originally proposed by Vassilakis. The processing speed is adequate for many real-time applications and results indicate limited but significant agreement with an Internet application of the chosen model. Finally, the usage of this tool is illustrated by the computation of a roughness profile of a musical composition that can be compared to its perceived patterns of “tension” and “relaxation.”
Convention Paper 7516 (Purchase now)

Thursday, October 2, 11:00 am — 1:00 pm

T2 - Standards-Based Audio Networks Using IEEE 802.1 AVB

Presenters:
Robert Boatright, Harman International - CA, USA
Matthew Xavier Mora, Apple - Cupertino, CA, USA
Michael Johas Teener, Broadcom Corp. - Irvine CA, USA

Abstract:
Recent work by IEEE 802 working groups will allow vendors to build a standards-based network with the appropriate quality of service for high quality audio performance and production. This new set of standards, developed by the IEEE 802.1 Audio Video Bridging Task Group, provides three major enhancements for 802-based networks:

1. Precise timing to support low-jitter media clocks and accurate synchronization of multiple streams,
2. A simple reservation protocol that allows an endpoint device to notify the various network elements in a path so that they can reserve the resources necessary to support a particular stream, and
3. Queuing and forwarding rules that ensure that such a stream will pass through the network within the delay specified by the reservation.

These enhancements require no changes to the Ethernet lower layers and are compatible with all the other functions of a standard Ethernet switch (a device that follows the IEEE 802.1Q bridge specification). As a result, all of the rest of the Ethernet ecosystem is available to developers—in particular, the various high speed physical layers (up to 10 gigabit/sec in current standards, even higher speeds are in development), security features (encryption and authorization), and advanced management (remote testing and configuration) features can be used. This tutorial will outline the basic protocols and capabilities of AVB networks, describe how such a network can be used, and provide some simple demonstrations of network operation (including a live comparison with a legacy Ethernet network).

Thursday, October 2, 1:00 pm — 2:30 pm

Opening Ceremonies
Awards
Keynote Speech

Abstract:
Opening Remarks:
• Executive Director Roger Furness
• President Bob Moses
• Convention Cochairs John Strawn, Valerie Tyler
Program:
• AES Awards Presentation
• Introduction of Keynote Speaker
• Keynote Address by Chris Stone

Awards Presentation

Please join us as the AES presents special awards to those who have made outstanding contributions to the Society in such areas of research, scholarship, and publications, as well as other accomplishments that have contributed to the enhancement of our industry. The awardees are:

PUBLICATIONS AWARD: Roger S. Grinnip III
BOARD OF GOVERNORS AWARD: Jim Anderson, Peter Swarte
FELLOWSHIP AWARD: Jonathan Abel, Angelo Farina, Rob Maher, Peter Mapp, Christoph Musialik, Neil Shaw, Julius Smith, Gerald Stanley, Alexander Voishvillo, William Whitlock
SILVER MEDAL AWARD: Keith Johnson
GOLD MEDAL AWARD: George Massenburg
DISTINGUISHED SERVICE MEDAL AWARD: Jay McKnight

Keynote Speaker

Record Plant co-founder Chris Stone will explore new trends and opportunities in the music industry and what it takes to succeed in today's environment, including how to utilize networking and free services to reduce risk when starting a new small business. Speaking from his strengths as a business/marketing entrepreneur, Stone will focus on the artist’s need to develop a sophisticated approach to operating their own business and also how traditional engineers can remain relevant and play a meaningful role in the ongoing evolution of the recording industry. Stone’s keynote address is entitled: The Artist Owns the Industry.

Thursday, October 2, 2:30 pm — 6:00 pm

TT2 - Dolby Laboratories, San Francisco

Abstract:
Visit legendary Dolby Laboratories’ headquarters while you are in San Francisco. Dolby, known for its more than 40 years of audio innovation and leadership, will showcase its latest technologies (audio and video) for high-definition packaged disc media and digital cinema. Demonstrations will take place in Dolby’s state-of-the-art listening rooms, and in their world-class Presentation Studio.

Dolby Laboratories (NYSE:DLB) is the global leader in technologies that are essential elements in the best entertainment experiences. Founded in 1965 and best known for high-quality audio and surround sound, Dolby innovations enrich entertainment at the movies, at home, or on the go. Visit http://www.dolby.com for more information.

Note: Maximum of 40 participants per tour.

Price: $30 (members), $40 (nonmembers)

Thursday, October 2, 2:30 pm — 4:30 pm

T3 - Broadband Noise Reduction: Theory and Applications

Presenters:
Alexey Lukin, iZotope, Inc. - Boston, MA, USA
Jeremy Todd, iZotope, Inc. - Boston, MA, USA

Abstract:
Broadband noise reduction (BNR) is a common technique for attenuating background noise in audio recordings. Implementations of BNR have steadily improved over the past several decades, but the majority of them share the same basic principles. This tutorial discusses various techniques used in the signal processing theory behind BNR. This will include earlier methods of implementation such as broadband and multiband gates and compander-based systems for tape recording. In addition to explanation of the early methods used in the initial implementation of BNR, greater emphasis and discussion will be focused toward recent advances in more modern techniques such as spectral subtraction. These include multi-resolution processing, psychoacoustic models, and the separation of noise into tonal and broadband parts. We will compare examples of each technique for their effectiveness on several types of audio recordings.

Thursday, October 2, 2:30 pm — 4:00 pm

W3 - Analyzing, Recommending, and Searching Audio Content—Commercial Applications of Music Information Retrieval

Chair:
Jay LeBoeuf, Imagine Research
Panelists:
Markus Cremer, Gracenote
Matthias Gruhne, Fraunhofer Institute for Digital Media Technology
Tristan Jehan, The Echo Nest
Keyvan Mohajer, Melodis Corporation

Abstract:
This workshop will focus on the cutting-edge applications of music information retrieval technology (MIR). MIR is a key technology behind music startups recently featured in Wired and Popular Science. Online music consumption is dramatically enhanced by automatic music recommendation, customized playlisting, song identification via cell phone, and rich metadata / digital fingerprinting technologies. Emerging startups offer intelligent music recommender systems, lookup of songs via humming the melody, and searching through large archives of audio. Recording and music software now offer powerful new features, leveraging MIR techniques. What’s out there and where is this all going? This workshop will inform AES members of the practical developments and exciting opportunities within MIR, particularly with the rich combination of commercial work in this area. Panelists will include industry thought-leaders: a blend of established commercial companies and emerging start-ups.

Thursday, October 2, 2:30 pm — 4:30 pm

P4 - Acoustic Modeling and Simulation

Chair: Scott Norcross, Communications Research Centre - Ottawa, Ontario, Canada

P4-1 Application of Multichannel Impulse Response Measurement to Automotive Audio—Michael Strauß, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and Technical University of Delft, Delft, The Netherlands; Diemer de Vries, Technical University of Delft - Delft, The Netherlands
Audio reproduction in small enclosures holds a couple of differences in comparison to conventional room acoustics. Today’s car audio systems meet sophisticated expectations but still the automotive listening environment delivers critical acoustic properties. During the design of such an audio system it is helpful to gain insight into the temporal and spatial distribution of the acoustic field's properties. Because room acoustic modeling software reaches its limits the use of acoustic imaging methods can be seen as a promising approach. This paper describes the application of wave field analysis based on a multichannel impulse response measurement in an automotive use case. Besides a suitable preparation of the theoretical aspects, the analysis method is used to investigate the acoustic wave field inside a car cabin.
Convention Paper 7521 (Purchase now)

P4-2 Multichannel Low Frequency Room Simulation with Properly Modeled Source Terms—Multiple Equalization Comparison—Ryan J. Matheson, University of Waterloo - Waterloo, Ontario, Canada
At low frequencies unwanted room resonances in regular-sized rectangular listening rooms cause problems. Various methods for reducing these resonances are available including some multichannel methods. Thus with introduction of setups like 5.1 surround into home theater systems there are now more options available to perform active resonance control using the existing loudspeaker array. We focus primarily on comparing, separately, each step of loudspeaker placement and its effects on the response in the room as well as the effect of adding additional symmetrically placed loudspeakers in the rear to cancel out any additional room resonances. The comparison is done by use of a Finite Difference Time Domain (FDTD) simulator with focus on properly modeling a source in the simulation. A discussion about the ability of a standard 5.1 setup to utilize a multichannel equalization technique (without adding additional loudspeakers to the setup) and a modal equalization technique is later discussed.
Convention Paper 7522 (Purchase now)

P4-3 A Super-Wide-Range Microphone with Cardioid Directivity—Kazuho Ono, Takehiro Sugimoto, Akio Ando, NHK Science and Technical Research Laboratories - Tokyo, Japan; Tomohiro Nomura, Yutaka Chiba, Keishi Imanaga, Sanken Microphone Co. Ltd. - Japan
This paper describes a super-wide-range microphone with cardioid directivity, which covers the frequency range up to 100 kHz. The authors have successfully developed the omni-directional microphone capable of picking up sounds of up to 100 kHz with low noise. The proposed microphone uses an omni-directional capsule adopted in the omni-directional super-wide-range microphone and a bi-directional capsule that is newly designed to fit the characteristics of the omni-directional one. The output signals of both capsules are synthesized as the output signals to achieve cardioid directivity. The measurement results show that the proposed microphone achieves wide frequency range up to 100 kHz, as well as low noise characteristics and excellent cardioid directivity.
Convention Paper 7523 (Purchase now)

P4-4 Methods and Limitations of Line Source Simulation—Stefan Feistel, Ahnert Feistel Media Group - Berlin, Germany; Ambrose Thompson, Martin Audio - High Wycombe, Bucks, UK; Wolfgang Ahnert, Ahnert Feistel Media Group - Berlin, Germany
Although line array systems are in widespread use today, investigations of the requirements and methods for accurate modeling of line sources are scarce. In previous publications the concept of the Generic Loudspeaker Library (GLL) was introduced. We show that on the basis of directional elementary sources with complex directivity data finite line sources can be simulated in a simple, general, and precise manner. We derive measurement requirements and discuss the limitations of this model. Additionally, we present a second step of refinement, namely the use of different directivity data for cabinets of identical type based on their position in the array. All models are validated by measurements. We compare the approach presented with other proposed solutions.
Convention Paper 7524 (Purchase now)

Thursday, October 2, 4:00 pm — 5:00 pm

Listening Session

Abstract:
Students are encouraged to bring in their projects to a non competitive listening sessions for feedback and comments form Dave Greenspan, a panel, and audience. Students will be able to sign up at the first SDA meeting for time slots. Students who are finalists in the Recording competition are excluded from this event to allow others who were not finalists the opportunity for feedback.

Thursday, October 2, 4:30 pm — 6:30 pm

B4 - Mobile/Handheld Broadcasting: Developing a New Medium

Chair:
Jim Kutzner, Public Broadcasting Service
Panelists:
Mark Aitken, Sinclair Broadcast Group
Sterling Davis, Cox Broadcasting
Brett Jenkins, Ion Media Networks
Dakx Turcotte, Neural Audio Corp.

Abstract:
The broadcasting industry, the broadcast and consumer equipment vendors, and the Advanced Television Systems Committee have been vigorously moving forward toward the development of a Mobile/Handheld DTV broadcast standard and its practical implementation. In order to bring this new service to the public players from various industry segments have come together in an unprecedented fashion. In this session key leaders in this activity will present what the emerging system includes, how far the industry has progressed, and what’s left to be done.

Thursday, October 2, 5:00 pm — 6:45 pm

M1 - Basic Acoustics: Understanding the Loudspeaker

Presenter:
John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada

Abstract:
This presentation is for AES members at an intermediate level and introduces many concepts in acoustics. The basic propagation of sound waves in air for both plane and spherical waves is developed and applied to the operation of a simple, sealed-box loudspeaker. Topics such as the acoustic impedance, compact source operation, and diffraction are included. Some live demonstrations with a simple loudspeaker; microphone and measuring computer are used to illustrate the basic radiation principle of a typical electrodynamic driver mounted in a sealed box.

Thursday, October 2, 5:00 pm — 6:45 pm

L3 - AC Power and Grounding

Chair:
Bruce C. Olson, Olson Sound Design - Minneapolis, MN, USA
Panelists:
David Stevens
Bill Whitlock, Jensen Transformers - Chatsworth, CA, USA

Abstract:
How do you kill the hum without killing yourself? This panel will discuss how to provide AC power properly, avoid hum and not kill the performers, technicians, or yourself. A lot of the advice out there isn’t just wrong, it is potentially fatal. However, being safe is easy. The only question is, why doesn’t everyone know this! We will also discuss the use of generator sets, the myths and facts about grounding, and typical configurations.

Thursday, October 2, 5:00 pm — 6:45 pm

W5 - Engineering Mistakes We Have Made in Audio

Chair:
Peter Eastty, Oxford Digital Limited - UK
Panelists:
Robert Bristow-Johnson, Audio Imagination
James D. (JJ) Johnston, Neural Audio Corp.
Mel Lambert, Media & Marketing
George Massenburg, Massenburg Design Works
Jim McTigue, Impulsive Audio

Abstract:
Six leading audio product developers will share the enlightening, thought-provoking, and (in retrospect) amusing lessons they have learned from actual mistakes they have made in the product development trenches.

Friday, October 3, 9:00 am — 11:00 am

Perceptual Audio Coding—The First 20 Years

Moderator:
Marina Bosi, Stanford University; author of Introduction to Digital Audio Coding and Standards
Panelists:
Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology; TU Ilmenau - Ilmenau, Germany
Bernd Edler, University of Hannover
Louis Fielder, Dolby Laboratories
J. D. Johnston, Neural Audio Corp. - Kirkland, WA, USA
John Princen, BroadOn Communications
Gerhard Stoll, IRT
Ken Sugiyama, NEC

Abstract:
Who would have guessed that teenagers and everybody else would be clamoring for devices with MP3/AAC (MPEG Layer III/MPEG Advanced Audio Coding) perceptual audio coders that fit into their pockets? As perceptual audio coders become more and more integral to our daily lives, residing within DVDs, mobile devices, broad/webcasting, electronic distribution of music, etc., a natural question to ask is: what made this possible and where is this going? This panel, which includes many of the early pioneers who helped advance the field of perceptual audio coding, will present a historical overview of the technology and a look at how the market evolved from niche to mainstream and where the field is heading.

Friday, October 3, 9:00 am — 11:00 am

M2 - Binaural Audio Technology—History, Current Practice, and Emerging Trends

Presenter:
Robert Schulein, Schaumburg, IL, USA

Abstract:
During the winter and spring of 1931-32, Bell Telephone Laboratories, in cooperation with Leopold Stokowski and the Philadelphia Symphony Orchestra, undertook a series of tests of musical reproduction using the most advanced apparatus obtainable at that time. The objectives were to determine how closely an acoustic facsimile of an orchestra could be approached using both stereo loudspeakers and binaural reproduction. Detailed documents discovered within the Bell Telephone archives will serve as a basis for describing the results and problems revealed while creating the binaural demonstrations. Since these historic events, interest in binaural recording and reproduction has grown in areas such as sound field recording, acoustic research, sound field simulation, audio for electronic games, music listening, and artificial reality. Each of theses technologies has its own technical concerns involving transducers, environmental simulation, human perception, position sensing, and signal processing. This Master Class will cover the underlying principles germane to binaural perception, simulation, recording, and reproduction. It will include live demonstrations as well as recorded audio/visual examples.

Friday, October 3, 9:00 am — 11:30 am

T4 - Perceptual Audio Evaluation

Presenters:
Søren Bech, Bang & Olufsen A/S - Struer, Denmark
Nick Zacharov, SenseLab - Delta, Denmark

Abstract:
The aim of this tutorial is to provide an overview of perceptual evaluation of audio through listening tests, based on good practices in the audio and affiliated industries. The tutorial is aimed at anyone interested in the evaluation of audio quality and will provide a highly condensed overview of all aspects of performing listening tests in a robust manner. Topics will include: (1) definition of a suitable research question and associated hypothesis, (2) definition of the question to be answered by the subject, (3) scaling of the subjective response, (4) control of experimental variables such as choice of signal, reproduction system, listening room, and selection of test subjects, (5) statistical planning of the experiments, and (6) statistical analysis of the subjective responses. The tutorial will include both theory and practical examples including discussion of the recommendations of relevant international standards (IEC, ITU, ISO). The presentation will be made available to attendees and an extended version will be available in the form of the text “Perceptual Audio Evaluation" authored by Søren Bech and Nick Zacharov.

Friday, October 3, 9:00 am — 11:30 am

P5 - Audio Equipment and Measurements

Chair: John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada

P5-1 Can One Perform Quasi-Anechoic Loudspeaker Measurements in Normal Rooms?—John Vanderkooy, Stanley Lipshitz, University of Waterloo - Waterloo, Ontario, Canada
This paper is an analysis of two methods that attempt to achieve high resolution frequency responses at low frequencies from measurements made in normal rooms. Such data is contaminated by reflections before the low-frequency impulse response of the system has fully decayed. By modifying the responses to decay more rapidly, then windowing a reflection-free portion, and finally recovering the full response by deconvolution, these quasi-anechoic methods purport to thwart the usual reciprocal uncertainty relationship between measurement duration and frequency resolution. One method works by equalizing the response down to dc, the other by increasing the effective highpass corner frequency of the system. Each method is studied with simulations, and both appear to work to varying degrees, but we question whether they are measurements or effectively simply model extensions. In practice noise significantly degrades both procedures.
Convention Paper 7525 (Purchase now)

P5-2 Automatic Verification of Large Sound Reinforcement Systems Using Models of Loudspeaker Performance Data—Klas Dalbjörn, Johan Berg, Lab.gruppen AB - Kungsbacka, Sweden
A method is described to automatically verify individual loudspeaker integrity and confirm the proper configuration of amplifier-loudspeaker connections in sound reinforcement systems. Using impedance-sensing technology in conjunction with software-based loudspeaker performance modeling, the procedure verifies that the load presented at each amplifier output corresponds to impedance characteristics as described in the DSP system’s currently loaded model. Accurate verification requires use of load impedance models created by iterative testing of numerous loudspeakers.
Convention Paper 7526 (Purchase now)

P5-3 Bend Radius—Stephen Lampen, Carl Dole, Shulamite Wan, Belden - San Francisco, CA, USA
Designers, installers, and system integrators, have many rules and guidelines to follow. Most of these are intended to maximize cable and equipment performance. Many of these are “rules-of-thumb,” simple guidelines, easy to remember, and often just as easily broken. One of these is the “rule-of-thumb” regarding the bending of cable, especially coaxial cable. Many may have heard the term “No tighter than ten times the diameter.” While this can be helpful, in a general way, there is a deeper and more complex question. What happens when you do bend cable? What if you have no choice? Often a specific choice of rack or configuration of equipment requires that cables be bent tighter than that recommendation. And what happens if you “unbend” a cable that has been damaged? Does it stay damaged or can it be restored? This paper outlines a series of laboratory tests to determine exactly what happens when cable is bent and what the reaction is. Further, we will analyze the effect of bending on cable performance, specifically looking at impedance variations and return loss (signal reflection). For high-definition video signals (HD-SDI) return loss is the key to maximum cable length, bit errors, and open eye patterns. So analyzing the effecting of bending will allow us to determine signal quality based on the bending of an individual cable. But does this apply to digital audio cables? Does the relatively low frequencies of AES digital signals make a difference? Can these cables be bent with less effect on performance? These tests were repeated on both coaxial cable of different sizes and twisted pairs. Flexible coax cables were tested, as well as the standard solid-core installation versions. Paired cables consisted of AES digital audio shielded cables, both install and flexible versions, were also tested.
Convention Paper 7527 (Purchase now)

P5-4 Detecting Changes in Audio Signals by Digital Differencing—Bill Waslo, Liberty Instruments Inc. - Liberty Township, OH, USA
A software application has been developed to provide an accessible method, based on signal subtraction, to determine whether or not an audio signal may have been perceptibly changed by passing through components, cables, or similar processes or treatments. The goals of the program, the capabilities required of it, its effectiveness, and the algorithms it uses are described. The program is made freely available to any interested users for use in such tests.
Convention Paper 7528 (Purchase now)

P5-5 Research on a Measuring Method of Headphones and Earphones Using HATS—Kiyofumi Inanaga, Takeshi Hara, Sony Corporation - Tokyo, Japan; Gunnar Rasmussen, G.R.A.S. Sound & Vibration A/S - Copenhagen, Denmark; Yasuhiro Riko, Riko Associates - Tokyo, Japan
Currently various types of couplers are used for measurement of headphones and earphones. The coupler was selected according to the device under test by the measurer. Accordingly it was difficult to compare the characteristics of headphones and earphones. A measuring method was proposed using HATS and a simulated program signal. However, the method had some problems in the shape of ear hole, and the measured results were not reproducible. We tried to improve the reproducibility of the measurement using several pinna models. As a result, we achieved a measuring platform using HATS, which gives good reproducibility of measured results for various types of headphones and earphones and then makes it possible to compare the measured results fairly.
Convention Paper 7529 (Purchase now)

Friday, October 3, 9:00 am — 1:00 pm

P6 - Loudspeaker Design

Chair: Alexander Voishvillo, JBL Professional - Northridge, CA, USA

P6-1 Loudspeaker Production Variance—Steven Hutt, Equity Sound Investments - Bloomington, IN, USA; Laurie Fincham, THX Ltd. - San Rafael, CA, USA
Numerous quality assurance philosophies have evolved over the last few decades designed to manage manufacturing quality. Managing quality control of production loudspeakers is particularly challenging. Variation of subcomponents and assembly processes across loudspeaker driver production batches may lead to excessive variation of sensitivity, bandwidth, frequency response, and distortion characteristics, etc. As loudspeaker drivers are integrated into production audio systems these variants result in broad performance permutation from system to system that affects all aspects of acoustic balance and spatial attributes. This paper will discuss traditional electro-dynamic loudspeaker production variation.
Convention Paper 7530 (Purchase now)

P6-2 Distributed Mechanical Parameters Describing Vibration and Sound Radiation of Loudspeaker Drive Units—Wolfgang Klippel, University of Technology Dresden - Dresden, Germany; Joachim Schlechter, KLIPPEL GmbH - Dresden, Germany
—Wolfgang Klippel, University of Dresden, Dresden, Germany; Joachim Schlechter, Klippel GmbH, Dresden, Germany The mechanical vibration of loudspeaker drive units is described by a set of linear transfer functions and geometrical data that are measured at selected points on the surface of the radiator (cone, dome, diaphragm, piston, panel) by using a scanning technique. These distributed parameters supplement the lumped parameters (T/S, nonlinear, thermal), simplify the communication between cone, driver, and loudspeaker system design and open new ways for loudspeaker diagnostics. The distributed vibration can be summarized to a new quantity called accumulated acceleration level (AAL), which is comparable with the sound pressure level (SPL) if no acoustical cancellation occurs. This and other derived parameters are the basis for modal analysis and novel decomposition techniques that make the relationship between mechanical vibration and sound pressure output more transparent. Practical problems and indications for practical improvements are discussed for various example drivers. Finally, the usage of the distributed parameters within finite and boundary element analyses is addressed and conclusions for the loudspeaker design process are made.
Convention Paper 7531 (Purchase now)

P6-3 A New Methodology for the Acoustic Design of Compression Driver Phase-Plugs with Radial Channels—Mark Dodd, Celestion International Ltd. - Ipswich, UK,and GP Acousics (UK) Ltd., Maidstone, UK; Jack Oclee-Brown, GP Acousics (UK) Ltd. - Maidstone, UK, and University of Southampton, Southampton, UK
Recent work by the authors describes an improved methodology for the design of annular-channel, dome compression drivers. Although not so popular, radial channel phase plugs are used in some commercial designs. While there has been some limited investigation into the behavior of this kind of compression driver, the literature is much more extensive for annular types. In particular, the modern approach to compression driver design, based on a modal description of the compression cavity, as first pioneered by Smith, has no equivalent for radial designs. In this paper we first consider if a similar approach is relevant to radial-channel phase plug designs. The acoustical behavior of a radial-channel compression driver is analytically examined in order to derive a geometric condition that ensures minimal excitation of the compression cavity modes.
Convention Paper 7532 (Purchase now)

P6-4 Mechanical Properties of Ferrofluids in Loudspeakers—Guy Lemarquand, Romain Ravaud, Valerie Lemarquand, Claude Depollier, Laboratoire d’Acoustique de l’Université du Maine - Le Mans, France
This paper describes the properties of ferrofluid seals in ironless electrodynamic loudspeakers. The motor consists of several outer stacked ring permanent magnets. The inner moving part is a piston. In addition, two ferrofluid seals are used that replace the classic suspension. Indeed, these seals fulfill several functions. First, they ensure the airtightness between the loudspeaker faces. Second, they act as bearings and center the moving part. Finally, the ferrofluid seals also exert a pull back force on the moving piston. Both radial and axial forces exerted on the piston are calculated thanks to analytical formulations. Furthermore, the shape of the seal is discussed as well as the optimal quantity of ferrofluid. The seal capacity is also calculated.
Convention Paper 7533 (Purchase now)

P6-5 An Ironless Low Frequency Subwoofer Functioning under its Resonance Frequency—Benoit Merit, Université du Maine - Le Mans, France, Orkidia Audio, Saint Jean de Luz, France; Guy Lemarquand, Université du Maine - Le Mans, France; Bernard Nemoff, Orkidia Audio - Saint Jean de Luz, France
A low frequency loudspeaker (10 Hz to 100 Hz) is described. Its structure is totally ironless in order to avoid nonlinear effects due to the presence of iron. The large diaphragm and the high force factor of the loudspeaker lead to its high efficiency. Efforts have been made for reducing the nonlinearities of the loudspeaker for a more accurate sound reproduction. In particular we have developed a motor totally made of permanent magnets, which create a uniform induction across the entire intended displacement of the coil. The motor linearity and the high force factor of this flat loudspeaker make it possible to function under its resonance frequency with great accuracy.
Convention Paper 7534 (Purchase now)

P6-6 Line Arrays with Controllable Directional Characteristics—Theory and Practice—Laurie Fincham, Peter Brown, THX Ltd. - San Rafael, CA, USA
A so-called arc line array is capable of providing directivity control. Applying simple amplitude shading can, in theory, provide good off-axis lobe suppression and constant directivity over a frequency range determined at low-frequencies by line length and at high-frequencies by driver spacing. Array transducer design presents additional challenges–the dual requirements of close spacing, for accurate high-frequency control, and a large effective radiating area, for good bass output, are incompatible with the use of multiple full-range drivers. A novel drive unit layout is proposed and theoretical and practical design criteria are presented for a two-way line with controllable directivity and virtual elimination of spatial aliasing. The PC-based array controller permits real-time changes in beam parameters for multiple overlaid beams.
Convention Paper 7535 (Purchase now)

P6-7 Loudspeaker Directivity Improvement Using Low Pass and All Pass Filters—Charles Hughes, Excelsior Audio Design & Services, LLC - Gastonia, NC, USA
The response of loudspeaker systems employing multiple drivers within the same pass band is often less than ideal. This is due to the physical separation of the drivers and their lack of proper acoustical coupling within the higher frequency region of their use. The resultant comb filtering is sometimes addressed by applying a low pass filter to one or more of the drivers within the pass band. This can cause asymmetries in the directivity response of the loudspeaker system. A method is presented to greatly minimize these asymmetries through the use of low pass and all pass filters. This method is also applicable as a means to extend the directivity control of a loudspeaker system to lower frequencies.
Convention Paper 7536 (Purchase now)

P6-8 On the Necessary Delay for the Design of Causal and Stable Recursive Inverse Filters for Loudspeaker Equalization—Avelino Marques, Diamantino Freitas, Polytechnic Institute of Porto - Porto, Portugal
The authors have developed and applied a novel approach to the equalization of non-minimum phase loudspeaker systems, based on the design of Infinite Impulse Response (recursive) inverse filters. In this paper the results and improvements attained on this novel IIR filter design method are presented. Special attention has been given to the delay of the equalized system. The boundaries to be posed on the search space of the delay for a causal and stable inverse filter, to be used in the nonlinear least squares minimization routine, are studied, identified, and related with the phase response of a test system and with the order of the inverse filter. Finally, these observations and relations are extended and applied to multi-way loudspeaker systems, demonstrating the connection of the lower and upper bounds of the delay with the loudspeaker’s crossover filters phase response and inverse filter order.
Convention Paper 7537 (Purchase now)

Friday, October 3, 9:00 am — 10:30 am

W6 - Audio Networking for the Pros

Chair:
Umberto Zanghieri, ZP Engineering srl
Panelists:
Steve Gray, Peavey Digital Research
Greg Shay, Axia Audio
Jérémie Weber, Auvitran
Aidan Williams, Audinate

Abstract:
Several solutions are available on the market today for digital audio transfer over conventional data cabling, but only some of them allow usage of standard networking equipment. This workshop presents some commercially available solutions (Cobranet, Livewire, Ethersound, Dante), with specific focus on noncompressed, low-latency audio transmission for pro-audio and live applications using standard IEEE 802.3 network technology. The main challenges of digital audio transport will be outlined, including compatibility with common networking equipment, reliability, latency, and deployment. Typical scenarios will be proposed, with panelists explaining their own approaches and solutions.

Friday, October 3, 9:00 am — 10:30 am

P7 - Audio Content Management

P7-1 A Piano Sound Database for Testing Automatic Transcription Methods—Luis Ortiz-Berenguer, Elena Blanco-Martin, Alberto Alvarez-Fernandez, Jose A. Blas-Moncalvillo, Francisco J. Casajus-Quiros, Universidad Politecnica de Madrid - Madrid, Spain
A piano sound database, called PianoUPM, is presented. It is intended to help the researching community in developing and testing transcription methods. A practical database needs to contain notes and chords played through the full piano range, and it needs to be recorded from acoustic pianos rather than synthesized ones. The presented piano sound database includes the recording of thirteen pianos from different manufacturers. There are both upright and grand pianos. The recordings include the eighty-eight notes and eight different chords played both in legato and staccato styles. It also includes some notes of every octave played with four different forces to analyze the nonlinear behavior. This work has been supported by the Spanish National Project TEC2006-13067-C03-01/TCM.
Convention Paper 7538 (Purchase now)

P7-2 Measurements of Spaciousness for Stereophonic Music—Andy Sarroff, Juan P. Bello, New York University - New York, NY, USA
The spaciousness of pre-recorded stereophonic music, or how large and immersive the virtual space of it is perceived to be, is an important feature of a produced recording. Quantitative models of spaciousness as a function of a recording’s (1) wideness of source panning and of a recording’s (2) amount of overall reverberation are proposed. The models are independently evaluated in two controlled experiments. In one, the panning widths of a distribution of sources with varying degrees of panning are estimated; in the other, the extent of reverberation for controlled mixtures of sources with varying degrees of reverberation are estimated. The models are shown to be valid in a controlled experimental framework.
Convention Paper 7539 (Purchase now)

P7-3 Music Annotation and Retrieval System Using Anti-Models—Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, National Tsing Hua University - Taiwan
Query-by-semantic-description (QBSD) is a natural way for searching/annotating music in a large database. We propose such a system by considering anti-words for each annotation word based on the concept of supervised multi-class labeling (SML). Moreover, words that are highly correlated with the anti-semantic meaning of a word constitute its anti-word set. By modeling both a word and its anti-word set, our system can achieve +8.21% and +1.6% gains of average precision and recall against SML under the condition of an equal average number of annotation words, that is, 10. By incorporating anti-models, we also allow queries with anti-semantic words, which is not an option for previous systems.
Convention Paper 7540 (Purchase now)

P7-4 The Effects of Lossy Audio Encoding on Onset Detection Tasks—Kurt Jacobson, Matthew Davies, Mark Sandler, Queen Mary University of London - London, UK
In large audio collections, it is common to store audio content with perceptual encoding. However, encoding parameters may vary from collection to collection or even within a collection—using different bit rates, sample rates, codecs, etc. We evaluated the effect of various audio encodings on the onset detection task and show that audio-based onset detection methods are surprisingly robust in the presence of MP3 encoded audio. Statistically significant changes in onset detection accuracy only occur at bit-rates lower than 32 kbps.
Convention Paper 7541 (Purchase now)

P7-5 An Evaluation of Pre-Processing Algorithms for Rhythmic Pattern Analysis—Matthias Gruhne, Christian Dittmar, Daniel Gaertner, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau Technical University - Ilmenau, Germany
For the semantic analysis of polyphonic music, such as genre recognition, rhythmic pattern features (also called Beat Histogram) can be used. Feature extraction is based on the correlation of rhythmic information from drum instruments in the audio signal. In addition to drum instruments, the sounds of pitched instruments are usually also part of the music signal to analyze. This can have a significant influence on the correlation patterns. This paper describes the influence of pitched instruments for the extraction of rhythmic features, and evaluates two different pre-processing methods. One method computes a sinusoidal and noise model, where its residual signal is used for feature extraction. In the second method, a drum transcription based on spectral characteristics of drum sounds is performed, and the rhythm pattern feature is derived directly from the occurrences of the drum events. Finally, the results are explained and compared in detail.
Convention Paper 7542 (Purchase now)

P7-6 A Framework for Producing Rich Musical Metadata in Creative Music Production—Gyorgy Fazekas, Yves Raimond, Mark Sandler, Queen Mary University of London - London, UK
Musical metadata may include references to individuals, equipment, procedures, parameters, or audio features extracted from signals. There are countless possibilities for using this data during the production process. An intelligent audio editor, besides internally relying on it, can be both producer and consumer of information about speci?c aspects of music production. In this paper we propose a framework for producing and managing meta information about a recording session, a single take or a subsection of a take. As basis for the necessary knowledge representation we use the Music Ontology with domain speci?c extensions. We provide examples on how metadata can be used creatively, and demonstrate the implementation of an extended metadata editor in a multitrack audio editor application.
Convention Paper 7543 (Purchase now)

P7-7 SoundTorch: Quick Browsing in Large Audio Collections—Sebastian Heise, Michael Hlatky, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Musicians, sound engineers, and foley artists face the challenge of finding appropriate sounds in vast collections containing thousands of audio files. Imprecise naming and tagging forces users to review dozens of files in order to pick the right sound. Acoustic matching is not necessarily helpful here as it needs a sound exemplar to match with and may miss relevant files. Hence, we propose to combine acoustic content analysis with accelerated auditioning: Audio files are automatically arranged in 2-D by psychoacoustic similarity. A user can shine a virtual flashlight onto this representation; all sounds in the light cone are played back simultaneously, their position indicated through surround sound. User tests show that this method can leverage the human brain's capability to single out sounds from a spatial mixture and enhance browsing in large collections of audio content.
Convention Paper 7544 (Purchase now)

P7-8 File System Tricks for Audio Production—Michael Hlatky, Sebastian Heise, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Not every file presented by a computer operating system needs to be an actual stream of independent bits. We demonstrate that different types of virtual files and folders including so-called "Filesystems in Userspace" (FUSE) allow streamlining audio content management with relatively little additional complexity. For instance, an off-the-shelf database system may present a distributed sound library through (seemingly) standard files in a project-specific hierarchy with no physical copying of the data involved. Regions of audio files may be represented as separate files; audio effect plug-ins may be displayed as collections of folders for on-demand processing while files are read. We address differences between operating systems, available implementations, and lessons learned when applying such techniques.
Convention Paper 7545 (Purchase now)

Friday, October 3, 10:00 am — 12:30 pm

TT4 - Paul Stubblebine Mastering/The Tape Project, San Francisco

Abstract:
A world-class mastering studio with credits that include classic recordings for The Grateful Dead and Santana and such new artists as Ferron, California Zephyr, and Jennifer Berezan. Now deeply involved with DVD as well as traditional audio mastering, the studio recently moved to a larger, full service Mission Street complex. The Tape Project remasters recordings for analog tape distribution.

Note: Maximum of 20 participants per tour.

Price: $30 (members), $40 (nonmembers)

Friday, October 3, 11:00 am — 1:00 pm

B5 - Loudness Workshop

Chair:
John Chester
Panelists:
Marvin Caesar, Aphex
James Johnston, Neural Audio Corp.
Thomas Lund, TC Electronic A/S
Andrew Mason, BBC
Robert Orban, Orban/CRL
Jeffery Riedmiller, Dolby Laboratories

Abstract:
New challenges and opportunities await broadcast engineers concerned about optimum sound quality in this contemporary age of multichannel sound and digital broadcasting. The earliest studies in the measurement of loudness levels were directed to telephony issues, with the publication in 1933 of the equal-loudness contours of Fletcher and Munson, and the Bell Labs tests of more than a half-million listeners at the 1938 New York Worlds Fair demonstrating that age and gender are also important factors in hearing response. A quarter of a century later, broadcasters began to take notice of the often-conflicting requirements of controlling both modulation and loudness levels. These are still concerns today as new technologies are being adopted. This session will explore the current state of the art in the measurement and control of loudness levels and look ahead to the next generation of techniques that may be available to audio broadcasters.

Friday, October 3, 11:00 am — 1:00 pm

L5 - Practical Advice for Wireless Systems Users

Chair:
Karl Winkler, Lectrosonics
Panelists:
Freddy Chancellor
Henry Cohen, Production Radio Rentals - NYC, NY, USA
Michael Pettersen, Shure Incorporated - Niles, IL, USA

Abstract:
From houses of worship to wedding bands to community theaters, there are small- to medium-sized wireless microphone systems and IEMs in use by the millions. Unlike the Super Bowl or the Grammys, these smaller systems often do not have dedicated technicians, sophisticated frequency coordination, or in many cases even the proper basic attention to system setup. This live sound event will begin with a basic discussion of the elements of properly choosing components, designing systems, and setting them up in order to minimize the potential for interference while maximizing performance. Topics covered will include antenna placement, antenna cabling, spectrum scanning, frequency coordination, gain structure, system monitoring and simple testing/troubleshooting procedures. Briefly covered will also be planning for upcoming RF spectrum changes.

Friday, October 3, 11:30 am — 1:00 pm

P8 - Room Acoustics and Binaural Audio

P8-1 On the Minimum-Phase Nature of Head-Related Transfer Functions—Juhan Nam, Miriam A. Kolar, Jonathan S. Abel, Stanford University - Stanford, CA, USA
For binaural synthesis, head-related transfer functions (HRTFs) are commonly implemented as pure delays followed by minimum-phase systems. Here, the minimum-phase nature of HRTFs is studied. The cross-coherence between minimum-phase and unprocessed measured HRTFs was seen to be greater than 0.9 for a vast majority of the HRTFs, and was rarely below 0.8. Non-minimum-phase filter components resulting in reduced cross-coherence appeared in frontal and ipsilateral directions. The excess group delay indicates that these non-minimum-phase components are associated with regions of moderate HRTF energy. Other regions of excess phase correspond to high-frequency spectral nulls, and have little effect on cross-coherence.
Convention Paper 7546 (Purchase now)

P8-2 Apparatus Comparison for the Characterization of Spaces—Adam Kestian, Agnieszka Roginska, New York University - NY, NY, USA
This work presents an extension of the Acoustic Pulse Reflectometry (APR) methodology that was previously used to obtain the characteristics of smaller acoustic spaces. Upon reconstructing larger spaces, the geometric configuration and characteristics of the measurement apparatus can be directly related to the clarity of the results. This paper describes and compares three measurement setups and apparatus configurations. The advantages and disadvantages of each methodology are discussed.
Convention Paper 7547 (Purchase now)

P8-3 Quantifying the Effect of Room Response on Automatic Speech Recognition Systems—Jeremy Anderson, John Harris, University of Florida - Gainesville, FL, USA
It has been demonstrated that the acoustic environment has an impact on timbre and speech intelligibility. Automatic speech recognition is an established area that suffers from the negative effects of mismatch between different room impulse responses (RIR). To better understand the changes imparted by the RIR, we have created synthetic responses to simulate utterances recorded in different locations. Using speech recognition techniques to quantify our results, we then looked for trends in performance to connect with impulse response changes.
Convention Paper 7548 (Purchase now)

P8-4 In Situ Determination of Acoustic Absorption Coefficients—Scott Mallais, University of Waterloo - Waterloo, Ontario, Canada
The determination of absorption characteristics for a given material is developed for in situ measurements. Experiments utilize maximum length sequences and a single microphone. The sound pressure is modeled using the compact source approximation. Emphasis is placed on low frequency resolution that is dependent on the geometry of both the loudspeaker-microphone-sample configuration and the room in which the measurement is performed. Methods used to overcome this limitation are discussed. The concept of the acoustic center is applied in the low frequency region, modifying the calculation of the absorption coefficient.
Convention Paper 7549 (Purchase now)

P8-5 Head-Related Transfer Function Customization by Frequency Scaling and Rotation Shift Based on a New Morphological Matching Method—Pierre Guillon, Laboratoire d’Acoustique de l’Université du Maine - Le Mans, France, and Orange Labs, Lannion, France; Thomas Guignard, Rozenn Nicol, Orange Labs - Lannion, France
Head-Related Transfer Functions (HRTFs) individualization is required to achieve high quality Virtual Auditory Spaces. An alternative to acoustic measurements is the customization of non-individual HRTFs. To transform HRTF data, we propose a combination of frequency scaling and rotation shifts, whose parameters are predicted by a new morphological matching method. Mesh models of head and pinnae are acquired, and differences in size and orientation of pinnae are evaluated with a modified Iterative Closest Point (ICP) algorithm. Optimal HRTF transformations are computed in parallel. A relatively good correlation between morphological and transformation parameters is found and allows one to predict the customization parameters from the registration of pinna shapes. The resulting model achieves better customization than frequency scaling only, which shows that adding the rotation degree of freedom improves HRTF individualization.
Convention Paper 7550 (Purchase now)

Friday, October 3, 12:00 pm — 2:30 pm

TT5 - Singer V7 Studios/Universal Audio, San Francisco

Abstract:
Four-time Emmy Award-Winning Composer/Producer/Performer/Writer, & Director Scott Singer has been at the cutting edge of technology-based entertainment for three decades. Most recently Mr. Singer was the Technical Musical Director and Assistant Director for the High-Definition DVD live recordings of Boz Scaggs’ Jazz Album and Greatest Hits, and the HD simulcast of the San Francisco Opera Rigoletto.

Now in its 24th year of operation, Singer Productions and Singer Studios V7 continues to serve as a state of the art recording facility for both Mr. Singer’s projects as well as many other talented recording artists and performers. Scott has just completed a full studio remodel (Version 7) adding the world’s first Bentley Edition Recording Suite—featuring a custom British mixing desk from John Oram/Trident, the “GP40,” as well as classic high-end components from Neve, SSL, Universal Audio, RCA, and GML.

State of the Emulation

Dave Berners will be co-presenting an answering questions on plug-in
emulations. They will demo UA gear, with in-studio live vocals by singer
Kyah Doran.

Note: Maximum of 30 participants per tour.

Price: $30 (members), $40 (nonmembers)

Friday, October 3, 1:00 pm — 2:00 pm

Lunchtime Keynote: Dave Giovannoni of First Sounds

Abstract:
Before Edison—Recovering the World's First Audio Recordings

First Sounds, an informal collaborative of audio engineers and historians, recently corrected the historical record and made international headlines by playing back a phonautogram made in Paris in April 1860—a ghostly, ten-second evocation of a French folk song. This and other phonautograms establish a forgotten French typesetter as the first person to record reproducible airborne sound 17 years before Edison invented the phonograph. Primitive and nearly accidental, the world’s first audio recordings pose a unique set of technical challenges. David Giovannoni of First Sounds discusses their recovery and restoration and will premiere two newly restored recordings.

Friday, October 3, 2:00 pm — 4:45 pm

Recording Competition - Surround

Abstract:
The Student Recording Competition is a highlight at each convention. A distinguished panel of judges participates in critiquing finalists of each category in an interactive presentation and discussion. Student members can submit stereo and surround recordings in the categories classical, jazz, folk/world music, and pop/rock. Meritorious awards will be presented at the closing Student Delegate Assembly Meeting on Sunday.

Friday, October 3, 2:00 pm — 6:00 pm

TT6 - Tarpan Studios/Ursa Minor Arts & Media, San Rafael

Abstract:
World-renowned producer/artist Narada Michael Walden has owned this gem-like studio for over twenty years. During that time artists such as Aretha Franklin, Whitney Houston, Mariah Carey, Steve Winwood, Kenny G, and Sting have recorded gold and platinum albums here. The tour will also include URSA Minor Arts & Media, an innovative web and multimedia production company.

Note: Maximum of 20 participants per tour.

Price: $35 (members), $45 (nonmembers)

Friday, October 3, 2:30 pm — 4:00 pm

Compressors—A Dynamic Perspective

Moderator:
Fab Dupont
Panelists:
Dave Derr
Wade Goeke
Dave Hill
Hutch Hutchison
George Massenburg
Rupert Neve

Abstract:
A device that, some might say, is being abused by those involved in the “loudness wars,” the dynamic range compressor can also be a very creative tool. But how exactly does it work? Six of the audio industry’s top designers and manufacturers lift the lid on one of the key components in any recording, broadcast or live sound signal chain. They will discuss the history, philosophy and evolution of this often misunderstood processor. Is one compressor design better than another? What design features work best for what application? The panel will also reveal the workings behind the mysteries of feedback and feed-forward designs, side-chains, and hard and soft knees, and explore the uses of multiband, parallel and serial compression.

Friday, October 3, 2:30 pm — 4:15 pm

M3 - Sonic Methodology and Mythology

Presenter:
Keith O. Johnson - Pacifica, CA, USA

Abstract:
Do extravagant designs and superlative specifications satisfy sonic expectations? Can power cords, interconnects, marker dyes and other components in a controversial lineup improve staging, clarity, and other features? Intelligent measurements and neural feedback studies support these sonic issues as well as predict misdirected methodology from speculative thought. Sonic changes and perceptual feats to hear them are possible and we'll explore recorders, LPs, amplifiers, conversion, wire, circuits and loudspeakers to observe how they create artifacts and interact in systems. Hearing models help create and interpret tests intended to excite predictive behaviors of components. Time domain, tone cluster and fast sweep signals along with simple test devices reveal small complex artifacts. Background knowledge of halls, recording techniques, and cognitive perception becomes helpful to interpret results, which can reveal simple explanations to otherwise remarkable physics. Other topics include power amplifiers that can ruin a recording session, noise propagation from regulators, singing wire, coherent noise, eigensonics, and speakers prejudicial to key signatures. Waveform perception, tempo shifting, and learned object sounds will be demonstrated.

Friday, October 3, 2:15 pm — 5:15 pm

TT7 - Sony Computer Entertainment America, Foster City

Abstract:
Sony Computer Entertainment America Inc. (SCEA) audio facility in Foster City, CA was constructed in 2006 to create exceptional audio support for the PlayStation^® family of products. This 13-room facility, in partnership with it’s 15-room sister facility in San Diego, provides dialog, music, and sound design support for first party games on the PS3^™, PSP^®, and PlayStation^®2 game platforms. The Foster City facility has eleven 5.1 sound pods, one 7.1 mix room, and a recording studio. All rooms have hardware and software parity with the San Diego facility so project work can be shared and teams sizes can scale through out the development cycle of a game. All rooms are THX approved and have Mac and PC computers running in an external machine room. Monitors are custom designed by Pelonis Sound and Acoustics, Inc.

God of War^®, Uncharted- Drakes Fortune^™, SOCOM: US Navy SEALs, and MLB 08 The Show^™ are some of the titles that have benefited from music, sound design, and dialog work created in the Foster City and San Diego audio facilities.

SCEA policy requires all visitors to sign a Non-Disclosure Agreement to enter the facility.

Note: Maximum of 46 participants per tour.

Price: $35 (members), $45 (nonmembers)

Friday, October 3, 2:30 pm — 5:00 pm

TT8 - Singer V7 Studios/Universal Audio, San Francisco

Abstract:
Four-time Emmy Award-Winning Composer/Producer/Performer/Writer, and Director Scott Singer has been at the cutting edge of technology-based entertainment for three decades. Most recently Mr. Singer was the Technical Musical Director and Assistant Director for the High-Definition DVD live recordings of Boz Scaggs’ Jazz Album and Greatest Hits, and the HD simulcast of the San Francisco Opera Rigoletto.

Now in its 24th year of operation, Singer Productions and Singer Studios V7 continues to serve as a state of the art recording facility for both Mr. Singer’s projects as well as many other talented recording artists and performers. Scott has just completed a full studio remodel (Version 7) adding the world’s first Bentley Edition Recording Suite—featuring a custom British mixing desk from John Oram/Trident, the “GP40,” as well as classic high-end components from Neve, SSL, Universal Audio, RCA, and GML.

State of the Emulation

Dave Berners will be co-presenting an answering questions on plug-in
emulations. They will demo UA gear, with in-studio live vocals by singer
Kyah Doran.

Note: Maximum of 30 participants per tour.

Price: $30 (members), $40 (nonmembers)

Friday, October 3, 2:30 pm — 6:30 pm

P9 - Multichannel Sound Reproduction

Chair: Durand Begault, NASA Ames Research Center - Mountain View, CA, USA

P9-1 An Investigation of 2-D Multizone Surround Sound Systems—Mark Poletti, Industrial Research Limited - Lower Hutt, Wellington, New Zealand
Surround sound systems can produce a desired sound field over an extended region of space by using higher order Ambisonics. One application of this capability is the production of multiple independent soundfields in separate zones. This paper investigates multi-zone surround systems for the case of two-dimensional reproduction. A least squares approach is used for deriving the loudspeaker weights for producing a desired single frequency wave field in one of N zones. It is shown that reproduction in the active zone is more difficult when an inactive zone is in-line with the virtual sound source and the active zone. Methods for controlling this problem are discussed.
Convention Paper 7551 (Purchase now)

P9-2 Two-Channel Matrix Surround Encoding for Flexible Interactive 3-D Audio Reproduction—Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
The two-channel matrix surround format is widely used for connecting the audio output of a video gaming system to a home theater receiver for multichannel surround reproduction. This paper describes the principles of a computationally-efficient interactive audio spatialization engine for this application. Positional cues including 3-D elevation are encoded for each individual sound source by frequency-independent interchannel phase and amplitude differences, rather than HRTF cues. A matrix surround decoder based on frequency-domain Spatial Audio Scene Coding (SASC) is able to faithfully reproduce both ambient reverberation and positional cues over headphones or arbitrary multichannel loudspeaker reproduction formats, while preserving source separation despite the intermediate encoding over only two channels.
Convention Paper 7552 (Purchase now)

P9-3 Is My Decoder Ambisonic?—Aaron Heller, SRI International - Menlo Park, CA, USA; Richard Lee, Pandit Littoral - Cooktown, Queensland, Australia; Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA
In earlier papers, the present authors established the importance of various aspects of Ambisonic decoder design: a decoding matrix matched to the geometry of the loudspeaker array in use, phase-matched shelf filters, and distance compensation. These are needed for accurate reproduction of spatial localization cues, such as interaural time difference (ITD), interaural level difference (ILD), and distance cues. Unfortunately, many listening tests of Ambisonic reproduction reported in the literature either omit the details of the decoding used or utilize suboptimal decoding. In this paper we review the acoustic and psychoacoustic criteria for Ambisonic reproduction; present a methodology and tools for "black box" testing to verify the performance of a candidate decoder; and present and discuss the results of this testing on some widely used decoders.
Convention Paper 7553 (Purchase now)

P9-4 Exploiting Human Spatial Resolution in Surround Sound Decoder Design—David Moore, Jonathan Wakefield, University of Huddersfield - West Yorkshire, UK
This paper presents a technique whereby the localization performance of surround sound decoders can be improved in directions in which human hearing is more sensitive to sound source location. Research into the Minimum Audible Angle is explored and incorporated into a fitness function based upon a psychoacoustic model. This fitness function is used to guide a heuristic search algorithm to design new Ambisonic decoders for a 5-speaker surround sound layout. The derived decoder is successful in matching the variation in localization performance of the human listener with better performance to the front and rear and reduced performance to the sides. The effectiveness of the standard ITU 5-speaker layout versus a non-standard layout is also considered in this context.
Convention Paper 7554 (Purchase now)

P9-5 Surround System Based on Three-Dimensional Sound Field Reconstruction—Filippo M. Fazi, Philip A. Nelson, Jens E. Christensen, University of Southampton - Southampton, UK; Jeongil Seo, Electronics and Telecommunications Research Institute (ETRI) - Daejeon, Korea
The theoretical fundamentals and the simulated and experimental performance of an innovative surround sound system are presented. The proposed technology is based on the physical reconstruction of a three-dimensional target sound field over a region of the space using an array of loudspeakers surrounding the listening area. The computation of the loudspeaker gains includes the numerical or analytical solution of an integral equation of the first kind. The experimental setup and the measured reconstruction performance of a system prototype constituted by a three dimensional array of 40 loudspeakers are described and discussed.
Convention Paper 7555 (Purchase now)

P9-6 A Comparison of Wave Field Synthesis and Higher-Order Ambisonics with Respect to Physical Properties and Spatial Sampling—Sascha Spors, Jens Ahrens, Technische Universität Berlin - Berlin, Germany
Wave field synthesis (WFS) and higher-order Ambisonics (HOA) are two high-resolution spatial sound reproduction techniques aiming at overcoming some of the limitations of stereophonic reproduction techniques. In the past, the theoretical foundations of WFS and HOA have been formulated in a quite different fashion. Although some work has been published that aims at comparing both approaches their similarities and differences are not well documented. This paper formulates the theory of both approaches in a common framework, highlights the different assumptions made to derive the driving functions, and the resulting physical properties of the reproduced wave field. Special attention will be drawn to the spatial sampling of the secondary sources since both approaches differ significantly here.
Convention Paper 7556 (Purchase now)

P9-7 Reproduction of Virtual Sound Sources Moving at Supersonic Speeds in Wave Field Synthesis—Jens Ahrens, Sascha Spors, Technische Universität Berlin - Berlin, Germany
In conventional implementations of wave field synthesis, moving sources are reproduced as sequences of stationary positions. As reported in the literature, this process introduces various artifacts. It has been shown recently that these artifacts can be reduced when the physical properties of the wave field of moving virtual sources are explicitly considered. However, the findings were only applied to virtual sources moving at subsonic speeds. In this paper we extend the published approach to the reproduction of virtual sound sources moving at supersonics speeds. The properties of the actual reproduced sound field are investigated via numerical simulations.
Convention Paper 7557 (Purchase now)

P9-8 An Efficient Method to Generate Particle Sounds in Wave Field Synthesis—Michael Beckinger, Sandra Brix, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
Rendering a couple of virtual sound sources for wave field synthesis (WFS) in real time is nowadays feasible using the calculation power of state-of-the-art personal computers. If immersive atmospheres containing thousands of sound particles like rain and applause should be rendered in real time for a large listening area with a high spatial accuracy, calculation complexity increases enormously. A new algorithm based on continuously generated impulse responses and following convolutions, which renders many sound particles in an efficient way will be presented in this paper. The algorithm was verified by first listening tests and its calculation complexity was evaluated as well.
Convention Paper 7558 (Purchase now)

Friday, October 3, 2:30 pm — 5:00 pm

P10 - Nonlinearities in Loudspeakers

Chair: Laurie Fincham, THX Ltd. - San Rafael, CA, USA

P10-1 Audibility of Phase Response Differences in a Stereo Playback System. Part 2: Narrow-Band Stimuli in Headphones and Loudspeakers—Sylvain Choisel, Geoff Martin, Bang & Olufsen A/S - Struer, Denmark
An series of experiments were conducted in order to measure the audibility thresholds of phase differences between channels using mismatched cross-over networks. In Part 1 of this study, it was shown that listeners are able to detect very small inter-channel phase differences when presented with wide-band stimuli over headphones, and that the threshold was frequency dependent. This second part of the investigation focuses on listeners’ abilities with narrow-band signals (from 63 to 8000 Hz) in headphones as well as loudspeakers. The results confirm the frequency dependency of the audibility threshold over headphones, whereas for loudspeaker playback the threshold was essentially independent of the frequency.
Convention Paper 7559 (Purchase now)

P10-2 Time Variance of the Suspension Nonlinearity—Finn Agerkvist, Technical University of Denmark - Lyngby, Denmark; Bo Rhode Petersen, Aalborg University - Esbjerg, Denmark
It is well known that the resonance frequency of a loudspeaker depends on how it is driven before and during the measurement. Measurement done right after exposing it to high levels of electrical power and/or excursion giver lower values than what can be measured when the loudspeaker is cold. This paper investigates the changes in compliance the driving signal can cause, this includes low level short duration measurements of the resonance frequency as well as high power long duration measurements of the nonlinearity of the suspension. It is found that at low levels the suspension softens but recovers quickly. The high power and long term measurements affect the nonlinearity of the loudspeaker, by increasing the compliance value for all values of displacement. This level dependency is validated with distortion measurements and it is demonstrated how improved accuracy of the nonlinear model can be obtained by including the level dependency.
Convention Paper 7560 (Purchase now)

P10-3 A Study of the Creep Effect in Loudspeakers Suspension—Finn Agerkvist, Technical University of Denmark - Lyngby, Denmark; Knud Thorborg, Carsten Tinggaard, Tymphany A/S - Taastrup, Denmark
This paper investigates the creep effect, the visco elastic behavior of loudspeaker suspension parts, which can be observed as an increase in displacement far below the resonance frequency. The creep effect means that the suspension cannot be modeled as a simple spring. The need for an accurate creep model is even larger as the validity of loudspeaker models are now sought extended far into the nonlinear domain of the loudspeaker. Different creep models are investigated and implemented both in simple lumped parameter models as well as time domain nonlinear models, the simulation results are compared with a series of measurements on three version of the same loudspeaker with different thickness and rubber type used in the surround.
Convention Paper 7561 (Purchase now)

P10-4 The Influence of Acoustic Environment on the Threshold of Audibility of Loudspeaker Resonances—Shelley Uprichard, Bang & Olufsen A/S - Struer, Denmark and University of Surrey, Guildford, Surrey, UK; Sylvain Choisel, Bang & Olufsen A/S - Struer, Denmark
Resonances in loudspeakers can produce a detrimental effect on sound quality. The reduction or removal of unwanted resonances has therefore become a recognized practice in loudspeaker tuning. This paper presents the results of a listening test that has been used to determine the audibility threshold of a single resonance in different acoustic environments: headphones, loudspeakers in a standard listening room, and loudspeakers in a car. Real loudspeakers were measured and the resonances modeled as IIR filters. Results show that there is a significant interaction between acoustic environment and program material.
Convention Paper 7562 (Purchase now)

P10-5 Confirmation of Chaos in a Loudspeaker System Using Time Series Analysis—Joshua Reiss, Queen Mary, University of London - London, UK; Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia; Danijel Djurek, AVAC – Alessandro Volta Applied Ceramics, Laboratory for Nonlinear Dynamics - Zagreb, Croatia
The dynamics of an experimental electrodynamic loudspeaker is studied by using the tools of chaos theory and time series analysis. Delay time, embedding dimension, fractal dimension, and other empirical quantities are determined from experimental data. Particular attention is paid to issues of stationarity in the system in order to identify sources of uncertainty. Lyapunov exponents and fractal dimension are measured using several independent techniques. Results are compared in order to establish independent confirmation of low dimensional dynamics and a positive dominant Lyapunov exponent. We thus show that the loudspeaker may function as a chaotic system suitable for low dimensional modeling and the application of chaos control techniques.
Convention Paper 7563 (Purchase now)

Friday, October 3, 2:30 pm — 4:00 pm

P11 - Listening Tests & Psychoacoustics

P11-1 Testing Loudness Models—Real vs. Artificial Content—James Johnston, Neural Audio Corp. - Kirkland, WA, USA
A variety of loudness models have been recently proposed and tested by various means. In this paper some basic properties of loudness are examined, and a set of artificial signals are designed to test the "loudness space" based on principles dating back to Harvey Fletcher, or arguably to Wegel and Lane. Some of these signals, designed to model "typical" content, seem to reinforce the results of prior loudness model testing. Other signals, less typical of standard content, seem to show that there are some substantial differences when these less common signals and signal spectra are used.
Convention Paper 7564 (Purchase now)

P11-2 Audibility of High Q-factor All-Pass Components in Head-Related Transfer Functions—Daniela Toledo, Henrik Møller, Aalborg University - Aalborg, Denmark
Head-related transfer functions (HRTFs) can be decomposed into minimum phase, linear phase, and all-pass components. It is known that low Q-factor all-pass sections in HRTFs are audible as lateral shifts when the interaural group delay at low frequencies is above 30 µs. The goal of our investigation is to test the audibility of high Q-factor all-pass components in HRTFs and the perceptual consequences of removing them. A three-alternative forced choice experiment has been conducted. Results suggest that high Q-factor all-pass sections are audible when presented alone, but inaudible when presented with their minimum phase HRTF counterpart. It is concluded that high Q-factor all-pass sections can be discarded in HRTFs used for binaural synthesis.
Convention Paper 7565 (Purchase now)

P11-3 A Psychoacoustic Measurement and ABR for the Sound Signals in the Frequency Range between 10 kHz and 24 kHz—Mizuki Omata, Musashi Institute of Technology - Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Motoki Koubori, Yoshitaka Moriya, Masaki Kyouso, Shogo Kiryu, Musashi Institute of Technology - Tokyo, Japan
In high definition audio media such as SACD and DVD-audio, wide frequency range far beyond 20 kHz is used. However, the auditory characteristics for the frequencies higher than 20 kHz have not been necessarily understood. At the first step to make clear the characteristics, we conducted a psychoacoustic and an auditory brain-stem response (ABR) measurement for the sound signals in the frequency range between 10 kHz and 24 kHz. At a frequency of 22 kHz, the hearing threshold in the psychoacoustic measurement could be measured for 4 of 5 subjects. The minimum sound pressure level was 80 dB. The thresholds of 100 dB in the ABR measurement could be measured for 1 of the 5 subjects.
Convention Paper 7566 (Purchase now)

P11-4 Quantifying the Strategy Taken by a Pair of Ensemble Hand-Clappers under the Influence of Delay—Nima Darabii, Peter Svensson, The Centre for Quantifiable Quality of Service in Communication Systems, NTNU - Trondheim, Norway; Snorre Farner, IRCAM - Paris, France
Pairs of subjects were placed in two acoustically isolated rooms clapping together under an influence of delay up to 68 ms. Their trials were recorded and analyzed based on a definition of compensation factor. This parameter was calculated from the recorded observations for both performers as a discrete function of time and thought of as a measure of the strategy taken by the subjects while clapping. The compensation factor was shown to have a strong individual as well as a fairly musical dependence. Increasing the delay compensation factor was shown obviously to be increased as it is needed to avoid tempo decrease for such high latencies. Virtual anechoic conditions cause a less deviation for this factor than the reverberant conditions. Slightly positive compensation parameter for very short latencies may lead to a tempo acceleration in accordance with Chafe effect.
Convention Paper 7567 (Purchase now)

P11-5 Quantitative and Qualitative Evaluations for TV Advertisements Relative to the Adjacent Programs—Eiichi Miyasaka, Akiko Kimura, Musashi Institute of Technology - Yokohama, Kanagawa, Japan
The sound levels of advertisements (CMs) in Japanese conventional terrestrial analog broadcasting (TAB) were quantitatively compared with those in Japanese terrestrial digital broadcasting (TDB). The results show that the average CM-sound level in TDB was about 2 dB lower and the average standard deviation was wider than those in TAB, while there were few differences between TAB and TDB at some TV station. Some CMs in TDB were perceived clearly louder than the adjacent programs although the sound level differences between the CMs and the programs were only within ±2 dB. Next, insertion methods of CMs into the main programs in Japan were qualitatively investigated. The results show that some kinds of the methods could unacceptably irritate viewers.
Convention Paper 7568 (Purchase now)

Friday, October 3, 4:00 pm — 6:45 pm

B6 - History of Audio Processing

Chair:
Emil Torick
Panelists:
Dick Burden
Marvin Caesar, Aphex
Glen Clark, Glen Clark & Associates
Mike Dorrough, Dorrough Electronics
Frank Foti, Omnia
Greg J. Ogonowski, Orban/CRL
Bob Orban, Orban/CRL
Eric Small, Modulation Sciences

Abstract:
The participants of this session pioneered audio processing and developed the tools we still use today. A discussion of the developments, technology, and the “Loudness Wars” will take place. This session is a must if you want to understand how and why audio processing is used.

Friday, October 3, 4:30 pm — 6:00 pm

T7 - Sound in the UI

Presenter:
Jeff Essex, Audiosyncrasy - Albany, CA, USA

Abstract:
Many computer and consumer electronics products use sound as part of their UI, both to communicate actions and to create a "personality" for their brand. This session will present numerous real-world examples of sounds created for music players, set-top boxes and operating systems. We'll follow projects from design to implementation with special attention to solving real-world problems that arise during development. We'll also discuss some philosophies of sound design, showing examples of how people respond to various audio cues and how those reactions can be used to convey information about the status of a device (navigation through menus, etc.).

Friday, October 3, 4:30 pm — 7:00 pm

TT9 - Singer V7 Studios/Universal Audio, San Francisco

W7 - Same Techniques, Different Technologies—Recurring Strategies for Producing Game, Web, and Mobile Audio

Chair:
Peter Drescher
Panelists:
Steve Horowitz, NickOnline
George "The Fatman" Sanger, Legendary Game Audio Guru
Guy Whitmore, Microsoft Game Studio

Abstract:
When any new technology develops, the limitations of current systems are inevitably met. Bandwidth constraints then generate a class of techniques designed to maximize information transfer. Over time as bottlenecks expand, new kinds of applications become possible, making previous methods and file formats obsolete. By the time broadband access becomes available, we can observe a similar progression taking place in the next developing technology. The workshop discusses this trend as exhibited in the gaming, Internet, and mobile industries, with particular emphasis on audio file types and compression techniques. The presenter will compare and contrast obsolete tricks of the trade with current practices and invite industry veterans to discuss the trend from their points of view. Finally the panel makes predictions about the evolution of media.

Friday, October 3, 5:00 pm — 6:30 pm

P12 - Amplifiers and Automotive Audio

P12-1 Imperfections and Possible Advances in Analog Summing Amplifier Design—Milan Kovinic, MMK Instruments - Belgrade, Serbia; Dragan Drincic, Advanced School for Electrical & Computer Engineering - Belgrade, Serbia; Sasha Jankovic, OXYGEN-Digital, Parkgate Studio - Sussex, UK
The major requirement in the design of the analog summing amplifier is the quality of the summing bus. The key problem in most common designs is the artifact of summing bus impedance, which cannot be considered as true physical impedance, because it has been generated by negative feedback. The loop gain of the amplifier used will limit the performance at higher audio frequencies where the loop gain is lower, increasing the channels cross talk. The inevitable effect of heavy feedback is the increased susceptibility of the amplifier to oscillate as well as sensitivity to RFI. The advanced solution, presented in this paper, could be seen in the usage of the transistor common-base pair (CB-CB) configuration as a summing bus. The CB pair offers inherent low-input impedance, low-noise, very good frequency response, and, very importantly, makes the application of total feedback not necessarily.
Convention Paper 7569 (Purchase now)

P12-2 A Switchmode Power Supply Suitable for Audio Power Amplifiers—Jay Gordon, Factor One Inc. - Keyport, NJ, USA
Power supplies for audio amplifiers have different requirements than typical commercial power supplies. A tabulation of power supply parameters that affect the audio application is presented and discussed. Different types of audio amplifiers are categorized and shown to have different requirements. Over time new technologies have emerged that affect the implementation of AC to DC converters used in audio amplifiers. A brief history of audio power supply technology is presented. The evolution of the newly proposed interleaved boost with LLC resonant half bridge topology from preceding technologies is shown. The operation of the new topology is explained and its advantages are shown by a simulation of the circuit.
Convention Paper 7570 (Purchase now)

P12-3 On the Optimization of Enhanced Cascode—Dimitri Danyuk, Consultant - Miami, FL, USA
Twenty years ago enhanced cascode and other circuit topologies based on the same design principles were presented to audio amplifier designers. The circuit was supposed to be incorporated in transconductance gain stages and current sources. Enhanced cascode was used in some commercial products but have not received wide adoption. It was speculated that enhanced cascode has reduced phase margin and at times higher distortion being compared to conventional cascode. Enhanced cascode is analyzed on the basis of distortion and frequency response. It is shown how to make the most of enhanced cascode. Optimized novel circuit topology is presented.
Convention Paper 7571 (Purchase now)

P12-4 An Active Load and Test Method for Evaluating the Efficiency of Audio Power Amplifiers—Harry Dymond, Phil Mellor, University of Bristol - Bristol, UK
This paper presents the design, implementation, and use of an “active load” for audio power amplifier efficiency testing. The active load can simulate linear complex loads representative of real-world amplifier operation with a load modulus between 4 and 50 ohms inclusive, load phase-angles between -60° and +60° inclusive, and operates from 20 to 20,000 Hz. The active load allows for the development of an automated test procedure for evaluating the efficiency of an audio power amplifier across a range of output voltage amplitudes, load configurations, and output signal frequencies. The results of testing a class-B and a class-D amplifier, each rated at 100 watts into 8 ohms, are presented.
Convention Paper 7572 (Purchase now)

P12-5 An Objective Method of Measuring Subjective Click-and-Pop Performance for Audio Amplifiers—Kymberly Christman (Schmidt), Maxim Integrated Products - Sunnyvale, CA, USA
Click-and-pop refers to any “clicks” and “pops” or other unwanted, audio-band transient signals that are reproduced by headphones or loudspeakers when the audio source is turned on or off. Until recently, the industry’s characterization of this undesirable effect has been almost purely subjective. Marketing phrases such as “low pop noise” and “clickless/popless operation” illustrate the subjectivity applied in quantifying click-and-pop performance. This paper presents a method that objectively quantifies this parameter, allowing meaningful, repeatable comparisons to be drawn between different components. Further, results of a subjective click-and-pop listening test are presented to provide a baseline for objectionable click-and-pop levels in headphone amplifiers.
Convention Paper 7573 (Purchase now)

P12-6 Effective Car Audio System Enabling Individual Signal Processing Operations of Coincident Multiple Audio Sources through Single Digital Audio Interface Line—Chul-Jae Yoo, In-Sik Ryu, Hyundai Autonet - South Korea
There are three major audio sources in recent car environments: primary audio (usually music including radio), navigation voice prompt, and hands-free voice. Listening situations in cars include not only listening to a single audio source, but also listening to concurrent multiple audio sources—for example, navigation guided as listening music and navigation guided or listening music as talking on a hands-free cell phone. In this paper a conventional external amplifier system connected with a head unit by three audio interface lines was introduced. Then, an effective automotive audio system having single SPDIF interface line that is capable of concurrent processing of the above three kinds of audio sources was proposed. The new system leads to a reduced wire harness in car environments and also increases voice qualities by transmitting voice signals via an SPDIF digital line compared with that via analog lines.
Convention Paper 7574 (Purchase now)

P12-7 Digital Equalization of Automotive Sound Systems Employing Spectral Smoothed FIR Filters—Marco Binelli, Angelo Farina, University of Parma - Parma, Italy
In this paper we investigate the usage of spectral smoothed FIR filters for equalizing a car audio system. The target is also to build short filters that can be processed on DSP processors with limited computing power. The inversion algorithm is based on the Nelson-Kirkeby method and on independent phase and magnitude smoothing, by means of a continuous phase method as Panzer and Ferekidis showd. The filter is aimed to create a "target" frequency response, not necessarily flat, employing a short number of taps and maintaining good performances everywhere inside the car's cockpit. As shown also by listening tests, smoothness, and the choice of the right frequency response increase the performances of the car audio systems.
Convention Paper 7575 (Purchase now)

P12-8 Implementation of a Generic Algorithm on Various Automotive Platforms—Thomas Esnault, Jean-Michel Raczinski, Arkamys - Paris, France
This paper describes a methodology to adapt a generic automotive algorithm to various embedded platforms while keeping the same audio rendering. To get over the limitations of the target DSPs, we have developed tools to control the transition from one platform to another including algorithm adaptation and coefficients computing. Objective and subjective validation processes allow us to certify the quality of the adaptation. With this methodology, productivity has been increased in an industrial context.
Convention Paper 7576 (Purchase now)

P12-9 Advanced Audio Algorithms for a Real Automotive Digital Audio System—Stefania Cecchi, Lorenzo Palestini, Paolo Peretti, Emanuele Moretti, Francesco Piazza, Università Politecnica delle Marche - Ancona, Italy; Ariano Lattanzi, Ferruccio Bettarelli, Leaff Engineering - Porto Potenza Picena (MC), Italy
In this paper an innovative modular digital audio system for car entertainment is proposed. The system is based on a plug-in-based software (real-time) framework allowing reconfigurability and flexibility. Each plug-in is dedicated to a particular audio task such as equalization and crossover filtering, implementing innovative algorithms. The system has been tested on a real car environment, with a hardware platform comprising professional audio equipments, running on a PC. Informal listening tests have been performed to validate the overall audio quality, and satisfactory results were obtained.
Convention Paper 7577 (Purchase now)

Friday, October 3, 7:00 pm — 8:30 pm

Heyser Lecture
followed by
Technical Council
Reception

Abstract:
The Richard C. Heyser distinguished lecturer for the 125thAES Convention is Floyd Toole. Toole studied electrical engineering at the University of New Brunswick and at the Imperial College of Science and Technology, University of London, where he received a Ph.D. In 1965 he joined the National Research Council of Canada, where he reached the position of Senior Research Officer in the Acoustics and Signal Processing Group. In 1991, he joined Harman International Industries, Inc. as Corporate Vice President – Acoustical Engineering. In this position he worked with all Harman International companies, and directed the Harman Research and Development Group, a central resource for technology development and subjective measurements, retiring in 2007.

Toole’s research has focused on the acoustics and psychoacoustics of sound reproduction in small rooms, directed to improving engineering measurements, objectives for loudspeaker design and evaluation, and techniques for reducing variability at the loudspeaker/room/listener interface. For papers on these subjects he has received two AES Publications Awards and the AES Silver Medal. He is a Fellow and Past President of the AES and a Fellow of the Acoustical Society of America. In September, 2008, he was awarded the CEDIA Lifetime Achievement Award. He has just completed a book Sound Reproduction: Loudspeakers and Rooms (Focal Press, 2008). The title of his lecture is, “Sound Reproduction: Where We Are and Where We Need to Go.”

Over the past twenty years scientific research has made considerable progress in identifying the significant variables in sound reproduction and in clarifying the psychoacoustic relationships between measurements and perceptions. However, this knowledge is not widespread, and the audio industry remains burdened by unsubstantiated practices and folklore. Oft repeated beliefs can have status and influence commensurate with scientific facts.

One problem has been that much of the essential data was obscured by disorder: the knowledge was buried in papers in numerous books and journals, indexed under many different topics, and sometimes a key point was peripheral to the main subject of the paper. Assembling and organizing the information was the purpose of my recent book, Sound Reproduction (Focal Press, 2008). It turns out that we know a great deal about the acoustics and psychoacoustics of loudspeakers in small rooms, and this knowledge provides substantial guidance about designing and integrating systems to provide high quality sound reproduction.

However, what we hear over these installations is of variable sound quality and, more importantly, not always what was intended by the artists. Inconsistent and imperfect devices and practices in both the professional and consumer domains result in mismatches between recording and playback. Standards exist but are not often used. Many of them are fundamentally flawed. If we in the audio industry are serious about our mission to deliver the aural art in music and movies, as it was created, to consumers, there is work to be done. It begins with agreeing on the objectives, and is followed by an application of the science we know.

Saturday, October 4, 9:00 am — 10:45 am

B7 - DTV Audio Myth Busters

Chair:
Jim Kutzner, PBS
Panelists:
Robert Bleidt, Fraunhofer USA Digital Media Technologies
Tim Carroll, Linear Acoustic, Inc.
Ken Hunold, Dolby Laboratories
David Wilson, Consumer Electronics Association

Abstract:
There is no limit to the confusion created by the audio options in DTV. What do the systems really do? What happens when the systems fail? How much control can be exercised at each step in the content food chain? There are thousands of opinions and hundreds of options, but what really works and how do you keep things under control? Bring your questions and join the discussion as four experts from different stages in the chain try to sort it out.

Saturday, October 4, 9:00 am — 10:45 am

T9 - How I Does Filters: An Uneducated Person’s Way to Design Highly Regarded Digital Equalizers and Filters

Presenter:
Peter Eastty, Oxford Digital Limited - Oxfordshire, UK

Abstract:
Much has been written in many learned papers about the design of audio filters and equalizers, this is NOT another one of those. The presenter is a bear of little brain and has over the years had to reduce the subject of digital filtering into bite-sized lumps containing a number of simple recipes that have got him through most of his professional life. Complete practical implementations of high pass and low pass multi-order filters, bell (or presence) filters, and shelving filters including the infrequently seen higher order types. The tutorial is designed for the complete novice, it is light on mathematics and heavy on explanation and visualization—even so, the provided code works and can be put to practical use.

Saturday, October 4, 9:00 am — 10:30 am

T10 - New Technologies for Up to 7.1 Channel Playback in Any Game Console Format

Presenter:
Geir Skaaden, Neural Audio Corp. - Kirkland, WA, USA

Abstract:
This tutorial investigates methods for increasing the number of audio channels in a gaming console beyond its current hardware limitations. The audio engine within a game is capable of creating a 360 8 environment, however, the console hardware uses only a few channels to represent this world. If home playback systems are commonly able to reproduce up to 7.1 channels, how do game developers increase the number of playback channels for a platform that is limited to 2 or 5 outputs? New encoding technologies make this possible. Descriptions of current methods will be made in addition to new console independent technologies that run within the game engine. Game content will be used to demonstrate the encode/decode process.

Saturday, October 4, 9:00 am — 10:30 am

P13 - Spatial Perception

Chair: Richard Duda, San Jose State University - San Jose, CA, USA

P13-1 Individual Subjective Preferences for the Relationship between SPL and Different Cinema Shot Sizes—Roberto Munoz, U. Tecnológica de Chile INACAP - Santiago, Chile; Manuel Recuero, Universidad Politécnica de Madrid - Madrid, Spain; Manuel Gazzo, Diego Duran, U. Tecnológica de Chile INACAP - Santiago, Chile
The main motivation of this study was to find Individual Subjective Preferences (ISP) for the relationship between SPL and different cinema shot sizes. By means of the psychophysical method of Adjustment (MA), the preferred SPL for four of the most frequently used shot sizes, i.e., long shot, medium shot, medium close-up, and close-up, was subjectively quantified.
Convention Paper 7578 (Purchase now)

P13-2 Improvements to a Spherical Binaural Capture Model for Objective Measurement of Spatial Impression with Consideration of Head Movements—Chungeun Kim, Russell Mason, Tim Brookes, University of Surrey - Guildford, Surrey, UK
This research aims, ultimately, to develop a system for the objective evaluation of spatial impression, incorporating the finding from a previous study that head movements are naturally made in its subjective evaluation. A spherical binaural capture model, comprising a head-sized sphere with multiple attached microphones, has been proposed. Research already conducted found significant differences in interaural time and level differences, and cross-correlation coefficient, between this spherical model and a head and torso simulator. It is attempted to lessen these differences by adding to the sphere a torso and simplified pinnae. Further analysis of the head movements made by listeners in a range of listening situations determines the range of head positions that needs to be taken into account. Analysis of these results inform the optimum positioning of the microphones around the sphere model.
Convention Paper 7579 (Purchase now)

P13-3 Predicting Perceived Off-Center Sound Degradation in Surround Loudspeaker Setups for Various Multichannel Microphone Techniques—Nils Peters, Bruno Giordano, Sungyoung Kim, McGill University - Montreal, Quebec, Canada; Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA; Stephen McAdams, McGill University - Montreal, Quebec, Canada
Multiple listening tests were conducted to examine the influence of microphone techniques on the quality of sound reproduction. Generally, testing focuses on the central listening position (CLP), and neglects off-center listening positions. Exploratory tests focusing on the degradation in sound quality at off-center listening positions were presented at the 123rd AES Convention. Results showed that the recording technique does influence the degree of sound degradation at off-center positions. This paper focuses on the analysis of the binaural re-recording at the different listening positions in order to interpret the results of the previous listening tests. Multiple linear regression is used to create a predictive model which accounts for 85% of the variance in the behavioral data. The primary successful predictors were spectral and the secondary predictors were spatial in nature.
Convention Paper 7580 (Purchase now)

Saturday, October 4, 9:00 am — 12:00 pm

P14 - Listening Tests & Psychoacoustics

Chair: Poppy Crum, Johns Hopkins University - Baltimore, MD, USA

P14-1 Rapid Learning of Subjective Preference in Equalization—Andrew Sabin, Bryan Pardo, Northwestern University - Evanston, IL, USA
We describe and test an algorithm to rapidly learn a listener’s desired equalization curve. First, a sound is modified by a series of equalization curves. After each modification, the listener indicates how well the current sound exemplifies a target sound descriptor (e.g., “warm”). After rating, a weighting function is computed where the weight of each channel (frequency band) is proportional to the slope of the regression line between listener responses and within-channel gain. Listeners report that sounds generated using this function capture their intended meaning of the descriptor. Machine ratings generated by computing the similarity of a given curve to the weighting function are highly correlated to listener responses, and asymptotic performance is reached after only ~25 listener ratings.
Convention Paper 7581 (Purchase now)

P14-2 An Initial Validation of Individualized Crosstalk Cancellation Filters for Binaural Perceptual Experiments—Alastair Moore, Anthony Tew, University of York - York, UK; Rozenn Nicol, France Télécom R&D - Lannion, France
Crosstalk cancellation provides a means of delivering binaural stimuli to a listener for psychoacoustic research that avoids many of the problems of using headphone in experiments. The aim of this study was to determine whether individual crosstalk cancellation filters can be used to present binaural stimuli, which are perceptually indistinguishable from a real sound source. The fast deconvolution with frequency dependent regularization method was used to design crosstalk cancellation filters. The reproduction loudspeakers were positioned at ±90-degrees azimuth and the synthesized location was 0-degrees azimuth. Eight listeners were tested with three types of stimuli. In twenty-two out of the twenty-four listener/stimulus combinations there were no perceptible differences between the real and virtual sources. The results suggest that this method of producing individualized crosstalk cancellation filters is suitable for binaural perceptual experiments.
Convention Paper 7582 (Purchase now)

P14-3 Reverberation Echo Density Psychoacoustics—Patty Huang, Jonathan S. Abel, Hiroko Terasawa, Jonathan Berger, Stanford University - Stanford, CA, USA
A series of psychoacoustic experiments were carried out to explore the relationship between an objective measure of reverberation echo density, called the normalized echo density (NED), and subjective perception of the time-domain texture of reverberation. In one experiment, 25 subjects evaluated the dissimilarity of signals having static echo densities. The reported dissimilarities matched absolute NED differences with an R2 of 93%. In a 19-subject experiment, reverberation impulse responses having evolving echo densities were used. With an R2 of 90% the absolute log ratio of the late field onset times matched reported dissimilarities between impulse responses. In a third experiment, subjects reported breakpoints in the character of static echo patterns at NED values of 0.3 and 0.7.
Convention Paper 7583 (Purchase now)

P14-4 Optimal Modal Spacing and Density for Critical Listening—Bruno Fazenda, Matthew Wankling, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper presents a study on the subjective effects of modal spacing and density. These are measures often used as indicators to define particular aspect ratios and source positions to avoid low frequency reproduction problems in rooms. These indicators imply a given modal spacing leading to a supposedly less problematic response for the listener. An investigation into this topic shows that subjects can identify an optimal spacing between two resonances associated with a reduction of the overall decay. Further work to define a subjective counterpart to the Schroeder Frequency has revealed that an increase in density may not always lead to an improvement, as interaction between mode-shapes results in serious degradation of the stimulus, which is detectable by listeners.
Convention Paper 7584 (Purchase now)

P14-5 The Illusion of Continuity Revisited on Filling Gaps in the Saxophone Sound—Piotr Kleczkowski, AGH University of Science and Technology - Cracow, Poland
Some time-frequency gaps were cut from a recording of a motif played legato on the saxophone. Subsequently, the gaps were filled with various sonic material: noises and sounds of an accompanying band. The quality of the saxophone sound processed in this way was investigated by listening tests. In all of the tests, the saxophone seemed to continue through the gaps, an impairment in quality being observed as a change in the tone color or an attenuation of the sound level. There were two aims of this research. First, to investigate whether the continuity illusion contributed to this effect, and second, to discover what kind of sonic material filling the gaps would cause the least deterioration in sound quality.
Convention Paper 7585 (Purchase now)

P14-6 The Incongruency Advantage for Sounds in Natural Scenes—Brian Gygi, Veterans Affairs Northern California Health Care System - Martinez, CA, USA; Valeriy Shafiro, Rush University Medical Center - Chicago, IL, USA
This paper tests identification of environmental sounds (dogs barking or cars honking) in familiar auditory background scenes (street ambience, restaurants). Initial results with subjects trained on both the background scenes and the sounds to be identified showed a significant advantage of about 5% better identification accuracy for sounds that were incongruous with the background scene (e.g., a rooster crowing in a hospital). Studies with naïve listeners showed this effect is level-dependent: there is no advantage for incongruent sounds up to a Sound/Scene ratio (So/Sc) of –7.5 dB, after which there is again about 5% better identification. Modeling using spectral-temporal measures showed that saliency based on acoustic features cannot account for this difference.
Convention Paper 7586 (Purchase now)

Saturday, October 4, 9:00 am — 10:30 am

P15 - Loudspeakers—Part 1

P15-1 Advanced Passive Loudspeaker Protection—Scott Dorsey, Kludge Audio - Williamsburg, VA, USA
In a follow-on to a previous conference paper (AES Convention Paper 5881), the author explores the use of polymeric positive temperature coefficient (PPTC) protection devices that have a discontinuous I/V curve that is the result of a physical state change. He gives a simple model for designing networks employing incandescent lamps and PPTC devices together to give linear operation at low levels while providing effective limiting at higher levels to prevent loudspeaker damage. Some discussion of applications in current service is provided.
Convention Paper 7588 (Purchase now)

P15-2 Target Modes in Moving Assemblies of Compression Drivers and Other Loudspeakers—Fernando Bolaños, Pablo Seoane, Acústica Beyma S.A. - Valencia, Spain
This paper deals with how the important modes in a moving assembly of compression drivers and other loudspeakers can be found. Dynamic importance is an essential tool for those who work on modal analysis of systems with many degrees of freedom and complex structures. The important modes calculation or measurement in moving assemblies is an objective (absolute) method to find the relevant modes that act on the dynamics of these transducers. Our paper discusses axial modes and breath modes, which are basic for loudspeakers. The model generalized masses and the participation factors are useful tools to find the moving assemblies important modes (target modes). The strain energy of the moving assembly, which represents the amount of available potential energy, is essential as well.
Convention Paper 7589 (Purchase now)

P15-3 Determining Manufacture Variation in Loudspeakers Through Measurement of Thiele/Small Parameters—Scott Laurin, Karl Reichard, Pennsylvania State University - State College, PA, USA
Thiele/Small parameters have become a standard for characterizing loudspeakers. Using fairly straightforward methods, the Thiele/Small parameters for twenty nominally identical loudspeakers were determined. The data were compiled to determine the manufacturing variations. Manufacturing tolerances can have a large impact on the variability and quality of loudspeakers produced. Generally, when more stringent tolerances are applied, there is less variation and drivers become more expensive. Now that the loudspeakers have been characterized, each one will be driven to failure. Some loudspeakers will be intentionally degraded to accelerate failures. The goal is to correlate variation in the Thiele/Small parameters with variation in speaker failure modes and operating life.
Convention Paper 7590 (Purchase now)

P15-4 About Phase Optimization in Multitone Excitations —Delphine Bard, Vincent Meyer, University of Lund - Lund, Sweden
Multitone signals are often used as excitation for the characterization of audio systems. The frequency spectrum of the response consists of harmonics of the frequencies contained in the excitation and intermodulation products. Besides the choice of frequencies, in order to avoid frequency overlapping, there is also the need to chose adequate magnitudes and phases for the different components that constitute the multitone signal. In this paper we will investigate how the choice of the phases will impact the properties of the multitone signal, but also how it will affect the performances of a compensation method based on Volterra kernels and using multitone signals as an excitation.
Convention Paper 7591 (Purchase now)

P15-5 Viscous Friction and Temperature Stability of the Mid-High Frequency Loudspeaker—Ivan Djurek, Antonio Petosic, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
Mid-high frequency loudspeakers behave quite differently as compared to low-frequency units, regarding effects coming from the surrounding air medium. Previous work stressed high influence of the imaginary part of the viscous force, which significantly affects the resonance frequency of mid-high frequency loudspeakers. Viscous force is relatively highly dependent on temperature and humidity of the surrounding air, and in this paper we have evaluated how changes in temperature and humidity reflect to the loudspeaker's linearity, which may be significant for the quality of sound reproduction.
Convention Paper 7592 (Purchase now)

P15-6 Calorimetric Evaluation of Intrinsic Friction in the Loudspeaker Membrane—Antonio Petosic, Ivan Djurek, University of Zagreb - Zagreb, Croatia; Danijel Djurek, Alessandro Volta Applied Ceramics (AVAC) - Zagreb, Croatia
Friction losses in the vibrating system of an electrodynamic loudspeaker are represented by the intrinsic friction R_i, which enters the equation of motion, and these losses are accompanied by irreversible release of the heat. A method is proposed for measurement of the friction losses in the loudspeaker's membrane by measurement of the thermocouple temperature probe glued to the membrane. Temperature on the membrane surface fluctuates stochastically as a result of thermo-elastic coupling in the membrane material. Evaluation of the amplitude in the temperature fluctuations enables an absolute and direct evaluation of intrinsic friction R_i entering friction force F=R_i·?(x), irrespective of the nonlinearity type and strength associated with the loudspeaker operation.
Convention Paper 7593 (Purchase now)

P15-7 Phantom Powering the Modern Condenser Microphone: A Practical Look at Conditions for Optimized Performance—Mark Zaim, Tadashi Kikutani, Jackie Green, Audio-Technica U.S., Inc.
Phantom Powering a microphone is a decades old concept with powering conventions and methods that may have become obsolete, ineffective, or inefficient. Modern sound techniques, including those of live sound settings, now use many condenser microphones in settings that were previously dominated by dynamics. As a prerequisite for considering a modern phantom power specification or method, we study the efficiencies and requirements of microphones in typical multiple mic and high SPL settings in order to gain understanding of circuit and design requirements for the maximum dynamic range performance.
Convention Paper 7594 (Purchase now)

Saturday, October 4, 10:00 am — 11:00 am

Listening Session

Abstract:
Students are encouraged to bring in their projects to a non-competitive listening session for feedback and comments from Dave Greenspan, a panel, and audience. Students will be able to sign up at the first SDA meeting for time slots. Students who are finalists in the Recording competition are excluded from this event to allow others who were not finalists the opportunity for feedback.

Saturday, October 4, 10:30 am — 1:00 pm

P16 - Spatial Audio Quality with Playback Demonstration on Sunday 9:00 am – 10:00 am

Chair: Francis Rumsey, University of Surrey - Guildford, Surrey, UK

P16-1 QESTRAL (Part 1): Quality Evaluation of Spatial Transmission and Reproduction Using an Artificial Listener—Francis Rumsey, Slawomir Zielinski, Philip Jackson, Martin Dewhirst, Robert Conetta, Sunish George, University of Surrey - Guildford, Surrey, UK; Søren Beck, Bang & Olufsen a/s - Struer, Denmark; David Meares, DJM Consultancy - Sussex, UK
Most current perceptual models for audio quality have so far tended to concentrate on the audibility of distortions and noises that mainly affect the timbre of reproduced sound. The QESTRAL model, however, is specifically designed to take account of distortions in the spatial domain such as changes in source location, width, and envelopment. It is not aimed only at codec quality evaluation but at a wider range of spatial distortions that can arise in audio processing and reproduction systems. The model has been calibrated against a large database of listening tests designed to evaluate typical audio processes, comparing spatially degraded multichannel audio material against a reference. Using a range of relevant metrics and a sophisticated multivariate regression model, results are obtained that closely match those obtained in listening tests.
Convention Paper 7595 (Purchase now)

P16-2 QESTRAL (Part 2): Calibrating the QESTRAL Model Using Listening Test Data—Robert Conetta, Francis Rumsey, Slawomir Zielinski, Phillip Jackson, Martin Dewhirst, University of Surrey - Guildford, Surrey, UK; Søren Beck, Bang & Olufsen a/s - Struer, Denmark; David Meares, DJM Consultancy - Sussex, UK; Sunish George, University of Surrey - Guildford, Surrey, UK
The QESTRAL model is a perceptual model that aims to predict changes to spatial quality of service between a reference system and an impaired version of the reference system. To achieve this, the model required calibration using perceptual data from human listeners. This paper describes the development, implementation, and outcomes of a series of listening experiments designed to investigate the spatial quality impairment of 40 processes. Assessments were made using a multi-stimulus test paradigm with a label-free scale, where only the scale polarity is indicated. The tests were performed at two listening positions, using experienced listeners. Results from these calibration experiments are presented. A preliminary study on the process of selecting of stimuli is also discussed.
Convention Paper 7596 (Purchase now)

P16-3 QESTRAL (Part 3): System and Metrics for Spatial Quality Prediction—Philip J. B. Jackson, Martin Dewhirst, Rob Conetta, Slawomir Zielinski, Francis Rumsey, University of Surrey - Guildford, Surrey, UK; David Meares, DJM Consultancy - Sussex, UK; Søren Bech, Bang & Olufsen A/S - Struer, Denmark; Sunish George, University of Surrey - Guildford, Surrey, UK
The QESTRAL project aims to develop an artificial listener for comparing the perceived quality of a spatial audio reproduction against a reference reproduction. This paper presents implementation details for simulating the acoustics of the listening environment and the listener's auditory processing. Acoustical modeling is used to calculate binaural signals and simulated microphone signals at the listening position, from which a number of metrics corresponding to different perceived spatial aspects of the reproduced sound field are calculated. These metrics are designed to describe attributes associated with location, width, and envelopment attributes of a spatial sound scene. Each provides a measure of the perceived spatial quality of the impaired reproduction compared to the reference reproduction. As validation, individual metrics from listening test signals are shown to match closely subjective results obtained, and can be used to predict spatial quality for arbitrary signals.
Convention Paper 7597 (Purchase now)

P16-4 QESTRAL (Part 4): Test Signals, Combining Metrics and the Prediction of Overall Spatial Quality—Martin Dewhirst, Robert Conetta, Francis Rumsey, Philip Jackson, Slawomir Zielinski, Sunish George, University of Surrey - Guildford, Surrey, UK; Søren Beck, Bang & Olufsen A/S - Struer, Denmark; David Meares, DJM Consultancy - Sussex, UK
The QESTRAL project has developed an artificial listener that compares the perceived quality of a spatial audio reproduction to a reference reproduction. Test signals designed to identify distortions in both the foreground and background audio streams are created for both the reference and the impaired reproduction systems. Metrics are calculated from these test signals and are then combined using a regression model to give a measure of the overall perceived spatial quality of the impaired reproduction compared to the reference reproduction. The results of the model are shown to match closely the results obtained in listening tests. Consequently, the model can be used as an alternative to listening tests when evaluating the perceived spatial quality of a given reproduction system, thus saving time and expense.
Convention Paper 7598 (Purchase now)

P16-5 An Unintrusive Objective Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings—Sunish George, Slawomir Zielinski, Francis Rumsey, Robert Conetta, Martin Dewhirst, Philip Jackson, University of Surrey - Guildford, Surrey, UK; David Meares, DJM Consultancy - West Sussex, UK; Søren Bech, Bang & Olufsen A/S - Struer, Denmark
This paper describes the development of an unintrusive objective model, developed independently as a part of QESTRAL project, for predicting the sensation of envelopment arising from commercially available 5-channel surround sound recordings. The model was calibrated using subjective scores obtained from listening tests that used a grading scale defined by audible anchors. For predicting subjective scores, a number of features based on Inter-Aural Cross Correlation (IACC), Karhunen-Loeve Transform (KLT), and signal energy levels were extracted from recordings. The ridge regression technique was used to build the objective model, and a calibrated model was validated using a listening test scores database obtained from a different group of listeners, stimuli, and location. The initial results showed a high correlation between predicted and actual scores obtained from listening tests.
Convention Paper 7599 (Purchase now)

Saturday, October 4, 11:00 am — 1:00 pm

B8 - Lip Sync Issue

Chair:
Jonathan S. Abrams, Nutmeg Audio Post
Panelists:
Scott Anderson, Syntax-Brillian
Richard Fairbanks, Pharoah Editoial, Inc.
David Moulton, Sausalito Audio, LLC
Kent Terry, Dolby Laboratories

Abstract:
This is a complex problem, with several causes and fewer solutions. From production to broadcast, there are many points in the signal path and postproduction process where lip sync can either be properly corrected, or made even worse.

This session’s panel will discuss several key issues. Where do the latency issues exist in postproduction? Where do they exist in broadcast? Is there an acceptable window of latency? How can this latency be measured? What correction techniques exist? Does one type of video display exhibit less latency than another? What is being done in display design to address the latency? What proposed methods are on the horizon for addressing this issue in the future?

Join us as our panel covers the field from measurement, to post, to broadcast, and to the home.

Saturday, October 4, 11:30 am — 1:00 pm

Platinum Mastering

Moderator:
Bob Ludwig
Panelists:
Bernie Grundman, Bernie Grundman Mastering - Hollywood, CA, USA
Scott Hull, Scott Hull Mastering/Masterdisk - New York, NY, USA
Herb Powers Jr., Powers Mastering Studios - Orlando, FL, USA
Doug Sax, The Matering Lab - Ojai, CA, USA

Abstract:
In this session moderated by mastering legend Bob Ludwig, mastering all-stars talk about the craft and business of mastering and answer audience questions—including queries submitted in advance by students at top recording colleges. Panelists include Bernie Grundman of Bernie Grundman Mastering, Hollywood; Scott Hull of Scott Hull Mastering/Masterdisk, New York; Herb Powers Jr. of Powers Mastering Studios, Orlando, Fla.; and Doug Sax of The Mastering Lab, Ojai, Calif.

Saturday, October 4, 11:30 am — 1:00 pm

P17 - Loudspeakers—Part 2

P17-1 Accuracy Issues in Finite Element Simulation of Loudspeakers—Patrick Macey, PACSYS Limited - Nottingham, UK
Finite element-based software for simulating loudspeakers has been around for some time but is being used more widely now, due to improved solver functionality, faster hardware, and improvements in links to CAD software and other preprocessing improvements. The analyst has choices to make in what techniques to employ, what approximations might be made, and how much detail to model.
Convention Paper 7600 (Purchase now)

P17-2 Nonlinear Loudspeaker Unit Modeling—Bo Rohde Pedersen, Aalborg University - Esbjerg, Denmark; Finn T. Agerkvist, Technical University Denmark - Lyngby, Denmark
Simulations of a 6½-inch loudspeaker unit are performed and compared with a displacement measurement. The nonlinear loudspeaker model is based on the major nonlinear functions and expanded with time-varying suspension behavior and flux modulation. The results are presented with FFT plots of three frequencies and different displacement levels. The model errors are discussed and analyzed including a test with a loudspeaker unit where the diaphragm is removed.
Convention Paper 7601 (Purchase now)

P17-3 An Optimized Pair-Wise Constant Power Panning Algorithm for Stable Lateral Sound Imagery in the 5.1 Reproduction System—Sungyoung Kim, Yamaha Corporation - Shizuoka, Japan, and McGill University, Montreal, Quebec, Canada; Masahiro Ikeda, Akio Takahashi, Yamaha Corporation - Shizuoka, Japan
Auditory image control in the 5.1 reproduction system has been a challenge due to the arrangement of loudspeakers, especially in the lateral region. To suppress typical artifacts in a pair-wise constant power algorithm, a new gain ratio between the Left and Left Surround channel has been experimentally determined. Listeners were asked to estimate the gain ratio between two loudspeakers for seven lateral positions so as to set the direction of the sound source. From these gain ratios, a polynomial function was derived in order to parametrically represent a gain ratio in an arbitrary direction. The result of validating the experiments showed that the new function produced stable auditory imagery in the lateral region.
Convention Paper 7602 (Purchase now)

P17-4 The Use of Delay Control for Stereophonic Audio Rendering Based on VBAP—Dongil Hyun, Tacksung Choi, Daehee Youn, Yonsei University - Seoul, Korea; Seokpil Lee, Broadcasting-Communication Convergence Research Center KETI - Seongnam, Korea; Youngcheol Park, Yonsei University - Wonju, Korea
This paper proposes a new panning method that can enhance the performance of the stereophonic audio rendering system based on VBAP. The proposed system introduces a delay control to enhance the performance of the VBAP. Sample delaying is used to reduce the energy cancellation due to out-of-phase. Preliminary simulations and measurements are conducted to verify the controllability of ILD by delay control between stereophonic loudspeakers. By simulating ILD by the delay control, spatial direction at frequencies where energy cancellation occurred could be perceived more stable than the conventional VBAP. The performance of the proposed system is also verified by a subjective listening test.
Convention Paper 7603 (Purchase now)

P17-5 Ambience Sound Recording Utilizing Dual MS (Mid-Side) Microphone Systems Based upon Frequency Dependent Spatial Cross Correlation (FSCC)—Part-2: Acquisition of On-Stage Sounds—Teruo Muraoka Takahiro Miura, Tohru Ifukube, University of Tokyo - Tokyo, Japan
In musical sound recording, a forest of microphones is commonly observed. It is for good sound localization and favorable ambience, however, the forest is desired to be sparse for less laborious setting up and mixing. For this purpose the authors studied sound-image representation of stereophonic microphone arrangements utilizing Frequency Dependent Spatial Cross Correlation (FSCC), which is a cross correlation of two microphone’s outputs. The authors first examined FSCCs of typical microphone arrangements for acquisition of ambient sounds and concluded that the MS (Mid-Side) microphone system with setting directional azimuth at 132-degrees is the best. The authors also studied conditions of on-stage sounds acquisition and found that the FSCC of a co-axial type microphone takes the constant value of +1, which is advantageous for stable sound localization. Thus the authors further compared additional sound acquisition characteristics of the MS system (setting directional azimuth at 120-degrees) and the XY system. We found the former to be superior. Finally, the author proposed dual MS microphone systems. One is for on-stage sound acquisition set directional azimuth at 120-degrees and the other is for ambient sound acquisition set directional azimuth at 132-degrees.
Convention Paper 7604 (Purchase now)

P17-6 Ambisonic Loudspeaker Arrays—Eric Benjamin, Dolby Laboratories - San Francisco, CA, USA
The Ambisonic system is one of very few surround sound systems that offers the promise of reproducing full three-dimensional (periphonic) audio. It can be shown that arrays configured as regular polyhedra can allow the recreation of an accurate sound field at the center of the array. But the regular polyhedric shape can be impractical for real everyday usage because the requirement that the listener have his head located at the center of the array forces the location of the lower loudspeakers to be beneath the floor, or even the location of a loudspeaker directly beneath the listener. This is obviously impracticable, especially in domestic applications. Likewise, it is typically the case that the width of the array is larger than can be accommodated within the room boundaries. The infeasibility of such arrays is a primary reason why they have not been more widely deployed. The intent of this paper is to explore the efficacy of alternative array shapes for both horizontal and periphonic reproduction.
Convention Paper 7605 (Purchase now)

P17-7 Optimum Placement for Small Desktop/PC Loudspeakers—Vladimir Filevski, Audio Expert DOO - Skopje, Macedonia
A desktop/PC loudspeaker usually stands on a desk, so the direct sound from the loudspeaker interferes with the reflected sound from the desk. On the desk, a "perfect" loudspeaker with flat anechoic frequency response will not give a flat, but a comb-like resultant frequency response. Here is presented one simple and inexpensive solution to this problem—a small, conventional loudspeaker is placed on a holder. The holder is a horizontal pivoting telescopic arm that enables easy positioning of the loudspeaker. With one side, the arm is attached on the top corner of the PC monitor, and the other side is attached to the loudspeaker. The listener extends and rotates the arm in horizontal plane to such a position that no reflection from the desk or from the PC monitor reaches the listener, thus preserving the presumably flat anechoic frequency response of the loudspeaker.
Convention Paper 7606 (Purchase now)

Saturday, October 4, 1:00 pm — 2:00 pm

Lunchtime Keynote:
Peter Gotcher
of Topspin Media

Abstract:
The Music Business Is Dead—Long Live the NEW Music Business!

Peter Gotcher will deliver a high-level view of the changing business models facing the music industry today. Gotcher will explain why it no longer works for artists to derive their income from record labels that provide a tiny share of high volume sales. He will also explore new revenue models that include multiple revenue streams for artists; the importance of getting rid of unproductive middlemen; and generating more revenue from fewer fans.

Saturday, October 4, 2:00 pm — 4:00 pm

TT14 - WAM Studios, San Francisco

Abstract:
Women's Audio Mission is a San Francisco-based, non-profit organization dedicated to the advancement of women in music production and the recording arts. In a field where women are chronically under-represented (less than 5%), WAM seeks to "change the face of sound" by providing hands-on training, experience, career counseling, and job placement to women and girls in media technology for music, radio, film, television, and the internet. WAM believes that women's mastery of music technology and inclusion in the production process will expand the vision and voice of media and popular culture.

Note: Maximum of 20 participants per tour.

Price: $30 (members), $40 (nonmembers)

Saturday, October 4, 2:30 pm — 6:30 pm

Recording Competition - Stereo

B9 - Listener Fatigue and Longevity

Chair:
David Wilson, CEA
Panelists:
Sam Berkow, SIA Acoustics
Marvin Caesar, Aphex
James Johnston, Neural Audio Corp.
Ted Ruscitti, On-Air Research

Abstract:
This panel will discuss listener fatigue and its impact on listener retention. While listener fatigue is an issue of interest to broadcasters, it is also an issue of interest to telecommunications service providers, consumer electronics manufacturers, music producers, and others. Fatigued listeners to a broadcast program may tune out, while fatigued listeners to a cell phone conversation may switch to another carrier, and fatigued listeners to a portable media player may purchase another company’s product. The experts on this panel will discuss their research and experiences with listener fatigue and its impact on listener retention.

Saturday, October 4, 2:30 pm — 4:00 pm

P18 - Innovative Audio Applications

Chair: Cynthia Bruyns-Maxwell, University of California Berkeley - Berkeley, CA, USA

P18-1 An Audio Reproduction Grand Challenge: Design a System to Sonic Boom an Entire House—Victor W. Sparrow, Steven L. Garrett, Pennsylvania State University - University Park, PA, USA
This paper describes an ongoing research study to design a simulation device that can accurately reproduce sonic booms over the outside surface of an entire house. Sonic booms and previous attempts to reproduce them will be reviewed. The authors will present some calculations that suggest that it will be very difficult to produce the required pressure amplitudes using conventional sound reinforcement electroacoustic technologies. However, an additional purpose is to make AES members aware of this research and to solicit feedback from attendees prior to a January 2009 down-selection activity for the design of an outdoor sonic boom simulation system.
Convention Paper 7607 (Purchase now)

P18-2 A Platform for Audiovisual Telepresence Using Model- and Data-Based Wave-Field Synthesis—Gregor Heinrich, Fraunhofer Institut für Graphische Datenverarbeitung (IGD) - Darmstadt, Germany, and vsonix GmbH, Darmstadt, Germany; Christoph Jung, Volker Hahn, Michael Leitner, Fraunhofer Institut für Graphische Datenverarbeitung (IGD) - Darmstadt, Germany
We present a platform for real-time transmission of immersive audiovisual impressions using model- and data-based audio wave-field analysis/synthesis and panoramic video capturing/projection. The audio sub-system focused on in this paper is based on circular cardioid microphone and weakly directional loudspeaker arrays. We report on both linear and circular setups that feed different wave-field synthesis systems. While we can report on perceptual results for the model-based wave-field synthesis prototypes with beamforming and supercardioid input, we present findings for the data-based approach derived using experimental simulations. This data-based wave-field analysis/synthesis (WFAS) approach uses a combination of cylindrical-harmonic decomposition of cardioid array signals and angular windowing to enforce causal propagation of the synthesized field. Specifically, our contributions include (1) the creation of a high-resolution reproduction environment that is omnidirectional in both the auditory and visual modality, as well as (2) a study of data-based WFAS for real-time holophonic reproduction with realistic microphone directivities.
Convention Paper 7608 (Purchase now)

P18-3 SMART-I²: “Spatial Multi-User Audio-Visual Real-Time Interactive Interface”—Marc Rébillat, University of Paris Sud - Paris, France; Etienne Corteel, sonic emotion ag - Oberglatt, Switzerland; Brian F. Katz, University of Paris Sud - Paris, France
The SMART-I² aims at creating a precise and coherent virtual environment by providing users both audio and visual accurate localization cues. It is known that, for audio rendering, Wave Field Synthesis, and for visual rendering, Tracked Stereoscopy, individually permit spatially high quality immersion within an extended space. The proposed system combines these two rendering approaches through the use of a large multi-actuator panel used as both a loudspeaker array and as a projection screen, considerably reducing audio-visual incoherencies. The system performance has been confirmed by an objective validation of the audio interface and a perceptual evaluation of the audio-visual rendering.
Convention Paper 7609 (Purchase now)

Saturday, October 4, 2:30 pm — 5:30 pm

P19 - Spatial Audio Processing

Chair: Agnieszka Roginska, New York University - New York, NY, USA

P19-1 Head-Related Transfer Functions Reconstruction from Sparse Measurements Considering a Priori Knowledge from Database Analysis: A Pattern Recognition Approach—Pierre Guillon, Rozenn Nicol, Orange Labs - Lannion, France; Laurent Simon, Laboratoire d’Acoustique de l’Université du Maine - Le Mans, France
Individualized Head-Related Transfer Functions (HRTFs) are required to achieve high quality Virtual Auditory Spaces. This paper proposes to decrease the total number of measured directions in order to make acoustic measurements more comfortable. To overcome the limit of sparseness for which classical interpolation techniques fail to properly reconstruct HRTFs, additional knowledge has to be injected. Focusing on the spatial structure of HRTFs, the analysis of a large HRTF database enables to introduce spatial prototypes. After a pattern recognition process, these prototypes serve as a well-informed background for the reconstruction of any sparsely measured set of individual HRTFs. This technique shows better spatial fidelity than blind interpolation techniques.
Convention Paper 7610 (Purchase now)

P19-2 Near-Field Compensation for HRTF Processing—David Romblom, Bryan Cook, Sennheiser DSP Research Lab - Palo Alto, CA, USA
It is difficult to present near-field virtual audio displays using available HRTF filters, as most existing databases are measured at a single distance in the far-field of the listener’s head. Measuring near-field data is possible, but would quickly become tiresome due to the large number of distances required to simulate sources moving close to the head. For applications requiring a compelling near-field virtual audio display, one could compensate the far-field HRTF filters with a scheme based on 1/r spreading roll off. However, this would not account for spectral differences that occur in the near-field. Using difference filters based on a spherical head model, as well as a geometrically accurate HRTF lookup scheme, we are able to compensate existing data and present a convincing virtual audio display for near field distances.
Convention Paper 7611 (Purchase now)

P19-3 A Method for Estimating Interaural Time Difference for Binaural Synthesis—Juhan Nam, Jonathan S. Abel, Julius O. Smith III, Stanford University - Stanford, CA, USA
A method for estimating interaural time difference (ITD) from measured head-related transfer functions (HRTFs) is presented. The method forms ITD as the difference in left-ear and right-ear arrival times, estimated as the times of maximum cross-correlation between measured HRTFs and their minimum-phase counterparts. The arrival time estimate is related to a least-squares fit to the measured excess phase, emphasizing those frequencies having large HRTF magnitude and deweighting large phase delay errors. As HRTFs are nearly minimum-phase, this method is robust compared to the conventional approach of cross-correlating left-ear and right-ear HRTFs, which can be very different. The method also performs slightly better than techniques averaging phase delay over a limited frequency range.
Convention Paper 7612 (Purchase now)

P19-4 Efficient Delay Interpolation for Wave Field Synthesis—Andreas Franck, Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Ulf Richter, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany, and HTWK Leipzig, Leipzig, Germany
Wave Field Synthesis enables the reproduction of complex auditory scenes and moving sound sources. Moving sound sources induce time-variant delay of source signals, To avoid severe distortions, sophisticated delay interpolation techniques must be applied. The typically large numbers of both virtual sources and loudspeakers in a WFS system result in a very high number of simultaneous delay operations, thus being a most performance-critical aspect in a WFS rendering system. In this paper we investigate suitable delay interpolation algorithms for WFS. To overcome the prohibitive computational cost induced by high-quality algorithms, we propose a computational structure that achieves a significant complexity reduction through a novel algorithm partitioning and efficient data reuse.
Convention Paper 7613 (Purchase now)

P19-5 Obtaining Binaural Room Impulse Responses from B-Format Impulse Responses—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
Given a set of head related transfer functions (HRTFs) and a room impulse response measured with a Soundfield microphone, the proposed technique computes binaural room impulse responses (BRIRs) that are similar to binaural room impulse responses that would be measured if, in place of the Soundfield microphone, the dummy head used for the HRTF set was directly recording the BRIRs. The proposed technique enables that from a set of HRTFs corresponding BRIRs for different rooms are obtained without a need for the dummy head or person to be present for measurement.
Convention Paper 7614 (Purchase now)

P19-6 A New Audio Postproduction Tool for Speech Dereverberation—Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi, NTT Communication Science Laboratories - Kyoto, Japan; Toshiyuki Kubota, NTT Media Lab. - Tokyo, Japan
This paper proposes a new audio postproduction tool for speech dereverberation utilizing our previously proposed method. In previous studies we have proposed the single-channel dereverberation method as a preprocessing of automatic speech recognition and reported good performance. This paper focuses more on the improvement of the audible quality of the dereverberated signals. To achieve good dereverberation with less audible artifacts, the previously proposed dereverberation method is combined with the post-processing that implicitly takes account of the perceptual masking property. The system has three adjustable parameters for controlling the audible quality. With an informal evaluation, we found that the proposed tool allows the professional audio engineers to dereverberate a set of reverberant recordings efficiently.
Convention Paper 7615 (Purchase now)

Saturday, October 4, 2:30 pm — 4:00 pm

P20 - Loudspeakers—Part 3

P20-1 Preliminary Results of Calculation of a Sound Field Distribution for the Design of a Sound Field Effector Using a 2-Way Loudspeaker Array with Pseudorandom Configuration—Yoshihiro Iijima, Musashi Institute of Technology - Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tsukuba, Japan; Shogo Kiryu, Musashi Institute of Technology - Tokyo, Japan
We have been developing a loudspeaker array system that can control a sound field in real time for live concerts. In order to reduce the sidelobes and to improve the frequency range, a 2-way loudspeaker array with pseudorandom configuration is proposed. Software is being developed to determine the configuration. For now, the configuration is optimized for a focused sound. The software calculates the ratio between the sound pressure of the focus point and the average of the sound pressure around the focus. It was shown that the sidelobes can be reduced with a pseudorandom configuration.
Convention Paper 7616 (Purchase now)

P20-2 Design and Implementation of a Sound Field Effector Using a Loudspeaker Array—Seigo Hayashi, Tomoaki Tanno, Musashi Institute of Technology - Tokyo, Japan; Toru Kamekawa, Tokyo National University Arts and Music - Tokyo, Japan; Kaoru Ashihara, Advanced Industrial Science and Technology - Tokyo Japan; Shogo Kiryu, Musashi Institute of Technology - Tokyo, Japan
We have been developing an effector that uses a 128-channel two-way loudspeaker array system for live concerts. The system was designed to realize the change of the sound field within 10 ms. The variable delay circuits and the communication circuit between the hardware and the control computer are implemented in one FPGA. All of the delay data that have been calculated in advance are stored in the SDRAM that is mounted on the FPGA board, and only the simple command is sent from the control computer. The system can control up to four sound focuses independently.
Convention Paper 7617 (Purchase now)

P20-3 Wave Field Synthesis: Practical Implementation and Application to Sound Beam Digital Pointing—Paolo Peretti, Laura Romoli, Lorenzo Palestini, Stefania Cecchi, Francesco Piazza, Universita Politecnica delle Marche - Ancona, Italy
Wave Field Synthesis (WFS) is a digital signal processing technique introduced to achieve an optimal acoustic sensation in a larger area than in traditional systems (Stereophony, Dolby Digital). It is based on a large number of loudspeakers and its real-time implementation needs the study of efficient solutions in order to limit the computational cost. To this end, in this paper we propose an approach based on a preprocessing of the driving function component, which does not depend on the audio streaming. Linear and circular geometries tests will be described and the application of this technique to digital pointing of the sound beam will be presented.
Convention Paper 7618 (Purchase now)

P20-4 Highly Focused Sound Beamforming Algorithm Using Loudspeaker Array System—Yoomi Hur, Seong Woo Kim, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
This paper presents a sound beamforming technique that can generate a highly focused sound beam using a loudspeaker array. For this purpose, we find the optimal weight that maximizes the contrast of sound power ratio between the target region and the other regions. However, there is a limitation to make the level of non-target region low with the directly derived weights, so the iterative pattern synthesis technique, which was introduced for antenna array, is investigated. Since it is assumed that there are imaginary signal powers in the non-target regions, the system makes efforts to further improve the contrast ratio iteratively. The performance of the proposed method was evaluated, and the results showed that it could generate highly focused sound beam than conventional method.
Convention Paper 7619 (Purchase now)

P20-5 Super-Directive Loudspeaker Array for the Generation of a Personal Sound Zone—Jung-Woo Choi, Youngtae Kim, Sangchul Ko, Jung-Ho Kim, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
A sound manipulation technique is proposed for selectively enhancing a desired acoustic property in a zone of interest called personal sound zone. In order to create the personal sound zone in which a listener can experience high sound level, acoustic energy is focused on only a selected area. Recently, two performance measures indicating acoustic properties of the personal sound zone—acoustic brightness and contrast—were employed to optimize driving functions of a loudspeaker array. In this paper first some limitations of individual control method are presented, and then a novel control strategy is suggested such that advantages of both are combined in a single objective function. Precise control of a sound field with desired shape of energy distribution is made possible by introducing a continuous spatial weighting technique. The results are compared to those based on the least-square optimization technique.
Convention Paper 7620 (Purchase now)

Saturday, October 4, 3:30 pm — 5:30 pm

Grammy SoundTable

Moderator:
Mike Clink, producer/engineer/entrepreneur (Guns N’ Roses, Sarah Kelly, Mötley Crüe)
Panelists:
Sylvia Massy, producer/engineer/entrepreneur (System of a Down, Johnny Cash, Econoline Crush)
Keith Olsen, producer/engineer/entrepreneur (Fleetwood Mac, Ozzy Osbourne, POGOLOGO Productions/MSR Acoustics)
Phil Ramone, producer/engineer/visionary (Elton John, Ray Charles, Shelby Lynne)
Carmen Rizzo, artist/producer/remixer (Seal, Paul Oakenfold, Coldplay)
John Vanderslice, artist/indie rock innovator/studio owner (MK Ultra, Mountain Goats, Spoon)

Abstract:
The 20th Annual GRAMMY Recording SoundTable is presented by the National Academy of Recording Arts & Sciences Inc. (NARAS) and hosted by AES.

YOU, Inc.! New Strategies for a New Economy

Today’s audio recording professional need only walk down the aisle of a Best Buy, turn on a TV, or listen to a cell phone ring to hear possibilities for new revenue streams and new applications to showcase their talents. From video games to live shows to ringbacks and 360 deals, money and opportunities are out there. It’s up to you to grab them.

For this special event the Producers & Engineers Wing has assembled an all-star cast of audio pros who’ll share their experiences and entrepreneurial expertise in creating opportunities in music and audio. You’ll laugh, you’ll cry, you’ll learn.

Saturday, October 4, 4:30 pm — 6:00 pm

T13 - Improved Power Supplies for Audio Digital-Analog Conversion

Presenters:
Mark Brasfield, National Semiconductor Corporation - Santa Clara, CA, USA
Robert A. Pease, National Semiconductor Corporation - Santa Clara, CA, USA

Abstract:
It is well known that good, stable, linear, constant-impedance, wide-bandwidth power supplies are important for high-quality Digital-to-Analog Conversion. Poor supplies can add noise and jitter and other unknown uncertainties, adversely affecting the audio.

Saturday, October 4, 5:00 pm — 6:45 pm

B10 - Audio Transport

Chair:
David Prentice, VCA
Panelists:
Kevin Campbell, APT Ltd.
Chris Crump, Comrex
Angela DePascale, Global Digital Datacom Services Inc.
Herb Squire, DSI RF
Mike Uhl, Telos

Abstract:
This will be a discussion of techniques and technologies used for transporting audio (i.e., STL, RPU, codecs, etc.). Transporting audio can be complex. This will be a discussion of various roads you can take.

Saturday, October 4, 5:00 pm — 6:45 pm

T14 - Electric Guitar-The Science Behind the Ritual

Presenter:
Alex U. Case, University of Massachusetts - Lowell, MA, USA

Abstract:
It is an unwritten law that recording engineers’ approach the electric guitar amplifier with a Shure SM57, in close against the grille cloth, a bit off-center of the driver, and angled a little. These recording decisions serve us well, but do they really matter? What changes when you back the microphone away from the amp, move it off center of the driver, and change the angle? Alex Case, Sound Recording Technology professor to graduates and undergraduates at UMass Lowell breaks it down, with measurements and discussion of the variables that lead to punch, crunch, and other desirables in electric guitar tone.

Saturday, October 4, 5:00 pm — 6:30 pm

P21 - Low Bit-Rate Audio Coding

P21-1 A Framework for a Near-Optimal Excitation Based Rate-Distortion Algorithm for Audio Coding—Miikka Vilermo, Nokia Research Center - Tampere, Finland
An optimal excitation based rate-distortion algorithm remains an elusive target in audio coding. Typical complexity of the problem for one frame alone is in the order of 60⁵⁰. This paper presents a framework for reducing the complexity. Excitation is calculated using cochlear filters that have relatively steep slopes above and below the central frequency of the filter. An approximation of the excitation can be calculated by limiting the cochlear filters to a small frequency region. For example, the cochlear filters may span 15 subbands. In this way, the complexity can be reduced approximately to the order of 60¹⁵•50.
Convention Paper 7621 (Purchase now)

P21-2 Audio Bandwidth Extension by Frequency Scaling of Sinusoidal Partials—Tomasz Zernicki, Maciej Bartkowiak, Poznan University of Technology - Poznan, Poland
This paper describes a new technique of efficient coding of high-frequency signal components as an alternative to Spectral Band Replication. The main idea is to reconstruct the high frequency harmonic structure trajectories by using fundamental frequencies obtained at the encoder side. Audio signal is decomposed into narrow subbands by demodulation based on the local instantaneous fundamental frequency of individual partials. High frequency components are reconstructed by modulation of the baseband signals with appropriately scaled instantaneous frequencies. Such approach offers correct synthesis of rapidly changing sinusoids as well as proper reconstruction of harmonic structure in the high-frequency band. This technique allows correct energy adjustment over sinusoidal partials. The high efficiency of the proposed technique has been confirmed by listening tests.
Convention Paper 7622 (Purchase now)

P21-3 Robustness Issues in Multi-View Audio Coding—Mauri Väänänen, Nokia Research Center - Tampere, Finland
This paper studies the problem of noise unmasking when multiple spatial filtering options (multiple views) are required from multi-microphone recordings compressed with lossy coding. The envisaged application is re-use and postprocessing of user-created content. A potential solution based on inter-channel prediction is outlined, that would also allow subtractive downmix options without excessive noise unmasking. The simple case of two relatively closely spaced omnidirectional microphones and mono downmix is used as an example, experimenting with real-world recordings and MPEG-1 Layer 3 coding.
Convention Paper 7623 (Purchase now)

P21-4 Quality Improvement of Very Low Bit Rate HE-AAC Using Linear Prediction Module—GunWoo Lee, JaeSeong Lee, University of Yonsei - Seoul, Korea; YoungCheol Park, University o Yonsei - Wonju-city, Korea; DaeHee Youn, University of Yonsei - Seoul, Korea
This paper proposes a new method of improving the quality of High Efficiency Advanced Audio Coding (HE-AAC) at very low bit rate under 16 kbps. Low bit rate HE-AAC often produces obvious spectral holes inducing musical noise in low energy frequency bands due to its limited number of available bits. In the proposed system, a linear prediction module is combined with HE-AAC as a pre-processor to reduce the spectral holes. For its efficient implementation, masking threshold of psychoacoustic model is normalized with LPC spectral envelope to quantize LPC residual signal with appropriate masking threshold. To reduce the pre-echo, we also modified the block switching module. Experimental results show that, at very low bit rate modes, the linear prediction module effectively reduce the spectral holes, which results in the reduction of musical noises compared to the conventional HE-AAC.
Convention Paper 7624 (Purchase now)

P21-5 An Implementation of MPEG-4 ALS Standard Compliant Decoder on ARM Core CPUs—Noboru Harada, Takehiro Moriya, Yutaka Kamamoto, NTT Communication Science Labs. - Kanagawa, Japan
MPEG-4 Audio Lossless Coding (ALS) is a standard that losslessly compresses audio signals in an efficient manner. MPEG-4 ALS is a suitable compression scheme for high-sound-quality portable music players. We have implemented a decoderder compliant with the MPEG-4 ALS standard on the ARM platform. In this paper the required CPU resources for MPEG-4 ALS tools on ARM9E are characterized by using an ARM CPU emulator, called ARMulator, as a simulation platform. It is shown that the required CPU clock cycle for decoding MPEG-4 ALS standard compliant bit streams is less than 20 MHz for 44.1-kHz/16-bit, stereo signals on ARM9E when the combination of the MPEG-4 ALS tools is properly selected and coding parameters are properly restricted.
Convention Paper 7625 (Purchase now)

Saturday, October 4, 6:00 pm — 7:00 pm

MIX Foundation 2008 TECnology Hall of Fame

Abstract:
Hosted by Mix Magazine Executive Editor/TECnology Hall of Fame director George Petersen.

Presented annually by the Mix Foundation for Excellence in Audio to honor significant, lasting contributions to the advancement of audio technology, this year's event will recognize fifteen audio innovations. "It is interesting to note how many of these products are still in daily use decades after their introduction," Petersen says. "These aren't simply museum pieces, but working tools. We're proud to recognize their significance to the industry."

Sunday, October 5, 9:00 am — 10:00 am

P16 Papers Demo

Abstract:
Playback session related to Paper Session 16 "Spatial Audio Quality" held Saturday, October 4 from 10:30 am to 1:00 pm.

Sunday, October 5, 9:00 am — 10:30 am

Design Competition

Abstract:
The design competition is a competition for audio projects developed by students at any university or recording school challenging students with an opportunity to showcase their technical skills. This is not for recording projects or theoretical papers, but rather design concepts and prototypes. Designs will be judged by a panel of industry experts in design and manufacturing. Multiple prizes will be awarded.

Sunday, October 5, 9:00 am — 11:00 am

M4 - Acoustics and Multiphysics Modeling

Presenter:
John Dunec, Comsol - Palo Alto, CA, USA

Abstract:
This Master Class covers acoustics and multiphysics modeling using Comsol. The Acoustics Module is specifically designed for those who work in classical acoustics with devices that produce, measure, and utilize acoustic waves. Application areas include the design of loudspeakers, microphones, hearing aides, noise control, sound barriers, mufflers, buildings, and performance spaces.

Sunday, October 5, 9:00 am — 10:45 am

W10 - File Formats for Interactive Applications and Games

Chair:
Chris Grigg, Beatnik, Inc.
Panelists:
Christof Faller, Illusonic LLC
John Lazzaro, University of California, Berkeley
Juergen Schmidt, Thomson

Abstract:
There are a number of different standards covering file formats that may be applicable to interactive or game applications. However, some of these older formats have not been widely adopted and newer formats may not yet be very well known. Other formats may be used in non-interactive applications but may be equally suitable to interactive applications. This tutorial reviews the requirements of an interactive file format. It presents an overview of currently available formats and discusses their suitability to certain interactive applications. The panel will discuss why past efforts at interactive audio standards have not made it to product and look to widely-adopted standards in related fields (graphics and networking) in order to borrow their successful traits for future standards. The workshop is presented by a number of experts who have been involved in the standardization or development of these formats. The formats
covered include Ambisonics B-Format, MPEG-4 object coding, MPEG-4 Structured Audio Orchestral Language, MPEG-4 Audio BIFS, and the upcoming iXMF standard.

Sunday, October 5, 9:00 am — 10:45 am

B11 - Internet Streaming—Audio Quality, Measurement, and Monitoring

Chair:
David Bialik
Panelists:
Ray Archie, CBS Radio
Rusty Hodge, SomaFM
Benjamin Larson, Streambox, Inc.
Greg J. Ogonowski, Orban/CRL
Skip Pizzi, Contributing Editor, Radio World magazine
Geir Skaaden, Neural Audio Corp.

Abstract:
Internet Streaming has become a provider of audio and video content to the public. Now that the public has recognized the medium, the provider needs to deliver the content with a quality comparable to other mediums. Audio monitoring is becoming important, and a need to quantify the performance is important so that the streamer can deliver product of a standard quality.

Sunday, October 5, 9:00 am — 11:00 am

P23 - Audio DSP

Chair: Jon Boley, LSB Audio

P23-1 Determination and Correction of Individual Channel Time Offsets for Signals Involved in an Audio Mixture—Enrique Perez Gonzalez, Joshua Reiss, Queen Mary University of London - London, UK
A method for reducing comb-filtering effects due to delay time differences between audio signals in sound mixer has been implemented. The method uses a multichannel cross-adaptive effect topology to automatically determine the minimal delay and polarity contributions required to optimize the sound mixture. The system uses real time, time domain transfer function measurements to determine and correct the individual channel offset for every signal involved in the audio mixture. The method has applications in live and recorded audio mixing where recording a single sound source with more than one signal path is required, for example when recording a drum set with multiple microphones. Results are reported that determine the effectiveness of the proposed method.
Convention Paper 7631 (Purchase now)

P23-2 STFT-Domain Estimation of Subband Correlations—Michael M. Goodwin, Creative Advanced Technology Center - Scotts Valley, CA, USA
Various frequency-domain and subband audio processing algorithms for upmix, format conversion, spatial coding, and other applications have been described in the recent literature. Many of these algorithms rely on measures of the subband autocorrelations and cross-correlations of the input audio channels. In this paper we consider several approaches for estimating subband correlations based on a short-time Fourier transform representation of the input signals.
Convention Paper 7632 (Purchase now)

P23-3 Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings—Chao-Ling Hsu, Jyh-Shing Roger Jang, National Tsing Hua University - Hsinchu, Taiwan; Te-Lu Tsai, Institute for Information Industry - Taipei, Taiwan
Separating singing voice from music accompaniment is an appealing but challenging problem, especially in the monaural case. One existing approach is based on computational audio scene analysis, which uses pitch as the cue to resynthesize the singing voice. However, the unvoiced parts of the singing voice are totally ignored since they have no pitch at all. This paper proposes a method to detect unvoiced parts of an input signal and to resynthesize them without using pitch information. The experimental result shows that the unvoiced parts can be reconstructed successfully with 3.28 dB signal-to-noise ratio higher than that achieved by the currently state-of-the-art method in the literature.
Convention Paper 7633 (Purchase now)

P23-4 Low Latency Convolution In One Dimension Via Two Dimensional Convolutions: An Intuitive Approach—Jeffrey Hurchalla, Garritan Corp. - Orcas, WA, USA
This paper presents a class of algorithms that can be used to efficiently perform the running convolution of a digital signal with a finite impulse response. The impulse is uniformly partitioned and transformed into the frequency domain, changing the one dimensional convolution into a two dimensional convolution that can be efficiently solved with nested short length acyclic convolution algorithms applied in the frequency domain. The latency of the running convolution is the time needed to acquire a block of data equal in size to the uniform partition length.
Convention Paper 7634 (Purchase now)

Sunday, October 5, 9:00 am — 10:30 am

P24 - Audio Digital Signal Processing and Effects—Part 1

P24-1 Simple Arbitrary IIRs—Richard Lee, Pandit Littoral - Cooktown, Queensland, Australia
This is a method of fitting IIRs (Infinite Impulse Response filters) to an arbitrary frequency response simple enough to incorporate in intelligent AV receivers. Short IIR filters are useful where computational power is limited and at low frequencies where FIRs have poor performance. Loudspeaker and microphone frequency response defects are often better matched to IIRs. Some caveats for digital EQ design are discussed. The emphasis is on loudspeakers and microphones
Convention Paper 7635 (Purchase now)

P24-2 Analysis of Design Parameters for Crosstalk Cancellation Filters Applied to Different Loudspeaker Configurations—Yesenia Lacouture Parodi, Aalborg University - Aalborg, Denmark
Several approaches to render binaural signals through loudspeakers have been proposed in past decades. Some studies have focused on the optimum loudspeaker arrangement while others have proposed more efficient filters. However, to our knowledge, the identification of optimal parameters for crosstalk cancellation filters applied to different loudspeaker configurations has not yet been addressed systematically. In this paper we document a study of three different inversion techniques applied to several loudspeaker arrangements. Least square approximations in frequency and time domain are evaluated along with a crosstalk canceller-based on minimum-phase approximation. The three methods are simulated in two-channel configuration and the least square approaches in four-channel configurations. Different span angles and elevations are evaluated for each case. In order to obtain optimum parameter, we varied the bandwidth, filter length, and regularization constant for each loudspeaker position and each method. We present a description of the simulations carried out and the optimum regularization values, expected channel separation, and performance error obtained for each configuration.
Convention Paper 7636 (Purchase now)

P24-3 A Hybrid Time and Frequency Domain Audio Pitch Shifting Algorithm—Nicolas Juillerat, University of Fribourg - Fribourg, Switzerland; Stefan Müller Arisona, University of Santa Barbara - Santa Barbara, CA, USA; Simon Schubiger-Banz, Computer Systems Institute, ETH Zürich - Zürich, Switzerland
This paper presents an abstract algorithm that performs audio pitch shifting as a combination of a signal analysis, a filter bank, and frequency shifting operations. Then, it is shown that two previously proposed pitch shifting algorithms are actually concrete implementations of the presented abstract algorithm. One of them is implemented in the frequency domain whereas the other is implemented in the time domain. Based on an analysis and comparison of the properties of these two implementations (quality, artifacts, assumptions on the signal), we propose a new hybrid implementation working partially in the frequency domain and partially in the time domain, and achieving superior quality by taking the best from each of the two existing implementations.
Convention Paper 7637 (Purchase now)

P24-4 A Colored Noise Suppressor Using Lattice Filter with Correlation Controlled Algorithm—Arata Kawamura, Youji Iiguni, Osaka University - Toyonaka, Osaka, Japan
A noise suppression technique is necessary in a wide range of applications including mobile communication and speech recognition systems. We have previously proposed a noise suppressor using a lattice filter that can cancel a white noise from an observed signal. Unfortunately, many practical noises are not white, and hence the conventional noise suppressor is not available for the practical noises. In this paper we propose a new adaptive algorithm used for the lattice filter to suppress a colored noise. The proposed algorithm can be directly derived from the conventional time recursive algorithm. To extract a speech from a speech mixed with colored noise, the lattice filter with the proposed algorithm gives a noise replica whose auto-correlation is close to the noise’s one. Subtracting the noise replica from the observed noisy speech, we can obtain an extracted speech. Simulation results showed that the proposed noise suppressor can extract a speech from a speech mixed with a tunnel noise, which is a colored noise recorded in a practical environment.
Convention Paper 7638 (Purchase now)

P24-5 Accurate IIR Equalization to an Arbitrary Frequency Response, with Low Delay and Low Noise Real-Time Adjustment—Peter Eastty, Oxford Digital Limited - Stonesfield, Oxfordshire, UK
A new form of equalizer has been developed that combines minimum phase, low delay, IIR signal processing with low noise, real-time adjustment of coefficients to accurately deliver an arbitrary frequency response as entered from a graphical user interface. The use of a join-the-dots type graphical user interface combined with cubic or similar splines is a common method of entering curved lines into 2-D drawing programs. The equalizer described in this paper combines a similar type of user interface with low-delay, minimum phase, IIR audio DSP. Key attributes also include real-time, nearly noiseless adjustment of the DSP coefficients in response to user input. All necessary information for the construction of these filters is included.
Convention Paper 7639 (Purchase now)

P24-6 A Method of Capacity Increase for Time-Domain Audio Watermarking Based on Low-Frequency Amplitude Modification—Harumi Murata, Akio Ogihara, Motoi Iwata, Akira Shiozaki, Osaka Prefecture University - Osaka, Japan
The objective of this work is to increase the capacity of watermark information in “the audio watermarking method based on amplitude modification,” which has been proposed by W. N. Lie as a prevention technique against copyright infringement. In this conventional method, the capacity of watermark information is not enough, and it is desirable that the capacity of watermark information is increased. In this paper we increase the capacity of watermark information by embedding multiple watermarks in the different levels of audio data independently. The proposed method has many data-channels for embedding, and hence it is possible to embed multiple watermarks by selecting the proper data-channel according to required data capacity or recovery rate.
Convention Paper 7640 (Purchase now)

P24-7 Constrained-Optimized Sound Beamforming of Loudspeaker-Array System—Myung Song, Soonho Baek, Yonsei University - Seoul, Korea; Seok-Pil Lee, Korea Electronics Technology Institute - Seongnam, Korea; Hong-Goo Kang, Yonsei University - Seoul, Korea
This paper proposes a novel loudspeaker-array system to form relatively high sound pressure toward the desired location. The proposed algorithm adopts a constrained-optimization technique such that the array response to the desired response is maintained over mainlobe width while minimizing its sidelobe level. At first the characteristic of sound propagation in reverberant environment is analyzed by off-line computer simulation. Then, the performance of the implemented loudspeaker-array system is evaluated by measuring sound pressure distribution in a real test room. The results show that the proposed sound beamforming algorithm forms more concentrative sound beam to the desired location than conventional algorithms even in a reverberation environment.
Convention Paper 7641 (Purchase now)

Sunday, October 5, 11:00 am — 1:00 pm

W11 - Upcoming MPEG Standard for Efficient Parametric Coding and Rendering of Audio Objects

Chair:
Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS
Panelists:
Jonas Engdegård
Christof Faller
Jürgen Herre
Leon van de Kerkhof

Abstract:
Through exploiting the human perception of spatial sound, “Spatial Audio Coding” technology enabled new ways of low bit-rate audio coding for multichannel signals. Following the finalization of the MPEG Surround specification, ISO/MPEG launched a follow-up standardization activity for bit-rate-efficient and backward compatible coding of several sound objects. On the receiving side, such a Spatial Audio Object Coding (SAOC) system renders the objects interactively into a sound scene on a reproduction setup of choice. The workshop reviews the ideas, principles, and prominent applications behind Spatial Audio Object Coding and reports on the status of the ongoing ISO/MPEG Audio standardization activities in this field. The benefits of the new approach will be highlighted and illustrated by means of real-time demonstrations.

Sunday, October 5, 11:30 am — 1:00 pm

Platinum Road Warriors

Moderator:
Clive Young
Panelists:
Eddie Mapp
Paul “Pappy” Middleton
Howard Page

Abstract:
An all-star panel of leading front-of-house engineers will explore subject matter ranging from gear to gossip, in what promises to be an insightful, amusing, and enlightening 90 minute session. Engineers for superstar artists will discuss war stories, technical innovations, and heroic efforts to maintain the eternal “show must go on” code of the road. Ample time will be provided for an audience Q&A session.

Sunday, October 5, 11:30 am — 1:00 pm

T15 - Real-Time Embedded Audio Signal Processing

Presenter:
Paul Beckmann, DSP Concepts, LLC - Sunnyvale, CA, USA

Abstract:
Product developers implementing audio signal processing algorithms in real-time encounter a host of challenges and tradeoffs. This tutorial focuses on the high-level architectural design decisions commonly faced. We discuss memory usage, block processing, latency, interrupts, and threading in the context of modern digital signal processors with an eye toward creating maintainable and reusable code. The impact of integrating audio decoders and streaming audio to the overall design will be presented. Examples will be drawn from typical professional, consumer, and automotive audio applications.

Sunday, October 5, 11:30 am — 1:00 pm

T16 - Latest Advances in Ceramic Loudspeakers and Their Drivers

Presenters:
Mark Cherry, Maxim, Inc. - Sunnyvale, CA, USA
Robert Polleros, Maxim Integrated Products - Austria
Peter Tiller, Murata - Atlanta, GA, USA

Abstract:
New cell phone designs demand small form factor while maintaining audio sound-pressure level. Speakers have typically been the component that limits the thinness of the design. New developments in ceramic, or piezoelectric, loudspeakers have opened the door for new sleek designs. Due to the capacitive nature of these ceramic speakers, special considerations need to be taken into account when choosing an audio amplifier to drive them. Today’s portable devices need smaller, thinner, more power-efficient electronic components. Cellular phones have become so thin that the dynamic speaker is typically the limiting factor in how thin manufacturers can make their handsets. The ceramic, or piezoelectric, speaker is quickly emerging as a viable alternative to the dynamic speaker. These ceramic speakers can deliver competitive sound-pressure levels (SPL) in a thin and compact package, thus potentially replacing traditional voice-coil dynamic speakers.

Sunday, October 5, 2:30 pm — 4:00 pm

W13 - Wanna Feel My LFE? And 51 Other Questions to Scare Your Grandma

Panelists:
Florian Camerer, ORF – Austrian TV
Bosse Ternstrom, Swedish Radio

Abstract:
Florian Camerer, ORF, and Bosse Ternstrom, Swedish Radio, are two veterans of surround sound production and mixing. In this workshop a multitude of diverse examples will be played, with controversial styles, wildly differing mixing techniques, earth shaking low frequency effects, and dynamic range that lives up to its name! Be prepared for a rollercoaster ride through multichannel audio wonderland!

Sunday, October 5, 2:30 pm — 4:00 pm

W14 - Navigating the Technology Mine Field in Game Audio

Chair:
Marc Schaefgen, Midway Games
Panelists:
Rich Carle, Midway Games
Clark Crawford, Midway Games
Kristoffer Larson, Midway Games

Abstract:
In the early days of game audio systems, tools and assets were all developed and produced in-house. The growth of the games industry has resulted in larger audio teams with separate groups dedicated to technology or content creation. The breadth of game genres and number of “specialisms” required to create the technology and content for a game have mandated that developers look out of house for some of their game audio needs.

In this workshop the panel discusses the changing needs of the game-audio industry and the models that a studio typically uses to produce the audio for a game. The panel consists of a number of audio directors from different first-party studios owned by Midway games. They will examine the current middleware market and explain how various tools are used by their studios in the audio production chain. The panel also explains how out-of-house musicians or sound designers are outsourced as part of the production process.

Sunday, October 5, 2:30 pm — 4:30 pm

T17 - An Introduction to Digital Pulse Width Modulation for Audio Amplification

Presenter:
Pallab Midya, Freescale Semiconductor Inc. - Austin, TX, USA

Abstract:
Digital PWM is highly suitable for audio amplification. Digital audio sources can be readily converted to digital PWM using digital signal processing. The mathematical nonlinearity associated with PWM can be corrected with extremely high accuracy. Natural sampling and other techniques will be discussed that convert a PCM signal to a digital PWM signal. Due to limitations of digital clock speeds and jitter, the duty ratio of the PWM signal has to be quantized to a small number of bits. The noise due to quantization can be effectively shaped to fall outside the audio band. PWM specific noise shaping techniques will be explained in detail. Further, there is a need for sample rate conversion for a digital PWM modulator to work with a digital PCM signal that is generated using a different clock. The mathematics of an asynchronous sample rate converters will also be discussed. Digital PWM signals are amplified by a power stage that introduces nonlinearity and mixes in noise from the power supply. This mechanism will be examined and ways to correct for it will be discussed.

Sunday, October 5, 2:30 pm — 4:00 pm

P26 - Audio Digital Signal Processing and Effects—Part 2

P26-1 Applications of Algorithmically-Generated Digital Audio for Web-Based Sonic Measure Ear Training—Christopher Ariza, Towson University - Towson, MD, USA
This paper examines applications of algorithmically-generated digital audio for a new type of ear training. This approach, called sonic measure ear training, circumvents the many limits of MIDI-based aural testing, and may offer a valuable resource for computer musicians and audio engineers. The Post-Ut system, introduced here, is the first web-based ear training system to offer sonic measure ear-training. After describing the design of the Post-Ut system, including the use of athenaCL, Csound, Python, and MySQL, the audio generation procedures are examined in detail. The design of questions and perceptual considerations are evaluated, and practical applications and opportunities for future development are outlined.
Convention Paper 7645 (Purchase now)

P26-2 A Perceptual Model-Based Speech Enhancement Algorithm—Rongshan Yu, Dolby Laboratories - San Francisco, CA, USA
This paper presents a perceptual model-based speech enhancement algorithm. The proposed algorithm measures the amount of the audible noise in the input noisy speech explicitly by using a psychoacoustic model, and decides an appropriate amount of noise reduction accordingly to achieve good noise level reduction without introducing significant distortion to the clean speech embedded in the input noisy signal. The proposed algorithm also mitigates the musical noise problem commonly encountered in conventional speech enhancement algorithms by having the amount of noise reduction adapt to the instantly estimated noise amplitude. Good performance of the proposed algorithm has been confirmed through objective and subjective tests.
Convention Paper 7646 (Purchase now)

P26-3 Real Time Implementation of an ESPRIT-Based Bass Enhancement Algorithm—Lorenzo Palestini, Emanuele Moretti, Paolo Peretti, Stefania Cecchi, Laura Romoli, Francesco Piazza, Università Politecnica delle Marche - Ancona, Italy
This paper presents a software real-time implementation for the NU-Tech platform of a bass enhancement algorithm based on the FAPI subspace tracker and the ESPRIT algorithm for fundamentals estimation to realize bass improvement of small loudspeakers exploiting the well known psychoacoustic phenomenon of the missing fundamental. Comparative informal listening tests have been performed to validate the virtual bass improvement, and their results show that the proposed method is well appreciated.
Convention Paper 7647 (Purchase now)

P26-4 Low-Power Implementation of a Subband Acoustic Echo Canceller for Portable Devices—Julie Johnson, David Hermann, John Wdowiak, Edward Chau, Hamid Sheikhzadeh, ON Semiconductor - Waterloo, Ontario, Canada
Portable audio communication devices require increasingly superior audio quality while using minimal power. Devices such as cell phones with speakerphone functionality can generate substantial acoustic echo due to the proximity of the microphone and speaker. To improve the audio quality in such devices, an oversampled subband acoustic echo canceller has been implemented on a miniature low-power dual core DSP system. This application is comprised of three subband-based algorithms: a Pseudo-Affine Projection adaptive filter, an Ephraim-Malah based single-microphone noise reduction algorithm, and a novel nonlinear residual echo suppressor. The system consumes less than 4 mW of power when configured with a 128 ms filter. Real-world tests indicate an echo return loss enhancement of greater than 30 dB for typical input levels.
Convention Paper 7648 (Purchase now)

P26-5 A Digital Model of the Echoplex Tape Delay—Steinunn Arnardottir, Jonathan S. Abel, Julius O. Smith, Stanford University - Stanford, CA, USA
The Echoplex is a tape delay unit featuring fixed playback and erase heads, a moveable record head, and a tape loop moving at roughly 8 ips. The relatively slow tape speed allows large frequency shifts, including "sonic booms" and shifting of the tape bias signal into the audio band. Here, the Ecxhoplex tape delay is modeled with read, write, and erase pointers moving along a circular buffer. The model separately generates the quasiperiodic capstan and pinch wheel components and drift of the observed fluctuating time delay. This delay drives an interpolated write simulating the record head. To prevent aliasing in the presence of a changing record head speed, an anti-aliasing filter with a variable cutoff frequency is described.
Convention Paper 7649 (Purchase now)

P26-6 A Digital Reverberator Modeled after the Scattering of Acoustic Waves by Trees in a Forest—Kyle Spratt, Jonathan S. Abel, Stanford University - Stanford, CA, USA
A digital reverberator modeled after the scattering of acoustic waves among trees in an idealized forest is presented. Termed "treeverb," the technique simulates forest acoustics using a network of digital waveguides, with bi-directional delay lines connecting trees represented by multi-port scattering junctions. The reverberator is designed by selecting tree locations and diameters, with waveguide delays determined by inter-tree distances, and scattering filters fixed according to tree-to-tree angles and trunk diameters. The scattering is modeled as that of plane waves normally incident on a rigid cylinder, and a simple low-order scattering filter is presented and shown to closely approximate the theoretical scattering. Small forests are seen to yield dense, gated reverb-like impulse responses.

Sunday, October 5, 4:30 pm — 6:00 pm

W15 - Interactive MIDI-Based Technologies for Game Audio

Chair:
Steve Martz, THX Ltd.
Panelists:
Chris Grigg, IASIG
Larry the O
Tom Savell, Creative Labs

Abstract:
The MIDI Manufacturers Ass’n (MMA) has developed three new standards for MIDI-based technologies with applications in game audio. The 3-D MIDI Controllers specification allows for real-time positioning and movement of music and sound sources in 3-D space, under MIDI control. The Interactive XMF specification marks the first nonproprietary file format for portable, cue-oriented interactive audio and MIDI content with integrated scripting. Finally, the MMA is working toward a completely new, and drastically simplified, 32-bit version of the MIDI message protocol for use on modern transports and software APIs, called the HD Protocol for MIDI Devices.

Sunday, October 5, 4:30 pm — 6:30 pm

T19 - Point-Counterpoint—Fixed vs. Floating-Point DSPs

Presenters:
Robert Bristow-Johnson, Audio Imagination
Jayant Datta, THX - Syracuse, NY, USA
Boris Lerner, Analog Devices - Norwood, MA, USA
Matthew Watson, Texas Instruments, Inc. - Dallas, TX, USA

Abstract:
There is a lot of controversy and interest in the signal processing community concerning the use of fixed and floating-point DSPs. There are various trade-offs between these two approaches. The audience will walk away with an appreciation of these two approaches and an understanding of the strengths of weaknesses of each. Further, this tutorial will focus on audio-specific signal processing applications to show when a fixed-point DSP is applicable and when a floating-point DSP is suitable.

Sunday, October 5, 5:00 pm — 6:45 pm

T20 - Radio Frequency Interference and Audio Systems

Presenter:
Jim Brown, Audio Systems Group, Inc.

Abstract:
This tutorial begins by identifying and discussing the fundamental mechanisms that couple RF into audio systems and allow it to be detected. Attention is then given to design techniques for both equipment and systems that avoid these problems and methods of fixing problems with existing equipment and systems that have been poorly designed or built.

AES San Francisco 2008Consumer Products and Applications Event Details

SDA Meeting – 1

P1 - Audio Coding

T1 - Electroacoustic Measurements

B1 - Listening Tests on Existing and New HDTV Surround Coding Systems

P2 - Analysis and Synthesis of Sound

T2 - Standards-Based Audio Networks Using IEEE 802.1 AVB

Opening CeremoniesAwardsKeynote Speech

TT2 - Dolby Laboratories, San Francisco

T3 - Broadband Noise Reduction: Theory and Applications

W3 - Analyzing, Recommending, and Searching Audio Content—Commercial Applications of Music Information Retrieval

P4 - Acoustic Modeling and Simulation

Listening Session

B4 - Mobile/Handheld Broadcasting: Developing a New Medium

M1 - Basic Acoustics: Understanding the Loudspeaker

L3 - AC Power and Grounding

W5 - Engineering Mistakes We Have Made in Audio

Perceptual Audio Coding—The First 20 Years

M2 - Binaural Audio Technology—History, Current Practice, and Emerging Trends

T4 - Perceptual Audio Evaluation

P5 - Audio Equipment and Measurements

P6 - Loudspeaker Design

W6 - Audio Networking for the Pros

P7 - Audio Content Management

TT4 - Paul Stubblebine Mastering/The Tape Project, San Francisco

B5 - Loudness Workshop

L5 - Practical Advice for Wireless Systems Users

P8 - Room Acoustics and Binaural Audio

TT5 - Singer V7 Studios/Universal Audio, San Francisco

Lunchtime Keynote: Dave Giovannoni of First Sounds

Recording Competition - Surround

TT6 - Tarpan Studios/Ursa Minor Arts & Media, San Rafael

Compressors—A Dynamic Perspective

M3 - Sonic Methodology and Mythology

TT7 - Sony Computer Entertainment America, Foster City

TT8 - Singer V7 Studios/Universal Audio, San Francisco

P9 - Multichannel Sound Reproduction

P10 - Nonlinearities in Loudspeakers

P11 - Listening Tests & Psychoacoustics

B6 - History of Audio Processing

T7 - Sound in the UI

TT9 - Singer V7 Studios/Universal Audio, San Francisco

W7 - Same Techniques, Different Technologies—Recurring Strategies for Producing Game, Web, and Mobile Audio

P12 - Amplifiers and Automotive Audio

Heyser Lecture followed by Technical Council Reception

B7 - DTV Audio Myth Busters

T9 - How I Does Filters: An Uneducated Person’s Way to Design Highly Regarded Digital Equalizers and Filters

T10 - New Technologies for Up to 7.1 Channel Playback in Any Game Console Format

P13 - Spatial Perception

P14 - Listening Tests & Psychoacoustics

P15 - Loudspeakers—Part 1

Listening Session

P16 - Spatial Audio Quality with Playback Demonstration on Sunday 9:00 am – 10:00 am

B8 - Lip Sync Issue

Platinum Mastering

P17 - Loudspeakers—Part 2

Lunchtime Keynote:Peter Gotcherof Topspin Media

TT14 - WAM Studios, San Francisco

Recording Competition - Stereo

B9 - Listener Fatigue and Longevity

P18 - Innovative Audio Applications

P19 - Spatial Audio Processing

P20 - Loudspeakers—Part 3

Grammy SoundTable

T13 - Improved Power Supplies for Audio Digital-Analog Conversion

B10 - Audio Transport

T14 - Electric Guitar-The Science Behind the Ritual

P21 - Low Bit-Rate Audio Coding

MIX Foundation 2008 TECnology Hall of Fame

P16 Papers Demo

Design Competition

M4 - Acoustics and Multiphysics Modeling

W10 - File Formats for Interactive Applications and Games

B11 - Internet Streaming—Audio Quality, Measurement, and Monitoring

P23 - Audio DSP

P24 - Audio Digital Signal Processing and Effects—Part 1

W11 - Upcoming MPEG Standard for Efficient Parametric Coding and Rendering of Audio Objects

Platinum Road Warriors

T15 - Real-Time Embedded Audio Signal Processing

T16 - Latest Advances in Ceramic Loudspeakers and Their Drivers

AES San Francisco 2008
Consumer Products and Applications Event Details

Opening Ceremonies
Awards
Keynote Speech

Heyser Lecture
followed by
Technical Council
Reception

Lunchtime Keynote:
Peter Gotcher
of Topspin Media