AES San Francisco 2008
Library Event Details

Thursday, October 2, 9:00 am — 12:30 pm

P1 - Audio Coding

Chair: Marina Bosi, Stanford University - Stanford, CA, USA

P1-1 A Parametric Instrument Codec for Very Low Bit Rates—Mirko Arnold, Gerald Schuller, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany
A technique for the compression of guitar signals is presented that utilizes a simple model of the guitar. The goal for the codec is to obtain acceptable quality at significantly lower bit rates compared to universal audio codecs. This instrument codec achieves its data compression by transmitting an excitation function and model parameters to the receiver instead of the waveform. The parameters are extracted from the signal using weighted least squares approximation in the frequency domain. For evaluation a listening test has been conducted and the results are presented. They show that this compression technique provides a quality level comparable to recent universal audio codecs. The application however is, at this stage, limited to very simple guitar melody lines. [This paper is being presented by Gerald Schuller.]
Convention Paper 7501 (Purchase now)

P1-2 Stereo ACC Real-Time Audio Communication—Anibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA; Filipe Abreu, SEEGNAL Research - Portugal; Deepen Sinha, ATC Labs - Chatham, NJ, USA
Audio Communication Coder (ACC) is a codec that has been optimized for monophonic encoding of mixed speech/audio material while minimizing codec delay and improving intrinsic error robustness. In this paper we describe two major recent algorithmic improvements to ACC: on-the-fly bit rate switching and coding of stereo. A combination of source, parametric, and perceptual coding techniques allows a very graceful switching between different bit rates with minimal impact on the subjective quality. A real-time GUI demonstration platform is available that illustrates the ACC operation from 16 kbit/s mono till 256 kbit/s stereo. A real-time two-way stereo communication platform over Bluetooth has been implemented that illustrates the ACC operational flexibility and robustness in error-prone environments.
Convention Paper 7502 (Purchase now)

P1-3 MPEG-4 Enhanced Low Delay AAC—A New Standard for High Quality Communication—Markus Schnell, Markus Schmidt, Manuel Jander, Tobias Albert, Ralf Geiger, Fraunhofer IIS - Erlangen, Germany; Vesa Ruoppila, Per Ekstrand, Dolby Stockholm/Sweden, Nuremberg/Germany; Bernhard Grill, Fraunhofer IIS - Erlangen, Germany
The MPEG Audio standardization group has recently concluded the standardization process for the MPEG-4 ER Enhanced Low Delay AAC (AAC-ELD) codec. This codec is a new member of the MPEG Advanced Audio Coding family. It represents the efficient combination of the AAC Low Delay codec and the Spectral Band Replication (SBR) technique known from HE-AAC. This paper provides a complete overview of the underlying technology, presents points of operation as well as applications, and discusses MPEG verification test results.
Convention Paper 7503 (Purchase now)

P1-4 Efficient Detection of Exact Redundancies in Audio Signals—José R. Zapata G., Universidad Pontificia Bolivariana - Medellín, Antioquia, Colombia; Ricardo A. Garcia, Kurzweil Music Systems - Waltham, MA, USA
An efficient method to identify bitwise identical long-time redundant segments in audio signals is presented. It uses audio segmentation with simple time domain features to identify long term candidates for similar segments, and low level sample accurate metrics for the final matching. Applications in compression (lossy and lossless) of music signals (monophonic and multichannel) are discussed.
Convention Paper 7504 (Purchase now)

P1-5 An Improved Distortion Measure for Audio Coding and a Corresponding Two-Layered Trellis Approach for its Optimization—Vinay Melkote, Kenneth Rose, University of California - Santa Barbara, CA, USA
The efficacy of rate-distortion optimization in audio coding is constrained by the quality of the distortion measure. The proposed approach is motivated by the observation that the Noise-to-Mask Ratio (NMR) measure, as it is widely used, is only well adapted to evaluate relative distortion of audio bands of equal width on the Bark scale. We propose a modification of the distortion measure to explicitly account for Bark bandwidth differences across audio coding bands. Substantial subjective gains are observed when this new measure is utilized instead of NMR in the Two Loop Search, for quantization and coding parameters of scalefactor bands in an AAC encoder. Comprehensive optimization of the new measure, over the entire audio file, is then performed using a two-layered trellis approach, and yields nearly artifact-free audio even at low bit-rates.
Convention Paper 7505 (Purchase now)

P1-6 Spatial Audio Scene Coding—Michael M. Goodwin, Jean-Marc Jot, Creative Advanced Technology Center - Scotts Valley, CA, USA
This paper provides an overview of a framework for generalized multichannel audio processing. In this Spatial Audio Scene Coding (SASC) framework, the central idea is to represent an input audio scene in a way that is independent of any assumed or intended reproduction format. This format-agnostic parameterization enables optimal reproduction over any given playback system as well as flexible scene modification. The signal analysis and synthesis tools needed for SASC are described, including a presentation of new approaches for multichannel primary-ambient decomposition. Applications of SASC to spatial audio coding, upmix, phase-amplitude matrix decoding, multichannel format conversion, and binaural reproduction are discussed.
Convention Paper 7507 (Purchase now)

P1-7 Microphone Front-Ends for Spatial Audio Coders—Christof Faller, Illusonic LLC - Lausanne, Switzerland
Spatial audio coders, such as MPEG Surround, have enabled low bit-rate and stereo backwards compatible coding of multichannel surround audio. Directional audio coding (DirAC) can be viewed as spatial audio coding designed around specific microphone front-ends. DirAC is based on B-format spatial sound analysis and has no direct stereo backwards compatibility. We are presenting a number of two capsule-based stereo compatible microphone front-ends and corresponding spatial audio encoder modifications that enable the use of spatial audio coders to directly capture and code surround sound.
Convention Paper 7508 (Purchase now)

Thursday, October 2, 9:00 am — 10:45 am

W1 - New Frontiers in Audio Forensics

Chair:
Richard Sanders
Panelists:
Eddy B. Brixen
Rob Maher
Jeffrey Smith
Gregg Stutchman

Abstract:
In recent history Audio Forensics has been primarily the practice of audio enhancement, analog audio authenticity, and speaker identification. Due to the transition to “all things digital,” new areas of audio forensics are necessarily emerging. Some of these include digital media authenticity, audio for digital video, dealing with compressed audio files such as cell phones, portable recorders and messaging machines. Other new topics include the possible use of the variation of the electric network frequency and audio ballistics analysis. Dealing with the new technologies requires an additional knowledge base, some of which will be presented in this workshop.

Thursday, October 2, 11:00 am — 1:00 pm

W2 - Archiving and Preservation for Audio Engineers

Chair:
Konrad Strauss
Panelists:
Chuck Ainlay
George Massenburg
John Spencer

Abstract:
The art of audio recording is 130 years old. Recordings from the late 1890s to the present day have been preserved thanks to the longevity of analog media, but can the same be said for today's digital recordings? Digital storage technology is transient in nature, making lifespan and obsolescence a significant concern. Additionally, digital recordings are usually platform specific; relying on the existence of unique software and hardware platforms, and the practice of nondestructive recording creates a staggering amount of data much of which is redundant or unneeded. This workshop will address the subject of best practices for storage and preservation of digital audio recordings and outline current thinking and archiving strategies from the home studio to the large production facility.

Thursday, October 2, 1:00 pm — 2:30 pm

Opening Ceremonies
Awards
Keynote Speech

Abstract:
Opening Remarks:
• Executive Director Roger Furness
• President Bob Moses
• Convention Cochairs John Strawn, Valerie Tyler
Program:
• AES Awards Presentation
• Introduction of Keynote Speaker
• Keynote Address by Chris Stone

Awards Presentation

Please join us as the AES presents special awards to those who have made outstanding contributions to the Society in such areas of research, scholarship, and publications, as well as other accomplishments that have contributed to the enhancement of our industry. The awardees are:

PUBLICATIONS AWARD: Roger S. Grinnip III
BOARD OF GOVERNORS AWARD: Jim Anderson, Peter Swarte
FELLOWSHIP AWARD: Jonathan Abel, Angelo Farina, Rob Maher, Peter Mapp, Christoph Musialik, Neil Shaw, Julius Smith, Gerald Stanley, Alexander Voishvillo, William Whitlock
SILVER MEDAL AWARD: Keith Johnson
GOLD MEDAL AWARD: George Massenburg
DISTINGUISHED SERVICE MEDAL AWARD: Jay McKnight

Keynote Speaker

Record Plant co-founder Chris Stone will explore new trends and opportunities in the music industry and what it takes to succeed in today's environment, including how to utilize networking and free services to reduce risk when starting a new small business. Speaking from his strengths as a business/marketing entrepreneur, Stone will focus on the artist’s need to develop a sophisticated approach to operating their own business and also how traditional engineers can remain relevant and play a meaningful role in the ongoing evolution of the recording industry. Stone’s keynote address is entitled: The Artist Owns the Industry.

Thursday, October 2, 2:30 pm — 6:00 pm

TT2 - Dolby Laboratories, San Francisco

Abstract:
Visit legendary Dolby Laboratories’ headquarters while you are in San Francisco. Dolby, known for its more than 40 years of audio innovation and leadership, will showcase its latest technologies (audio and video) for high-definition packaged disc media and digital cinema. Demonstrations will take place in Dolby’s state-of-the-art listening rooms, and in their world-class Presentation Studio.

Dolby Laboratories (NYSE:DLB) is the global leader in technologies that are essential elements in the best entertainment experiences. Founded in 1965 and best known for high-quality audio and surround sound, Dolby innovations enrich entertainment at the movies, at home, or on the go. Visit http://www.dolby.com for more information.

Note: Maximum of 40 participants per tour.

Price: $30 (members), $40 (nonmembers)

Thursday, October 2, 2:30 pm — 4:30 pm

T3 - Broadband Noise Reduction: Theory and Applications

Presenters:
Alexey Lukin, iZotope, Inc. - Boston, MA, USA
Jeremy Todd, iZotope, Inc. - Boston, MA, USA

Abstract:
Broadband noise reduction (BNR) is a common technique for attenuating background noise in audio recordings. Implementations of BNR have steadily improved over the past several decades, but the majority of them share the same basic principles. This tutorial discusses various techniques used in the signal processing theory behind BNR. This will include earlier methods of implementation such as broadband and multiband gates and compander-based systems for tape recording. In addition to explanation of the early methods used in the initial implementation of BNR, greater emphasis and discussion will be focused toward recent advances in more modern techniques such as spectral subtraction. These include multi-resolution processing, psychoacoustic models, and the separation of noise into tonal and broadband parts. We will compare examples of each technique for their effectiveness on several types of audio recordings.

Thursday, October 2, 2:30 pm — 4:00 pm

W3 - Analyzing, Recommending, and Searching Audio Content—Commercial Applications of Music Information Retrieval

Chair:
Jay LeBoeuf, Imagine Research
Panelists:
Markus Cremer, Gracenote
Matthias Gruhne, Fraunhofer Institute for Digital Media Technology
Tristan Jehan, The Echo Nest
Keyvan Mohajer, Melodis Corporation

Abstract:
This workshop will focus on the cutting-edge applications of music information retrieval technology (MIR). MIR is a key technology behind music startups recently featured in Wired and Popular Science. Online music consumption is dramatically enhanced by automatic music recommendation, customized playlisting, song identification via cell phone, and rich metadata / digital fingerprinting technologies. Emerging startups offer intelligent music recommender systems, lookup of songs via humming the melody, and searching through large archives of audio. Recording and music software now offer powerful new features, leveraging MIR techniques. What’s out there and where is this all going? This workshop will inform AES members of the practical developments and exciting opportunities within MIR, particularly with the rich combination of commercial work in this area. Panelists will include industry thought-leaders: a blend of established commercial companies and emerging start-ups.

Thursday, October 2, 3:00 pm — 5:00 pm

Evolution of Video Game Sound

Moderator:
John Griffin, Marketing Director, Games, Dolby Laboratories - USA
Panelists:
Simon Ashby, Product Director, Audiokinetic - Canada
Will Davis, Audio Lead, Electronic Arts/Pandemic Studios - USA
Charles Deenen, Sr. Audio Director, Electronic Arts Black Box - Canada
Tom Hays, Director, Technicolor Interactive Services - USA

Abstract:
From the discrete-logic build of Pong to the multi-core processors of modern consoles, video game audio has made giant strides in complexity to a heightened level of immersion and user interactivity. Since its modest beginnings of monophonic bleeps to the high-resolution multichannel orchestrations and point-of-view audio panning, audio professionals have creatively stretched the envelopes of audio production techniques, as well as the game engine capabilities.

The panel of distinguished video game audio professionals will discuss audio production challenges of landmark game platforms, techniques used to maximize the video game audio experience, the dynamics leading to the modern video game soundtracks, and where the video game audio experience is heading.

This event has been organized by Gene Radzik, AES Historical Committee Co-Chair.

Thursday, October 2, 5:00 pm — 6:45 pm

M1 - Basic Acoustics: Understanding the Loudspeaker

Presenter:
John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada

Abstract:
This presentation is for AES members at an intermediate level and introduces many concepts in acoustics. The basic propagation of sound waves in air for both plane and spherical waves is developed and applied to the operation of a simple, sealed-box loudspeaker. Topics such as the acoustic impedance, compact source operation, and diffraction are included. Some live demonstrations with a simple loudspeaker; microphone and measuring computer are used to illustrate the basic radiation principle of a typical electrodynamic driver mounted in a sealed box.

Thursday, October 2, 5:00 pm — 6:45 pm

W4 - How to Avoid Critical Data Loss After Catastrophic System Failure

Chair:
Chris Bross, DriveSavers, Inc.

Abstract:
Natural disaster, drive failure, human error, and cyber-related crime or corruption can threaten business continuity and a studio’s long-term survival if their data storage devices are compromised. Back up systems and disaster recovery plans can help the studio survive a system crash, but precautionary measures should also be taken to prevent catastrophic data loss should back-up measures fail or be incomplete.

A senior data recovery engineer will review the most common causes of drive failure, demonstrate the consequences of improper diagnosis and mishandling of the device, and provide appropriate action steps you can take for each type of data loss. Learn how to avoid further media damage or permanent data loss after the crash and optimize the chance for a complete data recovery.

Friday, October 3, 9:00 am — 11:00 am

Perceptual Audio Coding—The First 20 Years

Moderator:
Marina Bosi, Stanford University; author of Introduction to Digital Audio Coding and Standards
Panelists:
Karlheinz Brandenburg, Fraunhofer Institute for Digital Media Technology; TU Ilmenau - Ilmenau, Germany
Bernd Edler, University of Hannover
Louis Fielder, Dolby Laboratories
J. D. Johnston, Neural Audio Corp. - Kirkland, WA, USA
John Princen, BroadOn Communications
Gerhard Stoll, IRT
Ken Sugiyama, NEC

Abstract:
Who would have guessed that teenagers and everybody else would be clamoring for devices with MP3/AAC (MPEG Layer III/MPEG Advanced Audio Coding) perceptual audio coders that fit into their pockets? As perceptual audio coders become more and more integral to our daily lives, residing within DVDs, mobile devices, broad/webcasting, electronic distribution of music, etc., a natural question to ask is: what made this possible and where is this going? This panel, which includes many of the early pioneers who helped advance the field of perceptual audio coding, will present a historical overview of the technology and a look at how the market evolved from niche to mainstream and where the field is heading.

Friday, October 3, 9:00 am — 11:00 am

M2 - Binaural Audio Technology—History, Current Practice, and Emerging Trends

Presenter:
Robert Schulein, Schaumburg, IL, USA

Abstract:
During the winter and spring of 1931-32, Bell Telephone Laboratories, in cooperation with Leopold Stokowski and the Philadelphia Symphony Orchestra, undertook a series of tests of musical reproduction using the most advanced apparatus obtainable at that time. The objectives were to determine how closely an acoustic facsimile of an orchestra could be approached using both stereo loudspeakers and binaural reproduction. Detailed documents discovered within the Bell Telephone archives will serve as a basis for describing the results and problems revealed while creating the binaural demonstrations. Since these historic events, interest in binaural recording and reproduction has grown in areas such as sound field recording, acoustic research, sound field simulation, audio for electronic games, music listening, and artificial reality. Each of theses technologies has its own technical concerns involving transducers, environmental simulation, human perception, position sensing, and signal processing. This Master Class will cover the underlying principles germane to binaural perception, simulation, recording, and reproduction. It will include live demonstrations as well as recorded audio/visual examples.

Friday, October 3, 9:00 am — 12:00 pm

TT3 - Center for New Music and Audio Technology, UC Berkeley

Abstract:
The UC Berkeley Center for New Music and Audio Technologies (CNMAT) houses programs in research, pedagogy, and public performance that are focused on the creative interaction between music and technology. CNMAT's pedagogy program is highly integrated with the Department of Music's graduate program in composition, while the research program is linked with other disciplines and departments on campus such as architecture, mathematics, statistics, mechanical engineering, computer science, electrical engineering, psychology, cognitive science, physics, space sciences, the Center for New Media, and the Department of Theater, Dance, and Performance Studies. Presenters David Wessel (Co-Director, CNMAT) and Adrian Freed (Research Director, CNMAT) will give an overview of CNMAT research projects. For more information, visit http://cnmat.berkeley.edu/.

All visitors are required to sign a Non-Disclosure Agreement to enter the facility.

Note: Maximum of 47 participants per tour.

Price: $35 (members), $45 (nonmembers)

Friday, October 3, 9:00 am — 11:30 am

T4 - Perceptual Audio Evaluation

Presenters:
Søren Bech, Bang & Olufsen A/S - Struer, Denmark
Nick Zacharov, SenseLab - Delta, Denmark

Abstract:
The aim of this tutorial is to provide an overview of perceptual evaluation of audio through listening tests, based on good practices in the audio and affiliated industries. The tutorial is aimed at anyone interested in the evaluation of audio quality and will provide a highly condensed overview of all aspects of performing listening tests in a robust manner. Topics will include: (1) definition of a suitable research question and associated hypothesis, (2) definition of the question to be answered by the subject, (3) scaling of the subjective response, (4) control of experimental variables such as choice of signal, reproduction system, listening room, and selection of test subjects, (5) statistical planning of the experiments, and (6) statistical analysis of the subjective responses. The tutorial will include both theory and practical examples including discussion of the recommendations of relevant international standards (IEC, ITU, ISO). The presentation will be made available to attendees and an extended version will be available in the form of the text “Perceptual Audio Evaluation" authored by Søren Bech and Nick Zacharov.

Friday, October 3, 9:00 am — 10:30 am

P7 - Audio Content Management

P7-1 A Piano Sound Database for Testing Automatic Transcription Methods—Luis Ortiz-Berenguer, Elena Blanco-Martin, Alberto Alvarez-Fernandez, Jose A. Blas-Moncalvillo, Francisco J. Casajus-Quiros, Universidad Politecnica de Madrid - Madrid, Spain
A piano sound database, called PianoUPM, is presented. It is intended to help the researching community in developing and testing transcription methods. A practical database needs to contain notes and chords played through the full piano range, and it needs to be recorded from acoustic pianos rather than synthesized ones. The presented piano sound database includes the recording of thirteen pianos from different manufacturers. There are both upright and grand pianos. The recordings include the eighty-eight notes and eight different chords played both in legato and staccato styles. It also includes some notes of every octave played with four different forces to analyze the nonlinear behavior. This work has been supported by the Spanish National Project TEC2006-13067-C03-01/TCM.
Convention Paper 7538 (Purchase now)

P7-2 Measurements of Spaciousness for Stereophonic Music—Andy Sarroff, Juan P. Bello, New York University - New York, NY, USA
The spaciousness of pre-recorded stereophonic music, or how large and immersive the virtual space of it is perceived to be, is an important feature of a produced recording. Quantitative models of spaciousness as a function of a recording’s (1) wideness of source panning and of a recording’s (2) amount of overall reverberation are proposed. The models are independently evaluated in two controlled experiments. In one, the panning widths of a distribution of sources with varying degrees of panning are estimated; in the other, the extent of reverberation for controlled mixtures of sources with varying degrees of reverberation are estimated. The models are shown to be valid in a controlled experimental framework.
Convention Paper 7539 (Purchase now)

P7-3 Music Annotation and Retrieval System Using Anti-Models—Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, National Tsing Hua University - Taiwan
Query-by-semantic-description (QBSD) is a natural way for searching/annotating music in a large database. We propose such a system by considering anti-words for each annotation word based on the concept of supervised multi-class labeling (SML). Moreover, words that are highly correlated with the anti-semantic meaning of a word constitute its anti-word set. By modeling both a word and its anti-word set, our system can achieve +8.21% and +1.6% gains of average precision and recall against SML under the condition of an equal average number of annotation words, that is, 10. By incorporating anti-models, we also allow queries with anti-semantic words, which is not an option for previous systems.
Convention Paper 7540 (Purchase now)

P7-4 The Effects of Lossy Audio Encoding on Onset Detection Tasks—Kurt Jacobson, Matthew Davies, Mark Sandler, Queen Mary University of London - London, UK
In large audio collections, it is common to store audio content with perceptual encoding. However, encoding parameters may vary from collection to collection or even within a collection—using different bit rates, sample rates, codecs, etc. We evaluated the effect of various audio encodings on the onset detection task and show that audio-based onset detection methods are surprisingly robust in the presence of MP3 encoded audio. Statistically significant changes in onset detection accuracy only occur at bit-rates lower than 32 kbps.
Convention Paper 7541 (Purchase now)

P7-5 An Evaluation of Pre-Processing Algorithms for Rhythmic Pattern Analysis—Matthias Gruhne, Christian Dittmar, Daniel Gaertner, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau Technical University - Ilmenau, Germany
For the semantic analysis of polyphonic music, such as genre recognition, rhythmic pattern features (also called Beat Histogram) can be used. Feature extraction is based on the correlation of rhythmic information from drum instruments in the audio signal. In addition to drum instruments, the sounds of pitched instruments are usually also part of the music signal to analyze. This can have a significant influence on the correlation patterns. This paper describes the influence of pitched instruments for the extraction of rhythmic features, and evaluates two different pre-processing methods. One method computes a sinusoidal and noise model, where its residual signal is used for feature extraction. In the second method, a drum transcription based on spectral characteristics of drum sounds is performed, and the rhythm pattern feature is derived directly from the occurrences of the drum events. Finally, the results are explained and compared in detail.
Convention Paper 7542 (Purchase now)

P7-6 A Framework for Producing Rich Musical Metadata in Creative Music Production—Gyorgy Fazekas, Yves Raimond, Mark Sandler, Queen Mary University of London - London, UK
Musical metadata may include references to individuals, equipment, procedures, parameters, or audio features extracted from signals. There are countless possibilities for using this data during the production process. An intelligent audio editor, besides internally relying on it, can be both producer and consumer of information about speci?c aspects of music production. In this paper we propose a framework for producing and managing meta information about a recording session, a single take or a subsection of a take. As basis for the necessary knowledge representation we use the Music Ontology with domain speci?c extensions. We provide examples on how metadata can be used creatively, and demonstrate the implementation of an extended metadata editor in a multitrack audio editor application.
Convention Paper 7543 (Purchase now)

P7-7 SoundTorch: Quick Browsing in Large Audio Collections—Sebastian Heise, Michael Hlatky, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Musicians, sound engineers, and foley artists face the challenge of finding appropriate sounds in vast collections containing thousands of audio files. Imprecise naming and tagging forces users to review dozens of files in order to pick the right sound. Acoustic matching is not necessarily helpful here as it needs a sound exemplar to match with and may miss relevant files. Hence, we propose to combine acoustic content analysis with accelerated auditioning: Audio files are automatically arranged in 2-D by psychoacoustic similarity. A user can shine a virtual flashlight onto this representation; all sounds in the light cone are played back simultaneously, their position indicated through surround sound. User tests show that this method can leverage the human brain's capability to single out sounds from a spatial mixture and enhance browsing in large collections of audio content.
Convention Paper 7544 (Purchase now)

P7-8 File System Tricks for Audio Production—Michael Hlatky, Sebastian Heise, Jörn Loviscach, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany
Not every file presented by a computer operating system needs to be an actual stream of independent bits. We demonstrate that different types of virtual files and folders including so-called "Filesystems in Userspace" (FUSE) allow streamlining audio content management with relatively little additional complexity. For instance, an off-the-shelf database system may present a distributed sound library through (seemingly) standard files in a project-specific hierarchy with no physical copying of the data involved. Regions of audio files may be represented as separate files; audio effect plug-ins may be displayed as collections of folders for on-demand processing while files are read. We address differences between operating systems, available implementations, and lessons learned when applying such techniques.
Convention Paper 7545 (Purchase now)

Friday, October 3, 10:00 am — 12:30 pm

TT4 - Paul Stubblebine Mastering/The Tape Project, San Francisco

Abstract:
A world-class mastering studio with credits that include classic recordings for The Grateful Dead and Santana and such new artists as Ferron, California Zephyr, and Jennifer Berezan. Now deeply involved with DVD as well as traditional audio mastering, the studio recently moved to a larger, full service Mission Street complex. The Tape Project remasters recordings for analog tape distribution.

Note: Maximum of 20 participants per tour.

Price: $30 (members), $40 (nonmembers)

Friday, October 3, 1:00 pm — 2:00 pm

Lunchtime Keynote: Dave Giovannoni of First Sounds

Abstract:
Before Edison—Recovering the World's First Audio Recordings

First Sounds, an informal collaborative of audio engineers and historians, recently corrected the historical record and made international headlines by playing back a phonautogram made in Paris in April 1860—a ghostly, ten-second evocation of a French folk song. This and other phonautograms establish a forgotten French typesetter as the first person to record reproducible airborne sound 17 years before Edison invented the phonograph. Primitive and nearly accidental, the world’s first audio recordings pose a unique set of technical challenges. David Giovannoni of First Sounds discusses their recovery and restoration and will premiere two newly restored recordings.

Friday, October 3, 2:30 pm — 4:00 pm

Compressors—A Dynamic Perspective

Moderator:
Fab Dupont
Panelists:
Dave Derr
Wade Goeke
Dave Hill
Hutch Hutchison
George Massenburg
Rupert Neve

Abstract:
A device that, some might say, is being abused by those involved in the “loudness wars,” the dynamic range compressor can also be a very creative tool. But how exactly does it work? Six of the audio industry’s top designers and manufacturers lift the lid on one of the key components in any recording, broadcast or live sound signal chain. They will discuss the history, philosophy and evolution of this often misunderstood processor. Is one compressor design better than another? What design features work best for what application? The panel will also reveal the workings behind the mysteries of feedback and feed-forward designs, side-chains, and hard and soft knees, and explore the uses of multiband, parallel and serial compression.

Friday, October 3, 2:30 pm — 4:15 pm

M3 - Sonic Methodology and Mythology

Presenter:
Keith O. Johnson - Pacifica, CA, USA

Abstract:
Do extravagant designs and superlative specifications satisfy sonic expectations? Can power cords, interconnects, marker dyes and other components in a controversial lineup improve staging, clarity, and other features? Intelligent measurements and neural feedback studies support these sonic issues as well as predict misdirected methodology from speculative thought. Sonic changes and perceptual feats to hear them are possible and we'll explore recorders, LPs, amplifiers, conversion, wire, circuits and loudspeakers to observe how they create artifacts and interact in systems. Hearing models help create and interpret tests intended to excite predictive behaviors of components. Time domain, tone cluster and fast sweep signals along with simple test devices reveal small complex artifacts. Background knowledge of halls, recording techniques, and cognitive perception becomes helpful to interpret results, which can reveal simple explanations to otherwise remarkable physics. Other topics include power amplifiers that can ruin a recording session, noise propagation from regulators, singing wire, coherent noise, eigensonics, and speakers prejudicial to key signatures. Waveform perception, tempo shifting, and learned object sounds will be demonstrated.

Friday, October 3, 4:00 pm — 6:45 pm

B6 - History of Audio Processing

Chair:
Emil Torick
Panelists:
Dick Burden
Marvin Caesar, Aphex
Glen Clark, Glen Clark & Associates
Mike Dorrough, Dorrough Electronics
Frank Foti, Omnia
Greg J. Ogonowski, Orban/CRL
Bob Orban, Orban/CRL
Eric Small, Modulation Sciences

Abstract:
The participants of this session pioneered audio processing and developed the tools we still use today. A discussion of the developments, technology, and the “Loudness Wars” will take place. This session is a must if you want to understand how and why audio processing is used.

Friday, October 3, 5:30 pm — 6:45 pm

T8 - Free Source Code for Processing AES Audio Data

Presenters:
Gregg C. Hawkes, Xilinx - San Jose, CA, USA
Reed Tidwell, Xilinx - San Jose, CA, USA

Abstract:
This session is a tutorial on the Xilinx free Verilog and VHDL source code for extracting and inserting audio in SDI streams, including “on the fly” error correction and high performance, continuously adaptive, asynchronous sample rate conversion. The audio sample rate conversion supports large ratios as well as fractional conversion rates and maintains high performance while continuously adapting itself to the input and output rates without user control. The features, device utilization, and performance of the IP will be presented and demonstrated with industry standard audio hardware.

Saturday, October 4, 9:00 am — 12:00 pm

P14 - Listening Tests & Psychoacoustics

Chair: Poppy Crum, Johns Hopkins University - Baltimore, MD, USA

P14-1 Rapid Learning of Subjective Preference in Equalization—Andrew Sabin, Bryan Pardo, Northwestern University - Evanston, IL, USA
We describe and test an algorithm to rapidly learn a listener’s desired equalization curve. First, a sound is modified by a series of equalization curves. After each modification, the listener indicates how well the current sound exemplifies a target sound descriptor (e.g., “warm”). After rating, a weighting function is computed where the weight of each channel (frequency band) is proportional to the slope of the regression line between listener responses and within-channel gain. Listeners report that sounds generated using this function capture their intended meaning of the descriptor. Machine ratings generated by computing the similarity of a given curve to the weighting function are highly correlated to listener responses, and asymptotic performance is reached after only ~25 listener ratings.
Convention Paper 7581 (Purchase now)

P14-2 An Initial Validation of Individualized Crosstalk Cancellation Filters for Binaural Perceptual Experiments—Alastair Moore, Anthony Tew, University of York - York, UK; Rozenn Nicol, France Télécom R&D - Lannion, France
Crosstalk cancellation provides a means of delivering binaural stimuli to a listener for psychoacoustic research that avoids many of the problems of using headphone in experiments. The aim of this study was to determine whether individual crosstalk cancellation filters can be used to present binaural stimuli, which are perceptually indistinguishable from a real sound source. The fast deconvolution with frequency dependent regularization method was used to design crosstalk cancellation filters. The reproduction loudspeakers were positioned at ±90-degrees azimuth and the synthesized location was 0-degrees azimuth. Eight listeners were tested with three types of stimuli. In twenty-two out of the twenty-four listener/stimulus combinations there were no perceptible differences between the real and virtual sources. The results suggest that this method of producing individualized crosstalk cancellation filters is suitable for binaural perceptual experiments.
Convention Paper 7582 (Purchase now)

P14-3 Reverberation Echo Density Psychoacoustics—Patty Huang, Jonathan S. Abel, Hiroko Terasawa, Jonathan Berger, Stanford University - Stanford, CA, USA
A series of psychoacoustic experiments were carried out to explore the relationship between an objective measure of reverberation echo density, called the normalized echo density (NED), and subjective perception of the time-domain texture of reverberation. In one experiment, 25 subjects evaluated the dissimilarity of signals having static echo densities. The reported dissimilarities matched absolute NED differences with an R2 of 93%. In a 19-subject experiment, reverberation impulse responses having evolving echo densities were used. With an R2 of 90% the absolute log ratio of the late field onset times matched reported dissimilarities between impulse responses. In a third experiment, subjects reported breakpoints in the character of static echo patterns at NED values of 0.3 and 0.7.
Convention Paper 7583 (Purchase now)

P14-4 Optimal Modal Spacing and Density for Critical Listening—Bruno Fazenda, Matthew Wankling, University of Huddersfield - Huddersfield, West Yorkshire, UK
This paper presents a study on the subjective effects of modal spacing and density. These are measures often used as indicators to define particular aspect ratios and source positions to avoid low frequency reproduction problems in rooms. These indicators imply a given modal spacing leading to a supposedly less problematic response for the listener. An investigation into this topic shows that subjects can identify an optimal spacing between two resonances associated with a reduction of the overall decay. Further work to define a subjective counterpart to the Schroeder Frequency has revealed that an increase in density may not always lead to an improvement, as interaction between mode-shapes results in serious degradation of the stimulus, which is detectable by listeners.
Convention Paper 7584 (Purchase now)

P14-5 The Illusion of Continuity Revisited on Filling Gaps in the Saxophone Sound—Piotr Kleczkowski, AGH University of Science and Technology - Cracow, Poland
Some time-frequency gaps were cut from a recording of a motif played legato on the saxophone. Subsequently, the gaps were filled with various sonic material: noises and sounds of an accompanying band. The quality of the saxophone sound processed in this way was investigated by listening tests. In all of the tests, the saxophone seemed to continue through the gaps, an impairment in quality being observed as a change in the tone color or an attenuation of the sound level. There were two aims of this research. First, to investigate whether the continuity illusion contributed to this effect, and second, to discover what kind of sonic material filling the gaps would cause the least deterioration in sound quality.
Convention Paper 7585 (Purchase now)

P14-6 The Incongruency Advantage for Sounds in Natural Scenes—Brian Gygi, Veterans Affairs Northern California Health Care System - Martinez, CA, USA; Valeriy Shafiro, Rush University Medical Center - Chicago, IL, USA
This paper tests identification of environmental sounds (dogs barking or cars honking) in familiar auditory background scenes (street ambience, restaurants). Initial results with subjects trained on both the background scenes and the sounds to be identified showed a significant advantage of about 5% better identification accuracy for sounds that were incongruous with the background scene (e.g., a rooster crowing in a hospital). Studies with naïve listeners showed this effect is level-dependent: there is no advantage for incongruent sounds up to a Sound/Scene ratio (So/Sc) of –7.5 dB, after which there is again about 5% better identification. Modeling using spectral-temporal measures showed that saliency based on acoustic features cannot account for this difference.
Convention Paper 7586 (Purchase now)

Saturday, October 4, 11:00 am — 12:00 pm

T11 - [Canceled]

Saturday, October 4, 5:00 pm — 6:45 pm

B10 - Audio Transport

Chair:
David Prentice, VCA
Panelists:
Kevin Campbell, APT Ltd.
Chris Crump, Comrex
Angela DePascale, Global Digital Datacom Services Inc.
Herb Squire, DSI RF
Mike Uhl, Telos

Abstract:
This will be a discussion of techniques and technologies used for transporting audio (i.e., STL, RPU, codecs, etc.). Transporting audio can be complex. This will be a discussion of various roads you can take.

Saturday, October 4, 6:00 pm — 7:00 pm

MIX Foundation 2008 TECnology Hall of Fame

Abstract:
Hosted by Mix Magazine Executive Editor/TECnology Hall of Fame director George Petersen.

Presented annually by the Mix Foundation for Excellence in Audio to honor significant, lasting contributions to the advancement of audio technology, this year's event will recognize fifteen audio innovations. "It is interesting to note how many of these products are still in daily use decades after their introduction," Petersen says. "These aren't simply museum pieces, but working tools. We're proud to recognize their significance to the industry."

Sunday, October 5, 9:00 am — 10:45 am

W10 - File Formats for Interactive Applications and Games

Chair:
Chris Grigg, Beatnik, Inc.
Panelists:
Christof Faller, Illusonic LLC
John Lazzaro, University of California, Berkeley
Juergen Schmidt, Thomson

Abstract:
There are a number of different standards covering file formats that may be applicable to interactive or game applications. However, some of these older formats have not been widely adopted and newer formats may not yet be very well known. Other formats may be used in non-interactive applications but may be equally suitable to interactive applications. This tutorial reviews the requirements of an interactive file format. It presents an overview of currently available formats and discusses their suitability to certain interactive applications. The panel will discuss why past efforts at interactive audio standards have not made it to product and look to widely-adopted standards in related fields (graphics and networking) in order to borrow their successful traits for future standards. The workshop is presented by a number of experts who have been involved in the standardization or development of these formats. The formats
covered include Ambisonics B-Format, MPEG-4 object coding, MPEG-4 Structured Audio Orchestral Language, MPEG-4 Audio BIFS, and the upcoming iXMF standard.

Sunday, October 5, 9:00 am — 11:30 am

P22 - Hearing Enhancement

Chair: Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA

P22-1 Assessing the Acoustic Performance and Potential Intelligibility of Assistive Audio Systems for the Hard of Hearing and Other Users—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Around 14% of the general population suffer from a noticeable degree of hearing loss and would benefit from some form of hearing assistance or deaf aid. Recent DDA legislation and requirements mean that many more hearing assistive systems are being installed—yet there is evidence to suggest that many of these systems fail to perform adequately and provide the benefit expected. There has also been a proliferation of classroom and lecture room “soundfield” systems, with much conflicting evidence as to their apparent effectiveness. This paper reports on the results of some trial acoustic performance testing of such systems. In particular the effects of system microphone type, distance, and location are shown to have a significant effect on the resultant performance. The potential of using the Sound Transmission Index (STI) and in particular STIPa, for carrying out installation surveys has been investigated and a number of practical problems are highlighted. The requirements for a suitable acoustic test source to mimic a human talker are discussed as is the need to the need to adequately assess the effects of both reverberation and noise. The findings discussed in the paper are also relevant to the installation and testing of boardroom and conference room telecommunication systems.
Convention Paper 7626 (Purchase now)

P22-2 Aging and Sound Perception: Desirable Characteristics of Entertainment Audio for the Elderly—Hannes Müsch, Dolby Laboratories - San Francisco, CA, USA
During the last few years the research community has made substantial progress toward understanding how aging affects the way the ear and brain process sound. A review of the literature supports our experience as audio professionals that elderly listeners have preferences for the reproduction of entertainment audio that differ from those of young listeners. This presentation reviews the literature on aging and sound perception with a focus on speech. The review identifies desirable audio reproduction characteristics and discusses signal processing techniques to generate audio that is suited for elderly listeners.
Convention Paper 7627 (Purchase now)

P22-3 Speech Enhancement of Movie Sound—Christian Uhle, Oliver Hellmuth, Jan Weigel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Today, many people have problems understanding the speech content of a movie, e.g., due to hearing impairments. This paper describes a method for improving speech intelligibility of movie sound. Speech is detected by means of a pattern recognition method; the audio signal is then attenuated during periods where speech is absent. The speech signals are further processed by a spectral weighting method aiming at the suppression of the background noise. The spectral weights are computed by means of feature extraction and a neural network regression method. The output signal finally carries all relevant speech with reduced background noise allowing the listener to follow the plot of the movie more easily. Results of numerical evaluations and of listening tests are presented.
Convention Paper 7628 (Purchase now)

P22-4 An Investigation of Audio Balance for Elderly Listeners Using Loudness as the Main Parameter—Tomoyasu Komori, Toru Takagi, NHK Science and Technical Research Laboratories - Tokyo, Japan; Kohichi Kurozumi, NHK Engineering Service, Inc. - Tokyo, Japan; Kazuhiro Murakawa, Yamaki Electric Corporation - Tokyo, Japan
We have been studying the best sound balance for audibility for elderly listeners. We conducted subjective tests on the balance between narration and background sound using professional sound mixing engineers. The comparative loudness of narration to background sound was used to calculate appropriate respective levels for use in documentary programs. Monosyllabic intelligibility tests were then conducted in a noisy environment with both elderly and young people and a condition that complicates hearing for the elderly was identified. Assuming that the recruitment phenomenon and reduced ability to separate narration from background sound cause hearing problems for the elderly, we estimated appropriate loudness levels for them. We also constructed a prototype to assess the best audio balance for the elderly objectively.
Convention Paper 7629 (Purchase now)

P22-5 Estimating the Transfer Function from Air Conduction Recording to One’s Own Hearing—Sook Young Won, Jonathan Berger, Stanford University - Stanford, CA, USA
It is well known that there is often a sense of disappointment when an individual hears a recording of his/her own voice. The perceptual disparity between the live and recorded sound of ones own voice can be explained scientifically as the result of the multiple paths via which our body transmits vibrations from the vocal cords to the auditory system during vocalization, as opposed to the single air-conducted path in hearing a playback of one’s own recorded voice. In this paper we aim to investigate the spectral characteristics of one’s own hearing as compared to an air-conducted recording. To accomplish this objective, we designed and conducted a perceptual experiment with a real-time filtering application.
Convention Paper 7630 (Purchase now)

Sunday, October 5, 11:00 am — 1:00 pm

The Evolution of Electronic Instrument Interfaces: Past, Present, Future

Moderator:
Gino Robair, editor of Electronic Musician magazine
Panelists:
Roger Linn, Roger Linn Designs
Tom Oberheim, Founder, Oberheim Electronics
Dave Smith, Dave Smith Instruments

Abstract:
Developing musical instruments that take advantage of new technologies is exciting. However, coming up with something that is not only intuitive and musically useful but that will be accepted by musicians requires more than just a feature-rich box with sexy industrial design. This panel will discuss the issues involved in creating new musical instruments, with a focus on interface design, as well as explore ways to avoid the mistakes of the past when designing products for the future. These three panelists have brought a variety of innovative products to market (with varying degrees of success), which have made each of them household names in the MI world.

Sunday, October 5, 11:00 am — 1:00 pm

W11 - Upcoming MPEG Standard for Efficient Parametric Coding and Rendering of Audio Objects

Chair:
Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits IIS
Panelists:
Jonas Engdegård
Christof Faller
Jürgen Herre
Leon van de Kerkhof

Abstract:
Through exploiting the human perception of spatial sound, “Spatial Audio Coding” technology enabled new ways of low bit-rate audio coding for multichannel signals. Following the finalization of the MPEG Surround specification, ISO/MPEG launched a follow-up standardization activity for bit-rate-efficient and backward compatible coding of several sound objects. On the receiving side, such a Spatial Audio Object Coding (SAOC) system renders the objects interactively into a sound scene on a reproduction setup of choice. The workshop reviews the ideas, principles, and prominent applications behind Spatial Audio Object Coding and reports on the status of the ongoing ISO/MPEG Audio standardization activities in this field. The benefits of the new approach will be highlighted and illustrated by means of real-time demonstrations.

Sunday, October 5, 2:30 pm — 4:30 pm

P25 - Forensic Analysis

Chair: Eddy Bogh Brixen, EBB-Consult - Smørum, Denmark

P25-1 Magnetic Field Mapping of Analog Audio and Video Tapes (Invited Paper)—David P. Pappas, Fabio C. S. da Silva, National Institute of Standards and Technology
Forensic analysis of magnetic tape evidence can be critical in many cases. In particular, it has been shown that imaging the magnetic patterns on tapes can give important information related to authenticity and identifying the type of recorder(s) used. We present here an imaging system that allows examiners to view the magnetic patterns while they are playing, copying, or listening to cassette audio and VHS video tapes. With the added benefits of high resolution and polarity sensitivity, this system significantly improves on the accuracy and speed of the examination. Finally, the images, which constitute a true magnetic field map of the tape, can be saved to a computer file. We demonstrate that analog audio data can be recovered directly from these maps with bandwidths only limited by the sampling rate of the electronics. For helical video signals on VHS tapes, we can see the signature of the magnetic recording. Higher sampling rates and transverse spatial resolution would be needed to reconstruct video data from the images. However, for cases where VHS video tape has been cut into small pieces, we have built a custom fixture that allows the tape to be held up to it. It can display the low frequency synchronization track, allowing examiners to quickly identify the side of the tape that the data is on and the orientation. The system is based on magnetoresistive imaging heads capable of scanning 256 channels simultaneously along linear ranges of either 4 mm or 13 mm. High speed electronics read the channels and transfer the data to a computer that builds and displays the images.

P25-2 Magnetic Development: Magneto-Optical Indicator Film Imaging vs. Ferrofluids—Jonathan C. Broyles, Image and Sound Forensics - Parker, CO, USA
Techniques, advantages, and disadvantages of ferrofluids and magneto-optical indicator film imaging methods of magnetic development are discussed. Presentation and discussion of test results with supporting test images and figures. A number of MOIF imaging examples are presented for discussion. General overview on how magneto-optical magnetic development works. Magnetic development examples processed on magneto-optical imaging system developed and built by the author.
Convention Paper 7642 (Purchase now)

P25-3 Extraction of Electric Network Frequency Signals from Recordings Made in a Controlled Magnetic Field—Richard Sanders, University of Colorado Denver - Denver, CO, USA; Pete Popolo, National Center for Voice & Speech - Denver, CO, USA
An Electric Network Frequency (ENF) signal is the 60 Hz component of an AC power signal that varies over time due to fluctuations in power production and consumption, across the entire grid of a power distribution network. When present in audio recordings, these signals (or their harmonics) can be used to authenticate a recording, time stamp the original, or determine if a recording was copied or edited. This paper will present the results of an experiment to determine if ENF signals in a controlled magnetic field can be detected and extracted from audio recordings made with battery operated audio recording devices.
Convention Paper 7643 (Purchase now)

P25-4 Forensic Voice Identification Utilizing Digitally Extracted Speech Characteristics—Jeff M. Smith, Richard Sanders, University of Colorado Denver - Denver, CO, USA
By combining modern capabilities in the digital domain with more traditional methods of aural comparison and spectrographic inspection, the acquisition of identity from recorded evidence can be executed with greater confidence. To aid the Audio Forensic Examiner’s efforts in this, an effective approach to manual voice identification is presented here. To this end, this paper relates the research into and application of unique vocal characteristics utilized by the SIDNI (Speaker Identification by Numerical Imprint) automated system of voice identification to manual forensic investigation. Some characteristics include: fundamental speaking frequency, rate of vowels, proportional relationships in spectral distribution, amplitude of speech, and perturbation measurements.
Convention Paper 7644 (Purchase now)

AES San Francisco 2008Library Event Details