AES New York 2005
P1 - Analysis, Synthesis of Sound
Paper Session Details
Friday, October 7, 9:30 am — 11:30 am
Chair: Durand Begault, Audio Forensic Center, Charles M. Salter Associates - San Francisco, CA, USA
P1-1 Perceptual Modeling of Piano Tones—Brahim Hamadicharef, Emmanuel Ifeachor, University of Plymouth - Plymouth, Devon, UK
A modeling system for piano tones is presented. It fully automates the modeling process and includes the following three main stages: sound analysis, sound synthesis, and sound quality assessment. High quality piano sounds are analyzed in time and frequency domain. Analysis results are then used to design filter models matching the string resonance and create excitation signals using an inverse filtering technique for the excitation-filter synthesis model. The impact of each sound model parameter onto the perceived sound quality has been assessed using the Perceptual Evaluation of Audio Quality (PEAQ) algorithm. This is helping to optimize the DSP resource requirements for real-time implementation onto multimedia PC and FPGA-based hardware.
Convention Paper 6525 (Purchase now)
P1-2 Multichannel Audio Processing Using a Unified Domain Representation—Kevin Short, Ricardo Garcia, Michelle Daniels, Chaoticom Technologies - Andover, MA, USA
The Unified Domain representation for synchronized multichannel audio streams is introduced. This lossless and invertible transformation describes multiple streams of audio as a single frequency domain magnitude component multiplied by a complex matrix encoding the spatial and phase relationship information for each channel. Unified domain analysis and signal processing techniques for applications such as high-resolution frequency analysis, sound source separation, spatial psychoacoustic models and low bit rate audio coding are presented.
Convention Paper 6526 (Purchase now)
P1-3 Multichannel Audio Time-Scale Modification—David Dorran, Dublin Institute of Technology - Dublin, Ireland; Robert Lawlor, National University of Ireland - Maynooth, Ireland; Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland
Phase vocoder-based approaches to audio time scale modification introduce a reverberant artifact into the time-scaled output. Recent techniques have been developed to reduce the presence of this artifact. However, these techniques have the effect of introducing additional issues relating to their application to multichannel recordings. This paper addresses these issues by collectively analyzing all channels prior to time-scaling each individual channel.
Convention Paper 6527 (Purchase now)
P1-4 Improving MPEG-7 Sound Classification—Holger Crysandt, Aachen University (RWTH) - Aachen, Germany
This paper describes a mechanism to improve the sound classification algorithm included in the MPEG-7 standard without modifying or extending it. The sequential classification is turned into a hierarchical classification. Thereby it is possible to adopt the classification algorithm more flexible to the characteristics of the sound classes. This paper also gives a detailed view on how the algorithm is implemented using an XML database to store and request content information of the audio signals and model descriptions of sound classes using the MPEG-7 standard.
Convention Paper 6528 (Purchase now)
P2 - Acoustics & Desktop Production
Friday, October 7, 9:30 am — 11:30 am
Chair: Brad Gover, National Research Council - Ottawa, Ontario, Canada
P2-1 Measurement of Architectural Speech Security of Closed Offices and Meeting Rooms—Bradford Gover, John Bradley, National Research Council Canada - Ottawa, Ontario, Canada
A measurement procedure has been developed for rating the architectural speech security of closed offices and meeting rooms. It is based on measuring the attenuation between average levels in the meeting room and received levels at spot locations outside the room, 0.25 m from the room boundaries. These attenuations are used with statistical distributions of speech and noise levels to calculate a suitable signal-to-noise measure. This previously derived objective measure is related to the audibility and intelligibility of the transmitted speech. The measurement at spot receiver locations allows detection and characterization of localized weak spots (“hot spots”) in the room’s boundaries.
Convention Paper 6529 (Purchase now)
P2-2 Frequency-Based Coloring of the Waveform Display to Facilitate Audio Editing and Retrieval—Stephen Rice, The University of Mississippi - University, MS, USA; Comparisonics Corporation, Grass Valley, CA, USA
The audio waveform display provides the visual focus in audio-editing systems yet sounds are difficult to see in the display. Using a new technique, the display is colored to represent the frequency content to make sounds more visible. This requires extraction of frequency information from the audio signal and an appropriate mapping of this information to the color space. Ideally, the coloring is independent of recording level, and similar sounds are represented by similar colors. Audio-editing systems are enhanced by the improved user interface. Audio-retrieval systems can present colored waveform displays as visual “thumbnails” in a list of search results.
Convention Paper 6530 (Purchase now)
P2-3 Development of Auditory Alerts for Air Traffic Control Consoles—Densil Cabrera, Sam Ferguson, University of Sydney - Sydney, NSW, Australia; Gary Laing, Airservices Australia - Brisbane, Australia
This paper documents a project that developed a hierarchical auditory alert scheme for air traffic control consoles, replacing a basic system of auditory alerts. Alerts are designed to convey the level of urgency, not provoke annoyance, be easily distinguished, minimize speech interference, and be easily localized. User evaluations indicate that the new alert scheme is highly advantageous, especially when combined with improved visual coding of alerts. The alert scheme was implemented in Australian air traffic control centers in July 2005.
Convention Paper 6531 (Purchase now)
P2-4 Multichannel Impulse Response Measurement, Analysis, and Rendering in Archaeological Acoustics—Damian Murphy, University of York - Heslington, York, UK
Developments in measuring the acoustic characteristics of concert halls and opera houses are leading to standardized methods of impulse response capture for a wide variety of auralization applications and delivery methods. This paper extends and develops these methods to nontraditional performance venues and examines how objective acoustic parameter analysis can be applied in the field of acoustic archaeology. An initial study of selected archaeological sites in the UK is presented, each site demonstrating some feature of interest in terms of its acoustic characteristics. The resulting database of measurements has a particular use in convolution-based reverberation, and an acoustic analysis of the impulse responses provides an additional insight as to the characteristics and construction of these spaces.
Convention Paper 6532 (Purchase now)
P3 - Restoration, Storage & Car Audio
Friday, October 7, 1:30 pm — 3:30 pm
Chair: Brett Crockett, Dolby Laboratories - San Francisco, CA, USA
P3-1 Preferred Listening Levels in the Automotive Environment—Eric Benjamin, Brett Crockett, Dolby Laboratories - San Francisco, CA, USA
In recent years the automobile passenger compartment has become increasingly important as a location in which entertainment, primarily audio entertainment, is consumed. Listener preference for dialog levels in the domestic environment is fairly well understood, and it follows conversational speech levels. It has been known for some time that preferred listening levels for music are higher than those for dialog, and it has also been observed that because of the high ambient noise levels in the automobile passenger compartment that higher listening levels are used in order to bring the reproduction levels above the noise level. What is the range of noise levels within the automobile passenger compartment and how does that affect listener preference for sound reproduction levels?
Convention Paper 6533 (Purchase now)
P3-2 High Frequency Compensation for Compressed Digital Audio Using Sampled-Data Control—Koji Fujiyama, Naoya Iwasaki, Riku Kaibe, Hiroshi Kano,, SANYO Electric Co., Ltd. - Hirakata, Osaka, Japan; Yutaka Yamamoto, Kyoto University - Kyoto, Japan
The demands are growing for improving the quality of compressed digital audio as portable non-CD players become popular. Because high compression causes poor sound quality, a solution to improve it is being sought. An effective solution is to generate high frequency components from the lower one. We propose the idea of using imaging components and digital filters designed using "Sampled-Data Control" to minimize the error between an assumed original analog signal and interpolated digital signal. We successfully improved the sound quality of compressed audio by realizing the close frequency spectrum to CD especially for the sound with low bit rate.
Convention Paper 6534 (Purchase now)
P3-3 Are There Criteria to Evaluate Optical Disc Quality that Are Relevant for End-Users?—Jean-Marc Fontaine, Jacques Poitevineau, Laboratoire d'Acoustique Musicale - Paris, France
This paper deals with long-term preservation of sound and audiovisual heritage collections. Once-recordable optical discs (among other media) can fulfill such an objective, provided rigorous measures are taken, essentially through careful inspection of the discs by means of a specific equipment. We have studied disc quality criteria by considering thoroughly end-users applications, especially during data transfer processing (initial quality) and later accesses to existing collections (aging behavior). We show that error rates cannot be adopted as unique quality descriptors and other parameters have to be defined, e.g., from the data provided by our research-grade analyzer. A multidimensional statistical method such as Principal Component Analysis (PCA) gives promising indications toward this goal. The studies we have been conducting for many years are now reinforced thanks to the constitution of a network (Groupe d’Intérêt Scientifique-GIS in France) that combines complementary abilities and research equipment.
Convention Paper 6535 (Purchase now)
P3-4 An Open Design and Implementation for the Enabler Component of the Plural Node Architecture of Professional Audio Devices—Jun-ichi Fujimori, Yamaha - Hamamatsu, Japan; Harold Okai-Tettey, Richard John Foss, Rhodes University - Grahamstown, South Africa
The Plural Node architecture is an implementation architecture for professional audio devices that adhere to the Audio and Music (A/M) protocol. The Plural-Node implementation architecture comprises two components on separate IEEE 1394 nodes—a “Transporter” component dedicated to A/M protocol handling and an “Enabler” component that controls the Transporter and provides high level plug abstractions. An open generic transporter specification has been developed for the transporter component. This paper details an open design and implementation for the enabler component that allows for connection management via abstract mLAN plugs.
Convention Paper 6536 (Purchase now)
P4 - Spatial Perception and Processing
Friday, October 7, 1:30 pm — 4:30 pm
Chair: Richard Stroud
P4-1 Audibility of Spectral Switching in Head-Related Transfer Functions—Pablo Faundez Hoffmann, Henrik Møller, Aalborg University - Aalborg, Denmark
Binaural synthesis of a time-varying sound field is performed by updating head-related transfer functions (HRTFs). The updating is done to reflect the changes in the sound transmission to the listener’s ears that occur as a result of moving sound. Unless the differences in HRTFs are sufficiently small, a direct switching between them will cause an audible artifact that is heard as a click. By modeling HRTFs as minimum-phase filters and pure delays, it is possible to study the effects of spectral and time switching separately. Time switching was studied in a previous investigation. This work presents preliminary results on minimum audible spectral switching (MASS).
Convention Paper 6537 (Purchase now)
P4-2 Virtual Source Location Information for Binaural Cue Coding—Sang Bae Chon, In Yong Choi, Han-gil Moon, Seoul National University - Seoul, Korea; Joengil Seo, Electronics and Telecommunications Research Institute - Daejeon, Korea; Koeng-Mo Sung, Seoul National University - Seoul, Korea
Binaural Cue Coding (BCC) is an audio coding technology that expresses multichannel audio signals with mono or stereo downmixed audio signal(s) and side information which are Inter-Channel Level Difference (ICLD), Inter- Channel Time Delay (ICTD), and Inter-Channel Correlation (ICC). Among these, the ICLD describes the level difference between the signal of one channel and the reference downmixed signal of the BCC system. This ICLD plays the most important role to lateralize spatial image. However, the fact that the spatial image of sound is created by the location of the sound source in nature raises the question whether there is a more direct solution to describe the location of the sound source for the spatial image than ICLD. Virtual Source Location Information (VSLI), the proposed new side information in this paper, provides an answer to this question.
Convention Paper 6538 (Purchase now)
P4-3 Perceptual Movement of Auditory Images Fed through Multiway Loudspeakers Perpendicularly Set Up—Yu Agatsuma, Eiichi Miyasaka, Musashi Institute of Technology - Yokohama, Kanagawa, Japan
Vertical localization and perceptual movement of auditory images were investigated through 8 loudspeakers perpendicularly set up with each loudspeaker 30-cm. The results obtained by more than 20 observers show that the localization was fairly identified for one octave bands of noises with the center frequencies from 2 to 8 kHz, and smooth movement of auditory images from low to high was perceived when a stimulus consisting of one octave bands of noises was linearly climbed up with a various movement speed.
Convention Paper 6539 (Purchase now)
P4-4 High Order Spatial Audio Capture and its Binaural Head-Tracjed Playback Over Headphones with HRTF Cues—Ramani Duraiswami, Dmitry Zotkin, Zhiyun Li, Elena Grassi, Nail Gumerov, Larry Davis, University of Maryland - College Park, MD, USA
A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented. The sound capture is performed with a spherical microphone array. The sound field at the location is deduced from the captured sound and is represented using either spherical wave-functions or plane-wave expansions. The sound field representation is then transmitted to a remote location for immediate rendering or stored for later us. The sound renderer, coupled with the head tracker, reconstructs the acoustic field using individualized head-related transfer functions to preserve the perceptual spatial structure of the audio scene. Rigorous error bounds and a Nyquist-like sampling criterion for the representation of the sound field are presented and verified.
Convention Paper 6540 (Purchase now)
P4-5 Performance of Spatial Audio Using Dynamic Cross-Talk Cancellation—Tobias Lentz, Ingo Assenmacher, Jan Sokoll, Aachen University (RWTH) - Aachen, Germany
Creating a virtual sound scene with spatial distributed sources needs a technique to introduce spatial cues into audio signals and an appropriate reproduction system. In our case a complete binaural approach is used. It consists of binaural synthesis and head-tracked dynamic cross talk cancellation (CTC) for the reproduction of the binaural signal at the ears of the listener. In this paper performance and limitations of the complete system and also the various subsystems will be investigated and discussed. The channel separation of the dynamic CTC system is measured for various positions in the listening area as well as the subjective accomplishment of the localization is examined in listening tests.
Convention Paper 6541 (Purchase now)
P4-6 An Application of Lined-Up Loudspeaker Array System for Mixed Reality Audio-Visual Reproduction System—Hiroyuki Okubo, Yasushige Nakayama, NHK Science & Technical Research Laboratories - Setagaya, Tokyo, Japan; Yoichi Naito, NHK Engineering Administration Department - Shibuya, Tokyo, Japan; Toshikazu Ikenaga, NHK Nagasaki Station - Nagasaki City, Nagasaki, Japan; Setsu Komiyama, NHK Broadcasting Engineering Department - Shibuya, Tokyo, Japan
An interactive audio-visual system called the Mixed Reality Audio-Visual (MRAV) reproduction system has been developed. The MRAV system employs a method of stereoscopic image projection and a technique of multichannel sound field reproduction in which the loudspeaker array is able to focus the sound pressure in front of the listener to coincide with the three-dimensional (3-D) visual image. A new sound screen with a silver metallic coating has also been developed. It helps to maintain the sound quality even if it radiated from the loudspeaker array system when it is set behind the screen. This paper describes the design of the loudspeaker array system and it discusses our examination of the generated sound field corresponding to 3-D CG images.
Convention Paper 6542 (Purchase now)
P5 - Multichannel Sound -1
Saturday, October 8, 9:00 am — 12:00 pm
Chair: Thomas Sporer, Fraunhofer IDMT - Ilmenau, Germany
P5-1 Perceptual Evaluation of 5.1 Downmix Algorithms—Thomas Sporer, Beate Klehs, Fraunhofer IDMT - Ilmenau, Germany; Judith Liebetrau, Felix Richter, Alexander Krake, Gabi Muckenschnabl, Mandy Weitzel, Technical University of Ilmenau - Ilmenau, Germany
In a workshop at the 118th AES Convention, problems and solutions for automatic down-mixing were summarized. The key question is for which items automatic algorithms provide acceptable quality. In close conjunction to this issue is the question how to evaluate the quality of the result. Standardized listening test procedures such as ITU-R BS.1116 and ITU-RBS.1534 are designed to evaluate the difference between an unimpaired reference and the modified signal under test. They never were intended to be used for the comparison of 5-channel signals to 2-channel signals. In this paper a new listening test procedure is described, which is designed to judge the quality of down-mixing algorithms. The first results of listening tests performed using this procedure are described.
Convention Paper 6543 (Purchase now)
P5-2 Discrimination of Auditory Source Focus for Musical Instrument Sounds with Varying Low-Frequency Cross Correlation in Multichannel Loudspeaker Reproduction—Sungyoung Kim, William Martens, Atsushi Marui, McGill University - Montreal, Quebec, Canada
This paper examines the changes in auditory spatial impression associated with changes in signal incoherence within the low-frequency portion of a multichannel loudspeaker reproduction. Multichannel recordings were made in reverberant concert settings of single notes played on musical instruments with significant low-frequency energy. A signal processing method was then developed to manipulate low-frequency correlation in the prerecorded material while maintaining high sound quality; subsequent listening tests measured the perceptual effects of varying low frequency correlation on otherwise identical recordings of low-pitch, single-note performances on musical instruments such as the bass violin. For cutoff frequencies ranging from 200 Hz down to 63 Hz, the effects of cutoff frequency on discrimination thresholds were measured for changes in low-frequency correlation using a two-alternative forced-choice task. Listeners also made forced-choice identifications regarding auditory source focus. Results indicated that both discrimination and identification performance was degraded in the presence of the higher-frequency portion of the musical stimuli.
Convention Paper 6544 (Purchase now)
P5-3 Optimizing Placement and Equalization of Multiple Low Frequency Loudspeakers in Rooms—Adrian Celestinos, Sofus Birkedal Nielsen, Aalborg University - Aalborg, Denmark
Every room has strong influence on the low frequency performance of a loudspeaker. This is often problematic to control and to predict. The modal resonances modify the response of the loudspeaker depending on placement and listening position. In order to anticipate the behavior of low frequency loudspeakers in rooms a simulation tool has been created based on finite-difference time-domain approximations (FDTD). Simulations have shown that by increasing the number of loudspeakers and modifying their placement a significant improvement is achieved. A more even sound pressure level distribution along a listening area is obtained. The placement of loudspeakers has been optimized. Furthermore an equalization strategy can be implemented for optimization purpose. This solution can be combined with multi channel sound systems.
Convention Paper 6545 (Purchase now)
P5-4 An Immersive Audio Environment with Source Positioning Based on Virtual Microphone Control—Jonas Braasch, Wieslaw Woszczyk, Timothy Ryan, McGill University - Montreal, Quebec, Canada
In this paper an auditory virtual environment (AVE) is described that uses virtual microphone control (ViMiC) to address a 24-channel loudspeaker system based on ribbon speakers. In the newly designed environment, the microphones, with adjustable directivity patterns and axes of orientation, can be spatially placed as desired. The system architecture was designed to comply with the augmented ITU surround-sound loudspeaker placement and to create sound imagery similar to that associated with standard sound recording practice. The AVE will be used with close-spot microphone techniques in two-way internet audio transmissions to avoid feedback loops and provide dynamic placement for a number of sources.
Convention Paper 6546 (Purchase now)
P5-5 Simulation and Visualization of Room Compensation for Wave Field Synthesis with the Functional Transformation Method—Stefan Petrausch, Sascha Spors, Rudolf Rabenstein, University of Erlangen-Nuremberg - Erlangen, Germany
Active room compensation based on wave field synthesis (WFS) has been recently introduced. So far, the verification of the compensation algorithm is only possible through elaborate acoustical measurements. Therefore, a new simulation method is applied that is based on the functional transformation method (FTM). Compared with other simulation techniques, the FTM provides several advantages that facilitate the correct simulation of the complete wave field particularly in the interesting frequency ranges for WFS. The complete procedure, starting from the virtual "measurements" of the acoustical properties of the simulated room, via the correct excitation for the simulated wave field, toward the resulting animations and sounds is presented in this paper.
Convention Paper 6547 (Purchase now)
P5-6 Acoustic Intensity in Multichannel Rendering Systems—Antoine Hurtado-Huyssen, Jean-Dominique Polack, Université de Paris - Paris, France
Acoustic intensity is the mechanical energy stream through a point in the sound field. In the far field, it also identifies the direction of the main source (when there is one), but it remains neglected data in multichannel recording and reproduction systems. This paper describes the information contained in sound intensity, how it can be used and what can be expected from it. It also underlines the fact that this data is accessible in the frequency domain through any existing cardioid recording system, such as ambisonic or double MS.
Convention Paper 6548 (Purchase now)
P6 - Signal Processing for Audio -1
Saturday, October 8, 9:30 am — 12:00 pm
Chair: Vicki Melchior, Audio Signal Processing Consultant - San Anselmo, CA, USA
P6-1 Toward a Procedure for Stability Analysis of High Order Sigma Delta Modulators—Josh Reiss, Queen Mary, University of London - London, UK
One of the greatest unsolved problems in the theory of sigma delta modulation concerns the ability to analytically derive the stability, or boundedness, of a high order sigma delta modulator (SDM). In this paper we describe the existing literature and try to clarify the issues involved. We fully derive the stability of first order sigma delta modulators and derive some important results for the basic second order sigma delta modulator. For third order sigma delta modulators, we describe interesting simulated results as well as sketch a proof of instability, based on linear programming, for one particular SDM. Finally, we present two theoretical results concerning stability of general high order SDMs that point towards promising directions of future research.
Convention Paper 6549 (Purchase now)
P6-2 An Interface for Analysis-Driven Sound Processing—Niels Bogaards, Axel Röbel, IRCAM - Paris, France
AudioSculpt is an application for the musical analysis and processing of sound files. The program unites a very detailed inspection of sound, both visually and auditorily, with high quality analysis-driven effects, such as time-stretch, transposition, and spectral filtering. Multiple algorithms provide automatic segmentation to guide the placement of sound treatments and steer processing parameters. By designing transformations directly on the sonogram, very precise spectral modifications can be made, allowing both intuitive sound design as well as sound restoration and source separation.
Convention Paper 6550 (Purchase now)
P6-3 A Comparison of Digital Power Amplifiers with Conventional Linear Technology: Performance, Function, and Application—Craig Bell, Isaac Sibson, Zetex Semiconductors plc - Oldham, Lancashire, UK
Drawing conclusions about the actual subjective performance of an amplifier from the results of standard tests can be difficult. The measured results for the harmonic distortion and intermodulation distortion in the case of the linear amplifier are excellent and exceed those of the digital amplifier. However, a subjective assessment with music material ranked the digital amplifier performance as superior. More investigation was required.
Convention Paper 6551 (Purchase now)
P6-4 Selective Mixing of Sounds—Piotr Kleczkowski, AGH University of Science and Technology - Krakow, Poland
An interesting psychoacoustic phenomenon has been found: the removal of large parts of musical tracks in the time-frequency domain may not be perceived in the mix at all, whereas some details of the sounds are heard enhanced in the mix. The phenomenon is described and investigations into the possibility of its practical use are presented. It is shown how the details of implementation and particular parameters affect the attributes of the sound. The differences in the sound of a standard mix and the sound of the mix based on this phenomenon are summarized.
Convention Paper 6552 (Purchase now)
P6-5 An Efficient Asynchronous Sampling-Rate Conversion Algorithm for Multichannel Audio Applications—Paul Beckmann, Timothy Stilson, Analog Devices - San Jose, CA, USA
We describe an asynchronous sampling-rate conversion (SRC) algorithm that is specifically tailored to multichannel audio applications. The algorithm is capable of converting between arbitrary asynchronous sampling rates around a fixed operating point and is designed to operate in multithreaded systems. The algorithm uses a set of fractional delay filters together with cubic interpolation to achieve accurate and efficient sampling-rate conversion.
Convention Paper 6553 (Purchase now)
P7 - Multichannel Sound -2
Saturday, October 8, 1:00 pm — 4:00 pm
Chair: Richard Duda, University of California at Davis - Davis, CA, USA
P7-1 An Approach for Multichannel Recording and Reproduction of Sound Source Directivity—Roland Jacques, Technical University Ilmenau - Ilmenau, Germany; Bernhard Albrecht, Technical University Ilmenau - Ilmenau, Germany; Frank Melchior, Fraunhofer Institute for Digital Media Technology - Ilmenau, Germany; Diemer de Vries, Delft University of Technology - Delft, The Netherlands
Current holophonic sound systems allow the creation and free positioning of virtual sound sources. But only few systems incorporate the directivity characteristics of natural sources; virtual sources are mostly monopoles. Especially in the case of auditory scenes with a high degree of freedom for source and listener positioning, the reproduction of directivities is desirable. This paper presents an approach for the efficient multichannel capturing of source directivities, as well as a suitable reproduction technique that creates an extensive sound field approximating the directional radiation of the real source. The performance of this system, which is based on wave field synthesis, has been examined by means of dedicated measurements and listening tests from recordings of brass instruments.
Convention Paper 6554 (Purchase now)
P7-2 Artificial Reverberation Algorithm to Control Distance and Direction of Sound Source for Multichannel Audio System—Jeong-Hun Seo, Hwan Shim, Seoul National University - Seoul, Korea; Jae-Hyoun Yoo, Electronics and Telecommunications Research Institute (ETRI) - Daejon, Korea; Koeng-Mo Sung, Seoul National University - Seoul, Korea
A multichannel artificial reverberation algorithm to control perceived direction and distance is described in this paper. In conventional algorithms using IIR filters, reverberation time is the only parameter to be controlled. Moreover, since the convolution-based conventional algorithms apply only same impulse responses, but did not consider sound localization, it was not realistic enough. The new algorithm proposed in this paper utilizes early reflections segmented according to the azimuth from which direct sound comes and controls perceived direction by panning the direct sound, and controls perceived distance by adjusting the Energy Decay Curve (EDC) of reverberation and gain of the direct sound. In addition, the algorithm enhances Listener Envelopment(LEV) to make late reverberation incoherent among channels.
Convention Paper 6555 (Purchase now)
P7-3 Surround Recording of Music: Problems and Solutions—Joerg Wuttke, SCHOEPS GmbH - Karlsruhe (Durlach), Germany
How can we make surround recording for pure audio applications more successful? Musical content and good sound (among other concerns) are as important in surround as they are in 2-channel stereo. But the manufacturers of home playback equipment and the producers of program material have not yet fulfilled all the possibilities. Surround, if it is to justify its higher costs, must offer both localization and a sense of spaciousness throughout an enlarged listening area. Success will therefore depend on the prevalence of recording methods that extract maximum benefit from the center channel. Various possibilities exist for surround microphone arrangements; several different methods for 5.0 pickup will be described, along with the issues of crosstalk and "stereo down-mix" compatibility.
Convention Paper 6556 (Purchase now)
P7-4 Motion-Tracked Binaural Sound for Personal Music Players—V. Ralph Algazi, Robert Dalton, Jr., Richard Duda, Dennis M. Thompson, University of California at Davis - Davis, CA, USA
Motion-tracked binaural or MTB recording enhances headphone-based spatial sound reproduction by capturing and exploiting localization cues that result from voluntary head motion. For music reproduction, the sound field can be stabilized for any arbitrary head rotation by using sixteen microphones to sample the space around the head and by employing the signal from a head tracker to interpolate between these channels. MTB’s use of headphones makes it particularly suitable for portable music players. However, the technique must be modified to meet the special needs of this application. Methods are described for (a) converting legacy recordings to MTB format, (b) reducing the number of channels from 16 to 2.5, and (c) processing the head-tracker signal to extract head motion from the combination of head and torso motion.
Convention Paper 6557 (Purchase now)
P7-5 Subjective Consumer Evaluation of Multichannel Audio Codecs—Jim Barbour, Swinburne University of Technology - Melbourne, Australia
Are normal listeners able to identify any significant differences between multichannel audio codecs when listening to commercial music releases on good quality, consumer audio equipment? Audio professionals have often questioned whether consumers are able to hear the difference between high density, uncompressed multichannel formats, and lower data-rate delivery formats. In this study, formal subjective listening tests were conducted according to the ITU-R BS.1534 (MUSHRA) recommendation to evaluate consumer perception of popular 5.1 surround sound formats, namely Dolby AC-3, DTS, WMA Pro and mp3surround. Results suggest there is a threshold data-rate below which consumers are able to hear audible differences. Experimental design, methodology, and results will be presented and discussed.
Convention Paper 6558 (Purchase now)
P7-6 Advanced Multichannel Audio System for Reproducing a Live Sound Field with Ultimate Sensation of Presence—Kimio Hamasaki, Hiroyuki Okubo, Toshiyuki Nishiguchi, Yasushige Nakayama, Reiko Okumura, Masakazu Iwki, NHK Science & Technical Research Laboratories - Tokyo, Japan
An advanced multichannel audio system for reproducing a live sound field with ultimate sensation of presence has been studied for reproducing the sound field without any restrictions on the listening points. This system also aims to give audiences the best natural impression of presence and reality according to their listening point and to be applicable to live broadcasting. This paper introduces the experimental sound system and describes in detail the recording techniques and latest subjective evaluation experiments to evaluate the appropriateness of necessary conditions of the new sound system. The paper also discusses the advantages of the new sound system regarding the impression of presence and interactivity compared with conventional multichannel audio systems.
Convention Paper 6559 (Purchase now)
P8 - Signal Processing for Audio -2
Saturday, October 8, 1:00 pm — 4:00 pm
Chair: Gilbert Soulodre, Communications Research Centre - Ottowa, Ontario, Canada
P8-1 VisualAudio—An Environment for Designing, Tuning, and Testing Embedded Audio Applications—David A. Jaffe, Paul Beckmann, Britton Peddie, Timothy Stilson, Scott Van Duyne, Analog Devices, Inc. - San Jose, CA, USA
Different hardware configurations and applications suggest different audio system design trade-offs. VisualAudio is focused on embedded processor applications and currently works with Analog Devices, Inc. SHARC and Blackfin processors. VisualAudio is appropriate for a wide range of applications, including general purpose audio, pro audio, music “stomp” boxes, consumer electronics (such as audio-visual receiver (AVR) systems), and automotive audio systems. This article describes the decisions that were made in the design of VisualAudio and how they are tailored to the embedded processing environment. It contrasts VisualAudio with previous systems created by the authors, particularly Staccato Systems’ “SynthCore,” currently known as Analog Devices’ “SoundMAX.”
Convention Paper 6560 (Purchase now)
P8-2 Analysis and Design Algorithm of Time Varying Reverberator for Low Memory Applications—Tacksung Choi, Junho Lee, Young-Cheol Park, Dae Hee Youn, Yonsei University - Seoul, Korea
Development of an artificial reverberation algorithm with low memory requirements has been an issue of importance in applications such as mobile multimedia devices. One possible solution to this problem is to embed a time-varying all-pass filter to the feedback loop of the comb filter. In this paper theoretical and perceptual analyses of reverberators embedding time-varying all-pass filters in their feedback loops are presented. The analyses are to find a perceptually acceptable degree of phase variation by the all-pass filter. Based on the analyses, we propose a new methodology of designing reverberators embedding time-varying all-pass filters. Through the subjective tests, we showed that, even with smaller memory, the proposed method is capable of providing perceptually superior sound to the previous methods involving time-invariant parameters.
Convention Paper 6561 (Purchase now)
P8-3 A Comparison of the Performance of “Pruned Tree” versus “Stack” Algorithms for Look-Ahead Sigma Delta Modulators—James Angus, The University of Salford - Salford, Greater Manchester, UK
Look-ahead sigma-delta modulators look forward k samples before deciding to output a “one” or a “zero.” The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. This paper describes alternative tree-based algorithms. Tree-based algorithms are simpler to implement because they do not require backtracking to determine the correct output value. They can also be made more efficient using “Stack” algorithms. Both the tree algorithm and the more computationally efficient “Stack” algorithms are described. Implementations of both algorithms are described in some detail. In particular, the appropriate data structures for both the trial filters and score memories. Comparative results of their performance are also presented.
Convention Paper 6562 (Purchase now)
P8-4 Adaptive Strategies for Inverse Filtering—Scott Norcross, Communications Research Centre - Ottowa, Ontario, Canada; Martin Bouchard, University of Ottawa - Ottowa, Ontario, Canada; Gilbert Soulodre, Communications Research Centre - Ottowa, Ontario, Canada
Inverse filtering methods commonly use techniques such as regularization and/or smoothing to reduce artifacts created by the inverse filter. Previous studies have shown that these additional techniques can themselves introduce audible artifacts. Furthermore, the “optimal” amount of regularization or smoothing must be chosen by trial and error. This paper introduces some adaptive strategies based on analyzing the incoming audio to improve the subjective performance of various inverse filtering methods. The incoming audio signal is processed in blocks and the spectrum or masking curve can be calculated. One can then use the information from the audio signal to modify the inverse filter to help its performance. The characteristics of the incoming audio signal could also be used to determine if the application of an inverse filter is even necessary. In this paper two approaches are used to help define an inverse filter that is dependent on the incoming audio signal based on a frequency-domain fast-deconvolution method.
Convention Paper 6563 (Purchase now)
P8-5 New Understandings of the Use of Ferrites in the Prevention and Suppression of RF Interference to Audio Systems—Jim Brown, Audio Systems Group, Inc. - Chicago, IL, USA
Building on the work of Muncy, the author has shown that radio-frequency current on cable shields is often coupled to audio systems by two mechanisms—“the pin 1 problem” and shield-current-induced noise (SCIN). An improved equivalent circuit for a ferrite choke is developed that addresses both dimensional resonance within ferrites and the self resonance of inductors formed using those materials, then compared with measured data. Field tests show that chokes formed by passing signal cables through ferrite cores can significantly reduce current-coupled interference over the range of 500 kHz to 1,000 MHz. Guidelines are presented for diagnosing the causes of EMI from sources as diverse as AM broadcast transmitters and cell phones. Solutions are presented both for use in new products and for RFI suppression in field installations.
Convention Paper 6564 (Purchase now)
P8-6 Parametric Control of Filter Slope versus Time Delay for Linear Phase Crossovers—David McGrath, Justin Baird, Bruce Jackson, Lake Technology - Surry Hills, New South Wales, Australia
Linear phase crossover filters are a powerful tool for sound system designers. They deliver a near-ideal response with ruler-flat pass band, steep transition slopes, and adjustable stop-band rejection—all with zero phase shift. Transition slopes can be matched to a target response, for example 24 dB or 48 dB per octave, and can also be arbitrarily specified while still retaining a perfect-reconstruction characteristic. Practical application of linear phase crossovers requires manipulation of center frequency, transition slope, and stopband rejection. A graphical user interface is described that gives users new degrees of freedom in defining linear phase filter parameters. By setting bounds for parameters such as delay, a user can continuously vary other parameters while the graphical user interface optimizes the resulting filter. This paper presents new parameters for optimization of a target transition slope within a bounded delay parameter, providing fast and efficient user controls for working with and adjusting the crossover filters in real time.
Convention Paper 6565 (Purchase now)
P9 - Posters: Miscellaneous -1
Saturday, October 8, 1:00 pm — 2:30 pm
P9-1 A Robust Partial Tracker for Analysis of Music Signals—Hamid Satar-Boroujeni, Bahram Shafai, Northeastern University - Boston, MA, USA
We propose a novel approach for tracking of partials in music signals based on a robust Kalman filter. This tracker is based on a regularized least-squares approach that is designed to minimize the worst-possible regularized residual norm over the class of admissible uncertainties at each iteration. We introduce a set of state-space models for our signals based on the evolution of frequency and amplitude in different classes of musical instruments. These prior models are used to estimate future values of partial tracks in successive time frames of our spectral data. Parameters of evolution models are treated as bounded uncertainties, and our tracker can robustly track both frequency and power partials in all frequency regions.
Convention Paper 6566 (Purchase now)
P9-2 Automatic Retrieval of Musical Rhythmic Patterns—Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Jaroslaw Wojcik, Wroclaw University of Technology - Wroclaw, Poland
Even though the research within Music Information Retrieval domain is well-advanced, searching for music is still under development. Thanks to melody search methods applied in "query by humming" systems, users can retrieve melodies on the basis of an audio input. However, the research on rhythm is not advanced to such an extent yet. This paper addresses automatic retrieval of rhythmic patterns based on symbolic representation of music employing repeating rhythmic and melodic patterns. In the experiments the importance of melorhythmic representation of a musical piece is verified and compared to the sound duration-based hypothesis ranking method. Since most of the musical files to be found on the Internet are polyphonic the lowest or the highest sounds of the chords are also taken into consideration.
Convention Paper 6567 (Purchase now)
P9-3 A Spectrogram Display for Loudspeaker Transient Response—David Gunness, William Hoy, Eastern Acoustic Works, Inc. - Whitinsville, MA, USA
A spectrogram is a two-dimensional depiction of a waveform or transfer function in which frequency is depicted on one axis and time is depicted on the other. The level is plotted against frequency and time by using a color or gray scale. If the time resolution is constant, the display is usually referred to as a Fourier transform spectrogram. If the time resolution is scaled to the frequency, it is usually referred to as a wavelet transform spectrogram. In this paper we present a novel and efficient method for calculating a wavelet transform spectrogram, which is optimized for the analysis of loudspeaker transient response. The new method employs complex convolution of the frequency response, rather than explicit time domain windowing, or the wavelet transform.
Convention Paper 6568 (Purchase now)
P9-4 Quality Enhancement of Low Bit Rate MPEG1-Layer 3 Audio Based on Audio Resynthesis—Demetrios Cantzos, Chris Kyriakakis, University of Southern California - Los Angeles, CA, USA
One of the most popular audio compression formats is indisputably the MPEG1-Layer 3 format which is based on the idea of low-bit transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher quality home audio systems, it is becoming evident that higher bit rates may be required to maintain transparency. We propose a novel method that enhances low bit rate MP3 encoded audio segments by applying multichannel audio resynthesis methods in a post-processing stage or during decoding. Our algorithm employs the highly efficient Generalized Gaussian mixture model, which, combined with cepstral smoothing, leads to very low cepstral reconstruction errors. In addition, residual conversion is applied which proves to significantly improve the enhancement performance. The method presented can be easily generalized to include other audio formats for which sound quality is an issue.
Convention Paper 6569 (Purchase now)
P9-5 Obtaining 120-dB Performance Using Switching Power Supplies—Gregg Rouse, Larry Gaddy, AKM Semiconductors, Inc. - San Jose, CA, USA
There is a growing tendency to use switching power supplies to reduce costs. Many designers believe that switching supplies and high performance are mutually exclusive. With some careful design considerations, it is possible to optimize performance using switching power supplies. This paper will review selections of optimal switching frequency. The tradeoffs of switching frequency and efficiency, which are typical in high performance systems, will be examined. Measurement results using asynchronous audio switching clocks will be presented. Measurement results of synchronizing power supply switching to audio clocking multiples will demonstrate how to achieve 120-dB performance, the current high benchmark for professional audio systems.
Convention Paper 6570 (Purchase now)
P9-6 Influence of Artificial Mouth’s Directivity in Determining Speech Transmission Index—Fabio Bozzoli, Paolo Bilzi, Angelo Farina, University of Parma - Parma, Italy
In room acoustics, one of the most used parameters for evaluating the speech intelligibility is the Speech Transmission Index (STI). The experimental evaluation of this STI generally employs an artificial speaker (binaural head) and listener (artificial mouth). In this paper the influence on the measurements of the emission directivity of the artificial mouth was investigated for different acoustic environments and we have found that, in many cases (i.e., big rooms or systems of telecommunications) the results are not sensitive to modifications of the directivity; on the contrary, inside cars the shape of the whole balloon of directivity is important for determining correct and comparable values and the different mouths studied gives really different results in the same situation.
Convention Paper 6571 (Purchase now)
P10 - Posters: Audio Coding & Loudspeakers & Hi Resolution Audio
Saturday, October 8, 3:00 pm — 4:30 pm
P10-1 A New Low-Delay Codec for Two-Way High-Quality Audio Communication—Aníbal Ferreira, University of Porto - Porto, Portugal and ATC Labs, Chatham, NJ, USA; Deepen Sinha, ATC Labs - Chatham, NJ, USA
High-quality audio bit-rate reduction systems are widely used in many application areas involving audio broadcast, streaming, and download services. With the advent of 3G mobile and wireless communication networks, there is a clear opportunity for new multimedia services, notably those relying on two-way high-quality audio communication. In this paper we describe a new source/perceptual audio coder that features low-delay, intrinsic error robustness, and high subjective audio quality at competitive compression ratios. The structure of the audio coder is described and an emphasis is given on its innovative approaches to semantic signal segmentation and decomposition, independent coding of sinusoidal and noise components, and bandwidth extension using accurate spectral replacement. A few test results are presented that illustrate the operation and performance of the new coder. Audio demonstations are available at http://www.atc-labs.com/acc/.
Convention Paper 6572 (Purchase now)
P10-2 Compensation of Nonlinearities of Horn Loudspeakers—Delphine Bard, Mario Rossi, Ecole Polytechnique Federale de Lausanne - Lausanne, Switzerland; Mauro Del Nobile, Swissphonics - Peseux, Switzerland
This paper presents a compensation method of nonlinearities of horn loudspeakers. It is possible to compensate the nonlinearity’s effects of electroacoustic devices by applying the inverse nonlinearity upstream. The method is based on the measurements of the nonlinearity by Volterra series using multitone excitations. Once the Volterra kernels have been determined, we proceed by computing the inverse Volterra kernels, both in magnitude and phase. The method was implemented and validated in non-real time and real time (DSP implementation). To validate the nonlinearities compensation a comparison between total harmonic distortion measurements with and without compensation has been done.
Convention Paper 6573 (Purchase now)
P10-3 Diaphragm Parameters and Radiation Characteristics of Multilayer Piezoelectric Ceramic Loudspeakers—Jun Fujii, Juro Ohga, Shibaura Institute of Technology - Mitato-ku,Tokyo, Japan; Norikazu Sashida, Ashida Sound Co. - Shinagawa-ku, Tokyo, Japan; Ikuo Oohira, Taiyo Yuden Co., Ltd. - Hauna-machi., Gunma., Japan
This paper presents diaphragm parameters and analysis of radiation characteristics for small size loudspeaker by a multilayer piezoelectric ceramic bimorph diaphragm. The multilayer ceramic wafer is suitable for battery-operated mobile phones because of its lower electrical impedance nature. Three diaphragm parameter measuring methods are compared to develop the optimum measurement of diaphragm parameters. Then, output sound pressure frequency characteristics of a loudspeaker model with actual acoustical loads are analyzed.
Convention Paper 6574 (Purchase now)
P10-4 A Study on Lumped Elements Model and Thermal Effects of Eddy Currents in Loudspeakers—Ning Wu, Yong Shen, Xiaobing Xu, Nanjing University - Nanjing, China
A frequency-divided thermal model is developed to study the heat arising from the eddy currents in electro-dynamic loudspeakers. Using pure tone as the test signal, the steady state temperature of the voice coil is measured point by point in a high frequency range. The results illuminate the contrast of all the existing lumped electrical-models of eddy currents and show distinctly which is better. Also, a set of innovative thermal expressions considering different circumstances are deduced from the simplified thermal model. With these expressions and the measurement data, all the thermal elements’ value in the model can be obtained. Arbitrary temperature-rising course can then be predicted easily if several necessary parameters are given.
Convention Paper 6575 (Purchase now)
P10-5 Spatial Audio Coding System Based on Virtual Source Location Information—Jeongil Seo, Inseon Jang, Kyeongok Kang, Electronics and Telecommunications Research Institute (ETRI) - Daejon, Korea
Spatial audio coding (SAC) is a process to represent multichannel audio signals as down-mixed mono or stereo signals with spatial cues. The main strength of SAC is the significant bit-rate reduction while maintaining the perceptual sound quality. Binaural cue coding (BCC) has been introduced and now becomes an important scheme for multichannel SAC both in the sense of audio coding and the standardization issue in the MPEG. However, interchannel level difference (ICLD), one of the essential spatial cues for SAC, has a limitation that the quantized ICLD for transmission may lead to the sound quality degradation of a decoded signal. In this paper we propose virtual source location information (VSLI), which is an angle representing geometric spatial information between channels on playback layout, instead of the ICLD, and also a VSLI-based SAC system. Since a human being cannot easily distinguish the variation of the spatial angle within the three degree distortion, the spatial angle, hence the VSLI, can be approximated discretely with the three degree resolution while maintaining the perceptual quality of output signals. The objective and subjective assessment results of our proposed system confirm superior performance to the ICLD-based SAC system.
Convention Paper 6576 (Purchase now)
P10-6 An Ultra High Performance DAC with Controlled Time-Domain Response—Paul Lesso, Anthony Magrath, Wolfson Microelectronics - Edinburgh, Scotland, UK
This paper describes the design of an ultra-high performance stereo digital-to-analog converter (DAC) employing advanced digital filtering techniques. Recently there has been a renewed interest in the time-domain properties of digital filters used for interpolation and decimation. Linear phase FIR filters, which have proliferated digital filter design for the last two decades, have the undesirable properties of pre-ringing and high group delay. Conversely, minimum phase filters, which offer lower levels of pre-ringing, do not have a uniform phase response. This paper describes the trade-offs in the design of filters with controlled pre-ringing, coupled with desirable phase and magnitude characteristics. The paper also describes architectural choices in the implementation of the DAC signal processing chain, required to achieve commensurate analog performance.
Convention Paper 6577 (Purchase now)
P10-7 Understanding the Effects of AES-17 When Evaluating 192-kHz Converter Performance—Richard Kulavik, Larry Gaddy, AKM Semiconductors, Inc. - San Jose, CA, USA
This paper will cover hidden performance issues in 192-kHz converters that are evaluated using AES-17. AES-17 is the “AES Standard method for digital audio engineering—Measurement of digital audio equipment.” This standard calls out how to test most digital audio equipment, and it outlines the standard test procedures and methods in testing of audio. It is important to understand that these methods allow for several important issues to be hidden in the real spectrum of the converter. These hidden elements will be addressed in detail. Specifically, sections 9.1 and 9.3 of the AES standard will be examined.
Convention Paper 6578 (Purchase now)
P11 - Loudspeakers -1
Sunday, October 9, 9:00 am — 12:00 pm
Chair: Wolfgang Klippel, Klippel GmbH - Dresden, Germany
P11-1 Wideband Piezoelectric Rectangular Loudspeaker Using A Tuck Shape PVDF Bimorph—Toshinori Ouchi, Juro Ohga, Shibaura Institute of Technology - Minato-ku, Tokyo, Japan; Toshitaka Takei, Take T Co. - Kanagawa, Japan; Nobuhiro Moriyama, Kureha Chemical Industry Co. - Tyuou-ku, Tokyo Japan
A bimorph sheet of PVDF (polyvinylidenfluoride) film is applied to a flat rectangular loudspeaker as a folded zigzag-tack shape diaphragm whose size is, for example, 260 mm x 144 mm with various depths. These loudspeakers are characterized by moderate size with a wide frequency range, light weight, and no magnetic flux radiation. This paper examines the electro-acoustic transducer characteristics of this loudspeaker. Sensitivity and resonant frequency were measured by using both a flat panel baffle and a closed box. Theoretical estimation was carried out by thin curved beam theory. The estimated values are compared to the measured results.
Convention Paper 6579 (Purchase now)
P11-2 A Proposal for Low Frequency Loudspeaker Design Utilizing Ultrasonic Motor—Juro Ohga, Shibaura Institute of Technology - Tokyo, Japan; Hirokazu Negishi, DiMagic Co. Ltd. - Tokyo, Japan; Ikuo Oohira, Ashida Sound Co. - Tokyo Japan
Limitations of low frequency sound reproduction ability of the conventional direct-radiator loudspeakers is essential because the diaphragm of them must be driven in a mass-controlled range for a flat response. This paper proposes a novel direct-radiator loudspeaker suitable for low-frequency signal radiation. It utilizes an ultrasonic motor (USM) including a piezoelectric transducer. A velocity modulated continuous revolution is better than a reciprocal motion to avoid distortion in wave form due to the difference between dynamic and static frictional forces. A few fundamental ideas, a continuously revolving flat radiator, an air-flow modulation type without any mechanical radiator, and a conventional radiator actuated by a revolving mass are compared to investigate the merits of the loudspeaker proposed here.
Convention Paper 6581 (Purchase now)
P11-3 Finite Element Modeling of a Loudspeaker. Part 1: Theory and Validation—David Henwood, Brighton University - Brighton, UK; Gary Geaves, B&W Group Ltd. - Steyning, W. Sussex, UK
The paper describes finite element modeling of an axi-symmetric loudspeaker and the resulting predicted behavior, both the motion and the resulting sound pressure field, in both the time and frequency domains. The effect of the electrical circuit is included through postprocessing. Laser and impedance measurements are shown to aid the estimation of the material parameters. Predictions are compared with measured responses and are seen to represent the main features accurately. A significant spider resonance is described. Modes can be loosely classified by the position of their dominant motion, in the spider, cone or surround. This understanding of the modal structure is used in a study of trying to reduce the influence of a cone mode by varying a surround parameter (thickness). An additional paper, Part 2 at this conference, describes applications of the model.
Convention Paper 6582 (Purchase now)
P11-4 Radiated Sound Field Analysis of Loudspeaker Systems: Discrete Geometrical Distribution of Circular Membranes versus Co-Incident Annular Rings—Bernard Debail, Cabasse Acoustic Center - Plouzané, France; Hmaied Shaiek, GET - Brest Cedex, France
This paper addresses the problem of the spatial distribution of sound pressure generated by an annular membrane. A generalized theoretical approach will be developed in order to predetermine the sound field radiation of a disc or ring-shaped diaphragm, placed in a rigid infinite baffle. Because no assumption is made regarding the observing point, this generalized method is able to predict the acoustic pressure not only in the far field region but also in near field. Results demonstrate the superiority of the co-incident ring distribution compared to the traditional discrete distribution of discs. A new transducer based on concentric ring and disc especially designed to respect this coincident criterion will be introduced.
Convention Paper 6583 (Purchase now)
P11-5 Loudspeaker Nonlinearities— Causes, Parameters, Symptoms—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
The paper addresses the relationship between nonlinear distortion measurement and nonlinearities, which are the physical causes for signal distortion in loudspeakers, headphones, micro-speakers, and other transducers. Using simulation techniques, characteristic symptoms are found for each nonlinearity and presented systematically in a guide for loudspeaker diagnostics. This information is important for the interpretation of nonlinear parameters and for performing measurements that describe the loudspeaker more comprehensively. The practical application of the new techniques are demonstrated on three different loudspeakers.
Convention Paper 6584 (Purchase now)
P11-6 Modeling Compression Drivers Using T-Matrices and Finite Element Analysis—David J. Murphy, Krix Loudspeakers - Hackham, SA, Australia; Rick Morgans, Consultant - Australia
Models for a commercial compression driver were developed using transmission line matrices and finite element analysis using the commercial package ANSYS. The models were compared with measurements using plane wave tube loading and discrepancies were investigated. The electrical impedance was measured in-vacuo to obtain Thiele-Small parameters without acoustic loading. A resonance was investigated and found to be air leakage into the magnet cavity. The development of frequency dependent damping in the matrix and FEA models was necessary to improve the simulation accuracy.
Convention Paper 6580 (Purchase now)
P12 - Audio Coding -1
Sunday, October 9, 9:30 am — 12:00 pm
Chair: Alan Seefeldt, Dolby Laboratories - San Francisco, CA, USA
P12-1 Upfront Time Segmentation Methods for Transform Coding of Audio—Omar Niamut, Richard Heusdens, Huib Lincklaen Arriëns, Delft University of Technology - Delft, The Netherlands
We study a transform coder that employs a dynamic programming-based rate-distortion optimization framework for time segmentation. Although this coder exhibits a high performance, its computational complexity makes it unfeasible for many practical applications. It is investigated whether up-front time segmentation can reduce computational complexity without a significant decrease in performance. Up-front time segmentation can be accomplished by replacing the rate-distortion cost functional with low-complexity cost measures that are independent of bit rate and perceptual distortion. Through both quantitative and qualitative evaluation it is shown that dynamic programming-based up-front time segmentation for minimization of perceptual entropy can be a viable alternative to rate-distortion optimal time segmentation.
Convention Paper 6585 (Purchase now)
P12-2 Enhanced Accuracy of the Tonality Measure and Control Parameter Extraction Modules in MPEG-4 HE-AAC—Sang-Uk Ryu, Kenneth Rose, University of California at Davis - Santa Barbara, CA, USA
This paper investigates possible enhancements of the high efficiency-advanced audio coding (HE-AAC) encoder, with focus on the spectral band replication modules. The HE-AAC encoder generates side information, including control parameters, that characterizes the energy distribution across time and frequency as well as tonal and noise components, to ensure perceptually coherent regeneration of the high band at the decoder. The accuracy of the encoder's tonality measure and control parameter extraction modules is analyzed, leading to the proposal of an alternative approach employing sinusoidal analysis, which offers enhanced estimation of tonal and noise energy levels, as well as an improved control parameter extraction procedure. Comparative performance evaluation of the standard and modified encoders, on a set of audio signals demonstrates the perceptual impact of estimation inaccuracy on the regenerated high band quality, and identifies the type of audio where it causes meaningful degradation.
Convention Paper 6586 (Purchase now)
P12-3 New Techniques in Spatial Audio Coding—Alan Seefeldt, Mark Vinton, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA
The goal of spatial audio coding is to data compress multichannel audio material by combining channels into a composite signal and transmitting supporting side-information so that a decoder can reconstruct an approximation of the original signal from the composite. Many techniques have been discussed in the literature, most of which manipulate across time and frequency the magnitude and phase of the composite channels to create a perceptual approximation of the original multichannel sound field. Building on this framework, we discuss new techniques for computing and applying the side-information, new de-correlation techniques, and a new way of utilizing a traditional spatial coding system for the purpose of synthesizing a multichannel signal blindly from an existing stereo signal. We also compare the performance of this system to other existing systems.
Convention Paper 6587 (Purchase now)
P12-4 A New Broadcast Quality Low Bit Rate Audio Coding Scheme Utilizing Novel Bandwidth Extension Tools—Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal, ATC Labs, Chatham, NJ, USA
In this paper we describe the components of a novel audio coding algorithm capable of delivering high-fidelity CD-like stereo audio at the bit rates of 40 to 48 kbps and natural sounding FM grade mono at the bit rates of 18 to 22 kbps. Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio codecs. Formerly, we proposed one of a newer class of Bandwidth Extension techniques which are applied directly to the high resolution frequency representation of the signal (e.g., MDCT). This technique is based on a Fractal Self-Similarity Model (FSSM) for signal spectrum. The FSSM bandwidth extension forms a key component of the proposed codec. Other important components of the proposed scheme include a novel parametric stereo coding technique and a wideband psychoacoustic model that makes an explicit use of the Comodulation Release of Masking (CMR) phenomenon. This audio coding scheme is geared toward broadcast applications where codec latency and encoder complexity is generally not an overriding concern. Algorithmic details, audio demonstrations, and comparison to other audio coding schemes will be presented.
Convention Paper 6588 (Purchase now)
P12-5 The MPEG-4 Audio Lossless Coding (ALS) Standard—Technology and Applications—Tilman Liebchen, Technical University of Berlin - Berlin, Germany; Takehiro Moriya, Noboru Harada, Yutaka Kamamoto, NTT Communication Science Labs - Atsugi, Japan; Yuriy Reznik, Real Networks, Inc. - Seattle, WA, USA
MPEG-4 Audio Lossless Coding (ALS) is a new extension of the MPEG-4 audio coding family. The ALS core codec is based on forward-adaptive linear prediction, which offers remarkable compression together with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material. In this paper authors who have actively contributed to the standard describe the basic elements of the ALS codec with a focus on prediction, entropy coding, and related tools. We also present the latest developments in the standardization process and point out the most important applications of this new lossless audio format.
Convention Paper 6589 (Purchase now)
P13 - Loudspeakers -2
Sunday, October 9, 1:00 pm — 4:00 pm
Chair: D. B. (Don) Keele, Jr., Harman International Industries - Northridge, CA, USA
P13-1 Improving Loudspeaker Transient Response with Digital Signal Processing—David Gunness, Loud Technologies, Inc. - Whitinsville, MA, USA
The transient response of a loudspeaker represents the combined effect of a multitude of physical behaviors. Some of these behaviors are time-variant, nonlinear, or spatially variable and are not good candidates for digital correction. Others are sufficiently LTI (linear, time-invariant) and sufficiently consistent directionally to be largely correctable with specialized digital filters. In the particular case of high powered, horn-loaded loudspeakers, most of the observed transient misbehavior is the result of stable, correctable phenomena. Consequently, the transient response of such loudspeakers can be significantly improved with signal preconditioning. Measurements demonstrate the improvements that are possible.
Convention Paper 6590 (Purchase now)
P13-2 Simulation of Harmonic Distortion in Horns Using an Extended BEM Postprocessing—Michael Makarski, RWTH Aachen University - Aachen, Germany
The Boundary Element Method is a well-known tool to calculate sound radiation of horns. As the BEM is based on the linearized sound field equation, only linear properties of the sound field (frequency response, directivity, etc.) can be calculated. In addition to these linear properties, the nonlinear wave propagation in horns is of great interest. It depends mainly on the shape of the horn and the growth rate of the first narrow part. This paper describes a method to combine the pure linear method BEM with the calculation of nonlinear wave propagation in horns. Simulation and measurement results of different horns are presented and discussed. As first results indicate, this method offers a fast and accurate way to calculate nonlinear wave propagation in horns.
Convention Paper 6591 (Purchase now)
P13-3 Modal Analysis and Nonlinear Normal Modes (NNM) on Moving Assemblies of Loudspeakers—Fernando Bolaños, Acústica Beyma SA - Valencia, Spain
The most important modes for a direct acoustic radiator are the axial modes, which are axisymmetric circular modes of high coherence. Numeric modal analysis and measurement of the free and forced accelerations and displacement responses of the moving assemblies are performed to establish the main modes involved on the acoustic response. The axial modes had been identified by measurements (within the intrinsic degree of uncertainty). The experiences show evidence of clearly nonlinear normal modes (NNM), justifying the high complexity of mode finding in loudspeaker cones. Based on the axial modes, a three degrees of freedom model is proposed, where only one of the masses is externally forced. The modal analysis of a double cone speaker has been treated in short form.
Convention Paper 6592 (Purchase now)
P13-4 Finite Element Modeling of a Loudspeaker. Part 2: Applications—John Vanderkooy, University of Waterloo - Waterloo, Ontario, Canada; David Henwood, Gary Geaves, B&W Group Ltd. - Steyning, West Sussex, UK
The finite-element loudspeaker model presented in Part 1 is extended to three applications. First, we study the effect of a significant increase of the magnetic motor strength Bl on the breakup modes and other resonances of a typical driver. Approximations to the theory allow modal decomposition even when the loudspeaker voice coil is driven from a normal amplifier, showing that the modes most affected are those for which the back-emf due to voice-coil motion is strong. Second, we probe how the shape of the cone influences the resonances, breakup, and acoustic performance. Cones that flare outward have the most desirable characteristics. A final third study concerns the result of a change in the distribution of the damping and stiffness of parts of the driver, to see if useful characteristics ensue. The model is also used to investigate some aspects of measurements.
Convention Paper 6593 (Purchase now)
P13-5 Ground-Plane Constant Beamwidth Transducer (CBT) Loudspeaker Circular-Arc Line Arrays—D. B. (Don) Keele, Jr., Harman International Industries - Northridge, CA, USA; Douglas J. Button, JBL Professional - Northridge, CA, USA
This paper describes a design variation of the CBT loudspeaker line array that is intended to operate very close to a planar reflecting surface. The original free-standing CBT array is halved lengthwise and then positioned close to a flat surface so that acoustic reflections essentially recreate the missing half of the array. This halved array can then be doubled in size, which forms an array that is double the height of the original array. When compared to the original free-standing array, the ground-plane CBT array provides several advantages including: (1) elimination of floor reflections, (2) doubles array height, (3) doubles array sensitivity, (4) doubles array maximum SPL capability (5) extends vertical beamwidth control down an octave, and (6) minimizes near-far variation of SPL. This paper explores these characteristics through sound-field simulations and over-the-ground-plane measurements of three systems: (1) a conventional two-way compact monitor, (2) an experimental unshaded straight-line array, and (3) an experimental CBT Lengendre-shaded circular-arc curved-line array.
Convention Paper 6594 (Purchase now)
P13-6 A Balanced Modal Radiator (BMR)—Neil Harris, New Transducers Ltd. - Huntingdon, Cambs., UK; Graham Bank, Consultant
The goal of a practical loudspeaker that behaves like the "perfect point source" has been long sought. Mathematical analysis shows that the prototype for such a device does indeed exist, but it does not point to an obvious embodiment. Using this prototype, a practical flat diaphragm loudspeaker is developed, which has a substantially flat on-axis pressure response, as well as a smooth and extended power response. A fully-coupled FEA model is used to investigate the intrinsic characteristics of this radiator in both the mechanical and acoustical domains. Measurements from a real prototype loudspeaker illustrate the practicality of the method.
Convention Paper 6595 (Purchase now)
P14 - Audio Coding -2
Sunday, October 9, 1:00 pm — 4:00 pm
Chair: Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
P14-1 Jointly Optimal Time Segmentation, Distribution, and Quantization for Sinusoidal Coding of Audio and Speech—Richard Heusdens, Jesper Jensen, Pim Korten, Delft University of Technology - Delft, The Netherlands
In this paper we propose a rate-distortion optimal algorithm for sinusoidal coding of audio and speech. The algorithm determines for a pre-specified target bit-rate the optimal (variable-length) time segmentation, the optimal distribution of sinusoidal components over the segments, and the optimal (scalar) quantizers for quantizing the sinusoid parameters amplitude, phase, and frequency. The optimization is done by jointly optimizing the segment lengths, number of sinusoids, and quantizers using high-resolution quantization theory and dynamic programming techniques, which makes it possible to execute the algorithm in polynomial time. A particular advantage of the proposed method is that, given a target bit-rate, it solves the problem of finding the optimal balance between total number of sinusoids and number of bits per sinusoid.
Convention Paper 6596 (Purchase now)
P14-2 Enhanced Performance in the Functionality of Fine Grain Scalability—KiHyun Choo, Eunmi Oh, Jung-Hoe Kim, ChangYong Son, Samsung Advanced Institute of Technology - Suwon, Korea
The purpose of this paper is to take advantage of the characteristics of arithmetic decoding and then improve coding efficiency of codecs that provide the functionality of fine-grain scalability. The smart decoding algorithm exploits the fact that a decoding buffer still contains meaningful information for arithmetic decoding when there is no bit to be fed into the buffer. We tested the effect of the symbols additionally decoded from truncated MPEG-4 BSAC and MPEG-4 scalable lossless audio coding (SLS) bit streams. On average, approximately 41 symbols and 13 additional symbols are uniquely decodable per frame in MPEG-4 BASC and MPEG-4 SLA respectively. The experimental results show that much less spectral difference and higher SNR with the smart arithmetic decoding. This additional “compression” can be effective when transmitting truncated bit streams at lower bit rates.
Convention Paper 6597 (Purchase now)
P14-3 Scalability in KOZ Audio Compression Technology—Kevin Short, Ricardo Garcia, Michelle Daniels, Chaoticom Technologies - Andover, MA, USA
Intra-codec scalability in the KOZ audio compression technology is presented in detail. The KOZ codec uses a psychoacoustic model and high-resolution spectral analysis to create, prioritize, and layer audio objects, making it inherently scalable by varying the number of layers. The layers are sufficiently fine-grained to allow both small-step and large-step bit rate variations in real-time during content delivery. Decoder scalability based on availability of device resources is introduced. An overview of the architecture of the KOZ technology and some of the applications of its scalability are discussed.
Convention Paper 6598 (Purchase now)
P14-4 MPEG Spatial Audio Coding/MPEG Surround: Overview and Current Status—Jeroen Breebaart, Philips Research Laboratories - Eindhoven, The Netherlands; Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christof Faller, Agere Systems - Allentown, PA, USA; J. Rödén, Coding Technologies - Stockholm, Sweden; F. Myburg, Philips Applied Technologies - Eindhoven, The Netherlands; S. Disch, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Heiko Purnhagen, Coding Technologies - Stockholm, Sweden; G. Hotho, Philips Research Laboratories - Eindhoven, The Netherlands; M. Neusinger, Franuhofer Institute for Integrated Circuits IIS - Erlangen, Germany; K. Kjörling, Coding Technologies, Stockholm, Swede; W. Oomen, Philips Applied Tec
Recently, the MPEG a udio standardization group started a new work item on spatial audio coding. This new approach allows for a fully backward compatible representation of multichannel audio at bit rates that are only slightly higher than common rates currently used for coding of mono/stereo sound. This paper briefly describes the underlying idea and reports on the current status of the MPEG standardization activities. It provides an overview of the resulting "MPEG Surround" technology and discusses its capabilities. The current level of performance will be illustrated by listening test results.
Convention Paper 6599 (Purchase now)
P14-5 Efficient Design of Time-Frequency Stereo Parameter Sets for Parametric HE-AAC—Kan-Chun Lee, Chung-Han Yang, Han-Wen Hsu, Wen-Chieh Lee, Chi-Min Liu, Tzu-Wen Chang, National Chiao Tung University - Hsinchu, Taiwan
A parametric stereo coding (PS) tool is used to reconstruct stereo signal from the monaural signal. The tool can be jointly used with the HE-AAC to have high compression ratio and is referred to as the parametric HE-AAC in this paper. The PS tool is able to capture the stereo image of the audio input signal into a limited number of parameters, requiring only a small overhead. In MPEG-4 HE-AAC, the PS tool segments a frame into several regions in time domain and into stereo bands in frequency domain to deliver stereo parameter sets. This paper considers the design of the stereo parameters. These methods are integrated in the NCTU-HE-AAC and the objective experiments are conducted to check the quality.
Convention Paper 6600 (Purchase now)
P14-6 Structural Analysis of Low Latency Audio Coding Schemes—Ralf Geiger, Manfred Lutzky, Markus Schnell, Markus Schmidt, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Low latency audio coding gains increasing importance among upcoming high quality communication applications like videoconferencing and VoIP. This paper provides a comparison of two low latency audio codecs suitable for these tasks: MPEG-4 ER AAC-LD and ITU-T G722.1 Annex C. Despite their similar coding strategies both codecs show significant differences with respect to the used tools and coding performance. A comparison of the coding tools is provided and the influence on different signal classes is discussed.
Convention Paper 6601 (Purchase now)
P15 - Posters: Signal Processing
Sunday, October 9, 1:00 pm — 2:30 pm
P15-1 The Design of Half-Band FIR Filters Using Ripple Attenuation of a Manipulated Lowpass—Duane Wise, Consultant - Boulder, CO, USA
This paper investigates a technique for extracting a half-band FIR filter from a lowpass root FIR. Employing this technique with a root filter designed with an optimal least squares algorithm can result in a half-band FIR with very low ripple over most of the pass-band and stop-band, at the expense of ripple size at the band-edges. The principal advantage of this technique is in the design of half-band FIRs with less priority to band-edge ripple without the manipulation of an arbitrary weighting function.
Convention Paper 6602 (Purchase now)
P15-2 Single Channel Source Separation Using Short-Time Independent Component Analysis—Dan Barry, Dublin Institute of Technology - Dublin, Ireland; Derry Fitzgerald, Cork Institute of Technology - Cork, Ireland; Eugene Coyle, Dublin Institute of Technology - Dublin, Ireland; Bob Lawlor, National University of Ireland - Maynooth, Ireland
In this paper we develop a method for the sound source separation of single channel mixtures using Independent Component Analysis within a time-frequency representation of the audio signal. We apply standard Independent Component Analysis techniques to contiguous magnitude frames of the short-time Fourier transform of the mixture. Provided that the amplitude envelopes of each source are sufficiently different, it can be seen that it is possible to recover the independent short-time power spectra of each source. A simple scoring scheme based on auditory scene analysis cues is then used to overcome the source ordering problem ultimately allowing each of the independent spectra to be assigned to the correct output source. A final stage of adaptive filtering is then applied, which forces each of the spectra to become more independent. Each of the sources is then resynthesized using the standard inverse short-time Fourier transform with an overlap add scheme.
Convention Paper 6603 (Purchase now)
P15-3 A New Class of Smooth Power Complementary Windows and their Application to Audio Signal Processing—Deepen Sinha, ATC Labs - Chatham, NJ, USA; Anibal Ferreira, University of Porto - Porto, Portugal and ATC Labs, Chatham, NJ, USA
In this paper we describe a new family of smooth power complementary windows, which exhibit a very high level of localization in both time and frequency domain. This window family is parameterized by a "smoothness quotient." As the smoothness quotient increases the window becomes increasingly localized in time (most of the energy gets concentrated in the center half of the window) and frequency (far field rejection becomes increasing small to the order of -150 dB or lower). A closed form solution for such window function exists and the associated design procedure is described. The new class of windows are quite attractive for a number of applications as switching functions, equalization functions, or as windows for overlap-add and modulated filterbanks. An extension to the family of smooth windows that exhibits improved near-field response in frequency domain is also described.
Convention Paper 6604 (Purchase now)
P15-4 Active Leak Compensation in Small-Sized Loudspeakers Using Digital Signal Processing—Varun Chopra, Chalmers University of Technology - Gothenburg, Sweden
The frequency response of a small-sized loudspeaker unit used in applications such as mobile telephones changes substantially with changes in the acoustical load of the speaker. Presently, an acoustical solution is used for reducing the variations in the acoustical load of the loudspeaker. The acoustical solution relies on certain space and volume considerations to function satisfactorily, which are difficult to attain in today's compact mobile phones. An unconventional approach using digital signal processing to counter such degradation in frequency response is described here.
Convention Paper 6605 (Purchase now)
P15-5 Digital Signal Processing within the Steinberg VST Architecture—Roberto Osorio-Goenaga, New York University - New York, NY, USA
Steinberg Media Technologies, GmbH, of Germany, is one of the leading manufacturers of professional-audio hardware and software products. Within their software realm, they have developed a plug-in architecture for adding third-party DSP functionality for program developers who choose to support it. The architecture, commonly referred to as VST (virtual studio technology), has become a standard for third-party add-ons over the last decade, partly because of its cross-platform functionality. The software development kit (SDK) for VST plug-ins is available free of charge from Steinberg and is optimized for building within Microsoft’s Visual C++ environment on x86 PC’s, and on the CodeWarrior environment for Apple computers. This paper will focus on the implementation of classic and experimental filters within the aforementioned architecture, created and compiled on Visual C++; rebuilding these examples on a Mac should be a straightforward process. Documentation will include DSP literature as well as the process of creating VST plug-ins in a clear and understandable method. A suite of VST plug-ins will be produced and included as addendum for the project.
Convention Paper 6606 (Purchase now)
P15-6 Comparison Between Time Delay Based and Nonuniform Phase Based Equalization for Multichannel Loudspeaker-Room Responses—Sunil Bharitkar, Chris Kyriakakis, Audyssey Labs., Inc. and University of Southern California - Los Angeles, CA, USA
Traditionally, room response equalization is performed to improve sound quality at a listener. Given a loudspeaker and a listener, in a room, a loudspeaker-room response is obtained and an inverse filter is designed for loudspeaker-room magnitude response equalization. However, due to noncoincident positions of any two loudspeakers, in a multichannel setup, the combined response of the two loudspeakers may have an undesired broad spectral notch or peak or large spectral deviations in the crossover region. These spectral deviations introduced around the crossover, due to the combined phase response, generally cannot be compensated with magnitude response equalization. In this paper we compare two different methods (time delay and all-pass cascade) for correcting for the spectral deviations in the crossover region. We demonstrate that using non-uniform phase distribution, with all-pass filters, around the crossover region, as opposed to a constant phase (i.e., a fixed and optimized time delay in the satellite), it is possible to obtain better correction in the crossover region but with increased complexity. We also present an automatic approach for evaluating performance with the time-delay approach.
Convention Paper 6607 (Purchase now)
P15-7 Objective Function for Automatic Multiposition Equalization and Bass Management Filter Selection—Sunil Bharitkar, Chris Kyriakakis, Audyssey Labs., Inc. and University of Southern California - Los Angeles, CA, USA
Traditionally, multiposition room response equalization is performed to improve sound quality at multiple listeners. Furthermore, even after multiposition equalization, due to noncoincident positions of the subwoofer and the satellite, in a multichannel setup, the combined response of the two loudspeakers may include undesirable spectral deviations in the crossover region, which are different at different positions. These spectral deviations introduced around the crossover, due to the combined phase response, may be fixed by proper choice of the bass management filters. In this paper we present an objective function that can be used to characterize the performance of multiposition equalization, determine the uniformity of equalization, as well as allow automatic selection of the bass management filters for correcting the spectral deviations in the crossover region.
Convention Paper 6608 (Purchase now)
P15-8 Acoustical Monitoring Research for National Parks and Wilderness Areas—Rob Maher, B. Jerry Gregoire, Zhixin Chen, Montana State University - Bozeman, MT, USA
The natural sonic environment, or soundscape, of parks and wilderness areas is not yet fully characterized in a scientific sense. Published research in the U.S. National Park System is generally based on short-term sound level measurements or visitor response surveys associated with regulatory evaluation of noise intrusions from motorized recreational vehicles, tour aircraft, or nearby industrial activity. This paper reviews the history of soundscape studies in the National Park System and describes several recent advances that will allow automated recording and analysis of long-term audio recordings covering days, weeks, and months at a time.
Convention Paper 6609 (Purchase now)
P16 - Posters: Miscellaneous -2
Sunday, October 9, 3:00 pm — 4:30 pm
P16-1 A Binaural Model to Predict Position and Extension of Spatial Images Created with Standard Sound Recording Techniques—Jonas Braasch, McGill University - Montreal, Quebec, Canada
A binaural model was used to investigate different microphone techniques (Blumlein, ORTF, MS, spaced omni). In contrast to previous attempts, the model algorithm was not only designed to predict the position, but also the spatial extent, of a reproduced spatial image. The model also contains elements to simulate the precedence effect, which is required for analyzing spaced-microphone techniques, and is also useful when measuring the influence of the concert space on the recording.
Convention Paper 6610 (Purchase now)
P16-2 An Unsupervised Adaptive Filtering Approach of 2- to 5-Channel Upmix—Yan Li, Peter Driessen, University of Victoria - Victoria, British Columbia, Canada
A new algorithm of converting 2-channel audio materials to 5-channel based on subband unsupervised adaptive filtering is proposed in this paper. This algorithm uses a subband analysis-processing-synthesis framework. In each subband, a robust stereo image is obtained using principal component analysis, and an effective energy redistribution among surround channels is achieved by mapping cross-correlation between two input channels to a weighted panning matrix.
Convention Paper 6611 (Purchase now)
P16-3 Investigation on the Related Effect Caused by ECM Miniaturization—Jun Lee, Zonghan Wu, Zongbao Hu, Tao Zhang, Shenzhen Horn Electroacoustic Technology Co., Ltd. - Shenzhen, China
Miniaturization is a very important issue that people must keep in mind during design and production of electronic components. The miniaturization of ECM (electret condenser microphone) needs the corresponding changes of ECM structure, dimensions, and circuit distribution. Such changes impact on the specifications of ECM. This paper discusses the related effect of ECM miniaturization, which is useful for the design of mini-microphones.
Convention Paper 6612 (Purchase now)
P16-4 On Amplitude Panning and Asymmetric Loudspeaker Set-Ups—Arno van Leest, Philips Research Laboratories - Eindhoven, The Netherlands
The aim of amplitude panning is to create a phantom sound source that is heard at a certain predetermined position by feeding a mono signal to several loudspeakers using particular weighting factors. Two models used for amplitude panning are of particular importance: the velocity and the energy model. A nice property of these models is that the position of a phantom sound source can be described as linear combinations of the vectors pointing toward the physical positions of the loudspeakers. In this paper we describe how the coefficients of these linear combinations can be numerically computed by means of simple matrix algebra, which meet both the velocity and the energy model simultaneously for a given optimization criterion.
Convention Paper 6613 (Purchase now)
P16-5 Phantom Audio Sources with Vertically Separated Loudspeakers—Shiva Sundaram, Chris Kyriakakis, University of Southern California - Los Angeles, CA, USA
Multichannel auditory displays and immersive audio systems are frequently integrated with video displays. Often in these applications, the video display placement makes it difficult to place the center loudspeaker in front of the listener. A solution to this problem is to create a phantom center channel in front of the listener using loudspeakers placed elsewhere. Conventionally, phantom sources can be created by amplitude-panned techniques in the horizontal plane. However, since it is practical and aesthetic to have loudspeakers above or below the video display, in this paper, we propose a technique to create a center phantom at 0 degrees elevation and 0 degrees azimuth to the listener using two vertically separated loudspeakers placed above and below the horizontal plane at 0 degrees azimuth. The phantom center is created using inverse filtering techniques. This technique can also be extended to create phantom sources in the median plane.
Convention Paper 6614 (Purchase now)
P16-6 Difference of the Sound Levels among 15 Japanese Terrestrial and Digital Satellite TV Broadcasting Channels—Eiichi Miyasaka, Takahiro Kamada, Musashi Institute of Technology - Yokohama, Kanagawa, Japan
Toward better sound services for elderly listeners, we investigated, for the first time, the sound levels of 15 Japanese terrestrial analog broadcasting channels and digital broadcasting channels including NHK and the Commercial Broadcasting Bureaus. The results show that (1) the daily averaged sound level of the terrestrial channels was –6.1 dB with 4 dB higher than that of BS digital channels, (2) the maximum level difference of the averaged sound levels between main programs and advertisements inserted in the programs was 15 dB, and (3) the average level difference was 2.9 dB. These results imply that there exist perceptually significant problems for elderly viewers.
Convention Paper 6615 (Purchase now)
P16-7 Electroacoustic Analogy Analysis of Electret Condenser Microphones with Noise-Canceling Effects—Fang-Ching Lee, Industrial Technology Research Institute - Hsinchu, Taiwan
An electroacoustic analogy is developed to analyze the open-circuit sensitivity, noise-canceling effect, and frequency response of electret condenser microphones. In contrast to conventional models of electret condenser microphones in the literature, the present electroacoustic analogy analysis (E.A.A.) model details the open-circuit sensitivity, noise-canceling effect and frequency response of microphones. Two commercially available electret condenser microphones are analyzed to demonstrate the model. The results show that the calculating results of discussed microphones consist with the measuring data.
Convention Paper 6616 (Purchase now)
P17 - Hi Resolution Audio & Psychoacoustics, Perception, Listening Tests
Monday, October 10, 10:00 am — 12:00 pm
Chair: Gilbert Soulodre, Communications Research Centre - Ottowa, Ontario, Canada
P17-1 Stereo and Multichannel Loudness Perception and Metering—Gilbert Soulodre, Michel Lavoie, Communications Research Centre - Ottowa, Ontario, Canada
Much research has been conducted in recent years on loudness perception and metering. This has been motivated by an ITU-R effort to identify a loudness measure suitable for broadcast applications. The initial ITU-R effort focussed exclusively on mono signals, and a simple loudness meter based on a weighted energy sum [Leq(RLB)] was shown to perform best. In the present paper the work on loudness perception was extended to the stereo and multichannel cases. Formal subjective tests were conducted using typical broadcast material to derive a subjective database to evaluate the performance of new loudness meters. The results were used to examine the requirements for a loudness meter for mono, stereo, and multichannel signals.
Convention Paper 6618 (Purchase now)
P17-2 The Influence of Stereophony on the Restitution of Timbre by Loudspeakers—Mathieu Lavandier, Benjamin Guyot, Sabine Meunier, Philippe Herzog, Laboratoire de Mécanique et d'Acoustique - Marseille, France
In a previous study [Lavandier et al., AES 117th Convention, 2004 Paper 6240], the restitution of timbre by loudspeakers was evaluated by perceptual and physical measurements, in order to find the link between these two parallel approaches. An experimental protocol was built that consists of recording the sound radiated by single loudspeakers in a room and then submitting the recorded sounds to listening tests under headphones. Even if the spatial dimension of sound reproduction is not investigated here, the stereophony might also influence the restitution of timbre by loudspeakers. A panel of loudspeakers is recorded in both monophonic and stereophonic reproductions. The recordings are submitted to listening tests revealing the influence of the stereophony on the perceived differences between loudspeakers. This influence turns out to be minor, and the perceptual spaces resulting from multidimensional scaling analysis of the perceived differences were not affected by the change of the reproduction configuration.
Convention Paper 6619 (Purchase now)
P17-3 Audio Processor ICs for Advanced TV—Johan Mansson, et al., Analog Devices, Inc. - Limerick, Ireland
A family of fully integrated advanced audio processors for advanced TV applications implemented in deep sub-micron complementary metal oxide semiconductor (CMOS) will be presented. These are mixed signal designs with integrated data converters and digital signal processing to enable rich acoustic experience for cost-sensitive systems. Digital performance, in terms of speed and large memory, in conjunction with low cost required by the application, calls for the use of a small geometric CMOS process. It is technically challenging to design high performance converters on such sub-micron CMOS, particular due to substrate noise and flicker noise. To overcome these issues, continuous-time converter architectures have been developed and designed.
Convention Paper 6620 (Purchase now)
P17-4 Noise Shaping in Time-Domain Quantized LFM—Malcolm Hawksford, University of Essex - Colchester, Essex, UK
SDM binary code can be derived from linear frequency modulation (LFM) using zero crossing detection and time-domain quantization. However, the jitter-like nature of this quantization process does not generally yield well-structured noise shaped characteristics compared to conventional time domain coders. The problem of improving the frequency domain performance of the error resulting from an LFM-based model is studied, and further observations are made on the relationship between SDM and quantized LFM.
Convention Paper 6617 (Purchase now)
P18 - Microphones & Internet Audio
Monday, October 10, 10:00 am — 12:00 pm
Chair: Eddy B. Brixen, DPA Microphones A/S - Allerød, Denmark; and EBB-consult, Smørum, Denmark
P18-1 The Native B-Format Microphone: Part 1—Eric Benjamin, Thomas Chen, Dolby Laboratories - San Francisco, CA, USA
Ambisonic sound recording is predicated on the acquisition of audio signals in what has been termed “B-format,” which is the output of a microphone array known as the Soundfield Microphone. That microphone is a tetrahedral array of nominally cardioid microphone capsules. The capsule signals are processed in such a way as to give four output signals that are proportional to the pressure and the three-dimensional particle velocity vector at the center of the array. These same signals can be achieved by the use of a single omnidirectional microphone and three figure-of-eight microphones located close to each other. Errors in the shape of the polar pattern or in the ratio of the direct to diffuse-field sensitivity of the microphones result. The two methods are compared in both a theoretical analysis and in acoustical measurements. The results of the comparison of experimental recordings using these two types of arrays will be presented in part two of this paper.
Convention Paper 6621 (Purchase now)
P18-2 A Web Search Engine for Sound Effects—Stephen V. Rice, University of Mississippi - University, MS, USA; Stephen M. Bailey, Comparisonics Corporation - Grass Valley, CA, USA
FindSounds is the first Web search engine for sound effects. Queries are processed using a selective index of Web audio files that includes sound effects and musical instrument samples but excludes song and speech recordings. A text search retrieves audio files based on how they are labeled, and a “sounds-like” search locates audio files based on sound similarity. Each month FindSounds processes more than 1.5 million queries for more than 150,000 Internet users.
Convention Paper 6622 (Purchase now)
P18-3 Alternative Approaches for Recording Surround Sound—Colin Preston, South East Essex College - Southend-on-Sea, Essex, UK
The aim of this paper is to describe a series of adaptations of stereo microphone techniques for surround sound. The adaptations provide variable microphone polar patterns, multiple outputs for surround sound, as well as retaining a conventional 2-channel stereo output. The system enables the recording engineer to position and adjust the microphone cluster for a conventional stereo output and then derive the surround sound outputs. It is also possible to record the microphone outputs separately and create the desired polar patterns in postproduction. The technique also has creative applications for multimicrophone or multitrack situations where a number of single microphones would be used.
Convention Paper 6623 (Purchase now)
P18-4 Microphones, High Wind, and Rain—Eddy B. Brixen, DPA Microphones A/S - Allerød, Denmark; and EBB-consult, Smørum, Denmark
In outdoor recording, high wind and rain are generally a problem as this causes unwanted noise in the microphone signal. In order to prevent noise, the microphones can be protected from the weather by using windjammers, windscreens, etc. However, the effectiveness of these products varies significantly and full specifications characterizing maximum wind speed permitted, wind attenuation, spectral damping, influence by rain, etc., are seldom given. In this paper an overview of the problems involved in the specification of microphone systems for outdoor recording is given. It proposes measurements and a form of presentation that might provide more informative specifications.
Convention Paper 6624 (Purchase now)
P19 - Psychoacoustics, Perception, Listening Tests
Monday, October 10, 1:00 pm — 3:30 pm
Chair: Bozena Kostek, Gdansk University of Technology - Gdansk, Poland, and Institute of Physiology and Pathology of Hearing, Warsaw, Poland
P19-1 Automatic Evaluation of Musical Sound Separation Quality—Marek Dziubinski, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
This paper addresses the problem of evaluating effectiveness of musical sound separation algorithms. The standardized procedure for evaluating separation quality does not exist. The most convincing and typical way to do this is by carrying out subjective listening tests. However, subjective tests need a solid statistical validation, which means that many experts should take part in such tests, the room characteristics should be adequate, and what is also important, such tests are time consuming. Thus this paper attempts to show that it is possible to carry out the evaluation tests in an automatic way, by employing an Artificial Network System (ANN), which is further justified by experts’ opinion.
Convention Paper 6625 (Purchase now)
P19-2 Internet-Based Automatic Hearing Assessment System—Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland, and Institute of Physiology and Pathology of Hearing, Warsaw, Poland; Henryk Skarzynski, Institute of Physiology and Pathology of Hearing - Warsaw, Poland
In this paper the Internet-based system that allows for automatic testing of hearing is described. Hearing impairment is one of the fastest growing diseases of modern society. Therefore it is very important to organize mass screening tests to identify people suffering from this kind of impairment. The described application provides a test that uses automatic questionnaire analysis, standardized audiometric tone test procedures, and assessment of speech intelligibility in noise. When all the testing is completed, the system automatically analyzes the results for each person examined. Based on the number of incorrect answers, the decision is made automatically by the expert system. Persons whose hearing impairment is confirmed are referred to treatment in rehabilitation centers. All these centers are connected via the Internet and are provided with special distributed database access allowing them to automatically register and track the patient discovered during the remote screening.
Convention Paper 6626 (Purchase now)
P19-3 Constructing Individual and Group Timbre Spaces for Sharpness-Matched Distorted Guitar Timbres—Atsushi Marui, William Martens, McGill University - Montreal, Quebec, Canada
In a previous study on predicting timbral variation resulting from distortion-based effects processing, two types of direct ratings were collected from a large group of naïve subjects (approximately 50) for a set of sharpness-matched guitar timbres; first on a dissimilarity scale and then on 11 bipolar adjective scales. For the current study, similar data were collected from five trained subjects to allow for a comparison between results derived for each of the trained subjects and the previously reported group results. To investigate differences in how individuals might describe these distorted guitar timbres, the trained subjects’ adjective ratings were submitted separately to analysis using a method inspired by the Repertory Grid Technique (RGT), as well as being submitted for external multidimensional unfolding (MDU) analyses.
Convention Paper 6627 (Purchase now)
P19-4 Physiological and Content Considerations for a Second Very Low Frequency Channel for Bass Management, Subwoofers, and LFE—Robert (Robin) Miller III, FilmakerTechnology - Bethlehem, PA, USA
By convention, frequencies below 90 Hz produce no interaural cues useful for spatial sound or localization. Yet some claim they are able to hear a difference between a single subwoofer channel (whether or not to more than one subwoofer) and two (“stereo bass”). Reported research supports the Jeffress model of interaural time difference (ITD) determination in brain structures and extending the accepted lower frequency limit of interaural phase difference (IPD). Meanwhile, uncorrelated very low frequencies (VLF <100 Hz) exist in nearly all existing multichannel music and movie content. The audibility, recording, and reproduction of uncorrelated VLF are explored in theory and experiments.
Convention Paper 6628 (Purchase now)
P19-5 Individual Vocabulary Profiling of Spatial Enhancement Systems for Stereo Headphone Reproduction—Gaëtan Lorho, Nokia Corporation - Helsinki, Finland
This paper presents an audio descriptive analysis experiment employing an individual vocabulary development approach. The aim of the study was to compare the perceptual characteristics of spatial enhancement systems for stereo headphone reproduction. Five musical programs were selected and processed with two subsets of algorithms representing different approaches to spatial enhancement for headphones, including stereo enhancement systems and Virtual Home Theatre systems for headphone reproduction. Ten listeners were selected based on their discriminative and descriptive skills. Each subject developed his or her own set of attributes in three hours and performed a comparative evaluation of seven series of eight stimuli. The methods employed for the descriptive analysis process and for the analysis of this individual vocabulary profiling data are presented and some results from the perceptual evaluation are reported.
Convention Paper 6629 (Purchase now)
P20 - Sound Reinforcement
Monday, October 10, 1:00 pm — 3:00 pm
Chair: Wolfgang Ahnert, Acoustic Design Ahnert GmbH - Berlin, Germany
P20-1 Assessing the Suitability of Digital Signal Processing as Applied to Performance Audio such as In-Ear Monitoring Systems—Steve Armstrong, Keith Gordon, Betty Rule,, Gennum Corporation - Burlington, Ontario, Canada
In the sound reinforcement field, current in-ear monitor (IEM) systems provide a number of benefits over floor wedges including hearing protection, reduced stage volume, and improved coverage. However, new problems arise from occlusion caused by the tight earbud seal while old problems such as lack of personalization still remain. By applying digital signal processing (DSP) derived from the current state of the art in the hearing aid (HA) industry, these problems can be overcome. DSP is applied to both ambient microphones located at the users’ ears and the monitor audio feed. It provides multiband parametric equalization, compression, and limiting for each feed and ear separately, allowing for precise tailoring of the sound, including compensation for hearing loss.
Convention Paper 6630 (Purchase now)
P20-2 New Data Format to Describe Complex Sound Sources—Stefan Feistel, Software Design Ahnert GmbH - Berlin, Germany; Wolfgang Ahnert, Acoustic Design Ahnert - Berlin, Germany; Steffen Bock, Software Design Ahnert GmbH - Berlin, Germany
Originated by the evolution of modern sound systems and the resulting need to describe them formally, a new data format was developed to contain the mechanical, electronic, and acoustic properties of complex sound sources, such as loudspeaker clusters, column speakers or line arrays. The GLL (Generic Loudspeaker Library) format can utilize measurement data such as impulse responses or transfer functions directly, or include already postprocessed fractional-octave data. To conveniently manage the amount of data involved, a specialized storage algorithm was developed. Furthermore, a freely available software is presented that was created during the research work. It allows for utilizing these import and export functions as well as for investigating the data. As a part of the EASE acoustic simulation program, the software will allow loudspeaker manufacturers and users of acoustic prediction software to create and exchange complex loudspeaker data in a high-resolution format.
Convention Paper 6631 (Purchase now)
P20-3 The Significance of Phase Data for the Acoustic Prediction of Combinations of Sound Sources—Stefan Feistel, Software Design Ahnert GmbH - Berlin, Germany; Wolfgang Ahnert, Acoustic Design Ahnert GmbH - Berlin, Germany
Until today, only a few acoustic prediction software packages utilize complex directivity data to characterize sound sources, such as line arrays. This paper gives some theoretical background on the significance of phase data for the prediction of combinations of coherent sound sources. A mathematical model is introduced that allows evaluating the error ranges for several loudspeaker measurement methods. It is shown that, in contrast to what one would naively expect, the choice of the reference point for measuring complex data is rather irrelevant within given limits. These limits are derived based on the propagation equation for spherical waves. It is further shown analytically that the use of phase data reduces the measurement error to be expected by at least an order of magnitude.
Convention Paper 6632 (Purchase now)
P20-4 Simulation, Auralization, and Their Verification of Acoustic Parameters Using Line Arrays—Wolfgang Ahnert, Acoustic Design Ahnert GmbH - Berlin, Germany; Stefan Feistel, Software Design Ahnert GmbH - Berlin, Germany
An existing room with installed line arrays was investigated by means of corresponding measurements. Furthermore, in EASE4.1 a simulation of direct sound coverage as well as of the room impulse response has been done, to derive as a result sound levels and intelligibility measures. Here different approaches have been utilized including a statistic field approximation and the room acoustic algorithms of EASE AURA. Additionally the same acoustic parameters have been measured using the measurement software EASERA. The comparison of measured and simulated results showed a good correlation. To additionally confirm the similarity, a comparison of auralized files was performed which showed deviations audible only for experts.
Convention Paper 6633 (Purchase now)