AES San Francisco 2010
Paper Session P16
P16 - Signal Analysis and Synthesis
Saturday, November 6, 9:00 am — 12:30 pm (Room 236)
Agnieszka Roginska, New York University - New York, NY, USA
P16-1 Maintaining Sonic Texture with Time Scale Compression by a Factor of 100 or More—Robert Maher, Montana State University - Bozeman, MT, USA
Time lapse photography is a common technique to present a slowly evolving visual scene with an artificially rapid temporal scale. Events in the scene that unfold over minutes, hours, or days in real time can be viewed in a shorter video clip. Audio time scaling by a major compression factor can be considered the aural equivalent of time lapse video, but obtaining meaningful time-compressed audio requires interesting practical and conceptual challenges in order to retain the original sonic texture. This paper reviews a variety of existing techniques for compressing 24 hours of audio into just a few minutes of representative "time lapse" audio and explores several useful modifications and optimizations.
Convention Paper 8250 (Purchase now)
P16-2 Sound Texture Analysis Based on a Dynamical Systems Model and Empirical Mode Decomposition—Doug Van Nort, Jonas Braasch, Pauline Oliveros, Rensselaer Polytechnic Institute - Troy, NY, USA
This paper describes a system for separating a musical stream into sections having different textural qualities. This system translates several contemporary approaches to video texture analysis, creating a novel approach in the realm of audio and music. We first represent the signal as a set of mode functions by way of the Empirical Mode Decomposition (EMD) technique for time/frequency analysis, before expressing the dynamics of these modes as a linear dynamical system (LDS). We utilize both linear and nonlinear techniques in order to learn the system dynamics, which leads to a successful separation of the audio in time and frequency.
Convention Paper 8251 (Purchase now)
P16-3 An Improved Audio Watermarking Scheme Based on Complex Spectral Phase Evolution Spectrum—Jian Wang, Ron Healy, Joe Timoney, NUI Maynooth - Co. Kildare, Ireland
In this paper a new audio watermarking algorithm based on the CSPE algorithm is presented. This is an extension of a previous scheme. Peaks in a spectral representation derived from the CSPE are utilized for watermarking, instead of the previously proposed frequency identification. Although this new scheme is simple, it achieves a high robustness besides perceptual transparency and accuracy which is one distinguishing advantage over our previous scheme.
Convention Paper 8252 (Purchase now)
P16-4 About This Dereverberation Business: A Method for Extracting Reverberation from Audio Signals—Gilbert Soulodre, Camden Labs - Ottawa, Ontario, Canada
There are many situations where the reverberation found in an audio signal is not appropriate for its final use, and therefore we would like to have a means of altering the reverberation. Furthermore we would like to be able to modify this reverberation without having to directly measure the acoustic space in which it was recorded. In the present paper we describe a method for extracting the reverberant component from an audio signal. The method allows an estimate of the underlying dry signal to be derived. In addition, the reverberant component of the signal can be altered.
Convention Paper 8253 (Purchase now)
P16-5 Automatic Recording Environment Identification Using Acoustic Reverberation—Usman Amin Chaudhary, Hafiz Malik, University of Michigan-Dearborn - Dearborn, MI, USA
Recording environment leaves its acoustic signature in the audio recording captured in it. For example, the persistence of sound, due to multiple reflections from various surfaces in a room, causes temporal and spectral smearing of the recorded sound. This distortion is referred to as audio reverberation time. The amount of reverberation depends on the geometry and composition of a recording location, the difference in the estimated acoustic signature can be used for recording environment identification. We describe a statistical framework based on maximum likelihood estimation to estimate acoustic signature from the audio recording and use it for automatic recording environment identification. To achieve these objectives, digital audio recording is analyzed first to estimate acoustic signature (in the form of reverberation time and variance of the background noise), and competitive neural network based clustering is then applied to the estimated acoustic signature for automatic recording location identification. We have also analyzed the impact of source-sensor directivity, microphone type, and learning rate of clustering algorithm on the identification accuracy of the proposed method.
Convention Paper 8254 (Purchase now)
P16-6 Automatic Music Production System Employing Probabilistic Expert Systems—Gang Ren, Gregory Bocko, Justin Lundberg, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
An automatic music production system based on expert audio engineering knowledge is proposed. An expert system based on a probabilistic graphical model is employed to embed professional audio engineering knowledge and infer automatic production decisions based on musical information extracted from audio files. The production pattern, which is represented as a probabilistic graphic model, can be “learned” from the operation data of a human audio engineer or manually constructed from domain knowledge. The authors also discuss the real-time implementation of the proposed automatic production system for live mixing application scenarios. Musical event alignment and prediction algorithms are introduced to improve the time synchronization performance of our production model. The authors conclude with performance evaluations and a brief summary.
Convention Paper 8255 (Purchase now)
P16-7 Musical Eliza: An Automatic Musical Accompany System Based on Expressive Feature Analysis—Gang Ren, Justin Lundberg, Gregory Bocko, Dave Headlam, Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose an interactive algorithm that musically accompanies musicians based on the matching of expressive feature patterns to existing archive recordings. For each accompany music segment, multiple realizations with different musical characteristics are performed by master music performers and recorded. Musical expressive features are extracted from each accompany segment and its semantic analysis is obtained using music expressive language model. When the performance of system user is recorded, we extract and analyze musical expressive feature in real time and playback the accompany track from the archive database that best matches the expressive feature pattern. By creating a sense of musical correspondence, our proposed system provides exciting interactive musical communication experience and finds versatile entertainment and pedagogical applications.
Convention Paper 8256 (Purchase now)