AES New York 2009
Paper Session P18
P18 - Analysis and Synthesis of Sound
Monday, October 12, 9:00 am — 10:30 am
Chair: Sunil Bharitkar, Audyssey Labs/USC - Los Angeles, CA, USA
P18-1 Audio Bandwidth Extension Using Cluster Weighted Modeling of Spectral Envelopes—Nikolay Lyubimov, Alexey Lukin, Moscow State University - Moscow, Russian Federation
This paper presents a method for blind bandwidth extension of band-limited audio signals. A rough generation of the high-frequency content is performed by nonlinear distortion (waveshaping) applied to the mid-range band of the input signal. The second stage is shaping of the high-frequency spectrum envelope. It is done by a Cluster Weighted Model for MFCC coefficients, trained on full-bandwidth audio material. An objective quality measure is introduced and the results of listening tests are presented.
Convention Paper 7946 (Purchase now)
P18-2 Applause Sound Detection with Low Latency—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
This paper presents a comprehensive investigation on the detection of applause sounds in audio signals. It focuses on the processing of single microphone recordings in real-time with low latency. A particular concern is the intensity of the applause within the sound mixture and the influence of the interfering sounds on the recognition performance which is investigated experimentally. Well-known feature sets, feature processing, and classification methods are compared. Additional low-pass filtering of the feature time series leads to the concept of sigma features and yields further improvements of the detection result.
Convention Paper 7947 (Purchase now)
P18-3 Loudness Descriptors to Characterize Wide Loudness-Range Material—Esben Skovenborg, Thomas Lund, TC Electronic A/S - Risskov, Denmark
Previously we introduced the concept of loudness descriptors—key numbers to summarize loudness properties of a broadcast program or music track. This paper presents the descriptors: Foreground Loudness and Loudness Range. Wide loudness-range material is typically level-aligned based on foreground sound rather than overall loudness. Foreground Loudness measures the level of foreground sound, and Loudness Range quantifies the variation in loudness. We propose to use these descriptors for loudness profiling and alignment, especially when live, raw, and film material is combined with other broadcast programs, thereby minimizing level-jumps and also preventing unnecessary dynamics-processing. The loudness descriptors were computed for audio segments in both linear PCM and perceptually coded versions. This evaluation demonstrates that the descriptors are robust against nearly-transparent transformations.
Convention Paper 7948 (Purchase now)