AES Munich 2009
Thursday, May 7, 10:00 — 11:00
Library Event Details
Archiving, Restoration, and Digital Libraries
Thursday, May 7, 11:00 — 12:00
Coding of Audio Signals
Thursday, May 7, 12:00 — 13:30
Please join us as the AES presents special awards to those who have made outstanding contributions to the Society in such areas of research, scholarship, and publications, as well as other accomplishments that have contributed to the
enhancement of our industry. The awardees are:
Bronze Medal Award:
• Ivan Stamac
• Martin Wöhr
Board of Governors Award:
• Jan Berg
• Klaus Blasquiz
• Kimio Hamasaki
• Shinji Koyano
• Tapio Lokki
• Jiri Ocenasek
• John Oh
• Jan Abildgaard Pedersen
• Joshua Reiss
This year’s Keynote Speaker is Gerhard Thoma. Thoma has been leading the department of acoustics projects at BMW for more than 20 years. His speech will highlight many aspects of perception and acoustics from an unusual point of view: What does a driver in a car need to hear, what does he should not hear, and how can the acoustics and sounds of a car help to significantly enhance driving pleasure and safety?
Thursday, May 7, 14:00 — 15:30
P6 - Multichannel Coding
P6-1 Adaptive Predictive Modeling of Stereo LPC with Application to Lossless Audio Compression—Florin Ghido, Ioan Tabus, Tampere University of Technology - Tampere, Finland
We propose a novel method for exploiting the redundancy of stereo linear prediction coefficients by using adaptive linear prediction for the coefficients themselves. We show that an important proportion of the stereo linear prediction coefficients, on both the intrachannel and the interchannel parts, still contains important redundancy inherited from the signal. We can therefore significantly reduce the amplitude range of those LP coefficients by using adaptive linear prediction with orders up to 4, separately on the intrachannel and intrachannel parts. When integrated into asymmetrical OptimFROG, the new method obtains on average 0.29 percent improvement in compression with negligible increase in decoder complexity.
Convention Paper 7666 (Purchase now)
P6-2 A Study of MPEG Surround Configurations and Its Performance Evaluation—Evelyn Kurniawati, Samsudin Ng, Sapna George, ST Microelectronics Asia Pacific Pte. Ltd. - Singapore
The standardization of MPEG Surround in 2007 opens a new range of possibility for low bit rate multichannel audio encoding. While ensuring backward compatibility with legacy decoder, MPEG Surround offers various configurations to upmix to the desired number of channels. The downmix stream, which can be in mono or stereo format, can be passed to transform, hybrid, or any other types of encoder. These options give us more than one possible combination to encode a multichannel stream at a specific bit rate. This paper presents a comparative study between those options in terms of their quality performance that will help us choose the most suitable configuration of MPEG Surround in a range of operating bit rate.
Convention Paper 7667 (Purchase now)
P6-3 Lossless Compression of Spherical Microphone Array Recordings—Erik Hellerud, U. Peter Svensson, Norwegian University of Science and Technology - Trondheim, Norway
The amount of spatial redundancy for recordings from a spherical microphone array is evaluated using a low delay lossless compression scheme. The original microphone signals, as well as signals transformed to the spherical harmonics domain, are investigated. It is found that the correlation between channels is, as expected, very high for the microphone signals, in several di?erent acoustical environments. For the signals in the spherical harmonics domain, the compression gain from using inter-channel prediction is reduced, since this conversion results in many channels with low energy. Several alternatives for reducing the coding complexity are also investigated.
Convention Paper 7668 (Purchase now)
Thursday, May 7, 16:30 — 18:00
P7 - Spatial Audio Processing
P7-1 Low Complexity Binaural Rendering for Multichannel Sound—Kangeun Lee, Changyong Son, Dohyung Kim, Samsung Advanced Institute of Technology - Suwon, Korea
The current paper is concerned with an effective method to emulate the multichannel sound in a portable environment where low power is required. The goal of this paper is to show the complexity of binaural rendering of the multichannel to stereo sound systems in cases of portable devices. To achieve this, we proposed the modified discrete cosine transform (MDCT) based binaural rendering, combined with the Dolby Digital decoder (AC-3) that is a multichannel audio decoder. A reverberation algorithm is added to the proposed algorithm for closing to real sound. This combined structure is implemented on a DSP processer. The complexity and quality are compared with a conventional head-related transfer function (HRTF) filtering method and Dolby headphone that are the most current in commercial binaural rending technology, demonstrating significant complexity reduction and comparable sound quality to the Dolby headphone.
Convention Paper 7687 (Purchase now)
P7-2 Optimal Filtering for Focused Sound Field Reproductions Using a Loudspeaker Array—Youngtae Kim, Sangchul Ko, Jung-Woo Choi, Jungho Kim, SAIT, Samsung Electronics Co., Ltd. - Gyeonggi-do, Korea
This paper describes audio signal processing techniques in designing multichannel filters for reproducing an arbitrary spatial directivity pattern with a typical loudspeaker array. In designing the multichannel filters, some design criteria based on, for example, least-squares methods and the maximum energy array are introduced as non-iterative optimization techniques with a lower computational complexity. The abilities of the criteria are first evaluated with a given loudspeaker configuration for reproducing a desired acoustic property in a spatial area of interest. Also, additional constraints are considered to impose for minimizing the error between the amplitudes of actual and the desired spatial directivity pattern. Their limitations in practical applications are revealed by experimental demonstrations, and finally some guidelines are proposed in designing optimal filters.
Convention Paper 7688 (Purchase now)
P7-3 Single-Channel Sound Source Distance Estimation Based on Statistical and Source-Specific Features—Eleftheria Georganti, Philips Research Europe - Eindhoven, The Netherlands, University of Patras, Patras, Greece; Tobias May, Technische Universiteit Eindhoven - Eindhoven, The Netherlands; Steven van de Par, Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands; John Mourjopoulos, University of Patras - Patras, Greece
In this paper we study the problem of estimating the distance of a sound source from a single microphone recording in a room environment. The room effect cannot be separated from the problem without making assumptions about the properties of the source signal. Therefore, it is necessary to develop methods of distance estimation separately for different types of source signals. In this paper we focus on speech signals. The proposed solution is to compute a number of statistical and source-specific features from the speech signal and to use pattern recognition techniques to develop a robust distance estimator for speech signals. Experiments with a database of real speech recordings showed that the proposed model is capable of estimating source distance with acceptable performance for applications such as ambient telephony.
Convention Paper 7689 (Purchase now)
P7-4 Implementation of DSP-Based Adaptive Inverse Filtering System for ECTF Equalization—Masataka Yoshida; Haruhide Hokari; Shoji Shimada, Nagaoka University of Technology - Nagaoka, Niigata, Japan
The Head Related Transfer Function (HRTF) and the inverse Ear Canal Transfer Function (ECTF) must be accurately determined if stereo earphones are realized out-of-head sound localization (OHL) with high presence. However, the characteristics of ECTF depend on the type of earphone used and the number of earphone mounting and demounting operations. Therefore, we present a DSP-based adaptive inverse filtering system for ECTF equalization in this paper. The buffer composition and size of DSP were studied so as to implement operation processing. As a result, we succeeded in constructing a system that was able to work in the audio-band of 15 kHz with the sampling frequency of 44.1 kHz. Listening tests clarified that the effective estimation error of the adaptive inverse-ECTF for OHL was less than –11 dB with convergence time of about 0.3 seconds.
Convention Paper 7690 (Purchase now)
P7-5 Improved Localization of Sound Sources Using Multi-Band Processing of Ambisonic Components—Charalampos Dimoulas, George Kalliris, Konstantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on the use of multi-band ambisonic-processing for improved sound source localization. Energy-based localization can be easily delivered using soundfield microphone pairs, as long as free field conditions and the single omni-directional-point-source model apply. Multi-band SNR-based selective processing improves the noise tolerance and the localization accuracy, eliminating the influence of reverberation and background noise. Band-related sound-localization statistics are further exploited to verify the single or multiple sound-sources scenario, while continuous spectral fingerprinting indicates the potential arrival of a new source. Different sound-excitation scenarios are examined (single /multiple sources, narrowband / wideband signals, time-overlapping, noise, reverberation). Various time-frequency analysis schemes are considered, including filter-banks, windowed-FFT and wavelets with different time resolutions. Evaluation results are presented.
Convention Paper 7691 (Purchase now)
P7-6 Spatial Audio Content Management within the MPEG-7 Standard of Ambisonic Localization and Visualization Descriptions—Charalampos Dimoulas, George Kalliris, Kostantinos Avdelidis, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on spatial audio video/imaging and sound field visualization using ambisonic-processing, combined with MPEG-7 description schemes for multi-modal content description and management. Sound localization can be easily delivered using multi-band ambisonic processing under free-field and single point-source excitation conditions, offering an estimate on the achieved accuracy. Sound source forward propagation models can be applied in case that confident localization accuracy has achieved, to visualize the corresponding sound field. Otherwise, 3-D audio/surround sound reproduction simulation can be used instead. In any case, sound level distribution colormap-videos and highlighting images can be extracted. MPEG-7 adapted description schemes are proposed for spatial-audio audiovisual content description and management, facilitating a variety of user-interactive postprocessing applications.
Convention Paper 7692 (Purchase now)
Thursday, May 7, 17:00 — 18:00
Semantic Audio Analysis
Thursday, May 7, 18:30 — 19:30
The Richard C. Heyser distinguished lecturer for the 126th AES Convention is Gunnar Rasmussen, a pioneer in the construction of acoustic instrumentation, particularly of microphones, transducers, vibration and related devices. He was employed at Brüel & Kjær Denmark as an electronics engineer immediately after his graduation in 1950. After holding various positions in development, testing, and quality control, he spent one year in the United States working for Brüel & Kjær in sales and service.
After his return to Denmark in the mid-1950s he began the development of a new measurement microphone. This resulted in a superior mechanical stability, increased temperature, and long term stability. The resulting one-inch pressure microphone soon became the de facto standard microphone for acoustical measurements to replace the famous W.E. 640AA standardized microphone.
The optimized mechanical design of the new generation of measurement microphones opened up the possibility for reducing the size of the microphones, first to a ½” microphone and then to ¼” and 1/8” microphones with essentially the same superior mechanical, temperature and long term stability. Notably the ½” microphone is still the most widely used measurement tool today. Since the beginning of the 1960’s, this microphone design has been preferred for all types of acoustic measurements and has formed the basis for the IEC 1094 series of international standards for measurement microphones.
Gunnar Rasmussen received the Danish Design Award in 1969 for his novel design of the microphones that were exhibited at the New York Museum of Modern Art. He also developed the first acoustically optimized sound level meter, where the shape of the body was designed to minimize the effect of reflections from the casing to the microphone. This type 2203 Sound Level meter was for many years seen as the archetype of sound level meters and its characteristic shape became the symbol of a sound level meter.
Other major inventions and designs include the Delta Shear accelerometer, the dual piston pistonphone calibrator for precision calibration, the face-to-face sound intensity probe and hydrophones, occluded ears, artificial mouth, etc. Rasmussen is also the author of numerous papers on acoustics and vibration and has served as chairman and vice-chairman of various international organizations and standard committees. In 1990 he received the CETIM medal for his contribution to the field of intensity techniques. He is also a Fellow of the Acoustical Society of America.
In 1994 Rasmussen started his own company, G.R.A.S. Sound and Vibration. Originally a company specializing in precision Outdoor Microphones for permanent noise monitoring around airports, it is now one of the world’s leading companies in acoustic front-ends and transducers forming a wide range of general purpose and specialized microphones, electro-acoustic measurement devices such as ear couplers, precision calibration tools and multi-dimensional sound intensity probes. The title of his lecture is, “The Reproduction of Sound Starts at the Microphone.”
The microphones may be developed for many specific purposes: for communication, recording or precision measurements. Quality may have different meaning for different applications. Price may be a dominating factor. Carbon microphones were dominating up to the 1950s. Electret microphones have taken the place of carbon microphones with great improvement in quality and performance at low prices. The MEMS microphones are on the way.
The challenge in the high quality microphone development is to match or exceed the human ear in perception of sound for measurement purposes. Without measurements we cannot qualify our progress. We are still trying to match the frequency band, the dynamic range, the phase linearity of the human ear and to obtain very good reproducibility in all situations where humans are involved. We need microphones for development, for standardized measurements and for legal related measurements. Where are we today?
Friday, May 8, 09:00 — 12:30
P9 - Signal Analysis, Measurements, Restoration
Chair: Jan Abildgaard Pedersen
P9-1 Some Improvements of the Playback Path of Wire Recorders—Nadja Wallaszkovits, Phonogrammarchiv Austrian Academy of Sciences - Vienna, Austria; Heinrich Pichler, Audio Consultant - Vienna, Austria
The archival transfer of wire recordings to the digital domain is a highly specialized process that incorporates a wide range of specific challenges. One of the basic problems is the format incompatibility between different manufacturers and models. The paper discusses the special design philosophy, using the tone control network in the record path as well as in the playback path. This tone control circuit causes additional phase and group delay distortions. The influence and characteristics of the tone control (which was not a priori present with every model) is discussed and analog phase correction networks are described. The correction of phase errors is outlined. As this format has been obsolete for many decades, a high quality archival transfer can only be reached by modifying dedicated equipment. The authors propose some possible main modifications and improvements of the playback path of wire recorders, such as signal pickup directly after the playback head, introducing a high quality preamplifier, followed by analog phase correction and correction of the amplitude characteristics. Alternatively signal pickup directly after the playback head, introducing a high quality preamplifier, followed by digital signal processing to optimize the output signal is discussed.
Convention Paper 7698 (Purchase now)
P9-2 Acoustics of the Crime Scene as Transmitted by Mobile Phones—Eddy B. Brixen, EBB-consult - Smorum, Denmark
One task for the audio forensics engineer is to extract background information from audio recordings. A major problem is the assessment of analyzed telephone calls in general and mobile phones (LPC-algorithms) in particular. In this paper the kind of acoustic information to be extracted from a recorded phone call is initially explained. The parameters used for the characterization of the various acoustic spaces and events in question are described. It is discussed how the acoustical cues should be assessed. The validity of acoustic analyses carried out in the attempt to provide crime scene information like reverberation time is presented.
Convention Paper 7699 (Purchase now)
P9-3 Silence Sweep: A Novel Method for Measuring Electroacoustical Devices—Angelo Farina, University of Parma - Parma, Italy
This paper presents a new method for measuring some properties of an electroacoustical system, for example a loudspeaker or a complete sound system. Coupled with the already established method based on Exponential Sine Sweep, this new Silence Sweep method provides a quick and complete characterization of not linear distortions and noise of the device under test. The method is based on the analysis of the distortion products, such as harmonic distortion products or intermodulation effects, occurring when the system is fed with a wide-band signal. Removing from the test signal a small portion of the whole spectrum, it becomes possible to collect and analyze the not-linear response and the noise of the system in that “suppressed” band. Changing continuously the suppressed band over time, we get the Silence Sweep test signal, which allows for quick measurement of noise and distortion over the whole spectrum. The paper explains the method with a number of examples. The results obtained for some typical devices are presented, compared with those obtained with a standard, state-of-the-art measurement system.
Convention Paper 7700 (Purchase now)
P9-4 Pitch and Played String Estimation in Classic and Acoustic Guitars—Isabel Barbancho, Lorenzo Tardón, Ana M. Barbancho, Simone Sammartino, Universidad de Málaga - Málaga, Spain
In classic and acoustic guitars that use standard tuning, the same pitch can be produced at different strings. The aim of this paper is to present a method based on the time and frequency-domain characteristics of the recorded sound to determine, not only the pitch but also the string of the guitar that has been played to produce that pitch. This system will provide information not only of the pitch of the notes played, but also about how those notes were played. This specific information can be valuable to identify the style of the player and can be used in teaching to play the guitar.
Convention Paper 7701 (Purchase now)
P9-5 Statistical Properties of Music Signals—Miomir Mijic, Drasko Masovic, Dragana Sumarac-Pavlovic, Faculty of Electrical Engineering - Belgrade, Serbia
This paper is concerned with the results of a complex approach to statistical properties of various music signals based on 412 musical pieces classified in 12 different genres. Analyzed signals contain more than 24 hours of music. For each piece time variation of the signal level was found, performed with a 10 ms period of integration in rms calculation and with 90 percent overlap, making a new signal representing the level as a function of time. For each piece the statistical analysis of signal level has been performed by its statistical distribution, cumulative distribution, effective value within complete duration of piece, mean level value, and level value corresponding to maximum of the statistical distribution. The parameter L1, L10, L50, and L99 were extracted from cumulative distributions as numerical indicators of dynamic properties. The paper contains detailed statistical data and averaged data for all observed genres, as well as quantitative data about dynamic range and crest factor of various music signals.
Convention Paper 7702 (Purchase now)
P9-6 Multi-Band Generalized Harmonic Analysis (MGHA) and its Fundamental Characteristics in Audio Signal Processing—Takahiro Miura, Teruo Muraoka, Tohru Ifukube, University of Tokyo - Tokyo, Japan
One of the main problems in sound restoration of valuable historical recordings includes the noise reduction. We have been proposing and continuing to improve the noise reduction method utilized by inharmonic analysis such as GHA (Generalized Harmonic Analysis). Algorithm of GHA frequency extraction enables us to extract arbitrary frequency components. In this paper we aimed at more accurate frequency identification from noisy signals to divide analyzed frequency section into multi-bands before analysis: this algorithm is named as Multi-Band GHA (MGHA). The simulation of frequency analysis in a noise-free condition indicated that MGHA is more effective than GHA for the extraction of low frequency components in the condition of both lower window length and amount of frequency components. However, excluding the case of both lower window length and amount of frequency components, GHA identifies frequency components more precisely. Furthermore the result of frequency analysis in condition with steady noise shows that MGHA can be more effectively applied to the case of short window length, many frequency components, and low S/N.
Convention Paper 7703 (Purchase now)
P9-7 Automatic Detection of Salient Frequencies—Joerg Bitzer, University of Applied Science Oldenburg - Oldenburg, Germany; Jay LeBoeuf, Imagine Research, Inc. - San Francisco, CA, USA
In this paper we present several techniques to find the most significant frequencies in recorded audio tracks. These estimated frequencies could be used as a starting point for mixing engineers in the EQing process. In order to evaluate the results, we compare the detected frequencies with a list of reported salient frequencies from audio engineers. The results show that automatic detection is possible. Thus, one of the more boring tasks of a mixing engineer can be automated, which gives the mixing engineer more time to do the artistic part of the mixing process.
Convention Paper 7704 (Purchase now)
Friday, May 8, 10:00 — 11:00
Friday, May 8, 13:00 — 16:30
P11 - Audio Coding
Chair: Nick Zacharov
P11-1 A Novel Scheme for Low Bit Rate Unified Speech and Audio Coding—MPEG RM0—Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Markus Multrus, Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Stefan Bayer, Guillaume Fuchs, Johannes Hilpert, Nikolaus Rettelbach, Frederik Nagel, Julien Robilliard, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Redwan Salami, VoiceAge Corporation - Montreal, Quebec, Canada; Gerald Schuller, Fraunhofer IDMT - Ilmenau, Germany; Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Bernhard Grill, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Coding of speech signals at low bit rates, such as 16 kbps, has to rely on an efficient speech reproduction model to achieve reasonable speech quality. However, for audio signals not fitting to the model this approach generally fails. On the other hand, generic audio codecs, designed to handle any kind of audio signal, tend to show unsatisfactory results for speech signals, especially at low bit rates. To overcome this, a process was initiated by ISO/MPEG, aiming to standardize a new codec with consistent high quality for speech, music, and mixed content over a broad range of bit rates. After a formal listening test evaluating several proposals MPEG has selected the best performing codec as the reference model for the standardization process. This paper describes this codec in detail and shows that the new reference model reaches the goal of consistent high quality for all signal types.
Convention Paper 7713 (Purchase now)
P11-2 A Time-Warped MDCT Approach to Speech Transform Coding—Bernd Edler, Sascha Disch, Leibniz Universität Hannover - Hannover, Germany; Stefan Bayer, Guillaume Fuchs, Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The modified discrete cosine transform (MDCT) is often used for audio coding due to its critical sampling property and good energy compaction, especially for harmonic tones with constant fundamental frequencies (pitch). However, in voiced human speech the pitch is time-varying and thus the energy is spread over several transform coefficients, leading to a reduction of coding efficiency. The approach presented herein compensates for pitch variation in each MDCT block by application of time-variant re-sampling. A dedicated signal adaptive transform window computation ensures the preservation of the time domain aliasing cancellation (TDAC) property. Re-sampling can be designed such that the duration of the processed blocks is not altered, facilitating the replacement of the conventional MDCT in existing audio coders.
Convention Paper 7710 (Purchase now)
P11-3 A Phase Vocoder Driven Bandwidth Extension Method with Novel Transient Handling for Audio Codecs—Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Disch, Leibniz Universitaet Hanover - Hanover, Germany; Nikolaus Rettelbach, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Storage or transmission of audio signals is often subject to strict bit-rate constraints. This is accommodated by audio encoders that encode the lower frequency part in a waveform preserving way and approximate the high frequency signal from the lower frequency data by using a set of reconstruction parameters. This so called bandwidth extension can lead to roughness and other unpleasant auditory sensations. In this paper the origin of these artifacts is identified, and an improved bandwidth extension method called Harmonic Bandwidth Extension (HBE) is outlined avoiding auditory roughness in the reconstructed audio signal. Since HBE is based on phase vocoders, and thus intrinsically not well suited for transient signals, an enhancement of the method by a novel transient handling approach is presented. A listening test demonstrates the advantage of the proposed method over a simple phase vocoder approach.
Convention Paper 7711 (Purchase now)
P11-4 Efficient Cross-Fade Windows for Transitions between LPC-Based and Non-LPC-Based Audio Coding—Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Philippe Gournay, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bruno Bessette, Université de Sherbrooke - Sherbrooke, Quebec, Canada; Max Neuendorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-LPC-based coding mode (based on AAC) operating in the transform-domain and an LPC-based coding (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wLPT). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-LPC-based coding. This paper presents the new set of windows that was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes.
Convention Paper 7712 (Purchase now)
P11-5 Low Bit-Rate Audio Coding in Multichannel Digital Wireless Microphone Systems—Stephen Wray, APT Licensing Ltd. - Belfast, Northern Ireland, UK
Despite advances in voice and data communications in other domains, sound production for live events (concerts, theater, conferences, sports, worship, etc.) still largely depends on spectrum-inefficient forms of analog wireless microphone technology. In these live scenarios, low-latency transmission of high-quality audio is mission critical. However, while demand increases for wireless audio channels (for microphones, in-ear monitoring, and talkback
systems), some of the radio bands available for “Program Making and Special Events” are to be re-assigned for new wireless mobile telephony and Internet connectivity services: the FCC recently decided to permit so-called White Space Devices to operate in sections of UHF spectrum previously reserved for shared use by analog TV and wireless microphones. This paper examines the key performance aspects of low bit-rate audio codecs for the next generation of bandwidth-efficient digital wireless microphone systems that meet the future needs of live events.
Convention Paper 7714 (Purchase now)
P11-6 Krasner’s Audio Coder Revisited—Jamie Angus, Chris Ball, Thomas Peeters, Rowan Williams, University of Salford - Salford, Greater Manchester, UK
An audio compression encoder and decoder system based on Krasner’s work was implemented. An improved Quadrature Mirror Filter tree, which more closely approximates modern critical band measurements, splits the input signal into sub bands that are encoded using both adaptive quantization and entropy coding. The uniform adaptive quantization scheme developed by Jayant was implemented and enhanced through the addition of non-uniform quantization steps and look ahead. The complete codecs are evaluated using the perceptual audio evaluation algorithm PEAQ and their performance compared to equivalent MPEG-1 Layer III files. Initial, limited, tests reveal that the proposed codecs score Objective Difference Grades close to or even better than MPEG-1 Layer III files encoded at a similar bit rate.
Convention Paper 7715 (Purchase now)
P11-7 Inter-Channel Prediction to Prevent Unmasking of Quantization Noise in Beamforming—Mauri Väänänen, Nokia Research Center - Tampere, Finland
This paper studies the use of inter-channel prediction for the purpose of preventing or reducing the risk of noise unmasking when beamforming type of processing is applied to quantized microphone array signals. The envisaged application is the re-use and postprocessing of user-created content. Simulations with an AAC coder and real-world recordings using two microphones are performed to study the suitability of two existing coding tools for this purpose: M/S stereo coding and the AAC Long Term Predictor (LTP) tool adapted for inter-channel prediction. The results indicate that LTP adapted for inter-channel prediction often gives more coding gain than mere M/S stereo coding, both in terms of signal-to-noise ratio and perceptual entropy.
Convention Paper 7716 (Purchase now)
Friday, May 8, 13:30 — 17:30
TT5 - Stadtmuseum Musikinstrumente
Museum of the City of Munich Instruments
The extraordinary collection of the Sammlung Musik-Münchner Stadtmuseum presents exhibits highlighting the construction of musical instruments from different cultures as well as a wide survey of the musical activities of mankind. On show are about 1500 musical instruments from Africa, Asia, the precolonial Americas, and Europe out of a total 6,000 objects. During the guided tour of the collections visitors have the opportunity to play the complete gamelans from the Indonesian Islands of Java and Bali.
Price: EUR 20
Friday, May 8, 14:00 — 18:30
P13 - Spatial Audio and Spatial Perception
Chair: Tapio Lokki
P13-1 Evaluation of Equalization Methods for Binaural Signals—Zora Schärer, Alexander Lindau, TU Berlin - Berlin, Germany
The most demanding test criterion for the quality of binaural simulations of acoustical environments is whether they can be perceptually distinguished from a real sound field or not. If the simulation provides a natural interaction and sufficient spatial resolution, differences are predominantly perceived in terms of spectral distortions due to a non-perfect equalization of the transfer functions of the recording and reproduction systems (dummy head microphones, headphones). In order to evaluate different compensation methods, several headphone transfer functions were measured on a dummy head. Based upon these measurements, the performance of different inverse filtering techniques re-implemented from literature was evaluated using auditory measures for spectral differences. Additionally, an ABC/HR listening test was conducted, using two different headphones and two different audio stimuli (pink noise, acoustical guitar). In the listening test, a real loudspeaker was directly compared to a binaural simulation with high spatial resolution, which was compensated using seven different equalization methods.
Convention Paper 7721 (Purchase now)
P13-2 Crosstalk Cancellation between Phantom Sources—Florian Völk, Thomas Musialik, Hugo Fastl, Technical University of München - München, Germany
This paper presents an approach using phantom sources (resulting from the so-called summing localization of two loudspeakers) as sources for crosstalk cancellation (CTC). The phantom sources can be rotated synchronously with the listener’s head, thus demanding significantly less processing power than traditional approaches using fixed CTC loudspeakers, as an online re-computation of the CTC filters is (under certain circumstances) not necessary. First results of localization experiments show the general applicability of this procedure.
Convention Paper 7722 (Purchase now)
P13-3 Preliminary Evaluation of Sweet Spot Size in Virtual Sound Reproduction Using Dipoles—Yesenia Lacouture Parodi, Per Rubak, Aalborg University - Aalborg, Denmark
In a previous study, three crosstalk cancellation techniques were evaluated and compared under different conditions. Least square approximations in frequency and time domain were evaluated along with a method based on minimum-phase approximation and a frequency independent delay. In general, the least square methods outperformed the method based on minimum-phase approximation. However, the evaluation was only done for the best-case scenario, where the transfer functions used to design the filters correspond to the listener’s transfer functions and his/her location and orientation relative to the loudspeakers. In this paper we present a follow-up evaluation of the performance of the three inversion techniques when these conditions are violated. A setup to measure the sweet spot of different loudspeaker arrangements is described. Preliminary measurement results are presented for loudspeakers placed at the horizontal plane and an elevated position, where a typical 60-degree stereo setup is compared with two closely spaced loudspeakers. Additionally, two- and four-channel arrangements are evaluated.
Convention Paper 7723 (Purchase now)
P13-4 The Importance of the Direct to Reverberant Ratio in the Perception of Distance, Localization, Clarity, and Envelopment—David Griesinger, Consultant - Cambridge, MA, USA
The Direct to Reverberant ratio (D/R)—the ratio of the energy in the first wave front to the reflected sound energy—is absent from most discussions of room acoustics. Yet only the direct sound (DS) provides information about the localization and distance of a sound source. This paper discusses how the perception of DS in a reverberant field depends on the D/R and the time delay between the DS and the reverberant energy. Threshold data for DS perception will be presented, and the implications for listening rooms, hall design, and electronic enhancement will be discussed. We find that both clarity and envelopment depend on DS detection. In listening rooms the direct sound must be at least equal to the total reflected energy for accurate imaging. As the room becomes larger (and the time delay increases) the threshold goes down. Some conclusions: typical listening rooms benefit from directional loudspeakers, small concert halls should not have a shoe-box shape, early reflections need not be lateral, and electroacoustic enhancement of late reverberation may be vital in small halls.
Convention Paper 7724 (Purchase now)
P13-5 Frequency-Domain Interpolation of Empirical HRTF Data—Brian Carty, Victor Lazzarini, National University of Ireland - Maynooth, Ireland
This paper discusses Head Related Transfer Function (HRTF)-based artificial spatialization of audio. Two alternatives to the minimum phase method of HRTF interpolation are suggested, offering novel approaches to the challenge of phase interpolation. A phase truncation, magnitude interpolation technique aims to avoid complex preparation, manipulation or transformation of empirical HRTF data, and any inaccuracies that may be introduced by these operations. A second technique adds low frequency nonlinear frequency scaling to a functionally based phase model. This approach aims to provide a low frequency spectrum more closely aligned to the empirical HRTF data. Test results indicate favorable performance of the new techniques.
Convention Paper 7725 (Purchase now)
P13-6 Analysis and Implementation of a Stereophonic Play Back System for Adjusting the “Sweet Spot” to the Listener’s Position—Sebastian Merchel, Stephan Groth, Dresden University of Technology - Dresden, Germany
This paper focuses on a stereophonic play back system designed to adjust the “sweet spot” to the listener’s position. The system includes an optical face tracker that provides information about the listener’s x-y position. Accordingly, the loudspeaker signals are manipulated in real-time in order to move the “sweet spot.” The stereophonic perception with an adjusted “sweet spot” is theoretically investigated on the basis of several models of binaural hearing. The results indicate that an adjustment of signals corresponding to the center of the listener’s head does improve the localization over the whole listening area. Although some localization error remains due to asymmetric signal paths for off-center listening positions, which can be estimated and compensated for.
Convention Paper 7726 (Purchase now)
P13-7 Issues on Dummy-Head HRTFs Measurements—Daniela Toledo, Henrik Møller, Aalborg University - Aalborg, Denmark
The dimensions of a person are small compared to the wavelength at low frequencies. Therefore, at these frequencies head-related transfer functions (HRTFs) should decrease asymptotically until they reach 0 dB—i.e., unity gain—at DC. This is not the case in measured HRTFs: the limitations of the equipment used result in a wrong—and random—value at DC and the effect can be seen well within the audio frequencies. We have measured HRTFs on a commercially available dummy-head Neumann KU-100 and analyzed issues associated to calibration, DC correction, and low-frequency response. Informal listening tests suggest that the ripples seen in HRTFs with a wrong DC value affect the sound quality in binaural synthesis.
Convention Paper 7727 (Purchase now)
P13-8 Binaural Processing Algorithms: Importance of Clustering Analysis for Preference Tests—Andreas Silzle, Bernhard Neugebauer, Sunish George, Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The acceptability of a newly proposed technology for commercial application is often assumed if the sound quality reached in a listening test surpasses a certain target threshold. As an example, it is a well-established procedure for decisions on the deployment of audio codecs to run a listening test comparing the coded/decoded signal with the uncoded reference signal. For other technologies, e.g., upmix or binaural processing, however, the unprocessed signal only can act as a "comparison signal." Here, the goal is to achieve a significant preference of the processed over the comparison signal. For such preference listening tests, we underline the importance of clustering the test results to obtain additional valuable information, as opposed to using the standard statistic metrics like mean and confidence interval. This approach allows determining the size of the user group that significantly prefers to use the proposed algorithm when it would be available in a consumer device. As an example, listening test data for binaural processing algorithms are analyzed in this investigation.
Convention Paper 7728 (Purchase now)
P13-9 Perception of Head-Position-Dependent Variations in Interaural Cross-Correlation Coefficient—Russell Mason, Chungeun Kim, Tim Brookes, University of Surrey - Guildford, Surrey, UK
Experiments were undertaken to elicit the perceived effects of head-position-dependent variations in the interaural cross-correlation coefficient of a range of signals. A graphical elicitation experiment showed that the variations in the IACC strongly affected the perceived width and depth of the reverberant environment, as well as the perceived width and distance of the source. A verbal experiment gave similar results and also indicated that the head-position-dependent IACC variations caused changes in the perceived spaciousness and envelopment of the stimuli.
Convention Paper 7729 (Purchase now)
Friday, May 8, 16:30 — 18:30
P14 - Multichannel Coding
Chair: Ville Pulkki
P14-1 Further EBU Tests of Multichannel Audio Codecs—David Marston, BBC R&D - Tadworth, Surrey, UK; Franc Kozamernik, EBU - Geneva, Switzerland; Gerhard Stoll, Gerhard Spikofski, IRT - Munich, Germany
The European Broadcasting Union technical group D/MAE has been assessing the quality of multichannel audio codecs in a series of subjective tests. The two most recent tests and results are described in this paper. The first set of tests covered 5.1 multichannel audio emission codecs at a range of bit-rates from 128 kbit/s to 448 kbit/s. The second set of tests covered cascaded contribution codecs, followed by the most prominent emission codecs. Codecs under test include offerings from Dolby, DTS, MPEG, Apt, and Linear Acoustics. The conclusions observe that while high quality is achievable at lower bit-rates, there are still precautions to be aware of. The results from cascading of codecs have shown that the emission codec is usually the bottleneck of quality.
Convention Paper 7730 (Purchase now)
P14-2 Spatial Parameter Decision by Least Squared Error in Parametric Stereo Coding and MPEG Surround—Chi-Min Liu, Han-Wen Hsu, Yung-Hsuan Kao, Wen-Chieh Lee, National Chiao Tung University - Hsinchu, Taiwan
Parametric stereo coding (PS) and MPEG Surround (MPS) are used to reconstruct stereo or multichannel signals from down-mixed signals with a few spatial parameters. For extracting spatial parameters, the first issue is to decide a time-frequency (T-F) tile that controls the resolution of reconstructed spatial scenes and highly determines the amount of consumed bits. On the other hand, according to the standard syntax, the up-mixing matrices for time slots not on time borders are reconstructed by interpolation in the decoder. Therefore, the second issue is to decide the transmitted parameter values on the time borders for confirming the minimum reconstruction error of matrices. For both PS and MPS, based on the criterion of least squared error, this paper proposes a generic dynamic programming method for deciding the two issues under the tradeoff of audio quality and limited bits.
Convention Paper 7731 (Purchase now)
P14-3 The Potential of High Performance Computing in Audio Engineering—David Moore, Jonathan Wakefield, University of Huddersfield - West Yorkshire, UK
High Performance Computing (HPC) resources are fast becoming more readily available. HPC hardware now exists for use in conjunction with standard desktop computers. This paper looks at what impact this could have on the audio engineering industry. Several potential applications of HPC within audio engineering research are discussed. A case study is also presented that highlights the benefits of using the Single Instruction, Multiple Data (SIMD) architecture when employing a search algorithm to produce surround sound decoders for the standard 5-speaker surround sound layout.
Convention Paper 7732 (Purchase now)
P14-4 Efficient Methods for High Quality Merging of Spatial Audio Streams in Directional Audio Coding—Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Ville Pulkki, Helsinki University of Technology - Espoo, Finland; Fabian Kuech, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Mikko-Ville Laitinen, Helsinki University of Technology - Espoo, Finland; Richard Schultz-Amling, Markus Kallinger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Directional Audio Coding (DirAC) is an efficient technique to capture and reproduce spatial sound. The analysis step outputs a mono DirAC stream, comprising an omnidirectional microphone pressure signal and side information, i.e., direction of arrival and diffuseness of the sound field expressed in time-frequency domain. This paper proposes efficient methods to merge multiple mono DirAC streams to allow a joint playback at the reproduction side. The problem of merging two or more streams arises in applications such as immersive spatial audio teleconferencing, virtual reality, and online gaming. Compared to a trivial direct merging of the decoder outputs, the proposed methods are more efficient as they do not require the synthesis step. From this it follows the benefit that the loudspeaker setup at the reproduction side does not have to be known in advance. Simulations and listening tests confirm the absence of any artifacts and that the proposed methods are practically indistinguishable from the ideal merging.
Convention Paper 7733 (Purchase now)
Friday, May 8, 19:30 — 21:30
Isar Brau, Munchen Pullach
This year the Banquet will take place in a small old railway station, above the valley of the River Isar. The railway opened in 1891 and steam trains took people from the city to many beautiful places in the south of Munich. Today the steam trains have been replaced and the line is now part of the S-Bahn, so the old station is not needed anymore and has been turned into a traditional Bavarian style restaurant with its own micro-brewery. What could be more natural than making this location a pleasant place for a “get together” in a lovely atmosphere?
The welcome beer from the micro brewery and other drinks will be followed by a fine buffet with Bavarian delicacies. At the end of a long day at the Convention, these “Schmankerl” will be a good way to relax and enjoy the evening with old and new friends and colleagues. Come and savour Munich’s lifestyle. The ticket price includes all food and drinks and the bus to the restaurant and back.
55 Euros for AES members; 65 Euros for nonmembers
Tickets will be available at the Special Events desk.
Saturday, May 9, 13:00 — 14:00
Saturday, May 9, 15:30 — 17:00
The Career Fair will feature several companies from the exhibit floor. All attendees of the convention, students and professionals alike, are welcome to come talk with representatives from the companies and find out more about job and internship
opportunities in the audio industry. Bring your resume!
Saturday, May 9, 18:30 — 19:00
The band featured in the Live Sound Workshop LS2, Rauschenberger, will continue to play after the workshop finishes, in a concert open to all attendees.
The band "Rauschenberger" is a new upcoming group from Hannover around singer and leader Rauschenberger, who has a splendid and very characteristic voice.
Sunday, May 10, 09:00 — 12:30
P25 - Sound Design and Processing
Chair: Michael Hlatky
P25-1 Hierarchical Perceptual Mixing—Alexandros Tsilfidis, Charalambos Papadakos, John Mourjopoulos, University of Patras - Patras, Greece
A novel technique of perceptually-motivated signal-dependent audio mixing is presented. The proposed Hierarchical Perceptual Mixing (HPM) method is implemented in the spectro-temporal domain; its principle is to combine only the perceptually relevant components of the audio signals, derived after the calculation of the minimum masking threshold, which is introduced in the mixing stage. Objective measures are presented indicating that the resulting signals have enhanced dynamic range and lower crest factor with no unwanted artifacts, compared to the traditionally mixed signals. The overall headroom is improved, while clarity and tonal balance are preserved.
Convention Paper 7789 (Purchase now)
P25-2 Source-Filter Modeling in Sinusoid Domain—Wen Xue, Mark Sandler, Queen Mary, University of London - London, UK
This paper presents the theory and implementation of source-filter modeling in sinusoid domain and its applications on timbre processing. The technique decomposes the instantaneous amplitude in a sinusoid model into a source part and a filter part, each capturing a different aspect of the timbral property. We show that the sinusoid domain source-filter modeling is approximately equivalent to its time or frequency domain counterparts. Two methods are proposed for the evaluation of the source and filter, including a least-square method based on the assumption of slow variation of source and filter in time, and a filter bank method that models the global spectral envelope in the filter. Tests show the effectiveness of the algorithms for isolation frequency-driven amplitude variations. Example applications are given to demonstrate the use of the technique for timbre processing.
Convention Paper 7790 (Purchase now)
P25-3 Analysis of a Modified Boss DS-1 Distortion Pedal—Matthew Schneiderman, Mark Sarisky, University of Texas at Austin - Austin, TX, USA
Guitar players are increasingly modifying (or paying someone else to modify) inexpensive mass-produced guitar pedals into boutique units. The Keeley modification of the Boss DS-1 is a prime example. In this paper we compare the measured and perceived performance of a Boss DS-1 before and after applying the Keeley All-Seeing-Eye and Ultra mods. This paper sheds light on psychoacoustics, signal processing, and guitar recording techniques in relation to low fidelity guitar distortion pedals.
Convention Paper 7791 (Purchase now)
P25-4 Phase and Amplitude Distortion Methods for Digital Synthesis of Classic Analog Waveforms—Joseph Timoney, Victor Lazzarini, Brian Carty, NUI Maynooth - Maynooth, Ireland; Jussi Pekonen, Helsinki University of Technology - Espoo, Finland
An essential component of digital emulations of subtractive synthesizer systems are the algorithms used to generate the classic oscillator waveforms of sawtooth, square, and triangle waves. Not only should these be perceived to be authentic sonically, but they should also exhibit minimal aliasing distortions and be computationally efficient to implement. This paper examines a set of novel techniques for the production of the classic oscillator waveforms of analog subtractive synthesis that are derived from using amplitude or phase distortion of a mono-component input waveform. Expressions for the outputs of these distortion methods are given that allow parameter control to ensure proper bandlimited behavior. Additionally, their implementation is demonstrably efficient. Last, the results presented illustrate their equivalence to their original analog counterparts.
Convention Paper 7792 (Purchase now)
P25-5 Soundscape Attribute Identification—Martin Ljungdahl Eriksson, Jan Berg, Luleå University of Technology - Luleå, Sweden
In soundscape research, the field’s methods can be employed in combination with approaches involving sound quality attributes in order to create a deeper understanding of sound images and soundscapes and how these may be described and designed. The integration of four methods are outlined, two from the soundscape domain and two from the sound engineering domain.
Convention Paper 7793 (Purchase now)
P25-6 SonoSketch: Querying Sound Effect Databases through Painting—Michael Battermann, Sebastian Heise, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Numerous techniques support finding sounds that are acoustically similar to a given one. It is hard, however, to find a sound to start the similarity search with. Inspired by systems for image search that allow drawing the shape to be found, we address quick input for audio retrieval. In our system, the user literally sketches a sound effect, placing curved strokes on a canvas. Each of these represents one sound from a collection of basic sounds. The audio feedback is interactive, as is the continuous update of the list of retrieval results. The retrieval is based on symbol sequences formed from MFCC data compared with the help of a neural net using an editing distance to allow small temporal changes.
Convention Paper 7794 (Purchase now)
P25-7 Generic Sound Effects to Aid in Audio Retrieval—David Black, Sebastian Heise, Hochschule Bremen (University of Applied Sciences) - Bremen, Germany; Jörn Loviscach, Fachhochschule Bielefeld (University of Applied Sciences) - Bielefeld, Germany
Sound design applications are often hampered because the sound engineer must either produce new sounds using physical objects, or search through a database of sounds to find a suitable sample. We created a set of basic sounds to mimic these physical sound-producing objects, leveraging the mind's onomatopoetic clustering capabilities. These sounds, grouped into onomatopoetic categories, aid the sound designer in music information retrieval (MIR) and sound categorization applications. Initial testing regarding the grouping of individual sounds into groups based on similarity has shown that participants tended to group certain sounds together, often reflecting the groupings our team constructed.
Convention Paper 7795 (Purchase now)
Sunday, May 10, 13:30 — 15:00
P29 - Signal Analysis, Measurements, Restoration
P29-1 Evaluation and Comparison of Audio Chroma Feature Extraction Methods—Michael Stein, Benjamin M. Schubert, Ilmenau University of Technology - Ilmenau, Germany; Matthias Gruhne, Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Markus Mehnert, Ilmenau University of Technology - Ilmenau, Germany
This paper analyzes and compares different methods for digital audio chroma feature extraction. The chroma feature is a descriptor, which represents the tonal content of a musical audio signal in a condensed form. Therefore chroma features can be considered as an important prerequisite for high-level semantic analysis, like chord recognition or harmonic similarity estimation. A better quality of the extracted chroma feature enables much better results in these high-level tasks. In order to discover the quality of chroma features, seven different state-of-the-art chroma feature extraction methods have been implemented. Based on an audio database, containing 55 variations of triads, the output of these algorithms is critically evaluated. The best results were obtained with the Enhanced Pitch Class Profile.
Convention Paper 7814 (Purchase now)
P29-2 Measuring Transient Structure-Borne Sound in Musical Instruments—Proposal and First Results from a Laser Intensity Measurement Setup—Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany; Marcel thor Straten, Consultant - Seevetal, Germany; Andreas Selk, Consultant - Hamburg, Germany
The proposal for this new measurement setup is motivated by curiosity in transients propagating across arched tops of violins. Understanding the impact of edge construction on transient wave reflection back to the to the top of a violin or on conduction into the rib requires single-shot recordings possibly without statistical processing. Signal-to-noise ratio should be high although mechanical amplitudes at distinct locations on the structure surface are in the range of a few micrometers only. In the proposed setup, the intensity of a laser beam is directly measured after passing a screen attached to the device under test. The signal-to-noise ratio achieved for one micrometer transients in single-shot recordings is significantly more than 60 dB.
Convention Paper 7815 (Purchase now)
P29-3 Evaluating Ground Truth for ADRess as a Preprocess for Automatic Musical Instrument Identification—Joseph McKay, Mikel Gainza, Dan Barry, Dublin Institute of Technology - Dublin, Ireland
Most research in musical instrument identification has focused on labeling isolated samples or solo phrases. A robust instrument identification system capable of dealing with polytimbral recordings of instruments remains a necessity in music information retrieval. Experiments are described that evaluate the ground truth of ADRess as a sound source separation technique used as a preprocess to automatic musical instrument identification. The ground truth experiments are based on a number of basic acoustic features, while using a Gaussian Mixture Model as the classification algorithm. Using all 44 acoustic feature dimensions, successful identification rates are achieved.
Convention Paper 7816 (Purchase now)
P29-4 Improving Rhythmic Pattern Features Based on Logarithmic Preprocessing—Matthias Gruhne, Christian Dittmar, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
In the area of Music Information Retrieval, the rhythmic analysis of music plays an important role. In order to derive rhythmic information from music signals, several feature extraction algorithms have been described in the literature. Most of them extract the rhythmic information by auto-correlating the temporal envelope derived from different frequency bands of the music signal. Using the auto-correlated envelopes directly as an audio-feature is afflicted with the disadvantage of tempo dependency. To circumvent this problem, further postprocessing via higher-order statistics has been proposed. However, the resulting statistical features are still tempo dependent to a certain extent. This paper describes a novel method, which logarithmizes the lag-axis of the auto-correlated envelope and discards the tempo-dependent part. This approach leads to tempo-invariant rhythmic features. A quantitative comparison of the original methods versus the proposed procedure is described and discussed in this paper.
Convention Paper 7817 (Purchase now)
P29-5 Further Developments of Parameterization Methods of Audio Stream Analysis for Security Purposes—Pawel Zwan, Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The paper presents an automatic sound recognition algorithm intended for application in an audiovisual security monitoring system. A distributed character of security systems does not allow for simultaneous observation of multiple multimedia streams, thus an automatic recognition algorithm must be introduced. In the paper a module for the parameterization and automatic detection of audio events is described. The spectral analysis of sounds of a broken window, gunshot, and scream are performed and parameterization methods are proposed and discussed. Moreover, a sound classification system based on the Support Vector Machines (SVM) algorithm is presented and its accuracy is discussed. The practical application of the system with the use of a monitoring station is shown. The plan of further experiments is presented and the conclusions are derived.
Convention Paper 7818 (Purchase now)
P29-6 Estimating Instrument Spectral Envelopes for Polyphonic Music Transcription in a Music Scene-Adaptive Approach—Julio J. Carabias-Orti, Pedro Vera-Candeas, Nicolas Ruiz-Reyes, Francisco J. Cañadas-Quesada, Pablo Cabañas-Molero, University of Jaén - Linares, Spain
We propose a method for estimating the spectral envelope pattern of musical instruments in a musical scene-adaptive scheme, without having any prior knowledge about the real transcription. A musical note is defined as stable when variations between its harmonic amplitudes are held constant during a certain period of time. A density-based clustering algorithm is used with the stable notes in order to separate different envelope models for each note. Music scene-adaptive envelope patterns are finally obtained from similarity and continuity of the different note models. Our approach has been tested in a polyphonic music transcription scheme with synthesized and real music recordings obtaining very promising results.
Convention Paper 7819 (Purchase now)
Sunday, May 10, 14:30 — 16:30
W22 - MPEG SAOC: Interactive Audio and Broadcasting, Music 2.0, Next Generation Telecommunication
Oliver Hellmuth, Fraunhofer IIS - Erlangen, Germany
Jonas Engdegård, Dolby - Stockholm, Sweden
Christof Faller, Illusonic LLC - Lausanne, Switzerland
Jürgen Herre, Fraunhofer IIS - Erlangen, Germany
Werner Oomen, Philips Applied Technologies - Eindhoven, The Netherlands
Recently the ISO/MPEG standardization group launched an activity for bit rate-efficient and backward compatible coding of multiple sound objects that heavily exploits the human perception of spatial sound. On the receiving side, such a "Spatial Audio Object Coding" (SAOC) system renders the transmitted objects interactively into a sound scene on any desired reproduction. Based on the SAOC technology elegant solutions for Interactive Audio and Broadcasting, Music 2.0, Next Generation Telecommunication become feasible. The workshop reviews the ideas and principles behind Spatial Audio Object Coding, especially highlighting its possibilities and benefits for those new types of applications. Additionally, the potential of SAOC is illustrated by means of real-time demonstrations.