Authors:Alary, Benoit; Politis, Archontis; Schlecht, Sebastian; Välimäki, Vesa
Affiliation:Acoustics Lab, Dept. of Signal Processing and Acoustics, Aalto University, Espoo, Finland;Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland;International Audio Laboratories Erlangen, Erlangen, Germany;Acoustics Lab, Dept. of Signal Processing and Acoustics, Aalto University, Espoo, Finland
Artificial reverberation algorithms are used to enhance dry audio signals. Delay-based reverberators can produce a realistic effect at a reasonable computational cost. While the recent popularity of spatial audio algorithms is mainly related to the reproduction of the perceived direction of sound sources, there is also a need to spatialize the reverberant sound field. Usually multichannel reverberation algorithms output a series of decorrelated signals yielding an isotropic energy decay. This means that the reverberation time is uniform in all directions. However, the acoustics of physical spaces can exhibit more complex direction-dependent characteristics. This paper proposes a new method to control the directional distribution of energy over time, within a delay-based reverberator, capable of producing a directional impulse response with anisotropic energy decay. The discussion explores a method using multichannel delay lines in conjunction with a direction-dependent transform in the spherical harmonic domain to control the direction-dependent decay of the late reverberation. The new reverberator extends the feedback delay network, retaining its time-frequency domain characteristics. The proposed directional feedback delay network reverberator can produce nonuniform direction-dependent decay time, suitable for anisotropic decay reproduction on a loudspeaker array or in binaural playback through the use of ambisonics.
Download: PDF (HIGH Res) (1.8MB)
Download: PDF (LOW Res) (464KB)
Authors:Kristóf Horváth; Bank, Balázs
Affiliation:Dept. of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary;Dept. of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
Infinite impulse response (IIR) filters are widely used in audio signal processing, but they are sensitive to numerical effects, especially when only fixed-point arithmetic is available. The numerical problems can be reduced by converting the filter to parallel second-order sections. This is not always sufficient in audio signal processing because a filter having a logarithmic pole distribution leads to poles near the unit circle, which generates unacceptable levels of numerical noise. This can be avoided by implementing these problematic sections with specialized filter structures. In this paper various second-order structures are systematically analyzed, including the common direct-form structures and the Gold & Rader, Kingsbury, Zolzer, and optimized warped IIR structure. The paper also proposes an extension to the Chamberlin state variable filter so that it can be used as a general IIR filter, and shows that exactly this filter has the best noise performance among the tested structures for the problematic low-pole frequencies.
Download: PDF (HIGH Res) (5.2MB)
Download: PDF (LOW Res) (4.7MB)
Affiliation:Department of Computer Science, University of Milan Via Celoria, Milan, Italy
Computational Auditory Scene Analysis (CASA) is typically achieved by using statistical models that have been trained offline on available data. Their performance relies heavily on the assumption that the process generating the data along with the recording conditions are stationary over time. Nowadays, the focus of CASA is moving from structured, well-defined scenarios to unrestricted scenes with realistic characteristics where the stationarity assumption might not be true. Therefore, there is a high demand for methodologies and tools dealing with a series of problems tightly coupled with such nonstationary conditions, such as changes in the recording conditions, reverberant effects, etc. This paper formulates these obstacles under the concept drift framework and explores two fundamental adaptation approaches: active and passive. The overall aim is to learn online the statistical properties of the evolving data distribution and incorporate them into the recognition mechanism for boosting its performance. The proposed CASA system encompasses a concept drift detector and an online adaptation module. The proposed framework was evaluated in the auditory analysis of three environments (office, meeting room, and lecture hall) with diverse characteristics (dimensions, reverberation times, etc.) The results are encouraging in terms of classification rate, false positive rate, false negative rate, and detection delay.
Download: PDF (HIGH Res) (480KB)
Download: PDF (LOW Res) (344KB)
Authors:Howie, Will; Martin, Denis; Kim, Sungyoung; Kamekawa, Toru; King, Richard
Affiliation:CBC/Radio-Canada, Vancouver, Canada;The Graduate Program in Sound Recording, McGill University, Montreal, Quebec, Canada;College of Engineering Technology, Rochester Institute of Technology, Rochester, NY, USA; Department of Musical Creativity and the Environment, Tokyo University of the Arts, Tokyo, Japan;The Graduate Program in Sound Recording, McGill University, Montréal, Quebec, Canada
Well executed listening tests often require a great deal of time and resources for the creation or acquisition of appropriate stimuli, design of a testing interface, selection of subjects, and implementation of the experiment in an environment that is acoustically and technologically appropriate. The use of experienced, trained, or practiced subjects in listening tests has been shown in numerous previous studies to allow for a reduction in the number of subjects necessary to gather consistent, meaningful data. This study examined the effect of audio production experience, musical training, technical ear training, age, and previous experience listening to 3D music recordings on listener performance within the context of 3D audio evaluation. The results showed: (a) audio production was the most valuable type of previous experience for predicting listener consistency in making preference or ranking judgments; (b) music training was also found to be a good predictor of subject consistency; (c) technical ear training and previous experience hearing 3D music recordings had no influence on listener consistency; and (d) subjects in their early to mid 30s appear to occupy an optimal age range in terms of ability to focus on the types of listening test tasks described in this study. Stimuli used in this study were limited to orchestral music.
Download: PDF (HIGH Res) (779KB)
Download: PDF (LOW Res) (348KB)
Authors:Sampaio, Jose Fabrizio Pereira; Nascimento, Francisco Assis de Oliveira
Affiliation:Brazilian Federal Police, Brasilia, Brazil;University of Brasilia, Brasilia, Brazil
Forensic audio authentication techniques include an extensive toolbox for verifying recording originality and exposing forgery traces. The AMR (adaptive multirate) codec is a widespread standard audio format used to store speech in smartphones or digital recorders, making AMR audio is an important source of audio to be authenticated. An original AMR audio file is supposed to be single compressed, while a tampered version should be double compressed. This makes AMR double-compression detection an interesting topic for multimedia forensics. In this paper a new method is proposed to detect AMR double compression using compressed domain features based on linear prediction coefficients and a support vector machine (SVM). By using a robust scaling procedure, a detection accuracy of 98% was achieved with TIMIT database, reaching the same performance as the state-of-the-art methods. The feature extraction and computation are designed for the specific problem, representing AMR audio files by a more appropriate set of vectors. The main conclusion is that robust scaling is a key tool to increase the accuracy of the method. Compared to the previous experiments using compressed-domain features and min-max scaling, the method offers an increase of about 5% in average accuracy.
Download: PDF (HIGH Res) (653KB)
Download: PDF (LOW Res) (328KB)
Virtual or simulated environments are becoming an important tool in the tuning and evaluation of car audio systems. The main challenge seems to be generating convincing evidence that the results from such simulations are sufficiently similar to those from real cars to be usable as an alternative. There is evidence that engine harmonic cancellation can coexist with entertainment audio “under one roof,” provided that care is taken over the evaluation criteria. Finally, loudspeaker control systems can be combined with microphone systems in cars to create a fully self-contained test system.
Download: PDF (386KB)