Authors:Wei, Yi; Zeng, Yumin; Li, Chen
Affiliation:School of Physics and Technology Nanjing Normal University, Nanjing, China; Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, China; State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, China; Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China
The goal of speech enhancement is to make speech more pleasant and understandable, improving one or more perceptual aspects of speech, such as quality or intelligibility. This paper addresses single-channel speech enhancement. The authors explore improved multiband spectral subtraction based on the equivalent rectangular bandwidth (ERB) scale. In the proposed algorithm, the full speech spectrum is divided into different nonuniform frequency bands, and spectral subtraction is performed separately in each band. Moreover, subband spectral entropy is used directly to do the noise estimation rather than using speech endpoint detection. The ERB scale is adopted in the subband spectral entropy instead of the traditional linear scale or the Bark scale. The subband spectral entropy based on ERB scale can obtain a more accurate noise estimation, which can achieve better single-channel speech enhancement. The speech spectrograms, objective measures, and informal subjective listening tests show that the remnant noise is suppressed more by the proposed algorithm than by the Upadhyay’s algorithm.
Download: PDF (HIGH Res) (3.1MB)
Download: PDF (LOW Res) (467KB)
Discuss this paper (1 comment)
Authors:Bomhardt, Ramona; Mejía, Isabel C. Patiño; Zell, Andreas; Fels, Janina
Affiliation:RWTH Aachen University, Institute of Technical Acoustics, Medical Acoustics Group, Aachen, Germany; University of Tübingen, Tübingen, Germany
Numerous studies have shown that on the interaural time difference (ITD) depends on the angle of incident of the sound wave as well as the individual’s anthropometric dimensions. When a geometric model is used for determining ITD, exact anthropometric head dimensions are desirable. However, measuring anthropometric dimensions always introduces uncertainties that are primarily caused by the projection from three-dimensional head shape to one-dimensional measures. This paper describes a listening experiment to derive the direction-dependent just-noticeable time deviation from an individual ITD. The determined threshold is then utilized to calculate the required measurement accuracy of the input dimensions of an ITD model exemplar. A feasibility study is presented in which four required head dimensions of 17 subjects are automatically measured on three-dimensional head models. These models are determined using an RGBD sensor and software for three-dimensional surface reconstruction. The accuracy of the automatically determined anthropometric head dimensions is evaluated by comparing them to dimensions obtained from magnetic resonance imaging scans.
Download: PDF (HIGH Res) (4.5MB)
Download: PDF (LOW Res) (291KB)
Authors:Francombe, Jon; Brookes, Tim; Mason, Russell
Affiliation:Institute of Sound Recording, University of Surrey, Guildford, UK
With object-based audio transmission, a scene is distributed as a set of audio objects, as opposed to discrete audio channels. An object comprises an audio stream for a particular aspect of the scene, accompanied by some metadata, such as the desired level and spatial position of the object. Object-based audio offers the possibility of altering the rendering of an audio scene in order to modify or maintain perceptual attributes if the relationships between attributes and mix parameters are known. This research aims to determine the relationship between parameters of an object-based mix and the perception of envelopment (an important attribute of spatial audio reproduction systems), and to develop and test a system for manipulating envelopment in object-based audio in a perceptually relevant manner. An experiment was performed in which mixing engineers were asked to create mixes of object-based content at three levels of envelopment (low, medium, and high) while keeping the overall mix quality at an acceptable level. This enabled analysis of parameter values in order to assess how participants created different levels of envelopment. It was shown in a validation experiment that these parameters could be used to adjust envelopment to a target level.
Download: PDF (HIGH Res) (2.9MB)
Download: PDF (LOW Res) (547KB)
Authors:Denk, Florian; Kollmeier, Birger; Ewert, Stephan D.
Affiliation:Medizinische Physik and Cluster of Excellence
Acoustic reflections in impulse responses can be eliminated by truncation to short observation times that exclude the reflections. However, truncating the response tail distorts the corresponding low-frequency transfer function. When reflections in ”semianechoic” data originate from moderate-sized objects, e.g., equipment in anechoic chambers, their composition is mostly high frequencies. Consequently, truncation must only be performed in the mid to high frequencies where the information is contained in a brief time interval; the impulse response tail is anechoic for the low frequencies and can be retained. The authors present a frequency-dependent truncation approach that exploits this property by adapting the truncation length in each band. This avoids low-frequency errors while disturbing reflections are windowed out. Among several tested formulations, a novel Short Time Fourier Transform-based formulation generated the least artifacts while the anechoic impulse response was well preserved in both simulated and measured semianechoic data.
Download: PDF (HIGH Res) (2.3MB)
Download: PDF (LOW Res) (769KB)
Authors:Zhou, Haoran; Lu, Jing; Qiu, Xiaojun
Affiliation:Key Lab of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing, China; Centre for Audio, Acoustics and Vibration, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, Australia
Because of their superior directional properties, shotgun microphones are preferable choices for high-quality speech and audio recording in environments with intense ambient noise. However, their low- and middle-frequency directivities are usually not sufficiently high for practical usage. Alternatively, linear microphone arrays with unequally spaced elements enable high wideband directivity with a small number of microphones. A general procedure for designing such arrays steered at the endfire direction with unequally spaced elements is proposed for high-quality audio recording from 20 Hz to 16 kHz. A simulated annealing method is used to iteratively optimize the spatial distribution for microphones that do not have matched amplitude and phase. The challenge in the optimization process arises because large bandwidth requires large matrices, which produces large accumulation error. Therefore, the optimization process needs to be carefully regularized with the error being restricted to a relatively small tolerable level. The proposed method can produce microphone arrays with higher directivity than the corresponding shotgun microphones of the same length with comparable low self-noise level.
Download: PDF (HIGH Res) (8.3MB)
Download: PDF (LOW Res) (613KB)
Authors:Vencovsky, Vaclav; Rund, Frantisek; Slegl, David
Affiliation:Czech Technical University in Prague, Department of Radioelectronics, Czech Republic
For a given type of headphone, the sound pressure levels of pure tones at which an adequate number of young listeners without hearing loss just perceived the tones is called Reference Equivalent Threshold Sound Pressure Levels (RETSPLs). Current standards and norms have established RETSPL values for specific headphones; other headphones do not have such values. Although the Sennheiser HD 650 circumaural headphone is often used in behavioral experiments and listening tests, its RETSPL values have not yet been published. The HD 650 circumaural headphone was measured at frequencies between 125 Hz and 16 kHz for twenty five young listeners whose ages were in the range of 19 to 28 years in order to establish a RETSPL value. In addition, the paper compares the currently measured RETSPLs with several other types of circumaural and supra-aural headphones: the Sennheiser HDA 300, the Sennheiser HDA 200, the Beyer DT 48, and the Telephonics TDH 39. The results showed significant differences between the data for circumaural and supra-aural headphones at low frequencies below 500 Hz.
Download: PDF (HIGH Res) (1.0MB)
Download: PDF (LOW Res) (115KB)
Dialog recording, editing, and replacement is probably one of the most important aspects of movie sound production. It is the basis of good storytelling, as poor dialog is the “best way to ruin a movie.” Panelists during Audio for Cinema sessions at the 143rd Convention tackle this fascinating topic. Afterwards, a panel chaired by Nuno Fonseca debated the future challenges of audio for cinema.
Download: PDF (287KB)