AES E-Library

AES E-Library Search Results

Search Results (Displaying 1-10 of 13 matches) New Search
Sort by:
                 Records Per Page:

Bulk download: Download Zip archive of all papers from this Journal issue

 

A General Framework for Incorporating Time–Frequency Domain Sparsity in Multichannel Speech Dereverberation

Document Thumbnail

Effective speech dereverberation is a prerequisite in such applications as hands-free telephony, voice-based human-machine interfaces, and hearing aids. Blind multichannel speech dereverberation methods based on multichannel linear prediction (MCLP) can estimate the dereverberated speech component without any knowledge of the room acoustics. This can be achieved by estimating and subtracting the undesired reverberant component from the reference microphone signal. This report presents a general framework that exploits sparsity in the time–frequency domain of a MCLP-based speech dereverberation. The framework combines a wideband or a narrowband signal model with either an analysis or a synthesis sparsity prior, and generalizes state-of-the-art MCLP-based speech dereverberation methods.

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 17-30; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


A Machine-Learning Approach to Application of Intelligent Artificial Reverberation

Document Thumbnail

Digital audio effects, such as adding artificial reverberation, are actually transformations on an audio signal, where the transformation depends on a set of control parameters. Users change parameters over time based on the resulting perceived sound. This research simulates the process of automating the parameters using supervised learning to train classifiers so that they automatically assign effect parameter sets to audio features. Training can be done a-priori, as for example, by an expert user of the reverberation effects, or online by the user of such an effect. An automatic reverberator trained on a set of audio is expected to be able to apply reverberation correctly on similar audio defined by such properties as timbre, tempo, etc. For this reason, in order to create a reverberation effect that is as general as possible, training requires a large and diverse set of audio data. In one investigation, the user provides monophonic examples of desired reverberation characteristics for individual tracks taken from the Open Multitrack Testbed. This data was used to train a set of models that will automatically apply reverberation to similar tracks. The model was evaluated using classifier f1-scores, mean squared errors, and multistimulus listening tests. The best features from a 31-dimensional feature space were used.

Authors:
Affiliation:
JAES Volume 65 Issue 1/2 pp. 56-65; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


A Rapid Sensory Analysis Method for Perceptual Assessment of Automotive Audio

Document Thumbnail

As today’s automotive audio systems rapidly evolve, it is unclear if the current perceptual assessment protocols fully capture the human sensations evoked by such new systems. The highly complex and acoustically hostile environment of the automobile cabin hinders the effectiveness of standard objective metrics, while lacking robustness, repeatability, and perceptual relevance. This report examines the current assessment protocols and their identified limitations. A new design of an assessment protocol is proposed. It uses the Spatial Decomposition Method for acquiring, analyzing, and reproducing the sound field in a laboratory over loudspeakers, thereby allowing instant comparisons of automotive audio systems. A rapid sensory analysis protocol, the Flash Profile, is employed for evaluating the perceptual experience using individually elicited attributes, in a time-efficient manner. A pilot experiment is described, where experts, experienced, and naive assessors followed the procedure and evaluated three sound fields. Current findings suggest that this method allows for the assessment of both spatial and timbral properties of automotive sound.

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 130-146; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this report!


A Speech Preprocessing Method Based on Overlap-Masking Reduction to Increase Intelligibility in Reverberant Environments

Document Thumbnail

The reproduction of speech over loudspeakers in a reverberant environment is often encountered in daily life, as for example, in a train station or during a telephone conference. Spatial reverberation degrades intelligibility. This study proposes two perceptually motivated preprocessing approaches that are applied to the dry speech before being played into a reverberant environment. In the first algorithm, which assumes prior knowledge of the room impulse response, the amount of overlap-masking due to successive phonemes is reduced. In the second algorithm, emphasizing onsets is combined with overlap-masking. A speech intelligibility model is used to find the best parameters for these algorithms by minimizing the predicted speech reception thresholds. Listening tests show that this preprocessing method can indeed improve speech intelligibility in reverberant environments. In listening tests, Speech Reception Thresholds improved up to 6 dB.

Authors:
Affiliation:
JAES Volume 65 Issue 1/2 pp. 31-41; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Confidence Measures for Nonintrusive Estimation of Speech Clarity Index

Document Thumbnail

In many situations, measuring the amount and type of reverberation in a room assumes that the room impulse response is available for the computation. When that impulse response is not available, a nonintrusive room acoustic (NIRA) method must be used. In this report, the authors use the C50 clarity index to characterize reverberation in the signal because it has been shown to be more highly correlated with the speech recognition performance then other measures of reverberation. Multiple features are extracted from a reverberant speech signal and they are then used to train a bidirectional long short-term memory model that maps from the feature space into the target C50 value. Prediction intervals, which provide an upper and lower bound of the estimate, can be derived from the standard deviation of the per frame estimations. Confidence measures are then obtained by normalizing these prediction intervals. These measures are highly correlated with the absolute C50 estimation errors. The performance of the prediction intervals and confidence measure are shown to be consistent in many different noisy reverberant environments. The procedure proposed in this paper for deriving C50 prediction intervals and confidence measures could as well be applied to other room acoustic parameter estimation, for example, T60 (reverberation decay time to 60 dB) or DRR (direct to reverberation ratio).

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 90-99; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Instrumental and Perceptual Evaluation of Dereverberation Techniques Based on Robust Acoustic Multichannel Equalization

Document Thumbnail

Speech signals recorded in an enclosed space by microphones at a distance from the speaker are often corrupted by reverberation, which arises from the superposition of many delayed and attenuated copies of the source signal. Because reverberation degrades the signal, removing reverberation would enhance quality. Dereverberation techniques based on acoustic multichannel equalization are known to be sensitive to room impulse response perturbations. In order to increase robustness, several methods have been proposed, as for example, using a shorter reshaping filter length, incorporating regularization, or applying a sparsity-promoting penalty function. This paper focuses on evaluating the performance of these methods for single-source multi-microphone scenarios, using instrumental performance measures as well as using subjective listening tests. By analyzing the correlation between the instrumental and the perceptual results, it is shown that signal-based performance measures are more advantageous than channel-based performance measures to evaluate the perceptual speech quality of signals that were dereverberated by equalization techniques. Furthermore, this analysis also demonstrates the need to develop more reliable instrumental performance measures.

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 117-129; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Multichannel Wiener Filters in Binaural and Bilateral Hearing Aids — Speech Intelligibility Improvement and Robustness to DoA Errors

Document Thumbnail

An adaptive multichannel Wiener Filter (MWF) can be used for joint dereverberation and noise reduction in hearing aids. Using the short time objective intelligibility (STOI) measure, the authors compare bilateral and binaural configurations of the MWF for several cases: (a) different arrival direction of arrival (DoA) of the target speech, (b) different errors in the assumed DoA, and (c) different levels of microphone self-noise. While being much less robust against DoA errors, the binaural MWFs outperformed the bilateral MWF if the correct DoA is assumed. Furthermore, the bilateral MWF was shown to be affected by the microphone self-noise more than the binaural MWFs. A listening test indicated that a well-steered binaural MWF is able to improve the speech intelligibility in a noisy and reverberant speech scenario, and that this improvement is greater than that of the bilateral MWF. This was true despite the fact that the binaural MWF distorted the binaural cues such that no binaural advantage could be obtained. The post-filters of the bilateral and the binaural MWFs significantly improved the measured speech intelligibility because of the particular maximum likelihood spectral estimator that was used to compute the spectral gain of the filters.

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 8-16; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Object-Based Reverberation for Spatial Audio

Document Thumbnail

To enable future audio systems to be more immersive, interactive, and easily accessible, object-based frameworks are currently being explored as a means to that ends. In object-based audio, a scene is composed of a number of objects, each comprising audio content and metadata. The metadata is interpreted by a renderer, which creates the audio to be sent to each loudspeaker with knowledge of the speci?c target reproduction system. While recent standardization activities provide recommendations for object formats, the method for capturing and reproducing reverberation is still open. This research presents a parametric approach for capturing, representing, editing, and rendering reverberation over a 3D spatial audio system. A Reverberant Spatial Audio Object (RSAO) allows for an object to synthesize the required reverberation. An example illustrates a RSAO framework with listening tests that show how the approach correctly retains the room size and source distance. An agnostic rendering can be used to alter listener envelopment. Editing the parameters can also be used to alter the perceived room size and source distance; greater envelopment can be achieved with the appropriate reproduction system.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 66-77; January 2017 Permalink
Publication Date:


Download Now (500 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Perceptual Evaluation and Analysis of Reverberation in Multitrack Music Production

Despite the prominence of artificial reverberation in music production, there are few studies that explore the conventional usage and the resulting perception in the context of a mixing studio. Research into the use of artificial reverberation is difficult because of the lack of standardized parameters, inconsistent interfaces, and a diverse group of algorithms. In multistimuli listening tests, trained engineers were asked to rate 80 mixes that were generated from 10 professional-grade music recordings. Annotated subjective comments were also analyzed to determine the importance of reverberation in the perception of mixes, as well as classifying mixes as having too much or too little overall reverberation. The results support the notion that a universally preferred amount of reverberation is unlikely to exist, but the upper and lower bounds can be identified. The importance of careful parameter adjustment is evident from the limited range of acceptable feature values with regard to the perceived amount of reverberation relative to the just-noticeable differences in both reverberation loudness and early decay time. This study confirms previous findings that a perceived excess of reverberation typically has a detrimental effect on subjective preference. The ability to predict the desired amount of reverberation with a reasonable degree of accuracy has applications in automatic mixing and intelligent audio effects.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 65 Issue 1/2 pp. 108-116; January 2017 Permalink
Publication Date:


Download Now (1.6 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Rolling out AES67

Document Thumbnail

[Feature] AES67 has reached the point where it has been accepted very widely by the industry. It provides a mode of operation that allows one system to communicate audio information with another, while still allowing for enhanced features to be added on top. While certain features such as device/stream discovery were intentionally omitted from the standard, open solutions are emerging that will deal with this.

Author:
JAES Volume 65 Issue 1/2 pp. 148-151; January 2017 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!


                 Search Results (Displaying 1-10 of 13 matches)
AES - Audio Engineering Society