Bulk download - click topic to download Zip archive of all papers related to that topic: Applications in audio Architectural Acoustics Audio and Education Audio quality Audio Signal Processing Evaluation of spatial audio Forensic audio Listening tests and case-studies Multichannel and spatial audio processing and applications Spatial audio applications Spatial audio applicatons Transducers
Robust real-time audio signal enhancement increasingly relies on multichannel microphone arrays for signal acquisition. Sophisticated beamforming algorithms have been developed to maximize the benefit of multiple microphones. With the recent success of deep learning models created for audio signal processing, the task of Neural Beamforming remains an open research topic. This paper presents a Neural Beamformer architecture capable of performing spatial beamforming with microphones randomly distributed over very large areas, even in negative signal-to-noise ratio environments with multiple noise sources and reverberation. The proposed method combines adaptive, nonlinear filtering and the computation of spatial relations with state-of-the-art mask estimation networks. The resulting End-to-End network architecture is fully differentiable and provides excellent signal separation performance. Combining a small number of principal building blocks, the method is capable of low-latency, domain-specific signal enhancement even in challenging environments.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Traditional room-equalization involves exciting one loudspeaker at a time and deconvolving the loudspeaker-room response from the recording. As the number of loudspeakers and positions increase, the time required to measure loudspeaker-room responses will increase. In this paper, we present a technique to deconvolve impulse responses after exciting all loudspeakers at the same time. The stimuli are shifted relative to a base-stimuli and are optionally pre-processed with arbitrary filters to create specific sounding signals. The stimuli shift ensures capture of the low-frequency reverberation tail after deconvolution. Various deconvolution techniques, including with and without spectrum-shaping filters, are presented. The performance in terms of log-spectral distortion, as a function of stimuli length and shift, and impulse and magnitude response error plots for the Multichannel Acoustic Reverberation Dataset at York (MARDY) are presented.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Devices from smartphones to televisions are beginning to employ dual purpose displays, where the display serves as both a video screen and a loudspeaker. In this paper we demonstrate a method to generate localized sound-radiating regions on a flat-panel display. An array of force actuators affixed to the back of the panel is driven by appropriately filtered audio signals so the total response of the panel due to the actuator array approximates a target spatial acceleration profile. The response of the panel to each actuator individually is initially measured via a laser vibrometer, and the required actuator filters for each source position are determined by an optimization procedure that minimizes the mean squared error between the reconstructed and targeted acceleration profiles. Since the single-actuator panel responses are determined empirically, the method does not require analytical or numerical models of the system’s modal response, and thus is well-suited to panels having the complex boundary conditions typical of television screens, mobile devices, and tablets. The method is demonstrated on two panels with differing boundary conditions. When integrated with display technology, the localized audio source rendering method may transform traditional displays into multimodal audio-visual interfaces by colocating localized audio sources and objects in the video stream.
Download Now (2.6 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
This study introduces an inclusive and innovative online teaching pedagogy in sound design and modular synthesis using open-source software to achieve ideal student-centered learning outcomes and experience during the COVID-19 pandemic. This pedagogy proved to be effective after offering the course, conducting human subject research, and analyzing class evaluation data. The teaching strategies include comprehensive analysis in sound synthesis theory using sample patches, introduction to primary electronics, collaborative learning, hands-on lab experiments, student presentations, and alternative reading assignments in the form of educational videos. Online teaching software solutions were implemented to track student engagement. From a transformative perspective, the authors aim to cultivate student-centered learning, inclusive education, and equal opportunity in higher education in an online classroom setting. The goal is to achieve the same level of engagement as in-person classes, inspire a diverse student body, offer ample technical and mental support, as well as open the possibility of learning sound design through Eurorack modular synthesizers without investing money in expensive hardware. Students’ assignments, midterms, and final projects demonstrated their thorough understanding of the course material, strong motivation, and vibrant creativity. Human subject research was conducted during the course to improve the students’ learning experience and further shape the pedagogy. Three surveys and one-on-one interviews were given to a class of 25 students. The qualitative and quantitative data indicates the satisfaction and effectiveness of this student-centered learning pedagogy. Promoting social interaction and student well-being while teaching challenging topics during challenging times was also achieved.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
The effect of multiplicative flicker noise superimposed on audio equipment on the sense of hearing was considered. Variable resistors used for volumes generate flicker noise, which indicates that it acts multiplicatively on the signal flowing through it. Flicker noise measurements were made for some variable resistors. In addition, the audition test was conducted to investigate the perceptible magnitude of the case where the flicker noise acts on the signal in a multiplicative manner. As a result, it was concluded that untrained individuals rarely could discern the multiplicative effect of volume flicker noise.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Interior panning algorithms enable content authors to position auditory events not only at the periphery of the loudspeaker configuration, but also within the internal space between the listeners and the loudspeakers. In this study such algorithms are rigorously evaluated, comparing rendered static auditory events at various locations against true physical loudspeaker references. Various algorithmic approaches are subjectively assessed in terms of; Overall, Timbral, and Spatial Quality for three different stimuli, at five different positions and three radii. Results show for static positions that standard Vector Base Amplitude Panning performs equal, or better, than all other interior panning algorithms tested here. Timbral Quality is maintained throughout all distances. Ratings for Spatial Quality vary, with some algorithms performing significantly worse at closer distances. Ratings for Overall Quality reduce moderately with respect to reduced reproduction radius and are predominantly influenced by Timbral Quality.
Download Now (804 KB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
User generated recordings (UGRs) are common in audio forensic examination. The prevalence of handheld private recording devices, stationary doorbell cameras, law enforcement body cameras, and other systems capable of creating UGRs at public incidents is only expected to increase with the development of new and less expensive recording technology. It is increasingly likely that an audio forensic examiner will have to deal with an ad hoc collection of unsynchronized UGRs from mobile and stationary audio recording devices. The examiner’s tasks will include proper time synchronization, deducing microphone positions, and reducing the presence of competing sound sources and noise. We propose a standard forensic methodology for handling UGRs, including best practices for assessing authenticity and timeline synchronization.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Hearing protection devices (HPDs) are essential for musicians during loud performances to avoid hearing damage, but the standard Noise Reduction Rating (NRR) performance metric for HPDs metric says little about their behavior in a musical setting. One analysis tool being used to evaluate HPDs in the noise exposure research community is kurtosis measured in the ear and the reduction of noise kurtosis through an HPD. A musical signal, especially live music, will often have a high crest factor and kurtosis, so evaluating kurtosis loss will be important for an objective evaluation of musicians’ HPDs. In this paper, a background on kurtosis and filters affecting kurtosis is described, as well as a setup for generating high-kurtosis signals and measuring in-ear kurtosis loss through an HPD. Measurement results on a variety of musicians’ HPDs show that 83% of devices measured strongly reduce kurtosis, and that the kurtosis loss is likely an independent metric for performance because it is not correlated to the mean or standard deviation of the spectral insertion loss.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
We present a subjective evaluation of six 3D main-microphone techniques for three-dimensional binaural music production. Forty-seven subjects participated in the survey, listening on headphones. Results show a subjective preference for ESMA-3D, followed by Decca tree with height, of the included 3D arrays. However, the dummy head and a stereo AB microphone performed as well than any of the arrays for the general preference, timbre and envelopment. Though not implemented for this study, our workflow allows the possibility to include individualized HRTF's and head-tracking; their impact will be considered in a future study.
Download Now (552 KB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!