Bulk download - click topic to download Zip archive of all papers related to that topic: Applications in audio Architectural Acoustics Audio and Education Audio quality Audio Signal Processing Evaluation of spatial audio Forensic audio Listening tests and case-studies Multichannel and spatial audio processing and applications Spatial audio applications Spatial audio applicatons Transducers
This work introduces an Open Source turntable for the measurement of electro-acoustical devices. The idea is to provide an inexpensive and highly customizable device that can be adjusted according to specific measurement needs. Development of such turntable devices in the past required significant i n vestment. Specific mechanical and motor control design skills were needed, leading to both costly and time-consuming processes. Recent developments in mechatronics and 3D printing allow to design and build a cost-effective solution.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
A method for recreating complex soundscapes for tasks related to audio quality evaluation is presented. This approach uses an ambisonics-inspired basis for recreating dynamic noise in a system compatible with ETSI standard EG 202 396-1 for background noise reproduction. Recordings were captured with a spherical 32-microphone array and processed to match the two-dimensional four-loudspeaker array by creating four directional beams, each feeding an individual channel. As a result, a spatial background noise ambience is recreated, preserving the transient characteristics of the original recording.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
The ability to generate appropriate auditory localization cues is an important requisite of spatial audio rendering technology that contributes to the plausibility of virtual sounds presented to a user, especially in XR applications (VR/AR/MR). Algorithmic approaches have been proposed to quantify such technologies’ ability to reproduce interaural level difference (ILD) cues through regression and statistical methods, providing a useful standardization and automation method to estimate the localization accuracy potential of a given spatial audio rendering engine. Previous approaches are extended to include interaural time difference (ITD) cues as part the perceptual transform through the use of the interaural transfer function (ITF). The extended algorithmic approach of quantifying localization accuracy may provide an adequate substitute for critical listening studies as an evaluation method. However, this approach has not yet been validated through comparison with localization listening studies. A review of listening tests are reviewed in conclusion to increase confidence in presented methods of algorithmically quantifying localization accuracy potential of a spatial audio rendering engine.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
MPEG-H Audio is a Next Generation Audio (NGA) system offering a new audio experience for various applications: Object-based immersive sound delivers a new degree of realism and artistic freedom for immersive music applications, such as the 360 Reality Audio music service. Advanced interactivity options enable improved personalization and accessibility. Solutions exist, to create object-based features from legacy material, e.g., deep-learning-based dialogue enhancement. 'Universal delivery' allows for optimal rendering of a production over all kinds of devices and various ways of distribution like broadcast or streaming. All these new features are achieved by adding metadata to the audio, which is defined during production and offers content providers flexible control of interaction and rendering options. Thus, new possibilities are introduced, but also new requirements during the production process are imposed. This paper provides an overview of production scenarios using MPEG-H Audio along with examples of state-of-the-art NGA production workflows. Special attention is given to immersive music and broadcast applications as well as accessibility features.
Download Now (1.9 MB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
This article deals with the realization of an automated classification of loudspeaker enclosures. The acoustic load of the enclosure is reflected in the electrical impedance of the loudspeaker and is hence detectable from the point of view of the power amplifier. In order to classify the enclosures of passive one-way speakers, an artificial neural network is trained with synthetic impedance spectra based on equivalent electrical circuit models. The generalization capability is validated with measured test sets of closed, vented, band-pass and transmission-line enclosures. The resulting classification procedure works well within a synthetic test set. However, a good generalization to the measured test data requires further investigations to achieve better separation between the different vented enclosure types.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Next Generation Audio (NGA) systems like MPEG-H Audio rely on metadata to enable a wide variety of features. Information such as channel layouts, the position and properties of audio objects, and user interactivity options are only some of the data that can be used to improve consumer experience. Creating these metadata requires suitable tools, which are used in a process known as "authoring", where interactive features and the options for 3D immersive sound rendering are defined by the content creator. Different types of productions impose specific requirements on these authoring tools, which leads to a number of solutions appearing in the market. Using the example of MPEG-H Audio, this paper will detail some of the latest developments and authoring solutions designed to enable immersive and interactive live and post-productions.
Download Now (273 KB)
This paper is Open Access which means you can download it for free.
Start a discussion about this paper!
In order to exploit strengths and avoid weaknesses of the First Order Ambisonics (FOA) microphone technique, we devised a new, portable 3D microphone recording technique, “W-Ambisonics.” This new technique incorporates a spaced stereo cardioid microphone pair (for frontal information) with two FOA microphone arrays (for lateral, rear, and height information). In W-Ambisonics, two FOA microphones are spaced 17 cm apart to capture and represent interaural cues precisely, with two cardioid microphones spaced 50 cm apart, 50 cm in front, which improves frontal directionality. Combining these two microphone pairs enables the translation of recorded audio into various reproduction formats according to practical limitations in reproduction peripherals. The design focus of this technique was efficiency in the recording stage and scalability in the reproduction stage. We conducted three perceptual experiments whose results show that the W-Ambisonics method enables improved lateral localization, provides comparable sound quality to the conventional spaced array technique, and translates spacious yet precise sound images in listening evaluations of a binauralized headphone rendering. The W-Ambisonics microphone technique is practical, precise, and scalable across multiple reproduction scenarios, from binaural to multichannel systems.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Tele-ensemble systems have been widely used and mixed acoustic signals with different spatial acoustic characteristics for each space are heard. However, the perceptual impression of mixed acoustic signals containing multiple different spatial acoustic characteristics has not been sufficiently investigated. In this study, a listening test was performed to survey the difference of impression regarding the mixed acoustic signals with single and different spatial acoustic characteristics. Three instrumental signals (guitar, bass and drums) were played with a loudspeaker and recorded in three rooms. For the listening test, four mixed acoustic signals were set: No.1 All instruments were captured in the low reverberation room. No.2 All instruments were captured in the medium reverberation room. No.3 All instruments were captured in the high reverberation room. No.4 Bass was captured in the low reverberation room, drum in the medium reverberation room and guitar in the high reverberation room. Participants listened to No. 4 and compared it with Nos. 1, 2, and 3 and selected one of the seven evaluation items (pleasant, natural, reverberant, coherent, clear, likeable, noisy). Participants perceived No.1 as a pleasant acoustic signal with little reverberation and Nos.2 and 3 as unpleasant acoustic signals with more reverberation compared to No.4. It is suggested that mixed acoustic signals recorded in the low reverberation room are considered the least reverberant and the most comfortable. These results lead us to the conclusion that the homogenization of spatial acoustic characteristics by suppressing reverberation from the acoustic signals captured in multiple spaces is considered to be one method to give a pleasant impression.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
This paper discovers methods to create vibrato effects with artificial reverberation. Feedback delay networks have been used for many reverb effects. Past work with time varying feedback delay networks has focused primarily on small modulations of the delays and or feedback matrices in order to create a more natural sounding reverb. In this paper, we consider the possibility of using wider modulations of these reverbs for the purposes of sound effect generation. Specifically, amplitude modulation and frequency modulation can be obtained by varying feedback matrices or delay lines respectively. The results showed a convincing vibrato effect with minor artifacts and promise for using FDNs in sound effect generation. Future work will include reducing artifacts and fine tuning control parameters.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!
Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field (= 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!
This paper costs $33 for non-members and is free for AES members and E-Library subscribers.
Start a discussion about this paper!