AES E-Library

AES E-Library Search Results

Search Results (Displaying 1-10 of 11 matches) New Search
Sort by:
                 Records Per Page:

Bulk download: Download Zip archive of all papers from this Journal issue

 

The Fast Local Sparsity Method: A Low-Cost Combination of Time-Frequency Representations Based on the Hoyer Sparsity

Document Thumbnail

This paper describes a novel, low-cost method for combining time-frequency representations into a more sparse one. To this end, a new local quality measure that is based on an amplitude-weighted version of the so-called Hoyer sparsity is proposed. A detailed evaluation procedure that employs a dataset with nearly perfect f0 annotations of melodic signals and a set of white-noise pulses is adopted for assessing the time-frequency resolution attained. The proposed method is shown to produce state-of-the-art results among the existing combination methods in terms of energy concentration at frequency contours, onsets, and offsets, meeting the most desirable requirements: high time-frequency resolution, low computational cost, and the capability of combining representations with non-linear frequency scale.

Authors:
Affiliations:
JAES Volume 70 Issue 9 pp. 698-707; September 2022 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


A Comparative Study of Music Mastered by Human Engineers and Automated Services

Document Thumbnail

Technological advances in music and audio engineering have brought forth a new method of audio mastering in the form of online automatic mastering services. However, many claim that the results from automatic mastering services are inferior to the work of professional human engineers. The presented investigation explores the perception of mastered products by popular mastering services found online. Music was submitted for mastering provided by two human mastering engineers and two automatic mastering services. In a listening test, subjects were asked to identify human-mastered samples. Later, subjects were asked to provide preference rankings among human-mastered and instant mastered samples. Furthermore, objective parameters pertaining to timbre and spectral energy distribution were calculated from stimuli. Subjects were unable to consistently identify human-mastered musical samples. Preference towards human-mastered samples was observed from jazz excerpts but not from rock excerpts. These results show partial support for claims of human mastering superiority based on preference. This study provides a new perspective on the perception of content from human and instant mastering, which may offer a first step to understanding many differences between the two services.

Authors:
Affiliation:
JAES Volume 70 Issue 9 pp. 764-776; September 2022 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Antialiasing for Simplified Nonlinear Volterra Models

Document Thumbnail

Antiderivative antialiasing (ADAA) has emerged as a recent approach to reduce aliases for mathematically defined nonlinearities. In this study, ADAA is applied to simplified nonlinear Volterra modeling, which is a method for blackbox modeling of Hammerstein nonlinearities. Previously reported ADAA approaches contain a variable difference term in the denominator and therefore rely on a continuous piecewise function to prevent very small denominators. However, when applied to simplified Volterra models, this denominator term is eliminated, resulting in a polynomial function. This polynomial ADAA was tested against the standard approach of low-pass filtering the input to prevent aliasing. It was found that these two approaches perform comparably but that by combining them together, superior alias reduction can be achieved.

Authors:
Affiliation:
JAES Volume 70 Issue 9 pp. 690-697; September 2022 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Conditioned Source Separation by Attentively Aggregating Frequency Transformations With Self-Conditioning

Label-conditioned source separation extracts the target source, specified by an input symbol, from an input mixture track. A recently proposed label-conditioned source separation model called Latent Source Attentive Frequency Transformation (LaSAFT)--Gated Point-Wise Convolutional Modulation (GPoCM)--Net introduced a block for latent source analysis called LaSAFT. Employing LaSAFT blocks, it established state-of-the-art performance on several tasks of the MUSDB18 benchmark. This paper enhances the LaSAFT block by exploiting a self-conditioning method. Whereas the existing method only cares about the symbolic relationships between the target source symbol and latent sources, ignoring audio content, the new approach also considers audio content. The enhanced block computes the attention mask conditioning on the label and the input audio feature map. Here, it is shown that the conditioned U-Net employing the enhanced LaSAFT blocks outperforms the previous model. It is also shown that the present model performs the audio-query--based separation with a slight modification.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 70 Issue 9 pp. 661-673; September 2022 Permalink
Publication Date:


Download Now (882 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Deep Audio Effects for Snare Drum Recording Transformations

Document Thumbnail

The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a data-driven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an objective evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful in emulating microphone position changes.

Open Access

Open
Access

Authors:
Affiliation:
JAES Volume 70 Issue 9 pp. 742-752; September 2022 Permalink
Publication Date:


Download Now (676 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Loudspeaker Equalization for a Moving Listener

Document Thumbnail

When a person listens to loudspeakers, the perceived sound is affected not only by the loudspeaker properties but also by the acoustics of the surroundings. Loudspeaker equalization can be used to correct the loudspeaker-room response. However, when the listener moves in front of the loudspeakers, both the loudspeaker response and room effect change. In order for the best correction to be achieved at all times, adaptive equalization is proposed in this paper. A loudspeaker-correction system using the listener's current location to determine the correction parameters is proposed. The position of the listener's head is located using a depth-sensing camera, and suitable equalizer settings are then selected based on measurements and interpolation. After correcting for the loudspeaker's response at multiple locations and changing the equalization in real time based on the user's location, a loudspeaker response with reduced coloration is achieved compared to no calibration or conventional calibration methods, with the magnitude-response deviations decreasing from 10.0 to 5.6 dB within the passband of a high-quality loudspeaker. The proposed method can improve the audio monitoring in music studios and other occasions in which a single listener is moving in a restricted space.

Open Access

Open
Access

Authors:
Affiliation:
JAES Volume 70 Issue 9 pp. 722-730; September 2022 Permalink
Publication Date:


Download Now (727 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Nyquist Band Transform: An Order-Preserving Transform for Bandlimited Discretization

Document Thumbnail

This article proposes a method suitable for discretizing continuous-time infinite impulse response filters with features near or above the Nyquist limit. The proposed method, called the Nyquist Band Transform (NBT), utilizes conformal mapping to pre-map a prototype continuous-time system, such that, when discretized through the bilinear transform, the discretized frequency response is effectively truncated at the Nyquist limit. The discretized system shows little frequency warping when compared with the original continuous-time magnitude response. The NBT is order-preserving, parametrizable, and agnostic to the original system's design. The efficacy of the NBT is demonstrated through a virtual analog modeling application.

Authors:
Affiliation:
JAES Volume 70 Issue 9 pp. 674-689; September 2022 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Phase-Aware Transformations in Variational Autoencoders for Audio Effects

Document Thumbnail

This paper analyzes the impact of signal phase handling in one of the most popular architectures for the generative synthesis of audio effects: variational autoencoders (VAEs). Until quite recently, autoencoders based on the Fast Fourier Transform routinely avoided the phase of the signal. They store the phase information and retrieve it at the output or rely on signal phase regenerators such as Griffin--Lim. We evaluate different VAE networks capable of generating a latent space with intrinsic information from signal amplitude and phase. The Modulated Complex Lapped Transform (MCLT) has been evaluated as an alternative to the Short-Time Fourier Transform (STFT). A novel database on beats has been designed for testing the architectures. Results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on STFT and MCAT representations, using Griffin--Lim phase regeneration, multichannel networks, as well as the Complex VAE. The autoencoders successfully learned to represent the phase information and handle it in a holistic approach. State-of-the-art quality standards were reached for audio effects. The autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.

Authors:
Affiliations:
JAES Volume 70 Issue 9 pp. 731-741; September 2022 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Style Transfer of Audio Effects with Differentiable Signal Processing

Document Thumbnail

This work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. A deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. In contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. Pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. A survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. The proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. Convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 70 Issue 9 pp. 708-721; September 2022 Permalink
Publication Date:


Download Now (450 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


The Dynamic Grid: Time-Varying Parameters for Musical Instrument Simulations Based on Finite-Difference Time-Domain Schemes

Document Thumbnail

Several well-established approaches to physical modeling synthesis for musical instruments exist. Finite-difference time-domain methods are known for their generality and flexibility in terms of the systems one can model but are less flexible with regard to smooth parameter variations due to their reliance on a static grid. This paper presents the dynamic grid, a method to smoothly change grid configurations of finite-difference time-domain schemes based on sub-audio--rate time variation of parameters. This allows for extensions of the behavior of physical models beyond the physically possible, broadening the range of expressive possibilities for the musician. The method is applied to the 1D wave equation, the stiff string, and 2D systems, including the 2D wave equation and thin plate. Results show that the method does not introduce noticeable artifacts when changing between grid configurations for systems, including loss.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 70 Issue 9 pp. 650-660; September 2022 Permalink
Publication Date:


Download Now (540 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


                 Search Results (Displaying 1-10 of 11 matches)
AES - Audio Engineering Society