AES E-Library

AES E-Library Search Results

Search Results (Displaying 1-10 of 20 matches) New Search
Sort by:
                 Records Per Page:

Bulk download - click topic to download Zip archive of all papers related to that topic:   No Subject Listed    Acoustics    Applications in Audio    Diversity Equity and Inclusion    Education    Immersive & Spatial Audio    Perception    Recording & Production    Recording and Production    Room Acoustics    Signal Processing    Transducers / Converters   

 

Style Transfer for Non-differentiable Audio Effects

Document Thumbnail

Digital audio effects are widely used by audio engineers to alter the acoustic and temporal qualities of audio data. However, these effects can have a large number of parameters which can make them difficult to learn for beginners and hamper creativity for professionals. Recently, there have been a number of efforts to employ progress in deep learning to acquire the low-level parameter configurations of audio effects by minimising an objective function between an input and reference track, commonly referred to as style transfer. However, current approaches use inflexible black-box techniques or require that the effects under consideration are implemented in an auto-differentiation framework. In this work, we propose a deep learning approach to audio production style matching which can be used with effects implemented in some of the most widely used frameworks, requiring only that the parameters under consideration have a continuous domain. Further, our method includes style matching for various classes of effects, many of which are difficult or impossible to be approximated closely using differentiable functions. We show that our audio embedding approach creates logical encodings of timbral information, which can be used for a number of downstream tasks. Further, we perform a listening test which demonstrates that our approach is able to convincingly style match a multi-band compressor effect.

Author:
Affiliation:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Signal Processing!


Analysis of Musical Spectral Distortions as a Rating Mechanism for High-Fidelity Earplugs

Document Thumbnail

High-fidelity earplugs are generally assumed to preserve the musical characteristics of attenuated sound reaching the eardrums better than general-purpose earplugs by having flatter spectral attenuation (or insertion loss) characteristics. However, as all earplugs display non-flat insertion loss characteristics, quantifying and comparing spectral distortion magnitude is non-trivial: a simple calculation of flatness doesn’t capture information about the relevance to musical perception of the different types of spectral distortions. In this paper, a method is proposed that compares earplug spectral distortion by their effects on commonly used spectral measures of timbre on a collection of audio files: centroid, spread, skewness, kurtosis, flatness, and brightness. This musical spectral distortion score is shown to have very little correlation to a simple metric evaluating insertion loss flatness, and moderate correlation to simple average insertion loss in decibels, while also showing that the a foam earplug is more distorting than any of the musician earplugs.

Author:
Affiliation:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Applications in Audio!


Dynamic Polar Patterns: Advancing Recordist Agency via Dual-Output Microphones

Document Thumbnail

While multichannel mediation continues to grow in popularity, traditional mono and stereophonic recording techniques remain those underpinning audio production workflows. By incorporating dual-output microphone technology into established practices, capacity exists for nuancing recordist agency in ways not documented in existing literature. The Dynamic Polar Pattern is introduced as a simple process to simulate polar patterns changing shape over time, with affordances associated to proximity effect, distance factor, frequency masking and stereo width. Practice-led and practice-based methodology catalogues benefits of dual-output agency including the ability to capture multiple stereo techniques simultaneously, pedagogical attribute demonstration, rear-output panning, performance panning, sample packaging and DIY microphone modelling. An overarching position for “Why employ dual-output microphones?” is interrogated alongside technical data.

Open Access

Open
Access

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:


Download Now (1.4 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Recording and Production!


Generative Machine Listener

Document Thumbnail

We show how a neural network can be trained on individual intrusive listening test scores to predict a distribution of scores for each pair of reference and coded input stereo or binaural signals. We nickname this method the Generative Machine Listener (GML), as it is capable of generating an arbitrary amount of simulated listening test data. Compared to a baseline system using regression over mean scores, we observe lower outlier ratios (OR) for the mean score predictions, and obtain easy access to the prediction of confidence intervals (CI). The introduction of data augmentation techniques from the image domain results in a significant increase in CI prediction accuracy as well as Pearson and Spearman rank correlation of mean scores.

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Signal Processing!


Detection of phase alignment and polarity in drum tracks

Document Thumbnail

A time-shift applied to individual tracks that removes timing differences between microphones, called “phase alignment,” is frequently promoted as a way to improve the clarity and definition of live-recorded drum tracks. Common techniques include manual and automated micro-timing adjustments and switching the electrical polarity of “problem” tracks. This study aimed to determine if there was a clear audible difference between an original and corrected recording. Using a paired comparison test, listeners were asked whether two audio samples were the same or different, and later, asked for individual preference between the two samples. Evidence here questions the tacit assumption that time-shift techniques have the claimed influence to greatly improve, or even appreciably alter, the observed quality of a drum mix.

Open Access

Open
Access

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:


Download Now (1.2 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Applications in Audio!


Convolutional Transformer for Neural Speech Coding

Document Thumbnail

In this paper, we propose a Convolutional-Transformer speech codec which utilizes stacks of convolutions and self-attention layers to remove redundant information at the downsampling and upsampling blocks of a U-Net-style encoder-decoder neural codec architecture. We design the Transformers to use channel and temporal attention with any number of attention stages and heads while maintaining causality. This allows us to take into consideration the characteristics of the input vectors and flexibly utilize temporal and channel-wise relationships at different scales when encoding the salient information that is present in speech. This enables our model to reduce the dimensionality of its latent embeddings and improve its quantization efficiency while maintaining quality. Experimental results demonstrate that our approach achieves significantly better performance than convolution-only baselines.

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Signal Processing!


Improved Panning on Non-Equidistant Loudspeakers with Direct Sound Level Compensation

Document Thumbnail

Loudspeaker rendering techniques that create phantom sound sources often assume an equidistant loudspeaker layout. Typical home setups might not fulfill this condition as loudspeakers deviate from canonical positions, thus requiring a corresponding calibration. The standard approach is to compensate for delays and to match the loudness of each loudspeaker at the listener’s location. It was found that a shift of the phantom image occurs when this calibration procedure is applied and one of a pair of loudspeakers is significantly closer to the listener than the other. In this paper, a novel approach to panning on non-equidistant loudspeaker layouts is presented whereby the panning position is governed by the direct sound and the perceived loudness is governed by the full impulse response. Subjective listening tests are presented that validate the approach and quantify the perceived effect of the compensation. In a setup where the standard calibration leads to an average error of 10?, the proposed direct sound compensation largely returns the phantom source to its intended position.

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Recording and Production!


A Novel Digital Audio Network for Musical Instruments

Document Thumbnail

We present the rationale behind the development of a new digital audio network to be used by professional audio equipment consumers, musicians and audio manufacturers aimed at minimizing cost while simplifying system complexity, instrument connectivity, digital transmission, channel count, bus power capability and equipment manufacturer variants for today’s musicians needs. This paper presents a physical layer and digital protocol that must be carefully studied by the industry for further refinement.

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Signal Processing!


Vocal Affects Perceived from Spontaneous and Posed Speech

Document Thumbnail

This study examines listeners’ natural ability to identify an anonymous speaker’s emotions from speech samples with broad ranges of emotional intensity. This study aims to compare emotional ratings between posed and spontaneous speech samples and analyzes how basic acoustic parameters are utilized. The spontaneous samples were extracted from the Korean Spontaneous Speech corpus consisting of casual conversations. The posed samples with emotions (happiness, neutrality, anger, sadness) were obtained from the Emotion Classification dataset. Non-native listeners were asked to evaluate seven opposite pairs of affective attributes perceived from the speech samples. Listeners perceived fewer spontaneous samples as having negative valences. The posed samples had higher mean rating scores than those of the spontaneous speeches, only in negative valences. Listeners reacted more sensitively to the posed than spontaneous speeches in negative valence and had difficulty detecting happiness from the posed samples. The spontaneous samples perceived as positive had higher variance in pitch and higher maximum pitch than those perceived as negative. Contrastingly, the posed samples perceived as negative valences were positively correlated with higher values of the pitch parameters. These results can be utilized to assign specific vocal affects to artificial intelligence voice agents or virtual humans, rendering more human-like voices.

Open Access

Open
Access

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:


Download Now (3.8 MB)

This paper is Open Access which means you can download it for free.

Start a discussion about this Signal Processing!


Perceptual Comparison of Dynamic Binaural Reproduction Methods for Head-Mounted Microphone Arrays

Document Thumbnail

This paper presents results of a listening experiment evaluating three-degrees-of-freedom binaural reproduction of head-mounted microphone array signals. The methods are applied to an array of five microphones whose signals were simulated for static and dynamic array orientations. Methods under test involve scene-agnostic binaural reproduction methods as well as methods that have knowledge of (a subset of) source directions. In the results, the end-to-end magnitude-least-squares reproduction method outperforms other scene-agnostic approaches. Above all, linearly constrained beamformers using known source directions in combination with the end-to-end magnitude-least-squares method outcompete the scene-agnostic methods in perceived quality, especially for a rotating microphone array under anechoic conditions.

Authors:
Affiliations:
AES Convention: Paper Number: Permalink
Publication Date:
Subject:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this Immersive & Spatial Audio!


                 Search Results (Displaying 1-10 of 20 matches)
AES - Audio Engineering Society