AES Journal

Journal of the Audio Engineering Society

AES Journal

The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.

The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.


Current Issue: 2016 July/August - Volume 64 Number 7/8

Guest Editors’ Note, Special Issue on Intelligent Audio Processing, Semantics, and Interaction

Author: George M. Kalliris, Charalampos A. Dimoulas, and Christian Uhle

Page: 464

Download: PDF (33KB)

Papers

Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features

Open Access

Open
Access

Authors:
Affiliation:
Page:

Understanding how a human mixing engineer functions is necessary for the design of intelligent music production tools. The goal of such tools is to generate mixes that could realistically have been created by a human mix-engineer. This paper presents an analysis of 1501 mixes, over 10 different songs, created by mix-engineers. The number of mixes of each song ranged from 97 to 373. A variety of objective signal features were extracted and principal component analysis was performed revealing four dimensions of mix-variation, which can be described as amplitude, brightness, bass, and width. Feature distribution suggests multimodal behavior dominated by one specific mode. This distribution appears to be independent of the choice of song, but with variation in modal parameters. This is then used to obtain general trends and tolerance bounds for these features. The results presented here are useful as parametric guidance for intelligent music production systems. This provides insight into the creative decision making processes of mix-engineers.

  Download: PDF (HIGH Res) (1.1MB)

  Download: PDF (LOW Res) (566KB)

  Be the first to discuss this paper

Organizing a sonic space through vocal imitations

Open Access

Open
Access

Authors:
Affiliation:
Page:

This research investigates how the vocal mimicking capabilities of humans may be exploited to access and explore a given sonic space. Experiments showed that prototype vocal sounds can be represented in a two-dimensional space and still remain perceptually distinct from each other. Experiments provide a measure of how meaningful the machine distribution and grouping of vocal sounds are to humans, and confirms that humans are able to effectively use the acoustic and articulatory cues at their disposal to associate sounds to given prototypes. When used in an automatic clustering process, these cues are sufficiently consistent with those used by humans when categorizing acoustic phenomena. The procedure of dimensionality reduction and clustering is demonstrated in the case of imitations of engine sounds, which then represent the sonic space of a motor sound model. A two-dimensional space is particularly attractive for sound design because it can be used as a sonic map where the landmarks contain both a synthetic sound and its vocal imitation.

  Download: PDF (HIGH Res) (6.0MB)

  Download: PDF (LOW Res) (432KB)

  Be the first to discuss this paper

Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound

Authors:
Affiliation:
Page:

A soundscape recording captures the sonic environment at a given location at a given time using one or more fixed or moving microphones. In most cases, the soundscape is uncontrolled and unscripted. Human listeners experience sonic components as being either background or foreground depending on their salient perceptual characteristics, such as proximity, repetition, and spectral attributes. Analyzing soundscapes in research tasks requires the classification and segmentation of the important sonic components, but that process is time consuming when done manually. This research establishes the background and foreground classification task within a musicological and soundscape context and then presents a method for the automatic segmentation of soundscape recordings. Using a soundscape corpus with ground truth data obtained from a human perception study, the analysis shows that participants have a high level of agreement on the category assigned to background samples (92.5%), foreground samples (80.8%), and background with foreground samples (75.3%). Experiments demonstrate how smaller window sizes affect the performance of the classifier.

  Download: PDF (HIGH Res) (1.2MB)

  Download: PDF (LOW Res) (162KB)

  Be the first to discuss this paper

Multipath Beat Tracking

Authors:
Affiliation:
Page:

Most music compositions evolve according to an underlying unit of time, sometimes implied and sometimes audible, called the beat. Beat trackers are essential components of rhythmic analysis systems in a wide range of applications involving musical information extraction. The authors describe a new methodology for tracking the sequence of beat instants given the ODF (Onset Detection Function) and the IBI (Inter Beat Interval) estimate. This alternate solution is based on a divide-and-conquer strategy that concurrently manages a multitude of simpler trackers from a rule-based perspective, effectively performing the search only for the most promising beat candidates. As confirmed by the experimental results, the performance of this method improves when the number of simple trackers (paths) is increased. Compared to dynamic programming, this approach is preferable in terms of computational efficiency while yielding accuracy scores that are comparable with many known beat trackers.

  Download: PDF (HIGH Res) (986KB)

  Download: PDF (LOW Res) (394KB)

  Be the first to discuss this paper

An Intelligent Interface for Drum Pattern Variation and Comparative Evaluation of Algorithms

Open Access

Open
Access

Authors:
Affiliation:
Page:

Drum tracks for electronic dance music are a central and style-defining element. But creating them can be a cumbersome task because of a lack of appropriate tools and input devices. The authors created a tool that supports musicians in an intuitive way for creating variations of drum patterns or finding inspiration for new patterns. Starting with a basic seed pattern provided by the user, a list of variations with varying degrees of similarity to the seed is generated. The variations are created using one of the three algorithms: a similarity-based lookup method using a rhythm pattern database, a generative approach based on a stochastic neural network, and a genetic algorithm using similarity measures as target function. Expert users in electronic music production evaluated aspects of the prototype and algorithms. In addition, a web-based survey was performed to assess perceptual properties of the variations in comparison to baseline patterns created by a human expert. The study shows that the algorithms produce musical and interesting variations and that the different algorithms have their strengths in different areas.

  Download: PDF (HIGH Res) (6.3MB)

  Download: PDF (LOW Res) (547KB)

  Be the first to discuss this paper

Improving Multilingual Interaction for Consumer Robots through Signal Enhancement in Multichannel Speech

Authors:
Affiliation:
Page:

In order for social robots to be truly successful, they need the ability to orally communicate with humans, providing feedback and accepting commands. Social robots need automatic speech recognition (ASR) tools that function with different users, using different languages, voice pitches, pronunciations, and speech speeds over a wide range of sound and noise levels. This paper describes different methodologies for voice activity detection and noise elimination when used with ASR-based oral interaction within an affordable budget robot. Acoustically quasi-stationary environments are assumed, which in conjunction with the high background noise of the robot’s microphones makes the ASR challenging. This work has been performed in the context of project RAPP, which attempts to deliver a cloud repository of applications and services that can be utilized by heterogeneous robots, aiming at assisting people with a range of disabilities. Results show that noise estimation and elimination techniques are necessary for successfully performing ASR in environments with quasi-stationary noise.

  Download: PDF (HIGH Res) (7.9MB)

  Download: PDF (LOW Res) (285KB)

  Be the first to discuss this paper

On the Impact of The Semantic Content of Sound Events in Emotion Elicitation

Authors:
Affiliation:
Page:

Sound events are known to have an influence on the listener’s emotions, but the reason for this influence is less clear. Take for example the sound produced by a gun firing. Does the emotional impact arise from the fact that the listener recognizes that a gun produced the sound (semantic content) or does it arise from the attributes of the sound created by the firing gun? This research explores the relation between the semantic similarity of the sound events and the elicited emotions. Results indicate that the semantic content seems to have a limited role in the conformation of the listener’s affective states. However, when the semantic content is matched to specific areas in the Arousal-Valence space or when the source’s spatial position is considered, the effect of the semantic content is higher, especially for the cases of medium to low valence and medium to high arousal or when the sound source is at the lateral positions of the listener’s head.

  Download: PDF (HIGH Res) (877KB)

  Download: PDF (LOW Res) (284KB)

  Be the first to discuss this paper

Supervised and Unsupervised Sound Retrieval by Vocal Imitation

Authors:
Affiliation:
Page:

Existing methods to index and search audio documents are generally based on text metadata and text-based search engines, but this approach is often problematic and time consuming because the text label does not necessarily describe the audio content. Query by Example (QBE) is an alternative approach for improving the effectiveness and efficiency of sound retrieval. In this research, the authors propose a novel approach for sound query by vocal imitation. Vocal imitation is commonly used in human communication and can be employed for human-computer interaction. Two proposals are suggested: (1) a supervised system that trains a multiclass classifier using training vocal imitations of different sound classes in the library and classifies a new imitation query into one of the classes; (2) an unsupervised system that is more flexible because it measures the feature distance between the imitation query and each sound in the library, returning sounds most similar to the query. Such systems require an effective feature representation of imitation queries and sounds in the library. Existing handcrafted audio features may not work well given the variety of vocal imitations and the mismatch between vocal imitations and actual sounds. It is therefore proposed to learn feature representations from training vocal imitations automatically using a Stacked Auto-Encoder (SAE). Experiments show that sound retrieving performance by automatically learned features outperform those using carefully handcrafted features in both supervised and unsupervised settings.

  Download: PDF (HIGH Res) (824KB)

  Download: PDF (LOW Res) (605KB)

  Be the first to discuss this paper

Features

140th Convention Report, Paris

Page: 544

Download: PDF (1.4MB)

140th Convention Exhibitors and Sponsors

Page: 553

Download: PDF (216KB)

141st Convention Preview, Los Angeles

Page: 556

Download: PDF (452KB)

141st Convention Exhibitor and Sponsor Preview

Page: 558

Download: PDF (762KB)

2016 Conference on Audio for Virtual and Augmented Reality Preview, Los Angeles

Page: 582

Download: PDF (387KB)

Immersive Audio: Objects, Mixing, and Rendering

Author:
Page:

As immersive audio systems and production techniques gain greater prominence in the market, the need for cost-effective solutions becomes apparent. Existing systems need to be adapted to enable object-based production techniques. “Ideal” reproduction solutions are having to be rationalized for practical purposes. Headphones offer one possible destination for immersive content, without excessive hardware requirements.

  Download: PDF (321KB)

  Be the first to discuss this feature

140th Convention Paper Abstracts

Page: 592

Download: PDF (314KB)

AES Bylaws

Page: 618

Download: PDF (58KB)

Departments

Book Review

Page: 581

Download: PDF (55KB)

Section News

Page: 589

Download: PDF (221KB)

AES Conventions and Conferences

Page: 624

Download: PDF (139KB)

Extras

Table of Contents

Download: PDF (41KB)

Cover & Sustaining Members List

Download: PDF (77KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (74KB)

AES - Audio Engineering Society