AES E-Library

AES E-Library Search Results

Search Results (Displaying 9 matches) New Search
Sort by:

Bulk download: Download Zip archive of all papers from this Journal issue

 

An Intelligent Interface for Drum Pattern Variation and Comparative Evaluation of Algorithms

Document Thumbnail

Drum tracks for electronic dance music are a central and style-defining element. But creating them can be a cumbersome task because of a lack of appropriate tools and input devices. The authors created a tool that supports musicians in an intuitive way for creating variations of drum patterns or finding inspiration for new patterns. Starting with a basic seed pattern provided by the user, a list of variations with varying degrees of similarity to the seed is generated. The variations are created using one of the three algorithms: a similarity-based lookup method using a rhythm pattern database, a generative approach based on a stochastic neural network, and a genetic algorithm using similarity measures as target function. Expert users in electronic music production evaluated aspects of the prototype and algorithms. In addition, a web-based survey was performed to assess perceptual properties of the variations in comparison to baseline patterns created by a human expert. The study shows that the algorithms produce musical and interesting variations and that the different algorithms have their strengths in different areas.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 64 Issue 7/8 pp. 503-513; July 2016 Permalink
Publication Date:


Download Now (547 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Immersive Audio: Objects, Mixing, and Rendering

Document Thumbnail

[Feature] As immersive audio systems and production techniques gain greater prominence in the market, the need for cost-effective solutions becomes apparent. Existing systems need to be adapted to enable object-based production techniques. “Ideal” reproduction solutions are having to be rationalized for practical purposes. Headphones offer one possible destination for immersive content, without excessive hardware requirements.

Author:
JAES Volume 64 Issue 7/8 pp. 584-588; July 2016 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!


Improving Multilingual Interaction for Consumer Robots through Signal Enhancement in Multichannel Speech

Document Thumbnail

In order for social robots to be truly successful, they need the ability to orally communicate with humans, providing feedback and accepting commands. Social robots need automatic speech recognition (ASR) tools that function with different users, using different languages, voice pitches, pronunciations, and speech speeds over a wide range of sound and noise levels. This paper describes different methodologies for voice activity detection and noise elimination when used with ASR-based oral interaction within an affordable budget robot. Acoustically quasi-stationary environments are assumed, which in conjunction with the high background noise of the robot’s microphones makes the ASR challenging. This work has been performed in the context of project RAPP, which attempts to deliver a cloud repository of applications and services that can be utilized by heterogeneous robots, aiming at assisting people with a range of disabilities. Results show that noise estimation and elimination techniques are necessary for successfully performing ASR in environments with quasi-stationary noise.

Authors:
Affiliations:
JAES Volume 64 Issue 7/8 pp. 514-524; July 2016 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Multipath Beat Tracking

Document Thumbnail

Most music compositions evolve according to an underlying unit of time, sometimes implied and sometimes audible, called the beat. Beat trackers are essential components of rhythmic analysis systems in a wide range of applications involving musical information extraction. The authors describe a new methodology for tracking the sequence of beat instants given the ODF (Onset Detection Function) and the IBI (Inter Beat Interval) estimate. This alternate solution is based on a divide-and-conquer strategy that concurrently manages a multitude of simpler trackers from a rule-based perspective, effectively performing the search only for the most promising beat candidates. As confirmed by the experimental results, the performance of this method improves when the number of simple trackers (paths) is increased. Compared to dynamic programming, this approach is preferable in terms of computational efficiency while yielding accuracy scores that are comparable with many known beat trackers.

Authors:
Affiliations:
JAES Volume 64 Issue 7/8 pp. 493-502; July 2016 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


On the Impact of The Semantic Content of Sound Events in Emotion Elicitation

Document Thumbnail

Sound events are known to have an influence on the listener’s emotions, but the reason for this influence is less clear. Take for example the sound produced by a gun firing. Does the emotional impact arise from the fact that the listener recognizes that a gun produced the sound (semantic content) or does it arise from the attributes of the sound created by the firing gun? This research explores the relation between the semantic similarity of the sound events and the elicited emotions. Results indicate that the semantic content seems to have a limited role in the conformation of the listener’s affective states. However, when the semantic content is matched to specific areas in the Arousal-Valence space or when the source’s spatial position is considered, the effect of the semantic content is higher, especially for the cases of medium to low valence and medium to high arousal or when the sound source is at the lateral positions of the listener’s head.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 64 Issue 7/8 pp. 525-532; July 2016 Permalink
Publication Date:


Download Now (284 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Organizing a sonic space through vocal imitations

Document Thumbnail

This research investigates how the vocal mimicking capabilities of humans may be exploited to access and explore a given sonic space. Experiments showed that prototype vocal sounds can be represented in a two-dimensional space and still remain perceptually distinct from each other. Experiments provide a measure of how meaningful the machine distribution and grouping of vocal sounds are to humans, and confirms that humans are able to effectively use the acoustic and articulatory cues at their disposal to associate sounds to given prototypes. When used in an automatic clustering process, these cues are sufficiently consistent with those used by humans when categorizing acoustic phenomena. The procedure of dimensionality reduction and clustering is demonstrated in the case of imitations of engine sounds, which then represent the sonic space of a motor sound model. A two-dimensional space is particularly attractive for sound design because it can be used as a sonic map where the landmarks contain both a synthetic sound and its vocal imitation.

Open Access

Open
Access

Authors:
Affiliations:
JAES Volume 64 Issue 7/8 pp. 474-483; July 2016 Permalink
Publication Date:


Download Now (431 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound

Document Thumbnail

A soundscape recording captures the sonic environment at a given location at a given time using one or more fixed or moving microphones. In most cases, the soundscape is uncontrolled and unscripted. Human listeners experience sonic components as being either background or foreground depending on their salient perceptual characteristics, such as proximity, repetition, and spectral attributes. Analyzing soundscapes in research tasks requires the classification and segmentation of the important sonic components, but that process is time consuming when done manually. This research establishes the background and foreground classification task within a musicological and soundscape context and then presents a method for the automatic segmentation of soundscape recordings. Using a soundscape corpus with ground truth data obtained from a human perception study, the analysis shows that participants have a high level of agreement on the category assigned to background samples (92.5%), foreground samples (80.8%), and background with foreground samples (75.3%). Experiments demonstrate how smaller window sizes affect the performance of the classifier.

Authors:
Affiliation:
JAES Volume 64 Issue 7/8 pp. 484-492; July 2016 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Supervised and Unsupervised Sound Retrieval by Vocal Imitation

Document Thumbnail

Existing methods to index and search audio documents are generally based on text metadata and text-based search engines, but this approach is often problematic and time consuming because the text label does not necessarily describe the audio content. Query by Example (QBE) is an alternative approach for improving the effectiveness and efficiency of sound retrieval. In this research, the authors propose a novel approach for sound query by vocal imitation. Vocal imitation is commonly used in human communication and can be employed for human-computer interaction. Two proposals are suggested: (1) a supervised system that trains a multiclass classifier using training vocal imitations of different sound classes in the library and classifies a new imitation query into one of the classes; (2) an unsupervised system that is more flexible because it measures the feature distance between the imitation query and each sound in the library, returning sounds most similar to the query. Such systems require an effective feature representation of imitation queries and sounds in the library. Existing handcrafted audio features may not work well given the variety of vocal imitations and the mismatch between vocal imitations and actual sounds. It is therefore proposed to learn feature representations from training vocal imitations automatically using a Stacked Auto-Encoder (SAE). Experiments show that sound retrieving performance by automatically learned features outperform those using carefully handcrafted features in both supervised and unsupervised settings.

Authors:
Affiliation:
JAES Volume 64 Issue 7/8 pp. 533-543; July 2016 Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features

Document Thumbnail

Understanding how a human mixing engineer functions is necessary for the design of intelligent music production tools. The goal of such tools is to generate mixes that could realistically have been created by a human mix-engineer. This paper presents an analysis of 1501 mixes, over 10 different songs, created by mix-engineers. The number of mixes of each song ranged from 97 to 373. A variety of objective signal features were extracted and principal component analysis was performed revealing four dimensions of mix-variation, which can be described as amplitude, brightness, bass, and width. Feature distribution suggests multimodal behavior dominated by one specific mode. This distribution appears to be independent of the choice of song, but with variation in modal parameters. This is then used to obtain general trends and tolerance bounds for these features. The results presented here are useful as parametric guidance for intelligent music production systems. This provides insight into the creative decision making processes of mix-engineers.

Open Access

Open
Access

Authors:
Affiliation:
JAES Volume 64 Issue 7/8 pp. 466-473; July 2016 Permalink
Publication Date:


Download Now (566 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!


AES - Audio Engineering Society