AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this Journal issue

An Intelligent Interface for Drum Pattern Variation and Comparative Evaluation of Algorithms

Drum tracks for electronic dance music are a central and style-defining element. But creating them can be a cumbersome task because of a lack of appropriate tools and input devices. The authors created a tool that supports musicians in an intuitive way for creating variations of drum patterns or finding inspiration for new patterns. Starting with a basic seed pattern provided by the user, a list of variations with varying degrees of similarity to the seed is generated. The variations are created using one of the three algorithms: a similarity-based lookup method using a rhythm pattern database, a generative approach based on a stochastic neural network, and a genetic algorithm using similarity measures as target function. Expert users in electronic music production evaluated aspects of the prototype and algorithms. In addition, a web-based survey was performed to assess perceptual properties of the variations in comparison to baseline patterns created by a human expert. The study shows that the algorithms produce musical and interesting variations and that the different algorithms have their strengths in different areas.

Open
Access

Authors: Vogl, Richard; Leimeister, Matthias; Nuanáin, Carthach Ó; Jordà, Sergi; Hlatky, Michael; Knees, Peter
Affiliations: Department of Computational Perception, Johannes Kepler University Linz, Austria; Native Instruments GmbH, Berlin, Germany; Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain(See document for exact affiliation information.)
JAES Volume 64 Issue 7/8 pp. 503-513; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

Download Now (547 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Immersive Audio: Objects, Mixing, and Rendering

[Feature] As immersive audio systems and production techniques gain greater prominence in the market, the need for cost-effective solutions becomes apparent. Existing systems need to be adapted to enable object-based production techniques. “Ideal” reproduction solutions are having to be rationalized for practical purposes. Headphones offer one possible destination for immersive content, without excessive hardware requirements.

Author: Rumsey, Francis
JAES Volume 64 Issue 7/8 pp. 584-588; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!

Improving Multilingual Interaction for Consumer Robots through Signal Enhancement in Multichannel Speech

In order for social robots to be truly successful, they need the ability to orally communicate with humans, providing feedback and accepting commands. Social robots need automatic speech recognition (ASR) tools that function with different users, using different languages, voice pitches, pronunciations, and speech speeds over a wide range of sound and noise levels. This paper describes different methodologies for voice activity detection and noise elimination when used with ASR-based oral interaction within an affordable budget robot. Acoustically quasi-stationary environments are assumed, which in conjunction with the high background noise of the robot’s microphones makes the ASR challenging. This work has been performed in the context of project RAPP, which attempts to deliver a cloud repository of applications and services that can be utilized by heterogeneous robots, aiming at assisting people with a range of disabilities. Results show that noise estimation and elimination techniques are necessary for successfully performing ASR in environments with quasi-stationary noise.

Authors: Tsardoulias, Emmanouil; Thallas, Aristeidis G.; Symeonidis, Andreas L.; Mitkas, Pericles A.
Affiliations: Centre of Research & Technology, Thermi, Thessaloniki, Greece; Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece(See document for exact affiliation information.)
JAES Volume 64 Issue 7/8 pp. 514-524; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Multipath Beat Tracking

Most music compositions evolve according to an underlying unit of time, sometimes implied and sometimes audible, called the beat. Beat trackers are essential components of rhythmic analysis systems in a wide range of applications involving musical information extraction. The authors describe a new methodology for tracking the sequence of beat instants given the ODF (Onset Detection Function) and the IBI (Inter Beat Interval) estimate. This alternate solution is based on a divide-and-conquer strategy that concurrently manages a multitude of simpler trackers from a rule-based perspective, effectively performing the search only for the most promising beat candidates. As confirmed by the experimental results, the performance of this method improves when the number of simple trackers (paths) is increased. Compared to dynamic programming, this approach is preferable in terms of computational efficiency while yielding accuracy scores that are comparable with many known beat trackers.

Authors: Giorgi, Bruno Di; Zanoni, Massimiliano; Böck, Sebastian; Sarti, Augusto
Affiliations: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy; Department of Computational Perception, Johannes Kepler University, Linz, Österreich, Austria(See document for exact affiliation information.)
JAES Volume 64 Issue 7/8 pp. 493-502; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

On the Impact of The Semantic Content of Sound Events in Emotion Elicitation

Sound events are known to have an influence on the listener’s emotions, but the reason for this influence is less clear. Take for example the sound produced by a gun firing. Does the emotional impact arise from the fact that the listener recognizes that a gun produced the sound (semantic content) or does it arise from the attributes of the sound created by the firing gun? This research explores the relation between the semantic similarity of the sound events and the elicited emotions. Results indicate that the semantic content seems to have a limited role in the conformation of the listener’s affective states. However, when the semantic content is matched to specific areas in the Arousal-Valence space or when the source’s spatial position is considered, the effect of the semantic content is higher, especially for the cases of medium to low valence and medium to high arousal or when the sound source is at the lateral positions of the listener’s head.

Open
Access

Authors: Drossos, Konstantinos; Kaliakatsos-Papakostas, Maximos; Floros, Andreas; Virtanen, Tuomas
Affiliations: Audio Research Group, Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland; Cognitive and Computational Musicology Group, Dept. of Musical Studies, Aristoteles University of Thessaloniki, Thessaloniki, Greece; Lab. of Audiovisual Signal Processing, Dept. of Audiovisual Arts, Ionian University, Corfu, Greece(See document for exact affiliation information.)
JAES Volume 64 Issue 7/8 pp. 525-532; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

Download Now (284 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Organizing a sonic space through vocal imitations

This research investigates how the vocal mimicking capabilities of humans may be exploited to access and explore a given sonic space. Experiments showed that prototype vocal sounds can be represented in a two-dimensional space and still remain perceptually distinct from each other. Experiments provide a measure of how meaningful the machine distribution and grouping of vocal sounds are to humans, and confirms that humans are able to effectively use the acoustic and articulatory cues at their disposal to associate sounds to given prototypes. When used in an automatic clustering process, these cues are sufficiently consistent with those used by humans when categorizing acoustic phenomena. The procedure of dimensionality reduction and clustering is demonstrated in the case of imitations of engine sounds, which then represent the sonic space of a motor sound model. A two-dimensional space is particularly attractive for sound design because it can be used as a sonic map where the landmarks contain both a synthetic sound and its vocal imitation.

Open
Access

Authors: Rocchesso, Davide; Mauro, Davide Andrea; Drioli, Carlo
Affiliations: Iuav University of Venice, Italy; University of Udine, Italy(See document for exact affiliation information.)
JAES Volume 64 Issue 7/8 pp. 474-483; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

Download Now (431 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound

A soundscape recording captures the sonic environment at a given location at a given time using one or more fixed or moving microphones. In most cases, the soundscape is uncontrolled and unscripted. Human listeners experience sonic components as being either background or foreground depending on their salient perceptual characteristics, such as proximity, repetition, and spectral attributes. Analyzing soundscapes in research tasks requires the classification and segmentation of the important sonic components, but that process is time consuming when done manually. This research establishes the background and foreground classification task within a musicological and soundscape context and then presents a method for the automatic segmentation of soundscape recordings. Using a soundscape corpus with ground truth data obtained from a human perception study, the analysis shows that participants have a high level of agreement on the category assigned to background samples (92.5%), foreground samples (80.8%), and background with foreground samples (75.3%). Experiments demonstrate how smaller window sizes affect the performance of the classifier.

Authors: Thorogood, Miles; Fan, Jianyu; Pasquier, Philippe
Affiliation: Simon Fraser University, SIAT, Canada
JAES Volume 64 Issue 7/8 pp. 484-492; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Supervised and Unsupervised Sound Retrieval by Vocal Imitation

Existing methods to index and search audio documents are generally based on text metadata and text-based search engines, but this approach is often problematic and time consuming because the text label does not necessarily describe the audio content. Query by Example (QBE) is an alternative approach for improving the effectiveness and efficiency of sound retrieval. In this research, the authors propose a novel approach for sound query by vocal imitation. Vocal imitation is commonly used in human communication and can be employed for human-computer interaction. Two proposals are suggested: (1) a supervised system that trains a multiclass classifier using training vocal imitations of different sound classes in the library and classifies a new imitation query into one of the classes; (2) an unsupervised system that is more flexible because it measures the feature distance between the imitation query and each sound in the library, returning sounds most similar to the query. Such systems require an effective feature representation of imitation queries and sounds in the library. Existing handcrafted audio features may not work well given the variety of vocal imitations and the mismatch between vocal imitations and actual sounds. It is therefore proposed to learn feature representations from training vocal imitations automatically using a Stacked Auto-Encoder (SAE). Experiments show that sound retrieving performance by automatically learned features outperform those using carefully handcrafted features in both supervised and unsupervised settings.

Authors: Zhang, Yichi; Duan, Zhiyao
Affiliation: Dept. of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
JAES Volume 64 Issue 7/8 pp. 533-543; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features

Understanding how a human mixing engineer functions is necessary for the design of intelligent music production tools. The goal of such tools is to generate mixes that could realistically have been created by a human mix-engineer. This paper presents an analysis of 1501 mixes, over 10 different songs, created by mix-engineers. The number of mixes of each song ranged from 97 to 373. A variety of objective signal features were extracted and principal component analysis was performed revealing four dimensions of mix-variation, which can be described as amplitude, brightness, bass, and width. Feature distribution suggests multimodal behavior dominated by one specific mode. This distribution appears to be independent of the choice of song, but with variation in modal parameters. This is then used to obtain general trends and tolerance bounds for these features. The results presented here are useful as parametric guidance for intelligent music production systems. This provides insight into the creative decision making processes of mix-engineers.

Open
Access

Authors: Wilson, Alex; Fazenda, Bruno
Affiliation: Acoustics Research Centre, University of Salford, Greater Manchester, UK
JAES Volume 64 Issue 7/8 pp. 466-473; July 2016 Permalink
Publication Date: August 11, 2016 Import into BibTeX

Download Now (566 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

AES E-Library Search Results

An Intelligent Interface for Drum Pattern Variation and Comparative Evaluation of Algorithms

Immersive Audio: Objects, Mixing, and Rendering

Improving Multilingual Interaction for Consumer Robots through Signal Enhancement in Multichannel Speech

Multipath Beat Tracking

On the Impact of The Semantic Content of Sound Events in Emotion Elicitation

Organizing a sonic space through vocal imitations

Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound

Supervised and Unsupervised Sound Retrieval by Vocal Imitation

Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features

ABOUT AES

Contact Us

Search Results (Displaying 9 matches)		New Search
Sort by: