AES Journal

Journal of the AES

2012 September - Volume 60 Number 9

Papers

MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes

Authors:Herre, Jürgen; Purnhagen, Heiko; Koppens, Jeroen; Hellmuth, Oliver; Engdegård, Jonas; Hilper, Johannes; Villemoes, Lars; Terentiv, Leon; Falch, Cornelia; Hölzer, Andreas; Valero, María Luis; Resch, Barbara; Mundt, Harald; Oh, Hyen-O
Affiliation:International Audio Laboratories Erlangen, Erlangen, Germany (a joint institution of the University of Erlangen / Nuremberg and Fraunhofer IIS); Fraunhofer Institute for Integrated Circuits, Erlangen, Germany; MED-EL, Innsbruck, Austria; Dolby Sweden AB, Stockholm, Sweden; Skype, Stockholm, Sweden; Philips Research, Eindhoven, The Netherlands; Dolby Germany GmbH, Nürnberg, Germany; Digital TV Laboratory, LG Electronics, Seoul, Korea; GOODDAY Invent & Patent LAB, Seoul, South Korea
Page:655

In 2010 the ISO/MPEG Audio standardization group issued the Spatial Audio Object Coding (SAOC) specification to define technology for parametric low bit-rate coding of audio object signals with a mono or stereo downmix. This paper provides an overview of MPEG SAOC technology, discussing recent verification tests. The authors examine operation modes for typical application scenarios by taking advantage of object-based processing. Most important, SAOC enables transmission of multi-object signals at data rates of the same order of magnitude as those used to represent two-channel audio. The important application scenarios are envisaged to be high-quality spatial teleconferencing, personal audio, interactive gaming, and rich media. Because the SAOC representation is independent of any particular loudspeaker setup, SAOC signals can be rendered efficiently on either a target loudspeaker configuration or portable device.

Download: PDF (HIGH Res) (1.8MB)

Download: PDF (LOW Res) (526KB)

Be the first to discuss this paper

In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

Authors:Terasawa, Hiroko; Berger, Jonathan; Makino, Shoji
Affiliation:Life Science Center of TARA, University of Tsukuba, Tsukuba, Ibaraki, Japan; JST, PRESTO (Information Science and Humans), Chiyoda-ku, Tokyo, Japan; CCRMA, Department of Music, Stanford University, Stanford, CA, USA
Page:674

Because the spectral envelope of a sound is a crucial aspect of timbre perception, the authors propose a quantitative model of spectral envelope perception using a set of orthogonal basis functions, analogous to the three primary colors in vision. The goal is find a quantitative mapping between the physical description of the spectral envelope and its perception. This allows for a meaningful and reliable way of controlling timbre in sonification. This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, i.e., the perception that is specifically related to the spectral element of timbre. Mel-frequency cepstral coefficients (MFCC) were chosen as a metric for spectral envelope perception because of their linearity, orthogonality, and multidimensionality. Quantitative data from two experiments illustrate the linear relationship between the subjective perception of spectrally-varied synthetic sounds and the MFCC.

Download: PDF (HIGH Res) (856KB)

Download: PDF (LOW Res) (520KB)

Be the first to discuss this paper

Acoustic Detection of Human Activities in Natural Environments

Authors:Ntalampiras, Stavros; Potamitis, Ilyas; Fakotakis, Nikos
Affiliation:Politecnico di Milano, Milan, Italy; Technological Educational Institute of Crete, Rethymno, Greece; University of Patras, Patras, Greece
Page:686

Automatic recognition of sound events can be valuable for efficient analysis of audio scenes. For example, detecting human activities like trespassing and hunting in natural environments can play an important role in their preservation by alerting authorities to take action. In the proposed system, each sound class is represented by a hidden Markov model created from descriptors in the time, frequency, and wavelet domains. The system has the ability to automatically adapt to acoustic conditions of different scenes via the feedback loop that refines an unsupervised model. A reliable testing process was adopted for assessing the performance of the system under adverse conditions characterized by highly nonstationary environmental noise.

Download: PDF (HIGH Res) (1.8MB)

Download: PDF (LOW Res) (239KB)

Be the first to discuss this paper

Naviton—A Prototype Mobility Aid for Auditory Presentation of Three-Dimensional Scenes to the Visually Impaired

Authors:Bujacz, Michal; Skulimowski, Piotr; Strumillo, Pawel
Affiliation:Lodz University of Technology, Lodz, Poland
Page:696

To augment the task of navigation and orientation of blind individuals, a new travel aid uses 3D scene sonification to present information about the environment using nonverbal audio. The model is composed of two classes of objects: obstacles and planes. The algorithm uses scene image segmentation, personalized spatial audio, musical tones, and sonar-like sound patterns. Individually measured head-related transfer functions were used to provide users with the illusion of sounds originating from the locations of sonified scene elements. Using a segmented and parametric description overcomes the sensory mismatch between visual and auditory perception. In a pilot study using both blind and sighted volunteers, subjects were able to utilize the prototype for spatial orientation and obstacle avoidance after a few minutes of training, attaining 90% accuracy in estimating the direction and depth of obstacles.

Download: PDF (HIGH Res) (4.9MB)

Download: PDF (LOW Res) (379KB)

Be the first to discuss this paper

Digital Fabrication of Acoustic Sonifications

Author:Barrass, Stephen
Affiliation:University of Canberra, Canberra, Australia
Page:709

Because the human brain is often optimal for detecting subtle patterns, this paper explores a novel transformation that maps numerical data into sound. In this research, a set of data taken from head-related transfer functions was used to create physical objects (bells made from stainless steel) whose acoustics were then presented to listeners. The technique is called acoustic sonification. Listeners were able to hear differences in pitch and timbre of bells that were constructed from different datasets, while bells constructed from similar datasets sounded similar. Modulating the shape of a bell with a dataset can influence the acoustic spectrum in a way that results in audible differences |even though there was no apparent visual difference. Acoustic sonification can take advantage of auditory pattern recognition.

Download: PDF (HIGH Res) (6.8MB)

Download: PDF (LOW Res) (461KB)

Be the first to discuss this paper

Standards and Information Documents

AES Standards Committee News

Page: 716

Download: PDF (1.5MB)

Features

46th Conference Report, Denver

Page: 718

Download: PDF (1.2MB)

Audio Bit Rates

Author:Rumsey, Francis
Page:729

For many years now low bit-rate coding has remained a hot topic in audio research and development. It would not be understating the case to say that advances in this field, together with the increasing ubiquity of the Internet, were primarily responsible for the revolution in the digital music market. One might wonder, therefore, if there is anything left to do or say about the topic, yet it continues to result in innovations and enhanced standards, as well as new products and licensing opportunities.

Download: PDF (449KB)

Be the first to discuss this feature

The Analog Microphone Interface and its History

Author:Wuttke, Jörg
Page:735

The interface between microphones and microphone inputs has special characteristics and requires special attention. The low output levels of microphones and the possible need for long cables have made it necessary to think about noise and interference of all kinds. A microphone input is also the electrical load for a microphone and can have an adverse influence on its performance. Condenser microphones contain active circuitry that requires some form of powering. With the introduction of transistorized circuitry in the 1960s, it became practical for this powering to be incorporated into microphone inputs. Various methods appeared in the beginning; 48-volt phantom powering is now dominant, but this standard method is still not always implemented correctly.

Download: PDF (506KB)

Be the first to discuss this feature