AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this Journal issue

Audio Pattern Recognition of Baby Crying Sound Events

Infants can communicate their internal state (such as pain, hunger, fear, fatigue, or stress) by the nature of their crying. Experts in linguistics suggest that the cry comprises the first speech manifestations. This article describes the design methodology for classifying baby crying sound events according to the pathological status of the infant. Such an automated system can be an aid to an attending physician performing a diagnosis. In order to address this challenge, a great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) were considered. Classification techniques, including Multilayer Perception, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, and Hidden Markov model were customized. The goal is to provide an automatic and noninvasive framework for monitoring infants and helping inexperienced/trainee pediatricians, parents, and baby caregivers to identify the baby’s pathological status.

Author: Ntalampiras, Stavros
Affiliation: Politecnico di Milano, Department of Electronics, Information and Bioengineering, Milano, Italy
JAES Volume 63 Issue 5 pp. 358-369; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Autonomous Multitrack Equalization Based on Masking Reduction

In multitrack music production, some sounds get masked by other sounds and the listener has less ability to fully hear and distinguish the sound sources in the mix. The authors designed a simplified measure of masking based on best practices, and then implemented both an off-line and real-time, autonomous multitrack equalization system that reduces masking in multitrack audio. The system used objective measures of spectral masking in the resultant mixes. Listening tests provided a subjective comparison between the mix results of different implementations of the system, a raw mix, and manual mixes made by an amateur and a professional mix engineer. The results showed that autonomous systems reduce both the perceived and objective masking. The offline semi-autonomous system is capable of improving the raw mix better than an amateur and close to a professional mix by simply controlling one user parameter. The results also suggest that existing objective measures of masking are ill-suited for quantifying perceived masking in multitrack musical audio.

Authors: Hafezi, Sina; Reiss, Joshua D.
Affiliation: Queen Mary University of London, London, UK
JAES Volume 63 Issue 5 pp. 312-323; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Immersive Audio, Objects, and Coding

[Feature] Immersive audio is an increasingly important topic for the audio industry, and object-based treatment is related to it. The perceptual advantages of immersive audio depend strongly on the program material, and various novel coding schemes have emerged to deliver it at low bit rates.

Author: Rumsey, Francis
JAES Volume 63 Issue 5 pp. 394-398; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!

Nearfield Crosstalk Increases Listener Preferences for Headphone-Reproduced Stereophonic Imagery

Although final mixing and mastering is monitored over loudspeakers, the majority of music listeners use headphones on mobile devices. Preferences for spatial process depend on the method of reproduction. For a variety of program material using headphones, listeners often prefer a stereophonic image that is created by simulating nearfield crosstalk compared to the biphonic spatial image. This novel approach, called Nearfield Crosstalk Simulation, describes crosstalk that simulates closely located loudspeakers. Previous work used farfield crosstalk simulation in an effort to produce an enhanced stereophonic effect, but such results were less preferred. The primary difference between the more conventional farfield crosstalk and the novel nearfield crosstalk developed for this study was the introduction of a level and a time difference at low frequency, consistent with what actually occurs for sound sources very close to a listener’s head.

Authors: Manor, Ella; Martens, William; Marui, Atsushi; Cabrera, Densil
Affiliations: University of Sydney, Sydney, Australia; Tokyo University of the Arts, Adachi-ku, Tokyo, Japan(See document for exact affiliation information.)
JAES Volume 63 Issue 5 pp. 324-335; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Noise Reduction Performance of Various Signals for Impulse Response Measurement

When measuring the acoustic impulse response in a real environment, acoustic and electrical noise corrupts the result. To improve the quality of measurements, researchers often use various excitation measurement signals, such as a linearly swept sine (SS) wave or a maximum length sequence (MLS) signal because they have high energy at all frequencies. But when the noise is dominated by low-frequency components, the S/N in this region is reduced. In this study, the noise reduction performance (NRP) of different excitation signals was theoretically examined to derive equations that can determine the NRP from the spectra of the measurement signal and noise. From the theoretical and experimental examinations, the following results were obtained. The NPR for white signals and noise-whitening signals is actually the same. Only the minimum noise (MN) signal that minimizes the noise component showed a significant improvement in NPR. A pink spectrum measurement signal showed good NPR in the presence of 1/k2 spectrum noise, where k is the discrete frequency number, but worse performance with other types of noise. This supports the conclusion that using the MN signal, which has a power spectrum that is the square root of the power spectrum of the noise, is the best method of reducing the effect of noise on the measured impulse response.

Open
Access

Author: Kaneda, Yutaka
Affiliation: Department of Information and Communication Engineering, Faculty of Engineering, Tokyo Denki University, Adachi-ku, Tokyo, Japan
JAES Volume 63 Issue 5 pp. 348-357; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

Download Now (496 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Numerical Modeling and Sound Synthesis for Articulated String/Fretboard Interactions

Sound synthesis for fretted instruments, such as the guitar, has a long history because of numerous challenges; the interaction between the finger, string and fretboard under variable playing conditions is a delicate one, leading to many subtle features difficult to emulate using standard synthesis methods. A physical modeling approach thus becomes an attractive option. A vibrating string is subject to intermittent contact/recontact phenomena along the length of the fretboard, and the string is driven by a plucking interaction and stopped by a finger, the position of which and force applied by are gestural parameters. In this research, a finite-difference time-domain method is developed with a penalty potential allowing for a convenient model of distributed collision. Implementation details are discussed and simulation results and visualizations are presented illustrating a variety of typical playing gestures. Finally, given that such methods for highly nonlinear systems are prone to numerical instability, a brief description of an energy-balanced or Hamiltonian framework is provided, allowing for convenient numerical stability conditions.

Authors: Bilbao, Stefan; Torin, Alberto
Affiliation: Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
JAES Volume 63 Issue 5 pp. 336-347; May 2015 Permalink
Publication Date: May 22, 2015 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

AES E-Library Search Results

Audio Pattern Recognition of Baby Crying Sound Events

Autonomous Multitrack Equalization Based on Masking Reduction

Immersive Audio, Objects, and Coding

Nearfield Crosstalk Increases Listener Preferences for Headphone-Reproduced Stereophonic Imagery

Noise Reduction Performance of Various Signals for Impulse Response Measurement

Numerical Modeling and Sound Synthesis for Articulated String/Fretboard Interactions

ABOUT AES

Contact Us

Search Results (Displaying 6 matches)		New Search
Sort by: