Authors:Hafezi, Sina; Reiss, Joshua D.
Affiliation:Queen Mary University of London, London, UK
In multitrack music production, some sounds get masked by other sounds and the listener has less ability to fully hear and distinguish the sound sources in the mix. The authors designed a simplified measure of masking based on best practices, and then implemented both an off-line and real-time, autonomous multitrack equalization system that reduces masking in multitrack audio. The system used objective measures of spectral masking in the resultant mixes. Listening tests provided a subjective comparison between the mix results of different implementations of the system, a raw mix, and manual mixes made by an amateur and a professional mix engineer. The results showed that autonomous systems reduce both the perceived and objective masking. The offline semi-autonomous system is capable of improving the raw mix better than an amateur and close to a professional mix by simply controlling one user parameter. The results also suggest that existing objective measures of masking are ill-suited for quantifying perceived masking in multitrack musical audio.
Download: PDF (HIGH Res) (2.6MB)
Download: PDF (LOW Res) (317KB)
Authors:Manor, Ella; Martens, William; Marui, Atsushi; Cabrera, Densil
Affiliation:University of Sydney, Sydney, Australia; Tokyo University of the Arts, Adachi-ku, Tokyo, Japan
Although final mixing and mastering is monitored over loudspeakers, the majority of music listeners use headphones on mobile devices. Preferences for spatial process depend on the method of reproduction. For a variety of program material using headphones, listeners often prefer a stereophonic image that is created by simulating nearfield crosstalk compared to the biphonic spatial image. This novel approach, called Nearfield Crosstalk Simulation, describes crosstalk that simulates closely located loudspeakers. Previous work used farfield crosstalk simulation in an effort to produce an enhanced stereophonic effect, but such results were less preferred. The primary difference between the more conventional farfield crosstalk and the novel nearfield crosstalk developed for this study was the introduction of a level and a time difference at low frequency, consistent with what actually occurs for sound sources very close to a listener’s head.
Download: PDF (HIGH Res) (3.7MB)
Download: PDF (LOW Res) (315KB)
Authors:Bilbao, Stefan; Torin, Alberto
Affiliation:Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
Sound synthesis for fretted instruments, such as the guitar, has a long history because of numerous challenges; the interaction between the finger, string and fretboard under variable playing conditions is a delicate one, leading to many subtle features difficult to emulate using standard synthesis methods. A physical modeling approach thus becomes an attractive option. A vibrating string is subject to intermittent contact/recontact phenomena along the length of the fretboard, and the string is driven by a plucking interaction and stopped by a finger, the position of which and force applied by are gestural parameters. In this research, a finite-difference time-domain method is developed with a penalty potential allowing for a convenient model of distributed collision. Implementation details are discussed and simulation results and visualizations are presented illustrating a variety of typical playing gestures. Finally, given that such methods for highly nonlinear systems are prone to numerical instability, a brief description of an energy-balanced or Hamiltonian framework is provided, allowing for convenient numerical stability conditions.
Download: PDF (HIGH Res) (3.5MB)
Download: PDF (LOW Res) (425KB)
Affiliation:Department of Information and Communication Engineering, Faculty of Engineering, Tokyo Denki University, Adachi-ku, Tokyo, Japan
When measuring the acoustic impulse response in a real environment, acoustic and electrical noise corrupts the result. To improve the quality of measurements, researchers often use various excitation measurement signals, such as a linearly swept sine (SS) wave or a maximum length sequence (MLS) signal because they have high energy at all frequencies. But when the noise is dominated by low-frequency components, the S/N in this region is reduced. In this study, the noise reduction performance (NRP) of different excitation signals was theoretically examined to derive equations that can determine the NRP from the spectra of the measurement signal and noise. From the theoretical and experimental examinations, the following results were obtained. The NPR for white signals and noise-whitening signals is actually the same. Only the minimum noise (MN) signal that minimizes the noise component showed a significant improvement in NPR. A pink spectrum measurement signal showed good NPR in the presence of 1/k2 spectrum noise, where k is the discrete frequency number, but worse performance with other types of noise. This supports the conclusion that using the MN signal, which has a power spectrum that is the square root of the power spectrum of the noise, is the best method of reducing the effect of noise on the measured impulse response.
Download: PDF (HIGH Res) (5.2MB)
Download: PDF (LOW Res) (497KB)
Affiliation:Politecnico di Milano, Department of Electronics, Information and Bioengineering, Milano, Italy
Infants can communicate their internal state (such as pain, hunger, fear, fatigue, or stress) by the nature of their crying. Experts in linguistics suggest that the cry comprises the first speech manifestations. This article describes the design methodology for classifying baby crying sound events according to the pathological status of the infant. Such an automated system can be an aid to an attending physician performing a diagnosis. In order to address this challenge, a great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) were considered. Classification techniques, including Multilayer Perception, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, and Hidden Markov model were customized. The goal is to provide an automatic and noninvasive framework for monitoring infants and helping inexperienced/trainee pediatricians, parents, and baby caregivers to identify the baby’s pathological status.
Download: PDF (HIGH Res) (4.6MB)
Download: PDF (LOW Res) (414KB)
Immersive audio is an increasingly important topic for the audio industry, and object-based treatment is related to it. The perceptual advantages of immersive audio depend strongly on the program material, and various novel coding schemes have emerged to deliver it at low bit rates.
Download: PDF (429KB)