Author: Bozena Kostek
Authors:Paiva, Rafael C. D.; Pakarinen, Jyri; Välimäki, Vesa
Affiliation:Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland; Nokia Institute of Technology, INdT, Brasilia, Brazil
In order to synthesize steel-stringed instruments, such as a guitar, a model of the pickup phenomenon is required. This model includes the pickup position, the sensitivity width of the transducer, mixing options with multiple pickups, linear resonant filtering, and distortion produced by the distance-dependent magnetic flux. A waveguide framework was used to describe frequency coloration of the pickup location and the low-pass effect of sensitivity width. The resulting models can be used in musical sound synthesis and digital effects. The physical properties of the pickup transducer modify the timbre of the instrument in various ways. For implementing audio effects for real guitar signals, hexaphonic pickups for separate signal streams for each string would be needed. Several commercial implementations of such pickup systems currently exist.
Download: PDF (HIGH Res) (2.3MB)
Download: PDF (LOW Res) (1.4MB)
Authors:Schönstein, David; Katz, Brian F.G.
Affiliation:Arkamys, Paris, France; LIMSI-CNRS, Université Paris Sud, Orsay, France
Because appropriate head-related transfer functions (HRTFs) are key to binaural rendering, an evaluation is required to assess processing steps when individual HRTFs are not available. This study involving six subjects showed significant response variability in perceptual evaluations of HRTFs when subjects were asked to judge six sets of HRTFs, including individual HRTFs, with three different attributes. Insufficient reproducibility is problematic when trying to select nonindividual HRTFs. In order to minimize the effect of learning, adequate training should be provided. By using attribute evaluations and assessor selection, this study offers a methodology that might be used to produce consistent evaluations in commercial binaural syntheses.
Download: PDF (HIGH Res) (1.6MB)
Download: PDF (LOW Res) (344KB)
Authors:Nikunen, Joonas; Virtanen, Tuomas; Vilermo, Miikka
Affiliation:Tampere University of Technology, Tampere, Finland; Nokia Research Center, Tampere, Finland
The expanding use of portable multimedia devices has intensified the need for better forms of scalable spatial audio coding (SAC) that match the connectivity rate and multichannel playback capabilities of the receiving device. A new SAC method is based on the parameterization of multichannel audio by representing it as a linear combination of objects composed of fixed spectral bases with time-varying gain and channel-dependent spatial gain. Spatial parameters can be estimated from the original multichannel signal using psychoacoustic properties of sound source localization. The base audio can be monophonic or downmixed stereophonic. Listening tests showed that the proposed SAC algorithm achieved the performance of conventional spatial audio coding methods with similar bit rates. The sound separation performance was evaluated and found applicable for separating sound sources in the coding domain directly.
Download: PDF (HIGH Res) (1.9MB)
Download: PDF (LOW Res) (269KB)
Authors:Zotter, Franz; Frank, Matthias
Affiliation:Institute of Electronic Music and Acoustics, University of Music and Performing Arts, Graz, Austria
The ideal panning algorithm for creating virtual locations in surround sound would have small variations in energy as the target locations change. All-Round Ambisonic Panning (AllRAP) is an algorithm that aims for creating phantom sources of stable loudness and adjustable width for arbitrary loudspeaker arrangements. This is achieved by combining an extended version of Vector-Base Amplitude Panning (VBAP) with Ambisonics. Ambisonics as audio format needs to be decoded to loudspeakers, which conventionally requires either dedicated loudspeaker arrangements or sophisticated mathematical treatment. In contrast, the proposed panning and decoding algorithm is highly generic and easily applicable to any arrangement of loudspeakers and platform with only basic computational capacities. Because AllRAP also works with loudspeaker arrangements covering only a part of a sphere, it is suitable for upcoming surround with height formats.
Download: PDF (HIGH Res) (1.9MB)
Download: PDF (LOW Res) (1.2MB)
Authors:Mendonça, Catarina; Campos, Guilherme; Dias, Paulo; Vieira, José; Ferreira, João P.; Santos, Jorge A.
Affiliation:University of Minho, School of Psychology, Centro Algoritmi, Centro de Computaçõ Gráfica, Guimarães, Portugal; University of Aveiro, Department of Electronics Telecomunications and Informatics, Aveiro, Portugal
Even though individual head-related transfer function (HRTF) filters produce better performance in virtual-reality environments, measuring individuals is labor intensive and expensive. Can training be used to enhance the performance of generic filters? This research shows that short training sessions with feedback allows for perceptual adaptation where simple exposure to generic HRTF filters did not. The benefits of training were observed not only for the trained sounds but also for other stimulus positions that were not part of the training. Apparently, subjects were actually adapting and generalizing to the generic HRTF filters, which is a manifestation of sensory neural plasticity. Learning profiles are unique to individuals. Any testing of localization performance should recognize the influence of training.
Download: PDF (HIGH Res) (1.3MB)
Download: PDF (LOW Res) (323KB)
Authors:Sofianos, Stratis; Ariyaeeinia, Aladdin; Polfreman, Richard; Sotudeh, Reza
Affiliation:University of Hertfordshire, Hatfield, Hertfordshire, UK; University of Southampton, Southampton, UK
Separating the singing voice from accompanying instruments is important in music information-retrieval systems, since it allows for such applications as melody extraction, lyrics recognition, and singer identity. The authors investigate effective methods for unsupervised separation of the singing voice, called H-Semantics (Hybrid Singing Extraction through Multiband Amplitude Enhanced Thresholding and Independent Component Subtraction). The proposed method adds time-domain separation to the previous work that was based on frequency-domain cepstral methods. The results indicate separation of approximately 8.5 dB signal-to-distortion ratio over the baseline.
Download: PDF (HIGH Res) (719KB)
Download: PDF (LOW Res) (390KB)
Analysis of the electric network frequency (ENF) has rapidly emerged as a crucial tool in the armory of the forensic audio analyst. Traces of the ENF are often picked up on recordings, either by electromagnetic induction or acoustically, and these can be detected subsequently by analysts. It turns out that unique patterns in the frequency signature of power-line-related signals can be used to identify the time and place in which audio recordings might have been made. This is possible because some power grid companies keep records of the ENF in a database, against which forensic audio samples can be compared. Even if no such database is available, which is sometimes the case, analysis of residual traces of the ENF in a recording can be used to detect audio editing and other processes that might have been used to modify it. During the 46th International Conference, held recently in Denver, Colorado, a substantial number of the papers and posters were devoted to aspects of ENF analysis, a selection of which are summarized here.
Download: PDF (286KB)