AES New York 2007: Paper Session P3

AES 123rd Convention - Where Audio Comes Alive

AES New York 2007
Paper Session P3

P3 - Perception, Part 2

Friday, October 5, 1:30 pm — 5:30 pm
Chair: Poppy Crum, Johns Hopkins University School of Medicine - Baltimore, MD, USA

P3-1 Short-Term Memory for Musical Intervals: Cognitive Differences for Consonant and Dissonant Pure-Tone Dyads—Susan Rogers, Daniel Levitin, McGill University - Montreal, Quebec, Canada
To explore the origins of sensory and musical consonance/dissonance, 16 participants performed a short-term memory task by listening to sequentially presented dyads. Each dyad was presented twice; during each trial participants judged whether a dyad was novel or familiar. Nonmusicians showed greater recognition of musically dissonant than musically consonant dyads. Musicians recognized all dyads more accurately than predicted. Neither group used sensory distinctiveness as a recognition cue, suggesting that the frequency ratio, rather than the frequency difference between two tones, underlies memory for musical intervals. Participants recognized dyads well beyond the generally understood auditory short-term memory limit of 30 seconds, despite the inability to encode stimuli for long-term storage.
Convention Paper 7172 (Purchase now)

P3-2 Multiple Regression Modeling of the Emotional Content of Film and Music—Rob Parke, Elaine Chew, Chris Kyriakakis, University of Southern California - Los Angeles, CA, USA
Our research seeks to model the effect of music on the perceived emotional content of film media. We used participants’ ratings of the emotional content of film-alone, music-alone, and film-music pairings for a collection of emotionally neutral film clips and emotionally provocative music segments. Mapping the results onto a three-dimensional emotion space, we observed a strong relationship between the ratings of the film- and music-alone clips, and those of the film-music pairs. Previously, we modeled the ratings in each dimension independently. We now develop models, using stepwise regression, to describe the film-music ratings using quadratic terms and based on all dimensions simultaneously. We demonstrate that while linear-terms are sufficient for single emotion dimensional models, regression models that consider multiple emotion dimensions yield better results.
Convention Paper 7173 (Purchase now)

P3-3 Measurements and Perception of Nonlinear Distortion—Comparing Numbers and Sound Quality—Alex Voishvillo, JBL Professional - Northridge, CA, USA
The discrepancy between traditional measures of nonlinear distortion and its perception is commonly recognized. THD, two-tone and multitone intermodulation and coherence function provide certain objective information about nonlinear properties of a DUT, but they do not use any psychoacoustical principles responsible for distortion perception. Two approaches to building psychoacoustically-relevant measurement methods are discussed: one is based on simulation of the hearing system’s response similar to the methods used for assessment of codec’s sound quality. The other approach is based on several ideas such as distinguishing low-level versus high-level nonlinearities, low-order versus high-order nonlinearities, and spectral content of distortion signals that occur below the spectrum of an undistorted signal versus one that overlaps the signal’s spectrum or occurs above it. Several auralization examples substantiating this approach are demonstrated
Convention Paper 7174 (Purchase now)

P3-4 Influence of Loudness Level on the Overall Quality of Transmitted Speech—Nicolas Côté, France Télécom R&D - Lannion, France, and Berlin University of Technology, Berlin, Germany; Valérie Gautier-Turbin, France Télécom R&D - Lannion, France; Sebastian Möller, Berlin University of Technology - Berlin, Germany
This paper consists of a study on the influence of the loudness on the perceived quality of transmitted speech. This quality is based on judgments of particular quality features, one of which is loudness. In order to determine the influence of loudness on perceived speech quality, we designed a two-step auditory experiment. We varied the speech level of selected speech samples and degraded them by coding and packet-loss. Results show that loudness has an effect on the overall speech quality, but that effect depends on the other impairments involved in the transmission path, and especially on the bandwidth of the transmitted speech. We tried to predict the auditory judgments with two quality prediction models. The signal-based WB-PESQ model, which normalizes the speech signals to a constant speech level, does not succeed in predicting the speech quality for speech signals with only impairments due to a non-optimum speech level. However, the parametric E-model, which includes a measure of the listening level, provides a good estimation of the speech quality.
Convention Paper 7175 (Purchase now)

P3-5 On the Use of Graphic Scales in Modern Listening Tests—Slawomir Zielinski, Peter Brooks, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
This paper provides a basis for discussion of the perception and use of graphic scales in modern listening tests. According to the literature, the distances between the adjacent verbal descriptors used in typical graphic scales are often perceptually unequal. This implies that the scales are perceptually nonlinear and the ITU-R Quality Scale is shown to be particularly nonlinear in this respect. In order to quantify the degree of violation of linearity in listening tests, the evaluative use of graphic scales was studied in three listening tests. Contrary to expectation, the results showed that the listeners use the scales almost linearly. This may indicate that the listeners ignore the meaning of the descriptors and use the scales without reference to the labels.
Convention Paper 7176 (Purchase now)

P3-6 A Model-Based Technique for the Perceptual Optimization of Multimodal Musical Performances—Daniel Valente, Jonas Braasch, Rensselaer Polytechnic Institute - Troy, NY, USA
As multichannel audio and visual processing becomes more accessible to the general public, musicians are beginning to experiment with performances where players are in two or more remote locations. These co-located or telepresence performances challenge the conventions and basic rules of traditional musical experience. While they allow for collaboration with musicians and audiences in remote locations, the current limitations of technology restricts the communication between musicians. In addition, a telepresence performance introduces optical distortion that can result in impaired auditory communication, resulting in the need to study certain auditory-visual interactions. One such interaction is the relationship between a musician and a virtual visual environment. How does the attendant visual environment affect the perceived presence of a musician? An experiment was conducted to determine the magnitude of this effect. Two pre-recorded musical performances were presented through virtual display in a number of acoustically diverse environments under different relative background lighting conditions. Participants in this study were asked to balance the level of the direct-to-reverberant ratio, and reverberant level until the virtual musician's acoustic environment is congruent with that of the visual representation. One can expect auditory-visual interactions in the perception of a musician in varying virtual environments. Through a multivariate parameter optimization, the results from this paper will be used to develop a parametric model that will control the current auditory rendering system, Virtual Microphone Control (ViMiC), in order to create a more perceptually accurate auditory visual environment for performance.
Convention Paper 7177 (Purchase now)

P3-7 Subjective and Objective Rating of Intelligibility of Speech Recordings—Bradford Gover, John Bradley, National Research Council - Ottawa, Ontario, Canada
Recordings of test speech and an STIPA modulated noise stimulus were made with several microphone systems placed in various locations in a range of controlled test spaces. The intelligibility of the test speech recordings was determined by a subjective listening test, revealing the extent of differences among the recording systems and locations. Also, STIPA was determined for each physical arrangement and compared with the intelligibility test scores. The results indicate that STIPA was poorly correlated with the subjective responses, and not very useful for rating the microphone system performance. A computer program was written to determine STIPA in accordance with IEC 60268-16. The result was found to be highly sensitive to the method of determining the modulation transfer function at each modulation frequency, yielding the most accurate result when normalizing by the premeasured properties of the specific stimulus used.
Convention Paper 7178 (Purchase now)

P3-8 Potential Biases in MUSHRA Listening Tests—Slawomir Zielinski, Philip Hardisty, Christopher Hummersone, Francis Rumsey, University of Surrey - Guildford, Surrey, UK
The method described in the ITU-R BS.1534-1 standard, commonly known as MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors), is widely used for the evaluation of systems exhibiting intermediate quality levels, in particular low-bit rate codecs. This paper demonstrates that this method, despite its popularity, is not immune to biases. In two different experiments designed to investigate potential biases in the MUSHRA test, systematic discrepancies in the results were observed with a magnitude up to 20 percent. The data indicates that these discrepancies could be attributed to the stimulus spacing and range equalizing biases.
Convention Paper 7179 (Purchase now)

Last Updated: 20070820, mei

AES New York 2007Paper Session P3

AES New York 2007
Paper Session P3