Events of the AES: AES 113th Convention: SESSION M: PSYCHOACOUSTICS, PART 1

Return to 113th

Chairman's Welcome

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

4 Day Planner

Paper Sessions

Workshops

Special Events

Technical Tours

Student Program

Historical

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Registration

Travel and

General Info

Press Information

Student Volunteers

Tuesday, October 8 9:00 am – 12:00 noon
SESSION M: PSYCHOACOUSTICS, PART 1

Chair: Christopher Struck, Dolby Laboratories, San Francisco, CA, USA

M-1 Comparison of Objective Measures of Loudness Using Audio Program Material—Eric Benjamin, Dolby Laboratories, San Francisco, CA, USA

Level measurements do not necessarily correlate well with subjective loudness. Several methods are described for making objective measurements that are designed to correlate better with actual loudness. Some of these measures are: A-weighting and B-weighting, the methods of Stevens and Zwicker as described in ISO 532A and B, and the method described by Moore and Glasberg. All of these measures are intended to describe (measure) the loudness of sounds with constant spectra. How well do these measures work with typical audio signals distributed via broadcast or recording?
Convention Paper 5703

M-2 Descriptive Analysis and Ideal Point Modeling of Speech Quality in Mobile Communication—Ville-Veikko Mattila, Nokia Research Center, Tampere, Finland

Descriptive analysis of processed speech quality was carried out by semantic differentiation, and external preference mapping was used to relate the attributes to overall quality judgments. Clean and noisy speech samples from different speakers were processed by various processing chains representing mobile communications, resulting in a total of 170 samples. The perceptual characteristics of the test samples were described by 18 screened subjects, and the final descriptive language with 21 attributes were developed in panel discussions. The scaled attributes were mapped to overall quality evaluations collected from 30 screened subjects by partial least square regression. The Phase II ideal point modeling was used to predict the quality with an average error of about 6 percent and to study the linearity of the attributes.
Convention Paper 5704

M-3 Relating Multilingual Semantic Scales to a Common Timbre Space—William L. Martens, Charith N. W. Giragama, University of Aizu, Aizu-Wakamatsu, Fukushima-ken, Japan

A single, common perceptual space for a small set of processed guitar timbres was derived for two groups of listeners, one group comprising native speakers of the Japanese language, the other group comprising native speakers of Sinhala, a language of Sri Lanka. Subsets of these groups made ratings on 13 bipolar adjective scales for the same set of sounds, each of the two groups using anchoring adjectives taken from their native language. The adjectives were those freely chosen most often in a preliminary elicitation. The results showed that the Japanese and Sinhalese semantic scales related differently to the dimensions of their shared timbre space that was derived using MDS analysis of the combined dissimilarity ratings of listeners from both groups.
Convention Paper 5705

M-4 Design and Evaluation of Binaural Cue Coding Schemes—Frank Baumgarte, Christof Faller, Agere Systems, Murray Hill, NJ, USA

Binaural cue coding (BCC) offers a compact parametric representation of auditory spatial information such as localization cues inherent in multichannel audio signals. BCC allows reconstruction of the spatial image given a mono signal and spatial cues that require a very low rate of a few kbit/s. This paper reviews relevant auditory perception phenomena exploited by BCC. The BCC core processing scheme design is discussed from a psychoacoustic point of view. This approach leads to a BCC implementation based on binaural perception models. The audio quality of this implementation is compared with low-complexity FFT-based BCC schemes presented earlier. Furthermore, spatial equalization schemes are introduced to optimize the auditory spatial image of loudspeaker or headphone presentation.
Convention Paper 5706

M-5 Evaluating Influences of a Central Automotive Loudspeaker on Perceived Spatial Attributes Using a Graphical Assessment Language—Natanya Ford¹, Francis Rumsey¹, Tim Nind², - ¹University of Surrey, Guildford, Surrey, UK; ²Harman/Becker Automotive Systems, Martinsville, IN, USA

An investigation is described which further develops a graphical assessment language (GAL) for subjectively evaluating spatial attributes of audio reproductions. Two groups of listeners, those with previous experience of using a GAL and listeners new to graphical elicitation, were involved in the study which considered the influence of a central automotive loudspeaker on listeners’ perception of ensemble width, instrument focus, and image skew. Listeners represented these attributes from both driver’s and passenger’s seats using their own graphical descriptors. Source material for the study consisted of simple instrumental and vocal sources chosen for their spectral and temporal characteristics. Sources ranged from a sustained cello melody to percussion and speech extracts. When analyzed using conventional statistical methods, responses highlighted differences in listeners’ perceptions of width, focus, and skew for the various experimental conditions.
Convention Paper 5707

M-6 Multidimensional Perceptual Calibration for Distortion Effects Processing Software—Marui Atsushi, William L. Martens, University of Aizu, Aizuwakamatsu-shi, Fukushima-ken, Japan

Controlled nonlinear distortion effects processing produces a wide range of musically useful outputs, especially in the production of popular guitar sounds. But systematic control of distortion effects has been difficult to attain, due to the complex interaction of input gain, drive level, and tone controls. Rather than attempting to calibrate the output of commercial effects processing hardware, which typically employs proprietary distortion algorithms, a real-time software-based distortion effects processor was implemented and tested. Three distortion effect types were modeled using both wave-shaping and a second order filter to provide more complete control over the parameters typically manipulated in controlling effects for electric guitars. The motivation was to relate perceptual differences between effects processing outputs and the mathematical functions describing the nonlinear waveshaping producing variation in distortion. Perceptual calibration entailed listening sessions where listeners adjusted the tone of each of nine test outputs, and then made both pairwise dissimilarity ratings and attribute ratings for those nine stimuli. The results provide a basis for an effects-processing interface that is perceptually-calibrated for system users.
Convention Paper 5708