AES Los Angeles 2014
Paper Session P8
P8 - Perception: Part 1
Friday, October 10, 2:00 pm — 5:00 pm (Room 308 AB)
Dan Mapes-Riordan, Etymotic Research - Elk Grove Village, IL, USA; DMR Consultants - Evanston, IL, USA
P8-1 Effect of Phase on the Perceived Level of Bass—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Kai Jussila, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland; Technical University of Denmark - Denmark
The perceived level of bass is typically considered to be related to the level of the magnitude spectrum at the corresponding frequencies. However, recently it has been found that, in the case of harmonic complex signals, also the phase spectrum can affect it. This paper studies this effect further using formal listening tests. It is found out that the phase spectrum that produces the perception of the loudest bass depends on the individual. Furthermore, the loudness of the bass appears to be affected by the phase characteristics of the tone in a relatively wide band.
Convention Paper 9137 (Purchase now)
P8-2 Auditory Compensation for Spectral Coloration—Cleopatra Pike, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
The “spectral compensation effect” (Watkins, 1991) describes a decrease in perceptual sensitivity to spectral modifications caused by the transmission channel (e.g., loudspeakers, listening rooms). Few studies have examined this effect: its extent and perceptual mechanisms are not confirmed. The extent to which compensation affects the perception of sounds colored by loudspeakers and other channels should be determined. This compensation has been mainly studied with speech. Evidence suggests that speech engages special perceptual mechanisms, so compensation might not occur with non-speech sounds. The current study provides evidence of compensation for spectrum in nonspeech tests: channel coloration was reduced by approximately 20%.
Convention Paper 9138 (Purchase now)
P8-3 The Importance of Onset Features in Listeners’ Perception of Vocal Modes in Singing—Eddy B. Brixen, EBB-consult - Smorum, Denmark; Cathrine Sadolin, Complete Vocal Institute - Copenhagen, Denmark; Henrik Kjelin, Complete Vocal Institute - Copenhagen, Denmark
The Complete Vocal Technique defines four vocal modes: Neutral, Curbing, Overdrive, and Edge. This paper reports the result of a listening test involving 59 subjects. The goal has been to find the importance of onset and decay features when identifying the vocal modes. The conclusion is that the onset only to a minor degree is responsible for the aural detection of vocal modes.
Convention Paper 9139 (Purchase now)
P8-4 The Influence of Up- and Down-mixes on the Overall Listening Experience—Michael Schoeffler, International Audio Laboratories Erlangen - Erlangen, Germany; Alexander Adami, International Audio Laboratories Erlangen - Erlangen, Germany; Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
Former studies have shown that up- and down-mix algorithms have a significant effect on ratings of audio quality. The question arises whether this significant effect is also verifiable when it comes to rating the overall listening experience of music. When listeners rate the overall listening experience, they are allowed to take everything into account that is important to them for enjoying a listening experience. An experiment was conducted where 25 participants rated the overall listening experience while listening to music that was artistically mixed and up- and down-mixed by six algorithms. The results show that there are no significant differences between the artistic mixes and the up- and down-mix algorithms except for two mixing algorithms which served as “lower anchors” and had a significant negative effect on the ratings.
Convention Paper 9140 (Purchase now)
P8-5 Measures of Microdynamics—Esben Skovenborg, TC Electronic - Risskov, Denmark
Overall loudness variations such as the distance between soft and loud scenes of a movie are known as macrodynamics and can be quantified with the Loudness Range measure. Microdynamics, in contrast, concern variations on a (much) finer time-scale. In this study six types of objective measures—some based on loudness level, some based on peak-to-average ratio—were evaluated against perceived microdynamics. A novel measure LDR, based on the maximum difference between a “fast” and a “slow” loudness level, had the strongest perceptual correlation. Peak-to-average ratio (or crest factor) type of measures had little or no correlation. The ratings of perceived microdynamics were obtained in a listening experiment, with stimuli consisting of music and speech of different dynamical properties.
Convention Paper 9141 (Purchase now)
P8-6 Real-Time Infant Cry Detection in Diverse Environments: A Novel Approach—Anant Baijal, Samsung Electronics Co. Ltd. - Suwon, Korea; Jinsung Kim, Samsung Electronics Co. Ltd. - Suwon, Republic of Korea; Jae-hoon Jeong, Samsung Electronics Co. Ltd. - Suwon, Korea; Inwoo Hwang, Samsung Electronics Co. Ltd. - Suwon-si, Gyeonggi-do, Korea; JungEun Park, Samsung Electronics Co. Ltd. - Suwon, Korea; Byeong-Seob Ko, Samsung Electronics Co. Ltd. - Suwon, Korea
We present a novel approach to detect infant cry in actual outdoor and indoor settings. Using computationally inexpensive features like Mel Frequency Cepstral Coefficients (MFCCs) and timbre-related features, the proposed algorithm yields very high recall rates for detecting infant cry in challenging settings such as café, street, playground, office, and home environments, even when Signal to Noise Ratio (SNR) is as low as 6 dB, while maintaining high precision. The results indicate that our approach is highly accurate, robust and, works in real-time.
Convention Paper 9142 (Purchase now)