AES Paris 2016
Poster Session P12

P12 - Perception Part 1 and Audio Signal Processing Part 2

Sunday, June 5, 14:45 — 16:45 (Foyer)

P12-1 Auditory Perception of the Listening Position in Virtual Rooms Using Static and Dynamic Binaural SynthesisAnnika Neidhardt, Technische Universität Ilmenau - Ilmenau, Germany; Bernhard Fiedler, University of Technology Ilmenau - Ilmenau, Germany; Tobias Heinl, University of Technology Ilmenau - Ilmenau, Germany
Virtual auditory environments (VAEs) can be explored by controlling the position and orientation of an avatar and listening to the scene from its changing perspective. Reverberation is essential for immersion and plausibility as well as for externalization and the distance perception of the sound sources. These days, room simulation algorithms provide a high degree of realism for static and dynamic binaural reproduction. In this investigation, the ability of people to discriminate listening positions within a virtual room is studied. This is interesting to find out whether the state of the art room simulation algorithms are perceptually appropriate, but also to learn more about people’s capability of orientating themselves within a purely acoustical scene. New findings will help designing suitable VAEs. [Also a lecture—see session P6-7]
Convention Paper 9517 (Purchase now)


Convention Paper 9519 (Purchase now)

P12-3 An Innovative Structure for the Approximation of Stereo Reverberation Effect Using Mixed FIR/IIR FiltersAndrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Massimo Garai, University of Bologna - Bologna, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Reverberation is a well-known effect that has an important role in our listening experience. Focusing on hybrid reverberator structures, an innovative structure for the approximation of stereo impulse responses considering low complexity filters is presented in this paper. More in detail, the conventional hybrid reverberator structure has been modified using also a tone correction filter for the emulation of high frequency air-absorbing effect and introducing a technique for the reproduction of the stereo perception. On this basis, the presented approach allows to obtain a better approximation of the impulse responses considering both time and frequency domain. Several results are reported considering different real impulse responses and comparing the results with previous techniques in terms of computational complexity and reverberation quality.
Convention Paper 9542 (Purchase now)

P12-4 Improvement of DUET for Blind Source Separation in Closely Spaced Stereo Microphone RecordingChan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
This paper proposes a blind source separation (BSS) method to improve the performance of the degenerate unmixing estimation technique (DUET) when sound sources are recorded using closely spaced stereo microphones. In particular, the attenuation-delay-based discrimination analysis employed in DUET is replaced with a microphone spacing- and source direction-based discrimination analysis in order to remedy the problem of DUET when the attenuation factors between recorded stereo audio signals are not distinguishable. In other words, the proposed BSS method generates a histogram as a function of the microphone spacing and the directional difference between stereo signals. Next, the generated histogram is used to partition the time-frequency representations of the mixtures into that of each sound source. The performance of the proposed method is evaluated by means of both objective and subjective measures. Consequently, it is shown from the evaluation that the proposed BSS method outperforms the conventional DUET in a closely spaced stereo microphone recording environment.
Convention Paper 9543 (Purchase now)

P12-5 A Phase-Matched Exponential Harmonic Weighting for Improved Sensation of Virtual BassHyungi Moon, Yonsei University - Seoul, Korea; Gyutae Park, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
Virtual Bass System (VBS) is based on the psychoacoustic phenomenon called “missing fundamental” is widely used to extend the lower frequency limit of the small loudspeakers. The perceptual quality of the VBS is highly dependent on the weighting strategy for the generated harmonics. There have been several weighting strategies for the generated harmonics including loudness matching, exponential attenuation, and timbre matching. To precisely convey the weighting strategy, however, it is essential to match the phases between the reproduced harmonics to the natural harmonics contained in the original signal. In this paper limitations of the previous harmonic weighting schemes are addressed and a new harmonic weighting scheme is proposed. In the proposed weighting scheme, the slope of the attenuation weighting is dynamically varied according to the frequency of the missing fundamental, and a phase matching between the original and generated harmonics is performed prior to the harmonic weighting. Subjective tests show that the proposed method provides more natural and effective bass sensation than the conventional schemes.
Convention Paper 9544 (Purchase now)

P12-6 Extraction of Interchannel Coherent Component from Multichannel AudioAkio Ando, University of Toyama - Toyama, Japan; Hiroki Tanaka, University of Toyama - Toyama, Japan; Hiro Furuya, University of Toyama - Toyama, Japan
Three-dimensional audio recording usually involves a number of spatially distributed microphones to capture the spatial sound. The temporal differences in arrival of sound from a source to microphones make the recorded signal less coherent than that with coincident microphones. In this paper a new method that extracts the interchannel coherent component from multichannel audio signal is proposed. It estimates the component of one channel signal from the other channel signals based on the least squares estimation. The experimental result showed that the new method can extract the interchannel coherent component from multichannel audio signal regardless of the number of channels of the signal.
Convention Paper 9545 (Purchase now)

P12-7 The Difference in Perceptual Attributes for the Distortion Timbre of the Electric Guitar between Guitar Players and Non-Guitar PlayersKoji Tsumoto, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
Subjective evaluation experiments were performed to reveal the perceptual attributes for the distorted timbre of the electric guitar. The motivation was to gain smoothness in conversation over the distorted timbre between guitar players and non-guitar players at the recording sessions. The signals of three guitar performance were distorted in three different amount of distortions with three kinds of frequency characteristics. That bring the total to twenty-seven stimuli. Sixteen non-guitar players and sixteen electric guitar players participated in the rating experiments using semantic scales anchored by eight bipolar adjective pairs. The result indicated both had similar perceptual attributes for distorted guitar timbres. One latent factor was found and was correlated with the acoustic features. The alterations of frequency characteristics did not appear as the variable affecting the judgment of distortion timbres.
Convention Paper 9546 (Purchase now)

P12-8 The Effect of a Vertical Reflection on the Relationship between Preference and Perceived Change in Timbre and Spatial AttributesThomas Robotham, University of Huddersfield - Huddersfield, UK; Matthew Stephenson, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This study aims to investigate a vertical reflection’s beneficial or detrimental contribution to subjective preference compared with perceived change in timbral and spatial attributes. A vertical reflection was electro-acoustically simulated and evaluated through subjective tests using musical stimuli in the context of listening for entertainment. Results indicate that the majority of subjects preferred audio reproduction with the addition of a reflection. Furthermore, there is a potential relationship between positive preference and the perceived level of both timbral and spatial differences, although this relationship is dependent on the stimuli presented. Subjects also described perceived differences where the reflection was present. These descriptors provide evidence suggesting a link between timbral descriptions and preference. However, this link was not observed between preference and spatial descriptions.
Convention Paper 9547 (Purchase now)

P12-9 Relative Contribution of Interaural Time and Level Differences to Selectivity for Sound LocalizationSi Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Heng Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Cong Zhang, Wuhan Polytechnic University - Wuhan, Hubei, China
In the present study, we measured threshold of interaural level difference in standard stimulus (ILDs) through the interaural time difference in variable stimulus (ITDv) and tested just notice difference of interaural time difference in standard stimulus (ITDs) by the interaural level differences in variable stimulus (ILDv) for sine wave over a frequency ranging from 150 to 1500 Hz at some lateral positions of sound image. Two separate experiments were conducted based on two alternative forced-choice (2AFC) and 1 up/2 down adaptive procedure. We could explore the relative contribution of Interaural Level Difference (ILD) and Interaural Time Difference(ITD) to sound localization as a function of position and frequency from these experimental data. The results showed lateral discrimination between stimuli are not difficult at frequencies of 350, 450, 570, and 700 Hz when we tested JND of ILD in standard stimulus and the auditory system is easier to discriminate two sound images and is more sensitive to localize the lateral positions of standard stimulus as frequency is varied from 700 to 1500 Hz when we measured JND of ITD in standard stimulus.
Convention Paper 9548 (Purchase now)

P12-10 Assessment of the Impact of Spatial Audiovisual Coherence on Source UnmaskingJulian Palacino, UBO - LabSTICC - Lorient, France; Mathieu Paquier, UBO - Brest, France; Vincent Koehl, UBO - Lab-STICC - Brest, France; Frédéric Changenet, Radio France - Paris, France; Etienne Corteel, Sonic Emotion Labs - Paris, France
The present study aims at evaluating the contribution of spatial audiovisual coherence for sound source unmasking for live music mixing. Sound engineers working with WFS technologies for live sound mixing have reported that their mixing methods have radically changed. Using conventional mixing methods, the audio spectrum is balanced in order to get each instrument intelligible inside the stereo mix. In contrast, when using WFS technologies, the source intelligibility can be achieved thanks to spatial audiovisual coherence and/or sound spatialization (and without using spectral modifications). The respective effects of spatial audiovisual coherence and sound spatialization should be perceptually evaluated. As a first step, the ability of naive and expert subjects to identify a spatialized mix was evaluated by a discrimination task. For this purpose, live performances (rock, jazz, and classic) were played back to subjects with and without stereoscopic video display and VBAP or WFS audio rendering. Two sound engineers realized the audio mixing for three pieces of music and for both audio technologies in the same room where the test have been carried out. Also a lecture—see session P6-5]
Convention Paper 9516 (Purchase now)

P12-11 Modeling the Perceptual Components of Loudspeaker DistortionSune Lønbæk Olsen, GoerTek Audio Technologies - Copenhagen, Denmark; Technical University of Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Ewen MacDonald, Technical University of Denmark - Lyngby, Denmark; Tore Stegenborg-Andersen, DELTA SenseLab - Hørsholm, Denmark; Christer P. Volk, DELTA SenseLab - Hørsholm, Denmark; Aalborg University - Aalborg, Denmark
While non-linear distortion in loudspeakers decreases audio quality, the perceptual consequences can vary substantially. This paper investigates the metric Rnonlin [1] which was developed to predict subjective measurements of sound quality in nonlinear systems. The generalizability of the metric in a practical setting was explored across a range of different loudspeakers and signals. Overall, the correlation of Rnonlin predictions with subjective ratings was poor. Based on further investigation, an additional normalization step is proposed, which substantially improves the ability of Rnonlin to predict the perceptual consequences of non-linear distortion.
Convention Paper 9549 (Purchase now)

P12-12 Comparison of the Objective and the Subjective Parameters of the Different Types of Microphone PreamplifiersMichal Luczynski, Wroclaw University of Technology - Wroclaw, Poland; Maciej Sabiniok, Wroclaw University of Technology - Wroclaw, Poland
The aim of this paper is to compare different types of microphone preamplifiers. The authors designed six types of preamps using different technologies (f.ex. based on vacuum tube, transistors, operational amplifiers). Assumed parameters such as input signal, gain, power supply were the same for all circuits. Preamps were tested by objective and subjective methods. Then the authors tried to find out relations between different gain components, electroacoustic parameters, and subjective sensation. The authors did not mean to create commercial devices; just to compare and classify objective and subjective parameters depending on the different types of microphone preamplifier.
Convention Paper 9550 (Purchase now)

P12-13 Plane Wave Identification with Circular Arrays by Means of a Finite Rate of Innovation ApproachFalk-Martin Hoffmann, University of Southampton - Southampton, Hampshire, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Philip Nelson, University of Southampton - Southampton, UK
Many problems in the field of acoustic measurements depend on the direction of incoming wave fronts w.r.t. a measurement device or aperture. This knowledge can be useful for signal processing purposes such as noise reduction, source separation, de-aliasing, and super-resolution strategies among others. This paper presents a signal processing technique for the identification of the directions of travel for the principal plane wave components in a sound field measured with a circular microphone array. The technique is derived from a finite rate of innovation data model and the performance is evaluated by means of a simulation study for different numbers of plane waves in the sound field. Also a lecture—see session P7-4]
Convention Paper 9521 (Purchase now)

P12-14 Automatic Localization of a Virtual Sound Image Generated by a Stereophonic ConfigurationLaura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Sound localization systems aim at providing the position of a particular sound source as perceived by the human auditory system. Interaural level difference, interaural time difference, and spectral representations of the binaural signals are the main cues adopted for localization. When two sound sources are simultaneously active, a virtual source is created. In this paper a novel approach is presented to provide the human perception of a sound image created by two loudspeakers. The solution is based on both frequency-dependent binaural and monaural cues in order to consider the human auditory system sensitivity to spatial sound localization. Experimental results proved the effectiveness of the proposed approach in correctly estimating the horizontal and vertical position of the virtual source.
Convention Paper 9551 (Purchase now)

P12-15 The Effect of Early Impulse Response Length and Visual Environment on Externalization of Binaural Virtual SourcesJoseph Sinker, University of Salford - Salford, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK
When designing an audio-augmented-reality (AAR) system capable of rendering acoustic “overlays” to real environments, it is advantageous to create externalized virtual sources with minimal computational complexity. This paper describes experiments designed to explore the relationships between early impulse response (EIR) length, visual environment and perceived externalization, and to identify if reduced IR data can effectively render a virtual source in matched and unmatched environments. In both environments a broadly linear trend is exhibited between EIR length and perceived externalization, and statistical analysis suggests a threshold at approximately 30-40 ms above which the extension of the EIR yields no significant increase in externalization.
Convention Paper 9552 (Purchase now)

P12-16 The Perception of Vertical Image Spread by Interchannel DecorrelationChristopher Gribben, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Subjective listening tests were conducted to assess the general perception of decorrelation in the vertical domain. Interchannel decorrelation was performed between a pair of loudspeakers in the median plane; one at ear level and the other elevated 30° above. The test stimuli consisted of decorrelated octave-band pink noise samples (63–8000 Hz), generated using three decorrelation techniques—each method featured three degrees of the interchannel cross-correlation coefficient (ICCC): 0.1, 0.4, and 0.7. Thirteen subjects participated in the experiment, using a pairwise comparison method to grade the sample with the greater perceived vertical image spread (VIS). Results suggest there is broadly little difference of overall VIS between decorrelation methods, and changes to vertical interchannel decorrelation appear to be better perceived in the upper-middle-frequencies. [Also a lecture—see session 6-3]
Convention Paper 9514 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS June 5th   10:00 – 18:00 June 6th   09:00 – 18:00 June 7th   09:00 – 16:00
REGISTRATION DESK June 4th   08:00 – 18:00 June 5th   08:00 – 18:00 June 6th   08:00 – 18:00 June 7th   08:00 – 16:00
TECHNICAL PROGRAM June 4th   09:00 – 18:30 June 5th   08:30 – 18:00 June 6th   08:30 – 18:00 June 7th   08:45 – 16:00
AES - Audio Engineering Society