AES Paris 2016
Paper Session Details

P1 - Audio Equipment and Audio Formats

Saturday, June 4, 09:00 — 11:30 (Room 353)

Chair:
Menno van der Veen, Ir. bureau Vanderveen - Hichtum, Netherlands

P1-1 Linearization Technique of the Power Stage in Open-Loop Class D Amplifiers—Federico Guanziroli, STMicroelectroics - Milan, Italy; Pierangelo Confalonieri, STMicroelectroics - Milan, Italy; Germano Nicollini, STMicroelectronics - Milan, Italy
An efficient method to linearize the switching (power) stage of open-loop class D amplifiers is presented. This technique has been successfully applied to an open-loop fully-digital PWM class D amplifier designed in a 40 nm CMOS process leading to nearly 15 dB improvement in the Total Harmonic Distortion (THD). Simulated open-loop class D amplifier performance resulted to 105 dBA Signal-to-Noise Ratio (SNR), and 1W output power over 8 Ohm with 90% power efficiency and 0.014% THD.
Convention Paper 9484 (Purchase now)

P1-2 Physically-Based Large-Signal Modeling for Miniature Type Dual Triode Tube—Shiori Oshimo, Hiroshima Institute of Technology - Hiroshima, Japan; Kanako Takemoto, Hiroshima Institute of Technology - Hiroshima, Japan; Toshihiko Hamasaki, Hiroshima Institute of Technology - Hiroshima, Japan
A precise SPICE model for miniature (MT) triode tubes of high-µ 12AX7 and medium-µ 12AU7 is proposed, based on the physical analysis of the measurement results. Comparing the characteristics between these tubes, the grid current at lower plate voltage and positive grid bias condition is modeled successfully with novel perveance parameters for the first time, though it was known that the perveance depends on both grid and plate bias. It is shown that the modulation factor of the space charge for the MT triodes is different from the other classic tubes. The model is implemented in LTspice to result in a good replication for a variation of three-order magnitude of grid current and cathode current. Also a poster—see session P5-2
Convention Paper 9485 (Purchase now)

P1-3 Analysis of Current MEMS Microphones for Cost-Effective Microphone Arrays—A Practical Approach—Sven Kissner, Jade University of Applied Sciences - Oldenburg, Germany; Jörg Bitzer, Jade Hochschule Oldenburg - Oldenburg, Germany
With this paper we present a practically relevant investigation of current, commercially available MEMS microphones (Micro-ElectroMechanical Systems). We compared the static noise floor exhibited by single and various parallel MEMS microphone configurations and a conventional and commonly used electret capsule, as well as the directivity patterns of selected configurations. The results suggest that while current types are exhibiting an already acceptable static noise floor, a direct parallel circuit of MEMS microphones allows further reductions of the noise floor close to the theoretical value of 3 dB SPL per doubling of number of microphones while maintaining omnidirectionality below 5 kHz.
Convention Paper 9486 (Purchase now)

P1-4 Matching the Amplifier to the Audio for Highly Efficient Linear Amplifiers—Jamie Angus, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK
“Class-D” switching amplifiers are considered to be the most efficient amplifiers available on the market. However, designers must deal with supply rail, and radio frequency interference, as well as the need to switch power devices at high frequencies. Because of these, and other problems, not everyone wishes to use switching based technologies for their amplifiers. Unfortunately, linear amplifiers are significantly more inefficient than switching amplifiers, under sine wave testing. However real audio signals spend much more time at low amplitudes than a sine wave. By changing the switch points for “Class-G” or “Class-H” they can have efficiencies that rival “Class-D” amplifiers producing the same output. The paper develops optimum switch points for both single and multiple switching points, with respect to the expected amplitude distribution of the audio.
Convention Paper 9487 (Purchase now)

P1-5 Delay-Reduced Mode of MPEG-4 Enhanced Low Delay AAC (AAC-ELD)—Markus Schnell, Fraunhofer IIS - Erlangen, Germany; Wolfgang Jaegers, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Pablo Delgado, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Conrad Benndorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Albert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The MPEG-4 AAC Enhanced Low Delay (AAC-ELD) coder is well established in high quality communication applications, such as Apple’s FaceTime, as well as in professional live broadcasting. Both applications require high interactivity, which typically demands an algorithmic codec delay between 15 ms and 35 ms. Recently, MPEG finalized a new delay-reduced mode for AAC-ELD featuring only a fraction of the regular algorithmic delay. This mode operates virtually at higher sampling rates while maintaining standard sampling rates for I/O. Supporting this feature, AAC-ELD can address even more delay critical applications, like wireless microphones or headsets for TV. In this paper main details of the delay-reduced mode of AAC-ELD are presented and application scenarios are outlined. Audio quality aspects are discussed and compared against other codecs with a delay below 10 ms. Also a poster—see session P5-3]
Convention Paper 9488 (Purchase now)

P2 - Audio Signal Processing—Part 1: Coding, Encoding, and Perception

Saturday, June 4, 09:00 — 12:00 (Room 352B)

Chair:
Dejan Todorovic, Dirigent Acoustics - Belgrade, Serbia

P2-1 Low Complexity, Software Based, High Rate DSD Modulator Using Vector Quantification—Thierry Heeb, ISIN-SUPSI - Manno, Switzerland; Digimath - Sainte-Croix, Switzerland; Tiziano Leidi, ISIN-SUPSI - Manno, Switzerland; Diego Frei, ISIN-SUPSI - Manno, Switzerland; Alexandre Lavanchy, Engineered SA - Yverdon-les-Bains, Switzerland
High rate Direct Stream Digital (DSD) is emerging as a format of choice for distribution of high-definition audio content. However, real-time encoding of such streams requires considerable computing resources due to their high sampling rate, constraining implementations to hardware based platforms. In this paper we disclose a new modulator topology allowing for reduction in computational load and making real-time high rate DSD encoding suitable for software based implementation on off-the-shelf Digital Signal Processors (DSPs). We first present the architecture of the proposed modulator and then show results from a practical real-time implementation.
Convention Paper 9489 (Purchase now)

P2-2 Phase Derivative Correction of Bandwidth-Extended Signals for Perceptual Audio Codecs—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Sascha Disch, Fraunhofer IIS, Erlangen - Erlangen, Germany; Christopher Oates, Fraunhofer IIS - Erlangen, Germany; Ville Pulkki, Aalto University - Espoo, Finland
Bandwidth extension methods, such as spectral band replication (SBR), are often used in low-bit-rate codecs. They allow transmitting only a relatively narrow low-frequency region alongside with parametric information about the higher bands. The signal for the higher bands is obtained by simply copying it from the transmitted low-frequency region. The copied-up signal is processed by multiplying the magnitude spectrum with suitable gains based on the transmitted parameters to obtain a similar magnitude spectrum as that of the original signal. However, the phase spectrum of the copied-up signal is typically not processed but is directly used. In this paper we describe what are the perceptual consequences of using directly the copied-up phase spectrum. Based on the observed effects, two metrics for detecting the perceptually most significant effects are proposed. Based on these, methods how to correct the phase spectrum are proposed as well as strategies for minimizing the amount of transmitted additional parameter values for performing the correction. Finally, the results of formal listening tests are presented.
Convention Paper 9490 (Purchase now)

P2-3 AC-4 – The Next Generation Audio Codec—Kristofer Kjörling, Dolby Sweden AB - Stockholm, Sweden; Jonas Rödén, Dolby Sweden AB - Stockholm, Sweden; Martin Wolters, Dolby Germany GmbH - Nuremberg, Germany; Jeff Riedmiller, Dolby Laboratories - San Francisco, CA USA; Arijit Biswas, Dolby Germany GmbH - Nuremberg, Germany; Per Ekstrand, Dolby Sweden AB - Stockholm, Sweden; Alexander Gröschel, Dolby Germany GmbH - Nuremberg, Germany; Per Hedelin, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Holger Hörich, Dolby Germany GmbH - Nuremberg, Germany; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden; Jeroen Koppens, Dolby Sweden AB - Stockholm, Sweden; K. Krauss, Dolby Germany GmbH - Nuremberg, Germany; Heidi-Maria Lehtonen, Dolby Sweden AB - Stockholm, Sweden; Karsten Linzmeier, Dolby Germany GmbH - Nuremberg, Germany; Hannes Muesch, Dolby Laboratories, Inc. - San Francisco, CA, USA; Harald Mundt, Dolby Germany GmbH - Nuremberg, Germany; Scott Norcross, Dolby Laboratories - San Francisco, CA, USA; J. Popp, Dolby Germany GmbH - Nuremberg, Germany; Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Michael Schug, Dolby Germany GmbH - Nuremberg, Germany; L. Sehlström, Dolby Sweden AB - Stockholm, Sweden; R. Thesing, Dolby Germany GmbH - Nuremberg, Germany; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Mark Vinton, Dolby - San Francisco, CA, USA
AC-4 is a state of the art audio codec standardized in ETSI (TS103 190 and TS103 190-2) and the TS103 190 is part of the DVB toolbox (TS101 154). AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialogue enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. [Also a poster—see session P5-6]
Convention Paper 9491 (Purchase now)

P2-4 Using Phase Information to Improve the Reconstruction Accuracy in Sinusoidal Modeling—Clara Hollomey, Glasgow Caledonian University - Glasgow, Scotland, UK; David Moore, Glasgow Caledonian University - Glasgow, Lanarkshire, UK; Don Knox, Glasgow Caledonian University - Glasgow, Scotland, UK; W. Owen Brimijoin, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK; William Whitmer, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK
Sinusoidal modeling is one of the most common techniques for general purpose audio synthesis and analysis. Owing to the ever increasing amount of available computational resources, nowadays practically all types of sounds can be constructed up to a certain degree of perceptual accuracy. However, the method is computationally expensive and can for some cases, particularly for transient signals, still exceed the available computational resources. In this work methods derived from the realm of machine learning are exploited to provide a simple and efficient means to estimate the achievable reconstruction quality. The peculiarities of common classes of musical instruments are discussed and finally, the existing metrics are extended by information on the signal's phase propagation to allow for more accurate estimations. Also a poster—see session P5-8]
Convention Paper 9492 (Purchase now)

P2-5 Equalization of Spectral Dips Using Detection Thresholds—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA; Charles Q. Robinson, Dolby Laboratories - San Francisco, CA, USA; Andrew Poulain, Dolby - San Jose, CA, USA
Frequency response equalization is often performed to improve audio reproduction. Variations from the target system response due to playback equipment or room acoustics can result in perceptible timbre distortion. In the first part of this paper we describe experiments conducted to determine the audibility of artificially introduced spectral dips. In particular, we measured notch depth detection threshold (independent variable) with respect to notch center frequency and Q-factor (independent variables). Listening tests were administered to 10 listeners in a small listening room and a screening room (small cinema with approximately 100 seats). Pink noise was used as the stimulus as it is perceptually flat (with roughly 3 dB/octave spectral tilt with frequency) and is known to be a reliable and discriminating signal for performing timbre judgments. The listeners gave consistent notch depth results with low variability around the mean value. The notch audibility data was then used to develop multiple candidate algorithms that generate equalization curves designed to perceptually match a desired target response, while minimizing the equalization gain applied. Informal subjective results validated the performance of the final algorithm.
Convention Paper 9493 (Purchase now)

P2-6 Single-Channel Audio Source Separation Using Deep Neural Network Ensembles—Emad M. Grais, University of Surrey - Guildford, Surrey, UK; Gerard Roma, University of Surrey - Guildford, Surrey, UK; Andrew J. R. Simpson, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) problem by predicting time-frequency masks. The predicted masks are then used to separate the sources from the mixed signal. Different types of masks produce separated sources with different levels of distortion and interference. Some types of masks produce separated sources with low distortion, while other masks produce low interference between the separated sources. In this paper a combination of different DNNs’ predictions (masks) is used for SCSS to achieve better quality of the separated sources than using each DNN individually. We train four different DNNs by minimizing four different cost functions to predict four different masks. The first and second DNNs are trained to approximate reference binary and soft masks. The third DNN is trained to predict a mask from the reference sources directly. The last DNN is trained similarly to the third DNN but with an additional discriminative constraint to maximize the differences between the estimated sources. Our experimental results show that combining the predictions of different DNNs achieves separated sources with better quality than using each DNN individually. Also a poster—see session P5-7]
Convention Paper 9494 (Purchase now)

P3 - Instrumentation and Measurement

Saturday, June 4, 13:30 — 17:00 (Room 353)

Chair:
Bert Kraaijpoel, Dutch Film Academy (NFA) - Amsterdam, Netherlands; Royal Conservatory - The Hague, Netherlands

P3-1 Characterization and Measurement of Wind Noise around Microphones—Chris Woolf, Broadcast Engineering Systems - Cornwall, UK
Producing wind noise measurements for microphones that correlate well with practical use has always proved difficult. Characterizing the noise numerically, rather than spectrally, has proved even harder. This paper explores some novel approaches to both problems. The airflow patterns of a newly designed wind generator are mapped, and a simple method of producing turbulent flow from a laminar stream is demonstrated. In order to characterize the wind noise numerically a dual number approach is explored as a possibility. This takes the spectral curves for a protected and unprotected microphone in an airstream, and produces two numbers: one for the level of noise reduction, and a second one for the accuracy with which the two curves track, duly corrected for audibility.
Convention Paper 9495 (Purchase now)

P3-2 Rocking Modes (Part 2: Diagnostics)—William Cardenas, Klippel GmbH - Dresden, Germany; Wolfgang Klippel, Klippel GmbH - Dresden, Germany
The rocking behavior of the diaphragm is a severe problem in headphones, micro-speakers, and other kinds of loudspeakers causing voice coil rubbing that limits the maximum acoustical output at low frequencies. The root causes of this problem are small imbalances in the distribution of the stiffness, mass, and force factor in the gap. Based on lumped parameter modeling, modal decomposition and signal flow charts presented in a previous paper (Part 1) this paper focuses on the practical measurement using laser vibrometry, parameter identification, and root cause analysis. New characteristics are presented that simplify the interpretation of the identified parameters. The new technique has been validated by numerical simulations and systematic modifications of a real transducer. The diagnostic value of the new measurement technique has been illustrated on a transducer used in headphones
Convention Paper 9496 (Purchase now)

P3-3 Harmonic Distortion Measurement for Nonlinear System Identification—John Vanderkooy, University of Waterloo - Waterloo, ON, Canada; Sean Thomson, Bowers & Wilkins - Steyning, West Sussex, UK
In order to model nonlinearities in loudspeakers, accurate measurement of harmonic distortion is necessary with particular attention to the relative phases of fundamental and harmonics. This paper outlines several ways that logarithmic sweeps can be used to achieve this goal. It is shown that Novak’s redesign of the logsweep is not strictly necessary, if proper account is taken of the phase relationships of the various harmonics. We study several other types of sweeps and methods to extract precise harmonic amplitudes and phases, using tracking filter concepts. The paper also deals with measurement systems that may have fractional-sample delays between excitation, reference, and data channels. Such details are important for accurate phase characterization of transfer functions. An intermodulation example is given for which sweeps with a single instantaneous frequency are inadequate. Also a poster—see session P8-5]
Convention Paper 9497 (Purchase now)

P3-4 Evaluation of a Fast HRTF Measurement System—Jan-Gerrit Richter, Institute of Technical Acoustics, RWTH Aachen University - Aachen, Germany; Gottfried Behler, RWTH Aachen University - Aachen, Germany; Janina Fels, RWTH Aachen University - Aachen, Germany
This paper describes and evaluates a measurement setup for individual Head-Related Transfer Functions (HRTFs) in high spatial resolution in a short time period. The setup is constructed to have as little impact on the measurement as possible. It consists of a circular arc segment of approximately 160 degrees on which a large number of broadband loudspeakers are placed forming one continuous surface. By rotating the subject or the arc horizontally, HRTFs are acquired along a spherical surface. To evaluate the influence of the measurement setup a solid sphere and an artificial head are measured and are compared with both the presented system, simulation data using Boundary Element Method, and a traditional, well evaluated HRTF measurement system with only one loudspeaker.
Convention Paper 9498 (Purchase now)

P3-5 Efficiency of Switch-Mode Power Audio Amplifiers—Test Signals and Measurement Techniques—Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Switch-mode technology is greatly used for audio amplification. This is mainly due to the great efficiency this technology offers. Normally the efficiency of a switch-mode audio amplifier is measured using a sine wave input. However this paper shows that sine waves represent real audio very poorly. An alternative signal is proposed for test purposes. The efficiency of a switch-mode power audio amplifier is modelled and measured with both sine wave and the proposed test signal as inputs. The results show that the choice of switching devices with low on resistances are unfairly favored when measuring the efficiency with sine waves. A 10% efficiency improvement was found for low power outputs. It is therefore of great importance to use proper test signals when measuring the efficiency.
Convention Paper 9499 (Purchase now)

P3-6 ITU-R BS.1770 Based Loudness for Immersive Audio—Scott Norcross, Dolby Laboratories - San Francisco, CA, USA; Sachin Nanda, Dolby Laboratories - San Francisco, CA, USA; Zack Cohen, Dolby Laboratories - San Francisco, CA, USA
With the adoption of ITU-R BS.1770 and the creation of numerous loudness recommendations, measuring and controlling the loudness of audio for broadcast is now a standard practice for legacy (5.1 and stereo) content. With new immersive and personalized audio content, the measurement and controlling of loudness is still in its infancy. While ITU-R BS.1770 has recently been revised to support an arbitrary number of audio channels. However dynamic object-based audio measurement is not explicitly covered in this revision, though the revision can be used to measured the rendered object-based audio. This paper summarizes the results of subjective loudness matching tests that were conducted using rendered dynamic object-based audio to verify the revision of ITU-R BS.1770.
Convention Paper 9500 (Purchase now)

P3-7 Metrics for Constant Directivity—Rahulram Sridhar, Princeton University - Princeton, NJ, USA; Joseph G. Tylka, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
It is often desired that a transducer have a polar radiation pattern that is invariant with frequency, but there is currently no way of quantifying the extent to which a transducer possesses this quality (often called “constant directivity” or “controlled directivity”). To address the problem, commonly-accepted criteria are used to propose two definitions of constant directivity. The first, stricter definition, is that the polar radiation pattern of a transducer should be invariant over a specified frequency range, whereas the second definition is that the directivity factor (i.e., the ratio between the on-axis power spectrum and the average power spectrum over all directions), or index when expressed in dB, should be invariant with frequency. Furthermore, to quantify each criterion, five metrics are derived: (1) Fourier analysis of contour lines (i.e., lines of constant sensitivity over frequency and angle), (2) directional average of frequency response distortions, (3) distortion thresholding of polar responses, (4) standard deviation of directivity index, and (5) cross-correlation of polar responses. Measured polar radiation data for four loudspeakers are used to compute all five metrics that are then evaluated based on their ability to quantify constant directivity. Results show that all five metrics are able to quantify constant directivity according to the criterion on which each is based, while only two of them, metrics 4 and 5, are able to adequately quantify both proposed definitions of constant directivity. [Also a poster—see session P8-6]
Convention Paper 9501 (Purchase now)

P4 - Room Acoustics

Saturday, June 4, 13:30 — 16:30 (Room 352B)

Chair:
Ben Kok, BEN KOK - acoustic consulting - Uden, The Netherlands

P4-1 Small-Rooms Dedicated to Music: From Room Response Analysis to Acoustic Design—Lorenzo Rizzi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Gabriele Ghelfi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Maurizio Santini, Università degli Studi di Bergamo - Bergamo, Italy
Reviewing elements of on-field professional experience gained by the authors in the analysis of small-rooms dedicated to music, case studies offered by the everyday working practice allow to deal with specific situations, these are seldom described by usual theoretical models and literature. Using the analysis procedure developed and refined by authors, it is possible to investigate the characteristics of the acoustic response of the small-rooms with more detail. In this paper case studies of particular interest will be described: different small-room phenomena will be shown in the reported measurements. [Also a poster—see session P8-2]
Convention Paper 9502 (Purchase now)

P4-2 Direction of Late Reverberation and Envelopment in Two Reproduced Berlin Concert Halls—Winfried Lachenmayr, Mueller-BBM - Munich, Germany; Musikhochschule Detmold; Aki Haapaniemi, Aalto University School of Science - Aalto, Finland; Tapio Lokki, Aalto University - Aalto, Finland
Most studies on the influence of the direction of late reverberation on listener envelopment (LEV) in concert halls have been conducted in laboratory conditions, i.e., where synthetic sound fields and a relatively limited number of loudspeakers were used to approximate a real, spatially quite complex acoustic situation. This study approaches LEV from the real acoustics. The late part of the sound field of two measured concert halls Berlin Konzerthaus and Berlin Philharmonie, auralized with a state-of-the-art reproduction method, is altered virtually regarding its’ direction. Results suggest that the figure-of-eight weighting applied in late lateral level LJ for predicting envelopment is underestimating the importance of reverberation from directions such as ceiling and rear.
Convention Paper 9503 (Purchase now)

P4-3 Electronic Shell—Improvement of Room Acoustics without Orchestra Shell Utilizing Active Field Control—Takayuki Watanabe, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Hideo Miyazaki, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan
This paper introduces an example of Electronic Shell acoustic enhancement system that was installed in a multi-purpose hall without an orchestra shell. The system is based on the concept of Active Field Control using electroacoustic means. The three objectives of this system were (1) the enhancement of early reflection for performers, (2) the increase of the reverberation time and the total sound energy on stage, and (3) the enhancement of early reflection in the audience area. The application of this system showed an improvement of about 1 to 2 dB in STearly and more than 2 dB in G in the audience area, which is equivalent or better performance than simple mobile typed orchestra shell. [Also a Poster—See Session P8-3]
Convention Paper 9504 (Purchase now)

P4-4 Experimental Assessment of Low-Frequency Electroacoustic Absorbers for Modal Equalization in Actual Listening Rooms—Etienne Rivet, Ecole polytechnique fédérale de Lausanne (EPFL) - Lausanne, Switzerland; Sami Karkar, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland; Hervé Lissek, Ecole Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland; Torje Nikolai Thorsen, Goldmund International - Monaco, Monaco; Véronique Adam, Goldmund International - Monaco, Monaco
In listening rooms, low-frequency modal resonances lead to uneven distributions in space and frequency of the acoustic energy, as well as an alteration of the temporal behavior of the original music content. While usual absorption techniques have severe limitations for reducing the negative impact of room modes, the authors have previously proposed the use of electroacoustic absorbers for room modal equalization. This device consists of a current-driven, closed-box loudspeaker associated to a hybrid sensor-/shunt-based impedance control. In this communication we assess the performance of these electroacoustic absorbers in actual listening rooms, by measuring frequency responses at different locations, as well as their modal decay times. The electroacoustic absorbers perform as expected and the room modal equalization is clearly improved in the low-frequency range.
Convention Paper 9505 (Purchase now)

P4-5 Modeling Non-Shoebox Shaped Rooms with the Mode Matching Method—Bjørn Kolbrek, Norwegian University of Science and Technology - Trondheim, Norway; U. Peter Svensson, NTNU - Trondheim, Norway
When a room is not shoebox shaped, usually no analytical expressions exist for the determination of resonance frequencies and mode shapes. One option is to employ the Finite Element Method (FEM). In this paper an alternative method, the Mode Matching Method (MMM), is used to compute the transfer function and sound field of a non-shoebox shaped room with rigid walls and is compared to an FEM solution. The two methods show excellent agreement. Also a poster—see session P8-7]
Convention Paper 9506 (Purchase now)

P4-6 Room Acoustic Measurements Using a High SPL Dodecahedron—Dario D'Orazio, University of Bologna - Bologna, Italy; Simona De Cesaris, University of Bologna - Bologna, Italy; Paolo Guidorzi, University of Bologna - Bologna, Italy; Luca Barbaresi, University of Bologna - Bologna, Italy; Massimo Garai, University of Bologna - Bologna, Italy; Roberto Magalotti, B&C Speakers S.p.A. - Bagno a Ripoli (FI), Italy
In this paper a dodecahedron with high powered loudspeakers is presented. The source is designed to allow high SPL with very low distortion. By comparing the prototype with a reference sound source, the high SPL dodecahedron show a flat frequency response over the 80 ÷ 5000 Hz one third octave bands, enough to meet all the ISO 3382 criteria. Laboratory measurements have been performed to test the performances and the robustness of the dodecahedron using different techniques at different sound pressure levels and background noises. The prototype allows a good signal-to-noise ratio of the impulse response also when 75 dB of stationary noise is added during the measurements.
Convention Paper 9507 (Purchase now)

P5 - Audio Equipment, Audio Formats, and Audio Signal Processing Part 1

Saturday, June 4, 13:30 — 15:30 (Foyer)

P5-1 A Comparison of Optimization Methods for Compression Driver Design—Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Emiliano Capucci, Faital S.P.A. - Milan, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Romolo Toppi, Faital S.P.A. - Milan, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Finite element analysis is a powerful and widespread mathematical technique capable of modeling even very complex physical systems. The use of this method is quite common in loudspeaker design processes, although simulations may often become time consuming. In order to reduce the number of simulations needed to define an optimal design, some advanced metaheuristic algorithms can be employed. The use of these techniques is well known in many optimization tasks when an analytical description of the system is not available a priori. In this paper a comparison among three different optimization procedures in the design of a compression driver is presented. The algorithms will be evaluated in terms of both convergence time and residual error.
Convention Paper 9508 (Purchase now)

P5-2 Physically-Based Large-Signal Modeling for Miniature Type Dual Triode Tube—Shiori Oshimo, Hiroshima Institute of Technology - Hiroshima, Japan; Kanako Takemoto, Hiroshima Institute of Technology - Hiroshima, Japan; Toshihiko Hamasaki, Hiroshima Institute of Technology - Hiroshima, Japan
A precise SPICE model for miniature (MT) triode tubes of high-µ 12AX7 and medium-µ 12AU7 is proposed, based on the physical analysis of the measurement results. Comparing the characteristics between these tubes, the grid current at lower plate voltage and positive grid bias condition is modeled successfully with novel perveance parameters for the first time, though it was known that the perveance depends on both grid and plate bias. It is shown that the modulation factor of the space charge for the MT triodes is different from the other classic tubes. The model is implemented in LTspice to result in a good replication for a variation of three-order magnitude of grid current and cathode current. Also a lecture—see session P1-2]
Convention Paper 9485 (Purchase now)

P5-3 Delay-Reduced Mode of MPEG-4 Enhanced Low Delay AAC (AAC-ELD)—Markus Schnell, Fraunhofer IIS - Erlangen, Germany; Wolfgang Jaegers, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Pablo Delgado, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Conrad Benndorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Albert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The MPEG-4 AAC Enhanced Low Delay (AAC-ELD) coder is well established in high quality communication applications, such as Apple’s FaceTime, as well as in professional live broadcasting. Both applications require high interactivity, which typically demands an algorithmic codec delay between 15 ms and 35 ms. Recently, MPEG finalized a new delay-reduced mode for AAC-ELD featuring only a fraction of the regular algorithmic delay. This mode operates virtually at higher sampling rates while maintaining standard sampling rates for I/O. Supporting this feature, AAC-ELD can address even more delay critical applications, like wireless microphones or headsets for TV. In this paper main details of the delay-reduced mode of AAC-ELD are presented and application scenarios are outlined. Audio quality aspects are discussed and compared against other codecs with a delay below 10 ms. Also a lecture—see session P1-5]
Convention Paper 9488 (Purchase now)

P5-4 Advances to a Frequency-Domain Parametric Coder of Wideband Speech—Aníbal Ferreira, University of Porto - Porto, Portugal; Deepen Sinha, ATC Labs - Newark, NJ, USA
In recent years, tools in perceptual coding of high-quality audio have been tailored to capture highly detailed information regarding signal components so that they gained an intrinsic ability to represent audio parametrically. In a recent paper we described a first validation model to such an approach applied to parametric coding of wideband speech. In this paper we describe specific advances to such an approach that improve coding efficiency and signal quality. A special focus is devoted to the fact that transmission to the decoder of any phase information is avoided, and that direct synthesis in the time-domain of the periodic content of speech is allowed in order to cope with fast F0 changes. A few examples of signal coding and transformation illustrate the impact of those improvements.
Convention Paper 9509 (Purchase now)

P5-5 Visual Information Search in Digital Audio Workstations—Joshua Mycroft, Queen Mary University London - London, UK; Tony Stockman, Queen Mary University London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
As the amount of visual information within Digital Audio Workstations increases, the interface potentially becomes more cluttered and time consuming to navigate. The increased graphical information may tax available display space requirements and potentially overload visual perceptual and attentional bandwidth. This study investigates the extent to which Dynamic Query filters (sliders, buttons, and other filters) can be used in audio mixing interfaces to improve both visual search times and concurrent critical listening tasks (identifying subtle attenuation of named instruments in a multichannel mix). The results of the study suggest that the inclusion of Dynamic Query filters results in a higher amount of correctly completed visual and aural tasks.
Convention Paper 9510 (Purchase now)

P5-6 AC-4 – The Next Generation Audio Codec—Kristofer Kjörling, Dolby Sweden AB - Stockholm, Sweden; Jonas Rödén, Dolby Sweden AB - Stockholm, Sweden; Martin Wolters, Dolby Germany GmbH - Nuremberg, Germany; Jeff Riedmiller, Dolby Laboratories - San Francisco, CA USA; Arijit Biswas, Dolby Germany GmbH - Nuremberg, Germany; Per Ekstrand, Dolby Sweden AB - Stockholm, Sweden; Alexander Gröschel, Dolby Germany GmbH - Nuremberg, Germany; Per Hedelin, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Holger Hörich, Dolby Germany GmbH - Nuremberg, Germany; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden; Jeroen Koppens, Dolby Sweden AB - Stockholm, Sweden; K. Krauss, Dolby Germany GmbH - Nuremberg, Germany; Heidi-Maria Lehtonen, Dolby Sweden AB - Stockholm, Sweden; Karsten Linzmeier, Dolby Germany GmbH - Nuremberg, Germany; Hannes Muesch, Dolby Laboratories, Inc. - San Francisco, CA, USA; Harald Mundt, Dolby Germany GmbH - Nuremberg, Germany; Scott Norcross, Dolby Laboratories - San Francisco, CA, USA; J. Popp, Dolby Germany GmbH - Nuremberg, Germany; Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Michael Schug, Dolby Germany GmbH - Nuremberg, Germany; L. Sehlström, Dolby Sweden AB - Stockholm, Sweden; R. Thesing, Dolby Germany GmbH - Nuremberg, Germany; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Mark Vinton, Dolby - San Francisco, CA, USA
AC-4 is a state of the art audio codec standardized in ETSI (TS103 190 and TS103 190-2) and the TS103 190 is part of the DVB toolbox (TS101 154). AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialogue enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. Also a lecture—see session P2-3]
Convention Paper 9491 (Purchase now)

P5-7 Single-Channel Audio Source Separation Using Deep Neural Network Ensembles—Emad M. Grais, University of Surrey - Guildford, Surrey, UK; Gerard Roma, University of Surrey - Guildford, Surrey, UK; Andrew J. R. Simpson, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) problem by predicting time-frequency masks. The predicted masks are then used to separate the sources from the mixed signal. Different types of masks produce separated sources with different levels of distortion and interference. Some types of masks produce separated sources with low distortion, while other masks produce low interference between the separated sources. In this paper a combination of different DNNs’ predictions (masks) is used for SCSS to achieve better quality of the separated sources than using each DNN individually. We train four different DNNs by minimizing four different cost functions to predict four different masks. The first and second DNNs are trained to approximate reference binary and soft masks. The third DNN is trained to predict a mask from the reference sources directly. The last DNN is trained similarly to the third DNN but with an additional discriminative constraint to maximize the differences between the estimated sources. Our experimental results show that combining the predictions of different DNNs achieves separated sources with better quality than using each DNN individually. also a lecture—see session P2-6]
Convention Paper 9494 (Purchase now)

P5-8 Using Phase Information to Improve the Reconstruction Accuracy in Sinusoidal Modeling—Clara Hollomey, Glasgow Caledonian University - Glasgow, Scotland, UK; David Moore, Glasgow Caledonian University - Glasgow, Lanarkshire, UK; Don Knox, Glasgow Caledonian University - Glasgow, Scotland, UK; W. Owen Brimijoin, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK; William Whitmer, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK
Sinusoidal modeling is one of the most common techniques for general purpose audio synthesis and analysis. Owing to the ever increasing amount of available computational resources, nowadays practically all types of sounds can be constructed up to a certain degree of perceptual accuracy. However, the method is computationally expensive and can for some cases, particularly for transient signals, still exceed the available computational resources. In this work methods derived from the realm of machine learning are exploited to provide a simple and efficient means to estimate the achievable reconstruction quality. The peculiarities of common classes of musical instruments are discussed and finally, the existing metrics are extended by information on the signal's phase propagation to allow for more accurate estimations. Also a lecture—see session P2-4]
Convention Paper 9492 (Purchase now)

P5-9 Just Noticeable Difference of Interaural Level Difference to Frequency and Interaural Level Difference—Heng Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Cong Zhang, Wuhan Polytechnic University - Wuhan, Hubei, China; Yafei Wu, Wuhan University - Wuhan, Hubei, China
In order to explore the perceptual mechanism of Interaural Level Difference (ILD) and research the relationship of ILD limen to frequency and ILD, this article selected eight values of ILD according to the qualitative analysis of ILD sensitivity by human ear. It was divided into 24 frequency bands as critical band and selected the center frequency of each band to test. This experiment adopted the traditional test methods (1 up/2 down and 2AFC). The results showed that: the thresholds of ILD are more significant with frequency, they are smaller at 500 Hz and 4000 Hz, a maximum value especially when it reaches about 1000 Hz; the thresholds increase as the reference values of ILD increase. This work will provide basic data for comprehensive exploring perceptual characteristics of the human ear and theoretical support for audio efficient compression.
Convention Paper 9511 (Purchase now)

P6 - Perception: Part 1

Sunday, June 5, 09:00 — 12:00 (Room 353)

Chair:
Dan Mapes-Riordan, Etymotic Research - Elk Grove Village, IL, USA; DMR Consulting - Evanston, IL, USA

P6-1 Perception of Low Frequency Transient Acoustic Phenomena in Small Rooms for Music—Lorenzo Rizzi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Federico Ascari, Politecnico di Milano - Milan, Italy; Gabriele Ghelfi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Michele Ferroni, Politecnico di Milano - Milan, Italy
Reducing the gap between analysis of low-frequency behavior of small rooms and actual perception, we introduce the importance of transient energetic phenomena besides classic FFT steady state analysis. After a frequency and temporal domain analysis of real-world impulse responses of critical listening rooms, headphone tests were performed. Results show that, for short musical sounds, a new curve called “Overshoot Response” can be more useful than classic frequency response regarding the level perception. Furthermore, the perceived loss of definition after the convolution with R.I.R. is correlated with decaying time and two metrics that were defined—“Room Slowness” and “Room Inertia.”
Convention Paper 9512 (Purchase now)

P6-2 The Reduction of Vertical Interchannel Crosstalk: The Analysis of Localization Thresholds for Musical Sources—Rory Wallis, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Musical sources were presented to subjects as phantom images from vertically arranged stereophonic loudspeakers. Loudspeakers were arranged in two layers: “main” and “height.” Subjects reduced the amplitude of the height layer until the resultant phantom image matched the position of the same source presented from the lower loudspeaker alone; this was referred to as the “localization threshold.” Delays ranging from 0–10 ms were applied to the height layer. The localization threshold was only significantly affected by the ICTD. The median threshold for 0 ms was –9.5 dB, which was significantly lower than the –7 dB found for the stimuli in which the height layer was delayed. No evidence was found to support the existence of the precedence effect in the median plane.
Convention Paper 9513 (Purchase now)

P6-3 The Perception of Vertical Image Spread by Interchannel Decorrelation—Christopher Gribben, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Subjective listening tests were conducted to assess the general perception of decorrelation in the vertical domain. Interchannel decorrelation was performed between a pair of loudspeakers in the median plane; one at ear level and the other elevated 30° above. The test stimuli consisted of decorrelated octave-band pink noise samples (63–8000 Hz), generated using three decorrelation techniques—each method featured three degrees of the interchannel cross-correlation coefficient (ICCC): 0.1, 0.4, and 0.7. Thirteen subjects participated in the experiment, using a pairwise comparison method to grade the sample with the greater perceived vertical image spread (VIS). Results suggest there is broadly little difference of overall VIS between decorrelation methods, and changes to vertical interchannel decorrelation appear to be better perceived in the upper-middle-frequencies. [Also a poster—see session 12-16]
Convention Paper 9514 (Purchase now)

P6-4 Measurements to Determine the Ranking Accuracy of Perceptual Models—Andy Pearce, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Martin Dewhirst, University of Surrey - Guildford, Surrey, UK
Linear regression is commonly used in the audio industry to create objective measurement models that predict subjective data. For any model development, the measure used to evaluate the accuracy of the prediction is important. The most common of these assume a linear relationship between the subjective data and the prediction, though in the early stages of model development this is not always the case. Measures based on rank ordering (such as Spearman’s test), can alternatively be used. Spearman’s test, however, does not consider the variance of the subjective results. This paper presents a method of incorporating the subjective variance in the Spearman’s rank ordering test using Monte Carlo simulations and shows how this can be used to develop predictive models.
Convention Paper 9515 (Purchase now)

P6-5 Assessment of the Impact of Spatial Audiovisual Coherence on Source Unmasking—Julian Palacino, UBO - LabSTICC - Lorient, France; Mathieu Paquier, UBO - Brest, France; Vincent Koehl, UBO - Lab-STICC - Brest, France; Frédéric Changenet, Radio France - Paris, France; Etienne Corteel, Sonic Emotion Labs - Paris, France
The present study aims at evaluating the contribution of spatial audiovisual coherence for sound source unmasking for live music mixing. Sound engineers working with WFS technologies for live sound mixing have reported that their mixing methods have radically changed. Using conventional mixing methods, the audio spectrum is balanced in order to get each instrument intelligible inside the stereo mix. In contrast, when using WFS technologies, the source intelligibility can be achieved thanks to spatial audiovisual coherence and/or sound spatialization (and without using spectral modifications). The respective effects of spatial audiovisual coherence and sound spatialization should be perceptually evaluated. As a first step, the ability of naive and expert subjects to identify a spatialized mix was evaluated by a discrimination task. For this purpose, live performances (rock, jazz, and classic) were played back to subjects with and without stereoscopic video display and VBAP or WFS audio rendering. Two sound engineers realized the audio mixing for three pieces of music and for both audio technologies in the same room where the test have been carried out. [Also a poster—see session P12-10]
Convention Paper 9516 (Purchase now)

P6-6 Auditory Perception of the Listening Position in Virtual Rooms Using Static and Dynamic Binaural Synthesis—Annika Neidhardt, Technische Universität Ilmenau - Ilmenau, Germany; Bernhard Fiedler, University of Technology Ilmenau - Ilmenau, Germany; Tobias Heinl, University of Technology Ilmenau - Ilmenau, Germany
Virtual auditory environments (VAEs) can be explored by controlling the position and orientation of an avatar and listening to the scene from its changing perspective. Reverberation is essential for immersion and plausibility as well as for externalization and the distance perception of the sound sources. These days, room simulation algorithms provide a high degree of realism for static and dynamic binaural reproduction. In this investigation, the ability of people to discriminate listening positions within a virtual room is studied. This is interesting to find out whether the state of the art room simulation algorithms are perceptually appropriate, but also to learn more about people’s capability of orientating themselves within a purely acoustical scene. New findings will help designing suitable VAEs. Also a poster—see session P12-1]
Convention Paper 9517 (Purchase now)

P7 - Audio Signal Processing—Part 2: Beamforming, Upmixing, HRTF

Sunday, June 5, 09:00 — 12:30 (Room 352B)

Chair:
Jamie Angus, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK

P7-1 Dual-Channel Beamformer Based on Hybrid Coherence and Frequency Domain Filter for Noise Reduction in Reverberant Environments—Hong Liu, Peking University - Beijing, China; Miao Sun, Shenzhen Graduate School, Peking University - Guangdong, China
As an effective technique for suppressing coherent noise, adaptive beamforming shows a strong decrease in reverberant rooms due to multipath room reflections of received signals. In this paper a dual-channel beamformer based on noise coherence and frequency domain filter is proposed. First, hybrid coherence based on coherent-to-diffuse energy ratio (CDR) is introduced to approximate the coherence of noise signals. Then the hybrid coherence is used to estimate the noise power spectral density (PSD), which is applied to the frequency domain filter to reduce noise and reverberation components in microphone signals. Finally, outputs of the filter are processed by a beamformer to suppress residual noise. Experiments demonstrate that the proposed system has noticeable improvements in SNR and quality of the output in reverberant environments.
Convention Paper 9518 (Purchase now)

P7-2 WITHDRAWN—N/A

Convention Paper 9519 (Purchase now)

P7-3 Estimation of Individualized HRTF in Unsupervised Conditions—Mounira Maazaoui, UMR STMS, IRCAM-CNRS-UPMC Sorbonne Universités - Paris, France; Olivier Warusfel, UMR STMS, IRCAM-CNRS-UPMC Sorbonne Universités - Paris, France
Head Related Transfer Functions (HRTF) are the key features of binaural sound spatialization. Those filters are specific to each individual and generally measured in an anechoic room using a complex process. Although the use of non-individual filters can cause perceptual artifacts, the generalization of such measurements is hardly accessible for large public. Thus, many authors have proposed alternative individualization methods to prevent from measuring HRTFs. Examples of such methods are based on numerical modeling, adaptation of non-individual HRTFs or selection of non-individual HRTFs from a database. In this article we propose an individualization method where the best matching set of HRTFs is selected from a database on the basis of unsupervised binaural recordings of the listener in a real-life environment.
Convention Paper 9520 (Purchase now)

P7-4 Plane Wave Identification with Circular Arrays by Means of a Finite Rate of Innovation Approach—Falk-Martin Hoffmann, University of Southampton - Southampton, Hampshire, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Philip Nelson, University of Southampton - Southampton, UK
Many problems in the field of acoustic measurements depend on the direction of incoming wave fronts w.r.t. a measurement device or aperture. This knowledge can be useful for signal processing purposes such as noise reduction, source separation, de-aliasing, and super-resolution strategies among others. This paper presents a signal processing technique for the identification of the directions of travel for the principal plane wave components in a sound field measured with a circular microphone array. The technique is derived from a finite rate of innovation data model and the performance is evaluated by means of a simulation study for different numbers of plane waves in the sound field. [Also a poster—see session P12-13]
Convention Paper 9521 (Purchase now)

P7-5 Mismatch between Interaural Level Differences Derived from Human Heads and Spherical Models—Ramona Bomhardt, RWTH Aachen University - Aachen, Germany; Janina Fels, RWTH Aachen University - Aachen, Germany
The individualization of head-related transfer functions (HRTFs) is important for binaural reproduction to reduce measurement efforts and localization errors. One common assumption of individualization for frequencies below 6 kHz is that the sound pressure field around a sphere is similar to the one of a human head. To investigate the accuracy of this approximation, this paper compares the frequency-dependent interaural level difference (ILD) from a spherical approximation, a simulation using magnetic resonance imaging and individually measured HRTFs of 23 adults' heads. With this database, it is possible to analyze the influence of the head shape and the pinna on ILD using the boundary element method and the measured HRTFs. Meanwhile the mismatch between the spherical and human ILD below 1.5 kHz in the horizontal plane is small, they differ above. In the frequency range of 1.5 and 3.5 kHz, ILD of one side of the head is dominated by two maxima. The offset of the ear canal entrance towards the back of the head and the depth of the head are the two major influencing factors. In general, it is observed that the maxima of a spherical ILD are much smaller and more widely spaced than in the human ILD. Above 4 kHz the difference between human and spherical ILDs is even stronger.
Convention Paper 9522 (Purchase now)

P7-6 Stereo Panning Law Remastering Algorithm Based on Spatial Analysis—François Becker, Paris, France; Benjamin Bernard, Medialab Consulting SNP - Monaco, Monaco; Longcat Audio Technologies - Chalon-sur-Saone, France
Changing the panning law of a stereo mixture is often impossible when the original multitrack session cannot be retrieved or used, or when the mixing desk uses a fixed panning law. Yet such a modification would be of interest during tape mastering sessions, among other applications. We present a frequency-based algorithm that computes the panorama power ratio from stereo signals and changes the panning law without altering the original panorama. [Also a poster—see session 19-12]
Convention Paper 9523 (Purchase now)

P7-7 Non-Linear Extraction of a Common Signal for Upmixing Stereo Sources—François Becker, Paris, France; Benjamin Bernard, Medialab Consulting SNP - Monaco, Monaco; Longcat Audio Technologies - Chalon-sur-Saone, France
In the context of a two- to three-channel upmix, center channel derivations fall within the field of common signal extraction methods. In this paper we explore the pertinence of the performance criteria that can be obtained from a probabilistic approach to source extraction; we propose a new, non-linear method to extract a common signal from two sources that makes the implementation choice of deeper extraction with a criteria of information preservation; and we provide the results of preliminary listening tests made with real-world audio materials. Also a poster—see session P19-13]
Convention Paper 9524 (Purchase now)

P8 - Room Acoustics, Instrumentation and Measurement

Sunday, June 5, 09:00 — 11:00 (Foyer)

P8-1 Numerical Modeling of Sound Intensity Distributions around Acoustic Transducer—Adam Kurowski, Gdansk University of Technology - Gdansk, Poland; Józef Kotus, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The aim of this research study is to measure, simulate, and compare sound intensity distribution generated by the acoustic transducers of the loudspeaker. The comparison of the gathered data allows for validating the numerical model of the acoustic radiation. An accurate model of a sound source is necessary in mathematical modeling of the sound field distribution near the scattering obstacles. An example of such obstacle is a human head. Preparation of a robust mathematical model of the sound field generated by a loudspeaker is one of the important factors in simulation of sound waves scattering by the human head. The numerical model is developed for the purpose of this kind of research.
Convention Paper 9525 (Purchase now)

P8-2 Small-Rooms Dedicated to Music: From Room Response Analysis to Acoustic Design—Lorenzo Rizzi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Gabriele Ghelfi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Maurizio Santini, Università degli Studi di Bergamo - Bergamo, Italy
Reviewing elements of on-field professional experience gained by the authors in the analysis of small-rooms dedicated to music, case studies offered by the everyday working practice allow to deal with specific situations, these are seldom described by usual theoretical models and literature. Using the analysis procedure developed and refined by authors, it is possible to investigate the characteristics of the acoustic response of the small-rooms with more detail. In this paper case studies of particular interest will be described: different small-room phenomena will be shown in the reported measurements. Also a lecture—see Session P4-1]
Convention Paper 9502 (Purchase now)

P8-3 Electronic Shell—Improvement of Room Acoustics without Orchestra Shell Utilizing Active Field Control—Takayuki Watanabe, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Hideo Miyazaki, Yamaha Corp. - Hamamatsu, Shizuoka, Japan; Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan
This paper introduces an example of Electronic Shell acoustic enhancement system that was installed in a multi-purpose hall without an orchestra shell. The system is based on the concept of Active Field Control using electroacoustic means. The three objectives of this system were (1) the enhancement of early reflection for performers, (2) the increase of the reverberation time and the total sound energy on stage, and (3) the enhancement of early reflection in the audience area. The application of this system showed an improvement of about 1 to 2 dB in STearly and more than 2 dB in G in the audience area, which is equivalent or better performance than simple mobile typed orchestra shell. [Also a lecture—see session P4-3]
Convention Paper 9504 (Purchase now)

P8-4 A Novel Approach of Multichannel and Stereo Control Room Acoustic Treatment, Second Edition—Bogic Petrovic, MyRoom Acoustics - Belgrade, Serbia; Zorica Davidovic, MyRoom Acoustics - Belgrade, Serbia
This paper describes additional development and improvement for all walls and ceiling diffusers, a new principle for multichannel or stereo control room setup/treatment, as was originally published at the 129th AES Convention (Paper Number 8295). The main effort focused on lowering the price of treatment, optimization of LF absorption, simplification of diffuser construction, solution for long diffusers without periodic repetition of diffusive sequence, and increasing room decay. All of these procedures and design principles will be described and attached to this paper, including theoretical analysis and room acoustical measurements from some of the first control rooms built following this new and improved principle.
Convention Paper 9526 (Purchase now)

P8-5 Harmonic Distortion Measurement for Nonlinear System Identification—John Vanderkooy, University of Waterloo - Waterloo, ON, Canada; Sean Thomson, Bowers & Wilkins - Steyning, West Sussex, UK
In order to model nonlinearities in loudspeakers, accurate measurement of harmonic distortion is necessary with particular attention to the relative phases of fundamental and harmonics. This paper outlines several ways that logarithmic sweeps can be used to achieve this goal. It is shown that Novak’s redesign of the logsweep is not strictly necessary, if proper account is taken of the phase relationships of the various harmonics. We study several other types of sweeps and methods to extract precise harmonic amplitudes and phases, using tracking filter concepts. The paper also deals with measurement systems that may have fractional-sample delays between excitation, reference, and data channels. Such details are important for accurate phase characterization of transfer functions. An intermodulation example is given for which sweeps with a single instantaneous frequency are inadequate. [Also a lecture—see session P3-3]
Convention Paper 9497 (Purchase now)

P8-6 Metrics for Constant Directivity—Rahulram Sridhar, Princeton University - Princeton, NJ, USA; Joseph G. Tylka, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
It is often desired that a transducer have a polar radiation pattern that is invariant with frequency, but there is currently no way of quantifying the extent to which a transducer possesses this quality (often called “constant directivity” or “controlled directivity”). To address the problem, commonly-accepted criteria are used to propose two definitions of constant directivity. The first, stricter definition, is that the polar radiation pattern of a transducer should be invariant over a specified frequency range, whereas the second definition is that the directivity factor (i.e., the ratio between the on-axis power spectrum and the average power spectrum over all directions), or index when expressed in dB, should be invariant with frequency. Furthermore, to quantify each criterion, five metrics are derived: (1) Fourier analysis of contour lines (i.e., lines of constant sensitivity over frequency and angle), (2) directional average of frequency response distortions, (3) distortion thresholding of polar responses, (4) standard deviation of directivity index, and (5) cross-correlation of polar responses. Measured polar radiation data for four loudspeakers are used to compute all five metrics that are then evaluated based on their ability to quantify constant directivity. Results show that all five metrics are able to quantify constant directivity according to the criterion on which each is based, while only two of them, metrics 4 and 5, are able to adequately quantify both proposed definitions of constant directivity. [Also a lecture—see session P3-7]
Convention Paper 9501 (Purchase now)

P8-7 Modeling Non-Shoebox Shaped Rooms with the Mode Matching Method—Bjørn Kolbrek, Norwegian University of Science and Technology - Trondheim, Norway; U. Peter Svensson, NTNU - Trondheim, Norway
When a room is not shoebox shaped, usually no analytical expressions exist for the determination of resonance frequencies and mode shapes. One option is to employ the Finite Element Method (FEM). In this paper an alternative method, the Mode Matching Method (MMM), is used to compute the transfer function and sound field of a non-shoebox shaped room with rigid walls and is compared to an FEM solution. The two methods show excellent agreement. Also a lecture—see session P4-5]
Convention Paper 9506 (Purchase now)

P9 - Live Sound Production and Upmixing

Sunday, June 5, 12:45 — 13:45 (Room 353)

Chair:
Mark Drews, University of Stavanger - Stavanger, Norway; Norwegian Institute of Recorded Sound - Stavanger, Norway

P9-1 A Hybrid Approach to Live Spatial Sound Mixing—Etienne Corteel, Sonic Emotion Labs - Paris, France; Raphael Foulon, Sonic Emotion Labs - Paris, France; Frédéric Changenet, Radio France - Paris, France
In this paper, we present an approach for live sound mixing that combines object oriented mixing with Wave Field Synthesis rendering with more standard live mixing techniques. This approach combines a standard mixing desk and an external processing unit. We first describe the system and the controls available to the sound engineer. Such system enables to create extensive contrast in the mix working on spatial positioning (angle, depth) but also projection of sound. We then review the use of the system in various musical genre (classical, jazz, pop) describing concrete application installations.
Convention Paper 9527 (Purchase now)

P9-2 Mono-to-Stereo Upmixing—Christian Uhle, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Patrick Gampp, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
A method for upmixing of single-channel audio signals for stereophonic sound reproduction in real-time is presented. To this end, the input signal is decomposed into a foreground signal and a background signal. The background signal is decorrelated using a network of nested allpass filters. The intensity of the decorrelation is controlled using a computational model for the perceived intensity of decorrelation. The foreground sound sources like singers and soloists are reproduced in the center of the stereo image. The proposed method enables upmixing from mono to stereo signals (and can also be applied to enhance the stereo image) with low latency, moderate computational load, and low memory requirements. It produces output signals with a high sound quality and is suitable for automotive and low-bitrate streaming applications.
Convention Paper 9528 (Purchase now)

P10 - Audio Quality

Sunday, June 5, 12:45 — 16:15 (Room 352B)

Chair:
Robin Reumers, Sonic City Studios - Amsterdam, The Netherlands

P10-1 Subjective Evaluation of High Resolution Audio through Headphones—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan; Ryuta Yamamoto, Digifusion Japan Co., Ltd. - Hiroshima, Japan; Katsuyuki Niyada, Hiroshima Cosmopolitan University - Hiroshima, Japan
Recently, high resolution audio (HRA) can be played back through portable devices and spreads across musical genres and generation. It means that most people listen to HRA through headphones and earphones. In this study perceptual discrimination among audio formats including HRA has been invested using a headphones. Thirty-six subjects, who have a variety of audio and musical experience in the wide age range from 20s to 70s, participated in listening tests. Headphone presentation is superior in discriminating the details to the loudspeaker presentation. It is, however, found that the headphone presentation is weak in reproducing presence and reality. Audio enthusiasts and musicians could significantly discriminate audio formats than ordinary listeners in both headphone and loudspeaker listening conditions. Also a poster—see session P15-6]
Convention Paper 9529 (Purchase now)

P10-2 A Headphone Measurement System Covers both Audible Frequency and beyond 20 kHz (Part 2)—Naotaka Tsunoda, Sony Corporation - Shinagawa-ku, Tokyo, Japan; Takeshi Hara, Sony Video & Sound Products Inc. - Tokyo, Japan; Koji Nageno, Sony Video and Sound Corporation - Tokyo, Japan
A new scheme consists of measurement by wide range HATS, and the free-field HRTF correction was proposed to enable entire frequency response measurement from audible frequency and higher frequency area up to 140 kHz and for direct comparison with free field loud speaker frequency response. This report supplements the previous report [N. Tsunoda et al., “A Headphone Measurement System for Audible Frequency and Beyond 20kHz,” AES Convention 139, October 2015, convention paper 9375] that described system concept by adding ear simulator detail and tips to obtain reliable data with much improved reproducibility. Also a poster—see session P15-8]
Convention Paper 9530 (Purchase now)

P10-3 Methodologies for High-dimensional Objective Assessment of Spatial Audio Quality—Dan Darcy, Dolby Laboratories, Inc. - San Francisco, CA, USA; Kent Terry, Dolby Laboratories Inc. - San Francisco, CA, USA; Grant Davidson, Dolby Laboratories, Inc. - San Francisco, CA, USA; Rich Graff, Dolby Laboratories, Inc. - San Francisco, CA, USA; Alex Brandmeyer, Dolby Laboratories - San Francisco, CA, USA; Poppy Crum, Dolby Laboratories - San Francisco, CA, USA
Traditional methods of subjective assessment of sound, such as ratings scales and forced-choice tasks, can be limited and time intensive in their ability to reflect the depth of experiential qualities associated with spatial hearing. Attempts to report localization of sound can be challenging when confounds or noise are introduced by constrained motions of head turning or pointing, and these approaches do not all record higher-dimensional features of sound like dispersion and trajectory. We propose a structured method of testing to reliably capture the quality of experience of spatial sound. Feature extraction of the high-dimensional representation of reported experiences converts to robust metrics used to tune and drive system performance toward desired perceptual attributes and optimal experiential performance.
Convention Paper 9531 (Purchase now)

P10-4 Objective Measures of Voice Quality for Mobile Handsets—Holly Francois, Samsung Electronics R&D Institute UK - Staines-Upon Thames, Surrey, UK; Scott Isabelle, Knowles Inc. - Mountain View, CA, USA; Eunmi Oh, Samsung Electronics Co., Ltd. - Seoul, Korea
Mobile phones include noise suppression to facilitate use in noisy environments; therefore listening tests in accordance with ITU-T P.835 are appropriate for comparing handset performance. Objective speech quality measures are an often used cheaper alternative; however the results can be misleading, as rank order compared to listening tests is not always preserved. We compare the outputs of PESQ, POLQA, and 3Quest with the results of P.835 listening tests. As expected, measures intended for use with noise suppression perform that task better than tools that were not initially designed to do so. However, improved measures, that aim to preserve rank order while minimizing both maximum error and RMSE, would improve the reliability of comparative evaluations in background noise.
Convention Paper 9532 (Purchase now)

P10-5 The Difference between Stereophony and Wave Field Synthesis in the Context of Popular Music—Christoph Hold, Technische Universität Berlin - Berlin, Germany; Hagen Wierstorf, Technische Universität Ilmenau - Ilmenau, Germany; Alexander Raake, Technische Universität Ilmenau - Ilmenau, Germany
Stereophony and Wave Field Synthesis (WFS) are capable of providing the listener with a rich spatial audio experience. They both come with different advantages and challenges. Due to different requirements during the music production stage, a meaningful direct comparison of both methods has rarely been carried out in previous research. As stereophony relies on a channel- and WFS on a model-based approach, the same mix cannot be used for both systems. In this study mixes of different popular-music recordings have been generated, each for two-channel stereophony, surround stereophony, and WFS. The focus is on comparability between the reproduction systems in terms of the resulting sound quality. In a paired-comparison test listeners rated their preferred listening experience. (Also a poster—see session P15-10)
Convention Paper 9533 (Purchase now)

P10-6 Accelerometer Based Motional Feedback Integrated in a 2 3/4" Loudspeaker—Ruben Bjerregaard, Technical University of Denmark - Kongens Lyngby, Denmark; Anders N. Madsen, Technical University of Denmark - Kongens Lyngby, Denmark; Henrik Schneider, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
It is a well known fact that loudspeakers produce distortion when they are driven into large diaphragm displacements. Various methods exist to reduce distortion using forward compensation and feedback methods. Acceleration based motional feedback is one of these methods and was already thoroughly described in the 1960s showing good results at low frequencies. In spite of this, the technique has mainly been used for closed box subwoofers to a limited extent. In this paper design and experimental results for a 23 /4 " acceleration based motional feedback loudspeaker are shown to extend this feedback method to a small full range loudspeaker. Furthermore, the audio quality from the system with feedback is discussed based on measurements of harmonic distortion, intermodulation distortion, and subjective evaluation. Also a poster session—see session P15-7]
Convention Paper 9534 (Purchase now)

P10-7 Visualization Tools for Soundstage Tuning in Cars—Delphine Devallez, Arkamys - Paris, France; Alexandre Fénières, Arkamys - Paris, France; Vincent Couteaux, Telecom ParisTech - Paris, France
In order to improve the spatial fidelity of automotive audio systems by means of digital signal processing, the authors investigated means to objectively assess the spatial perception of reproduced stereophonic sound in car cabins. It implied choosing a convenient binaural microphonic system representative of real listening situations and metrics to analyze interaural time differences under 1.5~kHz in those binaural recordings. Frequency-dependent correlation correctly showed the frequencies at which the fidelity was improved and allowed to quantify the improvement. The time-domain correlation seemed to be a good indicator of the apparent source width, but failed at giving the perceived azimuth of the virtual sound source. Therefore that metric must be refined to be used efficiently during audio tunings. (Also a poster—see session P15-9)
Convention Paper 9536 (Purchase now)

P11 - Audio Content Management & Applications in Audio

Sunday, June 5, 14:00 — 16:30 (Room 353)

Chair:
Mark Drews, University of Stavanger - Stavanger, Norway; Norwegian Institute of Recorded Sound - Stavanger, Norway

P11-1 Development Tools for Modern Audio Codecs—Jonas Larsen, Dolby Germany GmbH - Nuremberg, Germany; Martin Wolters, Dolby Germany GmbH - Nuremberg, Germany
The Dolby Bitstream Syntax Description Language (BSDL) is a generic, XML-based language for describing the syntactical structure of compressed audio-visual streams. This paper describes how the representation of a bitstream syntax in the BSDL is used to ease the development of serialization, deserialization, and editing tools. Additionally, the formal syntax description allows realizing a range of novel analysis methods including bitstream syntax coverage measurements, detailed bitrate profiles, and the automatic generation of rich specification documentation. The approach is exemplified using the AC-4 codec.
Convention Paper 9537 (Purchase now)

P11-2 Can Bluetooth ever Replace the Wire?—Jonny McClintock, Qualcomm Technology International Ltd. - Belfast, Northern Ireland, UK
Bluetooth is widely used as a wireless connection for audio applications including mobile phones, media players, and wearables, removing the need for cables. The combination of the A2DP protocol and frame based codecs used in many Bluetooth stereo audio implementations have led to excessive latency and acoustic performance significantly below CD quality. This paper will cover the latest developments in Bluetooth audio connectivity that will deliver CD quality audio, or better, and low latency for video and gaming applications. These developments together with the increased battery life delivered by Bluetooth Smart could lead to the elimination of wires for many applications. [Also a poster—see session P15-11]
Convention Paper 9538 (Purchase now)

P11-3 Deep Neural Networks for Dynamic Range Compression in Mastering Applications—Stylianos Ioannis Mimilakis, Fraunhofer Institute for Digital Media Technology (IDMT) - Ilmenau, Germany; Konstantinos Drossos, Tampere University of Technology - Tampere, Finland; Tuomas Virtanen, Tampere University of Technology - Tampere, Finland; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany; Fraunhofer Institute for Digital Media technology (IDMT) - Ilmenau, Germany
The process of audio mastering often, if not always, includes various audio signal processing techniques such as frequency equalization and dynamic range compression. With respect to the genre and style of the audio content, the parameters of these techniques are controlled by a mastering engineer, in order to process the original audio material. This operation relies on musical and perceptually pleasing facets of the perceived acoustic characteristics, transmitted from the audio material under the mastering process. Modeling such dynamic operations, which involve adaptation regarding the audio content, becomes vital in automated applications since it significantly affects the overall performance. In this work we present a system capable of modelling such behavior focusing on the automatic dynamic range compression. It predicts frequency coefficients that allow the dynamic range compression, via a trained deep neural network, and applies them to unmastered audio signal served as input. Both dynamic range compression and the prediction of the corresponding frequency coefficients take place inside the time-frequency domain, using magnitude spectra acquired from a critical band filter bank, similar to humans’ peripheral auditory system. Results from conducted listening tests, incorporating professional music producers and audio mastering engineers, demonstrate on average an equivalent performance compared to professionally mastered audio content. Improvements were also observed when compared to relevant and commercial software. Also a poster—see session P15-9]
Convention Paper 9539 (Purchase now)

P11-4 Principles of Control Protocol Design and Implementation—Andrew Eales, Rhodes University - Grahamstown, South Africa; Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa
Control protocols are used within audio networks to manage both audio streams and networked audio devices. A number of control protocols for audio devices have been recently developed, including the AES standards AES64-2012 and AES70-2015. Despite these developments, an ontology of control protocol design and implementation does not exist. This paper proposes design and implementation heuristics for control protocols. Different categories of control protocol design and implementation heuristics are presented and the implications of individual heuristics are discussed. These heuristics allow the features provided by different control protocols to be compared and evaluated and provide guidelines for future control protocol development.
Convention Paper 9540 (Purchase now)

P11-5 Absorption Materials in Reflex Loudspeakers—Juha Backman, Genelec Oy - Iisalmi, Finland; Microsoft Mobile - Espoo, Finland
It is well known that the placement of absorbent material has an effect on the behavior of ported (reflex) enclosures, even if the acoustic solution of the field inside the enclosure would predict that the pressure field is quite homogeneous and that the flow velocities in the acoustic field are small. A CFD model is used to study this phenomenon, and the results indicate that there is strong vortex formation inside an unlined enclosure even at small volume velocities, and that the presence and the distribution of porous material has a strong effect on these vortices.
Convention Paper 9541 (Purchase now)

P12 - Perception Part 1 and Audio Signal Processing Part 2

Sunday, June 5, 14:45 — 16:45 (Foyer)

P12-1 Auditory Perception of the Listening Position in Virtual Rooms Using Static and Dynamic Binaural Synthesis—Annika Neidhardt, Technische Universität Ilmenau - Ilmenau, Germany; Bernhard Fiedler, University of Technology Ilmenau - Ilmenau, Germany; Tobias Heinl, University of Technology Ilmenau - Ilmenau, Germany
Virtual auditory environments (VAEs) can be explored by controlling the position and orientation of an avatar and listening to the scene from its changing perspective. Reverberation is essential for immersion and plausibility as well as for externalization and the distance perception of the sound sources. These days, room simulation algorithms provide a high degree of realism for static and dynamic binaural reproduction. In this investigation, the ability of people to discriminate listening positions within a virtual room is studied. This is interesting to find out whether the state of the art room simulation algorithms are perceptually appropriate, but also to learn more about people’s capability of orientating themselves within a purely acoustical scene. New findings will help designing suitable VAEs. [Also a lecture—see session P6-7]
Convention Paper 9517 (Purchase now)

P12-2 WITHDRAWN—N/A

Convention Paper 9519 (Purchase now)

P12-3 An Innovative Structure for the Approximation of Stereo Reverberation Effect Using Mixed FIR/IIR Filters—Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Massimo Garai, University of Bologna - Bologna, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Reverberation is a well-known effect that has an important role in our listening experience. Focusing on hybrid reverberator structures, an innovative structure for the approximation of stereo impulse responses considering low complexity filters is presented in this paper. More in detail, the conventional hybrid reverberator structure has been modified using also a tone correction filter for the emulation of high frequency air-absorbing effect and introducing a technique for the reproduction of the stereo perception. On this basis, the presented approach allows to obtain a better approximation of the impulse responses considering both time and frequency domain. Several results are reported considering different real impulse responses and comparing the results with previous techniques in terms of computational complexity and reverberation quality.
Convention Paper 9542 (Purchase now)

P12-4 Improvement of DUET for Blind Source Separation in Closely Spaced Stereo Microphone Recording—Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
This paper proposes a blind source separation (BSS) method to improve the performance of the degenerate unmixing estimation technique (DUET) when sound sources are recorded using closely spaced stereo microphones. In particular, the attenuation-delay-based discrimination analysis employed in DUET is replaced with a microphone spacing- and source direction-based discrimination analysis in order to remedy the problem of DUET when the attenuation factors between recorded stereo audio signals are not distinguishable. In other words, the proposed BSS method generates a histogram as a function of the microphone spacing and the directional difference between stereo signals. Next, the generated histogram is used to partition the time-frequency representations of the mixtures into that of each sound source. The performance of the proposed method is evaluated by means of both objective and subjective measures. Consequently, it is shown from the evaluation that the proposed BSS method outperforms the conventional DUET in a closely spaced stereo microphone recording environment.
Convention Paper 9543 (Purchase now)

P12-5 A Phase-Matched Exponential Harmonic Weighting for Improved Sensation of Virtual Bass—Hyungi Moon, Yonsei University - Seoul, Korea; Gyutae Park, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
Virtual Bass System (VBS) is based on the psychoacoustic phenomenon called “missing fundamental” is widely used to extend the lower frequency limit of the small loudspeakers. The perceptual quality of the VBS is highly dependent on the weighting strategy for the generated harmonics. There have been several weighting strategies for the generated harmonics including loudness matching, exponential attenuation, and timbre matching. To precisely convey the weighting strategy, however, it is essential to match the phases between the reproduced harmonics to the natural harmonics contained in the original signal. In this paper limitations of the previous harmonic weighting schemes are addressed and a new harmonic weighting scheme is proposed. In the proposed weighting scheme, the slope of the attenuation weighting is dynamically varied according to the frequency of the missing fundamental, and a phase matching between the original and generated harmonics is performed prior to the harmonic weighting. Subjective tests show that the proposed method provides more natural and effective bass sensation than the conventional schemes.
Convention Paper 9544 (Purchase now)

P12-6 Extraction of Interchannel Coherent Component from Multichannel Audio—Akio Ando, University of Toyama - Toyama, Japan; Hiroki Tanaka, University of Toyama - Toyama, Japan; Hiro Furuya, University of Toyama - Toyama, Japan
Three-dimensional audio recording usually involves a number of spatially distributed microphones to capture the spatial sound. The temporal differences in arrival of sound from a source to microphones make the recorded signal less coherent than that with coincident microphones. In this paper a new method that extracts the interchannel coherent component from multichannel audio signal is proposed. It estimates the component of one channel signal from the other channel signals based on the least squares estimation. The experimental result showed that the new method can extract the interchannel coherent component from multichannel audio signal regardless of the number of channels of the signal.
Convention Paper 9545 (Purchase now)

P12-7 The Difference in Perceptual Attributes for the Distortion Timbre of the Electric Guitar between Guitar Players and Non-Guitar Players—Koji Tsumoto, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Atsushi Marui, Tokyo University of the Arts - Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan
Subjective evaluation experiments were performed to reveal the perceptual attributes for the distorted timbre of the electric guitar. The motivation was to gain smoothness in conversation over the distorted timbre between guitar players and non-guitar players at the recording sessions. The signals of three guitar performance were distorted in three different amount of distortions with three kinds of frequency characteristics. That bring the total to twenty-seven stimuli. Sixteen non-guitar players and sixteen electric guitar players participated in the rating experiments using semantic scales anchored by eight bipolar adjective pairs. The result indicated both had similar perceptual attributes for distorted guitar timbres. One latent factor was found and was correlated with the acoustic features. The alterations of frequency characteristics did not appear as the variable affecting the judgment of distortion timbres.
Convention Paper 9546 (Purchase now)

P12-8 The Effect of a Vertical Reflection on the Relationship between Preference and Perceived Change in Timbre and Spatial Attributes—Thomas Robotham, University of Huddersfield - Huddersfield, UK; Name Withheld, Removed at the request of the presenter.; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
This study aims to investigate a vertical reflection’s beneficial or detrimental contribution to subjective preference compared with perceived change in timbral and spatial attributes. A vertical reflection was electro-acoustically simulated and evaluated through subjective tests using musical stimuli in the context of listening for entertainment. Results indicate that the majority of subjects preferred audio reproduction with the addition of a reflection. Furthermore, there is a potential relationship between positive preference and the perceived level of both timbral and spatial differences, although this relationship is dependent on the stimuli presented. Subjects also described perceived differences where the reflection was present. These descriptors provide evidence suggesting a link between timbral descriptions and preference. However, this link was not observed between preference and spatial descriptions.
Convention Paper 9547 (Purchase now)

P12-9 Relative Contribution of Interaural Time and Level Differences to Selectivity for Sound Localization—Si Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Heng Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Cong Zhang, Wuhan Polytechnic University - Wuhan, Hubei, China
In the present study, we measured threshold of interaural level difference in standard stimulus (ILDs) through the interaural time difference in variable stimulus (ITDv) and tested just notice difference of interaural time difference in standard stimulus (ITDs) by the interaural level differences in variable stimulus (ILDv) for sine wave over a frequency ranging from 150 to 1500 Hz at some lateral positions of sound image. Two separate experiments were conducted based on two alternative forced-choice (2AFC) and 1 up/2 down adaptive procedure. We could explore the relative contribution of Interaural Level Difference (ILD) and Interaural Time Difference(ITD) to sound localization as a function of position and frequency from these experimental data. The results showed lateral discrimination between stimuli are not difficult at frequencies of 350, 450, 570, and 700 Hz when we tested JND of ILD in standard stimulus and the auditory system is easier to discriminate two sound images and is more sensitive to localize the lateral positions of standard stimulus as frequency is varied from 700 to 1500 Hz when we measured JND of ITD in standard stimulus.
Convention Paper 9548 (Purchase now)

P12-10 Assessment of the Impact of Spatial Audiovisual Coherence on Source Unmasking—Julian Palacino, UBO - LabSTICC - Lorient, France; Mathieu Paquier, UBO - Brest, France; Vincent Koehl, UBO - Lab-STICC - Brest, France; Frédéric Changenet, Radio France - Paris, France; Etienne Corteel, Sonic Emotion Labs - Paris, France
The present study aims at evaluating the contribution of spatial audiovisual coherence for sound source unmasking for live music mixing. Sound engineers working with WFS technologies for live sound mixing have reported that their mixing methods have radically changed. Using conventional mixing methods, the audio spectrum is balanced in order to get each instrument intelligible inside the stereo mix. In contrast, when using WFS technologies, the source intelligibility can be achieved thanks to spatial audiovisual coherence and/or sound spatialization (and without using spectral modifications). The respective effects of spatial audiovisual coherence and sound spatialization should be perceptually evaluated. As a first step, the ability of naive and expert subjects to identify a spatialized mix was evaluated by a discrimination task. For this purpose, live performances (rock, jazz, and classic) were played back to subjects with and without stereoscopic video display and VBAP or WFS audio rendering. Two sound engineers realized the audio mixing for three pieces of music and for both audio technologies in the same room where the test have been carried out. Also a lecture—see session P6-5]
Convention Paper 9516 (Purchase now)

P12-11 Modeling the Perceptual Components of Loudspeaker Distortion—Sune Lønbæk Olsen, GoerTek Audio Technologies - Copenhagen, Denmark; Technical University of Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Ewen MacDonald, Technical University of Denmark - Lyngby, Denmark; Tore Stegenborg-Andersen, DELTA SenseLab - Hørsholm, Denmark; Christer P. Volk, DELTA SenseLab - Hørsholm, Denmark; Aalborg University - Aalborg, Denmark
While non-linear distortion in loudspeakers decreases audio quality, the perceptual consequences can vary substantially. This paper investigates the metric Rnonlin [1] which was developed to predict subjective measurements of sound quality in nonlinear systems. The generalizability of the metric in a practical setting was explored across a range of different loudspeakers and signals. Overall, the correlation of Rnonlin predictions with subjective ratings was poor. Based on further investigation, an additional normalization step is proposed, which substantially improves the ability of Rnonlin to predict the perceptual consequences of non-linear distortion.
Convention Paper 9549 (Purchase now)

P12-12 Comparison of the Objective and the Subjective Parameters of the Different Types of Microphone Preamplifiers—Michal Luczynski, Wroclaw University of Technology - Wroclaw, Poland; Maciej Sabiniok, Wroclaw University of Technology - Wroclaw, Poland
The aim of this paper is to compare different types of microphone preamplifiers. The authors designed six types of preamps using different technologies (f.ex. based on vacuum tube, transistors, operational amplifiers). Assumed parameters such as input signal, gain, power supply were the same for all circuits. Preamps were tested by objective and subjective methods. Then the authors tried to find out relations between different gain components, electroacoustic parameters, and subjective sensation. The authors did not mean to create commercial devices; just to compare and classify objective and subjective parameters depending on the different types of microphone preamplifier.
Convention Paper 9550 (Purchase now)

P12-13 Plane Wave Identification with Circular Arrays by Means of a Finite Rate of Innovation Approach—Falk-Martin Hoffmann, University of Southampton - Southampton, Hampshire, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK; Philip Nelson, University of Southampton - Southampton, UK
Many problems in the field of acoustic measurements depend on the direction of incoming wave fronts w.r.t. a measurement device or aperture. This knowledge can be useful for signal processing purposes such as noise reduction, source separation, de-aliasing, and super-resolution strategies among others. This paper presents a signal processing technique for the identification of the directions of travel for the principal plane wave components in a sound field measured with a circular microphone array. The technique is derived from a finite rate of innovation data model and the performance is evaluated by means of a simulation study for different numbers of plane waves in the sound field. Also a lecture—see session P7-4]
Convention Paper 9521 (Purchase now)

P12-14 Automatic Localization of a Virtual Sound Image Generated by a Stereophonic Configuration—Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Ferruccio Bettarelli, Leaff Engineering - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Sound localization systems aim at providing the position of a particular sound source as perceived by the human auditory system. Interaural level difference, interaural time difference, and spectral representations of the binaural signals are the main cues adopted for localization. When two sound sources are simultaneously active, a virtual source is created. In this paper a novel approach is presented to provide the human perception of a sound image created by two loudspeakers. The solution is based on both frequency-dependent binaural and monaural cues in order to consider the human auditory system sensitivity to spatial sound localization. Experimental results proved the effectiveness of the proposed approach in correctly estimating the horizontal and vertical position of the virtual source.
Convention Paper 9551 (Purchase now)

P12-15 The Effect of Early Impulse Response Length and Visual Environment on Externalization of Binaural Virtual Sources—Joseph Sinker, University of Salford - Salford, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK
When designing an audio-augmented-reality (AAR) system capable of rendering acoustic “overlays” to real environments, it is advantageous to create externalized virtual sources with minimal computational complexity. This paper describes experiments designed to explore the relationships between early impulse response (EIR) length, visual environment and perceived externalization, and to identify if reduced IR data can effectively render a virtual source in matched and unmatched environments. In both environments a broadly linear trend is exhibited between EIR length and perceived externalization, and statistical analysis suggests a threshold at approximately 30-40 ms above which the extension of the EIR yields no significant increase in externalization.
Convention Paper 9552 (Purchase now)

P12-16 The Perception of Vertical Image Spread by Interchannel Decorrelation—Christopher Gribben, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Subjective listening tests were conducted to assess the general perception of decorrelation in the vertical domain. Interchannel decorrelation was performed between a pair of loudspeakers in the median plane; one at ear level and the other elevated 30° above. The test stimuli consisted of decorrelated octave-band pink noise samples (63–8000 Hz), generated using three decorrelation techniques—each method featured three degrees of the interchannel cross-correlation coefficient (ICCC): 0.1, 0.4, and 0.7. Thirteen subjects participated in the experiment, using a pairwise comparison method to grade the sample with the greater perceived vertical image spread (VIS). Results suggest there is broadly little difference of overall VIS between decorrelation methods, and changes to vertical interchannel decorrelation appear to be better perceived in the upper-middle-frequencies. [Also a lecture—see session 6-3]
Convention Paper 9514 (Purchase now)

P13 - Perception: Part 2

Monday, June 6, 08:45 — 11:15 (Room 353)

Chair:
Thomas Görne, Hamburg University of Applied Sciences - Hamburg, Germany

P13-1 Exploiting Envelope Fluctuations to Enhance Binaural Perception—G. Christopher Stecker, Vanderbilt University School of Medicine - Nashville, TN, USA
A review of recent and classic studies of binaural perception leads to the conclusion that envelope fluctuations, such as sound onsets, play a critical role in the sampling of spatial information from auditory stimuli. Specifically, listeners’ perception of sound location corresponds with the binaural cues (interaural time and level differences) that coincide with brief increases in sound amplitude, and disregards binaural cues occurring at other times. This discrete, envelope-triggered sampling of binaural information can be exploited to enhance spatial perception of synthesized sound mixtures, or to facilitate the localization of mixture components. Also a poster—see session P19-9
Convention Paper 9553 (Purchase now)

P13-2 Two Alternative Minimum-Phase Filters Tested Perceptually—Robert Mores, University of Applied Sciences Hamburg - Hamburg, Germany; Ralf Hendrych, University of Applied Sciences Hamburg - Hamburg, Germany
A widely used method for designing minimum phase filters is based on the real cepstrum (Oppenheim, 1975). An alternative method is proposed for symmetric FIR filters that flips the filter’s “left side” around the central coefficient to the “right side” using a sinus ramp of perceptually irrelevant duration. The resulting phase is nearly minimal and nearly linear. The method is applied to impulse responses. Perception tests use original sound samples (A), samples processed by real-cepstrum-based minimum phase filters (B), and samples processed by the proposed method (C). The tests reveal that for impulsive sound samples the perceived dissimilarity between A and C is smaller than the dissimilarity between A and B suggesting that the alternative method has some potential for sound processing. Also a poster—see session P19-1]
Convention Paper 9554 (Purchase now)

P13-3 Subjective Listening Tests for Preferred Room Response in Cinemas - Part 2: Preference Test Results—Linda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA
SMPTE and ISO have specified near identical in-room target response curves for cinemas and dubbing stages. However, to this author's knowledge, to date these standards have never been scientifically tested and validated with modern technology and measurement techniques. For this reason it is still not known if the current SMPTE and ISO in-room target response curves are optimal or if better solutions exist. Using a Binaural Room Scanning system for room capture and simulation, various seating positions in three cinemas were reproduced through headphones for the purpose of conducting controlled listening experiments. This system used a binaural mannequin equipped with a computer-controlled rotating head to accurately capture binaural impulse responses of the sound system and the listening space which are then reproduced via calibrated headphones equipped with a head-tracker. In this way controlled listening evaluations can be made among different cinema audio systems tuned to different in-room target responses. Results from a MUSHRA-style preference test are presented. (Also a poster—see session P19-8)
Convention Paper 9555 (Purchase now)

P13-4 Binaural Spatialization over a Bone Conduction Headset: Minimum Discernable Angular Difference—Amit Barde, University of Canterbury - Christchurch, Canterbury, New Zealand; William S. Helton, University of Canterbury - Christchurch, Canterbury, New Zealand; Gun Lee, University of Canterbury - Christchurch, Canterbury, New Zealand; Mark Billinghurst, University of South Australia - Mawson Lakes, South Australia, Australia
Binaural spatialization in the horizontal plane over a bone conduction headset (BCH) was investigated using inexpensive and commercially available hardware and software components. The aim of this study was to determine the minimum discernable angular difference between two successively spatialized sound sources. Localization accuracy and externalization was also explored. Statistically significant results were observed for angular separations of 10° and above. Localization accuracy was found to be significantly poorer than that seen for previous loudspeaker and headphone based reproduction. Localization errors between 30° – 35° were observed for stimuli presented in front, back, and sides and 92% of the participants reported externalization. The study demonstrates that an acceptable level of spatial resolution and externalization is achievable using an inexpensive bone conduction headset and software components.
Convention Paper 9556 (Purchase now)

P13-5 The Harmonic Centroid as a Predictor of String Instrument Timbral Clarity—Kirsten Hermes, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Chris Hummersone, University of Surrey - Guildford, Surrey, UK
Spectrum is an important factor in determining timbral clarity. An experiment where listeners rate the changes in timbral clarity resulting from spectral equalization (EQ) can provide insight into the relationship between EQ and the clarity of string instruments. Overall, higher frequencies contribute to clarity more positively than lower ones, but the relationship is program-item-dependent. Fundamental frequency and spectral slope both appear to be important. Change in harmonic centroid (or dimensionless spectral centroid) correlates well with change in clarity, more so than octave band boosted/cut, harmonic number boosted/cut, or other variations on the spectral centroid. [Also a poster—see session P19-7]
Convention Paper 9557 (Purchase now)

P14 - Audio Signal Processing: Part 3—Audio Applications

Monday, June 6, 08:45 — 12:15 (Room 352B)

Chair:
Iva Salom, Institute Mihajlo Pupin, University of Belgrade - Belgrade, Serbia

P14-1 Ensemble Effect Using Gaussian Matrices—Connor McCullough, Bose Corporation - Boston, MA, USA; University of Miami - Coral Gables, FL
The purpose of this paper is to propose an algorithm to serve as an alternative to the chorus effect, the current standard for simulating an ensemble from a single track. Due to the deterministic nature of chorus, specifically the use of an LFO to modulate the delay, chorus often has audible oscillation and does not truly model the behavior of musicians playing simultaneously. The proposed alternative is the implementation of a Gaussian-based algorithm that attempts to model the actual process of musicians playing together. This modeling will be achieved by generating a Gaussian matrix ([# of instruments] x [# of notes]), with each index containing a resampling factor that will temporally and tonally shift each note in a recording. While the Gaussian distribution will serve as the basis for the algorithm, additional constraints will be applied to the resampling factor in order to properly model ensemble behavior.
Convention Paper 9558 (Purchase now)

P14-2 A Loudness Function for Analog and Digital Sound Systems Based on Equal Loudness Level Contours—Sofus Birkedal Nielsen, Aalborg University - Aalborg, Denmark
A new and better loudness compensation has been designed based on the differences between the Equal Loudness Level Contours (ELLC) in ISO 226:2003. Sound productions are normally being mixed at a high Mixing Level (ML) in dB but often played at a lower listening level, which means that the perceived frequency balance will been changed both for LL lower or higher than ML. The differences in ELLC ask for a level based equalization using fractional-order filters. A designing technique for both analog and digital fractional-order filters has been developed. The analog solution is based on OPAMs and the digital solution is realized in a 16/32 bit fixed point DSP and could be implemented in any sound producing system.
Convention Paper 9559 (Purchase now)

P14-3 Spatial Multi-Zone Sound Field Reproduction Using Higher-Order Loudspeakers in Reverberant Rooms—Keigo Wakayama, NTT Service Evolution Laboratories - Kanagawa, Japan; Hideaki Takada, NTT Service Evolution Laboratories - Kanagawa, Japan
We propose a method for reproducing multi-zone sound fields in a reverberant room using an array of higher-order loudspeakers. This method enables sparse arrangement of loudspeakers and reproduction of independent sound fields for multiple listeners without the need for headphones. For multi-zone reproduction, global sound field coefficients are obtained using translation operator. By using the coefficient of the room transfer function measured or simulated with an extension of the image-source method, the loudspeakers’ coefficients are then calculated with the minimum norm method in the cylindrical harmonic domain. From experiments of two-zone and three-zone examples, we show that there was a 2N + 1-fold decrease in the number of Nth-order loudspeakers for accurate reproduction with the proposed method compared to conventional methods.
Convention Paper 9560 (Purchase now)

P14-4 WITHDRAWN—N/A

Convention Paper 9561 (Purchase now)

P14-5 Comparison of Simple Self-Oscillating PWM Modulators—Nicolai Dahl, Technical University of Denmark - Lyngby, Denmark; Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Switch-mode power amplifiers has become the conventional choice for audio applications due to their superior efficiency and excellent audio performance. These amplifiers rely on high frequency modulation of the audio input. Conventional modulators use a fixed high frequency for modulation. Self-oscillating modulators do not have a fixed modulation frequency and can provide good audio performance with very simple circuitry. This paper proposes a new type of self-oscillating modulator. The proposed modulator is compared to an already existing modulator of similar type and their performances are compared both theoretically and experimentally. The result shows that the proposed modulator provides a higher degree of linearity resulting in around 2% lower Total Harmonic Distortion (THD). [Also a poster—see session P19-10]
Convention Paper 9562 (Purchase now)

P14-6 Low Energy Audio DSP Design: Going Beyond The Hardware Barrier—Jamie Angus, University of Salford - Salford, Greater Manchester, UK; JASA Consultancy - York, UK
Modern digital audio signal processors need to be energy efficient, both for mobile audio and environmental concerns. Improving technology has been reducing the power of these devices via better, smaller, transistors and reduced voltage swings between one and zero. However, there is a limit to how far this improvement can go. To further reduce processor energy consumption the number of transitions between one and zero must be reduced. This paper presents a method of doing this to, instructions, addresses, and data. By looking at the interaction between their usage statistics and their digital representation and modifying it to match the usage a reduction in energy consumption is achieved. The paper present both measured usage statistics, and bit allocation strategies to achieve this.
Convention Paper 9563 (Purchase now)

P14-7 WITHDRAWN—N/A

Convention Paper 9564 (Purchase now)

P15 - Live Sound Practice, Rendering, Human Factors and Interfaces

Monday, June 6, 08:45 — 10:45 (Foyer)

P15-1 Evaluation of Quality Features of Spatial Audio Signals in Non-Standardized Rooms: Two Mixed Method Studies—Ulrike Sloma, Technische Universität Ilmenau - Ilmenau, Germany
It is known that the propagation and the characteristics of a reproduced sound wave in a room is influenced by the room acoustics. So does the perceived sound quality, too? To answer this question it is indispensable to research the quality evaluation of reproduced spatial audio signals in non-standardized rooms and to compare the results with those from standardized listening rooms, in which quality evaluation is usually conducted. Beside the overall quality it is reasonable to assess which parameters of the room acoustics have influence on which quality features. To evaluate the principle influence of different listening rooms on the perception of audio signals, two listening tests are conducted in which three acoustical different rooms are examined. In the first study the aim was to find out if there is an influence on the basic audio quality and five given quality features. This was realized as a single stimulus test. Based on the results a second test was conducted. The approach was adapted from the Open Profiling of Quality. The results from the studies suggest that the influence of the room characteristics are of minor importance on the perception of spatial audio signals.
Convention Paper 9565 (Purchase now)

P15-2 Novel Designs for the Audio Mixing Interface Based on Data Visualization First Principles—Christopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
Given the shortcomings of current audio mixing interfaces (AMIs) this study focuses on the development of alternative AMIs based on data visualization first principles. The elementary perceptual tasks defined by Cleveland informed the design process. Two design ideas were considered for pan: using the elementary perceptual tasks “scale” to display pan on either a single or multiple horizontal lines. Four design ideas were considered for level: using “length,” “area,” “saturation,” or “scalable icon” for visualization. Each level idea was prototyped with each pan idea, totaling eight novel interfaces. Seven subjects undertook a usability evaluation, replicating a 16 channel reference mix with each interface. Results showed that “scalable icons” especially on multiple horizontal lines appear to show potential.
Convention Paper 9566 (Purchase now)

P15-3 The Method for Generating Movable Sound Source—Heng Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Yafei Wu, Wuhan University - Wuhan, Hubei, China; Cong Zhang, Wuhan Polytechnic University - Wuhan, Hubei, China
The rapid development of 3D video inspired the demand for 3D audio technology and products, but the products on the market currently are limited to follow the original stereo or surround sound technology; it is difficult to produce a three-dimensional sound field audio effect synchronized with 3D video content. The method is based on VBAP principle, derived 3D space movable sound source generating principles and formulas, and implement a method of generate movable sound source in the build 3D audio system. The produced virtual sound source could customize different trajectories and speed in 3D space. On the maximum base of keeping the existing audio equipment, only needing to configure the equipment to the allocation model, we could make the audience feel truly ubiquitous shock audio-visual enjoyment. After field tests, the movement of the movable virtual sound source multitasks generated by this method is obvious in the 3D sound field, and not only provides a good method for generating 3D movable sound source for future research and experimental film-making, but also for 3D audio in home entertainment promotion.
Convention Paper 9567 (Purchase now)

P15-4 Graphical Interface Aimed for Organizing Music Based on Mood of Music—Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.; Mateusz Bieñ, Academy of Music in Kraków - Kraków, Poland
Mood of music is one of the most intuitive criteria for listeners, thus it is used in automated systems for organizing music. This study is based on the emotional content of music and its automatic recognition and contains outcomes of a series of experiments related to building models and description of emotions in music. One-hundred-fifty-four excerpts from 10 music genres were evaluated in the listening experiments using a graphical model proposed by the authors, dedicated to the subjective evaluation of mood of music. The proposed model of mood of music was created in a Max MSP environment. Automatic mood recognition employing SOM and ANN was carried out and both methods returned results coherent with subjective evaluation.
Convention Paper 9568 (Purchase now)

P15-5 Subjective Evaluation of High Resolution Audio through Headphones—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan; Ryuta Yamamoto, Digifusion Japan Co., Ltd. - Hiroshima, Japan; Katsuyuki Niyada, Hiroshima Cosmopolitan University - Hiroshima, Japan
Recently, high resolution audio (HRA) can be played back through portable devices and spreads across musical genres and generation. It means that most people listen to HRA through headphones and earphones. In this study perceptual discrimination among audio formats including HRA has been invested using a headphones. Thirty-six subjects, who have a variety of audio and musical experience in the wide age range from 20s to 70s, participated in listening tests. Headphone presentation is superior in discriminating the details to the loudspeaker presentation. It is, however, found that the headphone presentation is weak in reproducing presence and reality. Audio enthusiasts and musicians could significantly discriminate audio formats than ordinary listeners in both headphone and loudspeaker listening conditions. Also a lecture—see session P10-1]
Convention Paper 9529 (Purchase now)

P15-6 Accelerometer Based Motional Feedback Integrated in a 2 3/4" Loudspeaker—Ruben Bjerregaard, Technical University of Denmark - Kongens Lyngby, Denmark; Anders N. Madsen, Technical University of Denmark - Kongens Lyngby, Denmark; Henrik Schneider, Technical University of Denmark - Kgs. Lyngby, Denmark; Finn T. Agerkvist, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
It is a well known fact that loudspeakers produce distortion when they are driven into large diaphragm displacements. Various methods exist to reduce distortion using forward compensation and feedback methods. Acceleration based motional feedback is one of these methods and was already thoroughly described in the 1960s showing good results at low frequencies. In spite of this, the technique has mainly been used for closed box subwoofers to a limited extent. In this paper design and experimental results for a 23 /4 " acceleration based motional feedback loudspeaker are shown to extend this feedback method to a small full range loudspeaker. Furthermore, the audio quality from the system with feedback is discussed based on measurements of harmonic distortion, intermodulation distortion, and subjective evaluation. [Also a lecture—see session P10-6]
Convention Paper 9534 (Purchase now)

P15-7 A Headphone Measurement System Covers both Audible Frequency and beyond 20 kHz (Part 2)—Naotaka Tsunoda, Sony Corporation - Shinagawa-ku, Tokyo, Japan; Takeshi Hara, Sony Video & Sound Products Inc. - Tokyo, Japan; Koji Nageno, Sony Video and Sound Corporation - Tokyo, Japan
A new scheme consists of measurement by wide range HATS, and the free-field HRTF correction was proposed to enable entire frequency response measurement from audible frequency and higher frequency area up to 140 kHz and for direct comparison with free field loud speaker frequency response. This report supplements the previous report [N. Tsunoda et al., “A Headphone Measurement System for Audible Frequency and Beyond 20kHz,” AES Convention 139, October 2015, convention paper 9375] that described system concept by adding ear simulator detail and tips to obtain reliable data with much improved reproducibility. Also a lecture—see session P10-2]
Convention Paper 9530 (Purchase now)

P15-8 Deep Neural Networks for Dynamic Range Compression in Mastering Applications—Stylianos Ioannis Mimilakis, Fraunhofer Institute for Digital Media Technology (IDMT) - Ilmenau, Germany; Konstantinos Drossos, Tampere University of Technology - Tampere, Finland; Tuomas Virtanen, Tampere University of Technology - Tampere, Finland; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany; Fraunhofer Institute for Digital Media technology (IDMT) - Ilmenau, Germany
The process of audio mastering often, if not always, includes various audio signal processing techniques such as frequency equalization and dynamic range compression. With respect to the genre and style of the audio content, the parameters of these techniques are controlled by a mastering engineer, in order to process the original audio material. This operation relies on musical and perceptually pleasing facets of the perceived acoustic characteristics, transmitted from the audio material under the mastering process. Modeling such dynamic operations, which involve adaptation regarding the audio content, becomes vital in automated applications since it significantly affects the overall performance. In this work we present a system capable of modelling such behavior focusing on the automatic dynamic range compression. It predicts frequency coefficients that allow the dynamic range compression, via a trained deep neural network, and applies them to unmastered audio signal served as input. Both dynamic range compression and the prediction of the corresponding frequency coefficients take place inside the time-frequency domain, using magnitude spectra acquired from a critical band filter bank, similar to humans’ peripheral auditory system. Results from conducted listening tests, incorporating professional music producers and audio mastering engineers, demonstrate on average an equivalent performance compared to professionally mastered audio content. Improvements were also observed when compared to relevant and commercial software. Also a lecture—see session P11-3]
Convention Paper 9539 (Purchase now)

P15-9 Visualization Tools for Soundstage Tuning in Cars—Delphine Devallez, Arkamys - Paris, France; Alexandre Fénières, Arkamys - Paris, France; Vincent Couteaux, Telecom ParisTech - Paris, France
In order to improve the spatial fidelity of automotive audio systems by means of digital signal processing, the authors investigated means to objectively assess the spatial perception of reproduced stereophonic sound in car cabins. It implied choosing a convenient binaural microphonic system representative of real listening situations and metrics to analyze interaural time differences under 1.5~kHz in those binaural recordings. Frequency-dependent correlation correctly showed the frequencies at which the fidelity was improved and allowed to quantify the improvement. The time-domain correlation seemed to be a good indicator of the apparent source width, but failed at giving the perceived azimuth of the virtual sound source. Therefore that metric must be refined to be used efficiently during audio tunings. (Also a poster—see session P10-7)
Convention Paper 9536 (Purchase now)

P15-10 The Difference between Stereophony and Wave Field Synthesis in the Context of Popular Music—Christoph Hold, Technische Universität Berlin - Berlin, Germany; Hagen Wierstorf, Technische Universität Ilmenau - Ilmenau, Germany; Alexander Raake, Technische Universität Ilmenau - Ilmenau, Germany
Stereophony and Wave Field Synthesis (WFS) are capable of providing the listener with a rich spatial audio experience. They both come with different advantages and challenges. Due to different requirements during the music production stage, a meaningful direct comparison of both methods has rarely been carried out in previous research. As stereophony relies on a channel- and WFS on a model-based approach, the same mix cannot be used for both systems. In this study mixes of different popular-music recordings have been generated, each for two-channel stereophony, surround stereophony, and WFS. The focus is on comparability between the reproduction systems in terms of the resulting sound quality. In a paired-comparison test listeners rated their preferred listening experience. (Also a lecture—see session P10-5)
Convention Paper 9533 (Purchase now)

P15-11 Can Bluetooth ever Replace the Wire?—Jonny McClintock, Qualcomm Technology International Ltd. - Belfast, Northern Ireland, UK
Bluetooth is widely used as a wireless connection for audio applications including mobile phones, media players, and wearables, removing the need for cables. The combination of the A2DP protocol and frame based codecs used in many Bluetooth stereo audio implementations have led to excessive latency and acoustic performance significantly below CD quality. This paper will cover the latest developments in Bluetooth audio connectivity that will deliver CD quality audio, or better, and low latency for video and gaming applications. These developments together with the increased battery life delivered by Bluetooth Smart could lead to the elimination of wires for many applications. [Also a lecture—see session P11-2]
Convention Paper 9538 (Purchase now)

P16 - Recording and Production Techniques

Monday, June 6, 12:30 — 14:30 (Room 352B)

Chair:
Sonja Krstic, School of Electrical Engineering and Computer Science - Belgrade, Serbia

P16-1 Microphone Array Design Applied to Complete Hemispherical Sound Reproduction—From Integral 3D to Comfort 3D—Michael Williams, Sounds of Scotland - Le Perreux sur Marne, France
This paper describes the parameters that need to be taken into account in the design of a 13 channel microphone array recording system for reproduction also in a 13 loudspeaker hemispherical configuration. Both the microphone array and the loudspeaker array use 8 channels in the horizontal reference plane, 4 channels in the 45° elevation plane, and a Zenith channel at the top (90° elevation). This paper will also describe the various stages of advancement to complete 3D coverage (Integral 3D), and the logical development of this type of array to a new format proposition - the 16 channel “Comfort 3D format.”
Convention Paper 9569 (Purchase now)

P16-2 Object-Based Audio Recording Methods—Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Jean-Marc Lyzwa, CNSMDP - Paris, France; Delphine Devallez, Arkamys - Paris, France; Catherine De Boisheraud, CNSMDP - Paris, France
The new ADM standard enables to define an audio file as object-based audio. Along with many other functionalities, the polar coordinates can be specified for each audio object. An audio scene can therefore be described independently of the reproduction system. This means that an object-based recording can be rendered on a 5.1 system, a binaural system, or any other system. In the case of a binaural system, it also gives the opportunity to interact with the audio content, as a headtracker can be used to follow the movements of the listener’s head and change the binaural rendering accordingly. This paper describes how such an object-based recording can be achieved. Also a poster—see session P19-6]
Convention Paper 9570 (Purchase now)

P16-3 A Further Investigation of Echo Thresholds for the Optimization of Fattening Delays—Michael Uwins, University of Huddersfield - Huddersfield, UK; Dan Livesey, Confetti College, Nottingham Trent University - Nottingham, UK
Since the introduction of stereophonic sound systems, mix engineers have developed and employed numerous artificial methods in order to enhance their productions. A simple yet notable example is the effect commonly known as “fattening,” where a mono signal is cloned, delayed, and then panned to the opposite side of the stereo field. The technique can improve a sound’s prominence in the mix by increasing its overall amplitude while creating a pseudostereo image and is a consequence of a renowned psychoacoustic phenomenon, the “precedence effect.” The aim of this investigation was to build upon previous accepted studies, conducting further experiments in order to produce refined estimates for echo thresholds for elements common to a multi-track music production. This investigation obtained new estimates of echo thresholds and fattening delay times, for a variety of isolated instrumental and vocal recordings, as perceived by a sample population of trained mix engineers. The study concludes that current recommendation for delay times used to create fattening effects should be refined, taking into account not only those features of the but also the consequences of temporal and spectral masking, when applied in the context of a multitrack Also a poster—see session P19-2]
Convention Paper 9571 (Purchase now)

P16-4 An Investigation into the Sonic Signature of Three Classic Dynamic Range Compressors—Austin Moore, University of Huddersfield - Huddersfield, UK; Rupert Till, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
Dynamic range compression (DRC) is a much-used process in music production. Traditionally the process was implemented in order to control the dynamic range of program material to minimize the potential of overloading recording devices. However, over time DRC became a process that was applied more as a creative effect and less as a preventative measure. In a professional recording environment it is not uncommon for engineers to have access to several different types of DRC unit, each with their own purportedly unique sonic signature. This paper investigates the differences between three popular vintage dynamic range compressors by conducting a number of measurements on the devices. The compressors were tested using: THD measurements, tone bursts, and objective analysis of music-based material using spectrum analysis and audio feature extraction.
Convention Paper 9572 (Purchase now)

P17 - Rendering Systems

Monday, June 6, 14:45 — 16:45 (Room 353)

Chair:
Libor Husnik, Czech Technical University in Prague, Faculty of Electrical Engineering - Prague Czech Republic

P17-1 Optimized Spatial Layout for Virtual Spatial Audio Conference—Liyun Pang, Huawei Technologies Duesseldorf GmbH - Düsseldorf, Germany; European Research Center - Munich, Germany; Pablo Hoffmann, Huawei Technologies Düsseldorf GmbH - Düsseldorf, Germany
This paper is related to three-dimensional (3D) audio signal processing for virtual spatial audio conference applications. The core idea is to provide users of a virtual spatial audio conference with recommendations for the optimal spatial layout (spatial arrangement) of participants where optimal implies maximal speech intelligibility. That is, the listener’s ability to understand speech from an individual speaker in a multi-speaker scenario is enhanced. The idea combines information of the individual’s voice together with directional audio information to estimate the speech intelligibility of candidate spatial layouts (spatial arrangement). The candidate layout that provides the best speech intelligibility estimate is then selected.
Convention Paper 9573 (Purchase now)

P17-2 A Listener Adaptive Optimal Source Distribution System for Virtual Sound Imaging—Marcos F. Simon Galvez, University of Southampton - Southampton, Hampshire, UK; Takeshi Takeuchi, University of Southampton - Southampton, Hampshire, UK; Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
This paper describes the use of an optimal source distribution loudspeaker array for binaural reproduction. In this paper the device is made adaptive to the listener’s position. This is obtained by using a fixed set of crosstalk cancellation filters created for a central listening position plus a listener-position dependent delay network, which varies the delay of each loudspeaker unit to maximize the cross-talk cancellation response at the listener’s position. The paper introduces the formulation required for the adaptive control and simulated results predicting the performance of the device for symmetrical and asymmetrical listening positions. It is also shown how the proposed formulation has been implemented on a Sherwood S7 OPSODIS soundbar.
Convention Paper 9574 (Purchase now)

P17-3 Multizone Soundfield Reproduction with Virtual Elevations Using a Linear Loudspeaker Array—Wenyu Jin, Huawei Technologies Düsseldorf GmbH - Düsseldorf, Germany; European Research Center - Munich, Germany; Milos Markovic, Huawei Technologies Düsseldorf GmbH - Düsseldorf, Germany; European Research Center - Munich, Germany; Liyun Pang, Huawei Technologies Duesseldorf GmbH - Düsseldorf, Germany; European Research Center - Munich, Germany
Multizone soundfield reproduction has drawn the researchers' attention recently. The paper introduces a soundfield rendering system for a simultaneous reproduction of sound sources with different elevations over multiple listening areas using a linear loudspeaker array. A novel method based on the usage of HRTF (Head Related Transfer Function) spectral elevation cues in conjunction with a horizontal multi-zone sound rendering system is proposed. The proposed method is in a dual-band manner and it aims to accurately reproduce the desired 3-D elevated sound with the consideration of HRTF cues within the selected listening zones, while also minimizing the sound leakage to the targeted quiet zones over the entire audio frequency range. A listening test is conducted and the results confirm the feasibility of simultaneous multizone soundfield rendering with different elevation using a single 2-D loudspeaker array.
Convention Paper 9575 (Purchase now)

P17-4 Characterization of the Acoustical Directivity of a Speaker on a Sound Bar Enclosure: A Comparison between Measurements, Boundary Element Method, and a Spheroidal Analytical Model—Vincent Roggerone, Ecole Polytechnique - Palaiseau, France; Xavier Boutillon, Ecole Polytechnique - Palaiseau, France; Etienne Corteel, Sonic Emotion Labs - Paris, France
The directivity of a sound bar of slender shape is analyzed. Measurements are compared to the results of a boundary element method. A good agreement is obtained in the low-mid frequency range. In order to reduce the computing time, a geometrical approximation based on a spheroidal analytical model is also considered. This approximation holds up to a certain frequency. The spheroid shape produces a more regular sound field pattern.
Convention Paper 9576 (Purchase now)

P18 - Human Factors and Interfaces

Monday, June 6, 14:45 — 15:45 (Room 352B)

Chair:
Sorgun Akkor, STD Ltd. - Istanbul, Turkey

P18-1 Robustness of Speaker Recognition from Noisy Speech Samples and Mismatched Languages—Ahmed Al-Noori, University of Salford - Salford, Greater Manchester, UK; Francis F. Li, University of Salford - Salford, UK; Philip J. Duncan, University of Salford - Salford, Greater Manchester, UK
Speaker recognition systems can typically attain high performance in ideal conditions. However, significant degradations in accuracy are found in channel-mismatched scenarios. Non-stationary environmental noises and their variations are listed at the top of speaker recognition challenges. Gammtone frequency cepstral coefficient method (GFCC) has been developed to improve the robustness of speaker recognition. This paper presents systematic comparisons between performance of GFCC and conventional MFCC-based speaker verification systems with a purposely collected noisy speech data set. Furthermore, the current work extends the experiments to include investigations into language independency features in recognition phases. The results show that GFCC has better verification performance in noisy environments than MFCC. However, the GFCC shows a higher sensitivity to language mismatch between enrollment and recognition phase.
Convention Paper 9577 (Purchase now)

P18-2 A Reliable Singing Voice-Driven MIDI Controller Using Electroglottographic Signal—Kostas Kehrakos, TEI of Crete - E. Daskalaki, Greece; Christos Chousidis, Brunel University - Uxbridge, London, UK; Spyros Kouzoupis, TEI of Crete - E. Daskalaki, Greece
Modern synthesizers and software created a need for encoding musical performance. However, human singing voice, which is the dominant means of musical expression, lacks a reliable encoding system. This is because of the difficulties we face to extract the necessary control information from its heavy harmonic content. In this paper a novel singing voice MIDI controller, based on the Electroglottography is proposed. Electroglottographic signal has lower harmonic content than audio, but it contains enough information to describe music expression. The system uses autocorrelation for pitch extraction and a set of supplementary algorithms to provide information for dynamics and duration. The results show that the proposed system can serve as a platform for the development of reliable singing voice MIDI controllers.
Convention Paper 9578 (Purchase now)

P19 - Perception Part 2, Audio Signal Processing Part 3, and Recording and Production Techniques

Monday, June 6, 14:45 — 16:45 (Foyer)

P19-1 Two Alternative Minimum-Phase Filters Tested Perceptually—Robert Mores, University of Applied Sciences Hamburg - Hamburg, Germany; Ralf Hendrych, University of Applied Sciences Hamburg - Hamburg, Germany
A widely used method for designing minimum phase filters is based on the real cepstrum (Oppenheim, 1975). An alternative method is proposed for symmetric FIR filters that flips the filter’s “left side” around the central coefficient to the “right side” using a sinus ramp of perceptually irrelevant duration. The resulting phase is nearly minimal and nearly linear. The method is applied to impulse responses. Perception tests use original sound samples (A), samples processed by real-cepstrum-based minimum phase filters (B), and samples processed by the proposed method (C). The tests reveal that for impulsive sound samples the perceived dissimilarity between A and C is smaller than the dissimilarity between A and B suggesting that the alternative method has some potential for sound processing. [Also a lecture—see session P13-2]
Convention Paper 9554 (Purchase now)

P19-2 A Further Investigation of Echo Thresholds for the Optimization of Fattening Delays—Michael Uwins, University of Huddersfield - Huddersfield, UK; Dan Livesey, Confetti College, Nottingham Trent University - Nottingham, UK
Since the introduction of stereophonic sound systems, mix engineers have developed and employed numerous artificial methods in order to enhance their productions. A simple yet notable example is the effect commonly known as “fattening,” where a mono signal is cloned, delayed, and then panned to the opposite side of the stereo field. The technique can improve a sound’s prominence in the mix by increasing its overall amplitude while creating a pseudostereo image and is a consequence of a renowned psychoacoustic phenomenon, the “precedence effect.” The aim of this investigation was to build upon previous accepted studies, conducting further experiments in order to produce refined estimates for echo thresholds for elements common to a multi-track music production. This investigation obtained new estimates of echo thresholds and fattening delay times, for a variety of isolated instrumental and vocal recordings, as perceived by a sample population of trained mix engineers. The study concludes that current recommendation for delay times used to create fattening effects should be refined, taking into account not only those features of the but also the consequences of temporal and spectral masking, when applied in the context of a multitrack mix. Also a lecture—see session P16-3]
Convention Paper 9571 (Purchase now)

P19-3 Extraction of Anthropometric Measures from 3D-Meshes for the Individualization of Head-Related Transfer Functions—Manoj Dinakaran, Huawei Technologies, European Research Center - Munich, Germany; Technical University of Berlin - Berlin, Germany; Peter Grosche, Huawei Technologies European Research Center - Munich, Germany; Fabian Brinkmann, Technical University of Berlin - Berlin, Germany; Stefan Weinzierl, Technical University of Berlin - Berlin, Germany
Anthropometric measures are used for individualizing head-related transfer functions (HRTFs) for example, by selecting best match HRTFs from a large library or by manipulating HRTF with respect to anthropometrics. Within this process, an accurate extraction of anthropometric measures is crucial as small changes may already influence the individualization. Anthropometrics can be measured in many different ways, e.g., from pictures or anthropometers. However, these approaches tend to be inaccurate. Therefore, we propose to use Kinect for generating individual 3D head-and-shoulder meshes from which anthropometrics are automatically extracted. This is achieved by identifying and measuring distances between characteristics points on the outline of each mesh and was found to yield accurate and reliable estimates of corresponding features. In our experiment, a large set of anthropometric measures was automatically extracted for 61 subjects and evaluated in terms of a cross-validation against the manually extracted correspondent.
Convention Paper 9579 (Purchase now)

P19-4 Methods of Phase-Aligning Individual Instruments Recorded with Multiple Microphones during Post-Production—Bartlomiej Kruk, Wroclaw University Technology, Faculty of Electronics Chair of Acoustic and Multimedia - Wroclaw, Poland; University of Applied Science - Nysa, Poland; Aleksander Sobecki, Wroclaw University of Technology - Wroclaw, Poland
When recording any instrument, like a guitar cabinet or a drum set with a multi-microphone setup, phase plays a key role in shaping the sound. Despite the importance, phase is often overlooked during the recording process because of lack of time or experience. Then during mixing stage engineers tend to use equalizers and compressors to correct issues that might originate in signals not being well time-aligned. Phase measuring tools like goniometers are widely used by mastering engineers to diagnose any phase related issues in a mix, yet their usefulness in shaping sounds of individual instruments is vastly overlooked. The main aim of this paper is to present and analyze easy phase-aligning methods.
Convention Paper 9580 (Purchase now)

P19-5 Wireless Sensor Networks for Sound Design: A Summary on Possibilities and Challenges—Felipe Reinoso Carvalho, Vrije Universiteit Brussel, ETRO - Pleinlaan 2, 1050, Brussels; Ku Leuven; Abdellah Touhafi, Vrije Universiteit Brussel - Pleinlaan, Brussels; Kris Steenhaut, Vrije Universiteit Brussel - Pleinlaan, Brussels
This article presents opportunities of using Wireless Sensor Networks (WSNs) equipped with acoustic sensors as tools for sound design. We introduce the technology, examples considered as State of the Art, and several potential applications involving different profiles of sound design. The importance of adding real-time audio-messages into sound design is considered a main issue in this proposal. Actual technological situation and challenges are here discussed. The usage of WSNs for sound design is plausible, although technological challenges demand strong interaction between sound designers and WSN developers.
Convention Paper 9581 (Purchase now)

P19-6 Object-Based Audio Recording Methods—Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Jean-Marc Lyzwa, CNSMDP - Paris, France; Delphine Devallez, Arkamys - Paris, France; Catherine De Boisheraud, CNSMDP - Paris, France
The new ADM standard enables to define an audio file as object-based audio. Along with many other functionalities, the polar coordinates can be specified for each audio object. An audio scene can therefore be described independently of the reproduction system. This means that an object-based recording can be rendered on a 5.1 system, a binaural system, or any other system. In the case of a binaural system, it also gives the opportunity to interact with the audio content, as a headtracker can be used to follow the movements of the listener’s head and change the binaural rendering accordingly. This paper describes how such an object-based recording can be achieved. [Also a paper—see session P16-2]
Convention Paper 9570 (Purchase now)

P19-7 The Harmonic Centroid as a Predictor of String Instrument Timbral Clarity—Kirsten Hermes, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK; Chris Hummersone, University of Surrey - Guildford, Surrey, UK
Spectrum is an important factor in determining timbral clarity. An experiment where listeners rate the changes in timbral clarity resulting from spectral equalization (EQ) can provide insight into the relationship between EQ and the clarity of string instruments. Overall, higher frequencies contribute to clarity more positively than lower ones, but the relationship is program-item-dependent. Fundamental frequency and spectral slope both appear to be important. Change in harmonic centroid (or dimensionless spectral centroid) correlates well with change in clarity, more so than octave band boosted/cut, harmonic number boosted/cut, or other variations on the spectral centroid. [Also a paper—see session P13-5]
Convention Paper 9557 (Purchase now)

P19-8 Subjective Listening Tests for Preferred Room Response in Cinemas - Part 2: Preference Test Results—Linda A. Gedemer, University of Salford - Salford, UK; Harman International - Northridge, CA, USA
SMPTE and ISO have specified near identical in-room target response curves for cinemas and dubbing stages. However, to this author's knowledge, to date these standards have never been scientifically tested and validated with modern technology and measurement techniques. For this reason it is still not known if the current SMPTE and ISO in-room target response curves are optimal or if better solutions exist. Using a Binaural Room Scanning system for room capture and simulation, various seating positions in three cinemas were reproduced through headphones for the purpose of conducting controlled listening experiments. This system used a binaural mannequin equipped with a computer-controlled rotating head to accurately capture binaural impulse responses of the sound system and the listening space which are then reproduced via calibrated headphones equipped with a head-tracker. In this way controlled listening evaluations can be made among different cinema audio systems tuned to different in-room target responses. Results from a MUSHRA-style preference test are presented. (Also a lecture—see session P13-3)
Convention Paper 9555 (Purchase now)

P19-9 Exploiting Envelope Fluctuations to Enhance Binaural Perception—G. Christopher Stecker, Vanderbilt University School of Medicine - Nashville, TN, USA
A review of recent and classic studies of binaural perception leads to the conclusion that envelope fluctuations, such as sound onsets, play a critical role in the sampling of spatial information from auditory stimuli. Specifically, listeners’ perception of sound location corresponds with the binaural cues (interaural time and level differences) that coincide with brief increases in sound amplitude, and disregards binaural cues occurring at other times. This discrete, envelope-triggered sampling of binaural information can be exploited to enhance spatial perception of synthesized sound mixtures, or to facilitate the localization of mixture components. [Also a lecture—see session P13-1]
Convention Paper 9553 (Purchase now)

P19-10 Comparison of Simple Self-Oscillating PWM Modulators—Nicolai Dahl, Technical University of Denmark - Lyngby, Denmark; Niels Elkjær Iversen, Technical University of Denmark - Kogens Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Michael A. E. Andersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Switch-mode power amplifiers has become the conventional choice for audio applications due to their superior efficiency and excellent audio performance. These amplifiers rely on high frequency modulation of the audio input. Conventional modulators use a fixed high frequency for modulation. Self-oscillating modulators do not have a fixed modulation frequency and can provide good audio performance with very simple circuitry. This paper proposes a new type of self-oscillating modulator. The proposed modulator is compared to an already existing modulator of similar type and their performances are compared both theoretically and experimentally. The result shows that the proposed modulator provides a higher degree of linearity resulting in around 2% lower Total Harmonic Distortion (THD). [Also a lecture—see session P14-5]
Convention Paper 9562 (Purchase now)

P19-11 Spatial Multi-Zone Sound Field Reproduction Using Higher-Order Loudspeakers in Reverberant Rooms—Keigo Wakayama, NTT Service Evolution Laboratories - Kanagawa, Japan; Hideaki Takada, NTT Service Evolution Laboratories - Kanagawa, Japan
We propose a method for reproducing multi-zone sound fields in a reverberant room using an array of higher-order loudspeakers. This method enables sparse arrangement of loudspeakers and reproduction of independent sound fields for multiple listeners without the need for headphones. For multi-zone reproduction, global sound field coefficients are obtained using translation operator. By using the coefficient of the room transfer function measured or simulated with an extension of the image-source method, the loudspeakers’ coefficients are then calculated with the minimum norm method in the cylindrical harmonic domain. From experiments of two-zone and three-zone examples, we show that there was a 2N + 1-fold decrease in the number of Nth-order loudspeakers for accurate reproduction with the proposed method compared to conventional methods. [Also a lecture—see session P14-3]
Convention Paper 9560 (Purchase now)

P19-12 Stereo Panning Law Remastering Algorithm Based on Spatial Analysis—François Becker, Paris, France; Benjamin Bernard, Medialab Consulting SNP - Monaco, Monaco; Longcat Audio Technologies - Chalon-sur-Saone, France
Changing the panning law of a stereo mixture is often impossible when the original multitrack session cannot be retrieved or used, or when the mixing desk uses a fixed panning law. Yet such a modification would be of interest during tape mastering sessions, among other applications. We present a frequency-based algorithm that computes the panorama power ratio from stereo signals and changes the panning law without altering the original panorama. Also a lecture—see session P7-6
Convention Paper 9523 (Purchase now)

P19-13 Non-Linear Extraction of a Common Signal for Upmixing Stereo Sources—François Becker, Paris, France; Benjamin Bernard, Medialab Consulting SNP - Monaco, Monaco; Longcat Audio Technologies - Chalon-sur-Saone, France
In the context of a two- to three-channel upmix, center channel derivations fall within the field of common signal extraction methods. In this paper we explore the pertinence of the performance criteria that can be obtained from a probabilistic approach to source extraction; we propose a new, non-linear method to extract a common signal from two sources that makes the implementation choice of deeper extraction with a criteria of information preservation; and we provide the results of preliminary listening tests made with real-world audio materials. Also a lecture—see session P7-7
Convention Paper 9524 (Purchase now)

P20 - Perception: Part 3

Tuesday, June 7, 08:45 — 10:45 (Room 353)

Chair:
John Mourjopoulos, University of Patras - Patras, Greece

P20-1 Conflicting Dynamic and Spectral Directional Cues Form Separate Auditory Images—Henri Pöntynen, Aalto University School of Electrical Engineering - Espoo, Finland; Olli Santala, Aalto University School of Electrical Engineering - Aalto, Finland; Ville Pulkki, Aalto University - Espoo, Finland
Auditory localization under conflicting dynamic and spectral cues was investigated in a listening experiment where head-motion-coupled amplitude panning was used to create front-back confusions with moving free-field stimuli. Subjects reported whether stimuli of various spectra formed auditory images in the front, rear or both hemiplanes simultaneously. The results show that panned low-pass stimuli were consistently localized to the rear hemiplane while high-pass stimuli did not produce hemiplane reversals. The main result of the experiment is that broadband stimuli providing low-frequency ITD sequences that are inconsistent with the source directions implied by the spectral cues can lead to the formation of two segregated auditory images. This effect was observed with both continuous and discontinuous stimulus spectra.
Convention Paper 9582 (Purchase now)

P20-2 Discrimination of Formant Frequency in Pink Noise—Tomira Rogala, Fryderyk Chopin University of Music - Warsaw, Poland
The paper reports an experiment conducted to determine discrimination thresholds for timbre in tonmeister students and non-musicians. The variations of timbre were obtained through introducing a 1/3-octave wide formant into the spectrum of noise and shifting the formant’s center frequency. Discrimination thresholds were measured using a 3AFC procedure. The results have shown that the threshold values determined for tonmeister students were considerably lower than those obtained for non-musicians. In both groups of listeners a learning effect was observed: the thresholds decreased in successive measurement series completed by a listener. It also was found that the formant frequency discrimination thresholds depended on the formant frequency and were much higher at 125 Hz than at 315 Hz and higher frequencies.
Convention Paper 9583 (Purchase now)

P20-3 The Influence of Room Acoustics on Musical Performance and Interpretation—A Pilot Study—Jan Berg, Luleå University of Technology - Piteå, Sweden; Sverker Jullander, Luleå University of Technology - Piteå, Sweden; Petter Sundkvist, Luleå University of Technology - Piteå, Sweden; Helge Kjekshus, Luleå University of Technology - Piteå, Sweden
Concert hall acoustics is an important factor that influences musical performance. Different acoustics lead to different musical results. For a musical performer, the artistic impression of a performance is paramount. Therefore, it is essential to study the relation between concert hall acoustics and musical performance. Such studies might also be relevant for architects and acousticians. A pilot study was devised, enabled by a unique concert hall with mechanically variable acoustics. A musician played the grand piano at four trials, each having a distinctive acoustic condition. The trials were recorded for later analysis. The performances were assessed by experts and the pianist himself. The results show that clear as well as subtle differences in interpretation and performance between the trials existed.
Convention Paper 9584 (Purchase now)

P20-4 Timbre Preferences of Four Listener Groups and the Influence of their Cultural Backgrounds—Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Ron Bakker, Yamaha Music Europe - Vianen, Netherlands; Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan
The cultural influence on listeners’ timbre preference was investigated using the magnitude estimation method. Four listener groups (Dutch, Japanese, Korean, and American) participated in a listening test in their own countries. The listeners manipulated the timbre of five stimuli (Dutch, Japanese, Korean and English popular song, and orchestral music) by adjusting gains of three frequency bands according to their preferences. The statistical analysis (a mixed design ANOVA) showed that only interaction factor of the listener groups and the stimuli significantly differentiated the preferred spectral responses of four listener groups. This implies that a listener group from one country had unique timbre preference that appeared by listening to a song in its own language. [Also a poster—see session P22-3]
Convention Paper 9585 (Purchase now)

P21 - Immersive Audio: Part 1

Tuesday, June 7, 08:45 — 11:45 (Room 352B)

Chair:
Bob Schulein, RBS Consultants / ImmersAV Technology - Schaumburg, IL, USA

P21-1 Low-Complexity Stereo Signal Decomposition and Source Separation for Application in Stereo to 3D Upmixing—Sebastian Kraft, Helmut-Schmidt-University - Hamburg, Germany; Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
In this paper we present a general low-complexity stereo signal decomposition approach. Based on a common stereo signal model, the panning coefficients and azimuth positions of the sources in a stereo mix are estimated. In a next step, this information is used to separate direct and ambient signal components. The simple algorithm can be implemented at low computational cost and its application in a stereo to 3D upmix context is described. Particular focus is put on the generation of additional ambient channels by using decorrelation filters in a tree structure. Finally, the separation performance is evaluated with several standard measures and compared to other algorithms. Also a poster—see session P22-5]
Convention Paper 9586 (Purchase now)

P21-2 Immersive Audio Delivery Using Joint Object Coding—Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden
Immersive audio experiences (3D audio) are an important element of next-generation audio entertainment systems. This paper presents joint object coding techniques that enable the delivery of object-based immersive audio content (e.g., Dolby Atmos) at low bit rates. This is achieved by conveying a multichannel downmix of the immersive content using perceptual audio coding algorithms together with parametric side information that enables the reconstruction of the audio objects from the downmix in the decoder. An advanced joint object coding tool is part of the AC-4 system recently standardized by ETSI. Joint object coding is also used in a backwards compatible extension of the Dolby Digital Plus system. Listening test results illustrate the performance of joint object coding in these two applications. [Also a poster—see session P22-7]
Convention Paper 9587 (Purchase now)

P21-3 Design and Subjective Evaluation of a Perceptually-Optimized Headphone Virtualizer—Grant Davidson, Dolby Laboratories, Inc. - San Francisco, CA, USA; Dan Darcy, Dolby Laboratories, Inc. - San Francisco, CA, USA; Louis Fielder, Dolby - San Francisco, CA, USA; Zhiwei Schuang, Dolby Laboratories Intl. Services Co. Ltd. - Beijing, China; Rich Graff, Dolby Laboratories, Inc. - San Francisco, CA, USA; Jeroen Breebaart, Dolby Australia Pty. Ltd. - McMahons Point, NSW, Australia; Poppy Crum, Dolby Laboratories - San Francisco, CA, USA
We describe a novel method for designing echoic headphone virtualizers based on a stochastic room model and a numerical optimization procedure. The method aims to maximize sound source externalization under a natural-timbre constraint. The stochastic room model generates a number of binaural room impulse response (BRIR) candidates for each virtual channel, each embodying essential perceptual cues. A perceptually-based distortion metric evaluates the timbre of each candidate, and the optimal candidate is selected for use in the virtualizer. We designed a 7.1.4 channel virtualizer and evaluated it relative to a LoRo stereo downmix using a single-interval A:B preference test. For a pool of 10 listeners, the test resulted in an overall virtualizer preference of 75%, with no stereo test item preferred over binaural.
Convention Paper 9588 (Purchase now)

P21-4 An Open 3D Audio Production Chain Proposed by the Edison 3D Project—Etienne Corteel, Sonic Emotion Labs - Paris, France; David Pesce, b<>com - Cesson-Sévigné, France; Raphael Foulon, Sonic Emotion Labs - Paris, France; Gregory Pallone, b<>com - Cesson-Sévigné, France; Frédéric Changenet, Radio France - Paris, France; Hervé Dejardin, Radio France - Paris, France
In this paper we present a production chain for Next Generation Audio formats that combines standard digital audio workstations with external 3D audio rendering software using an open communication protocol. We first describe the scope of the Edison 3D project. In a second part, we revisit existing 3D audio formats outlining the need for an open format for content creation and archiving. We then describe the tools developed in Edison 3D enabling user interaction, storage of object positions in the timeline, monitoring of audio content in various rendering formats (stereo, 5.1, binaural, WFS, HOA), and export into the open and recently standardized (ITU BS.2076) 3D audio format: Audio Definition Model. We finally provide an outlook into future work.
Convention Paper 9589 (Purchase now)

P21-5 Perceptual Evaluation of Transpan for 5.1 Mixing of Acoustic Recordings—Gaëtan Juge, Paris Conservatory (CNSMDP) - Paris, France; Amandine Pras, Paris Conservatoire (CNSMDP) - Paris, France; Stetson University - DeLand, FL, USA; Ilja Frissen, McGill University - Montreal, Quebec, Canada
We evaluate the efficiency of a 3D spatialization software named Transpan in the context of mixing acoustic recordings on a 5.1 reproduction system. The study aims to investigate if the use of the binaural with cross-talk cancellation (XTC) processing implemented in Transpan can improve the localization of lateral sources and their stability through listeners’ movements. We administered a listening test to 22 expert listeners in Paris and in Berlin. The test consisted of comparisons among two mixes with and without Transpan binaural/XTC panning, for four classical music excerpts under five listening conditions, i.e., at the sweet spot and while performing specific movements. Quantitative analysis of multiple choice questions showed that Transpan can enlarge the 5.1 sweet spot area toward the rear speakers. From qualitative analysis of participants’ feedback emerged five main categories of comments, namely Localization stability; Precise localization accuracy; Vague localization accuracy; Timbral and spectral artifacts; and Spatial differences. Together the results show that Transpan allows for better source lateralization in 5.1 mixing. Also a poster—see session P22-5]
Convention Paper 9590 (Purchase now)

P21-6 The Influence of Head Tracking Latency on Binaural Rendering in Simple and Complex Sound Scenes—Peter Stitt, LIMSI Université Paris-Saclay - Orsay, France; Etienne Hendrickx, Paris Conservatory (CNSMDP) - Paris, France; Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Brian Katz, LIMSI-CNRS - Orsay, France
Head tracking has been shown to improve the quality of multiple aspects of binaural rendering for single sound sources, such as reduced front-back confusions. This paper presents the results of an AB experiment to investigate the influence of tracker latency on the perceived stability of virtual sounds. The stimuli used are a single frontal sound source and a complex (5 source) sound scene. A comparison is performed between the results for the simple and complex sound scenes and the head motions of the subjects for various latencies. The perceptibility threshold was found to be 10 ms higher for the complex scene compared to the simple one. The subject head movement speeds were found to be 6 degrees/s faster for the complex scene.
Convention Paper 9591 (Purchase now)

P22 - Rendering, Human Factors and Interfaces

Tuesday, June 7, 12:00 — 14:00 (Foyer)

P22-1 A Subjective Comparison of Discrete Surround Sound and Soundbar Technology by Using Mixed Methods—Tim Walton, Newcastle University - Newcastle-Upon-Tyne, UK; BBC Research and Development - Salford, UK; Michael Evans, BBC Research & Development - Salford, Greater Manchester, UK; David Kirk, Newcastle University - Newcastle-Upon-Tyne, UK; Frank Melchior, BBC Research and Development - Salford, UK
In recent years, soundbars have seen a rise in interest from consumers of home audio. Such technology offers an alternative means to experience surround sound content compared to conventional discrete multichannel systems. This paper presents a subjective comparison between two soundbars—a discrete 5 channel surround system and a discrete stereo system for a range of content and listener experience—in order to evaluate how soundbar technology compares to conventional discrete systems. A mixed methods approach, Open Profiling of Quality, was used in order to deeper understand preference ratings for the various reproduction systems. The results show that the discrete surround system was significantly preferred to the soundbars for all content due to a combination of timbral and spatial factors.
Convention Paper 9592 (Purchase now)

P22-2 WITHDRAWN—N/A

Convention Paper 9593 (Purchase now)

P22-3 Timbre Preferences of Four Listener Groups and the Influence of their Cultural Backgrounds—Sungyoung Kim, Rochester Institute of Technology - Rochester, NY, USA; Ron Bakker, Yamaha Music Europe - Vianen, Netherlands; Masahiro Ikeda, Yamaha Corporation - Hamamatsu, Shizuoka, Japan
The cultural influence on listeners’ timbre preference was investigated using the magnitude estimation method. Four listener groups (Dutch, Japanese, Korean, and American) participated in a listening test in their own countries. The listeners manipulated the timbre of five stimuli (Dutch, Japanese, Korean and English popular song, and orchestral music) by adjusting gains of three frequency bands according to their preferences. The statistical analysis (a mixed design ANOVA) showed that only interaction factor of the listener groups and the stimuli significantly differentiated the preferred spectral responses of four listener groups. This implies that a listener group from one country had unique timbre preference that appeared by listening to a song in its own language. [Also a lecture—see session 20-4]
Convention Paper 9585 (Purchase now)

P22-4 Low-Complexity Stereo Signal Decomposition and Source Separation for Application in Stereo to 3D Upmixing—Sebastian Kraft, Helmut-Schmidt-University - Hamburg, Germany; Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
In this paper we present a general low-complexity stereo signal decomposition approach. Based on a common stereo signal model, the panning coefficients and azimuth positions of the sources in a stereo mix are estimated. In a next step, this information is used to separate direct and ambient signal components. The simple algorithm can be implemented at low computational cost and its application in a stereo to 3D upmix context is described. Particular focus is put on the generation of additional ambient channels by using decorrelation filters in a tree structure. Finally, the separation performance is evaluated with several standard measures and compared to other algorithms. Also a lecture—see session P21-1]
Convention Paper 9586 (Purchase now)

P22-5 Perceptual Evaluation of Transpan for 5.1 Mixing of Acoustic Recordings—Gaëtan Juge, Paris Conservatory (CNSMDP) - Paris, France; Amandine Pras, Paris Conservatoire (CNSMDP) - Paris, France; Stetson University - DeLand, FL, USA; Ilja Frissen, McGill University - Montreal, Quebec, Canada
We evaluate the efficiency of a 3D spatialization software named Transpan in the context of mixing acoustic recordings on a 5.1 reproduction system. The study aims to investigate if the use of the binaural with cross-talk cancellation (XTC) processing implemented in Transpan can improve the localization of lateral sources and their stability through listeners’ movements. We administered a listening test to 22 expert listeners in Paris and in Berlin. The test consisted of comparisons among two mixes with and without Transpan binaural/XTC panning, for four classical music excerpts under five listening conditions, i.e., at the sweet spot and while performing specific movements. Quantitative analysis of multiple choice questions showed that Transpan can enlarge the 5.1 sweet spot area toward the rear speakers. From qualitative analysis of participants’ feedback emerged five main categories of comments, namely Localization stability; Precise localization accuracy; Vague localization accuracy; Timbral and spectral artifacts; and Spatial differences. Together the results show that Transpan allows for better source lateralization in 5.1 mixing. [Also a lecture—see session P21-5]
Convention Paper 9590 (Purchase now)

P22-6 Immersive Audio Delivery Using Joint Object Coding—Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden
Immersive audio experiences (3D audio) are an important element of next-generation audio entertainment systems. This paper presents joint object coding techniques that enable the delivery of object-based immersive audio content (e.g., Dolby Atmos) at low bit rates. This is achieved by conveying a multichannel downmix of the immersive content using perceptual audio coding algorithms together with parametric side information that enables the reconstruction of the audio objects from the downmix in the decoder. An advanced joint object coding tool is part of the AC-4 system recently standardized by ETSI. Joint object coding is also used in a backwards compatible extension of the Dolby Digital Plus system. Listening test results illustrate the performance of joint object coding in these two applications. (Also a lecture – see session P21-2)
Convention Paper 9587 (Purchase now)

P23 - Immersive Audio: Part 2

Tuesday, June 7, 12:15 — 14:15 (Room 352B)

Chair:
Robin Reumers, Sonic City Studios - Amsterdam, The Netherlands

P23-1 Exploring the Benefits of Surround Sound in Contemporary Live Music Performances—John Crossley, University of Derby - Derby, UK
Spatial audio utilizing 5.1 surround sound and newer developments such as object oriented audio has become well established in cinema and home theaters. The expansion of this into live musical performance is quite limited. This work explores the benefits of surround sound for contemporary music performance. A 20-channel Wavefield synthesis system was compared to a high quality stereo sound reinforcement system under identical experimental conditions. An original composition was used to avoid familiarity with program material and to encourage focus on spatial considerations. Data drawn from audiences at both performances is used to quantify the perceptual differences for the average audience and to draw conclusions as to the usefulness of using a system of this type in an “average” contemporary live music performance.
Convention Paper 9594 (Purchase now)

P23-2 Extended Bass Management Methods for Cost-Efficient Immersive Audio Reproduction in Digital Cinema—Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Charles Q. Robinson, Dolby Laboratories - San Francisco, CA, USA
New, more sophisticated cinema audio formats have recently been developed and deployed that provide a more immersive sound field for the audience members. This paper discusses techniques that can reduce the economic costs associated with the installation and maintenance of the audio reproduction devices in contemporary immersive digital cinema, while best retaining the benefits of these new formats. The focus of this paper is novel bass management techniques that enable the use of cost-effective loudspeakers. These proposed bass processing methods take advantage of the reduced spatial saliency of low frequency audio and allow for a reduced spatial resolution for audio signals in that range. We present subjective tests conducted in a state of the art cinema installation that illustrate the effects of the proposed solutions on sound quality. Some of the these techniques have been incorporated to the future version of the Dolby Atmos cinema specification.
Convention Paper 9595 (Purchase now)

P23-3 Local Wave Field Synthesis by Spatial Band-Limitation in the Circular/Spherical Harmonics Domain—Nara Hahn, University of Rostock - Rostock, Germany; Fiete Winter, Universität Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
The achievable accuracy of sound field synthesis (SFS) techniques, such as Wave Field Synthesis (WFS), is mainly limited in practice due to the limited loudspeaker density. Above the so called spatial aliasing frequency, considerable artifacts are introduced in the synthesized sound field. In local SFS, the accuracy within a local listening area is increased at the cost of degradations outside. In this paper a new approach for local WFS is proposed. The WFS driving functions are computed based on an order-limited harmonics expansion of the target sound field. A local listening area is created around the shifted expansion center where the synthesized sound field exhibits higher accuracy. The size of the local area is controlled by the expansion order of the driving function. The derivations of 2D, 3D, and 2.5D driving functions are given, and the synthesized sound fields are evaluated by numerical simulations.
Convention Paper 9596 (Purchase now)

P23-4 Investigation on Subjective HRTF Rating Repeatability—Areti Andreopoulou, LIMSI-CNRS - Orsay, France; Brian Katz, LIMSI-CNRS - Orsay, France
This paper investigates the repeatability of an HRTF evaluation protocol, assessing the spatial quality of binaural stimuli, moving along pre-defined trajectories on the horizontal and median planes, on a forced-choice 9-point rating scale. The protocol assessment was based on data simulations and subjective studies. Repeatability was evaluated as a function of the size and content of the HRTF corpus, the trajectories, and the resolution of the rating scale. Analysis of the data revealed that HRTF rating is a reliable, yet challenging task with low repeatability rates of [about] 50%. Therefore participant screening through pre-tests should be used to maximize reliability of the responses.
Convention Paper 9597 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS June 5th 10:00 – 18:00 June 6th 09:00 – 18:00 June 7th 09:00 – 16:00

REGISTRATION DESK June 4th 08:00 – 18:00 June 5th 08:00 – 18:00 June 6th 08:00 – 18:00 June 7th 08:00 – 16:00

TECHNICAL PROGRAM June 4th 09:00 – 18:30 June 5th 08:30 – 18:00 June 6th 08:30 – 18:00 June 7th 08:45 – 16:00

Audio Engineering Society

AES Paris 2016Paper Session Details