AES Berlin 2014
Paper Session Details
P1 - Perception—Part 1
Saturday, April 26, 10:00 — 12:30 (Room Paris)
Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada
P1-1 Consistency of High Frequency Preference Among Expert Listeners—Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Stuart Bremner, McGill University - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Munich, Germany; McGill University - Montreal, Canada
Consistency is one of the most fundamental skills of the recording professional. This is particularly important in tasks that involve the shaping of timbre. A study was designed that allows expert listeners to control a simple shelving equalizer that alters the high-frequency content of high-quality stereo program material over repeated trials. Fifteen trained subjects performed the test. Results indicate that there was an expectedly large range of preference for high frequency content. This was, however, also accompanied by a somewhat large variance. Unlike previous studies, consistency of high frequency preference was shown to be less related to the subjects' experience than other balancing tasks.
Convention Paper 9018 (Purchase now)
P1-2 Subjective Evaluation of High Resolution Recordings in PCM and DSD Audio Formats—Atsushi Marui, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Toru Kamekawa, Tokyo University of the Arts - Adachi-ku, Tokyo, Japan; Kazuhiko Endo, TEAC Corporation - Tokyo, Japan; Erisa Sato, TEAC Corporation - Tokyo, Japan
High-resolution audio production and consumption are increasing attraction supported by releases of the relatively affordable audio recorders from multiple manufacturers and broader bandwidth of the Internet. However, differences in audio quality between high-resolution audio formats are still not well known, especially between the different audio formats available for the audio recorders. In order to evaluate the differences between subjective impression of the sounds recorded using high resolution audio formats, three audio formats—PCM (192 kHz/24 bits), DSD (2.8 MHz), and DSD (5.6 MHz)—recorded with multiple studio-quality audio recorders were evaluated in a double-blind A/B comparison listening test. Six sound programs evaluated by forty-six participants on eight attributes revealed statistically significant differences between PCM and DSD but not between the two sampling frequencies (2.8 MHz and 5.6 MHz) of DSD.
Convention Paper 9019 (Purchase now)
P1-3 The Acceptability of Speech with Interfering Radio Program Material—Khan Baykaner, University of Surrey - Guildford, Surrey, UK; Christopher Hummersone, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Søren Bech, Bang & Olufsen a/s - Struer, Denmark; Aalborg University - Aalborg, Denmark
A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio programs featuring speech as the target. Twenty-one subjects, including naïve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial, whether the listening scenario was acceptable or unacceptable. Stimuli pairs were set to randomly selected SNRs ranging from 0 to 45 dB. Results showed no significant difference between subjects according to listening experience. A logistic regression to acceptability was carried out based on SNR. The model had accuracy R2 = 0.87, RMSE = 14%, and RMSE* = 7%. By accounting for the presence of background audio in the target program, 90% of the variance could be explained.
Convention Paper 9020 (Purchase now)
P1-4 The Effect of Dynamic Range Compression on Loudness and Quality Perception in Relation to Crest Factor—Mark Wendl, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Two listening tests were carried out to find the changes in perceived loudness and perceived quality as the crest factors were changed for three genres (rock, electronic, and jazz) as a result of limiting, a type of dynamic range compression. The stimuli ranged from crest factors of 15 dBFS to 9 dBFS with a 1 dBFS increment. Loudness and quality had significant differences between the crest factors suggesting that a change in crest factor affects both. Correlations between loudness and quality were present for rock and jazz however not for electronic suggesting that genres can affect how we perceive quality.
Convention Paper 9021 (Purchase now)
P1-5 Hyper-Compression in Music Production: Listener Preferences on Dynamic Range Reduction—Robert W. Taylor, University of Newcastle - Callaghan, NSW, Australia; University of Sydney - Sydney, NSW, Australia; William L. Martens, University of Sydney - Sydney, NSW, Australia
Achieving “loud” recordings as a result of hyper-compression is a prevailing expectation within the creative system of music production, sustaining a myth that has been developing since the mid-twentieth century as a consequence of the “louder is better” paradigm. The study reported here investigated whether the amounts of hyper-compression typical of current audio practice produce results that listeners prefer. The experimental approach taken in this study was to conduct a subjective preference test requiring listeners to make a forced choice between seven levels of compression for each of five musical programs that differed in musical genre. The presented seven versions of each musical program were carefully matched in loudness as the versions were varied in compression level, and so differences in loudness per se cannot account for the differences in preferences choices observed between musical programs. In addition, it was found that subject factors such as age group, and speculatively the amount of exposure to different genres, were of considerable influence on listener preferences.
Convention Paper 9022 (Purchase now)
P2 - Perception—Part 2
Saturday, April 26, 14:30 — 18:30 (Room Paris)
Jürgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany
P2-1 An Approach to Quantifying the Latency Tolerance Range in Non-Collaborative Musical Performances—Jorge Medina Victoria, University of Applied Sciences - Darmstadt, Germany; Cork Institute of Technology - Cork, Ireland
Latency is a well-known issue in collaborative music performances over networks such as the Internet. The effects of latency in performances over networks has been researched for the last decade, however, relatively few researches deal with the question of how musicians cope with their own latency in non-collaborative performances (performing music solo). This paper introduces the new concept of Latency Tolerance Range (LTR) and describes a methodological approach in order to develop a listening test, the results of which may demonstrate the influence of the musicians’ performed instruments (chordophones, aerophones, and membranophones) on latency perception.
Convention Paper 9023 (Purchase now)
P2-2 Emotional Impact of Different Forms of Spatialization in Everyday Mediatized Music Listening: Placebo or Technology Effects?—Steffen Lepa, Technical University of Berlin - Berlin, Germany; Audio Communication Group; Stefan Weinzierl, Technical University of Berlin - Berlin, Germany; Hans-Joachim Maempel, Staatliches Institut für Musikforschung Preußischer Kulturbesitz - Berlin, Germany; Technische Universität Berlin - Berlin, Germany; Elena Ungeheuer, Julius-Maximilians-Universität - Würzburg, Germany
Do the spatial cues conveyed by different audio playback technologies alter the affective experience of music listening or is this rather a matter of quality expectations leading to “placebo effects”? To find out, we conducted a 2-factorial between-subjects design study employing “spatialization type” (stereo headphones / stereo loudspeakers / live concert simulation) and “spatial quality expectations” (yes / no) as independent experimental factors. Three-hundred-six subjects rated the perceived intensity of emotional expression when listening to four different musical pieces as well as the overall audio quality. While we observed significant effects of spatialization type on perceived affective expressivity of music and spatial audio quality, expectation-related placebo effects affected perceived spatial audio quality only. Results are discussed in terms of their significance for music and media research.
Convention Paper 9024 (Purchase now)
P2-3 Data-Driven Modeling of the Spatial Sound Experience—Aki Härmä, Philips Research - Eindhoven, The Netherlands; Munhum Park, Philips Research Laboratories - Eindhoven, The Netherlands; Armin Kohlrausch, Philips Research Europe - Eindhoven, The Netherlands
Since the evaluation of audio systems or processing schemes is time-consuming and resource-expensive, alternative objective evaluation methods attracted considerable research interests. However, current perceptual models are not yet capable of replacing a human listener especially when the test stimulus is complex, for example, a sound scene consisting of time-varying multiple acoustic images. This paper describes a data-driven approach to develop a model to predict the subjective evaluation of complex acoustic scenes, where the extensive set of listening test results collected in the latest MPEG-H 3-D audio initiative was used as training data. The results showed that a few selected outputs of various auditory models may be a useful set of features, where linear regression and multilayer perceptron models reasonably predicted the overall distribution of listening test scores, estimating both mean and variance.
Convention Paper 9025 (Purchase now)
P2-4 Investigation into Vertical Stereophonic Localization in the Presence of Interchannel Crosstalk—Rory Wallis, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening tests were carried out on 12 subjects, using stereophonic loudspeakers arranged vertically in the median plane, to determine the threshold at which the amplitude of a delayed upper loudspeaker had to be reduced in order for stimuli to be fully localized at a lower loudspeaker. The test stimuli used were seven octave bands of noise (125, 250, 500, 1000, 2000, 4000, and 8000 Hz) and one broadband source (125 ¬ 8000 Hz). The effect of frequency on the threshold was found to be significant (with the 1000 and 2000 Hz bands having the lowest thresholds) while the effect of delay time was non-significant. The threshold for the broadband stimulus was found to be significantly lower compared to each of the noise bands.
Convention Paper 9026 (Purchase now)
P2-5 The Perceptual Effects of Horizontal and Vertical Interchannel Decorrelation Using the Lauridsen Decorrelator—Christopher Gribben, University of Huddersfield - Huddersfield, West Yorkshire, UK; Bedford, Bedfordshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
The perceptual effects of interchannel decorrelation, using a method proposed by Lauridsen, have been investigated subjectively, looking specifically at the frequency dependency of decorrelation. Twelve subjects graded the perceived auditory image width of a pink noise sample that had been decorrelated by a Lauridsen decorrelator algorithm, varying the frequency-band, time-delay, and decorrelation factor for each sample. The same test has been carried out in both the horizontal and vertical planes. Results generally indicate that decorrelation is more effective horizontally than vertically. For horizontal decorrelation, the higher the frequency, the more effective the decorrelation, with a longer time-delay required for lower frequencies. In contrast, the vertical width produced by vertical decorrelation is better perceived at lower frequencies than higher ones.
Convention Paper 9027 (Purchase now)
P2-6 The Effect of Auditory Memory on the Perception of Timbre—Cleopatra Pike, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Listeners are more sensitive to timbral differences when comparing stimuli side-by-side than temporally-separated. The contributions of auditory memory and spectral compensation to this effect are unclear. A listening test examined the role of auditory memory in timbral discrimination, across retention intervals (RIs) of up to 40 s. For timbrally complex music stimuli discrimination accuracy was good across all RIs, but there was increased sensitivity to onset spectrum, which decreased with increasing RI. Noise stimuli showed no onset sensitivity but discrimination performance declined with RIs of 40 s. The difference between program types may suggest different onset sensitivity and memory encoding (categorical vs non-categorical). The onset bias suggests that memory effects should be measured prior to future investigation of spectral compensation.
Convention Paper 9028 (Purchase now)
P2-7 Investigation of a Random Radio Sampling Method for Selecting Ecologically Valid Music Program Material—Jon Francombe, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Martin Dewhirst, University of Surrey - Guildford, Surrey, UK; Søren Bech, Bang & Olufsen a/s - Struer, Denmark; Aalborg University - Aalborg, Denmark
When performing subjective tests of an audio system, it is necessary to use appropriately selected program material to excite that system. Program material is often required to be wide-ranging and representative of commonly consumed audio, while having minimal selection bias. A random radio sampling procedure was investigated for its ability to produce such a stimulus set. Nine popular stations were sampled at six different times of day over a number of days to produce a 200-item pool. Musical and signal-based characteristics were examined; the items were found to span a wide range of genres and years, and physical similarities were found between items in the same genre. The proposed method is beneficial for collecting a wide and representative stimulus set.
Convention Paper 9029 (Purchase now)
P2-8 Criticality of Audio Stimuli for Listening Tests – Listening Durations During a Ranking Task—Jonas Ekeroot, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden; Arne Nykänen, Luleå University of Technology - Luleå, Sweden
The process of selecting critical audio stimuli for listening tests is known from the literature to be both labor-intensive and time-consuming, and has been described as more of art than science. Explicit accounts of systematic procedures are not the most commonly encountered. In a previous study a ranking-by-elimination method was investigated, resulting in a rank order that could be used as a guide for critical stimuli selection. This paper presents a further exploratory analysis of data on the subjects’ listening durations, both as a function of number of stimuli left on screen and individually per stimulus. A strong negative correlation was found between the rank order of criticality and playing duration.
Convention Paper 9030 (Purchase now)
P3 - Signal Processing—Part 1
Sunday, April 27, 09:00 — 11:30 (Room Paris)
Ville Pulkki, Aalto University - Espoo, Finland; Technical University of Denmark - Denmark
P3-1 Memory Requirements Reduction Technique for Delay Storage in Real Time Acoustic Cameras—Héctor A. Sánchez-Hevia, University of Alcalá - Alcalá de Henares, Madrid, Spain; Inma Mohino-Herranz, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
Acoustic cameras are devices capable of displaying a visual representation of sound waves. Typically these devices relay on delay-based techniques, such as Delay and Sum Beamforming, being the calculation of the proper delay values a key component of the system. For real-time systems with a large amount of microphones it is not practical to perform such calculation being common to go for an offline strategy in which the pre-calculated values are stored in memory, allowing a faster dispatch of the data while increasing memory requirements. In this paper we present a technique for delay storage optimization based on various symmetries found within the pre-calculated values that allow a reduction up to almost 16 times over the initial memory requirements.
Convention Paper 9031 (Purchase now)
P3-2 Introducing Waveform Distribution Moments for Audio Content Analysis—Henrik von Coler, SIM (Staatliches Institut für Musikforschung) - Berlin, Germany; Technical University of Berlin - Berlin, Germany
This paper introduces waveform distribution moments as features for audio content analysis. Moments and central moments of distributions are directly calculated from the squared waveform, in order to extract information on the energy development of a signal. The feature trajectories thus obtained promise to be applicable in transient detection, onset detection, and related tasks and are more sensitive to rapid changes than root mean square based methods, as a qualitative analysis reveals. An evaluation of the proposed features is presented in a feature ranking experiment related to transient detection and in an onset detection experiment. In both applications the waveform distribution moments show promising results in comparison to other signal descriptors.
Convention Paper 9032 (Purchase now)
P3-3 Efficient Low Frequency Echo Cancellation Using Sparse Adaptive FIR Filters—Alexis Favrot, Illusonic GmbH - Lausanne, Switzerland; Christof Faller, Illusonic GmbH - Uster, Switzerland
It is shown how finite impulse response (FIR) filtering and filter adaptation can be implemented with reduced computational complexity when applied to signals containing only low frequencies. A sparse adaptive filter (with only every Mth coefficient being non-zero) with reduced adaptation rate achieves a similar result as a conventional adaptive filter but with lower computational complexity. An echo control scheme based on a sparse adaptive filter is described. Low frequency echoes are cancelled followed by echo suppression over all frequencies.
Convention Paper 9033 (Purchase now)
P3-4 Computationally-Efficient Speech Enhancement Algorithm for Binaural Hearing Aids—David Ayllón, University of Alcalá - Alcalá de Henares, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
The improvement of speech intelligibility in hearing aids is a complex and unsolved problem. The recent development of binaural hearing aids allows the design of speech enhancement algorithms to take advantages of the benefits of binaural hearing. In this paper a novel source separation algorithm for binaural speech enhancement based on supervised machine learning and time-frequency masking is presented. The proposed algorithm requires less than 10% of the available instructions for signal processing in a state-of-the-art hearing aid and obtains good separation performance in terms of WDO for low SNR.
Convention Paper 9035 (Purchase now)
P3-5 Two-Stage Impulsive Noise Detection Using Inter-frame Correlation and Hidden Markov Model for Audio Restoration—Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Dong Yun Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Nam In Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Myung Kyu Choi, Samsung Electronics - Gyeonggi-do, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper a two-stage impulsive noise detection method is proposed to improve the quality of audio signals distorted by impulsive noise. In order to reduce false alarms and missing detection errors, the proposed method first tries to detect whether a frame includes onsets on the basis of inter-frame correlation. Next, hidden Markov model-based maximum likelihood classification is carried out to decide if the onset has occurred from impulsive noise or not. It is shown from performance evaluation that the proposed method achieves higher detection accuracy than with conventional residual domain-based methods under various impulsive noise distributions.
Convention Paper 9036 (Purchase now)
P4 - Signal Processing—Part 2
Sunday, April 27, 13:30 — 16:30 (Room Paris)
Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Munich, Germany; McGill University - Montreal, Canada
P4-1 Creation of New Virtual Patterns for Emotion Recognition through PSOLA—Inma Mohino-Herranz, Universidad de Alcalá - Alcalá de Henares, Madrid, Spain; Héctor A. Sánchez-Hevia, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain
Human emotions can be recognized through speech analysis. One main problem of this discipline is the lack of databases with a sufficient number of patterns for a correct learning. This fact makes generalization in the learning process be more difficult. One possible solution is the creation of new virtual patterns, enlarging the training set. In order to carry out this enlargement, we modify the average pitch by using the technique known as Pitch Synchronous Overlap and Add combined with resampling, that allows to change the average pitch without altering neither the pitch variations nor the speech rate. Therefore, the emotion in the utterance is unaltered. Results over the original test set show that it is possible to achieve a significant reduction in the generalization effects with the proposed creation of new virtual training patterns.
Convention Paper 9037 (Purchase now)
P4-2 Extended Subtractive Synthesis of Harmonic Musical Tones—Rémi Mignot, Aalto University - Espoo, Finland; Vesa Välimäki, Aalto University - Espoo, Finland
A new approach is presented for the digital sound analysis-synthesis of musical tones. Based on the Source-Filter principle, the Extended Subtractive Synthesis roughly consists of the real-time filtering of a source signal by a digital filter. Starting from the recorded notes of a given instrument, the time-varying fundamental frequency and the digital filters are jointly analyzed. First, one key point of this work is the use of new advanced tools for the filter identification, which allow a relative low-order approximation of the spectral envelopes with a perceptually based criterion. Second, we propose a particular filter chain, for the separated sine and noise parts, which significantly reduces the simulation cost in the case of polyphonic synthesis and facilitates the time-varying filtering.
Convention Paper 9038 (Purchase now)
P4-3 A Virtual Acoustic Environment for Automated Parameter Optimization of a Multichannel Downmix Algorithm—Fabian Knappe, Hamburg University of Applied Sciences - Hamburg, Germany; Robert Mores, Hamburg University of Applied Sciences - Hamburg, Germany; Christian Hartmann, Institut für Rundfunktechnik - Munich, Germany
This paper presents an environment for automated parameter-optimization of a multichannel downmix algorithm. Manual optimization of multiple parameters in audio signal algorithms is likely to deliver poor results, especially if many parameters mutually interfere with each other. Even professionals fail to control the correct adjustment of all the parameters. At the same time broadcast environments ask for automated and efficient handling. This paper approaches automated optimization of a 5.0 to 2.0 channel downmix algorithm by defining a virtual acoustic environment and using an optimization process based on the Levenberg-Marquardt algorithm. The aim of the study is to determine recommendations for the parameterization of the downmix algorithm that enable mixing engineers to employ the algorithm’s potential without knowledge of all the parameters’ dependencies. A listening test validates the results across various genres.
Convention Paper 9039 (Purchase now)
P4-4 An Approach to the Generation of Subharmonic Frequencies in Audio Applications—Dieter Leckschat, University of Applied Science Düsseldorf - Düsseldorf, Germany; Christian Epe, University of Applied Sciences Düsseldorf - Duesseldorf, Germany
In recording studios it is common to use equipment and algorithms to enhance audio productions in the low frequency range. Today’s methods use either a frequency-selective dynamic compression or focus on psychoacoustics to take advantage of the residuum effect. The basic subject of this paper is a method to generate sub-harmonics of an audio signal. The most interesting sub-harmonic is one octave below a signal´s fundamental frequency. By implementing a mathematical formula it is possible to produce an oscillation at half the frequency of a given harmonic oscillation. The method works in the analog or digital domain and instantaneous, which makes it suited for real-time applications of musicians. Depending on the design the process can be optimized for stationary signals or for signals with transient components.
Convention Paper 9040 (Purchase now)
P4-5 True Peak Metering—A Tutorial Review—Ian Dash, Consultant - Marrickville, NSW, Australia
Along with the loudness algorithm, ITU-R Recommendation BS.1770 specifies a true-peak metering method using oversampling and interpolation. The need for such metering is discussed, along with considerations on its implementation and on its usage. Implementation issues include oversampling factor, the tradeoff between accuracy and processor load and the proportion of total processor load when combined with loudness measurement. Four sources of error are examined: timing of interpolated samples, ripple in the passband response of the interpolation filter, incomplete alias suppression in the stopband response of the filter and departure from linear phase response. Implications of filter topology and filter order are discussed. An example of implementation is given along with performance parameters.
Convention Paper 9041 (Purchase now)
P4-6 Drift and Wow Correction of Analogue Magnetic Tape Recordings in the Analogue Domain Using HF-Bias Signals—Nadja Wallaszkovits, Phonogrammarchiv, Austrian Academy of Science - Vienna, Austria; Tobias Hetzer, University of Applied Sciences FH Technikum Wien - Vienna, Austria; Heinrich Pichler, Audio Consultant - Vienna, Austria
Based on various existing ideas of using the high frequency (HF) bias signal of analogue magnetic tape recordings as a reference signal for irregular speed deviations, this paper discusses the approach to pre-process the bias signal in the analogue domain in a way that allows control of the playback speed of the replay machine in form of a servo loop. Therefore, the bias signal is captured via a sensor head and specifically preprocessed to match the reference frequency of the capstan motor of the tape machine. The machine is set to external vari-speed control mode and, thus, deviations of the bias signal act as external vari-speed reference, allowing automatic speed correction in real-time. The paper discusses the possibilities, problems and limits of the technical implementation of such a prototype.
Convention Paper 9042 (Purchase now)
P5 - Perception/Spatial Audio/Room Acoustics
Sunday, April 27, 15:00 — 16:30 (Foyer)
P5-1 Elicitation and Objective Grading of Punch within Produced Music—Steven Fenton, University of Huddersfield - Huddersfield, West Yorkshire, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
This paper details the results from an investigation into the objective grading of punch within a complex musical signal. The term punch is a subjective term, which is often used to characterize music or sound sources that exhibit a sense of dynamic power or weight to the listener. In a novel reverse elicitation process, experts were asked to create audio samples that they perceived as having punch using a multi-band wave shaping process. Expert listeners then graded the generated punchy audio samples in a controlled listening test. Statistical analysis identified correlations between Mean Subject Scores and the parameters that created the punchy audio samples suggesting that an algorithm could be developed to objectively evaluate punch in produced music.
Convention Paper 9043 (Purchase now)
P5-2 The Subjective Effect of BRIR Length on Perceived Headphone Sound Externalization and Tonal Coloration—Ryan Crawford-Emery, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Binaural room impulse responses (BRIRs) of various lengths were convolved with stereophonic audio signals. Listening tests were conducted to assess how the length of BRIRs affected the perceived externalization effect and tonal coloration of the audio. The results showed statistically significant correlations between BRIR length and both externalization and tonal coloration. Conclusions are drawn from this and in addition, reasoning, a critical evaluation and suggested further work are suggested. The experiment provides the basis for further development of an effective and efficient externalization algorithm.
Convention Paper 9044 (Purchase now)
P5-3 3-D Audio Object Rendering into 5.1 Surround System—Kangeun Lee, Samsung Electronics - Yongin-si, Gyeonggi-do, Korea; Seokhwan Jo, Samsung Electronics - Yongin-si, Gyeonggi-do, Korea; Do-Hyung Kim, Samsung Electronics - Suwon-si, Gyeonggi-do, Korea
Following the recent trend of employing UHD video for increasing reality, audio object-based representation is one of the candidates for UHD audio format. The current paper is concerned with an effective method for the rendering of audio objects into a conventional 5.1 surround system. In order to represent the 3-D objects onto the upper hemisphere of the listener, the proposed system introduces object localization and virtualization of height speakers. The object is mapped to the 10.1 channel by using the object localization, and the 10.1 channel is rendered to 5.1 surround layout by virtualization based on the mixed structure by HRTF and VBAP. Subjective impressiveness was compared with the 19.1 loudspeaker system, which demonstrated almost same performance on localization in the horizontal and vertical plane. Therefore, the proposed system is capable of delivering sound moving effects to listeners over the conventional surround system.
Convention Paper 9045 (Purchase now)
P5-4 Elevation Localization Response Accuracy on Vertical Planes of Differing Azimuth—Tommy Ashby, University of Surrey - Guildford, Surrey, UK; Russell Mason, University of Surrey - Guildford, Surrey, UK; Tim Brookes, University of Surrey - Guildford, Surrey, UK
Head movement has been shown to significantly improve localization response accuracy in elevation. It is unclear from previous research whether this is due to static cues created once the head has reached a new stationary position or dynamic cues created through the act of moving the head. In this experiment listeners were asked to report the location of loudspeakers placed on vertical planes at four different azimuth angles (0°, 36°, 72°, 108°) with no head movement. Static elevation response accuracy was significantly more accurate for sources away from the median plane. This finding, combined with the statement that listeners orient to face the source when localizing, suggests that dynamic cues are the cause of improved localization through head movement.
Convention Paper 9046 (Purchase now)
P5-5 A New Method for the Determination of Acoustically Good Room Dimension Ratios—John Sarris, Aretaieio University Hospital - Athens, Greece
A new method for the determination of acoustically good room dimension ratios is presented. The method is based on the metric of variation of mean pressure defined as the variation of the mean levels of the sound pressure distribution within a room over a frequency range. This new metric quantifies the overall sound pressure variation within the room and is representative of the evenness of the frequency response among the various listening positions. Simulation results are presented for a small and a larger room where the new index is used to draw maps from which appropriate room proportions can be chosen.
Convention Paper 9047 (Purchase now)
P5-6 A Novel Approach for Prototype Extraction in a Multipoint Equalization Procedure—Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy; Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary; Alberto Carini, University of Urbino "Carlo Bo" - Urbino, Italy
Multipoint equalization is a useful procedure used to enlarge the zone to be equalized in sound reproduction systems by measuring the room impulse responses in multiple locations and deriving a prototype function capable to represent the real environment. This paper deals with the introduction of a novel prototype function derived from the combination of quasi-anechoic impulse responses with the impulse responses recorded in the real environment to be equalized. This is motivated by the fact that at mid and high frequencies the timbre perception and localization is dominated by the direct sound, thus, the measurable, but mostly inaudible magnitude deviations due to reflections should not be equalized. Several experiments have been conducted in order to validate the proposed approach, considering a real environment and reporting objective and subjective measurements in comparison with the state of the art.
Convention Paper 9048 (Purchase now)
P5-7 Implementation of a Binaural Localization Algorithm in Hearing Aids: Specifications and Achievable Solutions—Gilles Courtois, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Patrick Marmaroli, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Morten Lindberg, 2L (Lindberg Lyd AS) - Oslo, Norway; Yves Oesch, Phonak Communciations AG - Murten, Switzerland; William Balande, Phonak Communciations AG - Murten, Switzerland
This paper introduces the constraints and issues related to the implementation of a binaural localization algorithm on a pair of hearing aids. This algorithm should improve the rendering of the spatial information available in the audio signals, which are usually distorted by the signal processing algorithms in the hearing devices, thus degrading localization cues. First, several reported algorithms achieving binaural sound localization in the frontal horizontal plane are reviewed. The way in which the standard methods and processes could be used within the context of hearing aids is then discussed. Finally, a solution that is suitable for a certain type of system is proposed.
Convention Paper 9034 (Purchase now)
P6 - Room Acoustics
Sunday, April 27, 16:30 — 18:30 (Room Paris)
Ben Kok, BEN KOK - acoustic consulting - Uden, The Netherlands
P6-1 Infrasound in Vehicles—Theory, Measurement, and Analysis—John Vanderkooy, University of Waterloo - Waterloo, ON, Canada
Infrasound (IS) in cars is quite strong and may be responsible for health effects. This paper presents measurements and simplified mechanisms for the production of IS in vehicles. Four mechanisms are proposed: (1) turbulence from the moving vehicle or other traffic, infusing through the vents; (2) flexing of the body causing volume changes; (3) acceleration of the vehicle, causing an inertial reaction from the enclosed and external air; and (4) pressure variations due to altitude changes. The acoustic pressure from these mechanisms can be simplified by the fact that IS wavelengths are much larger than the size of the vehicle. Measurements are interesting and analyzed to elucidate the acoustic contribution of each mechanism.
Convention Paper 9049 (Purchase now)
P6-2 Open Plan Office Acoustics and Computer Modeling: Theory versus Practice—Lise W. Tjellesen, Applied Acoustic Design (AAD) - Ataines, UK
The acoustics of open plan offices and offices in general has long been the subject of numerous studies looking at privacy levels and speech intelligibility. As acoustic consultants, office design is often dealt with on a daily basis, and various guidance’s such as national / international standards, building regulations or general codes of practice are normally used as references when carrying out the design. In more complex designs of offices, especially open plan offices, computer modeling is used more and more as an integral and important tool alongside accumulated experiences from measurements. This paper explores the problems and challenges of using computer modeling in connection with open plan office design.
Convention Paper 9050 (Purchase now)
P6-3 On Low and Mid Frequencies Sound Absorption Characteristics of Porous Materials—Elena Prokofieva, Edinburgh Napier University - Edinburgh, UK
Porous materials are frequently used in sound insulation applications. In building acoustics they are installed inside the separating constructions to absorb unwanted mid and high frequencies propagating through them. Although it is not required by UK Building Regulations, low frequency noise transferred through separating partitions is considered as a nuisance by occupants and could and should be addressed. A laboratory study had been conducted to investigate the effect of various types of porous materials that may be used for partition fillings on the sound absorption over low and mid frequencies. The results suggest that open pore materials could improve low frequency range of absorption while have no detrimental effect on mid frequency absorption level.
Convention Paper 9051 (Purchase now)
P6-4 Optimizing Microphone Placement and Formats for Assistive Listening System Sound Pick-Up—Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Approximately 10 – 14% of the general population (USA & Northern Europe) suffer from a noticeable degree of hearing loss and would benefit from some form of hearing assistance or deaf aid. However, many assistive listening systems do not provide the benefit that they should, as they are often let down by their poor acoustic performance. The paper investigates the acoustic and speech intelligibility requirements for ALS performance and examines a number of microphone pick-up scenarios in terms of their potential intelligibility and sound quality performance. The results of testing carried out in a number of rooms and venues are presented, mainly in terms of the resultant Speech Transmission Index (STI) measurements. The paper concludes by providing a number of recommendations and “rules of thumb” for successful microphone placement and testing.
Convention Paper 9052 (Purchase now)
P7 - Transducers—Part 1: Loudspeakers
Monday, April 28, 09:00 — 13:00 (Room Paris)
Markus Koch, Bang & Olufsen Deutschland GmbH - Pullach, Germany
P7-1 Small-Signal Loudspeaker Impedance Emulator—Niels Elkjær Iversen, Technical University of Denmark - Kongens Lyngby, Denmark; Arnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark
Specifying the performance of audio amplifiers is typically done by playing sine waves into a pure ohmic load. However real loudspeaker impedances are not purely ohmic but characterized by the mechanical resonance between the mass of the diaphragm and the compliance of its’ suspension which vary from driver to driver. Therefore a loudspeaker emulator capable of adjusting its’ impedance to a given driver is in need for measurement purposes. This paper proposes a loudspeaker emulator circuit for small signals. Simulations and experimental results are compared and show that it is possible to emulate the loudspeaker impedance with an electric circuit and that its’ resonance frequency can be changed by tuning two resistors.
Convention Paper 9053 (Purchase now)
P7-2 Dynamic Measurement of Loudspeaker Suspension Parameters Using an Active Harmonic Control Technique—Antonin Novak, Orkidia Audio - Ascain, France; Université du Maine, UMR CNRS 6613 - Les Mans, France; Pierrick Lotton, Université du Maine, UMR CNRS 6613 - Le Mans, France; Laurent Simon, Université du Maine, UMR CNRS 6613 - Le Mans, France
A new nondestructive technique to measure the nonlinear suspension parameters (stiffness Kms and mechanical resistance Rms) of a loudspeaker using an active harmonic control technique is presented. The goal of the active harmonic control is to eliminate the higher harmonics from the displacement signal so that a purely harmonic motion of the diaphragm is ensured. The nonlinear stiffness Kms is then measured as a function of instantaneous and peak displacement; the mechanical resistance Rms is measured as a function of velocity. A frequency dependence of these parameters is also discussed.
Convention Paper 9054 (Purchase now)
P7-3 Auralization of Signal Distortion in Audio Systems Part 2: Transducer Modeling—Wolfgang Klippel, Klippel GmbH - Dresden, Germany
A new method is presented for the auralization of selected distortion components generated by regular nonlinearities inherent in loudspeaker systems. Contrary to the generic approach presented in the first part the alternative approach presented here exploits the results of lumped parameter modeling in the state space. A mixing device generates a virtual output signal comprising nonlinear distortion attenuated or enhanced by a user-defined scaling factor. The auralization output can be used for systematic listening tests or perceptive modeling to determine audibility thresholds and to assess the impact on sound quality of the dominant nonlinearities in loudspeakers.
Convention Paper 9055 (Purchase now)
P7-4 Quantifying Acoustic Measurement Tolerances and Their Importance in the Loudspeaker Supply Chain—Peter John Chapman, Bang & Olufsen a/s - Struer, Denmark
Tolerances are attached to any type of measurement, and acoustical measurements are typically associated with relatively large tolerances. Despite this, measurement results are often quoted to a high degree of precision and test limits are regularly set without consideration of the measurement tolerances involved. Quantifying measurement tolerances in manufacturing in general is well documented; however the literature fails to describe the application of suitable analysis methods to the field of acoustical measurements. The paper presents the consequences of the presence of measurement tolerances in classifying parts and also describes the shortfalls of the Gauge R&R study. How to quantify a capable measurement system is described including a simple method for quantifying acoustical measurement tolerances. This is particularly relevant in quality assurance in loudspeaker production and relates strongly to the definition of test limits and loudspeaker specifications in the supply chain.
Convention Paper 9056 (Purchase now)
P7-5 An Investigation of Loudspeaker Simulation Efficiency and Accuracy Using a Conventional Model, a Near-to-Far-Field Transformation, and the Rayleigh Integral—Ulrik Skov, iCapture ApS - Roskilde, Copenhagen, Denmark; René Christensen, iCapture ApS - Roskilde, Denmark
Simulation on loudspeaker drivers require a conventional fully coupled vibroacoustic model to capture both the effect of the loading mass of the air on the moving parts and the geometric topology of the cone, dust cap, and surround. An accurate vibroacoustic model can be time-consuming to solve, especially in 3-D. In practical applications, this results in poor efficiency concerning the decision-making process to move on to the next simulation model. To overcome this the loudspeaker designer can use either a near-to-far-field transformation or post-process structural only results via the Rayleigh integral to reduce or totally eliminate the computationally demanding open air domain in front of the speaker. These simplifications come with the cost of a frequency dependent inaccuracy. This paper compares for three different drivers (a totally flat, a concave cone, and a convex dome) the efficiency and accuracy of a conventional fully-coupled vibroacoustic model where the measurement point is included in the computational FEA domain with respectively, a reduced air domain model having the measurement point outside the computational FEA domain obtained by a near-to-far-field transformation, and a model relying on the structural only Rayleigh integral post-processing.
Convention Paper 9057 (Purchase now)
P7-6 Mechanical Nonlinearities of Electrodynamic Loudspeakers: An Experimental Study—Balbine Maillou, Université du Maine - LAUM CNRS UMR 6613 - Le Mans, France; Pierrick Lotton, Université du Maine, UMR CNRS 6613 - Le Mans, France; Antonin Novak, Orkidia Audio - Ascain, France; Université du Maine, UMR CNRS 6613 - Les Mans, France; Laurent Simon, Université du Maine, UMR CNRS 6613 - Le Mans, France
Spider and surround suspensions are at the origin of viscoelastic and nonlinear behaviors of loudspeakers because of their assembly geometry and their intrinsic materials. We propose here a new dynamic experimental method to characterize these properties. We drive the loudspeaker moving part with a shaker and measure the driving force, the acceleration, the velocity, and the displacement. Results are presented and discussed for a given loudspeaker, which surround suspensions exhibit viscoelastic behavior.
Convention Paper 9058 (Purchase now)
P7-7 Active Loudspeaker Heat Protection—Stéphan Tassart, STMicroelectronics - Paris, France; Simon Valcin, STMicroelectronics - Grenoble, France; Michel Menu, STMicroelectronics - Grenoble, France
Loudspeakers are devices that accumulate heat during their transduction process. The rise of temperature is potentially harmful for the voice-coil and must be countered by the active heat control (AHC) process when other passive and mechanical dissipation schemes become inefficient. Known AHC aim at limiting the voice-coil temperature through a closed-loop approach and may lead to oscillations and audio artifacts when temperature measurements are available with latency. This paper establishes that an open-loop AHC relying on a dynamic range compressor configured as a brick-wall limiter whose threshold is modulated by the temperature of the magnetic components insures a bounded voice-coil temperature. The temperature of the magnetic assembly and the driving force of the loudspeaker can be both estimated in real-time, respectively by a linear quadratic observer (a Kalman filter) and by an envelope follower. The new AHC scheme is demonstrated and compared to closed-loop AHC on a simulation example.
Convention Paper 9059 (Purchase now)
P7-8 A Novel Moving Magnet Linear Motor—Claudio Lastrucci, Powersoft S.p.a. - Scandicci (FI), Italy
Electrical to acoustic conversion approach has not changed since the beginning of acoustics. New technologies in the electronic amplification domain and latest magnetic materials open a door in the field of alternative methods of acoustic transduction. A new electrodynamic device that considerably improves electrical to acoustical conversion efficiency, sound quality, robustness, and power handling has been developed. A fully balanced and symmetrical moving magnet motor design, along with anysotropic magnetic compound integration, delivers substantial performances in terms of acceleration, linearity, and efficiency providing additional degrees of freedom in high quality professional speaker design.
Convention Paper 9060 (Purchase now)
P8 - Transducers—Part 2
Monday, April 28, 13:30 — 16:00 (Room Paris)
Hans Riekehof-Boehmer, SCHOEPS Mikrofone - Karlsruhe, Germany
P8-1 Ambient Atmospheric Conditions and Their Influence on Acoustic Measurements—Peter John Chapman, Bang & Olufsen a/s - Struer, Denmark
The paper describes an area that is often not considered by those who are involved in performing acoustic measurements. Specifically, the measured sound pressure level is directly influenced by the ambient atmospheric conditions in which the measurement is performed and by the raw condition of the device under test. The influence of temperature, atmospheric pressure, and humidity are described. Different strategies for removing these influences are presented. Furthermore, consequences of ignoring these influences in the laboratory and on the production line are illustrated in terms of measurement error, falling yield, and misalignment of sensitivity in active systems. The paper focuses on indoor acoustic measurements but the subject is equally valid for outdoor measurements.
Convention Paper 9061 (Purchase now)
P8-2 Numerical Simulation of Microphone Wind Noise, Part 3: Wind Screens and Shields—Juha Backman, Nokia Corporation - Espoo, Finland
This paper discusses the use of the computational fluid dynamics (CFD) for computational analysis of microphone wind noise. This final part of the study discusses the effect of typical wind shields and pop screens on the flow around the microphone body. The simulations show that although all types of screens share common action mechanisms, moving the vortices generated by the flow further away from the microphone diaphragm and absorbing the vortex energy, the importance of these mechanisms varies with the type and size of the shield. In addition to the flow properties also the effect of different wind shields on the acoustical response of the microphone is discussed.
Convention Paper 9062 (Purchase now)
P8-3 Graphene Microphone—Dejan Todorovic, Dirigent Acoustics Ltd. - Belgrade, Serbia; Iva Salom, Institute Mihajlo Pupin - Belgrade, Serbia; Djordje Jovanovic, Institute of Physics Belgrade, University of Belgrade - Belgrade, Serbia; Aleksandar Matkovic, University of Belgrade - Belgrade, Serbia; Marijana Milicevic, University of Belgrade - Belgrade, Serbia; Mirjana Radosavljevic, Dirigent Acoustics Ltd. - Belgrade, Serbia
This paper analyses recent trends in graphene research and applications in acoustics and audio-technology and attempts to identify future directions in which the field is likely to develop. The possibilities of application of single or multi-layer graphene as membranes in transducers are the scope of the research of the graphene group. FEM and experimental analysis of single and multi-layer graphene, as well as realization of the first samples of acoustic transducers, are in progress.
Convention Paper 9063 (Purchase now)
P8-4 How Far Do Microphones Reach? A Comparison between Dynamic and Analog / Digital Condenser Microphones—Jürgen Breitlow, Georg Neumann Berlin - Berlin, Germany; Dominic Haul, Georg Neumann Berlin - Berlin, Germany
The feedback of recording engineers to condenser microphones occasionally refers to them as “hearing too far.” Unwanted background noises such as fans, clocks, air conditioners that are not heard when recording with a dynamic microphone are being reproduced by a condenser microphone. In the present study this phenomenon is examined in regard to its physical cause. The whole signal chain from the microphone to the recorder has to be considered. Therefore it needs to be determined how the masking effect and the perceptibility of quiet sources are influenced by the self-noise of the signal chain.
Convention Paper 9064 (Purchase now)
P8-5 A Condenser Microphone for Close-Miking and Very High Sound Pressure Levels–Revisited—Martin Schneider, Georg Neumann GmbH - Berlin, Germany
In the late 1960s, tube condenser microphones were superseded by their transistorized counterparts. Close-miking was adopted for pop and rock music. Microphones able to handle these very high SPLs, but keeping the familiar sonic characteristics, needed to be developed. One model, that has now become a classic of its own, will be described in detail, taking advantage of well-sorted archives and the original documents.
Convention Paper 9065 (Purchase now)
P9 - Audio Signal Processing/Transducers/Recording/Network Audio
Monday, April 28, 15:00 — 16:30 (Foyer)
P9-1 Adaptive Digital Oscillator for Virtual Acoustic Feedback—Leonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Marco Giobbi, Universita Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Vesa Välimäki, Aalto University - Espoo, Finland
In the domain of Virtual Acoustics research, the emulation of acoustic feedback, such as the so-called guitar howling, has been scarcely addressed. This paper takes pace from this peculiar effect to introduce a computational technique aimed at its emulation and extension to possible new scenarios of Virtual Acoustics. A nonlinear digital oscillator for real-time operation with good stability properties and low computational cost is employed to emulate guitar feedback (or guitar howling). The oscillator frequency is tuned according to a pitch detection system that adaptively tracks pitch changes in real-time. A real-time implementation of the algorithm in the Puredata environment has been developed to provide guitar howling emulation.
Convention Paper 9066 (Purchase now)
P9-2 A Psychoacoustic-based Vocal Suppression for Enhanced Interactive Service Using Spatial Audio Object Coding—Tung Chin Lee, Yonsei University - Seoul, Korea; Young-cheol Park, Yonsei University - Wonju, Kwangwon-do, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
In this paper we present a new vocal suppression algorithm that can enhance the quality of music signal coded using Spatial Audio Object Coding (SAOC) in Karaoke mode. The remained vocal component in the coded music signal is estimated and suppressed by using a spectral subtraction method. Using the fact that the level of the remained vocal components is varied depending on the input object power, we propose a psychoacoustic rule where the suppression level is adapted according to the auditory masking property. Objective and subjective test were performed and the results confirm that the proposed algorithm offers an improved quality.
Convention Paper 9067 (Purchase now)
P9-3 Application of Common-Pole Parallel Filters to Nonlinear Models Based on Orthogonal Functions—Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary; Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Different nonlinear models are exploited to model real-world devices. Among them, an effective technique is based on the combination of orthogonal nonlinear functions and frequency-domain adaptive filtering algorithms for nonlinear system identification. In this paper first the independence of the model from the orthogonal basis is demonstrated by complementing previously obtained results. Then, a highly efficient model implementation is presented by taking advantage of fixed pole parallel filters for the linear filtering part. The efficiency comes both from using common-pole modeling and from applying a warped filter design that takes into account the frequency resolution of human hearing. Experimental results prove the effectiveness of the proposed approach showing its suitability in real-time digital simulation of nonlinear audio devices.
Convention Paper 9068 (Purchase now)
P9-4 Multiphysic Modeling and Heuristic Optimization of Compression Driver Design—Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy; Emiliano Capucci, Faital S.P.A. - Milan, Italy; Romolo Toppi, Faital S.P.A. - Milan, Italy
The use of finite element analysis is quite common in modern design techniques. Modeling allows to save time and efforts, especially when complex phenomena have to be considered. A compression driver is an example of a product with a problematic design, because the great number of variables and the different physics that are involved in the sound generation process makes the direct solution of mathematical models not trivial. In this paper an algorithm that optimizes the design parameters of the driver through an evolution strategies based procedure, taking advantage of the accuracy of the results from finite elements simulation, is presented. The method has been tested by optimizing a real compression driver and the results are reported.
Convention Paper 9069 (Purchase now)
P9-5 Properties of Gradient Loudspeakers—Sigmund Gudvangen, Buskerud and Vestfold University College - Kongsberg, Norway
The radiation pattern of loudspeakers play a crucial role for how the acoustic power is distributed in the room. There is mounting evidence that early reflections from the side walls are beneficial, while early reflections parallel to the sagittal plane appears to be less desirable. Gradient loudspeakers provide a means of producing unidirectional radiation patterns. Moreover, their radiation patterns are frequency-independent. In view of these very desirable properties the characteristics of first-order gradient loudspeakers are analyzed. General expressions for sound pressure and particle velocity are derived and the distortion of the radiation patterns in the high-frequency region is reviewed.
Convention Paper 9070 (Purchase now)
P9-6 A Guide to the Design and Evaluation of New User Interfaces for the Audio Industry—Christopher Dewey, University of Huddersfield - Huddersfield, UK; Jonathan Wakefield, University of Huddersfield - Huddersfield, UK
This paper starts from the viewpoint that the audio industry should take advantage of the possibilities offered by new visual and interactive interfaces in order to provide the best tools for audio tasks. Audio industry products have moved toward better displays in modern digital mixers and digital audio workstations but haven’t fully embraced the possibilities of current interface technology and remain largely traditional in interface design. In order for audio engineers to develop new visual and interactive audio products an understanding of existing Human Computer Interface (HCI) design and evaluation methodology is required. This paper presents a design and evaluation process that is tailored to audio industry product development and was used in developing a new EQ interface.
Convention Paper 9071 (Purchase now)
P9-7 An Open-Source Dynamic Networked Audio System—Michelle Daniels, University of California San Diego - La Jolla, CA, USA
This paper presents an open-source networked audio system for managed networks that consists of a single Streaming Audio Manager (SAM) and an arbitrary number of clients that can be dynamically added to or removed from the system. Clients stream multichannel uncompressed audio to SAM using the Real-Time Transport Protocol. Inside of SAM, clients can be muted and soloed, and their volume can be adjusted. Additionally, client streams can be delayed to compensate for static differences in latency between audio and video playback in a multimedia environment. Basic SAM setups mix all incoming streams to a specified set of output channels. However, in advanced setups, SAM can send discrete outputs for each client to an external audio rendering system, which communicates with SAM using Open Sound Control (OSC). Third party developers can create their own renderers for advanced audio processing and can also implement user interfaces to remotely control and monitor SAM and its clients using additional OSC messages.
Convention Paper 9072 (Purchase now)
P10 - Human Factors
Monday, April 28, 16:30 — 18:00 (Room Paris)
Hyunkook Lee, University of Huddersfield - Huddersfield, UK
P10-1 A New Algorithm for Vocal Tract Shape Extraction from Singer's Waveforms—Rebecca Vos, University of York - Heslington, York, UK; Jamie Andrea Shyla Angus, University of Salford - Salford, Greater Manchester, UK; Brad H. Story, University of Arizona - Tucson, AZ, USA
This paper presents a new algorithm for extracting vocal tract shape from speech or singing. Based on acoustic sensitivity functions it removes the ambiguity that conventional methods suffer from. We describe acoustic sensitivity functions and how we extract the necessary formant frequencies from the acoustic waveform. Results are presented for a variety of singers both male and female singing a variety of vowels and notes. The results are good and the system not only has applications in voice training but could also be used for control of games or music synthesis.
Convention Paper 9073 (Purchase now)
P10-2 Participatory Amplitude Level Adjustment of Gesture Controlled Upper Body Garment Sound in Immersive Virtual Reality—Erik Sikström, Aalborg University Copenhagen - Copenhagen, Denmark; Morten Havemøller Laursen, Aalborg University Copenhagen - Copenhagen, Denmark; Kasper Søndergaard Pedersen, Aalborg University Copenhagen - Copenhagen, Denmark; Amalia de Götzen, Aalborg University Copenhagen - Copenhagen, Denmark; Stefania Serafin, Aalborg University Copenhagen - Copenhagen, Denmark
Gesture-controlled sounds from virtual clothes in immersive virtual environments, is a relatively unexplored topic. In this paper an experiment aiming to find a range between a highest acceptable amplitude level and a lowest acceptable level for the sounds of an upper-body garment was conducted. Participants were asked to set the two amplitude levels of the sound from the virtual clothes that were generated by the subjects’ gesture input, in relation to other sound sources with already predefined levels (footsteps and ambient sounds). This task was performed while walking around in a virtual park area. The results yielded two dynamic ranges that were differently placed depending on if the sound was initially presented at a loudest possible level, or the lowest possible level.
Convention Paper 9074 (Purchase now)
P10-3 Audio Information Mining – Pragmatic Review, Outlook, and a Universal Open Architecture—Philip J. Duncan, University of Salford - Salford, Greater Manchester, UK; Duraid Y. Mohammed, University of Salford - Salford, Greater Manchester, UK; Francis F. Li, University of Salford - Salford, Greater Manchester, UK
There is an immense amount of audio data available currently whose content is unspecified and the problem of classification and generation of metadata poses a significant and challenging research problem. We present a review of past and current work in this field; specifically in the three principal areas of segmentation, feature extraction, and classification and give an overview and critical appraisal of techniques currently in use. One of the major impediments to progress in the field has been specialism and the inability of classifiers to generalize, and we propose a non exclusive generalized open architecture framework for classification of audio data that will accommodate third party plugins and work with multi-dimensional feature/descriptor space as input.
Convention Paper 9075 (Purchase now)
P11 - Spatial Audio
Tuesday, April 29, 09:00 — 12:30 (Room Paris)
Clemens Par, Swiss Audec - Morges, Switzerland
P11-1 Control of Frame Loudspeaker Array for 3-D Television—Akio Ando, University of Toyama - Toyama, Japan; Masafumi Fujii, University of Toyama - Toyama, Japan
To obtain a stable sound localization on the TV display, the use of a loudspeaker array set on the frame of the display may be a solution. However, the frequency response and the shape of the wave front reproduced by the array sometimes deteriorate. This is because the wave field synthesis with Rayleigh integrals may not be effective in the absence of a secondary source on the display. In this study we use the Rayleigh I integral to calculate input signals of the loudspeakers and introduce weighting coefficients for the signals to alleviate the deterioration. Error functions are defined to scale such deterioration and minimized by the simulated annealing. As the result, the frequency response and the wave surface were improved regardless of the virtual source position.
Convention Paper 9076 (Purchase now)
P11-2 Ambidio: Sound Stage Width Extension for Internal Laptop Loudspeakers—Tsai-Yi Wu, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA; Ralph Glasgal, Ambiophonics Institute - Rockleigh, NJ, USA
This paper introduces a sound stage width extension method for internal loudspeakers. Ambidio is a real-time application that enhances a stereo sound file playing on a laptop in order to provide a more immersive experience over built-in laptop loudspeakers. The method, based on Ambiophonics principles, is relatively robust to a listener's head position and requires no measured/synthesized HRTFs. The key novelty of the approach is the pre/post-processing algorithm that dynamically tracks the image spread and modifies it to fit the hardware setting in real-time. Two detailed evaluations are provided to assess the robustness of the proposed method. Experimental results show that the average perceived stage width of Ambidio is 176° using internal speakers, while keeping a relatively flat frequency response and a higher user preference rating.
Convention Paper 9077 (Purchase now)
P11-3 On Spatial-Aliasing-Free Sound Field Reproduction using Infinite Line Source Arrays—Frank Schultz, University of Rostock / Institute of Communications Engineering - Rostock, Germany; Till Rettberg, University of Rostock - Rostock, Germany; Sascha Spors, University of Rostock - Rostock, Germany
Concert sound reinforcement systems aim at the reproduction of homogeneous sound fields over extended audiences for the whole audio bandwidth. For the last two decades this has been mostly approached by using so called line source arrays due to their superior abilities of producing homogeneous sound fields. Design and setup criteria for line source arrays were derived as Wavefront Sculpture Technology in literature. This paper introduces a viewpoint on the problem at hand by utilizing a signal processing model for sound field synthesis. It will be shown that the optimal radiation of a line source array can be considered as a special case of spatial-aliasing-free synthesis of a wave front that propagates perpendicular to the array. For high frequencies the so called waveguide operates as a spatial low-pass filter and therefore attenuates energy that otherwise would lead to spatial aliasing artifacts.
Convention Paper 9078 (Purchase now)
P11-4 2-D to 3-D Upmixing Based on Perceptual Band Allocation (PBA)—Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening tests were carried out to evaluate the performance of a 2-D to 3-D ambience upmixing technique based on “Perceptual Band Allocation (PBA),” which is a novel vertical image extension method. Five-channel recordings were made with a 3-channel frontal microphone array and a 4-channel ambience array in a concert hall. The 4-channel ambience signals were low- and high-pass filtered at three different crossover frequencies: 0.5 k, 1 k, and 4 kHz. For 2-D to 3-D upmixing, the low-passed signals were routed to the corresponding lower-layer loudspeakers while the high-passed ones to the upper-layer loudspeakers configured in a 9-channel Auro3D-inspired setup. Results suggested that the proposed method produced a similar or greater magnitude of perceived 3-D listener envelopment compared to an original 9-channel ambience recording as well as the original 5-channel recording, depending on the crossover frequency.
Convention Paper 9079 (Purchase now)
P11-5 Customization of Head-Related Impulse Response Via Two-Dimension Common Factor Decomposition and Sampled Measurements—Zhixin Wang, City University of Hong Kong - Kowloon, Hong Kong; Cheung Fat Chan, City University of Hong Kong - Kowloon, Hong Kong
A method based on subject-dependent impulse response extraction is proposed for the customization of head-related impulse response. In the training step, a two-dimension common factor decomposition algorithm is applied to train a set of direction-dependent impulse responses that are common for all subjects. A subject-dependent impulse response is extracted simultaneously for each subject to capture the subject-dependent information. In the customization step, the subject-dependent impulse response of a target subject is extracted from several head-related impulse response measurements of the subject. The extracted subject-dependent impulse response is then convolved with the trained direction-dependent impulses to construct all head-related impulse responses for the target subject. It is shown that with head-related impulse responses measured at a few directions for a target subject, head-related impulse responses at all trained directions can be customized with fairly low distortion.
Convention Paper 9080 (Purchase now)
P11-6 A Flexible System Architecture for Collaborative Sound Engineering in Object-Based Audio Environments—Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Audanika GmbH; Christoph Sladeczek, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
Object-based sound reproduction, on the one hand, allows sound engineers to interact with sound objects, not only during production but also in the reproduction venue. On the other hand object-based systems are quite complex. Multicore audio processors are used to render complex sound scenes consisting of hundreds of audio objects to be reproduced using a large number of loudspeaker channels. This results in the need for applications optimally adapted to the user. Working tasks need to be parallelized. This paper outlines a software architecture that helps to incorporate the multitude of audio processing components of an object-based spatial audio environment into a unified system. The architecture allows multiple sound engineers to access, monitor, control, and/or change these system components parameters collaboratively using wireless mobile devices.
Convention Paper 9082 (Purchase now)
P11-7 Effect of Microphone Number and Positioning on the Average of Frequency Responses in Cinema Calibration—Giulio Cengarle, Dolby Laboratories - Barcelona, Spain; Toni Mateos, Dolby Laboratories - Barcelona, Spain
When measuring the response of a loudspeaker by averaging multiple points in a room, the results typically vary according to the number of microphones employed and their positions. We present an interpretation of the average procedure that shows that averaging converges to a compromise response over the relevant listening area, at a rate inverse to the square root of the number of microphones employed. We then provide real-world examples by performing measurements in a dubbing stage and a cinema theater, and analyzing the variations of average frequency responses over a large set of different microphone number and positioning. Results confirm the predicted scaling of the deviations and quantify their magnitude in typical rooms. The data provided helped to establish the point of diminishing returns in number of microphones.
Convention Paper 9083 (Purchase now)
P12 - Applications in Audio
Tuesday, April 29, 14:30 — 16:30 (Room Paris)
Dylan Menzies, De Montfort University - Leicester, UK
P12-1 A Delayed Parallel Filter Structure with an FIR Part Having Improved Numerical Properties—Balázs Bank, Budapest University of Technology and Economics - Budapest, Hungary; Julius O. Smith, III, Stanford University - Stanford, CA, USA
In real-world applications high-order IIR filters are often converted to series or parallel second-order sections to decrease the negative effects of coefficient truncation and round-off noise. While series biquads are more common, the parallel structure is gaining more interest due to the possibility of full code parallelization. In addition, it is relatively simple to design a filter directly in a parallel form, which can be efficiently utilized for logarithmic frequency resolution filtering often required in audio. If the numerator order of the original transfer function is higher than that of the denominator, a parallel FIR part arises in addition to the second-order IIR sections. Unfortunately, in this case the gain of the sections and that of the FIR filter can be significantly higher than that of the final transfer function, which requires the downscaling of the filter coefficients to avoid overload. This leads to a significant loss of useful bit-depth. This paper analyzes problem and suggests delaying the IIR part so that there is no overlap between the responses of the FIR part and the second-order sections.
Convention Paper 9084 (Purchase now)
P12-2 A Loudness-Based Adaptive Equalization Technique for Subjectively Improved Sound Reproduction—Konstantinos Drossos, Ionian University - Corfu, Greece; Andreas Floros, Ionian University - Corfu, Greece; Nikolaos-Grigorios Kanellopoulos, Ionian University - Corfu, Greece
Sound equalization is a common approach for objectively or subjectively defining the reproduction level at specific frequency bands. It is also well-known that the human auditory system demonstrates an inner process of sound-weighting. Due to this, the perceived loudness changes with the frequency and the user-defined sound reproduction gain, resulting into a deviation of the intended and the perceived equalization scheme as the sound level changes. In this work we introduce a novel equalization approach that takes into account the above perceptual loudness effect in order to achieve subjectively constant equalization. A series of listening tests shows that the proposed equalization technique is an efficient and listener-preferred alternative for both professional and home audio reproduction applications.
Convention Paper 9085 (Purchase now)
P12-3 New Sound and Visual System of the State Parliament of North-Rhine Westphalia in Duesseldorf, Germany—Enno Finder, ADA Acoustics & Media Consultants GmbH - Berlin, Germany; Wolfgang Ahnert, ADA Acoustics & Media Consultants GmbH - Berlin, Germany
In 2012 the assembly hall of the state parliament of North-Rhine Westphalia, located in Duesseldorf, Germany, was renovated extensively. In this context the floor structure has been changed to improve the situation for handicapped delegates and guests. The new sound design have to solve two tasks, namely excellent sound coverage from the lectern and president’s position and sound localization for speeches from a delegate desk by using a kind of a new delta-stereophonic network. The paper will explain all design issues including new microphones on the lectern. The issue to fit the new speaker systems and the new microphone types into the architectural environment is discussed as well the start of renovation of the visual systems.
Convention Paper 9086 (Purchase now)
P12-4 Latest Improvements for Spatial Sound Reinforcement: Configuration’s Automation, Remote Control Using Mobile Devices, and Object Based Room Simulation—Javier Frutos-Bonilla, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Gabriel Gatzsche, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Audanika GmbH; Rene Rodigast, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
Fraunhofer IDMT presented in 2005 a sound reinforcement system based on the precedence effect that could recreate the natural spatial impression of static and dynamic sources on the stage regardless of the position of the listeners. Based on the experience learned over the last years, this paper presents three different developments for this system. These improvements are related to the automatic determination of the configuration parameters, the multiuser adjustment of fine parameters on the tribune using portable devices, and a new room simulation approach that takes into consideration both source and listener’s dependencies.
Convention Paper 9087 (Purchase now)
P13 - Applications in Audio/Education/Forensics
Tuesday, April 29, 15:00 — 16:30 (Foyer)
P13-1 Diffused System of Noise Measurement, Concept, and Implementation—Bartlomiej Kruk, Wroclaw University of Technology - Wroclaw, Poland; Michal Luczynski, Wroclaw University of Technology - Wroclaw, Poland; Adrian Pralat, Wroclaw University of Technology - Wroclaw, Poland
The main purpose of this paper is to explore the possibilities of improving the process of noise measurement. It is a known fact that performing simultaneous noise measurements in multiple locations requires the involvement of many individuals in order to operate equipment and ensure that the results are valid. The goal is to improve the measurement process by utilizing modern technology in a way allowing for the data to be collected in a controlled way and submitted to a central location. It allows for elimination of data preprocessing process and facilitates the acquisition and analysis process in measurements requiring data acquisition in multiple locations, reducing the human labor requirements and financial cost of measurements.
Convention Paper 9088 (Purchase now)
P13-2 Improving the Performance of an In-Home Acoustic Monitoring System by Integrating a Vocal Effort Classification Algorithm—Emanuele Principi, Università Politecnica delle Marche - Ancona, Italy; Roberto Bonfigli, Universita Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
The research interest in technologies for supporting people in their own homes is constantly increasing. In this context this paper proposes a speech-interfaced system for recognizing home automation commands and distress calls. The robustness of the system is increased by employing Power Normalized Cepstral Coefficients as features and by using an adaptive algorithm to reduce known sources of interference. In addition, the mismatch introduced by vocal effort variability is reduced employing a vocal effort classifier and multiple acoustic models. The performance has been evaluated on ITAAL, a recently proposed corpus of home automation commands and distress calls in Italian. The results confirm that the adopted solutions are effective to be employed in a distorted acoustic scenario.
Convention Paper 9089 (Purchase now)
P13-3 Eyes-Free Interaction for Personal Media Devices—Thomas Svedström, Aalto University - Espoo, Finland; Aki Härmä, Philips Research - Eindhoven, The Netherlands
The use of visual user interfaces in smartphones and other personal media devices (PMD) leads to decreased situational awareness, for example, in city traffic. It is proposed in the paper that many menu navigation functions in PMDs can be replaced by an eyes-free auditory interface and an input device based on acoustic recognition of tactile gestures. We demonstrate, using a novel experimental setup, that the use of the proposed auditory interface reduces the reaction times to external events in comparison to a visual UI. In addition, while the task completion times in menu navigation are somewhat increased in the auditory interface the subjects were able to complete the given interaction tasks correctly within a reasonable time.
Convention Paper 9090 (Purchase now)
P13-4 Supporting TV Sound in the UK – A New Role for Education?—Patrick Quinn, Glasgow Caledonian University - Glasgow, Scotland, UK
The demands placed on staff working in TV sound have changed and grown over the last few decades particularly for those in a senior role. Based on observations and interviews with senior staff this paper gives an overview of the challenges for those working in TV sound in the UK and suggests an enhanced role for Higher and Further Education to support the industry.
Convention Paper 9091 (Purchase now)
P13-5 Cross Level Peer Tutoring to Support Students Learning Audio Programming—David Moore, Glasgow Caledonian University - Glasgow, Lanarkshire, UK; Steven Walters, Glasgow Caledonian University - Glasgow, Scotland, UK
Computer programming supports learning of key concepts in audio and music technology education, including digital audio processing and sound synthesis. However, programming is a subject that can pose a challenge–particularly for students whose primary focus of study is not pure computer science. This paper examines cross level peer tutoring as a method for supporting audio students learning programming as part of an audio processing module. It will examine the viability of this scheme as a method for enhancing student self-efficacy and achievement. The paper will explore the benefits and issues from the point of view of the tutees as well as the tutors through both quantitative and qualitative research.
Convention Paper 9092 (Purchase now)
P13-6 From Faraday to Fourier: Teaching Audio Technology Fundamentals Using Loudspeaker Design—Scott Beveridge, Glasgow Caledonian University - Glasgow, Scotland, UK
This paper presents a novel method of teaching basic audio principles. We describe a loudspeaker design activity that encompasses a large number of core learning outcomes. These include the basics of sound and hearing, digital audio, the audio signal path, and electroacoustics. Following the constructivist learning paradigm, the task encourages students to actively develop their own understanding. The task also promotes deep learning strategies in addition to providing a fun and engaging practical learning experience. From an instructor's perspective the activity presents a unified, structured, cost-effective method of presenting course content.
Convention Paper 9093 (Purchase now)
P13-7 Efficient Cross-Codec Framing Grid Analysis for Audio Tampering Detection—Daniel Gärtner, Fraunhofer IDMT - Ilmenau, Germany; Christian Dittmar, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Patrick Aichroth, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Luca Cuccovillo, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Sebastian Mann, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany
In this paper we present an audio tampering detection method based on the analysis of discontinuities in the framing grid, caused either by manipulations within the same recording or across recordings even with codec changes. The approach extends state of the art methods for MP3 framing grid detection with respect to efficiency and robustness, and multi-codec support, adding mp3PRO, AAC, and HE-AAC schemes. An evaluation has been carried out using a publicly available dataset. A high performance is reported on both detecting tampering and codecs showing the usefulness of the approach in audio forensics.
Convention Paper 9094 (Purchase now)