Saturday, May 4, 10:30 — 12:30 (Sala Foscolo)

Paper Session: P2 - Audio Signal Processing—Part 1

Danilo Comminiello, Sapienza University of Rome - Rome, Italy

P2-1 Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770Pedro Duarte Pestana, Research Center for Science and Technology of the Arts, Portguese Catholic University (CITAR-UCP) - Almada, Portugal; CEAUL-FCUL; Joshua D. Reiss, Queen Mary University of London - London, UK; Alvaro Barbosa, Research Center for Science and Technology of the Arts, Portguese Catholic University (CITAR-UCP) - Lisbon, Portugal
The recent loudness measurement recommendations by the ITU and the EBU have gained widespread recognition in the broadcast community. The material it deals with is usually full-range mastered audio content, and its applicability to multitrack material is not yet clear. In the present work we investigate how well the evaluated perception of single track loudness agrees with the measured value as defined by ITU-R BS.1770. We analyze the underlying features that may be the cause for this disparity and propose some parameter alterations that might yield better results for multitrack material with minimal modification to their rating of broadcast content. The best parameter sets are then evaluated by a panel of experts in terms of how well they produce an equal-loudness multitrack mix and are shown to be significantly more successful.
P2-2 Delayless Robust DPCM Audio Transmission for Digital Wireless MicrophonesFlorian Pflug, Technische Universität Braunschweig - Braunschweig, Germany; Tim Fingscheidt, Technische Universität Braunschweig - Braunschweig, Germany
The employment of digital wireless microphones in professional contexts requires ultra-low delay, strong robustness, and high audio quality. On the other hand, audio coding is required in order to comply with the restrictions on the bandwidth of the radio channel, making the resulting source-coded audio signal more vulnerable to channel distortions. Therefore, in this paper we present a transmission system for differential pulse-code modulated (DPCM) audio with receiver-sided soft-decision error concealment exploiting channel reliability information, explicit redundancy by simple delayless parity-check codes, and residual redundancy within the source-coded audio signal. Simulations on frequency-shift keying (FSK)-modulated channels with additive white Gaussian noise show considerable gains in audio quality compared to hard-decision decoding and soft-decision decoding only exploiting reliability information and 0th-order a priori knowledge.
P2-3 Violin Sound Computer Classification Based on Expert KnowledgeAdam Robak, Poznan University of Technology - Poznan, Poland; Ewa Lukasik, Poznan University of Technology - Poznan, Poland
The paper presents results of the analysis of violins recorded during the final stage of the international violin-makers competition held in Poznan in 2011. In the quest for attributes that are both efficient for machine learning and interpretable for human experts we referred to the research of violin acousticians: Duennwald, Buen, and Fritz and calculated violin sound power in frequency bands recommended by these researchers. The resulting features, obtained for the averaged spectra of the musical pieces played at the competition, were used for clustering and classification experiments. Results are discussed, and a notable experiment is presented where the classifier assigns each analyzed violin to an instrument from the precedent violinmakers’ competition (2001) and compares their ranking.
P2-4 A Finite Difference Method for the Excitation of a Digital Waveguide String ModelLeonardo Gabrielli, Universitá Politecnica delle Marche - Ancona, Italy; Luca Remaggi, Universitá Politecnica delle Marche - Ancona, Italy; Stefano Squartini, Università Politecnica delle Marche - Ancona, Italy; Vesa Välimäki, Aalto University - Espoo, Finland
With Digital Waveguide modeling (DWG) a number of excitation methods have been proposed to feed the delay line properly. Generally speaking these may be based on signal models fitting recorded samples, excitation signals extracted from recorded samples or digital filter networks. While allowing for a stable, computationally efficient sound emulation, they may be unable to emulate secondary effects generated by the physical interaction of, e.g., distributed interaction of string and hammer. On the other hand, FDTD (Finite Difference Time Domain) models are more accurate in the emulation of the physical excitation mechanism at the expense of a higher computational cost and a complex coefficient design to ensure numerical stability. In this paper a mixed model is proposed composed of a two-step FDTD model, a commuted DWG, and an adaptor block to join the two sections. Properties of the model are provided and computer results are given for the case of the Clavinet tangent-string mechanism as an example application.
Saturday, May 4, 14:15 — 17:15 (Sala Carducci)

Paper Session: P3 - Perception

Frank Melchior, BBC Research and Development - Salford, UK

P3-1 The Relation between Preferred TV Program Loudness, Screen Size, and Display FormatIan Dash, Consultant - Marrickville, NSW, Australia; Todd Power, University of Sydney - Sydney, NSW, Australia; Densil A. Cabrera, University of Sydney - Sydney, NSW, Australia
The effect of television screen size and display format on preferred TV program loudness was investigated by listening tests using typical program material. While no significant influence on preferred program loudness was observed from screen size or color level, other preference patterns related to soundtrack content type were observed that are of interest.
P3-2 Vibration in Music PerceptionSebastian Merchel, Dresden University of Technology - Dresden, Germany; M. Ercan Altinsoy, Dresden University of Technology - Dresden, Germany
The coupled perception of sound and vibration is a well-known phenomenon during live pop or organ concerts. However, even during a symphonic concert in a classical hall, sound can excite perceivable vibrations on the surface of the body. This study analyzes the influence of audio-induced vibrations on the perceived quality of the concert experience. Therefore, sound and seat vibrations are controlled separately in an audio reproduction scenario. Because the correlation between sound and vibration is naturally strong, vibrations are generated from audio recordings using various approaches. Different parameters during this process (frequency and intensity modifications) are examined in relation to their perceptual consequences using psychophysical experiments. It can be concluded that vibrations play a significant role during the perception of music.
P3-3 An Assessment of Virtual Surround Sound Systems for Headphone Listening of 5.1 Multichannel AudioChris Pike, BBC Research and Development - Salford, York, UK; Frank Melchior, BBC Research and Development - Salford, UK
It is now common for broadcast signals to feature 5.1 surround sound. It is also increasingly common that audiences access broadcast content on portable devices using headphones. Binaural techniques can be applied to create a spatially enhanced headphone experience from surround sound content. This paper presents a subjective assessment of the sound quality of 12 state-of-the-art systems for converting 5.1 surround sound to a 2-channel signal for headphone listening. A multiple stimulus test was used with hidden reference and anchors; the reference stimulus was an ITU stereo down-mix. Dynamic binaural synthesis, based on individualized binaural room impulse response measurements and head orientation tracking, was also incorporated into the test. The experimental design and detailed analysis of the results are presented in this paper.
P3-4 Effect of Target Signal Envelope on Direction Discrimination in Spatially Complex Sound ScenariosOlli Santala, Aalto University School of Electrical Engineering - Aalto, Finland; Marko Takanen, Aalto University School of Electrical Engineering - Aalto, Finland; Ville Pulkki, Aalto University - Aalto, Finland
The temporal envelope of a sound signal has been found to have an effect on localization. Whether this is valid for spatially complex scenarios was addressed by conducting a listening experiment in which a spatially distributed sound source consisted of a target between two interfering noise-like sound sources, all emitting sound simultaneously. All the signals were harmonic complex tones with components within 2 kHz–8.2 kHz and were presented using loudspeaker reproduction in an anechoic chamber. The phases of the harmonic tones of the target signal were altered, causing the envelope to change. The results indicated that prominent peaks in the envelope of the target signal aided in the discrimination of its direction inside the widely distributed sound source.
P3-5 A Framework for Adaptive Real-Time Loudness ControlAndrea Alemanno, Sapienza University of Rome - Rome, Italy; Alessandro Travaglini, Fox International Channels Italy - Guidonia Montecelio (RM), Italy; Simone Scardapane, Sapienza University of Rome - Rome, Italy; Danilo Comminiello, Sapienza University of Rome - Rome, Italy; Aurelio Uncini, Sapienza University of Rome - Rome, Italy
Over the last few years, loudness control represents one of the most frequently investigated topics in audio signal processing. In this paper we describe a framework designed to provide adaptive real-time loudness measurement and processing of audio files and streamed content being reproduced by mobile players hosted in laptops, tablets, and mobile phones. The proposed method aims to improve the users’ listening experience by normalizing the loudness level of the content in real-time, while preserving the original creative intent of the original soundtrack. The loudness measurement and adaptation is based on a customization of the High Efficiency Loudness Model algorithm described in the AES Convention Paper #8612 (“HELM: High Efficiency Loudness Model for Broadcast Content,” presented at the 132nd Convention, April 2012). Technical and subjective tests were performed in order to evaluate the performance of the proposed method. In addition, the way the subjective test was arranged offered the opportunity to gather information on the preferred Target Level of streamed and media files reproduced on portable devices.
P3-6 The Perception of Masked Sounds and Reverberation in 3-D vs. 2-D Playback SystemsGiulio Cengarle, Imm Sound S.A., a Dolby company - Barcelona, Spain; Alexandre Pereda, Fundació Barcelona Media - Barcelona, Spain
This paper presents studies on perceptual aspects of spatial audio and their dependency on the playback format. The first study regards the perception of sound in the presence of a masker in stereo, 5.1, and 3-D. Psychoacoustic tests show that the detection threshold improves with the spread of the masker, which justifies the claim that individual elements of dense soundtracks are more audible when they are distributed in a wider panorama. The second study indicates that the perception of the reverberation level does not depend on the playback format. The joint interpretation of these results justifies the claim that in 3-D sound engineers can use higher levels of reverberation without compromising the intelligibility of the sound sources.
Sunday, May 5, 09:00 — 10:00 (Sala Alighieri)


Loudness Day: L1 - Loudness 101—A Hitchhiker's Guide to Audio Nirvana

Florian Camerer, ORF - Austrian TV - Vienna, Austria; EBU - European Broadcasting Union

This session will bring participants up to speed regarding most aspects of loudness control and metering. It is targeted to sound engineers in general, giving a brief intro to the algorithm and the metering paradigms and then expanding to common misunderstandings, dangers as well as chances and challenges. Some new concepts like "gating" and "true peak level" will be explained in detail. As chairman of the European loudness group PLOUD and senior post-pro mixing engineer at ORF (Austrian TV), Florian Camerer has enough experience under his belt to provide a thorough workout!


Sunday, May 5, 09:00 — 13:00 (Sala Carducci)

Paper Session: P6 - Recording and Production

Alex Case, University of Massachusetts—Lowell - Lowell, MA, USA

P6-1 Automated Tonal Balance Enhancement for Audio Mastering ApplicationsStylianos-Ioannis Mimilakis, Technological Educational Institute of Ionian Island - Lixouri, Greece; Konstantinos Drossos, Ionian University - Corfu, Greece; Andreas Floros, Ionian University - Corfu, Greece; Dionysios Katerelos, Technological Educational Institute of Ionian Island - Lixouri, Greece
Modern audio mastering procedures are involved with the selective enhancement or attenuation of specific frequency bands. The main reason is the tonal enhancement of the original / unmastered audio material. The aforementioned process is mostly based on the musical information and the mode of the audio material. This information can be retrieved from a listening procedure of the original stimuli, or the correspondent musical key notes. The current work presents an adaptive and automated equalization system that performs the aforementioned mastering procedure, based on a novel method of fundamental frequency tracking. In addition to this, the overall system is being evaluated with objective PEAQ analysis and subjective listening tests in real mastering audio conditions.
P6-2 A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone TypesBrecht De Man, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
An essential but complicated task in the audio production process is the selection of microphones that are suitable for a particular source. A microphone is often chosen based on price or common practices, rather than whether the microphone actually sounds best in that particular situation. In this paper we perceptually assess six microphone types for recording a female singer. Listening tests using a pairwise and multiple stimuli approach are conducted to identify the order of preference of these microphone types. The results of this comparison are discussed, and the performance of each approach is assessed.
P6-3 Comparison of Power Supply Pumping of Switch-Mode Audio Power Amplifiers with Resistive Loads and Loudspeakers as LoadsArnold Knott, Technical University of Denmark - Kgs. Lyngby, Denmark; Lars Press Petersen, Technical University of Denmark - Kgs. Lyngby, Denmark
Power supply pumping is generated by switch-mode audio power amplifiers in half-bridge configuration, when they are driving energy back into their source. This leads in most designs to a rising rail voltage and can be destructive for either the decoupling capacitors, the rectifier diodes in the power supply or the power stage of the amplifier. Therefore precautions are taken by the amplifier and power supply designer to avoid those effects. Existing power supply pumping models are based on an ohmic load attached to the amplifier. This paper shows the analytical derivation of the resulting waveforms and extends the model to loudspeaker loads. Measurements verify that the amount of supply pumping is reduced by a factor of four when comparing the nominal resistive load to a loudspeaker. A simplified and more accurate model is proposed and the influence of supply pumping on the audio performance is proven to be marginal.
P6-4 The Psychoacoustic Testing of the 3-D Multiformat Microphone Array Design and the Basic Isosceles Triangle Structure of the Array and the Loudspeaker Reproduction ConfigurationMichael Williams, Sounds of Scotland - Le Perreux sur Marne, France
Optimizing the loudspeaker configuration for 3-D microphone array design can only be achieved with a clear knowledge of the psychoacoustic parameters of reproduction of height localization. Unfortunately HRTF characteristics do not seem to give us useful information when applied to loudspeaker reproduction. A set of psychoacoustic parameters have to be measured for different configurations in order to design an efficient microphone array recording system, even more so, if a minimalistic approach to array design is going to be a prime objective. In particular the position of a second layer of loudspeakers with respect to the primary horizontal layer is of fundamental importance and can only be based on the psychoacoustics of height perception. What are the localization characteristics between two loudspeakers situated in each of the two layers? Is time difference as against level difference a better approach to interlayer localization? This paper will try to answer these questions and also justify the basic isosceles triangle loudspeaker structure that will help to optimize the reproduction of height information.
P6-5 A Perceptual Audio Mixing DeviceMichael J. Terrell, Queen Mary University of London - London, UK; Andrew J. R. Simpson, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
A method and device is presented that allows novice and expert audio engineers to perform mixing using perceptual controls. In this paper we use Auditory Scene Analysis [Bregman, 1990, MIT Press, Cambridge] to relate the multitrack component signals of a mix to the perception of that mix. We define the multitrack components of a mix as a group of audio streams, which are transformed into sound streams by the act of reproduction, and which are ultimately perceived as auditory streams by the listener. The perceptual controls provide direct manipulation of loudness balance within a mixture of sound streams, as well as the overall mix loudness. The system employs a computational optimization strategy to perform automatic signal gain adjustments to component audio-streams, such that the intended loudness balance of the associated sound-streams is produced. Perceptual mixing is performed using a complete auditory model, based on a model of loudness for time-varying sound streams [Glasberg and Moore, J. Audio Eng. Soc., vol. 50, 331-342 (2002 May)]. The use of the auditory model enables the loudness balance to be automatically maintained regardless of the listening level. Thus, a perceptual definition of the mix is presented that is listening-level independent, and a method of realizing the mix practically is given.
P6-6 On the Use of a Haptic Feedback Device for Sound Source Control in Spatial Audio SystemsFrank Melchior, BBC Research and Development - Salford, UK; Chris Pike, BBC Research and Development - Salford, York, UK; Matthew Brooks, BBC Research and Development - Salford, UK; Stuart Grace, BBC Research and Development - Salford, UK
Next generation spatial audio systems are likely to be capable of 3-D sound reproduction. Systems currently under discussion require the sound designer to position and manipulate sound sources in three dimensions. New intuitive tools, designed to meet the requirements of audio production environments, are needed to make efficient use of this new technology. This paper investigates a haptic feedback controller as a user interface for spatial audio systems. The paper will give an overview of conventional tools and controllers. A prototype has been developed based on the requirements of different tasks and reproduction methods. The implementation will be described in detail and the results of a user evaluation will be given.
P6-7 Audio Level Alignment—Evaluation Method and Performance of EBU R 128 by Analyzing Fader MovementsJon Allan, Luleå University of Technology - Piteå, Sweden; Jan Berg, Luleå University of Technology - Piteå, Sweden
A method is proposed for evaluating audio meters in terms of how well the engineer conforms to a level alignment recommendation and succeeds to achieve evenly perceived audio levels. The proposed method is used to evaluate different meter implementations, three conforming to the recommendation EBU R 128 and one conforming to EBU Tech 3205-E. In an experiment, engineers participated in a simulated live broadcast show and the resulting fader movements were recorded. The movements were analyzed in terms of different characteristics: fader mean level, fader variability, and fader movement. Significant effects were found showing that engineers do act differently depending on the meter and recommendation at hand.
P6-8 Balance Preference Testing Utilizing a System of Variable Acoustic ConditionRichard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Brett Leonard, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Scott Levine, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Grzegorz Sikora, Bang & Olufsen Deutschland GmbH - Pullach, Germany
In the modern world of audio production, there exists a significant disconnect between the music mixing control room of the audio professional and the listening space of the consumer or end user. The goal of this research is to evaluate a series of varying acoustic conditions commonly used in such listening environments. Expert listeners are asked to perform basic balancing tasks, under varying acoustic conditions. The listener can remain in position while motorized panels rotate behind a screen, exposing a different acoustic condition for each trial. Results show that listener fatigue as a variable is thereby eliminated, and the subject’s aural memory is quickly cleared, so that the subject is unable to adapt to the newly presented condition for each trial.
Sunday, May 5, 10:00 — 10:30 (Sala Alighieri)


Loudness Day: L2 - All Loudness Recommendations Are Equal—But Some Are More Equal than Others

Andrew Mason, BBC Research and Development - London, UK

This tutorial will give an explanation of the different loudness standards used in Europe (R 128), the US (A/85), and in other countries such as Australia and Japan.


Sunday, May 5, 10:45 — 11:30 (Sala Alighieri)

Loudness Day: L3 - Loudness for Commercials—How Esthetics Change(d)

Matteo Milani
Alessandro Travaglini, Fox International Channels Italy - Guidonia Montecelio (RM), Italy
Rubens Zambelli
Carlos Zarattini, Discovery Networks Italy - Milan, Italy

Commercials have often been the number one complaint when it came to loudness problems and level jumps. The fear to be softer than the competition has led to overcompression of the spots and to an extremely narrow loudness range as well as the”perception of the audio signal being constantly smashed. 'Free transients” and the liberation from this loudness competition has now finally come with the transition to loudness normalization, and especially for commercials this is more than welcome. Three sound designers who have a lot of experience in producing detailed sound tracks for commercials will demonstrate how this new paradigm has changed and is still changing their approach and show how the new dynamic possibilities can be used to great effect. Among the points discussed will be:

• Use of headroom and dynamic processors
• Use of sound effects (in particular low-frequency sounds)
• Speech definition and sound artifacts, and
• Short-term loudness limitations

Examples of their work will be played and explained.


Sunday, May 5, 11:45 — 13:15 (Sala Alighieri)

Loudness Day: L4 - Are Movies Too Loud? The Loudness Race Reaches the Cinema

Florian Camerer, ORF - Austrian TV - Vienna, Austria; EBU - European Broadcasting Union
Eelco Grimm, Grimm Audio - Utrecht, Netherlands

Cinema operators are on the receiving end of growing numbers of complaints from the audience of soundtracks that are "too loud." Just turning them down results in lowered dialog levels, which leaves the movie quieter but unintelligible. The EU has loudness standards that are now starting to be applied to cinemas. There is a possible need for standards to ensure the theaters are given soundtracks that meet EU laws. This workshop will investigate the issues involved and the work necessary to resolve the issues.


Sunday, May 5, 14:15 — 16:15 (Sala Alighieri)

Loudness Day: L5 - Make LUFS Not War

Thomas Lund, TC Electronic A/S - Risskov, Denmark
Florian Camerer, ORF - Austrian TV - Vienna, Austria; EBU - European Broadcasting Union
George Massenburg, Schulich School of Music, McGill University - Montreal, Quebec, Canada

Newly produced pop/rock music rarely sounds good on fine loudspeakers, commercials on TV are annoyingly loud, and a visit to the cinema may be a deafening experience. This is audio's dark middle ages, from which there will be little content for future generations to enjoy.

However, 2013 could be the year where a renaissance again spreads from Italy. Transparent loudness normalization has arrived to radio, TV, and iPod; and the panel sets out to describe the far-reaching implications this will have on audio production at large. Hear about new quality-defining criteria, and save your next album for generations to come.


Sunday, May 5, 16:30 — 17:30 (Sala Alighieri)


Loudness Day: L6 - Give Peaks a Chance

Thomas Lund, TC Electronic A/S - Risskov, Denmark

Hearing is our most acute temporal sense by far, but the terms we have for describing dynamic changes in audio are few and not well defined. This session is about micro-dynamics and macro-dynamics in music and in speech, what effect they have, and what it takes to actually register them as a listener. Engineers be warned. Though audio examples are given, the presentation will primarily be based on anatomy, physiology, and psychology.

[A follow-up session to the Loudness War panel]


Sunday, May 5, 17:45 — 18:15 (Sala Alighieri)


Loudness Day: L8 - Loudness in Radio—The Next Step

Florian Camerer, ORF - Austrian TV - Vienna, Austria; EBU - European Broadcasting Union

After the start of the switchover from peak normalization to loudness leveling in TV, the logical progression is the move into Radio. One could argue that due to the hypercompression used in music production and Pop/Rock-stations over the last years, loudness differences are not an issue in Radio.... but that somewhat cynical view is definitely not an excuse to leave out the vast world of Radio programming. On the contrary! Albeit from a different angle and with other strategies, loudness production has many benefits and advantages also for supercompressed Radio stations. In this session those differences and challenges will be examined, and an outlook on the forthcoming work of the EBU-loudness group PLOUD in that area will be given.


