AES New York 2019
Engineering Brief EB8
EB8 - Applications in Audio
Saturday, October 19, 3:30 pm — 4:30 pm
Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
EB8-1 Vibrary: A Consumer-Trainable Music Tagging Utility—Scott Hawley, Belmont University - Nashville, TN, USA; Jason Bagley, Art+Logic - Pasadena, CA, USA; Brett Porter, Art+Logic - Fanwood, NJ, USA; Daisey Traynham, Art+Logic - Pasadena, CA, USA
We present the engineering underlying a consumer application to help music industry professionals find audio clips and samples of personal interest within their large audio libraries typically consisting of heterogeneously-labeled clips supplied by various vendors. We enable users to train an indexing system using their own custom tags (e.g., instruments, genres, moods), by means of convolutional neural networks operating on spectrograms. Since the intended users are not data scientists and may not possess the required computational resources (i.e., Graphics Processing Units, GPUs), our primary contributions consist of (i) designing an intuitive user experience for a local client application to help users create representative spectrogram datasets, and (ii) "seamless" integration with a cloud-based GPU server for efficient neural network training.
Engineering Brief 562
EB8-2 Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio—Lauren Ward, University of Salford - Salford, UK; BBC R&D - Salford, UK; Matthew Paradis, BBC Research and Development - London, UK; Ben Shirley, University of Salford - Salford, Greater Manchester, UK; Salsa Sound Ltd - Salford, Greater Manchester, UK; Laura Russon, BBC Studios - Cardiff, Wales, UK; Robin Moore, BBC Research & Development - Salford, UK; Rhys Davies, BBC Studios - Cardiff, Wales, UK
Casualty Accessible and Enhanced (A&E) Audio is the first public trial of accessible audio technology using a narrative importance approach. This trial allows viewers to personalize the audio of an episode of the BBC’s "Casualty" drama series based on their hearing needs. Using a simple interface the audio can be varied between the broadcast mix and an accessible mix containing narratively important non-speech sounds, enhanced dialogue, and attenuated background sounds. This paper describes the trial’s development, implementation, and it’s evaluation by normal and hard of hearing listeners (n=5209 on 20/8/2019). 299 participants also completed a survey, rating the technology 3.6/5 stars. 73% reported the technology made the content more enjoyable or easier to understand.
Engineering Brief 563
EB8-3 Generative Modeling of Metadata for Machine Learning Based Audio Content Classification—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
Automatic content classification technique is an essential tool in multimedia applications. Present research for audio-based classifiers look at short- and long-term analysis of signals, using both temporal and spectral features. In this paper we present a neural network to classify between the movie (cinematic, TV shows), music, and voice using metadata contained in either the audio/video stream. Towards this end, statistical models of the various metadata are created since a large metadata dataset is not available. Subsequently, synthetic metadata are generated from these statistical models, and the synthetic metadata is input to the ML classifier as feature vectors. The resulting classifier is then able to classify real-world content (e.g., YouTube) with an accuracy ˜ 90% with very low latency (viz., ˜ on an average 7 ms) based on real-world metadata.
Engineering Brief 564
EB8-4 Individual Headphone Equalization at the Eardrum with New Apps for Computers and Cellphones—David Griesinger, David Griesinger Acoustics - Cambridge, MA, USA
Ear canal resonances that concentrate energy on the eardrum are highly individual, and headphones alter or eliminate them. The result is inaccurate timbre and in-head localization. We have developed computer apps that use an equal loudness test to match the sound spectrum at the eardrum from a pair of headphones to the spectrum at the eardrums from a frontal loudspeaker. The result is precise timbre and frontal localization. The improvement in sound is startling. In this presentation we will demonstrate the process and the easy to use software that is now available for VST, AAX, Windows, MAC, Android and IOS cellphones. [Presentation only; not available in E-Library]