AES E-Library

AES E-Library Search Results

Bulk download: Download Zip archive of all papers from this Journal issue

A System Architecture for Semantically Informed Rendering of Object-Based Audio

Object-based audio promises format-agnostic reproduction and extensive personalization of spatial audio content. However, in practical listening scenarios, such as in consumer audio, ideal reproduction is typically not possible. To maximize the quality of listening experience, a different approach is required, for example modifications of metadata to adjust for the reproduction layout or personalization choices. This paper proposes a novel system architecture for semantically informed rendering (SIR), that combines object audio rendering with high-level processing of object metadata. In many cases, this processing uses novel, advanced metadata describing the objects to optimally adjust the audio scene to the reproduction system or listener preferences. The proposed system is evaluated with several adaptation strategies, including semantically motivated downmix to layouts with few loudspeakers, manipulation of perceptual attributes, perceptual reverberation compensation, and orchestration of mobile devices for immersive reproduction. These examples demonstrate how SIR can significantly improve the media experience and provide advanced personalization controls, for example by maintaining smooth object trajectories on systems with few loudspeakers, or providing personalized envelopment levels. An example implementation of the proposed system architecture is described and provided as an open, extensible software framework that combines object-based audio rendering and high-level processing of advanced object metadata.

Open
Access

Authors: Franck, Andreas; Francombe, Jon; Woodcock, James; Hughes, Richard; Coleman, Philip; Menzies, Dylan; Cox, Trevor J.; Jackson, Philip J.B.; Fazi, Filippo Maria
Affiliations: Institute of Sound and Vibration Research, University of Southampton, Southampton, Hampshire, UK; BBC Research and Development, Dock House, MediaCityUK, Salford, UK; Acoustics Research Centre, University of Salford, Salford, UK; Institute of Sound Recording, University of Surrey, Guildford, Surrey, UK; Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, UK(See document for exact affiliation information.)
JAES Volume 67 Issue 7/8 pp. 498-509; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (528 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Audio Forensics: Keeping up in the Age of Smartphones and Fakery

[Feature] Some interesting issues in audio forensics arise from the widespread use of smartphones for recording chunks of audio, and how one can detect edits and copying or moving of the resulting files. The idea that microphones leave unique signatures on the signal is an intriguing one for further investigation, and identification based on frequency-response features seems a promising avenue. In the area of forgery detection there are big challenges for the development of new technology as systems get more and more clever at faking human characteristics or hiding the results of artificial processing. Papers from the 2019 Audio Forensics Conference are summarized.

Author: Rumsey, Francis
JAES Volume 67 Issue 7/8 pp. 617-622; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this feature!

Estimation of Late Reverberation Characteristics from a Single Two-Dimensional Environmental Image Using Convolutional Neural Networks

In augmented-reality (AR) applications, reproducing acoustic reverberation is essential for the immersive audio experience. The audio components of an AR system should simulate the acoustics of the environment that is experienced by the users. Earlier, in virtual–reality (VR) applications, sound engineers could program all of the reverberation parameters for a particular scene in advance or when the user is at a fixed position. However, adjusting the reverberation parameters using conventional procedures is difficult because the unlimited range of such parameters cannot be programmed for AR applications. Therefore, it is necessary to dynamically estimate the reverberation characteristics based on the environments in which the users move. Considering that skilled acoustic engineers can estimate the reverberation parameters using the images of a room without performing any measurements, we trained convolutional neural networks to estimate the reverberation parameters using two–dimensional images. The proposed method does not require the simulations of sound propagation using 3D reconstruction techniques.

Authors: Kon, Homare; Koike, Hideki
Affiliation: School of Computing, Tokyo Institute of Technology, Tokyo, Japan
JAES Volume 67 Issue 7/8 pp. 540-548; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Influence of Visual Stimuli on Perceptual Attributes of Spatial Audio

Although audio is often reproduced with a visual counterpart, the audio technology for these systems is often researched and evaluated in isolation from the visual component. Previous research indicates that the auditory and visual modalities are not processed separately by the brain. For example, visual stimuli can influence ratings of audio quality and vice versa. This paper presents an experiment to investigate the influence of visual stimuli on a set of attributes relevant to the perception of spatial audio. Eighteen participants took part in a paired comparison listening test where they were asked to judge pairs of stimuli rendered with fourteen-, five-, and two-channel systems using ten perceptual attributes. The stimuli were presented in both audio only and audiovisual conditions. The results show a significant and large effect of the loudspeaker configuration for all the tested attributes other than overall spectral balance and depth of field. The effect of visual stimuli was found to be small and significant for the attributes realism, sense of space, and spatial clarity. These results suggest that evaluations of audiovisual technologies that are aimed to evoke a sense of realism or presence should consider the influence of both the audio and visual modalities.

Open
Access

Authors: Woodcock, James; Davies, William J.; Cox, Trevor J.
Affiliation: Acoustics Research Centre, University of Salford, Salford, UK
JAES Volume 67 Issue 7/8 pp. 557-567; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (407 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Multichannel Compensated Amplitude Panning, An Adaptive Object-Based Reproduction Method

Conventional approaches for surround sound panning require loudspeakers to be distributed over the regions where images are required. However in many listening situations it is not practical or desirable to place loudspeakers at some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is an object-based reproduction method that adapts dynamically to the listener’s head orientation to provide stable images in any direction in the frequency range up to approximately 1000 Hz. This is achieved by accurately controlling the Interaural Time Difference cue. CAP can also provide images in the near-field range, by controlling the Interaural Level Difference. Using two loudspeakers and with full 6-degrees-of-freedom head tracking, it was previously shown possible to create low band images in any direction, although excessive gain is required for some listener orientations. But with 3 loudspeakers all images directions can be reproduced with moderate gain. Adding more loudspeakers to a stereo configuration does not worsen performance. For comparison, an Ambisonic approach with position tracking and 3 frontal loudspeakers can reproduce horizontal surround images, and 4 loudspeakers can reproduce full 3D.

Open
Access

Authors: Menzies, Dylan; Fazi, Filippo Maria
Affiliation: Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
JAES Volume 67 Issue 7/8 pp. 549-556; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (297 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Near-Field Object-Based Audio Rendering on Flat-Panel Displays

Devices such as smartphones and televisions are beginning to employ screens as both a video display and a loudspeaker. This multimodal device is well suited for object-based encoding of audio, where audio objects may be rendered at the location corresponding to the visual images. The audio object renderer must be configured to account for variations in panel behavior at different excitation frequencies. This research proposes a multiband crossover network for the audio object renderer that separates the signal for each audio object into low, midrange, and high-frequency bands. Each band is then reproduced on the panel using a different vibration rendering technique. The different rendering techniques are realized by employing a combination of actuator array processing and the natural vibration localization characteristics of point-driven panels. The cutoff frequencies for each band are determined by the physical properties of the panel. Experiments on a prototype panel employing the multiband crossover system demonstrate that the vibration response behaves as predicted in each frequency range. This system provides a platform for rendering spatial audio on devices when listeners are close to the screen, and where there are restrictions related to weight, power consumption, and form-factor.

Authors: Heilemann, Michael C.; Anderson, David A.; Bocko, Mark F.
Affiliations: University of Rochester, Rochester, NY, USA; University of Pittsburgh, Pittsburgh, PA, USA(See document for exact affiliation information.)
JAES Volume 67 Issue 7/8 pp. 531-539; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Personalization in Object-based Audio for Accessibility: A Review of Advancements for Hearing Impaired Listeners

Hearing loss is widespread and significantly impacts an individual’s ability to engage with broadcast media. Access for people with impaired hearing can be improved through new object-based audio personalization methods. Utilizing the literature on hearing loss and intelligibility, this paper develops three dimensions that have the potential to improve intelligibility: spatial separation, speech-to-noise ratio, and redundancy. These can be personalized, individually or concurrently, using object-based audio. A systematic review of all work in object-based audio personalization is then undertaken. These dimensions are utilized to evaluate each project’s approach to personalization, identifying successful approaches, commercial challenges, and the next steps required to ensure continuing improvements to broadcast audio for hard-of-hearing individuals. Although no single solution will address all problems faced by individuals with hearing impairments when accessing broadcast audio, several approaches covered in this review show promise.

Open
Access

Authors: Ward, Lauren A.; Shirley, Ben G.
Affiliation: Acoustics Research Centre, University of Salford, Manchester, UK
JAES Volume 67 Issue 7/8 pp. 584-597; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (138 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Quality of Experience Tests of an Object-based Radio Reproduction App on a Mobile Device

Object-based audio (OBA) provides many enhancements and new features; yet, many of these require the user to be active in choosing and selecting the functionalities in visual representations and graphical interfaces. Basic investigations of the user experience of OBA within the EU research project OPRHEUS helped to identify the necessary criteria and dimensions. The user experience in object-based media comprises three dimensions: audio, information, and usability experience. During the project, a radio app for mobile devices was designed, developed, and tested. It includes many of the end-user features available with OBA. A first Quality of Experience (QoE) test to evaluate the radio app was carried out at JOSEPHS, an open innovation lab located in Nuremberg, Germany. The second QoE test took place at b<>com’s user experience lab in Rennes, France. For both investigations, the main objective was to find out how users can access, interact, and appreciate the various new features of OBA. For the first test, two typical user and listening scenarios were simulated: mobile listening and at home. The general acceptance of the new features and functions that come along with OBA is very high. The usability is rated high. Further possibilities for improvements were provided by the test users. The very good perceived sound quality with surround sound over loudspeakers or binaural reproduction over headphones impressed the listeners most. The second test focused mainly on the approach of comparing and evaluating the features from acceptability to acceptance, or from expectations to fulfillment. In the second test, the most appreciated feature was to set fore-to-background balance. This feature was number two in the first test. The importance of speech intelligibility for Radio and TV is a known and well discussed issue. Now, with OBA and the Next Generation Audio (NGA) codec MPEG-H, solutions are at hand to address it.

Open
Access

Authors: Silzle, Andreas; Schmidt, Rebekka; Bleisteiner, Werner; Epain, Nicolas; Ragot, Martin
Affiliations: Fraunhofer IIS, Erlangen, Germany; Fraunhofer SCS, Nürnberg, Germany; Bayerischer Rundfunk, München, Germany; b<>com, Cesson-Sévigné, France(See document for exact affiliation information.)
JAES Volume 67 Issue 7/8 pp. 568-583; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (974 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Source Separation for Enabling Dialogue Enhancement in Object-based Broadcast with MPEG-H

Low intelligibility of narration or dialogue resulting from high background level is one of the most common complaints in broadcasting. Even when the intelligibility is not compromised, listeners may have personal preferences that differ from the mix being broadcast. Dialogue Enhancement (DE) enables the delivery of optimal dialogue mixing to each listener, be it in terms of intelligibility or for aesthetic preference. This makes DE one of the most promising applications of user interactivity enabled by object-based audio broadcasting, such as MPEG-H. This paper investigates the use of source separation methods to extract dialogue and background from the complex sound mixture for enabling object-based broadcasting when dialogue is not available from the production process, as for example, with legacy content. The presented source separation technology integrates several separation approaches with known limitations into a more powerful overall architecture. In addition, the paper evaluates the subjective benefit of DE using the Adjustment/Satisfaction Test in which the listeners made extensive use of the dialogue level personalization. The fact that the preferred dialogue level had a high variance among the listeners indicates the need for this functionality. Even when an imperfect separation result was used for enabling DE, the possibility for personalizing the dialogue level lead to increased listener satisfaction.

Open
Access

Authors: Paulus, Jouni; Torcoli, Matteo; Uhle, Christian; Herre, Jürgen; Disch, Sascha; Fuchs, Harald
Affiliations: Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany; International Audio Laboratories Erlangen, Erlangen, Germany, a joint institution of Universität Erlangen-Nürnberg and Fraunhofer IIS(See document for exact affiliation information.)
JAES Volume 67 Issue 7/8 pp. 510-521; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

Download Now (335 KB)

This paper is Open Access which means you can download it for free.

Start a discussion about this paper!

Spatial Coding of Complex Object-Based Program Material

Object-based audio (OBA) program material is challenging to distribute over low bandwidth channels and costly to render for thin clients. This research proposes a dynamic object-grouping solution that can represent a complex object-based scene as an equivalent reduced set of object groups while maintaining perceptually transparent rendering quality. This solution is a type of spatial coding. This paper introduces a real-time greedy simplification technique that addresses limitations of previous approaches by modeling spatial release from masking and distributing input objects into to multiple output groups. The core algorithm is extended to preserve other types of artistic metadata beyond object position. Results of perceptual tests show that this solution can achieve a 10:1 reduction in object count while maintaining high-quality audio playback and rendering flexibility at the endpoint. Spatial coding does not require perceptual coding of the objects’ audio essence but can be further combined with audio coding tools to deliver OBA content at low bit rates. This makes spatial coding a key component of an OBA production and distribution workflow. Object-based content creation, distribution, and rendering workflows require novel methods to process, combine, encode, and simplify complex auditory scenes to allow end-point rendering flexibility, efficiency, and adaptability as well as the means to cater for personalized experiences.

Authors: Breebaart, Jeroen; Cengarle, Giulio; Lu, Lie; Mateos, Toni; Purnhagen, Heiko; Tsingos, Nicolas
Affiliation: Dolby Laboratories
JAES Volume 67 Issue 7/8 pp. 486-497; July 2019 Permalink
Publication Date: August 14, 2019 Import into BibTeX

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!

Search Results (Displaying 1-10 of 11 matches)		New Search
Sort by:
	Records Per Page:

AES E-Library Search Results

A System Architecture for Semantically Informed Rendering of Object-Based Audio

Audio Forensics: Keeping up in the Age of Smartphones and Fakery

Estimation of Late Reverberation Characteristics from a Single Two-Dimensional Environmental Image Using Convolutional Neural Networks

Influence of Visual Stimuli on Perceptual Attributes of Spatial Audio

Multichannel Compensated Amplitude Panning, An Adaptive Object-Based Reproduction Method

Near-Field Object-Based Audio Rendering on Flat-Panel Displays

Personalization in Object-based Audio for Accessibility: A Review of Advancements for Hearing Impaired Listeners

Quality of Experience Tests of an Object-based Radio Reproduction App on a Mobile Device

Source Separation for Enabling Dialogue Enhancement in Object-based Broadcast with MPEG-H

Spatial Coding of Complex Object-Based Program Material

ABOUT AES

Contact Us