ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL
×
Cite This
Citation & Abstract
JU. JA. Ahmad, C. Alberti, JU. (J. Hong, B. Leonard, M. Mattavelli, C. Par, S. Quackenbush, and W. Woszczyk, "ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL," Paper 9218, (2014 October.). doi:
JU. JA. Ahmad, C. Alberti, JU. (J. Hong, B. Leonard, M. Mattavelli, C. Par, S. Quackenbush, and W. Woszczyk, "ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL," Paper 9218, (2014 October.). doi:
Abstract: Inverse problems have only been known in spatial audio for a very short time; their only solution, called "inverse coding" in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. For instance, inversely coded NHK 22.2 multichannel signals in combination with USAC may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available HE-AAC v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. We likewise present a 3D audio codec design for signals up to NHK 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full Higher Order Ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. The second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. The technology has been specified as the world’s first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain. A preliminary version of this technology, based on a downmix in frequency domain, was submitted to MPEG’s „Phase 2“ selection of low-bitrate 3D coding technologies and made use of an USAC binary, which unfortunately offered no tuning options.
@article{ahmad2014ecma-407:,
author={ahmad, junaid jameel and alberti, claudio and hong, jung wook (jonathan) and leonard, brett and mattavelli, marco and par, clemens and quackenbush, schuyler and woszczyk, wieslaw},
journal={journal of the audio engineering society},
title={ecma-407: new approaches to 3d audio content data rate reduction with rvc-cal},
year={2014},
volume={},
number={},
pages={},
doi={},
month={october},}
@article{ahmad2014ecma-407:,
author={ahmad, junaid jameel and alberti, claudio and hong, jung wook (jonathan) and leonard, brett and mattavelli, marco and par, clemens and quackenbush, schuyler and woszczyk, wieslaw},
journal={journal of the audio engineering society},
title={ecma-407: new approaches to 3d audio content data rate reduction with rvc-cal},
year={2014},
volume={},
number={},
pages={},
doi={},
month={october},
abstract={inverse problems have only been known in spatial audio for a very short time; their only solution, called "inverse coding" in literature, is essentially based on time-level modeling. inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. for instance, inversely coded nhk 22.2 multichannel signals in combination with usac may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available he-aac v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. a new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. invariants with gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: david hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. as will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. we likewise present a 3d audio codec design for signals up to nhk 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full higher order ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. the second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. the technology has been specified as the world’s first international 3d audio standard ecma-407 and may be further extended with static models in frequency domain. a preliminary version of this technology, based on a downmix in frequency domain, was submitted to mpeg’s „phase 2“ selection of low-bitrate 3d coding technologies and made use of an usac binary, which unfortunately offered no tuning options.},}
TY - paper
TI - ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL
SP -
EP -
AU - Ahmad, Junaid Jameel
AU - Alberti, Claudio
AU - Hong, Jung Wook (Jonathan)
AU - Leonard, Brett
AU - Mattavelli, Marco
AU - Par, Clemens
AU - Quackenbush, Schuyler
AU - Woszczyk, Wieslaw
PY - 2014
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2014
TY - paper
TI - ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL
SP -
EP -
AU - Ahmad, Junaid Jameel
AU - Alberti, Claudio
AU - Hong, Jung Wook (Jonathan)
AU - Leonard, Brett
AU - Mattavelli, Marco
AU - Par, Clemens
AU - Quackenbush, Schuyler
AU - Woszczyk, Wieslaw
PY - 2014
JO - Journal of the Audio Engineering Society
IS -
VO -
VL -
Y1 - October 2014
AB - Inverse problems have only been known in spatial audio for a very short time; their only solution, called "inverse coding" in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. For instance, inversely coded NHK 22.2 multichannel signals in combination with USAC may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available HE-AAC v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. We likewise present a 3D audio codec design for signals up to NHK 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full Higher Order Ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. The second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. The technology has been specified as the world’s first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain. A preliminary version of this technology, based on a downmix in frequency domain, was submitted to MPEG’s „Phase 2“ selection of low-bitrate 3D coding technologies and made use of an USAC binary, which unfortunately offered no tuning options.
Inverse problems have only been known in spatial audio for a very short time; their only solution, called "inverse coding" in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding. For instance, inversely coded NHK 22.2 multichannel signals in combination with USAC may be delivered at bitrates as low as 48kb/s and optimum performance can be achieved in combination with commercially available HE-AAC v2 at 256kb/s - without any scaling of output channel order, and with moderate complexity in the decoder. A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert’s proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool. We likewise present a 3D audio codec design for signals up to NHK 22.2 with two profiles: one profile, based on co-incidence, is able to code and synthesize a full Higher Order Ambisonics soundfield, up to order 6, at 48kb/s, 64kb/s, 96kb/s, 128kb/s, and above. The second profile, which optimizes de-correlation for phantom source imaging, codes channel-based or object-based signals at the same bitrates. The technology has been specified as the world’s first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain. A preliminary version of this technology, based on a downmix in frequency domain, was submitted to MPEG’s „Phase 2“ selection of low-bitrate 3D coding technologies and made use of an USAC binary, which unfortunately offered no tuning options.
Authors:
Ahmad, Junaid Jameel; Alberti, Claudio; Hong, Jung Wook (Jonathan); Leonard, Brett; Mattavelli, Marco; Par, Clemens; Quackenbush, Schuyler; Woszczyk, Wieslaw
Affiliations:
Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland; McGill University, Montreal, QC, Canada; GKL Audio Inc., Montreal, QC, Canada; University of Nebraska at Omaha, Omaha, NE; McGill University, Montreal, Quebec, Canada; Swiss Audec, Morges, Switzerland; Audio Research Labs, Scotch Plains, USA(See document for exact affiliation information.)
AES Convention:
137 (October 2014)
Paper Number:
9218
Publication Date:
October 8, 2014Import into BibTeX
Subject:
Spatial Audio
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=17541