144th AES CONVENTION Paper Session P10: Audio Coding, Analysis, and Synthesis

AES Milan 2018
Paper Session P10

P10 - Audio Coding, Analysis, and Synthesis

Thursday, May 24, 11:15 — 12:45 (Scala 2)

J├╝rgen Herre, International Audio Laboratories Erlangen - Erlangen, Germany; Fraunhofer IIS - Erlangen, Germany

P10-1 Bandwidth Extension Method Based on Generative Adversarial Nets for Audio CompressionQingbo Huang, Peking University - Beijing, China; Xihong Wu, Peking University - Beijing, China; Tianshu Qu, Peking University - Beijing, China
The compression ratio of core-encoder can be improved significantly by reducing the bandwidth of the audio signal, resulting in the poor listening perception. This paper proposes a bandwidth extension method based on generative adversarial nets (GAN) for extending the bandwidth of an audio signal, to create a more natural sound. The method uses GAN as a generative model to fit the distribution of the MDCT coefficients of the audio signals in the high-frequency components. Through minimax two-player gaming, more natural high-frequency information can be estimated. On this basis, a codec system is built up. To evaluate the proposed bandwidth extension system the MUSHRA experiments were carried on and the results show that there is comparable performance with HE-AAC.
Convention Paper 9954 (Purchase now)

P10-2 Device-Specific Distortion Observed in Portable Devices Available for Recording Device IdentificationAkira Nishimura, Tokyo University Information Sciences - Chiba-shi, Japan
This study addresses device-specific distortion observed in recorded audio, to identify a built-in system-on-a-chip (SoC) in a portable device. A swept sinusoidal signal is emitted from a loudspeaker and is recorded by the portable device used in this study. The three types of distortion observed by spectral analysis of the recorded signals are the folded components at frequencies symmetrical across 4 kHz and 8 kHz of the signal component, non-harmonic and non-subharmonic distortion components whose frequencies are 4 kHz below and multiples of 4 kHz above the signal frequency, and mixed non-subharmonics and folded components in the low-frequency region. They are also observed using the correlation matrix on temporal amplitude variations among frequencies derived from the recorded speech signals.
Convention Paper 9955 (Purchase now)

P10-3 Physically Derived Synthesis Model of an Edge ToneRod Selfridge, Queen Mary University of London - London, UK; University of Edinburgh - Edinburgh, UK; Joshua D. Reiss, Queen Mary University of London - London, UK; Eldad J. Avital, Queen Mary University of London - London, UK
The edge tone is the sound generated when a planar jet of air from a nozzle comes into contact with a wedge and a number of physical conditions are met. Fluid dynamics equations were used to synthesize authentic edge tones without the need for complex computation. A real-time physically derived synthesis model was designed using the jet airspeed and nozzle exit-to-wedge geometry. We compare different theoretical equations used to predict the tone frequency. A decision tree derived from machine learning based on previously published experimental results was used to predict the correct mode of operation. Results showed an accurate implementation for mode selection and highlighted areas where operation follows or deviates from previously published data.
Convention Paper 9956 (Purchase now)

Return to Paper Sessions