144th AES CONVENTION Poster Session P21: Posters: Audio Coding and Quality

AES Milan 2018
Poster Session P21

P21 - Posters: Audio Coding and Quality

Friday, May 25, 15:00 — 16:30 (Arena 2)

P21-1 Quantization with Signal Adding Noise Shaping Using Long Range Look-Ahead OptimizationAkihiko Yoneya, Nagoya Institute of Technology - Nagoya, Aichi-pref., Japan
A re-quantization approach for digital audio signals using noise shaping by extra signal addition is studied. The approach has been proposed by the author but its properties have not been studied well. In this paper, the feature and performance of the approach is investigated. As a result, the noise shaping performance is a little better than the conventional one and perceptual evaluation is superior in terms of the fineness of the sound source image especially when the optimization horizon used in the additional signal calculation is wide. Since a wide horizon requires a lot of computation, a pruning scheme of the optimization is proposed to reduce the calculation time and the amount of computation is evaluated experimentally.
Convention Paper 9999 (Purchase now)

P21-2 A Comparison of Clarity in MQA Encoded Files vs. Their Unprocessed State as Performed by Three Groups – Expert Listeners, Musicians, and Casual ListenersMariane Generale, McGill University - Montreal, QC, Canada; Richard King, McGill University - Montreal, Quebec, Canada; The Centre for Interdisciplinary Research in Music Media and Technology - Montreal, Quebec, Canada; Denis Martin, McGill University - Montreal, QC, Canada; CIRMMT - Montreal, QC, Canada
This paper aims to examine perceived clarity in MQA encoded audio files compared to their unprocessed state (96-kHz 24-bit). Utilizing a methodology initially proposed by the authors in a previous paper, this study aims to investigate any reported differences in clarity for three musical sources of varying genres. A double-blind test is conducted using three groups—expert listeners, musicians, and casual listeners—in a controlled environment using high-quality loudspeakers and headphones. The researchers were interested in comparing the responses of the three target groups and whether playback systems had any significant effect on listeners’ perception. Data shows that listeners were not able to significantly discriminate between MQA encoded files and the unprocessed original due to several interaction effects.
Convention Paper 10000 (Purchase now)

P21-3 A Subjective Evaluation of High Bitrate Coding of MusicKristine Grivcova, BBC Research & Development - Salford, UK; Chris Pike, BBC R&D - Salford, UK; University of York - York, UK; Thomas Nixon, BBC R&D - Salford, UK
The demand to deliver high quality audio has led broadcasters to consider lossless delivery. However the difference in quality over formats used in existing services is not clear. A subjective listening test was carried out to assess the perceived difference in quality between AAC-LC at 320 kbps and an uncompressed reference, using the method of ITU-R BS.1116. Twelve audio samples were used in the test, which included orchestral, jazz, vocal music, and speech. A total of 18 participants with critical listening experience took part in the experiment. The results showed no perceptible difference between AAC-LC at 320 kbps and the reference.
Convention Paper 10001 (Purchase now)

P21-4 Subjective Evaluation of a Spatialization Feature for Hearing Aids by Normal-Hearing and Hearing-Impaired SubjectsGilles Courtois, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Hervé Lissek, Swiss Federal Institute of Technology (EPFL) - Lausanne, Switzerland; Philippe Estoppey, Acoustique Riponne - Lausanne, Switzerland; Yves Oesch, Phonak Communciations AG - Murten, Switzerland; Xavier Gigandet, Phonak Communications AG - Murten, Switzerland
Remote microphone systems significantly improve speech intelligibly performance offered by hearing aids. The voice of the speaker(s) is captured close to the mouth by a microphone, then wirelessly sent to the hearing aids. However, the sound is rendered in a diotic way, which bypasses the spatial cues for localizing and identifying the speaker. The authors had formerly proposed a feature that localizes and spatializes the voice. The current study investigates the perception of that feature by normal-hearing and hearing-impaired subjects with and without remote microphone system experience. Comparing the diotic and binaural reproductions, subjects rated their preference over various audiovisual stimuli. The results show that experienced subjects mostly preferred the processing achieved by the feature, contrary to the other subjects.
Convention Paper 10002 (Purchase now)

P21-5 Virtual Reality for Subjective Assessment of Sound Quality in CarsAngelo Farina, Università di Parma - Parma, Italy; Daniel Pinardi, Università di Parma - Parma, Italy; Marco Binelli, University of Parma - Parma, Italy; Michele Ebri, University of Parma - Parma, Italy; Politecnico di Torino - Torino, Italy; Lorenzo Ebri, Ask Industries S.p.A. - Montecavolo di Quattrocastella (RE), Italy; University of Parma - Parma (PR), Italy
Binaural recording and playback has been used for decades in the automotive industry for performing subjective assessment of sound quality in cars, avoiding expensive and difficult tests on the road. Despite the success of this technology, several drawbacks are inherent in this approach. The playback on headphones does not have thebenefit of head-tracking, so the localization is poor. The HRTFs embedded in the binaural rendering are those of the dummy head employed for recording the sound inside the car, and finally there is no visual feedback, so the listener gets a mismatch between visual and aural stimulations. The new Virtual Reality approach solves all these problems. The research focuses on obtaining a 360° panoramic video of the interior of vehicle, accompanied by audio processed in High Order Ambisonics format, ready for being rendered on a stereoscopic VR visor. It is also possible to superimpose onto the video a real-time color map of noise levels, with iso-level curves and calibrated SPL values. Finally, both sound level color map and spatial audio can be filtered by the coherence with one or multiple reference signals, making it possible to listen and localize very precisely noise sources and excluding all the others. These results have been acquired employing a massive spherical microphone array, a 360° panoramic video recording system, and accelerometers or microphones for the reference signals.
Convention Paper 10003 (Purchase now)

P21-6 Quality Evaluation of Sound Broadcasted Via DAB+ System Based on a Single Frequency NetworkStefan Brachmanski, Wroclaw University of Technology - Wroclaw, Poland; Maurycy Kin, Wroclaw University of Science and Technology - Wroclaw, Poland
This paper presents the results of quality assessment of speech and music signals transmitted via the DAB+ system. The musical signals have been evaluated in both overall quality and some particular attributes. The subjective research was provided with the use of ACR procedure according to the ITU recommendation and the results have been presented as the MOS values for various bit rates. The speech signals were additionally examined with PESQ method. The results have shown that the assumed quality of 4 MOS, for this kind of broadcasting could be achieved at 48 kbit/s. This fact was confirmed by both subjective and objective research. The differences of results obtained for overall sound quality and particular sound attributes are discussed.
Convention Paper 10004 (Purchase now)

P21-7 An Investigation into Spatial Attributes of 360° Microphone Techniques for Virtual RealityConnor Millns, University of Huddersfield - Huddersfield, UK; Hyunkook Lee, University of Huddersfield - Huddersfield, UK
Listening tests were conducted to evaluate perceived spatial attributes of two types of 360° microphone techniques for virtual reality (First Order Ambisonics (FOA) and the Equal Segment Microphone Array (ESMA)). Also a binaural dummy head was included as a baseline for VR audio. The four attributes tested were: source shift/ensemble spread, source/ensemble distance, environmental width, and environmental depth. The stimuli used in these tests included single and multi-source sounds consisting of both human voice and instruments. The results indicate that listeners can distinguish differences in three of the four spatial attributes. The binaural head was rated the highest for each attribute and FOA was rated the least except for in environmental depth.
Convention Paper 10005 (Purchase now)

P21-8 Statistical Tests with MUSHRA DataCatarina Mendonça, Aalto University - Espoo, Finland; Symeon Delikaris-Manias, Aalto University - Helsinki, Finland
This work raises concerns regarding the statistical analysis of data obtained with the MUSHRA method. There is a widespread tendency to prefer the ANOVA test, which is supported by the recommendation. This work analyzes four assumptions underlying the ANOVA tests: interval scale, normality, equal variances, and independence. Data were collected from one experiment and one questionnaire. It is found that MUSHRA data tend to violate all of the above assumptions. The consequences of each violation are debated. The violation of multiple assumptions is of concern. The violation of independence of observations leads to the most serious concern. In light of these findings, it is concluded that ANOVA tests have a high likelihood of resulting in type 1 error (false positives) with MUSHRA data and should therefore never be used with this type of data. The paper finishes with a section devoted to statistical recommendations. It is recommended that when using the MUSHRA method, the Wilcoxon or Friedman tests be used. Alternatively, statistical tests based on resampling methods are also appropriate.
Convention Paper 10006 (Purchase now)

P21-9 Investigation of Audio Tampering in Broadcast ContentNikolaos Vryzas, Aristotle University of Thessaloniki - Thessaloniki, Greece; Anastasia Katsaounidou, Aristotle University of Thessaloniki - Thessaloniki, Greece; Rigas Kotsakis, Aristotle University of Thessaloniki - Thessaloniki, Greece; Charalampos A. Dimoulas, Aristotle University of Thessaloniki - Thessaloniki, Greece; George Kalliris, Aristotle University of Thessaloniki - Thessaloniki, Greece
Audio content forgery detection in broadcasting is crucial to prevent misinformation spreading. Tools for the authentication of audio files can be proven very useful, and several techniques have been proposed. In the current paper a database for evaluation of such techniques is introduced. A script was created for automatic generation of tampered audio files, given a number of original source files that contain recorded speech, while they have been encoded in different audio formats (Mp3, AAC, AMR, FLAC) and bitrates and finally they were used to generate the tampered audio files. The database was subjectively evaluated by experts in terms of samples changing audibility. The effect of tampering on several audio features was tested, in order to propose semi-automatic methods for discrimination between the original and tampered files. The database and the scripts are publically accessible so that researchers can use the pre-generated files or use the script to create datasets oriented to their research interests.
Convention Paper 10007 (Purchase now)

Return to Paper Sessions