Monday, October 13 4:00 pm 5:30 pm
Session Z8 Posters: Psychoacoustics and Coding, Part 2
Z8-1 Lossless Compression for Audio Data in the IEEE Floating-Point FormatDai Yang, Takehiro Moriya, NTT Cyber Space Laboratory, Tokyo, Japan
Most lossless audio-coding algorithms are designed for PCM input sound formats. Little work has been done on the lossless compression of IEEE floating-point audio files. In this paper we propose an efficient lossless-coding algorithm which handles IEEE floating-point format data as well as PCM format data. In the worst-case scenario, where the proposed algorithm was applied to artificially generated 48-kHz sampling frequency and 32-bit floating-point sound files, a compression ratio of less than 70 percent is still achieved. Moreover, the proposed algorithm allows random access and is easily extensible to lossless/variable-lossy operation, which will provide scalability to accommodate requirements of a wider range of applications and platforms.
Z8-2 Optimum Quantization of Flattened MDCT CoefficientsAníbal Ferreira, University of Ports/INESC Porto, Porto, Portugal
The performance of perceptual audio coders depends on the efficiency of the quantization operation in masking the quantization noise under the audio signal. This objective is better addressed by coding separately different signal components such as sinusoids, transients, and stationary noise. In this paper we use an audio coder that normalizes the MDCT spectrum by a smooth spectral envelope and by periodicities due to sinusoids. The resulting flattened MDCT coefficients are shown to exhibit a probability density function with small uncertainty allowing the design of an optimum nonuniform scalar quantizer. Its distortion-rate function is derived, is compared to that of known quantizers, and compared to that obtained under real audio coding conditions.
Z8-3 Low Frequency Optimization and Nonbass Masking Effects for Sound Field RecreationGraeme Huon, Zeljko Velican, Huonlabs, Victoria, Australia
The future delivery of sound must create or render a realistic sound event image. A model for 3-D sound capture and render format has been proposed earlier, based on depth perception. This paper considers the requirements for optimized low frequency reproduction and nonbass masking effects related to this model. Theoretical modeling, practical verification in real listening environments, and subjective assessment with skilled and unskilled listeners is presented and conclusions drawn. It is shown that room mode influence, low frequency spatial energy distribution, and main system integration for low frequency reproduction in rooms can be effectively managed. New apparatus is described that enables the loudspeaker to be placed optimally with respect to the listener so as to minimize room mode coloration of low frequencies. Its use to quantify frequency dependent loss is defined. Nonbass spatial masking effects are also reported in the context of the depth perception model.
Z8-4 An Information-Theoretic Model for Audio WatermarkingRuihua Ma, Institute for Infocomm Research, Singapore
Based on recent information regarding hiding theory, information hiding may be reviewed as a game between hider (embedder/decoder) and attacker; and optimal information-embedding and attack strategies may be developed in this context. So far, there is a lot of work to apply this theory to image watermarking. However, there is little research work about applying this theory to audio watermarking. This paper aims to fill in this gap. An information-theoretic model for audio watermarking is presented. We use wavelet statistical models for audio signals and compute data-hiding capacity for compressed and uncompressed host-audio sources. The simulation shows that within the experimental results, the proposed system has near-optimal performance compared to the theoretical upper bounds.
Z8-5 Watermark Insertion into MP3 Bitstream Using the Linbits CharacteristicsSeung-Jin Yang, Do-Hyung Kim, Jae-Ho Chung, Inha University, Incheon, Korea
We suggest the watermarking technique which inserts additive information into quantized integer coefficients whose values are over 15, called the linbits, during Huffman coding in MP3 encoding procedure. The linbits is inserted into the bitstream with binary codes as it is. We inserted watermarks by modifying the linbits and made an experiment evaluating audible distortion through the MOS Test. In our experiment, 20 untrained listeners were asked to rate 20 samples of about 15 seconds in which a watermark is inserted at 128 kbp/s, according to perceived quality on a scale of 1(very annoying) to 5 (imperceptible). As a result, we confirmed that we could insert the additional information or watermarks of about 60 bytes/s with sound quality of MOS 4.6 on an average.
Z8-6 Perceptual Convolution for ReverberationWen-Chieh Lee, Chung-Han Yang, Chi-Min Liu, National Chiao Tung University, Taiwan; Jiun-In Guo, National Chung Cheng University, Taiwan
The FIR-based reverberators, which convolve the input sequence with an impulse response modeling the concert hall, have better quality compared to the IIR-based approach. However, the high computational complexity of the FIR-based reverberators limits the applicability to most cost-oriented system. This paper introduces a method that uses perceptual criterion to reduce the complexity of convolution methods for reverberation. Also, an objective measurement criterion is introduced to check the perceptual difference from the reduction. The result has shown that the length of impulse response can be cut off by 60 percent without affecting the perceptual reverberation quality. The method is well integrated into the existing FFT-based approach to have around 30 percent speed-up. Also, the method has a high flexibility to various computation complexities with graceful degradation to the reverberation quality.
Z8-7 Advances in Trellis-Based SDM StructuresErwin Janssen, Derk Reefman, Philips Research, Eindhoven, The Netherlands
Recently, a new type of 1-bit Sigma Delta Modulator (SDM), called a Trellis noise-shaping converter, was introduced. It offers several advantages compared to a standard SDM, including better performance in stability, signal to noise ratio (SNR), and linearity. The major drawback of the Trellis architecture is the large computational requirement. This paper refines the concept of Trellis noise-shaping, and introduces a new algorithm that offers better performance at an incredibly reduced cost. On this new algorithm, a comparative performance analysis has been performed. Cost savings of multiple orders of magnitude have been achieved, while maintaining all the benefits of Trellis noise-shaping. Finally, special attention has been paid on critical implementation details.