AES 115th Convention: PAPERS

Registration

Exhibitors

Detailed Calendar

(in Excel)

Calendar (in PDF)

Convention Planner

Surround - Live:

Symposium

Paper Sessions

Tutorial Seminars

Workshops

Special Events

Exhibitor Seminars

Room A

Exhibitor Seminars

Room B

Technical Tours

Student Program

Historical

Free Hearing

Screenings

Heyser Lecture

Tech Comm Mtgs

Standards Mtgs

Press Information

Return to 115th

Monday, October 13 2:00 pm – 3:30 pm
Session Z7 Posters: Psychoacoustics and Coding, Part 1

Z7-1 Cascaded Trellis-Based Optimization for MPEG-4 Advanced Audio Coding—Cheng-Han Yang, Hsueh-Ming Hang, National Chiao Tung University, Hsinchu, Taiwan
A low complexity and high performance scheme for choosing MPEG-4 advanced audio coding (AAC) parameters is proposed. One key element in producing good quality compressed audio at low rates is selecting proper coding parameter values. The MPEG committee AAC reference model does not do well on this job. A joint trellis-based optimization approach has thus been previously proposed. It leads to a near-optimal selection of parameters at the cost of extremely high computational complexity. It is, therefore, very desirable to achieve a similar coding performance (audio quality) at a much lower complexity. Simulation results indicate that the proposed cascaded trellis-based optimization scheme has a coding performance close to that of the joint trellis-based scheme, but it requires only 1/70 computation.

Z7-2 Implementing MPEG Advanced Audio Coding and Layer-3 Encoders on 32-Bit and 16-Bit Fixed-Point Processors—Marc Gayer, Markus Lohwasser, Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
Encoder implementations of MPEG advanced audio coding and Layer-3 on 32-bit or 16-bit fixed-point processors are challenging due to the fact that the usable word length is restricted to 32 bits if low processing power is required. This paper describes the modifications and optimizations that had to be applied to the algorithms of these audio encoders to make a true fixed-point implementation on a 32-bit or 16-bit device possible without having to use floating point emulations or even 64-bit values for the signal energies and thresholds in the psychoacoustic model of the encoder and at the same time achieve high encoding quality and speed. Memory and processing power requirements on various platforms as well as results from an objective listening test will be presented.

Z7-3 An Extended-Run-Length Coding Tool for Audio Compression—Dai Yang, Takehiro Moriya, Akio Jin, Kazunaga Ikeda, NTT Cyber Space Laboratories, NTT Corporation, Musashino, Japan
An extended-run-length coding tool called zero compaction is proposed in this paper. The proposed coding tool is far more efficient and generates significantly better results than traditional run-length coding when a certain type of data appears in the middle stage of our lossless audio compression systems. The zero-compaction technique has been integrated in two lossless audio coding schemes. When tested on seven sets of standard audio files from the ISO, the inclusion of zero compaction improved the average compression rates of systems by more than 1 percent in all cases. In addition, the simplicity and good performance of zero-compaction shaves up to 4 percent from the total encoding time.

Z7-4 Implementation of Interactive 3-D Audio Using MPEG-4 Multimedia Standards—Jeongil Seo, Gi Yoon Park, Dae-Young Jang, Kyeoungok Kang, ETRI, Deajon, Korea
We present implementation procedure of the interactive 3-D audio player using MPEG-4 standards. MPEG-4 standards allow interaction with important objects in a scene as well as to provide the high efficiency coding tools for audiovisual objects. We apply the AudioBIFS system (version 1), parts of the BIFS in the MPEG-4 Systems standards, to provide an object-based interactivity for audio objects. It also provides geometric information for spatializing an audio object into the 3-D audio scene. Because the method of 3-D spatialization is not normative in the MPEG-4 AudioBIFS, we adopt a novel 3-D spatialization method based on HRTF processing. To prohibit excessive increase of the number of audio objects, we defined the background audio object to present the atmosphere of a 3-D audio scene. While the important audio objects are individually separated as audio objects, the other audio objects are merged into the background audio object. Though our interactive 3-D audio player is developed for a terminal player in a Digital Multimedia Broadcasting (DMB) system, it can be also applied to 3-DTV, virtual reality games, interactive home shopping applications, etc.

Z7-5 Error Mitigation of MPEG-4 Audio Packet Communication Systems—Schuyler Quackenbush, Audio Research Labs, Scotch Plains, NJ, USA; Peter Driessen, University of Victoria, Victoria, British Columbia, Canada
This paper investigates techniques for mitigating the effect of missing packets of MPEG-4 Advanced Audio Coding (AAC) data so as to minimize perceived audio degradation. Applications include streaming of AAC music files over the Internet and wireless packet data channels. A range of techniques are presented, but statistical interpolation in the time/frequency domain is found to be the most effective. The novelty of the work is to use statistical interpolation techniques intended for time domain samples on the frequency domain coefficients. A means of complexity reduction is presented, after which the error mitigation is found to require on average 17 percent additional computation for a channel with 5 percent errors as compared to a clear channel. In an informal listening test, all subjects preferred this technique over a more simplistic technique of signal repetition, and for one signal item statistical interpolation was preferred to the original.

Z7-6 Combined Source and Perceptual Audio Coding—Aníbal Ferreira, University of Porto/INESC Porto, Porto, Portugal; André Rocha, INESC Porto, Porto, Portugal
An advanced Audio Spectral Coder (ASC) is described that implements a new approach to the problem of efficient audio compression by combining source coding with perceptual coding techniques. This approach involves the audio decomposition into three main components: transient events in the time domain, harmonic structures of sinusoids, and stationary noise in the MDCT frequency domain. It is shown that this decomposition permits the independent parametrization and coding of components according to appropriate representation models and applicable psychoacoustic rules. The coding performance of ASC is characterized and it is shown that due to its underlying structure, additional functionalities other than compression are also possible namely bitstream semantic scalability, access, and classification.

Z7-7 Phase Transmission in a Sinusoidal Audio and Speech Coder—Albertus den Brinker, Andy Gerrits, Rob Sluijter, Philips Research Laboratories, Eindhoven, The Netherlands
In sinusoidal coding, frequency tracks are formed in the encoder and the amplitude and frequency information of a track is transmitted. Phase is usually not transmitted, but reconstructed at the decoder by assuming that the phase is the integral of the frequency. Such a reconstructed phase accumulates inaccuracies, leading to audible artifacts. To prevent this, a mechanism is proposed to transmit phase information of sinusoidal tracks. In the proposed mechanism, the phase is unwrapped based on the measured phases and frequencies in the encoder. The unwrapped phase is quantized and transmitted. The frequencies are not transmitted, but restored from the phase information by differentiation.

Z7-8 Objective Estimates of Partial Masking Thresholds for Mobile Terminal Alert Tones—David Isherwood, Ville-Veikko Mattila, Nokia Research Center, Tampere, Finland
Listening tests were performed to define the perceived loudness threshold above which a number of mobile terminal alert tones became only partially masked by environmental noise. The relative intensity of the noise and alert tone at the partial masking threshold was then used to create masker+maskee samples to be objectively measured with various loudness models. The objective estimates were then compared to the subjective results to derive recommendations as to how best to objectively estimate the partial masking thresholds.