AES Vienna 2007: Poster Session P6

P6 - Signal Processing, Sound Quality Design

Saturday, May 5, 15:00 — 16:30

P6-1 On Development of New Audio Codecs—Imre Varga, Siemens Networks - Munich, Germany
This paper presents the works recently completed or on-going in 3GPP and ITU-T on the development of new audio codecs. The main applications are wideband speech telephony, audio conferencing, and mobile multimedia applications including Packet-Switched Streaming (PSS), Multimedia Messaging (MMS), and Multimedia Broadcast/Multicast Service (MBMS). In the standardization process, terms-of-reference describing design constraints and performance requirements, test plans, selection rules are finalized first. Next, extensive subjective listening testing is conducted. The codec selection is based on the selection test results and the selection rules. Characterization phase of testing allows obtaining the full amount of information on the codec behavior.
Convention Paper 7022 (Purchase now)

P6-2 Fixed-Point Processing Optimization of the MPEG Audio Encoder Using a Statistical Model—Keun-Sup Lee, Samsung Electronics Co. Ltd. - Suwon, Korea; Young-Cheol Park, Yonsei University - Wonju, Korea; Dae Hee Youn, Yonsei University - Seoul, Korea
Audio applications for portable devices have two critical restrictions: small size and low power consumption. Therefore, fixed-point implementations are essential for those applications. Even with a fixed-point processor, however, the data width can still be an issue because it can affect both the hardware cost and power consumption. In this paper we propose a statistical model for the MPEG AAC audio encoder that can provide an optimal precision for the implementation. The hardware with the optimal precision, being compared with the floating-point system, is supposed to have perceptually insignificant errors at its output. To have an optimal precision for the AAC encoder, we estimate the maximum allowable amount of fixed-point arithmetic errors in the bit-allocation process using the statistical model. Finally, we present an architecture for the system appropriate for encoding the audio signals with minimum errors by the fixed-point processing. Tests showed that the fixed-point system optimized using the proposed model had sound quality comparable to the floating-point encoding system.
Convention Paper 7023 (Purchase now)

P6-3 Enhanced Bass Reinforcement Algorithm for Small-Sized Transducer—Han-gil Moon, Manish Arora, Chiho Chung, Seong-cheol Jang, Samsung Electronics Co. Ltd. - Suwon, Gyeonggi-Do, Korea
Nowadays, mobile devices such as cell phones or mp3 players using small-sized loudspeaker systems to supply sound events to users is very popular. The main reasons why small-sized transducers are being used are due to the design and the size of the devices. Unfortunately, their design and size restrain the transducers from high quality of low frequency performance. To breakthrough this physical barrier of poor low frequency generation, the well-known psychoacoustical background “missing fundamental illusion” is exploited. In this paper the method of enhancing bass perception using virtual pitch is presented. In our demonstration, listeners can feel the deep bass with fewer artifacts.
Convention Paper 7024 (Purchase now)

P6-4 Subordinate Audio Channels—Tim Jackson, Keith Yates, Manchester Metropolitan University - Manchester, UK; Francis Li, University of Salford - Salford, Greater Manchester, UK
In this paper we propose a model for a backward-compatible subordinate audio channel within a host digital audio signal. Embedding and extraction methods are presented and objective-perceptual assessment results reported. The method is designed so as to minimize perceptual degradation to the host signal and maintain compatibility with existing systems. The implementations utilize the discrete cosine transform and the masking properties of the human auditory system. Performance evaluation is assessed using the objective perceptual measure, objective difference grade. Test results support that both the host and subordinate audio channels can maintain good audio fidelity without significant perceptual degradation.
Convention Paper 7025 (Purchase now)

P6-5 Room Equalization Based on Acoustic and Human Perceptual Features—Lae-Hoon Kim, Seoul National University - Seoul, Korea; Mark Hasegawa-Johnson, University of Illinois at Urbana-Champaign - Urbana, IL, USA; Jun-Seok Lim, Sejong University - Seoul, Korea; Koeng-Mo Sung, Seoul National University - Seoul, Korea
Room equalization has the potential to create improved audio display for homes, cars, and professional applications. In this paper, the signal is inverse filtered using an inverse filter computed by using newly introduced regularized optimal multipoint frequency-warped linear prediction coefficients. We present experimental results that show that the proposed room equalization algorithm improves equalization on the equalizable parts, thus enlarging the region of perceptually effective equalization.
Convention Paper 7026 (Purchase now)

P6-6 Parametric Loudspeaker Equalization—Results and Comparison with Other Methods—German Ramos, Jose J. Lopez, Technical University of Valencia - Valencia, Spain
The results obtained by a loudspeaker equalization method are presented and compared with other equalization methods. The main characteristic of the proposed method resides on the fact that the equalizer structure is planned from the beginning as a chain of SOS (Second Order Sections), where each SOS is a low-pass, high-pass, or parametric filter defined by its parameters (frequency, gain, and Q), and designed by a direct search method. This filter structure, combined with the subjectively motivated error function employed, allows obtaining better results from a subjective point of view and requiring lower computational cost. The results have been compared with different FIR (finite impulse response) and IIR (infinite impulse response) filter design methods, with and without warped structures. In all cases, for the same computational cost, the presented method obtains a lower error function value.
Convention Paper 7027 (Purchase now)

P6-7 A Zero-Pole Vocal Track Model Estimation Method Accurately Reproducing Spectral Zeros—Damián Marelli, University of Vienna - Vienna, Austria; Peter Balazs, Austrian Academy of Sciences - Vienna, Austria
Model-based speech coding consists in modeling the vocal tract as a linear time-variant system. The all-pole model produced by the computationally efficiency linear predictive coding method provides a good representation for the majority of speech sounds. However, nasal and fricative sounds, as well as stop consonants, contain spectral zeros, which requires the use of a zero-pole model. Roughly speaking, a zero-pole model estimation method typically does a nonparametric estimation of the vocal tract impulse response and tunes the zero-pole model to fit this estimation in a square sense. In this paper we propose an alternative strategy. We tune the zero-pole model to directly fit the power spectrum of the speech signal in a logarithmic scale, to be consistent with the way the human ear perceives sounds. In this way, we avoid the error introduced by the vocal tract impulse response estimation and obtain a model that is more accurate at reproducing spectral zeros in a logarithmic scale. A drawback of the proposed method, however, is its computational complexity.
Convention Paper 7028 (Purchase now)

P6-8 Artificial Speech Synthesis Using LPC—Manjunath D. Kadaba, Uvinix Computing Solutions Bangalore/Karnataka - Bangalore, India
Speech analysis and synthesis with Linear Predictive Coding (LPC) exploit the predictable nature of speech signals. Cross-correlation, autocorrelation, and autocovariance provide the mathematical tools to determine this predictability. If we know the autocorrelation of the speech sequence, we can use the Levinson-Durbin algorithm to find an efficient solution to the least mean-square modeling problem and use the solution to compress or resynthesize the speech.
Convention Paper 7029 (Purchase now)