AES 34th International Conference, Jeju Island, Korea

HOME > Program > Technical Sessions

Program at a Glance

Technical Sessions

Thursday, 28 August

Friday, 29 August

Saturday, 30 August

09:00am-09:30am

Invited Lecture III: Nokia's view on Mobile Audio Development
Juha Beckman, Nokia, Finland,

Audio became an increasingly important component of wireless handsets. From the view of a leading mobile phone company, we will explore future directions of hardware aspects including electronics, loudspeakers, microphones, signal processing, connectivity, services, and contents.

09:30am-10:30am
<ET> Evaluation & Testing

ET-1
Impression Evaluation Model for Button Sounds Using a Neural twork
Gen Onishi, Hiroshima International University, Japan, Shunsuke Ishimitsu, Hiroshima City University, Japan, Koji Sakamoto, Takayuki Arai, University of Hyogo, Japan, Toshikazu Yoshimi, Yuichi Fujimoto and Kenichi Kawasaki, Pioneer Corporation, Saitama, Japan

This paper presents an impression evaluation model for button sounds generated when users press the buttons on car audio equipment using a neural network. The dynamic characteristics of 11 kinds of button sounds obtained by their wavelet transform frequencies and sound pressure values are fed into the network model inputs. The model then responds with three factor scores, "esthetic", "force" and "metallic", and an evaluation value of "offensive - pleasant" as the outputs. By analyzing the inside functions of the neural network after training, we confirmed the model acquired a mechanism that extracted four impression evaluation values from the sound characteristics, thus showing the model could attain automation of button sound design.

ET-2
A Study of Evaluating the Button Sounds usingWavelets
Shunsuke Ishimitsu, Koji Sakamoto, Hiroshima City University, Japan, Toshikazu Yoshimi, Yuichi Fujimoto, Kenichi Kawasaki, Pioneer Corporation, Saitama, Japan

A lot of attention has been directed at designing various sounds that are treated as noise, such as automobile acceleration sounds and cleaner sounds. The reason is that the idea of sound being a normal part of product operation has permeated society. We focused on sound design and evaluating it for 11 kinds of button sounds. First, an impression was extracted by the semantic differential (SD) method, and the relevance with a time frequency analysis was investigated. Next, we confirmed whether or not the impression changed when a sound that generated a bad impression was processed using adaptive control into a sound that generated a good impression.

11:00am-12:30pm
<CAS2> Coding for Audio and Speech, PART2

CAS2-1
Introduction to the OpenCORE audio components used in the Android platform
Javier Tapia, Jim Kosmach, Greg Sherwood, Ralph Neff, PacketVideo Corporation, San Diego CA, USA

Audio and speech codecs such as MP3, AAC, and AMR are used extensively on mobile devices throughout the world. In the ideal case, such codecs rely on hardware acceleration. However, it is also very common to see software audio codecs running on the main application processor, which is often an ARM core processor. Such codecs must be memory efficient, processing cycle efficient, portable to multiple operating systems, robust to data loss, and must also have a modular interface. In this paper, we introduce the OpenCORE multimedia framework and associated optimized audio codecs which are a part of the Android platform. We show how these components meet the challenging requirements for use in mobile devices. The OpenCORE audio components are currently available from the Open Handset Alliance as part of the Android SDK, and the source code for these components is scheduled for release in late 2008. The components are thus freely available for use in mobile device projects, and for non-mobile projects as well.

CAS2-2
An Efficient Foward Prediction Order Selection Method for MPEG-4 Audio Lossless Coding
Choong Sang Cho, Je Woo Kim, Seok Pil Lee, Byeong Ho Choi, Korea Electronic Technology Institute (KETI), Kyunggi, Korea, Jin Ah Kang, Hong Kook Kim, Gwangju Institute of Science and Technology (GIST), Gwangju, Korea

Recently, the users of multi-media products require high quality audio services. Such services can be realized by lossless audio coding of multi-channel audio. The MPEG-4 audio lossless coding (ALS) is a good candidate for doing this. However, a proper forward prediction order selection is crucial to the performance of MPEG-4 ALS since the compression ratio of MPEG-4 ALS highly depends on the prediction order. In this paper, we propose a new method which can minimize the total bit rate with respect to the prediction order. That is, the proposed method estimates a prediction order where the bit rate increment due to the prediction order becomes higher than the bit rate reduction of the residual encoding, resulting in the increase of the total bit rate. For this end, we first compute the mean squared errors (MSEs) of residual corresponding to two successive prediction orders, and then estimate the prediction order by comparing the relative reduction in MSEs with a predefined threshold that corresponds to the bit rate for the quantization of a reflection coefficient. The performance of the proposed method is evaluated by measuring the compression ratio. As a result, it is shown from the experiments that the proposed method has better compression ratio than the normal method in MPEG-4 ALS and also it has a comparable compression ratio to the adaptive method in MPEG-4 ALS.

CAS2-3
Segmented Dimensionality Reduction Coding on Frequency Domain Signal
Minje Kim, Seungkwon Beack, Taejin Lee, Daeyoung Jang, Kyeongok Kang, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea

This paper proposes schemes of compressing frequency domain acoustic signals using dimensionality reduction methods. Dimensionality reduction methods which work on a two-dimensional matrix usually result in high compression ratio since they not only allow us to represent the input matrix with smaller amount of data, but exploit intrinsic information of the original data. Frequency domain signals can be seen as a (number of frequency bands) (number of total frames) input matrix of dimensionality reduction methods. However, in this case, real-time encoding is not possible and encoder-side delay is inevitable which amounts to the length of whole input signal. To minimize the delay this paper proposes a coding scheme which conducts multiple dimensionality reduction on segments of input data frames serially.

Thursday, 28 August

Friday, 29 August

Saturday, 30 August