AES Conferences and Conferences

Return to 29th Home

  Program at a glance

  Opening & Conference Talk

Opening and Introduction John Oh, Chair of 29th AES Conference
Welcoming Address Koeng-Mo Sung
Seoul Nat’l Univ. & Institute of Electronics Engineers of Korea
Congratulatory Address Jung-Hee Song
Ministry of Information and Communication

Conference Talk: Perspective of Mobile Music Service
Won-Yong Jo
Senior Manager, MelOn Business Team Leader
Content Business Division
Specific information

Senior Manager, Won Yong Jo is in charge of ubiquitous music service “MelOn” of SK Telecom which known as the Korea’s top wireless operator.


As a designer and operator whose driving project “MelOn” to success, he carries “MelOn” forward to create new digital contents market by the synergy of convergence of wired and wireless network.


He played a great role in conceiving and developing the “MelOn” which proved to be a big hit, as a total of 6,000,000 people have signed up for it about 18 months after its inception


A leading mobile operator in Korea, SK telecom, has acquired over 6 million subscribers after a year providing of fixed-mobile music portal, Melon. To combat prevailing piracy and make consumers believe its value has been a challenge. However, it provides the potential of mobile music service through a integrated rental business model.
Melon is an integrated music service in the sense that users can download or stream over 850,000 tunes through Melon website via broadband networks or on move through SKT’s EV-DO networks. The tunes can be shared between PC, MP3 players and handsets without additional charge. In addition to well-organized music pools arranged by charts or genre, Melon hosts showcase, which delivers the latest albums that have not been shown in record shops. Users can watch music video as well.

MelOn has been regarded as a prototype in terms of business model and service features in the upcoming digital music market.

  Industrial Solution Introduction

Date & Time: October 2(Saturday), 4:30 – 5:30 p.m.
Place: Grand Ballroom, Engineer House

Presentation Schedule
Time Company
16:30 - 16:40   Fraunhofer IIS
16:40 - 16:50   LG Electronics
16:50 - 17:00   ON Semiconductor
17:00 - 17:10   Oxford Digital Limited
17:10 - 17:20   Pulsus Technologies
17:20 - 17:30   Samsung Advanced Institute of Technology
17:30 - 17:40   Softmax

  Workshop: Power Efficient Audio

Date & Time: October 4 (Monday) 1:30 – 3:30 p.m.
Place: Grand ballroom, Engineer House


Increasingly, power efficiency of audio systems is increasingly important in mobile and handheld applications. Music playing capability became inevitable function of mobile handsets and performance and storage capacity of MP3 players are rapidly improving. The purpose of this workshop is to review theory, design and practice in power efficient audio technologies from various aspects. The discussion will include hardware issues of semiconductor chips, transducers, software, and efficient amplification methods including class-D amplifier.


Juha Backman, Nokia, Finland
Kiyoung Choi, Seould National University, Korea
Pascal Tournier, On-semiconductor, France
Pierre-Louis Bossart, Freescale, USA (to be confirmed)
Jihong Kim, Seould National University, Korea
Yonhong Jhung, Tamul Multimedia, Korea
John J.H. Oh (Chair), Pulsus Technologies, Korea

* More Information and abstracts will be provided onsite.

  Technical Sessions (This preliminary program is accurate as of press time)

Saturday, September 2

Paper Session 1 Coding I 10:00 am-10:30 am

1-1 Multichannel Goes Mobile: MPEG Surround Binaural Rendering (Invited), Jeroen Breebaart1, Juergen Herre2, Lars Villemoes3, Craig Jin4, Kristofer Kjörling3, Jan Plogsties2, 1Philips Research Laboratories, The Netherlands, 2Fraunhofer IIS, Germany, 3Coding Technologies, Sweden, 4VAST Audio, Australia
Surround sound is on the verge of broad adoption in consumers' homes, for digital broadcasting and even for Internet services. The currently developed MPEG Surround technology offers bitrate efficient, and mono/stereo compatible, transmission of high-quality multi-channel audio. This enables multi-channel services for applications where mono or stereo backwards compatibility is required, as well as applications with severely bandwidth limited distribution channels. This paper outlines a significant addition to the MPEG Surround specification which enables computationally efficient decoding of MPEG Surround data into binaural stereo, as is appropriate for appealing surround sound reproduction on mobile devices, such as cellular phones. The publication describes the basics of the underlying MPEG Surround architecture, the binaural decoding process, and subjective testing results.

Paper Session 2 Coding II 11:00 am-12:30 pm

2-1 Bandwidth Extension for Scalable Audio Coding, Miyoung Kim, Eunmi Oh, Dohyung Kim, Junghoe Kim, Sangwook Kim, Samsung Advanced Institute of Technology (SAIT), Korea
MPEG-4 BSAC has fine grain scalability functionality that bitstream can be truncated and decoded at any layer from one full bitstream. This fine grain scalability supports adaptive transmission and decoding according to user control, terminal specifications and network environment. In current BSAC scheme, however, as the transmitted layers become less and less the decoded output loses its high frequency signals and the sound quality becomes degraded. In this paper, a novel way was proposed which recovers the missing frequency signals when the decoded bitrate is lower than top bitrate. It provides full bandwidth at any bitrate below top bitrate and graceful degradation of sound quality in scalable reproduction. It also improves audio quality at low bitrate with top layer.

2-2 Preprocessing Method For Enhancing Digital Audio Quality In Speech Communication System Geun-Bae Song1, Chul-Yong Ahn1, Jae Bum Kim1, Ho-Chong Park2, Austin Kim1, 1Samsung Electronics, Korea, 2Kwangwoon University, Korea
This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

2-3 Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting- Dalwon Jang1, Seungjae Lee2, Jun Seok Lee2, Minho Jin1, Jin S. Seo2, Sunil Lee1, Chang D. Yoo1, 1Korea Advanced Institute of Science and Technology, Korea, 2Electronics and Telecommunications Research Institute, Korea
In this paper, an automatic commercial monitoring system using audio fingerprinting is proposed. The goal of the commercial monitoring system is to identify the title and the exact duration of commercials in real-time. To achieve this, only the audio is considered. The audio is easy to handle in real-time and can provide high accuracy for commercial identification. More precisely, the spectral subband centroids are extracted from an audio part of a commercial and indexed using the K-D tree algorithm. To detect aired commercials robustly, a four-step verification method using the indexed tree of the commercials is proposed. Experimental results show that the proposed system is robust against degradations during the real broadcasting and recording process and thus can fulfill the commercial monitoring satisfactorily.

Paper Session 3. Class-D Amplifier I 1:30 pm-2:30 pm

3-1 Full Digital Amplification for Handheld Devices (Invited) , John Oh, Pulsus Technolgies, Korea
The Class-D audio amplifier has intrinsic advantage of power-efficiency in mobile and handheld application such as mobile phones and MP3 players. The digital class-D amplifier (in other words, full digital amplifier) is ideal as we can implement a better quality audio with integrated DSP feature, while digital interface is immune to noisy environment of handheld systems such as mobile phone. However, this technology was could not be adopted in handheld audio systems because of several technological challenges. We will review the issues of full digital amplification in handheld environment and implementation of SoC products which was first applied to music mobile phone recently.

3-2 Chaotic Modulation in PWM Digital Amplifier Vladislav Shimanskiy, Samsung Electronics, Korea
Performance of a pure digital audio amplifier using pulse width modulation (PWM) highly depends on suppression of high frequency, inaudible, components in its spectrum. Noise-like phase modulation of PWM carrier signal allows reducing of nonlinear effects in demodulation filter, and can be of advantage for EMC reasons. In this paper we propose a method of digital PWM amplifier implementation where a digital chaotic oscillator is used for carrier spreading. Modulation principle, interval modulator schematic, and chaotic oscillator structure as well as experimental results are presented

Poster Session 2:30 pm-3:30 pm

P-1 Dual Channel Audio Decoding Architecture of Digital TV SoC Hyo Jin Kim, Joon Il Lee, Tae Hoon Hwang, Myeong Hoon Lee, H. O. Oh, Seong Jong Choi, LG Electronics, Korea
In this paper, we describe design issues of an audio decoding processor that can be applied especially to decoding and/or encoding more than two audio contents simultaneously. The dual channel decoding is an emerging issue in the upcoming convergent multimedia applications including Digital TV, DVD, cellular phone, and Internet audio. To satisfy various requirements for dual channel decoding, we propose a novel DSP hardware architecture and a software structure.

P-2 Multichannel Sound Scene Control for MPEG Surround Seungkwon Beack, Jeongil Seo, Inseon Jang, Dae-young Jang, Electronics and Telecommunications Research Institute (ETRI), Korea
MPEG Surround is a technique to compress multichannel signals as backward compatible mono or stereo sound with a little bit increase of bitrate for additional side information. We call that additional information as spatial cues which can remove a lot of redundancy between inter-channels. In this paper, to provide useful functionality to MPEG Surround, we proposed multichannel sound scene control algorithm based on the modification of spatial cues. By simple modification of spatial cues for MPEG Surround, the overall sound scene produced by multichannel sound can be easily controlled and panned according to user preference and other system’s request such as synchronization of multi-view video and interactive TV.

P-3 Evaluation of PEAQ for the Quality measurement of Perceptual Audio Encoders Ashok Magadum, Ittiam Systems, Karnataka, India
Quality measurement of perceptual audio encoders during design and embedded platform implementation plays a critical role. Perceptual audio codecs, which are lossy in nature, provide tradeoff between quality and complexity based on the application. In this investigation, the PEAQ (Perceptual Evaluation of Audio Quality) is used for measurement of the audio quality and observations are made on its benefits and limitations. Enhancements to improve the performance of PEAQ for quality measurement of perceptual audio codecs are also discussed.

P-4 Implementation of Sound Engine Of Gayageum Based on Intel Bulverde PX272A Processor Sang-Jin Cho, Sung-min Cho, Hoon Oh, Ui-Pil Chong, University of Ulsan, Korea
In this paper, we describe a sound synthesis of Korean plucked string instrument called gayageum and implementation of the sound engine based on the Intel Bulverde PXA272A processor. Commuted waveguide synthesis (CWS) is used for physical modeling of gayageum and is modified to include transmission characteristic of movable bridge called Anjok. Characteristic of Anjok is extracted from recorded sound by means of adaptive filtering for inverse system identification. To implement sound engine of gayageum, we use X-hyper272A board which equips Intel Bulverde PXA272 processor as prototype of sound engine.

Paper Session 4. Class-D Amplifier II 3:30 pm-4:30 pm

4-1 A Hybrid System Approach for Class-D audio Amplifier- Gaël Pillonnet1, Nacer Abouchi2, Philippe Marguery1, 1ST Microelectronics, France, 2CPE Lyon, France
The field of switching electronics poses challenging control problems that cannot be treated in a complete manner using tradional modelling and controller design approaches. Class D amplifiers invite the application of advanced hybrid systems methodologies. Nowadays, the recent theoretical advances in field of hybrid systems are inviting both the control and the electronics communities to revisit the model and the control issues. Motivated by this, a novel model for Class D is outlined in this paper. As motor drives and dc-dc conversion fields, the hybrid control theory can be applied to audio amplifier but the first step is to find the good hybrid model to describe the system.

4-2 Class AB versus D for Multimedia Applications in Portable Electronic Devices Pascal Tournier, On Semiconductor, France
Portable electronic products are integrating more and more multimedia capabilities. This has created great interest in incorporating stereo capabilities. Thus, when considering the power management inside a cellular phone for example, the audio function is becoming a key player. By looking at the basics of Class AB and D topologies, this paper is promoting the efficiency of the new filterless ClassD amplifiers.

Sunday, September 3

Paper Session 5 Implementations I 9:00 am-10:30 am

5-1 Using aacPlus for Premium Color Ring Back Tones Andreas Ehret1, Michael Schug1, Holger Hoerich1, Seongsoo Park2, Dae Sic Woo2, Dong-Hahk Lee2, Jin Soo Park3, Sung Il Park3, 1Coding Technologies, Germany, 2SK Telecom, Korea, 3Mcubeworks, Korea
aacPlus is a highly efficient audio coding scheme which is capable of providing very good audio quality at low bit rates. Color ring back tone services provide a music experience while waiting for a called person to answer the phone, replacing the traditional beep sound ring tones. These services have been commercialized since 2002 but some audio quality issues remain as the existing speech codecs are being used to code music. With the increasing availability of aacPlus on mobile phones it becomes possible to use aacPlus for this kind of application and it will be the core component of new premium color ring back tone services. The new service will provide significantly improved audio quality which allows for an excellent user experience with music ring back tones. The basic parameters and extensions to adapt the aacPlus codec to the transmission channel are described. Simulation and subjective listening test results are also presented in this paper.

5-2 Low Complexity Virtual Bass Enhancement Algorithm for Portable Multimedia Device Manish Arora, HyuckJae Lee, Seongcheol Jang, Samsung Electronics, Korea
Earbuds and Insert Ear Earphones have gained immense popularity with mobile multimedia systems recently owing to their convenience and easy low cost design. Unfortunately their design is subject to severe constraints especially affecting their low frequency signal performance. Bass signal performance contributes significantly to the user perceived sound quality and a good bass signal reproduction is essential. Increasing the sound energy in the bass signal range is an unviable solution since the gain required are exceedingly high and signal distortion occurs because of speaker overload. Recently methods are being proposed to invoke low frequency illusion using psychoacoustic phenomena of the missing fundamental. This paper proposes a simple and effective signal processing method to create bass signal illusion using the missing fundamental effect, at a complexity of 12 MIPS on Motorola 56371 audio DSP.

5-3 A Fast Quantization Loop Algorithm for MP3/AAC Encoders Hyen-O Oh1, Young-Cheol Park2, Hyo Jin Kim1, Seung Jong Choi1, 1LG Electronics, Korea, 2Yonsei University, Korea
A fast quantization loop algorithm for low power implementations of MP3/AAC encoder is proposed. The proposed algorithm exploits the empirically-driven linear relationship of quantizer parameters, and it can reduce dramatically the maximum number of iterations needed in quantization. The resulting MP3/AAC encoder can be implemented in real time with a power-limited system.

Paper Session 6 Speech Processing 11:00 am-12:30 pm

6-1 Hands-Free Audio and Its Application to Telecommunication Terminals Martin Schönle1, Christophe Beaugeant1, Kai Steinert1, Heinrich W. Loellmann2, Bastian Sauert2, Peter Vary2, 1Siemens AG, Germany, 2RWTH Aachen University, Germany
In this paper we focus on the enhancement of speech quality by hands-free audio systems in modern, multimedia enabled telecommunication terminals. Requirements like the compatibility with wideband audio and full-duplex hands-free operation demand for sophisticated system designs. As a starting point we introduce a state-of-the-art system developed for mobile phones, considering practical aspects of algorithm design. Algorithmic extensions to this solution are presented, supporting speech enhancement for the user of the terminal or reducing the round-trip delay. In addition, more advanced concepts are discussed. They offer the capability of joint design for the most important algorithms and provide the possibility to exploit psychoacoustic properties of the human ear.

6-2. A Band Extension Technique for G.711 Speech Based on Full Wave Rectification and Steganography Naofumi Aoki, Hokkaido University, Japan
This study investigates a band extension technique for speech data encoded with G.711, the most common codec for digital speech communications systems such as VoIP. The proposed technique employs steganography for the transmission of the side information required for the band extension based on full wave rectification. Due to the steganography, the proposed technique is able to enhance the speech quality without an increase of the amount of data transmission. From the results of a subjective evaluation, it is indicated that the proposed technique may potentially be useful for improving the speech quality, compared with the conventional

6-3. Adaptive Microphone Array with Self-Delay Estimator Yang-Won Jung1, Hong-Goo Kang2, Hyo Jin Kim1, Seung Jong Choi1, 1LG Electronics, Korea, 2Yonsei University, Korea
In real environments, it is difficult to acquire clean speech because of the noise and the reverberation in the environments. Microphone array system is the one of the most promising solution which can provide accurate acquisition of speech in noisy environments. Normally, a time delay estimator is required for an adaptive microphone array system. In this paper, an adaptive microphone array system with self-delay estimator is proposed. The generalized sidelobe canceller (GSC) with an adaptive blocking matrix (ABM) is considered as an adaptive array algorithm. By showing that the ABM can estimate the relative time delay between each sensor, the proposed system utilizes the ABM not only for blocking target components in the blocked signal path, but also for estimating the relative time delay. Therefore, the proposed system requires only the GSC structure while maintaining the system performance similar to the conventional system using an additional time delay estimator as a preprocessor. Simulation results show that the performance of the proposed system is identical to the conventional system that uses an additional time delay estimation module.

Monday, September 4

Paper Session 7 Implementations II 9:00 am-10:30 am

7-1 SLIMbus™: An Audio, Data and Control Interface For Mobile Devices Juha Backman1, James Schuessler2, Bernard van Vlimmeren3, Xavier Lambrecht4, Peter Kavanagh2, Kenneth Boyce2, Genevieve Vansteeg2, Chris Travis5, 1Nokia, Finland, 2National Semiconductor, USA, 3Philips Research, The Netherlands, 4Philips Applied Technologies, Belgium, 5Wolfson Microelectronics, UK
A new inter-chip interface standard under development, SLIMbus™, to be published by MIPI (Mobile Industry Proces-sor Interface) Alliance is described. The primary target of the interface is isochronous transfer of digital audio signals, supporting all common sample rates and word lengths, and related device control, but the interface is equally well ap-plicable for any application needing moderate data rates up to 28 Mbit/s. The serial multi-drop bus with separate data and clock wires supports high-quality mobile audio by providing high-quality clock, flexible power management, and a large number of devices and audio channels.

7-2 Acoustic Communication with OFDM Signal Embedded in Audio Hosei Matsuoka, Yusuke Nakashima, Takeshi Yoshimura, Multimedia Laboratories, NTT DoCoMo, Japan
This paper presents a method of aerial acoustic communication in which data is modulated using OFDM (Orthogonal Frequency Division Multiplexing) and embedded in regular audio material. It can transmit at a data rate of about 1kbps, which is much higher than is possible with other data hiding techniques. In our method, the high frequency band of the audio signal is replaced with OFDM carriers, each of which is power-controlled according to the spectrum envelope of the audio signal. The method enables the transmission of short messages from loudspeakers to mobile handheld devices without significantly degrading the quality of the original voice or music signals.

7-3 A Survey of Mobile Audio Architecture Issues Pierre-Louis Bossart, Freescale Semiconductors, USA
This paper aims at providing a broad overview of the status of audio support in mobile hand-held devices. The evolution from audio players or cell-phones into convergent devices supporting multiple and complex use-cases has generated new system constraints and new mobile audio architectures. By reviewing the complete mobile audio framework, from codecs to audio middleware, from platform architecture to hardware, we provide a down-to-earth explanation for critical and dimensioning factors and shed some light on existing standards, upcoming solutions and needed compromises

Paper Session 8 3D Audio, Synthetic Audio 11:00 am-12:30 am

8-1 Low Complexity 3D Audio Algorithms for Handheld Devices Young-Cheol Park1, Taik-sung Choi2, Jae-woong Jung2 Dae-Hee Youn2, Si-wook Nam2, Jung-min Song2, 1Yonsei Unversity, 2LG Electronics, Korea
In this paper, we present low-complexity 3D audio algorithms suitable for applications based on low-power and low-memory hardware. The algorithms include an artificial reverberator associated with a time-varying all-pass filter, IIR crosstalk cancellation filters implementing frequency warping, and a headphone externalization algorithm based on simplified head-related transfer functions (HRTF's). Performances and complexities of the presented algorithms are measured and compared with the conventional ones. 20 30% of memory reduction was gained using the time-varying artificial reverberator, and 50% of computational reduction was achieved using the IIR crosstalk cancellation filters. Finally, 80% of computational complexity and 50% of memory were saved by using the simplified externalization system.

8-2 Stereo Widening For Loudspeakers in Mobile Devices Pauli Minnaar, Jan Abildgaard Pedersen, AM3D, Denmark
A new cross-talk cancellation system is proposed for processing normal stereo signals to be played back through a pair of very closely-spaced loudspeakers, such as in a mobile phone. The goal of the proposed system is to substantially widen the stereo image while maintaining the timbre of the original stereo material. Several implementations of the system are shown and the advantages over traditional cross-talk cancellation systems are discussed.

8-3 Evaluation of Iterative Matching for Scalable Wavetable Synthesis Simon Wun1, Andrew Horner2, 1Institute for Infocomm Research, Singapore, 2Hong Kong University of Science and Technology, Hong Kong
Wavetable synthesis of musical instrument tones has attracted some interest in its application in the generation of polyphonic ringtones on mobile phones. Recent work on wavetable synthesis assumes that a constant number of wavetables are used throughout the duration of a tone. A scalable approach would dynamically allocate wavetables to simultaneous voices to allow more polyphony and improve the sound quality. Iterative wavetable matching finds the basis spectra one by one over several iterations, and offers real-time control of the number of wavetables, supporting scalability. This paper describes and evaluates several iterative methods. Matching results for a range of instruments show that, on average, iterative local search can find matches with errors within 0.5\% of near-optimal non-iterative solutions. Iterative local search only rarely gets stuck on bad local optima.

Back to AES Home Page

(C) 2006, Audio Engineering Society, Inc.