AES 29th International Conference, Seoul, Korea

Return to 29th Home

Program at a glance

Opening and Introduction	John Oh, Chair of 29^th AES Conference
Welcoming Address	Koeng-Mo Sung Seoul Nat’l Univ. & Institute of Electronics Engineers of Korea
Congratulatory Address	Jung-Hee Song Ministry of Information and Communication

Conference Talk: Perspective of Mobile Music Service

Won-Yong Jo
Senior Manager, MelOn Business Team Leader
Content Business Division

Specific information

-	Senior Manager, Won Yong Jo is in charge of ubiquitous music service “MelOn” of SK Telecom which known as the Korea’s top wireless operator.
-	As a designer and operator whose driving project “MelOn” to success, he carries “MelOn” forward to create new digital contents market by the synergy of convergence of wired and wireless network.
-	He played a great role in conceiving and developing the “MelOn” which proved to be a big hit, as a total of 6,000,000 people have signed up for it about 18 months after its inception

Abstract

A leading mobile operator in Korea, SK telecom, has acquired over 6 million subscribers after a year providing of fixed-mobile music portal, Melon. To combat prevailing piracy and make consumers believe its value has been a challenge. However, it provides the potential of mobile music service through a integrated rental business model.
Melon is an integrated music service in the sense that users can download or stream over 850,000 tunes through Melon website via broadband networks or on move through SKT’s EV-DO networks. The tunes can be shared between PC, MP3 players and handsets without additional charge. In addition to well-organized music pools arranged by charts or genre, Melon hosts showcase, which delivers the latest albums that have not been shown in record shops. Users can watch music video as well.

MelOn has been regarded as a prototype in terms of business model and service features in the upcoming digital music market.

Industrial Solution Introduction

Date & Time: October 2(Saturday), 4:30 – 5:30 p.m.
Place: Grand Ballroom, Engineer House

Presentation Schedule

Time	Company
16:30 - 16:40	Fraunhofer IIS
16:40 - 16:50	LG Electronics
16:50 - 17:00	ON Semiconductor
17:00 - 17:10	Oxford Digital Limited
17:10 - 17:20	Pulsus Technologies
17:20 - 17:30	Samsung Advanced Institute of Technology
17:30 - 17:40	Softmax

Workshop: Power Efficient Audio

Date & Time: October 4 (Monday) 1:30 – 3:30 p.m.
Place: Grand ballroom, Engineer House

Description

Increasingly, power efficiency of audio systems is increasingly important in mobile and handheld applications. Music playing capability became inevitable function of mobile handsets and performance and storage capacity of MP3 players are rapidly improving. The purpose of this workshop is to review theory, design and practice in power efficient audio technologies from various aspects. The discussion will include hardware issues of semiconductor chips, transducers, software, and efficient amplification methods including class-D amplifier.

Panel

Juha Backman, Nokia, Finland
Kiyoung Choi, Seould National University, Korea
Pascal Tournier, On-semiconductor, France
Pierre-Louis Bossart, Freescale, USA (to be confirmed)
Jihong Kim, Seould National University, Korea
Yonhong Jhung, Tamul Multimedia, Korea
John J.H. Oh (Chair), Pulsus Technologies, Korea

* More Information and abstracts will be provided onsite.

Technical Sessions (This preliminary program is accurate as of press time)

Saturday, September 2

Paper Session 1 Coding I	10:00 am-10:30 am

1-1 Multichannel Goes Mobile: MPEG Surround Binaural Rendering (Invited), Jeroen Breebaart¹, Juergen Herre², Lars Villemoes³, Craig Jin⁴, Kristofer Kjörling³, Jan Plogsties², ¹Philips Research Laboratories, The Netherlands, ²Fraunhofer IIS, Germany, ³Coding Technologies, Sweden, ⁴VAST Audio, Australia Surround sound is on the verge of broad adoption in consumers' homes, for digital broadcasting and even for Internet services. The currently developed MPEG Surround technology offers bitrate efficient, and mono/stereo compatible, transmission of high-quality multi-channel audio. This enables multi-channel services for applications where mono or stereo backwards compatibility is required, as well as applications with severely bandwidth limited distribution channels. This paper outlines a significant addition to the MPEG Surround specification which enables computationally efficient decoding of MPEG Surround data into binaural stereo, as is appropriate for appealing surround sound reproduction on mobile devices, such as cellular phones. The publication describes the basics of the underlying MPEG Surround architecture, the binaural decoding process, and subjective testing results.

Paper Session 2 Coding II	11:00 am-12:30 pm

2-1 Bandwidth Extension for Scalable Audio Coding, Miyoung Kim, Eunmi Oh, Dohyung Kim, Junghoe Kim, Sangwook Kim, Samsung Advanced Institute of Technology (SAIT), Korea MPEG-4 BSAC has fine grain scalability functionality that bitstream can be truncated and decoded at any layer from one full bitstream. This fine grain scalability supports adaptive transmission and decoding according to user control, terminal specifications and network environment. In current BSAC scheme, however, as the transmitted layers become less and less the decoded output loses its high frequency signals and the sound quality becomes degraded. In this paper, a novel way was proposed which recovers the missing frequency signals when the decoded bitrate is lower than top bitrate. It provides full bandwidth at any bitrate below top bitrate and graceful degradation of sound quality in scalable reproduction. It also improves audio quality at low bitrate with top layer.

2-2 Preprocessing Method For Enhancing Digital Audio Quality In Speech Communication System Geun-Bae Song¹, Chul-Yong Ahn¹, Jae Bum Kim¹, Ho-Chong Park², Austin Kim¹, ¹Samsung Electronics, Korea, ²Kwangwoon University, Korea This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

2-3 Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting- Dalwon Jang¹, Seungjae Lee², Jun Seok Lee², Minho Jin¹, Jin S. Seo², Sunil Lee¹, Chang D. Yoo¹, ¹Korea Advanced Institute of Science and Technology, Korea, ²Electronics and Telecommunications Research Institute, Korea In this paper, an automatic commercial monitoring system using audio fingerprinting is proposed. The goal of the commercial monitoring system is to identify the title and the exact duration of commercials in real-time. To achieve this, only the audio is considered. The audio is easy to handle in real-time and can provide high accuracy for commercial identification. More precisely, the spectral subband centroids are extracted from an audio part of a commercial and indexed using the K-D tree algorithm. To detect aired commercials robustly, a four-step verification method using the indexed tree of the commercials is proposed. Experimental results show that the proposed system is robust against degradations during the real broadcasting and recording process and thus can fulfill the commercial monitoring satisfactorily.

Paper Session 3. Class-D Amplifier I	1:30 pm-2:30 pm

3-1 Full Digital Amplification for Handheld Devices (Invited) , John Oh, Pulsus Technolgies, Korea The Class-D audio amplifier has intrinsic advantage of power-efficiency in mobile and handheld application such as mobile phones and MP3 players. The digital class-D amplifier (in other words, full digital amplifier) is ideal as we can implement a better quality audio with integrated DSP feature, while digital interface is immune to noisy environment of handheld systems such as mobile phone. However, this technology was could not be adopted in handheld audio systems because of several technological challenges. We will review the issues of full digital amplification in handheld environment and implementation of SoC products which was first applied to music mobile phone recently.

3-2 Chaotic Modulation in PWM Digital Amplifier Vladislav Shimanskiy, Samsung Electronics, Korea Performance of a pure digital audio amplifier using pulse width modulation (PWM) highly depends on suppression of high frequency, inaudible, components in its spectrum. Noise-like phase modulation of PWM carrier signal allows reducing of nonlinear effects in demodulation filter, and can be of advantage for EMC reasons. In this paper we propose a method of digital PWM amplifier implementation where a digital chaotic oscillator is used for carrier spreading. Modulation principle, interval modulator schematic, and chaotic oscillator structure as well as experimental results are presented

Poster Session	2:30 pm-3:30 pm

P-1 Dual Channel Audio Decoding Architecture of Digital TV SoC Hyo Jin Kim, Joon Il Lee, Tae Hoon Hwang, Myeong Hoon Lee, H. O. Oh, Seong Jong Choi, LG Electronics, Korea In this paper, we describe design issues of an audio decoding processor that can be applied especially to decoding and/or encoding more than two audio contents simultaneously. The dual channel decoding is an emerging issue in the upcoming convergent multimedia applications including Digital TV, DVD, cellular phone, and Internet audio. To satisfy various requirements for dual channel decoding, we propose a novel DSP hardware architecture and a software structure.

P-2 Multichannel Sound Scene Control for MPEG Surround Seungkwon Beack, Jeongil Seo, Inseon Jang, Dae-young Jang, Electronics and Telecommunications Research Institute (ETRI), Korea MPEG Surround is a technique to compress multichannel signals as backward compatible mono or stereo sound with a little bit increase of bitrate for additional side information. We call that additional information as spatial cues which can remove a lot of redundancy between inter-channels. In this paper, to provide useful functionality to MPEG Surround, we proposed multichannel sound scene control algorithm based on the modification of spatial cues. By simple modification of spatial cues for MPEG Surround, the overall sound scene produced by multichannel sound can be easily controlled and panned according to user preference and other system’s request such as synchronization of multi-view video and interactive TV.

P-3 Evaluation of PEAQ for the Quality measurement of Perceptual Audio Encoders Ashok Magadum, Ittiam Systems, Karnataka, India Quality measurement of perceptual audio encoders during design and embedded platform implementation plays a critical role. Perceptual audio codecs, which are lossy in nature, provide tradeoff between quality and complexity based on the application. In this investigation, the PEAQ (Perceptual Evaluation of Audio Quality) is used for measurement of the audio quality and observations are made on its benefits and limitations. Enhancements to improve the performance of PEAQ for quality measurement of perceptual audio codecs are also discussed.

P-4 Implementation of Sound Engine Of Gayageum Based on Intel Bulverde PX272A Processor Sang-Jin Cho, Sung-min Cho, Hoon Oh, Ui-Pil Chong, University of Ulsan, Korea In this paper, we describe a sound synthesis of Korean plucked string instrument called gayageum and implementation of the sound engine based on the Intel Bulverde PXA272A processor. Commuted waveguide synthesis (CWS) is used for physical modeling of gayageum and is modified to include transmission characteristic of movable bridge called Anjok. Characteristic of Anjok is extracted from recorded sound by means of adaptive filtering for inverse system identification. To implement sound engine of gayageum, we use X-hyper272A board which equips Intel Bulverde PXA272 processor as prototype of sound engine.

Paper Session 4. Class-D Amplifier II	3:30 pm-4:30 pm

4-1 A Hybrid System Approach for Class-D audio Amplifier- Gaël Pillonnet¹, Nacer Abouchi², Philippe Marguery¹, ¹ST Microelectronics, France, ²CPE Lyon, France The field of switching electronics poses challenging control problems that cannot be treated in a complete manner using tradional modelling and controller design approaches. Class D amplifiers invite the application of advanced hybrid systems methodologies. Nowadays, the recent theoretical advances in field of hybrid systems are inviting both the control and the electronics communities to revisit the model and the control issues. Motivated by this, a novel model for Class D is outlined in this paper. As motor drives and dc-dc conversion fields, the hybrid control theory can be applied to audio amplifier but the first step is to find the good hybrid model to describe the system.

4-2 Class AB versus D for Multimedia Applications in Portable Electronic Devices Pascal Tournier, On Semiconductor, France Portable electronic products are integrating more and more multimedia capabilities. This has created great interest in incorporating stereo capabilities. Thus, when considering the power management inside a cellular phone for example, the audio function is becoming a key player. By looking at the basics of Class AB and D topologies, this paper is promoting the efficiency of the new filterless ClassD amplifiers.

Sunday, September 3

Paper Session 5 Implementations I	9:00 am-10:30 am

5-1 Using aacPlus for Premium Color Ring Back Tones Andreas Ehret¹, Michael Schug¹, Holger Hoerich¹, Seongsoo Park², Dae Sic Woo², Dong-Hahk Lee², Jin Soo Park³, Sung Il Park³, ¹Coding Technologies, Germany, ²SK Telecom, Korea, ³Mcubeworks, Korea aacPlus is a highly efficient audio coding scheme which is capable of providing very good audio quality at low bit rates. Color ring back tone services provide a music experience while waiting for a called person to answer the phone, replacing the traditional beep sound ring tones. These services have been commercialized since 2002 but some audio quality issues remain as the existing speech codecs are being used to code music. With the increasing availability of aacPlus on mobile phones it becomes possible to use aacPlus for this kind of application and it will be the core component of new premium color ring back tone services. The new service will provide significantly improved audio quality which allows for an excellent user experience with music ring back tones. The basic parameters and extensions to adapt the aacPlus codec to the transmission channel are described. Simulation and subjective listening test results are also presented in this paper.

5-2 Low Complexity Virtual Bass Enhancement Algorithm for Portable Multimedia Device Manish Arora, HyuckJae Lee, Seongcheol Jang, Samsung Electronics, Korea Earbuds and Insert Ear Earphones have gained immense popularity with mobile multimedia systems recently owing to their convenience and easy low cost design. Unfortunately their design is subject to severe constraints especially affecting their low frequency signal performance. Bass signal performance contributes significantly to the user perceived sound quality and a good bass signal reproduction is essential. Increasing the sound energy in the bass signal range is an unviable solution since the gain required are exceedingly high and signal distortion occurs because of speaker overload. Recently methods are being proposed to invoke low frequency illusion using psychoacoustic phenomena of the missing fundamental. This paper proposes a simple and effective signal processing method to create bass signal illusion using the missing fundamental effect, at a complexity of 12 MIPS on Motorola 56371 audio DSP.

5-3 A Fast Quantization Loop Algorithm for MP3/AAC Encoders Hyen-O Oh¹, Young-Cheol Park², Hyo Jin Kim¹, Seung Jong Choi¹, ¹LG Electronics, Korea, ²Yonsei University, Korea A fast quantization loop algorithm for low power implementations of MP3/AAC encoder is proposed. The proposed algorithm exploits the empirically-driven linear relationship of quantizer parameters, and it can reduce dramatically the maximum number of iterations needed in quantization. The resulting MP3/AAC encoder can be implemented in real time with a power-limited system.

Paper Session 6 Speech Processing	11:00 am-12:30 pm

6-1 Hands-Free Audio and Its Application to Telecommunication Terminals Martin Schönle¹, Christophe Beaugeant¹, Kai Steinert¹, Heinrich W. Loellmann², Bastian Sauert², Peter Vary², ¹Siemens AG, Germany, ²RWTH Aachen University, Germany In this paper we focus on the enhancement of speech quality by hands-free audio systems in modern, multimedia enabled telecommunication terminals. Requirements like the compatibility with wideband audio and full-duplex hands-free operation demand for sophisticated system designs. As a starting point we introduce a state-of-the-art system developed for mobile phones, considering practical aspects of algorithm design. Algorithmic extensions to this solution are presented, supporting speech enhancement for the user of the terminal or reducing the round-trip delay. In addition, more advanced concepts are discussed. They offer the capability of joint design for the most important algorithms and provide the possibility to exploit psychoacoustic properties of the human ear.

6-2. A Band Extension Technique for G.711 Speech Based on Full Wave Rectification and Steganography Naofumi Aoki, Hokkaido University, Japan This study investigates a band extension technique for speech data encoded with G.711, the most common codec for digital speech communications systems such as VoIP. The proposed technique employs steganography for the transmission of the side information required for the band extension based on full wave rectification. Due to the steganography, the proposed technique is able to enhance the speech quality without an increase of the amount of data transmission. From the results of a subjective evaluation, it is indicated that the proposed technique may potentially be useful for improving the speech quality, compared with the conventional

6-3. Adaptive Microphone Array with Self-Delay Estimator Yang-Won Jung¹, Hong-Goo Kang², Hyo Jin Kim¹, Seung Jong Choi¹, ¹LG Electronics, Korea, ²Yonsei University, Korea In real environments, it is difficult to acquire clean speech because of the noise and the reverberation in the environments. Microphone array system is the one of the most promising solution which can provide accurate acquisition of speech in noisy environments. Normally, a time delay estimator is required for an adaptive microphone array system. In this paper, an adaptive microphone array system with self-delay estimator is proposed. The generalized sidelobe canceller (GSC) with an adaptive blocking matrix (ABM) is considered as an adaptive array algorithm. By showing that the ABM can estimate the relative time delay between each sensor, the proposed system utilizes the ABM not only for blocking target components in the blocked signal path, but also for estimating the relative time delay. Therefore, the proposed system requires only the GSC structure while maintaining the system performance similar to the conventional system using an additional time delay estimator as a preprocessor. Simulation results show that the performance of the proposed system is identical to the conventional system that uses an additional time delay estimation module.

Monday, September 4

Paper Session 7 Implementations II	9:00 am-10:30 am

7-1 SLIMbus™: An Audio, Data and Control Interface For Mobile Devices Juha Backman¹, James Schuessler², Bernard van Vlimmeren³, Xavier Lambrecht⁴, Peter Kavanagh², Kenneth Boyce², Genevieve Vansteeg², Chris Travis⁵, ¹Nokia, Finland, ²National Semiconductor, USA, ³Philips Research, The Netherlands, ⁴Philips Applied Technologies, Belgium, ⁵Wolfson Microelectronics, UK A new inter-chip interface standard under development, SLIMbus™, to be published by MIPI (Mobile Industry Proces-sor Interface) Alliance is described. The primary target of the interface is isochronous transfer of digital audio signals, supporting all common sample rates and word lengths, and related device control, but the interface is equally well ap-plicable for any application needing moderate data rates up to 28 Mbit/s. The serial multi-drop bus with separate data and clock wires supports high-quality mobile audio by providing high-quality clock, flexible power management, and a large number of devices and audio channels.

7-2 Acoustic Communication with OFDM Signal Embedded in Audio Hosei Matsuoka, Yusuke Nakashima, Takeshi Yoshimura, Multimedia Laboratories, NTT DoCoMo, Japan This paper presents a method of aerial acoustic communication in which data is modulated using OFDM (Orthogonal Frequency Division Multiplexing) and embedded in regular audio material. It can transmit at a data rate of about 1kbps, which is much higher than is possible with other data hiding techniques. In our method, the high frequency band of the audio signal is replaced with OFDM carriers, each of which is power-controlled according to the spectrum envelope of the audio signal. The method enables the transmission of short messages from loudspeakers to mobile handheld devices without significantly degrading the quality of the original voice or music signals.

7-3 A Survey of Mobile Audio Architecture Issues Pierre-Louis Bossart, Freescale Semiconductors, USA This paper aims at providing a broad overview of the status of audio support in mobile hand-held devices. The evolution from audio players or cell-phones into convergent devices supporting multiple and complex use-cases has generated new system constraints and new mobile audio architectures. By reviewing the complete mobile audio framework, from codecs to audio middleware, from platform architecture to hardware, we provide a down-to-earth explanation for critical and dimensioning factors and shed some light on existing standards, upcoming solutions and needed compromises

Paper Session 8 3D Audio, Synthetic Audio	11:00 am-12:30 am

8-1 Low Complexity 3D Audio Algorithms for Handheld Devices Young-Cheol Park¹, Taik-sung Choi², Jae-woong Jung² Dae-Hee Youn², Si-wook Nam², Jung-min Song², ¹Yonsei Unversity, ²LG Electronics, Korea In this paper, we present low-complexity 3D audio algorithms suitable for applications based on low-power and low-memory hardware. The algorithms include an artificial reverberator associated with a time-varying all-pass filter, IIR crosstalk cancellation filters implementing frequency warping, and a headphone externalization algorithm based on simplified head-related transfer functions (HRTF's). Performances and complexities of the presented algorithms are measured and compared with the conventional ones. 20 30% of memory reduction was gained using the time-varying artificial reverberator, and 50% of computational reduction was achieved using the IIR crosstalk cancellation filters. Finally, 80% of computational complexity and 50% of memory were saved by using the simplified externalization system.

8-2 Stereo Widening For Loudspeakers in Mobile Devices Pauli Minnaar, Jan Abildgaard Pedersen, AM3D, Denmark A new cross-talk cancellation system is proposed for processing normal stereo signals to be played back through a pair of very closely-spaced loudspeakers, such as in a mobile phone. The goal of the proposed system is to substantially widen the stereo image while maintaining the timbre of the original stereo material. Several implementations of the system are shown and the advantages over traditional cross-talk cancellation systems are discussed.

8-3 Evaluation of Iterative Matching for Scalable Wavetable Synthesis Simon Wun¹, Andrew Horner², ¹Institute for Infocomm Research, Singapore, ²Hong Kong University of Science and Technology, Hong Kong Wavetable synthesis of musical instrument tones has attracted some interest in its application in the generation of polyphonic ringtones on mobile phones. Recent work on wavetable synthesis assumes that a constant number of wavetables are used throughout the duration of a tone. A scalable approach would dynamically allocate wavetables to simultaneous voices to allow more polyphony and improve the sound quality. Iterative wavetable matching finds the basis spectra one by one over several iterations, and offers real-time control of the number of wavetables, supporting scalability. This paper describes and evaluates several iterative methods. Matching results for a range of instruments show that, on average, iterative local search can find matches with errors within 0.5\% of near-optimal non-iterative solutions. Iterative local search only rarely gets stuck on bad local optima.