|Keynote Talk: More Than Mobile - Innovative Services of KTF
Byungki Oh, KTF, Korea
KTF, one of the biggest mobile carriers in Korea, provides the SHOW third generation (3G) services which intend to deliver not only voice calls but also visual communication features such as video calling and data services. Details of the current SHOW 3G services and expectation on future services are discussed.
|Invited Lecture I:
DisplayPortTM: Digital Multimedia Display Interface standard for PCs and Mobile Devices
Kewei Yang, Analogix Semiconductor
The newly-developed DisplayPortTM standard is intended to provide an interface suited to a very wide range of applications, including both external (¡°box to box¡±) and internal (e.g., notebook PC panel interface) connections. With the widespread industry support, DisplayPortTM will replace just about every major digital multimedia display interface for PC¡¯s that are in existence today. Previous display interfaces, such as DVI, LVDS, and VGA, will no longer be included on displays in the future, as the PC industry vision is to have ultra-sleek displays with just a DisplayPortTM connector on the back of the PC. DisplayPortTM allows high-definition digital audio to be available to the display device over the same cable as the digital video signal. It delivers true plug-and-play with robust interoperability. When the optional content protection capability is active, DisplayPortTM will support viewing high definition television, video and audio types of protected content. This paper provides an overview of the proposed standard and its basic technical details.
|Invited Lecture II:
Speech Enhancement for Mobile Applications
Te-won Lee, Qualcomm, USA
We outline examples for machine learning algorithms using graphical models to represent speech signals in a systematic manner. Linear data generative models have recently gained popularity because they are able to learn efficient codes for sound signals and allow the analysis of important sound features and their characteristics to model different types of sounds, individual speech and speaker characteristics or classes of speakers. The generative model principle can be extended in time and space to handle dynamics and environmental acoustics. We present two examples for blind source separation in a graphical model. First, a method for solving the difficult problem of separating multiple sources given only a single channel observation. Second, a method for treating multi-channel observations that takes into account reverberations, sensor noise and other real environment challenges.
<SP1> Signal Processing, PART1
Discrimination of Music Signals for Mobile Broadcasting Receivers
Myungssuk Song, Hong-goo Kang, Yonsei University, Seoul, Korea
This paper proposes a Gaussian mixture model (GMM)-based music discrimination system for mobile broadcasting receivers. The objective of the system is automatically archiving music signals from audio broadcasting programs that are normally mixed with human voices, acoustic noises, commercial advertisements, and so on. To enhance the robustness of the system performance and to sharply cut the starting/ending-point of the recording, we also introduce a post-processing module whose features consist of signal duration, energy dynamics, and local variation of feature statistics. Experimental results to various input signals verify the superiority of the proposed system.
Non-linear Signal Processind for Low Frequency Enhancement
Pauli Minnaar, AM3D A/S, Aalborg, Denmark
A new method is introduced for enhancing the low frequency performance of small loudspeakers and headphones. The method overcomes some of the known problems and is able to create a powerful bass enhancement. Dynamics processing is used to maximize the bass output depending on the amount of headroom available in the original signal. This non-linear signal processing creates a series of harmonics. By carefully controlling the harmonics a rich low frequency sound can be created.
<CAS1> Coding for Audio and Speech, PART1
Framework for Unified Speech and Audio Coding
Eunmi Oh and Miyoung Kim, Samsung Advanced Institute of Technology (SAIT), Suwon, Korea
The purpose of this study is to propose a framework of unified speech and audio coding that can compress speech and music equally well, and then to verify the feasibility of a highly efficient low-rate coding scheme. In this paper, a coding scheme is introduced by utilizing flexible time and frequency representation of a filter bank called Frequency Varying Modulated Lapped Transform (FVMLT). The time/frequency resolution of FV-MLT is determined by psychoacoustic model. The output of filterbank is quantized by considering rate/distortion optimization. The high temporal resolution coding tool can be used depending on the characteristics of input signal.
Using Salient Envelope Features for Audio Coding
Joachim Thiemann, Peter Kabal, McGill University, Montreal, Quebec, Canada
In this paper, we present a perceptual audio coding method that encodes the audio using perceptually salient envelope features. These features are found by passing the audio through a set of gammatone filters, and then computing the Hilbert envelopes of the responses. Relevant points of these envelopes are isolated and transmitted to the decoder. The decoder reconstructs the audio in an iterative manner from these relevant envelope points. Initial experiments suggest that even without sophisticated entropy coding a moderate bitrate reduction is possible while retaining good quality.
Personalized music service based on parametric object-oriented spatial audio coding
Yangwon Jung, Hyun-o Oh, LG Electronics, Seoul, Korea
From the development of spatial audio coding such as MPEG Surround, the concepts of object-oriented spatial audio coding were emerged, and now, efforts are made on the standardization of MPEG SAOC (Spatial Audio Object Coding). The key target applications of MPEG SAOC were suggested as backwards compatible / interactive re-mix, and teleconferencing / telecommunications, and gaming. Among those, we pay attention on the interactive re-mix applications and propose personalized music service system.
A Bit Reduction Algorithm for Spectral Band Replication using the Masking Effect
Sang Bae Chon, Mingu Lee, Koeng-Mo Sung, Seoul National University, Seoul, Korea, Sejong University, Hee-Suk Pang, Seoul, Korea
Spectral Band Replication (SBR) is a state-of-the-art technology to enhance audio or speech codecs especially at low bitrates based on harmonic redundancy in the frequency domain. With SBR, it is possible to generate high frequency components of a full-band audio signal with a bitrate of a few kbps. In this paper, a bit reduction algorithm for SBR is proposed using the Masking Effect and threshold in quiet. The proposed algorithm reduces the SBR bitrate by modifying the envelope data of SBR so that the modification cannot be perceived in the subjective sense. Experiments show that the proposed algorithm achieves about 10~12 % bit reduction for the envelope data of SBR based on the 3GPP Enhanced aacPlus codec at the bitrate of 24 kbps with no perceptive sound quality degradation.
Bit-rate Reduction Using Efficient Difference Coding of Sinusoid Amplitude
Namsuk Lee, Samsung Electronics, Suwon, Korea
In this paper, we develop new method for sinusoid amplitude coding. In MPEG-4 SSC(SinuSoidal Coding, Parametric coding for high quality audio), audio signal is analyzed by transients, sinusoids and noise. the frequency and amplitude and phase of each sinusoid are extracted. And then, each sinusoid of current frame is connected with each sinusoids of previous frame using similarity of the frequency, amplitude and phase. Sinusoid connected to sinusoid of previous frame is continuation and sinusoid that is not connected to sinusoid of previous frame is birth. Continuations and Births are quantized and coded by Huffman entropy coding method. In our paper, we suggest new difference method of sinusoid amplitude in birth. Bit-rate is reduced using this method. In SSC parametric codec implemented by Samsung Electronics, bit-rate is reduced by 15.89% of sinusoid amplitude in birth
Implementation of 3D Sound Using Grouped HRTF
Seo Bo-Kug, Ryu Il-Hyun, Cha Hyung-Tai, Soongsil University, Seoul, Korea
Head Related Transfer Function (HRTF) databases, including the information of the sounds arrived at the ears, are generally used to make the 3D sound. The convolution of the HRTF with the original sound is a general method for sound image localization. However, to use the non-individual HRTF can cause
confusion in perception of the directions of the source and a degrading the moving sound effects due to each listener's unique characteristics. In this paper we propose a new HRTF method, which will generate a 3D-sound by grouping and averaging the HRTFs existing in the vicinity of the direction to localize. The MOS (Mean Opinion Score) test results show that the proposed method is much better than the conventional methods in both sound localization characteristics and moving sound effects.
An improved weighting curve based on equal-loudness contour
Inseok Heo, Koeng-Mo Sung, Seoul National University, Seoul, Korea
A weighted curve and ITU-R 468 weighting are commonly used these days. Both curves are based on the inverse shape of Equal-loudness contour -whose standard curve is revised as defined in ISO 226:2003. In this paper, an improved equation of weighting curve is discussed. Compared to pre-existing weighting curves, more similarities are presented between proposed method and the inverse shape of standard curve (ISO 226).
Low Carrier Frequency Noise-shaper for Digital Amplifier
Park Kyoungsoo, NeoFidelity, Seoul, Korea, Koeng-Mo Sung, Seoul National University, Seoul, Korea,
Noise-shaper has not been a severe concern in a digital amplifier which is to convert PCM data to PWM digitally. Simple noise-shaper with order of a few is good enough for most audio application if oversampling ratio is 8 and sampling frequency, 48kHz or around which is most typical value in real chip manufacturer. However, the use of low-prder noise shaper causes high frequency noise smear more to audio band from the low oversampling ratio and, therefore, more careful design of noise-shaper is required to overcome the artifacts. Low oversampling ratio noise shaper can be designed by spreading the zeros of noise transfer function optimally with parameters of sampling frequency, oversampling ratio, requantization resolution and noise-shaper order. This technique can use up the noise within band budget given that a minimum SNR requirement for an application is defined. Optimal or marginal noise shaper design also needs to consider nonlinearity at small signal input. Proper dithering is investigated to avoid that nonlinearity at small signal input keeping just-fitting SNR with limited carrier frequency.
Smooth PCM Clipping of Audio
Mohammed Chalil, Analog Devices, Bangalore, India
The PCM clipping problem with respect to audio is discussed in this paper. Two solutions to avoid this problem is discussed and results are compared.
Advanced Terrestrial DMB System Strucure for Multi-channel Audio Services
Hyun Wook Kim, Kyungsun Cho, Han Gil Moon, Samsung Electronics, Suwon, Korea
Advanced Terrestrial Digital Multimedia Broadcasting (T-DMB) system structure is proposed for multi-channel audio services. Multi-channel audio signal is divided into stereo audio signal and additional multi-channel side information. Each audio signal and multi-channel side information is transmitted through each elementary streams, respectively. Advanced T-DMB system mainly proposes separated OD Stream and protected ES_DescriptorUpdate command for conveying multi-channel side information. The first OD Stream for stereo audio signal can convey a command for new object descriptor, and second OD Stream uses a protected command for additional multi-channel side information.
Industrial Solution Introductions