Events of the AES: AES 112th Convention: Session Y: SIGNAL PROCESSING FORUM

Return to 112th

Detailed Calendar
(in Excel)

Calendar (in PDF)

Chairman's Welcome

Exhibitors

Metadata Symposium

Special Events

Papers

Workshops

Technical Tours

Cultural Tours

Students

Historical

Heyser Lecture

Information
for Authors

Tech Comm Mtgs

Standards Comm Mtgs

Travel

Hotel Information

Registration

Session Y: SIGNAL PROCESSING FORUM - PART 2

Monday, May 13, 09:00 – 13:00 h
Chair: John Mourjopoulos, University of Patras, Patras, Greece

09:00 h
Y1 DSP Hardware and Software Trade-offs for Professional Audio Applications – Remi Payan, Texas Instruments, Houston, TX, USA

Professional Audio systems target a wide variety of applications for recording, mixing, musical instruments, studios, and broadcasting. The challenge for Pro Audio systems designers is to find solutions not only to satisfy some of the more common requirements (including high sample rates, high performance and high precision) but also to meet system-specific design goals such as minimizing latency in effects processors and bit error tracking in digital mixing consoles.
This paper will first summarize these requirements; then discuss hardware architecture and software methodology trade-offs (e.g. sample-by-sample vs. block processing) and present a flexible DSP architecture for these systems.
Convention Paper 5622

09:30 h
Y2 Time-Domain Polyphonic Piano Transcription using a Self-generating Database – Juan Bello, Laurent Daudet, Mark Sandler, Queen Mary University of London, London, UK

We describe a new method for the estimation of multiple pitch information in recorded piano music. The method works in the time-domain and makes use of a self-generating database of all possible notes. First, we show how accurate polyphonic pitch detection can be achieved given an adequate database. Then an algorithm is proposed that generates the database from the music, using estimation of predominant pitches in the frequency-domain and pitch-shifting techniques. Both systems generate a MIDI representation of the original signal. This method -that can be generalized to any solo instrument- overcomes the usual constraints of the traditional frequency-domain approach regarding intervals and quantity of notes.
Convention Paper 5623

10:00 h
Y3 A System for Hybridizing Vocal Performance - Kim Hang Lau, Creative Technology Singapore, Singapore

This paper describes an experimental prototype system developed for vocal modification. This system aims to modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung target vocal sample, simulating a non-parametric transfer of singing techniques from the target vocalist to the source vocalist. The system is comprised of a time-varying time/pitch/amplitude modification engine, a pitch-detector and a subsystem for the generation of modification parameters. Although the system has yet to attain a desirable level of robustness, it has successfully generated interesting synthesized samples that demonstrate the idea of a hybridizing vocal performance.
Convention Paper 5625

10:30 h
Y4 Phoneme Recognition for 3-D Modeled Digital Character Talking Emulation – George Kalliris, Charalampos Dimoulas, George Papanikolaou, Kostas Avdelidis, Taxiarchis Passias, John Stoitsis, Aristotle University of Thessaloniki, Thessaloniki, Greece

The current paper focuses on the design and implementation of a phoneme recognition algorithm that is used to extract the appropriate parameters in order to drive a 3d graphics facial expression and animation procedure. This is used to emulate speech generation to 3d modeled digital characters. At the first development step, LPC, STFT analysis, wavelets, cepstrum and pattern recognition techniques were tested for phoneme recognition and speaker classification. Then, 3d graphics facial expressions and phonemes were related in a library. A client – server application that processes speech, combines library data via morphing techniques and generates a digital character, virtually speaking according to the given speech, was finally designed. Possible applications include cartoon dubbing and web based virtual teleconference.
Convention Paper 5626

11:00 h
Y5 Efficient High-Frequency Bandwidth Extension of Audio and Speech – Erik Larsen¹, Michael Danessis², Ronald Aarts¹ - ¹Philips Research Labs, Eindhoven, Netherlands; ²University of Salford, Salford, UK

The use of perceptually based (lossy) audio codecs, like MPEG 1 - layer 3 ('mp3'), has become very popular in the last few years. However, at very high compression rates the perceptual quality of the signal is degraded, which is mainly exhibited as a loss of high frequencies. We propose an efficient algorithm for extending the bandwidth of an audio signal, with the goal to create a more natural sound. This is done by adding an extra octave at the high frequency part of the spectrum. The algorithm uses a non-linearity to generate the extended octave, and can be applied to music as well as speech. This also enables application to fixed or mobile communication systems.
Convention Paper 5627

11:30 h
Y6 Signaling Techniques for Broadcast Applications – Matthew Watson, Michael Truman, Dolby Laboratories Inc., San Francisco, CA, USA

This paper introduces two signaling techniques that may be useful in broadcast applications for combining supplemental data with program material. Such data may be used to monitor audience participation, advertising placement or to convey additional program information (including lyrics and URLs). The first technique involves replacing unused bits, or fill bits, in fixed data-rate compression systems with information-carrying data in a way that does not affect the quality of the program material and does not necessitate modification of encoders or decoders. A second technique involves modulating the bandwidth of a signal and adding perceptually shaped noise to create an inaudible, constant-rate supplemental signal. Both techniques may be integrated with the Dolby Digital perceptual coding system.
Convention Paper 5628

12:00 h
Y7 A New Concept of Interference Compensation for the Parametric and Graphic Equalizer Banks - Seyed Ali Azizi, Harman/Becker Automotive Systems, Karlsbad, Germany

The overall frequency response of a graphic or parametric equalizer bank, consisting of a number of serially connected cut or boost equalizers, may show serious deviations from the user defined gain setting. The deviations are caused by mutual interference's of the equalizers, and depend on gains, quality factors and the center frequencies of the individual equalizers. This paper discusses the known interference compensation strategies and then introduces a new approach to efficiently counteract the undesired interference effects. It is based on the "Opposite Filter Concept": Based on the user defined parameter setting of an equalizer bank, a set of simple filters counteracting the interference's are adequately parameterized and serially inserted into the equalizer bank resulting in substantial diminution of the interference effects, and consequently in generation of an overall frequency response close to the desired one .
Convention Paper 5629

12:30 h
Y8 Improvements of Artificial Reverberation by use of Subband Feedback Delay Networks – Igor Nikolic, Mihajlo Pupin Institute, Belgrade, Yugoslavia

This paper proposes the improvement of artificial reverberation through decomposition of input signal into subbands and use of different subband feedback delay networks (FDN). By this means non-uniform modal density is achieved and orders of FDNs are lowered. Evaluation of filter banks and corresponding subband FDNs are described. Comparison results, obtained through objective (simulation) and subjective (listening tests) analysis, are presented.
Convention Paper 5630