Audio Meta Data Generation for the Continuous Media Web

Claudia Schremmer1, Steve Cassidy2, Silvia Pfeiffer1
1CSIRO, Epping, NSW, Australia
2Macquarie University, Sydney, Australia

The Continuous Media Web (CMWeb) integrates timecontinuous media into the searching, linking, and browsing function of the World Wide Web. The file format underlying the CMWeb technology, Annodex, streams the media content multiplexed with metadata in CMML format that contains information relevant to the whole media file (e.g., title, author, language) as well as time-sensitive information (e.g., topics, speakers, time-sensitive hyperlinks). This paper discusses the problem of generating Annodex streams from complex linguistic annotations: annotated recordings collected for use in linguistic research. We are particularly interested in automatically annotated recordings of meetings and teleconferences and see automaticallygenerated CMML files as one way of viewing such recordings. The paper presents some experiments with generating Annodex files from hand annotated meeting recordings.

