AES Munich 2009
Thursday, May 7, 10:00 — 11:30
Poster Session P3
P3 - Recording, Reproduction, and Delivery
P3-1 Audio Content Annotation, Description, and Management Using Joint Audio Detection, Segmentation, and Classification Techniques—Christos Vegiris, Charalambos Dimoulas, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The current paper focuses on audio content management by means of joint audio segmentation and classification. We concentrate on the separation of typical audio classes, such as silence/background noise, speech, crowded speech, music, and their combinations. A compact feature-vector subset is selected by a Correlation feature selection subset evaluation algorithm after the use of EM clustering algorithm on an initial audio data set. Time and spectral parameters are extracted using filter-banks and wavelets in combination with sliding windows and exponential moving averaging techniques. Features are extracted on a point-to-point basis, using the finest possible time resolution, so that each sample can be individually classified to one of the available groups. Clustering algorithms like EM or Simple K-means are tested to evaluate the final point-to-point classification result, therefore the joint audio detection-classification indexes. The extracted audio detection, segmentation, and classification results can be incorporated into appropriate description schemes that would annotate audio events/segments for content description and management purposes.
Convention Paper 7661 (Purchase now)
P3-2 Ambience Sound Recording Utilizing Dual MS (Mid-Side) Microphone Systems Based upon Frequency Dependent Spatial Cross Correlation (FSCC) [Part 3: Consideration of Microphones’ Locations]—Teruo Muraoka, Takahiro Miura, Tohru Ifukube, University of Tokyo - Tokyo, Japan
In order to achieve ambient and exactly sound-localized musical recording with fewer number of microphones, we studied sound acquisition performances of microphone arrangements utilizing their Frequency Dependent Spatial Cross Correlation (FSCC). The result is that an MS microphone is best for this purpose. The setting of the microphone's directional azimuth at 132 degrees is the best for ambient sound acquisition and setting of that at 120 degrees is best for on-stage sound acquisition. We conducted actual concert recordings with a combination of those MS microphones (Dual MS microphone systems) and obtained satisfactory results. Successively, we studied the proper setting positions of those microphones. For ambient sound acquisition, suspending the microphone at the center of a concert hall is favorable, and for on-stage sound acquisition, locating it at almost above the conductor’s position will also be satisfactory. Process of the studies will be reported.
Convention Paper 7662 (Purchase now)
P3-3 A Comparative Approach to Sound Localization within a 3-D Sound Field—Martin J. Morrell, Joshua D. Reiss, Queen Mary, University of London - London, UK
In this paper we compare different methods for sound localization around and within a 3-D sound field. The first objective is to determine which form of panning is consistently preferred for panning sources around the loudspeaker array. The second objective and main focus of the paper is localizing sources within the loudspeaker array. We seek to determine if the sound sources can be located without movement or a secondary reference source. The authors compare various techniques based on ambisonics, vector base amplitude panning and time delay based panning. We report on subjective listening tests that show which method of panning is preferred by listeners and rate the success of panning within a 3-D loudspeaker array.
Convention Paper 7663 (Purchase now)
P3-4 The Effect of Listening Room on Audio Quality in Ambisonics Reproduction—Olli Santala, Helsinki University of Technology - Espoo, Finland; Heikki Vertanen, Helsinki University of Technology - Espoo, Finland, University of Helsinki, Helsinki, Finland; Jussi Pekonen, Jan Oksanen, Ville Pulkki, Helsinki University of Technology - Espoo, Finland
In multichannel reproduction of spatial audio with first-order Ambisonics the loudspeaker signals are relatively coherent, which produces prominent coloration. The coloration artifacts have been suggested to depend on the acoustics of the listening room. This dependency was researched with subjective listening tests in an anechoic chamber with an octagonal loudspeaker setup. Different virtual listening rooms were created by adding diffuse reverberation with 0.25 seconds RT60 using a 3-D 16-channel loudspeaker setup. In the test, the subjects compared the audio quality in the virtual rooms. The results suggest that optimal audio quality was obtained when the virtual room effect and the direct sound were on equal level at the listening position.
Convention Paper 7664 (Purchase now)
P3-5 Ontology-Based Information Management in Music Production—Gyorgy Fazekas, Mark Sandler, Queen Mary, University of London - London, UK
In information management, ontologies are used for defining concepts and relationships of a domain in question. The use of a schema permits structuring, interoperability, and automatic interpretation of data, thus allows accessing information by means of complex queries. In this paper we use ontologies to associate metadata, captured during music production, with explicit semantics. The collected data is used for finding audio clips processed in a particular way, for instance, using engineering procedures or acoustic signal features. As opposed to existing metadata standards, our system builds on the Resource Description Framework, the data model of the Semantic Web, which provides flexible and open-ended knowledge representation. Using this model, we demonstrate a framework for managing information, relevant in music production.
Convention Paper 7665 (Purchase now)