11th January 2000 - Surround sound in large spaces - the challenges

Christian Landone, King's College London

A new year, a new room and, of course, a new lecture. Christian started the first Y2K lecture on January 11th by introducing the two main methods for the reproduction of directional information through multi-loudspeaker layouts.

Transaural techniques, to which Spectral stereo and cross-talk compensated binaural encodings belong, provide the listener with synthetic auditory cues corresponding to those generated by a real sound source. Holographic systems, on the other hand, produce a 2D or 3D sound field within a confined area and deliver natural cues as a result of the diffraction of the combined wavefront generated by the loudspeakers in the layout with the listener's head and torso. The examples in this case include pair-wise amplitude panning, Ambisonic and Wave Front Synthesis (WFS). Christian explained that Transaural methods are generally not suitable for large audience areas since prior knowledge of the listeners exact location is required and, for this reason, the lecture was focused mainly on holographic techniques.

The best known holographic reproduction methods were presented and their inherent shortcomings highlighted: pair-wise panning allows the placement of the auditory image within an arc subtended by two loudspeakers and therefore requires a high number of information channels and fairly complex rendering algorithms in order to allow the smooth transition of the notional source between adjacent loudspeaker pairs. Ambisonic requires a fairly low number of transmission channels and is, therefore, well suited for applications where bandwidth and storage space are critical; this method, however, can suffer from auditory artifacts caused by the presence of phase-reversed feeds in some of the loudspeaker. WFS, on the other hand, is very attractive from thetheoretical point of view, since it offers a complete solution to the problem of providing consistent imaging throughout the entire listening area; WFS, however, is based on spatial sampling and, therefore, the reproduction of a notional source with full bandwidth requires an unrealistic number of loudspeakers. Practical implementation of WFS systems hence must rely heavily on ‘psychoacoustic-based’ compromises.

Following this discussion Christian pointed out that, in general, holographic methods assume equal arrival time of the wavefronts generated by each transducer at the listening location; this condition can be satisfied exclusively at one point (the sweet spot) in the audience area. Plane wave propagation in unbounded spaces is also implicitly assumed in most holographic techniques but in the instance of a large area with more than one listener, the proximity effect of the loudspeaker and the non-anechoic characteristic of the performance space should be accounted for.

Christian then moved on to the actual topic of the lecture: performance evaluation methods of surround sound systems for large audience areas and presented the various methods currently in use. Assessment methods were divided into two broad categories taking into account either the wave-theoretical aspects of the reproduced sound field or its effect on the listener from a perceptual point of view.

The wave theoretical methods, such as the Integrated-D error and the Interference Patterns mainly evaluate the difference between the acoustical field generated in the listening area by the original source and that reproduced by the loudspeaker array. Perceptual methods, instead, quantify the spatial imaging accuracy on the basis of the quality and coherence of the auditory cues generated by diffraction of the combined wavefront on the listeners. The Velocity and Energy vectors were singled out as the main tools, within this particular category, for the assessment of holographic sound reproduction systems. An in-depth discussion of each assessment technique followed; the Integrated-D error, a quantity based on Huygen's principle, measures the pressure difference between the desired and reconstructed sound fields in a circular area centred on the sweet spot; the D-error appears to be an excellent technique for assessing the relative reconstruction accuracy of different surround layouts and encoding scheme, but does not provide information regarding the nature of the error or its effect in terms of localisation impairment.

The Interference Pattern method visualises the acoustical field generated in the listening space and can be used to assess the extension of the sweet spot in the listening area. This technique, however, only shows the pressure (amplitude) distributions but neglects phase information. Christian pointed out that, because only a ‘partial picture’ of the acoustical field is presented, such a method could be misleading, and showed some cases where more than one sweet spot can appear within the listening area! In the Perceptual techniques, generally, some measures based on localisation vectors are used. These quantities were initially proposed as approximate psychoacoustic criteria for optimising ambisonic encoders and decoders. The Velocity vector addresses the characteristic of the reproduced image according to the low frequency theory of the human auditory localisation, based on interaural time differences (ITD), and, for this reason, it can be considered valid for frequencies below 800Hz. Above 800Hz the so-called ‘Energy vector’ is utilised; in the high frequency regime, in fact, the main localisation cues on the horizontal plane are interaural level differences (ILD). When both vectors are used together, the image generated by the surround sound system could, in theory, be entirely characterised both in terms of its apparent angular location and its perceived sharpness.

Christian then presented some non-ideal listening conditions that make all these assessment methods fail in one way or another, pointing out that the precedence effect can be considered as the main culprit for this. The law of the first wavefront, a feature of the hearing system responsible for the collapse of auditory images into the loudspeaker nearest to the listener, is not taken into account by any of the assessment methods currently in use. By their very nature, in fact, both the D-error and the Interference patterns do not consider the listener as ‘part of the equation’ and, although delays can be accounted for by the localisation vectors, they can only be used to quantify the degree of phasiness of the reconstructed image (i.e. these are valid for relative delays of microseconds rather than milliseconds).

Christian gave a brief tour of his own research experiments, concluding that traditional methods in use for the evaluation of holographic sound reproduction are not suitable at all in the generalised case of a large audience and suggesting that only a binaural computational model that takes into account the precedence effect can be considered reliable. Only then it will be possible to optimise the surround sound impact on large audiences.

Wesley Maebe