15th June 1999 - DVD-Audio

Robert Stuart, Meridian Audio
Kirk Paulsen, Sonic Solutions

Bob Stuart of Meridian and Kirk Paulsen of Sonic Solutions presented the June lecture, an introduction to the DVD-Audio format. The event provided an opportunity for DVD-Audio proponents to put their case after April's lecture on Super Audio CD.

The DVD-A and SACD formats are currently competing to replace the CD for high-quality stereo and surround-sound music delivery.

Bob Stuart began by introducing the key points of the DVD-A specification - keenly awaited, much - refined, and very recently released to the industry by the 44 member companies of WG4. The mainstays of the format are high audio quality, multi-channel capability for surround reproduction, and the inclusion of additional multimedia content such as still pictures and video clips. In common with other new formats, there is also a strong copyright protection infrastructure.

The audio quality potential of the DVD-A format may, at last, exceed the range of human perception, at least in terms of bandwidth and dynamic range. It is hard to . conceive that the maximum 192kHz sampling rate and 24-bit wordlength could be insufficient, although it is debatable whether any origination process will ever fill the potential of such a carrier. Any number of channels from one to six may be encoded on the disk (although the 176.4kHz and 192kHz sampling rates are confined to the two-channel modes).

The disc's physical technology is identical to its DVD brethren such as DVD-Video and DVD-ROM. The capacity of a single-side is sufficient for 65 minutes of six-channel, 96kHz-sampled, 20-bit PCM; which seems quite respectable, except that this would require data to be read from the disc at an aggregate rate of 13.8Mb/s - the specified transport limit is only 9.6Mb/s. This limitation in data rate, as much as the modest playing time, is the rationale for the DVD-A's 'MLP' option (Meridian Lostless Packing), an ingenious system for reducing the amount of data needed to encode audio signals - but without any change to the audio data.

With MLP, which offers average compression ratios between 2:1 and 3:1 depending on the music and the chosen sampling parameters, 74 minutes of six- channel, 96kHz-sampled, 24-bit PCM occupies only about 2/3 of the disc capacity, and the aggregate data rate can be maintained below the 9.6Mb/s ceiling. Other possible permutations are 74 minutes of 5.l-channel surround PLUS a two-channel stereo mix, all at 96kHz, 24-bit; or two hours of two-channel, 24-bit at 192kHz, or even 25 hours of 44.1kHz-sampled, 16-bit audio for a super-quality talking book.

For compatibility with two-channel replay systems two approaches are supported - either a completely independent two-channel mix can be encode alongside the multi-channel program, or the two-channel version can be down-mixed from the surround program according to mix coefficients (possibly dynamic) specified by the mastering engineer.

MLP is an amazing technology, and one which is very counter-intuitive to most of us - how can the number of bits used to encode an audio signal be reduced, and yet all the original bits be recovered by the decoder? This process is not at all like those all-too-familiar perceptual coding schemes which don't reproduce the original data, but instead degrade the signal in ways intended to be imperceptible. MLP is more like 'PKZip', familiar from the world of PCs, except that PKZip wouldn't be very good at reducing the data rate of 24-bit, 96kHz-sampled audio signals.

MLP takes advantage of the fact that audio is rather predictable. The preponderance of lower frequencies allows subsequent Samples to be predicted by a knowledge of those which came before - not very accurately, nor with infallibility, but by encoding only the differences between the actual samples and the predictions of a sophisticated filter, a good saving in data rate can be achieved. Another important aspect of MLP is its FIFO buffering - if the instantaneous data rate required by an awkward passage exceeds the 9.6Mb/s maximum, the player can catch up later without being embarrassed by a lack of output samples in the mean time. In other words, peaks in the data rate can be ironed out using spare throughput available during nearby troughs.

Because MLP works on an entropy principle, it has some interesting properties: for example, if the original audio is sampled at a higher rate (say 96kHz instead of 44.1kHz), MLP will generally be able to compress the data more (surely not because there's nothing up there!) This means that high sample rates can be accommodated by DVD-A with a surprisingly small loss of capacity. On the other hand, increasing the wordlength may produce a disproportionate degradation in the compression ratio since the additional low-order bits are very chaotic (surely they're not just noise!) especially if sigma-delta A/D converters have been used.

All very interesting, you might think, but surely it's bound to be ages before it catches on, and anyway you couldn't make a DVD-A yet even if you wanted to. Enter Kirk Paulsen of Sonic Solutions, manufacturer of a real-life DVD-A mastering system.

Although Sonic Solutions has already supplied more than 200 DVD-Video mastering systems in Europe over the three years since the format's introduction, Kirk was at pains to point out the very different work-flow which is needed to master a DVD-Audio disc. Surprisingly, despite the bewildering range of quality / capacity trade-offs available in encoding video for DVD, a few years of operational experience has allowed the process to be virtually automated offline. Video content can be 'optimally' encoded by the computer overnight without the need for operator intervention. For audio mastering engineers, on the other hand, the workload may be about to get heavier.

The DVD-A mastering process starts with 'Audio Prep' - this is familiar territory for CD mastering engineers; it's simply capturing and editing the audio 'assets' (as we must learn to call them) only now there are more channels, and more quality options in terms of sampling rate and wordlength. The question of how (or whether) to provide the two-channel mix must be decided here. Next comes the assimilation of the ASVU assets - that's setting up the slide show to you (Audio Still Video Unit -- presumably that's what they call slides in the CIA); there are lots of decisions to be made here - obviously choosing the slides, but also determining the transitions (cut, wipe etc.) and the degree of interaction allowed - will the listener (viewer?) be allowed to choose his own running order for the slides? Can it be random? Can he browse? Next comes Menu Creation, and finally Motion Video Prep. Hmm.... don't hang up your headphones yet guys! It can't be THAT difficult. In fact, the key to this bewildering array of tools seems to be PROOFING - at each of the many stages, different options can be easily auditioned and evaluated, and the overall effect assessed.

We clearly have interesting times ahead. There will surely he many new things for us all to learn. But the potential in terms of the listening (and viewing) experience is incomparable to anything we've had before. I, for one, am really looking forward to it! Our thanks are due to and Bob Stuart and Kirk Paulsen for their entertaining and informative introduction to the new format.

Ian Dennis