Meeting Topic: A Personal History of Perceptual Audio Coding
Speaker Name: James (JJ) Johnston, DTS Inc.
Meeting Location: Microsoft Studios, Redmond, WA
James (JJ) Johnston spoke at the October PNW Section meeting, giving a personal viewpoint on his participation in the evolution of perceptual audio coding. Some 15 AES members and 18 nonmembers attended the meeting held at Microsoft Studios in Redmond.
James D. (JJ) Johnston worked for AT&T Bell Labs and its successor AT&T Labs Research, retiring (temporarily) in 2002. He also worked for Microsoft as Windows Audio Architect for 6 years. He is currently Chief Scientist for DTS Inc. and an AES Fellow.
The music world is now well familiar with using reduced bit rate codecs such as the MP3 (really MPEG-1 audio layer 3) and AAC (Advanced Audio Codec) after decades of using linear PCM with the compact disc. JJ gave his personal view of his pioneering work into music codecs. He jokingly called his talk a personal history of his making a test program for a computer that turned into MP3.
He reviewed the 1970's analog and digital technology he worked with while researching music codecs at AT&T. These were power hungry, big, complex, hybrids of digital and analog circuits — with lots of touchy adjustments. The addition of analog dividers and multipliers allowed him to adjust step sizes in the converters. Then, sub-band coders (SBC) using analog filters were developed — which worked well, showing that integer band sampling and SBCs were a practical concept, even if the analog implementation was very complex. A 56kbs "commentary" coder using such techniques (in analog hardware) worked well, proving the concepts. The Quadrature Mirror Filter (QMF) could be done digitally, but no good filters existed. JJ's early filter designs for this in 1979 are still his most cited papers.
By the early 1980s, computer power could barely do two bands of SBC, and yet the results
sounded poor — an "upward spread of masking" effect, JJ learned, and his first hint of the need for perceptual coding.
In 1984, an Alliant FX minicomputer had arrived at AT&T, and now JJ had enough computer memory space — more than 32kwords! He was assigned to "break the compiler" and run it through its paces. He would develop a series of test programs to insert perceptual noise, measure perceptual entropy, and to do perceptual transform coding (PXFM). There was some pre-echo, but PXFM generally sounded much better than previous techniques.
Test material sound sources were four vinyl LP clips, copied to audio Compact Cassette at home, then played into a 12 bit converter on a minicomputer in the lab - it took weeks to get useful amounts of material processed. The arrival of CDs made things easier.
By 1986, JJ's computer was working well and the first informal listening tests were held at AT&T. But by 1987 JJ was working on video, and his audio work was not published as AT&T balked at patenting costs. In 1988, after finally getting the patents, JJ took his concepts to the IEEE/ICASSP conference. Next to him was Karlheinz Brandenburg's paper (presented by Heinz Gerhäuser). They looked at each other's posters and realized they had the same concept. They convinced their respective bosses that they should be working together on this. This, he feels, is probably the birth of MP3.
JJ played some of the original test material, male voices singing in an African style:
1- the original digital track (originally 12bit floating (pseudo 16bit)/32kHz)
2- with perceptual noise insertion, a 13.6dB signal/noise ratio, and not bad sounding
3- with sample modulated white noise at 13.6dB S/N - very noisy
4- the white noise difference signal — a spitty sound, often with a tonal character
5- the difference signal of the perceptual noise example — it sounded somewhat like a very distorted original, but it's buried under the original and you don't normally hear it.
Next came the pain of dealing with the standards bodies. Four years late, JJ wrote the paper. There was no money in the idea and no market, so management said to make it a standard, and MPEG was starting up at the time. At the MPEG meeting, it seemed the IRT (German broadcast research) had it's own ideas about audio codecs, but the 16 various codec proposals were combined into 4 groups, told they had to combine their ideas, then submit the 4 ideas for evaluation.
JJ's group created ASPEC (Audio Spectro-Perceptual Entropy Coding) using
PXFM, an OCF (Optimum Coding in the Frequency Domain) filterbank (using MDCT, Modified Discrete Cosine Transform), block switching, and other ideas. In spite of hardware interface changes stipulated late in the game, which meant ASPEC had some jitter, ASPEC still won the quality evaluation.
However, they were oddly told it was too complicated to implement, and rejected.
Instead, 2 audio parts for MPEG-2 (layers 1 and 2) were specified. There was much acrimony between various factions. Layers 3 and 4 were proposed, but since they needed to use the filter banks of Layers 1-2, a hybrid filter bank was finally devised and called Layer 3. Oddly, the hybrid was not deemed too complicated like MDCT, even though it was. After much more intergroup battles, the standard was finally agreed to.
And what of AAC? While MP3 was finishing, researchers went on to suggest further changes regarding backward compatibility and MPEG-2 audio layers. JJ figured none of this was adequate, dropped out of the standards mess, and teamed with Anibal Ferierra to make an improved codec, PAC (Perceptual Audio Coder). When MPEG had a test, they had to allow 2 non-backwards compatible codecs including PAC. PAC won the test.
For this non-backward compatible (NBC) project, the top NBC developers were forced to work together to work on one NBC that MPEG might allow to be standardized. Most features of PAC replaced most features of ASPEC - and was renamed AAC.
A break was held for snacks and door prizes, which were Opus 4 Studios CDs and a copy of Adobe Audition 3 (courtesy Adobe/Charles Van Winkle).
After the break, a Q&A session included thoughts on codec testing material, WMA and dithering. JJ recommended higher bit rates for better quality, and lossless if you can. He has no iPod, and uses speakers to hear his CDs.
Written By: Gary Louie