Heyser Memorial Lecture
AES 133rd Convention
Moscone Center, San Francisco, USA
Saturday, October 27, 6:30 pm
In 1971 while at Carnegie-Mellon University I was told that there were too many people in audio who simply didn’t make sense, that there was no money in audio, and that all the problems were solved, even though we were using magtape and LP’s. Audio is a very interesting and tricky field. Presently, research in the USA is limited to a couple of commercial companies (who deserve some applause for keeping it going), and the budget for research, as opposed to advanced development, is rather tiny and begrudged. That, alone, makes it a tough field, but there are worse problems. Shortly after I joined the AES, I discovered a split within the AES into several warring groups. On one hand we had the analytical engineers, who read SNR’s, did various system transfer function measurements, and worked from the hard-science side, and on the other hand we had the artists, people who actually made, recorded, mixed, and otherwise made the product that the AES depends on, and then there was the the High End. When I started, the idea was strongly stated that we didn’t understand how people heard things well enough to apply one to the other. That idea still persists today, at least as expressed from the artistic side. On the other side, we see one-number THD measurements, SNR measurements, and such, which do little to encourage dialog or information transfer. In the middle, we have a growing crowd of people who would like to pull the two disciplines back together, as in fact we DO understand a lot more than usually credited from either side.
In this talk, I will mention some experiences I’ve had, discuss our present understanding of human auditory perception in very broad terms, point out how the way we actually work (meaning our ears and brains) encourages a dichotomy of knowledge that no longer exists, and then suggest some ways that education on all sides (I think there are no longer “two” sides, if there ever were) can bring people closer together, apply some of the technical things we know on the artistic side, learn what the artistic side needs and wants, and solve some of the issues like “mix buss performance” claims, the performance of various processors (which are quite nonlinear, and for good reason), and so on. It is my hope that we can continue to push the understanding of perception into all sides of the equation in a more direct fashion, which should help create the “really here” immersive understanding that is the goal of the realists, create the “you could never be here but you wish you could” sensation of the avant-garde, and encourage the delivery systems of the world to ‘get with the program’. I know it’s a dream, but there is a lot to gain by exchanging information, allowing information to be tested, learning what is really going on, and taking advantage of modern science, in the service of art.
Many years ago, when I was an undergrad at Carnegie-Mellon University, I mentioned to a couple of the professors that I found audio and radio to be interesting. Their reactions were quite clear, none of them thought it was even a remotely good idea. When pressed for a reason why, the responses were quite similar, in particular, I was told that there were too many people in audio who simply didn’t make sense, that there was no money in audio, and that all the problems were solved. Note that this was in the days of early cassette recorders, huge Crown tape decks (well, I remember them, but not fondly, for the relay problems), as well as Scully, Ampex, and the other famous makers, and the idea of digital audio wasn’t even in the cards. Speech transmission had just started to completely switch over to digital signaling with mu-Law in the USA, and a-Law most everywhere else. The computer I used to feed cards to was a very fast 1 MIPS (and filled a room) and the timesharing box I could get a terminal on sometimes was a whopping 200KIPS.
From there, I went to Bell Labs, hired in by David Goodman and Jim Flanagan to do work on speech coding. Dave reported to Jim, who reported to Max Matthews, who was at that time still at the labs, doing computer music. A long story ensued, but I wound up doing work on ADPCM at much higher sampling rates than 8kHz, with 2 to 12 bits of resolution, switch selected. That work indicated rather clearly to me, as did some follow up work, that perceptual issues were a key to audio, in more or less every fashion. This turned out not to be too interesting to upper management, who, partially as a result of various legal issues, said, “we don’t do audio”. It was, however, also conveyed to me very clearly that audio wasn’t the place to be, it was full of people who were a bit off, at least for the most part. As fate would have it, I spent my life working on audio signal processing at Bell Labs and AT&T Research, contributing heavily to MP3 and even more so to MPEG-2 AAC, then short periods at Microsoft in the multimedia division, and even shorter periods with Neural Audio and elsewhere working on spatial sensation, room correction, and loudness modeling, at which time I decided it was time to pack it in regarding the corporate world.