Sections

AES Section Meeting Reports

Pacific Northwest - January 25, 2022

AES Fellow James D. (jj) Johnston spoke on auditory masking at PNW Section's January 2022 meeting, via Zoom.

Meeting Topic: Masking — What is it, and when does it happen?

Speaker Name: James D. (jj) Johnston - Immersion Networks

Meeting Location: Zoom

Summary

PNW held its January 2022 meeting on Zoom, presenting AES Fellow James D. (jj) Johnston speaking about the basics of auditory masking. About 55 AES members attended of 76 total.

PNW Chair Greg Dixon opened the meeting, then committeeperson Dan Mortensen promoted the PNW Section Tea Time Topics, an informal audio Zoom gathering on most Saturdays. Steve Turnidge introduced jj and handled moderating duties.

Mr. Johnston, (hereinafter referred to as jj in lowercase) found that questions about sound mixing and instruments disappearing led him to do this talk, which is built upon his previous tutorial about the human hearing mechanism given in April 2019, which may be viewed on the PNW website here:
https://www.aes-media.org/sections/pnw/pnwrecaps/2019/apr2019/index.htm

The 2019 talk, dubbed "Hearing 099" (in reference to introductory courses often numbered as 101), introduced the basics of the human auditory system including the cochlear filtering and frequency analysis. Typos referring to Hearing 096 may be ignored.

Also available were demonstration sound files in uncompressed .wav format and full resolution jpegs of their spectral analysis. A complete meeting recap with links to files, Zoom chat and video are at the PNW website meeting archive at:
https://www.aes-media.org/sections/pnw/pnwrecaps/2022/jan2022/index.htm

jj reviewed the basics of hearing, noting that the ear is a time and frequency analyzer, and for masking, he defined loudness and partial loudness, a sensory term. Next, he also defined ERB (Equivalent Rectangular Bandwidth), the same idea as a critical band.
A primary source of masking is that the signal to noise ratio (SNR) of hair cells is about 30dB, which gets mapped across a 90dB range via cochlear mechanics and outer hair cells. Masking occurs when the partial and total loudnesses do not change when things are added to or removed from the original signal.

Attendees were invited to download and listen to demo files rather than hear compromised files over Zoom. Use headphones, and compare the files nbnoise.wav to nbnoisesin.wav to nbnoisesins.wav (narrowband noise with sine wave). 1-2 should sound the same, with 2 having a tone masked by noise. The 3rd should have an audible extra tone that is unmasked.

So, principle/rule of masking # 1: Masking is frequency based, and need similar spectra. jj went on to introduce additional principles/rules of masking, including:

2, there are temporal aspects of masker and signal to mask, such as when simultaneous, or postmasking or premasking (and the dreaded pre-echo codec effect).

3, sometimes less than 30dB difference is needed to mask, such as noise masking tones, which can be as low as 3.5dB. Demo files were tmn.wav (tonemaskingnoise) and nmt.wav (noisemaskingtone), both with a 15dB difference.

Of course, real audio isn't just tones and noises, it's complicated. For speech, try 15dB or so as an approximation. Music can cover the full range from 3.5dB to 30dB. Stereo creates more complications. The "Suzanne Vega problem" was a singer, mainly mono, in a stereo recording. An effect called BMLD (Binaural Masking Level Depression) causes changes in masking, so use care in stereo mixing and use a good codec that is BMLD aware. Two demo files offered were inphase.wav and outphase.wav - they have the same power spectra but don't sound the same.

And now up to about 6 rules - if any ERB isn't masked, the signal isn't masked.

Some practical takeaway ideas were:
-similar spectra may mask, or be masked
-similar time domain behavior in an ERB or ERBs may be masked
-different time structure in 2 signals may prevent masking

A lively Q&A session ensued, followed by two short get-acquainted breakout room sessions, then a group self-introductions and chat period.

The Presenter
James D. (jj) Johnston is Chief Scientist of Immersion Networks. He has a long and distinguished career in electrical engineering, audio science, and digital signal processing. His research and product invention spans hearing and psychoacoustics, perceptual encoding, and spatial audio methodologies.
He was one of the first investigators in the field of perceptual audio coding, one of the inventors and standardizers of MPEG 1/2 audio Layer 3 and MPEG-2 AAC. Most recently, he has been working in the area of auditory perception and ways to expand the limited sense of realism available in standard audio playback for both captured and synthetic performances.
Johnston worked for AT&T Bell Labs and its successor AT&T Labs Research for two and a half decades. He later worked at Microsoft and then Neural Audio and its successors before joining Immersion. He is an IEEE Fellow, an AES Fellow, a NJ Inventor of the Year, an AT&T Technical Medalist and Standards Awardee, and a co-recipient of the IEEE Donald Fink Paper Award. In 2006, he received the James L. Flanagan Signal Processing Award from the IEEE Signal Processing Society, and presented the 2012 Heyser Lecture at the AES 133rd Convention: Audio, Radio, Acoustics and Signal Processing: the Way Forward. In 2021, along with two colleagues, Johnston was awarded the Industrial Innovation Award by the Signal Processing Society "for contributions to the standardization of audio coding technology."
Mr. Johnston received the BSEE and MSEE degrees from Carnegie-Mellon University, Pittsburgh, PA in 1975 and 1976 respectively.

Written By: G Louie

More About Pacific Northwest Section

Navigation

AES Section Meeting Reports

Pacific Northwest - January 25, 2022

Summary

ABOUT AES

Contact Us