Meeting Topic: Dereverberation and Other Audio Trickery
Moderator Name: Blair Francey
Speaker Name: Dr. Gilbert Soulodre, FAES; Camden Labs
Other business or activities at the meeting:
Blair Francey welcomed everyone to the meeting.
He thanked the sponsors and the executive committee; and Norris publications.
He provided notification of future meetings and reminded members to renew their dues.
Meeting Location: Ryerson University: RCC 361, Communications Building - Toronto ONTARIO
Blair Francey introduced Dr. Soulodre and provided the audience with his background. Dr. Soulodre invented the loudness measure adopted as the international standard (ITU-BS1770 and EBU-R128), receiving an Emmy Award.
Gilbert's presentation used visual aids and as well as many audio demonstration examples, utilizing his software. The presentation was divided into two parts: the first half dealt with the theory and background, along with applications and products; while the second half offered many audio examples.
He discussed signal decomposition technologies, referring to extracted audio signals as streams. In the case of dereverberation there would be two streams: the dry signal and the pure reverberation. As he later demonstrated, this software technology enables many other algorithms to be created, such as extracting individual sound sources from a stereo field.
Perceptual based processing is used because there is no mathematical or physical laws that say this (eg: dereverberation) is possible. He's developed a perceptual model to account for many aspects of human hearing including loudness, frequency domain effects, temporal and spatial effects, to name a few.
Discussing his perceptual model with regard to phase, he noted humans are relatively insensitive to phase on a single channel. Processing is done so that errors in magnitude are reduced at the expense of greater errors in phase. Block based processing is used to contain errors.
His dereverberation algorithm relies on three assumptions: nearby impulse room responses tend to have similar magnitude responses, but very different phase responses; the human ear is relatively insensitive to phase differences of errors over short periods of time (measured in tens of milliseconds); if the processing is done over short 'blocks' of time, the phase of the input signal is used as an estimate for the phase of the dry extracted stream. It can also be used to estimate the phase of the reverberation.
He did state that the ear is not insensitive to phase over longer periods of time. Also, processing over long blocks will result in audible time-domain artifacts.
His algorithm operates in the frequency domain. It derives a perceptually relevant block-based estimate of the reverberant system. Then a perceptually relevant estimate of the reverberant energy is derived. This energy is then removed from the input signal.
In estimating the impulse response, perceptually relevant estimates consists of estimating the block-based magnitudes of the impulse response. The rate of decay of the signal is determined by the rates of decay of the dry source and the reverberant system. The fastest rate of decay occurs when the dry source has stopped, leaving us only with the reverberant decay. We look for the fastest rate of decay to estimate the block-based magnitudes of the impulse response.
Audio examples Dr. Soulodre played in the first half were demonstration-based: meaning that they included unusually reverberant settings to make processing easily audible and to 'challenge' the dereverberation algorithm. Among them were: male speech in a noisy squash court; and an 'anechoic' string ensemble with added artificial reverb.
The samples were played 'live' through his software via his laptop. On the most extreme settings (complete reverb removal) small artifacts were audible. However such settings would rarely be needed for practical applications. We are able to audition either the dry component or the reverberant component, and we're able lessen the amount of reverb in the signal via a graphic slider in the software. This is not currently commercially available.
One other observation on phase approximation is that it's most accurate when it matters most, and least accurate when it matters least.
Suggested applications for this software included mastering tools, restoration, upmixing for surround, speech recognition, teleconferencing, hearing aids, and security and forensics, to name a few.
In the second half, Gilbert gave audio examples of signal extraction, where elements within a stereo mix are separated. Obviously this only works on stereo and multi-channel sources. By positioning a 'spatial filterbank' — for example in the centre -- vocals can be separated from the actual mix (along with other elements in that centre position). These filters can take on any shape with regard to width and level.
As a demonstration, he played back a Frank Sinatra recording. The software allows for auditioning (soloing) the extracted components, as well as then extracting the reverb component from those extractions, and auditioning those as well. We could choose to hear either the voice stream, the instrument stream, the isolated reverb, or the voice without reverb! The amount of extraction was varied by a slider and, again, the most extreme settings produced artifacts. The software interface was very simple and elegant, and appeared very easy to navigate.
Playing back the Beatles' 'Penny Lane', he was able to extract various elements from the mix (eg: the flute solos) by 'panning' the spatial filterbank in real time.
An application of this can be used to create surround mixes from stereo sources. Different 'streams' can be placed independently; reverb streams can be used to create space and envelopment. It should be noted the individual streams reconstruct perfectly: nothing is lost, nothing is added.
Other uses for Dr. Soulodre's perceptual based processing involve reconstructing signals passed through lossy codecs, reconstructing stereo soundstages, and removing or reducing noise from music and speech recordings.
On mp3 reconstruction, he can use stream extraction techniques to perceptually reconstruct the full bandwidth, reduce coding artifacts, and reconstruct transient and ambient information. He played back a sample of a file that was reconstructed back from a 96k bit rate. Again this can only be a reconstruction, and not a true restoration.
On stereo reconstruction, he demonstrated an example of a project he did for the Chesky label where out-of-phase drum leakage into the piano mics in an audiophile jazz recording was very successfully removed. This was the most impressive example of Gilbert's presentation.
Another example involved track isolation where singer/piano leakage into the 'wrong' mics were reduced.
Noise reduction involves decomposing the signal into noise and non-noise streams. The software is auto-adapting and operates with high-noise environments. This is currently available in TC Electronics 'Back Drop' noise reduction software.
Many examples were demonstrated including John Lennon dialogue, film dialogue, speech with a very noisy background, and high level radio noise with static.
On removing noise and unwanted sounds from a speech recording, he was able to produce a 'hyper-directional' result from an 'omni' source. Playing back samples, noise was reduced, reverberation was lowered or removed, and wind effects were reduced, among other disturbances.
Summarizing his talk he stated: "The performance of many audio processing tasks can be significantly improved by exploiting the various properties of the human auditory system.
At the close Blair Francey presented Dr. Soulodre with a Toronto AES Certificate of Appreciation and an AES coffee mug.
Written By: Karl Machat; with Frank Lockwood