6.4 Voice Activity Detection using Microphone Array
Jaeyoun Cho(1) & Ashok Krishnamurthy(2)
(1) Samsung Electronics, Digital Media R&D Center, Suwon, Korea. (2) Ohio State University, Columbus, OH 43210, USA.
It is useful to decide whether the microphone signal includes a target speech or not at a temporal moment because
the process called the voice activity detection (VAD) can reduce any redundant efforts made for the speech coding
or the speech recognition, or it can help provide more accurate noise estimation for the speech enhancement. The
detection of speech or non-speech in a frame has been simply done by observing the variance of its energy level,
zero crossing rate or periodicity. In this occasion, however, the detection error increases exponentially as much as
the background noise is added up. Unvoiced fricative sounds which have low energy with being distributed over
widebands are more vulnerable to the background noise than any other phonemes are. It is proposed in this
literature that voice activity can be detected more robustly in noisy environment by observing the subband power
ratio of the noisy speech and its beamformed signal. Also, it is shown to be effective in the fricatives than in the
vowels. Whatsoever, this method guarantees much better performance than single microphone VADs when the
noise is obviously reduced by beamforming.