Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation
×
Cite This
Citation & Abstract
L. Zhang, M. Zhang, and C. Li, "Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation," J. Audio Eng. Soc., vol. 63, no. 12, pp. 1017-1024, (2015 December.). doi: https://doi.org/10.17743/jaes.2015.0085
L. Zhang, M. Zhang, and C. Li, "Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation," J. Audio Eng. Soc., vol. 63 Issue 12 pp. 1017-1024, (2015 December.). doi: https://doi.org/10.17743/jaes.2015.0085
Abstract: Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.
@article{zhang2016dual-microphone,
author={zhang, luofei and zhang, ming and li, chen},
journal={journal of the audio engineering society},
title={dual-microphone voice activity detection estimate in handset applications based on neural network by using subband signed power difference and inter-microphone cross correlation},
year={2016},
volume={63},
number={12},
pages={1017-1024},
doi={https://doi.org/10.17743/jaes.2015.0085},
month={december},}
@article{zhang2016dual-microphone,
author={zhang, luofei and zhang, ming and li, chen},
journal={journal of the audio engineering society},
title={dual-microphone voice activity detection estimate in handset applications based on neural network by using subband signed power difference and inter-microphone cross correlation},
year={2016},
volume={63},
number={12},
pages={1017-1024},
doi={https://doi.org/10.17743/jaes.2015.0085},
month={december},
abstract={voice activity detection (vad) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. this report explores the combination of a neural network and dual microphones to improve vad estimates in handset applications. two new features are extracted from the dual microphones: subband signed power difference (sbspd) and inter-microphone cross correlation (imcc). sbspd provides specific and accurate power difference information at various frequency bands and imcc contains detailed spatial location information of both microphones. extensive objective evaluation has been performed under various noise conditions including directional speech interference. compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of vad estimate under various noise environments, especially directional speech interferences. because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.},}
TY - report
TI - Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation
SP - 1017
EP - 1024
AU - Zhang, LuoFei
AU - Zhang, Ming
AU - Li, Chen
PY - 2016
JO - Journal of the Audio Engineering Society
IS - 12
VO - 63
VL - 63
Y1 - December 2015
TY - report
TI - Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation
SP - 1017
EP - 1024
AU - Zhang, LuoFei
AU - Zhang, Ming
AU - Li, Chen
PY - 2016
JO - Journal of the Audio Engineering Society
IS - 12
VO - 63
VL - 63
Y1 - December 2015
AB - Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.
Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.
Authors:
Zhang, LuoFei; Zhang, Ming; Li, Chen
Affiliation:
Jiangsu Audio Engineering Lab, School of Physics and Technology, Nanjing Normal University, Nanjing, China JAES Volume 63 Issue 12 pp. 1017-1024; December 2015
Publication Date:
January 6, 2016Import into BibTeX
Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=18059