dCollection 디지털 학술정보 유통시스템

Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition

원문보기

주제(키워드) 도움말 Array signal processing , Covariance matrices , Speech recognition , Noise measurement , Maximum likelihood estimation , Speech processing , Convolution , Beamforming , dereverberation , maximum-likelihood estimation , robust speech recognition , steering vector estimation
발행기관 IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
발행년도 2021
총서유형 Journal
본문언어 영어

초록/요약 도움말

Beamforming has been one of the most successful approaches using multi-microphones for robust speech recognition. Although a beamforming method, called the maximum-likelihood distortionless response (MLDR) beamformer, was recently presented to achieve promising performance, it requires an accurate steering vector for a target speaker in advance like many kinds of beamformers. In this paper, we present a method for steering vector estimation (SVE) by replacing the noise spatial covariance matrix estimate with a normalized version of the variance-weighted spatial covariance matrix estimate for the observed noisy speech signal obtained by the iterative update rule in the MLDR beamforming framework. In addition, an MLDR beamforming method without a steering vector for a target speaker given in advance is presented where the SVE and the beamforming are alternately repeated. Furthermore, an online algorithm based on recursive least squares (RLS) is derived to cope with various practical applications including time-varying situations, and the power method is introduced for further efficient online processing. We also present batch and online convolutional MLDR beamforming with SVE for simultaneous beamforming and dereverberation where the weighted prediction error (WPE) dereverberation and the MLDR beamforming with the SVE were jointly optimized based on the maximum-likelihood estimation (MLE) for a zero-mean complex Gaussian signal with time-varying variances. Moreover, input signals masked by a neural network (NN) for estimating target speech or noise components can be used to further improve the presented beamformers. Experimental results on the CHiME-4 and REVERB challenge datasets demonstrate the effectiveness of the presented methods.

반출 Meta View 목록

서강대학교

검색 상세

Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition

초록/요약 도움말