검색 상세

Target Speech Extraction Using AuxIVA Exploiting Target Masks and Noise Dependency for Robust Speech Recognition

초록/요약

This thesis presents a target speech extraction method in ambient background noises for robust automatic speech recognition (ASR). The method is based on auxiliary-function-based independent vector analysis (AuxIVA). In AuxIVA, weighted covariance matrices with variances scaled by target masks are introduced to extract target speech on a fixed output channel. The target masks can be estimated by diffusenesses of microphone pair, which obtained by the coherence function. To strengthen channel fixation, noise output channels are assumed to be dependent by introducing multi-dimensional independent component analysis. The algorithm is also extended to independent low-rank matrix analysis (ILRMA) framework. For ILRMA, the inter-channel dependency of noise outputs is resolved by introducing non-negative tensor factorization (NTF), which is an extension of non-negative matrix factorization (NMF). Furthermore, an online algorithm based on frame-by-frame processing is derived to cope with practical ASR applications. The weighted covariance matrices are updated recursively by applying the matrix inversion lemma. Besides, the recursive updates of source variances are derived for both AuxIVA and ILRMA. Experimental results on the CHiME-4 datasets show that the proposed algorithms effectively extract target speech on the fixed channel.

more