Target Speech Extraction Using AuxIVA Exploiting Target Masks and Noise Dependency for Robust Speech Recognition
- 주제(키워드) Speech processing , robust speech recognition , independent vector analysis , mask , online adaptation
- 발행기관 서강대학교 일반대학원
- 지도교수 박형민
- 발행년도 2021
- 학위수여년월 2021. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- UCI I804:11029-000000065942
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
초록/요약
This thesis presents a target speech extraction method in ambient background noises for robust automatic speech recognition (ASR). The method is based on auxiliary-function-based independent vector analysis (AuxIVA). In AuxIVA, weighted covariance matrices with variances scaled by target masks are introduced to extract target speech on a fixed output channel. The target masks can be estimated by diffusenesses of microphone pair, which obtained by the coherence function. To strengthen channel fixation, noise output channels are assumed to be dependent by introducing multi-dimensional independent component analysis. The algorithm is also extended to independent low-rank matrix analysis (ILRMA) framework. For ILRMA, the inter-channel dependency of noise outputs is resolved by introducing non-negative tensor factorization (NTF), which is an extension of non-negative matrix factorization (NMF). Furthermore, an online algorithm based on frame-by-frame processing is derived to cope with practical ASR applications. The weighted covariance matrices are updated recursively by applying the matrix inversion lemma. Besides, the recursive updates of source variances are derived for both AuxIVA and ILRMA. Experimental results on the CHiME-4 datasets show that the proposed algorithms effectively extract target speech on the fixed channel.
more