Deep-Neural-Network-Based Spectral Feature Regression Using HMM-Based Feature Enhancement with DOA-Constrained ICA for Robust Speech Recognition
Lee, Ho Yong (Sogang University, General Graduate School)
- 발행기관 Sogang University, General Graduate School
- 지도교수 Park, Hyung Min
- 발행년도 2016
- 학위수여년월 2016. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제URI http://www.dcollection.net/handler/sogang/000000058945
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
- The automatic speech recognition (ASR) performance is degraded in adverse real-world environments. Since deep learning is successfully used as a recent breakthrough for acoustic modeling in ASR, deep-neural-network(DNN)-based speech feature enhancement approaches have attracted attention due to its ...
- The automatic speech recognition (ASR) performance is degraded in adverse real-world environments. Since deep learning is successfully used as a recent breakthrough for acoustic modeling in ASR, deep-neural-network(DNN)-based speech feature enhancement approaches have attracted attention due to its powerful modeling capability. However, the DNN-based approaches cannot achieve remarkable recognition performance improvements for speech with severe distortion in test environments different from the training environments. In this thesis, we propose a DNN-based feature enhancement method, using multi-channel inputs, that includes enhanced spectral features and estimated noise features in the DNN inputs to reconstruct noise-robust features. The enhanced spectral features are obtained by direction-of-arrival(DOA)-constrained independent component analysis (ICA) followed by Bayesian feature enhancement using an hidden-Markov-model(HMM) prior from multi-channel inputs to exploit the capabilities of efficient online target-speech-extraction and feature enhancement for robust speech recognition, respectively. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of adjacent feature vectors of noisy speech acquired by an input in addition to the corresponding enhanced feature vectors and estimated noise feature vectors. Experimental results demonstrate that the proposed method provided better ASR performance than the conventional DNN-based method using noisy speech features only or other competing DNN-based methods, even in mismatched noise conditions.