검색 상세

Deep-Neural-Network-Based Spectral Feature Regression Using HMM-Based Feature Enhancement with DOA-Constrained ICA for Robust Speech Recognition

Lee, Ho Yong (Sogang University, General Graduate School)

원문보기

초록 moremore
The automatic speech recognition (ASR) performance is degraded in adverse real-world environments. Since deep learning is successfully used as a recent breakthrough for acoustic modeling in ASR, deep-neural-network(DNN)-based speech feature enhancement approaches have attracted attention due to its ...
The automatic speech recognition (ASR) performance is degraded in adverse real-world environments. Since deep learning is successfully used as a recent breakthrough for acoustic modeling in ASR, deep-neural-network(DNN)-based speech feature enhancement approaches have attracted attention due to its powerful modeling capability. However, the DNN-based approaches cannot achieve remarkable recognition performance improvements for speech with severe distortion in test environments different from the training environments. In this thesis, we propose a DNN-based feature enhancement method, using multi-channel inputs, that includes enhanced spectral features and estimated noise features in the DNN inputs to reconstruct noise-robust features. The enhanced spectral features are obtained by direction-of-arrival(DOA)-constrained independent component analysis (ICA) followed by Bayesian feature enhancement using an hidden-Markov-model(HMM) prior from multi-channel inputs to exploit the capabilities of efficient online target-speech-extraction and feature enhancement for robust speech recognition, respectively. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of adjacent feature vectors of noisy speech acquired by an input in addition to the corresponding enhanced feature vectors and estimated noise feature vectors. Experimental results demonstrate that the proposed method provided better ASR performance than the conventional DNN-based method using noisy speech features only or other competing DNN-based methods, even in mismatched noise conditions.