검색 상세

Feature Enhancement Based on Deep TDNN Neural Networks using Residual Learning for Robust Speech Recognition

초록/요약

Today, there are various studies using Deep Neural Network(DNN) for speech recognition, but recognition performance against noise is still challenging problem when the Signal-to-Noise Ratio(SNR) is low. In addition, DNN has the disadvantage that it is difficult to apply in real environment due to distortion problem resulted from difference between training and test data. Especially when the type of noise is changed, such distortion is intensified. Therefore, this paper proposes a feature enhancement algorithm using speech signal preprocessing and DNN. Training data for DNN are extracted from the signals through weighted predication error (WPE) based dereverberation, spectral-mask based steering vector estimation, and Minimum Variance Distortionless Response (MVDR) algorithms. In the DNN model, we derive the principal idea from residual learning. So we combine the network with various kinds of feature data obtained through the preprocessing and consist it, similar to the preprocessing method. In addition, we use Time-Delay Neural Network(TDNN), which is robust to speech data processing and Noise-Aware based Training(NAT) is also performed. Finally, We evaluate the proposed method with CHIME3 data, which is widely accepted in speech recognition, and compare the results to the algorithm that won the CHIME3 Challenge when the acoustic models and language models are fixed.

more