Feature Enhancement Based on Deep TDNN Neural Networks using Residual Learning for Robust Speech Recognition
- 주제(키워드) Feature enhancement , Speech recognition , Time-Delay Neural Network , Residual learning , Signal processing
- 발행기관 서강대학교 일반대학원
- 지도교수 박형민
- 발행년도 2019
- 학위수여년월 2019. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제URI http://www.dcollection.net/handler/sogang/000000063904
- UCI I804:11029-000000063904
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
초록/요약
Today, there are various studies using Deep Neural Network(DNN) for speech recognition, but recognition performance against noise is still challenging problem when the Signal-to-Noise Ratio(SNR) is low. In addition, DNN has the disadvantage that it is difficult to apply in real environment due to distortion problem resulted from difference between training and test data. Especially when the type of noise is changed, such distortion is intensified. Therefore, this paper proposes a feature enhancement algorithm using speech signal preprocessing and DNN. Training data for DNN are extracted from the signals through weighted predication error (WPE) based dereverberation, spectral-mask based steering vector estimation, and Minimum Variance Distortionless Response (MVDR) algorithms. In the DNN model, we derive the principal idea from residual learning. So we combine the network with various kinds of feature data obtained through the preprocessing and consist it, similar to the preprocessing method. In addition, we use Time-Delay Neural Network(TDNN), which is robust to speech data processing and Noise-Aware based Training(NAT) is also performed. Finally, We evaluate the proposed method with CHIME3 data, which is widely accepted in speech recognition, and compare the results to the algorithm that won the CHIME3 Challenge when the acoustic models and language models are fixed.
more