검색 상세

Multi-channel Voice Activity Detection Based on Joint Training with Speech Presence Probability Features

  • 주제(키워드) Voice activity detection
  • 발행기관 서강대학교 일반대학원
  • 지도교수 박형민
  • 발행년도 2021
  • 학위수여년월 2021. 8
  • 학위명 석사
  • 학과 및 전공 일반대학원 전자공학과
  • UCI I804:11029-000000066079
  • 본문언어 영어
  • 저작권 서강대학교 논문은 저작권보호를 받습니다.

초록/요약

Recently, voice-activity-detection (VAD) algorithms for detecting speech in noisy reverberant environments have been intensively developed. Conventionally, one of the most widely used VAD algorithms is a statistical-model-based single-channel VAD method based on the likelihood ratio test (LRT), but this method provides poor performance in noisy and non-stationary environments. To overcome this problem, recently, a method of calculating the LRT by estimating noise components through supervised learning has been developed. In this paper, we propose a joint training multi-channel VAD method. First, noise components estimated by a speech enhancement (SE) network are used to estimate the Gaussian-model-based speech presence probabilities (GM-SPPs). Then, the SPP features are concatenated with the magnitude spectrogram of noisy speech and fed as an input of the VAD network to be trained by the VAD label. Since the proposed method estimates the multi-channel SPP with the help of the SE network, it can detect speech well even in noisy and reverberant environments. In addition, the VAD performance can be further improved by training the SE and VAD networks with joint training. Through experiments on the REVERB challenge dataset, it was confirmed that the performance of the proposed method was superior to that of compared methods.

more