검색 상세

A real-time beamforming method based on complex Gaussian mixture model using frequency dependency for robust speech recognition

  • 발행기관 서강대학교 일반대학원
  • 지도교수 박형민
  • 발행년도 2020
  • 학위수여년월 2020. 2
  • 학위명 석사
  • 학과 및 전공 일반대학원 전자공학과
  • UCI I804:11029-000000064882
  • 본문언어 영어
  • 저작권 서강대학교 논문은 저작권보호를 받습니다.

초록/요약

In this paper, we consider a beamforming algorithm for robust speech recognition in noisy environment. Since the beamformer improves the target source with respect to the direction indicated by the steering vector, it is important to estimate steering vector correctly. There is conventional algorithm that derive the steering vector based on complex Gaussian mixture model (CGMM). This method estimates the noise mask by clustering each time-frequency component in to a complex Gaussian distribution, and then calculate the covariance matrix for the target signal and derive the steering vector from it. This method has the advantage of high accuracy and no dependence on the target sound source and microphone position, but has the disadvantage of slowing the convergence speed and not responding to the change of position of the target sound source by clustering using the EM algorithm for the whole sound source. In this paper, we propose a real-time algorithm using recursive convergence. Through this, we propose a steering vector estimation algorithm that is fast in convergence and robust against changes in the position of sound sources. In addition, we propose a CGMM using a complex Gaussian distribution that depends on the frequency axis to solve permutation problems that may occur in real-time algorithms that estimate distributions with less data. As a result, it is confirmed that the proposed method shows a similar performance to the existing method by comparing the speech recognition rate of the beamforming output signal through the estimated steering vector.

more

초록/요약

In this paper, we consider a beamforming algorithm for robust speech recognition in noisy environment. Since the beamformer improves the target source with respect to the direction indicated by the steering vector, it is important to estimate steering vector correctly. There is conventional algorithm that derive the steering vector based on complex Gaussian mixture model (CGMM). This method estimates the noise mask by clustering each time-frequency component in to a complex Gaussian distribution, and then calculate the covariance matrix for the target signal and derive the steering vector from it. This method has the advantage of high accuracy and no dependence on the target sound source and microphone position, but has the disadvantage of slowing the convergence speed and not responding to the change of position of the target sound source by clustering using the EM algorithm for the whole sound source. In this paper, we propose a real-time algorithm using recursive convergence. Through this, we propose a steering vector estimation algorithm that is fast in convergence and robust against changes in the position of sound sources. In addition, we propose a CGMM using a complex Gaussian distribution that depends on the frequency axis to solve permutation problems that may occur in real-time algorithms that estimate distributions with less data. As a result, it is confirmed that the proposed method shows a similar performance to the existing method by comparing the speech recognition rate of the beamforming output signal through the estimated steering vector.

more