Multi-channel speech enhancement combining Non-neural and Neural Beamformers
- 발행기관 서강대학교 일반대학원
- 지도교수 박형민
- 발행년도 2023
- 학위수여년월 2023. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000070141
- UCI I804:11029-000000070141
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
Multi-channel speech enhancement (MCSE) aims to extract only the target speech from a multi-channel recorded signal in a noisy environment. Speech enhancement aims to improve speech quality by removing corrupting noise from noisy speech. It can be done by single channel input, but the performance of the Single-channel speech enhancement (SCSE) methods is limited in noisy and reverberant far-field environments. Because multi-channel signals can utilize spatial information, spatial filtering techniques known as beamforming have been widely used. Recently, various neural beamformers have been proposed. At this time, there has been an attempt to combine a previously developed non-neural beamformer with a neural beamformer. Conventional hybrid approaches typically follow two steps: first applying a beamformer and then applying a SCSE model as a post-filter. This paper proposes a new way to utilize both the non-neural beamformer and the neural beamformer together. This paper proposes using the output of the non-neural beamformer as one of the features of the neural beamformer. In this paper, various experiments were conducted through early fusion and late fusion methods, and which method was better. In addition, this paper proposed two conformer-based multi-channel speech enhancement models that can maximize performance using non-neural beamformer features. In this method, by reducing the size of the encoder and decoder, a model with similar performance was found with lower computational complexity than the conventional method.
more

