Sound Event Localization and Detection Using Dual Cross-modal Attention and Parameter Sharing
크로스모달 어텐션 및 파라미터 공유 기법을 활용한 음향 이벤트 판별 및 방향 탐지
- 주제어 (키워드) Sound Event Localization and Detection , Deep Learning , Parameter Sharing , Cross-modal Transformer
- 발행기관 서강대학교 일반대학원
- 지도교수 박형민
- 발행년도 2022
- 학위수여년월 2022. 8
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000067083
- UCI I804:11029-000000067083
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록
Sound event localization and detection is a joint task that unifies sound event detection and directions of arrival (DOA) estimation. It is reasonable to combine detection and localization by estimating the temporal and spatial locations of the targe events since sound of an event is transmitted to microphones from the corresponding source at a specific direction. The task has become a popular topic so that it was introduced into the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) Task3 in 2019. In this thesis, we propose a method based on dual cross-modal attention (DCMA) and parameter sharing to simultaneously detect and localize sound events. Furthermore, we introduce various data augmentation methods and diverse types of acoustic features. Experimental results show the proposed system outperformed the baseline method significantly. In addition, our model adopting the track-wise output format achieved much larger LR_CD than the highly ranked systems in the challenge.
more