DFHDRNet : Multi-image HDR Deghosting based on Deformable Convolutions
- 발행기관 서강대학교 일반대학원
- 지도교수 강석주
- 발행년도 2024
- 학위수여년월 2024. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제URI http://www.dcollection.net/handler/sogang/000000076700
- UCI I804:11029-000000076700
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록
In order to capture a natural scene with an infinite dynamic range, an expensive high dynamic range(HDR) cameras are essential. However, creating HDR image by synthesizing low dynamic range(LDR) images taken with different exposure values is possible with a relatively cheaper low dynamic range cameras. Unfortunately, in this process, ghost artifact occurs due to the dynamic camera position and large mo- tions of the filming subject. Therefore, numerous methods have been proposed to compensate with these artifacts. Among the methods to solve this problem, intro- duction of vision transformers (ViTs) has made some great improvements recently. However, ghost artifacts caused by large motion or occluded areas are still challeng- ing with ViTs due to the large computational cost, which quadratically increases proportion to the size of the input image. Because of these constraint of ViTs, ViT- based methods not only perform learning with very small patches, but also perform inference by dividing images into small patches. Such inference process eliminates the advantages of leveraging global attention of ViTs. Therefore, we propose DFH- DRNet which can fully leverage the advantages of deformable convolutions (DCNs) in the field of HDR deghosting. Furthermore, we designed DeFormable Fusion Block (DFFB) which fuses LDR images with different exposures based on DCNs. In ad- dition, we propose Depth-Wise Attention Module (DWAM) to effectively restore the occuluded area. We conducted extensive experiments on public HDR datasets and show that our proposed method outperforms existing transformer-based meth- ods with various indicators.
more초록
무한한 동적 범위를 가진 장면을 촬영하기 위해서는 값비싼 High Dynamic Range (HDR)카메라가필수적이다.그러나비교적저렴한 LDR카메라를사용하 여서로다른노출값으로촬영된 Low Dynamic Range (LDR)이미지를합성하여 HDR영상을생성할수있다.하지만,이과정에서고정되지않은카메라와촬영피 사체의큰움직임으로인해유령결함이발생한다.따라서이러한결함을제거하기 위한 수많은 방법이 제안되었다. 최근에는 이 문제를 해결하기 위한 방법으로 비 전트랜스포머(ViT)를도입하여몇가지큰개선을이루었다.그러나입력영상의 크기의제곱에비례하여증가하는큰계산비용으로인해비전트랜스포머를활용 하여해결하기에는여전히제한이있다.비전트랜스포머를활용한방법들에서는 이러한제약때문에매우작은패치로이미지를분할하여학습을수행할뿐만아니 라추론할때에도영상을작은패치로분할하여수행한다.이러한프로세스는비전 트랜스포머의 장거리 관련도를 활용할 수 있다는 이점을 활용할 수 없도록 한다. 따라서 HDR deghosting분야에서가변컨볼루션(Deformable Convolution)의이점 을 충분히 활용할 수 있는 DFHDRNet을 제안한다. 또한 가변 컨볼루션을 기반으 로 서로 다른 노출로 LDR 영상을 융합하는 Deformable Fusion Block (DFFB)을 설계했다.또한과/저노출영역을효과적으로복원하기위해 Depth-wise Attention Module (DWAM)을제안한다.우리는공개된 HDR데이터세트들에대해실험을 수행했으며제안된방법이다양한지표로기존트랜스포머기반방법을능가함을 보인다.
more목차
I . Introduction 1
II . Related work 8
2.1 Alignment-based methods 8
2.2 Pixel rejection-based methods 8
2.3 CNN-based methods 9
2.4 Transformer-based methods 10
2.5 Deformable Convolution Networks 10
III . Proposed Method 12
3.1 Network Architecture 12
3.2 Deformable Fusion Block 14
3.2.1 Deformable Convolution Layer 14
3.2.2 Deformable Alignment Module 17
3.2.3 Depth-Wise Attention Module 18
3.3 Loss Function 20
IV . Experimental Results 21
4.1 Dataset and Implementation Details 21
4.1.1 Datasets 21
4.1.2 Evaluation metrics 24
4.1.3 Optimization details 24
4.2 Comparison with Other Methods 25
4.2.1 Datasets with Ground Truth 25
4.2.2 Datasets without Ground Truth 27
4.3 Ablation Study 30
4.4 Comparison on the Computational Cost 34
V . Conclusion 35
Bibliography 36