Graph-based Approaches for 3D Hand Pose Estimation and Hand Gesture Recognition
- 주제어 (키워드) 3D hand pose estimation , hand gesture recognition , graph convolutional network
- 발행기관 서강대학교 일반대학원
- 지도교수 강석주
- 발행년도 2023
- 학위수여년월 2023. 2
- 학위명 박사
- 학과 및 전공 일반대학원 전자공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000070052
- UCI I804:11029-000000070052
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
As computers become more commonplace, interest in developing new technologies for human-computer interaction is increasing. The ultimate goal is to make computer interactions as natural as human interactions. For this purpose, new studies using hand pose and gesture are being actively conducted. Typically, the hand pose estimation task estimates articulated hand poses, and the hand gesture recognition task classifies predefined gestures. In this dissertation, we study novel methods to improve the efficiency and performance for these two tasks. Firstly, we propose a novel method with graph-based global and local relation reasoning modules for 3D hand pose estimation from a single depth image. A hand is an articulated object and consists of six parts that represent the palm and five fingers. The kinematic constraints can be obtained by modeling the dependency between adjacent joints. We propose a novel convolutional neural network (CNN)-based approach incorporating hand joint connections to features through both a global relation inference for the entire hand and local relation inference for each finger. Modeling the relations between the hand joints can alleviate critical problems for occlusion and self-similarity. We also present a hierarchical structure with six branches that independently estimate the position of the palm and five fingers by adding hand connections of each joint using graph reasoning based on graph convolutional networks (GCNs). Experimental results on public hand pose datasets show that the proposed method outperforms previous state-of-the-art methods. Specifically, the proposed method achieves the best accuracy compared to state-of-the-art methods on public datasets. In addition, the proposed method can be utilized for real-time applications with an execution speed of 103 fps in a single GPU environment. Secondly, we propose a novel multi-stream improved spatio-temporal graph convolutional network (MS-ISTGCN) for skeleton-based dynamic hand gesture recognition. We adopt an adaptive spatial graph convolution that can learn the relationship between distant hand joints and propose an extended temporal graph convolution with multiple dilation rates that can extract informative temporal features from short to long periods. Furthermore, we add a new attention layer consisting of effective spatio-temporal attention and channel attention between the spatial and temporal graph convolution layers to find and focus on key features. Finally, we propose a multi-stream structure that feeds multiple data modalities (i.e., joints, bones, and motions) as inputs to improve performance using the ensemble technique. Each of the three-stream networks is independently trained and fused to predict the final hand gesture. The performance of the proposed method is verified through extensive experiments with two widely used public dynamic hand gesture datasets. Our proposed method achieves the highest recognition accuracy in various gesture categories for both datasets compared with state-of-the-art methods. Moreover, we further improve the performance of gesture recognition by applying joint annotations generated by our 3D hand pose estimation.
more

