Optimization of neural network accelerators for edge device computation
- 주제어 (키워드) neural network , hardware accelerator , systolic array , computing-in-memory , network-on-chip , artificial intelligence
- 발행기관 서강대학교 일반대학원
- 지도교수 류성주
- 발행년도 2025
- 학위수여년월 2025. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000079755
- UCI I804:11029-000000079755
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
This paper introduces a set of optimization strategies for neural network accelerators to enhance computation on edge devices. First, we present Teleport, a hardware accelerator designed for a lightweight neural network called ShiftNet. Shift Convolution operations are performed with low utilization on conventional hardware, while Teleport uses an Address Translator to process Shift Convolution more efficiently. Secondly, we propose NexusCIM, which performs DNN computations while minimizing communication bottlenecks in a multi-CIM architecture. Traditional multi- CIM architectures, simultaneous data transfers from each CIM unit during DNN computations lead to communication bottlenecks. To address this problem, NexusCIM replaced the router with a hub core and implemented C-Mesh NoC.
more목차
I. Research Overview 2
II. Introduction 3
III. Hardware Accelerator for ShiftNet (Computation Bottleneck) 6
3.1 Preliminaries 6
3.1.1 Shift Convolution 6
3.1.2 Systolic Array 8
3.1.3 Previous Work 9
3.2 Teleport Architecture 12
3.2.1 Address Translator 12
3.2.2 Low-Cost Systolic Loader 15
3.2.3 Top-Level Architecture 16
3.2.4 Network Mapping and Dataflow 18
3.3 Results 20
3.3.1 Experimental Setup 20
3.3.2 Results 22
3.4 Summary 27
IV. Multi-CIM Architecture for DNN (Communication Bottleneck) 28
4.1 Preliminaries 28
4.1.1 DNN on Multi-CIM 28
4.1.2 Challenges of DNN on Multi-CIM 31
4.1.3 Previous Work 32
4.2 NexusCIM Architecture 34
4.2.1 Top-Level Architecture 35
4.2.2 Nexus Block Dataflow 36
4.2.3 Hub Core 39
4.2.4 Reconfigurable CIMU Group Modes 41
4.2.5 Mapping Strategy 42
4.3 Results 45
4.3.1 Experimental Setup 45
4.3.2 Results 45
4.4 Summary 54
V. Conclusion 55
References 57