Low-overhead Dataflow Speculation for Accelerating Tiled Sparse Matrix Multiplication
- 주제어 (키워드) Sparse matrix multiplication , Tile-level Dataflow Speculation
- 발행기관 서강대학교 일반대학원
- 지도교수 류성주
- 발행년도 2025
- 학위수여년월 2025. 2
- 학위명 석사
- 학과 및 전공 일반대학원 전자공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000079748
- UCI I804:11029-000000079748
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
This dissertation presents SpecBoost, a novel design methodology for efficiently managing sparse matrix-sparse matrix multiplication on custom hardware. SpMSpM is a key operation in areas such as scientific computing, sparse linear algebra, and machine learning, particularly when dealing with large, highly sparse workloads. Traditional methods, including inner product, outer product, and Gustavson approaches, rely on fixed dataflows to accelerate SpMSpM. However, these fixed dataflows struggle to adapt to the diverse sparsity patterns of different matrices, leading to suboptimal performance. To address these challenges, we introduce analyzes the sparsity patterns of each matrix tile and speculates the most suitable tile-level dataflow before SpMSpM. By adapting the dataflow to the input patterns, SpecBoost ensures optimal performance across a variety of sparse matrix benchmarks. Compared to the previous accelerator ExTensor, SpecBoost achieves a 2.92x reduction in memory accesses and a 2.58x improvement in computational performance, demonstrating its efficiency and - adaptability.
more목차
I. Introduction 7
II. Preliminaries: Sparse Matrix Multiplication (SpMSpM) 11
2.1 Sparse Matrix Format 11
2.2 Matrix Multiplication Dataflows 14
2.2.1 Inner product 14
2.2.2 Outer product 16
2.2.3 Gustavson (Row-wise Product) 17
2.2.4 Hybrid Dataflow 18
III. Hardware Accelerator for SpMSpM 20
3.1 Previous Work 20
3.2 SpecBoost 22
3.2.1 Sparse Matrix Tile Sampling with Threshold 24
3.2.2 Tile-level Dataflow Speculation 27
3.2.3 Architecture 32
IV. Evaluation 34
4.1 Experiment Setup 34
4.2 Speculation Accuracy 35
4.3 SpMSpM Performance 36
4.4 Hardware Implementation 40
V. Conclusion 41
Reference 42

