dCollection 디지털 학술정보 유통시스템

An Architectural Framework for High-Performance and Energy-Efficient Large Language Model Acceleration on NAND Flash-Based In-Storage Computing Systems

원문보기

주제(키워드) 3D NAND flash memory , in-storage computing , large languages models , DRAM buffering , triple-level cell
발행기관 서강대학교 일반대학원
지도교수 류성주
발행년도 2026
학위수여년월 2026. 2
학위명 석사
학과 및 전공 일반대학원 전자공학과
실제URI http://www.dcollection.net/handler/sogang/000000082247
UCI I804:11029-000000082247
본문언어 영어
저작권 논문은 저작권에 의해 보호받습니다.

초록(요약문)

This paper presents an architectural framework for NAND flash-based in-storage computing (ISC) systems, addressing challenges of high energy consumption and low performance in accelerating large language model (LLM). For energy efficiency, E-Flash introduces a novel data mapping methodology that reduces the power consumed when reading static weights. It utilizes state-switching algorithm and cell-first allocation to map frequent data patterns to low-power cell states, achieving an energy reduction of up to 37.73% compared to baseline. To increase performance, NITRO addresses the latency of handling dynamic activations by using a hybrid architecture that buffers these intermediate results in a fast DRAM subsystem. Furthermore, NITRO maximizes throughput by leveraging a distributed dataflow to exploit the parallelism of the NAND array, thereby reducing inference latency by up to 85%. These strategies enable both fast and energy-efficient LLM deployment directly inside storage systems.

Abstract
I. Introduction 1
II. Preliminaries 4
2.1. 3D NAND Flash 4
2.2. 3D NAND Flash-Based Matrix-Vector Multiplication 8
III. Bit-Pattern Aware Data Mapping Algorithm 10
3.1. Motivation 10
3.2. Proposed Methodology 13
3.2.1. State-Switching Algorithm 13
3.2.2. Cell-First Allocation 15
3.2.3. Top-Level Architecture 18
3.3. Experiments 20
3.3.1. Experimental Setup 20
3.3.2. Results 22
IV. Heterogeneous In-Storage Computing Architecture 25
4.1. Motivation 25
4.2. Proposed Methodology 28
4.2.1. Activation Buffering 28
4.2.2. Distributed Dataflow 32
4.2.3. Top-Level Architecture 35
4.3. Experiments 38
4.3.1. Experimental Setup 38
4.3.2. Results 41
V. Conclusion 47
Reference 49

반출 Meta View 목록

서강대학교

검색 상세

An Architectural Framework for High-Performance and Energy-Efficient Large Language Model Acceleration on NAND Flash-Based In-Storage Computing Systems

초록(요약문)

목차