dCollection 디지털 학술정보 유통시스템

Optimization with Access Frequency-Based Remapping for Recommendation System Inference Accelerators

원문보기

주제(키워드) Recommendation system , NAND flash memory , in-storage computing , hardware accelerator , data remapping
발행기관 서강대학교 일반대학원
지도교수 류성주
발행년도 2026
학위수여년월 2026. 2
학위명 석사
학과 및 전공 일반대학원 전자공학과
실제URI http://www.dcollection.net/handler/sogang/000000082259
UCI I804:11029-000000082259
본문언어 영어
저작권 논문은 저작권에 의해 보호받습니다.

초록(요약문)

We propose an optimization strategy for recommendation system accelerators to enhance inference on NAND flash-based in-storage computing (ISC). Modern recommendation systems provide personalized outputs from user activities such as clicks and streaming histories. Deep learning models for these tasks employ embedding layers that require large memory and show irregular access patterns. As data volumes increase, embedding tables often grow beyond DRAM capacity, making NAND flash storage necessary. However, this random access pattern causes most of the data fetched from large NAND flash pages to remain unused. This disparity between small embedding vectors and large page buffers leads to underutilized internal bandwidth and degraded performance. To solve this, we use access frequency-based remapping to group frequently accessed embedding vectors onto the same page. This is combined with plane distribution, which distributes these pages across multiple planes to maximize hardware parallelism. Second, we implement a page-wise cache in the SSD controller that stores frequently accessed pages in SRAM. Experimental results show that our proposed method improves latency by up to 81% over the existing ISC architectures.

I. Introduction 1
II. Preliminary 6
2.1. Random Data Access Pattern on Recommendation System 6
2.2. Inefficient Bandwidth Utilization on NAND Flash Memory 7
2.3. Data Access Frequency on Recommendation System 9
III. Hardware Accelerator for Recommendation System 11
3.1. Motivation 11
3.2. Embedding Layers in Prior SSD-Based Accelerators 16
3.3. RecFlash 17
3.4. RecFlash Hardware Design 34
IV. Evaluation 38
4.1. Experimental Setup 38
4.2. Latency Analysis of Embedding Operation 41
4.3. Energy Consumption Analysis 43
4.4. End-to-End Model Latency Analysis 45
4.5. Cumulative Inference Time Analysis with Online Training 48
V. Conclusion 51

반출 Meta View 목록

서강대학교

검색 상세

Optimization with Access Frequency-Based Remapping for Recommendation System Inference Accelerators

초록(요약문)

목차