검색 상세

Optimization with Access Frequency-Based Remapping for Recommendation System Inference Accelerators

초록(요약문)

We propose an optimization strategy for recommendation system accelerators to enhance inference on NAND flash-based in-storage computing (ISC). Modern recommendation systems provide personalized outputs from user activities such as clicks and streaming histories. Deep learning models for these tasks employ embedding layers that require large memory and show irregular access patterns. As data volumes increase, embedding tables often grow beyond DRAM capacity, making NAND flash storage necessary. However, this random access pattern causes most of the data fetched from large NAND flash pages to remain unused. This disparity between small embedding vectors and large page buffers leads to underutilized internal bandwidth and degraded performance. To solve this, we use access frequency-based remapping to group frequently accessed embedding vectors onto the same page. This is combined with plane distribution, which distributes these pages across multiple planes to maximize hardware parallelism. Second, we implement a page-wise cache in the SSD controller that stores frequently accessed pages in SRAM. Experimental results show that our proposed method improves latency by up to 81% over the existing ISC architectures.

more

목차

I. Introduction 1
II. Preliminary 6
2.1. Random Data Access Pattern on Recommendation System 6
2.2. Inefficient Bandwidth Utilization on NAND Flash Memory 7
2.3. Data Access Frequency on Recommendation System 9
III. Hardware Accelerator for Recommendation System 11
3.1. Motivation 11
3.2. Embedding Layers in Prior SSD-Based Accelerators 16
3.3. RecFlash 17
3.4. RecFlash Hardware Design 34
IV. Evaluation 38
4.1. Experimental Setup 38
4.2. Latency Analysis of Embedding Operation 41
4.3. Energy Consumption Analysis 43
4.4. End-to-End Model Latency Analysis 45
4.5. Cumulative Inference Time Analysis with Online Training 48
V. Conclusion 51

more