검색 상세

Accelerating Diffusion Transformers by Dynamically Skipping Redundant Operations

초록(요약문)

Diffusion Transformers (DiTs) have demonstrated outstanding performance as generative models, but their computationally expensive iterative sampling process results in significant latency and energy costs. We propose a novel software-hardware co-optimized acceleration framework designed to address these computational challenges by leveraging the inherent temporal redundancy in the DiT inference process. We introduce a redundancy-aware computing mechanism that selectively skips redundant operations and reuses computational results from previous timestep. To minimize potential accuracy degradation caused by cumulative approximation errors, dynamic threshold scaling (DTS) method is employed to adjust similarity criteria. Furthermore, we design dedicated units capable of efficient low bit compression and comparison to reduce hardware overhead. We design an accelerator architecture based on this dynamic skipping method, and experiments confirm it achieves substantial performance and energy gains while maintaining output quality.

more

목차

Ⅰ. Introduction 1
1.1. Background 1
1.2. Problem Statement 2
Ⅱ. Preliminaries 4
2.1. Diffusion Model 4
2.2. Diffusion Transformer 7
Ⅲ. RADiT: Redundancy-Aware Diffusion Transformer 9
3.1. Observation 9
3.2. Overview 13
3.3. Redundancy-Aware Algorithms 15
3.3.1 Block-Level Redundancy Detection 16
3.3.2 Dynamic Threshold Scaling 19
3.3.3 Efficient Bit Compression 23
3.4 Hardware Architecture 27
3.5 Summary 29
Ⅳ. Experiments and Results 31
4.1. Experimental Setup 31
4.2 Evaluations 33
4.2.1 Accuracy Evaluation 33
4.2.2 Performance Evaluation 36
4.3 Summary 40
Ⅴ. Conclusion 42
References 43

more