검색 상세

Eliminating Data and Metadata I/O Bottlenecks in Parallel Utilities for HPC Environments

HPC 환경의 병렬 Utiilty에서의 데이터 및 메타데이터 I/O 병목 제거 연구

초록 (요약문)

Modern High-Performance Computing (HPC) environments face mounting chal- lenges due to the shift from large to small file datasets, combined with an increasing number of users and parallelized applications. As HPC systems rely on Parallel File Systems (PFS), such as Lustre for data processing, performance bottlenecks stemming from Object Storage Target (OST) contention have become a significant concern. Existing solutions, such as LADS with its object-level scheduling approach, fall short in large-scale HPC environments due to their inability to effectively ad- dress metadata I/O bottlenecks and the growing number of I/O processes. This study highlights the pressing need for a comprehensive solution that tackles both OST contention and metadata I/O challenges in diverse HPC workloads. To address these challenges, we propose SwiftLoad, an object-level I/O scheduling framework that leverages a metadata catalog to improve the performance and efficiency of paral- lel HPC utilities. The adoption of the metadata catalog mitigates the metadata I/O bottlenecks that commonly occur in HPC utilities, a challenge that is particularly pronounced in object-level I/O scheduling. SwiftLoad addresses OST contention and the uneven distribution of I/O processes across different OSTs through mathe- matical modeling and incorporates a Loader Configuration Module to regulate the number of I/O processes. Evaluated with two representative utilities—data dedupli- cation profiling and data augmentation—SwiftLoad demonstrated performance im- provements of up to 5.63× and 11.0×, respectively, on a production supercomputer.

more

목차

1 Introduction 11
2 Background and Related Work 15
2.1 Lustre File System 15
2.2 HPC Utilities 17
2.3 Related Work 18
3 Analysis of I/O bottlenecks in HPC utilities 20
3.1 OST Contention and Imbalance Analysis 20
3.2 Limitations of Object-Level Scheduling in HPC Utilities 23
3.2.1 Metadata I/O bottleneck 24
3.2.2 Inefficiencies of Numerous Processes on OSTs 25
4 Design of SwiftLoad 27
4.1 SwiftLoad Overview 27
4.2 Catalog Construction 28
4.3 Scheduling with Catalog 32
4.4 Loader Configuration Module 33
4.5 Catalog Consistency 35
4.6 Catalog Size Analysis 36
5 Evaluation 37
5.1 Experimental setup 37
5.2 Simulation Study 39
5.3 Performance Evaluation 42
5.3.1 Overall Performance 42
5.3.2 Metadata I/O Latency 43
5.3.3 I/O Read Latency 45
5.3.4 Overhead Analysis on Catalog Construction 46
6 Conclusion and Future Work 48
6.1 Conclusion 48
6.2 Future Work 48
References 50

more