Metadata Search and Discovery Services for Scientific Applications on HPC Storage Architectures
HPC 스토리지 아키텍처를 위한 과학용 응용의 메타데이터 검색 및 발견 서비스 연구
- 주제(키워드) HPC Storage Systems , Scientific Collaborations , Data Sharing , Scientific Metadata Indexing , Multi-attribute Querying , Scientific Data Management
- 발행기관 Graduate School of Computer Science and Engineering, Sogang University
- 지도교수 Youngjae Kim
- 발행년도 2021
- 학위수여년월 2021. 2
- 학위명 박사
- 학과 및 전공 일반대학원 컴퓨터공학과
- UCI I804:11029-000000065587
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
초록/요약
The high-performance computing (HPC) storage systems are one of the critical components of computational, experimental, and observational science today. The ability to selectively access desired information from large volumes of data at very high speeds and minimum overhead is critical to scientific applications. Therefore, several efforts have been made to integrate scientific search and discovery services in HPC storage systems. However, due to a variety of different HPC storage architec- tures such as federated geo-distributed HPC data centers, distributed and parallel file systems, and high-speed persistent memory-based storage pools, it is non-trivial to apply a single solution to multiple storage architectures of HPC paradigm. Some of the main challenges include minimal performance degradation, effective meta- data management, data sharing controls and policies, and awareness of underlying storage. Therefore, accelerating the scientific search and discovery services while addressing the aforementioned challenges is crucial, especially for upcoming era of exascale storage architectures. This dissertation is focused on solving the above challenges and building scien- tific search and discovery service framework targeting each storage layer to accelerate HPC and scientific computing. In the first part of the dissertation (Chapter 3), we build a scientific collaboration friendly storage model for a wide-area storage network, i.e., geo-distributed HPC data centers. So, applications and scientists can benefit from the discovery services without losing performance with our proposed multi-mode metadata indexing approach. In the second part of the dissertation (Chapter 4), we present our solution to enable search services for scientists and applications directly running on scalable and distributed file systems by tightly integrating data man- agement into the file system. Chapter 5, presents a memory-object based scientific discovery service to fully utilize the emerging non-volatile memory pools. This dissertation shows that the proposed scientific metadata search and dis- covery services inside storage layers highly complements the HPC and scientific com- puting architectures.
more