Chronica : A Data-Imbalance-Aware Scheduler for Distributed Deep Learning
- 주제어 (키워드) Distributed deep learning , Straggler , Scheduler , Data imbalance
- 발행기관 서강대학교 일반대학원
- 지도교수 박성용
- 발행년도 2023
- 학위수여년월 2023. 2
- 학위명 석사
- 학과 및 전공 일반대학원 컴퓨터공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000070054
- UCI I804:11029-000000070054
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
One of the major challenges in distributed deep learning is to attenuate straggler problem. The straggler increases synchronization delay and significantly inhibits the convergence of deep learning model. We have empirically observed that the imbalanced data samples worsens the straggler problem and makes the convergence of deep learning model slower. However, existing approaches such as BOA and EP4DDL have not addressed data imbalance issue while solving the straggler problem. To overcome straggler and data imbal- ance problems, we propose Chronica, a new data-imbalance-aware scheduler. Based on the size of data and configuration of each worker, Chronica elaborately predicts the required training time for each worker. Chronica then provides equivalent training time to each of workers by alleviating both mini-batch-level and epoch-level straggler problems. Furthermore, in order to achieve fast convergence, Chronica suggests a new parameter synchronization scheme based on the weighted average of training load on each worker. Our extensive evaluation using four deep learning models over eight Amazon EC2 GPU instances showed that Chronica achieves up to 2.55× speedup against BOA and EP4DDL.
more