데이터 불균형 문제를 위한 가중치 기반 샘플링을 이용한 정규화 방법
Novel Regularization Method with Weighted Negative Sampling for Data Imbalance Problem
- 주제(키워드) Data Imbalance Problem , Regularization , Weighted Sampling , Sentence Pair Classification
- 발행기관 서강대학교 일반대학원
- 지도교수 서정연
- 발행년도 2020
- 학위수여년월 2020. 2
- 학위명 석사
- 학과 및 전공 일반대학원 컴퓨터공학과
- UCI I804:11029-000000064912
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
초록/요약
In neural network models, obtaining a high-quality dataset is critical because they are generally reliant on training data. A common problem is a class imbalance that occurs in which models tend to be biased to the majority class when the training data is not balanced. To overcome this problem, we propose a novel regularization method that provides a penalty to the loss function by using two facets of the distribution of the model's output p(y|x): 1) skewed mean and 2) variance divergence between p(y|x=1) and p(y|x=0). Moreover, weighted negative sampling was used to reduce the gap between the amount of data in each class. We define several weight functions and investigate sampling strategies to obtain the most effective one. Experimental results demonstrate that our methods consistently improve performance in sentence pair classification tasks, and the combination of the regularization methods and weighted negative sampling provide substantial performance gains on all five datasets; notably, state-of-the-art performances are achieved on the WikiQA and SelQA datasets.
more

