검색 상세

데이터 불균형 문제를 위한 가중치 기반 샘플링을 이용한 정규화 방법

Novel Regularization Method with Weighted Negative Sampling for Data Imbalance Problem

초록/요약

In neural network models, obtaining a high-quality dataset is critical because they are generally reliant on training data. A common problem is a class imbalance that occurs in which models tend to be biased to the majority class when the training data is not balanced. To overcome this problem, we propose a novel regularization method that provides a penalty to the loss function by using two facets of the distribution of the model's output p(y|x): 1) skewed mean and 2) variance divergence between p(y|x=1) and p(y|x=0). Moreover, weighted negative sampling was used to reduce the gap between the amount of data in each class. We define several weight functions and investigate sampling strategies to obtain the most effective one. Experimental results demonstrate that our methods consistently improve performance in sentence pair classification tasks, and the combination of the regularization methods and weighted negative sampling provide substantial performance gains on all five datasets; notably, state-of-the-art performances are achieved on the WikiQA and SelQA datasets.

more