dCollection 디지털 학술정보 유통시스템

데이터 불균형 문제를 위한 가중치 기반 샘플링을 이용한 정규화 방법

Novel Regularization Method with Weighted Negative Sampling for Data Imbalance Problem

원문보기

주제(키워드) Data Imbalance Problem , Regularization , Weighted Sampling , Sentence Pair Classification
발행기관 서강대학교 일반대학원
지도교수 서정연
발행년도 2020
학위수여년월 2020. 2
학위명 석사
학과 및 전공 일반대학원 컴퓨터공학과
UCI I804:11029-000000064912
본문언어 영어
저작권 서강대학교 논문은 저작권보호를 받습니다.

초록/요약

In neural network models, obtaining a high-quality dataset is critical because they are generally reliant on training data. A common problem is a class imbalance that occurs in which models tend to be biased to the majority class when the training data is not balanced. To overcome this problem, we propose a novel regularization method that provides a penalty to the loss function by using two facets of the distribution of the model's output p(y|x): 1) skewed mean and 2) variance divergence between p(y|x=1) and p(y|x=0). Moreover, weighted negative sampling was used to reduce the gap between the amount of data in each class. We define several weight functions and investigate sampling strategies to obtain the most effective one. Experimental results demonstrate that our methods consistently improve performance in sentence pair classification tasks, and the combination of the regularization methods and weighted negative sampling provide substantial performance gains on all five datasets; notably, state-of-the-art performances are achieved on the WikiQA and SelQA datasets.

반출 Meta View 목록

서강대학교

검색 상세

데이터 불균형 문제를 위한 가중치 기반 샘플링을 이용한 정규화 방법

초록/요약