자연어추론에서의 성별, 인종에 대한 편향 측정
GRiN : Evaluating Gender and Racial Bias in Natural Language Inference
- 주제(키워드) Natural language Understanding , Natural Language Inference , Pretrained Lanaugae Model , Fairness in NLP , 자연어이해 , 자연어추론 , 인공지능 윤리문제
- 발행기관 서강대학교 일반대학원
- 지도교수 서정연
- 발행년도 2021
- 학위수여년월 2021. 8
- 학위명 석사
- 학과 및 전공 일반대학원 컴퓨터공학과
- UCI I804:11029-000000066206
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권보호를 받습니다.
초록/요약
We introduce the template-based gender and racial bias evaluation framework in natural language inference (NLI) task. We name the dataset used in our framework as GRiN (Gender and Racial bias in Natural Language Inference). In this work, we define bias as overgeneralized relation between a target (e.g., gender, race) and an attribute (e.g., occupation). To measure such bias, we design pairs of sentences in three types. The first two templates generate the same bias-neutral premise illustrating an attribute with target-specific hypotheses. The third type is NLI transformed sentences from crowdsourced stereotype benchmarks (CrowS-Pairs, StereoSet) to preserve the natural context. Our bias evaluation metric is the overall accuracy and the prediction standard deviation between sentence pairs. We demonstrate that NLI classifiers trained on SNLI and MNLI corpus substantially exhibit gender and racial biases, regardless of their intrinsic biases.
more