User Utterance Intent Classification in Unseen Domain Based on Smaller Large-Language Models through Prompt Engineering
소형 대규모-언어모델의 프롬프트 엔지니어링을 통한 미학습 도메인에서의 사용자 발화 의도분류
- 주제어 (키워드) Zero-shot Intent Classification , Instruction-tuned Models , Large Language Models; 의도 분류 , 지시 학습 대규모 언어 모델 , 검색기 증강 구조
- 발행기관 서강대학교 일반대학원
- 지도교수 서정연
- 발행년도 2024
- 학위수여년월 2024. 8
- 학위명 박사
- 학과 및 전공 일반대학원 컴퓨터공학과
- 실제 URI http://www.dcollection.net/handler/sogang/000000078839
- UCI I804:11029-000000078839
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록 (요약문)
The rapid development of large language models, alongside successful instruction-tuning that enables them to appropriately perform various tasks based on human instructions, has led to the use of instruction-tuned models as specialists for specific tasks. However, research using instruction-tuned models for intent classification has been limited, and the intent classification accuracy of these state-of-the-art instruction-tuned models still lags behind that of previously studied models. This thesis explores methods for utilizing instruction-tuned large language models for zero-shot intent classification in unseen domains and provides experimental results to support these findings. Specifically, the thesis proposes diversifying the types and numbers of instruction templates, converting the task into multiple-choice intent classification task, providing intent descriptions, and augmenting intent classification with a description retriever. Each of these methods has been tested to determine the optimal way to use instruction-tuned models for zero-shot intent classification tasks in unseen domains. By adopting these methods, performance improvements of 11.08%, 17.32%, and 7.12% absolute points were achieved on the CLINC150, HWU64, and BANKING77 datasets, respectively, surpassing the performance of state-of-the-art models.
more초록 (요약문)
대규모 언어 모델의 빠른 발전과 함께 인간의 지시에 따라 다양한 작업을 적절하게 수행할 수 있도록 하는 지시 학습 기술의 발전으로 인해 자연어 처리 각 분야의 전문가로서 지시 학습 모델을 사용하는 연구들이 활발히 이루어지고 있다. 그러나 의도 분류를 위한 지시 학습 대규모 모델 활용에 대한 연구는 제한적이며, 공개된 지시 학습 모델의 의도 분류 정확도는 이전에 연구된 미세조정 모델들에 비해 여전히 뒤떨어진다. 본 논문은학습되지 않은 도메인에서 정확한 의도분류를 하기위해 지시학습된 대규모 모델을 활용하는 방법을 탐구하고 이를 뒷받침하는 실험 결과를 제공한다. 구체적으로, 1) 지시 학습용 템플릿의 종류와 수를 다양화하고, 2)모델의 학습목표와 생성방식을 다지선다형의도분류작업으로 전환하며, 3) 모델 입력으로 분류 대상 의도에 대한 구체적인 설명을 제공하고, 4) 검색기 증강 구조를 이용한 의도 분류 방법을 제안한다. 이러한 방법을 채택함으로써 의도 분류 공개 데이터인 CLINC150, HWU64, BANKING77 데이터셋에서 각각 11.08%, 17.32%, 7.12%의 성능 향상을 달성하여 최신 모델의 성능을 능가하였다.
more목차
1 Introduction 1
2 Related Works 7
2.1 Dialogue System 7
2.2 Intent Classification 8
2.3 Intent Classification Datasets 10
2.4 Intent Classification Setups 12
2.5 Advancements in Neural Network Architectures for En-hanced Intent Classification 15
2.5.1 The Role of Neural Networks in Intent Classification 15
2.5.2 Advances in Transformer-Based Architectures 15
2.5.3 Leveraging Transformer Models for Intent Classification 17
2.5.4 Recent Topics in Leveraging LLMs for Intent Classification 18
2.6 Few-shot and Zero-shot Intent Classification 19
2.6.1 Prototypical Networks 19
2.6.2 Retriever-based zero-shot intent classification 22
2.6.3 Leveraing Large Language Models for Zero-shot Intent Classification 23
2.7 Instruction-tuning 24
2.8 Data Resources for Instruction-Tuning 25
2.8.1 Instruction-Tuning Datasets: Composition and Examples 25
2.8.2 Human-Annotated Instruction-Tuning Datasets 26
2.8.3 Model Distillation Instruction-Tuning Datasets 27
2.9 The Rise of Instruction-Tuned Large Language Models 28
3 Fine-tuning Instruction-tuned Large Language Models for Intent Classification Task 30
3.1 Learning Objectives for Training Large Language Models for Intent Classification 30
3.1.1 Encoder-Decoder model and Decoder only model 30
3.1.2 Instruction-tuning LLM for intent classification 33
3.1.3 Description-Generation 39
3.1.4 Retriever-augmented Intent Classification System 44
4 Experiments 46
4.1 Datasets 46
4.2 Training and Testing Setups 48
4.3 Zero-shot Performance of Open Large Language Instruction-tuned Models 51
4.4 Ablation Study 55
4.5 Effect of Number of Insturction Templates Used in Utilizing Instruction-tuned Models 56
4.6 Effect of Number of Options Used in Training for Unseen Domain Adaptation 58
4.7 Effect of Types of Templates for Instruction-tuned Models 59
4.8 Effect of Quality and Quantity of Descriptions 62
4.9 Description Retriever 65
4.10 In-domain Baselines 67
4.11 Out-domain Baselines 68
5 Error Analysis of Model Predictions on Intent Classification Datasets: CLINC and BANKING 71
5.1 Case Study 72
5.1.1 Case Study of Model Interpretation 72
5.1.2 Case Study of Data Completeness 74
5.1.3 Case Study of Conversational History Requirement 75
5.1.4 Error Type Statistics for the BANKING Dataset 76
5.2 Error Mitigation Approaches 77
5.2.1 Calculation of Model's Confidence Score 77
5.2.2 Confidence Score-based Error Mitigation Approach 78
5.2.3 Prompt-varied Voting Error Mitigation Approach 80
5.2.4 Model-varied Voting Error Mitigation Approach 81
6 Conclusion 82
Bibliography 84