검색 상세

Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts

초록/요약 도움말

Named entity recognition (NER) is the process of categorizing named entities in a given text that suffers from the lack of labeled corpora, which is a long-standing issue. Deep neural networks have been successfully applied to NER tasks. However, they require a large number of annotated data. Regardless of the number of data made available, annotation requires significant human effort, which is expensive and time-consuming. Moreover, collecting labeled data that reflect contemporary surrounding statuses requires exhaustive follow-up and incurs correspondingly higher costs. Current NERs typically focus on the supervised learning of hand-crafted data. The most well-known dataset for NER shared tasks, which was released at the 2003 Conference on Natural Language Learning, is used for basic training and evaluation. Although the data are qualified, the database has low coverage of timely material. In this paper, we illustrate methods for swiftly labeling up-to-date data via distant supervision. To tackle the difficulty of annotating contemporary written texts, we generate labeled data articles that reflect the latest issues. We evaluated the proposed methods with bidirectional long short-term memory conditional random-field architecture using static and contextualized embedding methods. Our proposed models perform higher than state-of-the-art methods with average F1-scores 3.09% better with weakly labeled Wikipedia data and 3.47% better with Cable News Network data. When using the NER model with Flair embedding, our method shows 1.50 and 3.26% higher F1-scores with weakly labeled Wikipedia and news data, respectively. Qualitatively, the proposed model also performs better when extracting contemporary keywords.

more