A Proposed Model for Bengali Named Entity Recognition Using Maximum Entropy Markov Model Incorporated with Rich Linguistic Feature Set

Author(s):  
Fahmida Alam ◽  
Md. Asiful Islam
Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1596
Author(s):  
Xiang Li ◽  
Junan Yang ◽  
Hui Liu ◽  
Pengjiang Hu

Named entity recognition (NER) aims to extract entities from unstructured text, and a nested structure often exists between entities. However, most previous studies paid more attention to flair named entity recognition while ignoring nested entities. The importance of words in the text should vary for different entity categories. In this paper, we propose a head-to-tail linker for nested NER. The proposed model exploits the extracted entity head as conditional information to locate the corresponding entity tails under different entity categories. This strategy takes part of the symmetric boundary information of the entity as a condition and effectively leverages the information from the text to improve the entity boundary recognition effectiveness. The proposed model considers the variability in the semantic correlation between tokens for different entity heads under different entity categories. To verify the effectiveness of the model, numerous experiments were implemented on three datasets: ACE2004, ACE2005, and GENIA, with F1-scores of 80.5%, 79.3%, and 76.4%, respectively. The experimental results show that our model is the most effective of all the methods used for comparison.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ming Cheng ◽  
Shufeng Xiong ◽  
Fei Li ◽  
Pan Liang ◽  
Jianbo Gao

Abstract Background Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities. Methods To tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition. Results In order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset. Conclusions The comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.


Author(s):  
Zeqi Tan ◽  
Yongliang Shen ◽  
Shuai Zhang ◽  
Weiming Lu ◽  
Yueting Zhuang

Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods, considering the entity recognition as a span classification task, can deal with nested entities naturally. But they suffer from the huge search space and the lack of interactions between entities. To address these issues, we propose a novel sequence-to-set neural network for nested NER. Instead of specifying candidate spans in advance, we provide a fixed set of learnable vectors to learn the patterns of the valuable spans. We utilize a non-autoregressive decoder to predict the final set of entities in one pass, in which we are able to capture dependencies between entities. Compared with the sequence-to-sequence method, our model is more suitable for such unordered recognition task as it is insensitive to the label order. In addition, we utilize the loss function based on bipartite matching to compute the overall training loss. Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. The code is available at https://github.com/zqtan1024/sequence-to-set.


Sign in / Sign up

Export Citation Format

Share Document