Building the Knowledge Graph for Zakat (KGZ) in Indonesian Language

2021 ◽  
Vol 16 ◽  
pp. 1-10
Author(s):  
Husni Teja Sukmana ◽  
JM Muslimin ◽  
Asep Fajar Firmansyah ◽  
Lee Kyung Oh

In Indonesia, philanthropy is identical to Zakat. Zakat belongs to a specific domain because it has its characteristics of knowledge. This research studied knowledge graph in the Zakat domain called KGZ which is conducted in Indonesia. This area is still rarely performed, thus it becomes the first knowledge graph for Zakat in Indonesia. It is designed to provide basic knowledge on Zakat and managing the Zakat in Indonesia. There are some issues with building KGZ, firstly, the existing Indonesian named entity recognition (NER) is non-restricted and general-purpose based which data is obtained from a general source like news. Second, there is no dataset for NER in the Zakat domain. We define four steps to build KGZ, involving data acquisition, extracting entities and their relationship, mapping to ontology, and deploying knowledge graphs and visualizations. This research contributed a knowledge graph for Zakat (KGZ) and a building NER model for Zakat, called KGZ-NER. We defined 17 new named entity classes related to Zakat with 272 entities, 169 relationships and provided labelled datasets for KGZ-NER that are publicly accessible. We applied the Indonesian-Open Domain Information Extractor framework to process identifying entities’ relationships. Then designed modeling of information using resources description framework (RDF) to build the knowledge base for KGZ and store it to GraphDB, a product from Ontotext. This NER model has a precision 0.7641, recall 0.4544, and F1-score 0.5655. The increasing data size of KGZ is required to discover all of the knowledge of Zakat and managing Zakat in Indonesia. Moreover, sufficient resources are required in future works.

2020 ◽  
Author(s):  
Shintaro Tsuji ◽  
Andrew Wen ◽  
Naoki Takahashi ◽  
Hongjian Zhang ◽  
Katsuhiko Ogasawara ◽  
...  

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.


2019 ◽  
Vol 9 (1) ◽  
pp. 15 ◽  
Author(s):  
Runyu Fan ◽  
Lizhe Wang ◽  
Jining Yan ◽  
Weijing Song ◽  
Yingqian Zhu ◽  
...  

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Wangping Xiong ◽  
Jun Cao ◽  
Xian Zhou ◽  
Jianqiang Du ◽  
Bin Nie ◽  
...  

Background. Chinese patent medicines are increasingly used clinically, and the prescription drug monitoring program is an effective tool to promote drug safety and maintain health. Methods. We constructed a prescription drug monitoring program for Chinese patent medicines based on knowledge graphs. First, we extracted the key information of Chinese patent medicines, diseases, and symptoms from the domain-specific corpus by the information extraction. Second, based on the extracted entities and relationships, a knowledge graph was constructed to form a rule base for the monitoring of data. Then, the named entity recognition model extracted the key information from the electronic medical record to be monitored and matched the knowledge graph to realize the monitoring of the Chinese patent medicines in the prescription. Results. Named entity recognition based on the pretrained model achieved an F1 value of 83.3% on the Chinese patent medicines dataset. On the basis of entity recognition technology and knowledge graph, we implemented a prescription drug monitoring program for Chinese patent medicines. The accuracy rate of combined medication monitoring of three or more drugs of the program increased from 68% to 86.4%. The accuracy rate of drug control monitoring increased from 70% to 97%. The response time for conflicting prescriptions with two drugs was shortened from 1.3S to 0.8S. The response time for conflicting prescriptions with three or more drugs was shortened from 5.2S to 1.4S. Conclusions. The program constructed in this study can respond quickly and improve the efficiency of monitoring prescriptions. It is of great significance to ensure the safety of patients’ medication.


2020 ◽  
Vol 10 (18) ◽  
pp. 6429
Author(s):  
SungMin Yang ◽  
SoYeop Yoo ◽  
OkRan Jeong

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.


2021 ◽  
Vol 11 (23) ◽  
pp. 11425
Author(s):  
Nikolaos Giarelis ◽  
Nikos Karacapilidis

This paper aims to meaningfully analyse the Horizon 2020 data existing in the CORDIS repository of EU, and accordingly offer evidence and insights to aid organizations in the formulation of consortia that will prepare and submit winning research proposals to forthcoming calls. The analysis is performed on aggregated data concerning 32,090 funded projects, 34,295 organizations participated in them, and 87,067 public deliverables produced. The modelling of data is performed through a knowledge graph-based approach, aiming to semantically capture existing relationships and reveal hidden information. The main contribution of this work lies in the proper utilization and orchestration of keyphrase extraction and named entity recognition models, together with meaningful graph analytics on top of an efficient graph database. The proposed approach enables users to ask complex questions about the interconnection of various entities related to previously funded research projects. A set of representative queries demonstrating our data representation and analysis approach are given at the end of the paper.


Author(s):  
Shuang Liu ◽  
Hui Yang ◽  
Jiayi Li ◽  
Simon Kolmanič

AbstractWith rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and $$F_1$$ F 1 value have been significantly improved.


Author(s):  
Nona Naderi ◽  
Julien Knafou ◽  
Jenny Copara ◽  
Patrick Ruch ◽  
Douglas Teodoro

The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains—biology, chemistry, and medicine—available in different languages—English and French. Individual deep masked language models, pretrained on external corpora, are fined-tuned on task-specific domain and language corpora and ensembled using classical majority voting strategies. Experiments show statistically significant improvement of the ensemble models over an individual BERT-based baseline model, with an overall best performance of 77% macro F1-score. We further perform a detailed analysis of the ensemble results and show how their effectiveness changes according to entity properties, such as length, corpus frequency, and annotation consistency. The results suggest that the ensembles of deep masked language models are an effective strategy for tackling NER across corpora from the health and life science domains.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Guozhen Zhang ◽  
Xiangang Cao ◽  
Mengyuan Zhang

With the rapid development of coal mine intelligent technology, the complexity of coal mine equipment has been continuously improved and the equipment maintenance resources have been continuously enriched. The traditional coal mine equipment maintenance knowledge management technology can no longer meet the current needs of equipment maintenance knowledge management, and the problems of low utilization rate, poor interoperability, and serious loss of knowledge have gradually emerged. It is urgent to study new knowledge system construction and knowledge management application technology for large-scale coal mine equipment maintenance resources. Knowledge graph is a technical method to describe the relationship between things in the objective world by using a graph model. It can effectively solve the problem of knowledge dynamic mining and management under large-scale data. Therefore, this paper focuses on the establishment of a coal mine equipment maintenance knowledge graph system by using knowledge graph technology. The main research contents are as follows: Firstly, based on the current situation that there is no unified basic knowledge system in the field of coal mine equipment maintenance, this paper establishes the coal mine equipment maintenance ontology (CMEMO) to effectively solve the problem that there are no unified representation, integration, and sharing of coal mine equipment maintenance knowledge in this field and provide support for the construction of coal mine equipment maintenance knowledge graph. Then, aiming at the problem that the traditional named-entity recognition method has a poor recognition effect and relies too much on artificial feature design, this paper proposes a named-entity recognition model for coal mine equipment maintenance based on neural network (BERT-BiLSTM-CRF) and applies the model to the coal mine equipment maintenance data set for verification. The experimental results show that, under the same data set, the entity recognition effect of this model is more leading than that of other models. Finally, through demand analysis and architecture design, combined with the constructed ontology model of coal mine equipment maintenance field, the entity identification of coal mine equipment maintenance is completed based on the BERT-BiLSTM-CRF model and the Django application framework is used to build the coal mine equipment maintenance knowledge graph system to realize the functions of each module of the knowledge graph system.


Sign in / Sign up

Export Citation Format

Share Document