name disambiguation
Recently Published Documents


TOTAL DOCUMENTS

280
(FIVE YEARS 75)

H-INDEX

29
(FIVE YEARS 3)

2021 ◽  
pp. 1-30
Author(s):  
Michael Färber ◽  
David Lamprecht

Abstract Several scholarly knowledge graphs have been proposed to model and analyze the academic landscape. However, although the number of data sets has increased remarkably in recent years, these knowledge graphs do not primarily focus on data sets but rather associated entities such as publications. Moreover, publicly available data set knowledge graphs do not systematically contain links to the publications in which the data sets are mentioned. In this paper, we present an approach for constructing an RDF knowledge graph that fulfills these mentioned criteria. Our data set knowledge graph, DSKG, is publicly available at http://dskg.org and contains metadata of data sets for all scientific disciplines. To ensure high data quality of the DSKG, we first identify suitable raw data set collections for creating the DSKG. We then establish links between the data sets and publications modeled in the Microsoft Academic Knowledge Graph that mention these data sets. As the author names of data sets can be ambiguous, we develop and evaluate a method for author name disambiguation and enrich the knowledge graph with links to ORCID. Overall, our knowledge graph contains more than 2,000 data sets with associated properties, as well as 814,000 links to 635,000 scientific publications. It can be used for a variety of scenarios, facilitating advanced data set search systems and new ways of measuring and awarding the provisioning of data sets.


2021 ◽  
Author(s):  
Luciano V. B. Espiridião ◽  
Laura L. Dias ◽  
Anderson A. Ferreira

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 383
Author(s):  
Xin Zheng ◽  
Pengyu Zhang ◽  
Yanjie Cui ◽  
Rong Du ◽  
Yong Zhang

Name disambiguation has long been a significant issue in many fields, such as literature management and social analysis. In recent years, methods based on graph networks have performed well in name disambiguation, but these works have rarely used heterogeneous graphs to capture relationships between nodes. Heterogeneous graphs can extract more comprehensive relationship information so that more accurate node embedding can be learned. Therefore, a Dual-Channel Heterogeneous Graph Network is proposed to solve the name disambiguation problem. We use the heterogeneous graph network to capture various node information to ensure that our method can learn more accurate data structure information. In addition, we use fastText to extract the semantic information of the data. Then, a clustering method based on DBSCAN is used to classify academic papers by different authors into different clusters. In many experiments based on real datasets, our method achieved high accuracy, which proves its effectiveness.


2021 ◽  
Author(s):  
Qian Zhou ◽  
Wei Chen ◽  
Weiqing Wang ◽  
Jiajie Xu ◽  
Lei Zhao

2021 ◽  
Author(s):  
Shivashankar Subramanian ◽  
Daniel King ◽  
Doug Downey ◽  
Sergey Feldman

2021 ◽  
Author(s):  
Zhiqiang Zhang ◽  
Chunqi Wu ◽  
Zhao Li ◽  
Juanjuan Peng ◽  
Haiyan Wu ◽  
...  

2021 ◽  
pp. 016555152110181
Author(s):  
Jinseok Kim ◽  
Jenna Kim ◽  
Jinmo Kim

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.


Sign in / Sign up

Export Citation Format

Share Document