name matching
Recently Published Documents


TOTAL DOCUMENTS

83
(FIVE YEARS 15)

H-INDEX

12
(FIVE YEARS 1)

Author(s):  
Dmitry Mozzherin

Biodiversity taxonomy provides a means to organize information about living organisms into maintainable tree- or graph-like structures (taxonomic backbones). Taxonomy is tightly bound to biodiversity nomenclature—a collection of recommendations, rules and conventions for naming living organisms. Species are often considered to be the most important unit of taxonomy structures. Keeping scientific names of species and other taxa accurate and up to date are major challenges during creation and maintenance of large taxonomic backbones. Global Names Architecture (Global Names) is an initiative that developed tools and databases for detecting, parsing, and verifying scientific names. Verification tools also provide information about which taxonomic and nomenclatural resources contain information for a given scientific name. Taxonomic intelligence provided by resources aggregated by Global Names allows resolving of taxon names from different backbones, even if their "current" scientific names vary. Parsing of scientific names with GNparser allows for normalization of names, making them comparable. Fast name matching (reconciliation) and discovery of a taxonomic meaning (resolution) by GNverifier connects information from various resources. The most recently developed tools by Global Names provide name verification and taxon matching on an unprecedented scale. During this presentation we are going to describe Global Names tools and show how they can be used for reconciliation of lexical variants of scientific names, for extracting the authorship metadata, how names can be verified and resolved, and how data can be connected to a variety of biodiversity resources.


Author(s):  
William Ulate ◽  
Sunitha Katabathuni ◽  
Alan Elliott

The World Flora Online (WFO) is the collaborative, international initiative to achieve Target 1 of the Global Strategy for Plant Conservation (GSPC): "An online flora of all known plants." WFO provides an open-access, web-based compendium of the world’s plant species, which builds upon existing knowledge and published floras, checklists and revisions but will also require the collection and generation of new information on poorly known groups and unexplored regions (Borsch et al. 2020). The construction of the WFO Taxonomic Backbone is central to the entire WFO as it determines the accessibility of additional content data and at the same time, represents a taxonomic opinion on the circumscription of those taxa. The Plant List v.1.1 (TPL 2013) was the starting point for the backbone, as this was the most comprehensive resource covering all plants available. We have since curated the higher taxonomy of the backbone, based on the following published community-derived classifications: the Angiosperm Phylogeny Group (APG IV 2016), the Pteridophyte Phylogeny Group (PPG I 2016), Bryophytes (Buck et al. 2008), and Hornworts & Liverworts (Söderström et al. 2016). The WFO presents a community-supported consensus classification with the aim of being the authoritative global source of information on the world's plant diversity. The backbone is actively curated by our Taxonomic Expert Networks (TEN), consisting of specialists of taxonomic groups, ideally at the Family or Order level. There are currently 37 approved TENs, involving more than 280 specialists, working with the WFO. There are small TENs like the Begonia Resource Center and the Meconopsis Group (with five specialists), medium TENs like Ericaceae and Zingiberaceae Resource Centers or SolanaceaSource.org (around 15 experts), and larger TENs like Caryophyllales.org and the Legume Phylogeny Working Group, with more than 80 specialists involved. When we do not have taxonomic oversight, the World Checklist of Vascular Plants (WCVP 2019) has been used to update those families from the TPL 2013 classification. Full credit and acknowledgement given to the original sources is a key requirement of this collaborative project, allowing users to refer to the primary data. For example, an association with the original content is kept through the local identifiers used by the taxonomic content providers as a link to their own resources. A key requirement for the WFO Taxonomic Backbone is that every name should have a globally unique identifier that is maintained, ideally forever. After considering several options, the WFO Technology Working Group recommended that the WFO Council establish a WFO Identifier (WFO-ID), a 10-digit number with a “wfo-” prefix, aimed at establishing a resolvable identifier for all existing plant names, which will not only be used in the context of WFO but can be universally used to reference plant names. Management of the WFO Taxonomic Backbone has been a challenge as TPL v1.1 was derived from multiple taxonomic datasets, which led to duplication of records. For that reason, names can be excluded from the public portal by the WFO Taxonomic Working Group or the TENs, but not deleted. A WFO-ID is not deleted nor reused after it has been excluded from the WFO Taxonomic Backbone. Keeping these allows for better matching when assigning WFO-IDs to data derived from content providers. Nevertheless, this implies certain considerations for new names and duplications. New names are added to the WFO Taxonomic Backbone via nomenclators like the International Plants Name Index (IPNI, The Royal Botanic Gardens, Kew et al. 2021) for Angiosperms, and Tropicos (Missouri Botanical Garden 2021) for Bryophytes, as well as harvesting endemic and infraspecific names from Flora providers when providing descriptive content. New names are passed to the TEN to make a judgement on their taxonomic status. When TENs provide a new authoritative taxonomic list for their group, we first produce a Name Matching report to ensure no names are missed. Several issues come from managing and maintaining taxonomic lists, but the process of curating an ever-growing integrated resource leads to an increase in the challenges we face with homonyms, non-standard author abbreviations, orthographic variants and duplicate names when Name Matching. The eMonocot database application, provided by Royal Botanic Gardens, Kew, (Santarsiero et al. 2013) and subsequently adapted by the Missouri Botanical Garden to provide the underlying functionality for WFO's current toolset, has also proven itself to be a challenging component to update. In this presentation, we will share our hands-on experience, technical solutions and workflows creating and maintaining the WFO Taxonomic Backbone.


Author(s):  
Teresa Mayfield-Meyer ◽  
Phyllis Sharp ◽  
Dusty McDonald

The reality is that there is no single “taxonomic backbone”, there are many: the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy, the World Register of Marine Species (WoRMS) and MolluscaBase, to name a few. We could view each one of these as a vertebra on the taxonomic backbone, but even that isn’t quite correct as some of these are nested within others (MolluscaBase contributes to WoRMS, which contributes to Catalogue of Life, which contributes to the GBIF Backbone Taxonomy). How is a collection manager without expertise in a given set of taxa and a limited amount of time devoted to finding the “most current” taxonomy supposed to maintain research grade identifications when there are so many seemingly authoritative taxonomic resources? And once a resource is chosen, how can they seamlessly use the information in that resource? This presentation will document how the Arctos community’s use of the taxon name matching service Global Names Architecture (GNA) led one volunteer team leader in a marine invertebrate collection to attempt to make use of WoRMS taxonomy and how her persistence brought better identifications and classifications to a community of collections. It will also provide insight into some of the technical and curatorial challenges involved in using an outside resource as well as the ongoing struggle to keep up with changes as they occur in the curated resource.


2021 ◽  
Vol 17 (9) ◽  
pp. 776-788
Author(s):  
Attia Nehar ◽  
Slimane Bellaouar ◽  
Djelloul Ziadi ◽  
Khaled Moulay Omar
Keyword(s):  

Author(s):  
Vijay Barve

Research projects in ecology or biodiversity either start with an area of study or a target species list. Working with these species lists or taxonomic lists is not as straightforward as it seems. The taxonomic names that are considered to be “standard,” are surprisingly dynamic. Over time, the names keep changing with ongoing research and advancements in taxonomy. Additionally, they undergo all sorts of reorganization, such as one species being split into multiple species and/or subspecies, the grouping of multiple species into a single species, and the reclassification of species from one genus to another. Compiling a consistent target species list can be very time consuming and tricky. However it is the initial step in most research projects and needs to be completed in order to continue the research. Advancements in biodiversity informatics are helping simplify and automate some of these tasks. There are several web services that provide taxonomic data with either a taxonomic or a geographic focus. An increasing number of experts are opening access to their carefully curated taxonomic lists. Even with the help of these services, a lot of time needs to be spent to create a working list of names that can be linked to data such as Global Biodiversity Information Facility (GBIF) mediated occurrence data. The package “taxotools” (Barve 2021) provides basic taxonomic list processing functions within the R programming environment (R Core Team 2021). Even though it is a work in progress, the functions available so far are applicable to diverse projects. The tools available can be categorized into the following broad areas: Name manipulation: A set of helper functions to check scientific names with global name resolution services like Global Names Architecture (GNA) & GBIF Name Parser, and to construct and deconstruct scientific names to and from components like genus, species and subspecific units. Name matching: Matches names either with global name services or with user-created master taxonomy lists using fuzzy matching, testing combinations of genus level synonyms, subspecies elevation to species, trying to match with higher level taxonomic entities like genus and family, and employing a user-defined lookup table to manually resolve names. List processing: Updates list fields such as unique identifiers (id), higher taxonomy and taxonomic ranks. List matching: Compares user generated lists with each other and finds differences in the two lists, then prepares the lists for merging together to form a masterlist. Format conversion: Converts taxolist to and from formats like HTML and Darwin Core (Wieczorek et al. 2021), which is useful in data exchange or checking the lists manually. Name harvesting functions: Acquires additional names from Integrated Taxonomic Information System (ITIS) and Wikipedia (taxonomy infobox). Name manipulation: A set of helper functions to check scientific names with global name resolution services like Global Names Architecture (GNA) & GBIF Name Parser, and to construct and deconstruct scientific names to and from components like genus, species and subspecific units. Name matching: Matches names either with global name services or with user-created master taxonomy lists using fuzzy matching, testing combinations of genus level synonyms, subspecies elevation to species, trying to match with higher level taxonomic entities like genus and family, and employing a user-defined lookup table to manually resolve names. List processing: Updates list fields such as unique identifiers (id), higher taxonomy and taxonomic ranks. List matching: Compares user generated lists with each other and finds differences in the two lists, then prepares the lists for merging together to form a masterlist. Format conversion: Converts taxolist to and from formats like HTML and Darwin Core (Wieczorek et al. 2021), which is useful in data exchange or checking the lists manually. Name harvesting functions: Acquires additional names from Integrated Taxonomic Information System (ITIS) and Wikipedia (taxonomy infobox). Detailed function listings under each category are listed in Table 1. This package has been effectively used in several biodiversity studies and projects like Map of Life, ButterflyNet, Terrestrial Parasite Tracker etc. It has been successfully tested on a masterlist constructed with ~1M names from World Flora Online and performs well. The package is available on The Comprehensive R Archive Network (CRAN) [https://CRAN.R-project.org/package=taxotools] and the developmental release is on GitHub [https://github.com/vijaybarve/taxotools].


2021 ◽  
Vol 7 ◽  
pp. e465
Author(s):  
Mohammed Hadwan ◽  
Mohammed A. Al-Hagery ◽  
Maher Al-Sanabani ◽  
Salah Al-Hagree

Background Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. Methods In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. Results The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets.


2021 ◽  
pp. 271-284
Author(s):  
Philip Blair ◽  
Carmel Eliav ◽  
Fiona Hasanaj ◽  
Kfir Bar

2020 ◽  
Vol 1684 ◽  
pp. 012085
Author(s):  
KaiLi Song ◽  
YunLing Li ◽  
LuLu Yao ◽  
Yuan Wang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document