Journal of Biomedical Semantics

Abstract Background The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and how results are derived has become significantly more complex with the recent explosive growth of heterogeneous research data and methods. Therefore, it is important that the provenance of results is tracked, described, and managed throughout the research lifecycle starting from the beginning of an experiment to its end to ensure reproducibility of results described in publications. However, there is a lack of interoperable representation of end-to-end provenance of scientific experiments that interlinks data, processing steps, and results from an experiment’s computational and non-computational processes. Results We present the “REPRODUCE-ME” data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps. Conclusion We have designed and developed an interoperable way to represent the complete path of a scientific experiment consisting of computational and non-computational steps. We have applied and evaluated our approach to a set of scientific experiments in different subject domains like computational science, biological imaging, and microscopy.

Download Full-text

Residual refinement for interactive skin lesion segmentation

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00255-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Dalei Jiang ◽

Yin Wang ◽

Feng Zhou ◽

Hongtao Ma ◽

Wenting Zhang ◽

...

Keyword(s):

Skin Lesion ◽

User Interaction ◽

Universal Method ◽

Lesion Segmentation ◽

Feature Maps ◽

Classic Problem ◽

Novel Approach ◽

Wide Range ◽

Public Datasets ◽

Segmentation Task

Abstract Background Image segmentation is a difficult and classic problem. It has a wide range of applications, one of which is skin lesion segmentation. Numerous researchers have made great efforts to tackle the problem, yet there is still no universal method in various application domains. Results We propose a novel approach that combines a deep convolutional neural network with a grabcut-like user interaction to tackle the interactive skin lesion segmentation problem. Slightly deviating from grabcut user interaction, our method uses boxes and clicks. In addition, contrary to existing interactive segmentation algorithms that combine the initial segmentation task with the following refinement task, we explicitly separate these tasks by designing individual sub-networks. One network is SBox-Net, and the other is Click-Net. SBox-Net is a full-fledged segmentation network that is built upon a pre-trained, state-of-the-art segmentation model, while Click-Net is a simple yet powerful network that combines feature maps extracted from SBox-Net and user clicks to residually refine the mistakes made by SBox-Net. Extensive experiments on two public datasets, PH2 and ISIC, confirm the effectiveness of our approach. Conclusions We present an interactive two-stage pipeline method for skin lesion segmentation, which was demonstrated to be effective in comprehensive experiments.

Download Full-text

FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00254-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Lars Vogt

Keyword(s):

Empirical Data ◽

Life Sciences ◽

Data Representation ◽

Point Of View ◽

Knowledge Graph ◽

Management Tools ◽

Semantic Graph ◽

Universal Usability ◽

Almost All ◽

Technical Difference

Abstract Background The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. Results Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. Conclusions We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.

Download Full-text

Prefrontal fNIRS-based clinical data analysis of brain functions in individuals abusing different types of drugs

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00256-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Xuelin Gu ◽

Banghua Yang ◽

Shouwei Gao ◽

Lin Feng Yan ◽

Ding Xu ◽

...

Keyword(s):

Density Functional ◽

Near Infrared ◽

Learning Algorithm ◽

Support Vector ◽

Drug Abusers ◽

Drug Rehabilitation ◽

Functional Near Infrared Spectroscopy ◽

Linear Discriminant ◽

Brain Functions ◽

Different Types

Abstract Background The activation degree of the orbitofrontal cortex (OFC) functional area in drug abusers is directly related to the craving for drugs and the tolerance to punishment. Currently, among the clinical research on drug rehabilitation, there has been little analysis of the OFC activation in individuals abusing different types of drugs, including heroin, methamphetamine, and mixed drugs. Therefore, it becomes urgently necessary to clinically investigate the abuse of different drugs, so as to explore the effects of different types of drugs on the human brain. Methods Based on prefrontal high-density functional near-infrared spectroscopy (fNIRS), this research designs an experiment that includes resting and drug addiction induction. Hemoglobin concentrations of 30 drug users (10 on methamphetamine, 10 on heroin, and 10 on mixed drugs) were collected using fNIRS and analyzed by combining algorithm and statistics. Results Linear discriminant analysis (LDA), Support vector machine (SVM) and Machine-learning algorithm was implemented to classify different drug abusers. Oxygenated hemoglobin (HbO2) activations in the OFC of different drug abusers were statistically analyzed, and the differences were confirmed. Innovative findings: in both the Right-OFC and Left-OFC areas, methamphetamine abusers had the highest degree of OFC activation, followed by those abusing mixed drugs, and heroin abusers had the lowest. The same result was obtained when OFC activation was investigated without distinguishing the left and right hemispheres. Conclusions The findings confirmed the significant differences among different drug abusers and the patterns of OFC activations, providing a theoretical basis for personalized clinical treatment of drug rehabilitation in the future.

Download Full-text

An ontology network for Diabetes Mellitus in Mexico

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00252-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Cecilia Reyes-Peña ◽

Mireya Tovar ◽

Maricela Bravo ◽

Regina Motz

Keyword(s):

Diabetes Mellitus ◽

Language Processing ◽

Medical Information ◽

Technical Information ◽

Quality Criteria ◽

International Standards ◽

Knowledge Generation ◽

Diabetic Patients ◽

Snomed Ct ◽

New Knowledge

Abstract Background Medical experts in the domain of Diabetes Mellitus (DM) acquire specific knowledge from diabetic patients through monitoring and interaction. This allows them to know the disease and information about other conditions or comorbidities, treatments, and typical consequences of the Mexican population. This indicates that an expert in a domain knows technical information about the domain and contextual factors that interact with it in the real world, contributing to new knowledge generation. For capturing and managing information about the DM, it is necessary to design and implement techniques and methods that allow: determining the most relevant conceptual dimensions and their correct organization, the integration of existing medical and clinical information from different resources, and the generation of structures that represent the deduction process of the doctor. An Ontology Network is a collection of ontologies of diverse knowledge domains which can be interconnected by meta-relations. This article describes an Ontology Network for representing DM in Mexico, designed by a proposed methodology. The information used for Ontology Network building include the ontological resource reuse and non-ontological resource transformation for ontology design and ontology extending by natural language processing techniques. These are medical information extracted from vocabularies, taxonomies, medical dictionaries, ontologies, among others. Additionally, a set of semantic rules has been defined within the Ontology Network to derive new knowledge. Results An Ontology Network for DM in Mexico has been built from six well-defined domains, resulting in new classes, using ontological and non-ontological resources to offer a semantic structure for assisting in the medical diagnosis process. The network comprises 1367 classes, 20 object properties, 63 data properties, and 4268 individuals from seven different ontologies. Ontology Network evaluation was carried out by verifying the purpose for its design and some quality criteria. Conclusions The composition of the Ontology Network offers a set of well-defined ontological modules facilitating the reuse of one or more of them. The inclusion of international vocabularies as SNOMED CT or ICD-10 reinforces the representation by international standards. It increases the semantic interoperability of the network, providing the opportunity to integrate other ontologies with the same vocabularies. The ontology network design methodology offers a guide for ontology developers about how to use ontological and non-ontological resources in order to exploit the maximum of information and knowledge from a set of domains that share or not information.

Download Full-text

CIDO ontology updates and secondary analysis of host responses to COVID-19 infection based on ImmPort reports and literature

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00250-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Anthony Huffman ◽

Anna Maria Masci ◽

Jie Zheng ◽

Nasim Sanati ◽

Timothy Brunson ◽

...

Keyword(s):

Secondary Analysis ◽

Secondary Data ◽

Host Responses ◽

Ontology Development ◽

Specific Gene ◽

Disease Ontology ◽

Female Patients ◽

Different Types ◽

Comprehensive Picture ◽

Male Patients

Abstract Background With COVID-19 still in its pandemic stage, extensive research has generated increasing amounts of data and knowledge. As many studies are published within a short span of time, we often lose an integrative and comprehensive picture of host-coronavirus interaction (HCI) mechanisms. As of early April 2021, the ImmPort database has stored 7 studies (with 6 having details) that cover topics including molecular immune signatures, epitopes, and sex differences in terms of mortality in COVID-19 patients. The Coronavirus Infectious Disease Ontology (CIDO) represents basic HCI information. We hypothesize that the CIDO can be used as the platform to represent newly recorded information from ImmPort leading the reinforcement of CIDO. Methods The CIDO was used as the semantic platform for logically modeling and representing newly identified knowledge reported in the 6 ImmPort studies. A recursive eXtensible Ontology Development (XOD) strategy was established to support the CIDO representation and enhancement. Secondary data analysis was also performed to analyze different aspects of the HCI from these ImmPort studies and other related literature reports. Results The topics covered by the 6 ImmPort papers were identified to overlap with existing CIDO representation. SARS-CoV-2 viral S protein related HCI knowledge was emphasized for CIDO modeling, including its binding with ACE2, mutations causing different variants, and epitope homology by comparison with other coronavirus S proteins. Different types of cytokine signatures were also identified and added to CIDO. Our secondary analysis of two cohort COVID-19 studies with cytokine panel detection found that a total of 11 cytokines were up-regulated in female patients after infection and 8 cytokines in male patients. These sex-specific gene responses were newly modeled and represented in CIDO. A new DL query was generated to demonstrate the benefits of such integrative ontology representation. Furthermore, IL-10 signaling pathway was found to be statistically significant for both male patients and female patients. Conclusion Using the recursive XOD strategy, six new ImmPort COVID-19 studies were systematically reviewed, the results were modeled and represented in CIDO, leading to the enhancement of CIDO. The enhanced ontology and further seconary analysis supported more comprehensive understanding of the molecular mechanism of host responses to COVID-19 infection.

Download Full-text

Linking common human diseases to their phenotypes; development of a resource for human phenomics

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00249-x ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 1

Author(s):

Şenay Kafkas ◽

Sara Althubaiti ◽

Georgios V. Gkoutos ◽

Robert Hoehndorf ◽

Paul N. Schofield

Keyword(s):

Text Mining ◽

Validation Dataset ◽

Clinical Settings ◽

Mining Method ◽

Disease Phenotype ◽

Significant Information ◽

Sequencing Technologies ◽

Disease Associations ◽

Icd 10 ◽

Sporadic Disease

Abstract Background In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings. Methods We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships. Results We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity. Conclusion We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713.

Download Full-text

Syntax-based transfer learning for the task of biomedical relation extraction

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00248-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Joël Legrand ◽

Yannick Toussaint ◽

Chedy Raïssi ◽

Adrien Coulet

Keyword(s):

Transfer Learning ◽

Language Processing ◽

Domain Adaptation ◽

Relation Extraction ◽

Training Data ◽

Learning Performance ◽

Promising Alternative ◽

Syntactic Features ◽

The Impact ◽

Biomedical Relation Extraction

Abstract Background Transfer learning aims at enhancing machine learning performance on a problem by reusing labeled data originally designed for a related, but distinct problem. In particular, domain adaptation consists for a specific task, in reusing training data developedfor the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because they usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains. Results In this paper, we experiment with transfer learning for the task of relation extraction from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical relation extraction tasks and equal performances for two others, for which little annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in transfer learning for relation extraction. Conclusion Given the difficulty to manually annotate corpora in the biomedical domain, the proposed transfer learning method offers a promising alternative to achieve good relation extraction performances for domains associated with scarce resources. Also, our analysis illustrates the importance that syntax plays in transfer learning, underlying the importance in this domain to privilege approaches that embed syntactic features.

Download Full-text

Toward a systematic conflict resolution framework for ontologies

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00246-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

C. Maria Keet ◽

Rolf Grütter

Keyword(s):

Conflict Resolution ◽

Domain Knowledge ◽

Ad Hoc ◽

Ontology Development ◽

Subject Domain ◽

Ontology Alignment ◽

Cardinality Constraint ◽

Actual Case ◽

Object Property ◽

Property Versus

Abstract Background The ontology authoring step in ontology development involves having to make choices about what subject domain knowledge to include. This may concern sorting out ontological differences and making choices between conflicting axioms due to limitations in the logic or the subject domain semantics. Examples are dealing with different foundational ontologies in ontology alignment and OWL 2 DL’s transitive object property versus a qualified cardinality constraint. Such conflicts have to be resolved somehow. However, only isolated and fragmented guidance for doing so is available, which therefore results in ad hoc decision-making that may not be the best choice or forgotten about later. Results This work aims to address this by taking steps towards a framework to deal with the various types of modeling conflicts through meaning negotiation and conflict resolution in a systematic way. It proposes an initial library of common conflicts, a conflict set, typical steps toward resolution, and the software availability and requirements needed for it. The approach was evaluated with an actual case of domain knowledge usage in the context of epizootic disease outbreak, being avian influenza, and running examples with COVID-19 ontologies. Conclusions The evaluation demonstrated the potential and feasibility of a conflict resolution framework for ontologies.

Download Full-text

ResidueFinder: extracting individual residue mentions from protein literature

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00243-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ton E Becker ◽

Eric Jakobsson

Keyword(s):

Amino Acids ◽

Full Text ◽

Protein Function ◽

Regular Expression ◽

Computationally Efficient ◽

Expression Library ◽

Trade Offs ◽

Entire Sequence ◽

Individual Residue ◽

Efficient Program

Abstract Background The revolution in molecular biology has shown how protein function and structure are based on specific sequences of amino acids. Thus, an important feature in many papers is the mention of the significance of individual amino acids in the context of the entire sequence of the protein. MutationFinder is a widely used program for finding mentions of specific mutations in texts. We report on augmenting the positive attributes of MutationFinder with a more inclusive regular expression list to create ResidueFinder, which finds mentions of native amino acids as well as mutations. We also consider parameter options for both ResidueFinder and MutationFinder to explore trade-offs between precision, recall, and computational efficiency. We test our methods and software in full text as well as abstracts. Results We find there is much more variety of formats for mentioning residues in the entire text of papers than in abstracts alone. Failure to take these multiple formats into account results in many false negatives in the program. Since MutationFinder, like several other programs, was primarily tested on abstracts, we found it necessary to build an expanded regular expression list to achieve acceptable recall in full text searches. We also discovered a number of artifacts arising from PDF to text conversion, which we wrote elements in the regular expression library to address. Taking into account those factors resulted in high recall on randomly selected primary research articles. We also developed a streamlined regular expression (called “cut”) which enables a several hundredfold speedup in both MutationFinder and ResidueFinder with only a modest compromise of recall. All regular expressions were tested using expanded F-measure statistics, i.e., we compute Fβ for various values of where the larger the value of β the more recall is weighted, the smaller the value of β the more precision is weighted. Conclusions ResidueFinder is a simple, effective, and efficient program for finding individual residue mentions in primary literature starting with text files, implemented in Python, and available in SourceForge.net. The most computationally efficient versions of ResidueFinder could enable creation and maintenance of a database of residue mentions encompassing all articles in PubMed.

Download Full-text

Journal of Biomedical Semantics
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer (Biomed Central Ltd.)

End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach

Residual refinement for interactive skin lesion segmentation

FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

Prefrontal fNIRS-based clinical data analysis of brain functions in individuals abusing different types of drugs

An ontology network for Diabetes Mellitus in Mexico

CIDO ontology updates and secondary analysis of host responses to COVID-19 infection based on ImmPort reports and literature

Linking common human diseases to their phenotypes; development of a resource for human phenomics

Syntax-based transfer learning for the task of biomedical relation extraction

Toward a systematic conflict resolution framework for ontologies

ResidueFinder: extracting individual residue mentions from protein literature

Export Citation Format

Journal of Biomedical SemanticsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer (Biomed Central Ltd.)

End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach

Residual refinement for interactive skin lesion segmentation

FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

Prefrontal fNIRS-based clinical data analysis of brain functions in individuals abusing different types of drugs

An ontology network for Diabetes Mellitus in Mexico

CIDO ontology updates and secondary analysis of host responses to COVID-19 infection based on ImmPort reports and literature

Linking common human diseases to their phenotypes; development of a resource for human phenomics

Syntax-based transfer learning for the task of biomedical relation extraction

Toward a systematic conflict resolution framework for ontologies

ResidueFinder: extracting individual residue mentions from protein literature

Journal of Biomedical Semantics
Latest Publications