biomedical domain Latest Research Papers

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

BMC Bioinformatics ◽

10.1186/s12859-021-04539-0 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Juan J. Lastra-Díaz ◽

Alicia Lara-Clares ◽

Ana Garcia-Serrano

Keyword(s):

Real Time ◽

Semantic Similarity ◽

Similarity Measure ◽

Shortest Path ◽

State Of The Art ◽

Biomedical Domain ◽

Snomed Ct ◽

Shortest Path Algorithm ◽

Semantic Similarity Measure ◽

Current State

Abstract Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. Results To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra’s algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. Conclusions We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

Download Full-text

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

Applied Sciences ◽

10.3390/app12010154 ◽

2021 ◽

Vol 12 (1) ◽

pp. 154

Author(s):

Ziheng Zhang ◽

Feng Han ◽

Hongjian Zhang ◽

Tomohiro Aoki ◽

Katsuhiko Ogasawara

Keyword(s):

Language Processing ◽

Relation Extraction ◽

Medical Data ◽

Biomedical Literature ◽

Literature Mining ◽

Biomedical Domain ◽

Pubmed Central ◽

General Domain ◽

Biomedical Information Retrieval ◽

Science Community

Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. The objective of this study is to examine how changes in the ratio of the biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. We downloaded abstracts of 214,892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased when the ratio of the biomedical domain to general domain data was 3:7 to 5:5. This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.

Download Full-text

Interpretable ontology meta-matching in the biomedical domain using Mamdani fuzzy inference

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116025 ◽

2021 ◽

pp. 116025

Author(s):

Jorge Martinez-Gil ◽

Jose Manuel Chaves-Gonzalez

Keyword(s):

Fuzzy Inference ◽

Biomedical Domain

Download Full-text

Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain

Progress in Artificial Intelligence ◽

10.1007/s13748-021-00263-1 ◽

2021 ◽

Author(s):

Jorge Martinez-Gil ◽

Riad Mokadem ◽

Franck Morvan ◽

Josef Küng ◽

Abdelkader Hameurlain

Keyword(s):

Penalized Regression ◽

Biomedical Domain ◽

Knowledge Graphs

Download Full-text

Disease Named Entity Recognition (D-NER) Evaluation

10.21203/rs.3.rs-911654/v1 ◽

2021 ◽

Author(s):

Xie-Yuan Xie

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Case ◽

Case Reports ◽

Named Entity Recognition ◽

Entity Recognition ◽

Biomedical Domain ◽

Named Entity ◽

Medical Domain

Abstract Named Entity Recognition (NER) is a key task in Natural Language Processing (NLP). In medical domain, NER is very important phase in all end-to-end systems. In this paper, we investigate the performance of NER for disease (D-NER). TaggerOne was evaluated on 52 cardiovascular-related clinical case reports against hand annotation for diseases. Different training sets have been used to evaluate the performance of TaggerOne as a famous tool for NER in biomedical domain.

Download Full-text

ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01621-8 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Boxiang Liu ◽

Liang Huang

Keyword(s):

New England ◽

Machine Translation ◽

Domain Knowledge ◽

Language Translation ◽

Fine Tuning ◽

Biomedical Domain ◽

Chinese Translation ◽

Parallel Corpus ◽

Translation Quality ◽

Full Dataset

Abstract Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. Description We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en$$\rightarrow$$ → zh (zh$$\rightarrow$$ → en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en$$\rightarrow$$ → zh (zh$$\rightarrow$$ → en) directions on the full dataset. Conclusions The code and data are available at https://github.com/boxiangliu/ParaMed.

Download Full-text

Knowledge-Guided Efficient Representation Learning for Biomedical Domain

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining ◽

10.1145/3447548.3467118 ◽

2021 ◽

Author(s):

Kishlay Jha ◽

Guangxu Xun ◽

Nan Du ◽

Aidong Zhang

Keyword(s):

Representation Learning ◽

Biomedical Domain ◽

Efficient Representation

Download Full-text

Semantic text mining and its application in biomedical domain

10.17918/etd-899 ◽

2021 ◽

Author(s):

Illhoi Yoo

Keyword(s):

Text Mining ◽

Biomedical Domain

Download Full-text

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Health Information Science and Systems ◽

10.1007/s13755-021-00156-6 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Rodrique Kafando ◽

Rémy Decoupes ◽

Sarah Valentin ◽

Lucile Sautot ◽

Maguelonne Teisseire ◽

...

Keyword(s):

Biomedical Domain ◽

Biomedical Analysis ◽

Terminology Extraction ◽

Morphosyntactic Variation ◽

Term Extraction ◽

Statistical Measures ◽

Analysis Strategies ◽

Experimental Findings ◽

Intelligent Process ◽

Qualitative Analyses

AbstractHere, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain.

Download Full-text

Applications for Flexible TFT Arrays Emerge in the Biomedical Domain

Information Display ◽

10.1002/msid.1231 ◽

2021 ◽

Vol 37 (4) ◽

pp. 26-33

Author(s):

Auke Jisk Kronemeijer ◽

Gerwin H. Gelinck

Keyword(s):

Biomedical Domain

Download Full-text

biomedical domain
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

Interpretable ontology meta-matching in the biomedical domain using Mamdani fuzzy inference

Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain

Disease Named Entity Recognition (D-NER) Evaluation

ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain

Knowledge-Guided Efficient Representation Learning for Biomedical Domain

Semantic text mining and its application in biomedical domain

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Applications for Flexible TFT Arrays Emerge in the Biomedical Domain

Export Citation Format

biomedical domainRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

Interpretable ontology meta-matching in the biomedical domain using Mamdani fuzzy inference

Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain

Disease Named Entity Recognition (D-NER) Evaluation

ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain

Knowledge-Guided Efficient Representation Learning for Biomedical Domain

Semantic text mining and its application in biomedical domain

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Applications for Flexible TFT Arrays Emerge in the Biomedical Domain

biomedical domain
Recently Published Documents