Concept Recognition in French Biomedical Text Using Automatic Translation

Abstract Motivation Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. Results In this article, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods. Availabilityand implementation The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

BMC Bioinformatics ◽

10.1186/1471-2105-14-146 ◽

2013 ◽

Vol 14 (1) ◽

Cited By ~ 8

Author(s):

Antonio Jimeno Yepes ◽

Élise Prieur-Gaston ◽

Aurélie Névéol

Keyword(s):

Biomedical Text ◽

Parallel Corpora ◽

Automatic Translation

Download Full-text

DARR: A Free-text Analysis System for the Automatic Documentation of Radiological Reports

Methods of Information in Medicine ◽

10.1055/s-0038-1636585 ◽

1977 ◽

Vol 16 (03) ◽

pp. 144-153 ◽

Cited By ~ 3

Author(s):

E. Vaccari ◽

W. Delaney ◽

A. Chiesa

Keyword(s):

Natural Language ◽

Text Analysis ◽

Automatic Documentation ◽

Free Text ◽

Software System ◽

Automatic Translation ◽

Radiological Report ◽

Analysis System ◽

Content Processing

A software system for the automatic free-text analysis and retrieval of radiological reports is presented. Such software involves: (1) automatic translation of the specific natural language in a formalized metalanguage in order to transform the radiological report in a »normalized report« analyzable by computer; (2) content processing of the normalized report to select desired information. The approach used to accomplish point (1) is described in detail referring to a specific application.

Download Full-text

Grammatical Difficulties of Automatic Translation of Scientific and Technical Literature

Naukovy Visnyk of South Ukrainian National Pedagogical University named after K D Ushynsky Linguistic Sciences ◽

10.24195/2616-5317-2018-27-16 ◽

2019 ◽

Vol 26 (27) ◽

pp. 134-141

Author(s):

Aliona Kolesnichenko ◽

Natalya Zhmayeva

Keyword(s):

Key Words ◽

Technical Literature ◽

Automatic Translation ◽

Advantages And Disadvantages ◽

Grammatical Errors ◽

Translation Service

The article is devoted to the analysis of grammatical difficulties encountered in the process of automatic translation. The paper discusses the advantages and disadvantages of the SDL Trados automatic translation service. The types of grammatical errors when translating scientific and technical texts in SDL Trados are classified, the ways of overcoming them are outlined. Key words: scientific and technical literature, automatic translation, grammatical difficulties.

Download Full-text