Finding Gene Names

Computational Text Analysis ◽

10.1093/oso/9780198567400.003.0016 ◽

2006 ◽

Author(s):

Soumya Raychaudhuri

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Critical Issue ◽

High Accuracy ◽

Scientific Text ◽

Recognition Algorithms ◽

Wide Range ◽

Genomics Research ◽

Processing Techniques ◽

Mining Algorithms

Successful use of text mining algorithms to facilitate genomics research hinges on the ability to recognize the names of genes in scientific text. In this chapter we address the critical issue of gene name recognition. Once gene names can be recognized in the scientific text, we can begin to understand what the text says about those genes. This is a much more challenging issue than one might appreciate at first glance. Gene names can be inconsistent and confusing; automated gene name recognition efforts have therfore turned out to be quite challenging to implement with high accuracy. Gene name recognition algorithms have a wide range of useful applications. Until this chapter we have been avoiding this issue and have been using only gene-article indices. In practice these indices are manually assembled. Gene name recognition algorithms offer the possibility of automating and expediting the laborious task of building reference indices. Article indices can be built that associate articles to genes based on whether or not the article mentions the gene by name. In addition, gene name recognition is the first step in doing more detailed sentence-by-sentence text analysis. For example, in Chapter 10 we will talk about identifying relationships between genes from text. Frequently, this requires identifying sentences refering to two gene names, and understanding what sort of relationship the sentence is describing between these genes. Sophisticated natural language processing techniques to parse sentences and understand gene function cannot be done in a meaningful way without recognizing where the gene names are in the first place. The major concepts of this chapter are presented in the frame box. We begin by describing the commonly used strategies that can be used alone or in concert to identify gene names. At the end of the chapter we introduce one successful name finding algorithm that combines many of the different strategies. There are several commonly used approaches that can be exploited to recognize gene names in text (Chang, Shutze, et al. 2004). Often times these approaches can be combined into even more effective multifaceted algorithms.

Download Full-text

A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity

AERA Open ◽

10.1177/23328584211028615 ◽

2021 ◽

Vol 7 ◽

pp. 233285842110286

Author(s):

Kylie L. Anglin ◽

Vivian C. Wong ◽

Arielle Boguslav

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Intervention Implementation ◽

Proof Of Concept ◽

Coaching Intervention ◽

Processing Techniques ◽

Teacher Coaching ◽

The Impact

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.

Download Full-text

Detecting Malicious Windows Commands Using Natural Language Processing Techniques

Innovative Security Solutions for Information Technology and Communications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-12942-2_13 ◽

2019 ◽

pp. 157-169

Author(s):

Muhammd Mudassar Yamin ◽

Basel Katt

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Techniques

Download Full-text

Identification of spam comments using natural language processing techniques

2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP) ◽

10.1109/iccp.2014.6936976 ◽

2014 ◽

Cited By ~ 6

Author(s):

Cristina Radulescu ◽

Mihaela Dinsoreanu ◽

Rodica Potolea

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Techniques

Download Full-text

Using Natural Language Processing Techniques for Stock Return Predictions

SSRN Electronic Journal ◽

10.2139/ssrn.2940564 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ming Li Chew ◽

Sahil Puri

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Stock Return ◽

Processing Techniques

Download Full-text

Identifying and intercepting health misinformation on Reddit dermatology forums with artificially intelligent bots using natural language processing (Preprint)

10.2196/preprints.20975 ◽

2021 ◽

Author(s):

Monique B. Sager ◽

Aditya M. Kashyap ◽

Mila Tamminga ◽

Sadhana Ravoori ◽

Christopher Callison-Burch ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Test Accuracy ◽

Limited Data ◽

Test Environment ◽

Data Set ◽

Inappropriate Care ◽

Processing Techniques

BACKGROUND Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to inappropriate care. Initial testing has shown that artificially intelligent bots can detect misinformation on Reddit forums and may be able to produce responses to posts containing misinformation. OBJECTIVE To analyze the ability of bots to find and respond to health misinformation on Reddit’s dermatology forums in a controlled test environment. METHODS Using natural language processing techniques, we trained bots to target misinformation using relevant keywords and to post pre-fabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. RESULTS Our models yielded data test accuracies ranging from 95%-100%, with a BERT fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective pre-fabricated responses to misinformation. CONCLUSIONS Using a limited data set, bots had near-perfect ability to detect these examples of health misinformation within Reddit dermatology forums. Given that these bots can then post pre-fabricated responses, this technique may allow for interception of misinformation. Providing correct information, even instantly, however, does not mean users will be receptive or find such interventions persuasive. Further work should investigate this strategy’s effectiveness to inform future deployment of bots as a technique in combating health misinformation. CLINICALTRIAL N/A

Download Full-text

Recent Advances in Conversational Intelligent Tutoring Systems

AI Magazine ◽

10.1609/aimag.v34i3.2485 ◽

2013 ◽

Vol 34 (3) ◽

pp. 42-54 ◽

Cited By ~ 54

Author(s):

Vasile Rus ◽

Sidney D’Mello ◽

Xiangen Hu ◽

Arthur Graesser

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Intelligent Tutoring ◽

Individual Student ◽

Learning Progressions ◽

Tutoring Systems ◽

Recent Advances ◽

Processing Techniques

We report recent advances in intelligent tutoring systems with conversational dialogue. We highlight progress in terms of macro and microadaptivity. Macroadaptivity refers to a system’s capability to select appropriate instructional tasks for the learner to work on. Microadaptivity refers to a system’s capability to adapt its scaffolding while the learner is working on a particular task. The advances in macro and microadaptivity that are presented here were made possible by the use of learning progressions, deeper dialogue and natural language processing techniques, and by the use of affect-enabled components. Learning progressions and deeper dialogue and natural language processing techniques are key features of DeepTutor, the first intelligent tutoring system based on learning progressions. These improvements extend the bandwidth of possibilities for tailoring instruction to each individual student which is needed for maximizing engagement and ultimately learning.

Download Full-text

Applying Natural Language Processing Techniques to Generate Open Data Web APIs Documentation

Lecture Notes in Computer Science - Web Engineering ◽

10.1007/978-3-030-50578-3_28 ◽

2020 ◽

pp. 416-432

Author(s):

César González-Mora ◽

Cristina Barros ◽

Irene Garrigós ◽

Jose Zubcoff ◽

Elena Lloret ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Open Data ◽

Processing Techniques

Download Full-text

Automated Classroom Lecture Note Generation Using Natural Language Processing and Image Processing Techniques

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/16852019 ◽

2019 ◽

Vol 8 (5) ◽

pp. 1920-1926

Author(s):

Sandanayake T.C. ◽

Keyword(s):

Image Processing ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lecture Note ◽

Image Processing Techniques ◽

Processing Techniques

Download Full-text

Application of natural language processing methods to extract coded data from administrative data held in the Scottish Prescribing Information System

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.263 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Clifford Nangle ◽

Stuart McTaggart ◽

Margaret MacLeod ◽

Jackie Caldwell ◽

Marion Bennie

Keyword(s):

Information System ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Exposure ◽

Drug Dose ◽

Free Text ◽

Wide Range ◽

The Impact ◽

Prescribing Information

ABSTRACT ObjectivesThe Prescribing Information System (PIS) datamart, hosted by NHS National Services Scotland receives around 90 million electronic prescription messages per year from GP practices across Scotland. Prescription messages contain information including drug name, quantity and strength stored as coded, machine readable, data while prescription dose instructions are unstructured free text and difficult to interpret and analyse in volume. The aim, using Natural Language Processing (NLP), was to extract drug dose amount, unit and frequency metadata from freely typed text in dose instructions to support calculating the intended number of days’ treatment. This then allows comparison with actual prescription frequency, treatment adherence and the impact upon prescribing safety and effectiveness. ApproachAn NLP algorithm was developed using the Ciao implementation of Prolog to extract dose amount, unit and frequency metadata from dose instructions held in the PIS datamart for drugs used in the treatment of gastrointestinal, cardiovascular and respiratory disease. Accuracy estimates were obtained by randomly sampling 0.1% of the distinct dose instructions from source records, comparing these with metadata extracted by the algorithm and an iterative approach was used to modify the algorithm to increase accuracy and coverage. ResultsThe NLP algorithm was applied to 39,943,465 prescription instructions issued in 2014, consisting of 575,340 distinct dose instructions. For drugs used in the gastrointestinal, cardiovascular and respiratory systems (i.e. chapters 1, 2 and 3 of the British National Formulary (BNF)) the NLP algorithm successfully extracted drug dose amount, unit and frequency metadata from 95.1%, 98.5% and 97.4% of prescriptions respectively. However, instructions containing terms such as ‘as directed’ or ‘as required’ reduce the usability of the metadata by making it difficult to calculate the total dose intended for a specific time period as 7.9%, 0.9% and 27.9% of dose instructions contained terms meaning ‘as required’ while 3.2%, 3.7% and 4.0% contained terms meaning ‘as directed’, for drugs used in BNF chapters 1, 2 and 3 respectively. ConclusionThe NLP algorithm developed can extract dose, unit and frequency metadata from text found in prescriptions issued to treat a wide range of conditions and this information may be used to support calculating treatment durations, medicines adherence and cumulative drug exposure. The presence of terms such as ‘as required’ and ‘as directed’ has a negative impact on the usability of the metadata and further work is required to determine the level of impact this has on calculating treatment durations and cumulative drug exposure.

Download Full-text

Natural-language processing and automatic indexing

The Indexer The International Journal of Indexing ◽

10.3828/indexer.1990.17.1.8 ◽

1990 ◽

Vol 17 (1) ◽

pp. 21-29

Author(s):

C. Korycinski ◽

Alan F. Newell

Keyword(s):

Natural Language Processing ◽

Statistical Analysis ◽

Natural Language ◽

Language Processing ◽

Database Systems ◽

Statistical Techniques ◽

Free Text ◽

Automatic Indexing ◽

Text Database ◽

Processing Techniques

The task of producing satisfactory indexes by automatic means has been tackled on two fronts: by statistical analysis of text and by attempting content analysis of the text in much the same way as a human indexcr does. Though statistical techniques have a lot to offer for free-text database systems, neither method has had much success with back-of-the-bopk indexing. This review examines some problems associated with the application of natural-language processing techniques to book texts.

Download Full-text