scholarly journals DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction

2021 ◽  
Vol 4 ◽  
Author(s):  
Arjun Bhatt ◽  
Ruth Roberts ◽  
Xi Chen ◽  
Ting Li ◽  
Skylar Connor ◽  
...  

Drug labeling contains an ‘INDICATIONS AND USAGE’ that provides vital information to support clinical decision making and regulatory management. Effective extraction of drug indication information from free-text based resources could facilitate drug repositioning projects and help collect real-world evidence in support of secondary use of approved medicines. To enable AI-powered language models for the extraction of drug indication information, we used manual reading and curation to develop a Drug Indication Classification and Encyclopedia (DICE) based on FDA approved human prescription drug labeling. A DICE scheme with 7,231 sentences categorized into five classes (indications, contradictions, side effects, usage instructions, and clinical observations) was developed. To further elucidate the utility of the DICE, we developed nine different AI-based classifiers for the prediction of indications based on the developed DICE to comprehensively assess their performance. We found that the transformer-based language models yielded an average MCC of 0.887, outperforming the word embedding-based Bidirectional long short-term memory (BiLSTM) models (0.862) with a 2.82% improvement on the test set. The best classifiers were also used to extract drug indication information in DrugBank and achieved a high enrichment rate (>0.930) for this task. We found that domain-specific training could provide more explainable models without performance sacrifices and better generalization for external validation datasets. Altogether, the proposed DICE could be a standard resource for the development and evaluation of task-specific AI-powered, natural language processing (NLP) models.

2020 ◽  
Author(s):  
Dennis Shung ◽  
Cynthia Tsay ◽  
Loren Laine ◽  
Prem Thomas ◽  
Caitlin Partridge ◽  
...  

Background and AimGuidelines recommend risk stratification scores in patients presenting with gastrointestinal bleeding (GIB), but such scores are uncommonly employed in practice. Automation and deployment of risk stratification scores in real time within electronic health records (EHRs) would overcome a major impediment. This requires an automated mechanism to accurately identify (“phenotype”) patients with GIB at the time of presentation. The goal is to identify patients with acute GIB by developing and evaluating EHR-based phenotyping algorithms for emergency department (ED) patients.MethodsWe specified criteria using structured data elements to create rules for identifying patients, and also developed a natural-language-processing (NLP)-based algorithm for automated phenotyping of patients, tested them with tenfold cross-validation (n=7144) and external validation (n=2988), and compared them with the standard method for encoding patient conditions in the EHR, Systematized Nomenclature of Medicine (SNOMED). The gold standard for GIB diagnosis was independent dual manual review of medical records. The primary outcome was positive predictive value (PPV).ResultsA decision rule using GIB-specific terms from ED triage and from ED review-of-systems assessment performed better than SNOMED on internal validation (PPV=91% [90%-93%] vs. 74% [71%-76%], P<0.001) and external validation (PPV=85% [84%-87%] vs. 69% [67%-71%], P<0.001). The NLP algorithm (external validation PPV=80% [79-82%]) was not superior to the structured-datafields decision rule.ConclusionsAn automated decision rule employing GIB-specific triage and review-of-systems terms can be used to trigger EHR-based deployment of risk stratification models to guide clinical decision-making in real time for patients with acute GIB presenting to the ED.


Assessment ◽  
2021 ◽  
pp. 107319112199646
Author(s):  
Olivia Gratz ◽  
Duncan Vos ◽  
Megan Burke ◽  
Neelkamal Soares

To date, there is a paucity of research conducting natural language processing (NLP) on the open-ended responses of behavior rating scales. Using three NLP lexicons for sentiment analysis of the open-ended responses of the Behavior Assessment System for Children-Third Edition, the researchers discovered a moderately positive correlation between the human composite rating and the sentiment score using each of the lexicons for strengths comments and a slightly positive correlation for the concerns comments made by guardians and teachers. In addition, the researchers found that as the word count increased for open-ended responses regarding the child’s strengths, there was a greater positive sentiment rating. Conversely, as word count increased for open-ended responses regarding child concerns, the human raters scored comments more negatively. The authors offer a proof-of-concept to use NLP-based sentiment analysis of open-ended comments to complement other data for clinical decision making.


2015 ◽  
Vol 22 (6) ◽  
pp. 1220-1230 ◽  
Author(s):  
Huan Mo ◽  
William K Thompson ◽  
Luke V Rasmussen ◽  
Jennifer A Pacheco ◽  
Guoqian Jiang ◽  
...  

Abstract Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.


2020 ◽  
Vol 14 ◽  
pp. 117954682095341 ◽  
Author(s):  
Todd C Villines ◽  
Mark J Cziraky ◽  
Alpesh N Amin

Real-world evidence (RWE) provides a potential rich source of additional information to the body of data available from randomized clinical trials (RCTs), but there is a need to understand the strengths and limitations of RWE before it can be applied to clinical practice. To gain insight into current thinking in clinical decision making and utility of different data sources, a representative sampling of US cardiologists selected from the current, active Fellows of the American College of Cardiology (ACC) were surveyed to evaluate their perceptions of findings from RCTs and RWE studies and their application in clinical practice. The survey was conducted online via the ACC web portal between 12 July and 11 August 2017. Of the 548 active ACC Fellows invited as panel members, 173 completed the survey (32% response), most of whom were board certified in general cardiology (n = 119, 69%) or interventional cardiology (n = 40, 23%). The survey results indicated a wide range of familiarity with and utilization of RWE amongst cardiologists. Most cardiologists were familiar with RWE and considered RWE in clinical practice at least some of the time. However, a significant minority of survey respondents had rarely or never applied RWE learnings in their clinical practice, and many did not feel confident in the results of RWE other than registry data. These survey findings suggest that additional education on how to assess and interpret RWE could help physicians to integrate data and learnings from RCTs and RWE to best guide clinical decision making.


2021 ◽  
Vol 28 (1) ◽  
pp. e100267
Author(s):  
Keerthi Harish ◽  
Ben Zhang ◽  
Peter Stella ◽  
Kevin Hauck ◽  
Marwa M Moussa ◽  
...  

ObjectivesPredictive studies play important roles in the development of models informing care for patients with COVID-19. Our concern is that studies producing ill-performing models may lead to inappropriate clinical decision-making. Thus, our objective is to summarise and characterise performance of prognostic models for COVID-19 on external data.MethodsWe performed a validation of parsimonious prognostic models for patients with COVID-19 from a literature search for published and preprint articles. Ten models meeting inclusion criteria were either (a) externally validated with our data against the model variables and weights or (b) rebuilt using original features if no weights were provided. Nine studies had internally or externally validated models on cohorts of between 18 and 320 inpatients with COVID-19. One model used cross-validation. Our external validation cohort consisted of 4444 patients with COVID-19 hospitalised between 1 March and 27 May 2020.ResultsMost models failed validation when applied to our institution’s data. Included studies reported an average validation area under the receiver–operator curve (AUROC) of 0.828. Models applied with reported features averaged an AUROC of 0.66 when validated on our data. Models rebuilt with the same features averaged an AUROC of 0.755 when validated on our data. In both cases, models did not validate against their studies’ reported AUROC values.DiscussionPublished and preprint prognostic models for patients infected with COVID-19 performed substantially worse when applied to external data. Further inquiry is required to elucidate mechanisms underlying performance deviations.ConclusionsClinicians should employ caution when applying models for clinical prediction without careful validation on local data.


2020 ◽  
Vol 27 (6) ◽  
pp. 917-923
Author(s):  
Liqin Wang ◽  
Suzanne V Blackley ◽  
Kimberly G Blumenthal ◽  
Sharmitha Yerneni ◽  
Foster R Goss ◽  
...  

Abstract Objective Incomplete and static reaction picklists in the allergy module led to free-text and missing entries that inhibit the clinical decision support intended to prevent adverse drug reactions. We developed a novel, data-driven, “dynamic” reaction picklist to improve allergy documentation in the electronic health record (EHR). Materials and Methods We split 3 decades of allergy entries in the EHR of a large Massachusetts healthcare system into development and validation datasets. We consolidated duplicate allergens and those with the same ingredients or allergen groups. We created a reaction value set via expert review of a previously developed value set and then applied natural language processing to reconcile reactions from structured and free-text entries. Three association rule-mining measures were used to develop a comprehensive reaction picklist dynamically ranked by allergen. The dynamic picklist was assessed using recall at top k suggested reactions, comparing performance to the static picklist. Results The modified reaction value set contained 490 reaction concepts. Among 4 234 327 allergy entries collected, 7463 unique consolidated allergens and 469 unique reactions were identified. Of the 3 dynamic reaction picklists developed, the 1 with the optimal ranking achieved recalls of 0.632, 0.763, and 0.822 at the top 5, 10, and 15, respectively, significantly outperforming the static reaction picklist ranked by reaction frequency. Conclusion The dynamic reaction picklist developed using EHR data and a statistical measure was superior to the static picklist and suggested proper reactions for allergy documentation. Further studies might evaluate the usability and impact on allergy documentation in the EHR.


CJEM ◽  
2020 ◽  
Vol 22 (S1) ◽  
pp. S90-S90
Author(s):  
A. Kirubarajan ◽  
A. Taher ◽  
S. Khan ◽  
S. Masood

Introduction: The study of artificial intelligence (AI) in medicine has become increasingly popular over the last decade. The emergency department (ED) is uniquely situated to benefit from AI due to its power of diagnostic prediction, and its ability to continuously improve with time. However, there is a lack of understanding of the breadth and scope of AI applications in emergency medicine, and evidence supporting its use. Methods: Our scoping review was completed according to PRISMA-ScR guidelines and was published a priori on Open Science Forum. We systematically searched databases (Medline-OVID, EMBASE, CINAHL, and IEEE) for AI interventions relevant to the ED. Study selection and data extraction was performed independently by two investigators. We categorized studies based on type of AI model used, location of intervention, clinical focus, intervention sub-type, and type of comparator. Results: Of the 1483 original database citations, a total of 181 studies were included in the scoping review. Inter-rater reliability for study screening for titles and abstracts was 89.1%, and for full-text review was 77.8%. Overall, we found that 44 (24.3%) studies utilized supervised learning, 63 (34.8%) studies evaluated unsupervised learning, and 13 (7.2%) studies utilized natural language processing. 17 (9.4%) studies were conducted in the pre-hospital environment, with the remainder occurring either in the ED or the trauma bay. The majority of interventions centered around prediction (n = 73, 40.3%). 48 studies (25.5%) analyzed AI interventions for diagnosis. 23 (12.7%) interventions focused on diagnostic imaging. 89 (49.2%) studies did not have a comparator to their AI intervention. 63 (34.8%) studies used statistical models as a comparator, 19 (10.5%) of which were clinical decision making tools. 15 (8.3%) studies used humans as comparators, with 12 of the 15 (80%) studies showing superiority in favour of the AI intervention when compared to a human. Conclusion: AI-related research is rapidly increasing in emergency medicine. AI interventions are heterogeneous in both purpose and design, but primarily focus on predictive modeling. Most studies do not involve a human comparator and lack information on patient-oriented outcomes. While some studies show promising results for AI-based interventions, there remains uncertainty regarding their superiority over standard practice, and further research is needed prior to clinical implementation.


BMJ Open ◽  
2019 ◽  
Vol 9 (12) ◽  
pp. e033374 ◽  
Author(s):  
Daniela Balzi ◽  
Giulia Carreras ◽  
Francesco Tonarelli ◽  
Luca Degli Esposti ◽  
Paola Michelozzi ◽  
...  

ObjectiveIdentification of older patients at risk, among those accessing the emergency department (ED), may support clinical decision-making. To this purpose, we developed and validated the Dynamic Silver Code (DSC), a score based on real-time linkage of administrative data.Design and settingThe ‘Silver Code National Project (SCNP)’, a non-concurrent cohort study, was used for retrospective development and internal validation of the DSC. External validation was obtained in the ‘Anziani in DEA (AIDEA)’ concurrent cohort study, where the DSC was generated by the software routinely used in the ED.ParticipantsThe SCNP contained 281 321 records of 180 079 residents aged 75+ years from Tuscany and Lazio, Italy, admitted via the ED to Internal Medicine or Geriatrics units. The AIDEA study enrolled 4425 subjects aged 75+ years (5217 records) accessing two EDs in the area of Florence, Italy.InterventionsNone.Outcome measuresPrimary outcome: 1-year mortality. Secondary outcomes: 7 and 30-day mortality and 1-year recurrent ED visits.ResultsAdvancing age, male gender, previous hospital admission, discharge diagnosis, time from discharge and polypharmacy predicted 1-year mortality and contributed to the DSC in the development subsample of the SCNP cohort. Based on score quartiles, participants were classified into low, medium, high and very high-risk classes. In the SCNP validation sample, mortality increased progressively from 144 to 367 per 1000 person-years, across DSC classes, with HR (95% CI) of 1.92 (1.85 to 1.99), 2.71 (2.61 to 2.81) and 5.40 (5.21 to 5.59) in class II, III and IV, respectively versus class I (p<0.001). Findings were similar in AIDEA, where the DSC predicted also recurrent ED visits in 1 year. In both databases, the DSC predicted 7 and 30-day mortality.ConclusionsThe DSC, based on administrative data available in real time, predicts prognosis of older patients and might improve their management in the ED.


Author(s):  
Tal Linzen ◽  
Emmanuel Dupoux ◽  
Yoav Goldberg

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.


Sign in / Sign up

Export Citation Format

Share Document