Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

Yiqing Zhao; Saravut J. Weroha; Ellen L. Goode; Hongfang Liu; Chen Wang

doi:10.1186/s12911-020-01364-y

Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01364-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yiqing Zhao ◽

Saravut J. Weroha ◽

Ellen L. Goode ◽

Hongfang Liu ◽

Chen Wang

Keyword(s):

Targeted Therapy ◽

Data Quality ◽

Real World ◽

Genetic Information ◽

Genetic Data ◽

Real World Data ◽

Rule Based ◽

Clinical Notes ◽

Real World Evidence ◽

F Measure

Abstract Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.

Download Full-text

Generating Real-World Evidence from Unstructured Clinical Notes to Examine Clinical Utility of Genetic Tests: Use Case in BRCAness

10.21203/rs.3.rs-41553/v1 ◽

2020 ◽

Author(s):

Yiqing ZHAO ◽

Saravut J Weroha ◽

Ellen Goode ◽

Hongfang Liu ◽

Chen Wang

Keyword(s):

Targeted Therapy ◽

Data Quality ◽

Real World ◽

Genetic Information ◽

Genetic Data ◽

Real World Data ◽

Rule Based ◽

Clinical Notes ◽

Real World Evidence ◽

F Measure

Abstract Background: Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information.Methods: We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated on a Foundation-tested women cancer cohort (N=196). Upon retrieval of patients’ genetic information using NLP system, we assessed completeness of genetic data captured in unstructured clinical notes according a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results: We identified seven topics in clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance (VUS). Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies.Conclusions: In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issue such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate real-world utility of genetic information to initiate prescription of targeted therapy.

Download Full-text

Generating Real-world Evidence from Unstructured Clinical Notes to Examine Clinical Utility of Genetic Tests: Use Case in BRCAness

10.21203/rs.3.rs-41553/v2 ◽

2020 ◽

Author(s):

Yiqing ZHAO ◽

Saravut J Weroha ◽

Ellen Goode ◽

Hongfang Liu ◽

Chen Wang

Keyword(s):

Targeted Therapy ◽

Data Quality ◽

Real World ◽

Genetic Information ◽

Genetic Data ◽

Real World Data ◽

Rule Based ◽

Clinical Notes ◽

Real World Evidence ◽

F Measure

Abstract Background : Next- generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods : We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N=196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy . Results: We identified seven topics in clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance (VUS) . O ur rule-based system achieved a precision of 0.87, recall of 0.93 and F- measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F- measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F- measure of 0.82 for seven-topic classification. We found in result-containing sentences, capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions : In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes . Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate real-world utility of genetic information to initiate a prescription of targeted therapy.

Download Full-text

Generating Real-world Evidence from Unstructured Clinical Notes to Examine Clinical Utility of Genetic Tests: Use Case in BRCAness

10.21203/rs.3.rs-41553/v3 ◽

2020 ◽

Author(s):

Yiqing ZHAO ◽

Saravut J Weroha ◽

Ellen Goode ◽

Hongfang Liu ◽

Chen Wang

Keyword(s):

Targeted Therapy ◽

Data Quality ◽

Real World ◽

Genetic Information ◽

Genetic Data ◽

Real World Data ◽

Rule Based ◽

Clinical Notes ◽

Real World Evidence ◽

F Measure

Abstract Background: Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information.Methods: We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N=196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results: We identified seven topics in clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance (VUS). Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies.Conclusions: In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate real-world utility of genetic information to initiate a prescription of targeted therapy.

Download Full-text

Increasing Trust in Real-World Evidence Through Evaluation of Observational Data Quality

10.1101/2021.03.25.21254341 ◽

2021 ◽

Author(s):

Clair Blacketer ◽

Frank J Defalco ◽

Patrick B Ryan ◽

Peter R Rijnbeek

Keyword(s):

Data Quality ◽

Real World ◽

R Package ◽

Observational Research ◽

Real World Data ◽

Quality Reporting ◽

Healthcare Data ◽

Real World Evidence ◽

Quality Checks

Advances in standardization of observational healthcare data have enabled methodological breakthroughs, rapid global collaboration, and generation of real-world evidence to improve patient outcomes. Standardizations in data structure, such as use of Common Data Models (CDM), need to be coupled with standardized approaches for data quality assessment. To ensure confidence in real-world evidence generated from the analysis of real-world data, one must first have confidence in the data itself. The Data Quality Dashboard is an open-source R package that reports potential quality issues in an OMOP CDM instance through the systematic execution and summarization of over 3,300 configurable data quality checks. We describe the implementation of check types across a data quality framework of conformance, completeness, plausibility, with both verification and validation. We illustrate how data quality checks, paired with decision thresholds, can be configured to customize data quality reporting across a range of observational health data sources. We discuss how data quality reporting can become part of the overall real-world evidence generation and dissemination process to promote transparency and build confidence in the resulting output. Transparently communicating how well CDM standardized databases adhere to a set of quality measures adds a crucial piece that is currently missing from observational research. Assessing and improving the quality of our data will inherently improve the quality of the evidence we generate.

Download Full-text

Validate Real-World Data-based Endpoint Measures of Cancer Treatment Outcomes

10.1101/2021.06.10.21258706 ◽

2021 ◽

Author(s):

Qian Li ◽

Hansi Zhang ◽

Zhaoyi Chen ◽

Yi Guo ◽

Thomas George ◽

...

Keyword(s):

Overall Survival ◽

Clinical Trials ◽

Data Quality ◽

Treatment Outcomes ◽

Real World ◽

Gold Standard ◽

Surrogate Marker ◽

Real World Data ◽

World Data ◽

Real World Evidence

Recently, there is a growing interest in using real-world data (RWD) to generate real-world evidence (RWE) that complements clinical trials. Nevertheless, to quantify the treatment effects, it is important to develop meaningful RWD-based endpoints. In cancer trials, two real-world endpoints are particularly of interest: real-world overall survival (rwOS) and real-world time to next treatment (rwTTNT). In this work, we identified ways to calculate these real-world endpoints with structured EHR data, and validated these endpoints against the gold-standard measurements of these endpoints derived from linked EHR and TR data. In addition, we also examined and reported the data quality issues especially the inconsistency between the EHR and TR data. Using survival model, our result showed that patients (1) without subsequent chemotherapy or (2) with subsequent chemotherapy and longer rwTTNT, would have longer rwOS, showing the validity of using rwTTNT as a real-world surrogate marker for measuring cancer endpoints.

Download Full-text

Assessing Real-World Data Quality: The Application of Patient Registry Quality Criteria to Real-World Data and Real-World Evidence

Therapeutic Innovation & Regulatory Science ◽

10.1177/2168479019837520 ◽

2019 ◽

pp. 216847901983752

Author(s):

Richard E. Gliklich ◽

Michelle B. Leavy

Keyword(s):

Data Quality ◽

Real World ◽

Quality Criteria ◽

Patient Registry ◽

Real World Data ◽

World Data ◽

Real World Evidence

Download Full-text

Assessing Real-World Data Quality: The Application of Patient Registry Quality Criteria to Real-World Data and Real-World Evidence

Therapeutic Innovation & Regulatory Science ◽

10.1007/s43441-019-00058-6 ◽

2020 ◽

Vol 54 (2) ◽

pp. 303-307 ◽

Cited By ~ 2

Author(s):

Richard E. Gliklich ◽

Michelle B. Leavy

Keyword(s):

Data Quality ◽

Real World ◽

Quality Criteria ◽

Patient Registry ◽

Real World Data ◽

World Data ◽

Real World Evidence

Download Full-text

Real-world evidence: perspectives on challenges, value, and alignment of regulatory and national health technology assessment data collection requirements

International Journal of Technology Assessment in Health Care ◽

10.1017/s0266462321000131 ◽

2021 ◽

Vol 37 ◽

Author(s):

Hannah Sievers ◽

Angelika Joos ◽

Mickaël Hiligsmann

Keyword(s):

Health Technology Assessment ◽

Technology Assessment ◽

Real World ◽

Health Technology ◽

Practical Experience ◽

Added Value ◽

Real World Data ◽

Conflicting Demands ◽

Real World Evidence ◽

Semistructured Interviews

Abstract Objective This study aims to assess stakeholder perceptions on the challenges and value of real-world evidence (RWE) post approval, the differences in regulatory and health technology assessment (HTA) real-world data (RWD) collection requirements under the German regulation for more safety in drug supply (GSAV), and future alignment opportunities to create a complementary framework for postapproval RWE requirements. Methods Eleven semistructured interviews were conducted purposively with pharmaceutical industry experts, regulatory authorities, health technology assessment bodies (HTAbs), and academia. The interview questions focused on the role of RWE post approval, the added value and challenges of RWE, the most important requirements for RWD collection, experience with registries as a source of RWD, perceptions on the GSAV law, RWE requirements in other countries, and the differences between regulatory and HTA requirements and alignment opportunities. The interviews were recorded, transcribed, and translated for coding in Nvivo to summarize the findings. Results All experts agree that RWE could close evidence gaps by showing the actual value of medicines in patients under real-world conditions. However, experts acknowledged certain challenges such as: (i) heterogeneous perspectives and differences in outcome measures for RWE generation and (ii) missing practical experience with RWD collected through mandatory registries within the German benefit assessment due to an unclear implementation of the GSAV. Conclusions This study revealed that all stakeholder groups recognize the added value of RWE but experience conflicting demands for RWD collection. Harmonizing requirements can be achieved through common postlicensing evidence generation (PLEG) plans and joint scientific advice to address uncertainties regarding evidence needs and to optimize drug development.

Download Full-text

Real-World Data and Real-World Evidence

Innovative Methods for Rare Disease Drug Development ◽

10.1201/9781003049364-8 ◽

2020 ◽

pp. 141-166

Author(s):

Shein-Chung Chow

Keyword(s):

Real World ◽

Real World Data ◽

World Data ◽

Real World Evidence

Download Full-text

Disease Network Delineates the Disease Progression Profile of Cardiovascular Diseases

10.1101/2020.09.09.290585 ◽

2020 ◽

Author(s):

Zefang Tang ◽

Yiqin Yu ◽

Kenney Ng ◽

Daby Sow ◽

Jianying Hu ◽

...

Keyword(s):

Cardiovascular Diseases ◽

Disease Progression ◽

Real World ◽

Disease Risk ◽

Disease Network ◽

Real World Data ◽

Risk Models ◽

Age Bias ◽

Real World Evidence ◽

Tremendous Amount

AbstractAs Electronic Health Records (EHR) data accumulated explosively in recent years, the tremendous amount of patient clinical data provided opportunities to discover real world evidence. In this study, a graphical disease network, named progressive cardiovascular disease network (progCDN), was built based on EHR data from 14.3 million patients 1 to delineate the progression profiles of cardiovascular diseases (CVD). The network depicted the dominant diseases in CVD development, such as the heart failure and coronary arteriosclerosis. Novel progression relationships were also discovered, such as the progression path from long QT syndrome to major depression. In addition, three age-group progCDNs identified a series of age-associated disease progression paths and important successor diseases with age bias. Furthermore, we extracted a list of salient features to build a series of disease risk models based on the progression pairs in the disease network. The progCDN network can be further used to validate or explore novel disease relationships in real world data. Features with sufficient abundance and high correlation can be widely applied to train disease risk models when using EHR data.

Download Full-text