Statistical Development and Validation of Clinical Prediction Models

2021 ◽  
Author(s):  
Steven J. Staffa ◽  
David Zurakowski

Summary Clinical prediction models in anesthesia and surgery research have many clinical applications including preoperative risk stratification with implications for clinical utility in decision-making, resource utilization, and costs. It is imperative that predictive algorithms and multivariable models are validated in a suitable and comprehensive way in order to establish the robustness of the model in terms of accuracy, predictive ability, reliability, and generalizability. The purpose of this article is to educate anesthesia researchers at an introductory level on important statistical concepts involved with development and validation of multivariable prediction models for a binary outcome. Methods covered include assessments of discrimination and calibration through internal and external validation. An anesthesia research publication is examined to illustrate the process and presentation of multivariable prediction model development and validation for a binary outcome. Properly assessing the statistical and clinical validity of a multivariable prediction model is essential for reassuring the generalizability and reproducibility of the published tool.

2021 ◽  
Author(s):  
Cynthia Yang ◽  
Jan A. Kors ◽  
Solomon Ioannou ◽  
Luis H. John ◽  
Aniek F. Markus ◽  
...  

Objectives This systematic review aims to provide further insights into the conduct and reporting of clinical prediction model development and validation over time. We focus on assessing the reporting of information necessary to enable external validation by other investigators. Materials and Methods We searched Embase, Medline, Web-of-Science, Cochrane Library and Google Scholar to identify studies that developed one or more multivariable prognostic prediction models using electronic health record (EHR) data published in the period 2009-2019. Results We identified 422 studies that developed a total of 579 clinical prediction models using EHR data. We observed a steep increase over the years in the number of developed models. The percentage of models externally validated in the same paper remained at around 10%. Throughout 2009-2019, for both the target population and the outcome definitions, code lists were provided for less than 20% of the models. For about half of the models that were developed using regression analysis, the final model was not completely presented. Discussion Overall, we observed limited improvement over time in the conduct and reporting of clinical prediction model development and validation. In particular, the prediction problem definition was often not clearly reported, and the final model was often not completely presented. Conclusion Improvement in the reporting of information necessary to enable external validation by other investigators is still urgently needed to increase clinical adoption of developed models.


Author(s):  
Jianfeng Xie ◽  
Daniel Hungerford ◽  
Hui Chen ◽  
Simon T Abrams ◽  
Shusheng Li ◽  
...  

SummaryBackgroundCOVID-19 pandemic has developed rapidly and the ability to stratify the most vulnerable patients is vital. However, routinely used severity scoring systems are often low on diagnosis, even in non-survivors. Therefore, clinical prediction models for mortality are urgently required.MethodsWe developed and internally validated a multivariable logistic regression model to predict inpatient mortality in COVID-19 positive patients using data collected retrospectively from Tongji Hospital, Wuhan (299 patients). External validation was conducted using a retrospective cohort from Jinyintan Hospital, Wuhan (145 patients). Nine variables commonly measured in these acute settings were considered for model development, including age, biomarkers and comorbidities. Backwards stepwise selection and bootstrap resampling were used for model development and internal validation. We assessed discrimination via the C statistic, and calibration using calibration-in-the-large, calibration slopes and plots.FindingsThe final model included age, lymphocyte count, lactate dehydrogenase and SpO2 as independent predictors of mortality. Discrimination of the model was excellent in both internal (c=0·89) and external (c=0·98) validation. Internal calibration was excellent (calibration slope=1). External validation showed some over-prediction of risk in low-risk individuals and under-prediction of risk in high-risk individuals prior to recalibration. Recalibration of the intercept and slope led to excellent performance of the model in independent data.InterpretationCOVID-19 is a new disease and behaves differently from common critical illnesses. This study provides a new prediction model to identify patients with lethal COVID-19. Its practical reliance on commonly available parameters should improve usage of limited healthcare resources and patient survival rate.FundingThis study was supported by following funding: Key Research and Development Plan of Jiangsu Province (BE2018743 and BE2019749), National Institute for Health Research (NIHR) (PDF-2018-11-ST2-006), British Heart Foundation (BHF) (PG/16/65/32313) and Liverpool University Hospitals NHS Foundation Trust in UK.Research in contextEvidence before this studySince the outbreak of COVID-19, there has been a pressing need for development of a prognostic tool that is easy for clinicians to use. Recently, a Lancet publication showed that in a cohort of 191 patients with COVID-19, age, SOFA score and D-dimer measurements were associated with mortality. No other publication involving prognostic factors or models has been identified to date.Added value of this studyIn our cohorts of 444 patients from two hospitals, SOFA scores were low in the majority of patients on admission. The relevance of D-dimer could not be verified, as it is not included in routine laboratory tests. In this study, we have established a multivariable clinical prediction model using a development cohort of 299 patients from one hospital. After backwards selection, four variables, including age, lymphocyte count, lactate dehydrogenase and SpO2 remained in the model to predict mortality. This has been validated internally and externally with a cohort of 145 patients from a different hospital. Discrimination of the model was excellent in both internal (c=0·89) and external (c=0·98) validation. Calibration plots showed excellent agreement between predicted and observed probabilities of mortality after recalibration of the model to account for underlying differences in the risk profile of the datasets. This demonstrated that the model is able to make reliable predictions in patients from different hospitals. In addition, these variables agree with pathological mechanisms and the model is easy to use in all types of clinical settings.Implication of all the available evidenceAfter further external validation in different countries the model will enable better risk stratification and more targeted management of patients with COVID-19. With the nomogram, this model that is based on readily available parameters can help clinicians to stratify COVID-19 patients on diagnosis to use limited healthcare resources effectively and improve patient outcome.


2020 ◽  
Author(s):  
Fernanda Gonçalves Silva ◽  
Leonardo Oliveira Pena Costa ◽  
Mark J Hancock ◽  
Gabriele Alves Palomo ◽  
Luciola da Cunha Menezes Costa ◽  
...  

Abstract Background: The prognosis of acute low back pain is generally favourable in terms of pain and disability; however, outcomes vary substantially between individual patients. Clinical prediction models help in estimating the likelihood of an outcome at a certain time point. There are existing clinical prediction models focused on prognosis for patients with low back pain. To date, there is only one previous systematic review summarising the discrimination of validated clinical prediction models to identify the prognosis in patients with low back pain of less than 3 months duration. The aim of this systematic review is to identify existing developed and/or validated clinical prediction models on prognosis of patients with low back pain of less than 3 months duration, and to summarise their performance in terms of discrimination and calibration. Methods: MEDLINE, Embase and CINAHL databases will be searched, from the inception of these databases until January 2020. Eligibility criteria will be: (1) prognostic model development studies with or without external validation, or prognostic external validation studies with or without model updating; (2) with adults aged 18 or over, with ‘recent onset’ low back pain (i.e. less than 3 months duration), with or without leg pain; (3) outcomes of pain, disability, sick leave or days absent from work or return to work status, and self-reported recovery; and (4) study with a follow-up of at least 12 weeks duration. The risk of bias of the included studies will be assessed by the Prediction model Risk Of Bias ASsessment Tool, and the overall quality of evidence will be rated using the Hierarchy of Evidence for Clinical Prediction Rules. Discussion: This systematic review will identify, appraise, and summarize evidence on the performance of existing prediction models for prognosis of low back pain, and may help clinicians to choose the best option of prediction model to better inform patients about their likely prognosis. Systematic review registration: PROSPERO reference number CRD42020160988


2018 ◽  
Vol 22 (66) ◽  
pp. 1-294 ◽  
Author(s):  
Rachel Archer ◽  
Emma Hock ◽  
Jean Hamilton ◽  
John Stevens ◽  
Munira Essat ◽  
...  

Background Rheumatoid arthritis (RA) is a chronic, debilitating disease associated with reduced quality of life and substantial costs. It is unclear which tests and assessment tools allow the best assessment of prognosis in people with early RA and whether or not variables predict the response of patients to different drug treatments. Objective To systematically review evidence on the use of selected tests and assessment tools in patients with early RA (1) in the evaluation of a prognosis (review 1) and (2) as predictive markers of treatment response (review 2). Data sources Electronic databases (e.g. MEDLINE, EMBASE, The Cochrane Library, Web of Science Conference Proceedings; searched to September 2016), registers, key websites, hand-searching of reference lists of included studies and key systematic reviews and contact with experts. Study selection Review 1 – primary studies on the development, external validation and impact of clinical prediction models for selected outcomes in adult early RA patients. Review 2 – primary studies on the interaction between selected baseline covariates and treatment (conventional and biological disease-modifying antirheumatic drugs) on salient outcomes in adult early RA patients. Results Review 1 – 22 model development studies and one combined model development/external validation study reporting 39 clinical prediction models were included. Five external validation studies evaluating eight clinical prediction models for radiographic joint damage were also included. c-statistics from internal validation ranged from 0.63 to 0.87 for radiographic progression (different definitions, six studies) and 0.78 to 0.82 for the Health Assessment Questionnaire (HAQ). Predictive performance in external validations varied considerably. Three models [(1) Active controlled Study of Patients receiving Infliximab for the treatment of Rheumatoid arthritis of Early onset (ASPIRE) C-reactive protein (ASPIRE CRP), (2) ASPIRE erythrocyte sedimentation rate (ASPIRE ESR) and (3) Behandelings Strategie (BeSt)] were externally validated using the same outcome definition in more than one population. Results of the random-effects meta-analysis suggested substantial uncertainty in the expected predictive performance of models in a new sample of patients. Review 2 – 12 studies were identified. Covariates examined included anti-citrullinated protein/peptide anti-body (ACPA) status, smoking status, erosions, rheumatoid factor status, C-reactive protein level, erythrocyte sedimentation rate, swollen joint count (SJC), body mass index and vascularity of synovium on power Doppler ultrasound (PDUS). Outcomes examined included erosions/radiographic progression, disease activity, physical function and Disease Activity Score-28 remission. There was statistical evidence to suggest that ACPA status, SJC and PDUS status at baseline may be treatment effect modifiers, but not necessarily that they are prognostic of response for all treatments. Most of the results were subject to considerable uncertainty and were not statistically significant. Limitations The meta-analysis in review 1 was limited by the availability of only a small number of external validation studies. Studies rarely investigated the interaction between predictors and treatment. Suggested research priorities Collaborative research (including the use of individual participant data) is needed to further develop and externally validate the clinical prediction models. The clinical prediction models should be validated with respect to individual treatments. Future assessments of treatment by covariate interactions should follow good statistical practice. Conclusions Review 1 – uncertainty remains over the optimal prediction model(s) for use in clinical practice. Review 2 – in general, there was insufficient evidence that the effect of treatment depended on baseline characteristics. Study registration This study is registered as PROSPERO CRD42016042402. Funding The National Institute for Health Research Health Technology Assessment programme.


2021 ◽  
Author(s):  
Arjun Chandna ◽  
Raman Mahajan ◽  
Priyanka Gautam ◽  
Lazaro Mwandigha ◽  
Karthik Gunasekaran ◽  
...  

ABSTRACTBackgroundIn locations where few people have received COVID-19 vaccines, health systems remain vulnerable to surges in SARS-CoV-2 infections. Tools to identify patients suitable for community-based management are urgently needed.MethodsWe prospectively recruited adults presenting to two hospitals in India with moderate symptoms of laboratory-confirmed COVID-19 in order to develop and validate a clinical prediction model to rule-out progression to supplemental oxygen requirement. The primary outcome was defined as any of the following: SpO2 < 94%; respiratory rate > 30 bpm; SpO2/FiO2 < 400; or death. We specified a priori that each model would contain three clinical parameters (age, sex and SpO2) and one of seven shortlisted biochemical biomarkers measurable using near-patient tests (CRP, D-dimer, IL-6, NLR, PCT, sTREM-1 or suPAR), to ensure the models would be suitable for resource-limited settings. We evaluated discrimination, calibration and clinical utility of the models in a temporal external validation cohort.Findings426 participants were recruited, of whom 89 (21·0%) met the primary outcome. 257 participants comprised the development cohort and 166 comprised the validation cohort. The three models containing NLR, suPAR or IL-6 demonstrated promising discrimination (c-statistics: 0·72 to 0·74) and calibration (calibration slopes: 1·01 to 1·05) in the validation cohort, and provided greater utility than a model containing the clinical parameters alone.InterpretationWe present three clinical prediction models that could help clinicians identify patients with moderate COVID-19 suitable for community-based management. The models are readily implementable and of particular relevance for locations with limited resources.FundingMédecins Sans Frontières, India.RESEARCH IN CONTEXTEvidence before this studyA living systematic review by Wynants et al. identified 137 COVID-19 prediction models, 47 of which were derived to predict whether patients with COVID-19 will have an adverse outcome. Most lacked external validation, relied on retrospective data, did not focus on patients with moderate disease, were at high risk of bias, and were not practical for use in resource-limited settings. To identify promising biochemical biomarkers which may have been evaluated independently of a prediction model and therefore not captured by this review, we searched PubMed on 1 June 2020 using synonyms of “SARS-CoV-2” AND [“biomarker” OR “prognosis”]. We identified 1,214 studies evaluating biochemical biomarkers of potential value in the prognostication of COVID-19 illness. In consultation with FIND (Geneva, Switzerland) we shortlisted seven candidates for evaluation in this study, all of which are measurable using near-patient tests which are either currently available or in late-stage development.Added value of this studyWe followed the TRIPOD guidelines to develop and validate three promising clinical prediction models to help clinicians identify which patients presenting with moderate COVID-19 can be safely managed in the community. Each model contains three easily ascertained clinical parameters (age, sex, and SpO2) and one biochemical biomarker (NLR, suPAR or IL-6), and would be practical for implementation in high-patient-throughput low resource settings. The models showed promising discrimination and calibration in the validation cohort. The inclusion of a biomarker test improved prognostication compared to a model containing the clinical parameters alone, and extended the range of contexts in which such a tool might provide utility to include situations when bed pressures are less critical, for example at earlier points in a COVID-19 surge.Implications of all the available evidencePrognostic models should be developed for clearly-defined clinical use-cases. We report the development and temporal validation of three clinical prediction models to rule-out progression to supplemental oxygen requirement amongst patients presenting with moderate COVID-19. The models are readily implementable and should prove useful in triage and resource allocation. We provide our full models to enable independent validation.


2021 ◽  
Author(s):  
Richard D. Riley ◽  
Thomas P. A. Debray ◽  
Gary S. Collins ◽  
Lucinda Archer ◽  
Joie Ensor ◽  
...  

2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
A Youssef

Abstract Study question Which models that predict pregnancy outcome in couples with unexplained RPL exist and what is the performance of the most used model? Summary answer We identified seven prediction models; none followed the recommended prediction model development steps. Moreover, the most used model showed poor predictive performance. What is known already RPL remains unexplained in 50–75% of couples For these couples, there is no effective treatment option and clinical management rests on supportive care. Essential part of supportive care consists of counselling on the prognosis of subsequent pregnancies. Indeed, multiple prediction models exist, however the quality and validity of these models varies. In addition, the prediction model developed by Brigham et al is the most widely used model, but has never been externally validated. Study design, size, duration We performed a systematic review to identify prediction models for pregnancy outcome after unexplained RPL. In addition we performed an external validation of the Brigham model in a retrospective cohort, consisting of 668 couples with unexplained RPL that visited our RPL clinic between 2004 and 2019. Participants/materials, setting, methods A systematic search was performed in December 2020 in Pubmed, Embase, Web of Science and Cochrane library to identify relevant studies. Eligible studies were selected and assessed according to the TRIPOD) guidelines, covering topics on model performance and validation statement. The performance of predicting live birth in the Brigham model was evaluated through calibration and discrimination, in which the observed pregnancy rates were compared to the predicted pregnancy rates. Main results and the role of chance Seven models were compared and assessed according to the TRIPOD statement. This resulted in two studies of low, three of moderate and two of above average reporting quality. These studies did not follow the recommended steps for model development and did not calculate a sample size. Furthermore, the predictive performance of neither of these models was internally- or externally validated. We performed an external validation of Brigham model. Calibration showed overestimation of the model and too extreme predictions, with a negative calibration intercept of –0.52 (CI 95% –0.68 – –0.36), with a calibration slope of 0.39 (CI 95% 0.07 – 0.71). The discriminative ability of the model was very low with a concordance statistic of 0.55 (CI 95% 0.50 – 0.59). Limitations, reasons for caution None of the studies are specifically named prediction models, therefore models may have been missed in the selection process. The external validation cohort used a retrospective design, in which only the first pregnancy after intake was registered. Follow-up time was not limited, which is important in counselling unexplained RPL couples. Wider implications of the findings: Currently, there are no suitable models that predict on pregnancy outcome after RPL. Moreover, we are in need of a model with several variables such that prognosis is individualized, and factors from both the female as the male to enable a couple specific prognosis. Trial registration number Not applicable


2019 ◽  
Vol 98 (10) ◽  
pp. 1088-1095 ◽  
Author(s):  
J. Krois ◽  
C. Graetz ◽  
B. Holtfreter ◽  
P. Brinkmann ◽  
T. Kocher ◽  
...  

Prediction models learn patterns from available data (training) and are then validated on new data (testing). Prediction modeling is increasingly common in dental research. We aimed to evaluate how different model development and validation steps affect the predictive performance of tooth loss prediction models of patients with periodontitis. Two independent cohorts (627 patients, 11,651 teeth) were followed over a mean ± SD 18.2 ± 5.6 y (Kiel cohort) and 6.6 ± 2.9 y (Greifswald cohort). Tooth loss and 10 patient- and tooth-level predictors were recorded. The impact of different model development and validation steps was evaluated: 1) model complexity (logistic regression, recursive partitioning, random forest, extreme gradient boosting), 2) sample size (full data set or 10%, 25%, or 75% of cases dropped at random), 3) prediction periods (maximum 10, 15, or 20 y or uncensored), and 4) validation schemes (internal or external by centers/time). Tooth loss was generally a rare event (880 teeth were lost). All models showed limited sensitivity but high specificity. Patients’ age and tooth loss at baseline as well as probing pocket depths showed high variable importance. More complex models (random forest, extreme gradient boosting) had no consistent advantages over simpler ones (logistic regression, recursive partitioning). Internal validation (in sample) overestimated the predictive power (area under the curve up to 0.90), while external validation (out of sample) found lower areas under the curve (range 0.62 to 0.82). Reducing the sample size decreased the predictive power, particularly for more complex models. Censoring the prediction period had only limited impact. When the model was trained in one period and tested in another, model outcomes were similar to the base case, indicating temporal validation as a valid option. No model showed higher accuracy than the no-information rate. In conclusion, none of the developed models would be useful in a clinical setting, despite high accuracy. During modeling, rigorous development and external validation should be applied and reported accordingly.


2020 ◽  
Vol 35 (1) ◽  
pp. 100-116 ◽  
Author(s):  
M B Ratna ◽  
S Bhattacharya ◽  
B Abdulrahim ◽  
D J McLernon

Abstract STUDY QUESTION What are the best-quality clinical prediction models in IVF (including ICSI) treatment to inform clinicians and their patients of their chance of success? SUMMARY ANSWER The review recommends the McLernon post-treatment model for predicting the cumulative chance of live birth over and up to six complete cycles of IVF. WHAT IS KNOWN ALREADY Prediction models in IVF have not found widespread use in routine clinical practice. This could be due to their limited predictive accuracy and clinical utility. A previous systematic review of IVF prediction models, published a decade ago and which has never been updated, did not assess the methodological quality of existing models nor provided recommendations for the best-quality models for use in clinical practice. STUDY DESIGN, SIZE, DURATION The electronic databases OVID MEDLINE, OVID EMBASE and Cochrane library were searched systematically for primary articles published from 1978 to January 2019 using search terms on the development and/or validation (internal and external) of models in predicting pregnancy or live birth. No language or any other restrictions were applied. PARTICIPANTS/MATERIALS, SETTING, METHODS The PRISMA flowchart was used for the inclusion of studies after screening. All studies reporting on the development and/or validation of IVF prediction models were included. Articles reporting on women who had any treatment elements involving donor eggs or sperm and surrogacy were excluded. The CHARMS checklist was used to extract and critically appraise the methodological quality of the included articles. We evaluated models’ performance by assessing their c-statistics and plots of calibration in studies and assessed correct reporting by calculating the percentage of the TRIPOD 22 checklist items met in each study. MAIN RESULTS AND THE ROLE OF CHANCE We identified 33 publications reporting on 35 prediction models. Seventeen articles had been published since the last systematic review. The quality of models has improved over time with regard to clinical relevance, methodological rigour and utility. The percentage of TRIPOD score for all included studies ranged from 29 to 95%, and the c-statistics of all externally validated studies ranged between 0.55 and 0.77. Most of the models predicted the chance of pregnancy/live birth for a single fresh cycle. Six models aimed to predict the chance of pregnancy/live birth per individual treatment cycle, and three predicted more clinically relevant outcomes such as cumulative pregnancy/live birth. The McLernon (pre- and post-treatment) models predict the cumulative chance of live birth over multiple complete cycles of IVF per woman where a complete cycle includes all fresh and frozen embryo transfers from the same episode of ovarian stimulation. McLernon models were developed using national UK data and had the highest TRIPOD score, and the post-treatment model performed best on external validation. LIMITATIONS, REASONS FOR CAUTION To assess the reporting quality of all included studies, we used the TRIPOD checklist, but many of the earlier IVF prediction models were developed and validated before the formal TRIPOD reporting was published in 2015. It should also be noted that two of the authors of this systematic review are authors of the McLernon model article. However, we feel we have conducted our review and made our recommendations using a fair and transparent systematic approach. WIDER IMPLICATIONS OF THE FINDINGS This study provides a comprehensive picture of the evolving quality of IVF prediction models. Clinicians should use the most appropriate model to suit their patients’ needs. We recommend the McLernon post-treatment model as a counselling tool to inform couples of their predicted chance of success over and up to six complete cycles. However, it requires further external validation to assess applicability in countries with different IVF practices and policies. STUDY FUNDING/COMPETING INTEREST(S) The study was funded by the Elphinstone Scholarship Scheme and the Assisted Reproduction Unit, University of Aberdeen. Both D.J.M. and S.B. are authors of the McLernon model article and S.B. is Editor in Chief of Human Reproduction Open. They have completed and submitted the ICMJE forms for Disclosure of potential Conflicts of Interest. The other co-authors have no conflicts of interest to declare. REGISTRATION NUMBER N/A


Sign in / Sign up

Export Citation Format

Share Document