Abstract
Background
An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. Prior research on risk-prediction models for CDI have focused on a small number of risk factors with the goal of developing a model that works well across hospitals. We hypothesize that risk factors are, in part, hospital-specific. We applied a generalizable machine learning approach to discovering, or “learning”, hospital-specific risk-stratification models using electronic health record (EHR) data collected during the course of patient care from the Massachusetts General Hospital (MGH) and the University of Michigan Health System (UM).
Methods
We utilized EHR data from 115,958 adult inpatient admissions from 2012–2014 (MGH) and 258,050 adult inpatient admissions from 2010–2016 (UM) (Fig 1). We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 2,964 and 4,739 features in the MGH and UM models, respectively. We used L2 regularized logistic regression to learn the models and measured the discriminative performance of the models on a year of held-out data from each hospital.
Results
The MGH and UM models achieved AUROCs of 0.74 (CI: 0.73–0.75) and 0.77 (CI: 0.75–0.80), respectively. The relative importance of risk factors varied significantly across hospitals. In particular, in-hospital locations appeared in the set of top risk factors at one hospital and in the set of protective factors at the other. On average, both models were able to predict CDI five days in advance of clinical diagnosis (Fig 2).
Conclusion
We used EHR data to generate a daily estimate of the risk of CDI for each inpatient hospitalization. We applied a generalizable data-driven approach to existing data from two large institutions with different patient populations and different data formats and content. In contrast to approaches that focus on learning models that apply generally across hospitals, our proposed approach yields risk stratification models tailored to an institution’s EHR system and patient population. In turn, these hospital-specific models could allow for earlier and more accurate identification of high-risk patients.
Disclosures
All authors: No reported disclosures.