scholarly journals A Generalizable, Data-Driven Approach to Predict Daily Risk ofClostridium difficileInfection at Two Large Academic Health Centers

2018 ◽  
Vol 39 (4) ◽  
pp. 425-433 ◽  
Author(s):  
Jeeheh Oh ◽  
Maggie Makar ◽  
Christopher Fusco ◽  
Robert McCaffrey ◽  
Krishna Rao ◽  
...  

OBJECTIVEAn estimated 293,300 healthcare-associated cases ofClostridium difficileinfection (CDI) occur annually in the United States. To date, research has focused on developing risk prediction models for CDI that work well across institutions. However, this one-size-fits-all approach ignores important hospital-specific factors. We focus on a generalizable method for building facility-specific models. We demonstrate the applicability of the approach using electronic health records (EHR) from the University of Michigan Hospitals (UM) and the Massachusetts General Hospital (MGH).METHODSWe utilized EHR data from 191,014 adult admissions to UM and 65,718 adult admissions to MGH. We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 4,836 features from patients at UM and 1,837 from patients at MGH. We used L2 regularized logistic regression to learn the models, and we measured the discriminative performance of the models on held-out data from each hospital.RESULTSUsing the UM and MGH test data, the models achieved area under the receiver operating characteristic curve (AUROC) values of 0.82 (95% confidence interval [CI], 0.80–0.84) and 0.75 ( 95% CI, 0.73–0.78), respectively. Some predictive factors were shared between the 2 models, but many of the top predictive factors differed between facilities.CONCLUSIONA data-driven approach to building models for estimating daily patient risk for CDI was used to build institution-specific models at 2 large hospitals with different patient populations and EHR systems. In contrast to traditional approaches that focus on developing models that apply across hospitals, our generalizable approach yields risk-stratification models tailored to an institution. These hospital-specific models allow for earlier and more accurate identification of high-risk patients and better targeting of infection prevention strategies.Infect Control Hosp Epidemiol2018;39:425–433

2017 ◽  
Vol 4 (suppl_1) ◽  
pp. S403-S404
Author(s):  
Maggie Makar ◽  
Jeeheh Oh ◽  
Christopher Fusco ◽  
Joseph Marchesani ◽  
Robert McCaffrey ◽  
...  

Abstract Background An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. Prior research on risk-prediction models for CDI have focused on a small number of risk factors with the goal of developing a model that works well across hospitals. We hypothesize that risk factors are, in part, hospital-specific. We applied a generalizable machine learning approach to discovering, or “learning”, hospital-specific risk-stratification models using electronic health record (EHR) data collected during the course of patient care from the Massachusetts General Hospital (MGH) and the University of Michigan Health System (UM). Methods We utilized EHR data from 115,958 adult inpatient admissions from 2012–2014 (MGH) and 258,050 adult inpatient admissions from 2010–2016 (UM) (Fig 1). We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 2,964 and 4,739 features in the MGH and UM models, respectively. We used L2 regularized logistic regression to learn the models and measured the discriminative performance of the models on a year of held-out data from each hospital. Results The MGH and UM models achieved AUROCs of 0.74 (CI: 0.73–0.75) and 0.77 (CI: 0.75–0.80), respectively. The relative importance of risk factors varied significantly across hospitals. In particular, in-hospital locations appeared in the set of top risk factors at one hospital and in the set of protective factors at the other. On average, both models were able to predict CDI five days in advance of clinical diagnosis (Fig 2). Conclusion We used EHR data to generate a daily estimate of the risk of CDI for each inpatient hospitalization. We applied a generalizable data-driven approach to existing data from two large institutions with different patient populations and different data formats and content. In contrast to approaches that focus on learning models that apply generally across hospitals, our proposed approach yields risk stratification models tailored to an institution’s EHR system and patient population. In turn, these hospital-specific models could allow for earlier and more accurate identification of high-risk patients. Disclosures All authors: No reported disclosures.


2021 ◽  
Vol 33 (4) ◽  
pp. 167-184
Author(s):  
Chih-Hung Yuan ◽  
Chia-Huei Wu ◽  
Dajiang Wang ◽  
Shiyun Yao ◽  
Yingying Feng

This study uses a content analysis method to systematically review 83 research papers from 2002-2018 to explore consumer-to-consumer (C2C) e-commerce research trends. The findings of this study indicate that (1) C2C e-commerce is discussed and investigated in many disciplines, but mainly published in e-commerce journals; (2) studies on C2C e-commerce increasingly focus on diverse topics, but concentrate on regions such as China and the United States; (3) the focus of academic collaboration has shifted from domestic to international collaboration, and collaboration within the same institution. However, collaboration is scarce across different study teams; (4) the data-driven approach is the main approach used in studies on C2C e-commerce; (5) while the number of recent C2C e-commerce studies adopted theories is increasing, few have developed theoretical frameworks or models. Finally, study implications and future study suggestions are also discussed.


Author(s):  
Vedat Bayram ◽  
Gohram Baloch ◽  
Fatma Gzara ◽  
Samir Elhedhli

Optimizing warehouse processes has direct impact on supply chain responsiveness, timely order fulfillment, and customer satisfaction. In this work, we focus on the picking process in warehouse management and study it from a data perspective. Using historical data from an industrial partner, we introduce, model, and study the robust order batching problem (ROBP) that groups orders into batches to minimize total order processing time accounting for uncertainty caused by system congestion and human behavior. We provide a generalizable, data-driven approach that overcomes warehouse-specific assumptions characterizing most of the work in the literature. We analyze historical data to understand the processes in the warehouse, to predict processing times, and to improve order processing. We introduce the ROBP and develop an efficient learning-based branch-and-price algorithm based on simultaneous column and row generation, embedded with alternative prediction models such as linear regression and random forest that predict processing time of a batch. We conduct extensive computational experiments to test the performance of the proposed approach and to derive managerial insights based on real data. The data-driven prescriptive analytics tool we propose achieves savings of seven to eight minutes per order, which translates into a 14.8% increase in daily picking operations capacity of the warehouse.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4469
Author(s):  
Fahad Kamran ◽  
Victor C. Le ◽  
Adam Frischknecht ◽  
Jenna Wiens ◽  
Kathleen H. Sienko

Dehydration beyond 2% bodyweight loss should be monitored to reduce the risk of heat-related injuries during exercise. However, assessments of hydration in athletic settings can be limited in their accuracy and accessibility. In this study, we sought to develop a data-driven noninvasive approach to measure hydration status, leveraging wearable sensors and normal orthostatic movements. Twenty participants (10 males, 25.0 ± 6.6 years; 10 females, 27.8 ± 4.3 years) completed two exercise sessions in a heated environment: one session was completed without fluid replacement. Before and after exercise, participants performed 12 postural movements that varied in length (up to 2 min). Logistic regression models were trained to estimate dehydration status given their heart rate responses to these postural movements. The area under the receiver operating characteristic curve (AUROC) was used to parameterize the model’s discriminative ability. Models achieved an AUROC of 0.79 (IQR: 0.75, 0.91) when discriminating 2% bodyweight loss. The AUROC for the longer supine-to-stand postural movements and shorter toe-touches were similar (0.89, IQR: 0.89, 1.00). Shorter orthostatic tests achieved similar accuracy to clinical tests. The findings suggest that data from wearable sensors can be used to accurately estimate mild dehydration in athletes. In practice, this method may provide an additional measurement for early intervention of severe dehydration.


2021 ◽  
Author(s):  
Cecilia E Thomas ◽  
Leo Dahl ◽  
Sanna Byström ◽  
Yan Chen ◽  
Mathias Uhlén ◽  
...  

Background: Risk prediction is crucial for early detection and prognosis of breast cancer. Circulating plasma proteins could provide a valuable source to increase the validity of risk prediction models, however, no such markers have yet been identified for clinical use. Methods: EDTA plasma samples from 183 breast cancer cases and 366 age-matched controls were collected prior to diagnosis from the Swedish breast cancer cohort KARMA. The samples were profiled on 700 circulating proteins using an exploratory affinity proteomics approach. Linear association analyses were performed on case-control status and a data-driven analysis strategy was applied to cluster the women on their plasma proteome profiles in an unsupervised manner. The resulting clusters were subsequently annotated for the differences in phenotypic characteristics, clinical parameters, and genetic risk. Results: Using the data-driven approach we identified five clusters with distinct proteomic plasma profiles. Women in a particular sub-group (cluster 1) were significantly more likely to have used menopausal hormonal therapy (MHT), more likely to get a breast cancer diagnosis, and were older compared to the remaining clusters. The levels of circulating proteins in cluster 1 were decreased for proteins related to DNA repair and cell replication and increased for proteins related to mammographic density and female tissues. In contrast, classical dichotomous case-control analyses did not reveal any proteins significantly associated with future breast cancer. Conclusion: Using a data-driven approach, we identified a subset of women with circulating proteins associated with previous use of MHT and risk of breast cancer. Our findings point to the potential long-lasting effects of MHT on the circulating proteome even after ending the treatment, and hence provide valuable insights concerning risk predication of breast cancer.


2020 ◽  
Vol 10 (16) ◽  
pp. 5696 ◽  
Author(s):  
Samar A. Shilbayeh ◽  
Abdullah Abonamah ◽  
Ahmad A. Masri

Prediction models of coronavirus disease utilizing machine learning algorithms range from forecasting future suspect cases, predicting mortality rates, to building a pattern for country-specific pandemic end date. To predict the future suspect infection and death cases, we categorized the approaches found in the literature into: first, a purely data-driven approach, whose goal is to build a mathematical model that relates the data variables including outputs with inputs to detect general patterns. The discovered patterns can then be used to predict the future infected cases without any expert input. The second approach is partially data-driven; it uses historical data, but allows expert input such as the SIR epidemic algorithm. This approach assumes that the epidemic will end according to medical reasoning. In this paper, we compare the purely data-driven and partially-data driven approaches by applying them to data from three countries having different past pattern behavior. The countries are the US, Jordan, and Italy. It is found that those two prediction approaches yield significantly different results. Purely data-driven approach depends totally on the past behavior and does not show any decline in the number of the infected cases if the country did not experience any decline in the number of cases. On the other hand, a partially data-driven approach guarantees a timely decline of the infected curve to reach zero. Using the two approaches highlights the importance of human intervention in pandemic prediction to guide the learning process as opposed to the purely data-driven approach that predicts future cases based on the pattern detected in the data.


2019 ◽  
Vol 30 (4) ◽  
pp. 524-531
Author(s):  
Taylor E. Purvis ◽  
Brian J. Neuman ◽  
Lee H. Riley ◽  
Richard L. Skolasky

OBJECTIVEIn this paper, the authors demonstrate to spine surgeons the prevalence and severity of anxiety and depression among patients presenting for surgery and explore the relationships between different legacy and Patient-Reported Outcomes Measurement Information System (PROMIS) screening measures.METHODSA total of 512 adult spine surgery patients at a single institution completed the 7-item Generalized Anxiety Disorder questionnaire (GAD-7), 8-item Patient Health Questionnaire (PHQ-8) depression scale, and PROMIS Anxiety and Depression computer-adaptive tests (CATs) preoperatively. Correlation coefficients were calculated between PROMIS scores and GAD-7 and PHQ-8 scores. Published reference tables were used to determine the presence of anxiety or depression using GAD-7 and PHQ-8. Sensitivity and specificity of published guidance on the PROMIS Anxiety and Depression CATs were compared. Guidance from 3 sources was compared: published GAD-7 and PHQ-8 crosswalk tables, American Psychiatric Association scales, and expert clinical consensus. Receiver operator characteristic curves were used to determine data-driven cut-points for PROMIS Anxiety and Depression. Significance was accepted as p < 0.05.RESULTSIn 512 spine surgery patients, anxiety and depression were prevalent preoperatively (5% with any anxiety, 24% with generalized anxiety screen-positive; and 54% with any depression, 24% with probable major depression). Correlations were moderately strong between PROMIS Anxiety and GAD-7 scores (r = 0.72; p < 0.001) and between PROMIS Depression and PHQ-8 scores (r = 0.74; p < 0.001). The observed correlation of the PROMIS Depression score was greater with the PHQ-8 cognitive/affective score (r = 0.766) than with the somatic score (r = 0.601) (p < 0.001). PROMIS Anxiety and Depression CATs were able to detect the presence of generalized anxiety screen-positive (sensitivity, 86.0%; specificity, 81.6%) and of probable major depression (sensitivity, 82.3%; specificity, 81.4%). Receiver operating characteristic curve analysis demonstrated data-driven cut-points for these groups.CONCLUSIONSPROMIS Anxiety and Depression CATs are reliable tools for identifying generalized anxiety screen-positive spine surgery patients and those with probable major depression.


2012 ◽  
Author(s):  
Michael Ghil ◽  
Mickael D. Chekroun ◽  
Dmitri Kondrashov ◽  
Michael K. Tippett ◽  
Andrew Robertson ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document