Understanding the complexity of sepsis mortality prediction via rule discovery and analysis: a pilot study
Abstract Background Sepsis, defined as life-threatening organ dysfunction caused by a dysregulated host response to infection, has become one of the major causes of death in Intensive Care Units (ICUs). The heterogeneity and complexity of this syndrome lead to the absence of golden standards for its diagnosis, treatment, and prognosis. The early prediction of in-hospital mortality for sepsis patients is not only meaningful to medical decision making, but more importantly, relates to the well-being of patients. Methods In this paper, a rule discovery and analysis (rule-based) method is used to predict the in-hospital death events of 2021 ICU patients diagnosed with sepsis using the MIMIC-III database. The method mainly includes two phases: rule discovery phase and rule analysis phase. In the rule discovery phase, the RuleFit method is employed to mine multiple hidden rules which are capable to predict individual in-hospital death events. In the rule analysis phase, survival analysis and decomposition analysis are carried out to test and justify the risk prediction ability of these rules. Then by leveraging a subset of these rules, we establish a prediction model that is both more accurate at the in-hospital death prediction task and more interpretable than most comparable methods. Results In our experiment, RuleFit generates 77 risk prediction rules, and the average area under the curve (AUC) of the prediction model based on 62 of these rules reaches 0.781 ($$\pm 0.018$$ ± 0.018 ) which is comparable to or even better than the AUC of existing methods (i.e., commonly used medical scoring system and benchmark machine learning models). External validation of the prediction power of these 62 rules on another 1468 sepsis patients not included in MIMIC-III in ICU provides further supporting evidence for the superiority of the rule-based method. In addition, we discuss and explain in detail the rules with better risk prediction ability. Glasgow Coma Scale (GCS), serum potassium, and serum bilirubin are found to be the most important risk factors for predicting patient death. Conclusion Our study demonstrates that, with the rule-based method, we could not only make accurate prediction on in-hospital death events of sepsis patients, but also reveal the complex relationship between sepsis-related risk factors through the rules themselves, so as to improve our understanding of the complexity of sepsis as well as its population.