Date of Award

August 2020

Degree Type


Degree Name

Master of Science


Health Care Informatics

First Advisor

Jake Luo


Clinical Trials, Comorbidities, Machine Learning, Sepsis


Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.

This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an eventual amended disease definition. Longitudinal clinical trial data (ntrials=30,915) were vectorized into a machine-readable format compatible with predictive modeling, selected and reduced in dimension, and used to predict incidences of sepsis via application of several machine learning models: logistic regression, support vector machines (SVM), naïve Bayes Classifier, decision trees, and random forests. The intent of the study was to identify possible predictive features for sepsis via comparative analysis of different machine learning models, and to recommend subsequent study of sepsis prediction using the training model on new data (non-clinical-trial-derived) in the same format. If the models can be generalized to new data, it stands to assume they could eventually become clinically useful. In referencing F1 scores and recall scores, the random forest classifier was the best performer among this cohort of models.