Date of Award


Degree Type


Degree Name

Master of Science



First Advisor

Shengtong Han


Infant Mortality, Survival


According to the Centers for Disease Control and Prevention, the infant mortality rate in the United States in 2018 was 5.6 deaths per 1000 live births. Infant mortality is defined as a child being born alive but dying before their first birthday. This study aimed to determine if adding socioeconomic factors to traditional predictive survival models improved the predictive power in terms of survival for late and post neonatal infants. Secondly, this study looked to develop a risk score to and predict which mothers would be classified as “High” or “Low” risk for infant death.

Data were analyzed from a retrospective cohort study using 2016 Period Linked Birth/Infant Death Data Set from the Centers for Disease Control and Prevention. Kaplan-Meier curves, which model estimated survival functions, were created for the parameters of interest and compared unadjusted survival statistics using Log-rank test. A risk score was developed using Cox Proportional Hazards model from potential predictors. From the start of 2016 through the end of 2017 there were 20,334 infant deaths in the United States. Of these, 7979 (39.2%) occurred after the first week of life, 7477 without congenital abnormalities. Time dependent ROC were used to determine the AUC at each time point from a base model consisting of Apgar score at five minutes, gestation age at birth, and birthweight, and compared them to a model with socioeconomic factors added. Goodness of fit tests were also investigated to see how each model fits the data overall. Kaplan Meier curves of the risk categories on training and validation test sets were not statistically significantly different from each other for both the “High” and “Low” risk groups (Brier score 0.096) indicating that the prediction of risk category is very good.

The model with socioeconomic factors included had better predictive power compared to the base model with very similar AUC values for months 1-5 and then higher AUC values for months 6-11. As well, goodness of fit tests showed that the socioeconomic status (SES) model fit the data much better (Base p< 0.001, SES p= 0.046). Concordance was also a bit higher for the SES model compared to the Base model, 63.76% vs 63.14%. Kaplan Meier curves indicate that there is potential to utilize baseline clinical information to predict whether an infant should be considered as high risk for mortality within the first year of life. With this information, physicians will be able to direct their attention to patients that may require more social or medical interventions.