Date of Award

May 2023

Degree Type


Degree Name

Doctor of Philosophy


Educational Psychology

First Advisor

Razia Azen

Committee Members

Brian Patterson, Bo Zhang, Jake Luo


Binary Classification, Dataset Shift, Machine Learning


Deployment of a classifier into a machine learning application likely begins with training different types of algorithms on a subset of the available historical data and then evaluating them on datasets that are drawn from identical distributions. The goal of this evaluation process is to select the classifier that is believed to be most robust in maintaining good future performance, and then deploy that classifier to end-users who use it to make predictions on new data. Often times, predictive models are deployed in conditions that differ from those used in training, meaning that dataset shift occurred. In these situations, there are no guarantees that predictions made by the predictive model in deployment will still be as reliable and accurate as they were during the training of the model. This study demonstrated a technique that can be utilized by others when selecting a classifier for deployment, as well as the first comparative study that evaluates machine learning classifier performance on synthetic datasets with different levels of prior-probability, covariate, and concept dataset shifts.

The results from this study showed the impact of dataset shift on the performance of different classifiers for two real-world datasets related to teacher retention in Wisconsin and detecting fraud in testing, as well as demonstrated a framework that can be used by others when selecting a classifier for deployment. By using the methods from this study as a proactive approach to evaluate classifiers on synthetic dataset shift, different classifiers would have been considered for deployment of both predictive models, compared to only using evaluation datasets that were drawn from identical distributions. The results from both real-world datasets also showed that there was no classifier that dealt well with prior-probability shift and that classifiers were affected less by covariate and concept shift than was expected. Two supplemental demonstrations of the methodology showed that it can be extended for additional purposes of evaluating classifiers on dataset shift. Results from analyzing the effects of hyperparameter choices on classifier performance under dataset shift, as well as the effects of actual dataset shift on classifier performance, showed that different hyperparameter configurations have an impact on the performance of a classifier in general, but can also have an impact on how robust that classifier might be to dataset shift.