Date of Award

May 2024

Degree Type

Thesis

Degree Name

Master of Science

Department

Computer Science

First Advisor

Susan McRoy

Abstract

Anemia is a global health problem, and over 2 billion people are affected. Although, the major cause of anemia is iron deficiency (IDA), global estimates suggest that only about half of anemia could be attributed to ID. The typical test of anemia involves measurement of hemoglobin using Complete Blood Count (CBC) test, which also gives additional information on blood cell numbers and morphology. The diagnosis of iron deficiency anemia (IDA, both anemic and ID co-exist in a subject) requires additional expensive serum ferritin test. However, blood cell count, and morphology can also be utilized for diagnosis of IDA. The goal of this study therefore is to evaluate and compare methods for training, testing, and explaining machine learning (ML) models using data from routine CBC tests to identify IDA. Here we evaluate data-driven, machine learning methods to classify IDA from more available CBC data using a US-NHANES dataset of over 19,500 instances and explain the results as ranked feature importance. The results show that, using CBC variables, IDA can be classified with a precision-recall area under the curve (PR AUC) of 0.88 and recall/sensitivity of 0.98 and 0.84 for the original dataset and an unseen one, collected in Kenya respectively. The explanations indicate which aspects of the CBC results most contribute to a diagnosis, revealing that optimization made only minor changes to the model and that the features used remained consistent with professional practice, suggesting that the approach would be acceptable to health professionals.

Share

COinS