Date of Award
May 2024
Degree Type
Thesis
Degree Name
Master of Science
Department
Computer Science
First Advisor
Susan McRoy
Abstract
Anemia is a global health problem, and over 2 billion people are affected. Although, the major cause of anemia is iron deficiency (IDA), global estimates suggest that only about half of anemia could be attributed to ID. The typical test of anemia involves measurement of hemoglobin using Complete Blood Count (CBC) test, which also gives additional information on blood cell numbers and morphology. The diagnosis of iron deficiency anemia (IDA, both anemic and ID co-exist in a subject) requires additional expensive serum ferritin test. However, blood cell count, and morphology can also be utilized for diagnosis of IDA. The goal of this study therefore is to evaluate and compare methods for training, testing, and explaining machine learning (ML) models using data from routine CBC tests to identify IDA. Here we evaluate data-driven, machine learning methods to classify IDA from more available CBC data using a US-NHANES dataset of over 19,500 instances and explain the results as ranked feature importance. The results show that, using CBC variables, IDA can be classified with a precision-recall area under the curve (PR AUC) of 0.88 and recall/sensitivity of 0.98 and 0.84 for the original dataset and an unseen one, collected in Kenya respectively. The explanations indicate which aspects of the CBC results most contribute to a diagnosis, revealing that optimization made only minor changes to the model and that the features used remained consistent with professional practice, suggesting that the approach would be acceptable to health professionals.
Recommended Citation
pullakhandam, siddartha, "CLASSIFICATION AND EXPLANATION OF IRON DEFICIENCY ANEMIA FROM COMPLETE BLOOD COUNT DATA USING MACHINE LEARNING" (2024). Theses and Dissertations. 3508.
https://dc.uwm.edu/etd/3508