Date of Award
December 2021
Degree Type
Thesis
Degree Name
Master of Science
Department
Computer Science
First Advisor
Susan W McRoy
Committee Members
Ethan V Munson, Jake Luo
Keywords
Classification, Machine Learning, Natural Language Processing, Sarcopenia
Abstract
Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.
A variety of machine learning models combined with different features and text processingare tested against training data that mentions the term and test data that is coded for the condition from small datasets from the Medical College of Wisconsin. This research showed that no tested configurations performed exceptionally well, nor combinations of features, based on the F1 score. Still, some models did show promise, especially those classifying with a support vector machine, as well as other classifiers such as decision trees, gradient boosting and logistic regression. Based on this initial research, while some of the ideas and approaches here did not perform great on the data studied, they provide many some insight and paths forward to extend them and apply them on larger and more precise datasets.
Recommended Citation
Flasch, Kevin, "Predicting Occurrence of the Term Sarcopenia with Semi-Supervised Machine Learning" (2021). Theses and Dissertations. 2782.
https://dc.uwm.edu/etd/2782