Date of Award
Master of Science
Susan W McRoy
Ethan V Munson, Jake Luo
Classification, Machine Learning, Natural Language Processing, Sarcopenia
Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.
A variety of machine learning models combined with different features and text processingare tested against training data that mentions the term and test data that is coded for the condition from small datasets from the Medical College of Wisconsin. This research showed that no tested configurations performed exceptionally well, nor combinations of features, based on the F1 score. Still, some models did show promise, especially those classifying with a support vector machine, as well as other classifiers such as decision trees, gradient boosting and logistic regression. Based on this initial research, while some of the ideas and approaches here did not perform great on the data studied, they provide many some insight and paths forward to extend them and apply them on larger and more precise datasets.
Flasch, Kevin, "Predicting Occurrence of the Term Sarcopenia with Semi-Supervised Machine Learning" (2021). Theses and Dissertations. 2782.