Date of Award

August 2020

Degree Type


Degree Name

Master of Science


Computer Science

First Advisor

Susan W McRoy

Committee Members

Rohit Kate, Maryam Zolnoori


healthcare, machine learning, opioid dependence medication, social media, text classification, topic modeling


Social media provides a convenient platform for patients to share their drug usage experience with others; consequently, health researchers can leverage this potential data to gain valuable information about users’ drug satisfaction. Since the 1990s, opioid drug abuse has become a national crisis. In order to reduce the dependency of opioids, several drugs have been presented to the market, but little is known about patient satisfaction with these treatments. Sentiment analysis is a method to measure and interpret patients’ satisfaction. In the first phase of this study, we aimed to utilize social media posts to predict patients’ sentiment towards opioid dependency treatment. We focused on Suboxone, a well-known opioid dependence medication, as our targeted treatment and, an online healthcare forum as our data source. For the purpose of our analysis, we first collected 1,532 posts to create a training dataset, split the posts to sentences, and annotated 1100 sentences for sentiment analysis. To predict patients’ sentiment, we extracted features from patients’ posts, including bigrams, trigrams, and features extracted from topic modeling. To develop the prediction model, we used two machine learning methods, Naïve Bayes and SVM, for predicting sentiment. We achieved the best performance using SVM, getting an accuracy of 61% for SVM. In the second phase of this study, we also aimed to understand the behavior of the patients toward the targeted medication. To accomplish this goal, we used the Health Belief Model (HBM), a social psychological model that describes and predicts patients’ health-related attitudes in action, benefit, barrier, and threat categories, for predicting such behavior from patients’ reviews. We also utilized the same combinations of features and machine learning methods that we used in the first phase of the study, and the best accuracy performance was 47% for the SVM classifier as compared to 43% as our baseline.