Date of Award

December 2013

Degree Type

Thesis

Degree Name

Master of Science

Department

Engineering

First Advisor

Rashmi Prasad

Committee Members

Susan McRoy, Rohit J. Kate

Keywords

Biomedical Text, Classification, Drug-Drug Interaction, Natural Language Processing

Abstract

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, mechanism, and int. To find the set of best features for classification, we explored many features, including stemmed words, bigrams, part of speech tags, verb lists, parse tree information, mutual information, and similarity measures, among others. As the system faced two different classification tasks, binary and multi-class, we also explored various classifiers in each stage. Our results show that the best performing classifier in both stages was Support Vector Machines, and the best performing features were 1000 top informative words and part of speech tags between two main drugs. We obtained an F-Measure of 0.64, showing a 12% improvement over our submitted system to the DDIExtraction 2013 competition.

Share

COinS