Date of Award
May 2014
Degree Type
Thesis
Degree Name
Master of Science
Department
Engineering
First Advisor
Rohit J. Kate
Committee Members
Susan McRoy, Rashmi Prasad
Keywords
Clinical Text, Conditional Random Fields, Metamap, Named Entity Recognition, Natural Language Processing, UMLS
Abstract
The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition's Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features.
Recommended Citation
Ghiasvand, Omid, "Disease Name Extraction from Clinical Text Using Conditional Random Fields" (2014). Theses and Dissertations. 495.
https://dc.uwm.edu/etd/495