Date of Award

December 2014

Degree Type

Thesis

Degree Name

Master of Science

Department

Health Care Informatics

First Advisor

Rashmi Prasad

Committee Members

Timothy Patrick, Rohit Kate

Keywords

Information Extraction, Patient Demographics, Patient Psychiatric Diagnoses, Psychology, Text Mining

Abstract

Automatic extraction of patient demographics and psychiatric diagnoses from clinical notes allows for the collection of patient data on a large scale. This data could be used for a variety of research purposes including outcomes studies or developing clinical trials. However, current research has not yet discussed the automatic extraction of demographics and psychiatric diagnoses in detail. The aim of this study is to apply text mining to extract patient demographics - age, gender, marital status, education level, and admission diagnoses from the psychiatric assessments at a mental health hospital and also assign codes to each category. Gender is coded as either Male or Female, marital status is coded as either Single, Married, Divorced, or Widowed, and education level can be coded starting with Some High School through Graduate Degree (PhD/JD/MD etc. Level). Classifications for diagnoses are based on the DSM-IV. For each category, a rule-based approach was developed utilizing keyword-based regular expressions as well as constituency trees and typed dependencies. We employ a two-step approach that first maximizes recall through the development of keyword-based patterns and if necessary, maximizes precision by using NLP-based rules to handle the problem of ambiguity. To develop and evaluate our method, we annotated a corpus of 200 assessments, using a portion of the corpus for developing the method and the rest as a test set. F-score was satisfactory for each category (Age: 0.997; Gender: 0.989; Primary Diagnosis: 0.983; Marital Status: 0.875; Education Level: 0.851) as was coding accuracy (Age: 1.0; Gender: 0.989; Primary Diagnosis: 0.922; Marital Status: 0.889; Education Level: 0.778). These results indicate that a rule-based approach could be considered for extracting these types of information in the psychiatric field. At the same time, the results showed a drop in performance from the development set to the test set, which is partly due to the need for more generality in the rules developed.

Recommended Citation

Klosterman, Eric James, "Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments" (2014). Theses and Dissertations. 613.
https://dc.uwm.edu/etd/613

Download

Included in

Computer Sciences Commons, Medicine and Health Sciences Commons, Psychology Commons

COinS

UWM Digital Commons

Theses and Dissertations

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Date of Award

Degree Type

Degree Name

Department

First Advisor

Committee Members

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Links

UWM Digital Commons

Theses and Dissertations

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Author

Date of Award

Degree Type

Degree Name

Department

First Advisor

Committee Members

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Links