Date of Award

May 2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Engineering

First Advisor

Rohit Kate

Committee Members

Rohit Kate, Jake Luo, Tian Zhao, Jun Zhang, Zeyun Yu

Keywords

Bidirectional Encoder Representations from Transformers (BERT), Clinical Ontology, Medical Ontology, Ontology Embeddings, SNOMED CT, Word Embeddings

Abstract

ABSTRACT Leveraging Biomedical Ontological Knowledge to Improve Clinical Term Embeddings by Fuad Abu Zahra The University of Wisconsin-Milwaukee, 2023 Under the Supervision of Dr. Rohit J. Kate This research is on obtaining and using word embeddings for natural language processing tasks in the biomedical domain. Word embeddings are vector representations of words commonly obtained from large text corpora. This research leverages the biomedical ontology of SNOMED CT as an alternate source for obtaining embeddings for clinical terms. The existing graph-based methods can only give embeddings for concepts (i.e., nodes of the graph) of an ontology, hence we developed a novel method to obtain embeddings for clinical words and terms from their concept embeddings. These embeddings were evaluated on benchmark datasets of clinical term similarity and on the clinical term normalization task and were found to work better than corpus-based embeddings. However, unlike corpus-based embeddings, the embeddings obtained from SNOMED CT do not incorporate linguistic knowledge as the method was not trained on text data. Therefore, we also developed two new methods to combine the two resources of embeddings – by generating a synthetic corpus out of SNOMED CT ontology and using it for additional training using corpus-based methods, and by fine-tuning a corpus-based system on SNOMED CT concept embeddings. The evaluation showed that the combined embeddings obtained using these methods perform better than either type of embeddings.

Share

COinS