Date of Award
August 2020
Degree Type
Thesis
Degree Name
Master of Science
Department
Mathematics
First Advisor
Istvan Lauko
Committee Members
Istvan Lauko, Jeb Willenbring, Gabriella Pinter
Keywords
BERT, Carthon, Machine Learning, Math, NLP, NLU
Abstract
Recent advances in natural language processing technology have led to the emergence of
large and deep pre-trained neural networks. The use and focus of these networks are on transfer
learning. More specifically, retraining or fine-tuning such pre-trained networks to achieve state
of the art performance in a variety of challenging natural language processing/understanding
(NLP/NLU) tasks. In this thesis, we focus on identifying paraphrases at the sentence level using
the network Bidirectional Encoder Representations from Transformers (BERT). It is well
understood that in deep learning the volume and quality of training data is a determining factor
of performance. The objective of this thesis is to develop a methodology for algorithmic
generation of high-quality training data for paraphrasing task, an important NLU task, as well as
the evaluation of the resulting training data on fine-tuning BERT to identify paraphrases. Here
we will focus on elementary adverbial paraphrases, but the methodology extends to the general
case. In this work, training data for adverbial paraphrasing was generated utilizing an Oxfordiii
synonym dictionary, and we used the generated data to re-train BERT for the paraphrasing task
with strong results, achieving a validation accuracy of 96.875%.
Recommended Citation
Carthon, Mark Anthony, "Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks" (2020). Theses and Dissertations. 2476.
https://dc.uwm.edu/etd/2476