Date of Award

August 2020

Degree Type

Thesis

Degree Name

Master of Science

Department

Mathematics

First Advisor

Istvan Lauko

Committee Members

Istvan Lauko, Jeb Willenbring, Gabriella Pinter

Keywords

BERT, Carthon, Machine Learning, Math, NLP, NLU

Abstract

Recent advances in natural language processing technology have led to the emergence of

large and deep pre-trained neural networks. The use and focus of these networks are on transfer

learning. More specifically, retraining or fine-tuning such pre-trained networks to achieve state

of the art performance in a variety of challenging natural language processing/understanding

(NLP/NLU) tasks. In this thesis, we focus on identifying paraphrases at the sentence level using

the network Bidirectional Encoder Representations from Transformers (BERT). It is well

understood that in deep learning the volume and quality of training data is a determining factor

of performance. The objective of this thesis is to develop a methodology for algorithmic

generation of high-quality training data for paraphrasing task, an important NLU task, as well as

the evaluation of the resulting training data on fine-tuning BERT to identify paraphrases. Here

we will focus on elementary adverbial paraphrases, but the methodology extends to the general

case. In this work, training data for adverbial paraphrasing was generated utilizing an Oxfordiii

synonym dictionary, and we used the generated data to re-train BERT for the paraphrasing task

with strong results, achieving a validation accuracy of 96.875%.

Recommended Citation

Carthon, Mark Anthony, "Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks" (2020). Theses and Dissertations. 2476.
https://dc.uwm.edu/etd/2476

Download

Included in

Applied Mathematics Commons, Computer Sciences Commons, Mathematics Commons

COinS

UWM Digital Commons

Theses and Dissertations

Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

Date of Award

Degree Type

Degree Name

Department

First Advisor

Committee Members

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Links

UWM Digital Commons

Theses and Dissertations

Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

Author

Date of Award

Degree Type

Degree Name

Department

First Advisor

Committee Members

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Links