Date of Award

May 2023

Degree Type

Thesis

Degree Name

Master of Science

Department

Mathematics

First Advisor

David DS Spade

Committee Members

Gabriella Pinter, Jeb Willenbring

Keywords

Association Study, Bayesian Lasso Regression, Feature Selection, Genetic data, Lasso Regression

Abstract

Association studies for genetic data are essential to understand the genetic basis of complex traits. However, analyzing such high-dimensional data needs suitable feature selection methods. For this reason, we compare three methods, Lasso Regression, Bayesian Lasso Regression, and Ridge Regression combined with significance tests, to identify the most effective method for modeling quantitative trait expression in genetic data. All methods are applied to both simulated and real genetic data and evaluated in terms of various measures of model performance, such as the mean absolute error, the mean squared error, the Akaike information criterion, and the Bayesian information criterion. The results show that all methods perform better than the ordinary least squares model on the prediction of future data. Moreover, the Lasso Regression outperforms all methods in terms of execution time and simplicity of the model, which therefore leads to better interpretability and makes it the best choice for association studies. Overall this thesis provides valuable insights into the strength and limitations of existing feature selection methods for modeling quantitative trait expression and highlights its importance in association studies for genetic data.

Share

COinS