Date of Award
May 2023
Degree Type
Thesis
Degree Name
Master of Science
Department
Mathematics
First Advisor
David DS Spade
Committee Members
Gabriella Pinter, Jeb Willenbring
Keywords
Association Study, Bayesian Lasso Regression, Feature Selection, Genetic data, Lasso Regression
Abstract
Association studies for genetic data are essential to understand the genetic basis of complex traits. However, analyzing such high-dimensional data needs suitable feature selection methods. For this reason, we compare three methods, Lasso Regression, Bayesian Lasso Regression, and Ridge Regression combined with significance tests, to identify the most effective method for modeling quantitative trait expression in genetic data. All methods are applied to both simulated and real genetic data and evaluated in terms of various measures of model performance, such as the mean absolute error, the mean squared error, the Akaike information criterion, and the Bayesian information criterion. The results show that all methods perform better than the ordinary least squares model on the prediction of future data. Moreover, the Lasso Regression outperforms all methods in terms of execution time and simplicity of the model, which therefore leads to better interpretability and makes it the best choice for association studies. Overall this thesis provides valuable insights into the strength and limitations of existing feature selection methods for modeling quantitative trait expression and highlights its importance in association studies for genetic data.
Recommended Citation
Kubillus, Anna-Lena, "Comparative Study of Variable Selection Methods for Genetic Data" (2023). Theses and Dissertations. 3177.
https://dc.uwm.edu/etd/3177