Date of Award

May 2023

Degree Type


Degree Name

Master of Science



First Advisor

Gabriella Pinter

Committee Members

Istvan Lauko, David Spade


Baseball Analysis, Sabermetrics


From statistics being reported in newspapers in the 1840s, to present day, baseballhas always been one of the most data-driven sports. We make use of the endless publicly available baseball data to build models in R and Python that answer various baseball- related questions regarding predicting and optimizing run production, evaluating player effectiveness, and forecasting the postseason. To predict and optimize run production, we present three models. The first builds a common tool in baseball analysis called a Run Expectancy Matrix which is used to give a value (in terms of runs) to various in-game decisions. The second uses the batting statistics of a lineup of 9 players to predict the average number of runs per game that this team should score. The third gives the optimal position in the lineup of the 9th batter by calculating the average runs per game for each possible batting order placement and returns the lineup that produces the maximum runs. To evaluate player effectiveness, we built a model which calculates a player’s WAR (Wins Above Replacement). To forecast the postseason, we follow an updated version of Bill James’ “World Series Prediction System” developed by Rob Mains and present a code that models this system. Each of these models provides crucial data analysis that is useful in improving the performance of not just individual players, but entire teams.