Date of Award

May 2024

Degree Type


Degree Name

Doctor of Philosophy



First Advisor

Kundan Kishor

Committee Members

Jangsu Yoon, Tian Zhao, Omid Ardakani


This dissertation explores the utilization of machine learning (ML), deep learning (DL), and social media data analysis to improve the accuracy of predicting stock returns in the housing market. The dissertation aims to investigate different models and data sources in order to enhance the accuracy of forecasting, surpassing the effectiveness of previous methods.

The first chapter of dissertation investigates the efficacy of machine learning models in predicting housing values at different time intervals. The system utilizes various datasets containing home prices and optimizes hyperparameters using cross-validation. The findings indicate that machine learning models surpass traditional time series models, especially when it comes to longer forecast periods. When it comes to shorter forecasts, the performance varies. Simple models like ARMA demonstrate similar or slightly inferior outcomes compared to more complicated machine learning models.

In the second hcapter the research is expanded to include Real Estate Investment Trusts (REITs) by utilizing both Machine Learning (ML) and Deep Learning (DL) models to forecast changes in prices. The daily data from many categories of Real Estate Investment Trusts (REITs), such as office, retail, and data centers, is used for the time period spanning from October 30, 2015, to October 13, 2023. The study utilizes many models such as Random Forest, XGBoost, LightGBM, Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTMs), Gated Recurrent Units (GRUs), and Bidirectional LSTMs (Bi-LSTMs). The results indicate that machine learning (ML) models outperform deep learning (DL) models in terms of forecasting accuracy, as evidenced by their lower Root Mean Squared Errors. Significantly, both machine learning (ML) and deep learning (DL) models surpass a simplistic random walk baseline in terms of performance.

While the third chapter of dissertation explores the potential of utilizing social media data, particularly semantic information collected from Twitter, to enhance the accuracy of predictions for homebuilder stock returns, building upon the achievements of machine learning models. The study evaluates the forecasting accuracy of features extracted from Twitter in comparison to the widely employed macroeconomic indicator, the daily mortgage rate. There are nine prominent machine learning models used for prediction. The results indicate that all models based on features perform better than the random walk baseline, emphasizing the importance of including extra data. Significantly, the utilization of Twitter sentiment elements results in increased prediction accuracy for four out of six homebuilders when compared to solely relying on the mortgage rate. These findings indicate that analyzing social media sentiment can be a helpful method for capturing market sentiment and enhancing predictions of stock returns in the homebuilding industry.

Overall, this dissertation adds value to the study of housing market prediction by demonstrating the efficacy of machine learning models and the potential of social media data to improve forecasting accuracy. The findings offer useful insights for investors, analysts, and researchers who aim to enhance stock return projections in the housing market.

Included in

Economics Commons