Date of Award

August 2017

Degree Type


Degree Name

Doctor of Philosophy



First Advisor

Zengwang Xu

Committee Members

Changshan Wu, Ryan Holifield, Woonsup Choi, Hyejin Yoon


Cluster Analysis, Iterative Proportional Fitting, Population Synthesis, Random Forest Model, Travel Activity Patterns


Population synthesis is a fundamental procedure for individual-based modeling in transportation research. The population synthesis generates anonymized individuals with selected social-demographic variables that have similar statistical distributions as that of the samples from the real population. Previous studies on population synthesis focused on generating general-purpose population by fitting the joint distributions of multiple variables to their sampled distributions. In addition to fitting the joint distributions, this study focuses on generating population for travel activity analysis by considering individuals’ travel activity patterns and associated social, economic, and demographic characteristics.

A person’s daily movement is a time-sequence of activities connected by travel behaviors. It can be described as vectors that include important transportation attributes such as travel distance, travel mode, activity type, activity time, and activity sequence. A multidimensional pattern vector method is used in this study to represent an individual’s daily travel activities. This method is based on the combination of time-geography, sequence alignment, and pattern vector. Using the 2001 and 2009 National Household Travel Survey (NHTS), the travel distance and activity sequence of individuals are normalized, compared, and integrated into a dissimilarity matrix. Major travel activity patterns are then examined by cluster analysis. The random forest model is applied to examine the prominent socio-demographic characteristics that correlate to the activity patterns. The prominent socio-demographic characteristics are then used to synthesize population microdata. Since the algorithm complexity of population synthesis grows exponentially with the number of attributes, the methodology used in this study can effectively reduce the computational intensity by focusing on the most important variables for travel activity analysis.

This study also addresses another issue in traditional population synthesis algorithms, i.e., the probability distributions at the individual and household levels cannot be fitted simultaneously. In this study, Iterative Proportional Fitting (IPF) algorithm is used to consider the distributions at different scales and to generate synthetic population microdata with the prominent socio-demographic characteristics. The performance of the algorithm that generates synthesized population is evaluated by scatter plot and Normalized Root Mean Square Error (NRMSE) analysis. In addition, the distributions of socio-demographic attributes in the synthesized data are compared with that of variables in the observed sample dataset. The verification result indicates that the new method can produce a better population microdata.

This dissertation describes how to generate a synthetic population for Milwaukee County, WI with prominent socio-demographic variables for travel activity analysis. By critically selecting the prominent socio-demographic factors, the computational intensity of population synthesis is reduced. It is also found that, by aggregating the IPF-generated weights of individuals and using them to the household level, the overall goodness-of-fit can be managed at a reasonable level and the distributions of socio-demographic factors at the individual and household levels can be fitted.

Included in

Geography Commons