Date of Award
December 2013
Degree Type
Thesis
Degree Name
Master of Science
Department
Mathematics
First Advisor
Kyle Swanson
Committee Members
Clark Evans, Johnathan Kahl
Keywords
Bias, Crowdsourced, Hail, mPING, Non-Meteorological, PING
Abstract
Hail is a substantial severe weather hazard in the USA, with significant damage to property and
crops occurring annually. Traditional methods of forecasting hail size have limited accuracy, and despite
improvements in remote sensing of precipitation, the fall characteristics of hail make quantification of
hail imprecise. Research into hail is ongoing, but traditional hail datasets have known biases and low
spatiotemporal resolution. The increased usage of smartphones creates the opportunity to use a
crowdsourced dataset provided by the Precipitation Identification Near the Ground (PING) program, a
program developed by the National Severe Storms Laboratory. PING data is compared to approximate
ground truth in the form of preliminary Severe Prediction Center (SPC) hail reports, and National
Weather Service (NWS) issued severe warning polygons. Biases and inaccuracies in the dataset are also
explored through exploratory data analysis.
While PING reports did not suffer from biases based on time of day or day of week, the location
of PING reports was found to have a heavy bias towards high population density areas compared to SPC
reports. Skill scores of PING reports, compared to SPC reports, were low, with a remarkably high False
Alarm Rate (FAR), indicating false reports being a problem in the PING dataset. Comparing PING reports
to severe polygons did not substantially improve the skill scores. The low number of severe PING reports
prevented any meaningful analysis of size accuracy. While the number of SPC reports were mostly
correlated with the number of warning polygons issued by each Weather Forecast Office, the PING
reports were not well correlated, with an anomalously high number of reports in the Oklahoma City
region. The inaccuracy of PING reports and strong population bias suggest that the PING hail database
may not have high utility, and should only be used in conjunction with other databases in order to
ensure quality.
Recommended Citation
Pehoski, Joseph Robert, "A Crowdsourced Hail Dataset: Potential, Biases, and Inaccuracies" (2013). Theses and Dissertations. 301.
https://dc.uwm.edu/etd/301