Date of Award

December 2018

Degree Type


Degree Name

Doctor of Philosophy


Information Studies

First Advisor

Dietmar Wolfram

Committee Members

Richard P. Smiraglia, Jacques du Plessis, Margaret E.I. Kipp, Blake Catherine


data citation, data reuse, data sharing, research data


Despite the open science movement and mandates for the sharing of research data by major funding agencies and influential journals, the citation of data sharing and reuse has not become standard practice in the various science, technology, engineering and mathematics (STEM) fields. Advances in technology have lowered some barriers to data sharing, but it is a socio-technical phenomenon and the impact of the ongoing evolution in scholarly communication practices has yet to be quantified. Furthermore, there is need for a deeper and more nuanced understanding of author self-citation and recitation, the most often cited types of data, disciplinary differences regarding data citation and the extent of interdisciplinarity in data citation.

This study employed a mixed methods approach that combined coding with semi-automatic text-searching techniques in order to assess the impact of data sharing and reuse on data citation in STEM fields. The research considered over 500,000 open research data entities, such as datasets, software and data studies, from over 350 repositories worldwide. I also examined 705 bibliographic publications with a total of 15,261 instances of data sharing, reuse, and citation the data, article, discipline and interdisciplinary levels. More specifically, I measured the phenomenon of data sharing in terms of formal data citation, frequently cited data types, and author self-citation, and I explored recitation at the levels of both data- and bibliography-level, and data reuse practices in bibliographies, associations of disciplines, and interdisciplinary contexts.

The results of this research revealed, to begin with, disciplinary differences with regard to the impact of data sharing and reuse on data citation in STEM fields. This research also yielded the following additional findings regarding the citation of data by STEM researchers; 1) data sharing practices were diverse across disciplines: 2) data sharing has been increasing in recent years; 3) each discipline made use of major digital repositories; 4) these repositories took various forms depending on the discipline; 5) certain data types were more often cited in each discipline, so that the frequency distribution of the data types was highly skewed; 6) author self-citation and recitation followed similar trends at the data and bibliographic levels, but specific practices varied within each discipline; 7) associations between and across data and author self-citation and recitation at the bibliographic level were observed, with the self-citation rate differing significantly among disciplines;8) data reuse in bibliographies was rare yet diverse; 9) informal citation of data sharing and reuse at the bibliographic level was more common in certain fields, with astronomy/physics showing the highest amount (98%) and technology the lowest (69%); 10) within bibliographic publications, the documentation of data sharing and reuse occurred mainly in the main text; 11) publications in certain disciplines, such as chemistry, computing and engineering, did not attract citations from more than one field (i.e., showed no diversity); and, on the other hand,12) publications in other fields attracted a wide range of interdisciplinary data citations.

This dissertation, then, contributes to the understanding of two key areas aspects of the current citation systems. First, the findings have practical implications for individual researchers, decision makers, funding agencies and publishers with regard to giving due credits to those who share their data. Second, this research has methodological implications in terms of reducing the labor required to analyze the full text of associated articles in order to identify evidence of data citation.