Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage.PeerJ 1:e175  https://doi.org/10.7717/peerj.175

Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.

We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.


Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.  doi:10.1371/journal.pone.0000308

Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available.

We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations.

Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression.


Belter CW (2014) Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets. PLoS ONE 9(3): e92590.  doi:10.1371/journal.pone.0092590

Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities – such as data curation – that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so.

This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center.

My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set.

These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions


Pienta, Amy M.; Alter, George C.; Lyle, Jared A. (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data  http://hdl.handle.net/2027.42/78307

Abstract : The goal of this paper is to examine the extent to which social science research data are shared and assess whether data sharing affects research productivity tied to the research data themselves. We construct a database from administrative records containing information about thousands of social science studies that have been conducted over the last 40 years.

Included in the database are descriptions of social science data collections funded by the National Science Foundation and the National Institutes of Health. A survey of the principal investigators of a subset of these social science awards was also conducted.

We report that very few social science data collections are preserved and disseminated by an archive or institutional repository. Informal sharing of data in the social sciences is much more common. The main analysis examines publication metrics that can be tied to the research data collected with NSF and NIH funding – total publications, primary publications (including PI), and secondary publications (non-research team).

Multivariate models of count of publications suggest that data sharing, especially sharing data through an archive, leads to many more times the publications than not sharing data. This finding is robust even when the models are adjusted for PI characteristics, grant award features, and institutional characteristics


Bertil Dorch. On the Citation Advantage of linking to data: Astrophysics. 2012.  <hprints-00714715v2>

Abstract : This paper present some indications of the existence of a Citation Advantage related to linked data, using astrophysics as a case. Using simple measures, I find that the Citation Advantage presently (at the least since 2009) amounts to papers with links to data receiving on the average 50% more citations per paper per year, than the papers without links to data.

A similar study by other authors should a cumulative effect after several years amounting to 20%. Hence, a Data Sharing Citation Advantage seems inevitable.


Edwin A. Henneken, Alberto Accomazzi (2011) Linking to Data – Effect on Citation Rates in Astronomy.  http://arxiv.org/abs/1111.3618

Abstract: Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data.

However, linking to data still is a far cry away from being a “practice”, especially where it comes to authors providing these links during the writing and submission process. You need to have both a willingness and a publication mechanism in order to create such a practice.

Showing that articles with links to data get higher citation rates might increase the willingness of scientists to take the extra steps of linking data sources to their publications. In this presentation we will show this is indeed the case: articles with links to data result in higher citation rates than articles without such link