Opinion: The Promise and Plight of Open Data

Open science serves to make the research process more transparent. But we are still waiting to realize the fruits of open-data policies at scientific journals.

As humans have faced crises such as COVID-19, climate change, and biodiversity loss, society has become more inclined to seek scientific answers that can rapidly solve our problems. This heightened demand for answers has imposed a profound pressure on scientists to conduct often complex research in a short time frame. As a result, there is less opportunity for review and error correction of results before publication. This rushed demand for information can also promote the spread of misinformation in the scientific literature.

How can we, as scientists and members of the public, verify that published findings are robust and error free?

A long-standing safeguard against errors in science is the peer-review process, where findings undergo extensive and meticulous review by experts in the field. In theory, these experts verify that research submitted for publication is scientifically sound and that findings are robustly supported. However, peer review is conducted by a limited number of experts who are often pressed for time and do not have access to a study’s raw data and analytical steps.

Open science is a movement that aims to promote the accessibility and transparency of scientific research. A burgeoning dimension of open science is a move toward open data—the practice of publicly sharing the data underlying scientific findings. Access to the data used in a scientific study provides reviewers and readers the opportunity to better understand the process that led to the presented results.

At the same time, open data allow anyone to reproduce a study’s analyses and validate its findings. Occasionally, readers identify errors in the data or analyses that slipped through the peer-review process. These errors can be handled through published corrections or retractions, depending on their severity. One would expect open data to result in more errors being identified and fixed in published papers.

But are journals with open-data policies more likely than their traditional counterparts to correct published research with erroneous results? To answer this, we collected information on data policies and article retractions for 199 journals that publish research in the fields of ecology and evolution, and compared retraction rates before and after open-data policies were implemented.

Surprisingly, we found no detectable link between data-sharing policies and annual rates of article retractions. We also found that the publication of corrections was not affected by requirements to share data, and that these results persisted after accounting for differences in publication rates among journals and time lags between policy implementation and study publication due to the peer-review process. While our analysis was restricted to studies in ecology and evolution, colleagues in psychology and medicine have suggested to us that they expect similar patterns in their fields of study.

Do these results mean that open-data policies are ineffective? No. There is no doubt that open data promote transparency, but our results suggest that a greater potential for error detection does not necessarily translate into greater error correction. We propose three additional practices, some of which could actually improve open-data practices, to help science self-correct.

First, journals should ensure that open-data files contain all the information required to reproduce results. Compliance with open-science policies has thus far proven less than ideal. A recent study we coauthored found that more than 50 percent of archived datasets among Canadian ecology and evolution researchers were either incomplete or difficult to reuse. Although it seems high, this percentage is an improvement over the prior state of these data indicated by earlier analyses. We are optimistic that this number will continue to shrink as researchers learn about and are recognized for their efforts in data transparency and open science.

Second, get authors to provide the code used to analyze the data. New research can be more thoroughly reviewed if the analytical code required to produce the results is made openly available. Open code, even if it is imperfect, greatly facilitates validating data and results. Policies that mandate open code have been adopted by some academic journals but they are rarely enforced.

Third, everyone should accept that scientists make mistakes. We need to work together to remove the negative stigma associated with error correction in science. Historically, article retractions have often been associated with research misconduct, such as data fabrication or fraud. It is critical to recognize that errors are frequently honest mistakes, and that correcting errors is a natural part of doing science. One way to destigmatize error correction is to relabel retractions using more-specific terminology (e.g., “self-retraction” and “amendment”). This distinction among error corrections can separate the cases of malice from those of conscientious researchers wanting to set the record straight after identifying mistakes.

Integrating these three factors with open-data policies can strengthen the impact of open-science practices in academia. One possible outcome of open-data policies is encouraging authors to carefully check their work prior to publication, leading to fewer errors in the scientific record. If this were happening, we would have noticed a decrease in the rate of corrections and retractions among journals that mandate open data. While we did not detect this decreased rate in our study, the benefits of careful data review before publication cannot currently be ruled out. Open-data policies are in their infancy, and many benefits are likely to arise as such policies become more common, both at journals and at funding agencies.

For example, the White House recently updated US policy guidance to make federally funded research papers and data publicly available to all Americans as of 2026. International agencies and organizations, including the United Nations Educational, Scientific, and Cultural Organization (UNESCO) and the Organisation for Economic Co-operation and Development (OECD), are also highly supportive of open-science practices, including open data.

Scientists and members of the public improve and maintain the trustworthiness of information by advocating for greater transparency in science. At a time when information has never been more plentiful, concerted efforts are needed to share data and code, as well as to destigmatize error correction to ensure that science is truly self-correcting.

Published courtesy of The Scientist.

Ilias Berberi is a PhD candidate studying the structure and dynamics of competitive interactions among bird species at Carleton University in Canada. Dominique Roche is a postdoctoral researcher studying ecology and the impacts of publicly shared data at the same institution and at the University of Neuchâtel in Switzerland.

Opinion: The Promise and Plight of Open Data

Opinion: The Promise and Plight of Open Data

Leave a Reply

A New Route Towards Ultra-Low Energy Data Storage Technologies

Inside the Urban Machine: Where America’s Data Centers Actually Live

Winning the AI Pentathlon Requires Endurance

Why Better‑off Cities and Towns See More Benefits from Data Centers Than Rural Regions

End-To-End Congestion Control in Data Center Networks: A Survey

New Prize Program Recognizes MIT Researchers Who Make Data Openly Accessible and Reusable

Microsoft, GitHub, Others Announce the Industry Data for Society Partnership

Leave a Reply