Initial findings of a new study1 on cancer research appear to bolster the emerging consensus that clinical trials have a “reproducibility” problem — meaning attempts to replicate trials cannot always produce the same results. But a deeper look into the actual mechanics necessary to reproduce a trial reveals a process that is beset by variables that make clear conclusions difficult, one of the authors argues.

“It is hard to replicate something because there are so many factors that could influence it,” says Timothy Errington, PhD, manager of metascience at the Center for Open Science at the University of Virginia in Charlottesville.

Thus, the Reproducibility Project at UVA has undertaken an elaborate attempt to replicate prior research, with the recently published results focusing on cancer research following a similar effort on psychology trials. Efforts were particularly made to ensure that replication failures were not caused by errors in the reproduction experiments. Errington and co-author concluded, “The results of the first set of replication studies are mixed, and while it is too early to draw any conclusions, it is clear that assessing reproducibility in cancer biology is going to be as complex as it was in a similar project in psychology.”

An emerging body of research reveals that past studies — some of which may form the basis of current policies and recommendations — cannot be replicated by investigators today. This lack of “reproducible” research may undermine current studies based on prior findings, particularly as investigators look at the risk-benefit ratio for people participating in a clinical trial. Some studies of the problem estimate that as much as 50% or more of published results in biomedical research cannot be validated.2

In an attempt to address these concerns and shed some much-needed light on the subject, the UVA researchers attempted replications with high statistical power and sought to authenticate the original key biological materials in studies designed to avoid bias. In addition, “the authors of the original papers were contacted in advance for details of the research methodology that may not have appeared in their paper, and were asked to share any original reagents, protocols, and data in order to maximize the quality and fidelity of the replication designs,” the researchers reported.

In the end, they were left with “mixed” results that were not reduced to a percentage of reproducibility at this phase, prompting instead such caveats “that there is no such thing as exact replication because there are always differences between the original study and the replication. These differences could be obvious — like the date, the location of the experiment, or the experimenters — or they could be more subtle, like small differences in reagents or the execution of experimental protocols.”

In addition, a failure to replicate does not necessarily mean the original research was incorrect, the authors concluded.

“It is possible, for example, that differences in the methodologies that were thought to be irrelevant are actually important,” the authors noted. “Indeed, a failed replication can lead to a better understanding of a phenomenon if it results in the generation of new hypotheses to explain how the original and replication methodologies produced different results and, critically, leads to follow-up experiments to test these hypotheses.”

Scientific coverage of the findings was less equivocal, with one journal concluding, “Of the five studies the [cancer research replication] project has tackled so far, some involving experimental treatments already in clinical trials, only two could be repeated, one could not, and technical problems stymied the remaining two replication efforts.”3

IRB Advisor asked Errington to comment on this and other aspects of this complex project.

IRB Advisor: Is the assessment correct that only two of five cancer research results from prior trials could be replicated?

Errington: At this stage — and even at the end — we try not to label. The truth is we don’t know what a lot of this means. That is [the journal’s] opinion, and that is important, but we are actually interested in exploring this further. You talk to some people and they would say, “None were replicated.” Others would say, “Things look just fine.” Two of them came up with technical issues — what that really means is that in the replication, the experimental systems behaved differently. So, whether that is technical or that is actually what occurred originally but was not reported is kind of hard to separate. Those [comments] are really broad strokes, and the truth is, there is a lot more nuance to this. That’s what we are trying to get into — and to actually discuss that more in detail.

IRB Advisor: This initially has been presented as a kind of a general widespread problem, but you cite a host of variables that could undermine replication efforts, adding a considerable level of complexity to the whole question.

Errington: That’s all the more reason to try to improve this process. We are making great leaps in knowledge, but we are probably not being as efficient as we can in that entire process. We should be able to minimize the variance that occurs just from our own communications as scientists, or our own incentives to only publish part of the results versus everything. We can definitely change the behavior to expose more of it.

The other thing is because it is hard to replicate — which really means because it is hard to do research — we often need to be very cautious how much weight we put on any one study. Just because someone has published a study and nobody has replicated it, that doesn’t dismiss the original, but it doesn’t mean we should put too much [weight] on the original. It’s one piece of evidence. It’s important to recheck that piece of evidence to ensure we have reliable [research] and not just assume that. We need to put that in context. We do want to be able to try to trust our research as much as possible. If we can’t, OK — let’s figure out how we can improve it so that we can build on each other’s work more efficiently.

IRB Advisor: Is one of the goals of your work to develop a process or methodology to look at this problem?

Errington: This project is one way of doing that. It is difficult because we don’t incentivize replications and the showing all of [research] processes. We’re doing these projects as a means to get a rate [of reproducibility]. Nobody reports these [and there’s not an accepted method] of figuring it out. So this project is an initial attempt to say, “Well what is that method?” It would also be good to have complementary mechanisms that ask, “How can we better track all of the studies that are going on?” Because right now, that is locked away — what we see is what they publish. What they publish are positive results.

IRB Advisor: How does this project compare to your previous reproducibility study4 of psychological research?

Errington: That abstract describes five different ways to examine reproducibility, but we still really don’t know what that means. Everybody wants to put [a number on it], but we don’t understand it. Unfortunately, everybody just latched onto the number 39% [of studies that were reproducible] when they reported on it. But the truth is that both of these projects have already exposed common themes, [including] not being able to have access to all the data, the materials, and the methods. That was a big challenge even getting our [cancer research] project launched, and it was a similar problem in psychology. These [general problems] are not unique to the [scientific] disciplines — maybe the aspects are different — but there are shared commonalities across all of science that can basically hinder reproducibility, and there are ways to improve these.

IRB Advisor: What are some of the obstacles that have to be overcome?

Errington: Right now there is a lot of emphasis on getting positive results, doing it very quickly, and getting novel findings. What that generally leads to — and with the psychology study we had the same thing — is that you have these small sample sizes. [Our replication experiments] — every single one, I think — had a higher sample size than the original because we are powering up our experiments to find useful effects. Say, instead of using five mice per condition, we are using 15-plus mice per condition. You don’t want to use too many because that is wasteful in terms of resources and lives, but if you use too few and too many people do that, you can get misled really quickly.”

REFERENCES

  1. Nosek BA, Errington TM. Reproducibility in cancer biology: Making sense of replications. eLife 2017;6:e23383. DOI: http://bit.ly/2kO10tG.
  2. Redman BK, Caplan AL. Limited Reproducibility of Research Findings: Implications for the Welfare of Research Participants and Considerations for Institutional Review Boards. IRB: Ethics & Human Research July-August 2016;8-10.
  3. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science 349(6251), aac4716. Doi:10.1126/science.aac4716.
  4. Kaiser J. Rigorous replication effort succeeds for just two of five cancer papers. Science Jan. 18, 2017: http://bit.ly/2j9JMmo.