“Reproducibility Project: Cancer Biology,” a collaboration of the Charlottesville-based Center for Open Science and the Palo Alto, Calif.-based Science Exchange, attempted to replicate five highly influential mouse experiments. Three of the attempts failed or were inconclusive. The other two found somewhat similar results, though with smaller effects.
The question now is why this second round of experiments was generally unsuccessful. The failure of a replication study does not mean that the original study was flawed. The flaws could be in the replication effort. Moreover, the journal eLife is publishing only the first five of 29 planned replication studies, and so the leaders of the effort say it is too soon to draw broad conclusions.
But these five studies support the argument that scientific findings — even ones that have gone through peer review and been published in elite journals — may not always be as robust as they first appear. At the very least, scientists should be more precise in describing their work so that others can come along and try to do the same thing. Reproducibility is a fundamental feature of solid science.
“Scientists are very smart people, but smart people can still fool themselves,” said John Ioannidis, a Stanford University professor who serves on the center's advisory board. “You have a high risk of seeing faces in the clouds. They just appear once, and then they’re gone.”
The center's founder, Brian Nosek, a University of Virginia professor of psychology, said there are many possible explanations for why those second attempts failed to come up with identical results. There may have been small differences in techniques, for example, or in the mice used in the experiments.
“We don’t know what to conclude. And that’s the most frustrating part of how science works. You do a study. You think you got something. We do it again. We don’t got it,” Nosek said.
The latest outcomes have distressed the scientists involved in the original experiments.
Atul Butte, a researcher at University of California at San Francisco, said he was extremely “annoyed” about the way the project handled a study by him and other scientists. He said the replication team essentially found the same results but took an extra analytical step that rendered them statistically insignificant.
“They altered the methodology. If you replicate the study, why change the statistical methods at the end?” Butte said. He added, “Who watches the watchers?”
The replication researchers counter by saying they took extraordinary care to communicate with the original laboratories and submit the plans of what they intended to do, including their analytical methods, for peer review.
“The approach we’ve taken with the project is to make all of the data open and accessible to everyone. Anyone can go in and look at the data,” said Nicole Perfito, a project manager at Science Exchange.
An experiment involving mouse breast-cancer cells came from the laboratory of Irving Weissman, a renowned researcher at Stanford's School of Medicine. His study showed that an antibody called CD47, which protects cancer from immune-system attacks, is on all human cancer cells. And it found that when an antibody blocked the protein, immune cells attacked tumors.
But the replication study was confounded by a spontaneous shrinkage of tumors in the “control” mice — the ones that didn’t get the anti-CD47 therapy.
Weissman harshly criticized the second team's work, saying the researchers only tried to do part of the experiment. He said their inability to grow tumors in the control mice showed they were having technical problems — ones that his lab offered to help figure out. “They turned us down,” he noted. Now the CD47 antibody is being tested in a clinical trial.
Another experiment, conducted as a collaboration that included the Dana Farber Cancer Institute and the Broad Institute of the Massachusetts Institute of Technology and Harvard University, found that a gene mutation known as PREX2 accelerated tumor growth in mice. The experiment used control mice that lacked the mutant gene. In the original, the control mice lived about two months before the growing tumors killed them, while mice with the mutant gene died much more quickly. But in the replication, the control mice mysteriously died in about a week, too quickly to test the hypothesis.
“I don’t believe the results from the reproducibility study calls into question the results of our original experiment,” said Lynda Chin, a co-principal investigator. She said the original result was validated in a subsequent experiment using a different technique.
Eric Lander, one of the country's top scientists, the head of the Broad Institute and one of 48 co-authors of the paper describing that first experiment, said, “If the controls differ in different papers, you can't actually compare the experiments. But it does not speak to anything wrong in the [original] paper. They couldn’t even test it. Now what they have to do is figure out why in the world in their hands are their mice dying in a week.”
Tim Errington, a cancer biologist who led the replication project for the Center for Open Science, countered that it “used the exact same system they did. It's their control system, not ours. And that did not replicate.”
He and Nosek said one possibility is that the cancerous cell line used in both sets of experiments had mutated over time.
In it, he argued that scientists are prone to “selective reporting.” They do many experiments that come up with nothing new and don’t report those null results. Instead, they publish their more interesting, apparently significant findings — ideally in top journals such as Science, Nature, Cell and The Lancet. This selective reporting is not dishonest, much less fraudulent, but Ioannidis thinks it can lead to the significance of observed effects being overstated. That's because random things happen. There are blips in the data — quirks, flukes, anomalies.
In 2011 and 2012, scientists at two major companies, Amgen and Bayer, reported that they had consistently struggled to replicate the findings of researchers in academic laboratories. These companies did not disclose the details of their experiments.
Enter Nosek, the University of Virginia professor. With grant money and the backing of many leaders in the scientific establishment, he founded the Center for Open Science. It advocates complete transparency in research, which includes publishing raw data.
In 2015, the Center produced its first replication study. The researchers attempted 100 psychology experiments but could replicate only 39. A furor erupted. Harvard psychology Daniel Gilbert led the attack, saying the Nosek group had made fundamental errors in their effort.
Then last year, the Center extended its work into the field of cancer biology. One of its studies involved a longtime challenge in oncology: getting cancer treatments delivered into and around tumors. In 2010, a study reported that researchers had developed a peptide, a small piece of protein, that in combination with cancer drugs helped them penetrate the tumor. In the replication experiment, however, the peptide did not increase the drugs’ penetration.
Erkki Ruoslahti, a researcher at Sanford-Burnham Prebys Medical Discovery Institute in La Jolla, Calif., who was a main author of that original study, said the group seeking to replicate it proposed “a very, very limited study … and I was unhappy about it.” He decided he didn’t want to be involved in the Center's project and declined to provide the peptide synthesized by his lab.
When the replicators' results didn’t match the original ones, Ruoslahti asserted, they did no troubleshooting. He said now he’s worried that negative publicity will hurt his efforts to bring the peptide into clinical use with patients. If so, he added, the replication team “didn’t do cancer patients a favor.”
Victor Velculescu, co-director of cancer biology for the Kimmel Cancer Center at Johns Hopkins, said reproducibility is important but that a balance needs to be struck with innovation. “If we want every study so perfectly reproduced that we won’t believe it until we do it 1,000 times, then it would take decades before people published anything,” he said.
Dinah Singer, director of the division of cancer biology at the National Cancer Institute, which provided at least some funding for all the original studies, said that the project's inability to replicate study results “doesn’t nullify the data but simply points out the difficulty in reproducing it. It provides us an opportunity to learn about the system and to move forward.” Similarly, Lawrence Tabak, the principal deputy director of the National Institutes of Health, of which NCI is a part, said the undertaking contributes to wide-ranging efforts by NIH, foundations and researchers themselves to increase reproducibility.
“Overall, this is not a setback for science. This is a contribution for science,” Tabak said. “Fundamental for science is its self-correcting nature. This was directed evolution, if you will, but still a way of self-correction.”