Last week a new Science paper (Gilbert, et al.) came out criticizing a previous Science (Nosek, et al.) paper, which had tried to re-run 100 psychology findings and found only about 40 percent replicated. In the new paper, Gilbert, et. al. suggest that this low rate was because of analytical mistakes and infidelities to the original methods, but when accounting for these, replication rates were quite high.
Harvard (home to three of the four Gilbert, et al. authors) issued an embargoed press release to journalists in advance with the eminently clickable title “Researchers overturn landmark study on the replicability of psychological science.” When the embargo was lifted, newspapers were ready with headlines like “New Critique Sees Flaws in Landmark Analysis of Psychology Studies” and “Psychobabble: Study claiming academic psychologists' research is mostly untrue 'is itself flawed and biased.'” At least two outlets seem to have included comments from Nosek only after their article was written and posted online.
Within a few days though, critiques of the Gilbert, et al. “debunking” paper started appearing (including three! from Andrew Gelman: 1, 2, 3) and Nosek, et al. suggested that Gilbert, et. al. had misrepresented details as “infidelities.” Among the better methodological critiques are ones from Sanjay Srivastava, and Uri Simonsohn (Gilbert, et al. reply here).
[Side note: In an amazing irony bordering on divine providence, the American Statistical Association released a “reminder” of proper use of p-values for scientists right in the middle of this.]
IPA’s a big supporter of data sharing, transparency, and replications. But we seem to be in the early days of a new wave of replication methods being worked out, and it’s hard not to draw comparisons to last year’s “Worm Wars” debates over the effects of treating children for parasites on education outcomes. In that case a university also issued an embargoed press release announcing an upcoming paper that overturned a well-established finding (full disclosure: one that we have some ties to). In both cases after a week of acrimonious back-and-forth, most researchers seemed to agree that the new findings seem more shrug-worthy than earth-shattering, but that’s long after the news headlines have settled into the back of readers’ minds. In the end, the best coverage of “repligate” is probably the more measured story that came out the following week from New York Magazine: “Is Psychology’s Replication Crisis Really Overblown?” Similarly, with deworming, the Guardian which had initially printed the one-sided press release later revisited it, but those later stories are effectively a footnote compared to the original coverage.
To be fair, journals like Science which use embargoes limit circulation of working papers for vetting by colleagues, and try to maximize news coverage timed with the release date, force journalists to work with limited information. When Vox covered deworming they did it in a way that was slower, repeated, and went deeper into the process, explaining why there was a debate and why it was important.
“’NEW STUDY DEBUNKS…’ Breaking News Consumer Handbook”
The people who write press releases (and I’m guilty of this as well) want to see a story printed as close to the press release as possible, usually touting the brilliant researcher (and university/organization) involved. How close does this article sound to a press release? (Some examples, for Worms: Guardian vs. press release; Repligate: Harvard Gazette1 vs. press release)
Does the story claim to “debunk” or dramatically overturn established wisdom? Science usually progresses in slow accumulations of findings rather than eureka discoveries, what are the chances this time is different?
Does it look like both sides were heard from in the writing of the story, or was there a blank left for an opposing quote to fill in later? (or even updated with a quote later)
For better or worse, many researchers are attached to their own findings, were any independent researchers consulted for the story? Did they mention if the methods used are considered standard in the field?
Is the flashy debunking announcement brand new or has the field had a chance to examine it thoroughly? Announcing the press release is a bit like calling a Presidential election after Iowa or a tennis match winner after a strong serve. The better stories almost always come out later after the field has come to consensus.
If you’re getting involved with the discussion online, try not to get distracted by the tone. It’s easy to ascribe nefarious intentions to researchers who come to a different conclusion, but most people are looking for the same truth at the end of the day. Before you click “post” check if your tone will distract from your message.
I’d like to hope that nobody needs these tips, but we’re in the early stages of a new movement towards a more robust science, and after seeing these two scenarios play out, I worry that both the media and the even the data types can jump to conclusions too quickly, so it’s best to read critical stories with a critical eye.
A couple of other resources:
- David Evans has an excellent presentation for journalists on how to look at impact evaluation research, but covers some of these issues in this spot.
- He also has an anthology of everything written on the Worm Wars.
- A list of readings on “replication crisis” is here.
- Sanjay Srivastava’s blog post has a list of other posts on repligate.
 The Harvard Gazette is a PR tool of Harvard University so this shouldn’t be surprising, this article was re-circulated by some prominent researchers.