REUTERS/Kacper Pempel/Files

Last year, The Monkey Cage ran a post about a survey experiment involving a photograph of President Obama:

First, we showed people either a lighter or darker image of Obama. Afterward, we asked those people to fill in the blanks to complete partial words, and we counted how often they provided stereotype-consistent completions of the following prompts: L A _ _ (LAZY); D _ _ _ Y (DIRTY); and _ _ O R (POOR). People who had seen the darker image were on average 36 percent more likely to use these stereotype-consistent words than were people who saw the lighter image.

After receiving the lighter image or darker image of President Obama, participants were asked to complete 11 prompts that could be completed in stereotype-consistent ways. These 11 prompts were followed by items asking participants to rate Obama’s competence and trustworthiness, as well as rating him on a feeling thermometer.

My re-analysis of the data indicated that the darker image of Obama did produce a statistically significant increase in stereotype-consistent responses to the LAZY/DIRTY/POOR combination of prompts, as originally reported.

But on re-analysis, I found that the darker image of Obama did not produce a statistically significant change in responses to the feeling thermometer, the competence item or the trustworthiness item, or for responses to a scale that combined the full set of 11 prompts.

What’s more, I found that it was possible to pull the responses from a different set of three prompts to conclude that the lighter image of Obama caused participants to complete more prompts in a stereotype-consistent way. In other words, in writing up the results of the experiment, the researchers could have reported a null result, a statistically significant result in one direction, or a statistically significant result in the other direction.

What does this say about social science research?

This example highlights a challenge in reporting results of scientific studies known as “researcher degrees of freedom.” Researchers have a great deal of flexibility in determining how to report data, which results to report or whether to report them at all.

Often, choices about what gets reported are driven by reasonable decisions, such as what researchers perceived to be the most theoretically relevant research design or appropriate method of analysis. But what gets reported also might be influenced by researchers’, peer reviewers’ and journal editors’ preference for certain results, such as statistically significant results. Such a “statistical significance filter” can produce inflated estimates of effect sizes and lead to erroneous conclusions.

I recently published a comment on a meta-analysis from the psychological literature on stereotype threat. That body of work might suffer from this problem. In the re-analysis data set, the weighted estimated effect size for stereotype threat was -0.04 across the 13 studies with the largest participant samples, but was -0.73 across the 13 studies with the smallest participant samples, with negative values indicating that stereotypes harmed test performance.

In plain language, that means that there was a large difference in the size of the “stereotype threat” effect between the small studies and the large studies. It’s certainly possible that this difference might be because of factors other than selective publication, as indicated in my comment and the original authors’ reply to the comment. But it might also be that smaller studies that showed no evidence of the effect were less likely to be published than smaller studies showing evidence of a negative effect of stereotypes. There are many possible explanations for such potential selective publication, among them the possibility of ideological bias.

How can social science overcome this problem?

There are several ways to reduce potential bias in the reporting of research findings, as I noted in my article discussing selective reporting in the race and sex discrimination literature. In preregistered studies, researchers publicly declare a planned research design before data analysis. Or journals might consider preaccepting articles from a preregistered study, agreeing to publish the study no matter what the results are, reducing the chance of selective publication.

Social science has value and should inform public policy decisions, but the credibility of social science studies can be undercut if researchers retain the ability to selectively report results.

L.J. Zigerell is an assistant professor of politics and government at Illinois State University. Find him on Twitter @LJZigerell.