I recently wrote a piece that argued there is an anti-winner bias in polling. For races expected to be competitive, I noted that the predicted poll margin underestimated the actual vote share for the winner.

I’ve thought more about this issue and talked with some people about it (including the Monkey Cage’s own Andrew Gelman), and I don’t think the claim holds up to closer scrutiny. It may even be a statistical illusion. The finding could illustrate the problem of “controlling for the outcome” in statistics, or it could reflect the more mundane fact that as the forecasted margin of victory approaches zero, the chance of understating the winner’s margin naturally gets larger (since the winner’s margin will always be bigger than zero).

So this particular idea didn’t really pan out. Which is a shame, because there is an underlying problem that it was meant to address, and the problem still exists.

The issue is this: Virtually all the forecasting models in 2014 overstated the likely number of miscalled races. Models with similar margins of error would probably have overstated the number of miscalls in past years, too. But this is not because they were poorly constructed. Instead, there seems to be something strange in the polls.

A poll-averaging model that is well-calibrated for vote shares (what percent of the vote a candidate receives) will miss the actual two-party vote by about as much as it says it will. So for example, about 90 percent of the outcomes would fall within the prediction’s 90 percent confidence interval, 95 percent  within the 95 percent  confidence interval, and so on.

It’s pretty straightforward to set up a model that meets this standard quite well for past outcomes. Not surprisingly, such a model also does a good job of predicting the past winners. And if everything is working as it should be, such a model should also predict the wrong winner about as often as it expects to. It should have a good sense of what it doesn’t know.

But I’ve found that a forecast calibrated on vote shares overstates the number of likely misses. From 2004 through 2014, the Election Lab poll averaging process miscalled a total of six Senate races, yet the model calibrated on vote shares thinks there is only a 0.4 percent  chance of six or fewer misses. That’s not zero, of course. But even if the true number were, say, 10 percent, it would still be saying that the model was significantly overstating the number of likely miscalls.

The notion of an anti-winner bias was my attempt to resolve this apparent contradiction. Since I have no immediate follow-on explanation, I’ll open it up to the broader community. Assuming that I’ve identified the problem correctly, it seems worth pondering for the sake of forecasts in 2016.