With about two weeks to go until Election Day, a number of the forecasters have been investigating the potential accuracy of polling averages. See the discussion by Mark Blumenthal, Nate Silver, and Sean Trende, as well as the earlier analysis by Josh Katz at The Upshot. The main conclusion is that poll averages where one candidate leads by more than about three points are likely to call the winner correctly, but closer races are still up in the air.
At Election Lab, our forecast for party control of the Senate has been an outlier for a while now: we currently give Republicans about a 90 percent chance of taking the Senate, while the next most confident forecast is roughly 75 percent. If the polls can often be wrong, are we right to be so confident?
Like all the other forecasters, we tested our approach to see how well it predicted past outcomes. This is called “calibration”: ensuring that the probabilities in the forecast accurately reflect the uncertainty we have about the outcome. This conversation about accuracy encouraged us to take a second look.
If our forecast is overconfident, the most likely source of the problem is the standard errors. So the most straightforward solution would be to make our standard errors larger. The trick is that we only want to do that for the races where we should be uncertain. If larger standard errors end up making us more uncertain about races where we ought to be confident, then we’re not necessarily getting better predictions overall. In some races the predictions will be better, while in others they’ll be worse.
Here are the results of a calibration exercise we did using Senate polling data from 2004 through 2012. (We thank The Upshot at the New York Times for making these data available.) We focus on 2004-2012 because there have been roughly triple the number of polls per election during this period as there were in earlier years. For this exercise, we ran our model first with all the polls available three weeks before the election each year, and then again with the polls available in each remaining week.
For each week’s predictions, we then expanded the standard errors by 1 percentage point and 2 percentage points. This adds in quite a lot of uncertainty. It’s analogous to adding two and four percentage points to an individual poll’s margin of error.
We then calculated a Brier score for all the forecasts. The Brier score measures how far off probabilities are on average: the higher the score, the worse the average prediction. It’s also a squared error metric, so it punishes bigger errors more than smaller ones. That means that any missed prediction is going to count against us a lot. (To be sure, Brier scores are not beyond reproach as a measure of forecasting accuracy. So consider this just one way of evaluating the Election Lab model.)
The question is whether adding more uncertainty makes the forecasts better overall, by making us more properly uncertain about the forecasts that are dicey without unduly hurting our confidence about the forecasts that are good.
The results are in the table below. The first thing to note is that adding 1 percentage point to the standard error doesn’t really affect the overall quality of the forecast. At three weeks out, it makes the Brier score only a tiny bit better. And after that it either offers no improvement or makes it a little worse.
So, as of last week, adding about a percentage point to our standard errors might have improved our forecast a little bit. If we had done that, we would have given the Republicans an 80 percent chance of taking the Senate — lower than we do now, but still an outlier among the forecasters. No matter what, the Election Lab model is still pretty confident the Republicans are going to take the chamber.
If that extra uncertainty doesn’t improve the forecast very much, what does improve it? Time. Every week, the overall score drops, from around 0.05 with two and three weeks to go, down to around 0.04 with one week to go and about 0.02 on Election Day. The polls are simply more predictive the closer we get to Election Day, and this downward trend dwarfs any effect of adding more uncertainty. For the 2004-2012 elections, our process has never ended up miscalling more than two races in one election year, and one miscall has been the most common result.
This suggests that uncertainty is no substitute for information. Greater uncertainty is only useful if you add it to the races where you truly ought to be more uncertain. And the only way to do that is to get more information about how each race will turn out on Election Day, which we can do as Election Day approaches.
Does this mean there’s no difference in accuracy between all the forecasters, despite sometimes big differences in probability? Not necessarily. We did this test on our own poll-averaging process. Other forecasters may have a better process that more accurately gauges uncertainty in each individual race. Likewise, we expanded the standard errors across the board. If others can do it only in the races where that uncertainty is warranted, they might easily improve the results.
For these reasons, we don’t think this analysis tells us too much about the “nerdfight.” Moreover, there remains broader agreement among the modelers that sometimes gets lost in all the talk about differences. We’re all predicting virtually the same winner for every race, and probably even roughly the same vote share as well. We’re all predicting a Republican Senate, with varying levels of certainty. That means that half the range of probabilities is simply not at issue. And any differences will likely shrink as Election Day approaches and the steady increase in information pulls everyone toward the final result.
Eric McGhee is a political scientist and, with Ben Highton and John Sides, helped produce the Election Lab forecasts.