The Washington Post's Election Lab -- our statistical model designed to predict outcomes of the various races on the ballot this fall -- is currently showing Republicans with a 95 percent chance of winning the Senate. While most political handicappers suggest Republicans have an edge in the battle for the Senate majority, few would say it is as heavily tilted toward the GOP as Election Lab. And, even other statistical models -- kept by FiveThirtyEight and the New York Times -- project far more caution about the likely outcome in 25 days time. I reached out to John Sides, Ben Highton and Eric McGhee, who put together the model behind Election Lab, for some answers. Our conversation, edited only for grammar, is below.
FIX: The Election Lab model shows Republicans with a 95 percent chance of winning the Senate majority. Nate Silver's model pegs it at a 58 percent chance. The New York Times' at a 66 percent chance. Why is EL so bullish?
Election Lab: First, I should mention what I don't think is behind the difference: Our prediction for the vote share in each race. I'll bet that none of the modelers differ by more than a couple of percentage points in terms of how many votes they think each candidate will receive. Likewise, I'll bet we're all within a seat or two in terms of the number of total Democratic and Republicans seats we think most likely. Rather, the difference is more about the statistical confidence in our predictions. At Election Lab, we have always had a smaller margin of error than the other modelers. This reflects our conclusion that, by this point in the cycle, you can pretty much trust an average of the polls to tell you the winner. In defense of our approach, the races we gave a high probability at the end of the primary season almost two months ago are all predicted in the same direction today. All we’ve done is add to the list (all on the Republican side). So at least for those initially high probability races, things so far have played out as expected.
But we don't begrudge the other modelers in their disagreement with us on this -- in fact, we think this is a useful conversation to have. And we would still be surprised if we got every race right, since even the high probabilities we have now suggest we could easily get a couple wrong.
FIX: On a race by race level, there is no state that EL rates as less than an 78 % probability. Is that certainty explained by the same factors as the overall prediction? Why or why not?
EL: Yes, it also comes from the same basic decision to trust the polls explicitly. As part of this decision, our confidence in a prediction grows the more consistently one candidate leads. That can turn a small lead into a fairly confident prediction.
FIX: Can you explain how you decide to weight the various factors that contribute to the model as the election gets closer? Is the EL process different than processes used by other modelers?
EL: At this point, we're almost 100 percent weighted toward the polls, except in races where the polling is extremely light (fewer than about four polls total). In those cases, we are just using the prediction from our fundamentals model. I think that all the other modelers are pretty much the same way. The big differences now concern certainty about the averages, and maybe a few other design decisions about the averaging process.
FIX: Given how much polling is coming out every day now, are models, which increasingly depend on polling as the election draws nearer, a lagging indicator of where these races are? A leading indicator? Something in between?
EL: Since everyone is basically using publicly-available polls, it all depends on how quickly the polls are released and how quickly the forecasters then include those results in their predictions. Then there are two additional factors: how heavily recent polls are weighted compared to the long-term average, and the uncertainty they impose on their prediction. The more [you] weight recent polls, the faster the prediction will change but the greater the chance that they'll just follow noise rather than real movement. The more uncertainty they impose, the more slowly the overall probabilities will change -- kind of like the difference between the way you experience a wave in the middle of the ocean (a slow, gradual rise and fall) versus at the shore (a sharp, sudden hit). Unfortunately, we won't know for certain what's leading and what's lagging until Election Day.
FIX: What constitutes success as it relates to the model? Getting every race right? Something else?
EL: That's a very good question! A model should not be judged on whether it gets every race right, any more than a handicapper should. Rather, it should be judged on the probabilities it assigns to those outcomes. If the election were tomorrow and the Democrats held the majority, our prediction would turn out to be bad relative to the others. But the reverse is also true: if the Republicans claim the Senate, we would be better than the others. That's true even though pretty much all of the forecasters at least tilt toward a Republican Senate. A confident prediction should be punished especially hard if it's wrong, but it should also be rewarded more than others if it's right. When the issue is probability (as opposed to, say, vote shares), being on the correct side of 50 percent is not the end of the story.
On the other hand, the most we can do with one election is to identify the model that performed best for that election in the sense that the actual outcome was considered most likely according to its predictions. But "less likely" almost never means "impossible," so there's always the chance that this year was a fluke and a different model would perform better over a series of years. So while we should definitely rate the modelers after the election, the conversation about what makes the most sense for multiple elections will continue.
The reality is that in order to confidently figure out if one model is really better than another model (as opposed to just being lucky in a given set of races or for a single election year), we would need many more elections and election years. With enough data we could probably sort it out, but certainly looking at predictions for 36 Senate elections from a single election year will not be enough to determine once and for all if we’re better or worse than the folks at the Upshot, 538, etc.—only if we’re better or worse this year.
However, this kind of conversation is one of the great strengths of the modeling approach. For the most part, the modelers are not using information that is different than what's available to and used by everyone else watching these elections. In fact, the models often use less information than other, more qualitative, approaches. That's why these other approaches should continue to be part of the conversation. The difference is that the modelers make everything transparent and formal. They're really sticking their necks out in assigning probabilities to these specific events. That makes it easier to tell when someone got it wrong, and why. It makes our lives harder, of course, but I think it's useful for the community of people consuming these forecasts, and for advancing our understanding of what makes elections tick.