Moreover, our predictions in individual races were almost entirely correct. Assuming that Warner in Virginia, Sullivan in Alaska, and Cassidy in Louisiana are ultimately going to win, we will have called 35 of 36 races correctly.
Brier scores take into account both whether races were called correctly and the underlying confidence of the forecast. The best outcome is to be 100 percent certain and correct (a Brier score of 0). The worst outcome is to be 100 percent certain and incorrect (a Brier score of 1). Lower scores are better. Here are some preliminary estimates, excluding Louisiana but assuming Warner and Sullivan victories:
|New York Times||0.035|
|Princeton Election Consortium||0.043|
|Note: Assumes Warner and Sullivan victories. Does not include Louisiana.|
Because the forecasts were calling most of the same winners, the differences here are mainly due to confidence. The forecast with the lowest score was the Daily Kos, which was run by political scientist Drew Linzer. The Election Lab forecast had the second lowest score. One other way to compare these scores is this: the difference of a score of 2.5 and a score of 5 is about the difference between what our forecasting model has previously scored on Election Day vs. two weeks to go.
So, what did we get wrong? To be sure, Warner’s race was closer than we and most others expected. It counts as a “correct” call but forecasts relying on fundamentals or polling averages did not see a tight race coming.
We also called the North Carolina Senate race incorrectly, as the map at top shows. Our fundamentals model was always bullish on Hagan, and, although incorporating polling narrowed her advantage over Tillis, we still saw her as the likely winner. In our final forecast, we have Tillis at 23 percent chance of winning — which, of course, is not the same as 0 percent. Still, this was a race that our model didn’t fully anticipate.
Assuming Warner, Sullivan, and Cassidy wins, this will produce an 8-seat gain for the GOP. The most likely scenario in our final forecast was a 7-seat gain (49 percent of simulations), although an 8-seat gain was certainly possible, too (28 percent).
This is just a first cut at evaluating our model. There are other criteria we need to use (again, see Bialik’s nice piece). Moreover, as we have noted, there is no way to determine from a single election whose model is “better.” We’ll also have more to say about the model’s performance in House races.
Most importantly, we’ll use the model to draw some broader lessons about the election itself.