With almost all of the Senate races called, we can start to evaluate the Election Lab forecasting model. Here is an early assessment of the Senate election forecasts.

First, here’s what we got right. We suggested very early this election year — in this Jan. 27 post — that the Republicans had a good chance of taking the Senate.  This was based on a simple fundamentals model of elections from 1980-2012.  These are the fundamentals established by the political science literature, and 2014 was a good year for the fundamentals.  Our final forecast — that the GOP had a very high (98 percent) chance of taking the Senate — was borne out Tuesday night.

Moreover, our predictions in individual races were almost entirely correct.  Assuming that Warner in Virginia, Sullivan in Alaska, and Cassidy in Louisiana are ultimately going to win, we will have called 35 of 36 races correctly.

But counting winners isn’t the best way to evaluate forecasting models.  A better (though not perfect) tool is Brier scores.  Most forecasters favor making these one part of the evaluation process, as Carl Bialik of 538 notes.

Brier scores take into account both whether races were called correctly and the underlying confidence of the forecast.  The best outcome is to be 100 percent certain and correct (a Brier score of 0).  The worst outcome is to be 100 percent certain and incorrect (a Brier score of 1).  Lower scores are better.  Here are some preliminary estimates, excluding Louisiana but assuming Warner and Sullivan victories:

Forecaster Brier score
Daily Kos 0.024
Election Lab 0.027
538 0.032
PredictWise 0.032
Pollster 0.034
New York Times 0.035
Princeton Election Consortium 0.043
Note: Assumes Warner and Sullivan victories. Does not include Louisiana.

Because the forecasts were calling most of the same winners, the differences here are mainly due to confidence.  The forecast with the lowest score was the Daily Kos, which was run by political scientist Drew Linzer.  The Election Lab forecast had the second lowest score.  One other way to compare these scores is this: the difference of a score of 2.5 and a score of 5 is about the difference between what our forecasting model has previously scored on Election Day vs. two weeks to go.

So, what did we get wrong? To be sure, Warner’s race was closer than we and most others expected.  It counts as a “correct” call but forecasts relying on fundamentals or polling averages did not see a tight race coming.

We also called the North Carolina Senate race incorrectly, as the map at top shows.  Our fundamentals model was always bullish on Hagan, and, although incorporating polling narrowed her advantage over Tillis, we still saw her as the likely winner.  In our final forecast, we have Tillis at 23 percent chance of winning — which, of course, is not the same as 0 percent.  Still, this was a race that our model didn’t fully anticipate.

Assuming Warner, Sullivan, and Cassidy wins, this will produce an 8-seat gain for the GOP.  The most likely scenario in our final forecast was a 7-seat gain (49 percent of simulations), although an 8-seat gain was certainly possible, too (28 percent).

This is just a first cut at evaluating our model.  There are other criteria we need to use (again, see Bialik’s nice piece).  Moreover, as we have noted, there is no way to determine from a single election whose model is “better.” We’ll also have more to say about the model’s performance in House races.

Most importantly, we’ll use the model to draw some broader lessons about the election itself.