Some recent discussions of Senate forecasting models have looked at current differences among the models, mainly ours and The Upshot’s. One key factor that has differentiated them is polls. Our model hasn’t incorporated polling data, although we’ll be doing so shortly. The Upshot’s model has factored in available polling data all along.
This doesn’t stem from any philosophical differences in how to forecast elections. (There is no “War of the Senate models.”) As we noted in our first Senate forecast way back in January, our plan has always been to combine a forecasting model with polling data.
So we’ve recently done some further analysis on the value of early polling in Senate races. Here are three things we have learned from our look under the hood.
Finding#1. Senate polls have predictive value that increases as Election Day approaches.
Political science studies have shown that pre-election polls become more predictive as Election Day draws closer. This is true in American presidential elections, as Robert Erikson and Christopher Wlezien have shown in their book. It is true in House elections, when predicting the overall vote in all House elections with pre-election “generic ballot” polls.
What about in Senate elections? Below is a similar graph using Erikson and Wlezien’s methodology and drawing on polling data in 362 Senate races between 1990-2012. (These data combine our own and those graciously made public by the folks at The Upshot.) It shows how well the Democrats’ share of the major-party preferences in pre-election polls can predict the Democrats’ share of the vote in November.
The graph shows two important things. First, as in other types of elections, the predictive value of polling increases as the election draws nigh.
Second, early polls do appear to have value in predicting Senate elections. For example, 450 days before the election, the R-squared value is 0.5 — meaning that polls explain about 50 percent of the variation in outcomes. As of right now, about 130 days before the election, the polls explain roughly 75 percent of the variation.
2) Early polls move toward both the eventual outcome and toward a simple model forecast.
If the ability of polls to predict outcomes increases as Election Day approaches, the accuracy of Senate polling should increase with time, too. It does. Here is the absolute difference between the Senate polls and the eventual outcome over time:
Senate polls tend to converge on the outcome — but, interestingly, only as of about a few weeks ago. That is to say, the accuracy of the Senate polls doesn’t improve much up until about June of the election year. But their accuracy is going to improve quite a bit going forward. (See also Josh Katz on this.)
But perhaps an even more interesting question is whether polling moves in the direction of a model-based forecast. This is interesting because at this moment, there are several races in which the polls seem potentially out of line with forecasts. For example, based on the fundamentals, Mitch McConnell — a Republican incumbent running for reelection in a red state — should win more handily than polls currently suggest. So what might we expect to happen in these cases?
To illustrate, we took the 1980-2012 Senate elections and estimated out-of-sample forecasts for every Senate election between 1992 and 2012. The model used only a very small number of factors, many fewer than in the actual Election Lab model: economic growth, presidential approval, whether it’s a midterm or presidential year, and how the state voted in the most recent presidential election. The other kinds of factors in the model — which tap the kinds of candidates who are running, their fundraising, etc. — may themselves depend on polls. So we focus on factors that are arguably “prior” to election-year decisions and dynamics.
The graph below shows the deviation between the Senate polls and forecasts based on this very simple model:
Interestingly, the polls move sharply toward the model’s prediction in the last few weeks of the campaign. This is exactly what should happen if the polls tend to converge on “the fundamentals” that go into a forecasting model like ours or The Upshot’s. This is also consistent with what happens in presidential elections and House elections. For example, Erikson, Wlezien and Joseph Bafumi have found that House generic ballot polls tend to trend away from the president’s party.
This suggests that certain “fundamentals” of Senate elections are not yet “fully baked” into polls. So if the polls look out of line with the fundamentals, on average they should move toward the fundamentals. Of course, “on average” implies that not every race may manifest this pattern, including McConnell’s.
It’s also clear that the polls will continue to provide a more and more accurate look at the race. Indeed, as the two graphs illustrate, the polls are closer to the outcome than to this very simple forecast. This leads to the next question: What is the right “blend” of a model-based forecast and a polling average?
3) Right now, it is best to base predictions on both the polls and the model.
If current Senate polls have predictive power, do we need models at all? Why not just look at the polls? There are several reasons why models are necessary.
For one, lots of races won’t have polling, especially on the House side. So much attention is focused on the Senate that the House is being overlooked. But given that we at The Monkey Cage are trying to forecast both individual House and Senate races, we need a model for all the races where there are few if any polls (which will likely include a few Senate races too).
But this leaves the question: Does a model-based forecast contribute anything over and above the polls, especially in races that have been polled relatively frequently? Cohn and Katz found:
But the fundamentals contribute very little additional information in highly polled Senate races. That’s one reason, in states with a large number of polls, that The Upshot’s Senate forecasting model relies almost exclusively on polling data rather than fundamentals.
Here is what we have found, focusing on Senate races in 2008-2012, when we have the same early fundraising data that we are using to help make predictions in 2014.
First, averaging across Senate races in these years, the polls are usually closer to the final vote share than is the model.
But vote share is not the only thing we care about. We also want to forecast the winner of each race and the overall odds that each party will control a majority of seats and thus the Senate. This is particularly important in 2014 because control of the Senate depends on a large number of races: Alaska, Arkansas, Colorado, Georgia, Iowa, Kentucky, Louisiana, Michigan and North Carolina. The polling in each of these races is close enough that the ground could easily shift between now and November. And whether it shifts or doesn’t will determine control of the chamber.
The Upshot’s analysis has shown that, at this point in the campaign, the polls are only slightly better at picking winners and losers than their model. The difference amounts to three or four seats out of 114 over five election cycles.
What we’ve found in examining 2008-2012 is that the combination of our model and the polls actually does offer a modest advantage for picking the winners of individual races, compared to just using the model or the polls alone. It’s not a large advantage, but it seems real and therefore worth taking into account.
What about forecasting control of the Senate? Here we have found, again looking at 2008 through 2012, that the combination of a model and polls was at least as good, and sometimes better, than the polls alone at predicting who would control the chamber or the number of seats each party would control.
In combining the model and polls, we draw on an approach outlined by political scientist Simon Jackman. The model provides a baseline prediction — which can shift modestly based on trends in fundraising — and then new polls serve to “update” that prediction. The polls get more weight when we have more certainty about what the polls say, which is a function of how much polling there has been in a race, how much the polls have moved around, and how long it has been since the last poll. Jackman also describes how a forecaster could adjust the weight attached to the model’s prediction, and we have experimented with various possibilities.
Ultimately, what we found most accurate at this point in the 2008-2012 campaigns was to weight the model roughly the same as polls for heavily polled races, and still more for lightly polled races. Then, it was best to gradually shift that weighting scheme over the coming weeks, so that our predictions heavily favored the polls by early September.
Probably very few of you have made it this deep into a wonky post on polls and forecasting models. I’ll just conclude by emphasizing again that there is far more philosophical agreement than disagreement among the forecasters, even if their precise estimates differ somewhat. Moreover, to the extent that forecasters ultimately rely on polls to forecast key races, any differences should soon begin to disappear.
In our next post, we’ll provide an updated forecast that is based on both our forecasting model and the polls.