There has been a lot of concern about the quality of the polls this cycle, and those concerns have merit. Alaska does present some real polling challenges. Colorado has been mis-polled some in recent years. The polls do sometimes have a systematic bias, and are sometimes just wrong because the race is close and polling is an imperfect science. These are all legitimate worries, even now, just one week from Election Day.
However, in worrying about the polls this year, it’s important not to lose sight of an important long-term trend: there are a lot more polls than there used to be. And that matters for predictions.
In every election year from 1990 through 2002, there had been roughly 100 to 200 likely voter polls for Senate elections by this point in the election cycle. (These data were generously provided by The Upshot at the New York Times.)
But starting in 2004, the number of polls began to increase rapidly. The average between 2008 and 2012 has been over three times as large. The number of pollsters has also gone up, from an average of about 50 between 1990 and 2002, to about 80 between 2008 and 2012.
Of course, there may have been more polls in the past that have since been lost. When you’re putting together a dataset of polls after the fact, it’s harder to find older ones. But it’s tough to imagine that all this increase is a matter of accounting. There are simply more polls now than there used to be.
Regardless of the reason why there are more polls, having those polls makes the forecasts a lot more accurate. The graph below shows, from 1990 through 2012, the volume of polls and the accuracy of the predictions (using the Election Lab process) at one week before Election Day. Accuracy is measured with Brier scores, which capture the size of the average miss, such that lower values are better.
The two are pretty clearly related: as more polls become available, the accuracy of the prediction gets better. Brier scores are less than half what they used to be. Likewise, the number of missed predictions is also down, from an average of 2.5 in the earlier period to an average of 1.3 more recently.
Some of the effect of more polls is about sampling. Polls can be wrong just because they didn’t poll everyone in the electorate, so the more times you poll, the better your result. But this can’t be the whole story, because our process takes sampling error into account when it comes up with a probability.
Rather, I suspect some of it is the benefit that comes from lots of smart people tackling the same problem. More pollsters means more likely voter models. If we assume there’s no fraud or outright incompetence (not always a good assumption), then even an outlier poll is telling us that it’s possible to draw a sample of people, weight that sample a certain way, and get a different result than everyone else is getting. If nobody produces such a poll—even though lots of people are polling the race—it increases our confidence in the current prediction.
The number of polls and pollsters this cycle is on a par with the high numbers of recent elections. Of course, that’s no guarantee that the quality is the same, and some kind of systematic polling failure is always possible. Moreover, the relationship between more polls and higher accuracy isn’t perfect: the highest poll volume came in 2010, a year when the Election Lab process also miscalled more races.
But despite all the legitimate worries about the state of the polling industry and the quality of the polls this cycle, we ought to take a step back and give credit where credit is due. Looked at a certain way, it’s a golden age of polls.